Evidence Capture for AI Red Team Findings

Intermediate LAB teaching reviewer-safe AI red-team evidence capture: objectives, scope, preconditions, expected controls, observed behavior, uncertainty, risk explanation, remediation guidance, and portfolio-ready findings.

StatusIntermediate

DomainAI Security

TrackAI Red Team Scenario Design

RuntimeRead-only course

Overview

This LAB teaches how to capture AI red-team evidence in a reviewer-safe format. The goal is to document what was evaluated, what control was expected, what behavior was observed, what uncertainty remains, and what remediation is recommended.

Reviewer-safe evidence Expected controls Observed behavior No sensitive artifacts

Concept Deep Dives

Expand each concept when studying AI red-team finding evidence capture fundamentals.

What is evidence capture for AI red-team findings?

Evidence capture is the structured documentation of what was reviewed, what control was expected, what behavior was observed, what uncertainty remains, and what remediation is recommended. The goal is to create a reviewer-safe, reproducible finding.

Why must findings separate observation from conclusion?

Observation records what happened. Conclusion explains what it means. Separating them prevents overclaiming, preserves uncertainty, and makes the finding easier for reviewers, auditors, and engineers to validate.

What belongs in reviewer-safe evidence?

Reviewer-safe evidence includes scenario objective, authorized scope, synthetic artifacts, preconditions, expected controls, observed behavior, uncertainty, and remediation. It excludes secrets, customer records, live exploit output, and reusable attack payloads.

How should uncertainty and limits be recorded?

Uncertainty should identify what was not tested, what assumptions were made, what evidence is synthetic, what system state was simulated, and what the finding cannot prove.

How should remediation be tied to evidence?

Remediation should map directly to the observed control gap. A finding should not recommend broad fixes unless the evidence supports them. Strong remediation is specific, bounded, and control-oriented.

What makes an AI red-team finding portfolio-ready?

A portfolio-ready finding is clear, scoped, reproducible, reviewer-safe, honest about uncertainty, and mapped to expected controls and remediation without claiming live compromise or exposing sensitive material.

Visual Evidence Capture Model

A strong AI red-team finding turns scenario observations into a bounded, reviewer-safe evidence package.

AI Red-Team Scenario Safe scenario objective, failure mode, and authorized review boundary

→

Scope and Preconditions What was in bounds, what was assumed, and what system state was simulated

→

Expected Control Refusal, approval gate, source boundary, data boundary, or fail-closed behavior

Observed Behavior What happened during the safe, synthetic review

→

Uncertainty and Risk What is known, what is not proven, and why the behavior matters

→

Reviewer-Safe Finding Evidence-backed conclusion and remediation mapped to the control gap

Learning rule: Evidence capture is safe when it documents synthetic, bounded observations without collecting secrets, customer data, live exploit output, or sensitive runtime artifacts.

Reviewer-Safe Finding Template

Use a structured template so that AI red-team findings are readable, bounded, reproducible, and safe to share with reviewers.

Objective What behavior, control, or boundary was being evaluated?

Scope What was in bounds, out of bounds, synthetic, simulated, or explicitly excluded?

Expected Control What should have prevented the issue or contained the behavior?

Observed Behavior What actually happened, stated without unsupported conclusions?

Reviewer-safe finding:

Objective:
[What was being evaluated?]

Scope:
[What was in bounds and out of bounds?]

Preconditions:
[What was assumed, simulated, or set up?]

Expected control:
[What should have prevented or contained the issue?]

Synthetic evidence artifact:
[Description of safe evidence used]

Observed behavior:
[What actually happened?]

Uncertainty:
[What was not tested or proven?]

Risk explanation:
[Why does this matter?]

Remediation:
[What should change?]

Finding confidence:
[High / Medium / Low with justification]

Reviewer-safe conclusion:
[Bounded conclusion without live compromise claims]

High-Risk Anti-Pattern

A dangerous pattern is treating raw exploit output, sensitive data, screenshots with secrets, or unsupported claims as proof of an AI red-team finding.

Unsafe pattern:

raw sensitive evidence
→ real secrets or tokens
→ customer records
→ exploit output copied into a report
→ unclear scope
→ unsupported compromise claims
→ missing uncertainty

Risk:

secret exposure
customer data leakage
misleading finding
policy violation
audit evidence contamination
unsafe portfolio artifact
loss of reviewer trust

Secure alternative:
Use synthetic evidence artifacts.
State the scoped objective.
Record expected controls.
Separate observation from conclusion.
Document uncertainty.
Map remediation to evidence.
Write reviewer-safe conclusions.

Governance Boundary

This LAB is read-only and deterministic. It teaches safe evidence capture only. It does not collect sensitive evidence, store secrets, access customer records, copy exploit output, run live red-team tests, mutate runtime systems, or claim production enforcement.

Runtime = read-only learning

Backend exposure = false
Public backend exposed = false
Live red-team execution = false
Live exploit execution = false
Sensitive evidence collection = false
Customer data access = false
Credential handling = false
Secret handling = false
Real sensitive data usage = false
Runtime mutation = false
Production enforcement claim = false