AI Red Team Scenario Design · Evidence Capture · L2
Evidence Capture for AI Red Team Findings
Intermediate LAB teaching reviewer-safe AI red-team evidence capture: objectives, scope, preconditions, expected controls, observed behavior, uncertainty, risk explanation, remediation guidance, and portfolio-ready findings.
Overview
This LAB teaches how to capture AI red-team evidence in a reviewer-safe format. The goal is to document what was evaluated, what control was expected, what behavior was observed, what uncertainty remains, and what remediation is recommended.
Concept Deep Dives
Expand each concept when studying AI red-team finding evidence capture fundamentals.
What is evidence capture for AI red-team findings?
Evidence capture is the structured documentation of what was reviewed, what control was expected, what behavior was observed, what uncertainty remains, and what remediation is recommended. The goal is to create a reviewer-safe, reproducible finding.
Why must findings separate observation from conclusion?
Observation records what happened. Conclusion explains what it means. Separating them prevents overclaiming, preserves uncertainty, and makes the finding easier for reviewers, auditors, and engineers to validate.
What belongs in reviewer-safe evidence?
Reviewer-safe evidence includes scenario objective, authorized scope, synthetic artifacts, preconditions, expected controls, observed behavior, uncertainty, and remediation. It excludes secrets, customer records, live exploit output, and reusable attack payloads.
How should uncertainty and limits be recorded?
Uncertainty should identify what was not tested, what assumptions were made, what evidence is synthetic, what system state was simulated, and what the finding cannot prove.
How should remediation be tied to evidence?
Remediation should map directly to the observed control gap. A finding should not recommend broad fixes unless the evidence supports them. Strong remediation is specific, bounded, and control-oriented.
What makes an AI red-team finding portfolio-ready?
A portfolio-ready finding is clear, scoped, reproducible, reviewer-safe, honest about uncertainty, and mapped to expected controls and remediation without claiming live compromise or exposing sensitive material.
Visual Evidence Capture Model
A strong AI red-team finding turns scenario observations into a bounded, reviewer-safe evidence package.
Reviewer-Safe Finding Template
Use a structured template so that AI red-team findings are readable, bounded, reproducible, and safe to share with reviewers.
Reviewer-safe finding:
Objective:
[What was being evaluated?]
Scope:
[What was in bounds and out of bounds?]
Preconditions:
[What was assumed, simulated, or set up?]
Expected control:
[What should have prevented or contained the issue?]
Synthetic evidence artifact:
[Description of safe evidence used]
Observed behavior:
[What actually happened?]
Uncertainty:
[What was not tested or proven?]
Risk explanation:
[Why does this matter?]
Remediation:
[What should change?]
Finding confidence:
[High / Medium / Low with justification]
Reviewer-safe conclusion:
[Bounded conclusion without live compromise claims]
High-Risk Anti-Pattern
A dangerous pattern is treating raw exploit output, sensitive data, screenshots with secrets, or unsupported claims as proof of an AI red-team finding.
Unsafe pattern:
raw sensitive evidence
→ real secrets or tokens
→ customer records
→ exploit output copied into a report
→ unclear scope
→ unsupported compromise claims
→ missing uncertainty
Risk:
secret exposure
customer data leakage
misleading finding
policy violation
audit evidence contamination
unsafe portfolio artifact
loss of reviewer trust
Secure alternative:
Use synthetic evidence artifacts.
State the scoped objective.
Record expected controls.
Separate observation from conclusion.
Document uncertainty.
Map remediation to evidence.
Write reviewer-safe conclusions.
Governance Boundary
This LAB is read-only and deterministic. It teaches safe evidence capture only. It does not collect sensitive evidence, store secrets, access customer records, copy exploit output, run live red-team tests, mutate runtime systems, or claim production enforcement.
Runtime = read-only learning
Backend exposure = false
Public backend exposed = false
Live red-team execution = false
Live exploit execution = false
Sensitive evidence collection = false
Customer data access = false
Credential handling = false
Secret handling = false
Real sensitive data usage = false
Runtime mutation = false
Production enforcement claim = false