AI Security Engineering · Testing and Evidence · L2
AI Security Testing and Evidence Harness
Intermediate LAB teaching AI security testing and evidence harness design: prompt boundary tests, tool permission tests, retrieval tests, output safety checks, runtime guardrail tests, abuse/cost tests, expected outcomes, and audit-ready evidence packages.
Overview
This LAB teaches how to design AI security tests that prove controls work across prompt, tool, retrieval, output, runtime, abuse, cost, and evidence boundaries.
Concept Deep Dives
Expand each concept when studying AI security testing and evidence harness design.
What is an AI security testing harness?
An AI security testing harness is a structured way to define test cases, expected outcomes, control checks, pass/fail results, and evidence records for AI security boundaries.
Why do AI controls need negative tests?
Negative tests prove that unsafe, unauthorized, overscoped, or unsupported behavior is blocked. They validate deny paths, refusal behavior, escalation, throttling, and fail-closed outcomes.
What should be tested in prompt boundaries?
Tests should verify that untrusted user input, retrieved content, and tool output cannot override trusted system or developer instructions.
What should be tested in tool permissions?
Tests should verify tool classification, permission scope, approval requirements, self-approval prevention, deny decisions, and decision evidence.
What should be tested in retrieval security?
Tests should verify tenant scope, source authority, sensitivity filters, freshness checks, conflict handling, poisoning resistance, and context packaging.
What makes evidence audit-ready?
Audit-ready evidence explains the input, control boundary, expected outcome, actual decision, reason, final outcome, and reviewer-safe summary without claiming production enforcement.
Visual AI Security Testing and Evidence Harness Model
A secure testing harness turns AI security controls into repeatable proof.
Example Scenario
A test case attempts to make retrieved content override tool permissions and request an unauthorized mutating action.
Secure test harness handling:
define scenario and threat condition
declare expected outcome before review
validate prompt boundary behavior
validate retrieval source trust
validate tool permission denial
validate output safety response
record pass/fail outcome
produce reviewer-safe evidence summary
Result:
The AI system can prove that retrieved content cannot override tool authorization.
High-Risk Anti-Pattern
A dangerous pattern relies on informal testing or screenshots without structured evidence.
Unsafe pattern:
AI controls exist
-> tests are informal or manual
-> expected outcomes are undefined
-> failures are not captured
-> evidence is incomplete
-> control effectiveness cannot be proven
Risk:
prompt boundary bypass goes unnoticed
tool permission denials are not regression-tested
retrieval poisoning controls are assumed, not proven
output safety failures lack evidence
runtime and cost controls are not repeatable
audit reviewer cannot trace the decision path
Secure alternative:
Define deterministic test cases.
Declare expected outcomes.
Run positive and negative tests.
Record pass/fail evidence.
Summarize for reviewers.
Repeat as regression suite.
Governance Boundary
This LAB is read-only and deterministic. It does not call models, execute tools, retrieve enterprise data, query vector databases, expose backend APIs, mutate runtime systems, or claim production enforcement.
Runtime = read-only learning
Backend exposure = false
Live model integration = false
Live tool execution = false
Live retrieval execution = false
Vector database access = false
Enterprise data access = false
Provider quota mutation = false
Runtime mutation = false
Production enforcement claim = false