AI Security Engineering · Testing and Evidence · L2

AI Security Testing and Evidence Harness

Intermediate LAB teaching AI security testing and evidence harness design: prompt boundary tests, tool permission tests, retrieval tests, output safety checks, runtime guardrail tests, abuse/cost tests, expected outcomes, and audit-ready evidence packages.

StatusIntermediate

DomainAI Security

TrackAI Security Engineering

RuntimeRead-only course

Overview

This LAB teaches how to design AI security tests that prove controls work across prompt, tool, retrieval, output, runtime, abuse, cost, and evidence boundaries.

Security testing Evidence harness Negative tests Audit-ready proof

Concept Deep Dives

Expand each concept when studying AI security testing and evidence harness design.

What is an AI security testing harness?

An AI security testing harness is a structured way to define test cases, expected outcomes, control checks, pass/fail results, and evidence records for AI security boundaries.

Why do AI controls need negative tests?

Negative tests prove that unsafe, unauthorized, overscoped, or unsupported behavior is blocked. They validate deny paths, refusal behavior, escalation, throttling, and fail-closed outcomes.

What should be tested in prompt boundaries?

Tests should verify that untrusted user input, retrieved content, and tool output cannot override trusted system or developer instructions.

What should be tested in tool permissions?

Tests should verify tool classification, permission scope, approval requirements, self-approval prevention, deny decisions, and decision evidence.

What should be tested in retrieval security?

Tests should verify tenant scope, source authority, sensitivity filters, freshness checks, conflict handling, poisoning resistance, and context packaging.

What makes evidence audit-ready?

Audit-ready evidence explains the input, control boundary, expected outcome, actual decision, reason, final outcome, and reviewer-safe summary without claiming production enforcement.

Visual AI Security Testing and Evidence Harness Model

A secure testing harness turns AI security controls into repeatable proof.

Test Case Scenario, input, threat, control boundary

→

Expected Outcome Allow, deny, redact, throttle, escalate, fail closed

→

Control Check Prompt, tool, retrieval, output, runtime, cost

Negative Test Unsafe path should be blocked or escalated

→

Pass / Fail Result Compare actual outcome to expected decision

→

Evidence Record Input, checks, reason, result, outcome

Regression Set Repeatable suite for future changes

→

Reviewer Summary Audit-ready explanation of control behavior

→

Governance Package Safe evidence without production enforcement claims

Learning rule: Untested AI controls are assumptions; tested controls become evidence.

Example Scenario

A test case attempts to make retrieved content override tool permissions and request an unauthorized mutating action.

Threat input Retrieved content contains an instruction to bypass tool approval.

Expected outcome Deny unauthorized tool path and preserve tool permission boundary.

Control checks Prompt boundary, retrieval trust, tool permission, and output policy.

Evidence package Records input, expected result, actual decision, reason, and reviewer summary.

Secure test harness handling:
define scenario and threat condition
declare expected outcome before review
validate prompt boundary behavior
validate retrieval source trust
validate tool permission denial
validate output safety response
record pass/fail outcome
produce reviewer-safe evidence summary

Result:
The AI system can prove that retrieved content cannot override tool authorization.

High-Risk Anti-Pattern

A dangerous pattern relies on informal testing or screenshots without structured evidence.

Unsafe pattern:

AI controls exist
-> tests are informal or manual
-> expected outcomes are undefined
-> failures are not captured
-> evidence is incomplete
-> control effectiveness cannot be proven

Risk:

prompt boundary bypass goes unnoticed
tool permission denials are not regression-tested
retrieval poisoning controls are assumed, not proven
output safety failures lack evidence
runtime and cost controls are not repeatable
audit reviewer cannot trace the decision path

Secure alternative:
Define deterministic test cases.
Declare expected outcomes.
Run positive and negative tests.
Record pass/fail evidence.
Summarize for reviewers.
Repeat as regression suite.

Governance Boundary

This LAB is read-only and deterministic. It does not call models, execute tools, retrieve enterprise data, query vector databases, expose backend APIs, mutate runtime systems, or claim production enforcement.

Runtime = read-only learning

Backend exposure = false
Live model integration = false
Live tool execution = false
Live retrieval execution = false
Vector database access = false
Enterprise data access = false
Provider quota mutation = false
Runtime mutation = false
Production enforcement claim = false