AI Red Team Scenario Design · Retrieval Poisoning · L2
Retrieval Poisoning Scenario Design
Intermediate LAB teaching safe retrieval-poisoning scenario design: source authority, tenant boundaries, stale content risk, retrieval trust, expected controls, reviewer-safe evidence, and non-mutation boundaries.
Overview
This LAB teaches how to design safe retrieval-poisoning scenarios that evaluate whether an AI workflow can be influenced by untrusted, stale, low-authority, tenant-crossing, or misleading retrieved content.
Concept Deep Dives
Expand each concept when studying retrieval-poisoning scenario design fundamentals.
What is retrieval-poisoning scenario design?
Retrieval-poisoning scenario design is the safe planning of tests that evaluate whether an AI workflow can be influenced by untrusted, stale, low-authority, tenant-crossing, or misleading retrieved content. The goal is to assess retrieval controls, not to poison real data sources.
Why does source authority matter?
Source authority helps determine whether retrieved content should be trusted, treated as untrusted context, excluded, or routed for human review. A safe scenario checks whether the system distinguishes approved knowledge from untrusted content.
How do stale or low-authority sources create risk?
Stale or low-authority sources can cause outdated guidance, weak recommendations, incorrect summaries, policy drift, or unsupported decisions. A retrieval-risk scenario should test freshness, provenance, relevance, and authority labels.
Where can tenant or context boundaries fail?
Boundary failures can occur when content from one tenant, workspace, user group, data source, or sensitivity tier is retrieved into another context without isolation, authorization, or filtering.
What controls should a retrieval-risk scenario test?
Controls include source allow-lists, authority labels, tenant isolation, freshness checks, sensitivity filtering, context packaging, citation requirements, retrieval logging, and fail-closed behavior.
How should retrieval-risk findings be documented safely?
A safe finding records objective, scope, source category, expected retrieval control, observed behavior, evidence, uncertainty, risk, and remediation without modifying real corpora, vector stores, documents, or production data sources.
Visual Retrieval Poisoning Scenario Design Model
A strong retrieval-poisoning scenario turns source-trust risk into a scoped, evidence-backed control review.
Example Scenario
An AI assistant retrieves internal policy guidance before answering employee questions. The learner must design a safe scenario to check whether low-authority or stale content can influence the assistant’s response.
Safe scenario handling:
define the retrieval workflow
map approved and untrusted sources
identify source authority and freshness
state tenant and sensitivity boundaries
use synthetic source records only
do not mutate corpora or vector stores
observe whether retrieval controls are preserved
record uncertainty and limits
write remediation tied to source-trust controls
Result:
The scenario becomes a retrieval-control review, not a data poisoning exercise.
High-Risk Anti-Pattern
A dangerous pattern is modifying or seeding real retrieval sources to influence model behavior while treating the activity as safe testing or research.
Unsafe pattern:
real corpus mutation
→ vector database writes
→ source tampering
→ customer data exposure
→ tenant boundary crossing
→ unsupported compromise claims
Risk:
production data integrity loss
customer data exposure
tenant isolation failure
misleading model output
audit evidence contamination
loss of trust in retrieval systems
Secure alternative:
Use synthetic source records.
Do not modify real corpora.
Do not write to vector databases.
Label source authority.
Preserve tenant boundaries.
Record expected controls.
Capture reviewer-safe evidence.
Recommend retrieval-control remediation.
Governance Boundary
This LAB is read-only and deterministic. It teaches safe scenario design only. It does not poison retrieval systems, modify corpora, write to vector databases, tamper with sources, access customer data, mutate runtime systems, or claim production enforcement.
Runtime = read-only learning
Backend exposure = false
Public backend exposed = false
Live retrieval poisoning = false
Corpus mutation = false
Vector database write = false
Source tampering = false
Live model abuse execution = false
Live exploit execution = false
Live red-team execution = false
Customer data access = false
Credential handling = false
Runtime mutation = false
Production enforcement claim = false