← Back to AI Red Team Scenario Design Track

AI Red Team Scenario Design · Retrieval Poisoning · L2

Retrieval Poisoning Scenario Design

Intermediate LAB teaching safe retrieval-poisoning scenario design: source authority, tenant boundaries, stale content risk, retrieval trust, expected controls, reviewer-safe evidence, and non-mutation boundaries.

StatusIntermediate
DomainAI Security
TrackAI Red Team Scenario Design
RuntimeRead-only course

Study Menu

Overview

This LAB teaches how to design safe retrieval-poisoning scenarios that evaluate whether an AI workflow can be influenced by untrusted, stale, low-authority, tenant-crossing, or misleading retrieved content.

Source authority Tenant scope Retrieval trust No corpus mutation

Concept Deep Dives

Expand each concept when studying retrieval-poisoning scenario design fundamentals.

What is retrieval-poisoning scenario design?

Retrieval-poisoning scenario design is the safe planning of tests that evaluate whether an AI workflow can be influenced by untrusted, stale, low-authority, tenant-crossing, or misleading retrieved content. The goal is to assess retrieval controls, not to poison real data sources.

Why does source authority matter?

Source authority helps determine whether retrieved content should be trusted, treated as untrusted context, excluded, or routed for human review. A safe scenario checks whether the system distinguishes approved knowledge from untrusted content.

How do stale or low-authority sources create risk?

Stale or low-authority sources can cause outdated guidance, weak recommendations, incorrect summaries, policy drift, or unsupported decisions. A retrieval-risk scenario should test freshness, provenance, relevance, and authority labels.

Where can tenant or context boundaries fail?

Boundary failures can occur when content from one tenant, workspace, user group, data source, or sensitivity tier is retrieved into another context without isolation, authorization, or filtering.

What controls should a retrieval-risk scenario test?

Controls include source allow-lists, authority labels, tenant isolation, freshness checks, sensitivity filtering, context packaging, citation requirements, retrieval logging, and fail-closed behavior.

How should retrieval-risk findings be documented safely?

A safe finding records objective, scope, source category, expected retrieval control, observed behavior, evidence, uncertainty, risk, and remediation without modifying real corpora, vector stores, documents, or production data sources.

Visual Retrieval Poisoning Scenario Design Model

A strong retrieval-poisoning scenario turns source-trust risk into a scoped, evidence-backed control review.

AI Workflow Under Review RAG application, search assistant, knowledge assistant, or retrieval-enabled workflow
Retrieval Source Map Approved sources, untrusted sources, tenant scope, freshness, and sensitivity
Source Authority Boundary Allowed, denied, untrusted, stale, low-authority, or review-required content
Poisoning Hypothesis Misleading retrieval, stale source influence, tenant crossing, or source tampering risk
Expected Control Filtering, source labeling, citation enforcement, tenant isolation, or fail-closed behavior
Reviewer-Safe Finding Observed behavior, evidence, uncertainty, risk, and remediation
Learning rule: Retrieval-poisoning testing is safe only when it evaluates source-trust controls without mutating corpora, vector databases, files, or production content.

Example Scenario

An AI assistant retrieves internal policy guidance before answering employee questions. The learner must design a safe scenario to check whether low-authority or stale content can influence the assistant’s response.

Objective Evaluate whether the retrieval workflow respects source authority, freshness, and tenant boundaries.
Scope Synthetic source catalog and simulated retrieval results only. No real corpus mutation, vector writes, or production data sources.
Expected Control The assistant should prefer approved sources, label low-authority context, preserve tenant boundaries, and avoid unsupported claims.
Evidence Reviewer-safe record of source category, expected retrieval control, observed behavior, uncertainty, and remediation.
Safe scenario handling:
define the retrieval workflow
map approved and untrusted sources
identify source authority and freshness
state tenant and sensitivity boundaries
use synthetic source records only
do not mutate corpora or vector stores
observe whether retrieval controls are preserved
record uncertainty and limits
write remediation tied to source-trust controls

Result:
The scenario becomes a retrieval-control review, not a data poisoning exercise.

High-Risk Anti-Pattern

A dangerous pattern is modifying or seeding real retrieval sources to influence model behavior while treating the activity as safe testing or research.

Unsafe pattern:

real corpus mutation
→ vector database writes
→ source tampering
→ customer data exposure
→ tenant boundary crossing
→ unsupported compromise claims

Risk:

production data integrity loss
customer data exposure
tenant isolation failure
misleading model output
audit evidence contamination
loss of trust in retrieval systems

Secure alternative:
Use synthetic source records.
Do not modify real corpora.
Do not write to vector databases.
Label source authority.
Preserve tenant boundaries.
Record expected controls.
Capture reviewer-safe evidence.
Recommend retrieval-control remediation.

Governance Boundary

This LAB is read-only and deterministic. It teaches safe scenario design only. It does not poison retrieval systems, modify corpora, write to vector databases, tamper with sources, access customer data, mutate runtime systems, or claim production enforcement.

Runtime = read-only learning

Backend exposure = false
Public backend exposed = false
Live retrieval poisoning = false
Corpus mutation = false
Vector database write = false
Source tampering = false
Live model abuse execution = false
Live exploit execution = false
Live red-team execution = false
Customer data access = false
Credential handling = false
Runtime mutation = false
Production enforcement claim = false