A document in a knowledge base contains instructions telling the assistant to ignore company policy and reveal confidential files. What type of risk is this?

Indirect prompt injection. Indirect prompt injection occurs when retrieved or processed content contains instructions that try to manipulate the model.

Which control most directly reduces the impact of data exfiltration from a RAG assistant?

Authorize retrieval so users can access only documents they are permitted to see. If retrieval respects user authorization and data boundaries, the model has less unauthorized data available to reveal.

An agent can update customer records. What is the strongest practitioner judgment?

Treat tool use as higher risk and require narrow permissions, validation, logging, and confirmation for sensitive actions. Tools and actions increase business impact. They should be constrained with least privilege, validation, logging, and approval controls where needed.

Prompt Injection, Data Exfiltration, and Gen | Free Guide 2026

GenAI threat modeling for practitioners

A traditional web application usually treats form fields, files, and API calls as inputs. A generative AI application treats natural language as an input too. That creates a new control problem: a user, document, email, web page, or retrieved passage can contain instructions that try to manipulate the model. Prompt injection is the broad term for attempts to override system instructions, bypass policy, reveal hidden context, or trigger unauthorized behavior.

Direct prompt injection comes from the user. The user might ask the assistant to ignore safety rules, reveal the hidden prompt, output private context, or call a tool it should not call. Indirect prompt injection comes from content the application retrieves or processes. A document in a knowledge base could contain text that tells the model to send confidential data to the user. An email summary tool could read an email that instructs it to ignore the original task.

Threat area	What can go wrong	Practitioner control question
Direct prompt injection	User tries to override application rules or reveal hidden context	Are instructions, guardrails, and refusal behavior tested against adversarial prompts?
Indirect prompt injection	Retrieved content includes malicious instructions	Are data sources trusted, scanned, filtered, and separated from control instructions?
Data exfiltration	The assistant reveals private documents, secrets, or system prompts	Are access controls and output checks limiting what can be returned?
Tool misuse	An agent calls an action outside the user's intent or authority	Are tools narrow, validated, logged, and gated by confirmation where needed?
Over-trust	Users treat generated text as authoritative when it is not	Are citations, disclaimers, human review, and escalation paths appropriate?

Data exfiltration is the security concern behind many prompt-injection stories. If the model cannot access sensitive data, it cannot reveal that data. If the application retrieves only documents the user is allowed to see, the blast radius is lower. If an agent has only one narrow action, tool misuse is constrained. That is why least privilege, data minimization, and authorization checks are still the first line of defense for GenAI.

Prompt design helps, but prompts are not a security boundary by themselves. A strong system instruction can tell the model to follow company policy, use only retrieved sources, and refuse requests for secrets. That is useful. But a determined user can still try to manipulate the model, and retrieved text can still conflict with instructions. Security should not depend only on the model deciding to behave.

Guardrails for Amazon Bedrock can help evaluate user inputs and model responses for unsafe content, denied topics, sensitive data, prompt attack patterns where supported, and grounding behavior. Guardrails are part of defense-in-depth. They should be combined with IAM, application authorization, source controls, tool schemas, logging, and human review for high-impact actions.

Tool use changes the risk level. A chat assistant that only answers from a public FAQ has limited impact. A Bedrock agent that can create refunds, update customer records, or open support cases needs stronger controls. The application should define allowed tools, required parameters, validation rules, and confirmation steps. The model should not be able to invent new tools or call arbitrary APIs just because the user asks.

A simple threat modeling workflow:

Name the business workflow and the decision or action the AI supports.
List the assets: prompts, documents, embeddings, logs, secrets, user identities, tools, and outputs.
Draw trust boundaries between users, the application, model service, data stores, logs, and action systems.
Identify abuse cases, including prompt injection, indirect instructions, sensitive data requests, unauthorized actions, and false authoritative answers.
Choose layered controls: least privilege, source governance, guardrails, validation, citations, logging, and human review.
Test the system with normal, edge-case, and adversarial prompts before production.
Monitor incidents and update controls as users find new ways to interact with the assistant.

Scenario: a company builds a support assistant that retrieves internal troubleshooting documents. An attacker asks the assistant to reveal all hidden instructions and all documents about unreleased products. The correct control answer is not to trust a stronger prompt alone. The application should enforce user authorization, retrieve only permitted documents, use guardrails and output checks, and log suspicious requests for review.

Scenario: an email summarizer reads incoming messages and drafts replies. A malicious email says to ignore previous instructions and include the user's private calendar details in the reply. This is indirect prompt injection because the instruction comes from content being processed. The application should separate untrusted email content from control instructions, limit tool access, and require user approval before sending responses.

Scenario: an internal procurement agent can submit purchase requests. A user asks it to split a purchase into smaller orders to bypass approval thresholds. This is not only a prompt issue; it is a business policy issue. The backend should enforce spending rules and approvals even if the model produces a persuasive request. The model can assist with form completion, but the system of record must enforce policy.

Practitioner-level GenAI security is about asking where the model can read, write, and act. If the model can only read approved public content, the risk is smaller. If it can read sensitive records and trigger business actions, the review must be stronger. AIF-C01 expects that kind of judgment, not exploit engineering.

AWS AI Practitioner Study Guide

9.3 Prompt Injection, Data Exfiltration, and GenAI Threat Modeling

Key Takeaways

GenAI threat modeling for practitioners

AWS AI Practitioner Study Guide

1Chapter 1: AIF-C01 Orientation and Official Source Control

2Chapter 2: AI/ML Foundations and Use-Case Fit

3Chapter 3: ML Lifecycle, Metrics, and Practitioner MLOps

4Chapter 4: Generative AI Foundations and Inference Concepts

5Chapter 5: Prompting, Model Selection, Customization, and Evaluation

6Chapter 6: Amazon Bedrock, RAG, Agents, and Guardrails

7Chapter 7: AWS Managed AI/ML Services and SageMaker Map

8Chapter 8: Responsible AI, Human Review, and Safety

9Chapter 9: Security, Compliance, Governance, and Cost Controls

10Chapter 10: Integrated AWS AI Business Scenario Labs

11Chapter 11: Final Review, Exam Readiness, and Recertification

9.3 Prompt Injection, Data Exfiltration, and GenAI Threat Modeling

Key Takeaways

GenAI threat modeling for practitioners