4.6 Prompt Injection, Data Leakage, and User Input Risk

Key Takeaways

  • Prompt injection is an attempt to manipulate a model or application into ignoring instructions, revealing data, or taking unsafe actions.
  • Indirect prompt injection can hide malicious instructions inside retrieved documents, web pages, tickets, emails, or other user-controlled content.
  • Data leakage can occur through prompts, retrieved context, model outputs, logs, integrations, or poorly governed customization data.
  • User input should be treated as untrusted and controlled with least privilege, context filtering, guardrails, monitoring, and human review where risk is high.
Last updated: May 2026

Treat user input as untrusted

Generative AI applications often combine system instructions, user prompts, retrieved documents, conversation history, and tool outputs. Prompt injection attacks try to exploit that mixture. A direct prompt injection might tell the assistant to ignore previous instructions and reveal confidential data. An indirect prompt injection hides instructions inside content the application retrieves, such as a document, ticket, web page, or email. The user may not type the malicious instruction directly, but the model still sees it in context.

Prompt injection is different from a normal bad question. A normal bad question may be vague, rude, or outside scope. A prompt injection is trying to override the application design. It may ask for hidden prompts, secrets, credentials, private documents, chain-of-thought, or actions the user should not be allowed to take. A practitioner should not assume that polite interface text will stop this behavior. The application needs technical and process controls.

Data leakage is the related risk that sensitive information leaves its intended boundary. Leakage can happen when a prompt includes confidential data that should not be sent, when retrieval returns documents the user should not access, when the model output reveals another user's information, when logs store sensitive prompts without a retention plan, or when customization data is collected without approval. The issue is not only the model. It is the full data path around the model.

Control matrix

RiskExampleControl direction
Direct prompt injectionUser asks the assistant to ignore system instructionsClear refusal behavior, prompt design, guardrails, and tests
Indirect prompt injectionRetrieved document contains malicious instructionsTreat retrieved text as data, filter sources, and limit tool permissions
Over-retrievalAssistant receives confidential documents unrelated to the userPermission-aware retrieval and least privilege
Secret exposurePrompt or output includes passwords, API keys, or tokensAWS Secrets Manager, scanning, redaction, and no secrets in prompts
Sensitive data in logsPrompts with personal data are retained too broadlyCloudWatch log controls, retention policy, access review, and masking
Weak auditabilityNo record of model calls or administrative changesCloudTrail, application logs, and governance review

IAM remains central. A generative AI application should run with the least privilege needed for its task. If it can call tools, query databases, retrieve files, or trigger actions, those permissions should be scoped tightly. The model should not be trusted as the authorization layer. Authorization belongs in the application and AWS control plane. Users should only retrieve data and invoke actions they are allowed to use.

AWS-aware risk reduction

Several AWS services fit naturally into the control conversation. IAM controls who and what can access AWS resources. AWS KMS supports encryption key management for data at rest across many AWS services. AWS Secrets Manager helps keep secrets out of code and prompts. Amazon Macie can help discover sensitive data in Amazon S3 so teams understand what may be exposed before indexing or sharing content. CloudTrail records API activity for audit visibility. CloudWatch supports application and operational monitoring. Guardrails for Amazon Bedrock can help apply safety policies to Bedrock-based applications.

These controls should be paired with design choices. Separate instructions from retrieved data. Tell the model that retrieved content is untrusted evidence, not new operating instructions. Avoid putting secrets, credentials, or unnecessary personal data into prompts. Filter retrieval by user permissions and business need. Validate structured outputs before downstream systems use them. Use human approval before high-impact actions. Test with adversarial prompts and hostile documents before launch.

A non-builder can contribute by asking concrete questions:

  • What data can users cause the assistant to retrieve?
  • Can the assistant take actions, or only produce draft text?
  • Where are prompts, retrieved passages, and outputs logged?
  • How are sensitive fields redacted or excluded?
  • What happens when a user asks for private data or hidden instructions?
  • Who reviews prompt injection tests and incident logs?

Prompt injection cannot be solved by one clever phrase in the system prompt. It is a security and governance pattern. The business must decide what the assistant is allowed to know, say, and do. The application must enforce those decisions with access control, retrieval boundaries, output controls, monitoring, and escalation. That is why prompt injection belongs in both generative AI fundamentals and security governance discussions.

Test Your Knowledge

Which situation is the best example of indirect prompt injection?

A
B
C
D
Test Your Knowledge

Why should a GenAI application avoid placing secrets in prompts?

A
B
C
D
Test Your Knowledge

Which design principle is most important for a GenAI assistant that can retrieve internal documents?

A
B
C
D