AWS AI Practitioner Study Guide

Exam

1.1 Credential Scope, Code, and Search Language 1.2 Exam Format, Cost, Score, and Delivery 1.3 Official Exam Guide, Skill Builder, and Practice Workflow 1.4 Target Candidate Boundaries and Out-of-Scope Tasks 1.5 Domain Weights and Study Prioritization 1.6 Exam Policy, Retake, Results, and Recertification

2.1 AI, ML, Deep Learning, and Core Terminology 2.2 Supervised, Unsupervised, and Reinforcement Learning 2.3 Data Types, Labels, Structure, and Quality 2.4 Inference Patterns: Batch, Real-Time, and Embedded AI 2.5 AI Use-Case Fit and No-AI Decisions 2.6 Classification, Regression, Clustering, Forecasting, and Recommendation 2.7 AI/ML Foundations Case Lab

3.1 Data Collection, EDA, Preprocessing, and Feature Concepts 3.2 Training, Evaluation, Deployment, and Monitoring Lifecycle 3.3 Model Sources: Managed APIs, Open Source, and Custom Models 3.4 Evaluation Metrics and Business Metrics 3.5 Practitioner MLOps: Repeatability, Monitoring, and Retraining 3.6 SageMaker Lifecycle Service Map 3.7 ML Lifecycle Case Lab

4.1 Foundation Models, LLMs, Transformers, and Modalities 4.2 Tokens, Context Windows, Embeddings, and Vector Search 4.3 Inference Parameters: Temperature, Top-p, and Output Controls 4.4 Hallucination, Grounding, and Context Quality 4.5 GenAI Use-Case Fit and Risk Triage 4.6 Prompt Injection, Data Leakage, and User Input Risk 4.7 Generative AI Foundations Case Lab

5.1 Prompt Engineering Patterns and Business Quality 5.2 Zero-Shot, Few-Shot, Templates, and Instruction Design 5.3 Model Selection: Capability, Latency, Cost, and Risk 5.4 RAG vs Fine-Tuning vs Prompting vs Custom Models 5.5 Model Evaluation, Human Review, and Red-Team Feedback 5.6 Cost, Performance, and Throughput Decision-Making 5.7 Prompting and Model Selection Case Lab

6.1 Bedrock Core Concepts, Model Access, and Managed FM Choice 6.2 Knowledge Bases, RAG, Data Sources, and Grounding 6.3 Agents, Action Groups, Orchestration, and Business Workflows 6.4 Guardrails, Content Filters, Denied Topics, and Sensitive Data 6.5 Bedrock Model Evaluation, Monitoring, and Human Feedback 6.6 Bedrock Cost, Latency, Throughput, and Operational Fit 6.7 Bedrock, RAG, and Agents Case Lab

7.1 Managed AI Services vs Foundation Model Apps vs Custom ML 7.2 Text, Language, Search, and Document AI Services 7.3 Vision, Speech, Contact Center, and Personalization Services 7.4 Amazon Q Business, Developer, and Practitioner Fit 7.5 SageMaker Canvas, Studio, Clarify, Autopilot, and Data Wrangler 7.6 Data Foundation Services: S3, Glue, OpenSearch, and QuickSight 7.7 AWS AI Service Selection Case Lab

8.1 Fairness, Bias, Transparency, and Explainability 8.2 Privacy, Safety, Human Review, and Accountability 8.3 Guardrails, Clarify, A2I, and Content Safety Controls 8.4 Responsible AI Risk Registers and Governance Workflows 8.5 Monitoring, Feedback, Drift, and Incident Response 8.6 Responsible AI Case Lab

9.1 Shared Responsibility, IAM, and Least Privilege for AI 9.2 Encryption, Secrets, Networking, and Data Privacy 9.3 Prompt Injection, Data Exfiltration, and GenAI Threat Modeling 9.4 Logging, Monitoring, CloudTrail, CloudWatch, and Config 9.5 Compliance Artifact, Audit Manager, Macie, and Policy Evidence 9.6 AI Cost Controls, Pricing, Throughput, and Budget Governance 9.7 Security and Governance Case Lab

10.1 Customer Support GenAI Assistant Lab 10.2 Document Intelligence and Compliance Review Lab 10.3 Personalization, Forecasting, and Fraud Detection Lab 10.4 Enterprise Search, RAG, and Knowledge Management Lab 10.5 Responsible AI and Security Review Board Lab 10.6 Cost, Performance, and Operations Review Lab 10.7 Full AIF-C01 Business Simulation

11.1 Final 30-Day AIF-C01 Study Plan 11.2 Official Practice Resources and Weak-Domain Remediation 11.3 90-Minute Exam Timing, Flagging, and Guessing Workflow 11.4 Test-Day Checklist: Online or Test Center 11.5 Post-Exam Results, Retake, and Recertification Plan 11.6 AWS AI Practitioner Final Mixed Review

4.6 Prompt Injection, Data Leakage, and User Input Risk

Key Takeaways

Prompt injection is an attempt to manipulate a model or application into ignoring instructions, revealing data, or taking unsafe actions.
Indirect prompt injection can hide malicious instructions inside retrieved documents, web pages, tickets, emails, or other user-controlled content.
Data leakage can occur through prompts, retrieved context, model outputs, logs, integrations, or poorly governed customization data.
User input should be treated as untrusted and controlled with least privilege, context filtering, guardrails, monitoring, and human review where risk is high.

Last updated: May 2026

Treat user input as untrusted

Generative AI applications often combine system instructions, user prompts, retrieved documents, conversation history, and tool outputs. Prompt injection attacks try to exploit that mixture. A direct prompt injection might tell the assistant to ignore previous instructions and reveal confidential data. An indirect prompt injection hides instructions inside content the application retrieves, such as a document, ticket, web page, or email. The user may not type the malicious instruction directly, but the model still sees it in context.

Prompt injection is different from a normal bad question. A normal bad question may be vague, rude, or outside scope. A prompt injection is trying to override the application design. It may ask for hidden prompts, secrets, credentials, private documents, chain-of-thought, or actions the user should not be allowed to take. A practitioner should not assume that polite interface text will stop this behavior. The application needs technical and process controls.

Data leakage is the related risk that sensitive information leaves its intended boundary. Leakage can happen when a prompt includes confidential data that should not be sent, when retrieval returns documents the user should not access, when the model output reveals another user's information, when logs store sensitive prompts without a retention plan, or when customization data is collected without approval. The issue is not only the model. It is the full data path around the model.

Control matrix

Risk	Example	Control direction
Direct prompt injection	User asks the assistant to ignore system instructions	Clear refusal behavior, prompt design, guardrails, and tests
Indirect prompt injection	Retrieved document contains malicious instructions	Treat retrieved text as data, filter sources, and limit tool permissions
Over-retrieval	Assistant receives confidential documents unrelated to the user	Permission-aware retrieval and least privilege
Secret exposure	Prompt or output includes passwords, API keys, or tokens	AWS Secrets Manager, scanning, redaction, and no secrets in prompts
Sensitive data in logs	Prompts with personal data are retained too broadly	CloudWatch log controls, retention policy, access review, and masking
Weak auditability	No record of model calls or administrative changes	CloudTrail, application logs, and governance review

IAM remains central. A generative AI application should run with the least privilege needed for its task. If it can call tools, query databases, retrieve files, or trigger actions, those permissions should be scoped tightly. The model should not be trusted as the authorization layer. Authorization belongs in the application and AWS control plane. Users should only retrieve data and invoke actions they are allowed to use.

AWS-aware risk reduction

Several AWS services fit naturally into the control conversation. IAM controls who and what can access AWS resources. AWS KMS supports encryption key management for data at rest across many AWS services. AWS Secrets Manager helps keep secrets out of code and prompts. Amazon Macie can help discover sensitive data in Amazon S3 so teams understand what may be exposed before indexing or sharing content. CloudTrail records API activity for audit visibility. CloudWatch supports application and operational monitoring. Guardrails for Amazon Bedrock can help apply safety policies to Bedrock-based applications.

These controls should be paired with design choices. Separate instructions from retrieved data. Tell the model that retrieved content is untrusted evidence, not new operating instructions. Avoid putting secrets, credentials, or unnecessary personal data into prompts. Filter retrieval by user permissions and business need. Validate structured outputs before downstream systems use them. Use human approval before high-impact actions. Test with adversarial prompts and hostile documents before launch.

A non-builder can contribute by asking concrete questions:

What data can users cause the assistant to retrieve?
Can the assistant take actions, or only produce draft text?
Where are prompts, retrieved passages, and outputs logged?
How are sensitive fields redacted or excluded?
What happens when a user asks for private data or hidden instructions?
Who reviews prompt injection tests and incident logs?

Prompt injection cannot be solved by one clever phrase in the system prompt. It is a security and governance pattern. The business must decide what the assistant is allowed to know, say, and do. The application must enforce those decisions with access control, retrieval boundaries, output controls, monitoring, and escalation. That is why prompt injection belongs in both generative AI fundamentals and security governance discussions.

Test Your Knowledge

Which situation is the best example of indirect prompt injection?

A malicious instruction is hidden inside a document that the assistant retrieves as context.

A user asks a short but unclear question.

A model response is limited by maximum tokens.

An embedding model converts approved text into vectors.

Test Your Knowledge

Why should a GenAI application avoid placing secrets in prompts?

Prompts, context, outputs, or logs may expose data beyond its intended boundary.

Models cannot process text that contains numbers.

Secrets Manager requires secrets to be typed into every prompt.

Temperature settings encrypt prompt data automatically.

Test Your Knowledge

Which design principle is most important for a GenAI assistant that can retrieve internal documents?

Use permission-aware retrieval so users only access content they are authorized to see.

Let the model decide whether the user is authorized based on writing style.

Index every file in the company with no metadata.

Disable audit logs to reduce noise.

Up Next

4.7 Generative AI Foundations Case Lab

Continue learning

AWS AI Practitioner Study Guide

1Chapter 1: AIF-C01 Orientation and Official Source Control

2Chapter 2: AI/ML Foundations and Use-Case Fit

3Chapter 3: ML Lifecycle, Metrics, and Practitioner MLOps

4Chapter 4: Generative AI Foundations and Inference Concepts

5Chapter 5: Prompting, Model Selection, Customization, and Evaluation

6Chapter 6: Amazon Bedrock, RAG, Agents, and Guardrails

7Chapter 7: AWS Managed AI/ML Services and SageMaker Map

8Chapter 8: Responsible AI, Human Review, and Safety

9Chapter 9: Security, Compliance, Governance, and Cost Controls

10Chapter 10: Integrated AWS AI Business Scenario Labs

11Chapter 11: Final Review, Exam Readiness, and Recertification

4.6 Prompt Injection, Data Leakage, and User Input Risk

Key Takeaways

Treat user input as untrusted

Control matrix

AWS-aware risk reduction