AWS AI Practitioner Study Guide

Exam

1.1 Credential Scope, Code, and Search Language 1.2 Exam Format, Cost, Score, and Delivery 1.3 Official Exam Guide, Skill Builder, and Practice Workflow 1.4 Target Candidate Boundaries and Out-of-Scope Tasks 1.5 Domain Weights and Study Prioritization 1.6 Exam Policy, Retake, Results, and Recertification

2.1 AI, ML, Deep Learning, and Core Terminology 2.2 Supervised, Unsupervised, and Reinforcement Learning 2.3 Data Types, Labels, Structure, and Quality 2.4 Inference Patterns: Batch, Real-Time, and Embedded AI 2.5 AI Use-Case Fit and No-AI Decisions 2.6 Classification, Regression, Clustering, Forecasting, and Recommendation 2.7 AI/ML Foundations Case Lab

3.1 Data Collection, EDA, Preprocessing, and Feature Concepts 3.2 Training, Evaluation, Deployment, and Monitoring Lifecycle 3.3 Model Sources: Managed APIs, Open Source, and Custom Models 3.4 Evaluation Metrics and Business Metrics 3.5 Practitioner MLOps: Repeatability, Monitoring, and Retraining 3.6 SageMaker Lifecycle Service Map 3.7 ML Lifecycle Case Lab

4.1 Foundation Models, LLMs, Transformers, and Modalities 4.2 Tokens, Context Windows, Embeddings, and Vector Search 4.3 Inference Parameters: Temperature, Top-p, and Output Controls 4.4 Hallucination, Grounding, and Context Quality 4.5 GenAI Use-Case Fit and Risk Triage 4.6 Prompt Injection, Data Leakage, and User Input Risk 4.7 Generative AI Foundations Case Lab

5.1 Prompt Engineering Patterns and Business Quality 5.2 Zero-Shot, Few-Shot, Templates, and Instruction Design 5.3 Model Selection: Capability, Latency, Cost, and Risk 5.4 RAG vs Fine-Tuning vs Prompting vs Custom Models 5.5 Model Evaluation, Human Review, and Red-Team Feedback 5.6 Cost, Performance, and Throughput Decision-Making 5.7 Prompting and Model Selection Case Lab

6.1 Bedrock Core Concepts, Model Access, and Managed FM Choice 6.2 Knowledge Bases, RAG, Data Sources, and Grounding 6.3 Agents, Action Groups, Orchestration, and Business Workflows 6.4 Guardrails, Content Filters, Denied Topics, and Sensitive Data 6.5 Bedrock Model Evaluation, Monitoring, and Human Feedback 6.6 Bedrock Cost, Latency, Throughput, and Operational Fit 6.7 Bedrock, RAG, and Agents Case Lab

7.1 Managed AI Services vs Foundation Model Apps vs Custom ML 7.2 Text, Language, Search, and Document AI Services 7.3 Vision, Speech, Contact Center, and Personalization Services 7.4 Amazon Q Business, Developer, and Practitioner Fit 7.5 SageMaker Canvas, Studio, Clarify, Autopilot, and Data Wrangler 7.6 Data Foundation Services: S3, Glue, OpenSearch, and QuickSight 7.7 AWS AI Service Selection Case Lab

8.1 Fairness, Bias, Transparency, and Explainability 8.2 Privacy, Safety, Human Review, and Accountability 8.3 Guardrails, Clarify, A2I, and Content Safety Controls 8.4 Responsible AI Risk Registers and Governance Workflows 8.5 Monitoring, Feedback, Drift, and Incident Response 8.6 Responsible AI Case Lab

9.1 Shared Responsibility, IAM, and Least Privilege for AI 9.2 Encryption, Secrets, Networking, and Data Privacy 9.3 Prompt Injection, Data Exfiltration, and GenAI Threat Modeling 9.4 Logging, Monitoring, CloudTrail, CloudWatch, and Config 9.5 Compliance Artifact, Audit Manager, Macie, and Policy Evidence 9.6 AI Cost Controls, Pricing, Throughput, and Budget Governance 9.7 Security and Governance Case Lab

10.1 Customer Support GenAI Assistant Lab 10.2 Document Intelligence and Compliance Review Lab 10.3 Personalization, Forecasting, and Fraud Detection Lab 10.4 Enterprise Search, RAG, and Knowledge Management Lab 10.5 Responsible AI and Security Review Board Lab 10.6 Cost, Performance, and Operations Review Lab 10.7 Full AIF-C01 Business Simulation

11.1 Final 30-Day AIF-C01 Study Plan 11.2 Official Practice Resources and Weak-Domain Remediation 11.3 90-Minute Exam Timing, Flagging, and Guessing Workflow 11.4 Test-Day Checklist: Online or Test Center 11.5 Post-Exam Results, Retake, and Recertification Plan 11.6 AWS AI Practitioner Final Mixed Review

4.2 Tokens, Context Windows, Embeddings, and Vector Search

Key Takeaways

Tokens are the units a generative model reads and writes, so token volume affects cost, latency, and how much context can fit in one request.
A context window is the total space available for instructions, user input, retrieved content, and output, but a larger window does not automatically mean better answers.
Embeddings convert content into numeric vectors that represent semantic meaning and support similarity search.
Vector search is a core building block for retrieval augmented generation, where trusted content is retrieved before a model generates an answer.

Last updated: May 2026

Tokens and context windows

A token is a chunk of text that a model processes. In English, a token may be a whole word, part of a word, punctuation, or a formatting marker. The exact tokenization depends on the model. Practitioners do not need to tokenize text manually, but they should understand that model input and output are measured in tokens. This matters because tokens influence cost, latency, and the maximum amount of information a model can consider in one request.

A context window is the model's working space for a single interaction. It includes system instructions, developer or application instructions, user input, retrieved passages, examples, conversation history, and the model's generated output. If too much content is included, the request may exceed the window or force the application to remove useful material. If too little content is included, the model may answer from general knowledge rather than the organization's source of truth.

A larger context window is helpful, but it is not the same as accuracy or memory. Long prompts can bury the most important facts, mix outdated and current policy, or include irrelevant content that distracts the model. The practical question is not simply how many tokens fit. The question is whether the application gives the model the right evidence, in the right order, with clear instructions about what to do when evidence is missing.

Embeddings and vector search

Embeddings are numeric representations of meaning. An embedding model converts text, and sometimes other content types, into a vector: a list of numbers that captures semantic relationships. Similar ideas tend to have vectors that are close together, even when the exact words differ. That is why a search for password reset can find a document that says account recovery. The application is not just matching keywords; it is comparing meaning.

Vector search uses those embeddings to find similar content quickly. In a retrieval augmented generation workflow, the application embeds a user question, searches a vector index for relevant chunks, sends those chunks to the foundation model as context, and asks the model to answer using the retrieved material. AWS services can support this pattern in different ways. Knowledge Bases for Amazon Bedrock can connect data sources, create embeddings, use a vector store, and retrieve relevant context for Bedrock applications.

Amazon OpenSearch Service and OpenSearch Serverless are common vector search options in AWS architectures.

Building block	Practical purpose	Common pitfall
Token	Unit of model input or output	Ignoring token cost and response length
Context window	Space for prompt, evidence, history, and output	Filling it with noisy or stale material
Embedding	Numeric representation of meaning	Assuming embeddings prove facts are true
Vector index	Search structure for similar vectors	Forgetting permissions, freshness, or source quality
RAG	Retrieve trusted context before generation	Treating retrieval as a guarantee against every error

What a non-builder should ask

A business sponsor does not need to tune embedding dimensions, but the sponsor should ask whether the content pipeline is trustworthy. Retrieval quality depends on clean source documents, sensible chunking, useful metadata, access controls, and regular refresh. A policy manual split into random fragments may retrieve incomplete guidance. A knowledge base that indexes obsolete files may produce answers that are fluent but wrong. A vector index without permission filtering can expose information to users who should not see it.

For an AWS AI Practitioner candidate, the scenario judgment is straightforward. If a chatbot must answer questions from private company documents, plain prompting is usually weaker than a retrieval design. If the task is open-ended creative drafting, embeddings may be less central. If the answer must cite approved procedures, retrieval, citations, and refusal behavior become important. When cost or latency is tight, the team may need smaller prompts, better retrieval filters, or a model that balances speed and quality.

Use this readiness checklist before approving a vector-search-backed GenAI workflow:

Source documents are authoritative, current, and owned by a business team.
Sensitive data has been reviewed before indexing, with help from services such as Amazon Macie when appropriate.
Users only retrieve content they are allowed to access through IAM-aware application design and source permissions.
The application has a plan for content refresh, deletion, retention, and audit logging.
Test questions cover synonyms, abbreviations, missing information, and conflicting documents.

The important mental model is that embeddings find likely relevant content, while the foundation model turns selected context into a natural-language answer. Both pieces can fail in different ways. Retrieval can miss the right document, retrieve a weak chunk, or expose stale information. Generation can overstate, omit uncertainty, or blend retrieved facts with general language. Strong designs evaluate the full path, not only the model.

Test Your Knowledge

Why do tokens matter in a generative AI application?

They influence input limits, output limits, latency, and cost.

They are the only security control needed for private data.

They replace the need for prompts and instructions.

They are AWS account identifiers used by IAM.

Test Your Knowledge

What is the role of embeddings in a RAG workflow?

They represent content as vectors so similar meaning can be searched.

They encrypt every model response with a customer managed key.

They guarantee that every generated statement is true.

They remove all need for source-document governance.

Test Your Knowledge

A model has a very large context window. What should a practitioner still worry about?

Whether the context is relevant, current, authorized, and clearly ordered.

Whether the model can now ignore all output controls.

Whether vector search is legally prohibited in every AWS workload.

Whether token cost no longer applies.

Up Next

4.3 Inference Parameters: Temperature, Top-p, and Output Controls

Continue learning

AWS AI Practitioner Study Guide

1Chapter 1: AIF-C01 Orientation and Official Source Control

2Chapter 2: AI/ML Foundations and Use-Case Fit

3Chapter 3: ML Lifecycle, Metrics, and Practitioner MLOps

4Chapter 4: Generative AI Foundations and Inference Concepts

5Chapter 5: Prompting, Model Selection, Customization, and Evaluation

6Chapter 6: Amazon Bedrock, RAG, Agents, and Guardrails

7Chapter 7: AWS Managed AI/ML Services and SageMaker Map

8Chapter 8: Responsible AI, Human Review, and Safety

9Chapter 9: Security, Compliance, Governance, and Cost Controls

10Chapter 10: Integrated AWS AI Business Scenario Labs

11Chapter 11: Final Review, Exam Readiness, and Recertification

4.2 Tokens, Context Windows, Embeddings, and Vector Search

Key Takeaways

Tokens and context windows

Embeddings and vector search

What a non-builder should ask