What is the primary purpose of RAG (Retrieval-Augmented Generation)?

To retrieve relevant information and include it in the prompt to ground the model's response. RAG retrieves relevant documents from external sources (like Azure AI Search) and includes them in the prompt as context. This grounds the model's response in factual data, reducing hallucinations and enabling answers about proprietary or recent information.

Which Azure service is primarily used as the retrieval component in a RAG architecture?

Azure AI Search. Azure AI Search is the primary service for implementing the retrieval step in RAG. It provides vector search, semantic ranking, and hybrid search capabilities to find relevant documents that are then included in the prompt for the generative model.

What are "embeddings" in the context of generative AI?

Numerical vector representations of text that capture semantic meaning. Embeddings are numerical vector representations of text that capture its semantic meaning. Similar text produces similar vectors, enabling semantic search — finding content by meaning rather than exact keyword matches. They are essential for vector search in RAG architectures.

A company wants employees to ask natural language questions about internal policies and receive accurate answers from their HR documents. Which approach is most appropriate?

Implement RAG using Azure AI Search to retrieve relevant HR documents and Azure OpenAI to generate answers. RAG is the best approach because it retrieves relevant HR documents from a knowledge base (Azure AI Search) and includes them as context for the generative model (Azure OpenAI). This ensures answers are grounded in actual company policies rather than hallucinated or outdated information.

Retrieval-Augmented Generation (RAG)

Quick Answer: RAG enhances generative AI by retrieving relevant documents from external sources and including them as context in the prompt. This grounds the model's response in factual data, reducing hallucinations and enabling answers about proprietary or recent information. Azure AI Search is the primary Azure service for implementing RAG.

What Is RAG?

Retrieval-Augmented Generation (RAG) is an architecture pattern that combines information retrieval with generative AI to produce more accurate, grounded, and up-to-date responses.

The Problem RAG Solves

Generative AI models have two fundamental limitations:

Knowledge cutoff — they do not know about events after their training data was collected
No proprietary data — they have no access to your organization's private documents, databases, or systems

RAG solves both problems by retrieving relevant information and injecting it into the prompt before the model generates a response.

RAG vs. Standard Generation

Approach	How It Works	Accuracy	Data Access
Standard generation	Model responds from pre-trained knowledge only	Risk of hallucinations and outdated info	Only pre-training data
RAG	Retrieve relevant documents → include in prompt → generate grounded response	Higher accuracy, fewer hallucinations	Your data + pre-training

How RAG Works

The Three-Step RAG Process

Step 1: Retrieve

User asks a question
The system searches a knowledge base (Azure AI Search) for relevant documents
Retrieval can use keyword search, vector search, or hybrid search
Top-K most relevant documents or passages are selected

Step 2: Augment

Retrieved documents are added to the prompt as context
The prompt now includes: system message + retrieved context + user question
This "augments" the model's knowledge with specific, relevant data

Step 3: Generate

The generative AI model produces a response grounded in the retrieved context
The response is based on the provided documents, not just pre-trained knowledge
This significantly reduces hallucinations

RAG Example

Without RAG: User: "What is our company's parental leave policy?" Model: "I'm sorry, I don't have information about your specific company's policies." (Or worse, it hallucmates a policy.)

With RAG:

Retrieve: Search company HR documents → find parental_leave_policy.pdf
Augment: Add policy document to prompt as context
Generate: Model responds: "According to your company's policy, employees are entitled to 16 weeks of paid parental leave, applicable to both birth and adoptive parents..."

Embeddings and Vector Search

What Are Embeddings?

Embeddings are numerical vector representations of text that capture its semantic meaning. Similar meanings produce similar vectors, enabling semantic search.

Text	Vector (simplified)	Similar To
"How to reset my password"	[0.12, 0.87, 0.34, ...]	"Change login credentials"
"What's the weather like"	[0.91, 0.05, 0.73, ...]	"Temperature forecast today"

Why Embeddings Matter for RAG

Traditional keyword search matches exact words: searching "reset password" only finds documents containing those exact words.

Semantic search with embeddings matches MEANING: searching "reset password" also finds documents about "change credentials," "update login," or "recover account access" — even if they don't contain the exact words "reset" or "password."

Types of Search in Azure AI Search

Search Type	How It Works	Best For
Keyword search	Matches exact terms (TF-IDF, BM25)	Finding specific terms and phrases
Vector search	Matches meaning using embeddings	Finding semantically similar content
Hybrid search	Combines keyword + vector search	Best overall retrieval quality
Semantic ranking	Re-ranks results using a language model	Improving result relevance

Azure AI Search for RAG

Azure AI Search is the primary service for building RAG solutions on Azure:

Key Capabilities for RAG

Feature	Description
Indexing	Ingest and index documents from Azure Blob, SQL, Cosmos DB, and more
Vector search	Store and search embedding vectors for semantic matching
Hybrid search	Combine keyword and vector search for best results
Semantic ranking	Use AI to re-rank results by relevance
AI enrichment	Apply AI skills during indexing (OCR, entity extraction, translation)
Integrated vectorization	Automatically generate embeddings during indexing and querying

RAG Architecture on Azure

User Question
     │
     ▼
[Azure AI Search]  ──── retrieve relevant documents
     │
     ▼
[Prompt Construction] ──── system message + retrieved docs + question
     │
     ▼
[Azure OpenAI Service] ──── generate grounded response
     │
     ▼
Grounded Answer

On the Exam: Know that RAG uses a retrieval step (Azure AI Search) to find relevant documents, then includes them in the prompt for the generative model (Azure OpenAI). This grounds responses in factual data and reduces hallucinations. You do NOT need to know how to implement RAG — just understand the concept and why it is important.

RAG vs. Fine-Tuning

Aspect	RAG	Fine-Tuning
How it works	Retrieve relevant data at query time	Modify the model's weights with training data
Data freshness	Always uses latest data from knowledge base	Reflects data at time of fine-tuning
Cost	Search infrastructure costs	Training compute costs
Speed to implement	Fast (index documents, configure search)	Slow (prepare data, train, validate)
Best for	Dynamic data, Q&A over documents, current info	Consistent behavior change, style adaptation
Hallucination reduction	Strong (grounded in retrieved data)	Moderate (model still generates from weights)

On the Exam: If a question asks about answering questions from company documents or providing up-to-date information, RAG is usually the correct answer. Fine-tuning is for changing the model's overall behavior or style, not for adding specific knowledge.

Microsoft Azure AI Fundamentals

5.4 Retrieval-Augmented Generation (RAG)

Key Takeaways

Retrieval-Augmented Generation (RAG)

What Is RAG?

The Problem RAG Solves

RAG vs. Standard Generation

How RAG Works

The Three-Step RAG Process

RAG Example

Embeddings and Vector Search

What Are Embeddings?

Why Embeddings Matter for RAG

Types of Search in Azure AI Search

Azure AI Search for RAG

Key Capabilities for RAG

RAG Architecture on Azure

RAG vs. Fine-Tuning

Microsoft Azure AI Fundamentals

1Introduction

2Domain 1: Describe AI Workloads and Considerations (15-20%)

3Domain 2: Fundamental Principles of Machine Learning on Azure (20-25%)

4Domain 3: Computer Vision Workloads on Azure (15-20%)

5Domain 4: Natural Language Processing Workloads on Azure (15-20%)

6Domain 5: Generative AI Workloads on Azure (15-20%)

7Exam Review and Full-Length Practice Questions

5.4 Retrieval-Augmented Generation (RAG)

Key Takeaways

Retrieval-Augmented Generation (RAG)

What Is RAG?

The Problem RAG Solves

RAG vs. Standard Generation

How RAG Works

The Three-Step RAG Process

RAG Example

Embeddings and Vector Search

What Are Embeddings?

Why Embeddings Matter for RAG

Types of Search in Azure AI Search

Azure AI Search for RAG

Key Capabilities for RAG

RAG Architecture on Azure

RAG vs. Fine-Tuning