RAG, Grounding, Embeddings, and Azure AI Search

Key Takeaways

Retrieval-augmented generation grounds responses in trusted content by retrieving relevant chunks at query time and passing them to the model as context.
Chunking quality matters because the retriever matches chunks, not whole libraries; preserve headings, source IDs, permissions, and citation metadata.
Embeddings turn text or multimodal content into vectors so Azure AI Search can retrieve semantic matches even when query words differ from document words.
Hybrid search combines keyword and vector retrieval, while semantic ranker can rerank rich text results for better relevance in RAG workloads.
Evaluate RAG at both stages: retrieval relevance for the chunks and groundedness, relevance, completeness, safety, and citation quality for the final answer.

Last updated: June 2026

Why RAG Is Tested Heavily

Retrieval-augmented generation (RAG) is the standard pattern for answering with current or private knowledge that a base model was not trained on. The model still generates the final answer, but the factual material comes from retrieved grounding data. On AI-103, RAG appears in Implement generative AI and agentic solutions (30-35%) and again in Implement information extraction solutions (10-15%), where the blueprint explicitly lists configuring semantic, hybrid, and vector search and building RAG ingestion flows with OCR.

Questions often ask you to diagnose hallucinations, select an Azure AI Search feature, or decide whether RAG, fine-tuning, or an agent tool is the right answer.

Classic RAG Pipeline

Stage	What happens	Common AI-103 trap
Ingest	Bring in documents, pages, records, images, or PDFs	Ignoring permissions and source metadata
Chunk	Split content into meaningful passages	Chunks too large, too small, or missing headings
Enrich	OCR, layout extraction, language analysis, cleanup	Treating scanned PDFs as plain text
Embed	Convert chunks and sometimes queries into vectors	Changing embedding model without regenerating vectors
Index	Store text, vectors, filters, fields, and citations	Forgetting filterable fields for security trimming
Retrieve	Run vector, keyword, hybrid, or agentic retrieval	Using only one method when recall is poor
Generate	Put selected chunks into the prompt and answer with citations	Letting the model answer beyond retrieved evidence
Evaluate	Score retrieval and response quality	Testing only final prose, not retrieved chunks

Chunking and Grounding

Good RAG starts before the first chat request. A policy handbook, API reference, or contract collection should be chunked around natural structure: headings, sections, tables, paragraphs, and page boundaries. Each chunk should carry document title, URL or blob path, page or section, last-modified date, access-control metadata, and any citation fields the user interface needs. These fields are what later make a security-trimming filter possible and a citation clickable.

Chunk overlap (carrying a small window of text from the previous chunk into the next) helps when an answer straddles a section boundary, but too much overlap multiplies storage cost and returns near-duplicate hits. Very large chunks waste context-window space and dilute the embedding's meaning; very tiny chunks lose the explanation needed to answer. The practical target is not a magic token number — it is the smallest chunk that still preserves enough meaning for one answer and one citation. A common starting point teams test is a few hundred tokens per chunk with modest overlap, then tuned by evaluation.

Why grounding beats fine-tuning for facts

When a model invents a plan code or a price, the defect is missing or wrong retrieved context, not a missing skill. Fine-tuning teaches behavior and style, not fresh facts; it cannot keep up with documents that change weekly, and retraining is slow and costly. RAG fixes the same problem by changing what the model sees at query time, which is why an AI-103 hallucination scenario almost always points to a retrieval or grounding fix rather than fine-tuning.

Embeddings, Vector Search, Hybrid Search, and Semantic Ranker

An embedding model converts text or supported multimodal content into numeric vectors. Azure AI Search stores those vectors and performs approximate nearest-neighbor retrieval for semantic similarity. That is why a query for access badge replacement can find a policy chunk titled lost credential procedure even with no shared words. Critical rule: if you change the embedding model, you must re-embed and re-index every chunk — old and new vectors are not comparable.

Use this contrast to answer retrieval questions:

Retrieval method	Strength	Weakness
Vector search	Concepts, synonyms, paraphrases	Misses exact codes, IDs, rare terms
Keyword (BM25) search	Exact identifiers, product names, error codes, legal terms	Misses paraphrased intent
Hybrid search	Merges both result sets — best enterprise default	Slightly higher complexity
Semantic ranker	Reranks top text results with language understanding	Adds latency and cost; needs a result set to rerank

Azure AI Search also supports integrated vectorization, where indexers and skillsets chunk and embed content during indexing instead of requiring custom ingestion code. For production, add security trimming with a filterable field (such as group IDs) so each user retrieves only chunks they are allowed to see.

Citations and Refusal Behavior

Citations are not decorative — they let a user verify a generated claim against the source chunk and let evaluators measure groundedness. A grounded app should render citation metadata, refuse to make claims beyond the retrieved evidence, and ask a clarifying follow-up when retrieval returns nothing relevant. Two RAG-specific metrics matter on the exam: retrieval relevance (did we fetch the right chunks?) and groundedness (does the answer stay inside those chunks?). Test both — strong prose grounded in the wrong chunk is still a failure.

Official Anchors

RAG in Azure AI Search: https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview
Vector search overview: https://learn.microsoft.com/en-us/azure/search/vector-search-overview
Hybrid search in Azure AI Search: https://learn.microsoft.com/en-us/azure/search/hybrid-search-how-to-query
RAG evaluators in Foundry: https://learn.microsoft.com/en-us/azure/foundry/concepts/evaluation-evaluators/rag-evaluators

Test Your Knowledge

A benefits assistant answers from company PDFs. It misses queries that use everyday employee wording instead of policy terminology, and it also misses exact plan codes when employees provide them. What retrieval approach is the best first fix?

Use hybrid search so keyword matches cover exact codes while vector search covers semantic wording, then evaluate whether semantic ranker improves the top results

Disable chunking and send every benefits document to the model in each prompt

Fine-tune the model on the PDFs and remove all source citations

Use only semantic ranker without storing searchable text or vectors

Test Your Knowledge

An engineer swaps the embedding model to a newer one but keeps the existing vectors in the Azure AI Search index. Vector search relevance collapses. What is the root cause?

Semantic ranker was left enabled

The index lost its filterable fields for security trimming

Old vectors were produced by a different embedding model and are not comparable to new query vectors, so the index must be re-embedded

Chunk overlap is now too small

Up Next

Agent Tools, Functions, Memory, and Evaluation

Continue learning

Microsoft Azure AI Apps and Agents Developer Associate

Microsoft Azure AI App and Agent Developer (AI-103)

RAG, Grounding, Embeddings, and Azure AI Search

Key Takeaways

Why RAG Is Tested Heavily

Classic RAG Pipeline

Chunking and Grounding

Why grounding beats fine-tuning for facts

Embeddings, Vector Search, Hybrid Search, and Semantic Ranker

Citations and Refusal Behavior

Official Anchors

Microsoft Azure AI Apps and Agents Developer Associate

1AI-103 Blueprint, Microsoft Foundry, and Solution Planning

2Generative AI, Agents, and Retrieval-Augmented Generation

3Vision, Language, Information Extraction, and Final Review

Microsoft Azure AI App and Agent Developer (AI-103)

RAG, Grounding, Embeddings, and Azure AI Search

Key Takeaways

Why RAG Is Tested Heavily

Classic RAG Pipeline

Chunking and Grounding

Why grounding beats fine-tuning for facts

Embeddings, Vector Search, Hybrid Search, and Semantic Ranker

Citations and Refusal Behavior

Official Anchors