RAG, Grounding, Embeddings, and Azure AI Search

Key Takeaways

  • Retrieval-augmented generation grounds responses in trusted content by retrieving relevant chunks at query time and passing them to the model as context.
  • Chunking quality matters because the retriever matches chunks, not whole libraries; preserve headings, source IDs, permissions, and citation metadata.
  • Embeddings turn text or multimodal content into vectors so Azure AI Search can retrieve semantic matches even when query words differ from document words.
  • Hybrid search combines keyword and vector retrieval, while semantic ranker can rerank rich text results for better relevance in RAG workloads.
  • Evaluate RAG at both stages: retrieval relevance for the chunks and groundedness, relevance, completeness, safety, and citation quality for the final answer.
Last updated: June 2026

Why RAG Is Tested Heavily

Retrieval-augmented generation, or RAG, is the standard pattern for answering with current or private knowledge that a base model was not trained on. The model still generates the final answer, but the factual material comes from retrieved grounding data. AI-103 questions often ask you to diagnose hallucinations, select Azure AI Search features, or decide whether RAG, fine-tuning, or an agent tool is the right answer.

Classic RAG Pipeline

StageWhat happensCommon AI-103 trap
IngestBring in documents, pages, records, images, or PDFsIgnoring permissions and source metadata
ChunkSplit content into meaningful passagesChunks too large, too small, or missing headings
EnrichOCR, layout extraction, language analysis, cleanupTreating scanned PDFs as plain text
EmbedConvert chunks and sometimes queries into vectorsChanging embedding model without regenerating vectors
IndexStore text, vectors, filters, fields, and citationsForgetting filterable fields for security trimming
RetrieveRun vector, keyword, hybrid, or agentic retrievalUsing only one method when recall is poor
GeneratePut selected chunks into the prompt and answer with citationsLetting the model answer beyond retrieved evidence
EvaluateScore retrieval and response qualityTesting only final prose, not retrieved chunks

Chunking and Grounding

Good RAG starts before the first chat request. A policy handbook, API reference, or contract collection should be chunked around natural structure: headings, sections, tables, paragraphs, and page boundaries. Each chunk should carry document title, URL or blob path, page or section, last modified date, access-control metadata, and any citation fields the UI needs.

Chunk overlap can help when answers span section boundaries, but too much overlap increases cost and duplicate retrieval. Very large chunks waste context window space. Very tiny chunks lose the explanation needed to answer a question. The practical target is not a magic token number; it is the smallest chunk that still preserves enough meaning for an answer and citation.

Embeddings, Vector Search, Hybrid Search, and Semantic Ranker

An embedding model converts text or supported multimodal content into numeric vectors. Azure AI Search can store those vectors and perform nearest-neighbor retrieval for semantic similarity. That is why a query for access badge replacement can find a policy chunk titled lost credential procedure even without exact word overlap.

Pure vector search is strong for concepts. Keyword search is strong for exact identifiers, product names, error codes, and legal terms. Hybrid search runs both and merges the result sets, which is often the best default for enterprise RAG. Semantic ranker adds another query-time step that uses language understanding to rerank text-rich results. Use it when measured relevance improves enough to justify latency and cost.

Azure AI Search also supports integrated vectorization, where indexers and skillsets can chunk and embed content during indexing instead of requiring custom ingestion code. For production, add security trimming so users retrieve only chunks they are allowed to see.

Citations and Refusal Behavior

Citations are not decorative. They let the user verify a generated claim against the source chunk. A grounded app should render citation metadata, avoid unsupported claims, and refuse or ask a follow-up question when retrieval does not provide enough evidence.

Official Anchors

Test Your Knowledge

A benefits assistant answers from company PDFs. It misses queries that use everyday employee wording instead of policy terminology, and it also misses exact plan codes when employees provide them. What retrieval approach is the best first fix?

A
B
C
D