7.4 End-to-End Solution Architectures

Key Takeaways

Document processing pipelines chain Document Intelligence (extract) -> Language (enrich: NER, PII, key phrases) -> AI Search (index + vectors) -> downstream app or Power BI.
Intelligent bots use a CLU orchestration model to route: FAQ intents to Custom Question Answering, action intents to a CLU domain model + backend API, and open-ended intents to Azure OpenAI grounded with RAG.
RAG is the flagship generative pattern (Implement generative AI, 15-20%): retrieve from AI Search, construct the prompt with context, generate with Azure OpenAI, then filter the output through Content Safety before returning with citations.
Safety checks belong at BOTH ends of a generative pipeline: Prompt Shields screen the input before generation; content filters and groundedness checks screen the output before it reaches the user.
A composed Document Intelligence model auto-classifies an incoming document and routes it to the correct component model, removing the need for a separate pre-classification step.

Last updated: June 2026

Quick Answer: Four patterns recur: document processing (Document Intelligence -> Language -> AI Search), orchestrated chatbot (CLU routes to Q&A, a CLU domain model, or RAG), enterprise RAG (AI Search retrieve -> OpenAI generate -> Content Safety filter), and moderation (Content Safety + blocklists + human review). The exam tests service responsibility and order of operations.

Architecture 1: Document Processing Pipeline

Upload -> Azure Blob Storage
  -> Document Intelligence  (text, tables, key-value pairs, classify, custom fields)
  -> Azure AI Language       (NER, key phrases, PII redaction, language detect)
  -> Azure AI Search         (index enriched content + vector embeddings)
  -> App / Power BI

Use cases: invoice processing, contract analysis, medical-record digitization, compliance review. A composed model in Document Intelligence auto-classifies mixed inbound documents (invoice vs receipt vs purchase order) and routes each to the right component model in one call.

Architecture 2: Orchestrated Intelligent Chatbot

User message -> Azure Bot Service (Teams / Web Chat)
  -> CLU orchestration model
       FAQ intent      -> Custom Question Answering -> curated answer
       Action intent   -> CLU domain model -> extract entities -> backend API
       Open-ended      -> Azure OpenAI + RAG -> grounded answer
       None            -> "I didn't catch that, please rephrase"
  -> Content Safety (filter response)
  -> Bot Service (reply)

The orchestration model is the key exam concept: a single CLU project that dispatches to child Q&A, CLU, and OpenAI capabilities so one bot handles FAQs and structured actions together.

Architecture 3: Enterprise RAG System

Ingestion: Blob / SharePoint / SQL
  -> AI Search indexer + skillset (OCR, NER, key phrases, embeddings)
  -> hybrid index (keyword + vector) + semantic ranking

Query time:
  1. User query -> Prompt Shields (block injection)
  2. Embed query -> vector + keyword retrieve from AI Search
  3. Build prompt = system message + retrieved context + query
  4. Azure OpenAI generates the answer
  5. Content Safety filters the output
  6. Groundedness check verifies support
  7. Return answer with citations

RAG, not fine-tuning, is the answer when the requirement is current or proprietary knowledge with citations. Fine-tuning changes style/format, not real-time facts.

Architecture 4: Content Moderation Pipeline

User-generated content
  -> Content Safety (4 categories)
  -> Blocklist check (org-specific terms)
  -> PII detection / redaction
  -> Decision:
       Safe (severity 0)      -> auto-approve
       Low/Medium (2-4)       -> human review queue
       High (6)               -> auto-reject + notify

For AI-generated content add Prompt Shields (pre-generation), content filters (input + output), groundedness, and protected-material detection.

Production Deployment Checklist

Category	Requirement	Implementation
Security	No keys in code	Managed identity + Key Vault
Security	Network isolation	Private endpoints + VNet
Security	Least privilege	RBAC roles
Reliability	High availability	Multi-region deployment
Reliability	Transient faults	Exponential backoff, honor Retry-After
Reliability	Graceful degradation	Fallback response when AI fails
Monitoring	API metrics	Azure Monitor diagnostics
Monitoring	Safety events	Log content-filter triggers
Monitoring	Cost	Azure Cost Management + quota alerts
Compliance	Privacy	Encryption, residency, no model training
Compliance	Responsible AI	Filters, human oversight, transparency

On the Exam: Order matters. Safety screening of generated output happens BEFORE the response reaches the user, and Prompt Shields run BEFORE generation — answers that place safety checks after delivery are wrong.

RAG vs Fine-Tuning vs Prompt Engineering

Generative-AI architecture questions almost always force a choice among three ways to make a model produce the right answer, and each fits a different need. Prompt engineering — system messages, few-shot examples, and templates — is the cheapest lever and the answer when the requirement is tone, format, or simple instruction-following. RAG grounds the model in retrieved data and is the answer whenever the requirement mentions current, proprietary, frequently changing, or citable knowledge: you update the index, not the model.

Fine-tuning retrains the model on labeled examples and is the answer only when you need a consistent specialized style or to encode a narrow task that prompting cannot reliably achieve; it does not teach the model new facts and goes stale the moment your data changes. A reliable elimination rule: if the stem says "up-to-date", "latest", or "company documents with citations", pick RAG; if it says "always respond in this exact format/persona", consider fine-tuning.

Skillsets, Indexers, and the Knowledge Mining Pipeline

The knowledge-mining domain (15-20%) tests the AI Search ingestion chain in detail. A data source points at Blob Storage, SQL, or Cosmos DB. An indexer crawls that source on a schedule. A skillset is the enrichment graph attached to the indexer — built-in skills run OCR, entity recognition, key-phrase extraction, language detection, and embedding generation, and a custom skill calls your own Azure Function for logic the built-ins lack. Enriched output lands in the search index for querying and can also be persisted to a knowledge store for downstream analytics in tables, objects, or files.

For modern grounding you add vector fields plus semantic ranking so retrieval combines keyword precision with embedding recall. Architecture questions probe which component owns which step, so anchor each verb — crawl, enrich, project, index, rank — to its component.

Multimodal and Edge Deployment Patterns

Two deployment variations show up repeatedly. First, multimodal solutions chain modalities: Speech transcribes an audio clip, Language extracts entities and sentiment from the transcript, Vision or Content Understanding reads attached images, and the combined signal drives a decision — the exam wants the correct service per modality in the correct order. Second, container and edge deployment runs select Azure AI models (such as Language sentiment, Vision Read, or Speech) in Docker containers on-premises or on Azure IoT Edge for data-residency, low-latency, or intermittent-connectivity needs.

Containers still require periodic connectivity to report usage for billing, so "fully offline forever" is a wrong-answer trap. Choosing containers is correct when the requirement is "data cannot leave the building" or "must function during network outages".

Test Your Knowledge

In an orchestrated chatbot, the CLU orchestration model classifies a user message as an FAQ intent. Where should it route the message?

Azure OpenAI for a generative response

Custom Question Answering for a curated answer

Azure AI Language for entity extraction

Azure AI Content Safety for moderation

Test Your Knowledge

In a RAG pipeline, what is the correct order of operations after a user submits a query?

Generate the response, then retrieve documents, then check content safety

Retrieve documents, then generate the response, then return without further checks

Run Prompt Shields, retrieve from AI Search, generate with Azure OpenAI, filter output with Content Safety, return with citations

Check content safety, then generate, then retrieve documents

Test Your Knowledge

A document workflow receives a mix of invoices, receipts, and purchase orders and must route each to the right extraction model automatically. Which Document Intelligence feature handles this?

Prebuilt layout model

A standalone custom field model

A composed model

The general read model

Up Next

7.5 Exam Day Quick Reference

Continue learning

Azure AI Engineer Associate

Azure AI-102

7.4 End-to-End Solution Architectures

Key Takeaways

Architecture 1: Document Processing Pipeline

Architecture 2: Orchestrated Intelligent Chatbot

Architecture 3: Enterprise RAG System

Architecture 4: Content Moderation Pipeline

Production Deployment Checklist

RAG vs Fine-Tuning vs Prompt Engineering

Skillsets, Indexers, and the Knowledge Mining Pipeline

Multimodal and Edge Deployment Patterns

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

7.4 End-to-End Solution Architectures

Key Takeaways

Architecture 1: Document Processing Pipeline

Architecture 2: Orchestrated Intelligent Chatbot

Architecture 3: Enterprise RAG System

Architecture 4: Content Moderation Pipeline

Production Deployment Checklist

RAG vs Fine-Tuning vs Prompt Engineering

Skillsets, Indexers, and the Knowledge Mining Pipeline

Multimodal and Edge Deployment Patterns