7.4 End-to-End Solution Architectures

Key Takeaways

  • Document processing pipelines chain Document Intelligence (extract) -> Language (enrich: NER, PII, key phrases) -> AI Search (index + vectors) -> downstream app or Power BI.
  • Intelligent bots use a CLU orchestration model to route: FAQ intents to Custom Question Answering, action intents to a CLU domain model + backend API, and open-ended intents to Azure OpenAI grounded with RAG.
  • RAG is the flagship generative pattern (Implement generative AI, 15-20%): retrieve from AI Search, construct the prompt with context, generate with Azure OpenAI, then filter the output through Content Safety before returning with citations.
  • Safety checks belong at BOTH ends of a generative pipeline: Prompt Shields screen the input before generation; content filters and groundedness checks screen the output before it reaches the user.
  • A composed Document Intelligence model auto-classifies an incoming document and routes it to the correct component model, removing the need for a separate pre-classification step.
Last updated: June 2026

Quick Answer: Four patterns recur: document processing (Document Intelligence -> Language -> AI Search), orchestrated chatbot (CLU routes to Q&A, a CLU domain model, or RAG), enterprise RAG (AI Search retrieve -> OpenAI generate -> Content Safety filter), and moderation (Content Safety + blocklists + human review). The exam tests service responsibility and order of operations.

Architecture 1: Document Processing Pipeline

Upload -> Azure Blob Storage
  -> Document Intelligence  (text, tables, key-value pairs, classify, custom fields)
  -> Azure AI Language       (NER, key phrases, PII redaction, language detect)
  -> Azure AI Search         (index enriched content + vector embeddings)
  -> App / Power BI

Use cases: invoice processing, contract analysis, medical-record digitization, compliance review. A composed model in Document Intelligence auto-classifies mixed inbound documents (invoice vs receipt vs purchase order) and routes each to the right component model in one call.

Architecture 2: Orchestrated Intelligent Chatbot

User message -> Azure Bot Service (Teams / Web Chat)
  -> CLU orchestration model
       FAQ intent      -> Custom Question Answering -> curated answer
       Action intent   -> CLU domain model -> extract entities -> backend API
       Open-ended      -> Azure OpenAI + RAG -> grounded answer
       None            -> "I didn't catch that, please rephrase"
  -> Content Safety (filter response)
  -> Bot Service (reply)

The orchestration model is the key exam concept: a single CLU project that dispatches to child Q&A, CLU, and OpenAI capabilities so one bot handles FAQs and structured actions together.

Architecture 3: Enterprise RAG System

Ingestion: Blob / SharePoint / SQL
  -> AI Search indexer + skillset (OCR, NER, key phrases, embeddings)
  -> hybrid index (keyword + vector) + semantic ranking

Query time:
  1. User query -> Prompt Shields (block injection)
  2. Embed query -> vector + keyword retrieve from AI Search
  3. Build prompt = system message + retrieved context + query
  4. Azure OpenAI generates the answer
  5. Content Safety filters the output
  6. Groundedness check verifies support
  7. Return answer with citations

RAG, not fine-tuning, is the answer when the requirement is current or proprietary knowledge with citations. Fine-tuning changes style/format, not real-time facts.

Architecture 4: Content Moderation Pipeline

User-generated content
  -> Content Safety (4 categories)
  -> Blocklist check (org-specific terms)
  -> PII detection / redaction
  -> Decision:
       Safe (severity 0)      -> auto-approve
       Low/Medium (2-4)       -> human review queue
       High (6)               -> auto-reject + notify

For AI-generated content add Prompt Shields (pre-generation), content filters (input + output), groundedness, and protected-material detection.

Production Deployment Checklist

CategoryRequirementImplementation
SecurityNo keys in codeManaged identity + Key Vault
SecurityNetwork isolationPrivate endpoints + VNet
SecurityLeast privilegeRBAC roles
ReliabilityHigh availabilityMulti-region deployment
ReliabilityTransient faultsExponential backoff, honor Retry-After
ReliabilityGraceful degradationFallback response when AI fails
MonitoringAPI metricsAzure Monitor diagnostics
MonitoringSafety eventsLog content-filter triggers
MonitoringCostAzure Cost Management + quota alerts
CompliancePrivacyEncryption, residency, no model training
ComplianceResponsible AIFilters, human oversight, transparency

On the Exam: Order matters. Safety screening of generated output happens BEFORE the response reaches the user, and Prompt Shields run BEFORE generation — answers that place safety checks after delivery are wrong.

RAG vs Fine-Tuning vs Prompt Engineering

Generative-AI architecture questions almost always force a choice among three ways to make a model produce the right answer, and each fits a different need. Prompt engineering — system messages, few-shot examples, and templates — is the cheapest lever and the answer when the requirement is tone, format, or simple instruction-following. RAG grounds the model in retrieved data and is the answer whenever the requirement mentions current, proprietary, frequently changing, or citable knowledge: you update the index, not the model.

Fine-tuning retrains the model on labeled examples and is the answer only when you need a consistent specialized style or to encode a narrow task that prompting cannot reliably achieve; it does not teach the model new facts and goes stale the moment your data changes. A reliable elimination rule: if the stem says "up-to-date", "latest", or "company documents with citations", pick RAG; if it says "always respond in this exact format/persona", consider fine-tuning.

Skillsets, Indexers, and the Knowledge Mining Pipeline

The knowledge-mining domain (15-20%) tests the AI Search ingestion chain in detail. A data source points at Blob Storage, SQL, or Cosmos DB. An indexer crawls that source on a schedule. A skillset is the enrichment graph attached to the indexer — built-in skills run OCR, entity recognition, key-phrase extraction, language detection, and embedding generation, and a custom skill calls your own Azure Function for logic the built-ins lack. Enriched output lands in the search index for querying and can also be persisted to a knowledge store for downstream analytics in tables, objects, or files.

For modern grounding you add vector fields plus semantic ranking so retrieval combines keyword precision with embedding recall. Architecture questions probe which component owns which step, so anchor each verb — crawl, enrich, project, index, rank — to its component.

Multimodal and Edge Deployment Patterns

Two deployment variations show up repeatedly. First, multimodal solutions chain modalities: Speech transcribes an audio clip, Language extracts entities and sentiment from the transcript, Vision or Content Understanding reads attached images, and the combined signal drives a decision — the exam wants the correct service per modality in the correct order. Second, container and edge deployment runs select Azure AI models (such as Language sentiment, Vision Read, or Speech) in Docker containers on-premises or on Azure IoT Edge for data-residency, low-latency, or intermittent-connectivity needs.

Containers still require periodic connectivity to report usage for billing, so "fully offline forever" is a wrong-answer trap. Choosing containers is correct when the requirement is "data cannot leave the building" or "must function during network outages".

Test Your Knowledge

In an orchestrated chatbot, the CLU orchestration model classifies a user message as an FAQ intent. Where should it route the message?

A
B
C
D
Test Your Knowledge

In a RAG pipeline, what is the correct order of operations after a user submits a query?

A
B
C
D
Test Your Knowledge

A document workflow receives a mix of invoices, receipts, and purchase orders and must route each to the right extraction model automatically. Which Document Intelligence feature handles this?

A
B
C
D