7.4 End-to-End Solution Architectures
Key Takeaways
- Document processing pipelines chain Document Intelligence (extract) -> Language (enrich: NER, PII, key phrases) -> AI Search (index + vectors) -> downstream app or Power BI.
- Intelligent bots use a CLU orchestration model to route: FAQ intents to Custom Question Answering, action intents to a CLU domain model + backend API, and open-ended intents to Azure OpenAI grounded with RAG.
- RAG is the flagship generative pattern (Implement generative AI, 15-20%): retrieve from AI Search, construct the prompt with context, generate with Azure OpenAI, then filter the output through Content Safety before returning with citations.
- Safety checks belong at BOTH ends of a generative pipeline: Prompt Shields screen the input before generation; content filters and groundedness checks screen the output before it reaches the user.
- A composed Document Intelligence model auto-classifies an incoming document and routes it to the correct component model, removing the need for a separate pre-classification step.
Quick Answer: Four patterns recur: document processing (Document Intelligence -> Language -> AI Search), orchestrated chatbot (CLU routes to Q&A, a CLU domain model, or RAG), enterprise RAG (AI Search retrieve -> OpenAI generate -> Content Safety filter), and moderation (Content Safety + blocklists + human review). The exam tests service responsibility and order of operations.
Architecture 1: Document Processing Pipeline
Upload -> Azure Blob Storage
-> Document Intelligence (text, tables, key-value pairs, classify, custom fields)
-> Azure AI Language (NER, key phrases, PII redaction, language detect)
-> Azure AI Search (index enriched content + vector embeddings)
-> App / Power BI
Use cases: invoice processing, contract analysis, medical-record digitization, compliance review. A composed model in Document Intelligence auto-classifies mixed inbound documents (invoice vs receipt vs purchase order) and routes each to the right component model in one call.
Architecture 2: Orchestrated Intelligent Chatbot
User message -> Azure Bot Service (Teams / Web Chat)
-> CLU orchestration model
FAQ intent -> Custom Question Answering -> curated answer
Action intent -> CLU domain model -> extract entities -> backend API
Open-ended -> Azure OpenAI + RAG -> grounded answer
None -> "I didn't catch that, please rephrase"
-> Content Safety (filter response)
-> Bot Service (reply)
The orchestration model is the key exam concept: a single CLU project that dispatches to child Q&A, CLU, and OpenAI capabilities so one bot handles FAQs and structured actions together.
Architecture 3: Enterprise RAG System
Ingestion: Blob / SharePoint / SQL
-> AI Search indexer + skillset (OCR, NER, key phrases, embeddings)
-> hybrid index (keyword + vector) + semantic ranking
Query time:
1. User query -> Prompt Shields (block injection)
2. Embed query -> vector + keyword retrieve from AI Search
3. Build prompt = system message + retrieved context + query
4. Azure OpenAI generates the answer
5. Content Safety filters the output
6. Groundedness check verifies support
7. Return answer with citations
RAG, not fine-tuning, is the answer when the requirement is current or proprietary knowledge with citations. Fine-tuning changes style/format, not real-time facts.
Architecture 4: Content Moderation Pipeline
User-generated content
-> Content Safety (4 categories)
-> Blocklist check (org-specific terms)
-> PII detection / redaction
-> Decision:
Safe (severity 0) -> auto-approve
Low/Medium (2-4) -> human review queue
High (6) -> auto-reject + notify
For AI-generated content add Prompt Shields (pre-generation), content filters (input + output), groundedness, and protected-material detection.
Production Deployment Checklist
| Category | Requirement | Implementation |
|---|---|---|
| Security | No keys in code | Managed identity + Key Vault |
| Security | Network isolation | Private endpoints + VNet |
| Security | Least privilege | RBAC roles |
| Reliability | High availability | Multi-region deployment |
| Reliability | Transient faults | Exponential backoff, honor Retry-After |
| Reliability | Graceful degradation | Fallback response when AI fails |
| Monitoring | API metrics | Azure Monitor diagnostics |
| Monitoring | Safety events | Log content-filter triggers |
| Monitoring | Cost | Azure Cost Management + quota alerts |
| Compliance | Privacy | Encryption, residency, no model training |
| Compliance | Responsible AI | Filters, human oversight, transparency |
On the Exam: Order matters. Safety screening of generated output happens BEFORE the response reaches the user, and Prompt Shields run BEFORE generation — answers that place safety checks after delivery are wrong.
RAG vs Fine-Tuning vs Prompt Engineering
Generative-AI architecture questions almost always force a choice among three ways to make a model produce the right answer, and each fits a different need. Prompt engineering — system messages, few-shot examples, and templates — is the cheapest lever and the answer when the requirement is tone, format, or simple instruction-following. RAG grounds the model in retrieved data and is the answer whenever the requirement mentions current, proprietary, frequently changing, or citable knowledge: you update the index, not the model.
Fine-tuning retrains the model on labeled examples and is the answer only when you need a consistent specialized style or to encode a narrow task that prompting cannot reliably achieve; it does not teach the model new facts and goes stale the moment your data changes. A reliable elimination rule: if the stem says "up-to-date", "latest", or "company documents with citations", pick RAG; if it says "always respond in this exact format/persona", consider fine-tuning.
Skillsets, Indexers, and the Knowledge Mining Pipeline
The knowledge-mining domain (15-20%) tests the AI Search ingestion chain in detail. A data source points at Blob Storage, SQL, or Cosmos DB. An indexer crawls that source on a schedule. A skillset is the enrichment graph attached to the indexer — built-in skills run OCR, entity recognition, key-phrase extraction, language detection, and embedding generation, and a custom skill calls your own Azure Function for logic the built-ins lack. Enriched output lands in the search index for querying and can also be persisted to a knowledge store for downstream analytics in tables, objects, or files.
For modern grounding you add vector fields plus semantic ranking so retrieval combines keyword precision with embedding recall. Architecture questions probe which component owns which step, so anchor each verb — crawl, enrich, project, index, rank — to its component.
Multimodal and Edge Deployment Patterns
Two deployment variations show up repeatedly. First, multimodal solutions chain modalities: Speech transcribes an audio clip, Language extracts entities and sentiment from the transcript, Vision or Content Understanding reads attached images, and the combined signal drives a decision — the exam wants the correct service per modality in the correct order. Second, container and edge deployment runs select Azure AI models (such as Language sentiment, Vision Read, or Speech) in Docker containers on-premises or on Azure IoT Edge for data-residency, low-latency, or intermittent-connectivity needs.
Containers still require periodic connectivity to report usage for billing, so "fully offline forever" is a wrong-answer trap. Choosing containers is correct when the requirement is "data cannot leave the building" or "must function during network outages".
In an orchestrated chatbot, the CLU orchestration model classifies a user message as an FAQ intent. Where should it route the message?
In a RAG pipeline, what is the correct order of operations after a user submits a query?
A document workflow receives a mix of invoices, receipts, and purchase orders and must route each to the right extraction model automatically. Which Document Intelligence feature handles this?