1.5 Azure AI Solution Architecture Patterns

Key Takeaways

Microservices isolate each AI capability for independent scaling and fault containment; orchestration coordinates services in a pipeline via Functions, Logic Apps, or Durable Functions.
RAG (Azure AI Search retrieval + Azure OpenAI generation) is the most-tested architecture pattern on AI-102 and the cure for hallucination on enterprise data.
Edge deployment runs containerized AI on Azure IoT Edge for low-latency, intermittent-connectivity scenarios, but containers still need periodic connectivity for billing.
Language, Speech, Vision Read, Document Intelligence, and Translator offer Docker containers; not all cloud features are available in the container.
Multi-region deployment behind Front Door or Traffic Manager delivers HA and DR for AI workloads.

Last updated: June 2026

Quick Answer: The four patterns AI-102 tests are microservices (independent services), orchestration (pipeline coordination), RAG (Azure AI Search + Azure OpenAI), and edge/container (IoT Edge). Design for scalability, fault tolerance, cost, security, and compliance. RAG is the most heavily tested.

Pattern 1: Microservices

Each capability is its own deployable service behind an API gateway, scaling and failing independently.

[Client] -> [API Gateway / Azure API Management]
              |-- Vision Service   -> Azure AI Vision
              |-- Language Service -> Azure AI Language
              |-- Speech Service   -> Azure AI Speech
              |-- Search Service   -> Azure AI Search

Use when: large apps where capabilities have different scaling and release cadences and you want fault isolation (a Vision outage must not take down Language).

Pattern 2: Orchestration pipeline

A central coordinator — Azure Functions, Durable Functions, or Logic Apps — runs services in sequence or fan-out/fan-in.

[Input] -> [Orchestrator]
            |-- 1. OCR            (Document Intelligence)
            |-- 2. Entity + PII   (AI Language)
            |-- 3. Sentiment      (AI Language)
            |-- 4. Safety check   (Content Safety)
            |-- 5. Index results  (AI Search)

Use when: document-processing and content-enrichment workflows with multiple ordered steps. Durable Functions is the right pick when a question stresses long-running, stateful, or fan-out orchestration.

Pattern 3: RAG (Retrieval-Augmented Generation)

The flagship pattern. Retrieve grounded context, then generate.

[User query]
  -> embed query, search Azure AI Search (vector / hybrid + semantic ranking)
  -> build prompt: system message + retrieved chunks + user query
  -> Azure OpenAI chat completion (grounded answer + citations)
  -> Content Safety / groundedness check
  -> response to user

Use when: enterprise chatbots, knowledge bases, support assistants, document Q&A. RAG solves the core hallucination problem by forcing the model to answer from retrieved enterprise content. Exam questions pair Azure AI Search (retrieval) with Azure OpenAI (generation) — recognize that two-service signature instantly. Vector or hybrid search plus semantic ranking is the recommended retrieval configuration.

Pattern 4: Edge deployment with containers

Run inference locally on Azure IoT Edge for latency, offline tolerance, or data-residency.

[Camera / IoT device]
  -> Azure IoT Edge
       |-- Custom Vision container (local inference)
       |-- Speech container (local STT/TTS)
  -> Azure IoT Hub (sync results to cloud)

Use when: assembly-line defect detection, retail analytics, or remote sites with poor connectivity.

Container support and constraints

Service	Containerized	Typical edge use
Azure AI Language	Yes	Sentiment, NER, key phrases on-prem
Azure AI Speech	Yes	STT/TTS without internet dependency
Azure AI Vision (Read/OCR)	Yes	OCR at the edge
Document Intelligence	Yes	Form processing on-prem
Azure AI Translator	Yes	Offline translation

Four container facts the exam repeats: (1) the model runs locally, but the container still needs periodic connectivity to Azure for billing/metering and will stop after extended disconnection; (2) you must accept the EULA and pass Endpoint + ApiKey + Billing environment variables on docker run; (3) not every cloud feature is available in the container; (4) containers are chosen for latency, compliance, or connectivity, never for cheaper compute.

On the Exam: "Real-time inference with no reliable internet" => an edge container on IoT Edge, with the caveat that billing still requires periodic connectivity. A purely cloud answer fails the connectivity requirement.

Cost optimization

Lever	How it saves	Typical savings
Commitment-tier pricing	Pre-purchase usage at a discount	15-30%
Right-sizing	Match provisioned throughput to demand	Variable
Batch APIs	Defer non-urgent work to batch	40-60%
Caching	Reuse repeated results	Variable
F0 for dev	Free dev/test usage	100% (dev only)
Lower-cost regions	Deploy where compliance allows	5-20%

High availability and disaster recovery

Multi-region: deploy the AI resource in two or more regions and route with Azure Front Door or Traffic Manager, failing over automatically when a region is unhealthy.
Data protection: keep training data and exported custom models in geo-redundant storage (GRS), back up custom model configurations, and document training parameters for reproducibility.
Quota awareness: for Azure OpenAI, spread provisioned throughput or pay-as-you-go deployments across regions so a single-region quota limit cannot become a single point of failure.

Choosing between the patterns

The patterns are not mutually exclusive, and exam scenarios often blend them — a RAG chatbot is usually also a set of microservices behind an API gateway, with an orchestrator stitching retrieval, generation, and a safety check together. The skill is choosing the dominant pattern the scenario is really asking about. If the emphasis is independent scaling and fault isolation of distinct capabilities, the answer is microservices. If the emphasis is coordinating ordered, possibly stateful steps, the answer is orchestration (and Durable Functions for long-running or fan-out work).

If the emphasis is grounding generative answers in private enterprise data, the answer is RAG. If the emphasis is latency, offline operation, or keeping data on-premises, the answer is the edge/container pattern. Treat the requirement keywords as the discriminator: "scale each independently," "multi-step pipeline," "ground the answers," and "no reliable internet" each point to exactly one pattern.

Throughput, scaling, and resilience details

For Azure OpenAI specifically, two deployment models matter for architecture questions. Standard (pay-as-you-go) deployments share regional capacity and are billed per token — good for variable or low-volume workloads. Provisioned Throughput Units (PTUs) reserve dedicated capacity for predictable, high-volume, low-latency workloads with stable cost. When a scenario stresses consistent latency at scale or guaranteed capacity, PTUs are the answer; when it stresses cost efficiency for spiky traffic, standard is the answer.

Add retry with exponential backoff for 429 (throttling) responses, caching for repeated prompts, and a multi-region failover so a regional capacity shortfall degrades gracefully rather than failing outright. These resilience patterns combine with the cost levers above: batch APIs and caching cut spend, while PTUs and multi-region routing protect availability and latency.

On the Exam: When the requirement is grounded answers over enterprise data, the two-service RAG signature (Azure AI Search + Azure OpenAI) is the answer even if microservices or orchestration also appear in the diagram — the grounding requirement is what the question is really testing.

Test Your Knowledge

Which architecture pattern combines Azure AI Search retrieval with Azure OpenAI generation to ground answers in enterprise data?

RAG (Retrieval-Augmented Generation)

Microservices pattern

Edge deployment pattern

Batch processing pattern

Test Your Knowledge

Why does an Azure AI container deployed on-premises still require periodic internet connectivity?

For billing and metering

To download the model on every request

To authenticate end users via Entra ID

To fetch encryption keys from Key Vault

Test Your Knowledge

A factory needs real-time defect detection on a line with unreliable internet. Which deployment best fits?

Cloud-only Azure AI Vision

Batch processing with Azure Functions

A RAG pipeline with Azure OpenAI

A Custom Vision container on Azure IoT Edge

Test Your Knowledge

Which orchestration choice best fits a long-running, stateful, fan-out/fan-in AI processing workflow?

Azure Durable Functions

A single stateless Azure Function per request

A static Bicep template

An Azure Key Vault access policy

Up Next

2.1 Azure AI Content Safety Overview

Content Safety and Moderation (within Plan and Manage, Domain 1)

Azure AI Engineer Associate

Azure AI-102

1.5 Azure AI Solution Architecture Patterns

Key Takeaways

Pattern 1: Microservices

Pattern 2: Orchestration pipeline

Pattern 3: RAG (Retrieval-Augmented Generation)

Pattern 4: Edge deployment with containers

Container support and constraints

Cost optimization

High availability and disaster recovery

Choosing between the patterns

Throughput, scaling, and resilience details

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

1.5 Azure AI Solution Architecture Patterns

Key Takeaways

Pattern 1: Microservices

Pattern 2: Orchestration pipeline

Pattern 3: RAG (Retrieval-Augmented Generation)

Pattern 4: Edge deployment with containers

Container support and constraints

Cost optimization

High availability and disaster recovery

Choosing between the patterns

Throughput, scaling, and resilience details