6.1 Azure OpenAI Service — Models and Deployment

Key Takeaways

  • Azure OpenAI Service provides access to OpenAI models (GPT-4o, GPT-4, GPT-3.5 Turbo, DALL-E 3, Whisper, text-embedding-3-large) within the Azure platform with enterprise security.
  • Models must be explicitly deployed to an Azure OpenAI resource before they can be used — deployment creates an endpoint with a specific model version.
  • Deployment types include Standard (shared infrastructure, pay-per-token), Provisioned (dedicated throughput units), and Global (cross-region routing).
  • Azure OpenAI offers the same models as OpenAI but adds enterprise features: VNet integration, managed identity, content filtering, and compliance certifications.
  • Each model family has different capabilities: GPT-4o (multimodal text+vision), GPT-4 (text generation), DALL-E 3 (image generation), Whisper (speech-to-text), embeddings (vector representations).
Last updated: March 2026

Azure OpenAI Service — Models and Deployment

Quick Answer: Azure OpenAI provides GPT-4o (multimodal), GPT-4 (text), GPT-3.5 Turbo (fast text), DALL-E 3 (images), Whisper (speech-to-text), and embedding models within Azure. Deploy models to your Azure OpenAI resource with Standard, Provisioned, or Global deployment types.

Available Models

ModelTypeCapabilitiesContext Window
GPT-4oMultimodalText + image understanding, code generation, reasoning128K tokens
GPT-4o miniMultimodalFaster, cheaper GPT-4o variant128K tokens
GPT-4TextAdvanced reasoning, code generation8K-128K tokens
GPT-4 TurboTextFaster GPT-4 with vision support128K tokens
GPT-3.5 TurboTextFast, cost-effective text generation16K tokens
DALL-E 3Image generationGenerate images from text descriptionsN/A
WhisperAudioSpeech-to-text transcription and translation25 MB audio
text-embedding-3-largeEmbeddingsConvert text to vector representations8K tokens
text-embedding-3-smallEmbeddingsSmaller, faster embedding model8K tokens
text-embedding-ada-002EmbeddingsLegacy embedding model8K tokens

Creating an Azure OpenAI Resource

# Create an Azure OpenAI resource
az cognitiveservices account create \
    --name my-openai-service \
    --resource-group rg-ai-prod \
    --kind OpenAI \
    --sku S0 \
    --location eastus \
    --yes

Important: Azure OpenAI Service is NOT available through a multi-service Azure AI Services resource. It requires its own dedicated resource with --kind OpenAI.

Model Deployment

Deploying a Model via Azure CLI

# Deploy GPT-4o
az cognitiveservices account deployment create \
    --name my-openai-service \
    --resource-group rg-ai-prod \
    --deployment-name gpt4o-deployment \
    --model-name gpt-4o \
    --model-version "2024-08-06" \
    --model-format OpenAI \
    --sku-name Standard \
    --sku-capacity 10  # Thousands of tokens per minute (TPM)

Deployment Types

TypeDescriptionBillingBest For
StandardShared infrastructure, variable throughputPay per token consumedDevelopment, variable workloads
ProvisionedDedicated throughput units (PTU)Monthly commitmentProduction with predictable loads
Global StandardRoutes traffic across regionsPay per tokenMulti-region, highest availability

Provisioned Throughput Units (PTU)

  • Purchase a fixed number of throughput units
  • Guaranteed minimum throughput (tokens per minute)
  • Lower per-token cost for high-volume workloads
  • Monthly commitment required

Key Differences: Azure OpenAI vs. OpenAI

FeatureAzure OpenAIOpenAI (Direct)
AuthenticationAPI key OR Entra ID / managed identityAPI key only
Network securityVNet, private endpoints, IP filteringInternet-only
Content filteringBuilt-in, configurableLimited
ComplianceSOC 2, HIPAA, GDPR, FedRAMPLimited
Data privacyData NOT used to train modelsData NOT used (API usage)
SLA99.9% (Standard), 99.95% (PTU)Best effort
Regional deploymentChoose specific Azure regionOpenAI infrastructure

On the Exam: Azure OpenAI is differentiated by enterprise features: VNet/private endpoints, managed identity, built-in content filtering, and compliance certifications. Questions may ask why a company would choose Azure OpenAI over direct OpenAI — the answer is enterprise security and compliance.

Token Concepts

What Are Tokens?

  • Tokens are the fundamental units of text processed by LLMs
  • English: approximately 1 token = 0.75 words (or 4 characters)
  • Total tokens = input tokens (prompt) + output tokens (completion)
  • Cost is calculated per 1,000 tokens (input and output priced separately)

Token Limits

ConceptDescription
Context windowMaximum total tokens (input + output) per request
Max output tokensMaximum tokens in the generated response
TPM (Tokens Per Minute)Rate limit for your deployment
RPM (Requests Per Minute)Maximum API calls per minute
Test Your Knowledge

Which deployment type should you choose for a production Azure OpenAI workload with predictable, high-volume traffic?

A
B
C
D
Test Your Knowledge

Can Azure OpenAI Service be accessed through a multi-service Azure AI Services resource?

A
B
C
D
Test Your Knowledge

Which Azure OpenAI model supports both text and image understanding (multimodal)?

A
B
C
D
Test Your Knowledge

Approximately how many English words does one token represent?

A
B
C
D