6.1 Azure OpenAI Service — Models and Deployment
Key Takeaways
- Azure OpenAI Service provides access to OpenAI models (GPT-4o, GPT-4, GPT-3.5 Turbo, DALL-E 3, Whisper, text-embedding-3-large) within the Azure platform with enterprise security.
- Models must be explicitly deployed to an Azure OpenAI resource before they can be used — deployment creates an endpoint with a specific model version.
- Deployment types include Standard (shared infrastructure, pay-per-token), Provisioned (dedicated throughput units), and Global (cross-region routing).
- Azure OpenAI offers the same models as OpenAI but adds enterprise features: VNet integration, managed identity, content filtering, and compliance certifications.
- Each model family has different capabilities: GPT-4o (multimodal text+vision), GPT-4 (text generation), DALL-E 3 (image generation), Whisper (speech-to-text), embeddings (vector representations).
Azure OpenAI Service — Models and Deployment
Quick Answer: Azure OpenAI provides GPT-4o (multimodal), GPT-4 (text), GPT-3.5 Turbo (fast text), DALL-E 3 (images), Whisper (speech-to-text), and embedding models within Azure. Deploy models to your Azure OpenAI resource with Standard, Provisioned, or Global deployment types.
Available Models
| Model | Type | Capabilities | Context Window |
|---|---|---|---|
| GPT-4o | Multimodal | Text + image understanding, code generation, reasoning | 128K tokens |
| GPT-4o mini | Multimodal | Faster, cheaper GPT-4o variant | 128K tokens |
| GPT-4 | Text | Advanced reasoning, code generation | 8K-128K tokens |
| GPT-4 Turbo | Text | Faster GPT-4 with vision support | 128K tokens |
| GPT-3.5 Turbo | Text | Fast, cost-effective text generation | 16K tokens |
| DALL-E 3 | Image generation | Generate images from text descriptions | N/A |
| Whisper | Audio | Speech-to-text transcription and translation | 25 MB audio |
| text-embedding-3-large | Embeddings | Convert text to vector representations | 8K tokens |
| text-embedding-3-small | Embeddings | Smaller, faster embedding model | 8K tokens |
| text-embedding-ada-002 | Embeddings | Legacy embedding model | 8K tokens |
Creating an Azure OpenAI Resource
# Create an Azure OpenAI resource
az cognitiveservices account create \
--name my-openai-service \
--resource-group rg-ai-prod \
--kind OpenAI \
--sku S0 \
--location eastus \
--yes
Important: Azure OpenAI Service is NOT available through a multi-service Azure AI Services resource. It requires its own dedicated resource with
--kind OpenAI.
Model Deployment
Deploying a Model via Azure CLI
# Deploy GPT-4o
az cognitiveservices account deployment create \
--name my-openai-service \
--resource-group rg-ai-prod \
--deployment-name gpt4o-deployment \
--model-name gpt-4o \
--model-version "2024-08-06" \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 10 # Thousands of tokens per minute (TPM)
Deployment Types
| Type | Description | Billing | Best For |
|---|---|---|---|
| Standard | Shared infrastructure, variable throughput | Pay per token consumed | Development, variable workloads |
| Provisioned | Dedicated throughput units (PTU) | Monthly commitment | Production with predictable loads |
| Global Standard | Routes traffic across regions | Pay per token | Multi-region, highest availability |
Provisioned Throughput Units (PTU)
- Purchase a fixed number of throughput units
- Guaranteed minimum throughput (tokens per minute)
- Lower per-token cost for high-volume workloads
- Monthly commitment required
Key Differences: Azure OpenAI vs. OpenAI
| Feature | Azure OpenAI | OpenAI (Direct) |
|---|---|---|
| Authentication | API key OR Entra ID / managed identity | API key only |
| Network security | VNet, private endpoints, IP filtering | Internet-only |
| Content filtering | Built-in, configurable | Limited |
| Compliance | SOC 2, HIPAA, GDPR, FedRAMP | Limited |
| Data privacy | Data NOT used to train models | Data NOT used (API usage) |
| SLA | 99.9% (Standard), 99.95% (PTU) | Best effort |
| Regional deployment | Choose specific Azure region | OpenAI infrastructure |
On the Exam: Azure OpenAI is differentiated by enterprise features: VNet/private endpoints, managed identity, built-in content filtering, and compliance certifications. Questions may ask why a company would choose Azure OpenAI over direct OpenAI — the answer is enterprise security and compliance.
Token Concepts
What Are Tokens?
- Tokens are the fundamental units of text processed by LLMs
- English: approximately 1 token = 0.75 words (or 4 characters)
- Total tokens = input tokens (prompt) + output tokens (completion)
- Cost is calculated per 1,000 tokens (input and output priced separately)
Token Limits
| Concept | Description |
|---|---|
| Context window | Maximum total tokens (input + output) per request |
| Max output tokens | Maximum tokens in the generated response |
| TPM (Tokens Per Minute) | Rate limit for your deployment |
| RPM (Requests Per Minute) | Maximum API calls per minute |
Which deployment type should you choose for a production Azure OpenAI workload with predictable, high-volume traffic?
Can Azure OpenAI Service be accessed through a multi-service Azure AI Services resource?
Which Azure OpenAI model supports both text and image understanding (multimodal)?
Approximately how many English words does one token represent?