Which deployment type should you choose for a production Azure OpenAI workload with predictable, high-volume traffic?

Provisioned (PTU) deployment. Provisioned (PTU) deployment provides dedicated throughput units with guaranteed minimum throughput, making it ideal for production workloads with predictable, high-volume traffic. Standard deployment uses shared infrastructure with variable throughput.

Can Azure OpenAI Service be accessed through a multi-service Azure AI Services resource?

No, it requires its own dedicated resource with kind "OpenAI". Azure OpenAI Service requires its own dedicated resource created with --kind OpenAI. It is NOT available through a multi-service Azure AI Services resource, unlike Vision, Language, and Speech services.

Which Azure OpenAI model supports both text and image understanding (multimodal)?

GPT-4o. GPT-4o is a multimodal model that can process both text and images. You can send images along with text prompts and the model will analyze and respond to visual content. GPT-3.5 Turbo is text-only, GPT-4 is primarily text, and embedding models produce vector representations.

Approximately how many English words does one token represent?

0.75 words. In English, approximately 1 token equals 0.75 words (or about 4 characters). This means 100 words is approximately 133 tokens. Token counts are important for understanding costs (billed per 1,000 tokens) and context window limits.

Azure OpenAI Service — Models and Deployment

Quick Answer: Azure OpenAI provides GPT-4o (multimodal), GPT-4 (text), GPT-3.5 Turbo (fast text), DALL-E 3 (images), Whisper (speech-to-text), and embedding models within Azure. Deploy models to your Azure OpenAI resource with Standard, Provisioned, or Global deployment types.

Available Models

Model	Type	Capabilities	Context Window
GPT-4o	Multimodal	Text + image understanding, code generation, reasoning	128K tokens
GPT-4o mini	Multimodal	Faster, cheaper GPT-4o variant	128K tokens
GPT-4	Text	Advanced reasoning, code generation	8K-128K tokens
GPT-4 Turbo	Text	Faster GPT-4 with vision support	128K tokens
GPT-3.5 Turbo	Text	Fast, cost-effective text generation	16K tokens
DALL-E 3	Image generation	Generate images from text descriptions	N/A
Whisper	Audio	Speech-to-text transcription and translation	25 MB audio
text-embedding-3-large	Embeddings	Convert text to vector representations	8K tokens
text-embedding-3-small	Embeddings	Smaller, faster embedding model	8K tokens
text-embedding-ada-002	Embeddings	Legacy embedding model	8K tokens

Creating an Azure OpenAI Resource

# Create an Azure OpenAI resource
az cognitiveservices account create \
    --name my-openai-service \
    --resource-group rg-ai-prod \
    --kind OpenAI \
    --sku S0 \
    --location eastus \
    --yes

Important: Azure OpenAI Service is NOT available through a multi-service Azure AI Services resource. It requires its own dedicated resource with --kind OpenAI.

Model Deployment

Deploying a Model via Azure CLI

# Deploy GPT-4o
az cognitiveservices account deployment create \
    --name my-openai-service \
    --resource-group rg-ai-prod \
    --deployment-name gpt4o-deployment \
    --model-name gpt-4o \
    --model-version "2024-08-06" \
    --model-format OpenAI \
    --sku-name Standard \
    --sku-capacity 10  # Thousands of tokens per minute (TPM)

Deployment Types

Type	Description	Billing	Best For
Standard	Shared infrastructure, variable throughput	Pay per token consumed	Development, variable workloads
Provisioned	Dedicated throughput units (PTU)	Monthly commitment	Production with predictable loads
Global Standard	Routes traffic across regions	Pay per token	Multi-region, highest availability

Provisioned Throughput Units (PTU)

Purchase a fixed number of throughput units
Guaranteed minimum throughput (tokens per minute)
Lower per-token cost for high-volume workloads
Monthly commitment required

Key Differences: Azure OpenAI vs. OpenAI

Feature	Azure OpenAI	OpenAI (Direct)
Authentication	API key OR Entra ID / managed identity	API key only
Network security	VNet, private endpoints, IP filtering	Internet-only
Content filtering	Built-in, configurable	Limited
Compliance	SOC 2, HIPAA, GDPR, FedRAMP	Limited
Data privacy	Data NOT used to train models	Data NOT used (API usage)
SLA	99.9% (Standard), 99.95% (PTU)	Best effort
Regional deployment	Choose specific Azure region	OpenAI infrastructure

On the Exam: Azure OpenAI is differentiated by enterprise features: VNet/private endpoints, managed identity, built-in content filtering, and compliance certifications. Questions may ask why a company would choose Azure OpenAI over direct OpenAI — the answer is enterprise security and compliance.

Token Concepts

What Are Tokens?

Tokens are the fundamental units of text processed by LLMs
English: approximately 1 token = 0.75 words (or 4 characters)
Total tokens = input tokens (prompt) + output tokens (completion)
Cost is calculated per 1,000 tokens (input and output priced separately)

Token Limits

Concept	Description
Context window	Maximum total tokens (input + output) per request
Max output tokens	Maximum tokens in the generated response
TPM (Tokens Per Minute)	Rate limit for your deployment
RPM (Requests Per Minute)	Maximum API calls per minute

Azure AI Engineer Associate

6.1 Azure OpenAI Service — Models and Deployment

Key Takeaways

Azure OpenAI Service — Models and Deployment

Available Models

Creating an Azure OpenAI Resource

Model Deployment

Deploying a Model via Azure CLI

Deployment Types

Provisioned Throughput Units (PTU)

Key Differences: Azure OpenAI vs. OpenAI

Token Concepts

What Are Tokens?

Token Limits

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (15-20%)

3Domain 2: Implement Content Moderation Solutions (10-15%)

4Domain 3: Implement Computer Vision Solutions (15-20%)

5Domain 4: Implement Natural Language Processing Solutions (25-30%)

6Domain 5: Implement Knowledge Mining and Document Intelligence Solutions (10-15%)

7Domain 6: Implement Generative AI Solutions (10-15%)

8Exam Review: Cross-Domain Topics and Advanced Practice

6.1 Azure OpenAI Service — Models and Deployment

Key Takeaways

Azure OpenAI Service — Models and Deployment

Available Models

Creating an Azure OpenAI Resource

Model Deployment

Deploying a Model via Azure CLI

Deployment Types

Provisioned Throughput Units (PTU)

Key Differences: Azure OpenAI vs. OpenAI

Token Concepts

What Are Tokens?

Token Limits