6.5 DALL-E Image Generation
Key Takeaways
- Azure OpenAI generates images through the Images API using DALL-E 3 and the newer GPT-image-1 family; DALL-E 3 supports sizes 1024x1024, 1792x1024, and 1024x1792 only.
- DALL-E 3 accepts n=1 per request (one image), with quality standard or hd and style vivid or natural.
- DALL-E 3 automatically rewrites your prompt for detail and safety and returns the revised_prompt actually used in the response.
- Content filtering runs on BOTH the input prompt and the generated image, using the same harm categories (hate, sexual, violence, self-harm) as text.
- Responsible-AI restrictions block photorealistic images of identifiable real people, copyrighted characters, and trademarked logos.
Quick Answer: Azure OpenAI creates images via the Images API with DALL-E 3 (and the newer GPT-image-1 family). DALL-E 3 supports exactly 1024x1024, 1792x1024, 1024x1792, n=1 per call,
qualitystandard/hd,stylevivid/natural. DALL-E 3 auto-revises the prompt and returnsrevised_prompt. Content filters screen both the prompt and the output image.
Generating an Image
from openai import AzureOpenAI
client = AzureOpenAI(api_version="2024-10-21",
azure_endpoint="https://my-openai.openai.azure.com/",
azure_ad_token_provider=token_provider)
resp = client.images.generate(
model="dalle3", # deployment name
prompt="A serene mountain lake at sunset, digital art",
n=1, # DALL-E 3 supports only 1
size="1024x1024", # or 1792x1024 / 1024x1792
quality="hd", # or standard
style="vivid" # or natural
)
image_url = resp.data[0].url # valid ~24 hours
revised = resp.data[0].revised_prompt # what DALL-E actually used
On the Exam: The returned
urlis temporary (expires roughly 24 hours) — download and persist the image (e.g. to Blob Storage) if you need it long term. You can also requestresponse_format="b64_json"to get the bytes inline instead of a URL.
DALL-E 3 Parameters
| Parameter | Allowed values | Notes |
|---|---|---|
| size | 1024x1024, 1792x1024, 1024x1792 | No custom sizes; DALL-E 3 dropped 256/512 |
| quality | standard, hd | hd adds detail at higher cost |
| style | vivid, natural | vivid = dramatic/hyper-real; natural = photographic |
| n | 1 only | Multiple images = multiple calls |
Trap: Options offering 256x256 or 512x512 are testing whether you remember those are DALL-E 2 sizes, removed in DALL-E 3. Likewise, asking for n=4 in one DALL-E 3 call fails.
Prompt Revision
DALL-E 3 rewrites your prompt before generating, adding descriptive detail, artistic direction for vague requests, and safety-oriented phrasing. The revised_prompt field returns the exact text used, so you can log it, display it, or detect when the model substantially reinterpreted your intent. This behavior is unique to DALL-E 3 and is a favorite exam fact.
Dual Content Filtering
Unlike text generation (filtered once), image generation is screened twice:
| Stage | When | What it checks |
|---|---|---|
| Prompt filter | Before generation | Blocks requests for harmful imagery |
| Image filter | After generation | Analyzes the produced image for harmful content |
| Copyright / IP check | After generation | Blocks copyrighted characters & trademarked logos |
Both filters use the same four harm categories as text — hate, sexual, violence, self-harm — at configurable severity. A prompt may pass yet still have its output blocked, and you can receive a content_filter finish reason on either side.
Responsible-AI Restrictions
- No photorealistic images of identifiable real people (celebrities, public figures).
- No reproduction of copyrighted characters or trademarked logos.
- Refuses prompts for misleading, deceptive, or otherwise harmful imagery.
- A blocked request returns an error /
content_filterflag rather than an image.
Scenario: an app sends "a photorealistic portrait of [a named celebrity]". The prompt filter rejects it under the real-person rule — the fix is to describe a fictional or non-identifiable subject, not to raise any parameter.
Reading the finish reason
When a request is blocked, you do not get a usable image. The Images API surfaces this either as an HTTP error carrying a content-filter code or, in chat-style flows, as a content_filter finish reason. Your application must handle this gracefully — show a friendly message and let the user revise the prompt rather than silently failing. This dual-stage screening means defensive code should anticipate a block at either the prompt or the image stage, not just on submission.
Persisting generated images
Because the returned url expires (~24 hours), production apps follow a consistent pattern: generate the image, immediately download the bytes (or request response_format="b64_json"), and store them in durable storage such as Azure Blob Storage with your own access controls. Relying on the temporary URL for anything beyond an immediate preview is a frequently-tested mistake. If you need several variations, loop and call the API multiple times with n=1 each, since DALL-E 3 never returns more than one image per request.
DALL-E 3 vs. GPT-image-1
| Capability | DALL-E 3 | GPT-image-1 (newer) |
|---|---|---|
| Text-to-image | Yes | Yes |
| Auto prompt revision | Yes | Model-managed |
| Image editing / inpainting | No | Yes (image + mask input) |
| Transparent backgrounds, better text rendering | Limited | Improved |
Quality vs. style, decoded
The two creative knobs are independent and often confused. quality (standard vs. hd) controls how much rendering effort and detail go into the image — hd costs more and takes longer but sharpens fine detail. style (vivid vs. natural) controls the aesthetic: vivid pushes dramatic, saturated, hyper-real results, while natural yields more muted, photographic output. A scenario asking for "realistic, true-to-life product photos" points to style=natural; one asking for "eye-catching, dramatic marketing art" points to style=vivid. Neither knob changes the allowed sizes or the n=1 limit.
Responsible-AI summary for images
| Restriction | Practical effect |
|---|---|
| No identifiable real people | Photorealistic celebrity/public-figure portraits blocked |
| No copyrighted characters | Branded cartoon/film characters refused |
| No trademarked logos | Company marks not reproduced |
| Harm-category filters | Hate, sexual, violence, self-harm screened on prompt and image |
These are enforced by the service, not optional toggles a developer can disable, and they are the most likely responsible-AI image questions on the exam.
On the Exam: Remember the dual filter (prompt AND image), the fixed three sizes with n=1, the revised_prompt field, the temporary URL that must be persisted, and the real-person / copyright restrictions. If a scenario requires editing an existing image with a mask, that points to GPT-image-1, not DALL-E 3, which only generates from text.
Which image sizes does DALL-E 3 support on Azure OpenAI?
What does the revised_prompt field in a DALL-E 3 response contain?
How does content filtering apply to image generation on Azure OpenAI?
An application must edit an existing photo by replacing a masked region with new content. Which model is appropriate?