2.4 Content Filtering in Azure OpenAI Service
Key Takeaways
- Azure OpenAI / Microsoft Foundry deployments have an always-on content filter that screens both input prompts and output completions across the four harm categories.
- Default filters block Medium and High severity on input and output; thresholds are configurable to High-only, Medium+High, or Low+Medium+High.
- A filtered input returns an HTTP 400 with a content_filter error; a filtered output returns finish_reason content_filter instead of stop.
- prompt_filter_results covers the input (including jailbreak and indirect_attack flags); content_filter_results covers the output (including protected_material).
- Custom content filter configurations are created in the Microsoft Foundry portal under Safety + Security and then assigned to a model deployment.
Quick Answer: Every Azure OpenAI deployment ships with an always-on filter screening input prompts and output completions across Hate, Sexual, Violence, and Self-Harm. Defaults block Medium + High. A filtered input raises an HTTP 400
content_filtererror (the model never sees it); a filtered output returnsfinish_reason = "content_filter". Build custom filters in the Microsoft Foundry portal and assign them to a deployment.
Default Filter Behavior
The Azure OpenAI content filter is on by default and cannot be silently turned off by an application — disabling it for the highest-risk categories requires a Microsoft-approved modified-filter application. The default policy blocks Medium and High severity in all four categories, on both directions.
| Category | Input (default) | Output (default) |
|---|---|---|
| Hate | Block Medium + High | Block Medium + High |
| Sexual | Block Medium + High | Block Medium + High |
| Violence | Block Medium + High | Block Medium + High |
| Self-Harm | Block Medium + High | Block Medium + High |
Thresholds can be set to one of three configurable levels: High only, Medium + High, or Low + Medium + High (strictest). Note: blocking "Low+Medium+High" means even low-severity content is blocked — useful for child-facing apps.
What Happens When Content Is Filtered
- Input filtered — the call fails with HTTP 400 and an error body whose code is
content_filter. The prompt is never sent to the model, so you are not billed for a completion. - Output filtered — the call succeeds but the choice has
finish_reason = "content_filter"(instead of"stop"), and the offending text is replaced with an empty string.
from openai import AzureOpenAI
client = AzureOpenAI(api_key=KEY, api_version="2024-10-21",
azure_endpoint=ENDPOINT)
try:
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_text}])
choice = resp.choices[0]
if choice.finish_reason == "content_filter":
show_safe_fallback() # output was blocked
else:
print(choice.message.content)
except Exception as e:
if "content_filter" in str(e):
show_safe_fallback() # input (HTTP 400) was blocked
else:
raise
Always branch on finish_reason and catch the 400 — a robust app handles both an empty filtered completion and a rejected prompt without crashing.
Custom Content Filter Configurations
You tune filtering in the Microsoft Foundry portal (formerly Azure AI Foundry / Azure OpenAI Studio), not the base Azure portal blade and not the OpenAI SDK.
- Open your Foundry project → Safety + Security → Content filters.
- Create content filter and set per-category, per-direction thresholds.
- Toggle the extra detectors.
- Assign the filter to one or more model deployments.
| Setting | Options | Applies to |
|---|---|---|
| Input filter | Off / Low / Medium / High | Prompts |
| Output filter | Off / Low / Medium / High | Completions |
| Prompt Shields (jailbreak) | On / Off | Input |
| Indirect attack (XPIA) | On / Off | Input documents |
| Protected material – text | On / Off | Output |
| Protected material – code | On / Off | Output |
| Groundedness | On / Off | Output |
On the Exam: "Where do you change content filter severity for a GPT-4o deployment?" → Microsoft Foundry portal, Safety + Security, Content filters, then assign to the deployment. CLI/SDK answers are wrong.
Filter Annotations
With annotations enabled, the response carries detailed results. Memorize which object holds which flag:
{
"choices": [{
"content_filter_results": {
"hate": {"filtered": false, "severity": "safe"},
"self_harm": {"filtered": false, "severity": "safe"},
"sexual": {"filtered": false, "severity": "safe"},
"violence": {"filtered": false, "severity": "safe"},
"protected_material_text": {"filtered": false, "detected": false},
"protected_material_code": {"filtered": false, "detected": false}
},
"finish_reason": "stop"
}],
"prompt_filter_results": [{
"content_filter_results": {
"hate": {"filtered": false, "severity": "safe"},
"jailbreak": {"filtered": false, "detected": false},
"indirect_attack": {"filtered": false, "detected": false}
}
}]
}
prompt_filter_results= input analysis; this is wherejailbreakandindirect_attacklive.content_filter_results(insidechoices) = output analysis; this is whereprotected_material_text/codelive.
On the Exam: If a question asks which field reveals a jailbreak attempt, it is
prompt_filter_results[...].content_filter_results.jailbreak— never the outputcontent_filter_results. Protected material is the mirror image: output only.
Content Filtering for DALL-E and Image Generation
Image-generation models add their own layers: a prompt filter on the request, an output filter on the generated image, an automatic revised_prompt that may rewrite your prompt for safety, and copyright protection against close reproductions of known works.
resp = client.images.generate(model="dall-e-3",
prompt="A peaceful landscape painting", n=1, size="1024x1024")
if resp.data[0].revised_prompt:
print("Revised:", resp.data[0].revised_prompt)
If the prompt is unsafe the generate call is rejected; if you see a revised_prompt, the safety system rewrote your request before generation.
Standalone Content Safety vs the Built-In Filter
The built-in Azure OpenAI filter and the standalone Content Safety APIs share the same harm models, but they are not interchangeable on the exam.
| Question signal | Correct answer |
|---|---|
| "Screen prompts/completions for an Azure OpenAI deployment" | Built-in filter (configure in Foundry) |
| "Moderate user comments before they ever reach a model" | Standalone Analyze Text API |
| "Adjust how strictly GPT-4o blocks violent prompts" | Custom content filter in Foundry, assigned to the deployment |
| "Get the raw 0-7 severity to apply my own policy" | Standalone API (built-in filter only allows/blocks) |
The built-in filter applies a policy and short-circuits the call; the standalone API hands back severities and lets your code decide. Both can run in the same solution — for example, standalone moderation on uploaded files and the built-in filter on the chat completion.
Models Covered and Approval to Loosen Filters
The content filter applies to all Azure OpenAI models — the GPT-4o / GPT-4 / GPT-3.5 chat families, embeddings are not filtered (no harmful-text surface), DALL-E for images, and Whisper for transcription where applicable. You cannot fully disable filtering on the highest-risk categories from the portal: lowering or turning off filters for an approved subset requires submitting Microsoft's modified content filter application and being granted Limited Access. Exam scenarios that say "the customer wants to turn the filter off entirely" should steer you to that approval process, not a simple toggle.
Interpreting finish_reason and Errors — Worked Example
Consider a chat call where the user prompt is clean but the model drifts into describing weapon construction. The request succeeds (HTTP 200), but choices[0].finish_reason comes back as content_filter and choices[0].message.content is empty; content_filter_results.violence.filtered is true with severity: "high". Your code must detect the empty filtered completion and present a safe fallback — it must not assume an empty string means the model had nothing to say.
Now consider a prompt that itself contains a high-severity hate slur: the SDK raises an exception, the HTTP status is 400, and the error payload's code is content_filter with prompt_filter_results showing the offending category. No completion is generated and you are not billed for output tokens. A production handler must branch on both outcomes — the 200-with-content_filter finish reason and the 400 input rejection — which is exactly the dual try/except plus finish_reason check shown earlier.
Common Traps
- A filtered output is HTTP 200, not an error; only a filtered input is a 400. Mixing these up is a classic distractor.
jailbreakandindirect_attacklive inprompt_filter_results;protected_material_*lives in the outputcontent_filter_results.- Custom filters are configured in Microsoft Foundry, never via
azCLI or the SDK constructor.
What does an Azure OpenAI response show when the generated output is blocked by the content filter?
Where do you create and assign custom content filter configurations for an Azure OpenAI deployment?
By default, which severity levels does the Azure OpenAI content filter block?
Which annotation field reveals that an input prompt contained a jailbreak attempt?