2.4 Content Filtering in Azure OpenAI Service

Key Takeaways

Azure OpenAI / Microsoft Foundry deployments have an always-on content filter that screens both input prompts and output completions across the four harm categories.
Default filters block Medium and High severity on input and output; thresholds are configurable to High-only, Medium+High, or Low+Medium+High.
A filtered input returns an HTTP 400 with a content_filter error; a filtered output returns finish_reason content_filter instead of stop.
prompt_filter_results covers the input (including jailbreak and indirect_attack flags); content_filter_results covers the output (including protected_material).
Custom content filter configurations are created in the Microsoft Foundry portal under Safety + Security and then assigned to a model deployment.

Last updated: June 2026

Quick Answer: Every Azure OpenAI deployment ships with an always-on filter screening input prompts and output completions across Hate, Sexual, Violence, and Self-Harm. Defaults block Medium + High. A filtered input raises an HTTP 400 content_filter error (the model never sees it); a filtered output returns finish_reason = "content_filter". Build custom filters in the Microsoft Foundry portal and assign them to a deployment.

Default Filter Behavior

The Azure OpenAI content filter is on by default and cannot be silently turned off by an application — disabling it for the highest-risk categories requires a Microsoft-approved modified-filter application. The default policy blocks Medium and High severity in all four categories, on both directions.

Category	Input (default)	Output (default)
Hate	Block Medium + High	Block Medium + High
Sexual	Block Medium + High	Block Medium + High
Violence	Block Medium + High	Block Medium + High
Self-Harm	Block Medium + High	Block Medium + High

Thresholds can be set to one of three configurable levels: High only, Medium + High, or Low + Medium + High (strictest). Note: blocking "Low+Medium+High" means even low-severity content is blocked — useful for child-facing apps.

What Happens When Content Is Filtered

Input filtered — the call fails with HTTP 400 and an error body whose code is content_filter. The prompt is never sent to the model, so you are not billed for a completion.
Output filtered — the call succeeds but the choice has finish_reason = "content_filter" (instead of "stop"), and the offending text is replaced with an empty string.

from openai import AzureOpenAI

client = AzureOpenAI(api_key=KEY, api_version="2024-10-21",
                     azure_endpoint=ENDPOINT)
try:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": user_text}])
    choice = resp.choices[0]
    if choice.finish_reason == "content_filter":
        show_safe_fallback()       # output was blocked
    else:
        print(choice.message.content)
except Exception as e:
    if "content_filter" in str(e):
        show_safe_fallback()       # input (HTTP 400) was blocked
    else:
        raise

Always branch on finish_reason and catch the 400 — a robust app handles both an empty filtered completion and a rejected prompt without crashing.

Custom Content Filter Configurations

You tune filtering in the Microsoft Foundry portal (formerly Azure AI Foundry / Azure OpenAI Studio), not the base Azure portal blade and not the OpenAI SDK.

Open your Foundry project → Safety + Security → Content filters.
Create content filter and set per-category, per-direction thresholds.
Toggle the extra detectors.
Assign the filter to one or more model deployments.

Setting	Options	Applies to
Input filter	Off / Low / Medium / High	Prompts
Output filter	Off / Low / Medium / High	Completions
Prompt Shields (jailbreak)	On / Off	Input
Indirect attack (XPIA)	On / Off	Input documents
Protected material – text	On / Off	Output
Protected material – code	On / Off	Output
Groundedness	On / Off	Output

On the Exam: "Where do you change content filter severity for a GPT-4o deployment?" → Microsoft Foundry portal, Safety + Security, Content filters, then assign to the deployment. CLI/SDK answers are wrong.

Filter Annotations

With annotations enabled, the response carries detailed results. Memorize which object holds which flag:

{
  "choices": [{
    "content_filter_results": {
      "hate": {"filtered": false, "severity": "safe"},
      "self_harm": {"filtered": false, "severity": "safe"},
      "sexual": {"filtered": false, "severity": "safe"},
      "violence": {"filtered": false, "severity": "safe"},
      "protected_material_text": {"filtered": false, "detected": false},
      "protected_material_code": {"filtered": false, "detected": false}
    },
    "finish_reason": "stop"
  }],
  "prompt_filter_results": [{
    "content_filter_results": {
      "hate": {"filtered": false, "severity": "safe"},
      "jailbreak": {"filtered": false, "detected": false},
      "indirect_attack": {"filtered": false, "detected": false}
    }
  }]
}

prompt_filter_results = input analysis; this is where jailbreak and indirect_attack live.
content_filter_results (inside choices) = output analysis; this is where protected_material_text/code live.

On the Exam: If a question asks which field reveals a jailbreak attempt, it is prompt_filter_results[...].content_filter_results.jailbreak — never the output content_filter_results. Protected material is the mirror image: output only.

Content Filtering for DALL-E and Image Generation

Image-generation models add their own layers: a prompt filter on the request, an output filter on the generated image, an automatic revised_prompt that may rewrite your prompt for safety, and copyright protection against close reproductions of known works.

resp = client.images.generate(model="dall-e-3",
    prompt="A peaceful landscape painting", n=1, size="1024x1024")
if resp.data[0].revised_prompt:
    print("Revised:", resp.data[0].revised_prompt)

If the prompt is unsafe the generate call is rejected; if you see a revised_prompt, the safety system rewrote your request before generation.

Standalone Content Safety vs the Built-In Filter

The built-in Azure OpenAI filter and the standalone Content Safety APIs share the same harm models, but they are not interchangeable on the exam.

Question signal	Correct answer
"Screen prompts/completions for an Azure OpenAI deployment"	Built-in filter (configure in Foundry)
"Moderate user comments before they ever reach a model"	Standalone Analyze Text API
"Adjust how strictly GPT-4o blocks violent prompts"	Custom content filter in Foundry, assigned to the deployment
"Get the raw 0-7 severity to apply my own policy"	Standalone API (built-in filter only allows/blocks)

The built-in filter applies a policy and short-circuits the call; the standalone API hands back severities and lets your code decide. Both can run in the same solution — for example, standalone moderation on uploaded files and the built-in filter on the chat completion.

Models Covered and Approval to Loosen Filters

The content filter applies to all Azure OpenAI models — the GPT-4o / GPT-4 / GPT-3.5 chat families, embeddings are not filtered (no harmful-text surface), DALL-E for images, and Whisper for transcription where applicable. You cannot fully disable filtering on the highest-risk categories from the portal: lowering or turning off filters for an approved subset requires submitting Microsoft's modified content filter application and being granted Limited Access. Exam scenarios that say "the customer wants to turn the filter off entirely" should steer you to that approval process, not a simple toggle.

Interpreting finish_reason and Errors — Worked Example

Consider a chat call where the user prompt is clean but the model drifts into describing weapon construction. The request succeeds (HTTP 200), but choices[0].finish_reason comes back as content_filter and choices[0].message.content is empty; content_filter_results.violence.filtered is true with severity: "high". Your code must detect the empty filtered completion and present a safe fallback — it must not assume an empty string means the model had nothing to say.

Now consider a prompt that itself contains a high-severity hate slur: the SDK raises an exception, the HTTP status is 400, and the error payload's code is content_filter with prompt_filter_results showing the offending category. No completion is generated and you are not billed for output tokens. A production handler must branch on both outcomes — the 200-with-content_filter finish reason and the 400 input rejection — which is exactly the dual try/except plus finish_reason check shown earlier.

Common Traps

A filtered output is HTTP 200, not an error; only a filtered input is a 400. Mixing these up is a classic distractor.
jailbreak and indirect_attack live in prompt_filter_results; protected_material_* lives in the output content_filter_results.
Custom filters are configured in Microsoft Foundry, never via az CLI or the SDK constructor.

Test Your Knowledge

What does an Azure OpenAI response show when the generated output is blocked by the content filter?

An HTTP 500 server error

An automatic retry against a different model

The unfiltered content with a warning header

A response whose finish_reason is content_filter

Test Your Knowledge

Where do you create and assign custom content filter configurations for an Azure OpenAI deployment?

Azure portal, on the Azure OpenAI resource Content filters blade

Directly in the OpenAI Python SDK constructor

Azure CLI using az openai filter commands

Microsoft Foundry portal, Safety + Security, Content filters

Test Your Knowledge

By default, which severity levels does the Azure OpenAI content filter block?

Only High severity

All severities including Safe

Low, Medium, and High severity

Medium and High severity

Test Your Knowledge

Which annotation field reveals that an input prompt contained a jailbreak attempt?

choices[].content_filter_results.violence

prompt_filter_results[].content_filter_results.protected_material_code

choices[].content_filter_results.protected_material_text

prompt_filter_results[].content_filter_results.jailbreak

Up Next

3.1 Azure AI Vision — Image Analysis 4.0

Domain 4: Implement Computer Vision Solutions (10-15%)

Azure AI Engineer Associate

Azure AI-102

2.4 Content Filtering in Azure OpenAI Service

Key Takeaways

Default Filter Behavior

What Happens When Content Is Filtered

Custom Content Filter Configurations

Filter Annotations

Content Filtering for DALL-E and Image Generation

Standalone Content Safety vs the Built-In Filter

Models Covered and Approval to Loosen Filters

Interpreting finish_reason and Errors — Worked Example

Common Traps

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

2.4 Content Filtering in Azure OpenAI Service

Key Takeaways

Default Filter Behavior

What Happens When Content Is Filtered

Custom Content Filter Configurations

Filter Annotations

Content Filtering for DALL-E and Image Generation

Standalone Content Safety vs the Built-In Filter

Models Covered and Approval to Loosen Filters

Interpreting finish_reason and Errors — Worked Example

Common Traps