What happens when an Azure OpenAI output is blocked by the content filter?

The API returns a response with finish_reason "content_filter". When an Azure OpenAI output is blocked by the content filter, the API returns a response with finish_reason set to "content_filter" instead of "stop". The filtered content is replaced with an empty string. Applications should check finish_reason and handle this gracefully.

Where do you create and manage custom content filter configurations for Azure OpenAI Service?

Azure AI Foundry portal → Project → Safety + Security → Content filters. Custom content filter configurations are created and managed in the Azure AI Foundry portal under Safety + Security → Content filters. You create a filter configuration with specific severity thresholds and then assign it to a model deployment.

Which content filter annotation field indicates whether an input prompt contains a jailbreak attempt?

prompt_filter_results.content_filter_results.jailbreak. The jailbreak detection result is found in prompt_filter_results.content_filter_results.jailbreak. This is part of the input analysis (prompt_filter_results), not the output analysis (content_filter_results). Jailbreak detection is a Prompt Shields feature applied to input prompts.

Content Filtering in Azure OpenAI Service

Quick Answer: Azure OpenAI Service includes built-in content filters that screen input prompts and output completions across violence, self-harm, sexual, and hate categories. Default filters block medium+ severity. Create custom configurations in Azure AI Foundry to adjust thresholds per category.

Default Content Filter Behavior

Every Azure OpenAI deployment has content filters enabled by default:

Category	Input Filter (Default)	Output Filter (Default)
Violence	Block Medium + High	Block Medium + High
Self-Harm	Block Medium + High	Block Medium + High
Sexual	Block Medium + High	Block Medium + High
Hate	Block Medium + High	Block Medium + High

What Happens When Content Is Filtered

Input filtered: The API returns an HTTP 400 error with a content_filter error code. The model never sees the prompt.

Output filtered: The API returns a response with a finish_reason of "content_filter" instead of "stop". The filtered content is replaced with an empty string.

Handling Filtered Responses in Code

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="<your-key>",
    api_version="2024-06-01",
    azure_endpoint="https://my-openai.openai.azure.com/"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "User message here"}
        ]
    )

    # Check if the response was filtered
    choice = response.choices[0]
    if choice.finish_reason == "content_filter":
        print("Response was filtered by content safety.")
        # Handle gracefully — show alternative message
    else:
        print(choice.message.content)

except Exception as e:
    if "content_filter" in str(e):
        print("Input was blocked by content safety filters.")
    else:
        raise

Custom Content Filter Configurations

Creating a Custom Filter in Azure AI Foundry

Navigate to Azure AI Foundry portal
Open your project and go to Safety + Security → Content filters
Click Create content filter
Configure severity thresholds per category:

Setting	Options	Description
Input filter	Allow all / Low / Medium / High	Threshold for input prompts
Output filter	Allow all / Low / Medium / High	Threshold for generated responses
Prompt Shields	On / Off	Enable/disable jailbreak detection
Protected material	On / Off	Enable/disable copyright detection
Groundedness	On / Off	Enable/disable groundedness checks

Assign the custom filter to a model deployment

Custom Filter Examples

Medical Application:

Violence input: Allow Low (medical descriptions)
Violence output: Allow Low
Self-Harm input: Block all (Medium threshold)
Self-Harm output: Block all
Sexual input: Block Medium
Sexual output: Block Medium
Hate input: Block Medium
Hate output: Block Medium

Customer Service Bot:

All categories input: Block Medium (default)
All categories output: Block Low (stricter on outputs)
Prompt Shields: Enabled
Protected material: Enabled

Content Filter Annotations

When annotations are enabled, the API response includes detailed content filter results:

{
    "choices": [{
        "message": {
            "content": "Generated response text"
        },
        "content_filter_results": {
            "hate": {
                "filtered": false,
                "severity": "safe"
            },
            "self_harm": {
                "filtered": false,
                "severity": "safe"
            },
            "sexual": {
                "filtered": false,
                "severity": "safe"
            },
            "violence": {
                "filtered": false,
                "severity": "safe"
            },
            "protected_material_text": {
                "filtered": false,
                "detected": false
            },
            "protected_material_code": {
                "filtered": false,
                "detected": false
            }
        },
        "finish_reason": "stop"
    }],
    "prompt_filter_results": [{
        "content_filter_results": {
            "hate": {"filtered": false, "severity": "safe"},
            "self_harm": {"filtered": false, "severity": "safe"},
            "sexual": {"filtered": false, "severity": "safe"},
            "violence": {"filtered": false, "severity": "safe"},
            "jailbreak": {"filtered": false, "detected": false},
            "indirect_attack": {"filtered": false, "detected": false}
        }
    }]
}

On the Exam: Know the difference between prompt_filter_results (input analysis — includes jailbreak and indirect attack detection) and content_filter_results (output analysis — includes protected material detection). Questions may ask you to interpret annotation JSON.

Content Filtering for DALL-E

Image generation models (DALL-E 3) have additional content filters:

Filter	Description
Prompt filter	Screens the text prompt for harmful image generation requests
Output filter	Analyzes the generated image for harmful visual content
Revised prompt	DALL-E may revise the prompt to add safety-oriented language
Copyright protection	Prevents generation of images closely resembling copyrighted works

response = client.images.generate(
    model="dall-e-3",
    prompt="A peaceful landscape painting",
    n=1,
    size="1024x1024"
)

# Check if generation was filtered
if response.data[0].revised_prompt:
    print(f"Prompt was revised to: {response.data[0].revised_prompt}")

Azure AI Engineer Associate

2.4 Content Filtering in Azure OpenAI Service

Key Takeaways

Content Filtering in Azure OpenAI Service

Default Content Filter Behavior

What Happens When Content Is Filtered

Handling Filtered Responses in Code

Custom Content Filter Configurations

Creating a Custom Filter in Azure AI Foundry

Custom Filter Examples

Content Filter Annotations

Content Filtering for DALL-E

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (15-20%)

3Domain 2: Implement Content Moderation Solutions (10-15%)

4Domain 3: Implement Computer Vision Solutions (15-20%)

5Domain 4: Implement Natural Language Processing Solutions (25-30%)

6Domain 5: Implement Knowledge Mining and Document Intelligence Solutions (10-15%)

7Domain 6: Implement Generative AI Solutions (10-15%)

8Exam Review: Cross-Domain Topics and Advanced Practice

2.4 Content Filtering in Azure OpenAI Service

Key Takeaways

Content Filtering in Azure OpenAI Service

Default Content Filter Behavior

What Happens When Content Is Filtered

Handling Filtered Responses in Code

Custom Content Filter Configurations

Creating a Custom Filter in Azure AI Foundry

Custom Filter Examples

Content Filter Annotations

Content Filtering for DALL-E