2.4 Content Filtering in Azure OpenAI Service

Key Takeaways

  • Azure OpenAI Service has a built-in content filtering system that screens both input prompts and output completions.
  • Default content filters block medium and high severity content across all four harm categories — they are always active.
  • Custom content filter configurations allow adjusting severity thresholds per category and per direction (input vs. output).
  • Annotations in the API response indicate which content filter categories were triggered and their severity scores.
  • Content filtering applies to all Azure OpenAI models including GPT-4o, GPT-4, GPT-3.5 Turbo, DALL-E, and Whisper.
Last updated: March 2026

Content Filtering in Azure OpenAI Service

Quick Answer: Azure OpenAI Service includes built-in content filters that screen input prompts and output completions across violence, self-harm, sexual, and hate categories. Default filters block medium+ severity. Create custom configurations in Azure AI Foundry to adjust thresholds per category.

Default Content Filter Behavior

Every Azure OpenAI deployment has content filters enabled by default:

CategoryInput Filter (Default)Output Filter (Default)
ViolenceBlock Medium + HighBlock Medium + High
Self-HarmBlock Medium + HighBlock Medium + High
SexualBlock Medium + HighBlock Medium + High
HateBlock Medium + HighBlock Medium + High

What Happens When Content Is Filtered

Input filtered: The API returns an HTTP 400 error with a content_filter error code. The model never sees the prompt.

Output filtered: The API returns a response with a finish_reason of "content_filter" instead of "stop". The filtered content is replaced with an empty string.

Handling Filtered Responses in Code

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="<your-key>",
    api_version="2024-06-01",
    azure_endpoint="https://my-openai.openai.azure.com/"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "User message here"}
        ]
    )

    # Check if the response was filtered
    choice = response.choices[0]
    if choice.finish_reason == "content_filter":
        print("Response was filtered by content safety.")
        # Handle gracefully — show alternative message
    else:
        print(choice.message.content)

except Exception as e:
    if "content_filter" in str(e):
        print("Input was blocked by content safety filters.")
    else:
        raise

Custom Content Filter Configurations

Creating a Custom Filter in Azure AI Foundry

  1. Navigate to Azure AI Foundry portal
  2. Open your project and go to Safety + SecurityContent filters
  3. Click Create content filter
  4. Configure severity thresholds per category:
SettingOptionsDescription
Input filterAllow all / Low / Medium / HighThreshold for input prompts
Output filterAllow all / Low / Medium / HighThreshold for generated responses
Prompt ShieldsOn / OffEnable/disable jailbreak detection
Protected materialOn / OffEnable/disable copyright detection
GroundednessOn / OffEnable/disable groundedness checks
  1. Assign the custom filter to a model deployment

Custom Filter Examples

Medical Application:

  • Violence input: Allow Low (medical descriptions)
  • Violence output: Allow Low
  • Self-Harm input: Block all (Medium threshold)
  • Self-Harm output: Block all
  • Sexual input: Block Medium
  • Sexual output: Block Medium
  • Hate input: Block Medium
  • Hate output: Block Medium

Customer Service Bot:

  • All categories input: Block Medium (default)
  • All categories output: Block Low (stricter on outputs)
  • Prompt Shields: Enabled
  • Protected material: Enabled

Content Filter Annotations

When annotations are enabled, the API response includes detailed content filter results:

{
    "choices": [{
        "message": {
            "content": "Generated response text"
        },
        "content_filter_results": {
            "hate": {
                "filtered": false,
                "severity": "safe"
            },
            "self_harm": {
                "filtered": false,
                "severity": "safe"
            },
            "sexual": {
                "filtered": false,
                "severity": "safe"
            },
            "violence": {
                "filtered": false,
                "severity": "safe"
            },
            "protected_material_text": {
                "filtered": false,
                "detected": false
            },
            "protected_material_code": {
                "filtered": false,
                "detected": false
            }
        },
        "finish_reason": "stop"
    }],
    "prompt_filter_results": [{
        "content_filter_results": {
            "hate": {"filtered": false, "severity": "safe"},
            "self_harm": {"filtered": false, "severity": "safe"},
            "sexual": {"filtered": false, "severity": "safe"},
            "violence": {"filtered": false, "severity": "safe"},
            "jailbreak": {"filtered": false, "detected": false},
            "indirect_attack": {"filtered": false, "detected": false}
        }
    }]
}

On the Exam: Know the difference between prompt_filter_results (input analysis — includes jailbreak and indirect attack detection) and content_filter_results (output analysis — includes protected material detection). Questions may ask you to interpret annotation JSON.

Content Filtering for DALL-E

Image generation models (DALL-E 3) have additional content filters:

FilterDescription
Prompt filterScreens the text prompt for harmful image generation requests
Output filterAnalyzes the generated image for harmful visual content
Revised promptDALL-E may revise the prompt to add safety-oriented language
Copyright protectionPrevents generation of images closely resembling copyrighted works
response = client.images.generate(
    model="dall-e-3",
    prompt="A peaceful landscape painting",
    n=1,
    size="1024x1024"
)

# Check if generation was filtered
if response.data[0].revised_prompt:
    print(f"Prompt was revised to: {response.data[0].revised_prompt}")
Test Your Knowledge

What happens when an Azure OpenAI output is blocked by the content filter?

A
B
C
D
Test Your Knowledge

Where do you create and manage custom content filter configurations for Azure OpenAI Service?

A
B
C
D
Test Your Knowledge

Which content filter annotation field indicates whether an input prompt contains a jailbreak attempt?

A
B
C
D