2.4 Content Filtering in Azure OpenAI Service
Key Takeaways
- Azure OpenAI Service has a built-in content filtering system that screens both input prompts and output completions.
- Default content filters block medium and high severity content across all four harm categories — they are always active.
- Custom content filter configurations allow adjusting severity thresholds per category and per direction (input vs. output).
- Annotations in the API response indicate which content filter categories were triggered and their severity scores.
- Content filtering applies to all Azure OpenAI models including GPT-4o, GPT-4, GPT-3.5 Turbo, DALL-E, and Whisper.
Content Filtering in Azure OpenAI Service
Quick Answer: Azure OpenAI Service includes built-in content filters that screen input prompts and output completions across violence, self-harm, sexual, and hate categories. Default filters block medium+ severity. Create custom configurations in Azure AI Foundry to adjust thresholds per category.
Default Content Filter Behavior
Every Azure OpenAI deployment has content filters enabled by default:
| Category | Input Filter (Default) | Output Filter (Default) |
|---|---|---|
| Violence | Block Medium + High | Block Medium + High |
| Self-Harm | Block Medium + High | Block Medium + High |
| Sexual | Block Medium + High | Block Medium + High |
| Hate | Block Medium + High | Block Medium + High |
What Happens When Content Is Filtered
Input filtered: The API returns an HTTP 400 error with a content_filter error code. The model never sees the prompt.
Output filtered: The API returns a response with a finish_reason of "content_filter" instead of "stop". The filtered content is replaced with an empty string.
Handling Filtered Responses in Code
from openai import AzureOpenAI
client = AzureOpenAI(
api_key="<your-key>",
api_version="2024-06-01",
azure_endpoint="https://my-openai.openai.azure.com/"
)
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "User message here"}
]
)
# Check if the response was filtered
choice = response.choices[0]
if choice.finish_reason == "content_filter":
print("Response was filtered by content safety.")
# Handle gracefully — show alternative message
else:
print(choice.message.content)
except Exception as e:
if "content_filter" in str(e):
print("Input was blocked by content safety filters.")
else:
raise
Custom Content Filter Configurations
Creating a Custom Filter in Azure AI Foundry
- Navigate to Azure AI Foundry portal
- Open your project and go to Safety + Security → Content filters
- Click Create content filter
- Configure severity thresholds per category:
| Setting | Options | Description |
|---|---|---|
| Input filter | Allow all / Low / Medium / High | Threshold for input prompts |
| Output filter | Allow all / Low / Medium / High | Threshold for generated responses |
| Prompt Shields | On / Off | Enable/disable jailbreak detection |
| Protected material | On / Off | Enable/disable copyright detection |
| Groundedness | On / Off | Enable/disable groundedness checks |
- Assign the custom filter to a model deployment
Custom Filter Examples
Medical Application:
- Violence input: Allow Low (medical descriptions)
- Violence output: Allow Low
- Self-Harm input: Block all (Medium threshold)
- Self-Harm output: Block all
- Sexual input: Block Medium
- Sexual output: Block Medium
- Hate input: Block Medium
- Hate output: Block Medium
Customer Service Bot:
- All categories input: Block Medium (default)
- All categories output: Block Low (stricter on outputs)
- Prompt Shields: Enabled
- Protected material: Enabled
Content Filter Annotations
When annotations are enabled, the API response includes detailed content filter results:
{
"choices": [{
"message": {
"content": "Generated response text"
},
"content_filter_results": {
"hate": {
"filtered": false,
"severity": "safe"
},
"self_harm": {
"filtered": false,
"severity": "safe"
},
"sexual": {
"filtered": false,
"severity": "safe"
},
"violence": {
"filtered": false,
"severity": "safe"
},
"protected_material_text": {
"filtered": false,
"detected": false
},
"protected_material_code": {
"filtered": false,
"detected": false
}
},
"finish_reason": "stop"
}],
"prompt_filter_results": [{
"content_filter_results": {
"hate": {"filtered": false, "severity": "safe"},
"self_harm": {"filtered": false, "severity": "safe"},
"sexual": {"filtered": false, "severity": "safe"},
"violence": {"filtered": false, "severity": "safe"},
"jailbreak": {"filtered": false, "detected": false},
"indirect_attack": {"filtered": false, "detected": false}
}
}]
}
On the Exam: Know the difference between
prompt_filter_results(input analysis — includes jailbreak and indirect attack detection) andcontent_filter_results(output analysis — includes protected material detection). Questions may ask you to interpret annotation JSON.
Content Filtering for DALL-E
Image generation models (DALL-E 3) have additional content filters:
| Filter | Description |
|---|---|
| Prompt filter | Screens the text prompt for harmful image generation requests |
| Output filter | Analyzes the generated image for harmful visual content |
| Revised prompt | DALL-E may revise the prompt to add safety-oriented language |
| Copyright protection | Prevents generation of images closely resembling copyrighted works |
response = client.images.generate(
model="dall-e-3",
prompt="A peaceful landscape painting",
n=1,
size="1024x1024"
)
# Check if generation was filtered
if response.data[0].revised_prompt:
print(f"Prompt was revised to: {response.data[0].revised_prompt}")
What happens when an Azure OpenAI output is blocked by the content filter?
Where do you create and manage custom content filter configurations for Azure OpenAI Service?
Which content filter annotation field indicates whether an input prompt contains a jailbreak attempt?