2.1 Azure AI Content Safety Overview
Key Takeaways
- Azure AI Content Safety is a dedicated service for detecting harmful content in text, images, and AI-generated outputs.
- The service evaluates content across four harm categories: violence, self-harm, sexual content, and hate speech, each rated on a 0-7 severity scale.
- Content Safety supports both user-generated content (UGC) moderation and AI-generated content filtering in generative AI applications.
- The service is available as a standalone API and is integrated into Azure OpenAI Service as a built-in content filter.
- Prompt Shields detect and block adversarial prompt injection attacks — both direct jailbreaks and indirect (XPIA) attacks.
Azure AI Content Safety Overview
Quick Answer: Azure AI Content Safety detects harmful content in text and images across four categories (violence, self-harm, sexual, hate) rated on a 0-7 severity scale. It includes Prompt Shields for adversarial attack detection, groundedness detection to combat hallucination, and protected material detection to prevent copyright infringement.
What Is Azure AI Content Safety?
Azure AI Content Safety is a cloud-based API service that uses AI models to detect potentially harmful content in text, images, and multi-modal inputs. It powers the content filtering in Azure OpenAI Service and is also available as a standalone service for moderating user-generated content in applications.
Core Capabilities
| Capability | Description | Input Types |
|---|---|---|
| Text Moderation | Analyze text for harmful content across four categories | Text strings (up to 10,000 characters) |
| Image Moderation | Analyze images for harmful visual content | JPEG, PNG, GIF, BMP, TIFF, WEBP |
| Prompt Shields | Detect adversarial prompt injection attacks | Text prompts and documents |
| Groundedness Detection | Check if AI responses are grounded in source material | AI response + source documents |
| Protected Material Detection | Identify copyrighted text in AI outputs | AI-generated text |
| Custom Categories | Define organization-specific content categories | Text content |
Harm Categories
Azure AI Content Safety evaluates content across four primary harm categories:
1. Violence
Content that depicts, promotes, or glorifies violence, physical injury, or threats.
Examples detected:
- Descriptions of physical attacks or weapons use
- Graphic depictions of injuries or death
- Threats of violence against individuals or groups
- Instructions for causing physical harm
2. Self-Harm
Content that promotes, depicts, or instructs self-injury or suicide.
Examples detected:
- Descriptions of self-harm methods
- Promotion or glorification of self-harm
- Suicide instructions or encouragement
- Content that may trigger self-harm behaviors
3. Sexual Content
Sexually explicit or suggestive content.
Examples detected:
- Explicit sexual descriptions or imagery
- Suggestive content involving minors (always blocked)
- Non-consensual sexual scenarios
- Sexually explicit jokes or innuendo
4. Hate Speech
Content targeting protected groups with hostility, discrimination, or dehumanization.
Examples detected:
- Slurs and derogatory language targeting protected groups
- Content promoting discrimination based on race, gender, religion, etc.
- Dehumanizing language or comparisons
- Content promoting hate groups or ideologies
Severity Scale
Each harm category is rated on a 0-7 severity scale:
| Severity Level | Range | Description |
|---|---|---|
| Safe | 0-1 | Content is appropriate and harmless |
| Low | 2-3 | Mildly concerning but generally acceptable in most contexts |
| Medium | 4-5 | Moderately harmful; inappropriate in many contexts |
| High | 6-7 | Severely harmful; inappropriate in virtually all contexts |
On the Exam: The severity scale is 0-7 (not 0-3 or 0-10). Questions may test whether you know the correct range and how to set thresholds in content filter configurations.
Creating a Content Safety Resource
# Create an Azure AI Content Safety resource
az cognitiveservices account create \
--name my-content-safety \
--resource-group rg-ai-prod \
--kind ContentSafety \
--sku S0 \
--location eastus \
--yes
Calling the Text Moderation API
from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeTextOptions, TextCategory
from azure.core.credentials import AzureKeyCredential
client = ContentSafetyClient(
endpoint="https://my-content-safety.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)
# Analyze text for harmful content
request = AnalyzeTextOptions(
text="Text content to analyze for safety",
categories=[
TextCategory.HATE,
TextCategory.SELF_HARM,
TextCategory.SEXUAL,
TextCategory.VIOLENCE
]
)
response = client.analyze_text(request)
# Check results
for category_result in response.categories_analysis:
print(f"Category: {category_result.category}")
print(f"Severity: {category_result.severity}")
Calling the Image Moderation API
from azure.ai.contentsafety.models import AnalyzeImageOptions, ImageData
# Analyze an image for harmful content
with open("image.jpg", "rb") as f:
image_data = f.read()
request = AnalyzeImageOptions(
image=ImageData(content=image_data)
)
response = client.analyze_image(request)
for category_result in response.categories_analysis:
print(f"Category: {category_result.category}")
print(f"Severity: {category_result.severity}")
What is the severity scale range used by Azure AI Content Safety?
Which four harm categories does Azure AI Content Safety evaluate?
Which Python class is used to create a Content Safety client in the Azure SDK?