2.1 Azure AI Content Safety Overview

Key Takeaways

  • Azure AI Content Safety is a dedicated service for detecting harmful content in text, images, and AI-generated outputs.
  • The service evaluates content across four harm categories: violence, self-harm, sexual content, and hate speech, each rated on a 0-7 severity scale.
  • Content Safety supports both user-generated content (UGC) moderation and AI-generated content filtering in generative AI applications.
  • The service is available as a standalone API and is integrated into Azure OpenAI Service as a built-in content filter.
  • Prompt Shields detect and block adversarial prompt injection attacks — both direct jailbreaks and indirect (XPIA) attacks.
Last updated: March 2026

Azure AI Content Safety Overview

Quick Answer: Azure AI Content Safety detects harmful content in text and images across four categories (violence, self-harm, sexual, hate) rated on a 0-7 severity scale. It includes Prompt Shields for adversarial attack detection, groundedness detection to combat hallucination, and protected material detection to prevent copyright infringement.

What Is Azure AI Content Safety?

Azure AI Content Safety is a cloud-based API service that uses AI models to detect potentially harmful content in text, images, and multi-modal inputs. It powers the content filtering in Azure OpenAI Service and is also available as a standalone service for moderating user-generated content in applications.

Core Capabilities

CapabilityDescriptionInput Types
Text ModerationAnalyze text for harmful content across four categoriesText strings (up to 10,000 characters)
Image ModerationAnalyze images for harmful visual contentJPEG, PNG, GIF, BMP, TIFF, WEBP
Prompt ShieldsDetect adversarial prompt injection attacksText prompts and documents
Groundedness DetectionCheck if AI responses are grounded in source materialAI response + source documents
Protected Material DetectionIdentify copyrighted text in AI outputsAI-generated text
Custom CategoriesDefine organization-specific content categoriesText content

Harm Categories

Azure AI Content Safety evaluates content across four primary harm categories:

1. Violence

Content that depicts, promotes, or glorifies violence, physical injury, or threats.

Examples detected:

  • Descriptions of physical attacks or weapons use
  • Graphic depictions of injuries or death
  • Threats of violence against individuals or groups
  • Instructions for causing physical harm

2. Self-Harm

Content that promotes, depicts, or instructs self-injury or suicide.

Examples detected:

  • Descriptions of self-harm methods
  • Promotion or glorification of self-harm
  • Suicide instructions or encouragement
  • Content that may trigger self-harm behaviors

3. Sexual Content

Sexually explicit or suggestive content.

Examples detected:

  • Explicit sexual descriptions or imagery
  • Suggestive content involving minors (always blocked)
  • Non-consensual sexual scenarios
  • Sexually explicit jokes or innuendo

4. Hate Speech

Content targeting protected groups with hostility, discrimination, or dehumanization.

Examples detected:

  • Slurs and derogatory language targeting protected groups
  • Content promoting discrimination based on race, gender, religion, etc.
  • Dehumanizing language or comparisons
  • Content promoting hate groups or ideologies

Severity Scale

Each harm category is rated on a 0-7 severity scale:

Severity LevelRangeDescription
Safe0-1Content is appropriate and harmless
Low2-3Mildly concerning but generally acceptable in most contexts
Medium4-5Moderately harmful; inappropriate in many contexts
High6-7Severely harmful; inappropriate in virtually all contexts

On the Exam: The severity scale is 0-7 (not 0-3 or 0-10). Questions may test whether you know the correct range and how to set thresholds in content filter configurations.

Creating a Content Safety Resource

# Create an Azure AI Content Safety resource
az cognitiveservices account create \
    --name my-content-safety \
    --resource-group rg-ai-prod \
    --kind ContentSafety \
    --sku S0 \
    --location eastus \
    --yes

Calling the Text Moderation API

from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeTextOptions, TextCategory
from azure.core.credentials import AzureKeyCredential

client = ContentSafetyClient(
    endpoint="https://my-content-safety.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

# Analyze text for harmful content
request = AnalyzeTextOptions(
    text="Text content to analyze for safety",
    categories=[
        TextCategory.HATE,
        TextCategory.SELF_HARM,
        TextCategory.SEXUAL,
        TextCategory.VIOLENCE
    ]
)

response = client.analyze_text(request)

# Check results
for category_result in response.categories_analysis:
    print(f"Category: {category_result.category}")
    print(f"Severity: {category_result.severity}")

Calling the Image Moderation API

from azure.ai.contentsafety.models import AnalyzeImageOptions, ImageData

# Analyze an image for harmful content
with open("image.jpg", "rb") as f:
    image_data = f.read()

request = AnalyzeImageOptions(
    image=ImageData(content=image_data)
)

response = client.analyze_image(request)

for category_result in response.categories_analysis:
    print(f"Category: {category_result.category}")
    print(f"Severity: {category_result.severity}")
Test Your Knowledge

What is the severity scale range used by Azure AI Content Safety?

A
B
C
D
Test Your Knowledge

Which four harm categories does Azure AI Content Safety evaluate?

A
B
C
D
Test Your Knowledge

Which Python class is used to create a Content Safety client in the Azure SDK?

A
B
C
D