2.1 Azure AI Content Safety Overview

Key Takeaways

Azure AI Content Safety is a dedicated Azure AI service that detects harmful content in text, images, and AI-generated outputs through REST APIs and SDKs.
The service evaluates content across four harm categories — Hate, Sexual, Violence, and Self-Harm — each rated on a 0-7 severity scale.
The text API returns severities on the trimmed scale 0, 2, 4, 6 by default (mapping Safe/Low/Medium/High) and can return the full 0-7 scale when requested.
Content Safety powers both standalone user-generated-content moderation and the built-in content filter inside Azure OpenAI / Microsoft Foundry deployments.
Create the resource with kind ContentSafety (or via a multi-service Azure AI Services resource); the AI-102 exam (40-60 questions, 700/1000 to pass) tests which kind to deploy.

Last updated: June 2026

Quick Answer: Azure AI Content Safety detects harmful content in text and images across four categories — Hate, Sexual, Violence, and Self-Harm — each rated on a 0-7 severity scale. The text API returns the trimmed values 0, 2, 4, 6 by default. The service also adds Prompt Shields (adversarial attack detection), groundedness detection (hallucination), and protected material detection (copyright). It runs standalone or as the built-in filter inside Azure OpenAI.

What Is Azure AI Content Safety?

Azure AI Content Safety is a cloud API service that uses Microsoft's harm-classification models to flag potentially harmful material in text, images, and multimodal inputs. It is the same engine that powers the content filter inside Azure OpenAI Service (now surfaced through Microsoft Foundry), but it is also sold as an independent service so you can moderate user-generated content in any app — chat rooms, marketplaces, comment sections — even when no large language model is involved.

For AI-102, fix this framing in mind: Content Safety is a moderation service, distinct from Azure AI Language sentiment/PII or Azure AI Vision tagging. If a scenario says "detect hateful or violent material in user posts and block it," the answer is Content Safety, not Language or Vision.

Core Capabilities

Capability	What it does	Input types
Analyze Text	Scores text across the four harm categories	UTF-8 text, up to 10,000 characters per request
Analyze Image	Scores images for harmful visual content	JPEG, PNG, GIF, BMP, TIFF, WEBP (max 4 MB)
Prompt Shields	Detects direct jailbreaks and indirect (XPIA) injection	User prompt + grounding documents
Groundedness detection	Flags AI responses not supported by source text	Response + grounding sources
Protected material	Detects copyrighted text/code in AI output	AI-generated text or code
Custom categories	Org-specific categories you train	Text + labeled examples
Blocklists	Exact-match term filtering	Text + named blocklist

The Four Harm Categories

Content Safety classifies into exactly four categories. Memorize them — distractors on the exam love to insert plausible extras like "Profanity," "Spam," or "Misinformation," which are not built-in categories (those would be custom categories or blocklists).

Hate — content attacking or using discriminatory language toward people based on protected attributes (race, ethnicity, gender, religion, sexual orientation, disability, and similar).
Sexual — sexually explicit or suggestive content; sexual content involving minors is always treated as the most severe and is non-configurable.
Violence — physical harm, weapons, injury, or threats against people, animals, or property.
Self-Harm — content that depicts, encourages, or instructs self-injury or suicide.

The 0-7 Severity Scale

Each category receives a numeric severity from 0 to 7. The current text model returns the trimmed values 0, 2, 4, 6 by default — each pair of adjacent full-scale levels collapses into one returned value — and you can opt into the full 0-7 scale. Image moderation returns 0, 2, 4, 6.

Returned value	Band	Meaning
0	Safe (0-1)	Harmless in context
2	Low (2-3)	Mildly concerning, often acceptable
4	Medium (4-5)	Moderately harmful
6	High (6-7)	Severely harmful

On the Exam: The scale is 0-7, not 0-3 or 0-10. If asked what a text-moderation call returns, the safe answer is the four discrete values 0/2/4/6. You set your own threshold and compare each returned severity against it — the service itself does not decide allow/block.

Creating a Content Safety Resource

You can deploy a dedicated ContentSafety resource or use a multi-service Azure AI Services resource (the latter shares one key/endpoint across services). Free tier F0 allows limited requests; S0 is the standard paid tier needed for blocklists and custom categories.

az cognitiveservices account create \
    --name my-content-safety \
    --resource-group rg-ai-prod \
    --kind ContentSafety \
    --sku S0 \
    --location eastus \
    --yes

Authenticate calls with either the resource key (Ocp-Apim-Subscription-Key) or, preferred for production, Microsoft Entra ID via managed identity and the Cognitive Services User role — exam scenarios about keyless auth point to managed identity, not embedding keys.

from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeTextOptions, TextCategory
from azure.core.credentials import AzureKeyCredential

client = ContentSafetyClient(
    endpoint="https://my-content-safety.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

request = AnalyzeTextOptions(
    text="Text content to analyze for safety",
    categories=[TextCategory.HATE, TextCategory.SELF_HARM,
                TextCategory.SEXUAL, TextCategory.VIOLENCE]
)
response = client.analyze_text(request)
for result in response.categories_analysis:
    print(result.category, result.severity)  # e.g. Hate 0, Violence 4

If you omit categories, the service evaluates all four by default. The class name is ContentSafetyClient (package azure.ai.contentsafety) — not ContentModerationClient, which belongs to the retired Content Moderator service.

Standalone Moderation vs the Azure OpenAI Built-In Filter

A recurring exam decision is which surface of Content Safety to use. The same harm models appear in two places, and the right choice depends on whether a large language model is involved.

Scenario	Use
Moderate user comments, listings, or chat with no LLM	Standalone Content Safety Analyze Text / Analyze Image APIs
Screen prompts and completions for an Azure OpenAI deployment	The built-in content filter (configured in Foundry)
Moderate content from a third-party model or your own model	Standalone APIs called from your code
Catch jailbreaks and XPIA before generation	Prompt Shields, standalone or as a filter toggle

The standalone APIs give you the raw per-category severities so you decide the policy; the built-in filter applies a policy automatically and short-circuits the model call. Both bill against Content Safety, but the built-in filter is included with Azure OpenAI usage.

Service Limits and Tiers You Should Know

Free (F0) tier exists for evaluation with low request-per-second and monthly caps; blocklists and custom categories require the standard (S0) tier.
Text requests are limited to 10,000 characters; images to 4 MB.
Regional availability matters — Content Safety, and features like custom categories, are not in every region, so a deployment-region mismatch is a plausible exam wrong answer.
Data sent for analysis is not used to train Microsoft's models; this is a Responsible AI / data-privacy point that exam questions sometimes probe when comparing managed moderation to building your own classifier.

When a scenario stresses "no infrastructure to manage" and "detect harmful UGC at scale," the managed Content Safety service is the intended answer over training a bespoke model in Azure Machine Learning.

Test Your Knowledge

What is the severity scale range used by Azure AI Content Safety?

0-7

0-5

0-3

0-10

Test Your Knowledge

Which set lists the four harm categories evaluated by Azure AI Content Safety?

Spam, Phishing, Malware, Fraud

Toxicity, Danger, Explicit, Discrimination

Profanity, Harassment, Misinformation, Bias

Hate, Sexual, Violence, Self-Harm

Test Your Knowledge

Which client class interacts with Azure AI Content Safety in the Python SDK?

ContentSafetyClient

ContentModerationClient

TextAnalyticsClient

SafetyAnalysisClient

Up Next

2.2 Prompt Shields and Adversarial Attack Detection

Continue learning

Azure AI Engineer Associate

Azure AI-102

2.1 Azure AI Content Safety Overview

Key Takeaways

What Is Azure AI Content Safety?

Core Capabilities

The Four Harm Categories

The 0-7 Severity Scale

Creating a Content Safety Resource

Standalone Moderation vs the Azure OpenAI Built-In Filter

Service Limits and Tiers You Should Know

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

2.1 Azure AI Content Safety Overview

Key Takeaways

What Is Azure AI Content Safety?

Core Capabilities

The Four Harm Categories

The 0-7 Severity Scale

Creating a Content Safety Resource

Standalone Moderation vs the Azure OpenAI Built-In Filter

Service Limits and Tiers You Should Know