2.1 Azure AI Content Safety Overview
Key Takeaways
- Azure AI Content Safety is a dedicated Azure AI service that detects harmful content in text, images, and AI-generated outputs through REST APIs and SDKs.
- The service evaluates content across four harm categories — Hate, Sexual, Violence, and Self-Harm — each rated on a 0-7 severity scale.
- The text API returns severities on the trimmed scale 0, 2, 4, 6 by default (mapping Safe/Low/Medium/High) and can return the full 0-7 scale when requested.
- Content Safety powers both standalone user-generated-content moderation and the built-in content filter inside Azure OpenAI / Microsoft Foundry deployments.
- Create the resource with kind ContentSafety (or via a multi-service Azure AI Services resource); the AI-102 exam (40-60 questions, 700/1000 to pass) tests which kind to deploy.
Quick Answer: Azure AI Content Safety detects harmful content in text and images across four categories — Hate, Sexual, Violence, and Self-Harm — each rated on a 0-7 severity scale. The text API returns the trimmed values 0, 2, 4, 6 by default. The service also adds Prompt Shields (adversarial attack detection), groundedness detection (hallucination), and protected material detection (copyright). It runs standalone or as the built-in filter inside Azure OpenAI.
What Is Azure AI Content Safety?
Azure AI Content Safety is a cloud API service that uses Microsoft's harm-classification models to flag potentially harmful material in text, images, and multimodal inputs. It is the same engine that powers the content filter inside Azure OpenAI Service (now surfaced through Microsoft Foundry), but it is also sold as an independent service so you can moderate user-generated content in any app — chat rooms, marketplaces, comment sections — even when no large language model is involved.
For AI-102, fix this framing in mind: Content Safety is a moderation service, distinct from Azure AI Language sentiment/PII or Azure AI Vision tagging. If a scenario says "detect hateful or violent material in user posts and block it," the answer is Content Safety, not Language or Vision.
Core Capabilities
| Capability | What it does | Input types |
|---|---|---|
| Analyze Text | Scores text across the four harm categories | UTF-8 text, up to 10,000 characters per request |
| Analyze Image | Scores images for harmful visual content | JPEG, PNG, GIF, BMP, TIFF, WEBP (max 4 MB) |
| Prompt Shields | Detects direct jailbreaks and indirect (XPIA) injection | User prompt + grounding documents |
| Groundedness detection | Flags AI responses not supported by source text | Response + grounding sources |
| Protected material | Detects copyrighted text/code in AI output | AI-generated text or code |
| Custom categories | Org-specific categories you train | Text + labeled examples |
| Blocklists | Exact-match term filtering | Text + named blocklist |
The Four Harm Categories
Content Safety classifies into exactly four categories. Memorize them — distractors on the exam love to insert plausible extras like "Profanity," "Spam," or "Misinformation," which are not built-in categories (those would be custom categories or blocklists).
- Hate — content attacking or using discriminatory language toward people based on protected attributes (race, ethnicity, gender, religion, sexual orientation, disability, and similar).
- Sexual — sexually explicit or suggestive content; sexual content involving minors is always treated as the most severe and is non-configurable.
- Violence — physical harm, weapons, injury, or threats against people, animals, or property.
- Self-Harm — content that depicts, encourages, or instructs self-injury or suicide.
The 0-7 Severity Scale
Each category receives a numeric severity from 0 to 7. The current text model returns the trimmed values 0, 2, 4, 6 by default — each pair of adjacent full-scale levels collapses into one returned value — and you can opt into the full 0-7 scale. Image moderation returns 0, 2, 4, 6.
| Returned value | Band | Meaning |
|---|---|---|
| 0 | Safe (0-1) | Harmless in context |
| 2 | Low (2-3) | Mildly concerning, often acceptable |
| 4 | Medium (4-5) | Moderately harmful |
| 6 | High (6-7) | Severely harmful |
On the Exam: The scale is 0-7, not 0-3 or 0-10. If asked what a text-moderation call returns, the safe answer is the four discrete values 0/2/4/6. You set your own threshold and compare each returned severity against it — the service itself does not decide allow/block.
Creating a Content Safety Resource
You can deploy a dedicated ContentSafety resource or use a multi-service Azure AI Services resource (the latter shares one key/endpoint across services). Free tier F0 allows limited requests; S0 is the standard paid tier needed for blocklists and custom categories.
az cognitiveservices account create \
--name my-content-safety \
--resource-group rg-ai-prod \
--kind ContentSafety \
--sku S0 \
--location eastus \
--yes
Authenticate calls with either the resource key (Ocp-Apim-Subscription-Key) or, preferred for production, Microsoft Entra ID via managed identity and the Cognitive Services User role — exam scenarios about keyless auth point to managed identity, not embedding keys.
from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import AnalyzeTextOptions, TextCategory
from azure.core.credentials import AzureKeyCredential
client = ContentSafetyClient(
endpoint="https://my-content-safety.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)
request = AnalyzeTextOptions(
text="Text content to analyze for safety",
categories=[TextCategory.HATE, TextCategory.SELF_HARM,
TextCategory.SEXUAL, TextCategory.VIOLENCE]
)
response = client.analyze_text(request)
for result in response.categories_analysis:
print(result.category, result.severity) # e.g. Hate 0, Violence 4
If you omit categories, the service evaluates all four by default. The class name is ContentSafetyClient (package azure.ai.contentsafety) — not ContentModerationClient, which belongs to the retired Content Moderator service.
Standalone Moderation vs the Azure OpenAI Built-In Filter
A recurring exam decision is which surface of Content Safety to use. The same harm models appear in two places, and the right choice depends on whether a large language model is involved.
| Scenario | Use |
|---|---|
| Moderate user comments, listings, or chat with no LLM | Standalone Content Safety Analyze Text / Analyze Image APIs |
| Screen prompts and completions for an Azure OpenAI deployment | The built-in content filter (configured in Foundry) |
| Moderate content from a third-party model or your own model | Standalone APIs called from your code |
| Catch jailbreaks and XPIA before generation | Prompt Shields, standalone or as a filter toggle |
The standalone APIs give you the raw per-category severities so you decide the policy; the built-in filter applies a policy automatically and short-circuits the model call. Both bill against Content Safety, but the built-in filter is included with Azure OpenAI usage.
Service Limits and Tiers You Should Know
- Free (F0) tier exists for evaluation with low request-per-second and monthly caps; blocklists and custom categories require the standard (S0) tier.
- Text requests are limited to 10,000 characters; images to 4 MB.
- Regional availability matters — Content Safety, and features like custom categories, are not in every region, so a deployment-region mismatch is a plausible exam wrong answer.
- Data sent for analysis is not used to train Microsoft's models; this is a Responsible AI / data-privacy point that exam questions sometimes probe when comparing managed moderation to building your own classifier.
When a scenario stresses "no infrastructure to manage" and "detect harmful UGC at scale," the managed Content Safety service is the intended answer over training a bespoke model in Azure Machine Learning.
What is the severity scale range used by Azure AI Content Safety?
Which set lists the four harm categories evaluated by Azure AI Content Safety?
Which client class interacts with Azure AI Content Safety in the Python SDK?