1.3 Content Safety and Governance

Key Takeaways

Azure AI Content Safety and Microsoft Foundry guardrails classify harmful content such as hate, sexual content, violence, and self-harm across severity levels for text and images.
Foundry guardrails can inspect user input, final output, and, for Foundry Agent Service preview scenarios, tool calls and tool responses.
Prompt Shields address direct user prompt attacks and indirect attacks hidden in documents or other grounded content.
Optional controls such as groundedness, protected material, PII detection, blocklists, and task adherence help manage risks beyond the four core harm categories.
Governance requires testing, assignment to deployments or agents, monitoring, incident review, and human ownership; filters alone do not make an AI solution responsible.

Last updated: June 2026

From Principles To Controls

Responsible AI principles tell you what should be protected. Content safety and governance tell you how a Foundry solution starts enforcing those protections. On AI-901, expect scenario questions that ask which control belongs in a generative app, a user-upload workflow, or an agent that can call tools.

Microsoft Foundry guardrails are named collections of controls. A control defines the risk to detect, where to inspect the interaction, and what action to take. Azure AI Content Safety supplies classification models that help flag harmful content, while Foundry applies those controls to model deployments and, in preview, to agents built in Foundry Agent Service.

Core Harm Categories

Risk category	What it covers in exam terms	Common scenario
Hate and fairness	Attacks, discriminatory language, harassment, or identity-based abuse	Moderating comments before publishing
Sexual	Adult sexual content, exploitation, explicit sexual material, or related abuse	Filtering image uploads in a community app
Violence	Threats, weapons, extremist content, graphic harm, or instructions for injury	Blocking unsafe generated instructions
Self-harm	Suicide, self-injury, eating-disorder harm, or encouragement of self-harm	Detecting crisis language in user text
Task adherence	Agent behavior that drifts from user instructions or task objectives	Checking whether an agent's tool use matches the user's intent

Microsoft's Foundry severity documentation describes safe, low, medium, and high levels. The practical exam point is that a threshold controls what gets flagged or blocked. A stricter policy catches more borderline content but can create false positives; a looser policy reduces blocking but may miss risky material. Content at the safe level may still appear in annotations but is not the target of blocking.

Input, Output, And Agent Intervention Points

For basic model calls, the two most important checkpoints are user input and output. User input is the prompt or request sent to the model. Output is the completion returned to the user. A support chatbot, for example, should screen both the customer's message and the generated answer.

Agents add more risk because they can call tools. Foundry guardrails support preview intervention points for tool calls and tool responses in Foundry Agent Service. That matters when an agent can search documents, call APIs, or trigger actions. A harmful or manipulated tool result should not silently become the agent's final answer.

Controls Beyond Basic Harm Filtering

Foundry and Azure OpenAI safety features include more than the four core harm categories:

Prompt Shields for user prompt attacks detect attempts to override system instructions, change the assistant's role, or bypass safety rules.
Prompt Shields for indirect attacks detect malicious instructions embedded in documents, emails, webpages, or other external content the model uses for grounding.
Groundedness detection helps flag answers that are not supported by the source materials provided to the model.
Protected material detection helps identify known text or code that a model might reproduce too closely.
PII detection helps identify personally identifiable information in generated content.
Blocklists let teams add custom terms or patterns for their application context.

These controls map directly to AI-901 scenarios. A model producing unsupported policy answers points to groundedness. A document with hidden instructions points to indirect attack protection. A public content app that must block violent or sexual images points to content safety moderation.

Governance Process For A Foundry App

Use this process when reasoning through exam cases:

Define the allowed use. State what the app or agent should and should not do.
Identify risks. Include harmful content, prompt attacks, PII, protected material, hallucination, and misuse of tools.
Choose intervention points. Inspect user input, output, and agent tool activity when applicable.
Set actions. Decide whether each control should annotate, block, or annotate and block.
Assign controls. Apply the guardrail or content filter to the model deployment or agent that needs it.
Test in a non-production path. Use the playground, adversarial prompts, edge cases, and expected false-positive examples.
Monitor and review. Log incidents, tune thresholds, and keep a human owner accountable for policy changes.

Exam Framing

Do not treat content filters as magic. Microsoft documentation notes that application design and API configuration affect filtering behavior, and preview features can change. A responsible architecture combines guardrails with good prompts, least-privilege access, grounding, clear user disclosures, human review, and monitoring.

The strongest AI-901 answer usually names the specific risk and the specific control. "Use responsible AI" is too vague. "Apply a Foundry guardrail that checks user input and output for violence and self-harm, then test it in the playground before production" is the kind of concrete reasoning the exam rewards.

Test Your Knowledge

A Foundry agent answers questions from uploaded supplier documents. One document contains hidden instructions telling the agent to ignore its policy and send confidential customer records to an outside address. Which safety control is most relevant?

Prompt Shields for indirect attacks

Image generation

Speech synthesis

A higher max tokens setting

Test Your Knowledge

A RAG-based helpdesk app gives an answer that sounds confident but is not supported by the policy documents retrieved for the user. Which optional guardrail capability is intended to help flag this risk?

Groundedness detection

Text to speech

A custom color theme

Increasing the response temperature

Up Next

2.1 Model Deployments and Playgrounds

Chapter 2: Microsoft Foundry, Models, and Agents

Microsoft Certified: Azure AI Fundamentals

Microsoft Certified: Azure AI Fundamentals (AI-901)

1.3 Content Safety and Governance

Key Takeaways

From Principles To Controls

Core Harm Categories

Input, Output, And Agent Intervention Points

Controls Beyond Basic Harm Filtering

Governance Process For A Foundry App

Exam Framing

Microsoft Certified: Azure AI Fundamentals

1Chapter 1: AI-901 Format and Responsible AI

2Chapter 2: Microsoft Foundry, Models, and Agents

3Chapter 3: Azure AI Services, Vision, Language, and Extraction

4Chapter 4: AI-901 Scenario and Service Selection

5Chapter 5: Practice Labs, Common Traps, and Final Review

Microsoft Certified: Azure AI Fundamentals (AI-901)

1.3 Content Safety and Governance

Key Takeaways

From Principles To Controls

Core Harm Categories

Input, Output, And Agent Intervention Points

Controls Beyond Basic Harm Filtering

Governance Process For A Foundry App

Exam Framing