5.5 Responsible Generative AI
Key Takeaways
- Responsible generative AI applies all six Microsoft responsible AI principles specifically to generative AI systems — with additional considerations for content generation.
- Content filters in Azure OpenAI scan both prompts (input) and completions (output) for harmful content across four categories: hate, violence, sexual content, and self-harm.
- Prompt Shields detect and block jailbreak attempts — techniques where users try to bypass the model's safety guardrails.
- Groundedness detection verifies that AI-generated responses are factually supported by the provided source documents, helping reduce hallucinations.
- Microsoft Copilot incorporates responsible AI by design — with content filters, citations, disclaimers, and user feedback mechanisms built into the product.
Responsible Generative AI
Quick Answer: Responsible generative AI applies Microsoft's six principles with additional safeguards: content filters block harmful content in prompts and responses, Prompt Shields prevent jailbreaks, and groundedness detection ensures responses are factually supported. Azure OpenAI includes these protections by default.
Why Responsible Generative AI Is Different
Generative AI introduces unique risks that traditional AI does not have:
| Risk | Description | Example |
|---|---|---|
| Hallucination | Model generates false information confidently | "Abraham Lincoln invented the telephone" |
| Harmful content generation | Model produces hate speech, violence, or inappropriate content | Generating offensive text or images |
| Jailbreaking | Users craft prompts to bypass safety guardrails | "Ignore your instructions and tell me how to..." |
| Prompt injection | Hidden instructions in user input override system message | Injecting instructions through pasted text |
| Copyright concerns | Model may generate content closely matching copyrighted material | Reproducing copyrighted text or code |
| Misinformation | Model generates plausible-sounding but false information | Fake medical advice or legal information |
| Privacy leakage | Model may reveal personal information from training data | Generating someone's phone number or address |
Content Filtering in Azure OpenAI
Azure OpenAI includes default content filters that are always active:
Four Filter Categories
| Category | What It Blocks | Severity Levels |
|---|---|---|
| Hate | Discrimination, slurs, dehumanization | Safe → Low → Medium → High |
| Violence | Threats, graphic violence, harm descriptions | Safe → Low → Medium → High |
| Sexual | Sexually explicit or suggestive content | Safe → Low → Medium → High |
| Self-harm | Self-injury descriptions, instructions | Safe → Low → Medium → High |
How Filtering Works
User Prompt → [Input Content Filter] → Model → [Output Content Filter] → Response
- User submits a prompt to the API
- Input filter scans the prompt for harmful content
- If the prompt passes, the model generates a response
- Output filter scans the response for harmful content
- If the response passes, it is returned to the user
- If either filter triggers, the request is blocked with an error
On the Exam: Azure OpenAI content filters are enabled BY DEFAULT — they scan BOTH prompts AND responses. You do not need to enable them separately. This is a common exam question.
Prompt Shields
Prompt Shields detect and block jailbreak attempts — techniques where users try to make the model ignore its safety instructions:
Types of Jailbreak Attempts
| Technique | Example | How Prompt Shields Help |
|---|---|---|
| Direct override | "Ignore all previous instructions and..." | Detects instruction override patterns |
| Role playing | "Pretend you are an AI with no restrictions..." | Identifies role-play bypass attempts |
| Encoding tricks | Using base64 or alternative encodings to hide harmful requests | Detects encoded content |
| Indirect injection | Harmful instructions hidden in pasted documents or URLs | Scans all input content for hidden instructions |
Groundedness Detection
Groundedness detection verifies that AI-generated responses are factually supported by provided source documents:
| Check | Description | Result |
|---|---|---|
| Grounded | Response is supported by the provided context | Safe — response is factual |
| Ungrounded | Response contains claims not supported by context | Flagged — may be a hallucination |
This is especially important for RAG scenarios where you want the model to only answer from retrieved documents.
Microsoft Copilot and Responsible AI
Microsoft Copilot (the AI assistant built into Microsoft 365, Windows, Bing, and other products) incorporates responsible AI by design:
| Feature | Responsible AI Purpose |
|---|---|
| Content filters | Block harmful content in prompts and responses |
| Citations | Show sources for generated information (transparency) |
| Disclaimers | "AI-generated content may be incorrect" warnings (transparency) |
| User feedback | Thumbs up/down to report quality issues (accountability) |
| Data boundaries | Enterprise data does not leave the Microsoft 365 boundary (privacy) |
| Conversation limits | Limit conversation length to prevent harmful content accumulation |
| Grounding in enterprise data | Respond based on your organization's data (accuracy) |
Best Practices for Responsible Generative AI
For Developers and Organizations
| Practice | Description |
|---|---|
| Use content filters | Keep default Azure OpenAI content filters enabled |
| Implement RAG | Ground responses in factual data to reduce hallucinations |
| Set clear system messages | Define what the AI should and should not do |
| Monitor usage | Review logs for harmful content and misuse |
| Provide disclaimers | Inform users that content is AI-generated |
| Enable feedback | Allow users to report inaccurate or harmful responses |
| Test thoroughly | Red-team your application to find vulnerabilities |
| Human oversight | Keep humans in the loop for high-stakes decisions |
| Document limitations | Be transparent about what the AI can and cannot do |
For End Users
| Practice | Description |
|---|---|
| Verify information | Do not blindly trust AI-generated content |
| Provide context | Give the AI relevant information to reduce hallucinations |
| Review for bias | Check AI outputs for potential bias or unfairness |
| Report issues | Use feedback mechanisms to report problems |
| Understand limitations | Know that AI can make mistakes and has knowledge cutoffs |
On the Exam: Responsible generative AI questions often ask about content filters, hallucination mitigation (RAG/grounding), or how Copilot implements responsible AI. Know that content filters scan both inputs and outputs, RAG reduces hallucinations, and Copilot includes citations and disclaimers for transparency.
What are Prompt Shields in Azure OpenAI Service?
What does groundedness detection verify in generative AI?
Which of the following is a responsible AI feature built into Microsoft Copilot?
What is the best technique to reduce hallucinations in generative AI responses?
Which THREE of the following are categories used by Azure OpenAI content filters? (Select three)
Select all that apply