6.4 Guardrails, Content Filters, Denied Topics, and Sensitive Data
Key Takeaways
- Guardrails for Amazon Bedrock evaluate user inputs and model responses to help enforce application-specific safety and policy controls.
- Guardrail policies can include content filters, denied topics, word filters, sensitive information filters, contextual grounding checks, and other controls where supported.
- Sensitive information filters can block or mask recognized PII and custom regex entities, but data privacy still requires application and logging design.
- Guardrails are defense-in-depth, not a replacement for IAM, data classification, human review, prompt design, or business policy.
- The safest rollout uses detect or test behavior first, reviews false positives and false negatives, then tunes policies before production enforcement.
Guardrails as application safety controls
Generative AI applications need boundaries. A customer assistant should not produce hateful content, reveal personal data, provide disallowed advice, or ignore a prompt injection that asks it to bypass policy. Guardrails for Amazon Bedrock helps evaluate both user inputs and model responses. You can configure policies and then attach or reference guardrails in supported Bedrock inference, Knowledge Bases, and Agents workflows. The goal is consistent safety behavior for a specific use case.
A guardrail is not a generic promise that an AI application is safe. It is a configured set of policies, messages, and handling behavior. It must match the application. A retail product assistant, an internal code helper, a benefits FAQ bot, and a financial services workflow may need different denied topics, sensitive data rules, and response messages. Practitioners should ask what policy the guardrail is enforcing, not merely whether a guardrail exists.
| Guardrail policy area | What it helps control | Example practitioner use |
|---|---|---|
| Content filters | Harmful categories such as hate, insults, sexual content, violence, misconduct, and prompt attacks where supported | Reduce unsafe inputs or outputs in customer-facing chat. |
| Denied topics | Topics the application should avoid | Stop a banking assistant from giving prohibited investment advice. |
| Word filters | Exact words or phrases to block | Block profanity, restricted product terms, or custom phrases. |
| Sensitive information filters | PII and custom regex entities | Mask or block email, phone, account, or policy identifiers where appropriate. |
| Contextual grounding checks | Whether responses are supported by provided context | Flag or block ungrounded RAG answers. |
| Automated reasoning checks | Logical rule or policy adherence where configured | Validate that generated recommendations follow defined constraints. |
Content filters are useful when the problem is unsafe or unwanted content. They can evaluate prompts and responses and can be configured by category and strength. Prompt attack detection is especially relevant for applications that use instructions, tools, or retrieved context. A prompt injection might tell the assistant to ignore previous instructions, reveal hidden prompts, or call an action outside policy. Guardrails can help detect these attempts, while the application should also use strict tool schemas and least privilege.
Denied topics are for business policy boundaries. If a health insurance assistant is approved only to explain enrollment deadlines and plan terms, a denied topic might block individualized medical advice. If a tax software assistant is approved only for product navigation, it may need to avoid legal tax advice. The denied topic description should be specific enough to guide detection, and the blocked response should tell users what the assistant can help with instead.
Sensitive information filters help detect standard PII patterns and custom regex entities in prompts or responses. Depending on configuration, sensitive data can be blocked or masked. This is valuable for transcript summarization, customer chat, and internal support tools. But masking is not a complete privacy program. If model invocation logging is enabled, teams must understand what inputs and blocked content may be written to CloudWatch Logs or S3 and protect those destinations with encryption, retention, and access controls.
Contextual grounding checks matter for RAG. They can help detect responses that are not supported by retrieved passages or that are irrelevant to the user's query. This is especially useful when a generated answer could be treated as policy or procedure. A good design combines citations, grounding checks, prompt instructions, source-quality controls, and human review for high-impact decisions. Do not use grounding checks to hide the fact that the corpus is outdated or poorly curated.
Guardrail rollout workflow:
- Classify the application risk: public, internal, regulated, advisory, transactional, or creative.
- Define unacceptable inputs, unacceptable outputs, sensitive entities, denied topics, and required refusal messages.
- Test the guardrail with normal questions, edge cases, adversarial prompts, and false-positive examples.
- Use detect or evaluation modes where available to understand behavior before strict blocking.
- Tune category strengths, denied topic language, custom word lists, and sensitive data entities.
- Review logging and retention because safety events can include sensitive text.
- Monitor blocked rates, user complaints, missed policy violations, and business impact after launch.
Scenario: a customer support assistant summarizes chat transcripts. Sensitive information filters can mask PII in generated summaries, while content filters reduce harmful output. The application should still restrict which agents can view transcripts, store logs securely, and avoid sending unnecessary fields to the model. Guardrails reduce risk, but privacy-by-design starts before the prompt is built.
Scenario: an internal engineering assistant helps explain code. A prompt attack could be embedded in a README or issue description. Guardrails with prompt attack detection can help, but the tool should not have broad repository write access unless the workflow is explicitly designed for it. The safest pattern separates code explanation, code generation, and code-changing actions with different permissions and review steps.
For AWS Skill Builder practice, look for labs or demos that show guardrail testing. Change one policy at a time and observe which prompts are blocked, masked, or allowed. The learning goal is policy reasoning: identify the risk, choose the guardrail control, and explain what additional security or review control is still needed.
A public chatbot must avoid answering questions about a prohibited advisory topic. Which Guardrails feature is the best fit?
A transcript summarizer should prevent customer email addresses from appearing in generated summaries. Which guardrail policy area is most relevant?
Which statement best describes the role of Guardrails for Amazon Bedrock?