5.7 Prompting and Model Selection Case Lab
Key Takeaways
- A good AI recommendation starts with the business workflow, allowed data, user audience, measurable quality bar, and risk level.
- Prompting, RAG, model selection, guardrails, evaluation, and human review should be combined only as needed for the scenario.
- Practitioner case work should explain tradeoffs, not just name an AWS service.
- AWS Skill Builder, official practice resources, and small console explorations can reinforce service-selection judgment without relying on unauthorized question collections.
Case Lab: Internal Product Support Assistant
Scenario: a software company wants an internal assistant for support agents. Agents need quick answers from product manuals, release notes, and troubleshooting articles. The assistant should summarize relevant steps, cite approved sources when possible, avoid inventing fixes, and escalate safety or legal issues to a human lead. The company wants to start with one product line and expand if quality is acceptable.
Start with the workflow, not the model. The user is a support agent, not an end customer. The output helps the agent decide what to do next, but should not automatically close cases or send customer messages. The approved sources are product manuals, release notes, and troubleshooting articles. The risk is moderate because bad advice could waste time or harm customer trust, but a trained agent remains in the loop.
| Decision point | Recommended case choice | Reason |
|---|---|---|
| First scope | One product line and internal users | Limits risk and makes evaluation practical. |
| Prompt pattern | Template with task, constraints, sources, format, and fallback | Makes output consistent for agents. |
| Knowledge method | RAG through a managed knowledge approach such as Knowledge Bases for Amazon Bedrock | Product facts change and need grounding. |
| Model selection | Compare candidate Amazon Bedrock models against a rubric | Select by measured quality, latency, and cost. |
| Safeguards | Guardrails and escalation rules | Reduces unsafe or out-of-scope responses. |
| Review | Human agent reviews before customer response | Keeps final action accountable. |
A first prompt template could ask for a concise support answer using only retrieved source excerpts. It should include sections for likely issue, evidence from sources, recommended next step, missing information, and escalation flag. It should tell the model not to invent product behavior, not to use unsupported fixes, and to mark unknown when the retrieved context is insufficient.
Prompting alone is not enough here because the manuals and release notes are too large and change over time. Fine-tuning is also not the first choice because the problem is current factual knowledge, not mainly tone or repeated style. RAG fits because approved content remains outside the model weights and can be updated as products change.
Model selection should compare at least two candidates. The team should measure whether each model finds the right answer from retrieved context, handles missing context, refuses unsafe requests, and returns a useful structured response. The selected model should meet the support quality bar at acceptable latency and cost. If a smaller model passes, it may be the better production choice.
Evaluation set design:
- Common how-to questions with clear source answers.
- Recent release changes that older content might contradict.
- Questions with missing source information.
- Customer-sensitive or legal wording that should trigger escalation.
- Prompt injection attempts inside user text and retrieved-like content.
- Ambiguous cases where the assistant should ask for more information.
The team should define failure conditions before launch. Examples include inventing a fix, ignoring a newer release note, exposing restricted internal notes, failing to escalate safety language, or presenting a source-free answer as certain. These failures should lead to prompt changes, source cleanup, retrieval tuning, guardrail updates, or a narrower rollout.
Cost and performance planning should be practical. Estimate the number of support agents, expected requests per case, average retrieved context length, output length, and peak traffic during incidents. Limit the answer to the fields agents need. Avoid sending whole manuals when retrieved chunks are enough. Use usage monitoring and cost allocation so the product support owner can see adoption and spend.
Security and governance should be explicit. Use IAM least privilege for the application components. Protect source documents in Amazon S3 or the chosen repository according to sensitivity. Use encryption at rest and in transit where applicable. Log activity for troubleshooting and audit needs. Make sure any retrieved content is approved for support-agent use.
A non-builder approval review should ask five questions: What business problem are we solving? What data can the assistant use? How did candidate models perform against the rubric? What happens when the model is uncertain or unsafe? How will we monitor quality, cost, and user feedback after launch?
Practice drill for study: in AWS Skill Builder or official AWS training, focus on the distinction between Amazon Bedrock, Amazon Q, SageMaker AI, managed AI services, RAG, guardrails, and model evaluation. In the AWS console, safely explore service pages and documentation without uploading sensitive data. Use official practice resources to test service-selection reasoning. Do not rely on unauthorized question collections or claims about live test items.
The final recommendation for this case is a scoped RAG assistant on Amazon Bedrock, using a prompt template, approved knowledge sources, candidate model evaluation, guardrails, support-agent review, monitoring, and a pilot rollout. The recommendation also says what not to do: do not start with a custom model, do not fine-tune as the only knowledge source, do not auto-send responses to customers, and do not expand beyond the pilot until evaluation evidence supports it.
In the product support case, why is RAG a better first customization choice than fine-tuning only?
What is the safest first rollout for the internal support assistant?
Which approval question best reflects practitioner-level judgment for this case?