5.4 RAG vs Fine-Tuning vs Prompting vs Custom Models
Key Takeaways
- Prompting is the simplest customization layer and should be tried first when the needed facts fit in the request.
- RAG is preferred when the model must answer from changing enterprise knowledge, cite sources, or use information that should not be baked into model weights.
- Fine-tuning changes model behavior for a task, style, or pattern, but it is not the best way to keep fast-changing factual knowledge current.
- Custom model paths require stronger data, skills, governance, and lifecycle ownership, so they should be justified by clear business value.
Matching The Fix To The Failure
When a generative AI application performs poorly, the first question is why. If the model does not understand the task, improve the prompt. If it lacks current company knowledge, use retrieval. If it follows the task but not the desired style or pattern, consider fine-tuning. If the business need is specialized and cannot be met by managed models or services, evaluate a custom model path.
This distinction is important for AWS AI Practitioner scenarios because the best answer is often the least complex option that meets the requirement. Overbuilding adds cost, delivery time, operational risk, and governance work. Underbuilding can create hallucinations, stale answers, or unsafe workflow decisions.
| Approach | Best when | Not best when |
|---|---|---|
| Prompting | Task needs clearer instructions or format | Facts exceed the context or change often. |
| RAG | Answers need current enterprise knowledge | The issue is only tone or response style. |
| Fine-tuning | Need consistent behavior from many examples | Need frequent factual updates or source citations. |
| Continued pretraining | Need domain adaptation from broad corpus | Small team lacks data, budget, or ML ownership. |
| Custom model | Need unique capability or control | Managed services already meet the need. |
RAG means retrieval-augmented generation. The application retrieves relevant content from a knowledge source, then provides that context to the model for answer generation. In AWS designs, Knowledge Bases for Amazon Bedrock can help connect foundation models to enterprise data through embeddings and a vector store. The practitioner should understand the concept without needing to implement vector indexing details.
RAG is strong when facts change. Employee policies, product manuals, support articles, legal terms, and internal procedures should usually stay in managed content repositories, not in model weights. When a policy changes, the knowledge source can be updated and re-indexed. The model can then answer from retrieved context instead of relying on older training data.
RAG also supports traceability. If the application returns source links or document snippets, a human can verify the answer. This is valuable for customer support, compliance, and internal knowledge work. However, RAG quality depends on document quality, chunking, retrieval relevance, permissions, and prompt design. Bad retrieval still produces bad answers.
Fine-tuning is different. Fine-tuning updates a model using additional examples so it better follows a pattern, task, style, or domain-specific behavior. It can help with classification formats, brand voice, repeated transformations, or specialized response style. It is not ideal as a knowledge update mechanism because factual changes require new training work and may not be easy to trace back to a source.
Prompting versus RAG versus fine-tuning decision checklist:
- If the issue is unclear instructions, improve the prompt first.
- If the issue is missing or changing facts, use RAG or a managed knowledge solution.
- If the issue is consistent style or task behavior across many examples, evaluate fine-tuning.
- If the output must cite approved sources, favor RAG over fine-tuning alone.
- If the workflow is deterministic, use rules or application logic instead of a model.
- If risk is high, include guardrails, monitoring, and human review regardless of customization method.
Custom model development is the heaviest path. It may involve data collection, labeling, training, evaluation, deployment, monitoring, retraining, and security controls. Amazon SageMaker AI supports many ML lifecycle activities, while Amazon Bedrock supports customization options for supported models. For the foundational practitioner, the key is not to build the pipeline. The key is to know when a custom path is justified and what questions to ask.
A business should ask whether it has enough high-quality data, clear ownership, evaluation criteria, compliance approval, and ongoing budget. It should ask whether a managed AI service already solves the task. It should ask who will monitor drift, quality, safety, and cost after launch. A custom model without lifecycle ownership is a risk, not a strategy.
A common scenario is a company chatbot that gives outdated benefit answers. Fine-tuning the model on the employee handbook is usually the wrong first answer. The better design is a RAG application over the approved handbook or an enterprise assistant that respects access controls and current sources. Fine-tuning might later help the assistant speak in a preferred tone, but it should not be the only source of truth.
Another scenario is a support team that wants replies in a very specific internal style. If the approved facts are already available and the problem is tone consistency, prompt templates and few-shot examples may be enough. If many examples show a stable pattern that prompts cannot capture reliably, fine-tuning can be evaluated. The decision is based on measured failure, not novelty.
A company chatbot must answer from a policy library that changes every month and should show source references. Which approach is the best fit?
When is fine-tuning most appropriate compared with RAG?
What should a practitioner do first when a model output is poor because the prompt does not clearly define the task?