5.3 Model Selection: Capability, Latency, Cost, and Risk

Key Takeaways

  • Model selection balances capability, modality, context needs, latency, throughput, cost, safety, data handling, and operational support.
  • The largest or most capable model is not automatically the best choice if a smaller model meets the business quality bar at lower cost and latency.
  • AWS practitioners should distinguish Amazon Bedrock foundation models, Amazon Q applications, managed AI services, SageMaker AI paths, and non-AI alternatives.
  • Riskier use cases need stronger grounding, guardrails, evaluation, monitoring, and human review before model approval.
Last updated: May 2026

A Decision Framework For Model Choice

Model selection starts with the job to be done. A chatbot that answers employee policy questions, a document classifier, a sales email drafter, and an image moderation workflow have different requirements. Do not start by asking for the biggest model. Start by asking what the user needs, what data is allowed, how fast the response must be, how much variation is acceptable, and what happens if the output is wrong.

Amazon Bedrock gives teams access to foundation models through a managed service. Different models can vary by text capability, image capability, embeddings, context window, language support, latency, cost, safety behavior, and customization options. A practitioner does not need to memorize every provider detail, but should know that model choice affects quality and operations.

RequirementModel or service questionAWS direction
General text generationDoes the task need broad language capability?Compare suitable Amazon Bedrock text models.
Semantic search or RAGDo we need embeddings for retrieval?Use an embedding model with a supported vector store.
Enterprise assistantIs the main need a managed business assistant?Consider Amazon Q where it fits the use case.
Document extractionIs there a purpose-built managed service?Consider Amazon Textract before a generic FM.
Sentiment or entity detectionIs standard NLP enough?Consider Amazon Comprehend.
Custom ML predictionIs this a classic ML problem with data and labels?Consider SageMaker AI or SageMaker Canvas.
Deterministic ruleMust the result always follow exact rules?Use workflow logic, queries, or rules instead of GenAI.

Capability includes more than intelligence. A model must support the input and output modality: text, image, embeddings, or another format. It must support the needed language, context length, tool use pattern, and customization method. It should also be available in the intended AWS Region and meet the organization's data handling and compliance expectations.

Latency matters when the user is waiting. A customer chat response may need to arrive quickly. A nightly report summary can run more slowly. Larger prompts, larger models, longer outputs, and retrieval steps can increase response time. If a smaller model meets the quality bar, it may be the better production choice.

Cost is not only model price. It includes input tokens, output tokens, retrieval storage and search, orchestration, monitoring, evaluation, human review, and engineering support. A model that produces overly long outputs can be expensive even when the per-token rate looks acceptable. Prompt templates should limit output to what the workflow actually needs.

Risk changes the approval standard. A low-risk internal brainstorming tool can tolerate more variation than a customer-facing agent answering medical, legal, financial, safety, or employment questions. Higher-risk scenarios need stronger grounding, Guardrails for Amazon Bedrock where applicable, human review, logging, and clear escalation rules. Some scenarios should be rejected or redesigned.

Model selection workflow:

  1. State the use case, user audience, and decision impact.
  2. Identify data sources, sensitivity, and whether outputs are customer-facing.
  3. Decide whether a managed AI service, Amazon Q, Amazon Bedrock, SageMaker AI, or ordinary automation best fits.
  4. Create a small evaluation set with representative cases and edge cases.
  5. Test candidate models against quality, latency, cost, and safety criteria.
  6. Choose the simplest model and service design that meets the acceptance bar.
  7. Document fallback behavior, monitoring, and the review cadence.

A common practitioner mistake is using a foundation model for every AI-related need. If the task is to extract tables from invoices, Amazon Textract may be a better first option. If the task is to translate text, Amazon Translate may be a better fit. If business users need no-code predictions from tabular data, SageMaker Canvas may be appropriate. If the task is an enterprise assistant with access to company systems, Amazon Q may fit better than building from scratch.

SageMaker AI is important when the organization needs to build, train, customize, evaluate, or deploy ML models beyond a simple managed API pattern. That path can be powerful, but it brings more responsibility for data preparation, lifecycle, deployment, monitoring, and model governance. The exam target candidate is not expected to implement those pipelines, but should know when the path is heavier than needed.

Good model selection produces a defendable decision. The answer should say why the selected model or service fits the data, task, latency, cost, and risk. It should also say what was not selected and why. That judgment is central to practitioner-level AWS AI work.

Test Your Knowledge

A team wants fast, low-cost internal summaries and a smaller model meets the measured quality bar. What is the best model-selection decision?

A
B
C
D
Test Your Knowledge

A workflow needs to extract structured text and tables from scanned forms. Which first AWS option should a practitioner consider before a generic foundation model?

A
B
C
D
Test Your Knowledge

Which factor most increases the approval requirements for a model used in production?

A
B
C
D