4.2 Service Picker Drills

Key Takeaways

  • A repeatable picker beats memorization: choose among a Foundry model deployment, an agent, a Foundry Tool, RAG, fine-tuning, or custom training based on the scenario requirements.
  • Use prebuilt Azure AI services for common stable tasks such as sentiment, entities, speech to text, text to speech, OCR, image analysis, and content moderation.
  • Use a deployed Foundry model when the app needs open-ended language, image generation, multimodal reasoning, or flexible response drafting.
  • Use retrieval-augmented generation when answers must use private or changing source content; use fine-tuning only when repeated task behavior or style must be learned from examples.
  • Model size, modality, region availability, latency, quota, and token cost are valid service-selection constraints, not afterthoughts.
Last updated: June 2026

The Picker Pattern

After you classify the workload, choose the build path. AI-901 expects you to understand Microsoft Foundry as the place to build with models, agents, tools, evaluations, monitoring, and SDKs. It also expects you to recognize when a Foundry Tool is a better match than a general-purpose model.

Use this rule: choose the most specific reliable capability that solves the task. A general model can often imitate many behaviors, but a prebuilt service may be cheaper, easier to test, more consistent, or easier to explain.

Service Picker Table

RequirementChoose firstWhyWatch for
Standard text analytics such as sentiment, key phrases, entities, language detection, or PIIAzure LanguagePrebuilt natural language processing APIs and Foundry Tool accessCustom labels may require custom text classification or custom NER
Translate written documents or stringsTranslatorDedicated translation capabilitySpoken translation belongs with Azure Speech
Transcribe meetings, caption media, synthesize voice, or translate speechAzure SpeechHandles audio input and audio output directlySpeaker recognition is not the same as speech recognition
Describe images, read text in images, or detect visual featuresAzure Vision or multimodal modelVision handles image analysis; multimodal models reason over images in promptsImage generation is a different workload
Extract fields from forms, contracts, images, calls, and videosContent UnderstandingSupports schemas, confidence, grounding, and multimodal inputsSimple OCR may be enough for one image-text task
Moderate prompts, outputs, text, or imagesContent Safety or guardrailsDetects harmful content, prompt attacks, protected material, and groundedness risksSafety controls still need monitoring and review
Draft, rewrite, reason, code, summarize flexibly, or generate imagesFoundry model deploymentGeneral generative capability with prompt and parameter controlGround source-dependent answers with RAG
Decide when to search, call APIs, or complete a workflowFoundry agentTools and instructions let the system act across stepsTools need least privilege and traceability
Answer over private or fast-changing knowledgeRAG with search and a deployed modelRetrieval adds current source context without changing model weightsPoor retrieval can still produce weak answers
Repeated style or behavior learned from examplesFine-tuningChanges model behavior after training examplesFine-tuned deployments can add hosting cost

Five Quick Drills

Drill 1: Support inbox triage. The app labels urgency, detects customer names, finds product codes, and masks account numbers in emails. Start with Azure Language. If it also drafts replies, add a Foundry model, but do not replace deterministic extraction with a free-form answer.

Drill 2: Voice kiosk. A visitor speaks a question, the app transcribes it, a model answers, and the response is read aloud. Use Azure Speech for speech to text and text to speech. Add a deployed model for the answer. If the model directly handles spoken prompts in the scenario, a multimodal model may be part of the implementation.

Drill 3: Warehouse photo check. The app reads shelf labels and identifies damaged packages in uploaded images. Use Azure Vision for image analysis and OCR. If the app must return structured inspection fields from photos and videos, Content Understanding becomes the stronger answer.

Drill 4: Contract packet automation. The business receives PDFs, scanned pages, and recorded calls, then needs clause type, dates, parties, obligations, and confidence values. Use Content Understanding because the key requirement is schema-driven extraction across messy content.

Drill 5: IT request handler. The app answers questions, searches a knowledge base, checks device status, and opens tickets. Use a Foundry agent with constrained tools. Add RAG for knowledge answers and monitoring for tool behavior.

Model Choice Discipline

The best model is not automatically the largest model. Choose by task complexity, modality, accuracy needs, latency, region, quota, and cost. Small models can work for routing, formatting, and simple extraction. Stronger models fit complex reasoning, multimodal interpretation, and ambiguous instructions. The Foundry playground is the correct place to compare prompts, parameters, and model behavior before writing client code.

Use the picker as a decision tree: prebuilt tool when the task is standard, model deployment when the task is generative, RAG when the answer needs source grounding, agent when the model must act, and fine-tuning only when examples need to change repeated behavior.

Test Your Knowledge

A legal department wants employees to ask questions about internal policies. The documents change weekly, and answers should cite the current source passages. Which approach best fits?

A
B
C
D
Test Your Knowledge

A marketing prototype needs short campaign taglines and several new product-background images from text prompts. Which service direction is most appropriate?

A
B
C
D