4.2 Service Picker Drills

Key Takeaways

A repeatable picker beats memorization: choose among a Foundry model deployment, an agent, a Foundry Tool, RAG, fine-tuning, or custom training based on the scenario requirements.
Use prebuilt Azure AI services for common stable tasks such as sentiment, entities, speech to text, text to speech, OCR, image analysis, and content moderation.
Use a deployed Foundry model when the app needs open-ended language, image generation, multimodal reasoning, or flexible response drafting.
Use retrieval-augmented generation when answers must use private or changing source content; use fine-tuning only when repeated task behavior or style must be learned from examples.
Model size, modality, region availability, latency, quota, and token cost are valid service-selection constraints, not afterthoughts.

Last updated: June 2026

The Picker Pattern

After you classify the workload, choose the build path. AI-901 expects you to understand Microsoft Foundry as the place to build with models, agents, tools, evaluations, monitoring, and SDKs. It also expects you to recognize when a Foundry Tool is a better match than a general-purpose model.

Use this rule: choose the most specific reliable capability that solves the task. A general model can often imitate many behaviors, but a prebuilt service may be cheaper, easier to test, more consistent, or easier to explain.

Service Picker Table

Requirement	Choose first	Why	Watch for
Standard text analytics such as sentiment, key phrases, entities, language detection, or PII	Azure Language	Prebuilt natural language processing APIs and Foundry Tool access	Custom labels may require custom text classification or custom NER
Translate written documents or strings	Translator	Dedicated translation capability	Spoken translation belongs with Azure Speech
Transcribe meetings, caption media, synthesize voice, or translate speech	Azure Speech	Handles audio input and audio output directly	Speaker recognition is not the same as speech recognition
Describe images, read text in images, or detect visual features	Azure Vision or multimodal model	Vision handles image analysis; multimodal models reason over images in prompts	Image generation is a different workload
Extract fields from forms, contracts, images, calls, and videos	Content Understanding	Supports schemas, confidence, grounding, and multimodal inputs	Simple OCR may be enough for one image-text task
Moderate prompts, outputs, text, or images	Content Safety or guardrails	Detects harmful content, prompt attacks, protected material, and groundedness risks	Safety controls still need monitoring and review
Draft, rewrite, reason, code, summarize flexibly, or generate images	Foundry model deployment	General generative capability with prompt and parameter control	Ground source-dependent answers with RAG
Decide when to search, call APIs, or complete a workflow	Foundry agent	Tools and instructions let the system act across steps	Tools need least privilege and traceability
Answer over private or fast-changing knowledge	RAG with search and a deployed model	Retrieval adds current source context without changing model weights	Poor retrieval can still produce weak answers
Repeated style or behavior learned from examples	Fine-tuning	Changes model behavior after training examples	Fine-tuned deployments can add hosting cost

Five Quick Drills

Drill 1: Support inbox triage. The app labels urgency, detects customer names, finds product codes, and masks account numbers in emails. Start with Azure Language. If it also drafts replies, add a Foundry model, but do not replace deterministic extraction with a free-form answer.

Drill 2: Voice kiosk. A visitor speaks a question, the app transcribes it, a model answers, and the response is read aloud. Use Azure Speech for speech to text and text to speech. Add a deployed model for the answer. If the model directly handles spoken prompts in the scenario, a multimodal model may be part of the implementation.

Drill 3: Warehouse photo check. The app reads shelf labels and identifies damaged packages in uploaded images. Use Azure Vision for image analysis and OCR. If the app must return structured inspection fields from photos and videos, Content Understanding becomes the stronger answer.

Drill 4: Contract packet automation. The business receives PDFs, scanned pages, and recorded calls, then needs clause type, dates, parties, obligations, and confidence values. Use Content Understanding because the key requirement is schema-driven extraction across messy content.

Drill 5: IT request handler. The app answers questions, searches a knowledge base, checks device status, and opens tickets. Use a Foundry agent with constrained tools. Add RAG for knowledge answers and monitoring for tool behavior.

Model Choice Discipline

The best model is not automatically the largest model. Choose by task complexity, modality, accuracy needs, latency, region, quota, and cost. Small models can work for routing, formatting, and simple extraction. Stronger models fit complex reasoning, multimodal interpretation, and ambiguous instructions. The Foundry playground is the correct place to compare prompts, parameters, and model behavior before writing client code.

Use the picker as a decision tree: prebuilt tool when the task is standard, model deployment when the task is generative, RAG when the answer needs source grounding, agent when the model must act, and fine-tuning only when examples need to change repeated behavior.

Test Your Knowledge

A legal department wants employees to ask questions about internal policies. The documents change weekly, and answers should cite the current source passages. Which approach best fits?

Use retrieval-augmented generation with a search index and deployed model so answers are grounded in current documents.

Fine-tune a model once and remove all retrieval because policy documents are stable forever.

Use Azure Speech only because the users are asking natural-language questions.

Use image generation because the policies might contain diagrams.

Test Your Knowledge

A marketing prototype needs short campaign taglines and several new product-background images from text prompts. Which service direction is most appropriate?

Use generative models in Foundry, including an image-generation model for new visuals.

Use Azure Vision OCR because the app is creating images from text.

Use Azure Language named entity recognition because the output must be creative.

Use Content Understanding because all visual tasks require structured extraction.

Up Next

4.3 Evaluation, Monitoring, and Cost Awareness

Continue learning

Microsoft Certified: Azure AI Fundamentals

Microsoft Certified: Azure AI Fundamentals (AI-901)

4.2 Service Picker Drills

Key Takeaways

The Picker Pattern

Service Picker Table

Five Quick Drills

Model Choice Discipline

Microsoft Certified: Azure AI Fundamentals

1Chapter 1: AI-901 Format and Responsible AI

2Chapter 2: Microsoft Foundry, Models, and Agents

3Chapter 3: Azure AI Services, Vision, Language, and Extraction

4Chapter 4: AI-901 Scenario and Service Selection

5Chapter 5: Practice Labs, Common Traps, and Final Review

Microsoft Certified: Azure AI Fundamentals (AI-901)

4.2 Service Picker Drills

Key Takeaways

The Picker Pattern

Service Picker Table

Five Quick Drills

Model Choice Discipline