4.2 Service Picker Drills
Key Takeaways
- A repeatable picker beats memorization: choose among a Foundry model deployment, an agent, a Foundry Tool, RAG, fine-tuning, or custom training based on the scenario requirements.
- Use prebuilt Azure AI services for common stable tasks such as sentiment, entities, speech to text, text to speech, OCR, image analysis, and content moderation.
- Use a deployed Foundry model when the app needs open-ended language, image generation, multimodal reasoning, or flexible response drafting.
- Use retrieval-augmented generation when answers must use private or changing source content; use fine-tuning only when repeated task behavior or style must be learned from examples.
- Model size, modality, region availability, latency, quota, and token cost are valid service-selection constraints, not afterthoughts.
The Picker Pattern
After you classify the workload, choose the build path. AI-901 expects you to understand Microsoft Foundry as the place to build with models, agents, tools, evaluations, monitoring, and SDKs. It also expects you to recognize when a Foundry Tool is a better match than a general-purpose model.
Use this rule: choose the most specific reliable capability that solves the task. A general model can often imitate many behaviors, but a prebuilt service may be cheaper, easier to test, more consistent, or easier to explain.
Service Picker Table
| Requirement | Choose first | Why | Watch for |
|---|---|---|---|
| Standard text analytics such as sentiment, key phrases, entities, language detection, or PII | Azure Language | Prebuilt natural language processing APIs and Foundry Tool access | Custom labels may require custom text classification or custom NER |
| Translate written documents or strings | Translator | Dedicated translation capability | Spoken translation belongs with Azure Speech |
| Transcribe meetings, caption media, synthesize voice, or translate speech | Azure Speech | Handles audio input and audio output directly | Speaker recognition is not the same as speech recognition |
| Describe images, read text in images, or detect visual features | Azure Vision or multimodal model | Vision handles image analysis; multimodal models reason over images in prompts | Image generation is a different workload |
| Extract fields from forms, contracts, images, calls, and videos | Content Understanding | Supports schemas, confidence, grounding, and multimodal inputs | Simple OCR may be enough for one image-text task |
| Moderate prompts, outputs, text, or images | Content Safety or guardrails | Detects harmful content, prompt attacks, protected material, and groundedness risks | Safety controls still need monitoring and review |
| Draft, rewrite, reason, code, summarize flexibly, or generate images | Foundry model deployment | General generative capability with prompt and parameter control | Ground source-dependent answers with RAG |
| Decide when to search, call APIs, or complete a workflow | Foundry agent | Tools and instructions let the system act across steps | Tools need least privilege and traceability |
| Answer over private or fast-changing knowledge | RAG with search and a deployed model | Retrieval adds current source context without changing model weights | Poor retrieval can still produce weak answers |
| Repeated style or behavior learned from examples | Fine-tuning | Changes model behavior after training examples | Fine-tuned deployments can add hosting cost |
Five Quick Drills
Drill 1: Support inbox triage. The app labels urgency, detects customer names, finds product codes, and masks account numbers in emails. Start with Azure Language. If it also drafts replies, add a Foundry model, but do not replace deterministic extraction with a free-form answer.
Drill 2: Voice kiosk. A visitor speaks a question, the app transcribes it, a model answers, and the response is read aloud. Use Azure Speech for speech to text and text to speech. Add a deployed model for the answer. If the model directly handles spoken prompts in the scenario, a multimodal model may be part of the implementation.
Drill 3: Warehouse photo check. The app reads shelf labels and identifies damaged packages in uploaded images. Use Azure Vision for image analysis and OCR. If the app must return structured inspection fields from photos and videos, Content Understanding becomes the stronger answer.
Drill 4: Contract packet automation. The business receives PDFs, scanned pages, and recorded calls, then needs clause type, dates, parties, obligations, and confidence values. Use Content Understanding because the key requirement is schema-driven extraction across messy content.
Drill 5: IT request handler. The app answers questions, searches a knowledge base, checks device status, and opens tickets. Use a Foundry agent with constrained tools. Add RAG for knowledge answers and monitoring for tool behavior.
Model Choice Discipline
The best model is not automatically the largest model. Choose by task complexity, modality, accuracy needs, latency, region, quota, and cost. Small models can work for routing, formatting, and simple extraction. Stronger models fit complex reasoning, multimodal interpretation, and ambiguous instructions. The Foundry playground is the correct place to compare prompts, parameters, and model behavior before writing client code.
Use the picker as a decision tree: prebuilt tool when the task is standard, model deployment when the task is generative, RAG when the answer needs source grounding, agent when the model must act, and fine-tuning only when examples need to change repeated behavior.
A legal department wants employees to ask questions about internal policies. The documents change weekly, and answers should cite the current source passages. Which approach best fits?
A marketing prototype needs short campaign taglines and several new product-background images from text prompts. Which service direction is most appropriate?