4.1 Workload Classification Scenarios

Key Takeaways

AI-901 service selection starts by classifying the workload before naming a product: generative, agentic, text analysis, speech, vision, information extraction, content safety, or predictive machine learning.
The fastest scenario clue is the data direction: written text points to Azure Language or Translator, audio points to Azure Speech, visual input points to Azure Vision or a multimodal model, and mixed-media extraction points to Content Understanding.
Use Microsoft Foundry models for open-ended generation and reasoning, but use Foundry Tools when the need is a proven API such as transcription, entity extraction, OCR, or content moderation.
Agentic AI is different from a normal chat response because an agent works through steps and can use tools, memory, or external systems to complete a task.
Responsible AI, privacy, region, latency, cost, and human-review needs can change the right service even when the workload label looks obvious.

Last updated: June 2026

Classify Before You Pick

AI-901 scenario questions often look like service-name questions, but the underlying skill is workload classification. Microsoft explicitly lists common workloads in the AI-901 study guide: generative and agentic AI, text analysis, speech, computer vision, and information extraction. Treat those as the first sorting buckets. If you jump straight to a product name, you can miss a detail such as audio input, structured output, or tool use.

Start with the business action. Is the system creating new content, understanding existing content, extracting fields, predicting a category, or taking steps through tools? Then identify the input and output. The same customer-support workflow might use Azure Language to score sentiment, Azure Speech to transcribe calls, Content Understanding to extract call topics from recordings, and a Foundry model to draft a response.

Workload Map

Scenario signal	Classify as	Likely Azure fit	Main exam question
Write an answer, draft code, summarize with reasoning, or create an image	Generative AI	Foundry model or image-generation model	Is the output new content?
Decide when to call an API, search files, or complete a multi-step task	Agentic AI	Foundry Agent Service or agent client	Does the AI need tools or actions?
Sentiment, entities, key phrases, language detection, or PII in written text	Text analysis	Azure Language in Foundry Tools	Is the input already text?
Transcription, captions, spoken translation, neural voice, speaker identity	Speech	Azure Speech in Foundry Tools	Is audio the input or output?
Analyze an existing photo, read visible text, detect objects, or caption an image	Computer vision	Azure Vision or a multimodal model	Is the app interpreting visual input?
Pull fields, sections, topics, or JSON from documents, images, audio, or video	Information extraction	Azure Content Understanding	Is the goal structured output from messy media?
Detect harmful prompts, image content, groundedness, protected material, or prompt attacks	Content safety	Azure AI Content Safety or Foundry guardrails	Is the risk unsafe or policy-breaking content?
Predict a category, number, future value, or outlier from historical data	Predictive machine learning	Azure Machine Learning or a trained model path	Is it estimating rather than generating?

The Four-Question Read

Use this process every time:

Input: What does the app receive: text, speech, image, video, document, table data, or a user goal?
Verb: What must the AI do: generate, classify, translate, transcribe, extract, moderate, search, or act?
Output: Does the business need text, audio, an image, a label, a transcript, a confidence score, or structured JSON?
Risk: Does the output affect people, expose sensitive data, require human review, or need content filtering?

That last question matters because service fit is not only functional. A model that can answer a question may still be the wrong production choice if it cannot cite sources, keep private data protected, or provide confidence signals for review.

Traps That Separate Similar Workloads

OCR versus extraction: Optical character recognition reads visible text. Content Understanding can also organize fields, tables, confidence, grounding, classification, and summaries across documents, images, audio, and video. If the scenario says read one sign, OCR may be enough. If it says process applications, invoices, recordings, or videos into a schema, think Content Understanding.

Speech recognition versus speaker recognition: Speech recognition finds the words. Speaker recognition identifies or verifies the voice. A captioning app needs speech to text. A secure voice gate may need speaker verification, with privacy controls.

Vision analysis versus image generation: Vision interprets an existing visual. Image generation creates a new visual from a prompt. A product-photo description is analysis; a new product mockup is generation.

Chat versus agent: A chat client answers a prompt. An agent can choose tools and work through steps. If the scenario needs the AI to check inventory, create a ticket, or update a record, classify it as agentic.

For AI-901, write the workload name before the service name in your scratch notes. That single habit prevents most service-selection errors.

Test Your Knowledge

A city inspection team uploads photos, short videos, and inspector voice notes. The app must return a JSON record of likely code violations, confidence scores, and source references for a reviewer. Which workload and service fit best?

Information extraction with Azure Content Understanding, because the inputs are mixed media and the output must be structured.

Text to speech with Azure Speech, because all AI workloads eventually produce audio.

Image generation, because the app receives photos and videos.

Azure AI Content Safety only, because the requirement is to classify every uploaded item as harmful.

Test Your Knowledge

A retail assistant must answer a customer, check live inventory, reserve two items, and create a pickup task if stock is available. How should this be classified?

Agentic AI, because the system needs a model plus tools to complete steps beyond a single response.

Optical character recognition, because the app needs to read text from product labels.

Speech synthesis, because the customer wants a useful response.

Image classification, because retail scenarios always involve computer vision.

Up Next

4.2 Service Picker Drills

Continue learning

Microsoft Certified: Azure AI Fundamentals

Microsoft Certified: Azure AI Fundamentals (AI-901)

4.1 Workload Classification Scenarios

Key Takeaways

Classify Before You Pick

Workload Map

The Four-Question Read

Traps That Separate Similar Workloads

Microsoft Certified: Azure AI Fundamentals

1Chapter 1: AI-901 Format and Responsible AI

2Chapter 2: Microsoft Foundry, Models, and Agents

3Chapter 3: Azure AI Services, Vision, Language, and Extraction

4Chapter 4: AI-901 Scenario and Service Selection

5Chapter 5: Practice Labs, Common Traps, and Final Review

Microsoft Certified: Azure AI Fundamentals (AI-901)

4.1 Workload Classification Scenarios

Key Takeaways

Classify Before You Pick

Workload Map

The Four-Question Read

Traps That Separate Similar Workloads