3.1 Language and Speech Services

Key Takeaways

Azure AI Language is the service to map to written text analysis, including language detection, named entity recognition, PII detection, key phrase extraction, sentiment analysis, and summarization.
Azure AI Speech is the service to map to audio input or audio output, including speech to text, text to speech, speech translation, language identification, pronunciation assessment, and speaker-related scenarios.
Azure Translator is the service-specific answer for text and document translation, while speech translation belongs under Azure AI Speech.
For AI-901 implementation scenarios, recognize the Foundry path: test in a portal or studio surface, create or use a Foundry resource, then call the service through an SDK, REST API, or tool.
A spoken-prompt solution may use Azure Speech or a deployed multimodal model depending on whether the task is transcription/synthesis or direct audio-aware reasoning.

Last updated: June 2026

Service Choice Starts With The Input

AI-901 does not expect you to train a language model from scratch. It expects you to recognize when a ready-made Azure AI service is the better fit than a generic chat model. Start every scenario by asking what the user gives the app and what the app must return.

If the input is written text, think Azure AI Language or Azure Translator. If the input or output is audio, think Azure AI Speech. If the request is an open-ended prompt that includes audio or needs a conversational model response, a deployed multimodal model in Foundry may also appear in the answer, but the service distinction still matters.

Language, Translation, And Speech Map

Scenario signal	Best Azure capability	What it returns
Find people, places, organizations, dates, or PII in text	Azure AI Language NER or PII detection	Typed entities and offsets
Pull main concepts from emails or tickets	Azure AI Language key phrase extraction	Important terms and phrases
Score reviews as positive, neutral, negative, or mixed	Azure AI Language sentiment analysis	Sentiment labels and scores
Shorten a document or conversation transcript	Azure AI Language summarization	Extractive or abstractive summary
Translate written content between languages	Azure Translator	Target-language text or translated document
Turn audio into a transcript or caption stream	Azure AI Speech speech to text	Text transcript, often with timing
Read a generated response aloud	Azure AI Speech text to speech	Synthesized audio
Translate spoken audio in real time	Azure AI Speech speech translation	Translated text and optionally speech
Verify or identify a person by voice traits	Azure AI Speech speaker recognition	Speaker identity or verification result

Azure AI Language

Azure AI Language is a cloud service for natural language processing. Core capabilities include language detection, named entity recognition, PII detection, custom named entity recognition, and text analytics for health. Microsoft also lists established capabilities such as key phrase extraction, sentiment analysis, custom text classification, conversational language understanding, question answering, and summarization. The exam still names key phrase extraction, entity detection, sentiment analysis, and summarization explicitly, so study those even if a documentation page labels some as legacy or established.

A practical text analysis app usually follows this flow:

Create or select an Azure AI Language or Foundry resource.
Decide whether a prebuilt feature is enough or a custom project is needed.
Test sample text in Microsoft Foundry or Language Studio where available.
Call the REST API or client library from a lightweight app.
Protect sensitive input and output, especially when detecting PII or health information.

Use prebuilt features when the categories are standard. Use custom named entity recognition or custom text classification when your business needs labels that the prebuilt model does not understand, such as internal product codes, contract clauses, or support-ticket categories.

Azure AI Speech

Azure AI Speech covers speech to text, text to speech, speech translation, language identification, pronunciation assessment, and related voice scenarios. The exam wording often hides the answer in the direction of conversion. Speech to text means audio becomes written words. Text to speech means written words become spoken audio. Speech translation means spoken language is translated, with text or synthesized speech as the output.

Speech also has implementation choices. Real-time transcription fits live captions, voice commands, and interactive meetings. Batch transcription fits a backlog of recordings. Text to speech uses neural voices and can be adjusted with Speech Synthesis Markup Language for pronunciation, pitch, rate, and volume. Containers or sovereign-cloud options may appear in production discussions, but AI-901 usually only needs the basic service fit.

Exam Process

Use this decision process in scenarios:

Is the source written text, audio, or both?
Does the app need analysis, translation, transcription, synthesis, or speaker identity?
Is the desired output a label, extracted value, summary, translation, transcript, or audio file?
Does a prebuilt service solve it, or does the business need custom labels or a multimodal model?
After choosing, connect it through Foundry, Speech Studio, a REST API, or an SDK and apply responsible AI controls.

The common trap is mixing similar words. Speech recognition is not speaker recognition. Translation is not summarization. Entity recognition extracts typed items from text; it does not read text from an image. Keep the data direction clear and the service choice becomes straightforward.

Test Your Knowledge

A support team stores chat transcripts and wants an app to mask customer account numbers, identify product names, and produce a short case recap. Which Azure service is the best starting point?

Azure AI Language, because the workload is text analysis with PII, entity, and summarization needs.

Azure AI Vision, because all extraction tasks require an image model.

Azure AI Speech only, because transcripts are already written text.

Azure Load Balancer, because the app needs to distribute language requests.

Test Your Knowledge

A travel kiosk must listen to a spoken question in one language and play back the answer in another language. Which capability combination is most relevant?

Azure AI Speech with speech recognition, speech translation, and text to speech.

Azure AI Vision with OCR and smart crop.

Azure AI Language key phrase extraction only.

Azure Content Understanding with a receipt analyzer.

Up Next

3.2 Vision and Image Workloads

Continue learning

Microsoft Certified: Azure AI Fundamentals

Microsoft Certified: Azure AI Fundamentals (AI-901)

3.1 Language and Speech Services

Key Takeaways

Service Choice Starts With The Input

Language, Translation, And Speech Map

Azure AI Language

Azure AI Speech

Exam Process

Microsoft Certified: Azure AI Fundamentals

1Chapter 1: AI-901 Format and Responsible AI

2Chapter 2: Microsoft Foundry, Models, and Agents

3Chapter 3: Azure AI Services, Vision, Language, and Extraction

4Chapter 4: AI-901 Scenario and Service Selection

5Chapter 5: Practice Labs, Common Traps, and Final Review

Microsoft Certified: Azure AI Fundamentals (AI-901)

3.1 Language and Speech Services

Key Takeaways

Service Choice Starts With The Input

Language, Translation, And Speech Map

Azure AI Language

Azure AI Speech

Exam Process