3.1 Language and Speech Services

Key Takeaways

  • Azure AI Language is the service to map to written text analysis, including language detection, named entity recognition, PII detection, key phrase extraction, sentiment analysis, and summarization.
  • Azure AI Speech is the service to map to audio input or audio output, including speech to text, text to speech, speech translation, language identification, pronunciation assessment, and speaker-related scenarios.
  • Azure Translator is the service-specific answer for text and document translation, while speech translation belongs under Azure AI Speech.
  • For AI-901 implementation scenarios, recognize the Foundry path: test in a portal or studio surface, create or use a Foundry resource, then call the service through an SDK, REST API, or tool.
  • A spoken-prompt solution may use Azure Speech or a deployed multimodal model depending on whether the task is transcription/synthesis or direct audio-aware reasoning.
Last updated: June 2026

Service Choice Starts With The Input

AI-901 does not expect you to train a language model from scratch. It expects you to recognize when a ready-made Azure AI service is the better fit than a generic chat model. Start every scenario by asking what the user gives the app and what the app must return.

If the input is written text, think Azure AI Language or Azure Translator. If the input or output is audio, think Azure AI Speech. If the request is an open-ended prompt that includes audio or needs a conversational model response, a deployed multimodal model in Foundry may also appear in the answer, but the service distinction still matters.

Language, Translation, And Speech Map

Scenario signalBest Azure capabilityWhat it returns
Find people, places, organizations, dates, or PII in textAzure AI Language NER or PII detectionTyped entities and offsets
Pull main concepts from emails or ticketsAzure AI Language key phrase extractionImportant terms and phrases
Score reviews as positive, neutral, negative, or mixedAzure AI Language sentiment analysisSentiment labels and scores
Shorten a document or conversation transcriptAzure AI Language summarizationExtractive or abstractive summary
Translate written content between languagesAzure TranslatorTarget-language text or translated document
Turn audio into a transcript or caption streamAzure AI Speech speech to textText transcript, often with timing
Read a generated response aloudAzure AI Speech text to speechSynthesized audio
Translate spoken audio in real timeAzure AI Speech speech translationTranslated text and optionally speech
Verify or identify a person by voice traitsAzure AI Speech speaker recognitionSpeaker identity or verification result

Azure AI Language

Azure AI Language is a cloud service for natural language processing. Core capabilities include language detection, named entity recognition, PII detection, custom named entity recognition, and text analytics for health. Microsoft also lists established capabilities such as key phrase extraction, sentiment analysis, custom text classification, conversational language understanding, question answering, and summarization. The exam still names key phrase extraction, entity detection, sentiment analysis, and summarization explicitly, so study those even if a documentation page labels some as legacy or established.

A practical text analysis app usually follows this flow:

  1. Create or select an Azure AI Language or Foundry resource.
  2. Decide whether a prebuilt feature is enough or a custom project is needed.
  3. Test sample text in Microsoft Foundry or Language Studio where available.
  4. Call the REST API or client library from a lightweight app.
  5. Protect sensitive input and output, especially when detecting PII or health information.

Use prebuilt features when the categories are standard. Use custom named entity recognition or custom text classification when your business needs labels that the prebuilt model does not understand, such as internal product codes, contract clauses, or support-ticket categories.

Azure AI Speech

Azure AI Speech covers speech to text, text to speech, speech translation, language identification, pronunciation assessment, and related voice scenarios. The exam wording often hides the answer in the direction of conversion. Speech to text means audio becomes written words. Text to speech means written words become spoken audio. Speech translation means spoken language is translated, with text or synthesized speech as the output.

Speech also has implementation choices. Real-time transcription fits live captions, voice commands, and interactive meetings. Batch transcription fits a backlog of recordings. Text to speech uses neural voices and can be adjusted with Speech Synthesis Markup Language for pronunciation, pitch, rate, and volume. Containers or sovereign-cloud options may appear in production discussions, but AI-901 usually only needs the basic service fit.

Exam Process

Use this decision process in scenarios:

  1. Is the source written text, audio, or both?
  2. Does the app need analysis, translation, transcription, synthesis, or speaker identity?
  3. Is the desired output a label, extracted value, summary, translation, transcript, or audio file?
  4. Does a prebuilt service solve it, or does the business need custom labels or a multimodal model?
  5. After choosing, connect it through Foundry, Speech Studio, a REST API, or an SDK and apply responsible AI controls.

The common trap is mixing similar words. Speech recognition is not speaker recognition. Translation is not summarization. Entity recognition extracts typed items from text; it does not read text from an image. Keep the data direction clear and the service choice becomes straightforward.

Test Your Knowledge

A support team stores chat transcripts and wants an app to mask customer account numbers, identify product names, and produce a short case recap. Which Azure service is the best starting point?

A
B
C
D
Test Your Knowledge

A travel kiosk must listen to a spoken question in one language and play back the answer in another language. Which capability combination is most relevant?

A
B
C
D