4.1 Azure AI Language — Text Analytics Features

Key Takeaways

Azure AI Language exposes pre-built NLP features through one resource and one client: sentiment analysis, named entity recognition (NER), key phrase extraction, language detection, PII detection, entity linking, and summarization.
Sentiment analysis returns document-level and sentence-level labels (positive, negative, neutral, mixed) plus confidence scores that sum to 1.0 per item.
Opinion mining (set show_opinion_mining=True) links each sentiment to a target (aspect) and one or more assessments (opinions).
PII detection returns categorized entities AND a redacted_text where matches are masked; use the domain parameter (phi) for protected health information.
Most pre-built features are synchronous single calls, but summarization and Text Analytics for health run as long-running operations via begin_analyze_actions.

Last updated: June 2026

Quick Answer: Azure AI Language delivers pre-built Natural Language Processing (NLP) features through a single resource: sentiment analysis (with opinion mining), Named Entity Recognition (NER), key phrase extraction, language detection, Personally Identifiable Information (PII) detection/redaction, entity linking, and summarization. They work with no training. The AI-102 exam (40-60 questions, 100 minutes, pass 700/1000, $165) weights NLP at 15-20% under the current (December 23, 2025) outline, so this domain is heavily tested.

Pre-Built Features Overview

Feature	Client method	Output	Sync/Async
Sentiment Analysis	`analyze_sentiment`	positive/negative/neutral/mixed + scores	Sync
Opinion Mining	`analyze_sentiment(show_opinion_mining=True)`	target + assessment pairs	Sync
NER	`recognize_entities`	text, category, subcategory, offset, score	Sync
Key Phrase Extraction	`extract_key_phrases`	array of phrases	Sync
Language Detection	`detect_language`	language + ISO 639-1 code + score	Sync
PII Detection	`recognize_pii_entities`	entities + `redacted_text`	Sync
Entity Linking	`recognize_linked_entities`	entities linked to Wikipedia	Sync
Summarization	`begin_analyze_actions`	extractive or abstractive summary	Async
Text Analytics for health	`begin_analyze_healthcare_entities`	medical entities + relations	Async

Sentiment Analysis and Opinion Mining

The service returns a label at two levels. The document label is derived from sentence labels, and mixed appears when one document contains both positive and negative sentences. The three confidence scores (positive, negative, neutral) sum to 1.0 per item — there is no separate "mixed" score; mixed is inferred.

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(
    endpoint="https://my-language.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>"))

docs = ["The room was spacious and clean, but the food was terrible."]
response = client.analyze_sentiment(docs, show_opinion_mining=True)

for doc in response:
    print(doc.sentiment)                       # mixed
    print(doc.confidence_scores.positive)      # e.g. 0.45
    for sentence in doc.sentences:
        for op in sentence.mined_opinions:
            print(op.target.text, op.target.sentiment)        # food negative
            for a in op.assessments:
                print("  ", a.text, a.sentiment)              # terrible negative

On the Exam: Sentiment has FOUR document values — positive, negative, neutral, mixed. Opinion mining adds target (the aspect, e.g. "food") and assessment (the opinion word, e.g. "terrible"). If a scenario asks "which aspect of a product was criticized," the answer is opinion mining, not plain sentiment.

Named Entity Recognition (NER)

NER classifies spans into categories and subcategories with character offsets. Offsets default to UTF-16 code units, which matters for emoji and surrogate pairs — set string_index_type="UnicodeCodePoint" (or Utf16CodeUnit) to match your runtime.

Category	Subcategories	Examples
Person	—	"Bill Gates"
Organization	Medical, Stock exchange, Sports	"Microsoft", "NYSE"
Location	GPE, Structural, Geographical	"Paris", "Eiffel Tower"
DateTime	Date, Time, DateRange, Duration, Set	"April 4 1975", "3 PM"
Quantity	Number, Percentage, Currency, Age, Dimension	"42", "15%", "$1,000"
Email / URL / Phone / IP	—	"a@b.com", "192.168.1.1"

PII Detection and Redaction

recognize_pii_entities returns both the categorized entities and a masked redacted_text. Pass domain="phi" to detect Protected Health Information, and use categories_filter to restrict to specific types (e.g. only USSocialSecurityNumber). Redaction replaces matches with asterisks by default.

response = client.recognize_pii_entities(
    ["My SSN is 123-45-6789, email a@b.com"], domain="phi")
for doc in response:
    print(doc.redacted_text)   # My SSN is ***********, email *******
    for e in doc.entities:
        print(e.text, e.category, e.confidence_score)

Summarization, Language Detection, and Limits

Summarization is asynchronous. Extractive ranks and returns original sentences (control count with max_sentence_count); abstractive generates new paraphrased text. Both run through begin_analyze_actions with ExtractiveSummaryAction or AbstractiveSummaryAction.

Key service limits worth memorizing:

A single sync request accepts up to 5 documents for some features and 25 for others; check per-feature limits.
Each document is capped at 5,120 characters for synchronous calls (longer text must be chunked).
Language detection returns the primary language with an ISO 639-1 code; ambiguous input returns (Unknown) with a low score.

Common Trap: Candidates confuse recognize_entities (general NER, no masking) with recognize_pii_entities (PII + redacted_text). Only PII detection returns redacted output. Likewise, entity linking (recognize_linked_entities) disambiguates entities against Wikipedia — it is not the same as NER.

Choosing the Right Feature in a Scenario

Exam scenarios rarely name the method; they describe a business outcome and expect you to map it to the correct capability. Work backward from the desired output. If the requirement is "flag whether each review is favorable or unfavorable," that is document sentiment. If it is "tell us which feature of the product customers complain about," that is opinion mining, because only opinion mining surfaces the aspect (target) and the opinion word (assessment) tied to it. If the requirement is "pull out company names, dates, and amounts from contracts," that is general named entity recognition, since those are pre-built categories.

When the requirement is "mask social security numbers and credit cards before storing transcripts," pick personally identifiable information detection with redaction, and reach for the protected-health-information domain when clinical confidentiality is in scope. "Summarize a 40-page report into three sentences taken verbatim from the source" is extractive summarization, whereas "rewrite the report into a short paragraph in our own words" is abstractive summarization. "Route incoming chat messages to English, French, or Spanish agents" maps to language detection feeding a router.

A second pattern is distinguishing pre-built features from the trainable custom features in the next section. Pre-built sentiment, entity recognition, key phrase extraction, language detection, and summarization need no training data. The moment a scenario mentions company-specific categories, proprietary entity types, or labeled examples in storage, the answer shifts to a custom project rather than a pre-built call.

Test Your Knowledge

A reviewer writes: "The screen is gorgeous but the battery life is awful." Which capability identifies that "battery life" specifically received negative sentiment?

Language detection

Key phrase extraction

Opinion mining (aspect-based sentiment)

Entity linking

Test Your Knowledge

Which method returns both the detected PII entities and a masked version of the input string?

analyze_sentiment()

recognize_entities()

extract_key_phrases()

recognize_pii_entities()

Test Your Knowledge

You must summarize a long support article by returning new, paraphrased prose rather than copied sentences. Which action do you use?

AbstractiveSummaryAction

ExtractiveSummaryAction

RecognizeEntitiesAction

ExtractKeyPhrasesAction

Up Next

4.2 Conversational Language Understanding (CLU)

Continue learning

Azure AI Engineer Associate

Azure AI-102

4.1 Azure AI Language — Text Analytics Features

Key Takeaways

Pre-Built Features Overview

Sentiment Analysis and Opinion Mining

Named Entity Recognition (NER)

PII Detection and Redaction

Summarization, Language Detection, and Limits

Choosing the Right Feature in a Scenario

Azure AI Engineer Associate

1Introduction

2Domain 1: Plan and Manage an Azure AI Solution (20-25%)

3Content Safety and Moderation (within Plan and Manage, Domain 1)

4Domain 4: Implement Computer Vision Solutions (10-15%)

5Domain 5: Implement Natural Language Processing Solutions (15-20%)

6Domain 6: Implement Knowledge Mining and Information Extraction Solutions (15-20%)

7Domain 2: Implement Generative AI Solutions (15-20%)

8Domain 3: Implement an Agentic Solution (5-10%)

9Exam Review: Cross-Domain Topics and Advanced Practice

Azure AI-102

4.1 Azure AI Language — Text Analytics Features

Key Takeaways

Pre-Built Features Overview

Sentiment Analysis and Opinion Mining

Named Entity Recognition (NER)

PII Detection and Redaction

Summarization, Language Detection, and Limits

Choosing the Right Feature in a Scenario