4.1 Azure AI Language — Text Analytics Features

Key Takeaways

  • Azure AI Language exposes pre-built NLP features through one resource and one client: sentiment analysis, named entity recognition (NER), key phrase extraction, language detection, PII detection, entity linking, and summarization.
  • Sentiment analysis returns document-level and sentence-level labels (positive, negative, neutral, mixed) plus confidence scores that sum to 1.0 per item.
  • Opinion mining (set show_opinion_mining=True) links each sentiment to a target (aspect) and one or more assessments (opinions).
  • PII detection returns categorized entities AND a redacted_text where matches are masked; use the domain parameter (phi) for protected health information.
  • Most pre-built features are synchronous single calls, but summarization and Text Analytics for health run as long-running operations via begin_analyze_actions.
Last updated: June 2026

Quick Answer: Azure AI Language delivers pre-built Natural Language Processing (NLP) features through a single resource: sentiment analysis (with opinion mining), Named Entity Recognition (NER), key phrase extraction, language detection, Personally Identifiable Information (PII) detection/redaction, entity linking, and summarization. They work with no training. The AI-102 exam (40-60 questions, 100 minutes, pass 700/1000, $165) weights NLP at 15-20% under the current (December 23, 2025) outline, so this domain is heavily tested.

Pre-Built Features Overview

FeatureClient methodOutputSync/Async
Sentiment Analysisanalyze_sentimentpositive/negative/neutral/mixed + scoresSync
Opinion Mininganalyze_sentiment(show_opinion_mining=True)target + assessment pairsSync
NERrecognize_entitiestext, category, subcategory, offset, scoreSync
Key Phrase Extractionextract_key_phrasesarray of phrasesSync
Language Detectiondetect_languagelanguage + ISO 639-1 code + scoreSync
PII Detectionrecognize_pii_entitiesentities + redacted_textSync
Entity Linkingrecognize_linked_entitiesentities linked to WikipediaSync
Summarizationbegin_analyze_actionsextractive or abstractive summaryAsync
Text Analytics for healthbegin_analyze_healthcare_entitiesmedical entities + relationsAsync

Sentiment Analysis and Opinion Mining

The service returns a label at two levels. The document label is derived from sentence labels, and mixed appears when one document contains both positive and negative sentences. The three confidence scores (positive, negative, neutral) sum to 1.0 per item — there is no separate "mixed" score; mixed is inferred.

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

client = TextAnalyticsClient(
    endpoint="https://my-language.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>"))

docs = ["The room was spacious and clean, but the food was terrible."]
response = client.analyze_sentiment(docs, show_opinion_mining=True)

for doc in response:
    print(doc.sentiment)                       # mixed
    print(doc.confidence_scores.positive)      # e.g. 0.45
    for sentence in doc.sentences:
        for op in sentence.mined_opinions:
            print(op.target.text, op.target.sentiment)        # food negative
            for a in op.assessments:
                print("  ", a.text, a.sentiment)              # terrible negative

On the Exam: Sentiment has FOUR document values — positive, negative, neutral, mixed. Opinion mining adds target (the aspect, e.g. "food") and assessment (the opinion word, e.g. "terrible"). If a scenario asks "which aspect of a product was criticized," the answer is opinion mining, not plain sentiment.

Named Entity Recognition (NER)

NER classifies spans into categories and subcategories with character offsets. Offsets default to UTF-16 code units, which matters for emoji and surrogate pairs — set string_index_type="UnicodeCodePoint" (or Utf16CodeUnit) to match your runtime.

CategorySubcategoriesExamples
Person"Bill Gates"
OrganizationMedical, Stock exchange, Sports"Microsoft", "NYSE"
LocationGPE, Structural, Geographical"Paris", "Eiffel Tower"
DateTimeDate, Time, DateRange, Duration, Set"April 4 1975", "3 PM"
QuantityNumber, Percentage, Currency, Age, Dimension"42", "15%", "$1,000"
Email / URL / Phone / IP"a@b.com", "192.168.1.1"

PII Detection and Redaction

recognize_pii_entities returns both the categorized entities and a masked redacted_text. Pass domain="phi" to detect Protected Health Information, and use categories_filter to restrict to specific types (e.g. only USSocialSecurityNumber). Redaction replaces matches with asterisks by default.

response = client.recognize_pii_entities(
    ["My SSN is 123-45-6789, email a@b.com"], domain="phi")
for doc in response:
    print(doc.redacted_text)   # My SSN is ***********, email *******
    for e in doc.entities:
        print(e.text, e.category, e.confidence_score)

Summarization, Language Detection, and Limits

Summarization is asynchronous. Extractive ranks and returns original sentences (control count with max_sentence_count); abstractive generates new paraphrased text. Both run through begin_analyze_actions with ExtractiveSummaryAction or AbstractiveSummaryAction.

Key service limits worth memorizing:

  • A single sync request accepts up to 5 documents for some features and 25 for others; check per-feature limits.
  • Each document is capped at 5,120 characters for synchronous calls (longer text must be chunked).
  • Language detection returns the primary language with an ISO 639-1 code; ambiguous input returns (Unknown) with a low score.

Common Trap: Candidates confuse recognize_entities (general NER, no masking) with recognize_pii_entities (PII + redacted_text). Only PII detection returns redacted output. Likewise, entity linking (recognize_linked_entities) disambiguates entities against Wikipedia — it is not the same as NER.

Choosing the Right Feature in a Scenario

Exam scenarios rarely name the method; they describe a business outcome and expect you to map it to the correct capability. Work backward from the desired output. If the requirement is "flag whether each review is favorable or unfavorable," that is document sentiment. If it is "tell us which feature of the product customers complain about," that is opinion mining, because only opinion mining surfaces the aspect (target) and the opinion word (assessment) tied to it. If the requirement is "pull out company names, dates, and amounts from contracts," that is general named entity recognition, since those are pre-built categories.

When the requirement is "mask social security numbers and credit cards before storing transcripts," pick personally identifiable information detection with redaction, and reach for the protected-health-information domain when clinical confidentiality is in scope. "Summarize a 40-page report into three sentences taken verbatim from the source" is extractive summarization, whereas "rewrite the report into a short paragraph in our own words" is abstractive summarization. "Route incoming chat messages to English, French, or Spanish agents" maps to language detection feeding a router.

A second pattern is distinguishing pre-built features from the trainable custom features in the next section. Pre-built sentiment, entity recognition, key phrase extraction, language detection, and summarization need no training data. The moment a scenario mentions company-specific categories, proprietary entity types, or labeled examples in storage, the answer shifts to a custom project rather than a pre-built call.

Test Your Knowledge

A reviewer writes: "The screen is gorgeous but the battery life is awful." Which capability identifies that "battery life" specifically received negative sentiment?

A
B
C
D
Test Your Knowledge

Which method returns both the detected PII entities and a masked version of the input string?

A
B
C
D
Test Your Knowledge

You must summarize a long support article by returning new, paraphrased prose rather than copied sentences. Which action do you use?

A
B
C
D