4.1 Azure AI Language — Text Analytics Features
Key Takeaways
- Azure AI Language exposes pre-built NLP features through one resource and one client: sentiment analysis, named entity recognition (NER), key phrase extraction, language detection, PII detection, entity linking, and summarization.
- Sentiment analysis returns document-level and sentence-level labels (positive, negative, neutral, mixed) plus confidence scores that sum to 1.0 per item.
- Opinion mining (set show_opinion_mining=True) links each sentiment to a target (aspect) and one or more assessments (opinions).
- PII detection returns categorized entities AND a redacted_text where matches are masked; use the domain parameter (phi) for protected health information.
- Most pre-built features are synchronous single calls, but summarization and Text Analytics for health run as long-running operations via begin_analyze_actions.
Quick Answer: Azure AI Language delivers pre-built Natural Language Processing (NLP) features through a single resource: sentiment analysis (with opinion mining), Named Entity Recognition (NER), key phrase extraction, language detection, Personally Identifiable Information (PII) detection/redaction, entity linking, and summarization. They work with no training. The AI-102 exam (40-60 questions, 100 minutes, pass 700/1000, $165) weights NLP at 15-20% under the current (December 23, 2025) outline, so this domain is heavily tested.
Pre-Built Features Overview
| Feature | Client method | Output | Sync/Async |
|---|---|---|---|
| Sentiment Analysis | analyze_sentiment | positive/negative/neutral/mixed + scores | Sync |
| Opinion Mining | analyze_sentiment(show_opinion_mining=True) | target + assessment pairs | Sync |
| NER | recognize_entities | text, category, subcategory, offset, score | Sync |
| Key Phrase Extraction | extract_key_phrases | array of phrases | Sync |
| Language Detection | detect_language | language + ISO 639-1 code + score | Sync |
| PII Detection | recognize_pii_entities | entities + redacted_text | Sync |
| Entity Linking | recognize_linked_entities | entities linked to Wikipedia | Sync |
| Summarization | begin_analyze_actions | extractive or abstractive summary | Async |
| Text Analytics for health | begin_analyze_healthcare_entities | medical entities + relations | Async |
Sentiment Analysis and Opinion Mining
The service returns a label at two levels. The document label is derived from sentence labels, and mixed appears when one document contains both positive and negative sentences. The three confidence scores (positive, negative, neutral) sum to 1.0 per item — there is no separate "mixed" score; mixed is inferred.
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
client = TextAnalyticsClient(
endpoint="https://my-language.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>"))
docs = ["The room was spacious and clean, but the food was terrible."]
response = client.analyze_sentiment(docs, show_opinion_mining=True)
for doc in response:
print(doc.sentiment) # mixed
print(doc.confidence_scores.positive) # e.g. 0.45
for sentence in doc.sentences:
for op in sentence.mined_opinions:
print(op.target.text, op.target.sentiment) # food negative
for a in op.assessments:
print(" ", a.text, a.sentiment) # terrible negative
On the Exam: Sentiment has FOUR document values — positive, negative, neutral, mixed. Opinion mining adds target (the aspect, e.g. "food") and assessment (the opinion word, e.g. "terrible"). If a scenario asks "which aspect of a product was criticized," the answer is opinion mining, not plain sentiment.
Named Entity Recognition (NER)
NER classifies spans into categories and subcategories with character offsets. Offsets default to UTF-16 code units, which matters for emoji and surrogate pairs — set string_index_type="UnicodeCodePoint" (or Utf16CodeUnit) to match your runtime.
| Category | Subcategories | Examples |
|---|---|---|
| Person | — | "Bill Gates" |
| Organization | Medical, Stock exchange, Sports | "Microsoft", "NYSE" |
| Location | GPE, Structural, Geographical | "Paris", "Eiffel Tower" |
| DateTime | Date, Time, DateRange, Duration, Set | "April 4 1975", "3 PM" |
| Quantity | Number, Percentage, Currency, Age, Dimension | "42", "15%", "$1,000" |
| Email / URL / Phone / IP | — | "a@b.com", "192.168.1.1" |
PII Detection and Redaction
recognize_pii_entities returns both the categorized entities and a masked redacted_text. Pass domain="phi" to detect Protected Health Information, and use categories_filter to restrict to specific types (e.g. only USSocialSecurityNumber). Redaction replaces matches with asterisks by default.
response = client.recognize_pii_entities(
["My SSN is 123-45-6789, email a@b.com"], domain="phi")
for doc in response:
print(doc.redacted_text) # My SSN is ***********, email *******
for e in doc.entities:
print(e.text, e.category, e.confidence_score)
Summarization, Language Detection, and Limits
Summarization is asynchronous. Extractive ranks and returns original sentences (control count with max_sentence_count); abstractive generates new paraphrased text. Both run through begin_analyze_actions with ExtractiveSummaryAction or AbstractiveSummaryAction.
Key service limits worth memorizing:
- A single sync request accepts up to 5 documents for some features and 25 for others; check per-feature limits.
- Each document is capped at 5,120 characters for synchronous calls (longer text must be chunked).
- Language detection returns the primary language with an ISO 639-1 code; ambiguous input returns
(Unknown)with a low score.
Common Trap: Candidates confuse
recognize_entities(general NER, no masking) withrecognize_pii_entities(PII +redacted_text). Only PII detection returns redacted output. Likewise, entity linking (recognize_linked_entities) disambiguates entities against Wikipedia — it is not the same as NER.
Choosing the Right Feature in a Scenario
Exam scenarios rarely name the method; they describe a business outcome and expect you to map it to the correct capability. Work backward from the desired output. If the requirement is "flag whether each review is favorable or unfavorable," that is document sentiment. If it is "tell us which feature of the product customers complain about," that is opinion mining, because only opinion mining surfaces the aspect (target) and the opinion word (assessment) tied to it. If the requirement is "pull out company names, dates, and amounts from contracts," that is general named entity recognition, since those are pre-built categories.
When the requirement is "mask social security numbers and credit cards before storing transcripts," pick personally identifiable information detection with redaction, and reach for the protected-health-information domain when clinical confidentiality is in scope. "Summarize a 40-page report into three sentences taken verbatim from the source" is extractive summarization, whereas "rewrite the report into a short paragraph in our own words" is abstractive summarization. "Route incoming chat messages to English, French, or Spanish agents" maps to language detection feeding a router.
A second pattern is distinguishing pre-built features from the trainable custom features in the next section. Pre-built sentiment, entity recognition, key phrase extraction, language detection, and summarization need no training data. The moment a scenario mentions company-specific categories, proprietary entity types, or labeled examples in storage, the answer shifts to a custom project rather than a pre-built call.
A reviewer writes: "The screen is gorgeous but the battery life is awful." Which capability identifies that "battery life" specifically received negative sentiment?
Which method returns both the detected PII entities and a masked version of the input string?
You must summarize a long support article by returning new, paraphrased prose rather than copied sentences. Which action do you use?