4.1 Azure AI Language — Text Analytics Features
Key Takeaways
- Azure AI Language provides pre-built NLP features: sentiment analysis, named entity recognition (NER), key phrase extraction, language detection, PII detection, and text summarization.
- Sentiment analysis returns document-level and sentence-level sentiment (positive, negative, neutral, mixed) with confidence scores for each.
- Opinion mining is an extension of sentiment analysis that links sentiments to specific aspects/targets within the text.
- Named Entity Recognition (NER) identifies and categorizes entities like persons, organizations, locations, dates, quantities, and URLs.
- PII detection identifies and optionally redacts personally identifiable information including names, addresses, phone numbers, SSNs, and financial data.
Azure AI Language — Text Analytics Features
Quick Answer: Azure AI Language provides pre-built NLP features through a single API: sentiment analysis (with opinion mining), NER, key phrase extraction, language detection, PII detection/redaction, entity linking, and text summarization. These features work out-of-the-box with no training required.
Pre-Built Features Overview
| Feature | Input | Output | Use Case |
|---|---|---|---|
| Sentiment Analysis | Text | Positive/negative/neutral + confidence scores | Customer feedback analysis |
| Opinion Mining | Text | Aspect-level sentiment (target + assessment) | Product review analysis |
| NER | Text | Entity text + category + subcategory + offset | Information extraction |
| Key Phrase Extraction | Text | Array of key phrases | Topic summarization |
| Language Detection | Text | Detected language + confidence score | Routing multilingual content |
| PII Detection | Text | PII entities + redacted text | Privacy compliance |
| Entity Linking | Text | Entities linked to Wikipedia entries | Knowledge enrichment |
| Text Summarization | Text | Extractive or abstractive summary | Document summarization |
| Text Analytics for Health | Clinical text | Medical entities + relations + assertions | Healthcare NLP |
Sentiment Analysis and Opinion Mining
Basic Sentiment Analysis
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential
client = TextAnalyticsClient(
endpoint="https://my-language.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)
documents = [
"The hotel room was spacious and clean, but the restaurant food was terrible.",
"I absolutely loved the customer service experience!"
]
response = client.analyze_sentiment(documents, show_opinion_mining=True)
for doc in response:
print(f"Document sentiment: {doc.sentiment}")
print(f" Positive: {doc.confidence_scores.positive:.2f}")
print(f" Negative: {doc.confidence_scores.negative:.2f}")
print(f" Neutral: {doc.confidence_scores.neutral:.2f}")
for sentence in doc.sentences:
print(f" Sentence: '{sentence.text}'")
print(f" Sentiment: {sentence.sentiment}")
# Opinion mining - aspect-level sentiment
if sentence.mined_opinions:
for opinion in sentence.mined_opinions:
target = opinion.target
print(f" Target: {target.text} ({target.sentiment})")
for assessment in opinion.assessments:
print(f" Assessment: {assessment.text} "
f"({assessment.sentiment})")
Output Example
Document sentiment: mixed
Positive: 0.45
Negative: 0.50
Neutral: 0.05
Sentence: 'The hotel room was spacious and clean...'
Sentiment: positive
Target: hotel room (positive)
Assessment: spacious (positive)
Assessment: clean (positive)
Sentence: '...but the restaurant food was terrible.'
Sentiment: negative
Target: restaurant food (negative)
Assessment: terrible (negative)
On the Exam: Sentiment analysis returns four possible values: positive, negative, neutral, and mixed. "Mixed" appears when the document contains both positive and negative sentiments. Opinion mining adds target-assessment pairs to identify WHAT is being praised or criticized.
Named Entity Recognition (NER)
documents = [
"Microsoft was founded by Bill Gates and Paul Allen in Albuquerque on April 4, 1975."
]
response = client.recognize_entities(documents)
for doc in response:
for entity in doc.entities:
print(f"Entity: {entity.text}")
print(f" Category: {entity.category}")
print(f" Subcategory: {entity.subcategory}")
print(f" Confidence: {entity.confidence_score:.2f}")
print(f" Offset: {entity.offset}")
NER Entity Categories
| Category | Subcategories | Examples |
|---|---|---|
| Person | — | "Bill Gates", "Marie Curie" |
| Organization | — | "Microsoft", "United Nations" |
| Location | GPE, Structural, Geographic | "Paris", "Eiffel Tower", "Pacific Ocean" |
| DateTime | Date, Time, DateRange, Duration | "April 4, 1975", "3 PM", "two weeks" |
| Quantity | Number, Percentage, Currency, Age | "42", "15%", "$1,000", "30 years old" |
| URL | — | "https://example.com" |
| — | "user@example.com" | |
| Phone Number | — | "(555) 123-4567" |
| IP Address | — | "192.168.1.1" |
PII Detection and Redaction
documents = [
"My SSN is 123-45-6789 and my email is john@example.com. "
"Call me at (555) 123-4567."
]
response = client.recognize_pii_entities(documents)
for doc in response:
print(f"Redacted text: {doc.redacted_text}")
for entity in doc.entities:
print(f" PII: {entity.text} ({entity.category})")
# Output:
# Redacted text: My SSN is *********** and my email is ****************.
# Call me at **************.
# PII: 123-45-6789 (USSocialSecurityNumber)
# PII: john@example.com (Email)
# PII: (555) 123-4567 (PhoneNumber)
PII Categories
| Category | Examples |
|---|---|
| USSocialSecurityNumber | SSNs |
| CreditCardNumber | Visa, Mastercard numbers |
| Email addresses | |
| PhoneNumber | Phone numbers |
| Address | Physical addresses |
| IPAddress | IP addresses |
| Person | Personal names |
| Organization | Organization names (when contextually PII) |
Key Phrase Extraction
documents = [
"Azure AI Language provides powerful natural language processing "
"capabilities for text analytics, sentiment analysis, and entity recognition."
]
response = client.extract_key_phrases(documents)
for doc in response:
print(f"Key phrases: {doc.key_phrases}")
# Output: ['Azure AI Language', 'natural language processing',
# 'text analytics', 'sentiment analysis', 'entity recognition']
Language Detection
documents = [
"Bonjour le monde",
"Hello world",
"Hola mundo"
]
response = client.detect_language(documents)
for doc in response:
lang = doc.primary_language
print(f"Text: '{doc.id}' → {lang.name} ({lang.iso6391_name})"
f" confidence: {lang.confidence_score:.2f}")
# Output:
# Text → French (fr) confidence: 1.00
# Text → English (en) confidence: 1.00
# Text → Spanish (es) confidence: 1.00
Text Summarization
Azure AI Language offers two types of summarization:
| Type | Description | Output |
|---|---|---|
| Extractive | Selects key sentences from the original text | Original sentences (ranked by importance) |
| Abstractive | Generates new summary text | New text that paraphrases the original |
from azure.ai.textanalytics import (
ExtractiveSummaryAction,
AbstractiveSummaryAction
)
# Extractive summarization
poller = client.begin_analyze_actions(
documents=["Long document text here..."],
actions=[ExtractiveSummaryAction(max_sentence_count=3)]
)
for result in poller.result():
for summary in result:
for sentence in summary.sentences:
print(f" - {sentence.text} (rank: {sentence.rank_score:.2f})")
What sentiment values can Azure AI Language sentiment analysis return?
What does opinion mining add to basic sentiment analysis?
Which method should you call to get PII entities AND a redacted version of the text?
What is the difference between extractive and abstractive summarization?