4.1 NLP Concepts and Common Tasks
Key Takeaways
- Natural Language Processing (NLP) enables AI to understand, interpret, and generate human language — both text and speech.
- Key phrase extraction identifies the main topics and concepts in a text document, distilling long text into its most important ideas.
- Named entity recognition (NER) detects and classifies entities in text — people, organizations, locations, dates, quantities, and more.
- Sentiment analysis determines whether text expresses positive, negative, or neutral sentiment, with confidence scores for each category.
- Language detection identifies which language a piece of text is written in, supporting 120+ languages.
NLP Concepts and Common Tasks
Quick Answer: NLP enables AI to understand and process human language. Key tasks include key phrase extraction (main topics), named entity recognition (people, places, dates), sentiment analysis (positive/negative/neutral), language detection (which language), and text summarization (condense text). Azure AI Language provides all these capabilities as pre-built APIs.
What Is Natural Language Processing?
Natural Language Processing (NLP) is a branch of AI that deals with the interaction between computers and human language. NLP enables machines to read, understand, and derive meaning from text and speech.
NLP powers applications you use daily:
- Spam filters analyzing email content
- Virtual assistants understanding voice commands
- Auto-complete suggestions while typing
- Machine translation between languages
- Chatbots answering customer questions
- Search engines understanding your queries
Core NLP Tasks
1. Key Phrase Extraction
What it does: Identifies the main topics and concepts in text, extracting the most important phrases.
Input: "The food at the restaurant was delicious and the service was excellent. The atmosphere was cozy and the prices were reasonable."
Output: ["food", "restaurant", "service", "atmosphere", "prices"]
Use cases:
- Summarize large document collections by their key topics
- Extract main themes from customer feedback
- Automatically tag articles and documents
- Identify trending topics in social media
2. Named Entity Recognition (NER)
What it does: Detects and classifies entities mentioned in text — people, organizations, locations, dates, quantities, URLs, emails, and more.
Input: "Microsoft CEO Satya Nadella announced a $10 billion investment in Azure AI at the Build 2026 conference in Seattle."
Output:
| Entity | Type | Category |
|---|---|---|
| Microsoft | Organization | Company |
| Satya Nadella | Person | Name |
| $10 billion | Quantity | Currency |
| Azure AI | Product | Technology |
| Build 2026 | Event | Name |
| Seattle | Location | City |
Use cases:
- Extract contact information from documents
- Identify people and organizations mentioned in news articles
- Categorize support tickets by mentioned products
- Build knowledge graphs from unstructured text
3. Sentiment Analysis
What it does: Determines whether text expresses positive, negative, or neutral sentiment, with confidence scores for each.
Input: "The hotel room was beautiful, but the check-in process was terrible."
Output:
| Scope | Sentiment | Positive | Neutral | Negative |
|---|---|---|---|---|
| Overall | Mixed | 0.45 | 0.10 | 0.45 |
| "The hotel room was beautiful" | Positive | 0.95 | 0.03 | 0.02 |
| "the check-in process was terrible" | Negative | 0.02 | 0.05 | 0.93 |
Important: Sentiment analysis can operate at document level (overall sentiment) or sentence level (sentiment per sentence), allowing fine-grained analysis.
Use cases:
- Monitor brand sentiment on social media
- Analyze customer review trends
- Route customer support tickets by urgency (negative = urgent)
- Track employee satisfaction in survey responses
4. Language Detection
What it does: Identifies which language a piece of text is written in.
Input: "Bonjour le monde, comment allez-vous?"
Output: French (fr), Confidence: 0.99
Capabilities:
- Supports 120+ languages
- Detects the dominant language in mixed-language text
- Returns ISO 639-1 language codes (en, fr, es, de, etc.)
- Handles short text and ambiguous input
Use cases:
- Route customer communications to language-appropriate agents
- Pre-process text before translation
- Analyze the language distribution of social media mentions
- Validate expected language in form submissions
5. Text Summarization
What it does: Condenses long text into a shorter summary that captures the key points.
Types:
- Extractive summarization — selects the most important sentences from the original text
- Abstractive summarization — generates new sentences that capture the meaning (uses generative AI)
Use cases:
- Summarize meeting transcripts
- Create article summaries for newsletters
- Condense legal documents into key points
- Generate executive summaries from reports
6. PII (Personally Identifiable Information) Detection
What it does: Detects and optionally redacts sensitive personal information in text, such as names, phone numbers, email addresses, Social Security numbers, and credit card numbers.
Input: "Call John Smith at 555-123-4567 or email john@example.com"
Output: "Call [PERSON] at [PHONE NUMBER] or email [EMAIL]"
Use cases:
- Comply with GDPR, HIPAA, and other privacy regulations
- Redact PII before storing or sharing documents
- Monitor communications for data leak prevention
- Anonymize customer data for analytics
On the Exam: Questions often present a text analysis scenario and ask which NLP task is needed. Focus on what information needs to be extracted: main topics = key phrase extraction, specific entities = NER, emotional tone = sentiment, language identification = language detection, sensitive data = PII detection.
A company wants to analyze customer reviews to determine if customers feel positively or negatively about their products. Which NLP task should they use?
Which NLP task would identify "Microsoft", "Satya Nadella", and "Seattle" as entities of type Organization, Person, and Location in a news article?
A global company receives customer messages in many languages and needs to route them to agents who speak the appropriate language. Which NLP task should they use first?
A healthcare organization needs to automatically redact patient names, phone numbers, and Social Security numbers from medical documents. Which NLP task is most appropriate?
Match each NLP task to what it extracts from text:
Match each item on the left with the correct item on the right