An administrator needs to detect documents that contain credit card numbers in the predictable card format. Which Microsoft Purview classification engine is designed for this?

A sensitive information type (SIT). A sensitive information type matches predictable patterns using regular expressions, keywords, a checksum (the Luhn check for cards), and a confidence level, which is exactly how Purview detects credit card numbers.

An organization wants to identify documents that 'look like resumes' even though resumes have no fixed pattern. Which capability fits best?

A trainable classifier. Trainable classifiers use machine learning trained on sample content to recognize categories by meaning, such as resumes or source code, where no reliable pattern exists. Microsoft even ships a pre-trained Resume classifier.

Why is data classification considered the foundation for other Microsoft Purview controls?

Because the same SITs and trainable classifiers are reused by sensitivity labels, DLP, and retention. Classification only identifies and tags data; the same sensitive information types and trainable classifiers are then consumed as conditions by sensitivity labels, DLP policies, and retention rules, so classification feeds every downstream control.

Data Classification Foundations — Free Study Guide 2026

Key Takeaways

Microsoft Purview classifies data with two engines: sensitive information types (pattern-based) and trainable classifiers (machine-learning, trained from samples).
A sensitive information type (SIT) detects patterns such as credit card, passport, or national ID numbers using regular expressions, keywords, checksums, and a confidence level.
Trainable classifiers recognize content by meaning rather than pattern, and ship as pre-trained categories (for example resumes, source code, harassment) plus custom classifiers you train.
Classification feeds the rest of Purview: the same SITs and classifiers are reused by sensitivity labels, DLP policies, retention, and auto-labeling.

What Data Classification Does

Data classification in Microsoft Purview is the capability that discovers, identifies, and tags data by type and sensitivity so the organization knows what it holds before it tries to protect it. Classification answers the first question of every data-governance project: where is our sensitive data, and what kind is it? For SC-900 you need to recognize the two classification engines, because the same engines are reused everywhere else in Purview — by sensitivity labels, data loss prevention (DLP), retention, and auto-labeling.

The first engine is the sensitive information type (SIT). A SIT is a pattern-based classifier that looks for a recognizable format. Microsoft ships hundreds of built-in SITs covering common identifiers across many regions — credit card numbers, bank account numbers, passport numbers, national IDs such as the U.S. Social Security number, ABA routing numbers, and health identifiers. Each SIT is defined by a combination of:

A primary pattern — usually a regular expression or a keyword list (for example the digit format of a credit card).
Supporting evidence — additional keywords, dates, or formatting near the match (such as the words "card" or an expiry date) that raise confidence.
A checksum where the identifier has one (a credit card number must pass the Luhn algorithm).
A confidence level (low, medium, high) and a proximity window describing how close the supporting evidence must be.

You can also create custom SITs when the built-ins do not cover an organization-specific format, such as an internal employee ID. SITs are ideal when sensitive data has a predictable shape.

Classification engine	How it decides	Best for
Sensitive information type (SIT)	Pattern match: regex, keywords, checksum, confidence level	Structured identifiers (credit cards, SSNs, passports)
Trainable classifier	Machine learning trained on sample documents	Unstructured content by meaning (contracts, resumes, source code)
Exact Data Match (EDM)	Compares against a hashed table of your real values	Reducing false positives on known records (optional, advanced)

Trainable Classifiers and Auto-Labeling

The second engine is the trainable classifier, which recognizes content by what it is about rather than by a fixed pattern. You cannot write a regular expression for "this looks like a resume" or "this is harassing language," so Purview uses machine learning. A trainable classifier is taught by feeding it sample content; it learns the characteristics of that category and can then identify similar items. There are two kinds:

Pre-trained classifiers that Microsoft ships ready to use, including categories such as Resume, Source Code, Harassment, Profanity, Threat, Discrimination, Customer Complaints, Healthcare, and Finance. These are available immediately with no training.
Custom trainable classifiers that you build by providing seed content (Microsoft recommends roughly 50–500 sample items) and then validating the results before publishing.

Once trained, a classifier can be used wherever Purview accepts a condition: to auto-apply a sensitivity label, to drive a DLP policy, to trigger an auto-apply retention label, or inside Communication Compliance. This reuse is the key idea SC-900 tests — you classify once, then many controls consume the result.

Classification powers automatic labeling. Instead of relying on every user to tag content correctly, an administrator can configure an auto-labeling policy that applies a sensitivity or retention label whenever a SIT or trainable classifier matches. Auto-labeling runs in two places: as a client-side recommendation/automatic label inside Office apps, and as a service-side policy that scans existing data at rest in Exchange, SharePoint, and OneDrive.

Where Classification Fits in the Workflow

Classification is the first stage of nearly every Microsoft Purview governance scenario, and SC-900 likes to test the order of operations. The workflow is: know your data → protect your data → prevent data loss → govern your data. Classification serves the know your data stage, producing the signals that the protection, prevention, and governance stages consume.

A worked example makes the reuse obvious. Suppose an organization wants to protect customer health records. The steps are:

Classify — create or select a sensitive information type for the health identifier, and optionally a trainable classifier for clinical documents.
Label — build an auto-apply sensitivity label ("Highly Confidential\Health") whose condition is that SIT or classifier matches.
Prevent — create a DLP policy whose condition is content labeled Highly Confidential\Health and whose action blocks external sharing.
Govern — add an auto-apply retention label so those records are kept for the legally required period.

Notice the same classification result drives all four stages — proof that classification is foundational rather than a standalone feature.

Common trap: classification is identification, not protection. A SIT match by itself does nothing to the file — it simply tells Purview the data is there. Protection only happens once a sensitivity label, DLP policy, or retention rule acts on that classification. On the exam, if a scenario says "identify" or "detect" sensitive data, the answer is classification (SITs / trainable classifiers). If it says "protect," "encrypt," "prevent sharing," or "keep for X years," the answer is a different control that uses the classification.

Finally, do not confuse Purview classification with a security-detection product. Microsoft Defender products detect threats and protect endpoints, identities, and workloads; Microsoft Sentinel is the enterprise SIEM/SOAR. Discovering, classifying, and tagging data itself is always a Microsoft Purview task, never a Defender or Sentinel one.

Azure SC-900 Study Guide

Microsoft Certified: Security, Compliance, and Identity Fundamentals (SC-900)

10.1 Data Classification Foundations

Key Takeaways

What Data Classification Does

Trainable Classifiers and Auto-Labeling

Where Classification Fits in the Workflow

Azure SC-900 Study Guide

1Orientation: SC-900 Audience, Format, Scoring, Retakes, and Study Map

2Security, Compliance, and Identity Concepts: Zero Trust, Shared Responsibility, and GRC

3Microsoft Entra ID: Identities, Hybrid Identity, and Authentication

4Microsoft Entra Access Management, Governance, PIM, and Identity Protection

5Azure Infrastructure Security: DDoS, Firewall, WAF, NSGs, Bastion, and Key Vault

6Defender for Cloud: CSPM, Cloud Workload Protection, and Secure Score

7Microsoft Sentinel: SIEM, SOAR, Analytics, Hunting, Workbooks, and Playbooks

8Microsoft Defender XDR: Office, Endpoint, Cloud Apps, Identity, Vulnerability, and Threat Intelligence

9Microsoft Purview, Service Trust, Privacy, Compliance Manager, and Compliance Score

10Microsoft Purview Data Classification, Labels, DLP, Retention, eDiscovery, and Audit

11SC-900 Product Selection Playbooks and Common Microsoft Rebrand Traps

12Final SC-900 Review, Timed Practice, Exam Day, and Next Certification Path

Microsoft Certified: Security, Compliance, and Identity Fundamentals (SC-900)

10.1 Data Classification Foundations

Key Takeaways

What Data Classification Does

Trainable Classifiers and Auto-Labeling

Where Classification Fits in the Workflow