5.8 Compliance and PII Data Management
Key Takeaways
- Tags are key-value metadata attached to securable objects (e.g., classification='PII') to organize, search, and drive policy on sensitive data.
- Governed tags define an account-level vocabulary of allowed tag keys/values, with permissions over who may apply which tags.
- Automated data classification scans columns to detect and tag sensitive data such as PII without manual review.
- Column masks, row filters, and ABAC policies enforce PII protection; audit logs and lineage provide the compliance evidence trail.
- Combining classification (find PII), tags (label it), and ABAC policies (protect it) operationalizes regulations like GDPR and HIPAA.
Quick Answer: Unity Catalog manages PII and compliance with tags (key-value metadata like
classification=PII), governed tags (an account-level controlled vocabulary with permissions on who can apply them), and automated data classification that scans columns to detect sensitive data. Tags drive ABAC policies, masks, and filters, while lineage and audit logs provide the compliance evidence trail.
Tags and Classification
A tag is key-value metadata attached to a securable object — a catalog, schema, table, or column. Tags such as classification = 'PII', sensitivity = 'high', or domain = 'finance' let you organize, search, and govern data at scale. Applying a tag requires the APPLY TAG privilege. Tags are searchable in Catalog Explorer, so a privacy officer can instantly list every column tagged as PII across the account.
Automated data classification goes further: Databricks scans column data and automatically detects and tags likely sensitive content (emails, SSNs, phone numbers) so you do not have to find PII by hand across thousands of tables.
Governed Tags and Policy Enforcement
Free-form tags drift — three teams might write pii, PII, and personal. Governed tags fix this by defining an account-level vocabulary of allowed tag keys and values, plus permissions controlling who may apply which tags to which objects. This standardization is what makes tags reliable enough to drive automated security.
The operational pattern ties the chapter together:
| Step | Capability |
|---|---|
| Find sensitive data | Automated data classification |
| Label it consistently | Governed tags (e.g., classification=PII) |
| Protect it | ABAC policies, column masks, row filters keyed off the tag |
| Prove compliance | Lineage + audit logs (system.access.audit) |
Because ABAC policies attach to a tag, the moment classification tags a new email column as PII, the masking policy applies automatically — no per-table work.
Meeting Regulatory Requirements
Regulations such as GDPR, CCPA, and HIPAA require organizations to know where personal data lives, restrict who can access it, and prove that access was controlled. Unity Catalog supports each obligation:
- Data discovery / inventory: classification + tags answer "where is our PII?"
- Access restriction: grants, column masks, row filters, and ABAC policies limit exposure to authorized roles only.
- Right to erasure / minimization: Delta
DELETE/MERGEplusVACUUMremove personal data; tags help locate the rows. - Auditability:
system.access.auditrecords every read of governed data, and lineage shows where PII propagated downstream — both are the evidence auditors demand.
The exam framing: compliance is not a single feature but the combination of classification (find), tags/governed tags (label), masks/filters/ABAC (protect), and audit/lineage (prove). A governance plan that omits any one leg fails an audit — for example, masking PII but having no audit trail leaves you unable to demonstrate controlled access.
Tagging in Practice
Tags are applied with SQL and require the APPLY TAG privilege on the object. A column-level tag looks like:
ALTER TABLE customers
ALTER COLUMN ssn SET TAGS ('classification' = 'PII', 'sensitivity' = 'high');
Tags propagate conceptually with the data's importance, and you can query the information schema or use Catalog Explorer search to find every object carrying a given tag. This inventory capability is the foundation of any privacy program: you cannot protect PII you cannot locate.
Defense in Depth for PII
No single control is sufficient for compliance; Unity Catalog expects layered protection:
| Layer | Control |
|---|---|
| Coarse access | GRANT/REVOKE (only authorized groups reach the table) |
| Fine-grained | Column masks + row filters (limit what those users see) |
| Scaled enforcement | ABAC policies keyed off classification=PII tags |
| Evidence | system.access.audit + lineage |
Because an ABAC policy attaches to a tag, the workflow becomes self-maintaining: when automated classification tags a newly added email column as PII, the masking policy applies to it instantly, with no per-table change. This closes the common gap where new sensitive columns slip through manual review. The exam's compliance message is that governance is operationalized by chaining classification, governed tags, policy enforcement, and audit/lineage into one repeatable system rather than relying on any single feature.
The Compliance Workflow End to End
The exam frames PII and regulatory compliance as an integrated workflow rather than a checklist of isolated features. First you discover sensitive data — automated classification scans columns and surfaces likely PII so nothing is missed across thousands of tables. Then you label it consistently using governed tags, an account-controlled vocabulary that prevents the pii/PII/personal drift that would otherwise break automation.
Next you protect it: ABAC policies keyed to the classification=PII tag apply masks and filters automatically, and because policies bind to tags, a newly classified column is protected the instant it is tagged. Query system.access.audit for who-accessed-what and lineage for where-data-flowed, which together satisfy the auditability demands of GDPR, CCPA, and HIPAA.
The lesson to carry into the exam is that governance is operationalized by chaining classification, governed tags, policy enforcement, and audit/lineage into one self-maintaining system; remove any leg and an audit fails — masking without an audit trail, or tagging without enforcement, is incomplete.
Compliance quick recap
- Govern PII with a discover → label → protect → prove chain: classify sensitive columns, apply tags, enforce column masks / row filters, and evidence it via audit logs.
- Tag-based (ABAC) policies apply masking consistently wherever a tagged column appears.
- Keep regulated data in Unity Catalog managed tables so governance, lineage, and audit apply uniformly.
What is the purpose of governed tags in Unity Catalog?
How does automated data classification help with PII compliance?
An auditor asks an organization to prove that access to a PII-tagged table was controlled and to show where that data flowed downstream. Which two Unity Catalog capabilities provide this evidence?
You've completed this section
Continue exploring other exams