5.8 Compliance and PII Data Management

Key Takeaways

  • Tag tables and columns with metadata labels (e.g., "PII", "GDPR", "PHI") in Unity Catalog to track sensitive data across the organization.
  • Information schema tables enable programmatic discovery of all tables and columns tagged with specific sensitivity labels.
  • Right to deletion (GDPR Article 17) can be implemented using DELETE FROM or MERGE INTO with a list of user IDs to remove.
  • VACUUM must be run after deletion to physically remove the data files — Delta time travel could otherwise still access deleted records.
  • Data retention policies should be implemented using scheduled jobs that delete or archive data beyond the retention period.
Last updated: March 2026

Compliance and PII Data Management

Quick Answer: Tag sensitive columns with metadata labels in Unity Catalog for discovery. Implement right-to-deletion with DELETE + VACUUM. Use scheduled jobs for data retention. Combine column masks and row filters for access control.

Tagging Sensitive Data

-- Add tags to tables
ALTER TABLE prod.sales.customers SET TAGS ('sensitivity' = 'PII', 'regulation' = 'GDPR');

-- Add tags to columns
ALTER TABLE prod.sales.customers ALTER COLUMN email SET TAGS ('pii_type' = 'email');
ALTER TABLE prod.sales.customers ALTER COLUMN ssn SET TAGS ('pii_type' = 'ssn', 'sensitivity' = 'high');
ALTER TABLE prod.sales.customers ALTER COLUMN phone SET TAGS ('pii_type' = 'phone');

-- Remove tags
ALTER TABLE prod.sales.customers UNSET TAGS ('sensitivity');

Discovering Tagged Data

-- Find all columns tagged as PII
SELECT
    table_catalog,
    table_schema,
    table_name,
    column_name,
    tag_name,
    tag_value
FROM system.information_schema.column_tags
WHERE tag_name = 'pii_type';

Right to Deletion (GDPR)

-- Step 1: Delete user records
DELETE FROM prod.sales.customers
WHERE customer_id IN (SELECT customer_id FROM deletion_requests);

DELETE FROM prod.sales.orders
WHERE customer_id IN (SELECT customer_id FROM deletion_requests);

-- Step 2: VACUUM to physically remove data files
-- (without VACUUM, time travel could access deleted records)
VACUUM prod.sales.customers RETAIN 0 HOURS;
VACUUM prod.sales.orders RETAIN 0 HOURS;

-- Note: RETAIN 0 HOURS requires disabling the safety check:
-- SET spark.databricks.delta.retentionDurationCheck.enabled = false;

Important: VACUUM After Deletion

  • DELETE removes records from the logical table (current version)
  • But the data still exists in old Parquet files (accessible via time travel)
  • VACUUM physically removes the old files, making the deletion permanent
  • For GDPR compliance, VACUUM must run with a short retention period

Data Retention Policies

-- Scheduled job: delete data older than 2 years
DELETE FROM prod.logs.user_activity
WHERE event_date < date_sub(current_date(), 730);

-- Archive before deletion
INSERT INTO prod.archive.user_activity
SELECT * FROM prod.logs.user_activity
WHERE event_date < date_sub(current_date(), 730);

DELETE FROM prod.logs.user_activity
WHERE event_date < date_sub(current_date(), 730);

VACUUM prod.logs.user_activity;

Compliance Checklist

RequirementImplementation
Data discoveryTag tables/columns with sensitivity labels
Access controlGRANT/REVOKE + column masks + row filters
Audit trailUnity Catalog audit logs (system.access.audit)
Data lineageAutomatic Unity Catalog lineage
Right to deletionDELETE + VACUUM
Data retentionScheduled deletion/archival jobs
EncryptionCloud-native encryption at rest + in transit
Cross-borderRegional metastores + location-aware catalogs

On the Exam: Understand that DELETE alone is not sufficient for GDPR compliance — VACUUM must also run to physically remove data from storage. Know that tags help discover and classify sensitive data across the organization.

Test Your Knowledge

A company receives a GDPR deletion request. A data engineer runs DELETE FROM customers WHERE customer_id = 12345. Is the data fully deleted?

A
B
C
D
Test Your Knowledge

How can a data governance team find all columns in the Lakehouse that contain email addresses?

A
B
C
D
Congratulations!

You've completed this section

Continue exploring other exams