5.3 Data Lineage and Audit Logging
Key Takeaways
- Unity Catalog automatically tracks column-level data lineage across tables, views, notebooks, and jobs.
- The lineage graph shows how data flows from source tables through transformations to destination tables.
- Audit logs record every data access event including who accessed what data, when, and from which workspace.
- System tables (system.access.audit) provide programmatic access to audit log data for compliance reporting.
- Lineage information is captured at query time — no additional configuration is needed beyond Unity Catalog.
Data Lineage and Audit Logging
Quick Answer: Unity Catalog automatically captures column-level lineage (how data flows between tables) and audit logs (who accessed what, when). No extra configuration is needed. View lineage in the Catalog Explorer UI or query system tables for programmatic access.
Data Lineage
What Lineage Captures
| Lineage Type | Description | Example |
|---|---|---|
| Table-level | Which tables feed into which tables | orders → daily_sales |
| Column-level | Which source columns map to destination columns | orders.amount → daily_sales.total_revenue |
| Notebook/Job | Which notebooks and jobs produced a table | Job "daily_etl" wrote to silver.orders |
| Dashboard | Which dashboards consume a table | "Executive Dashboard" reads gold.kpis |
How Lineage Works
- Unity Catalog intercepts every query executed against Unity Catalog tables
- The query plan is analyzed to determine source and destination relationships
- Column-level mappings are extracted from the query plan
- Lineage is stored in Unity Catalog metadata and available in the UI
Viewing Lineage
- Catalog Explorer: Navigate to a table → click "Lineage" tab
- Lineage graph: Visual DAG showing upstream and downstream dependencies
- Column lineage: Click a column to see its source columns across tables
Use Cases for Lineage
- Impact analysis: "If I change this table, what downstream tables and dashboards will be affected?"
- Root cause analysis: "Where did this bad data originate?"
- Compliance: "Prove that PII data flows only to authorized tables"
- Documentation: Auto-generated data flow documentation
Audit Logging
What Audit Logs Capture
| Event Category | Examples |
|---|---|
| Data access | SELECT queries, table reads, file downloads |
| Data modification | INSERT, UPDATE, DELETE, MERGE operations |
| Permission changes | GRANT, REVOKE, ownership transfers |
| Schema changes | CREATE TABLE, ALTER TABLE, DROP TABLE |
| Authentication | Login events, token creation |
| Workspace events | Cluster creation, job runs, notebook executions |
Accessing Audit Logs via System Tables
-- Query audit logs for recent table access events
SELECT
event_time,
user_identity.email AS user_email,
action_name,
request_params.full_name_arg AS table_name,
source_ip_address,
workspace_id
FROM system.access.audit
WHERE action_name IN ('getTable', 'commandSubmit')
AND event_date >= current_date() - INTERVAL 7 DAYS
ORDER BY event_time DESC;
-- Find all users who accessed a specific table
SELECT DISTINCT
user_identity.email AS user_email,
COUNT(*) AS access_count,
MIN(event_time) AS first_access,
MAX(event_time) AS last_access
FROM system.access.audit
WHERE request_params.full_name_arg = 'prod.sales.customer_pii'
AND action_name = 'commandSubmit'
GROUP BY user_identity.email
ORDER BY access_count DESC;
System Tables for Governance
| System Table | Purpose |
|---|---|
| system.access.audit | All audit events across workspaces |
| system.information_schema.tables | Metadata for all tables |
| system.information_schema.columns | Column metadata for all tables |
| system.compute.clusters | Cluster usage and cost data |
| system.billing.usage | Compute and storage billing data |
Compliance Patterns
Data Retention Compliance
-- Track data retention: when was data last accessed?
SELECT
request_params.full_name_arg AS table_name,
MAX(event_time) AS last_accessed
FROM system.access.audit
WHERE action_name = 'commandSubmit'
GROUP BY table_name
HAVING last_accessed < current_timestamp() - INTERVAL 90 DAYS;
-- Tables not accessed in 90 days — candidates for archival
Access Review
-- Quarterly access review: who has permissions on sensitive tables?
SHOW GRANTS ON TABLE prod.sales.customer_pii;
-- Verify no unauthorized grants were made
SELECT *
FROM system.access.audit
WHERE action_name = 'updatePermissions'
AND event_date >= current_date() - INTERVAL 90 DAYS
ORDER BY event_time DESC;
On the Exam: Know that lineage is automatic (no configuration needed), that audit logs are stored in system.access.audit, and that lineage works at both table and column level. Understand common use cases for lineage (impact analysis, root cause analysis).
A data engineer needs to determine which downstream tables and dashboards would be affected by changing the schema of the "orders" table. Which Unity Catalog feature should they use?
Does Unity Catalog data lineage require any special configuration to enable?
Which system table contains audit log data for tracking who accessed specific data assets in Unity Catalog?