5.3 Data Lineage and Audit Logging

Key Takeaways

  • Unity Catalog automatically tracks column-level data lineage across tables, views, notebooks, and jobs.
  • The lineage graph shows how data flows from source tables through transformations to destination tables.
  • Audit logs record every data access event including who accessed what data, when, and from which workspace.
  • System tables (system.access.audit) provide programmatic access to audit log data for compliance reporting.
  • Lineage information is captured at query time — no additional configuration is needed beyond Unity Catalog.
Last updated: March 2026

Data Lineage and Audit Logging

Quick Answer: Unity Catalog automatically captures column-level lineage (how data flows between tables) and audit logs (who accessed what, when). No extra configuration is needed. View lineage in the Catalog Explorer UI or query system tables for programmatic access.

Data Lineage

What Lineage Captures

Lineage TypeDescriptionExample
Table-levelWhich tables feed into which tablesorders → daily_sales
Column-levelWhich source columns map to destination columnsorders.amount → daily_sales.total_revenue
Notebook/JobWhich notebooks and jobs produced a tableJob "daily_etl" wrote to silver.orders
DashboardWhich dashboards consume a table"Executive Dashboard" reads gold.kpis

How Lineage Works

  1. Unity Catalog intercepts every query executed against Unity Catalog tables
  2. The query plan is analyzed to determine source and destination relationships
  3. Column-level mappings are extracted from the query plan
  4. Lineage is stored in Unity Catalog metadata and available in the UI

Viewing Lineage

  • Catalog Explorer: Navigate to a table → click "Lineage" tab
  • Lineage graph: Visual DAG showing upstream and downstream dependencies
  • Column lineage: Click a column to see its source columns across tables

Use Cases for Lineage

  • Impact analysis: "If I change this table, what downstream tables and dashboards will be affected?"
  • Root cause analysis: "Where did this bad data originate?"
  • Compliance: "Prove that PII data flows only to authorized tables"
  • Documentation: Auto-generated data flow documentation

Audit Logging

What Audit Logs Capture

Event CategoryExamples
Data accessSELECT queries, table reads, file downloads
Data modificationINSERT, UPDATE, DELETE, MERGE operations
Permission changesGRANT, REVOKE, ownership transfers
Schema changesCREATE TABLE, ALTER TABLE, DROP TABLE
AuthenticationLogin events, token creation
Workspace eventsCluster creation, job runs, notebook executions

Accessing Audit Logs via System Tables

-- Query audit logs for recent table access events
SELECT
    event_time,
    user_identity.email AS user_email,
    action_name,
    request_params.full_name_arg AS table_name,
    source_ip_address,
    workspace_id
FROM system.access.audit
WHERE action_name IN ('getTable', 'commandSubmit')
    AND event_date >= current_date() - INTERVAL 7 DAYS
ORDER BY event_time DESC;
-- Find all users who accessed a specific table
SELECT DISTINCT
    user_identity.email AS user_email,
    COUNT(*) AS access_count,
    MIN(event_time) AS first_access,
    MAX(event_time) AS last_access
FROM system.access.audit
WHERE request_params.full_name_arg = 'prod.sales.customer_pii'
    AND action_name = 'commandSubmit'
GROUP BY user_identity.email
ORDER BY access_count DESC;

System Tables for Governance

System TablePurpose
system.access.auditAll audit events across workspaces
system.information_schema.tablesMetadata for all tables
system.information_schema.columnsColumn metadata for all tables
system.compute.clustersCluster usage and cost data
system.billing.usageCompute and storage billing data

Compliance Patterns

Data Retention Compliance

-- Track data retention: when was data last accessed?
SELECT
    request_params.full_name_arg AS table_name,
    MAX(event_time) AS last_accessed
FROM system.access.audit
WHERE action_name = 'commandSubmit'
GROUP BY table_name
HAVING last_accessed < current_timestamp() - INTERVAL 90 DAYS;
-- Tables not accessed in 90 days — candidates for archival

Access Review

-- Quarterly access review: who has permissions on sensitive tables?
SHOW GRANTS ON TABLE prod.sales.customer_pii;

-- Verify no unauthorized grants were made
SELECT *
FROM system.access.audit
WHERE action_name = 'updatePermissions'
    AND event_date >= current_date() - INTERVAL 90 DAYS
ORDER BY event_time DESC;

On the Exam: Know that lineage is automatic (no configuration needed), that audit logs are stored in system.access.audit, and that lineage works at both table and column level. Understand common use cases for lineage (impact analysis, root cause analysis).

Test Your Knowledge

A data engineer needs to determine which downstream tables and dashboards would be affected by changing the schema of the "orders" table. Which Unity Catalog feature should they use?

A
B
C
D
Test Your Knowledge

Does Unity Catalog data lineage require any special configuration to enable?

A
B
C
D
Test Your Knowledge

Which system table contains audit log data for tracking who accessed specific data assets in Unity Catalog?

A
B
C
D