5.1 Unity Catalog: Architecture and Three-Level Namespace
Key Takeaways
- Unity Catalog is the centralized governance solution for the Databricks Lakehouse, providing unified access control, auditing, lineage, and data discovery.
- The three-level namespace organizes data as catalog.schema.object (e.g., prod_catalog.sales.orders).
- A metastore is the top-level container for Unity Catalog metadata, typically one per cloud region.
- Securable objects include catalogs, schemas, tables, views, volumes, functions, models, connections, and external locations.
- Unity Catalog works across all Databricks workspaces attached to the same metastore, enabling centralized governance.
Unity Catalog: Architecture and Three-Level Namespace
Quick Answer: Unity Catalog provides centralized governance for the Databricks Lakehouse using a three-level namespace: catalog.schema.object. A metastore is the top-level regional container. Unity Catalog manages access control, data lineage, audit logging, and data discovery across all workspaces.
What Is Unity Catalog?
Unity Catalog is the unified governance layer for all data and AI assets in Databricks. Before Unity Catalog, each workspace had its own Hive metastore with separate access controls. Unity Catalog centralizes governance across workspaces.
Key Capabilities
| Capability | Description |
|---|---|
| Centralized access control | GRANT/REVOKE permissions on any data asset |
| Data lineage | Track how data flows from source to destination |
| Audit logging | Record who accessed what data and when |
| Data discovery | Search and browse data assets across the organization |
| Cross-workspace sharing | Same governance rules apply across all attached workspaces |
| Fine-grained access | Column-level and row-level security |
Three-Level Namespace
All data objects in Unity Catalog follow a three-level naming convention:
catalog.schema.object
| Level | Description | Example | Analogy |
|---|---|---|---|
| Catalog | Top-level container for organizing data | prod_catalog | Database server |
| Schema | Logical grouping of related objects within a catalog | sales | Database |
| Object | The actual data asset (table, view, function, etc.) | orders | Table |
-- Full three-level reference
SELECT * FROM prod_catalog.sales.orders;
-- Set default catalog and schema
USE CATALOG prod_catalog;
USE SCHEMA sales;
-- Now you can reference just the object name
SELECT * FROM orders;
Metastore
A metastore is the top-level container for Unity Catalog metadata:
- One metastore per cloud region (e.g., one for us-east-1, one for eu-west-1)
- Multiple workspaces can be attached to the same metastore
- All workspaces sharing a metastore have the same governance rules
- The metastore stores metadata only — actual data resides in cloud storage
Hierarchy
Metastore (regional)
├── Catalog: prod_catalog
│ ├── Schema: sales
│ │ ├── Table: orders
│ │ ├── Table: customers
│ │ ├── View: high_value_orders
│ │ └── Function: calculate_ltv
│ └── Schema: marketing
│ ├── Table: campaigns
│ └── Table: leads
├── Catalog: dev_catalog
│ └── Schema: sandbox
│ └── Table: test_data
└── Catalog: staging_catalog
└── Schema: sales
└── Table: orders
Securable Objects
Unity Catalog manages permissions on these object types:
| Object | Level | Description |
|---|---|---|
| Catalog | Top | Container for schemas |
| Schema | Middle | Container for data objects |
| Table | Bottom | Managed or external data tables |
| View | Bottom | Virtual tables (saved queries) |
| Volume | Bottom | Storage for non-tabular files (images, logs, CSVs) |
| Function | Bottom | SQL or Python UDFs |
| Model | Bottom | ML models (MLflow) |
| Connection | Top | Lakehouse Federation connections to external databases |
| External Location | Top | Cloud storage paths for external tables |
| Storage Credential | Top | Credentials for accessing cloud storage |
| Share | Top | Delta Sharing shares for data sharing |
Managed Storage Locations
| Level | Storage Location | Description |
|---|---|---|
| Metastore | Default managed storage for all catalogs | Fallback if no catalog/schema location set |
| Catalog | Override storage for all schemas in the catalog | Common for environment isolation |
| Schema | Override storage for all tables in the schema | Granular control per schema |
-- Create a catalog with a specific storage location
CREATE CATALOG prod_catalog
MANAGED LOCATION 's3://my-bucket/prod/';
-- Create a schema with a specific storage location
CREATE SCHEMA prod_catalog.sales
MANAGED LOCATION 's3://my-bucket/prod/sales/';
On the Exam: Know the three-level namespace (catalog.schema.object), that metastores are regional, and that managed storage can be set at metastore, catalog, or schema level. Understand that Unity Catalog works across workspaces.
What is the correct three-level namespace for referencing a table called "orders" in the "sales" schema of the "prod" catalog?
A Databricks deployment spans two cloud regions (us-east-1 and eu-west-1). How many Unity Catalog metastores are needed?
Which Unity Catalog object type is used to store non-tabular files like images, logs, and raw CSVs?