5.5 Lakehouse Federation
Key Takeaways
- Lakehouse Federation enables querying external databases (PostgreSQL, MySQL, Snowflake, Redshift, BigQuery, SQL Server) directly from Databricks without data movement.
- Connections define the authentication credentials for accessing external databases, and foreign catalogs expose external database objects in the Unity Catalog namespace.
- Federated queries run through Unity Catalog governance, providing consistent access control and audit logging for external data.
- Query optimization pushes predicates and projections down to the external database to minimize data transfer.
- Lakehouse Federation requires Pro or Serverless SQL warehouses — it does not work with classic SQL warehouses or all-purpose clusters.
Lakehouse Federation
Quick Answer: Lakehouse Federation lets you query external databases (PostgreSQL, MySQL, Snowflake, Redshift, BigQuery) directly from Databricks SQL without moving data. Connections define credentials; foreign catalogs expose external tables in Unity Catalog for governed access.
What Is Lakehouse Federation?
Lakehouse Federation enables federated queries — querying data in external databases as if it were a Unity Catalog table, without ETL or data movement.
Supported External Sources
| Source | Connection Type |
|---|---|
| PostgreSQL | postgresql |
| MySQL | mysql |
| Microsoft SQL Server | sqlserver |
| Azure Synapse | sqldw |
| Amazon Redshift | redshift |
| Snowflake | snowflake |
| Google BigQuery | bigquery |
| Other Databricks workspaces | databricks |
Setup Process
Step 1: Create a Connection
CREATE CONNECTION my_postgres_conn
TYPE postgresql
OPTIONS (
host 'my-postgres-host.example.com',
port '5432',
user 'readonly_user',
password secret('my-scope', 'pg-password')
);
Step 2: Create a Foreign Catalog
-- Map an external database to a Unity Catalog catalog
CREATE FOREIGN CATALOG postgres_data
USING CONNECTION my_postgres_conn
OPTIONS (database 'production_db');
Step 3: Query External Data
-- Query external tables using standard three-level namespace
SELECT * FROM postgres_data.public.customers
WHERE country = 'US';
-- Join external data with Lakehouse data
SELECT
c.customer_name,
o.total_amount
FROM postgres_data.public.customers c
JOIN prod.sales.orders o ON c.customer_id = o.customer_id;
Governance
Federated queries go through Unity Catalog governance:
| Feature | Support |
|---|---|
| Access control | GRANT/REVOKE on foreign catalogs, schemas, and tables |
| Audit logging | All federated queries logged in system.access.audit |
| Data lineage | Lineage tracked for federated queries |
| Consistent namespace | External tables appear in the three-level namespace |
-- Grant access to federated data
GRANT USE CATALOG ON CATALOG postgres_data TO `data-analysts`;
GRANT USE SCHEMA ON SCHEMA postgres_data.public TO `data-analysts`;
GRANT SELECT ON TABLE postgres_data.public.customers TO `data-analysts`;
Query Optimization
Lakehouse Federation optimizes federated queries by:
- Predicate pushdown: WHERE clauses are sent to the external database
- Projection pushdown: Only requested columns are fetched
- Aggregation pushdown: Simple aggregations (COUNT, SUM, AVG) pushed to the source
- Limit pushdown: LIMIT clauses applied at the source
-- This query pushes the WHERE and SELECT to PostgreSQL
-- Only matching rows and selected columns are transferred
SELECT customer_name, email
FROM postgres_data.public.customers
WHERE country = 'US' AND active = true;
Compute Requirements
| Compute Type | Federation Support |
|---|---|
| Serverless SQL warehouse | Supported |
| Pro SQL warehouse | Supported |
| Classic SQL warehouse | NOT supported |
| All-purpose cluster | NOT supported |
| Job cluster | NOT supported |
On the Exam: Know that Lakehouse Federation requires Pro or Serverless SQL warehouses, that external data is accessed without copying, and that Unity Catalog governance (permissions, lineage, auditing) applies to federated queries just like local tables.
Which Databricks compute type supports Lakehouse Federation for querying external databases?
What does "predicate pushdown" mean in the context of Lakehouse Federation?
After creating a foreign catalog in Lakehouse Federation, how do you control who can access the external data?