In the medallion architecture, which layer holds raw, unprocessed data ingested as-is from source systems?

Bronze. The Bronze layer lands data as-is from sources with little or no transformation, preserving a full replayable history; Silver cleanses and conforms it, and Gold holds curated business-level aggregates.

Which two technologies are the foundational layers that make the lakehouse transactional and governed on Databricks?

Delta Lake for storage transactions and Unity Catalog for unified governance. Delta Lake adds a transaction log on Parquet to deliver ACID reliability, while Unity Catalog provides one governance layer (catalog.schema.table namespace, access control, lineage) across all workspaces.

The Lakehouse Architecture — Free Study Guide 2026

Key Takeaways

The lakehouse unifies the low-cost, open storage of a data lake with the ACID transactions, governance, and performance of a data warehouse on one copy of data.
Delta Lake is the open storage layer that turns commodity object storage (S3, ADLS, GCS) into a transactional lakehouse by adding a JSON transaction log on top of Parquet.
The medallion architecture organizes data into Bronze (raw), Silver (cleansed/conformed), and Gold (business-level aggregate) layers for progressive refinement.
Separating compute from storage lets multiple engines (Spark, Photon, SQL warehouses) read the same governed tables without copying data.
Unity Catalog provides one governance layer across all workspaces, so tables, volumes, and models share a single permission and lineage model.

What Problem the Lakehouse Solves

Before the lakehouse, organizations ran two disconnected systems. A data lake stored raw files (JSON, CSV, Parquet) cheaply on object storage but had no transactions, no schema enforcement, and poor query performance — it easily degraded into a "data swamp." A separate data warehouse offered fast SQL, ACID guarantees, and governance, but required copying data into a proprietary format, was expensive, and could not handle unstructured data, machine learning, or streaming well.

The two-system pattern forced teams to maintain brittle ETL that copied data from the lake into the warehouse, creating stale duplicates, extra cost, and governance gaps. The lakehouse collapses these into one architecture: a single copy of data in open formats on cheap object storage, with a transactional metadata layer that delivers warehouse-grade reliability and speed directly on the lake.

The Databricks Data Intelligence Platform

Databricks brands its lakehouse as the Data Intelligence Platform. Its defining traits are:

Capability	How the lakehouse delivers it
Open storage	Data lives as Parquet files in your own cloud object store (S3, ADLS, GCS)
ACID transactions	The Delta Lake transaction log makes concurrent reads/writes safe
Schema enforcement & evolution	Bad-shape writes are rejected; intended changes are merged in
Decoupled compute	Many engines read the same tables; storage and compute scale independently
Unified governance	Unity Catalog secures tables, volumes, and models across workspaces
All workloads	SQL analytics, ETL, streaming, BI, and ML run on the same data

Because storage is decoupled from compute, you can spin up a SQL warehouse, an all-purpose cluster, and a jobs pipeline that all read the identical governed Delta tables — with no data movement and no copies to keep in sync.

The Medallion Architecture

Databricks recommends organizing lakehouse data into three quality tiers, the medallion (or multi-hop) architecture. Data flows forward, getting cleaner and more business-ready at each hop:

Bronze (raw): Ingested data landed as-is from source systems, with little or no transformation. Bronze preserves a full, replayable history of what arrived, including ingestion metadata (load time, source file). It is the system of record for reprocessing.
Silver (cleansed/conformed): Data is filtered, deduplicated, type-cast, and joined into a clean, queryable model. Silver enforces data-quality rules and conforms entities (one consistent customer, product, etc.) across sources.
Gold (curated): Business-level aggregates and project-specific tables optimized for analytics, dashboards, and reporting — for example, daily revenue by region or features for a model.

Moving validation downstream means a bad source file corrupts only Bronze; you can fix logic and rebuild Silver and Gold without re-ingesting. This progressive refinement is the backbone of reliable pipelines on the platform.

Delta Lake and Unity Catalog as Foundations

Two components make the lakehouse real on Databricks. Delta Lake is the open-source storage layer that adds a transaction log on top of Parquet, providing ACID guarantees, time travel, and schema management — every managed table on Databricks is a Delta table by default. Unity Catalog is the unified governance layer using a three-level namespace, catalog.schema.table, that centralizes access control, auditing, lineage, and discovery across every workspace in an account.

For the exam, remember the core value proposition: one copy of open-format data, made reliable by Delta Lake, governed by Unity Catalog, and served to every workload — without the cost and drift of a separate warehouse.

Decoupled Compute and Storage

A defining property of the lakehouse is the separation of compute from storage. In a classic warehouse the two are bound together, so to query more data you must scale the whole appliance, and idle compute still costs money. On the lakehouse, data sits in your cloud object store as a durable, independent layer, and compute is provisioned separately and on demand. The practical consequences matter for everyday engineering:

Independent scaling: you grow storage simply by writing more files; you grow compute by adding clusters or larger warehouses, and the two never have to move together.
Many engines, one copy: an interactive notebook cluster, a scheduled jobs pipeline, a serverless SQL warehouse, and an external BI tool can all read the same Delta tables concurrently. No engine owns the data, and there is no extract step to keep current.
Elastic cost: compute can autoscale and auto-terminate when idle, so you pay only while actually processing — something a coupled warehouse cannot do.

This design is what lets a single governed table serve SQL analytics, machine learning, and streaming at once. It also explains why the platform talks about workloads rather than databases: the table is the durable asset, and any engine attaches to it.

How the Pieces Fit Together

Putting the layers in order clarifies the whole architecture. Cloud object storage holds the bytes cheaply and durably. Delta Lake wraps those Parquet files with a transaction log to add ACID reliability, time travel, and schema management. Unity Catalog governs the resulting tables, volumes, and models with one permission and lineage model across every workspace. The medallion architecture organizes the data into Bronze, Silver, and Gold quality tiers as it is refined. Finally, compute — clusters, jobs, and SQL warehouses, accelerated by Photon — runs the actual work against that governed data.

Internalizing this stack, from raw object storage up through governed, query-ready Gold tables, is the mental model the rest of the exam builds on, and it is the reason the lakehouse can replace a separate lake-plus-warehouse pipeline with a single, reliable system.

Test Your Knowledge

What is the central architectural advantage of the lakehouse over the traditional two-tier (data lake + separate data warehouse) approach?

It stores all data in a proprietary warehouse format for faster queries

It maintains one copy of open-format data with ACID reliability, eliminating the need to copy data into a separate warehouse

It removes the need for any transaction log or governance layer

It requires coupling compute and storage so they scale together

Databricks Certified Data Engineer Associate

Databricks Certified Data Engineer Associate

1.1 The Lakehouse Architecture

Key Takeaways

What Problem the Lakehouse Solves

The Databricks Data Intelligence Platform

The Medallion Architecture

Delta Lake and Unity Catalog as Foundations

Decoupled Compute and Storage

How the Pieces Fit Together

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Databricks Certified Data Engineer Associate

1.1 The Lakehouse Architecture

Key Takeaways

What Problem the Lakehouse Solves

The Databricks Data Intelligence Platform

The Medallion Architecture

Delta Lake and Unity Catalog as Foundations

Decoupled Compute and Storage

How the Pieces Fit Together