3.1 The Medallion Architecture (Bronze, Silver, Gold)

Key Takeaways

The medallion architecture organizes data into three progressive quality layers: bronze (raw), silver (validated/cleansed), and gold (business-ready aggregates).
Bronze tables capture source data exactly as ingested, typically append-only, preserving the raw record so the pipeline can be reprocessed or audited later.
Silver tables enforce schema, deduplicate, cast types, join, and apply quality rules to produce a conformed enterprise view of business entities.
Gold tables hold curated, highly aggregated data modeled for dashboards, reporting, and ML features — often one gold table per use case.
Each hop incrementally raises data quality, and the same layered design supports both batch and streaming ingestion in the lakehouse.

Last updated: June 2026

What the Medallion Architecture Solves

The medallion architecture (also called the multi-hop architecture) is a data design pattern that incrementally improves the structure and quality of data as it flows through successive layers in the lakehouse. Each layer is a set of Delta Lake tables, and data moves "hop by hop" from raw to refined. The exam expects you to identify which layer a given table belongs to from its characteristics, not just to recite definitions.

The pattern exists because raw source data is messy: it has malformed records, duplicates, inconsistent types, and missing values. Rather than fixing everything in one giant transformation, you isolate concerns across three named layers. This makes pipelines easier to debug, lets you reprocess from a known-good intermediate state, and gives different consumers a table at the right quality level for their needs.

Layer	Also called	Quality	Typical write mode	Primary consumer
Bronze	Raw / landing	Lowest — as ingested	Append-only	Data engineers, reprocessing
Silver	Cleansed / conformed	Validated, deduplicated	Merge/upsert	Analysts, data scientists
Gold	Curated / aggregated	Highest — business-ready	Overwrite / merge	BI dashboards, ML, executives

Bronze Layer: Raw Ingestion

The bronze layer stores data exactly as it arrives from source systems — JSON, CSV, Parquet, CDC feeds, IoT events — with little or no transformation. Bronze tables are almost always append-only and frequently add metadata columns such as the ingestion timestamp (_ingest_time), source file name (_metadata.file_path from Auto Loader), and processing date. Preserving the unmodified raw record is the whole point: if a downstream bug is discovered, you can replay everything from bronze without re-pulling from the (possibly transient) source.

Because bronze is the system of record for raw data, you generally do not delete or update bronze rows. Schema is kept permissive — Auto Loader can even rescue unexpected columns into a _rescued_data column rather than failing the load.

Silver Layer: Cleansed and Conformed

The silver layer is where data becomes trustworthy. Transformations applied here typically include:

Schema enforcement and explicit type casting (string timestamps to real TIMESTAMP, etc.).
Deduplication of records that arrived multiple times.
Null handling, filtering of invalid rows, and applying data-quality expectations.
Joins and enrichment that combine multiple bronze sources into conformed business entities (a single customers or orders view across systems).

Silver gives the organization a clean, queryable, enterprise-wide view. It is detailed (row-level) rather than aggregated, so analysts and data scientists can slice it however they need. Silver tables are commonly written with MERGE INTO (upsert) so that updates and late-arriving changes are reflected correctly.

Gold Layer: Business-Ready Aggregates

The gold layer contains highly refined, aggregated data modeled for a specific business purpose. Where silver is general-purpose, gold is purpose-built: a daily-revenue-by-region table, a churn-feature table for an ML model, or a star-schema fact/dimension set for a BI tool. Gold tables apply the heavy GROUP BY, window functions, and joins that power dashboards, so the BI layer reads pre-computed results instead of scanning raw data.

A single silver table often feeds many gold tables, each shaped for one consumer. Because gold is derived, it can be rebuilt from silver at any time, which is why materialized views are a natural fit for gold in declarative pipelines.

Why the layering matters on the exam

Quality increases left to right — never the reverse. You don't write "cleaned" data back into bronze.
Each hop is reproducible — you can truncate and rebuild silver/gold from the upstream layer.
Batch and streaming both fit — a bronze table can be a streaming table fed by Auto Loader while gold is a batch-refreshed materialized view.
Don't skip layers in answer choices that propose dashboards reading directly from bronze; that violates the pattern's intent.

How the Layers Interact in Practice

A realistic lakehouse rarely has just three tables; it has many bronze ingests, several conformed silver entities, and dozens of gold marts. The medallion pattern keeps this manageable because each table has a single, well-understood quality contract. Consider a retail example:

Bronze holds bronze_orders, bronze_inventory, and bronze_clickstream, each an append-only landing table fed by Auto Loader from a different source.
Silver produces silver_orders (typed, deduplicated, with currency normalized) and silver_customers (a conformed view joining CRM and billing sources). Silver is the layer most users actually query for ad-hoc analysis.
Gold builds gold_daily_revenue, gold_customer_360, and a gold_churn_features table for the ML team — each derived from silver.

Common exam misconceptions

Gold is not always the smallest. It can be large if it stores fine-grained features; "gold" denotes purpose-built and curated, not small.
Silver is row-level, not aggregated. Aggregation is gold's job; a silver table that is already grouped is a design smell.
Bronze is not throwaway. It is the durable raw record that makes the whole pipeline replayable, so it is retained, not discarded after silver loads.
The medallion pattern is a convention, not a Databricks-enforced feature — Delta tables are just Delta tables; the layering is how you organize them. The exam still expects you to apply the convention correctly.

Test Your Knowledge

A pipeline ingests raw JSON click events directly from cloud storage and writes them, untransformed, to a Delta table with an added ingestion-timestamp column. Which medallion layer is this table?

Gold, because it is the first table queried by dashboards

Silver, because a column was added during ingestion

Bronze, because it stores raw source data as ingested with minimal transformation

It is outside the medallion architecture since it uses Auto Loader

Test Your Knowledge

Which transformation is MOST characteristic of moving data from the silver layer to the gold layer?

Aggregating detailed records into business-level metrics with GROUP BY for a specific dashboard

Rescuing unexpected columns into a _rescued_data field

Appending raw events without modification

Adding the source file name as a metadata column

Test Your Knowledge

Why is the bronze layer typically kept append-only and minimally transformed?

Because Delta Lake does not support updates on bronze tables

So the raw record is preserved and the entire pipeline can be reprocessed from bronze without re-pulling from the source

Because dashboards read directly from bronze and need stable schemas

To force all deduplication to happen in the gold layer

Up Next

3.2 Delta Lake Optimization: OPTIMIZE, VACUUM, and Liquid Clustering

Continue learning

Databricks Certified Data Engineer Associate

Databricks Certified Data Engineer Associate

3.1 The Medallion Architecture (Bronze, Silver, Gold)

Key Takeaways

What the Medallion Architecture Solves

Bronze Layer: Raw Ingestion

Silver Layer: Cleansed and Conformed

Gold Layer: Business-Ready Aggregates

Why the layering matters on the exam

How the Layers Interact in Practice

Common exam misconceptions

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Databricks Certified Data Engineer Associate

3.1 The Medallion Architecture (Bronze, Silver, Gold)

Key Takeaways

What the Medallion Architecture Solves

Bronze Layer: Raw Ingestion

Silver Layer: Cleansed and Conformed

Gold Layer: Business-Ready Aggregates

Why the layering matters on the exam

How the Layers Interact in Practice

Common exam misconceptions