3.6 Implementing a Multi-Hop Pipeline

Key Takeaways

A multi-hop pipeline implements the medallion architecture as one declarative pipeline of bronze, silver, and gold datasets with automatic dependency resolution.
Bronze ingests raw files with Auto Loader via read_files (SQL) or cloudFiles (Python) into an append-only streaming table.
Silver applies expectations, type casting, deduplication, and STREAM() reads to produce a cleansed streaming table.
Gold builds business aggregations and KPIs as materialized views using JOINs and GROUP BY over silver.
The whole bronze-to-gold flow is declared once and executed as a single managed unit, with quality metrics in the event log.

Last updated: June 2026

Putting Medallion + Declarative Pipelines Together

A multi-hop pipeline is the medallion architecture expressed as a single Lakeflow Declarative Pipeline. Instead of three separate jobs, you declare bronze, silver, and gold datasets in one pipeline and let the framework resolve their order and run them as a managed unit. This is the most-tested end-to-end scenario in this domain, so know each hop's responsibilities.

Bronze: Raw Ingestion with Auto Loader

The bronze hop ingests raw files incrementally. In SQL you use read_files (Auto Loader under the hood); in Python you use cloudFiles. Bronze is an append-only streaming table that keeps the raw record plus ingestion metadata.

CREATE OR REFRESH STREAMING TABLE bronze_orders
AS SELECT *, current_timestamp() AS ingest_ts, _metadata.file_path AS src_file
FROM STREAM read_files('/landing/orders', format => 'json');

Because Auto Loader checkpoints which files it has consumed, each run picks up only newly arrived files — true incremental ingestion with no manual file bookkeeping.

Silver: Cleanse, Enforce, Deduplicate

The silver hop reads from bronze using STREAM(bronze_orders) and applies quality and shaping logic: expectations to drop or fail on bad rows, explicit type casts, null handling, and deduplication.

CREATE OR REFRESH STREAMING TABLE silver_orders (
  CONSTRAINT valid_id EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW,
  CONSTRAINT valid_amt EXPECT (amount > 0)
)
AS SELECT
  order_id,
  CAST(amount AS DECIMAL(10,2)) AS amount,
  to_timestamp(order_ts) AS order_ts,
  customer_id
FROM STREAM(bronze_orders);

Silver is where the data becomes analytics-grade: conformed, typed, validated, and deduplicated. Note the use of STREAM(...) to declare a streaming read from an upstream pipeline table — a frequent syntax point on the exam.

Gold: Business Aggregations

The gold hop produces curated metrics as materialized views, joining and aggregating silver into business KPIs.

CREATE OR REFRESH MATERIALIZED VIEW gold_daily_revenue
AS SELECT
  date(order_ts) AS day,
  customer_region,
  SUM(amount) AS revenue,
  COUNT(*) AS order_count
FROM silver_orders s
JOIN dim_customers c ON s.customer_id = c.id
GROUP BY date(order_ts), customer_region;

One Declared Unit, Automatically Orchestrated

You define all three datasets in the same pipeline and never specify run order. The framework reads the queries, sees silver_orders depends on bronze_orders and gold_daily_revenue depends on silver_orders, and constructs the DAG:

read_files (Auto Loader)
      |
  bronze_orders  (streaming table, append-only)
      |
  silver_orders  (streaming table, expectations + dedupe + casts)
      |
  gold_daily_revenue  (materialized view, JOIN + GROUP BY)

Hop	Dataset type	Key operations
Bronze	Streaming table	Auto Loader ingest, raw + metadata
Silver	Streaming table	Expectations, casting, dedupe, STREAM() read
Gold	Materialized view	JOIN, GROUP BY, KPIs

Because it is one managed pipeline, you get unified observability: the event log records lineage, data-quality expectation metrics, and run history across all hops. Triggered mode refreshes the whole chain on a schedule; continuous mode keeps it updating in near real time. The same pattern scales by simply adding more silver or gold datasets that read from existing tables.

Operational Benefits and a Worked Walkthrough

Building the multi-hop flow as one declarative pipeline (rather than three hand-orchestrated jobs) delivers concrete advantages the exam emphasizes:

Single source of truth for orchestration — no external scheduler wiring bronze to silver to gold; the DAG is derived from the SQL.
Built-in data quality — expectations at the silver hop quarantine or reject bad rows and surface metrics in the event log.
Automatic recovery — in production mode the framework retries transient failures and restarts compute.
Lineage and observability — you can trace a gold metric back through silver to the exact bronze file that produced it.

End-to-end flow for an orders pipeline

Files land in /landing/orders as JSON throughout the day.
Bronze (bronze_orders, streaming table) uses read_files to ingest only new files, adding ingest_ts and src_file.
Silver (silver_orders, streaming table) reads STREAM(bronze_orders), drops rows failing order_id IS NOT NULL, casts amount to DECIMAL, and deduplicates by keeping the latest event per order_id.
Gold (gold_daily_revenue, materialized view) joins silver_orders to a customer dimension and aggregates revenue by day and region.

Run the pipeline in triggered mode each morning and it refreshes the entire chain incrementally; switch to continuous for near-real-time dashboards. Adding a second gold mart later requires only one more CREATE OR REFRESH MATERIALIZED VIEW that reads silver — the framework slots it into the DAG with no other changes.

Design Decisions Within a Multi-Hop Pipeline

Several implementation choices recur in exam scenarios, and each maps to a specific layer:

Decision	Where it belongs	Why
Ingest new files only	Bronze	Auto Loader / read_files track processed files
Reject or quarantine bad rows	Silver	Expectations with ON VIOLATION DROP/FAIL
Deduplicate by business key	Silver	ROW_NUMBER + STREAM read before downstream
Type casting and normalization	Silver	Produce conformed, analytics-ready schema
Aggregate to KPIs	Gold	Materialized view with GROUP BY
Join to dimensions	Gold (or silver enrich)	Combine conformed entities for reporting

Streaming-table vs materialized-view placement

Within the same pipeline you mix dataset types deliberately. Bronze and silver are usually streaming tables because they process appends incrementally and cheaply. Gold is usually a materialized view because business metrics must reflect the current full state of silver — including any corrections that flowed through as updates. A common mistake is making a gold aggregation a streaming table; an append-only streaming aggregate cannot easily recompute totals when upstream rows change, whereas a materialized view simply re-derives the result.

Matching dataset type to the layer's refresh semantics is exactly what the multi-hop implementation objective tests.

Test Your Knowledge

In a SQL declarative multi-hop pipeline, which function is used in the BRONZE table to incrementally ingest raw files from cloud storage with Auto Loader?

read_files() with a STREAM read

PIVOT()

MERGE INTO

VACUUM

Test Your Knowledge

Which dataset type and operation are MOST appropriate for the gold table in a multi-hop pipeline?

A streaming table that appends raw events

A materialized view that JOINs and aggregates silver data into business KPIs

A temporary view that rescues malformed columns

A bronze streaming table fed by Auto Loader

Test Your Knowledge

How are the bronze, silver, and gold tables in a single declarative multi-hop pipeline scheduled relative to one another?

The engineer manually triggers each table in sequence

All three always run at exactly the same time

The framework infers dependencies from the queries and runs bronze before silver before gold

Gold runs first to pre-allocate aggregates

Test Your Knowledge

Which operational benefit do you get from building the bronze-silver-gold flow as a single declarative pipeline rather than three separately scheduled jobs?

It eliminates the need to store any data in Delta format

It forces all tables to be materialized views

It prevents the use of data-quality expectations

Orchestration, dependency ordering, retries, and end-to-end lineage are handled automatically from the queries

Up Next

3.7 Slowly Changing Dimensions (SCD) Patterns

Continue learning

Databricks Certified Data Engineer Associate

Databricks Certified Data Engineer Associate

3.6 Implementing a Multi-Hop Pipeline

Key Takeaways

Putting Medallion + Declarative Pipelines Together

Bronze: Raw Ingestion with Auto Loader

Silver: Cleanse, Enforce, Deduplicate

Gold: Business Aggregations

One Declared Unit, Automatically Orchestrated

Operational Benefits and a Worked Walkthrough

End-to-end flow for an orders pipeline

Design Decisions Within a Multi-Hop Pipeline

Streaming-table vs materialized-view placement

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Databricks Certified Data Engineer Associate

3.6 Implementing a Multi-Hop Pipeline

Key Takeaways

Putting Medallion + Declarative Pipelines Together

Bronze: Raw Ingestion with Auto Loader

Silver: Cleanse, Enforce, Deduplicate

Gold: Business Aggregations

One Declared Unit, Automatically Orchestrated

Operational Benefits and a Worked Walkthrough

End-to-end flow for an orders pipeline

Design Decisions Within a Multi-Hop Pipeline

Streaming-table vs materialized-view placement