3.6 Implementing a Multi-Hop Pipeline

Key Takeaways

  • A multi-hop pipeline implements the medallion architecture as one declarative pipeline of bronze, silver, and gold datasets with automatic dependency resolution.
  • Bronze ingests raw files with Auto Loader via read_files (SQL) or cloudFiles (Python) into an append-only streaming table.
  • Silver applies expectations, type casting, deduplication, and STREAM() reads to produce a cleansed streaming table.
  • Gold builds business aggregations and KPIs as materialized views using JOINs and GROUP BY over silver.
  • The whole bronze-to-gold flow is declared once and executed as a single managed unit, with quality metrics in the event log.
Last updated: June 2026

Putting Medallion + Declarative Pipelines Together

A multi-hop pipeline is the medallion architecture expressed as a single Lakeflow Declarative Pipeline. Instead of three separate jobs, you declare bronze, silver, and gold datasets in one pipeline and let the framework resolve their order and run them as a managed unit. This is the most-tested end-to-end scenario in this domain, so know each hop's responsibilities.

Bronze: Raw Ingestion with Auto Loader

The bronze hop ingests raw files incrementally. In SQL you use read_files (Auto Loader under the hood); in Python you use cloudFiles. Bronze is an append-only streaming table that keeps the raw record plus ingestion metadata.

CREATE OR REFRESH STREAMING TABLE bronze_orders
AS SELECT *, current_timestamp() AS ingest_ts, _metadata.file_path AS src_file
FROM STREAM read_files('/landing/orders', format => 'json');

Because Auto Loader checkpoints which files it has consumed, each run picks up only newly arrived files — true incremental ingestion with no manual file bookkeeping.

Silver: Cleanse, Enforce, Deduplicate

The silver hop reads from bronze using STREAM(bronze_orders) and applies quality and shaping logic: expectations to drop or fail on bad rows, explicit type casts, null handling, and deduplication.

CREATE OR REFRESH STREAMING TABLE silver_orders (
  CONSTRAINT valid_id EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW,
  CONSTRAINT valid_amt EXPECT (amount > 0)
)
AS SELECT
  order_id,
  CAST(amount AS DECIMAL(10,2)) AS amount,
  to_timestamp(order_ts) AS order_ts,
  customer_id
FROM STREAM(bronze_orders);

Silver is where the data becomes analytics-grade: conformed, typed, validated, and deduplicated. Note the use of STREAM(...) to declare a streaming read from an upstream pipeline table — a frequent syntax point on the exam.

Gold: Business Aggregations

The gold hop produces curated metrics as materialized views, joining and aggregating silver into business KPIs.

CREATE OR REFRESH MATERIALIZED VIEW gold_daily_revenue
AS SELECT
  date(order_ts) AS day,
  customer_region,
  SUM(amount) AS revenue,
  COUNT(*) AS order_count
FROM silver_orders s
JOIN dim_customers c ON s.customer_id = c.id
GROUP BY date(order_ts), customer_region;

One Declared Unit, Automatically Orchestrated

You define all three datasets in the same pipeline and never specify run order. The framework reads the queries, sees silver_orders depends on bronze_orders and gold_daily_revenue depends on silver_orders, and constructs the DAG:

read_files (Auto Loader)
      |
  bronze_orders  (streaming table, append-only)
      |
  silver_orders  (streaming table, expectations + dedupe + casts)
      |
  gold_daily_revenue  (materialized view, JOIN + GROUP BY)
HopDataset typeKey operations
BronzeStreaming tableAuto Loader ingest, raw + metadata
SilverStreaming tableExpectations, casting, dedupe, STREAM() read
GoldMaterialized viewJOIN, GROUP BY, KPIs

Because it is one managed pipeline, you get unified observability: the event log records lineage, data-quality expectation metrics, and run history across all hops. Triggered mode refreshes the whole chain on a schedule; continuous mode keeps it updating in near real time. The same pattern scales by simply adding more silver or gold datasets that read from existing tables.

Operational Benefits and a Worked Walkthrough

Building the multi-hop flow as one declarative pipeline (rather than three hand-orchestrated jobs) delivers concrete advantages the exam emphasizes:

  • Single source of truth for orchestration — no external scheduler wiring bronze to silver to gold; the DAG is derived from the SQL.
  • Built-in data quality — expectations at the silver hop quarantine or reject bad rows and surface metrics in the event log.
  • Automatic recovery — in production mode the framework retries transient failures and restarts compute.
  • Lineage and observability — you can trace a gold metric back through silver to the exact bronze file that produced it.

End-to-end flow for an orders pipeline

  1. Files land in /landing/orders as JSON throughout the day.
  2. Bronze (bronze_orders, streaming table) uses read_files to ingest only new files, adding ingest_ts and src_file.
  3. Silver (silver_orders, streaming table) reads STREAM(bronze_orders), drops rows failing order_id IS NOT NULL, casts amount to DECIMAL, and deduplicates by keeping the latest event per order_id.
  4. Gold (gold_daily_revenue, materialized view) joins silver_orders to a customer dimension and aggregates revenue by day and region.

Run the pipeline in triggered mode each morning and it refreshes the entire chain incrementally; switch to continuous for near-real-time dashboards. Adding a second gold mart later requires only one more CREATE OR REFRESH MATERIALIZED VIEW that reads silver — the framework slots it into the DAG with no other changes.

Design Decisions Within a Multi-Hop Pipeline

Several implementation choices recur in exam scenarios, and each maps to a specific layer:

DecisionWhere it belongsWhy
Ingest new files onlyBronzeAuto Loader / read_files track processed files
Reject or quarantine bad rowsSilverExpectations with ON VIOLATION DROP/FAIL
Deduplicate by business keySilverROW_NUMBER + STREAM read before downstream
Type casting and normalizationSilverProduce conformed, analytics-ready schema
Aggregate to KPIsGoldMaterialized view with GROUP BY
Join to dimensionsGold (or silver enrich)Combine conformed entities for reporting

Streaming-table vs materialized-view placement

Within the same pipeline you mix dataset types deliberately. Bronze and silver are usually streaming tables because they process appends incrementally and cheaply. Gold is usually a materialized view because business metrics must reflect the current full state of silver — including any corrections that flowed through as updates. A common mistake is making a gold aggregation a streaming table; an append-only streaming aggregate cannot easily recompute totals when upstream rows change, whereas a materialized view simply re-derives the result.

Matching dataset type to the layer's refresh semantics is exactly what the multi-hop implementation objective tests.

Test Your Knowledge

In a SQL declarative multi-hop pipeline, which function is used in the BRONZE table to incrementally ingest raw files from cloud storage with Auto Loader?

A
B
C
D
Test Your Knowledge

Which dataset type and operation are MOST appropriate for the gold table in a multi-hop pipeline?

A
B
C
D
Test Your Knowledge

How are the bronze, silver, and gold tables in a single declarative multi-hop pipeline scheduled relative to one another?

A
B
C
D
Test Your Knowledge

Which operational benefit do you get from building the bronze-silver-gold flow as a single declarative pipeline rather than three separately scheduled jobs?

A
B
C
D