3.5 Lakeflow Declarative Pipelines (Delta Live Tables)

Key Takeaways

  • Lakeflow Declarative Pipelines (formerly Delta Live Tables/DLT) is a declarative ETL framework: you declare the desired tables and it manages orchestration, dependencies, retries, and scaling.
  • Streaming tables ingest append-only/incremental data, while materialized views recompute their full result and are ideal for aggregated gold tables.
  • Expectations are data-quality constraints that can warn (track), drop invalid rows, or fail the pipeline (ON VIOLATION DROP ROW / FAIL UPDATE).
  • Pipelines auto-resolve dependencies from the queries and run datasets in the correct order, so you never wire up the DAG manually.
  • Triggered mode runs once and stops; continuous mode runs nonstop. Development mode reuses a cluster and skips retries; production mode restarts and retries on failure.
Last updated: June 2026

Declarative ETL

Lakeflow Declarative Pipelines — formerly Delta Live Tables (DLT) and sometimes shown as Lakeflow Spark Declarative Pipelines — is a framework for building reliable ETL. The product was renamed when Databricks unified data engineering under Lakeflow (Lakeflow Connect for ingestion, Lakeflow Declarative Pipelines for transformation, and Lakeflow Jobs, formerly Workflows, for orchestration). You will see the new name on the exam but should recognize "DLT" as the same thing.

The core idea is declarative: you write queries describing the desired end state of each table (the WHAT), and the framework figures out the HOW — execution order, cluster management, error handling, retries, and incrementalization. This contrasts with imperative notebooks where you manually orchestrate each step.

Dataset Types

DatasetBehaviorBest for
Streaming tableAppend-only, processes new data incrementallyBronze ingestion, silver from streams
Materialized viewRecomputes its full query result on refreshGold aggregations, dimensions
ViewTemporary, not persisted to storageIntermediate logic, reuse within a pipeline
CREATE OR REFRESH STREAMING TABLE bronze
AS SELECT * FROM read_files('/landing', format => 'json');

CREATE OR REFRESH MATERIALIZED VIEW daily_sales
AS SELECT order_date, SUM(amount) AS total FROM silver GROUP BY order_date;

Expectations: Data Quality

Expectations are declarative data-quality constraints attached to a dataset. Each expectation is a boolean condition; the ON VIOLATION action decides what happens to rows that fail it:

ActionBehavior
(none) — warnRecords that fail are kept but counted as violations in metrics
DROP ROWInvalid records are dropped from the output (but tracked)
FAIL UPDATEThe pipeline update fails immediately on any violation
CREATE OR REFRESH STREAMING TABLE silver (
  CONSTRAINT valid_id EXPECT (id IS NOT NULL) ON VIOLATION DROP ROW,
  CONSTRAINT valid_amt EXPECT (amount > 0) ON VIOLATION FAIL UPDATE
)
AS SELECT * FROM STREAM(bronze);

Expectation metrics (records passed, dropped, failed) are recorded in the pipeline's event log, giving you observable data-quality tracking without extra code. In Python the equivalent decorators are @dlt.expect, @dlt.expect_or_drop, and @dlt.expect_or_fail.

Automatic Dependency Resolution

You never declare execution order. The framework reads each query, sees that silver selects from bronze and gold selects from silver, and builds the dependency DAG automatically — running bronze first, then silver, then gold, and parallelizing independent branches. Add a new table that reads an existing one and it is slotted into the graph without rewiring anything.

Execution Modes and Environments

Two orthogonal mode choices appear on the exam.

Pipeline execution mode (how often it runs):

  • Triggered — the pipeline processes available data once and then stops. Use for scheduled batch refreshes.
  • Continuous — the pipeline runs nonstop, ingesting and updating tables as new data arrives for low-latency streaming.

Development vs Production mode (how it behaves on failure and cluster reuse):

  • Developmentreuses the cluster between runs (faster iteration) and disables automatic retries so errors surface immediately for debugging.
  • Productionrestarts the cluster for each run and retries on transient failures for reliability.
TriggeredContinuous
LifecycleRuns once, stopsRuns forever
LatencyBatchNear real-time
CostLower (on-demand)Higher (always on)

Key exam distinction: triggered vs continuous controls scheduling/latency, while development vs production controls cluster reuse and retry behavior — they are independent settings, and confusing the two is a classic trap.

Streaming Tables vs Materialized Views in Depth

Choosing between the two persisted dataset types is one of the most tested decisions:

Streaming tableMaterialized view
ProcessingIncremental, append-only sourceRecomputes (or incrementally refreshes) full result
Source readsUses STREAM() / read_filesStandard SELECT over tables
Reprocessing dataDoes not reprocess old rowsReflects all current upstream data
Typical layerBronze, silverGold, dimensions
Handles updates upstreamNo (append only)Yes (recomputed)

A streaming table is the right pick when the source only appends and you want each row processed once. A materialized view is right when results must reflect updates and deletes in the source, such as a current daily-revenue total — because it recomputes, it naturally incorporates changes a streaming table would miss.

Other framework features

  • Unity Catalog integration governs pipeline tables with the same permissions and lineage as any UC table.
  • The event log captures lineage, expectation metrics, and run history for auditing and debugging.
  • Full refresh can be triggered to rebuild a table from scratch when logic changes, versus the default incremental update.
  • Pipelines can be authored in SQL or Python, and a single pipeline may mix both languages across its notebooks.
  • Serverless pipelines let Databricks manage compute autoscaling without you sizing clusters, and they enable features like incremental materialized-view refresh.
  • Pipeline parameters (configuration values) let one pipeline definition run against different paths or environments, supporting dev/staging/prod promotion without code changes.
Test Your Knowledge

In a Lakeflow Declarative Pipeline, which dataset type is the best choice for a gold table that fully recomputes a daily aggregation from silver data?

A
B
C
D
Test Your Knowledge

An expectation is defined with ON VIOLATION DROP ROW. What happens to records that fail the constraint?

A
B
C
D
Test Your Knowledge

How does a declarative pipeline determine the order in which its tables are built?

A
B
C
D
Test Your Knowledge

Which statement correctly distinguishes development mode from production mode in a Lakeflow Declarative Pipeline?

A
B
C
D