In a Lakeflow Declarative Pipeline, which dataset type is the best choice for a gold table that fully recomputes a daily aggregation from silver data?

A materialized view, because it recomputes its full query result on refresh. A materialized view recomputes its complete query result when refreshed, which matches a gold aggregation that needs to reflect the full current state of silver. Streaming tables are append-only/incremental and suit ingestion, while temporary views are not persisted.

An expectation is defined with ON VIOLATION DROP ROW. What happens to records that fail the constraint?

The records are dropped from the output but the violations are tracked in the event log. DROP ROW removes the failing records from the dataset's output while still counting them in the pipeline's data-quality metrics. Failing the whole update is the FAIL UPDATE action, and a plain warn-style expectation would keep the rows.

How does a declarative pipeline determine the order in which its tables are built?

It infers a dependency DAG automatically from the queries (e.g., gold reads silver, silver reads bronze). The framework parses each dataset's query to discover which other datasets it references and builds a dependency graph automatically, executing upstream tables before downstream ones. Developers never hand-order the DAG, and execution is not alphabetical or fully simultaneous.

Lakeflow Declarative Pipelines — Free Study Guide 2026

Key Takeaways

Lakeflow Declarative Pipelines (formerly Delta Live Tables/DLT) is a declarative ETL framework: you declare the desired tables and it manages orchestration, dependencies, retries, and scaling.
Streaming tables ingest append-only/incremental data, while materialized views recompute their full result and are ideal for aggregated gold tables.
Expectations are data-quality constraints that can warn (track), drop invalid rows, or fail the pipeline (ON VIOLATION DROP ROW / FAIL UPDATE).
Pipelines auto-resolve dependencies from the queries and run datasets in the correct order, so you never wire up the DAG manually.
Triggered mode runs once and stops; continuous mode runs nonstop. Development mode reuses a cluster and skips retries; production mode restarts and retries on failure.

Declarative ETL

Lakeflow Declarative Pipelines — formerly Delta Live Tables (DLT) and sometimes shown as Lakeflow Spark Declarative Pipelines — is a framework for building reliable ETL. The product was renamed when Databricks unified data engineering under Lakeflow (Lakeflow Connect for ingestion, Lakeflow Declarative Pipelines for transformation, and Lakeflow Jobs, formerly Workflows, for orchestration). You will see the new name on the exam but should recognize "DLT" as the same thing.

The core idea is declarative: you write queries describing the desired end state of each table (the WHAT), and the framework figures out the HOW — execution order, cluster management, error handling, retries, and incrementalization. This contrasts with imperative notebooks where you manually orchestrate each step.

Dataset Types

Dataset	Behavior	Best for
Streaming table	Append-only, processes new data incrementally	Bronze ingestion, silver from streams
Materialized view	Recomputes its full query result on refresh	Gold aggregations, dimensions
View	Temporary, not persisted to storage	Intermediate logic, reuse within a pipeline

CREATE OR REFRESH STREAMING TABLE bronze
AS SELECT * FROM read_files('/landing', format => 'json');

CREATE OR REFRESH MATERIALIZED VIEW daily_sales
AS SELECT order_date, SUM(amount) AS total FROM silver GROUP BY order_date;

Expectations: Data Quality

Expectations are declarative data-quality constraints attached to a dataset. Each expectation is a boolean condition; the ON VIOLATION action decides what happens to rows that fail it:

Action	Behavior
(none) — warn	Records that fail are kept but counted as violations in metrics
DROP ROW	Invalid records are dropped from the output (but tracked)
FAIL UPDATE	The pipeline update fails immediately on any violation

CREATE OR REFRESH STREAMING TABLE silver (
  CONSTRAINT valid_id EXPECT (id IS NOT NULL) ON VIOLATION DROP ROW,
  CONSTRAINT valid_amt EXPECT (amount > 0) ON VIOLATION FAIL UPDATE
)
AS SELECT * FROM STREAM(bronze);

Expectation metrics (records passed, dropped, failed) are recorded in the pipeline's event log, giving you observable data-quality tracking without extra code. In Python the equivalent decorators are @dlt.expect, @dlt.expect_or_drop, and @dlt.expect_or_fail.

Automatic Dependency Resolution

You never declare execution order. The framework reads each query, sees that silver selects from bronze and gold selects from silver, and builds the dependency DAG automatically — running bronze first, then silver, then gold, and parallelizing independent branches. Add a new table that reads an existing one and it is slotted into the graph without rewiring anything.

Execution Modes and Environments

Two orthogonal mode choices appear on the exam.

Pipeline execution mode (how often it runs):

Triggered — the pipeline processes available data once and then stops. Use for scheduled batch refreshes.
Continuous — the pipeline runs nonstop, ingesting and updating tables as new data arrives for low-latency streaming.

Development vs Production mode (how it behaves on failure and cluster reuse):

Development — reuses the cluster between runs (faster iteration) and disables automatic retries so errors surface immediately for debugging.
Production — restarts the cluster for each run and retries on transient failures for reliability.

	Triggered	Continuous
Lifecycle	Runs once, stops	Runs forever
Latency	Batch	Near real-time
Cost	Lower (on-demand)	Higher (always on)

Key exam distinction: triggered vs continuous controls scheduling/latency, while development vs production controls cluster reuse and retry behavior — they are independent settings, and confusing the two is a classic trap.

Streaming Tables vs Materialized Views in Depth

Choosing between the two persisted dataset types is one of the most tested decisions:

	Streaming table	Materialized view
Processing	Incremental, append-only source	Recomputes (or incrementally refreshes) full result
Source reads	Uses STREAM() / read_files	Standard SELECT over tables
Reprocessing data	Does not reprocess old rows	Reflects all current upstream data
Typical layer	Bronze, silver	Gold, dimensions
Handles updates upstream	No (append only)	Yes (recomputed)

A streaming table is the right pick when the source only appends and you want each row processed once. A materialized view is right when results must reflect updates and deletes in the source, such as a current daily-revenue total — because it recomputes, it naturally incorporates changes a streaming table would miss.

Other framework features

Unity Catalog integration governs pipeline tables with the same permissions and lineage as any UC table.
The event log captures lineage, expectation metrics, and run history for auditing and debugging.
Full refresh can be triggered to rebuild a table from scratch when logic changes, versus the default incremental update.
Pipelines can be authored in SQL or Python, and a single pipeline may mix both languages across its notebooks.
Serverless pipelines let Databricks manage compute autoscaling without you sizing clusters, and they enable features like incremental materialized-view refresh.
Pipeline parameters (configuration values) let one pipeline definition run against different paths or environments, supporting dev/staging/prod promotion without code changes.

Test Your Knowledge

Which statement correctly distinguishes development mode from production mode in a Lakeflow Declarative Pipeline?

Development mode reuses the cluster and disables automatic retries, while production mode restarts the cluster and retries on failure

Development mode runs continuously while production mode runs once

Development mode enables expectations and production mode disables them

There is no difference; the modes are aliases

Databricks Certified Data Engineer Associate

Databricks Certified Data Engineer Associate

3.5 Lakeflow Declarative Pipelines (Delta Live Tables)

Key Takeaways

Declarative ETL

Dataset Types

Expectations: Data Quality

Automatic Dependency Resolution

Execution Modes and Environments

Streaming Tables vs Materialized Views in Depth

Other framework features

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Databricks Certified Data Engineer Associate

3.5 Lakeflow Declarative Pipelines (Delta Live Tables)

Key Takeaways

Declarative ETL

Dataset Types

Expectations: Data Quality

Automatic Dependency Resolution

Execution Modes and Environments

Streaming Tables vs Materialized Views in Depth

Other framework features