3.5 Lakeflow Declarative Pipelines (Delta Live Tables)

Key Takeaways

Lakeflow Declarative Pipelines (formerly Delta Live Tables/DLT) is a framework for building reliable ETL pipelines with automatic dependency management.
You declare the desired end state of your data (WHAT), and the framework handles orchestration, error handling, and retries (HOW).
Three dataset types: streaming tables (append-only, incremental), materialized views (full refresh, recomputed), and views (temporary, not persisted).
Expectations define data quality constraints that can warn, drop invalid records, or fail the pipeline.
Pipelines automatically resolve dependencies between datasets and determine the optimal execution order.

Last updated: March 2026

Lakeflow Declarative Pipelines (Delta Live Tables)

Quick Answer: Lakeflow Declarative Pipelines let you define the desired state of your data transformations (the WHAT), while the framework automatically handles execution order, dependency resolution, error handling, and retries (the HOW). Choose streaming tables for incremental processing, materialized views for complex transformations, and expectations for data quality enforcement.

Declarative vs. Imperative

Approach	You Define	Framework Handles
Imperative (traditional)	Step-by-step execution logic	Nothing — you manage everything
Declarative (Lakeflow)	The desired end state of each dataset	Execution order, dependencies, retries, error handling

Dataset Types

Streaming Tables

Best for incremental, append-only workloads:

-- SQL: Create a streaming table
CREATE OR REFRESH STREAMING TABLE raw_orders
AS SELECT * FROM STREAM read_files('/data/raw/orders/', format => 'json');

-- SQL: Streaming table from another streaming table
CREATE OR REFRESH STREAMING TABLE cleaned_orders (
    CONSTRAINT valid_order_id EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW
)
AS SELECT
    order_id,
    customer_id,
    CAST(amount AS DECIMAL(10,2)) AS amount,
    CAST(order_date AS DATE) AS order_date
FROM STREAM(LIVE.raw_orders);

# Python: Create a streaming table
import dlt

@dlt.table(comment="Raw orders from cloud storage")
def raw_orders():
    return (spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "json")
        .load("/data/raw/orders/")
    )

Key properties of streaming tables:

Each input row is processed exactly once
New data is appended incrementally
Ideal for data ingestion from cloud storage, Kafka, etc.
Uses Structured Streaming under the hood

Materialized Views

Best for complex transformations and aggregations:

-- SQL: Create a materialized view
CREATE OR REFRESH MATERIALIZED VIEW daily_sales_summary
AS SELECT
    order_date,
    COUNT(DISTINCT order_id) AS total_orders,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM LIVE.cleaned_orders
GROUP BY order_date;

@dlt.table(comment="Daily sales aggregation")
def daily_sales_summary():
    return (dlt.read("cleaned_orders")
        .groupBy("order_date")
        .agg(
            count("order_id").alias("total_orders"),
            sum("amount").alias("total_revenue"),
            avg("amount").alias("avg_order_value")
        )
    )

Key properties of materialized views:

Results are pre-computed and stored (not computed on each query)
Refreshed using incremental refresh when possible
Can use any SQL operation (JOINs, aggregations, window functions)
Ideal for gold-layer analytics tables

Views (Temporary)

-- Temporary: not persisted, used for intermediate transformations
CREATE TEMPORARY STREAMING LIVE VIEW enriched_orders
AS SELECT
    o.*,
    c.customer_name,
    c.segment
FROM STREAM(LIVE.cleaned_orders) o
LEFT JOIN LIVE.customers c ON o.customer_id = c.customer_id;

Key properties of views:

Not persisted — computed on each pipeline run
Used for intermediate transformations
Reduce storage costs for transformations that don't need to be materialized

Choosing the Right Dataset Type

Scenario	Dataset Type
Ingest raw files from cloud storage	Streaming Table
Append-only event processing	Streaming Table
Aggregated daily/weekly summaries	Materialized View
JOIN-heavy transformations	Materialized View
Intermediate cleanup step (not queried directly)	View

Expectations (Data Quality)

Expectations define data quality constraints on pipeline datasets:

Expectation Actions

Action	SQL Syntax	Behavior
Warn	`EXPECT (condition)`	Log violation metrics; keep all rows
Drop	`EXPECT (condition) ON VIOLATION DROP ROW`	Remove rows that violate
Fail	`EXPECT (condition) ON VIOLATION FAIL UPDATE`	Stop the pipeline on violation

Examples

-- Warn: log violations but keep all rows
CREATE OR REFRESH STREAMING TABLE orders (
    CONSTRAINT valid_amount EXPECT (amount > 0),
    CONSTRAINT valid_date EXPECT (order_date IS NOT NULL)
)
AS SELECT * FROM STREAM(LIVE.raw_orders);

-- Drop: remove invalid rows
CREATE OR REFRESH STREAMING TABLE clean_orders (
    CONSTRAINT valid_order EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW,
    CONSTRAINT positive_amount EXPECT (amount > 0) ON VIOLATION DROP ROW
)
AS SELECT * FROM STREAM(LIVE.raw_orders);

-- Fail: stop pipeline if any violation occurs
CREATE OR REFRESH STREAMING TABLE critical_orders (
    CONSTRAINT required_fields EXPECT (order_id IS NOT NULL AND customer_id IS NOT NULL) ON VIOLATION FAIL UPDATE
)
AS SELECT * FROM STREAM(LIVE.raw_orders);

# Python expectations
@dlt.table
@dlt.expect("valid_order_id", "order_id IS NOT NULL")
@dlt.expect_or_drop("positive_amount", "amount > 0")
@dlt.expect_or_fail("required_customer", "customer_id IS NOT NULL")
def cleaned_orders():
    return dlt.read_stream("raw_orders")

Expectation Metrics

Pipeline UI shows:

Number of records that passed each expectation
Number of records that failed each expectation
Pass rate percentage for each expectation
Historical trends across pipeline runs

Automatic Dependency Resolution

Lakeflow automatically determines the execution order based on dataset dependencies:

-- Pipeline defines three datasets with implicit dependencies:
-- raw_orders → cleaned_orders → daily_summary

CREATE OR REFRESH STREAMING TABLE raw_orders
AS SELECT * FROM STREAM read_files('/data/raw/orders/');

CREATE OR REFRESH STREAMING TABLE cleaned_orders
AS SELECT * FROM STREAM(LIVE.raw_orders);  -- Depends on raw_orders

CREATE OR REFRESH MATERIALIZED VIEW daily_summary
AS SELECT order_date, SUM(amount) FROM LIVE.cleaned_orders  -- Depends on cleaned_orders
GROUP BY order_date;

The framework:

Analyzes the LIVE. references to build a dependency graph
Executes datasets in the correct order (topological sort)
Parallelizes independent branches
Handles failures and retries automatically

On the Exam: Know the three dataset types (streaming table, materialized view, view), the three expectation actions (warn, drop, fail), and how the LIVE. keyword creates automatic dependencies.

Loading diagram...

Lakeflow Declarative Pipeline Example

Test Your Knowledge

A data engineer wants to create a pipeline dataset that performs a GROUP BY aggregation over a silver table. The results should be pre-computed and stored. Which dataset type should they use?

Streaming table

Materialized view

Temporary view

External table

Test Your Knowledge

Which expectation action removes records that violate the data quality constraint without stopping the pipeline?

EXPECT (condition)

EXPECT (condition) ON VIOLATION DROP ROW

EXPECT (condition) ON VIOLATION FAIL UPDATE

EXPECT (condition) ON VIOLATION IGNORE

Test Your Knowledge

How does Lakeflow Declarative Pipelines determine the execution order of datasets?

Datasets are executed in the order they are defined in the notebook

The user must specify execution order explicitly

The framework analyzes LIVE. references to build a dependency graph and determines the order automatically

All datasets are executed in parallel regardless of dependencies

Test Your Knowledge

What is the key difference between a streaming table and a materialized view in Lakeflow Declarative Pipelines?

Streaming tables support SQL while materialized views support only Python

Streaming tables process each input row exactly once (append-only), while materialized views recompute results and support arbitrary transformations

Materialized views are temporary; streaming tables are permanent

Streaming tables cannot have expectations applied to them

Up Next

3.6 Implementing a Multi-Hop Pipeline

Continue learning

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

3.5 Lakeflow Declarative Pipelines (Delta Live Tables)

Key Takeaways

Lakeflow Declarative Pipelines (Delta Live Tables)

Declarative vs. Imperative

Dataset Types

Streaming Tables

Materialized Views

Views (Temporary)

Choosing the Right Dataset Type

Expectations (Data Quality)

Expectation Actions

Examples

Expectation Metrics

Automatic Dependency Resolution