3.5 Lakeflow Declarative Pipelines (Delta Live Tables)

Key Takeaways

  • Lakeflow Declarative Pipelines (formerly Delta Live Tables/DLT) is a framework for building reliable ETL pipelines with automatic dependency management.
  • You declare the desired end state of your data (WHAT), and the framework handles orchestration, error handling, and retries (HOW).
  • Three dataset types: streaming tables (append-only, incremental), materialized views (full refresh, recomputed), and views (temporary, not persisted).
  • Expectations define data quality constraints that can warn, drop invalid records, or fail the pipeline.
  • Pipelines automatically resolve dependencies between datasets and determine the optimal execution order.
Last updated: March 2026

Lakeflow Declarative Pipelines (Delta Live Tables)

Quick Answer: Lakeflow Declarative Pipelines let you define the desired state of your data transformations (the WHAT), while the framework automatically handles execution order, dependency resolution, error handling, and retries (the HOW). Choose streaming tables for incremental processing, materialized views for complex transformations, and expectations for data quality enforcement.

Declarative vs. Imperative

ApproachYou DefineFramework Handles
Imperative (traditional)Step-by-step execution logicNothing — you manage everything
Declarative (Lakeflow)The desired end state of each datasetExecution order, dependencies, retries, error handling

Dataset Types

Streaming Tables

Best for incremental, append-only workloads:

-- SQL: Create a streaming table
CREATE OR REFRESH STREAMING TABLE raw_orders
AS SELECT * FROM STREAM read_files('/data/raw/orders/', format => 'json');

-- SQL: Streaming table from another streaming table
CREATE OR REFRESH STREAMING TABLE cleaned_orders (
    CONSTRAINT valid_order_id EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW
)
AS SELECT
    order_id,
    customer_id,
    CAST(amount AS DECIMAL(10,2)) AS amount,
    CAST(order_date AS DATE) AS order_date
FROM STREAM(LIVE.raw_orders);
# Python: Create a streaming table
import dlt

@dlt.table(comment="Raw orders from cloud storage")
def raw_orders():
    return (spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "json")
        .load("/data/raw/orders/")
    )

Key properties of streaming tables:

  • Each input row is processed exactly once
  • New data is appended incrementally
  • Ideal for data ingestion from cloud storage, Kafka, etc.
  • Uses Structured Streaming under the hood

Materialized Views

Best for complex transformations and aggregations:

-- SQL: Create a materialized view
CREATE OR REFRESH MATERIALIZED VIEW daily_sales_summary
AS SELECT
    order_date,
    COUNT(DISTINCT order_id) AS total_orders,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM LIVE.cleaned_orders
GROUP BY order_date;
@dlt.table(comment="Daily sales aggregation")
def daily_sales_summary():
    return (dlt.read("cleaned_orders")
        .groupBy("order_date")
        .agg(
            count("order_id").alias("total_orders"),
            sum("amount").alias("total_revenue"),
            avg("amount").alias("avg_order_value")
        )
    )

Key properties of materialized views:

  • Results are pre-computed and stored (not computed on each query)
  • Refreshed using incremental refresh when possible
  • Can use any SQL operation (JOINs, aggregations, window functions)
  • Ideal for gold-layer analytics tables

Views (Temporary)

-- Temporary: not persisted, used for intermediate transformations
CREATE TEMPORARY STREAMING LIVE VIEW enriched_orders
AS SELECT
    o.*,
    c.customer_name,
    c.segment
FROM STREAM(LIVE.cleaned_orders) o
LEFT JOIN LIVE.customers c ON o.customer_id = c.customer_id;

Key properties of views:

  • Not persisted — computed on each pipeline run
  • Used for intermediate transformations
  • Reduce storage costs for transformations that don't need to be materialized

Choosing the Right Dataset Type

ScenarioDataset Type
Ingest raw files from cloud storageStreaming Table
Append-only event processingStreaming Table
Aggregated daily/weekly summariesMaterialized View
JOIN-heavy transformationsMaterialized View
Intermediate cleanup step (not queried directly)View

Expectations (Data Quality)

Expectations define data quality constraints on pipeline datasets:

Expectation Actions

ActionSQL SyntaxBehavior
WarnEXPECT (condition)Log violation metrics; keep all rows
DropEXPECT (condition) ON VIOLATION DROP ROWRemove rows that violate
FailEXPECT (condition) ON VIOLATION FAIL UPDATEStop the pipeline on violation

Examples

-- Warn: log violations but keep all rows
CREATE OR REFRESH STREAMING TABLE orders (
    CONSTRAINT valid_amount EXPECT (amount > 0),
    CONSTRAINT valid_date EXPECT (order_date IS NOT NULL)
)
AS SELECT * FROM STREAM(LIVE.raw_orders);

-- Drop: remove invalid rows
CREATE OR REFRESH STREAMING TABLE clean_orders (
    CONSTRAINT valid_order EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW,
    CONSTRAINT positive_amount EXPECT (amount > 0) ON VIOLATION DROP ROW
)
AS SELECT * FROM STREAM(LIVE.raw_orders);

-- Fail: stop pipeline if any violation occurs
CREATE OR REFRESH STREAMING TABLE critical_orders (
    CONSTRAINT required_fields EXPECT (order_id IS NOT NULL AND customer_id IS NOT NULL) ON VIOLATION FAIL UPDATE
)
AS SELECT * FROM STREAM(LIVE.raw_orders);
# Python expectations
@dlt.table
@dlt.expect("valid_order_id", "order_id IS NOT NULL")
@dlt.expect_or_drop("positive_amount", "amount > 0")
@dlt.expect_or_fail("required_customer", "customer_id IS NOT NULL")
def cleaned_orders():
    return dlt.read_stream("raw_orders")

Expectation Metrics

Pipeline UI shows:

  • Number of records that passed each expectation
  • Number of records that failed each expectation
  • Pass rate percentage for each expectation
  • Historical trends across pipeline runs

Automatic Dependency Resolution

Lakeflow automatically determines the execution order based on dataset dependencies:

-- Pipeline defines three datasets with implicit dependencies:
-- raw_orders → cleaned_orders → daily_summary

CREATE OR REFRESH STREAMING TABLE raw_orders
AS SELECT * FROM STREAM read_files('/data/raw/orders/');

CREATE OR REFRESH STREAMING TABLE cleaned_orders
AS SELECT * FROM STREAM(LIVE.raw_orders);  -- Depends on raw_orders

CREATE OR REFRESH MATERIALIZED VIEW daily_summary
AS SELECT order_date, SUM(amount) FROM LIVE.cleaned_orders  -- Depends on cleaned_orders
GROUP BY order_date;

The framework:

  1. Analyzes the LIVE. references to build a dependency graph
  2. Executes datasets in the correct order (topological sort)
  3. Parallelizes independent branches
  4. Handles failures and retries automatically

On the Exam: Know the three dataset types (streaming table, materialized view, view), the three expectation actions (warn, drop, fail), and how the LIVE. keyword creates automatic dependencies.

Loading diagram...
Lakeflow Declarative Pipeline Example
Test Your Knowledge

A data engineer wants to create a pipeline dataset that performs a GROUP BY aggregation over a silver table. The results should be pre-computed and stored. Which dataset type should they use?

A
B
C
D
Test Your Knowledge

Which expectation action removes records that violate the data quality constraint without stopping the pipeline?

A
B
C
D
Test Your Knowledge

How does Lakeflow Declarative Pipelines determine the execution order of datasets?

A
B
C
D
Test Your Knowledge

What is the key difference between a streaming table and a materialized view in Lakeflow Declarative Pipelines?

A
B
C
D