5.6 Data Quality Best Practices

Key Takeaways

  • Lakeflow Declarative Pipelines (formerly Delta Live Tables) enforce data quality with expectations: a description, a boolean constraint, and an action on violation.
  • The three violation actions are WARN (keep invalid rows, default), DROP (remove invalid rows), and FAIL (stop the pipeline).
  • SQL syntax is CONSTRAINT name EXPECT (condition) [ON VIOLATION action]; Python uses decorators like @dp.expect_or_drop().
  • Delta tables enforce CHECK constraints and NOT NULL automatically, rejecting writes that violate them.
  • Expectation metrics (valid vs. invalid record counts) are recorded in the pipeline event log for monitoring data quality over time.
Last updated: June 2026

Quick Answer: Databricks enforces data quality two ways. Delta table constraints (NOT NULL, CHECK) validate every write and reject bad rows. Lakeflow Declarative Pipelines expectations define a named constraint plus an action on violationWARN (keep invalid rows, the default), DROP (remove them), or FAIL (stop the pipeline) — and record valid/invalid counts to the event log.

Delta Table Constraints

The first line of defense is constraints on the Delta table itself, which are checked automatically on every insert/update:

ConstraintEffect
NOT NULLRejects rows where the column is null
CHECKRejects rows failing a boolean expression (e.g., CHECK (amount > 0))
PRIMARY KEY / FOREIGN KEYInformational/optimizer hints (not enforced at write in the same way)

Example:

ALTER TABLE sales.orders ADD CONSTRAINT pos_amt CHECK (amount > 0);

After this, any write with amount <= 0 fails the transaction. CHECK and NOT NULL are hard enforcement — invalid data never lands.

Expectations in Lakeflow Declarative Pipelines

For pipeline-driven quality, Lakeflow Declarative Pipelines (formerly Delta Live Tables / DLT) provide expectations. An expectation has three parts: a description (name), an invariant (a boolean condition each record must satisfy), and an action if a record violates it.

SQL form:

CREATE OR REFRESH STREAMING TABLE clean_orders (
  CONSTRAINT valid_id EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW,
  CONSTRAINT valid_amt EXPECT (amount > 0) ON VIOLATION FAIL UPDATE
) AS SELECT * FROM stream(raw_orders);

The three actions are central to the exam:

Action (SQL)Behavior
(none)WARN (default)Invalid rows are kept in the target; counts are logged
ON VIOLATION DROP ROWDROPInvalid rows are removed from the target; counts logged
ON VIOLATION FAIL UPDATEFAILThe pipeline update fails on any violation

Python Decorators and Monitoring

In Python pipelines, expectations are applied with decorators, which mirror the SQL actions:

  • @dp.expect("desc", "cond") — warn (retain invalid rows)
  • @dp.expect_or_drop("desc", "cond") — drop invalid rows
  • @dp.expect_or_fail("desc", "cond") — fail the pipeline
  • @dp.expect_all(...), @dp.expect_all_or_drop(...), @dp.expect_all_or_fail(...) — apply a dictionary of multiple expectations at once

Choosing an action: use DROP to quarantine bad records and keep the pipeline flowing for a Silver table; use FAIL for invariants that must never break (e.g., a null primary key) so the pipeline halts rather than corrupting downstream data; use WARN when you want visibility without dropping data.

Every expectation emits metrics — counts of records that passed and failed — to the pipeline event log, a queryable Delta table. Teams build dashboards on the event log to trend data quality over time and alert when failure rates spike. This observability is why expectations are preferred over ad-hoc filtering: the quality result is measured and retained, not silently discarded.

The Medallion Quarantine Pattern

Data quality is most effective when layered through the medallion architecture. Raw data lands in Bronze with minimal validation; expectations are applied as data moves to Silver and Gold, where business consumers expect clean, conformant records. A widely used pattern is quarantining: rather than FAILing a whole pipeline on bad records, you tag or route invalid rows so good data keeps flowing while bad data is captured for investigation. With expectations you can keep invalid rows (WARN), drop them (DROP), or split them by adding a derived is_valid flag and reading the failing subset into a separate quarantine table.

GoalRecommended action
Block data that must never be wrong (e.g., null PK)FAIL / expect_or_fail
Keep the pipeline flowing, discard bad rowsDROP / expect_or_drop
Observe quality without losing any rowsWARN / expect

Constraints vs. Expectations

Know when to use each mechanism. Delta table constraints (NOT NULL, CHECK) are enforced on every write to the table, whatever wrote it, and reject the offending transaction — use them for hard invariants that must hold for all writers. Pipeline expectations are defined inside a Lakeflow Declarative Pipeline and offer the flexible WARN/DROP/FAIL choice plus logged metrics — use them for quality rules that should observe, quarantine, or conditionally fail a transformation. Constraints are about table integrity; expectations are about pipeline-level data-quality observability.

A robust design uses both: constraints guarantee the floor, and expectations add graduated, measured quality gates with a dashboardable audit trail in the event log.

Operationalizing Data Quality

A mature pipeline treats data quality as a first-class, measured concern rather than a side effect of filtering. The reason expectations are favored over a plain WHERE clause that silently discards bad rows is observability: expectations emit pass/fail counts to the pipeline event log, so teams can chart quality trends, set alerts when failure rates spike, and prove to auditors that records were validated. Pair that with Delta table constraints, which guarantee a hard floor on every write regardless of who writes, and you have layered enforcement.

The exam expects you to map a requirement to the right tool: a never-violate invariant such as a non-null primary key calls for FAIL (or a NOT NULL constraint); a desire to keep clean data flowing while discarding malformed rows calls for DROP; and a need to monitor quality without losing any data calls for the default WARN. Knowing the SQL (CONSTRAINT … EXPECT (…) ON VIOLATION …) and Python decorator forms (expect, expect_or_drop, expect_or_fail, and their _all variants) lets you implement whichever the scenario demands.

Test Your Knowledge

In a Lakeflow Declarative Pipeline, an expectation uses ON VIOLATION DROP ROW. What happens to records that fail the condition?

A
B
C
D
Test Your Knowledge

Which Python decorator drops records that fail a data quality expectation?

A
B
C
D
Test Your Knowledge

A CHECK constraint CHECK (amount > 0) is added to a Delta table. What happens when a transaction tries to insert a row with amount = -5?

A
B
C
D
Test Your Knowledge

Where are an expectation's valid-versus-invalid record counts recorded for monitoring data quality over time?

A
B
C
D