In AUTO CDC INTO, what is the purpose of the SEQUENCE BY clause?

It provides the ordering column so changes are applied in correct logical order even when CDC events arrive out of order. SEQUENCE BY names the column that establishes the logical order of CDC events, letting the engine apply updates correctly despite out-of-order arrival. The matching key is given by KEYS, not SEQUENCE BY, and it has nothing to do with checkpoints or VACUUM.

For an AUTO CDC INTO target table stored as SCD TYPE 2, what must the target schema include?

The __START_AT and __END_AT columns with the same data type as the SEQUENCE BY column. SCD Type 2 versioning relies on effective-date columns, so the target must define __START_AT and __END_AT matching the SEQUENCE BY column's type, which the framework then populates automatically. Deletion vectors and checkpoint locations are unrelated to the SCD schema.

Slowly Changing Dimensions Patterns — Free Study Guide 2026

Key Takeaways

Slowly Changing Dimensions track how reference/dimension data (addresses, prices, statuses) changes over time.
SCD Type 1 overwrites the old value, keeping only the current state and losing history.
SCD Type 2 preserves history by adding a new row per change with __START_AT / __END_AT effective-date columns and a current flag.
AUTO CDC INTO (the renamed APPLY CHANGES INTO) computes SCD Type 1 or Type 2 automatically with STORED AS SCD TYPE 1/2 and SEQUENCE BY.
SEQUENCE BY handles out-of-order CDC events so changes are applied in the correct logical order.

What SCD Solves

Slowly Changing Dimensions (SCD) describe how to handle updates to dimension (reference) data that changes infrequently over time — a customer's address, a product's price, an employee's department. The question each SCD type answers is: when a dimension value changes, do we overwrite history or preserve it?

SCD Type 1: Overwrite

SCD Type 1 simply overwrites the existing value with the new one. There is exactly one row per business key, always showing the current value. History is lost — you cannot tell what the value used to be.

MERGE INTO dim_customer t USING updates s ON t.cust_id = s.cust_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Type 1 keeps the table small and simple and is appropriate when historical values do not matter (correcting a typo, storing a latest-known phone number).

SCD Type 2: Version History

SCD Type 2 preserves history by inserting a new row each time a tracked value changes. Each version carries effective-date columns and usually a current-flag:

cust_id	city	__START_AT	__END_AT	is_current
42	Reno	2024-01-01	2025-03-01	false
42	Boise	2025-03-01	null	true

The open record has a null __END_AT and is_current = true. Type 2 lets you answer point-in-time questions ("which city did customer 42 live in last March?") at the cost of more rows and bigger tables.

Automating SCD with AUTO CDC INTO

Doing SCD Type 2 by hand with MERGE is error-prone (closing old rows, handling out-of-order events). Databricks automates it in declarative pipelines with AUTO CDC INTO, the renamed and recommended successor to APPLY CHANGES INTO (same syntax; APPLY CHANGES still works but AUTO CDC is preferred).

CREATE OR REFRESH STREAMING TABLE dim_customer;

CREATE FLOW cdc_flow AS AUTO CDC INTO dim_customer
FROM STREAM(cdc_source)
KEYS (cust_id)
APPLY AS DELETE WHEN operation = 'DELETE'
SEQUENCE BY event_ts
STORED AS SCD TYPE 2;

The clauses you must recognize:

KEYS — the business key identifying a record.
SEQUENCE BY — the ordering column; the engine uses it to apply changes in correct logical order even when CDC events arrive out of order.
APPLY AS DELETE WHEN — treats matching rows as deletes (tombstones).
STORED AS SCD TYPE 1 or SCD TYPE 2 — chooses overwrite vs full history; Type 2 auto-populates __START_AT and __END_AT.
TRACK HISTORY ON (optional) — version only when specific columns change.

This collapses what would be 100+ lines of hand-written MERGE logic into a few declarative lines, and it natively understands streaming and late-arriving data without you managing watermarks.

Type 1 vs Type 2: Choosing and Comparing

Aspect	SCD Type 1	SCD Type 2
History	Lost (overwrite)	Preserved (new row per change)
Rows per key	One	Many (one per version)
Extra columns	None	__START_AT, __END_AT, current flag
Use when	Only current value matters; corrections	Point-in-time / audit history needed
AUTO CDC clause	STORED AS SCD TYPE 1	STORED AS SCD TYPE 2

Exam pointers

If a scenario says "we must report what a customer's status was at the time of each order," that requires Type 2 — Type 1 cannot answer it.
If it says "just keep the latest address; we don't care about old ones," Type 1 is correct and cheaper.
For SCD Type 2 target tables, the schema must include the __START_AT and __END_AT columns with the same data type as the SEQUENCE BY column.
SEQUENCE BY is what makes AUTO CDC robust to out-of-order events — a distractor answer often credits this to watermarking instead.

Manual SCD Type 2 with MERGE (for contrast)

Understanding why AUTO CDC is valuable means seeing the manual alternative. Implementing SCD Type 2 by hand with MERGE INTO requires two logical steps in one statement: close the currently-open row when a tracked value changes, and insert the new version. Because a single MERGE cannot both update the old row and insert a new one for the same key cleanly, the common pattern unions the source with a marker that forces a non-match for the insert:

MERGE INTO dim_customer t
USING staged_updates s
ON t.cust_id = s.cust_id AND t.is_current = true
WHEN MATCHED AND t.city <> s.city THEN
  UPDATE SET is_current = false, __END_AT = s.event_ts
WHEN NOT MATCHED THEN
  INSERT (cust_id, city, __START_AT, __END_AT, is_current)
  VALUES (s.cust_id, s.city, s.event_ts, null, true);

This hand-written approach is fragile: it does not natively handle out-of-order events, requires careful staging, and grows long fast. AUTO CDC INTO replaces this entire pattern, which is exactly why Databricks recommends it.

Other SCD types you may see referenced

Type	Behavior
Type 0	Retain original, never change (e.g., date-of-birth)
Type 1	Overwrite, no history
Type 2	New row per change, full history (most common)
Type 3	Add a 'previous value' column (limited history)

The Associate exam focuses overwhelmingly on Type 1 vs Type 2 and on driving them with AUTO CDC INTO; Types 0 and 3 appear mainly as distractors.

Test Your Knowledge

A dimension table must let analysts query what a customer's loyalty tier was on any past date. Which SCD type is required?

SCD Type 1, because it overwrites with the current value

SCD Type 2, because it keeps a versioned row per change with effective-date columns

No SCD is needed; a simple overwrite suffices

SCD Type 1 with deletion vectors

Test Your Knowledge

Which statement about AUTO CDC INTO and APPLY CHANGES INTO is correct?

AUTO CDC INTO is the recommended successor to APPLY CHANGES INTO and shares the same syntax

APPLY CHANGES INTO only supports SCD Type 1 while AUTO CDC supports Type 2

They are unrelated features from different products

AUTO CDC INTO can only be used in Python, never SQL

Databricks Certified Data Engineer Associate

Databricks Certified Data Engineer Associate

3.7 Slowly Changing Dimensions (SCD) Patterns

Key Takeaways

What SCD Solves

SCD Type 1: Overwrite

SCD Type 2: Version History

Automating SCD with AUTO CDC INTO

Type 1 vs Type 2: Choosing and Comparing

Exam pointers

Manual SCD Type 2 with MERGE (for contrast)

Other SCD types you may see referenced

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Databricks Certified Data Engineer Associate

3.7 Slowly Changing Dimensions (SCD) Patterns

Key Takeaways

What SCD Solves

SCD Type 1: Overwrite

SCD Type 2: Version History

Automating SCD with AUTO CDC INTO

Type 1 vs Type 2: Choosing and Comparing

Exam pointers

Manual SCD Type 2 with MERGE (for contrast)

Other SCD types you may see referenced