3.7 Slowly Changing Dimensions (SCD) Patterns

Key Takeaways

  • Slowly Changing Dimensions (SCD) handle updates to reference/dimension data that changes over time (e.g., customer addresses, product prices).
  • SCD Type 1 overwrites the existing record with the new value, losing historical data but keeping the table simple.
  • SCD Type 2 preserves history by creating a new row for each change, with effective date columns to track when each version was valid.
  • MERGE INTO is the primary SQL command for implementing both SCD Type 1 and SCD Type 2 patterns in Delta Lake.
  • Change Data Feed (CDF) can track row-level changes for downstream SCD processing.
Last updated: March 2026

Slowly Changing Dimensions (SCD) Patterns

Quick Answer: SCD Type 1 overwrites records (no history). SCD Type 2 creates new rows for each change (full history with effective dates). Both are implemented using MERGE INTO in Delta Lake.

SCD Type 1: Overwrite

SCD Type 1 replaces the existing value with the new value. No history is maintained.

-- SCD Type 1: Update existing record with new value
MERGE INTO dim_customers AS target
USING staging_customers AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN
    UPDATE SET
        target.name = source.name,
        target.address = source.address,
        target.phone = source.phone,
        target.updated_at = current_timestamp()
WHEN NOT MATCHED THEN
    INSERT (customer_id, name, address, phone, updated_at)
    VALUES (source.customer_id, source.name, source.address, source.phone, current_timestamp());

Use SCD Type 1 when:

  • Historical values are not needed
  • Only the current state matters (e.g., correcting a typo)
  • Simplicity is preferred over history tracking

SCD Type 2: Historical Tracking

SCD Type 2 preserves the full history of changes by creating a new row for each change:

-- SCD Type 2: Close existing record and insert new version
MERGE INTO dim_customers AS target
USING (
    SELECT
        customer_id,
        name,
        address,
        phone,
        current_timestamp() AS effective_start,
        NULL AS effective_end,
        TRUE AS is_current
    FROM staging_customers
) AS source
ON target.customer_id = source.customer_id AND target.is_current = TRUE
WHEN MATCHED AND (
    target.name != source.name OR
    target.address != source.address OR
    target.phone != source.phone
) THEN
    UPDATE SET
        target.effective_end = current_timestamp(),
        target.is_current = FALSE
WHEN NOT MATCHED THEN
    INSERT (customer_id, name, address, phone, effective_start, effective_end, is_current)
    VALUES (source.customer_id, source.name, source.address, source.phone, source.effective_start, source.effective_end, source.is_current);

-- Then insert the new current version for changed records
INSERT INTO dim_customers
SELECT
    s.customer_id, s.name, s.address, s.phone,
    current_timestamp() AS effective_start,
    NULL AS effective_end,
    TRUE AS is_current
FROM staging_customers s
JOIN dim_customers d ON s.customer_id = d.customer_id
WHERE d.is_current = FALSE
AND d.effective_end = (SELECT MAX(effective_end) FROM dim_customers WHERE customer_id = s.customer_id);

SCD Type 2 Table Structure

ColumnTypeDescription
surrogate_keyBIGINTUnique row identifier (auto-generated)
customer_idBIGINTNatural/business key
nameSTRINGCustomer name (versioned)
addressSTRINGCustomer address (versioned)
effective_startTIMESTAMPWhen this version became active
effective_endTIMESTAMPWhen this version was superseded (NULL = current)
is_currentBOOLEANWhether this is the active version

Use SCD Type 2 when:

  • Full change history is required (regulatory compliance, auditing)
  • Analytics need to report based on historical states
  • Data warehouse dimensional modeling requirements

On the Exam: Know the difference between SCD Type 1 (overwrite, no history) and SCD Type 2 (new rows, full history). Both use MERGE INTO. SCD Type 2 requires effective date columns and a current flag.

Test Your Knowledge

A data engineer needs to track the full history of customer address changes for compliance reporting. Which SCD type should they implement?

A
B
C
D
Test Your Knowledge

In a SCD Type 2 table, how do you identify the current (active) version of a customer record?

A
B
C
D
Test Your Knowledge

Which SQL command is used to implement both SCD Type 1 and SCD Type 2 patterns in Delta Lake?

A
B
C
D