3.4 Transform & Model for Analytics

Key Takeaways

  • A star schema separates numeric fact tables (events/measures) from descriptive dimension tables and is the target shape for analytics in Fabric.
  • Surrogate keys are warehouse-generated integer keys for dimensions; business (natural) keys come from the source — facts join to dimensions on surrogate keys.
  • Slowly changing dimension (SCD) handling: Type 1 overwrites the attribute; Type 2 adds a new versioned row with effective dates to preserve history.
  • Denormalize source 3NF tables into wide dimensions so the model has fewer, simpler relationships and faster filter propagation.
  • Build conformed dimensions (e.g., a shared Date dimension) so multiple fact tables can be analyzed consistently across the model.
Last updated: May 2026

From Source Shape to Analytics Shape

Operational sources are usually normalized (third normal form) to avoid update anomalies in transactional apps. Analytics wants the opposite: a star schema that is fast to filter and easy for business users and DAX to reason about. DP-600 tests whether you can do that transformation in a lakehouse or warehouse before semantic modeling.

Star Schema Fundamentals

  • Fact table — the events or measurements (sales line items, sessions, payments). Long and narrow: foreign keys plus numeric measures. Grain (one row = one what?) must be explicit and consistent.
  • Dimension table — descriptive context (Product, Customer, Date, Store). Wide and short: one row per entity, many descriptive attributes used for slicing and grouping.
  • Conformed dimension — a single shared dimension (most commonly Date) reused by multiple fact tables so analysis is consistent across subject areas.

Normalized vs. Star (Denormalization)

AspectNormalized source (3NF)Star schema (analytics)
GoalAvoid update anomaliesFast slicing and aggregation
TablesMany small related tablesFew wide dimensions + fact
Joins per queryManyFew
Relationship pathsLong chains (snowflake)Short, single-hop
Best forOLTP appsBI / semantic models

Denormalization is the deliberate flattening of those snowflaked source tables (e.g., merging Product, Subcategory, and Category into one wide Product dimension) so the model has fewer relationships and faster filter propagation. This is an exam-favored answer when a scenario complains about complex relationships or slow filtering.

Surrogate Keys vs. Business Keys

  • Business (natural) key — the identifier from the source system (e.g., CustomerEmail, SKU). It can change, be reused, or be non-numeric.
  • Surrogate key — a system-generated, typically integer key the warehouse assigns to each dimension row. Fact tables store the surrogate key as the foreign key.

Why the exam cares: surrogate keys decouple the model from source-key changes, give compact integer joins (better compression and performance), and are required to implement Type 2 slowly changing dimensions because the same business entity can have multiple historical rows.

Slowly Changing Dimensions (SCD)

When a dimension attribute changes over time (a customer moves cities), you choose how to handle history:

TypeBehaviorHistoryWhen to use
Type 1Overwrite the old valueNot keptCorrections; history irrelevant
Type 2Add a new row with a new surrogate key + effective/expiry dates + current flagFully preservedNeed point-in-time accuracy
Type 3Add a "previous value" columnLimited (one prior)Track only the last change

Type 2 is the most exam-relevant: it explains why surrogate keys exist and how facts stay tied to the dimension version that was true when the event occurred.

Where the Transformation Happens

  • Dataflow Gen2 — low-code shaping: merge, group, type-fix, derive columns, output to lakehouse/warehouse.
  • Notebook (Spark) — large-scale or programmatic builds, SCD logic, surrogate-key generation at volume.
  • Warehouse T-SQL / stored procedures — set-based dimensional ETL for relational teams.

Match the tool to skill set and scale; the target is always a clean star schema regardless of engine.

Loading diagram...
Target star schema
Test Your Knowledge

A scenario requires that when a customer's region changes, historical sales remain attributed to the region that was in effect at the time of each sale. Which dimension design supports this requirement?

A
B
C
D