6.2 Model Size Reduction and Cardinality Management

Key Takeaways

  • High-cardinality columns (many unique values) compress poorly and increase model size disproportionately.
  • Remove unnecessary columns before loading — every column consumes memory even if unused in reports.
  • Reduce row count by filtering historical data, aggregating to the needed granularity, or using incremental refresh.
  • Integer keys compress better than text keys — create surrogate integer keys when possible.
  • Disable Auto Date/Time to prevent hidden date tables from bloating the model.
Last updated: March 2026

Model Size Reduction and Cardinality Management

Quick Answer: Model size is driven by column count, row count, and cardinality (unique values per column). Remove unused columns, filter unnecessary rows, use integers instead of text for keys, reduce precision of decimals, and disable Auto Date/Time. High-cardinality text columns (GUIDs, free-text descriptions) are the biggest offenders.

Understanding Model Size

Power BI's VertiPaq engine compresses data using dictionary encoding and other techniques. The compression ratio depends on:

Factors Affecting Size

FactorImpact
Number of columnsEach column has overhead regardless of row count
Number of rowsMore rows = more data to store
CardinalityMore unique values = worse compression
Data typeText > Decimal > Integer > Boolean (most to least storage)
Value distributionSkewed distributions compress better than uniform

Understanding Cardinality

Cardinality = the number of distinct values in a column.

Cardinality LevelExampleCompression
Very low (2-10)Gender, Status, BooleanExcellent
Low (10-1,000)Category, Region, DepartmentVery Good
Medium (1,000-100,000)City, Product NameGood
High (100,000+)Customer Email, Transaction IDPoor
Very high (millions)GUID, Free Text, TimestampsVery Poor

Column Optimization

Remove Unused Columns

Every column in the model consumes memory. Remove columns that are not used in:

  • Reports (visuals, slicers, filters)
  • Measures (DAX references)
  • Relationships (key columns)
  • Row-level security (filter expressions)
In Power Query: Right-click column → Remove
Or: Select needed columns → Right-click → Remove Other Columns

Best Practice: "Remove Other Columns" is more future-proof than "Remove Columns" — if the source adds new columns, they are automatically excluded.

Reduce Column Precision

  • Use Date instead of Date/Time when time is not needed
  • Round decimal values to fewer decimal places
  • Truncate text columns to maximum needed length

Replace High-Cardinality Text with Integers

Instead of using text-based keys for relationships:

Before (Text Key)After (Integer Key)
"PROD-ABC-12345"1
"PROD-DEF-67890"2
"PROD-GHI-11111"3

Create a surrogate integer key in Power Query:

Add Column → Index Column → From 1

Row Optimization

Filter Historical Data

If reports only need the last 3 years of data:

Table.SelectRows(Source, each [Date] >= #date(2024, 1, 1))

Aggregate to Needed Granularity

If reports never drill below the daily level but data is per-second:

Group By Date, Product → Sum of Amount, Count of Transactions

This can reduce millions of rows to thousands.

Use Aggregation Tables

For large models, create pre-aggregated summary tables:

  1. Load both detail and summary tables
  2. Configure aggregations in the model
  3. Power BI automatically routes high-level queries to the summary table
  4. Drill-through queries use the detail table

Disable Auto Date/Time

File → Options → Data Load → Time Intelligence
→ Uncheck "Auto Date/Time for new files"

Auto Date/Time creates a hidden date table for every date column in the model. With 10 date columns, that is 10 hidden tables with full date hierarchies — significant overhead.

Instead: Create one explicit Date dimension table and use it for all date-based analysis.

On the Exam

The PL-300 frequently tests:

  • Identifying high-cardinality columns as model size problems
  • Choosing strategies to reduce model size
  • Understanding the impact of data types on compression
  • Knowing when to aggregate data before loading
  • Disabling Auto Date/Time and using an explicit date table
Test Your Knowledge

Which of the following columns would cause the WORST compression and largest model size increase?

A
B
C
D
Test Your Knowledge

Why should you disable the Auto Date/Time feature in Power BI?

A
B
C
D
Test Your Knowledge

A model has a detailed transactions table with 50 million rows at per-second granularity. Reports only show daily summaries. What is the best optimization approach?

A
B
C
D