6.2 Model Size Reduction and Cardinality Management
Key Takeaways
- High-cardinality columns (many unique values) compress poorly and increase model size disproportionately.
- Remove unnecessary columns before loading — every column consumes memory even if unused in reports.
- Reduce row count by filtering historical data, aggregating to the needed granularity, or using incremental refresh.
- Integer keys compress better than text keys — create surrogate integer keys when possible.
- Disable Auto Date/Time to prevent hidden date tables from bloating the model.
Model Size Reduction and Cardinality Management
Quick Answer: Model size is driven by column count, row count, and cardinality (unique values per column). Remove unused columns, filter unnecessary rows, use integers instead of text for keys, reduce precision of decimals, and disable Auto Date/Time. High-cardinality text columns (GUIDs, free-text descriptions) are the biggest offenders.
Understanding Model Size
Power BI's VertiPaq engine compresses data using dictionary encoding and other techniques. The compression ratio depends on:
Factors Affecting Size
| Factor | Impact |
|---|---|
| Number of columns | Each column has overhead regardless of row count |
| Number of rows | More rows = more data to store |
| Cardinality | More unique values = worse compression |
| Data type | Text > Decimal > Integer > Boolean (most to least storage) |
| Value distribution | Skewed distributions compress better than uniform |
Understanding Cardinality
Cardinality = the number of distinct values in a column.
| Cardinality Level | Example | Compression |
|---|---|---|
| Very low (2-10) | Gender, Status, Boolean | Excellent |
| Low (10-1,000) | Category, Region, Department | Very Good |
| Medium (1,000-100,000) | City, Product Name | Good |
| High (100,000+) | Customer Email, Transaction ID | Poor |
| Very high (millions) | GUID, Free Text, Timestamps | Very Poor |
Column Optimization
Remove Unused Columns
Every column in the model consumes memory. Remove columns that are not used in:
- Reports (visuals, slicers, filters)
- Measures (DAX references)
- Relationships (key columns)
- Row-level security (filter expressions)
In Power Query: Right-click column → Remove
Or: Select needed columns → Right-click → Remove Other Columns
Best Practice: "Remove Other Columns" is more future-proof than "Remove Columns" — if the source adds new columns, they are automatically excluded.
Reduce Column Precision
- Use Date instead of Date/Time when time is not needed
- Round decimal values to fewer decimal places
- Truncate text columns to maximum needed length
Replace High-Cardinality Text with Integers
Instead of using text-based keys for relationships:
| Before (Text Key) | After (Integer Key) |
|---|---|
| "PROD-ABC-12345" | 1 |
| "PROD-DEF-67890" | 2 |
| "PROD-GHI-11111" | 3 |
Create a surrogate integer key in Power Query:
Add Column → Index Column → From 1
Row Optimization
Filter Historical Data
If reports only need the last 3 years of data:
Table.SelectRows(Source, each [Date] >= #date(2024, 1, 1))
Aggregate to Needed Granularity
If reports never drill below the daily level but data is per-second:
Group By Date, Product → Sum of Amount, Count of Transactions
This can reduce millions of rows to thousands.
Use Aggregation Tables
For large models, create pre-aggregated summary tables:
- Load both detail and summary tables
- Configure aggregations in the model
- Power BI automatically routes high-level queries to the summary table
- Drill-through queries use the detail table
Disable Auto Date/Time
File → Options → Data Load → Time Intelligence
→ Uncheck "Auto Date/Time for new files"
Auto Date/Time creates a hidden date table for every date column in the model. With 10 date columns, that is 10 hidden tables with full date hierarchies — significant overhead.
Instead: Create one explicit Date dimension table and use it for all date-based analysis.
On the Exam
The PL-300 frequently tests:
- Identifying high-cardinality columns as model size problems
- Choosing strategies to reduce model size
- Understanding the impact of data types on compression
- Knowing when to aggregate data before loading
- Disabling Auto Date/Time and using an explicit date table
Which of the following columns would cause the WORST compression and largest model size increase?
Why should you disable the Auto Date/Time feature in Power BI?
A model has a detailed transactions table with 50 million rows at per-second granularity. Reports only show daily summaries. What is the best optimization approach?