2.1 Data Profiling in Power Query

Key Takeaways

  • Data profiling tools include Column Quality, Column Distribution, and Column Profile in the View tab.
  • Column Quality shows percentages of Valid, Error, and Empty values per column.
  • Column Distribution displays distinct count and unique count, helping identify potential key columns.
  • Column Profile provides detailed statistics (min, max, average, standard deviation) and value distribution for the selected column.
  • By default, profiling is based on the first 1,000 rows — change to 'Column profiling based on entire dataset' for accuracy.
Last updated: March 2026

Data Profiling in Power Query

Quick Answer: Power Query has three data profiling tools in the View tab: Column Quality (valid/error/empty percentages), Column Distribution (distinct/unique counts), and Column Profile (detailed statistics). Always switch to "entire dataset" profiling — the default only scans the first 1,000 rows.

Data profiling is the process of examining data to understand its structure, quality, and content before applying transformations. Power Query provides built-in profiling tools that are essential for making informed data cleaning decisions.

Enabling Data Profiling

Data profiling tools are found in the View tab of Power Query Editor:

View tab → Data Preview section → Check the boxes:
  ☑ Column Quality
  ☑ Column Distribution
  ☑ Column Profile

Critical Setting: By default, profiling is based on the first 1,000 rows. For accurate profiling, change this by clicking the status bar at the bottom of Power Query Editor and selecting "Column profiling based on entire dataset".

Column Quality

Column Quality adds a color-coded bar at the top of each column showing the percentage of:

IndicatorColorMeaning
ValidGreenValues conform to the column's data type
ErrorRedValues that cannot be resolved (type mismatches, division by zero)
EmptyGrayNull or blank values

What to look for:

  • Columns with high error percentages → may need type changes or error handling
  • Columns with high empty percentages → decide whether to filter, replace, or fill
  • 100% valid doesn't mean the data is correct — just that it matches the expected type

Column Distribution

Column Distribution shows two key metrics as a histogram at the top of each column:

MetricDefinitionUse Case
DistinctTotal number of different valuesUnderstanding cardinality
UniqueValues that appear exactly onceIdentifying potential keys

Interpreting the metrics:

  • Distinct = Row Count → every value is different (likely a key column)
  • Distinct = 1 → all values are the same (column may be unnecessary)
  • Unique = Distinct → no duplicate values (good candidate for a primary key)
  • Unique << Distinct → many values repeat (typical for category columns)

Example:

  • A CustomerID column with 10,000 rows, 10,000 distinct values, and 10,000 unique values → perfect primary key candidate
  • A Region column with 10,000 rows but only 5 distinct values → categorical column

Column Profile

Column Profile provides the most detailed analysis for the selected column. It appears in a panel below the data preview.

For Text Columns

StatisticDescription
CountNumber of non-null values
ErrorNumber of error values
EmptyNumber of null/blank values
DistinctNumber of unique values
UniqueValues appearing exactly once
Min LengthShortest text value
Max LengthLongest text value
Value DistributionBar chart of value frequencies

For Numeric Columns

StatisticDescription
CountNumber of non-null values
ErrorNumber of error values
EmptyNumber of null/blank values
MinSmallest value
MaxLargest value
AverageArithmetic mean
Standard DeviationMeasure of data spread
Even/Odd CountDistribution of even and odd values
Zero CountNumber of zero values

For Date Columns

StatisticDescription
MinEarliest date
MaxLatest date
CountNumber of non-null dates
Value DistributionTimeline visualization of date frequencies

Profiling Workflow

A systematic profiling approach:

  1. Enable all three profiling tools and switch to entire dataset
  2. Check Column Quality — flag columns with errors or high empty rates
  3. Review Column Distribution — identify keys, categories, and anomalies
  4. Examine Column Profile — deep-dive into suspicious columns
  5. Document findings — note which columns need cleaning, type changes, or removal
  6. Prioritize actions — fix errors first, then handle nulls, then optimize types

On the Exam

The PL-300 frequently tests:

  • Knowing where to find profiling tools (View tab in Power Query Editor)
  • Interpreting Column Quality percentages (valid, error, empty)
  • Distinguishing between distinct count and unique count
  • Understanding the default 1,000-row profiling limitation
  • Using profiling results to inform data cleaning decisions
Test Your Knowledge

In Power Query, Column Distribution shows that a column has 500 distinct values and 500 unique values. What does this tell you?

A
B
C
D
Test Your Knowledge

By default, Power Query column profiling is based on:

A
B
C
D
Test Your Knowledge

Column Quality shows a column is 85% Valid, 10% Error, and 5% Empty. Which profiling tool shows these percentages?

A
B
C
D