5.3 Micro-Partitions, Pruning, Clustering, and the Query Profile

Key Takeaways

Snowflake stores tables as immutable ~16 MB compressed micro-partitions, each holding per-column min/max metadata used for automatic pruning.
Define a clustering key only on large tables (multi-TB) where a common, selective filter no longer prunes well; reclustering consumes credits automatically.
SYSTEM$CLUSTERING_INFORMATION reports clustering depth and overlap — lower average depth means better-clustered data.
Query Profile flags spilling (undersized warehouse), low 'partitions scanned/total' (good pruning), and the most expensive nodes.

Last updated: June 2026

Micro-Partitions and Automatic Pruning

Snowflake physically stores every table as a set of micro-partitions — immutable, compressed columnar files of roughly 50–500 MB of uncompressed data (about 16 MB compressed), each holding up to a few hundred thousand rows. Crucially, Snowflake records metadata for every micro-partition: the range (min/max) of each column, the number of distinct values, and null counts. This metadata is what enables query pruning.

When a query has a filter such as WHERE order_date = '2026-01-15', the optimizer reads micro-partition metadata and skips every partition whose min/max range cannot contain that value — without scanning a byte of data. This is automatic and requires no configuration; it is Snowflake's substitute for indexes. Pruning is most effective when the filter column correlates with the order in which data was loaded, because related values then sit together in the same micro-partitions. As tables grow and receive many small DML operations, this natural ordering can degrade, and pruning becomes less selective.

Two properties make micro-partitions special: they are immutable (an update writes new partitions rather than editing in place, which is what powers Time Travel), and they are columnar within each partition, so a query reads only the columns and partitions it needs. Pruning therefore operates on both axes — partitions (rows) and columns — which is why SELECT * and unselective filters are doubly wasteful.

Clustering Keys and Reclustering

A clustering key is one or more columns (or expressions) that Snowflake uses to co-locate related rows into the same micro-partitions, restoring good pruning. You define it with ALTER TABLE t CLUSTER BY (order_date). Once defined, Snowflake performs automatic reclustering in the background as data changes — this consumes Snowflake-managed compute credits, billed separately from your warehouse.

Clustering is not free and is not for every table. Define a clustering key only when all of these hold:

The table is large (generally multiple terabytes / hundreds of millions of rows).
Queries frequently filter or join on the chosen column with high selectivity.
Query Profile shows the table no longer prunes well (high partitions-scanned ratio).

Choose a key with the right cardinality: high enough to be selective, low enough that values group into partitions (a unique key like a primary key is a poor choice). On small tables, the reclustering cost outweighs any benefit, and the exam expects you to reject clustering there.

Use SELECT SYSTEM$CLUSTERING_INFORMATION('t', '(order_date)') to inspect clustering quality. It returns the average clustering depth (how many overlapping micro-partitions a typical value spans) and overlap histograms. A lower average depth is better — depth near 1 means each value lives in few partitions, so pruning is excellent.

Reading the Query Profile and EXPLAIN

EXPLAIN returns the logical plan and estimated partitions/bytes without running the query. The Query Profile (in Snowsight, per executed query) is the richer diagnostic. Learn to read these signals:

Profile signal	Meaning	Typical fix
Partitions scanned ≈ Partitions total	Poor pruning; the filter isn't skipping data	Add/repair clustering key, or use search optimization
Bytes spilled to local storage	Warehouse memory exceeded; using SSD	Scale up the warehouse
Bytes spilled to remote storage	Even SSD exceeded; spilling to cloud storage (worst case)	Scale up further; reduce data processed
A single node = most of execution time	That operator (Join, Aggregate, TableScan) is the bottleneck	Optimize that step (join order, filter earlier)
Exploding / cartesian join	Missing or wrong join condition	Fix join keys

The percentage of time each operator consumes and the partitions scanned / partitions total ratio are the two numbers the exam cares about most. A low ratio (e.g., 12 of 4,000) proves pruning is working; a high ratio proves it is not. Spilling — local first, then the much slower remote spilling — is the unambiguous signal to scale up.

What Kills Pruning, and What to Do

Pruning quietly degrades for predictable reasons; recognizing them is exam gold:

Functions or casts on the filtered column. WHERE TO_DATE(ts) = '2026-01-15' can prevent the optimizer from using stored min/max ranges; filtering on the raw column (or a half-open range) preserves pruning.
Leading wildcards. LIKE '%abc' cannot prune on a range; consider the Search Optimization Service for substring search.
Heavy small-DML churn. Many small inserts/updates create poorly ordered micro-partitions, raising clustering depth over time — the case for a clustering key on a large table.
Low correlation between filter column and load order. If data arrives sorted by id but you filter by region, every partition may match — clustering by region fixes it.

Reclustering cost is the trade you must weigh. Snowflake reclusters automatically using serverless credits proportional to how much data must be re-sorted; high-churn tables can recluster expensively. The guidance: cluster a large table on a frequently filtered, selective column, then monitor with SYSTEM$CLUSTERING_INFORMATION and SYSTEM$CLUSTERING_DEPTH. If reclustering credits exceed query savings, drop the key.

Finally, distinguish the Profile from EXPLAIN. EXPLAIN is pre-execution and shows estimates (useful to compare plans cheaply); the Query Profile is post-execution and shows what actually happened — real partitions scanned, real bytes spilled, real per-operator time. When a question asks you to diagnose a query that already ran slowly, the Query Profile is the correct tool; when it asks how to check a plan before running it, EXPLAIN is the answer.

Test Your Knowledge

A 5 TB events table is filtered daily on event_date, but Query Profile shows nearly all micro-partitions are scanned for each date filter. What is the most appropriate action?

Switch the warehouse scaling policy to ECONOMY

Enable the result cache for the table

Reduce the warehouse size to save credits

Define a clustering key on event_date so rows co-locate and pruning improves

Test Your Knowledge

What does a lower average clustering depth reported by SYSTEM$CLUSTERING_INFORMATION indicate?

Each value spans fewer overlapping micro-partitions, so the table is well clustered

The warehouse is too small for the workload

The table is poorly clustered and needs immediate reclustering

The result cache is being used efficiently

Test Your Knowledge

In a Query Profile, you see 'Bytes spilled to remote storage' as a large value. What does this most directly indicate?

The query benefited from the result cache

Pruning eliminated most micro-partitions

The warehouse exhausted memory and local SSD and is spilling to slow cloud storage; scale up

A clustering key must be dropped

Up Next

5.4 Search Optimization, Materialized Views, and Query Best Practices

Continue learning

SnowPro Core Study Guide

SnowPro Core Certification

5.3 Micro-Partitions, Pruning, Clustering, and the Query Profile

Key Takeaways

Micro-Partitions and Automatic Pruning

Clustering Keys and Reclustering

Reading the Query Profile and EXPLAIN

What Kills Pruning, and What to Do

SnowPro Core Study Guide

1Chapter 1: SnowPro Core Orientation and Exam Strategy

2Chapter 2: Snowflake AI Data Cloud Features and Architecture

3Chapter 3: Account Management and Data Governance

4Chapter 4: Data Loading, Unloading, and Connectivity

5Chapter 5: Performance Optimization, Querying, and Transformation

6Chapter 6: Data Collaboration

7Chapter 7: Final Review and Test Day

SnowPro Core Certification

5.3 Micro-Partitions, Pruning, Clustering, and the Query Profile

Key Takeaways

Micro-Partitions and Automatic Pruning

Clustering Keys and Reclustering

Reading the Query Profile and EXPLAIN

What Kills Pruning, and What to Do