5.3 Micro-Partitions, Pruning, Clustering, and the Query Profile
Key Takeaways
- Snowflake stores tables as immutable ~16 MB compressed micro-partitions, each holding per-column min/max metadata used for automatic pruning.
- Define a clustering key only on large tables (multi-TB) where a common, selective filter no longer prunes well; reclustering consumes credits automatically.
- SYSTEM$CLUSTERING_INFORMATION reports clustering depth and overlap — lower average depth means better-clustered data.
- Query Profile flags spilling (undersized warehouse), low 'partitions scanned/total' (good pruning), and the most expensive nodes.
Micro-Partitions and Automatic Pruning
Snowflake physically stores every table as a set of micro-partitions — immutable, compressed columnar files of roughly 50–500 MB of uncompressed data (about 16 MB compressed), each holding up to a few hundred thousand rows. Crucially, Snowflake records metadata for every micro-partition: the range (min/max) of each column, the number of distinct values, and null counts. This metadata is what enables query pruning.
When a query has a filter such as WHERE order_date = '2026-01-15', the optimizer reads micro-partition metadata and skips every partition whose min/max range cannot contain that value — without scanning a byte of data. This is automatic and requires no configuration; it is Snowflake's substitute for indexes. Pruning is most effective when the filter column correlates with the order in which data was loaded, because related values then sit together in the same micro-partitions. As tables grow and receive many small DML operations, this natural ordering can degrade, and pruning becomes less selective.
Two properties make micro-partitions special: they are immutable (an update writes new partitions rather than editing in place, which is what powers Time Travel), and they are columnar within each partition, so a query reads only the columns and partitions it needs. Pruning therefore operates on both axes — partitions (rows) and columns — which is why SELECT * and unselective filters are doubly wasteful.
Clustering Keys and Reclustering
A clustering key is one or more columns (or expressions) that Snowflake uses to co-locate related rows into the same micro-partitions, restoring good pruning. You define it with ALTER TABLE t CLUSTER BY (order_date). Once defined, Snowflake performs automatic reclustering in the background as data changes — this consumes Snowflake-managed compute credits, billed separately from your warehouse.
Clustering is not free and is not for every table. Define a clustering key only when all of these hold:
- The table is large (generally multiple terabytes / hundreds of millions of rows).
- Queries frequently filter or join on the chosen column with high selectivity.
- Query Profile shows the table no longer prunes well (high partitions-scanned ratio).
Choose a key with the right cardinality: high enough to be selective, low enough that values group into partitions (a unique key like a primary key is a poor choice). On small tables, the reclustering cost outweighs any benefit, and the exam expects you to reject clustering there.
Use SELECT SYSTEM$CLUSTERING_INFORMATION('t', '(order_date)') to inspect clustering quality. It returns the average clustering depth (how many overlapping micro-partitions a typical value spans) and overlap histograms. A lower average depth is better — depth near 1 means each value lives in few partitions, so pruning is excellent.
Reading the Query Profile and EXPLAIN
EXPLAIN returns the logical plan and estimated partitions/bytes without running the query. The Query Profile (in Snowsight, per executed query) is the richer diagnostic. Learn to read these signals:
| Profile signal | Meaning | Typical fix |
|---|---|---|
| Partitions scanned ≈ Partitions total | Poor pruning; the filter isn't skipping data | Add/repair clustering key, or use search optimization |
| Bytes spilled to local storage | Warehouse memory exceeded; using SSD | Scale up the warehouse |
| Bytes spilled to remote storage | Even SSD exceeded; spilling to cloud storage (worst case) | Scale up further; reduce data processed |
| A single node = most of execution time | That operator (Join, Aggregate, TableScan) is the bottleneck | Optimize that step (join order, filter earlier) |
| Exploding / cartesian join | Missing or wrong join condition | Fix join keys |
The percentage of time each operator consumes and the partitions scanned / partitions total ratio are the two numbers the exam cares about most. A low ratio (e.g., 12 of 4,000) proves pruning is working; a high ratio proves it is not. Spilling — local first, then the much slower remote spilling — is the unambiguous signal to scale up.
What Kills Pruning, and What to Do
Pruning quietly degrades for predictable reasons; recognizing them is exam gold:
- Functions or casts on the filtered column.
WHERE TO_DATE(ts) = '2026-01-15'can prevent the optimizer from using stored min/max ranges; filtering on the raw column (or a half-open range) preserves pruning. - Leading wildcards.
LIKE '%abc'cannot prune on a range; consider the Search Optimization Service for substring search. - Heavy small-DML churn. Many small inserts/updates create poorly ordered micro-partitions, raising clustering depth over time — the case for a clustering key on a large table.
- Low correlation between filter column and load order. If data arrives sorted by
idbut you filter byregion, every partition may match — clustering byregionfixes it.
Reclustering cost is the trade you must weigh. Snowflake reclusters automatically using serverless credits proportional to how much data must be re-sorted; high-churn tables can recluster expensively. The guidance: cluster a large table on a frequently filtered, selective column, then monitor with SYSTEM$CLUSTERING_INFORMATION and SYSTEM$CLUSTERING_DEPTH. If reclustering credits exceed query savings, drop the key.
Finally, distinguish the Profile from EXPLAIN. EXPLAIN is pre-execution and shows estimates (useful to compare plans cheaply); the Query Profile is post-execution and shows what actually happened — real partitions scanned, real bytes spilled, real per-operator time. When a question asks you to diagnose a query that already ran slowly, the Query Profile is the correct tool; when it asks how to check a plan before running it, EXPLAIN is the answer.
A 5 TB events table is filtered daily on event_date, but Query Profile shows nearly all micro-partitions are scanned for each date filter. What is the most appropriate action?
What does a lower average clustering depth reported by SYSTEM$CLUSTERING_INFORMATION indicate?
In a Query Profile, you see 'Bytes spilled to remote storage' as a large value. What does this most directly indicate?