A streaming load has produced thousands of tiny Parquet files in a Delta table, and analytical scans have become slow. Which maintenance command directly addresses the small-file problem?

OPTIMIZE to bin-compact small files into target-sized files. OPTIMIZE groups many small Parquet files into bins targeting an ideal file size and rewrites them, directly resolving the small-file problem that slows scans. VACUUM removes old unreferenced files (and a 1-day threshold is rejected by default), statistics affect the warehouse optimizer not lakehouse file layout, and disabling V-Order would hurt read performance.

Direct Lake reports on a large Delta table are read-heavy and you want the Parquet layout optimized so reads are as fast as possible. Which optimization is designed for this read-heavy scenario?

V-Order. V-Order applies write-time sorting, encoding, and compression to the Parquet layout specifically to accelerate read-heavy queries such as Power BI Direct Lake. VACUUM retention only governs file cleanup, a copy activity is an ingestion tool, and warehouse statistics do not change lakehouse Parquet layout.

You must identify the slowest and most frequently executed queries that ran against a Fabric Warehouse over the past few weeks. Which feature should you query?

The queryinsights schema views, which retain about 30 days of query history. Query insights provides autogenerated views under the queryinsights schema with roughly 30 days of historical query data, surfacing long-running and frequent queries. The Spark UI is for Spark applications not the SQL Warehouse, Activator is for alerting, and VACUUM relates to Delta file cleanup.

Scenario Practice for Monitor and optimize a | Free Guide 2026

Key Takeaways

OPTIMIZE bin-compacts small Parquet files into target-sized files; Fabric applies bin-compaction, then Z-Order, then V-Order.
V-Order is write-time sorting/encoding that speeds read-heavy queries; Z-Order co-locates values to improve data skipping.
VACUUM removes unreferenced files older than the retention threshold; the default and minimum safe threshold is 7 days.
Fabric Warehouse auto-creates and auto-refreshes histogram, average-column-length, and cardinality statistics for the optimizer.
Query insights (queryinsights schema) retains 30 days of historical query data; DMVs give live session/request status.

Delta table maintenance scenarios

Most lakehouse performance questions reduce to Delta table maintenance. Frequent small writes (streaming, incremental loads, many small files) produce the small-file problem, which slows reads. The fix is OPTIMIZE, which groups small Parquet files into bins targeting an ideal file size and rewrites them. Fabric applies operations in a fixed order: bin-compaction first, then Z-Order, then V-Order.

V-Order is a write-time optimization (sorting, encoding, compression) of the Parquet layout that accelerates read-heavy queries, especially Power BI Direct Lake. It is applied during OPTIMIZE or on write.
Z-Order co-locates related column values in the same files so queries that filter on those columns skip more data. Combine Z-Order with V-Order in one command for filtered analytical workloads.
VACUUM deletes files no longer referenced by the Delta log that are older than the retention threshold. The default is seven days, and Fabric rejects retention intervals under seven days by default to protect time-travel and concurrent readers.

You can run these from a Spark SQL notebook, a Spark job definition, or the Lakehouse table maintenance UI (which exposes OPTIMIZE with optional V-Order and VACUUM together).

Warehouse and query performance scenarios

For the Fabric Warehouse and the SQL analytics endpoint, the engine automatically creates statistics whenever the optimizer needs them and auto-refreshes stale ones. Supported automatic statistics are histogram, average column length, and table cardinality. You can still issue CREATE STATISTICS / UPDATE STATISTICS for single-column histograms when you want manual control — useful when a load reshapes data and you want fresh statistics before a critical query.

To diagnose slow queries, use query insights: autogenerated views under the queryinsights schema retain 30 days of historical query data and surface long-running and frequently-run queries. For live activity — current connections, sessions, and requests — query the dynamic management views (DMVs) such as sys.dm_exec_requests. Warehouse-level tuning also favors avoiding many small inserts (batch loads), choosing appropriate data types, and letting the distributed engine parallelize.

Symptom	Likely remedy
Many tiny Parquet files, slow scans	OPTIMIZE (bin-compaction)
Filtered queries read too much data	Z-Order on the filter columns
Slow Direct Lake / read-heavy reports	V-Order
Storage bloat from old versions	VACUUM (>= 7-day retention)
Stale plan estimates in Warehouse	UPDATE STATISTICS / rely on auto-stats
Find the worst historical query	queryinsights views (30 days)

Spark configuration scenarios

Spark performance questions hinge on compute configuration. Starter pools give 5-10 second session startup with always-on medium nodes. Custom pools let you set node size and autoscale min/max nodes, while dynamic executor allocation lets a running application request more executors within bounds when tasks exceed current capacity. For many notebooks sharing the same lakehouse, environment, and Spark config, enable High Concurrency mode so they share one session and avoid repeated startup overhead.

The Native Execution Engine (NEE) boosts performance by running operations in a vectorized C++ engine; it does not support UDFs and automatically falls back to the JVM for them. Read the stem: 'cold-start latency' points to starter/high-concurrency, 'variable load' points to autoscale plus dynamic allocation, and 'raw throughput on standard operations' points to the Native Execution Engine.

Partitioning and OneLake optimization scenarios

Partitioning is a double-edged tool. Partitioning a Delta table by a low-cardinality column that queries filter on (for example year or region) prunes whole folders and speeds reads. But over-partitioning by a high-cardinality column re-creates the small-file problem — thousands of tiny partitions, each with tiny files. The exam tests this judgment: partition on columns that are frequently filtered and low-cardinality, and otherwise rely on Z-Order for data skipping within larger files.

For OneLake and the lakehouse broadly, the optimization levers are: keep files at a healthy size (run OPTIMIZE after sizable writes), apply V-Order for read-heavy Direct Lake consumption, VACUUM on a sensible schedule to control storage and version sprawl, and use shortcuts to avoid copying data into OneLake unnecessarily (reducing duplication and ingestion CU). A Direct Lake semantic model reads Parquet directly from OneLake, so V-Ordered, well-compacted tables translate straight into faster reports.

Scenario reading method

Approach every performance scenario the same way: identify the layer (lakehouse Delta, Warehouse SQL, or Spark compute), find the bottleneck cue in the stem (small files, missing skipping, stale stats, cold start, executor pressure), and map it to the single matching lever. When two answers look plausible, the better one targets the exact bottleneck the stem describes rather than a generic 'make it bigger' fix.

Scheduling maintenance sensibly

Maintenance itself consumes CU, so timing matters. Run OPTIMIZE after meaningful write volume (for example after a daily batch load or once a streaming table accumulates many small files), not on every micro-batch. Schedule VACUUM periodically with at least the 7-day default retention so time travel and concurrent readers stay safe. For tables consumed by Direct Lake, prioritize V-Order so the most-read tables render reports fastest.

The exam may ask which maintenance to run and when — the best answers compact and V-Order read-heavy tables on a cadence proportional to write activity, while reserving VACUUM for controlled storage cleanup rather than constant execution.

DP-700 Study Guide

Azure DP-700

4.3 Scenario Practice for Monitor and optimize an analytics solution

Key Takeaways

Delta table maintenance scenarios

Warehouse and query performance scenarios

Spark configuration scenarios

Partitioning and OneLake optimization scenarios

Scenario reading method

Scheduling maintenance sensibly

DP-700 Study Guide

1Chapter 1: DP-700 Orientation and Exam Strategy

2Chapter 2: Implement and manage an analytics solution

3Chapter 3: Ingest and transform data

4Chapter 4: Monitor and optimize an analytics solution

5Chapter 5: Final Review and Test Day

Azure DP-700

4.3 Scenario Practice for Monitor and optimize an analytics solution

Key Takeaways

Delta table maintenance scenarios

Warehouse and query performance scenarios

Spark configuration scenarios

Partitioning and OneLake optimization scenarios

Scenario reading method

Scheduling maintenance sensibly