A lakehouse table loaded by frequent micro-batches has become slow to query. Which maintenance operation most directly addresses the small-file problem?

OPTIMIZE to compact small files into fewer larger ones. OPTIMIZE compacts many small Parquet files into fewer larger ones, which is the direct fix for the small-file proliferation that frequent micro-batch loads cause. VACUUM reclaims storage but does not compact, CSV removes Delta benefits, and more partitions can worsen the small-file problem.

When eliminating distractors on a DP-700 tool-selection question, which scenario cue most strongly points away from a Spark notebook?

A small recurring incremental copy needing least cost and effort. A small recurring incremental copy with a least-cost, least-effort constraint points to Copy job, not a notebook, because a Spark session reserves compute for its whole duration and is overkill. The other scenarios — large SCD transforms, ML, and complex custom joins — are exactly where notebooks excel.

An incremental load reports missing updated rows even though new inserts appear correctly. What is the most likely root cause to investigate first?

The watermark column can be back-dated, so updates fall below the high-water mark. Inserts appearing while updates are missed is the signature of a flawed watermark — an editable or back-datable column lets updated rows receive values below the stored high-water mark, so they are filtered out. Storage, iteration count, and V-Order would not selectively drop updated rows.

Which combination correctly pairs each medallion layer with a typical Fabric implementation in a lakehouse-plus-warehouse design?

Bronze and Silver in lakehouses, Gold in a warehouse. A common, Microsoft-recommended pattern places Bronze and Silver in lakehouses (Spark-friendly, schema-on-read) and Gold in a warehouse so business users query a familiar T-SQL endpoint. The other pairings mismatch layers with stores that cannot serve those roles.

Practice Drills and Readiness Markers — Free Study Guide 2026

Key Takeaways

You are ready when you can instantly map a scenario to Copy activity, Copy job, Dataflow Gen2, notebook, Eventstream, or Eventhouse based on volume, latency, and effort cues.
Master the medallion flow end to end: shortcut/mirror or Copy into Bronze, Spark/Dataflow into Silver, denormalized dimensional model into Gold.
Be able to build a watermark incremental pipeline by hand (Lookup + Copy + watermark write) and explain when Copy job replaces all of it.
Know the streaming stack cold: Eventstream sources/operations/destinations, derived streams, Eventhouse ingestion modes, KQL update policies, and the four window types.
Practice optimization basics: OPTIMIZE/V-Order compaction and partitioning for lakehouse tables that load slowly or query slowly.

Readiness Self-Check

Before exam day, you should be able to answer each of these in seconds:

Given a source, volume, latency, and "least effort" constraint, name the correct ingestion tool and the transformation language for the destination.
Sketch a medallion flow: how does raw data reach Bronze (shortcut, mirror, Copy, or Eventstream), Silver (notebook or Dataflow), and Gold (dimensional model in a warehouse)?
Build a watermark-based incremental pipeline by hand, and explain why Copy job removes the need for it.
Pick the right window function (tumbling, hopping, sliding, session) for a stated streaming aggregation.
Decide between a native KQL table and a OneLake shortcut for Real-Time Intelligence, and when Query acceleration for a shortcut is worth it.

If any of these is slow, return to the relevant section. The exam gives you many short scenarios, so speed and confident elimination of distractors matter as much as raw knowledge.

A useful mental checklist for any ingestion question is Source → Volume → Latency → Transform → Destination. Identify each in the prompt: where the data comes from, how much there is, how fresh it must be, what reshaping it needs, and where it must land. Those five answers almost always collapse to a single correct tool and language.

For example, operational SQL source + millions of changed rows + near real time + no reshaping + OneLake resolves to mirroring; flat files + terabytes + nightly + none + lakehouse resolves to Copy activity; Kafka + continuous + sub-second + filter + eventhouse resolves to Eventstream into a KQL database.

Drill: Tool-Selection Reps

Run these reps and justify each pick out loud:

Scenario	Correct choice
Move 5 TB from blob storage to a lakehouse, no transforms, scheduled nightly	Copy activity in a pipeline
Recurring CDC replication from Azure SQL DB, least effort	Copy job (or mirroring)
Analysts clean and merge CSVs with Power Query before Silver	Dataflow Gen2
SCD Type 2 over a 300M-row dimension	Spark notebook
Capture IoT telemetry and route to KQL + lakehouse	Eventstream
Interactive ad-hoc analytics on billions of log rows	Eventhouse / KQL
Make an existing ADLS Gen2 folder queryable without copying	OneLake shortcut

When you can explain why each alternative is wrong, not just why your pick is right, you are at exam standard. Most DP-700 questions are won by eliminating two plausible-but-wrong tools.

Extend the drill to transformation-language reps. For each store, state the language and one thing only that store can do well: lakehouse → PySpark/Spark SQL, best for large distributed transforms and ML; warehouse → T-SQL, best for multi-table transactions and stored-procedure logic; Dataflow → M, best for analyst-friendly Power Query cleaning; eventhouse → KQL, best for interactive analytics on telemetry.

Then practice the streaming reps: name the four window types and one requirement each fits — tumbling for non-overlapping period totals, hopping for moving averages, sliding for event-driven emission, session for grouping bursts by an inactivity gap. Being able to recite these without hesitation removes a whole class of avoidable mistakes.

Drill: Optimization and Reliability

The monitor-and-optimize domain overlaps here, so practice the table-tuning levers that keep ingestion healthy:

OPTIMIZE / table compaction — merges many small Parquet files into fewer large ones; small-file proliferation from frequent micro-batch loads is the most common cause of slow lakehouse reads.
V-Order — Fabric's write-time optimization that improves read performance for Power BI and SQL engines on Delta tables.
Partitioning — partition large fact tables on a high-cardinality-but-pruning-friendly column (often a date) so queries scan less data; avoid over-partitioning, which creates tiny files.
VACUUM — removes old, unreferenced data files after retention, reclaiming storage; respect the retention window so you do not break time-travel queries that still need older versions.
Pipeline retry and monitoring — configure activity retry counts and intervals, set timeouts, and use the Monitor hub and run history to find failed activities, inspect input/output, and re-run from the point of failure rather than restarting the whole pipeline.

Finally, rehearse the failure-mode reasoning: a slow load usually means small files (compact), a skipped-rows incremental usually means a bad watermark, and a streaming gap usually means an under-provisioned eventhouse or wrong ingestion mode. Pattern-matching symptoms to root causes is exactly how the optimization questions are framed.

Close your preparation by linking ingestion to the dimensional model it feeds. Gold tables should be denormalized, carry surrogate keys, and implement the correct SCD type, because the downstream semantic model and Power BI reports assume a clean star schema. If a report shows duplicated dimension members, the root cause is usually a missing dedup or a broken surrogate-key assignment upstream in Silver — not the report itself. Tracing a reporting symptom back through Gold, Silver, and Bronze to the offending ingestion step is the highest-order skill this domain tests, and it is what separates a passing score from a guess.

When you can do that reliably across batch and streaming paths, you have met the SIE/EA depth standard for this chapter and are ready for the Ingest-and-transform questions on DP-700.

DP-700 Study Guide

Azure DP-700

3.5 Practice Drills and Readiness Markers

Key Takeaways

Readiness Self-Check

Drill: Tool-Selection Reps

Drill: Optimization and Reliability

DP-700 Study Guide

1Chapter 1: DP-700 Orientation and Exam Strategy

2Chapter 2: Implement and manage an analytics solution

3Chapter 3: Ingest and transform data

4Chapter 4: Monitor and optimize an analytics solution

5Chapter 5: Final Review and Test Day

Azure DP-700

3.5 Practice Drills and Readiness Markers

Key Takeaways

Readiness Self-Check

Drill: Tool-Selection Reps

Drill: Optimization and Reliability