3.5 Practice Drills and Readiness Markers

Key Takeaways

  • You are ready when you can instantly map a scenario to Copy activity, Copy job, Dataflow Gen2, notebook, Eventstream, or Eventhouse based on volume, latency, and effort cues.
  • Master the medallion flow end to end: shortcut/mirror or Copy into Bronze, Spark/Dataflow into Silver, denormalized dimensional model into Gold.
  • Be able to build a watermark incremental pipeline by hand (Lookup + Copy + watermark write) and explain when Copy job replaces all of it.
  • Know the streaming stack cold: Eventstream sources/operations/destinations, derived streams, Eventhouse ingestion modes, KQL update policies, and the four window types.
  • Practice optimization basics: OPTIMIZE/V-Order compaction and partitioning for lakehouse tables that load slowly or query slowly.
Last updated: June 2026

Readiness Self-Check

Before exam day, you should be able to answer each of these in seconds:

  • Given a source, volume, latency, and "least effort" constraint, name the correct ingestion tool and the transformation language for the destination.
  • Sketch a medallion flow: how does raw data reach Bronze (shortcut, mirror, Copy, or Eventstream), Silver (notebook or Dataflow), and Gold (dimensional model in a warehouse)?
  • Build a watermark-based incremental pipeline by hand, and explain why Copy job removes the need for it.
  • Pick the right window function (tumbling, hopping, sliding, session) for a stated streaming aggregation.
  • Decide between a native KQL table and a OneLake shortcut for Real-Time Intelligence, and when Query acceleration for a shortcut is worth it.

If any of these is slow, return to the relevant section. The exam gives you many short scenarios, so speed and confident elimination of distractors matter as much as raw knowledge.

A useful mental checklist for any ingestion question is Source → Volume → Latency → Transform → Destination. Identify each in the prompt: where the data comes from, how much there is, how fresh it must be, what reshaping it needs, and where it must land. Those five answers almost always collapse to a single correct tool and language.

For example, operational SQL source + millions of changed rows + near real time + no reshaping + OneLake resolves to mirroring; flat files + terabytes + nightly + none + lakehouse resolves to Copy activity; Kafka + continuous + sub-second + filter + eventhouse resolves to Eventstream into a KQL database.

Drill: Tool-Selection Reps

Run these reps and justify each pick out loud:

ScenarioCorrect choice
Move 5 TB from blob storage to a lakehouse, no transforms, scheduled nightlyCopy activity in a pipeline
Recurring CDC replication from Azure SQL DB, least effortCopy job (or mirroring)
Analysts clean and merge CSVs with Power Query before SilverDataflow Gen2
SCD Type 2 over a 300M-row dimensionSpark notebook
Capture IoT telemetry and route to KQL + lakehouseEventstream
Interactive ad-hoc analytics on billions of log rowsEventhouse / KQL
Make an existing ADLS Gen2 folder queryable without copyingOneLake shortcut

When you can explain why each alternative is wrong, not just why your pick is right, you are at exam standard. Most DP-700 questions are won by eliminating two plausible-but-wrong tools.

Extend the drill to transformation-language reps. For each store, state the language and one thing only that store can do well: lakehouse → PySpark/Spark SQL, best for large distributed transforms and ML; warehouse → T-SQL, best for multi-table transactions and stored-procedure logic; Dataflow → M, best for analyst-friendly Power Query cleaning; eventhouse → KQL, best for interactive analytics on telemetry.

Then practice the streaming reps: name the four window types and one requirement each fits — tumbling for non-overlapping period totals, hopping for moving averages, sliding for event-driven emission, session for grouping bursts by an inactivity gap. Being able to recite these without hesitation removes a whole class of avoidable mistakes.

Drill: Optimization and Reliability

The monitor-and-optimize domain overlaps here, so practice the table-tuning levers that keep ingestion healthy:

  • OPTIMIZE / table compaction — merges many small Parquet files into fewer large ones; small-file proliferation from frequent micro-batch loads is the most common cause of slow lakehouse reads.
  • V-Order — Fabric's write-time optimization that improves read performance for Power BI and SQL engines on Delta tables.
  • Partitioning — partition large fact tables on a high-cardinality-but-pruning-friendly column (often a date) so queries scan less data; avoid over-partitioning, which creates tiny files.
  • VACUUM — removes old, unreferenced data files after retention, reclaiming storage; respect the retention window so you do not break time-travel queries that still need older versions.
  • Pipeline retry and monitoring — configure activity retry counts and intervals, set timeouts, and use the Monitor hub and run history to find failed activities, inspect input/output, and re-run from the point of failure rather than restarting the whole pipeline.

Finally, rehearse the failure-mode reasoning: a slow load usually means small files (compact), a skipped-rows incremental usually means a bad watermark, and a streaming gap usually means an under-provisioned eventhouse or wrong ingestion mode. Pattern-matching symptoms to root causes is exactly how the optimization questions are framed.

Close your preparation by linking ingestion to the dimensional model it feeds. Gold tables should be denormalized, carry surrogate keys, and implement the correct SCD type, because the downstream semantic model and Power BI reports assume a clean star schema. If a report shows duplicated dimension members, the root cause is usually a missing dedup or a broken surrogate-key assignment upstream in Silver — not the report itself. Tracing a reporting symptom back through Gold, Silver, and Bronze to the offending ingestion step is the highest-order skill this domain tests, and it is what separates a passing score from a guess.

When you can do that reliably across batch and streaming paths, you have met the SIE/EA depth standard for this chapter and are ready for the Ingest-and-transform questions on DP-700.

Test Your Knowledge

A lakehouse table loaded by frequent micro-batches has become slow to query. Which maintenance operation most directly addresses the small-file problem?

A
B
C
D
Test Your Knowledge

When eliminating distractors on a DP-700 tool-selection question, which scenario cue most strongly points away from a Spark notebook?

A
B
C
D
Test Your Knowledge

An incremental load reports missing updated rows even though new inserts appear correctly. What is the most likely root cause to investigate first?

A
B
C
D
Test Your Knowledge

Which combination correctly pairs each medallion layer with a typical Fabric implementation in a lakehouse-plus-warehouse design?

A
B
C
D