Which statement about a lakehouse SQL analytics endpoint is correct and commonly tested as a trap?

It is read-only; you cannot write transformed tables through it. The lakehouse SQL analytics endpoint is read-only — it lets you query lakehouse tables with T-SQL but cannot perform write DML. To write transformed tables you use a Spark notebook or a warehouse, which is why scenarios needing T-SQL writes point to a warehouse.

An incremental load uses a watermark on a date column that users can manually edit. What is the likely failure?

Updated rows back-dated below the high-water mark are silently skipped. Because users can back-date the editable column, an updated row can receive a value below the stored high-water mark, so the next run filters it out and never loads it. The fix is a system-maintained, monotonically increasing column such as RowVersion or ModifiedDate.

A 200-million-row Silver-to-Gold transformation requires complex window functions and joins. Why is a Dataflow Gen2 the wrong choice here?

The single-node Mashup engine will hit memory limits, whereas Spark distributes the work. Dataflow Gen2's Mashup engine runs effectively single-node, so a 200-million-row transform with window functions will exhaust memory or run for hours; a Spark notebook distributes the workload across nodes. Dataflows can write to lakehouses, handle batch data, and use Append.

Common Traps in Ingest and transform data | Free Guide 2026

Key Takeaways

Don't force a 200M-row complex transform through a single-node Dataflow Gen2; Spark distributes the load, while the Mashup engine will hit memory limits.
Don't spin up a Spark notebook for a tiny incremental copy; Copy job is cheaper because Spark reserves compute for the whole session.
A watermark column must be monotonically increasing and never back-dated; using a user-editable date silently misses updated rows.
Shortcuts and mirroring virtualize data in place and do not copy it; Copy job and Copy activity physically move data.
Transformation language is tied to the store: PySpark/Spark SQL in lakehouse notebooks, T-SQL in the warehouse, M in Dataflows, KQL in the eventhouse.

Tool Mismatch Traps

The most expensive mistakes pick a valid tool that is wrong for the scale or cost:

Trap: Dataflow Gen2 for huge transforms. Dataflow's Mashup engine runs effectively single-node, so a 200-million-row join with window functions will exhaust memory or run for hours. The correct answer is a Spark notebook, which distributes work across nodes.
Trap: Spark notebook for a tiny incremental copy. A Spark session reserves roughly 4-8 capacity units per second for its whole lifetime, so it costs far more than a Copy job for a small recurring load. "Least cost / least effort" incremental copy is almost always Copy job.
Trap: Copy activity for transformation. Copy activity is EL movement with basic mapping only; it cannot join, aggregate, or apply business logic. If the scenario needs transformation, you need a Dataflow or notebook, not Copy.

Read every scenario for the quantitative and cost cues — row counts, frequency, and "least effort" — before choosing.

Trap: pipeline when the answer is orchestration of a notebook. A pipeline is not itself a transformation engine; it orchestrates. If the work is a complex transform, the right design is often a pipeline that runs a notebook activity on a schedule, not the pipeline doing the transform alone. Questions that mention scheduling plus heavy transformation usually want both: pipeline for orchestration, notebook for logic.
Trap: ignoring that Copy job and mirroring overlap. Both can keep a destination in sync with a changing source. Mirroring is the lowest-effort choice for continuously replicating a whole operational database into OneLake; Copy job is the choice when you want explicit control over which tables and which incremental strategy. If the scenario says "mirror the entire Azure SQL database in near real time," choose mirroring.

Incremental-Load and Watermark Traps

Incremental loading fails silently when the change-tracking design is weak:

Trap	Why it fails	Fix
Watermark on a user-editable date	Users can back-date rows, so updates land below the high-water mark and are skipped	Use a system-maintained `RowVersion`/`ModifiedDate`
Assuming Copy activity tracks state	Copy activity has no built-in state; the watermark must be stored and read by you	Use a Lookup + write-back, or switch to Copy job
Replace destination on an incremental flow	Replace truncates the table, discarding prior loads	Use Append for incremental, Replace only for full refresh
Ignoring late-arriving data	Late events miss their window and skew aggregates	Use watermarking/late-data handling in structured streaming

Copy job is the safe answer when the question stresses minimal effort because it tracks last-run state and supports both watermark- and CDC-based incremental copy automatically.

A subtler trap is confusing append-only loads with deduplication. Append in a Dataflow or a plain INSERT in Spark will faithfully add duplicate rows if the source re-sends them; only a MERGE/upsert keyed on the business key produces an idempotent load. If a scenario says "re-running the load must not create duplicates," Append alone is wrong — you need MERGE logic, which in turn points to a notebook or warehouse stored procedure rather than a Copy or simple Dataflow append.

Virtualize-vs-Copy and Language-vs-Store Traps

Two conceptual confusions are heavily tested:

Virtualize versus copy. Shortcuts and mirroring make external data appear in OneLake without duplicating storage; Copy job and Copy activity physically move bytes. When a question asks for the "least storage" or "single copy of truth" option over existing ADLS/S3 data, the answer is a shortcut, not a copy.

Transformation language is bound to the store. You cannot mix languages freely:

Lakehouse → PySpark, Scala, or Spark SQL in a notebook (also T-SQL read-only via the SQL analytics endpoint).
Warehouse → T-SQL for full read/write DML and multi-table transactions.
Dataflow Gen2 → M (Power Query).
Eventhouse / KQL DB → KQL.

A distractor that proposes "transform warehouse data with PySpark" or "run KQL against a lakehouse table" is wrong because the language does not match the store. The SQL analytics endpoint over a lakehouse is read-only — you cannot use it to write transformed tables; use a notebook or warehouse for writes.

More language-and-store traps to memorize

"Use a notebook to write to the warehouse." Spark notebooks read and write lakehouse Delta tables; the warehouse is written with T-SQL (or a Copy/Dataflow that targets it). Mixing these is a classic wrong answer.
"Query the eventhouse with T-SQL." Eventhouse/KQL databases use KQL; there is no T-SQL DML against them. If the requirement is interactive analytics over high-volume telemetry, the language is KQL and the store is the eventhouse.
"A shortcut copies the data on a schedule." Shortcuts never copy; they read in place. If the requirement needs a physical, point-in-time copy (for example, to freeze a snapshot), a shortcut is wrong and a Copy job or activity is right.
"Replace the warehouse with a lakehouse for fine-grained T-SQL security." When the scenario stresses multi-table transactions, stored procedures, or rich T-SQL security, the warehouse is the intended store even though both persist Delta in OneLake.

Finally, do not assume Preview features are off-limits — Microsoft states the exam may include commonly used Preview features. But when a GA tool and a Preview tool both fit, the GA tool is the safer pick unless the question explicitly favors the newer capability.

DP-700 Study Guide

Azure DP-700

3.4 Common Traps in Ingest and transform data

Key Takeaways

Tool Mismatch Traps

Incremental-Load and Watermark Traps

Virtualize-vs-Copy and Language-vs-Store Traps

More language-and-store traps to memorize

DP-700 Study Guide

1Chapter 1: DP-700 Orientation and Exam Strategy

2Chapter 2: Implement and manage an analytics solution

3Chapter 3: Ingest and transform data

4Chapter 4: Monitor and optimize an analytics solution

5Chapter 5: Final Review and Test Day

Azure DP-700

3.4 Common Traps in Ingest and transform data

Key Takeaways

Tool Mismatch Traps

Incremental-Load and Watermark Traps

Virtualize-vs-Copy and Language-vs-Store Traps

More language-and-store traps to memorize