3.1 Ingest and transform data Overview
Key Takeaways
- Ingest and transform data accounts for 30-35% of the DP-700 blueprint.
- The domain should be studied as job tasks, not a list of definitions.
- Questions often ask which action, control, data element, or workflow step is most appropriate.
- Use domain weight and practice misses to decide how much review time this area needs.
3.1 Ingest and transform data Overview
Ingest and transform data is a DP-700 blueprint domain focused on Batch and streaming ingestion, loading patterns, shortcuts, mirroring, transformations with Power Query, PySpark, SQL, and KQL, and dimensional-model preparation..
Official baseline
Use the current official materials before relying on secondary summaries. Primary source: Microsoft Certified: Fabric Data Engineer Associate. Also compare the official content outline, candidate guide, and scheduling resources when policies affect eligibility, fees, timing, or retakes.
Study notes
Ingest and transform data is weighted at 30-35%. The official description is: Batch and streaming ingestion, loading patterns, shortcuts, mirroring, transformations with Power Query, PySpark, SQL, and KQL, and dimensional-model preparation..
For test prep, convert the domain into actions. Ask: what document, data element, system control, report, code, policy, or communication step would a competent professional choose?
| High-yield cue | How to use it |
|---|---|
| Dp700 Loading Patterns | Practice recognizing when the stem is testing dp700 loading patterns and what action follows. |
| Dp700 Streaming Processing | Practice recognizing when the stem is testing dp700 streaming processing and what action follows. |
| Dp700 Batch Ingestion | Practice recognizing when the stem is testing dp700 batch ingestion and what action follows. |
| Dp700 Batch Transformation | Practice recognizing when the stem is testing dp700 batch transformation and what action follows. |
| Dp700 Streaming Storage | Practice recognizing when the stem is testing dp700 streaming storage and what action follows. |
| Dp700 Orchestration | Practice recognizing when the stem is testing dp700 orchestration and what action follows. |
Do not study this domain only by rereading notes. Build small scenarios and ask what the role should do next. The exam is more likely to test a practical decision than a pure definition.
Exam-ready mental model
For this section, reduce the material to a repeatable model: cue, authority, action, evidence, and risk. The cue tells you why the question is being asked. The authority is the rule, policy, standard, configuration behavior, official guideline, or operational constraint. The action is what the professional should do next. The evidence is the data point, document, log, calculation, or system state that supports the answer. The risk is what goes wrong if you choose the shortcut.
When reviewing, force yourself to state that model out loud for missed questions. If you can only remember a definition but cannot connect it to an action, the material is not yet exam-ready. If you can name the action but not the authority, you may choose an answer that sounds operationally convenient but violates the official process. If you can name the rule but not the evidence, you may overapply it to the wrong scenario.
How this appears on the exam
The exam usually tests applied judgment. Read the stem for the role, the setting, the governing rule, and the immediate task. Then choose the answer that is most accurate, policy-aligned, and complete for that task. If an answer sounds familiar but ignores the specific cue in the stem, treat it as a distractor. If two answers seem possible, prefer the one that is more specific to the stated task and leaves the cleanest audit trail.
Error-log rule
After each missed question in this area, write one sentence that starts with: I missed this because. Good categories are misread cue, did not know rule, wrong sequence, calculation error, overgeneralized policy, or chose the faster but less defensible action. Add a second sentence that starts with: Next time I will look for. That second sentence turns the miss into a concrete cue you can recognize later.
You want Azure SQL data to stay continuously available in OneLake with low-latency replication and without building a custom ETL process. Which Fabric feature should you use?
You need to query external Delta data in Fabric without copying or staging it into a new storage location. Which feature is the best fit?