4.2 Core Workflows and Decision Points
Key Takeaways
- Pipeline run detail shows per-activity status, duration, input/output, and rerun-from-failed-activity options.
- Semantic model refreshes can be monitored from refresh history, the Monitoring hub, or a pipeline's Semantic model refresh activity.
- Spark jobs surface the full Spark UI (stages, tasks, DAG, executors) plus driver/executor log downloads in the monitor.
- Fabric Activator alerts are condition-based reflexes: define the event/object, the condition, and the action (email, Teams, Power Automate).
- Set-on-failure alerts on pipelines and semantic-model refreshes notify teams without manual log-watching.
Monitoring each item type
Pipelines. From the Monitoring hub or the pipeline canvas you open a run to see each activity's status, duration, and input/output. You can filter by status, time range, and pipeline name. When a run fails you can rerun from the failed activity rather than re-executing the whole pipeline, which saves CU and time. Each activity exposes an error code and message and the activity that raised it.
Dataflows Gen2. Refresh activity appears in the Monitoring hub, including refreshes triggered by a pipeline. A failing dataflow usually points to a changed or removed source column, a credential/gateway issue, or a transformation that no longer matches the schema.
Semantic model refreshes. There are several monitoring paths: the model's Refresh history, the Monitoring hub, and a pipeline's Semantic model refresh activity. The refresh activity will error if you trigger a refresh while one is already in progress, so design for non-overlap.
Spark, notebooks, and eventstreams
Spark jobs and notebooks. Selecting a Spark application in the monitor opens the full Spark UI experience — stages, tasks, the directed acyclic graph (DAG), executors, and SQL plans — plus Logs and Event Logs tabs to download driver and executor logs. For Structured Streaming you watch Input Rate, Process Rate, and Batch Duration to confirm the stream keeps up. A failed notebook is diagnosed from the cell-level error and the executor logs.
Eventstreams. Eventstreams run continuously, so monitoring focuses on throughput and errors rather than a discrete run. Workspace monitoring auto-creates purpose-built tables in the monitoring Eventhouse capturing eventstream health, performance, and error data, queryable with KQL.
Eventhouses / KQL databases. The eventhouse System overview reports storage usage, compute usage, ingestion rates, query counts, errors, duration, and cache misses per database.
Turning signals into alerts with Fabric Activator
Fabric Activator (formerly Data Activator / Reflex) is the no-code engine that watches data and takes action when a condition is met. A reflex binds three things: the object/event to monitor (a Power BI visual measure, an eventstream event, a Real-Time hub stream, or an item run state), the condition (threshold crossed, value changed, refresh failed), and the action (email, Teams message, or a Power Automate flow).
The most exam-relevant pattern is operational alerting: configure an alert on failure for a pipeline or a semantic-model refresh so the on-call team is notified automatically instead of watching the Monitoring hub. For Power BI visual triggers, evaluation frequency follows the underlying semantic model's refresh schedule, so an alert is only as timely as the data behind it.
| Need | Correct workflow |
|---|---|
| Notify team when any pipeline fails | Activator alert (reflex) on pipeline failure / set-on-failure alert |
| See why one pipeline run failed | Open that run, read the failed activity's error |
| Re-run only the failed steps | Rerun from failed activity |
| Inspect a slow Spark stage | Open the Spark UI, examine stages/DAG/executors |
| Confirm a stream keeps up | Watch Input/Process Rate and Batch Duration |
Error resolution workflows by item type
Resolving errors follows a consistent pattern: locate the failure surface, read the specific error, and trace it to a root cause. Each item type exposes a different diagnostic path.
- Pipeline / Copy activity errors. Open the failed run, read the activity's error code and message, and check the source/sink connection, the dataset schema, and any expression that evaluates to null. For controlled failures, a Fail activity can stop the pipeline with a custom error code and message when a validation step detects bad data.
- Dataflow Gen2 errors. Inspect the failing step in the query editor; the most common cause is a removed or renamed source column that breaks a downstream transformation, followed by credential/gateway problems.
- Notebook / Spark errors. Read the cell-level exception, then download driver and executor logs from the Spark monitor; out-of-memory and skew issues appear as failing tasks in specific stages.
- T-SQL / Warehouse errors. Read the SQL error number and message; many trace to data-type mismatches, missing objects, or permissions. Use DMVs to see the live request and query insights to find the offending historical query.
- Eventhouse / KQL errors. Check the eventhouse System overview for ingestion errors and cache misses, and inspect the ingestion result for mapping or format failures.
Validation gating
A frequently tested pattern is defensive pipeline design: run a validation/lookup step, branch on the result, and use a Fail activity to halt with a meaningful error code if the data is invalid — preventing bad data from propagating downstream. This is preferred over letting a later activity fail with an obscure message.
Reading the failed activity, not guessing
The discipline the exam rewards is reading the actual error before acting. A stem that says a pipeline 'keeps failing' is inviting you to open the run, find the first failed activity, and read its message — not to immediately scale the capacity or rebuild the pipeline. The same applies to dataflows (open the failing step), notebooks (read the cell exception and executor logs), and warehouse queries (read the SQL error number). Distractors frequently offer a plausible but premature remediation; the better answer almost always begins with inspecting the specific error so the fix targets the true cause rather than a symptom.
A pipeline should stop immediately with a custom error code and message when a validation step detects bad data, so downstream activities never run on invalid input. Which activity should you use?
A pipeline with ten activities fails on activity eight. You fix the underlying data issue and want to resume without re-running the seven activities that already succeeded. What is the most appropriate action?
You want on-call engineers to be notified automatically by Teams whenever a specific semantic model's scheduled refresh fails, without anyone watching the Monitoring hub. Which design fits best?