You want a cleanup task that sends an alert and tears down resources whenever an upstream ETL task FAILS, but is skipped when it succeeds. Which Run if condition fits?

At least one failed. The 'At least one failed' condition runs the downstream task only when an upstream dependency fails, which is exactly the trigger you want for an alert/cleanup branch. 'All succeeded' would never fire on failure, and 'All done' would also run on success.

A daily schedule of '0 0 2 * * ?' represents what, and what compiles the UI schedule?

2:00 AM daily; compiled to a quartz cron expression. The UI schedule compiles to a quartz cron expression, and '0 0 2 * * ?' means 2:00 AM every day. Lakeflow Jobs also lets you set the timezone so the cron respects daylight-saving boundaries.

Databricks Workflows and Lakeflow Jobs — Free Study Guide 2026

Key Takeaways

Lakeflow Jobs orchestrate work using three concepts: jobs (the container), tasks (units of work shown as a DAG), and triggers (what starts the job).
Tasks declare upstream dependencies and use the 'Run if' condition (e.g. All succeeded, At least one succeeded, All done) to control conditional and error-handling branches.
Triggers can be scheduled (cron/quartz), continuous, file-arrival, or table-update; classic jobs default to no task retries while serverless jobs auto-optimize retries.
Repair run reruns only the failed and downstream tasks of a multi-task job, reusing successful task output instead of rerunning the whole job.
Job clusters are created for the run and terminated at the end, making them cheaper and more isolated than shared all-purpose clusters.

Jobs, Tasks, and Triggers

Lakeflow Jobs (the orchestrator formerly branded Databricks Workflows) is the native scheduler for the Databricks Data Intelligence Platform. The exam expects you to know three building blocks:

A job is the top-level container that coordinates, schedules, and runs work. A job can be a single notebook or hundreds of tasks with branching logic.
A task is one unit of work — a notebook, Python script, SQL file, Lakeflow Declarative Pipeline, dbt project, JAR, or another job (Run Job task). Tasks are arranged as a Directed Acyclic Graph (DAG), meaning dependencies flow one direction and cannot cycle back.
A trigger is the mechanism that starts a run. It can be time-based (a schedule) or event-based (file arrival in cloud storage, or a Delta table update).

Because the task graph is a DAG, Databricks can run independent tasks in parallel and only blocks a task until its upstream dependencies finish.

Dependencies and the Run-if Condition

Each task lists its Depends on upstream tasks. By default a task runs only when all of its upstream dependencies succeed, but the Run if setting changes this so you can build error-handling and fan-in branches:

Run if value	Task runs when…
All succeeded (default)	every upstream dependency succeeded
At least one succeeded	one or more upstream tasks succeeded
None failed	no upstream failed (skipped is OK)
All done	every upstream finished, regardless of outcome
At least one failed	one or more upstream failed (cleanup/alert branch)
All failed	every upstream dependency failed

The All done and At least one failed values are how you wire a notification or rollback task that should fire even when an earlier task breaks. Lakeflow Jobs also supports If/else condition tasks and For each tasks to loop a task over a parameter list.

Triggers, Schedules, and Cron

A job's trigger type determines when it starts:

Scheduled — a recurring time-based trigger. The UI builds the schedule, but it compiles to a quartz cron expression (for example 0 0 2 * * ? for 2:00 AM daily). You also pick a timezone, which matters for daylight-saving correctness.
Continuous — keeps one run active at all times, restarting automatically; used for always-on streaming. Continuous jobs cannot use a normal retry policy.
File arrival — polls a Unity Catalog external location or volume path and starts a run when new files land, lowering latency versus a fixed schedule.
Table update — triggers when a monitored Delta table changes.

If a scheduled run is still going when the next trigger fires, the concurrent runs limit and queueing settings decide whether the new run waits, is skipped, or runs in parallel.

Clusters, Retries, Notifications, and Repair

Compute choices. A task can run on serverless compute, a dedicated job cluster, or an existing all-purpose cluster. A job cluster spins up for the run and terminates when it ends — cheaper and more isolated, and billed at the lower Jobs Compute DBU rate. An all-purpose cluster stays running and is meant for interactive notebooks; reusing it for scheduled jobs costs more and risks resource contention.

Retries. Per task you set max_retries, a min_retry_interval between attempts, and retry on timeout. On classic compute the default is no retries; serverless jobs auto-optimize retries. A timeout marks a task Timed Out, and if you set both timeout and retries, the timeout applies to each attempt.

Notifications send email, Slack, or webhook alerts on start, success, failure, or duration-exceeded.

Repair run is exam-favorite: for a multi-task job, repair reruns only the failed tasks and their downstream tasks, reusing the output of tasks that already succeeded — far cheaper than rerunning everything.

Parameters, Git, and Task Values

Lakeflow Jobs supports job parameters — key/value pairs defined at the job level and automatically pushed down to every task, so one parameter (for example run_date) drives the whole DAG. Tasks read them with dbutils.widgets.get(...) in a notebook or as named arguments for a Python/SQL task. You can override parameters at trigger time, which is how the same job runs a backfill for an arbitrary date.

Git integration lets a job run notebooks or scripts directly from a remote Git repository (a specific branch, tag, or commit) instead of the workspace, so production runs exactly the reviewed code. This is the bridge between Jobs and the source-controlled, Asset-Bundle workflow.

Tasks can also pass small results downstream with task values (dbutils.jobs.taskValues.set/get), letting a control task compute a value (such as a row count) that a later If/else condition branches on.

Queueing and Concurrency

When runs overlap, two settings govern behavior: max concurrent runs caps how many runs of the same job execute at once, and queueing holds a triggered run until a slot frees instead of skipping it. For most ETL you set concurrent runs to 1 and enable queueing so a slow run does not drop the next scheduled load. Note that a continuous job ignores schedules and queueing because it always keeps one run alive, restarting on completion or failure. Choosing the right trigger, concurrency, and retry combination per job is the difference between a pipeline that quietly recovers and one that pages an engineer at 3 AM.

Test Your Knowledge

A Lakeflow Job has 12 tasks; task 7 fails after tasks 1-6 succeed. You fix the bug and want to resume without recomputing tasks 1-6. What should you use?

Use Repair run, which reruns only the failed task and its downstream tasks

Clone the job and run it from scratch

Switch the job to a continuous trigger

Increase max_retries on task 7 and wait

Test Your Knowledge

Which statement about job clusters versus all-purpose clusters is correct?

A job cluster is created for the run and terminated when it finishes, billed at the lower Jobs Compute rate

An all-purpose cluster is always cheaper for scheduled production jobs

Job clusters cannot use Photon

All-purpose clusters terminate automatically at the end of every job task

Databricks Certified Data Engineer Associate

Databricks Certified Data Engineer Associate

4.1 Databricks Workflows and Lakeflow Jobs

Key Takeaways

Jobs, Tasks, and Triggers

Dependencies and the Run-if Condition

Triggers, Schedules, and Cron

Clusters, Retries, Notifications, and Repair

Parameters, Git, and Task Values

Queueing and Concurrency

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Databricks Certified Data Engineer Associate

4.1 Databricks Workflows and Lakeflow Jobs

Key Takeaways

Jobs, Tasks, and Triggers

Dependencies and the Run-if Condition

Triggers, Schedules, and Cron

Clusters, Retries, Notifications, and Repair

Parameters, Git, and Task Values

Queueing and Concurrency