4.1 Databricks Workflows and Lakeflow Jobs

Key Takeaways

  • Lakeflow Jobs orchestrate work using three concepts: jobs (the container), tasks (units of work shown as a DAG), and triggers (what starts the job).
  • Tasks declare upstream dependencies and use the 'Run if' condition (e.g. All succeeded, At least one succeeded, All done) to control conditional and error-handling branches.
  • Triggers can be scheduled (cron/quartz), continuous, file-arrival, or table-update; classic jobs default to no task retries while serverless jobs auto-optimize retries.
  • Repair run reruns only the failed and downstream tasks of a multi-task job, reusing successful task output instead of rerunning the whole job.
  • Job clusters are created for the run and terminated at the end, making them cheaper and more isolated than shared all-purpose clusters.
Last updated: June 2026

Jobs, Tasks, and Triggers

Lakeflow Jobs (the orchestrator formerly branded Databricks Workflows) is the native scheduler for the Databricks Data Intelligence Platform. The exam expects you to know three building blocks:

  • A job is the top-level container that coordinates, schedules, and runs work. A job can be a single notebook or hundreds of tasks with branching logic.
  • A task is one unit of work — a notebook, Python script, SQL file, Lakeflow Declarative Pipeline, dbt project, JAR, or another job (Run Job task). Tasks are arranged as a Directed Acyclic Graph (DAG), meaning dependencies flow one direction and cannot cycle back.
  • A trigger is the mechanism that starts a run. It can be time-based (a schedule) or event-based (file arrival in cloud storage, or a Delta table update).

Because the task graph is a DAG, Databricks can run independent tasks in parallel and only blocks a task until its upstream dependencies finish.

Dependencies and the Run-if Condition

Each task lists its Depends on upstream tasks. By default a task runs only when all of its upstream dependencies succeed, but the Run if setting changes this so you can build error-handling and fan-in branches:

Run if valueTask runs when…
All succeeded (default)every upstream dependency succeeded
At least one succeededone or more upstream tasks succeeded
None failedno upstream failed (skipped is OK)
All doneevery upstream finished, regardless of outcome
At least one failedone or more upstream failed (cleanup/alert branch)
All failedevery upstream dependency failed

The All done and At least one failed values are how you wire a notification or rollback task that should fire even when an earlier task breaks. Lakeflow Jobs also supports If/else condition tasks and For each tasks to loop a task over a parameter list.

Triggers, Schedules, and Cron

A job's trigger type determines when it starts:

  • Scheduled — a recurring time-based trigger. The UI builds the schedule, but it compiles to a quartz cron expression (for example 0 0 2 * * ? for 2:00 AM daily). You also pick a timezone, which matters for daylight-saving correctness.
  • Continuous — keeps one run active at all times, restarting automatically; used for always-on streaming. Continuous jobs cannot use a normal retry policy.
  • File arrival — polls a Unity Catalog external location or volume path and starts a run when new files land, lowering latency versus a fixed schedule.
  • Table update — triggers when a monitored Delta table changes.

If a scheduled run is still going when the next trigger fires, the concurrent runs limit and queueing settings decide whether the new run waits, is skipped, or runs in parallel.

Clusters, Retries, Notifications, and Repair

Compute choices. A task can run on serverless compute, a dedicated job cluster, or an existing all-purpose cluster. A job cluster spins up for the run and terminates when it ends — cheaper and more isolated, and billed at the lower Jobs Compute DBU rate. An all-purpose cluster stays running and is meant for interactive notebooks; reusing it for scheduled jobs costs more and risks resource contention.

Retries. Per task you set max_retries, a min_retry_interval between attempts, and retry on timeout. On classic compute the default is no retries; serverless jobs auto-optimize retries. A timeout marks a task Timed Out, and if you set both timeout and retries, the timeout applies to each attempt.

Notifications send email, Slack, or webhook alerts on start, success, failure, or duration-exceeded.

Repair run is exam-favorite: for a multi-task job, repair reruns only the failed tasks and their downstream tasks, reusing the output of tasks that already succeeded — far cheaper than rerunning everything.

Parameters, Git, and Task Values

Lakeflow Jobs supports job parameters — key/value pairs defined at the job level and automatically pushed down to every task, so one parameter (for example run_date) drives the whole DAG. Tasks read them with dbutils.widgets.get(...) in a notebook or as named arguments for a Python/SQL task. You can override parameters at trigger time, which is how the same job runs a backfill for an arbitrary date.

Git integration lets a job run notebooks or scripts directly from a remote Git repository (a specific branch, tag, or commit) instead of the workspace, so production runs exactly the reviewed code. This is the bridge between Jobs and the source-controlled, Asset-Bundle workflow.

Tasks can also pass small results downstream with task values (dbutils.jobs.taskValues.set/get), letting a control task compute a value (such as a row count) that a later If/else condition branches on.

Queueing and Concurrency

When runs overlap, two settings govern behavior: max concurrent runs caps how many runs of the same job execute at once, and queueing holds a triggered run until a slot frees instead of skipping it. For most ETL you set concurrent runs to 1 and enable queueing so a slow run does not drop the next scheduled load. Note that a continuous job ignores schedules and queueing because it always keeps one run alive, restarting on completion or failure. Choosing the right trigger, concurrency, and retry combination per job is the difference between a pipeline that quietly recovers and one that pages an engineer at 3 AM.

Test Your Knowledge

A Lakeflow Job has 12 tasks; task 7 fails after tasks 1-6 succeed. You fix the bug and want to resume without recomputing tasks 1-6. What should you use?

A
B
C
D
Test Your Knowledge

You want a cleanup task that sends an alert and tears down resources whenever an upstream ETL task FAILS, but is skipped when it succeeds. Which Run if condition fits?

A
B
C
D
Test Your Knowledge

Which statement about job clusters versus all-purpose clusters is correct?

A
B
C
D
Test Your Knowledge

A daily schedule of '0 0 2 * * ?' represents what, and what compiles the UI schedule?

A
B
C
D