What does the Databricks Runtime (DBR) provide?

A pre-packaged software environment bundling Apache Spark, Delta Lake, and optimized libraries on cluster nodes. The Databricks Runtime is the bundled software on cluster machines, packaging a specific Spark build, Delta Lake, language runtimes, and performance libraries, with variants like LTS for production and ML for machine learning. Governance is Unity Catalog and the transaction log is Delta Lake.

A scheduled production pipeline currently runs on an always-on all-purpose cluster. What is the recommended, more cost-effective compute?

Job compute, which the scheduler creates per run and terminates when the job finishes. Job compute is provisioned automatically for each scheduled run and torn down when the job completes, so you pay only for the run rather than for an idle always-on cluster. All-purpose compute is meant for interactive development.

An administrator wants to restrict which instance types and maximum worker counts users may select when creating clusters, and to require cost-tracking tags. What should they use?

Cluster (compute) policies. Cluster policies are admin-defined rule sets that constrain configuration such as allowed instance types, maximum workers, autoscaling requirements, and mandatory tags, enforcing cost control and governance. The other options address querying history, credentials, and notebook languages.

With cluster autoscaling enabled between 2 and 8 workers, what happens during a resource-intensive stage of a job?

Databricks adds workers up to the maximum of 8, then removes them as load drops. Autoscaling adds worker nodes up to the configured maximum when the workload is resource-bound and removes them when load falls back toward the minimum, so you pay for extra capacity only while it is needed. It does not terminate the cluster or change Photon or driver settings.

Databricks Runtime and Compute Configuration | Free Guide 2026

Key Takeaways

The Databricks Runtime (DBR) bundles Apache Spark, Delta Lake, and optimized libraries; the LTS variant gives extended support.
All-purpose compute is long-lived for interactive development; job compute is created per run by the scheduler and terminated after.
Serverless compute is managed by Databricks, starts in seconds, and always runs Photon and autoscaling.
Autoscaling adds and removes workers between min and max bounds based on load; auto-termination shuts idle clusters to save cost.
Cluster policies constrain configuration (instance types, autoscaling, tags) to control cost and governance; DBU is the billing unit.

The Databricks Runtime

The Databricks Runtime (DBR) is the set of software installed on cluster machines. Each version bundles a specific Apache Spark build, Delta Lake, Java/Scala/Python, and a library of performance and connectivity optimizations — so you select a runtime version rather than assembling Spark yourself.

Key runtime variants:

Runtime	Use
Standard DBR	General data engineering with Spark + Delta
DBR LTS	Long-Term Support — extended patches/stability for production
DBR ML	Adds popular ML libraries (scikit-learn, XGBoost, etc.)
Photon-enabled DBR	Runs the Photon native engine for acceleration

Choose an LTS version for production pipelines that need stability over a long window; pick the latest standard release to access newer features.

Compute Types

Databricks compute falls into three categories the exam tests directly:

All-purpose compute — long-lived, interactive clusters for developing in notebooks, ad-hoc analysis, and collaboration. You create, restart, and terminate them manually; multiple users can share one.
Job compute (job clusters) — created automatically by the scheduler when a Lakeflow Job task runs and terminated when the job finishes. It is cheaper for production because it exists only for the run and is isolated per job.
Serverless compute — fully managed by Databricks. It starts in seconds from cached environments, autoscales automatically, and always runs Photon. There is nothing to size or terminate; Databricks handles the infrastructure.

A core exam rule: use all-purpose for interactive development and job compute for scheduled production work — running production jobs on always-on all-purpose clusters wastes money.

Configuring Clusters: Autoscaling, Termination, Policies

Classic clusters expose configuration that balances performance and cost:

Autoscaling — set a minimum and maximum number of worker nodes; Databricks adds workers when a job is resource-bound and removes them when load drops, so you pay for capacity only when needed.
Auto-termination — a cluster shuts down after a defined idle period (e.g., 30 minutes), preventing forgotten clusters from running up cost.
Cluster (compute) policies — admin-defined rule sets that constrain what users can configure: allowed instance types, max workers, mandatory autoscaling, required tags, and runtime version. Policies enforce cost controls and governance and simplify cluster creation for non-experts.

Driver and Workers

A cluster has one driver node (runs the Spark driver, coordinates tasks, holds notebook state) and one or more worker nodes (execute tasks in parallel). A single-node cluster runs driver and executor on one machine — fine for small or single-threaded work, not for large distributed jobs.

DBU: The Billing Unit

A Databricks Unit (DBU) is a normalized unit of processing capacity consumed per hour. Your bill is DBUs × tier rate (which varies by product — Jobs, All-Purpose, DBSQL — and cloud) plus the underlying cloud VM cost. Larger or more workers consume more DBUs per hour; Photon-enabled compute has a higher DBU rate but often finishes faster, lowering total DBUs for a workload.

Cost-Control Checklist

Run production on job compute, not all-purpose.
Enable autoscaling and auto-termination.
Apply cluster policies to cap size and enforce tags.
Consider serverless to remove idle and startup waste.

These levers are exactly what the associate exam expects you to apply when asked to make a workload reliable and cost-efficient.

Access Modes and Cluster Sizing

When you create classic compute you also choose an access mode, which determines isolation and Unity Catalog support. Single-user (dedicated) mode is assigned to one user and supports all languages including Scala; shared mode lets multiple users share a cluster with process isolation and full Unity Catalog governance. The legacy no-isolation shared mode lacks Unity Catalog enforcement and is being retired — new accounts created after December 18, 2025 do not get it. For governed multi-user work, shared access mode is the right answer.

Sizing involves two independent choices: the node type (CPU, memory, and whether the instance is memory- or compute-optimized) and the number of workers. Memory-heavy aggregations and joins benefit from memory-optimized nodes, while a small exploratory workload may run fine on a single-node cluster. Autoscaling then flexes worker count within bounds, so you size for the typical case and let scaling absorb spikes.

Matching Compute to the Workload

The associate exam repeatedly poses 'which compute should you use' scenarios. Use this decision guide:

Scenario	Recommended compute
Interactive notebook development, shared exploration	All-purpose cluster
Scheduled production pipeline	Job compute (per-run, auto-terminated)
Fast startup, zero infra management, bursty work	Serverless compute
SQL analytics, dashboards, BI tools	SQL warehouse (serverless for speed)
Long-running, stability-critical production	LTS Databricks Runtime

The through-line is cost-and-reliability fit: never leave production on an always-on all-purpose cluster, always enable auto-termination on interactive clusters, and reach for serverless when you want to eliminate startup latency and sizing effort. Pairing the right runtime (LTS for stability) with the right compute type, governed by cluster policies and billed transparently in DBUs, is the practical competency this domain certifies.

When a scenario stresses both cost and reliability, the strongest answer almost always combines several levers at once — job compute for the run, autoscaling within sensible bounds, auto-termination for any interactive clusters, an LTS runtime for production stability, and a cluster policy to keep every team within those guardrails.

Databricks Certified Data Engineer Associate

Databricks Certified Data Engineer Associate

1.6 Databricks Runtime and Compute Configuration

Key Takeaways

The Databricks Runtime

Compute Types

Configuring Clusters: Autoscaling, Termination, Policies

Driver and Workers

DBU: The Billing Unit

Cost-Control Checklist

Access Modes and Cluster Sizing

Matching Compute to the Workload

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Databricks Certified Data Engineer Associate

1.6 Databricks Runtime and Compute Configuration

Key Takeaways

The Databricks Runtime

Compute Types

Configuring Clusters: Autoscaling, Termination, Policies

Driver and Workers

DBU: The Billing Unit

Cost-Control Checklist

Access Modes and Cluster Sizing

Matching Compute to the Workload