1.6 Databricks Runtime and Compute Configuration
Key Takeaways
- The Databricks Runtime (DBR) bundles Apache Spark, Delta Lake, and optimized libraries; the LTS variant gives extended support.
- All-purpose compute is long-lived for interactive development; job compute is created per run by the scheduler and terminated after.
- Serverless compute is managed by Databricks, starts in seconds, and always runs Photon and autoscaling.
- Autoscaling adds and removes workers between min and max bounds based on load; auto-termination shuts idle clusters to save cost.
- Cluster policies constrain configuration (instance types, autoscaling, tags) to control cost and governance; DBU is the billing unit.
The Databricks Runtime
The Databricks Runtime (DBR) is the set of software installed on cluster machines. Each version bundles a specific Apache Spark build, Delta Lake, Java/Scala/Python, and a library of performance and connectivity optimizations — so you select a runtime version rather than assembling Spark yourself.
Key runtime variants:
| Runtime | Use |
|---|---|
| Standard DBR | General data engineering with Spark + Delta |
| DBR LTS | Long-Term Support — extended patches/stability for production |
| DBR ML | Adds popular ML libraries (scikit-learn, XGBoost, etc.) |
| Photon-enabled DBR | Runs the Photon native engine for acceleration |
Choose an LTS version for production pipelines that need stability over a long window; pick the latest standard release to access newer features.
Compute Types
Databricks compute falls into three categories the exam tests directly:
- All-purpose compute — long-lived, interactive clusters for developing in notebooks, ad-hoc analysis, and collaboration. You create, restart, and terminate them manually; multiple users can share one.
- Job compute (job clusters) — created automatically by the scheduler when a Lakeflow Job task runs and terminated when the job finishes. It is cheaper for production because it exists only for the run and is isolated per job.
- Serverless compute — fully managed by Databricks. It starts in seconds from cached environments, autoscales automatically, and always runs Photon. There is nothing to size or terminate; Databricks handles the infrastructure.
A core exam rule: use all-purpose for interactive development and job compute for scheduled production work — running production jobs on always-on all-purpose clusters wastes money.
Configuring Clusters: Autoscaling, Termination, Policies
Classic clusters expose configuration that balances performance and cost:
- Autoscaling — set a minimum and maximum number of worker nodes; Databricks adds workers when a job is resource-bound and removes them when load drops, so you pay for capacity only when needed.
- Auto-termination — a cluster shuts down after a defined idle period (e.g., 30 minutes), preventing forgotten clusters from running up cost.
- Cluster (compute) policies — admin-defined rule sets that constrain what users can configure: allowed instance types, max workers, mandatory autoscaling, required tags, and runtime version. Policies enforce cost controls and governance and simplify cluster creation for non-experts.
Driver and Workers
A cluster has one driver node (runs the Spark driver, coordinates tasks, holds notebook state) and one or more worker nodes (execute tasks in parallel). A single-node cluster runs driver and executor on one machine — fine for small or single-threaded work, not for large distributed jobs.
DBU: The Billing Unit
A Databricks Unit (DBU) is a normalized unit of processing capacity consumed per hour. Your bill is DBUs × tier rate (which varies by product — Jobs, All-Purpose, DBSQL — and cloud) plus the underlying cloud VM cost. Larger or more workers consume more DBUs per hour; Photon-enabled compute has a higher DBU rate but often finishes faster, lowering total DBUs for a workload.
Cost-Control Checklist
- Run production on job compute, not all-purpose.
- Enable autoscaling and auto-termination.
- Apply cluster policies to cap size and enforce tags.
- Consider serverless to remove idle and startup waste.
These levers are exactly what the associate exam expects you to apply when asked to make a workload reliable and cost-efficient.
Access Modes and Cluster Sizing
When you create classic compute you also choose an access mode, which determines isolation and Unity Catalog support. Single-user (dedicated) mode is assigned to one user and supports all languages including Scala; shared mode lets multiple users share a cluster with process isolation and full Unity Catalog governance. The legacy no-isolation shared mode lacks Unity Catalog enforcement and is being retired — new accounts created after December 18, 2025 do not get it. For governed multi-user work, shared access mode is the right answer.
Sizing involves two independent choices: the node type (CPU, memory, and whether the instance is memory- or compute-optimized) and the number of workers. Memory-heavy aggregations and joins benefit from memory-optimized nodes, while a small exploratory workload may run fine on a single-node cluster. Autoscaling then flexes worker count within bounds, so you size for the typical case and let scaling absorb spikes.
Matching Compute to the Workload
The associate exam repeatedly poses 'which compute should you use' scenarios. Use this decision guide:
| Scenario | Recommended compute |
|---|---|
| Interactive notebook development, shared exploration | All-purpose cluster |
| Scheduled production pipeline | Job compute (per-run, auto-terminated) |
| Fast startup, zero infra management, bursty work | Serverless compute |
| SQL analytics, dashboards, BI tools | SQL warehouse (serverless for speed) |
| Long-running, stability-critical production | LTS Databricks Runtime |
The through-line is cost-and-reliability fit: never leave production on an always-on all-purpose cluster, always enable auto-termination on interactive clusters, and reach for serverless when you want to eliminate startup latency and sizing effort. Pairing the right runtime (LTS for stability) with the right compute type, governed by cluster policies and billed transparently in DBUs, is the practical competency this domain certifies.
When a scenario stresses both cost and reliability, the strongest answer almost always combines several levers at once — job compute for the run, autoscaling within sensible bounds, auto-termination for any interactive clusters, an LTS runtime for production stability, and a cluster policy to keep every team within those guardrails.
What does the Databricks Runtime (DBR) provide?
A scheduled production pipeline currently runs on an always-on all-purpose cluster. What is the recommended, more cost-effective compute?
An administrator wants to restrict which instance types and maximum worker counts users may select when creating clusters, and to require cost-tracking tags. What should they use?
With cluster autoscaling enabled between 2 and 8 workers, what happens during a resource-intensive stage of a job?