4.2 Databricks Asset Bundles (Declarative Automation Bundles)

Key Takeaways

  • Databricks Asset Bundles (DABs) define jobs, pipelines, notebooks, and models as code in a required databricks.yml file, enabling source control, code review, and CI/CD.
  • The bundle root databricks.yml holds the bundle name plus resources, variables, and targets; the Databricks CLI runs validate, deploy, and run against a chosen --target.
  • Targets map to environments: mode: development prefixes resources with the developer's username and pauses schedules; mode: production enforces strict deployment to a stable workspace.
  • Non-development targets should set run_as to a service principal so deployments are not tied to an individual user's identity.
  • DABs replace ad-hoc UI configuration so the same definition deploys consistently across dev, staging, and prod from a CI system such as GitHub Actions.
Last updated: June 2026

What Databricks Asset Bundles Solve

Databricks Asset Bundles (DABs) — now also called Declarative Automation Bundles — let you define Databricks resources as code and deploy them reproducibly. Instead of clicking together a job in the UI in each workspace, you describe jobs, Lakeflow Declarative Pipelines, notebooks, ML models, experiments, and their dependencies in YAML, commit it to Git, review it, and deploy it through a CI/CD system.

A bundle brings software-engineering discipline to data work:

  • Source control — the whole project lives in a Git repo.
  • Code review — changes go through pull requests.
  • Testing and CI/CD — automated validation and deployment.
  • Consistency — the same definition deploys to dev, staging, and prod, eliminating drift between environments.

The databricks.yml File and Bundle Structure

Every bundle has a required configuration file named databricks.yml (or databricks.yaml) at the bundle root. It must contain at minimum the top-level bundle mapping with a name. The common top-level mappings are:

MappingPurpose
bundleBundle name and metadata
resourcesJobs, pipelines, models, experiments to deploy
variablesNamed values reused/overridden across targets
targetsNamed deployment environments (dev, staging, prod)
includePulls in extra YAML files (e.g. resources/*.yml)
syncFiles to upload/exclude with the bundle

A minimal example:

bundle:
  name: sales_etl
resources:
  jobs:
    daily_load:
      name: daily_load
      tasks:
        - task_key: ingest
          notebook_task:
            notebook_path: ./src/ingest.py
targets:
  dev:
    mode: development
  prod:
    mode: production

Targets, Deployment Modes, and the CLI Workflow

Targets map a bundle to an environment. Two deployment modes matter for the exam:

  • mode: development — prepends resource names with a [dev your.name] prefix, pauses schedules and triggers, and tags resources as dev. This lets multiple engineers deploy the same bundle into one workspace without colliding.
  • mode: production — enforces strict, stable deployment (no per-user prefixing) and validates that the target is configured safely.

You operate bundles with the Databricks CLI:

  1. databricks bundle validate — checks the YAML is well-formed.
  2. databricks bundle deploy -t prod — uploads artifacts and creates/updates jobs and pipelines in the target workspace.
  3. databricks bundle run <job_key> -t prod — triggers a deployed resource.
  4. databricks bundle destroy -t dev — removes deployed resources.

Variables hold environment-specific values (catalog names, warehouse IDs) so you override per target instead of duplicating definitions.

CI/CD and Service Principals

Bundles are designed to be driven from a CI/CD pipeline (GitHub Actions, Azure DevOps, GitLab). A typical flow: a pull request runs bundle validate and unit tests; merging to a release branch runs bundle deploy -t staging for integration testing; promotion runs bundle deploy -t prod.

Key best practices the exam rewards:

  • Use a service principal for non-development targets. Set run_as to a service principal in staging/prod so deployed jobs run under a stable, non-personal identity — not the individual who happened to deploy.
  • Pin the CLI version in production pipelines for reproducibility.
  • Store secrets in Databricks secret scopes, never in the YAML.
  • Use variables, not copy-paste, for values that differ across dev/staging/prod.

This is the difference between a bundle and clicking around the UI: the bundle is the auditable, repeatable source of truth.

Resources, Includes, and Variable Overrides

The resources mapping is where you declare what gets deployed: jobs, pipelines (Lakeflow Declarative Pipelines), models, experiments, dashboards, and more. Large projects split these across files and pull them in with include, for example include: ["resources/*.yml"], keeping each job definition in its own file.

Variables make one definition serve every environment. You declare a variable once, then override it per target so dev points at a dev catalog and prod at a prod catalog:

variables:
  catalog:
    default: dev_catalog
targets:
  prod:
    variables:
      catalog: prod_catalog

Resources reference ${var.catalog}, so promotion never means editing the job — only the target changes. This eliminates the copy-paste drift that plagues UI-built jobs.

Scaffolding with bundle init

You do not start from a blank file. databricks bundle init scaffolds a project from a template (default Python, dbt, or a custom template), generating a working databricks.yml, sample resources, and a tests/ folder. From there you edit, validate, and deploy. The generated layout already follows the recommended structure of source modules, tests, and per-target configuration, which is why bundles are the platform's recommended path for any code that must reach production reliably.

Exam pointers

A bundle is defined by a databricks.yml at the bundle root; targets map to environments; mode: development prefixes resources and pauses schedules while mode: production deploys strictly; and non-dev targets should run as a service principal. Bundles supersede manually configuring jobs in each workspace's UI, eliminating environment drift and making the deployed job identical to the reviewed and tested code in Git.

A useful rule of thumb: anything you would otherwise click together in the Jobs or Pipelines UI for production should instead be declared in the bundle, version-controlled, and deployed through databricks bundle deploy so the workspace state is always reproducible from source.

Asset Bundles quick recap

  • A bundle is defined by a databricks.yml with resources, variables, and targets.
  • mode: development prefixes resource names and pauses schedules; mode: production deploys clean, scheduled resources.
  • The CLI flow is bundle validatebundle deploybundle run (and bundle destroy), making deployments repeatable across dev/staging/prod.
Test Your Knowledge

Which file is required at the root of every Databricks Asset Bundle and what must it contain at minimum?

A
B
C
D
Test Your Knowledge

Two engineers deploy the same bundle into one shared workspace for testing. Which deployment mode keeps their deployments from colliding by prefixing resource names and pausing schedules?

A
B
C
D
Test Your Knowledge

In a CI/CD pipeline deploying to production, why should the prod target set run_as to a service principal?

A
B
C
D