4.2 Databricks Asset Bundles (Declarative Automation Bundles)
Key Takeaways
- Databricks Asset Bundles (DAB), now called Declarative Automation Bundles, package project source code, configuration, and resource definitions for CI/CD deployment.
- A bundle is defined by a databricks.yml file at the project root that specifies resources (jobs, pipelines), targets (environments), and configuration.
- The Databricks CLI commands (bundle validate, bundle deploy, bundle run) manage the lifecycle of bundles.
- Targets define environment-specific configurations (dev, staging, prod) within the same bundle definition.
- Bundles replace manual UI-based deployment with code-based, version-controlled infrastructure as code.
Databricks Asset Bundles (Declarative Automation Bundles)
Quick Answer: Databricks Asset Bundles (now Declarative Automation Bundles) are infrastructure-as-code definitions for Databricks projects. A databricks.yml file defines jobs, pipelines, and targets. The CLI validates, deploys, and runs bundles across dev, staging, and production environments.
What Are Bundles?
Bundles bring software engineering best practices to Databricks projects:
| Practice | How Bundles Enable It |
|---|---|
| Source control | Bundle YAML and source code stored in Git |
| Code review | Pull request review for infrastructure changes |
| CI/CD | Automated deployment via CLI in CI/CD pipelines |
| Environment management | Targets define dev/staging/prod configurations |
| Reproducibility | Same bundle definition deploys consistently |
| Testing | Validate bundle before deployment |
Bundle Structure
my-project/
├── databricks.yml # Main bundle configuration
├── resources/
│ ├── jobs.yml # Job definitions
│ └── pipelines.yml # Pipeline definitions
├── src/
│ ��── notebooks/
│ │ ├── ingest.py
│ │ ├── transform.sql
│ │ └── aggregate.py
│ └── libraries/
│ └── helpers.py
└── tests/
└── test_transform.py
databricks.yml Configuration
bundle:
name: sales-pipeline
# Include additional configuration files
include:
- resources/*.yml
# Target environments
targets:
dev:
mode: development
default: true
workspace:
host: https://dev-workspace.cloud.databricks.com
staging:
workspace:
host: https://staging-workspace.cloud.databricks.com
prod:
mode: production
workspace:
host: https://prod-workspace.cloud.databricks.com
run_as:
service_principal_name: prod-service-principal
Resource Configuration (resources/jobs.yml)
resources:
jobs:
daily_etl:
name: "Daily ETL Pipeline"
schedule:
quartz_cron_expression: "0 0 2 * * ?"
timezone_id: "UTC"
tasks:
- task_key: ingest
notebook_task:
notebook_path: src/notebooks/ingest.py
new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
- task_key: transform
depends_on:
- task_key: ingest
notebook_task:
notebook_path: src/notebooks/transform.sql
- task_key: aggregate
depends_on:
- task_key: transform
notebook_task:
notebook_path: src/notebooks/aggregate.py
CLI Commands
| Command | Purpose |
|---|---|
databricks bundle init | Create a new bundle from a template |
databricks bundle validate | Check bundle syntax and references |
databricks bundle deploy | Deploy bundle resources to a workspace |
databricks bundle run | Run a specific job or pipeline from the bundle |
databricks bundle destroy | Remove all resources created by the bundle |
Workflow
# 1. Initialize a new project
databricks bundle init --template default-python
# 2. Validate the configuration
databricks bundle validate -t dev
# 3. Deploy to development
databricks bundle deploy -t dev
# 4. Run the job
databricks bundle run daily_etl -t dev
# 5. Deploy to production (after testing)
databricks bundle deploy -t prod
Targets (Environment Management)
Targets allow the same bundle to be deployed to different environments:
| Target Property | Purpose | Example |
|---|---|---|
| mode | development or production | mode: production |
| workspace.host | Target workspace URL | https://prod.cloud.databricks.com |
| run_as | Service principal for production | run_as: {service_principal_name: sp-prod} |
| default | Default target for CLI commands | default: true |
Development vs. Production Mode
| Feature | Development Mode | Production Mode |
|---|---|---|
| Resource naming | Prefixed with [dev username] | Exact name as defined |
| Permissions | Current user only | Configured via run_as |
| Locking | No deployment lock | Deployment lock to prevent conflicts |
| Cluster policy | Can use any cluster | Should use job clusters |
CI/CD Integration
# Example: GitHub Actions CI/CD pipeline
name: Deploy Databricks Bundle
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: databricks/setup-cli@main
- run: databricks bundle validate -t prod
- run: databricks bundle deploy -t prod
On the Exam: Know the basic bundle structure (databricks.yml, resources, targets), the three main CLI commands (validate, deploy, run), and the difference between development and production modes. You do not need to memorize YAML syntax.
Which file must exist at the root of every Databricks Asset Bundle project?
What does the "databricks bundle validate" command do?
In a Databricks Asset Bundle, what is the purpose of "targets"?