Which file must exist at the root of every Databricks Asset Bundle project?

databricks.yml. Every Databricks Asset Bundle must have a databricks.yml file at the project root. This file defines the bundle name, targets (environments), includes for additional configuration files, and may directly define resources.

What does the "databricks bundle validate" command do?

It checks the bundle configuration for syntax errors and validates resource references. databricks bundle validate checks the bundle YAML configuration for syntax errors, validates that referenced resources and paths exist, and ensures the configuration is correct before deployment. Always run validate before deploy.

In a Databricks Asset Bundle, what is the purpose of "targets"?

To specify different environment configurations (dev, staging, prod) for the same bundle. Targets define environment-specific configurations within a single bundle. Each target specifies the workspace URL, mode (development or production), service principal, and other settings. This allows the same pipeline code to be deployed to different environments.

Databricks Asset Bundles (Declarative Automation Bundles)

Quick Answer: Databricks Asset Bundles (now Declarative Automation Bundles) are infrastructure-as-code definitions for Databricks projects. A databricks.yml file defines jobs, pipelines, and targets. The CLI validates, deploys, and runs bundles across dev, staging, and production environments.

What Are Bundles?

Bundles bring software engineering best practices to Databricks projects:

Practice	How Bundles Enable It
Source control	Bundle YAML and source code stored in Git
Code review	Pull request review for infrastructure changes
CI/CD	Automated deployment via CLI in CI/CD pipelines
Environment management	Targets define dev/staging/prod configurations
Reproducibility	Same bundle definition deploys consistently
Testing	Validate bundle before deployment

Bundle Structure

my-project/
├── databricks.yml           # Main bundle configuration
├── resources/
│   ├── jobs.yml             # Job definitions
│   └── pipelines.yml        # Pipeline definitions
├── src/
│   ��── notebooks/
│   │   ├── ingest.py
│   │   ├── transform.sql
│   │   └── aggregate.py
│   └── libraries/
│       └── helpers.py
└── tests/
    └── test_transform.py

databricks.yml Configuration

bundle:
  name: sales-pipeline

# Include additional configuration files
include:
  - resources/*.yml

# Target environments
targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://dev-workspace.cloud.databricks.com

  staging:
    workspace:
      host: https://staging-workspace.cloud.databricks.com

  prod:
    mode: production
    workspace:
      host: https://prod-workspace.cloud.databricks.com
    run_as:
      service_principal_name: prod-service-principal

Resource Configuration (resources/jobs.yml)

resources:
  jobs:
    daily_etl:
      name: "Daily ETL Pipeline"
      schedule:
        quartz_cron_expression: "0 0 2 * * ?"
        timezone_id: "UTC"
      tasks:
        - task_key: ingest
          notebook_task:
            notebook_path: src/notebooks/ingest.py
          new_cluster:
            spark_version: "15.4.x-scala2.12"
            node_type_id: "i3.xlarge"
            num_workers: 2

        - task_key: transform
          depends_on:
            - task_key: ingest
          notebook_task:
            notebook_path: src/notebooks/transform.sql

        - task_key: aggregate
          depends_on:
            - task_key: transform
          notebook_task:
            notebook_path: src/notebooks/aggregate.py

CLI Commands

Command	Purpose
`databricks bundle init`	Create a new bundle from a template
`databricks bundle validate`	Check bundle syntax and references
`databricks bundle deploy`	Deploy bundle resources to a workspace
`databricks bundle run`	Run a specific job or pipeline from the bundle
`databricks bundle destroy`	Remove all resources created by the bundle

Workflow

# 1. Initialize a new project
databricks bundle init --template default-python

# 2. Validate the configuration
databricks bundle validate -t dev

# 3. Deploy to development
databricks bundle deploy -t dev

# 4. Run the job
databricks bundle run daily_etl -t dev

# 5. Deploy to production (after testing)
databricks bundle deploy -t prod

Targets (Environment Management)

Targets allow the same bundle to be deployed to different environments:

Target Property	Purpose	Example
mode	development or production	`mode: production`
workspace.host	Target workspace URL	`https://prod.cloud.databricks.com`
run_as	Service principal for production	`run_as: {service_principal_name: sp-prod}`
default	Default target for CLI commands	`default: true`

Development vs. Production Mode

Feature	Development Mode	Production Mode
Resource naming	Prefixed with `[dev username]`	Exact name as defined
Permissions	Current user only	Configured via run_as
Locking	No deployment lock	Deployment lock to prevent conflicts
Cluster policy	Can use any cluster	Should use job clusters

CI/CD Integration

# Example: GitHub Actions CI/CD pipeline
name: Deploy Databricks Bundle
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: databricks/setup-cli@main
      - run: databricks bundle validate -t prod
      - run: databricks bundle deploy -t prod

On the Exam: Know the basic bundle structure (databricks.yml, resources, targets), the three main CLI commands (validate, deploy, run), and the difference between development and production modes. You do not need to memorize YAML syntax.

Databricks Certified Data Engineer Associate

4.2 Databricks Asset Bundles (Declarative Automation Bundles)

Key Takeaways

Databricks Asset Bundles (Declarative Automation Bundles)

What Are Bundles?

Bundle Structure

databricks.yml Configuration

Resource Configuration (resources/jobs.yml)

CLI Commands

Workflow

Targets (Environment Management)

Development vs. Production Mode

CI/CD Integration

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

4.2 Databricks Asset Bundles (Declarative Automation Bundles)

Key Takeaways

Databricks Asset Bundles (Declarative Automation Bundles)

What Are Bundles?

Bundle Structure

databricks.yml Configuration

Resource Configuration (resources/jobs.yml)

CLI Commands

Workflow

Targets (Environment Management)

Development vs. Production Mode

CI/CD Integration