1.4 Databricks Repos, Secrets, and Development Tools

Key Takeaways

  • Git folders (formerly Databricks Repos) clone a Git repository into the workspace for version control and CI/CD.
  • Git folders support branching, commit/push/pull, merge, rebase, and pull-with-conflict-resolution against GitHub, GitLab, Azure DevOps, and Bitbucket.
  • Secret scopes store credentials securely; dbutils.secrets.get retrieves them and Databricks redacts secret values from output.
  • The Databricks CLI and REST API enable automation and CI/CD deployment of jobs, notebooks, and configuration.
  • Databricks Asset Bundles (DABs) package code, jobs, and pipeline definitions as code for reproducible deployments.
Last updated: June 2026

Git Folders for Version Control

Git folders (formerly Databricks Repos) let you clone a remote Git repository directly into the workspace so notebooks and code are version-controlled. Instead of editing untracked notebooks, you work inside a folder backed by Git and perform standard operations from the UI or API.

Supported providers include GitHub, GitLab, Azure DevOps, and Bitbucket, authenticated with a personal access token (PAT) or app integration stored per user. Within a Git folder you can:

OperationPurpose
ClonePull a remote repo into the workspace
Create branch / checkoutIsolate feature work from main
Commit & pushSave changes back to the remote
PullBring remote changes (now with conflict resolution)
Merge / rebaseIntegrate branches; resolve conflicts in-UI

This is the backbone of CI/CD: developers work on feature branches, open pull requests in the provider, and an automated pipeline deploys the merged code to staging and production workspaces. A typical flow is dev branch in a workspace Git folder → PR review in GitHub → CI deploys to prod via the CLI or Asset Bundles.

Secrets and Secret Scopes

Credentials (database passwords, API keys, storage tokens) must never be hard-coded in notebooks. Databricks provides secret scopes — named, access-controlled containers of key/value secrets. You create a scope, add secrets to it, then read them at runtime:

pwd = dbutils.secrets.get(scope="prod", key="db_password")

Crucially, Databricks redacts any secret value if you try to print it, displaying [REDACTED] so credentials never leak into notebook output, logs, or job results. Scopes can be Databricks-backed or Azure Key Vault-backed, and access is controlled with scope-level ACLs (MANAGE / WRITE / READ).

Automation: CLI, REST API, and Asset Bundles

Development on Databricks extends well beyond the notebook UI. Three tools enable programmatic control and reproducible deployment:

  • Databricks CLI — a command-line interface that wraps the REST API. It manages workspaces, jobs, clusters, secrets, and Git folders, and is the workhorse of CI/CD scripts (for example, databricks jobs create or databricks bundle deploy).
  • REST API — programmatic endpoints for every platform object (jobs, runs, clusters, repos, permissions). Pipelines and external tools call it to create, trigger, and monitor work.
  • Databricks Asset Bundles (DABs) — the recommended infrastructure-as-code approach. A bundle is a YAML project (databricks.yml) that declares notebooks, Lakeflow Jobs, pipelines, and their settings. databricks bundle deploy provisions everything consistently across dev, staging, and prod targets, so the same definitions promote cleanly through environments.

Other Development Tooling

  • Notebook-scoped libraries via %pip install let you install Python packages for a single notebook session without changing the cluster.
  • dbutils.notebook.run() and %run chain notebooks for modular pipelines (%run shares the same context; notebook.run() runs in a separate execution with parameters).
  • Workspace files allow non-notebook code — .py modules, config, small data — to live beside notebooks and be imported.

Why This Matters for Exam Scenarios

The associate exam tests whether you can move from ad-hoc notebooks to production-grade, source-controlled, automated engineering. Recognize the right tool: Git folders for version control, secret scopes + dbutils.secrets for credentials, and the CLI / REST API / Asset Bundles for CI/CD deployment of jobs and pipelines.

A Concrete CI/CD Workflow

Tie the tools together with a realistic promotion path. A data engineer creates a feature branch inside a workspace Git folder and develops a Lakeflow pipeline notebook that reads dbutils.secrets.get from a secret scope scoped to that environment. When the work is ready, the engineer commits and pushes, then opens a pull request in GitHub where a teammate reviews it. Merging to main triggers a CI pipeline that runs the Databricks CLI to databricks bundle deploy a Databricks Asset Bundle, provisioning the jobs and pipelines into the staging target.

After validation, the same bundle deploys to the production target with production-scoped secrets. Nothing is hand-edited in production; the YAML bundle is the single source of truth, so every environment is reproducible.

Choosing Between %run and notebook.run

Modular code often spans notebooks, and the exam distinguishes the two chaining mechanisms:

MechanismBehavior
%run ./helpersInlines the other notebook into the same execution context, sharing variables and functions
dbutils.notebook.run(path, timeout, args)Runs the target in a separate context, accepts parameters, and returns a value

run() to orchestrate parameterized sub-tasks or build simple workflows in code. For production orchestration with dependencies, retries, and scheduling, prefer a **Lakeflow Job** over notebook-driven chaining. Together these tools move a project from interactive exploration to governed, automated, source-controlled delivery — the maturity the associate exam expects you to demonstrate. run(), and one that mentions schedules, retries, or alerts points to a Lakeflow Job rather than any notebook-chaining mechanism at all.

Git folders quick recap

  • Git folders (formerly Repos) connect a workspace to a remote repo (GitHub, GitLab, Azure DevOps, Bitbucket) and version-control notebooks and files.
  • Use feature branches and pull requests for review; CI/CD then deploys with Databricks Asset Bundles, not by editing production notebooks directly.
  • Notebook outputs are stripped on commit by default, keeping diffs clean.
Test Your Knowledge

A team wants their Databricks notebooks under version control with feature branches and pull requests in GitHub. Which Databricks feature do they use?

A
B
C
D
Test Your Knowledge

A notebook retrieves a database password with dbutils.secrets.get and then accidentally prints it. What does Databricks display?

A
B
C
D
Test Your Knowledge

Which Databricks capability lets you define jobs, pipelines, and notebooks as code in a YAML project and deploy them consistently across dev, staging, and prod?

A
B
C
D