1.4 Databricks Repos, Secrets, and Development Tools
Key Takeaways
- Git folders (formerly Databricks Repos) clone a Git repository into the workspace for version control and CI/CD.
- Git folders support branching, commit/push/pull, merge, rebase, and pull-with-conflict-resolution against GitHub, GitLab, Azure DevOps, and Bitbucket.
- Secret scopes store credentials securely; dbutils.secrets.get retrieves them and Databricks redacts secret values from output.
- The Databricks CLI and REST API enable automation and CI/CD deployment of jobs, notebooks, and configuration.
- Databricks Asset Bundles (DABs) package code, jobs, and pipeline definitions as code for reproducible deployments.
Git Folders for Version Control
Git folders (formerly Databricks Repos) let you clone a remote Git repository directly into the workspace so notebooks and code are version-controlled. Instead of editing untracked notebooks, you work inside a folder backed by Git and perform standard operations from the UI or API.
Supported providers include GitHub, GitLab, Azure DevOps, and Bitbucket, authenticated with a personal access token (PAT) or app integration stored per user. Within a Git folder you can:
| Operation | Purpose |
|---|---|
| Clone | Pull a remote repo into the workspace |
| Create branch / checkout | Isolate feature work from main |
| Commit & push | Save changes back to the remote |
| Pull | Bring remote changes (now with conflict resolution) |
| Merge / rebase | Integrate branches; resolve conflicts in-UI |
This is the backbone of CI/CD: developers work on feature branches, open pull requests in the provider, and an automated pipeline deploys the merged code to staging and production workspaces. A typical flow is dev branch in a workspace Git folder → PR review in GitHub → CI deploys to prod via the CLI or Asset Bundles.
Secrets and Secret Scopes
Credentials (database passwords, API keys, storage tokens) must never be hard-coded in notebooks. Databricks provides secret scopes — named, access-controlled containers of key/value secrets. You create a scope, add secrets to it, then read them at runtime:
pwd = dbutils.secrets.get(scope="prod", key="db_password")
Crucially, Databricks redacts any secret value if you try to print it, displaying [REDACTED] so credentials never leak into notebook output, logs, or job results. Scopes can be Databricks-backed or Azure Key Vault-backed, and access is controlled with scope-level ACLs (MANAGE / WRITE / READ).
Automation: CLI, REST API, and Asset Bundles
Development on Databricks extends well beyond the notebook UI. Three tools enable programmatic control and reproducible deployment:
- Databricks CLI — a command-line interface that wraps the REST API. It manages workspaces, jobs, clusters, secrets, and Git folders, and is the workhorse of CI/CD scripts (for example,
databricks jobs createordatabricks bundle deploy). - REST API — programmatic endpoints for every platform object (jobs, runs, clusters, repos, permissions). Pipelines and external tools call it to create, trigger, and monitor work.
- Databricks Asset Bundles (DABs) — the recommended infrastructure-as-code approach. A bundle is a YAML project (
databricks.yml) that declares notebooks, Lakeflow Jobs, pipelines, and their settings.databricks bundle deployprovisions everything consistently across dev, staging, and prod targets, so the same definitions promote cleanly through environments.
Other Development Tooling
- Notebook-scoped libraries via
%pip installlet you install Python packages for a single notebook session without changing the cluster. dbutils.notebook.run()and%runchain notebooks for modular pipelines (%runshares the same context;notebook.run()runs in a separate execution with parameters).- Workspace files allow non-notebook code —
.pymodules, config, small data — to live beside notebooks and be imported.
Why This Matters for Exam Scenarios
The associate exam tests whether you can move from ad-hoc notebooks to production-grade, source-controlled, automated engineering. Recognize the right tool: Git folders for version control, secret scopes + dbutils.secrets for credentials, and the CLI / REST API / Asset Bundles for CI/CD deployment of jobs and pipelines.
A Concrete CI/CD Workflow
Tie the tools together with a realistic promotion path. A data engineer creates a feature branch inside a workspace Git folder and develops a Lakeflow pipeline notebook that reads dbutils.secrets.get from a secret scope scoped to that environment. When the work is ready, the engineer commits and pushes, then opens a pull request in GitHub where a teammate reviews it. Merging to main triggers a CI pipeline that runs the Databricks CLI to databricks bundle deploy a Databricks Asset Bundle, provisioning the jobs and pipelines into the staging target.
After validation, the same bundle deploys to the production target with production-scoped secrets. Nothing is hand-edited in production; the YAML bundle is the single source of truth, so every environment is reproducible.
Choosing Between %run and notebook.run
Modular code often spans notebooks, and the exam distinguishes the two chaining mechanisms:
| Mechanism | Behavior |
|---|---|
%run ./helpers | Inlines the other notebook into the same execution context, sharing variables and functions |
dbutils.notebook.run(path, timeout, args) | Runs the target in a separate context, accepts parameters, and returns a value |
run() to orchestrate parameterized sub-tasks or build simple workflows in code. For production orchestration with dependencies, retries, and scheduling, prefer a **Lakeflow Job** over notebook-driven chaining. Together these tools move a project from interactive exploration to governed, automated, source-controlled delivery — the maturity the associate exam expects you to demonstrate. run(), and one that mentions schedules, retries, or alerts points to a Lakeflow Job rather than any notebook-chaining mechanism at all.
Git folders quick recap
- Git folders (formerly Repos) connect a workspace to a remote repo (GitHub, GitLab, Azure DevOps, Bitbucket) and version-control notebooks and files.
- Use feature branches and pull requests for review; CI/CD then deploys with Databricks Asset Bundles, not by editing production notebooks directly.
- Notebook outputs are stripped on commit by default, keeping diffs clean.
A team wants their Databricks notebooks under version control with feature branches and pull requests in GitHub. Which Databricks feature do they use?
A notebook retrieves a database password with dbutils.secrets.get and then accidentally prints it. What does Databricks display?
Which Databricks capability lets you define jobs, pipelines, and notebooks as code in a YAML project and deploy them consistently across dev, staging, and prod?