4.4 Testing and Deployment Best Practices
Key Takeaways
- Databricks recommends keeping reusable logic and its unit tests in Python modules (.py files), not inside notebooks, so they can be imported and tested with pytest.
- Unit tests use pytest (Python), testthat (R), or ScalaTest (Scala); run them locally with Databricks Connect or in-notebook with pytest.main().
- Do not run mutating tests against production data; a common pattern is testing against views or disposable fixtures rather than production tables.
- Promote code through dev, staging, and prod environments, deploying with Databricks Asset Bundles and a CI system that runs validate plus tests on every pull request.
- Separate dev/staging/prod into different catalogs or workspaces with their own data and service principals so tests never touch production assets.
Modularize Code Out of Notebooks
The single most important testing practice on Databricks is moving reusable logic out of notebooks and into Python modules (.py files). A notebook is great for exploration but awkward to unit test; a function in a module can be imported, called with controlled inputs, and asserted against expected outputs.
The recommended structure:
- Put transformation functions in
src/modules. - Put their tests in
tests/files namedtest_*.pyor*_test.py(the patterns pytest auto-discovers). - Keep notebooks thin — they orchestrate by importing the tested functions.
This separation is what makes pytest usable: pytest's fixtures, readability, and auto-discovery let you assert on small, pure functions instead of an entire notebook. For R use testthat; for Scala use ScalaTest.
Running Unit and Integration Tests
There are three common places to run tests:
| Approach | How | When to use |
|---|---|---|
| Local + Databricks Connect | Run pytest on your laptop; Spark executes on a remote cluster | Fast inner loop, IDE debugging |
| In-notebook | Call pytest.main() in a notebook | Need real Spark/runtime config fidelity |
| CI runner | GitHub Actions runs pytest on every PR | Gate merges automatically |
Databricks Connect lets your local IDE send Spark work to a Databricks cluster, so you write and debug PySpark with full editor support while real Spark runs the job. Running pytest.main() inside a notebook gives the highest fidelity because the test sees the exact Spark session and runtime variables of the cluster.
Critical rule: do not run unit tests against functions that add, remove, or change production data. A common safeguard is to test against views or disposable fixture tables instead of production tables.
Dev / Staging / Prod Promotion
Production-grade teams maintain three isolated environments and promote code through them:
- Development — engineers build and run unit tests; deployed with DABs
mode: development, which prefixes resources and pauses schedules. - Staging — a production-like environment for integration tests and data validation on representative data, deployed by CI under a service principal.
- Production — stable, scheduled, deployed with
mode: productionunder a service principal.
Isolation should be real: separate Unity Catalog catalogs (dev, staging, prod) or separate workspaces, each with its own data and its own service principal, so a test can never touch prod. Variables in the bundle (catalog name, warehouse ID) switch automatically per target.
CI/CD Glue
The automation that ties testing to deployment is a CI/CD pipeline (commonly GitHub Actions). A robust flow:
- On pull request: run
databricks bundle validateandpytest. A red test blocks the merge. - On merge to release branch:
databricks bundle deploy -t staging, then run integration tests against staging. - On promotion:
databricks bundle deploy -t prod.
Because the Databricks Asset Bundle is the single source of truth, the job that runs in prod is exactly the one that was reviewed and tested — no manual UI drift. Pin the CLI version in the pipeline for reproducibility and store credentials as CI secrets bound to a service principal. The payoff: every change is version-controlled, peer-reviewed, automatically tested, and deployed identically across environments.
The Testing Pyramid for Data Pipelines
Not all tests are equal; layer them like a pyramid — many fast unit tests, fewer slower integration tests, a handful of end-to-end checks:
| Test type | Scope | Where it runs |
|---|---|---|
| Unit | One pure function on tiny in-memory data | Local pytest / Databricks Connect |
| Integration | A task reading/writing real Delta tables | Staging environment |
| Data-quality | Expectations on actual output (row counts, nulls, ranges) | Staging / prod via pipeline expectations |
| End-to-end | The whole job DAG on representative data | Staging before promotion |
Unit tests catch logic bugs in seconds; integration tests catch wiring problems (schema mismatch, missing permission); data-quality tests catch bad data that is technically valid code. Lakeflow Declarative Pipelines expectations (@dlt.expect) act as built-in, continuously evaluated data-quality tests in staging and prod.
Common Pitfalls the Exam Probes
- Embedding all logic in a notebook so nothing is unit-testable.
- Pointing tests at production tables (mutating tests can corrupt data).
- Deploying from a laptop by hand instead of a CI pipeline, causing UI drift.
- Sharing one identity across environments instead of per-environment service principals.
The fix for every pitfall is the same discipline: modular code, isolated environments, automated tests, and bundle-driven deployment.
Exam pointers
Run tests with pytest.main(); never run mutating tests against production tables (use views or fixtures); and promote through dev, staging, prod with separate catalogs and service principals, deploying via Databricks Asset Bundles from a CI pipeline that gates merges on passing tests. Layer the tests as a pyramid — many fast unit tests, fewer integration tests, and a small set of end-to-end and data-quality checks — so most defects are caught cheaply in seconds before any code reaches staging or production.
Testing quick recap
- Move logic into importable
.pymodules and test with pytest (or testthat/ScalaTest); run via Databricks Connect or in-notebook. - Never mutate production data in tests — use fixtures, test views, or a separate catalog.
- Isolate dev/staging/prod with separate catalogs and service principals, and promote with Asset Bundles + CI/CD.
Databricks recommends which structure for testable data-engineering code?
Why should unit tests avoid running directly against production tables?
Which tool lets you write and debug PySpark in a local IDE while Spark actually executes on a remote Databricks cluster?
Which environment-isolation practice best prevents a test from ever touching production data?