4.4 Testing and Deployment Best Practices
Key Takeaways
- Development, staging, and production environments should be isolated using separate catalogs or schemas in Unity Catalog.
- Lakeflow Declarative Pipelines in development mode create tables prefixed with the developer name to avoid conflicts.
- Integration tests validate end-to-end pipeline behavior using small sample datasets in a staging environment.
- Promotion from dev to prod should follow a code review, CI/CD pipeline, and validation workflow.
- Parameterize notebooks and jobs with environment-specific values (catalog names, storage paths) rather than hardcoding.
Last updated: March 2026
Testing and Deployment Best Practices
Quick Answer: Use separate Unity Catalog catalogs for dev/staging/prod environments. Parameterize code with environment variables. Deploy via Databricks Asset Bundles with CI/CD. Test with integration tests on sample data in staging before promoting to production.
Environment Isolation
Unity Catalog-Based Isolation
Production: prod_catalog.schema.table
Staging: staging_catalog.schema.table
Development: dev_catalog.schema.table
| Environment | Catalog | Purpose | Access |
|---|---|---|---|
| Development | dev_catalog | Individual developer exploration and testing | Developers |
| Staging | staging_catalog | Integration testing with production-like data | CI/CD pipeline |
| Production | prod_catalog | Live data serving BI, ML, and applications | Production service principals |
Parameterization
# Use widgets for environment-specific configuration
dbutils.widgets.text("catalog", "dev_catalog", "Target Catalog")
catalog = dbutils.widgets.get("catalog")
# All table references use the parameterized catalog
spark.sql(f"USE CATALOG {catalog}")
spark.sql(f"SELECT * FROM {catalog}.silver.orders")
-- SQL with parameter
USE CATALOG ${catalog};
SELECT * FROM silver.orders;
Testing Strategies
Unit Tests
# Test individual transformation functions
def test_classify_amount():
assert classify_amount(1500) == "high"
assert classify_amount(500) == "medium"
assert classify_amount(50) == "low"
# Run with pytest
# pytest tests/test_transforms.py
Integration Tests
# Test end-to-end pipeline with sample data
def test_pipeline_end_to_end():
# 1. Create sample input data
sample_data = spark.createDataFrame([
(1, 100, "2026-01-01"),
(2, 200, "2026-01-02"),
], ["order_id", "amount", "order_date"])
# 2. Write to staging bronze table
sample_data.write.mode("overwrite").saveAsTable(
"staging_catalog.bronze.test_orders"
)
# 3. Run the pipeline (or the transformation notebook)
dbutils.notebook.run("./transform", timeout_seconds=300)
# 4. Verify output
result = spark.table("staging_catalog.silver.test_orders")
assert result.count() == 2
assert result.filter("amount <= 0").count() == 0
Data Quality Tests
-- Validate data quality after pipeline run
-- These can be SQL tasks in a Databricks job
-- Test: No null order IDs
SELECT COUNT(*) AS null_count
FROM prod_catalog.silver.orders
WHERE order_id IS NULL;
-- Expected: 0
-- Test: All amounts are positive
SELECT COUNT(*) AS negative_count
FROM prod_catalog.silver.orders
WHERE amount <= 0;
-- Expected: 0
-- Test: No duplicate orders
SELECT order_id, COUNT(*) AS cnt
FROM prod_catalog.silver.orders
GROUP BY order_id HAVING cnt > 1;
-- Expected: 0 rows
Deployment Workflow
Recommended CI/CD Pipeline
1. Developer creates feature branch
2. Developer tests locally in dev environment
3. Pull request submitted for code review
4. CI pipeline runs:
a. databricks bundle validate -t staging
b. Deploy to staging: databricks bundle deploy -t staging
c. Run integration tests: databricks bundle run test-job -t staging
5. Code review approved
6. Merge to main branch
7. CD pipeline runs:
a. databricks bundle validate -t prod
b. Deploy to production: databricks bundle deploy -t prod
Deployment Checklist
| Check | Verification |
|---|---|
| Bundle validates | databricks bundle validate succeeds |
| Integration tests pass | All staging tests complete successfully |
| Data quality validated | Expectations pass in staging pipeline run |
| Code reviewed | PR approved by at least one reviewer |
| Performance tested | Pipeline completes within SLA on staging data |
| Rollback plan | Previous version can be restored quickly |
Lakeflow Pipeline Modes
| Mode | Behavior | Use Case |
|---|---|---|
| Development | Tables prefixed with developer name; pipeline does not publish to target | Developer testing and iteration |
| Production | Tables published to target catalog/schema; full execution | Live production workloads |
On the Exam: Understand the concept of environment isolation using catalogs, parameterization for environment-agnostic code, and the deployment workflow from dev to staging to production using Asset Bundles and CI/CD.
Test Your Knowledge
How should a data engineer isolate development, staging, and production environments in a Databricks Lakehouse?
A
B
C
D
Test Your Knowledge
What behavior does "development mode" enable for Lakeflow Declarative Pipelines?
A
B
C
D