Study Plan and Strategies

Key Takeaways

A structured 4-6 week study plan is recommended, focusing proportional time on each domain based on its exam weight.
Hands-on practice with the Databricks Community Edition is essential — the exam tests practical knowledge, not just theory.
The medallion architecture (bronze/silver/gold) is a foundational concept that appears across multiple exam domains.
Focus on Spark SQL syntax over PySpark for the exam — most code-based questions use SQL.
Review the official Databricks documentation for Lakeflow Declarative Pipelines, Unity Catalog, and Auto Loader as these are heavily tested.

Last updated: March 2026

Study Plan and Strategies

Quick Answer: Plan for 4-6 weeks of dedicated study. Spend 60% of your time on Development & Ingestion and Data Processing (the two largest domains). Use Databricks Community Edition for hands-on practice, focus on Spark SQL syntax, and thoroughly understand Delta Lake, Lakeflow Declarative Pipelines, and Unity Catalog.

Recommended 6-Week Study Plan

Week	Focus Area	Activities
Week 1	Platform Fundamentals	Databricks workspace, clusters, notebooks, Lakehouse architecture
Week 2	Delta Lake Deep Dive	ACID transactions, time travel, OPTIMIZE, VACUUM, Liquid Clustering
Week 3	Data Ingestion	Auto Loader, COPY INTO, Spark SQL reads, schema evolution
Week 4	Data Transformations	SQL transformations, joins, aggregations, higher-order functions, UDFs
Week 5	Production Pipelines	Lakeflow Declarative Pipelines, Databricks Workflows, DABs
Week 6	Governance & Review	Unity Catalog, Delta Sharing, practice exams, weak-area review

Study Strategies That Work

1. Hands-On Practice First

The exam is heavily scenario-based. Reading documentation alone is not sufficient. Use the Databricks Community Edition (free) to:

Create and manage Delta tables
Write Spark SQL and PySpark transformations
Set up Auto Loader jobs
Build Lakeflow Declarative Pipelines
Explore Unity Catalog features

2. Master SQL Over Python

While the exam covers both Spark SQL and PySpark, most code-based questions use SQL syntax. Prioritize learning:

CREATE TABLE / CREATE OR REPLACE TABLE
MERGE INTO for upserts
COPY INTO for file ingestion
Window functions (ROW_NUMBER, RANK, LAG, LEAD)
Higher-order functions (TRANSFORM, FILTER, EXISTS)
Common Table Expressions (CTEs)

3. Understand the Medallion Architecture

The bronze → silver → gold pattern appears across multiple domains:

Bronze: Raw data ingestion (Domain 2)
Silver: Cleansed and enriched data (Domain 3)
Gold: Aggregated business-ready data (Domain 3)
Pipeline orchestration across all layers (Domain 4)
Governance applied at each layer (Domain 5)

4. Focus on "What" Not "How"

The exam tests whether you know what tool to use for a given scenario, not the exact click-by-click steps. For example:

"Which feature incrementally processes new files?" → Auto Loader
"How do you enforce data quality rules?" → Lakeflow expectations
"What provides centralized data governance?" → Unity Catalog

5. Know the Terminology Changes

Databricks has renamed several features. The exam uses the new names:

Old Name	New Name (Exam Term)
Delta Live Tables (DLT)	Lakeflow Declarative Pipelines
Databricks Workflows	Lakeflow Jobs
Databricks Asset Bundles	Declarative Automation Bundles
Databricks Repos	Git folders

Common Pitfalls to Avoid

Memorizing syntax without understanding purpose: Know when to use MERGE INTO vs. INSERT INTO vs. COPY INTO
Ignoring Unity Catalog: Although only 11% of the exam, many candidates underestimate this section
Skipping Lakeflow Declarative Pipelines: This is tested across both Data Processing and Productionizing domains
Not practicing with real data: Abstract understanding breaks down when facing scenario questions
Rushing through the exam: 90 minutes for 45 questions is 2 minutes per question — use all your time

Test Your Knowledge

What is the current Databricks name for Delta Live Tables (DLT)?

Databricks Streaming Tables

Lakeflow Declarative Pipelines

Delta Processing Framework

Lakehouse ETL Pipelines

Test Your Knowledge

Which layer of the medallion architecture contains raw, unvalidated data?

Gold layer

Silver layer

Bronze layer

Platinum layer

Up Next

1.1 The Lakehouse Architecture

Domain 1: Databricks Intelligence Platform (10%)

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

Study Plan and Strategies

Key Takeaways

Study Plan and Strategies

Recommended 6-Week Study Plan

Study Strategies That Work

1. Hands-On Practice First

2. Master SQL Over Python

3. Understand the Medallion Architecture

4. Focus on "What" Not "How"

5. Know the Terminology Changes

Common Pitfalls to Avoid