Career upgrade: Learn practical AI skills for better jobs and higher pay.
Level up
Cheat sheet

Databricks Data Engineer Cheat Sheet

Databricks Intelligence Platform

0%of exam

Platform CoreCompute PickerDelta + UCControl Plane

Data Ingestion + Loading

0%of exam

Ingestion MethodsAuto LoaderCOPY INTOLakeflow Connect

Data Transformation + Modeling

0%of exam

Lakeflow Jobs

0%of exam

Job OrchestrationTask TypesTriggersRun Repair

CI/CD

0%of exam

Git WorkflowDAB BundleTargetsCLI Deploy

Troubleshooting + Optimization

0%of exam

Spark UIBottlenecksLiquid ClusteringPredictive Optimization

Governance + Security

0%of exam

Unity CatalogPrivilegesMaskingDelta Sharing

Quick Facts

Exam
Data Engineer Associate
Questions
45 scored
Time
90 min
Fee
USD 200
Type
Multiple choice
Delivery
Online/test center
Aides
None
Validity
2 years
Weights
Not published
Code
SQL when possible

Platform Core

Lakehouse
Warehouse + lake
Delta Lake
ACID table layer
Unity Catalog
Governance plane
Workspace
Collaborative UI
Control plane
Databricks services
Data plane
Customer compute
Photon
Vectorized engine
SQL warehouse
SQL compute

Compute Picker

Serverless
Hands-off compute
SQL warehouse
BI queries
All-purpose
Interactive notebooks
Job compute
Per-run compute
Pool
Fast startup
Policy
Compute guardrails
Databricks Connect
Local IDE
Single node
Small tests

Ingest CAL

COPY, Auto Loader, Lakeflow Connect

COPY: batchAuto: filesLakeflow: connectors

COPY INTO vs Auto Loader

COPY INTO

  • Batch files
  • SQL command
  • Simple replay

Auto Loader

  • Continuous files
  • cloudFiles source
  • Schema drift

Batch vs stream

Ingestion Picker

  1. One-time filesCOPY INTO(Batch)
  2. Continuous filesAuto Loader(cloudFiles)
  3. Enterprise SaaSLakeflow Connect(Managed)
  4. Relational sourceJDBC/ODBC(Client)
  5. API sourceREST(Notebook)
  6. Nested JSONAuto Loader(Rescue)
  7. UC file pathVolume(Governed)

Ingestion Methods

COPY INTO
Idempotent batch
Auto Loader
Cloud file stream
Lakeflow Connect
Managed connectors
JDBC/ODBC
Database clients
REST
API ingestion
Partner
External connector
Local file
Manual upload
Volume
UC file storage

Lakeflow Connect vs JDBC

Lakeflow Connect

  • Managed connector
  • Enterprise sources
  • UC landing

JDBC/ODBC

  • Client access
  • Custom logic
  • Notebook code

Managed vs custom

Auto Loader

cloudFiles
Streaming source
schemaLocation
Schema tracking
Checkpoint
Processed file state
File events
Notification mode
Directory listing
Listing mode
addNewColumns
Fail then evolve
rescue
Capture drift
_rescued_data
Unexpected fields

Medallion

Bronze -> Silver -> Gold

Bronze: rawSilver: cleanGold: BIQuality rises

View vs Materialized View

View

  • Query definition
  • Computed on read
  • No stored result

Materialized view

  • Stored result
  • Refreshed data
  • Faster reads

Virtual vs stored

Modeling Picker

  1. Raw landingBronze(Preserve)
  2. Clean entitiesSilver(Validate)
  3. BI metricsGold(Aggregate)
  4. Array columnexplode(Flatten)
  5. Duplicate recordsdropDuplicates(Dedup)
  6. Small dimensionBroadcast join(Avoid shuffle)

Medallion Layers

Bronze
Raw landing
Silver
Cleaned entities
Gold
Business outputs
Raw columns
Bronze preserve
Dedup
Silver cleanup
Metrics
Gold aggregate
Lineage
Layer trace
Quality
Silver/gold checks

Transform Ops

Inner join
Matched rows
Left join
Keep left
Broadcast join
Small side shipped
Cross join
Cartesian product
Union
Distinct rows
Union all
Keep duplicates
explode
Array to rows
dropDuplicates
Remove duplicates
approx_count_distinct
Fast cardinality
summary
Column statistics

Gold Objects

Table
Stored Delta data
View
Stored query
Materialized view
Precomputed query
Streaming table
Incremental table
BI table
Consumption model
Metric view
Governed metrics
Dashboard
Visual consumer
Quality rule
Reliability check

Jobs DAG

Tasks depend, triggers start, repair reruns

DAG: orderTrigger: startRepair: failed tasks

Scheduled vs Data Trigger

Scheduled

  • Clock based
  • Fixed cadence
  • May run empty

Data trigger

  • Data arrival
  • Table/file change
  • Event aligned

Time vs readiness

Production Picker

  1. Task dependenciesLakeflow Jobs(DAG)
  2. Clock cadenceScheduled(Time)
  3. New filesFile arrival(Trigger)
  4. Source tables changeTable update(Trigger)
  5. Failed taskRepair run(Rerun)
  6. Dev to prodDAB(Deploy)

Job Orchestration

Lakeflow Job
Pipeline orchestration
Task
Work unit
DAG
Dependency graph
Retry
Failure rerun
Branching
Conditional path
Looping
Repeated task
Repair run
Failed-task rerun
Run history
Execution trends

Triggers

Scheduled
Time based
File arrival
New files
Table update
Table changed
Continuous
Always running
Manual
Run now
External
Outside orchestrator
Dependency
Upstream task
Failure rate
Health signal

Git Folder vs DAB

Git folder

  • Branch code
  • Commit changes
  • Pull request

DAB

  • Package assets
  • Target config
  • Deploy environments

Version vs deploy

DAB Bundle

DAB
Declarative deployment
databricks.yml
Bundle config
Target
Environment config
Variables
Reusable settings
Override
Target-specific value
Validate
Check bundle
Deploy
Promote assets
Git folder
Workspace repo

Spark UI

Skew, shuffle, spill explain slowness

Skew: unevenShuffle: exchangeSpill: disk

OPTIMIZE vs VACUUM

OPTIMIZE

  • Compact files
  • Improve reads
  • Layout rewrite

VACUUM

  • Delete old files
  • Reduce storage
  • Limits time travel

Speed vs cleanup

Spark Tuning

Skew
Uneven partitions
Shuffle
Data exchange
Spill
Disk overflow
Partitions
Parallelism units
autoBroadcast
Small join threshold
Executor memory
Worker heap
Driver memory
Coordinator heap
Spark UI
Stage diagnostics

UC Access

Grant base, restrict finer

GRANT: allowDENY: blockMask: valuesFilter: rows

Managed vs External Table

Managed

  • UC storage
  • UC lifecycle
  • Drop deletes data

External

  • External location
  • Shared files
  • Drop keeps data

Owned vs referenced

Governance Picker

  1. Central governanceUnity Catalog(UC)
  2. Read accessSELECT(Privilege)
  3. Hide rowsRow filter(RLS)
  4. Hide valuesColumn mask(CLS)
  5. Tag-driven rulesABAC(Central)
  6. External sharingDelta Sharing(Read-only)

Governance Core

Catalog
Top namespace
Schema
Database namespace
Managed table
UC lifecycle
External table
External location
GRANT
Add permission
REVOKE
Remove permission
DENY
Explicit block
Lineage
Data flow
Delta Sharing
Read-only share
Federation
External querying

Row Filter vs Column Mask

Row filter

  • Restrict rows
  • Tenant isolation
  • WHERE-like logic

Column mask

  • Transform values
  • PII redaction
  • Column function

Rows vs values

Security Controls

Principal
User/group/SP
USE CATALOG
Catalog access
USE SCHEMA
Schema access
SELECT
Read rows
MODIFY
Write rows
Row filter
Hide rows
Column mask
Transform values
ABAC
Tag-based control
Audit logs
Access evidence
System tables
Operational telemetry

ABAC vs Table Controls

ABAC

  • Tag matched
  • Central policy
  • Owner cannot remove

Table controls

  • Per-table setup
  • ALTER TABLE
  • Owner managed

Central vs local

Common Traps

Current weights

May guide omits weights Avoid legacy percentages

COPY vs Auto Loader

COPY is batch Auto Loader streams

Rescue vs evolve

Rescue captures drift addNewColumns evolves

View vs MV

View stores logic MV stores results

External drop

Table metadata drops Files remain

Privilege stack

Need USE parents SELECT reads table

ABAC access

ABAC restricts only GRANT still required

VACUUM risk

Cleans old files Breaks old time travel

Triggers

Schedule uses clock Data trigger waits

Last Minute

  1. 1.45 scored; 90 minutes
  2. 2.May guide: no weights
  3. 3.SQL when possible
  4. 4.Auto Loader = cloudFiles
  5. 5.COPY INTO = batch
  6. 6.Bronze raw; Silver clean
  7. 7.Gold = BI outputs
  8. 8.Jobs are DAG tasks
  9. 9.DAB deploys assets
  10. 10.Spark UI finds skew
  11. 11.UC governs data
  12. 12.Masks values; filters rows
  13. 13.ABAC needs base grants
  14. 14.Delta Sharing is read-only
Same family resources

Explore More Databricks Certifications

Continue into nearby exams from the same family. Each card keeps practice questions, study guides, flashcards, videos, and articles in one place.