Databricks Intelligence Platform
0%of exam
Data Ingestion + Loading
0%of exam
Data Transformation + Modeling
0%of exam
Lakeflow Jobs
0%of exam
CI/CD
0%of exam
Troubleshooting + Optimization
0%of exam
Governance + Security
0%of exam
Quick Facts
- Exam
- Data Engineer Associate
- Questions
- 45 scored
- Time
- 90 min
- Fee
- USD 200
- Type
- Multiple choice
- Delivery
- Online/test center
- Aides
- None
- Validity
- 2 years
- Weights
- Not published
- Code
- SQL when possible
Platform Core
- Lakehouse
- Warehouse + lake
- Delta Lake
- ACID table layer
- Unity Catalog
- Governance plane
- Workspace
- Collaborative UI
- Control plane
- Databricks services
- Data plane
- Customer compute
- Photon
- Vectorized engine
- SQL warehouse
- SQL compute
Compute Picker
- Serverless
- Hands-off compute
- SQL warehouse
- BI queries
- All-purpose
- Interactive notebooks
- Job compute
- Per-run compute
- Pool
- Fast startup
- Policy
- Compute guardrails
- Databricks Connect
- Local IDE
- Single node
- Small tests
Ingest CAL
COPY, Auto Loader, Lakeflow Connect
COPY INTO vs Auto Loader
COPY INTO
- Batch files
- SQL command
- Simple replay
Auto Loader
- Continuous files
- cloudFiles source
- Schema drift
Batch vs stream
Ingestion Picker
- One-time files→COPY INTO(Batch)
- Continuous files→Auto Loader(cloudFiles)
- Enterprise SaaS→Lakeflow Connect(Managed)
- Relational source→JDBC/ODBC(Client)
- API source→REST(Notebook)
- Nested JSON→Auto Loader(Rescue)
- UC file path→Volume(Governed)
Ingestion Methods
- COPY INTO
- Idempotent batch
- Auto Loader
- Cloud file stream
- Lakeflow Connect
- Managed connectors
- JDBC/ODBC
- Database clients
- REST
- API ingestion
- Partner
- External connector
- Local file
- Manual upload
- Volume
- UC file storage
Lakeflow Connect vs JDBC
Lakeflow Connect
- Managed connector
- Enterprise sources
- UC landing
JDBC/ODBC
- Client access
- Custom logic
- Notebook code
Managed vs custom
Auto Loader
- cloudFiles
- Streaming source
- schemaLocation
- Schema tracking
- Checkpoint
- Processed file state
- File events
- Notification mode
- Directory listing
- Listing mode
- addNewColumns
- Fail then evolve
- rescue
- Capture drift
- _rescued_data
- Unexpected fields
Medallion
Bronze -> Silver -> Gold
View vs Materialized View
View
- Query definition
- Computed on read
- No stored result
Materialized view
- Stored result
- Refreshed data
- Faster reads
Virtual vs stored
Modeling Picker
- Raw landing→Bronze(Preserve)
- Clean entities→Silver(Validate)
- BI metrics→Gold(Aggregate)
- Array column→explode(Flatten)
- Duplicate records→dropDuplicates(Dedup)
- Small dimension→Broadcast join(Avoid shuffle)
Medallion Layers
- Bronze
- Raw landing
- Silver
- Cleaned entities
- Gold
- Business outputs
- Raw columns
- Bronze preserve
- Dedup
- Silver cleanup
- Metrics
- Gold aggregate
- Lineage
- Layer trace
- Quality
- Silver/gold checks
Transform Ops
- Inner join
- Matched rows
- Left join
- Keep left
- Broadcast join
- Small side shipped
- Cross join
- Cartesian product
- Union
- Distinct rows
- Union all
- Keep duplicates
- explode
- Array to rows
- dropDuplicates
- Remove duplicates
- approx_count_distinct
- Fast cardinality
- summary
- Column statistics
Gold Objects
- Table
- Stored Delta data
- View
- Stored query
- Materialized view
- Precomputed query
- Streaming table
- Incremental table
- BI table
- Consumption model
- Metric view
- Governed metrics
- Dashboard
- Visual consumer
- Quality rule
- Reliability check
Jobs DAG
Tasks depend, triggers start, repair reruns
Scheduled vs Data Trigger
Scheduled
- Clock based
- Fixed cadence
- May run empty
Data trigger
- Data arrival
- Table/file change
- Event aligned
Time vs readiness
Production Picker
- Task dependencies→Lakeflow Jobs(DAG)
- Clock cadence→Scheduled(Time)
- New files→File arrival(Trigger)
- Source tables change→Table update(Trigger)
- Failed task→Repair run(Rerun)
- Dev to prod→DAB(Deploy)
Job Orchestration
- Lakeflow Job
- Pipeline orchestration
- Task
- Work unit
- DAG
- Dependency graph
- Retry
- Failure rerun
- Branching
- Conditional path
- Looping
- Repeated task
- Repair run
- Failed-task rerun
- Run history
- Execution trends
Triggers
- Scheduled
- Time based
- File arrival
- New files
- Table update
- Table changed
- Continuous
- Always running
- Manual
- Run now
- External
- Outside orchestrator
- Dependency
- Upstream task
- Failure rate
- Health signal
Git Folder vs DAB
Git folder
- Branch code
- Commit changes
- Pull request
DAB
- Package assets
- Target config
- Deploy environments
Version vs deploy
DAB Bundle
- DAB
- Declarative deployment
- databricks.yml
- Bundle config
- Target
- Environment config
- Variables
- Reusable settings
- Override
- Target-specific value
- Validate
- Check bundle
- Deploy
- Promote assets
- Git folder
- Workspace repo
Spark UI
Skew, shuffle, spill explain slowness
OPTIMIZE vs VACUUM
OPTIMIZE
- Compact files
- Improve reads
- Layout rewrite
VACUUM
- Delete old files
- Reduce storage
- Limits time travel
Speed vs cleanup
Spark Tuning
- Skew
- Uneven partitions
- Shuffle
- Data exchange
- Spill
- Disk overflow
- Partitions
- Parallelism units
- autoBroadcast
- Small join threshold
- Executor memory
- Worker heap
- Driver memory
- Coordinator heap
- Spark UI
- Stage diagnostics
UC Access
Grant base, restrict finer
Managed vs External Table
Managed
- UC storage
- UC lifecycle
- Drop deletes data
External
- External location
- Shared files
- Drop keeps data
Owned vs referenced
Governance Picker
- Central governance→Unity Catalog(UC)
- Read access→SELECT(Privilege)
- Hide rows→Row filter(RLS)
- Hide values→Column mask(CLS)
- Tag-driven rules→ABAC(Central)
- External sharing→Delta Sharing(Read-only)
Governance Core
- Catalog
- Top namespace
- Schema
- Database namespace
- Managed table
- UC lifecycle
- External table
- External location
- GRANT
- Add permission
- REVOKE
- Remove permission
- DENY
- Explicit block
- Lineage
- Data flow
- Delta Sharing
- Read-only share
- Federation
- External querying
Row Filter vs Column Mask
Row filter
- Restrict rows
- Tenant isolation
- WHERE-like logic
Column mask
- Transform values
- PII redaction
- Column function
Rows vs values
Security Controls
- Principal
- User/group/SP
- USE CATALOG
- Catalog access
- USE SCHEMA
- Schema access
- SELECT
- Read rows
- MODIFY
- Write rows
- Row filter
- Hide rows
- Column mask
- Transform values
- ABAC
- Tag-based control
- Audit logs
- Access evidence
- System tables
- Operational telemetry
ABAC vs Table Controls
ABAC
- Tag matched
- Central policy
- Owner cannot remove
Table controls
- Per-table setup
- ALTER TABLE
- Owner managed
Central vs local
Common Traps
Current weights
May guide omits weights ≠ Avoid legacy percentages
COPY vs Auto Loader
COPY is batch ≠ Auto Loader streams
Rescue vs evolve
Rescue captures drift ≠ addNewColumns evolves
View vs MV
View stores logic ≠ MV stores results
External drop
Table metadata drops ≠ Files remain
Privilege stack
Need USE parents ≠ SELECT reads table
ABAC access
ABAC restricts only ≠ GRANT still required
VACUUM risk
Cleans old files ≠ Breaks old time travel
Triggers
Schedule uses clock ≠ Data trigger waits
Last Minute
- 1.45 scored; 90 minutes
- 2.May guide: no weights
- 3.SQL when possible
- 4.Auto Loader = cloudFiles
- 5.COPY INTO = batch
- 6.Bronze raw; Silver clean
- 7.Gold = BI outputs
- 8.Jobs are DAG tasks
- 9.DAB deploys assets
- 10.Spark UI finds skew
- 11.UC governs data
- 12.Masks values; filters rows
- 13.ABAC needs base grants
- 14.Delta Sharing is read-only
Explore More Databricks Certifications
Continue into nearby exams from the same family. Each card keeps practice questions, study guides, flashcards, videos, and articles in one place.
More From This Family
Videos and articles for deeper review.
