Which plane in the Databricks architecture is responsible for executing compute workloads and processing data?

The data plane, running in the customer's cloud account. The data plane runs in the customer's cloud account and is responsible for all compute workloads and data processing. The control plane, managed by Databricks, handles the web UI, notebook server, and job scheduling but does not process customer data.

Which magic command would you use to run a shell command on the driver node in a Databricks notebook?

%sh. %sh executes shell commands on the driver node. %python runs Python code, %run executes another notebook, and %fs runs DBFS (Databricks File System) utility commands.

What is the key difference between an all-purpose cluster and a job cluster in Databricks?

Job clusters are created for a specific job run and terminated after completion, while all-purpose clusters persist for interactive use. Job clusters are ephemeral — they are created when a job starts and automatically terminated when it completes. All-purpose clusters persist and are shared for interactive development, exploration, and collaboration. Using job clusters for automated workloads is more cost-effective.

A data engineer wants to connect a BI tool directly to Databricks for SQL analytics. Which compute resource should they use?

A SQL warehouse. SQL warehouses are specifically optimized for SQL queries and BI tool connectivity. They provide instant startup (serverless), auto-scaling, and are designed for concurrent SQL workloads from BI tools, dashboards, and ad-hoc queries.

Databricks Workspace and Components

Quick Answer: A Databricks workspace is a cloud-based environment containing notebooks, clusters, SQL warehouses, jobs, Git folders, and the Unity Catalog explorer. The workspace operates across a control plane (managed by Databricks) and a data plane (in your cloud account).

Workspace Architecture: Control Plane vs. Data Plane

Databricks separates its architecture into two planes:

Control Plane (Managed by Databricks)

Web application (UI)
Notebook server
Job scheduler and orchestration
Cluster management and metadata
User authentication and workspace settings

Data Plane (In Your Cloud Account)

Compute resources (clusters and SQL warehouses)
Data storage (cloud object storage: S3, ADLS, GCS)
Data processing and query execution
Network connectivity to your data sources

On the Exam: Understand that your data never leaves your cloud account. The control plane sends instructions to the data plane, but data processing happens entirely within your infrastructure. This is critical for compliance and data sovereignty.

Key Workspace Components

1. Notebooks

Notebooks are the primary development interface in Databricks:

Feature	Description
Multi-language	Python, SQL, Scala, R in a single notebook
Magic commands	`%python`, `%sql`, `%scala`, `%r` to switch languages per cell
`%run`	Execute another notebook as if its code was in the current notebook
`%fs`	Run DBFS (Databricks File System) commands
`%sh`	Execute shell commands on the driver node
Widgets	Parameterize notebooks with dropdown, text, multiselect inputs
Revision history	Built-in version control for all notebook changes

-- Example: Using SQL magic command in a Python notebook
%sql
SELECT * FROM my_catalog.my_schema.sales_data
WHERE order_date >= '2026-01-01'
LIMIT 10

2. Clusters (Compute Resources)

Clusters are groups of VMs that execute your notebooks and jobs:

Cluster Type	Use Case	Key Feature
All-purpose	Interactive development and exploration	Shared, always-on (or auto-terminated)
Job clusters	Automated job execution	Created for a job run, terminated after completion
SQL warehouses	SQL queries and BI dashboards	Optimized for SQL, supports serverless

Cluster Configuration Options:

Single node vs. multi-node: Single node for small workloads; multi-node for distributed processing
Autoscaling: Automatically adds/removes worker nodes based on workload
Spot instances: Use discounted cloud instances for cost savings (with potential interruptions)
Runtime version: Databricks Runtime includes Apache Spark + optimizations + libraries
Photon acceleration: Enable the Photon engine for faster SQL and DataFrame operations

3. SQL Warehouses

SQL warehouses are compute resources optimized specifically for SQL workloads:

Classic SQL warehouses: VMs in your cloud account
Serverless SQL warehouses: Managed by Databricks for instant startup and auto-scaling
Pro SQL warehouses: Support for Lakehouse Federation (federated queries to external databases)
Used by Databricks SQL, BI tools, and partners for direct Lakehouse queries

4. Git Folders (Repos)

Git folders integrate version control directly into the workspace:

Clone repositories from GitHub, GitLab, Bitbucket, Azure DevOps
Branch, commit, push, and pull directly from the Databricks UI
Sync notebooks and code files with your Git provider
Support for .py, .sql, .scala, .r files alongside notebooks

5. Databricks File System (DBFS)

DBFS is an abstraction layer over cloud object storage:

Provides a familiar file-system interface (/mnt/, /tmp/, /FileStore/)
Backed by your cloud storage account
/FileStore/ is for files accessible via the web UI (plots, small uploads)
Best Practice: Use Unity Catalog managed locations instead of DBFS mounts for governance

6. Catalog Explorer

The Catalog Explorer in the workspace UI lets you:

Browse the three-level namespace: catalog → schema → table/view/volume
View table schemas, sample data, and column-level metadata
Explore data lineage graphs
Manage permissions (GRANT/REVOKE)
Search for data assets across the organization

Databricks Certified Data Engineer Associate

1.2 Databricks Workspace and Components

Key Takeaways

Databricks Workspace and Components

Workspace Architecture: Control Plane vs. Data Plane

Control Plane (Managed by Databricks)

Data Plane (In Your Cloud Account)

Key Workspace Components

1. Notebooks

2. Clusters (Compute Resources)

3. SQL Warehouses

4. Git Folders (Repos)

5. Databricks File System (DBFS)

6. Catalog Explorer

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

1.2 Databricks Workspace and Components

Key Takeaways

Databricks Workspace and Components

Workspace Architecture: Control Plane vs. Data Plane

Control Plane (Managed by Databricks)

Data Plane (In Your Cloud Account)

Key Workspace Components

1. Notebooks

2. Clusters (Compute Resources)

3. SQL Warehouses

4. Git Folders (Repos)

5. Databricks File System (DBFS)

6. Catalog Explorer