1.2 Databricks Workspace and Components

Key Takeaways

  • A Databricks workspace is the primary environment for accessing all Databricks data, code, and infrastructure assets.
  • Key workspace components include notebooks, clusters, jobs, SQL warehouses, repos (Git folders), and the Unity Catalog explorer.
  • Notebooks support multiple languages (Python, SQL, Scala, R) and allow mixing languages within a single notebook using magic commands (%python, %sql, %scala, %r).
  • The control plane (managed by Databricks) handles the web UI, notebooks, and job scheduling, while the data plane (in your cloud account) runs compute and stores data.
  • Databricks SQL warehouses are optimized compute resources specifically for running SQL queries and powering BI dashboards.
Last updated: March 2026

Databricks Workspace and Components

Quick Answer: A Databricks workspace is a cloud-based environment containing notebooks, clusters, SQL warehouses, jobs, Git folders, and the Unity Catalog explorer. The workspace operates across a control plane (managed by Databricks) and a data plane (in your cloud account).

Workspace Architecture: Control Plane vs. Data Plane

Databricks separates its architecture into two planes:

Control Plane (Managed by Databricks)

  • Web application (UI)
  • Notebook server
  • Job scheduler and orchestration
  • Cluster management and metadata
  • User authentication and workspace settings

Data Plane (In Your Cloud Account)

  • Compute resources (clusters and SQL warehouses)
  • Data storage (cloud object storage: S3, ADLS, GCS)
  • Data processing and query execution
  • Network connectivity to your data sources

On the Exam: Understand that your data never leaves your cloud account. The control plane sends instructions to the data plane, but data processing happens entirely within your infrastructure. This is critical for compliance and data sovereignty.

Key Workspace Components

1. Notebooks

Notebooks are the primary development interface in Databricks:

FeatureDescription
Multi-languagePython, SQL, Scala, R in a single notebook
Magic commands%python, %sql, %scala, %r to switch languages per cell
%runExecute another notebook as if its code was in the current notebook
%fsRun DBFS (Databricks File System) commands
%shExecute shell commands on the driver node
WidgetsParameterize notebooks with dropdown, text, multiselect inputs
Revision historyBuilt-in version control for all notebook changes
-- Example: Using SQL magic command in a Python notebook
%sql
SELECT * FROM my_catalog.my_schema.sales_data
WHERE order_date >= '2026-01-01'
LIMIT 10

2. Clusters (Compute Resources)

Clusters are groups of VMs that execute your notebooks and jobs:

Cluster TypeUse CaseKey Feature
All-purposeInteractive development and explorationShared, always-on (or auto-terminated)
Job clustersAutomated job executionCreated for a job run, terminated after completion
SQL warehousesSQL queries and BI dashboardsOptimized for SQL, supports serverless

Cluster Configuration Options:

  • Single node vs. multi-node: Single node for small workloads; multi-node for distributed processing
  • Autoscaling: Automatically adds/removes worker nodes based on workload
  • Spot instances: Use discounted cloud instances for cost savings (with potential interruptions)
  • Runtime version: Databricks Runtime includes Apache Spark + optimizations + libraries
  • Photon acceleration: Enable the Photon engine for faster SQL and DataFrame operations

3. SQL Warehouses

SQL warehouses are compute resources optimized specifically for SQL workloads:

  • Classic SQL warehouses: VMs in your cloud account
  • Serverless SQL warehouses: Managed by Databricks for instant startup and auto-scaling
  • Pro SQL warehouses: Support for Lakehouse Federation (federated queries to external databases)
  • Used by Databricks SQL, BI tools, and partners for direct Lakehouse queries

4. Git Folders (Repos)

Git folders integrate version control directly into the workspace:

  • Clone repositories from GitHub, GitLab, Bitbucket, Azure DevOps
  • Branch, commit, push, and pull directly from the Databricks UI
  • Sync notebooks and code files with your Git provider
  • Support for .py, .sql, .scala, .r files alongside notebooks

5. Databricks File System (DBFS)

DBFS is an abstraction layer over cloud object storage:

  • Provides a familiar file-system interface (/mnt/, /tmp/, /FileStore/)
  • Backed by your cloud storage account
  • /FileStore/ is for files accessible via the web UI (plots, small uploads)
  • Best Practice: Use Unity Catalog managed locations instead of DBFS mounts for governance

6. Catalog Explorer

The Catalog Explorer in the workspace UI lets you:

  • Browse the three-level namespace: catalog → schema → table/view/volume
  • View table schemas, sample data, and column-level metadata
  • Explore data lineage graphs
  • Manage permissions (GRANT/REVOKE)
  • Search for data assets across the organization
Loading diagram...
Databricks Workspace Architecture
Test Your Knowledge

Which plane in the Databricks architecture is responsible for executing compute workloads and processing data?

A
B
C
D
Test Your Knowledge

Which magic command would you use to run a shell command on the driver node in a Databricks notebook?

A
B
C
D
Test Your Knowledge

What is the key difference between an all-purpose cluster and a job cluster in Databricks?

A
B
C
D
Test Your Knowledge

A data engineer wants to connect a BI tool directly to Databricks for SQL analytics. Which compute resource should they use?

A
B
C
D