1.2 Databricks Workspace and Components

Key Takeaways

  • The workspace is the web UI organizing notebooks, queries, dashboards, Git folders, and access to compute, jobs, and data.
  • Notebooks contain runnable cells and support multiple languages via magic commands (%sql, %python, %scala, %r, %md, %run).
  • Catalog Explorer browses Unity Catalog assets: catalogs, schemas, tables, views, volumes, functions, and registered models.
  • Lakeflow Jobs orchestrate and schedule multi-task workflows; Lakeflow Spark Declarative Pipelines build managed ETL.
  • DBFS is a deprecated legacy pattern; Databricks now recommends Unity Catalog volumes, external locations, and workspace files.
Last updated: June 2026

The Workspace as the Control Plane

The workspace is the browser-based environment where you do all data-engineering work on Databricks. The left sidebar groups everything by function: Workspace (your folders and notebooks), Catalog (data assets), Jobs & Pipelines (orchestration), Compute (clusters and warehouses), and SQL (queries and dashboards). The workspace persists your code and configuration; the actual data lives in cloud object storage, governed by Unity Catalog.

Notebooks

A notebook is a web document of runnable cells. Each cell runs a command and shows its output inline — a result table, a chart, or text. Notebooks are attached to a cluster or warehouse, which supplies the compute. They are polyglot: a default language is set per notebook, but you can switch language per cell with magic commands:

Magic commandPurpose
%python / %sql / %scala / %rRun that cell in the named language
%mdRender Markdown documentation
%run ./other_notebookExecute another notebook inline (modular code)
%shRun a shell command on the driver node
%fsFile-system utility shortcut (e.g., %fs ls)

Notebooks also expose dbutils — utilities for the file system (dbutils.fs), widgets for parameters (dbutils.widgets), secrets (dbutils.secrets), and notebook chaining (dbutils.notebook.run).

Compute, Jobs, and Pipelines

Compute is where you create and manage clusters (Spark compute for notebooks and jobs) and SQL warehouses (compute for Databricks SQL). Clusters come in two flavors covered later: long-lived all-purpose compute for interactive development, and ephemeral job compute that the scheduler spins up per run and tears down after.

Under Jobs & Pipelines you find Databricks' orchestration tools, now branded Lakeflow:

  • Lakeflow Jobs (formerly Databricks Workflows) define multi-task workflows. A job is a DAG of tasks — notebooks, SQL, Python scripts, JARs, or pipeline runs — with dependencies, schedules (cron), retries, and alerts.
  • Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables / DLT) provide a declarative framework for building reliable ETL: you declare the target tables and transformations, and the runtime manages dependencies, incremental processing, and data-quality expectations.
  • Lakeflow Connect offers managed connectors for ingesting from external databases and SaaS applications.

Catalog Explorer and Data Assets

Catalog Explorer is the UI for browsing and managing Unity Catalog assets. It surfaces the full hierarchy — catalogs, schemas (databases), tables, views, volumes (non-tabular files), functions, and registered models — along with owners, permissions, lineage, and sample data. You manage grants and explore relationships here without writing SQL.

Storage: Volumes Replace DBFS

The Databricks File System (DBFS) root and mounts are a deprecated legacy pattern. Databricks now recommends governed storage:

  • Unity Catalog volumes — governed locations for non-tabular files (images, CSVs, models), addressed as /Volumes/catalog/schema/volume/....
  • External locations — registered cloud paths governed by Unity Catalog.
  • Workspace files — small files stored alongside notebooks in Git folders.

Accounts created after December 18, 2025 do not get DBFS root, mounts, or the Hive Metastore at all — they are Unity Catalog only. On the exam, treat volumes as the modern, governed answer for file storage.

Control Plane vs Data Plane

Understanding where things run helps you reason about security and cost. Databricks splits its architecture into two planes. The control plane is managed by Databricks and hosts the workspace UI, notebooks, job configurations, query history, and cluster management — the metadata and orchestration of your work. The compute plane (data plane) is where clusters and warehouses actually run and where your data is processed. For classic compute the data plane lives in your own cloud account, so machines and storage stay within your subscription; for serverless compute it runs in Databricks-managed infrastructure.

Either way, your table data never lives in the control plane — it stays in your cloud object storage, governed by Unity Catalog. This separation is why notebooks and job definitions persist even when every cluster is terminated.

Tying the Workspace Together

A day of data engineering threads through these components. You open a notebook in a Git folder, attach it to all-purpose compute, and develop a transformation, reading source files from a Unity Catalog volume and writing Delta tables you inspect in Catalog Explorer. When the logic is ready, you wrap it in a Lakeflow Job that runs on cheaper job compute on a schedule, with retries and email alerts. Analysts then query the resulting Gold tables in Databricks SQL and build dashboards.

Every asset — folders, notebooks, jobs, queries, dashboards, and the data itself — is reachable from the one workspace sidebar, and all data access is mediated by Unity Catalog so permissions stay consistent no matter which surface a user comes through. Recognizing which sidebar area owns which task (Workspace for code, Catalog for data, Jobs & Pipelines for orchestration, Compute for clusters, SQL for analytics) is exactly the kind of orientation the associate exam checks.

A useful rule of thumb: if a question asks where to find or grant access to data, the answer involves Catalog Explorer and Unity Catalog; if it asks how to schedule or automate work, the answer involves Lakeflow Jobs and job compute; and if it asks where credentials or files belong, the answer is a secret scope or a Unity Catalog volume, never a hard-coded value or a DBFS mount.

Test Your Knowledge

A notebook's default language is Python, but a data engineer needs to run one cell as SQL. What is the correct approach?

A
B
C
D
Test Your Knowledge

Which Databricks UI is used to browse Unity Catalog catalogs, schemas, tables, volumes, functions, and registered models along with their owners and lineage?

A
B
C
D
Test Your Knowledge

Databricks now recommends storing non-tabular files (such as images or raw CSVs) in which governed location rather than DBFS root?

A
B
C
D
Test Your Knowledge

What is the role of Lakeflow Jobs in the workspace?

A
B
C
D