2.8 Databricks Notebooks and Code Execution

Key Takeaways

  • Notebooks support cell-by-cell execution with multiple languages (Python, SQL, Scala, R) in the same notebook using magic commands.
  • The %run command executes another notebook and makes its variables, functions, and imports available in the calling notebook.
  • dbutils.notebook.run() executes a notebook as a child job and returns a string result, useful for modular orchestration.
  • Spark DataFrames are lazily evaluated — transformations are only executed when an action (count, show, collect, write) is triggered.
  • The display() function in Databricks notebooks renders DataFrames as interactive tables with sorting, filtering, and visualization capabilities.
Last updated: March 2026

Databricks Notebooks and Code Execution

Quick Answer: Notebooks support cell-by-cell execution with magic commands for multi-language support. Use %run for importing code, dbutils.notebook.run() for orchestrated execution, and understand lazy evaluation — transformations only execute when an action is called.

Multi-Language Support

Magic Commands

CommandPurposeExample
%pythonRun cell as Python%python\nprint("Hello")
%sqlRun cell as SQL%sql\nSELECT * FROM orders
%scalaRun cell as Scala%scala\nprintln("Hello")
%rRun cell as R%r\nprint("Hello")
%mdRender cell as Markdown%md\n# Title
%runExecute another notebook%run ./helpers
%shRun shell commands%sh ls /tmp
%fsRun DBFS commands%fs ls /data/
%pipInstall Python packages%pip install pandas==2.0.0

Cross-Language Variable Sharing

Variables defined in one language cell are NOT automatically available in cells of another language. To share data:

# Python: register DataFrame as a temp view
df.createOrReplaceTempView("my_data")
%sql
-- SQL: access the temp view
SELECT * FROM my_data WHERE amount > 100

Running Other Notebooks

%run (Inline Execution)

# Execute helpers notebook — all its variables/functions become available here
%run ./helpers

# Now you can use functions defined in helpers.py
result = process_data(df)
  • Executes in the same Spark session and process
  • Variables and functions from the child notebook are available in the parent
  • Cannot pass parameters (use widgets instead)
  • Blocking: Waits for the child notebook to complete

dbutils.notebook.run() (Child Job Execution)

# Run a notebook as a separate child job
result = dbutils.notebook.run(
    path="./data_processing",
    timeout_seconds=600,
    arguments={"start_date": "2026-01-01", "catalog": "prod"}
)
print(f"Child notebook returned: {result}")
  • Executes in a separate process
  • Returns a string value (set via dbutils.notebook.exit())
  • Can pass parameters as key-value pairs
  • Useful for orchestrating multiple notebooks
Feature%rundbutils.notebook.run()
Execution contextSame processSeparate process
Variable sharingYes (shared scope)No (only return string)
ParametersVia widgetsVia arguments dictionary
Return valueNoneString from notebook.exit()
Error handlingFails parent notebookCatchable exception

Lazy Evaluation

Spark uses lazy evaluation — transformations are not executed immediately. They are only executed when an action is triggered.

Transformations (Lazy — Build Execution Plan)

# These lines do NOT execute yet — they build a plan
df = spark.table("orders")                    # Read plan
filtered = df.filter(col("amount") > 100)     # Filter plan
grouped = filtered.groupBy("customer_id")     # Group plan
result = grouped.agg(sum("amount").alias("total"))  # Aggregation plan

Actions (Eager — Trigger Execution)

# These lines TRIGGER execution of the entire plan
result.show()           # Display results
result.count()          # Return row count
result.collect()        # Return all rows as list
result.write.saveAsTable("summary")  # Write to table
display(result)         # Databricks display function

Why Lazy Evaluation Matters

  1. Optimization: Spark optimizes the entire plan before executing (e.g., predicate pushdown)
  2. Efficiency: Only the data needed for the final result is read and processed
  3. Pipeline composition: You can chain many transformations without intermediate materializations

On the Exam: Understand that transformations are lazy and actions are eager. Know the difference between %run (shared scope, no parameters) and dbutils.notebook.run() (separate process, accepts parameters, returns string).

Test Your Knowledge

A data engineer defines a series of DataFrame transformations (filter, groupBy, agg) but does not see any output. What is the reason?

A
B
C
D
Test Your Knowledge

What is the key difference between %run and dbutils.notebook.run()?

A
B
C
D
Test Your Knowledge

How can data be shared between a Python cell and a SQL cell in the same Databricks notebook?

A
B
C
D