2.8 Databricks Notebooks and Code Execution
Key Takeaways
- Notebooks support cell-by-cell execution with multiple languages (Python, SQL, Scala, R) in the same notebook using magic commands.
- The %run command executes another notebook and makes its variables, functions, and imports available in the calling notebook.
- dbutils.notebook.run() executes a notebook as a child job and returns a string result, useful for modular orchestration.
- Spark DataFrames are lazily evaluated — transformations are only executed when an action (count, show, collect, write) is triggered.
- The display() function in Databricks notebooks renders DataFrames as interactive tables with sorting, filtering, and visualization capabilities.
Databricks Notebooks and Code Execution
Quick Answer: Notebooks support cell-by-cell execution with magic commands for multi-language support. Use %run for importing code, dbutils.notebook.run() for orchestrated execution, and understand lazy evaluation — transformations only execute when an action is called.
Multi-Language Support
Magic Commands
| Command | Purpose | Example |
|---|---|---|
| %python | Run cell as Python | %python\nprint("Hello") |
| %sql | Run cell as SQL | %sql\nSELECT * FROM orders |
| %scala | Run cell as Scala | %scala\nprintln("Hello") |
| %r | Run cell as R | %r\nprint("Hello") |
| %md | Render cell as Markdown | %md\n# Title |
| %run | Execute another notebook | %run ./helpers |
| %sh | Run shell commands | %sh ls /tmp |
| %fs | Run DBFS commands | %fs ls /data/ |
| %pip | Install Python packages | %pip install pandas==2.0.0 |
Cross-Language Variable Sharing
Variables defined in one language cell are NOT automatically available in cells of another language. To share data:
# Python: register DataFrame as a temp view
df.createOrReplaceTempView("my_data")
%sql
-- SQL: access the temp view
SELECT * FROM my_data WHERE amount > 100
Running Other Notebooks
%run (Inline Execution)
# Execute helpers notebook — all its variables/functions become available here
%run ./helpers
# Now you can use functions defined in helpers.py
result = process_data(df)
- Executes in the same Spark session and process
- Variables and functions from the child notebook are available in the parent
- Cannot pass parameters (use widgets instead)
- Blocking: Waits for the child notebook to complete
dbutils.notebook.run() (Child Job Execution)
# Run a notebook as a separate child job
result = dbutils.notebook.run(
path="./data_processing",
timeout_seconds=600,
arguments={"start_date": "2026-01-01", "catalog": "prod"}
)
print(f"Child notebook returned: {result}")
- Executes in a separate process
- Returns a string value (set via dbutils.notebook.exit())
- Can pass parameters as key-value pairs
- Useful for orchestrating multiple notebooks
| Feature | %run | dbutils.notebook.run() |
|---|---|---|
| Execution context | Same process | Separate process |
| Variable sharing | Yes (shared scope) | No (only return string) |
| Parameters | Via widgets | Via arguments dictionary |
| Return value | None | String from notebook.exit() |
| Error handling | Fails parent notebook | Catchable exception |
Lazy Evaluation
Spark uses lazy evaluation — transformations are not executed immediately. They are only executed when an action is triggered.
Transformations (Lazy — Build Execution Plan)
# These lines do NOT execute yet — they build a plan
df = spark.table("orders") # Read plan
filtered = df.filter(col("amount") > 100) # Filter plan
grouped = filtered.groupBy("customer_id") # Group plan
result = grouped.agg(sum("amount").alias("total")) # Aggregation plan
Actions (Eager — Trigger Execution)
# These lines TRIGGER execution of the entire plan
result.show() # Display results
result.count() # Return row count
result.collect() # Return all rows as list
result.write.saveAsTable("summary") # Write to table
display(result) # Databricks display function
Why Lazy Evaluation Matters
- Optimization: Spark optimizes the entire plan before executing (e.g., predicate pushdown)
- Efficiency: Only the data needed for the final result is read and processed
- Pipeline composition: You can chain many transformations without intermediate materializations
On the Exam: Understand that transformations are lazy and actions are eager. Know the difference between %run (shared scope, no parameters) and dbutils.notebook.run() (separate process, accepts parameters, returns string).
A data engineer defines a series of DataFrame transformations (filter, groupBy, agg) but does not see any output. What is the reason?
What is the key difference between %run and dbutils.notebook.run()?
How can data be shared between a Python cell and a SQL cell in the same Databricks notebook?