A data engineer defines a series of DataFrame transformations (filter, groupBy, agg) but does not see any output. What is the reason?

Spark uses lazy evaluation — transformations are not executed until an action (show, count, write) is triggered. Spark uses lazy evaluation for transformations (filter, groupBy, agg, select, etc.). These operations only build an execution plan. The plan is not executed until an action (show(), count(), collect(), write) is triggered.

What is the key difference between %run and dbutils.notebook.run()?

%run executes in the same process and shares variables, while dbutils.notebook.run() executes in a separate process and returns a string. %run executes a notebook inline in the same Spark session, making all variables and functions available in the calling notebook. dbutils.notebook.run() runs a notebook as a separate child job, accepts parameters via arguments, and returns a string value.

How can data be shared between a Python cell and a SQL cell in the same Databricks notebook?

Create a temporary view from the DataFrame in Python, then query it in SQL. Variables from one language are not directly available in another. The standard approach is to register a DataFrame as a temporary view (df.createOrReplaceTempView("name")) in Python, then query that view in a %sql cell.

Databricks Notebooks and Code Execution

Quick Answer: Notebooks support cell-by-cell execution with magic commands for multi-language support. Use %run for importing code, dbutils.notebook.run() for orchestrated execution, and understand lazy evaluation — transformations only execute when an action is called.

Multi-Language Support

Magic Commands

Command	Purpose	Example
%python	Run cell as Python	`%python\nprint("Hello")`
%sql	Run cell as SQL	`%sql\nSELECT * FROM orders`
%scala	Run cell as Scala	`%scala\nprintln("Hello")`
%r	Run cell as R	`%r\nprint("Hello")`
%md	Render cell as Markdown	`%md\n# Title`
%run	Execute another notebook	`%run ./helpers`
%sh	Run shell commands	`%sh ls /tmp`
%fs	Run DBFS commands	`%fs ls /data/`
%pip	Install Python packages	`%pip install pandas==2.0.0`

Cross-Language Variable Sharing

Variables defined in one language cell are NOT automatically available in cells of another language. To share data:

# Python: register DataFrame as a temp view
df.createOrReplaceTempView("my_data")

%sql
-- SQL: access the temp view
SELECT * FROM my_data WHERE amount > 100

Running Other Notebooks

%run (Inline Execution)

# Execute helpers notebook — all its variables/functions become available here
%run ./helpers

# Now you can use functions defined in helpers.py
result = process_data(df)

Executes in the same Spark session and process
Variables and functions from the child notebook are available in the parent
Cannot pass parameters (use widgets instead)
Blocking: Waits for the child notebook to complete

dbutils.notebook.run() (Child Job Execution)

# Run a notebook as a separate child job
result = dbutils.notebook.run(
    path="./data_processing",
    timeout_seconds=600,
    arguments={"start_date": "2026-01-01", "catalog": "prod"}
)
print(f"Child notebook returned: {result}")

Executes in a separate process
Returns a string value (set via dbutils.notebook.exit())
Can pass parameters as key-value pairs
Useful for orchestrating multiple notebooks

Feature	%run	dbutils.notebook.run()
Execution context	Same process	Separate process
Variable sharing	Yes (shared scope)	No (only return string)
Parameters	Via widgets	Via arguments dictionary
Return value	None	String from notebook.exit()
Error handling	Fails parent notebook	Catchable exception

Lazy Evaluation

Spark uses lazy evaluation — transformations are not executed immediately. They are only executed when an action is triggered.

Transformations (Lazy — Build Execution Plan)

# These lines do NOT execute yet — they build a plan
df = spark.table("orders")                    # Read plan
filtered = df.filter(col("amount") > 100)     # Filter plan
grouped = filtered.groupBy("customer_id")     # Group plan
result = grouped.agg(sum("amount").alias("total"))  # Aggregation plan

Actions (Eager — Trigger Execution)

# These lines TRIGGER execution of the entire plan
result.show()           # Display results
result.count()          # Return row count
result.collect()        # Return all rows as list
result.write.saveAsTable("summary")  # Write to table
display(result)         # Databricks display function

Why Lazy Evaluation Matters

Optimization: Spark optimizes the entire plan before executing (e.g., predicate pushdown)
Efficiency: Only the data needed for the final result is read and processed
Pipeline composition: You can chain many transformations without intermediate materializations

On the Exam: Understand that transformations are lazy and actions are eager. Know the difference between %run (shared scope, no parameters) and dbutils.notebook.run() (separate process, accepts parameters, returns string).

Databricks Certified Data Engineer Associate

2.8 Databricks Notebooks and Code Execution

Key Takeaways

Databricks Notebooks and Code Execution

Multi-Language Support

Magic Commands

Cross-Language Variable Sharing

Running Other Notebooks

%run (Inline Execution)

dbutils.notebook.run() (Child Job Execution)

Lazy Evaluation

Transformations (Lazy — Build Execution Plan)

Actions (Eager — Trigger Execution)

Why Lazy Evaluation Matters

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

2.8 Databricks Notebooks and Code Execution

Key Takeaways

Databricks Notebooks and Code Execution

Multi-Language Support

Magic Commands

Cross-Language Variable Sharing

Running Other Notebooks

%run (Inline Execution)

dbutils.notebook.run() (Child Job Execution)

Lazy Evaluation

Transformations (Lazy — Build Execution Plan)

Actions (Eager — Trigger Execution)

Why Lazy Evaluation Matters