A data engineer drops a managed Delta table from the Unity Catalog. What happens to the underlying data files?

The data files are deleted along with the table metadata. When a managed table is dropped, both the metadata (catalog entry) and the underlying data files are deleted. This is the key difference from external tables, where dropping only removes the metadata while data files remain at the specified LOCATION.

Which SQL syntax correctly reads a JSON file directly in a Spark SQL query?

SELECT * FROM json.`/data/events.json`. Spark SQL uses the backtick notation to read files directly: SELECT * FROM format.`path`. This works for json, csv, parquet, delta, and other supported formats.

How do you access a global temporary view named "shared_data" in a SQL query?

SELECT * FROM global_temp.shared_data. Global temporary views must be referenced with the global_temp schema prefix: SELECT * FROM global_temp.view_name. Regular temporary views are referenced without a schema prefix.

Reading Data with Spark SQL and PySpark

Quick Answer: Spark reads data from files using format-specific readers (spark.read.format()) or SQL commands (SELECT, CREATE TABLE). Data can be read from Parquet, Delta, JSON, CSV, and other formats. Tables persist metadata in the catalog, while views are virtual and computed on each query.

Reading Files with PySpark

Common File Reads

# Read Parquet files
df = spark.read.format("parquet").load("/data/files/sales.parquet")

# Read JSON files
df = spark.read.format("json").load("/data/files/events.json")

# Read CSV files with header and schema inference
df = (spark.read
    .format("csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .option("sep", ",")
    .load("/data/files/customers.csv")
)

# Read Delta table by path
df = spark.read.format("delta").load("/data/delta/orders/")

# Read Delta table by name (Unity Catalog)
df = spark.table("my_catalog.my_schema.orders")

Read Options for Common Formats

Format	Common Options	Example
CSV	header, inferSchema, sep, quote, escape, multiLine	`.option("header", "true")`
JSON	multiLine, primitivesAsString	`.option("multiLine", "true")`
Parquet	mergeSchema	`.option("mergeSchema", "true")`
Delta	versionAsOf, timestampAsOf	`.option("versionAsOf", "5")`

Reading Data with SQL

Direct File Reads in SQL

-- Read JSON files directly in SQL using backtick notation
SELECT * FROM json.`/data/files/events.json`;

-- Read CSV files
SELECT * FROM csv.`/data/files/customers.csv`;

-- Read Parquet files
SELECT * FROM parquet.`/data/files/sales.parquet`;

-- Read Delta files by path
SELECT * FROM delta.`/data/delta/orders/`;

Tables: Managed vs. External

Managed Tables

-- Managed table: Databricks manages both metadata AND data files
CREATE TABLE my_catalog.my_schema.sales (
    order_id BIGINT,
    customer_id BIGINT,
    amount DECIMAL(10,2),
    order_date DATE
);

-- Create table from query results
CREATE TABLE my_catalog.my_schema.sales_2026 AS
SELECT * FROM my_catalog.my_schema.sales
WHERE order_date >= '2026-01-01';

Dropping a managed table deletes both the metadata AND the data files.
Data is stored in the Unity Catalog managed storage location.

External Tables

-- External table: Databricks manages metadata only, data lives at LOCATION
CREATE TABLE my_catalog.my_schema.external_sales
LOCATION 's3://my-bucket/data/sales/'
AS SELECT * FROM raw_data;

Dropping an external table deletes only the metadata. The data files remain.
Data is stored at the specified LOCATION, not in managed storage.

Aspect	Managed Table	External Table
Data location	Unity Catalog managed storage	User-specified LOCATION
DROP TABLE	Deletes metadata + data	Deletes metadata only
Data lifecycle	Tied to table lifecycle	Independent of table
Best for	Most use cases	Data shared with other systems

Views

Regular Views

-- A view is a saved query — no data is stored
CREATE VIEW my_catalog.my_schema.high_value_orders AS
SELECT * FROM my_catalog.my_schema.orders
WHERE total_amount > 1000;

Temporary Views

-- Temporary view: exists only for the current Spark session
CREATE TEMP VIEW recent_orders AS
SELECT * FROM my_catalog.my_schema.orders
WHERE order_date >= current_date() - INTERVAL 7 DAYS;

Global Temporary Views

-- Global temp view: accessible across notebooks attached to the same cluster
CREATE GLOBAL TEMP VIEW cross_notebook_data AS
SELECT * FROM my_catalog.my_schema.orders;

-- Must reference with global_temp schema
SELECT * FROM global_temp.cross_notebook_data;

View Type	Scope	Persists	Query
View	Catalog (permanent)	Yes — persists in Unity Catalog	`SELECT * FROM catalog.schema.view_name`
Temp View	Current Spark session	No — gone when session ends	`SELECT * FROM view_name`
Global Temp View	All sessions on the same cluster	No — gone when cluster restarts	`SELECT * FROM global_temp.view_name`

On the Exam: Know the difference between managed tables (DROP deletes data), external tables (DROP keeps data), and views (no data stored). Also know that temp views are session-scoped while global temp views are cluster-scoped.

Databricks Certified Data Engineer Associate

2.2 Reading Data with Spark SQL and PySpark

Key Takeaways

Reading Data with Spark SQL and PySpark

Reading Files with PySpark

Common File Reads

Read Options for Common Formats

Reading Data with SQL

Direct File Reads in SQL

Tables: Managed vs. External

Managed Tables

External Tables

Views

Regular Views

Temporary Views

Global Temporary Views

Databricks Certified Data Engineer Associate

1Introduction

2Domain 1: Databricks Intelligence Platform (10%)

3Domain 2: Development and Ingestion (30%)

4Domain 3: Data Processing & Transformations (31%)

5Domain 4: Productionizing Data Pipelines (18%)

6Domain 5: Data Governance & Quality (11%)

2.2 Reading Data with Spark SQL and PySpark

Key Takeaways

Reading Data with Spark SQL and PySpark

Reading Files with PySpark

Common File Reads

Read Options for Common Formats

Reading Data with SQL

Direct File Reads in SQL

Tables: Managed vs. External

Managed Tables

External Tables

Views

Regular Views

Temporary Views

Global Temporary Views