1.5 Databricks SQL and the Photon Engine

Key Takeaways

  • Databricks SQL provides a SQL-native interface for running queries, building dashboards, and creating alerts directly on Lakehouse data.
  • SQL warehouses come in three types: Classic (VMs in your account), Pro (adds federation and advanced features), and Serverless (Databricks-managed for instant startup).
  • The Photon engine is a C++ vectorized query engine that dramatically accelerates SQL and DataFrame operations compared to standard Spark.
  • Databricks SQL dashboards allow visualization of query results with charts, tables, and auto-refresh schedules.
  • Query history and query profiles help debug performance issues by showing execution plans, I/O statistics, and bottlenecks.
Last updated: March 2026

Databricks SQL and the Photon Engine

Quick Answer: Databricks SQL is a SQL-native analytics interface for running queries and building dashboards on Lakehouse data. SQL warehouses provide the compute (Classic, Pro, or Serverless). The Photon engine accelerates queries using a vectorized C++ runtime.

Databricks SQL Overview

Databricks SQL is the analytics layer of the Data Intelligence Platform:

FeatureDescription
SQL EditorWrite and run SQL queries with autocomplete and syntax highlighting
DashboardsVisualize query results with charts, tables, and counters
AlertsTrigger notifications when query results meet conditions
Query HistoryView past queries, execution time, and resource usage
Query ProfileDetailed execution plan for performance debugging

SQL Warehouse Types

TypeCompute LocationStartup TimeKey Feature
ClassicYour cloud accountMinutesBasic SQL queries
ProYour cloud accountMinutesLakehouse Federation, query caching
ServerlessDatabricks-managedSecondsInstant startup, auto-scaling

Serverless SQL Warehouses

  • No cluster provisioning delay — queries start in seconds
  • Auto-scales based on query concurrency
  • Cost-efficient — scales to zero when not in use
  • Databricks manages the underlying infrastructure
  • Recommended for most SQL analytics workloads

Photon Engine

Photon is a high-performance C++ vectorized query engine that replaces parts of the Spark JVM execution:

How Photon Improves Performance

  • Vectorized execution: Processes batches of rows instead of one row at a time
  • Native C++ code: Eliminates JVM overhead (garbage collection, serialization)
  • Columnar processing: Optimized for Parquet and Delta Lake column formats
  • Adaptive query execution: Dynamically optimizes query plans during execution

When to Enable Photon

  • SQL-heavy workloads (aggregations, joins, filters)
  • ETL transformations on Delta tables
  • Dashboard queries requiring low latency
  • Available on SQL warehouses (always on) and compute clusters (configurable)

Query Profile

The query profile shows:

  • Execution plan: DAG of operations (scan, filter, join, aggregate)
  • Data statistics: Rows read, rows output, bytes scanned
  • Time breakdown: How long each operation took
  • Spill metrics: Whether data spilled to disk (indicates memory pressure)
  • Skew indicators: Whether data is unevenly distributed across tasks

On the Exam: Know the three SQL warehouse types, when to use Serverless (most scenarios), and that Photon improves SQL and DataFrame performance through vectorized C++ execution.

Test Your Knowledge

Which SQL warehouse type provides the fastest startup time and requires no cluster provisioning?

A
B
C
D
Test Your Knowledge

What technology does the Photon engine use to accelerate query performance?

A
B
C
D
Test Your Knowledge

A data engineer notices that a SQL query is running slowly. Which Databricks SQL feature provides a detailed breakdown of execution time, data statistics, and bottlenecks?

A
B
C
D