1.5 Databricks SQL and the Photon Engine
Key Takeaways
- Databricks SQL provides a SQL-native interface for running queries, building dashboards, and creating alerts directly on Lakehouse data.
- SQL warehouses come in three types: Classic (VMs in your account), Pro (adds federation and advanced features), and Serverless (Databricks-managed for instant startup).
- The Photon engine is a C++ vectorized query engine that dramatically accelerates SQL and DataFrame operations compared to standard Spark.
- Databricks SQL dashboards allow visualization of query results with charts, tables, and auto-refresh schedules.
- Query history and query profiles help debug performance issues by showing execution plans, I/O statistics, and bottlenecks.
Databricks SQL and the Photon Engine
Quick Answer: Databricks SQL is a SQL-native analytics interface for running queries and building dashboards on Lakehouse data. SQL warehouses provide the compute (Classic, Pro, or Serverless). The Photon engine accelerates queries using a vectorized C++ runtime.
Databricks SQL Overview
Databricks SQL is the analytics layer of the Data Intelligence Platform:
| Feature | Description |
|---|---|
| SQL Editor | Write and run SQL queries with autocomplete and syntax highlighting |
| Dashboards | Visualize query results with charts, tables, and counters |
| Alerts | Trigger notifications when query results meet conditions |
| Query History | View past queries, execution time, and resource usage |
| Query Profile | Detailed execution plan for performance debugging |
SQL Warehouse Types
| Type | Compute Location | Startup Time | Key Feature |
|---|---|---|---|
| Classic | Your cloud account | Minutes | Basic SQL queries |
| Pro | Your cloud account | Minutes | Lakehouse Federation, query caching |
| Serverless | Databricks-managed | Seconds | Instant startup, auto-scaling |
Serverless SQL Warehouses
- No cluster provisioning delay — queries start in seconds
- Auto-scales based on query concurrency
- Cost-efficient — scales to zero when not in use
- Databricks manages the underlying infrastructure
- Recommended for most SQL analytics workloads
Photon Engine
Photon is a high-performance C++ vectorized query engine that replaces parts of the Spark JVM execution:
How Photon Improves Performance
- Vectorized execution: Processes batches of rows instead of one row at a time
- Native C++ code: Eliminates JVM overhead (garbage collection, serialization)
- Columnar processing: Optimized for Parquet and Delta Lake column formats
- Adaptive query execution: Dynamically optimizes query plans during execution
When to Enable Photon
- SQL-heavy workloads (aggregations, joins, filters)
- ETL transformations on Delta tables
- Dashboard queries requiring low latency
- Available on SQL warehouses (always on) and compute clusters (configurable)
Query Profile
The query profile shows:
- Execution plan: DAG of operations (scan, filter, join, aggregate)
- Data statistics: Rows read, rows output, bytes scanned
- Time breakdown: How long each operation took
- Spill metrics: Whether data spilled to disk (indicates memory pressure)
- Skew indicators: Whether data is unevenly distributed across tasks
On the Exam: Know the three SQL warehouse types, when to use Serverless (most scenarios), and that Photon improves SQL and DataFrame performance through vectorized C++ execution.
Which SQL warehouse type provides the fastest startup time and requires no cluster provisioning?
What technology does the Photon engine use to accelerate query performance?
A data engineer notices that a SQL query is running slowly. Which Databricks SQL feature provides a detailed breakdown of execution time, data statistics, and bottlenecks?