5.2 Azure Synapse Analytics

Key Takeaways

A Synapse workspace bundles dedicated SQL pool, serverless SQL pool, Apache Spark pool, and pipelines, all managed in Synapse Studio.
A dedicated SQL pool (formerly Azure SQL Data Warehouse) is an MPP warehouse billed in DWUs and can be paused to save compute cost.
A serverless SQL pool charges per terabyte of data processed and queries files in ADLS Gen2 directly without loading them.
Spark pools auto-scale and auto-pause, and are the right choice for Python, Scala, or .NET notebooks on big data.
Synapse pipelines are an embedded subset of Azure Data Factory, sharing the same activities, triggers, and integration runtimes.

Last updated: June 2026

Azure Synapse Analytics

Azure Synapse Analytics is Microsoft's unified analytics service that brings data warehousing, big-data processing, data integration, and serverless SQL into a single workspace. On the DP-900, expect questions that ask you to pick the right compute option for a scenario, so focus on what each pool actually does and how it bills.

Quick Answer: A Synapse workspace contains four kinds of compute — dedicated SQL pool (provisioned MPP warehouse), serverless SQL pool (pay-per-query T-SQL over files), Apache Spark pool (notebooks and big-data jobs), and pipelines (the Data Factory ETL engine embedded inside Synapse) — all managed from Synapse Studio.

The Synapse Workspace

A Synapse workspace is the top-level resource. Inside it you get:

A default Azure Data Lake Storage Gen2 account that acts as the workspace's primary storage.
One or more compute pools (you add them as needed).
Synapse Studio, a browser-based IDE for writing T-SQL, authoring Spark notebooks, building pipelines, and managing security — all without leaving the portal.
Integrated Azure Active Directory (Microsoft Entra ID) authentication and Azure RBAC.

Dedicated SQL Pool (formerly SQL Data Warehouse)

The dedicated SQL pool is the modern name for what used to be sold as Azure SQL Data Warehouse. It is a massively parallel processing (MPP) engine: a control node breaks each T-SQL query into smaller pieces and runs them across 60 distributions on multiple compute nodes.

Key facts:

Billed by Data Warehouse Units (DWUs) — a blended measure of CPU, memory, and I/O. You provision a level such as DW100c, DW1000c, DW3000c.
You can pause the pool to stop compute charges (storage continues to bill).
Supports table distribution strategies: hash-distributed for large fact tables, round-robin for staging tables, and replicated for small dimension tables.
Uses columnstore indexes by default for compression and analytic speed.
Best for predictable, high-volume BI workloads up to petabytes.

Serverless SQL Pool

Every workspace ships with a serverless SQL pool ("Built-in") that is always on but charges only when you run a query. It is on-demand: you pay per TB of data processed, with no provisioning to manage.

Use serverless SQL when you want to:

Run ad-hoc T-SQL queries directly over Parquet, CSV, JSON, or Delta files in ADLS Gen2 without copying them into a database.
Create external tables and views so BI tools can query lake files using familiar T-SQL.
Build a logical data warehouse that joins files across multiple storage accounts.

You cannot use serverless SQL pool to ingest or persist data into managed tables — it is read-mostly query compute.

Apache Spark Pool

A Spark pool is a managed Apache Spark cluster you can spin up inside the workspace for Python (PySpark), Scala, .NET, and SQL workloads. Spark pools are the right choice when:

You need to process unstructured or semi-structured data at scale.
Data scientists want notebooks to combine Spark SQL, ML libraries, and visualizations.
You are building data engineering pipelines that run heavy joins, machine learning, or graph processing.

Spark pools auto-pause after a configurable idle period and auto-scale executors within a defined min/max range, so you pay only for what runs.

Synapse Pipelines

Synapse pipelines are the data integration engine inside the workspace. They use the same activities, datasets, triggers, and integration runtimes as Azure Data Factory — in fact, Synapse pipelines are a subset of ADF re-hosted inside Synapse Studio. You can copy data, orchestrate Spark notebooks, run stored procedures on the dedicated SQL pool, and call REST endpoints from a single pipeline.

Synapse Studio

Synapse Studio is the web UI where developers, analysts, and engineers collaborate. Tabs include:

Data — browse linked storage, lake files, and SQL/Spark databases.
Develop — write T-SQL scripts, Spark notebooks, KQL scripts, and Power BI reports.
Integrate — build pipelines and data flows.
Monitor — track pipeline runs, SQL requests, Spark applications.
Manage — configure pools, linked services, security, and Git integration.

Choosing the Right Compute Option

Scenario	Best Synapse compute
24/7 multi-TB enterprise data warehouse with predictable cost	Dedicated SQL pool
One-off T-SQL exploration of CSV/Parquet in the lake	Serverless SQL pool
Python/Scala ML notebooks on huge datasets	Apache Spark pool
Orchestrate copy and transformation jobs across systems	Synapse pipelines
Federated SQL over Cosmos DB analytical store	Serverless SQL pool with Synapse Link

Synapse Link

Azure Synapse Link creates a near-real-time replica of operational data (Cosmos DB, Dataverse, SQL Server 2022) into the Synapse workspace, so you can run analytics on transactional data without ETL and without impacting the source database. The classic exam scenario is "run analytics on Cosmos DB without affecting transactional performance" — the answer is Synapse Link.

Synapse vs Microsoft Fabric

Synapse is still fully supported, but Microsoft positions Microsoft Fabric as the strategic successor that unifies Synapse, Data Factory, and Power BI under one SaaS platform with OneLake at its core. For DP-900 you should know both: Synapse for current Azure-resource customers, Fabric for the unified next-generation platform.

Test Your Knowledge

A data engineer needs to run T-SQL queries on Parquet files in Azure Data Lake Storage Gen2. The queries are infrequent and the team does not want to provision or pay for idle compute. Which Synapse component fits best?

Dedicated SQL pool sized at DW100c

Serverless SQL pool

Apache Spark pool with auto-pause

Synapse pipelines with a copy activity

Test Your Knowledge

Which statement about a dedicated SQL pool in Azure Synapse Analytics is TRUE?

It bills only for data scanned per query, with no provisioning

It is the rebrand of Azure SQL Data Warehouse and uses massively parallel processing across 60 distributions

It is the same product as Azure SQL Database and is sized in DTUs

It cannot be paused; compute always runs once provisioned

Up Next

5.3 Azure Data Factory and Pipelines

Continue learning

Microsoft Azure Data Fundamentals

Azure DP-900

5.2 Azure Synapse Analytics

Key Takeaways

Azure Synapse Analytics

The Synapse Workspace

Dedicated SQL Pool (formerly SQL Data Warehouse)

Serverless SQL Pool

Apache Spark Pool

Synapse Pipelines

Synapse Studio

Choosing the Right Compute Option

Synapse Link

Synapse vs Microsoft Fabric

Microsoft Azure Data Fundamentals

1Chapter 1: Introduction & Exam Overview

2Chapter 2: Core Data Concepts

3Chapter 3: Relational Data on Azure

4Chapter 4: Non-Relational Data on Azure

5Chapter 5: Analytics Workloads on Azure

Azure DP-900

5.2 Azure Synapse Analytics

Key Takeaways

Azure Synapse Analytics

The Synapse Workspace

Dedicated SQL Pool (formerly SQL Data Warehouse)

Serverless SQL Pool

Apache Spark Pool

Synapse Pipelines

Synapse Studio

Choosing the Right Compute Option

Synapse Link

Synapse vs Microsoft Fabric