Which service provides a single workspace that combines an MPP SQL data warehouse with Apache Spark big-data processing?

Azure Synapse Analytics. The unified analytics service brings dedicated/serverless MPP SQL pools together with Spark pools and pipelines in one Studio workspace. A pipeline orchestrator, a managed open-source cluster service, and a visualization tool each cover only part of that scope.

A team needs to copy and transform data from 12 different on-premises and cloud sources on a nightly schedule, using a visual designer rather than code. Which service fits?

Azure Data Factory. The cloud ETL/ELT orchestrator offers 90+ connectors, scheduled triggers, and a drag-and-drop pipeline designer for exactly this data-movement-and-transformation job. The streaming service targets real-time data, and the Spark and warehouse services are processing engines, not connector-driven movement tools.

A factory streams sensor readings continuously and needs them analyzed in real time as they arrive. Which Azure service is designed for this?

Azure Stream Analytics. Real-time, continuous IoT/streaming data is processed with SQL-like queries by the streaming analytics service. The warehouse service targets batch and stored data, the visualization tool only displays results, and the storage layer holds data rather than analyzing streams as they arrive.

What capability does Data Lake Storage Gen2 add on top of standard Blob Storage to make it better for big-data analytics?

A hierarchical namespace with directory semantics and POSIX ACLs. Layering a true hierarchical namespace (real folders with fast atomic operations) plus POSIX-style access control lists onto Blob Storage is what makes Gen2 analytics-friendly. Multi-region writes belong to Cosmos DB, pipeline orchestration to Data Factory, and dashboards to Power BI.

Azure Data Services and Analytics — Free Study Guide 2026

Quick Answer: Synapse Analytics = unified data warehouse + big data. Data Factory = ETL/ELT pipelines. Databricks = Apache Spark for engineering and machine learning. HDInsight = managed open-source big data (Hadoop/Spark/Kafka/Hive). Stream Analytics = real-time/IoT streams. Power BI = dashboards. Data Lake Storage Gen2 = analytics-optimized storage. AZ-900 only tests the one-line purpose of each — match the keyword, do not memorize internals.

How the Pieces Fit Together

A typical analytics pipeline reads like a sentence: raw data lands in Data Lake Storage Gen2, Data Factory moves and transforms it, Synapse or Databricks processes and models it, and Power BI visualizes the result. Knowing that flow makes the "which service" questions easy.

The AZ-900 objective treats these as descriptive knowledge — you are asked to recognize what each tool is for, not how to configure clusters, write Spark code, or tune SQL pools. Expect questions phrased as a one-sentence scenario whose keyword ("pipeline," "warehouse," "real-time," "notebook," "dashboard") maps to exactly one service. The most common trap is the Synapse-vs-Databricks overlap, covered below.

Azure Synapse Analytics

Azure Synapse Analytics (the evolution of the former SQL Data Warehouse) is an integrated analytics service that brings warehousing and big data into one workspace:

Component	Purpose
Dedicated/serverless SQL pools	Massively parallel processing (MPP) queries over structured data
Apache Spark pools	Big-data processing and machine learning
Pipelines	Built-in Data Factory integration for data movement
Synapse Studio	One UI for SQL, Spark, and pipeline development
Power BI link	Direct visualization of warehouse data

Reach for Synapse when a question says "enterprise data warehouse," "unified analytics," or "combine warehousing and big data in one place."

Azure Data Factory

Azure Data Factory (ADF) is the cloud ETL/ELT (Extract-Transform-Load / Extract-Load-Transform) orchestrator. It does not store data itself — it moves and transforms it on a schedule, on demand, or on an event.

90+ connectors to cloud and on-premises sources
A visual, drag-and-drop pipeline designer (no code required for many flows)
Built-in scheduling, triggers, monitoring, and alerting

Whenever the keyword is "pipeline," "ETL," "orchestrate data movement," or "copy data from many sources," the answer is Data Factory.

Azure Databricks

Azure Databricks is a fast, collaborative Apache Spark analytics platform built with Databricks:

Collaborative notebooks shared by data engineers and data scientists
Auto-scaling Spark clusters that grow and shrink with the job
Delta Lake for ACID transactions on a data lake
MLflow for tracking experiments and managing machine-learning models

The distinguishing keywords are "Spark," "collaborative notebooks," and "machine learning." Note the overlap trap: both Synapse and Databricks can run Spark, but "collaborative data science / ML platform" leans Databricks, while "single workspace that also has an MPP SQL data warehouse" leans Synapse.

HDInsight, Stream Analytics, and Power BI

Azure HDInsight is a fully managed service for open-source big-data frameworks — Hadoop, Spark, Hive, Kafka, HBase. Pick it when a question names a specific open-source project to run as-is.
Azure Stream Analytics processes real-time data streams (often from IoT devices or Event Hubs) with SQL-like queries. Keyword: "real-time" or "streaming."
Power BI is the business-intelligence visualization layer — interactive reports and dashboards. Keyword: "dashboard," "report," "visualize."

Azure Data Lake Storage Gen2

Data Lake Storage Gen2 is not a separate product but a set of capabilities layered onto Blob Storage:

Hierarchical namespace — real directories and subdirectories with fast atomic folder operations (renames, deletes), instead of a flat blob list.
POSIX-compatible ACLs — fine-grained access control on individual files and folders.
Inherits Blob features: access tiers, lifecycle management, and the LRS-through-GZRS redundancy options.
Optimized to feed analytics engines such as Synapse, Databricks, and HDInsight.

Why the hierarchical namespace matters: on a flat blob store, "renaming a folder" actually means copying and re-listing every object underneath it, which is slow and expensive at petabyte scale. Gen2's true directory structure makes that a single fast metadata operation, which is why analytics engines that constantly read and reorganize partitioned datasets run far more efficiently against it. The exam phrasing to watch for is "big data analytics storage with a hierarchical namespace" — that points squarely at Data Lake Storage Gen2 rather than plain Blob Storage.

Quick Reference

Service	Category	One-line purpose
Synapse Analytics	Unified analytics	Warehouse + big data in one workspace
Data Factory	ETL/ELT	Pipeline orchestration and data movement
Databricks	Spark analytics	Collaborative data science and ML
HDInsight	Managed open-source	Hadoop, Spark, Kafka, Hive workloads
Stream Analytics	Real-time	IoT and streaming data processing
Power BI	Visualization	Dashboards and BI reports
Data Lake Storage Gen2	Storage	Analytics-optimized big-data storage

On the Exam: You are not asked to configure these services — only to recognize their purpose. If you can finish the sentence "___ is used for ___" for each row above, you are ready for this objective.

Microsoft Azure Fundamentals

Azure AZ-900

2.11 Azure Data Services and Analytics

Key Takeaways

How the Pieces Fit Together

Azure Synapse Analytics

Azure Data Factory

Azure Databricks

HDInsight, Stream Analytics, and Power BI

Azure Data Lake Storage Gen2

Quick Reference

Microsoft Azure Fundamentals

1Introduction

2Domain 1: Cloud Concepts (25-30%)

3Domain 2A: Azure Architecture Components

4Domain 2B: Azure Compute Services

5Domain 2C: Azure Networking Services

6Domain 2D: Azure Storage Services

7Domain 2E: Azure Identity, Access, and Security

8Domain 3A: Azure Cost Management

9Domain 3B: Azure Governance and Compliance

10Domain 3C: Azure Monitoring and Management Tools

11Additional Azure Services and Exam Review

12Advanced Azure Topics and Service Deep Dives

Azure AZ-900

2.11 Azure Data Services and Analytics

Key Takeaways

How the Pieces Fit Together

Azure Synapse Analytics

Azure Data Factory

Azure Databricks

HDInsight, Stream Analytics, and Power BI

Azure Data Lake Storage Gen2

Quick Reference