2.11 Azure Data Services and Analytics

Key Takeaways

  • Azure Synapse Analytics unifies data warehousing (MPP SQL pools) and big data (Spark pools) with built-in pipelines and Power BI integration.
  • Azure Data Factory is the cloud ETL/ELT pipeline orchestrator with 90+ connectors and a visual designer.
  • Azure Databricks is an Apache Spark platform for collaborative data engineering and machine learning, with Delta Lake and MLflow.
  • Azure HDInsight is managed open-source big data (Hadoop, Spark, Hive, Kafka); Stream Analytics handles real-time/IoT streams.
  • Data Lake Storage Gen2 adds a hierarchical namespace and POSIX ACLs on top of Blob Storage as the foundation for analytics.
Last updated: June 2026

Quick Answer: Synapse Analytics = unified data warehouse + big data. Data Factory = ETL/ELT pipelines. Databricks = Apache Spark for engineering and machine learning. HDInsight = managed open-source big data (Hadoop/Spark/Kafka/Hive). Stream Analytics = real-time/IoT streams. Power BI = dashboards. Data Lake Storage Gen2 = analytics-optimized storage. AZ-900 only tests the one-line purpose of each — match the keyword, do not memorize internals.

How the Pieces Fit Together

A typical analytics pipeline reads like a sentence: raw data lands in Data Lake Storage Gen2, Data Factory moves and transforms it, Synapse or Databricks processes and models it, and Power BI visualizes the result. Knowing that flow makes the "which service" questions easy.

The AZ-900 objective treats these as descriptive knowledge — you are asked to recognize what each tool is for, not how to configure clusters, write Spark code, or tune SQL pools. Expect questions phrased as a one-sentence scenario whose keyword ("pipeline," "warehouse," "real-time," "notebook," "dashboard") maps to exactly one service. The most common trap is the Synapse-vs-Databricks overlap, covered below.

Azure Synapse Analytics

Azure Synapse Analytics (the evolution of the former SQL Data Warehouse) is an integrated analytics service that brings warehousing and big data into one workspace:

ComponentPurpose
Dedicated/serverless SQL poolsMassively parallel processing (MPP) queries over structured data
Apache Spark poolsBig-data processing and machine learning
PipelinesBuilt-in Data Factory integration for data movement
Synapse StudioOne UI for SQL, Spark, and pipeline development
Power BI linkDirect visualization of warehouse data

Reach for Synapse when a question says "enterprise data warehouse," "unified analytics," or "combine warehousing and big data in one place."

Azure Data Factory

Azure Data Factory (ADF) is the cloud ETL/ELT (Extract-Transform-Load / Extract-Load-Transform) orchestrator. It does not store data itself — it moves and transforms it on a schedule, on demand, or on an event.

  • 90+ connectors to cloud and on-premises sources
  • A visual, drag-and-drop pipeline designer (no code required for many flows)
  • Built-in scheduling, triggers, monitoring, and alerting

Whenever the keyword is "pipeline," "ETL," "orchestrate data movement," or "copy data from many sources," the answer is Data Factory.

Azure Databricks

Azure Databricks is a fast, collaborative Apache Spark analytics platform built with Databricks:

  • Collaborative notebooks shared by data engineers and data scientists
  • Auto-scaling Spark clusters that grow and shrink with the job
  • Delta Lake for ACID transactions on a data lake
  • MLflow for tracking experiments and managing machine-learning models

The distinguishing keywords are "Spark," "collaborative notebooks," and "machine learning." Note the overlap trap: both Synapse and Databricks can run Spark, but "collaborative data science / ML platform" leans Databricks, while "single workspace that also has an MPP SQL data warehouse" leans Synapse.

HDInsight, Stream Analytics, and Power BI

  • Azure HDInsight is a fully managed service for open-source big-data frameworks — Hadoop, Spark, Hive, Kafka, HBase. Pick it when a question names a specific open-source project to run as-is.
  • Azure Stream Analytics processes real-time data streams (often from IoT devices or Event Hubs) with SQL-like queries. Keyword: "real-time" or "streaming."
  • Power BI is the business-intelligence visualization layer — interactive reports and dashboards. Keyword: "dashboard," "report," "visualize."

Azure Data Lake Storage Gen2

Data Lake Storage Gen2 is not a separate product but a set of capabilities layered onto Blob Storage:

  • Hierarchical namespace — real directories and subdirectories with fast atomic folder operations (renames, deletes), instead of a flat blob list.
  • POSIX-compatible ACLs — fine-grained access control on individual files and folders.
  • Inherits Blob features: access tiers, lifecycle management, and the LRS-through-GZRS redundancy options.
  • Optimized to feed analytics engines such as Synapse, Databricks, and HDInsight.

Why the hierarchical namespace matters: on a flat blob store, "renaming a folder" actually means copying and re-listing every object underneath it, which is slow and expensive at petabyte scale. Gen2's true directory structure makes that a single fast metadata operation, which is why analytics engines that constantly read and reorganize partitioned datasets run far more efficiently against it. The exam phrasing to watch for is "big data analytics storage with a hierarchical namespace" — that points squarely at Data Lake Storage Gen2 rather than plain Blob Storage.

Quick Reference

ServiceCategoryOne-line purpose
Synapse AnalyticsUnified analyticsWarehouse + big data in one workspace
Data FactoryETL/ELTPipeline orchestration and data movement
DatabricksSpark analyticsCollaborative data science and ML
HDInsightManaged open-sourceHadoop, Spark, Kafka, Hive workloads
Stream AnalyticsReal-timeIoT and streaming data processing
Power BIVisualizationDashboards and BI reports
Data Lake Storage Gen2StorageAnalytics-optimized big-data storage

On the Exam: You are not asked to configure these services — only to recognize their purpose. If you can finish the sentence "___ is used for ___" for each row above, you are ready for this objective.

Test Your Knowledge

Which service provides a single workspace that combines an MPP SQL data warehouse with Apache Spark big-data processing?

A
B
C
D
Test Your Knowledge

A team needs to copy and transform data from 12 different on-premises and cloud sources on a nightly schedule, using a visual designer rather than code. Which service fits?

A
B
C
D
Test Your Knowledge

A factory streams sensor readings continuously and needs them analyzed in real time as they arrive. Which Azure service is designed for this?

A
B
C
D
Test Your Knowledge

What capability does Data Lake Storage Gen2 add on top of standard Blob Storage to make it better for big-data analytics?

A
B
C
D