2.11 Azure Data Services and Analytics

Key Takeaways

  • Azure Synapse Analytics is an integrated analytics service combining big data and data warehousing for end-to-end analytics.
  • Azure Data Factory is a cloud-based ETL (Extract, Transform, Load) service for data integration and pipeline orchestration.
  • Azure Databricks provides an Apache Spark-based analytics platform for big data processing and machine learning.
  • Azure HDInsight is a managed open-source analytics service supporting Hadoop, Spark, Hive, Kafka, and more.
  • Azure Data Lake Storage Gen2 combines Blob Storage scalability with a hierarchical file system for big data analytics.
Last updated: March 2026

Azure Data Services and Analytics

Quick Answer: Synapse Analytics = unified analytics (data warehouse + big data). Data Factory = ETL pipelines. Databricks = Spark-based analytics. HDInsight = managed Hadoop/Spark. Data Lake = big data storage with hierarchical namespace.

Azure Synapse Analytics

Azure Synapse Analytics (formerly SQL Data Warehouse) is an integrated analytics service that brings together big data and data warehousing:

CapabilityDescription
SQL poolsMassively parallel processing (MPP) SQL queries on structured data
Spark poolsApache Spark for big data processing and machine learning
PipelinesData integration (built-in Data Factory capabilities)
StudioUnified workspace for SQL, Spark, and pipeline development
Power BI integrationDirect integration for visualization

Azure Data Factory

Azure Data Factory is a cloud-based data integration service for creating ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) pipelines.

Key features:

  • 90+ connectors — Connect to cloud and on-premises data sources
  • Visual pipeline designer — Drag-and-drop interface for building data flows
  • Scheduling — Trigger pipelines on a schedule, on-demand, or based on events
  • Monitoring — Built-in pipeline monitoring and alerting

Azure Databricks

Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform:

  • Collaborative notebooks — Data scientists and engineers work together
  • Auto-scaling clusters — Scale Spark clusters automatically
  • MLflow integration — Track experiments and manage ML models
  • Delta Lake — ACID transactions on data lakes

Azure Data Lake Storage Gen2

Data Lake Storage Gen2 builds on Azure Blob Storage and adds:

  • Hierarchical namespace — File system semantics with directories and subdirectories
  • POSIX-compatible ACLs — Fine-grained access control on directories and files
  • Blob Storage features — Tiering, lifecycle management, redundancy options
  • Optimized for analytics — Works with Synapse, Databricks, HDInsight, and other analytics engines

Data Service Quick Reference

ServiceTypeBest For
Synapse AnalyticsUnified analyticsEnterprise data warehousing + big data
Data FactoryETL/ELTData pipeline orchestration
DatabricksSpark analyticsCollaborative data science and ML
HDInsightManaged open-sourceHadoop, Spark, Kafka, Hive workloads
Data Lake Storage Gen2StorageBig data analytics storage
Stream AnalyticsReal-time analyticsIoT and streaming data processing
Power BIVisualizationBusiness intelligence dashboards

On the Exam: You do not need deep knowledge of these analytics services. Know the HIGH-LEVEL purpose of each: Synapse = unified analytics, Data Factory = ETL pipelines, Databricks = Spark + ML, Data Lake = big data storage.

Test Your Knowledge

Which Azure service provides a unified analytics workspace combining data warehousing and big data?

A
B
C
D
Test Your Knowledge

Which Azure service is specifically designed for creating ETL/ELT data pipelines?

A
B
C
D