2.5 Batch vs Stream Processing

Key Takeaways

Batch processing handles large volumes of data on a schedule and tolerates minutes-to-hours of latency.
Stream processing handles events one at a time or in micro-batches with sub-second to seconds latency.
Micro-batch processing groups events into small batches (often a few seconds wide) to balance throughput and latency.
Stream analytics queries aggregate events over time windows: tumbling (fixed, non-overlapping), hopping (fixed, overlapping), sliding (only on events), and session (gap-defined).
Azure Stream Analytics, Event Hubs, IoT Hub, and Microsoft Fabric Real-Time Intelligence are the core streaming services; Azure Synapse, Data Factory, and Fabric Data Factory dominate batch.

Last updated: June 2026

Every analytics platform has to decide when to process data: in scheduled bulk runs (batch) or as each event arrives (stream). Microsoft Learn treats this as one of the central DP-900 distinctions because it drives service choice.

Batch Processing

Batch processing collects data into a group and processes it as a single job. The classic example is a nightly ETL that loads yesterday's sales into the warehouse at 2 AM.

Batch Characteristics

Large volume per run — gigabytes to terabytes.
Scheduled or triggered, not continuous.
Latency-tolerant — minutes to hours between when data arrives and when it is processed.
Throughput-optimized — the goal is to move a lot of rows efficiently, not to react to any single row quickly.
Idempotent reruns — failed jobs are rerun on the same input.

Azure Services for Batch

Service	Role
Azure Data Factory / Fabric Data Factory	Orchestrates pipelines, copy activities, mapping data flows
Azure Synapse Analytics	Runs T-SQL or Spark batch jobs against warehouse and lake data
Microsoft Fabric (Warehouse, Lakehouse, Notebooks)	Unified storage and compute for batch ELT
Azure Databricks	Spark-based batch processing for large lake transformations
Azure HDInsight	Managed Hadoop/Spark for legacy batch workloads

Stream Processing

Stream processing treats each event as it arrives and produces results continuously. The classic example is a fraud detection system that scores every credit-card swipe in under a second.

Stream Characteristics

Unbounded — data has no defined end.
Event-driven — work is triggered by the arrival of new data, not by a clock.
Low latency — milliseconds to seconds end to end.
Per-event or windowed — operations either apply to single events or to events grouped by a time window.
State management — engines maintain running aggregates across windows.

Azure Services for Stream

Service	Role
Azure Event Hubs	Big-data event ingestion broker; the typical front door for streams
Azure IoT Hub	Bi-directional event ingestion specifically for IoT devices
Azure Stream Analytics	SQL-style streaming query engine over Event Hubs / IoT Hub
Microsoft Fabric Real-Time Intelligence (Eventstream / Eventhouse / KQL)	Unified streaming ingest, processing, and analytics inside Fabric
Azure Databricks Structured Streaming	Spark-based micro-batch and continuous streams

Micro-Batch Processing

Micro-batch processing sits between batch and true streaming. The engine collects events for a short interval — often a few seconds — and processes that small batch as a unit. Spark Structured Streaming, including the version in Azure Databricks and Microsoft Fabric, runs as micro-batches by default. It gets most of the latency of streaming with much of the throughput and simpler exactly-once semantics of batch.

Time Windows in Stream Analytics

Streams are unbounded, so almost every useful aggregation is computed per window. Azure Stream Analytics and Fabric Real-Time Intelligence support four window types you should recognize for DP-900.

Window	Shape	Example use
Tumbling	Fixed length, non-overlapping, contiguous	Count purchases per 5-minute bucket
Hopping	Fixed length, overlapping by a hop interval	5-minute average that updates every 1 minute
Sliding	Fixed length, evaluated only when an event arrives in the window	Trigger only when ≥3 alerts arrive in any 30-second period
Session	Variable length, defined by gaps of inactivity	Group clicks by a user until they are idle for 10 minutes

Quick Mental Picture

Tumbling is a row of train cars — no gaps, no overlap.
Hopping is overlapping lanes on a highway — every event lands in several windows at once.
Sliding is a tripwire — the window only fires when something crosses it.
Session is a movie theater — it lasts as long as the audience keeps clapping; once they stop for long enough, the window closes.

When to Combine Batch and Stream

Many real systems use both:

Stream path powers live dashboards, alerting, and anomaly detection.
Batch path rebuilds the historical record nightly, often with corrections, late-arriving data, and richer joins.

Microsoft Fabric Real-Time Intelligence is designed for exactly this pattern: events flow through Eventstreams, land in an Eventhouse for low-latency KQL queries, and can be persisted to OneLake for downstream batch and Power BI consumption.

For DP-900, the test is usually simpler: read the scenario, find the latency requirement ("within seconds" vs "by tomorrow morning"), and pick a service from the matching column.

Latency, Throughput, and the Core Trade-Off

Batch and stream sit at opposite ends of a latency-versus-completeness trade-off. Batch waits until it has a complete set of data, then processes it efficiently in one pass — high throughput, high latency, easy correctness. Streaming acts the instant an event arrives — low latency, but it must cope with out-of-order and late-arriving events because the network does not deliver events in perfect order.

This is why streaming engines use watermarks (a marker that says "I have probably seen all events up to time T") to decide when a time window is safe to close. You will not be asked to configure watermarks, but recognizing that streaming trades completeness for immediacy is squarely on the exam.

Event Time vs Processing Time

A subtle but testable streaming concept: event time is when the event actually happened (stamped at the source), while processing time is when the engine handled it. A sensor reading generated at 12:00:00 might not reach the cloud until 12:00:07 because of a network hiccup. Windowed aggregations should use event time so a late reading still counts in the correct minute, not the minute it happened to arrive. This distinction explains why windowing functions exist and why late data must be handled.

Choosing a Service From the Scenario

Scenario clue	Pattern	Azure service
"Every night," "once a day," "scheduled load"	Batch	Data Factory / Fabric Data Factory, Synapse
"Within seconds," "real time," "as it happens"	Stream	Event Hubs + Stream Analytics, Fabric RTI
"Per device," "telemetry," "sensors"	Stream ingest	IoT Hub
"Group events into a few-second batch"	Micro-batch	Spark Structured Streaming (Databricks/Fabric)

Why Many Systems Use Both (Lambda/Kappa)

The lambda architecture runs a fast speed (hot) layer for immediate, approximate results and a slower batch (cold) layer for complete, corrected history, then serves a merged view. The kappa architecture simplifies this by treating everything as a stream and replaying the event log for reprocessing. For DP-900 the key takeaway is recognizing the dual-path idea: the same Event Hubs ingestion endpoint can feed a real-time dashboard and land raw events in the lake for a nightly, full-fidelity rebuild — which is exactly what Microsoft Fabric Real-Time Intelligence is designed to support.

Test Your Knowledge

A logistics company needs to calculate the number of package scans per warehouse in non-overlapping 1-minute buckets, with each scan event belonging to exactly one bucket. Which Azure Stream Analytics window type fits this requirement?

Hopping window

Sliding window

Session window

Tumbling window

Test Your Knowledge

A wind farm needs to score turbine telemetry within two seconds of each reading to detect anomalies, while a separate nightly job rebuilds historical turbine performance tables in a Fabric lakehouse. Which combination of Azure services best fits this scenario?

Azure Data Factory for both the real-time scoring and the nightly rebuild

Azure SQL Database for the real-time scoring and Azure Cache for Redis for the nightly rebuild

Azure Event Hubs with Azure Stream Analytics for real-time scoring, and Microsoft Fabric (Data Factory and Lakehouse) for the nightly batch rebuild

Azure Cosmos DB for the real-time scoring and Azure Cosmos DB for the nightly rebuild

Up Next

3.1 Azure SQL Family Overview

Chapter 3: Relational Data on Azure

Microsoft Azure Data Fundamentals

Azure DP-900

2.5 Batch vs Stream Processing

Key Takeaways

Batch Processing

Batch Characteristics

Azure Services for Batch

Stream Processing

Stream Characteristics

Azure Services for Stream

Micro-Batch Processing

Time Windows in Stream Analytics

Quick Mental Picture

When to Combine Batch and Stream

Latency, Throughput, and the Core Trade-Off

Event Time vs Processing Time

Choosing a Service From the Scenario

Why Many Systems Use Both (Lambda/Kappa)

Microsoft Azure Data Fundamentals

1Chapter 1: Introduction & Exam Overview

2Chapter 2: Core Data Concepts

3Chapter 3: Relational Data on Azure

4Chapter 4: Non-Relational Data on Azure

5Chapter 5: Analytics Workloads on Azure

Azure DP-900

2.5 Batch vs Stream Processing

Key Takeaways

Batch Processing

Batch Characteristics

Azure Services for Batch

Stream Processing

Stream Characteristics

Azure Services for Stream

Micro-Batch Processing

Time Windows in Stream Analytics

Quick Mental Picture

When to Combine Batch and Stream

Latency, Throughput, and the Core Trade-Off

Event Time vs Processing Time

Choosing a Service From the Scenario

Why Many Systems Use Both (Lambda/Kappa)