5.4 Azure Stream Analytics and Real-Time Ingestion

Key Takeaways

  • Event Hubs ingests application and Kafka-protocol events at millions per second; IoT Hub adds per-device identity, MQTT, and cloud-to-device messaging.
  • Azure Stream Analytics is a managed, SQL-based stream processor measured in Streaming Units (SUs) that reads from Event Hubs or IoT Hub and writes to Power BI, Synapse, Cosmos DB, and more.
  • Tumbling windows are non-overlapping, hopping windows overlap by a hop interval, sliding windows emit on event entry or exit, session windows group by inactivity gaps, and snapshot windows group identical timestamps.
  • Microsoft Fabric Real-Time Intelligence uses Eventhouse and KQL Database (queried with KQL) as the modern replacement for the Stream Analytics plus Azure Data Explorer stack.
  • A lambda architecture pairs a hot path (Event Hubs to Stream Analytics to Power BI) with a cold path (Event Hubs to lake to batch processing) so both low-latency alerts and full-fidelity history are available.
Last updated: June 2026

Azure Stream Analytics and Real-Time Ingestion

Batch analytics is no longer enough. Fraud detection, IoT telemetry, clickstreams, and operational dashboards all demand insights with sub-second to single-digit-second latency. Azure offers a stack of real-time services, and DP-900 expects you to know which one handles ingestion, which one handles processing, and which one stores the results.

Quick Answer: Event Hubs and IoT Hub ingest the events. Azure Stream Analytics is the classic SQL-like processing engine that filters and aggregates them. Microsoft Fabric Real-Time Intelligence, including Eventhouse and KQL Database, is the modern Fabric-native replacement that stores and queries event data interactively.

The Real-Time Pipeline

Producers --> Ingestion --> Stream Processing --> Storage / Serve --> Visualize
(IoT, apps,    (Event Hubs,   (Stream Analytics,   (ADLS, Cosmos DB,   (Power BI,
 clickstream)   IoT Hub,       Fabric Real-Time     Eventhouse, SQL DB)  Fabric)
                Kafka)         Intelligence)

Ingestion: Event Hubs vs IoT Hub

Azure Event Hubs is a hyperscale event-ingestion service. It can accept millions of events per second and partition them across multiple consumer groups for parallel reads. Event Hubs is Kafka-protocol compatible, so existing Kafka producers and consumers work with it unchanged.

Azure IoT Hub is built on top of the Event Hubs engine but adds device-centric features:

  • Per-device identity and authentication (X.509 certs, SAS tokens).
  • Bi-directional communication: cloud-to-device messages, direct methods, device twins, file uploads.
  • Built-in device provisioning through DPS (Device Provisioning Service).
  • Support for MQTT, AMQP, and HTTPS device protocols.

Rule of thumb for the exam: if the source is "an application," "a website," or "a connector," it is Event Hubs. If the source is "a sensor," "a device," or anything that needs device twins or cloud-to-device commands, it is IoT Hub.

Azure Stream Analytics

Azure Stream Analytics (ASA) is a fully managed, SQL-based stream processing engine. You write a SQL-like query that reads from one or more inputs (Event Hubs, IoT Hub, Blob Storage) and writes to one or more outputs (Power BI, Synapse SQL pool, Cosmos DB, ADLS Gen2, Service Bus, Functions, Data Lake).

Key concepts:

  • Streaming Units (SUs) — the unit of capacity. Scale up or down by changing SU count.
  • Reference data — small slowly-changing tables joined to the stream at query time.
  • Windowing functions — the central feature, because every real-time aggregate must be bounded by time.

Windowing Functions

Streams are infinite, so aggregations must define a time window. ASA supports five window types — they appear constantly on the exam.

WindowBehaviorTypical use
TumblingFixed-size, non-overlapping windows back-to-back"Total orders per 1-minute bucket"
HoppingFixed-size windows that overlap by a hop interval"5-minute moving total, refreshed every minute"
SlidingA window emits a result only when an event enters or leaves"Alert when a sensor exceeds 100 degrees within a 30-second sliding window"
SessionGroups events until a gap of inactivity, then closes"Web session = events with no more than 30-second gap"
SnapshotGroups events that have the exact same timestamp"All events that occurred at the same millisecond"

Tumbling and hopping windows are time-based and predictable. Sliding and session windows are event-driven and variable.

Microsoft Fabric Real-Time Intelligence (the 2026 Pattern)

Microsoft has consolidated its modern real-time analytics offering inside Microsoft Fabric as Real-Time Intelligence (RTI). The two storage primitives are:

  • Eventhouse — A Fabric item that holds one or more KQL databases, optimized for time-series and log analytics. Backed by the Kusto engine.
  • KQL Database — Stores streaming and historical event data and is queried with Kusto Query Language (KQL) — the same language used by Azure Data Explorer, Log Analytics, and Microsoft Sentinel.

Other Real-Time Intelligence pieces include:

  • Eventstreams — A no-code experience to route events from Event Hubs, IoT Hub, Kafka, and other sources to destinations such as KQL databases, lakehouses, or Activator.
  • Activator — The trigger engine that watches data in motion and fires actions (Teams alert, Power Automate flow, email) when conditions are met.
  • Real-Time Dashboard — Native low-latency dashboards built directly on KQL queries.

For a DP-900 candidate it is enough to know that Fabric RTI is the modern, unified replacement for the Azure Stream Analytics + Azure Data Explorer + Power BI streaming dataset combo of the older Synapse era.

Hot Path vs Cold Path (Lambda Architecture)

A common architecture pattern keeps two parallel branches:

  • Hot path — Event Hubs -> Stream Analytics (or Fabric Eventstreams) -> Power BI / Activator alerts. Sub-second latency, summarized data.
  • Cold path — Event Hubs -> ADLS Gen2 (raw) -> Spark/Synapse -> Power BI. Higher latency, full fidelity for historical analysis and ML training.

Both branches feed from the same ingestion endpoint, which is why Event Hubs sits at the front of nearly every Azure real-time architecture.

Inputs, Outputs, and the Query in Between

An Azure Stream Analytics job has three parts you should be able to name. Inputs are the data sources — a stream input (Event Hubs, IoT Hub, or Blob/ADLS for replay) and optional reference data (a small, slowly changing lookup table joined to the stream). The query is SQL-like and includes the windowing function. Outputs are the sinks — Power BI (for live dashboards), Synapse / SQL Database, Cosmos DB, ADLS Gen2 / Blob, Service Bus, Event Hubs, or Azure Functions.

A frequent exam pattern hands you a sentence and asks which is the input versus output; "powers a live dashboard" means the Power BI output.

Tumbling vs Hopping, Precisely

The difference candidates miss most: a tumbling window of 5 minutes produces one result every 5 minutes and each event belongs to exactly one window. A hopping window of 5 minutes with a 1-minute hop also looks back 5 minutes but emits a result every minute, so each event appears in up to five overlapping windows. If the question says "a moving/rolling average updated more often than the window length," it is hopping; if it says "one bucket, no overlap," it is tumbling. A tumbling window is simply a hopping window whose hop equals its size.

Choosing Between Stream Analytics and Fabric RTI

For an Azure-resource architecture that already uses Event Hubs and Power BI, Azure Stream Analytics is the established SQL-based processor. For a unified, SaaS-style platform, Microsoft Fabric Real-Time Intelligence is the 2026 direction: Eventstreams route data with no code, an Eventhouse/KQL database stores and indexes it for interactive KQL queries, Activator fires alerts, and Real-Time Dashboards visualize it. Both solve the same problem; the exam distinguishes them by whether the scenario is built on discrete Azure resources (ASA) or on Fabric (RTI).

KQL in One Paragraph

You will not write KQL on DP-900, but you should recognize it. Kusto Query Language is the read-optimized language of Azure Data Explorer, Log Analytics, Microsoft Sentinel, and Fabric Eventhouses. It reads top-down with a pipe operator, e.g. Telemetry | where Temperature > 100 | summarize avg(Temperature) by DeviceId. If a question shows pipe-delimited query syntax or names an Eventhouse/KQL database, the language is KQL, not T-SQL or the Stream Analytics SQL dialect.

Reliability Concepts

Streaming engines aim for strong delivery guarantees. Azure Stream Analytics provides exactly-once processing within the job and at-least-once delivery to most outputs, using checkpoints so a restarted job resumes without losing or double-counting events. Combined with event-time windowing and late-arrival tolerance settings, this is how a real-time pipeline stays correct despite network jitter — the conceptual reliability story the exam expects you to appreciate even without configuring it.

Test Your Knowledge

A solution must produce the average temperature for each non-overlapping 1-minute period from an IoT Hub stream, with one result per minute. Which Azure Stream Analytics windowing function should you use?

A
B
C
D
Test Your Knowledge

An architect is choosing between Azure Event Hubs and Azure IoT Hub for ingesting telemetry. The source is 50,000 industrial sensors that require X.509 certificate authentication and occasional cloud-to-device commands to change sampling rate. Which service fits best?

A
B
C
D