An e-commerce team reports that their live checkout web page is responding slowly and occasionally throwing exceptions. Which Azure tool should they use to diagnose response times, failures, and dependency calls?

Application Insights. Application Insights is the Application Performance Management feature of Azure Monitor; it tracks response times, failure rates, exceptions, and dependency calls for live web applications. Service Health and Azure Status report on Microsoft's platform, not your application code, and Resource Health reports the up/down state of one resource without app-level diagnostics.

Which Azure Service Health layer is personalized to only the services and regions that your subscriptions actually use, and tracks planned maintenance affecting you?

Service Health. Service Health is the personalized middle layer that surfaces service issues, planned maintenance, health advisories, and security advisories scoped to your subscriptions. Azure Status is the global public view for all customers, and Resource Health zooms into a single specific resource.

For how long does Azure Monitor retain platform metrics by default before you must export them for longer storage?

93 days. Platform metrics are stored for 93 days by default. To keep numeric data longer you route it through diagnostic settings to a Log Analytics workspace, Storage Account, or Event Hub. Logs (not metrics) are what can be retained up to two years in a workspace.

Azure Monitor and Service Health — Free Study Guide 2026

Quick Answer: Azure Monitor is the full-stack telemetry platform. Metrics = fast numbers (93-day retention). Logs = detailed records you query with Kusto Query Language (KQL) in Log Analytics. Application Insights = live app performance. Service Health answers "is Azure broken?" at three zoom levels.

Azure Monitor: the telemetry umbrella

Azure Monitor automatically collects two fundamental data types the moment a resource is created — you do not deploy an agent to get them. Knowing the metrics-versus-logs split is the single most-tested idea in this section.

Pillar	What it is	Retention	How you read it
Metrics	Lightweight, time-series numbers (CPU %, request count, disk IOPS) sampled at one-minute granularity	93 days by default	Metrics Explorer, charts, autoscale rules
Logs	Verbose, structured event records (traces, errors, sign-ins)	Configurable, 30 days up to 2 years (longer with archive)	KQL in a Log Analytics workspace

A useful mental model: a metric tells you a number crossed a line; a log tells you the story of what happened. Autoscaling and most alerts fire on metrics because they are cheap and near-real-time; forensic troubleshooting uses logs.

Data sources Azure Monitor ingests

Application — your code's performance, via Application Insights
Guest OS — metrics/logs from inside a virtual machine (needs the Azure Monitor Agent)
Azure resource — platform metrics and resource (diagnostic) logs
Azure subscription — Activity Log and Service Health events
Azure tenant — Microsoft Entra ID (formerly Azure AD) audit and sign-in logs

Log Analytics and KQL

Log Analytics is the tool where you write queries against log data using Kusto Query Language (KQL). You will not be asked to write KQL on AZ-900, but a classic distractor pairs the word "query logs" with Application Insights. The exam answer for querying log data is always Log Analytics.

Application Insights

Application Insights is the Application Performance Management (APM) feature for live web apps and APIs. It surfaces:

Response times, failure rates, and exception stacks
Dependency tracking — calls out to SQL, REST APIs, and queues
Availability tests — synthetic ping tests from worldwide locations
Usage analytics — page views, sessions, and user funnels

Trap: Application Insights monitors your application code. Service Health monitors Microsoft's platform. If a question is about a slow checkout page, the answer is Application Insights, never Service Health.

Alerts: rule + action group

An Azure Monitor alert has two independent halves, and AZ-900 loves to test that separation.

Component	Answers the question	Example
Alert rule	What condition fires it?	CPU > 90% for 5 minutes
Action group	Who is told and how?	Email the on-call team, SMS the manager, run a Logic App
Severity	How urgent?	Sev 0 (Critical) → Sev 4 (Verbose)

Three alert types map to the data they watch: metric alerts (a number crosses a threshold), log alerts (a KQL query returns a result), and activity log alerts (a management action such as "VM deleted" occurs). Action groups can automate remediation — calling an Azure Function or Logic App — not just send notifications.

Service Health: three levels of zoom

Azure Service Health tells you when Azure itself has a problem. Picture a zoom lens going from the whole planet down to one disk.

Layer	Scope	Personalized?	Where
Azure Status	Every Azure service, every region, all customers	No	status.azure.com (public)
Service Health	Only the services and regions you use	Yes	Azure portal
Resource Health	One specific resource (a single VM)	Yes	The resource's blade

Service Health (the middle layer) tracks four event kinds: service issues (active outages), planned maintenance, health advisories (action-required changes like a deprecation), and security advisories. You can configure Service Health alerts so an outage in your region emails the team automatically.

Resource Health reports four states: Available (healthy), Unavailable (a platform or non-platform event hit it), Degraded (reduced performance), and Unknown (no signal for 10+ minutes).

On the Exam: Memorize the zoom: Azure Status = global / everyone, Service Health = personalized to your subscriptions, Resource Health = one resource. If a question mentions a single named VM, the answer is Resource Health. If it mentions "any maintenance affecting my subscriptions," the answer is Service Health.

Worked example: choosing the right tool

Walk through a realistic decision tree, because AZ-900 phrases these as "a company wants to..." scenarios:

"We want to auto-scale a virtual machine scale set when average CPU exceeds 70%." — This is numeric and near-real-time, so it uses metrics plus an autoscale rule, not logs.
"We need to search six months of historical errors and correlate them across services." — Six months exceeds the 93-day metric window and needs rich querying, so the answer is logs in Log Analytics queried with KQL.
"Email the on-call engineer when a metric breaches a threshold." — An alert rule detects the breach; an action group sends the email.
"Is the slowdown our code or Microsoft's platform?" — Check Application Insights for the app side and Service Health / Resource Health for the platform side.

Notice that metrics, logs, alerts, Application Insights, and Service Health each own a distinct job; the exam rewards you for picking the narrowest tool that fully answers the scenario.

Common AZ-900 traps in this section

"Query logs" always maps to Log Analytics / KQL, never to Application Insights (which generates app telemetry but is not the generic log-query tool).
Service Health is not Azure Monitor. Service Health is about Microsoft's platform outages; Azure Monitor is about your resources and apps.
Metrics ≠ logs. Metrics are numbers kept 93 days; logs are records kept in a workspace for up to two years.
Action group vs alert rule. The rule is the condition; the group is the response. Questions often hand you one and ask for the other.

The AZ-900 exam — roughly 40–60 questions in 45 minutes, a passing score of 700 on a 1000-point scale, US$99, delivered through Pearson VUE for Microsoft — reliably includes at least one Monitor-versus-Service-Health discrimination item, so over-learn this split.

Microsoft Azure Fundamentals

Azure AZ-900

3.6 Azure Monitor and Service Health

Key Takeaways

Azure Monitor: the telemetry umbrella

Data sources Azure Monitor ingests

Log Analytics and KQL

Application Insights

Alerts: rule + action group

Service Health: three levels of zoom

Worked example: choosing the right tool

Common AZ-900 traps in this section

Microsoft Azure Fundamentals

1Introduction

2Domain 1: Cloud Concepts (25-30%)

3Domain 2A: Azure Architecture Components

4Domain 2B: Azure Compute Services

5Domain 2C: Azure Networking Services

6Domain 2D: Azure Storage Services

7Domain 2E: Azure Identity, Access, and Security

8Domain 3A: Azure Cost Management

9Domain 3B: Azure Governance and Compliance

10Domain 3C: Azure Monitoring and Management Tools

11Additional Azure Services and Exam Review

12Advanced Azure Topics and Service Deep Dives

Azure AZ-900

3.6 Azure Monitor and Service Health

Key Takeaways

Azure Monitor: the telemetry umbrella

Data sources Azure Monitor ingests

Log Analytics and KQL

Application Insights

Alerts: rule + action group

Service Health: three levels of zoom

Worked example: choosing the right tool

Common AZ-900 traps in this section