9.2 Log Analytics, KQL, and Operational Queries

Key Takeaways

KQL queries start with a table, filter early with where, shape columns with project, and aggregate with summarize.
TimeGenerated is the primary timestamp used to control the analysis window in Log Analytics.
Operational KQL should answer a specific support question, such as what failed, where it failed, and whether it is still happening.
Joins, parse, extend, bin, and render are useful for correlation, extraction, time grouping, and visualization.
AZ-104 usually tests practical query interpretation rather than advanced data science.

Last updated: May 2026

KQL mental model

Kusto Query Language, or KQL, is a pipeline language. A query normally starts with a table, then each pipe character sends the current result set to the next operation. For AZ-104, the most important habit is to filter early, summarize intentionally, and project only the columns needed to answer the operational question.

A basic pattern looks like this:

AzureActivity
| where TimeGenerated > ago(24h)
| where ActivityStatusValue == "Failed"
| project TimeGenerated, ResourceGroup, ResourceProviderValue, OperationNameValue, Caller, ActivityStatusValue
| order by TimeGenerated desc

This query asks for failed control plane activity in the last day. It uses TimeGenerated for the time window, where to reduce the data, project to keep useful columns, and order by to put recent events first. If a question asks why a deployment failed or who attempted a change, this is the kind of query you should recognize.

KQL is case-sensitive in some operators and functions. == is case-sensitive. =~ is case-insensitive. has is usually better than contains for tokenized text search because it can be more efficient and semantically cleaner. Exam questions generally avoid obscure syntax, but they may expect you to identify the query that counts failures or filters the correct time range.

Common operators

Operator	Purpose	Example
`where`	Filter rows	`where TimeGenerated > ago(1h)`
`project`	Select or rename columns	`project TimeGenerated, Computer, CounterValue`
`extend`	Add calculated columns	`extend Percent = CounterValue * 100`
`summarize`	Aggregate rows	`summarize Count=count() by ResourceGroup`
`bin`	Group time into intervals	`by bin(TimeGenerated, 5m)`
`order by`	Sort results	`order by Count desc`
`join`	Combine tables	`join kind=leftouter (...) on ResourceId`
`render`	Visualize result	`render timechart`

A useful metric-like log query groups records into time buckets:

AzureActivity
| where TimeGenerated > ago(6h)
| where ActivityStatusValue in ("Failed", "Succeeded")
| summarize Events=count() by ActivityStatusValue, bin(TimeGenerated, 15m)
| render timechart

This helps an administrator see whether failures are isolated or part of a larger deployment issue.

VM operational queries

VM monitoring has multiple sources. Platform metrics can show CPU percentage, but guest-level logs and performance counters require Azure Monitor Agent and data collection rules. Depending on configuration, you may query tables such as Perf, InsightsMetrics, Heartbeat, Event, or Syslog.

Heartbeat is a simple availability clue:

Heartbeat
| where TimeGenerated > ago(30m)
| summarize LastHeartbeat=max(TimeGenerated) by Computer, _ResourceId
| extend MinutesSinceHeartbeat = datetime_diff("minute", now(), LastHeartbeat)
| where MinutesSinceHeartbeat > 10
| order by MinutesSinceHeartbeat desc

This does not prove the application is healthy. It only shows that the monitoring agent has not reported recently. Use it to separate monitoring-agent silence from platform availability questions.

A Windows event investigation might start like this:

Event
| where TimeGenerated > ago(2h)
| where EventLevelName in ("Error", "Critical")
| project TimeGenerated, Computer, EventLog, Source, EventID, RenderedDescription
| order by TimeGenerated desc

A Linux syslog investigation might use:

Syslog
| where TimeGenerated > ago(2h)
| where SeverityLevel in ("err", "crit", "alert", "emerg")
| project TimeGenerated, Computer, Facility, ProcessName, SyslogMessage
| order by TimeGenerated desc

Storage and network queries

Storage diagnostic logs can support troubleshooting that metrics alone cannot. If users report intermittent authorization failures, summarize by status and caller address:

StorageBlobLogs
| where TimeGenerated > ago(1h)
| where StatusCode in (401, 403)
| summarize Failures=count() by StatusCode, AuthenticationType, CallerIpAddress, bin(TimeGenerated, 5m)
| order by TimeGenerated desc

The exact table depends on the resource and diagnostic configuration. The exam will often give table names in the choices. Choose the query that filters the relevant time range and status code instead of one that counts unrelated successful traffic.

For network security group flow logs or traffic analytics, the table names and schema depend on the enabled feature version. The operational goal is the same: filter by source, destination, port, direction, and action, then summarize to identify denied flows. A troubleshooting query should answer whether the network blocked traffic, whether traffic never arrived, or whether a later layer failed.

Alert query design

A log alert query should return a numeric result that can be evaluated. A support query may return rows for humans. Those are different shapes. For a log alert that fires when more than five failed deployments occur in ten minutes, write a query that counts failures:

AzureActivity
| where TimeGenerated > ago(10m)
| where ActivityStatusValue == "Failed"
| where OperationNameValue has "deployments"
| summarize FailedDeployments=count()

Then configure the alert condition so it fires when FailedDeployments is greater than 5. If a query returns many columns and raw rows, it may be useful for investigation but awkward for a numeric alert rule.

Troubleshooting bad queries

When a query returns no data, confirm the table has data before debugging the filters. Start with the table name and a short time window:

AzureActivity
| take 10

Then add where TimeGenerated > ago(24h), then add one filter at a time. Check the time picker in the portal because it can override or interact with query expectations. Confirm column names from the schema pane. If you search text with exact case and get no result, try a case-insensitive operator such as =~ or search a broader field.

When a query is slow, reduce the time range, filter early, avoid unnecessary contains, project fewer columns, and summarize before joining. For AZ-104, performance tuning is not deep, but the administrator should know that querying seven years of raw logs interactively is not the same design as exporting logs for compliance retention.

Scenario recognition

If a question says "show a chart of average CPU by VM every five minutes," a metric chart or metric query may be enough. If it says "show failed administrative operations by caller," use AzureActivity. If it says "find VMs whose monitoring agent stopped reporting," query Heartbeat. If it says "count 403 blob operations by IP," use the storage resource log table after diagnostic settings are configured. If it says "build an alert from a complex count of log records," shape the KQL query to return a numeric result.

Test Your Knowledge

Which KQL clause is most appropriate for grouping failed activity events into five-minute intervals?

summarize count() by bin(TimeGenerated, 5m)

project-away TimeGenerated

take 5 by TimeGenerated

render group

Test Your Knowledge

A log alert must fire when more than ten 403 storage requests occur in fifteen minutes. What should the KQL query return?

A numeric count of matching records that the alert condition can evaluate

Only a Markdown table

Only a list of all successful requests

A Recovery Services vault name

Test Your Knowledge

Which table is commonly used to identify machines whose monitoring agent has stopped reporting recently?

Heartbeat

AzureActivity only

StorageBlobLogs only

PolicyResources only

Up Next

9.3 Alert Rules, Action Groups, and Alert Processing Rules

Continue learning

AZ-104 Microsoft Azure Administrator Study Guide

1Chapter 1: Exam Orientation and Microsoft Learn Source Control

2Chapter 2: Identity, RBAC, and Governance Foundations

3Chapter 3: Subscriptions, Policy, Costs, and Resource Organization

4Chapter 4: Storage Accounts, Access, and Data Protection

5Chapter 5: Compute, Virtual Machines, Scale Sets, and Bicep

6Chapter 6: Containers, App Service, and Platform Compute

7Chapter 7: Virtual Networking, Routing, and Secure Access

8Chapter 8: Name Resolution, Load Balancing, and Connectivity Troubleshooting

9Chapter 9: Monitoring, Alerting, Backup, and Site Recovery

10Chapter 10: AZ-104 Integration and Troubleshooting Case Labs

11Chapter 11: Final Review, Exam Experience, Renewal, and Career Path

9.2 Log Analytics, KQL, and Operational Queries

Key Takeaways

KQL mental model

Common operators

VM operational queries

Storage and network queries

Alert query design

Troubleshooting bad queries

Scenario recognition