6.1 KQL & Threat Hunting
Key Takeaways
- Kusto Query Language (KQL) is read-only; the SC-200 expects you to read, reason about, and lightly modify queries rather than write them from scratch under exam pressure.
- The core hunting pipeline is filter then shape then aggregate: where to filter, project/extend to shape columns, summarize to aggregate, and join to correlate tables.
- Defender XDR Advanced Hunting and Microsoft Sentinel both use KQL, but the table schemas differ — Device*/Email*/Identity* tables live in Defender XDR, while SecurityAlert/SecurityIncident/SigninLogs/Syslog live in Sentinel.
- Sentinel hunting queries are MITRE ATT&CK-tagged saved KQL that do not generate alerts; Livestream re-runs a query on a short interval so analysts can watch an active intrusion.
- Always bound a hunting query with a time filter (ago() or a between() window) first — unbounded scans are slow, costly, and a common exam wrong-answer trap.
Why Threat Hunting Matters on SC-200
The Perform threat hunting domain is worth 20-25% of the SC-200 exam. It is the smallest of the three domains, but it is also where candidates with weak Kusto Query Language (KQL) skills lose the most points. Microsoft does not ask you to author production queries from a blank screen — instead, it shows you a query and asks what it returns, which operator to add, or which table to start from. That means reading fluency beats memorizing syntax.
Threat hunting is proactive: you form a hypothesis about adversary behavior, query telemetry to confirm or reject it, and then either close the loop or escalate to detection engineering. This is different from the response domain, where an alert already fired.
KQL Fundamentals
KQL is a read-only query language. A query starts with a table name (the data source) and flows left to right through a pipeline of operators separated by the pipe (|) character. Each operator transforms the rows from the previous step.
| Operator | Purpose | Mental model |
|---|---|---|
where | Filter rows by a condition | Keep only rows that match |
project | Select / rename specific columns | Choose the columns you want |
project-away | Drop specific columns | Remove noise columns |
extend | Add a calculated column | Compute a new field |
summarize | Aggregate rows into groups | GROUP BY with counts/aggregates |
join | Correlate two tables on a key | Match events across tables |
union | Stack rows from multiple tables | Combine same-shaped tables |
top / take | Limit rows | Sample or rank results |
sort by / order by | Order results | Newest or largest first |
The Canonical Hunting Pipeline
The pattern Microsoft tests repeatedly is filter, then shape, then aggregate: start narrow with a time bound and a where, reduce columns with project or add context with extend, then summarize to find outliers.
DeviceProcessEvents
| where Timestamp > ago(7d)
| where FileName in~ ("powershell.exe", "cmd.exe")
| where ProcessCommandLine has_any ("-enc", "DownloadString", "Invoke-Expression")
| extend ParentChild = strcat(InitiatingProcessFileName, " -> ", FileName)
| summarize Hits = count(), Devices = dcount(DeviceId) by ParentChild
| sort by Hits desc
Key reading points the exam rewards:
ago(7d)bounds the scan to the last seven days. A time filter should come first for performance and cost — unbounded queries are a classic distractor.in~andhas_anyare case-insensitive;==andhas(single token) are stricter. Knowing the difference betweenhas(term match),contains(substring), and==(exact) is frequently tested.dcount()returns a distinct count;count()returns total rows. Mixing these up is a common wrong-answer pattern.summarize ... by Xgroups results byX; the columns you can keep aftersummarizeare only the aggregates and thebykeys.
join Kinds
join correlates two result sets on a shared key. The join kind changes which rows survive, and SC-200 likes to test this:
| Kind | Returns |
|---|---|
inner (default is innerunique) | Rows with a key match in both sides |
leftouter | All left rows, plus matches from the right |
leftsemi | Left rows that have a match (left columns only) |
leftanti | Left rows that have no match on the right |
leftanti is the "find what is missing" pattern — for example, devices that logged on but never reported an antivirus scan. If a question asks for hosts that did something but are absent from a baseline table, the answer is almost always leftanti.
Defender XDR Advanced Hunting Schema
Advanced Hunting in Microsoft Defender XDR (portal: security.microsoft.com) queries a unified schema across endpoint, email, identity, and cloud-app telemetry. You should recognize what each table holds and which prefix maps to which workload.
| Table | Workload | Typical hunt |
|---|---|---|
DeviceProcessEvents | Defender for Endpoint | Suspicious process trees, LOLBins |
DeviceNetworkEvents | Defender for Endpoint | C2 beaconing, rare destinations |
DeviceLogonEvents | Defender for Endpoint | Local logon anomalies |
EmailEvents | Defender for Office 365 | Phishing delivery, sender patterns |
EmailUrlInfo / EmailAttachmentInfo | Defender for Office 365 | Malicious URLs and attachments |
IdentityLogonEvents | Defender for Identity | On-prem AD logon hunting |
IdentityDirectoryEvents | Defender for Identity | Directory changes, DCSync-style activity |
CloudAppEvents | Defender for Cloud Apps | SaaS activity, OAuth grants |
AlertInfo / AlertEvidence | Defender XDR | Pivot from alerts to raw evidence |
Note the distinction the exam loves: IdentityLogonEvents (Defender for Identity, on-premises Active Directory) is not the same as Microsoft Entra ID sign-in data. Entra interactive sign-ins live in SigninLogs (Sentinel) or AADSignInEventsBeta (Defender XDR). Picking the wrong identity table is a frequent trap.
Microsoft Sentinel Hunting Queries and Livestream
In Microsoft Sentinel the Hunting blade holds saved KQL queries that are mapped to MITRE ATT&CK tactics and techniques. A hunting query is different from an analytics rule: a hunting query does not create alerts or incidents on its own. It is a hypothesis you run on demand or bookmark for later.
- Hunting queries are tagged with ATT&CK tactics (for example, Persistence, Lateral Movement) so you can hunt by adversary objective rather than by data source.
- Livestream takes a query and re-runs it on a short, near-real-time cadence so an analyst can watch an attack as it unfolds during active response. Livestream is investigative, not a persistent detection.
- A query that proves valuable should be promoted into a scheduled analytics rule or a Defender XDR custom detection so it generates alerts going forward (covered in 6.2).
Sentinel-side tables differ from Defender XDR. Common ones: SecurityAlert (alerts from connected products), SecurityIncident, SigninLogs and AADNonInteractiveUserSignInLogs (Entra ID), AuditLogs, Syslog, CommonSecurityLog (CEF), and OfficeActivity.
MITRE ATT&CK-Driven Hunting
The MITRE ATT&CK framework is a knowledge base of adversary tactics (the why) and techniques (the how). SC-200 expects you to drive hunts from a hypothesis tied to ATT&CK rather than browsing data aimlessly.
A structured hunt looks like this:
- Hypothesis — "An attacker is using WMI for lateral movement (technique T1047)."
- Data source — Pick the table that records that behavior (
DeviceProcessEventsforwmic.exe/WmiPrvSE.exe). - Query — Filter for the behavior, then aggregate to surface outliers.
- Triage — Separate benign administration from suspicious patterns.
- Outcome — Close as benign, escalate to an incident, or promote to a detection rule.
DeviceProcessEvents
| where Timestamp > ago(14d)
| where InitiatingProcessFileName =~ "WmiPrvSE.exe"
| where FileName in~ ("powershell.exe", "cmd.exe", "rundll32.exe")
| project Timestamp, DeviceName, AccountName, FileName, ProcessCommandLine
| sort by Timestamp desc
In Defender XDR, Threat Analytics reports give you adversary and campaign context (affected assets, recommended actions) that can seed a hypothesis. The exam may ask which feature provides analyst-ready threat intelligence and impact mapping — that is Threat Analytics, not Advanced Hunting itself.
A hunter needs every device that recorded an interactive logon in the last 7 days but is absent from a separate table of devices that ran a compliant antivirus scan. Which KQL join kind correctly returns only the logon devices with no matching scan record?
During an active intrusion, an SC-200 analyst wants to watch a specific KQL query continuously refresh in near real time in Microsoft Sentinel so they can observe attacker actions as they happen, without creating alerts. Which capability should they use?
An exam scenario shows a hunt for malicious on-premises Active Directory authentication using Defender for Identity telemetry in Defender XDR Advanced Hunting. Which table should the query start from?
Review this query: DeviceNetworkEvents | summarize Conns = count() by RemoteUrl, DeviceId | where Conns > 1000 | summarize dcount(DeviceId) by RemoteUrl. What does the final result represent?