SCStelz/security-investigator
GitHub用于调查条件访问策略变更、相关登录失败(如错误码53000)及疑似绕过行为。通过关联策略修改与登录时间线,区分合法排障与安全控制规避,提供取证分析。
Install All Skills
npx skills add SCStelz/security-investigator --all -g -y
More Options
List skills in collection
npx skills add SCStelz/security-investigator --list
Skills in Collection (27)
.github/skills/ca-policy-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill ca-policy-investigation -g -y
SKILL.md
Frontmatter
{
"name": "ca-policy-investigation",
"description": "Use this skill when asked to investigate Conditional Access policy changes, sign-in failures related to CA policies (error codes 53000, 50074, 530032), or suspected policy bypass\/manipulation. Triggers on keywords like \"Conditional Access\", \"CA policy\", \"device compliance\", \"policy bypass\", \"53000\", \"50074\", or when investigating why a user was blocked then suddenly unblocked. This skill provides forensic analysis of CA policy modifications correlated with sign-in failures.",
"drill_down_prompt": "Investigate Conditional Access policy changes — sign-in correlation, bypass detection",
"threat_pulse_domains": [
"identity"
]
}
Conditional Access Policy Investigation - Instructions
Purpose
This skill investigates Conditional Access (CA) policy changes in correlation with sign-in failures to detect:
- Legitimate troubleshooting (authorized policy changes to resolve access issues)
- Security control bypass (unauthorized policy modifications to circumvent blocks)
- Privilege abuse (users with admin rights weakening security controls)
The key distinction is whether policy changes were authorized and necessary vs self-service bypass of security controls.
📑 TABLE OF CONTENTS
- Critical Investigation Rules - Mandatory workflow steps
- Common Error Codes - Sign-in failure reference
- CA Policy States - Understanding policy modes
- 5-Step Investigation Workflow - KQL queries and analysis
- Real-World Example - Complete walkthrough
- Critical Mistakes - What NOT to do
- Security Recommendations - Remediation guidance
Critical Investigation Rules
When investigating sign-in failures (error codes 53000, 50074) with CA policy correlation:
⚠️ MANDATORY STEPS - DO NOT SKIP:
- Query ALL CA policy changes in chronological order (±2 days from failure time)
- Parse policy state transitions from the JSON (enabled → disabled → report-only)
- Compare failure timeline with policy change timeline
- Verify logical consistency: Ask "does this make sense?"
Key Questions to Answer:
- Was the user blocked BEFORE the policy change?
- Did the policy change resolve the block?
- Who initiated the policy change? (same user = suspicious)
- What was the business justification?
Common Error Codes
| Error Code | Description | Typical Cause |
|---|---|---|
| 53000 | Device not compliant | Device not enrolled in Intune or failing compliance checks |
| 50074 | Strong authentication required | MFA not satisfied |
| 50074 | User must enroll in MFA | MFA not configured for user |
| 530032 | Blocked by CA policy | Generic CA policy block |
| 65001 | User consent required | Application consent needed |
| 53003 | Access blocked by CA policy | Explicit block condition met |
| 70044 | Session expired | User needs to re-authenticate |
Error Code Investigation Priority
| Priority | Error Codes | Investigation Focus |
|---|---|---|
| HIGH | 53000, 530032, 53003 | Device compliance, CA policy blocks - check for policy manipulation |
| MEDIUM | 50074 | MFA requirements - check if MFA was bypassed |
| LOW | 65001, 70044 | Consent/session issues - usually not security-related |
CA Policy State Meanings
| State | What It Means | Security Impact |
|---|---|---|
| enabled | Policy actively enforcing | Blocks non-compliant access (intended behavior) |
| disabled | Policy not enforcing | Security control bypassed - all access allowed |
| enabledForReportingButNotEnforced | Report-only mode | Logs violations but doesn't block - defeats purpose |
State Transition Risk Assessment
| Transition | Risk Level | Interpretation |
|---|---|---|
enabled → disabled |
HIGH | Complete security bypass |
enabled → enabledForReportingButNotEnforced |
MEDIUM-HIGH | Partial bypass (monitoring only) |
disabled → enabled |
LOW | Security restored (good) |
enabledForReportingButNotEnforced → enabled |
LOW | Security strengthened (good) |
Investigation Workflow Pattern
Step 1: Identify Sign-In Failures
Query sign-in failures with CA context:
// Get failures with CA context
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (datetime(<START>) .. datetime(<END>))
| where UserPrincipalName =~ '<UPN>'
| where ResultType != '0'
| where AppDisplayName has '<APPLICATION>' // e.g., "Visual Studio Code"
| project TimeGenerated, IPAddress, Location, ResultType, ResultDescription,
ConditionalAccessStatus, UserAgent
| order by TimeGenerated asc
What to Look For:
ResultTypevalues: 53000, 50074, 530032, 53003ConditionalAccessStatus: "failure", "notApplied"- Pattern of repeated failures followed by success
Step 2: Query ALL CA Policy Changes in Timeframe
CRITICAL: Query ±2 days from the first failure time
let failure_time = datetime(<FIRST_FAILURE_TIME>);
let start = failure_time - 2d;
let end = failure_time + 2d;
AuditLogs
| where TimeGenerated between (start .. end)
| where OperationName has_any ("Conditional Access", "policy")
| where Identity =~ '<UPN>' or tostring(InitiatedBy) has '<UPN>'
| extend InitiatorUPN = tostring(parse_json(InitiatedBy).user.userPrincipalName)
| extend InitiatorIPAddress = tostring(parse_json(InitiatedBy).user.ipAddress)
| extend TargetName = tostring(parse_json(TargetResources)[0].displayName)
| project TimeGenerated, OperationName, Result, InitiatorUPN, InitiatorIPAddress,
TargetName, CorrelationId
| order by TimeGenerated asc // CRITICAL: Chronological order
Critical Analysis Points:
- InitiatorUPN: Who made the change? Same user as blocked = suspicious
- TargetName: Which policy was modified?
- TimeGenerated: Did change occur AFTER sign-in failures?
- Order: Always chronological (oldest first) to see cause/effect
Step 3: Parse Policy State Changes
For each CorrelationId from Step 2, get detailed changes:
// Get detailed property changes for a specific policy modification
AuditLogs
| where CorrelationId == "<CORRELATION_ID>"
| extend ModifiedProperties = parse_json(TargetResources)[0].modifiedProperties
| mv-expand ModifiedProperties
| extend PropertyName = tostring(ModifiedProperties.displayName)
| extend OldValue = tostring(ModifiedProperties.oldValue)
| extend NewValue = tostring(ModifiedProperties.newValue)
| project TimeGenerated, PropertyName, OldValue, NewValue
Key Properties to Extract:
- Look for
"state"property in the JSON - Parse
OldValueandNewValuefor state transitions - Document:
enabled→disabled→enabledForReportingButNotEnforced
Step 4: Extract Policy State from JSON
Manual JSON Parsing:
The OldValue and NewValue fields contain JSON. Look for the "state" field:
{
"state": "enabled",
"conditions": { ... },
"grantControls": { ... }
}
Build the Timeline:
- Extract
"state"from eachOldValueandNewValue - Create chronological list:
enabled→disabled→enabledForReportingButNotEnforced - Correlate with sign-in failure timeline
Step 5: Security Assessment
Compare timelines and assess intent:
| Pattern | Interpretation | Risk Level |
|---|---|---|
| Failures → Policy Disabled | User bypassed security control to unblock self | HIGH - Privilege abuse |
| Failures → Policy Changed to Report-Only | User weakened security control | MEDIUM-HIGH - Partial bypass |
| Policy Disabled → Failures Continue | Cached tokens (5-15 min propagation delay) | INFO - Expected behavior |
| Policy Changed → No More Failures | Policy change resolved issue | Context-dependent - May be legitimate troubleshooting |
| Different user made change | Admin assisted with access issue | LOW - Likely legitimate (verify authorization) |
Risk Escalation Criteria:
| Criteria | Risk Level |
|---|---|
| Same user blocked AND made policy change | HIGH |
| Policy disabled within 30 minutes of first failure | HIGH |
| Multiple policies modified | HIGH |
| Change made outside business hours | MEDIUM-HIGH |
| No change request ticket/approval | MEDIUM-HIGH |
| Admin made change for blocked user (with ticket) | LOW |
Real-World Example Analysis
Scenario: User blocked by device compliance policy, then modifies policy
Timeline
| Time | Event | Details |
|---|---|---|
| 19:05 | Sign-in failure | Error 53000: device not compliant |
| 19:06 | Sign-in failure | Error 53000: device not compliant |
| 19:07 | Sign-in failure | Error 53000: device not compliant |
| 19:09 | Policy change | enabled → disabled |
| 19:09 | Policy change | disabled → enabledForReportingButNotEnforced |
| 19:12 | Sign-in failure | Error 53000 (cached token) |
| 19:14 | Sign-in success | Access granted |
Analysis
-
✅ Policy was correctly blocking non-compliant device
- Device compliance policy was enforcing as intended
- User's device failed compliance checks (not enrolled or failing policy)
-
🚨 User disabled security control to bypass block
- Same user who was blocked made the policy change
- Change occurred within 4 minutes of repeated failures
- No approval or change request documented
-
⚠️ User partially reversed by enabling report-only
- Shows some awareness that disabling was too aggressive
- But report-only still defeats the purpose (doesn't block)
-
❌ Report-only mode is NOT a valid security posture
- Logs violations but allows non-compliant access
- Creates false sense of security (policy "exists" but doesn't protect)
Assessment
| Field | Value |
|---|---|
| Risk Level | MEDIUM-HIGH |
| Finding | Self-service security bypass using privileged role |
| Root Cause | User's device is non-compliant (not enrolled/failing compliance) |
| Policy Impact | Device compliance checks now ineffective for all users |
Recommendations
-
Immediate Actions:
- Restore policy to
enabledstate - Verify user's device compliance status
- Document incident for security review
- Restore policy to
-
User-Specific:
- Enroll user's device in Intune
- Verify device meets compliance requirements
- Review if user needs Security Administrator role
-
Process Improvements:
- Implement approval workflow for CA policy changes
- Create alert for policy state changes (enabled → disabled/report-only)
- Review all users with permission to modify CA policies
- Consider PIM for Security Administrator role
Critical Mistakes to Avoid
❌ DON'T:
| Mistake | Why It's Wrong |
|---|---|
| Query only ONE policy change event | You'll miss the sequence of changes |
| Read policy changes in reverse chronological order | Confuses cause/effect relationship |
| Assume policy was already disabled | Must check starting state from OldValue |
| Skip verifying "does this make logical sense?" | Disabled policies can't block users |
| Ignore the initiator identity | Same user = suspicious, different admin = verify authorization |
| Focus only on final state | The transition sequence reveals intent |
✅ DO:
| Best Practice | Why It Matters |
|---|---|
| Query ALL policy changes in the timeframe | Complete picture of modifications |
| Order chronologically (oldest first) | See cause/effect sequence |
| Parse the full JSON for state transitions | Extract exact policy states |
| Cross-check: blocked user → policy must be enabled | Logical consistency verification |
| Ask: "Why would user disable this policy?" | Usually to bypass a legitimate block |
| Check if initiator had authorization | Ticket, approval, documented reason |
Security Recommendations
When CA Policy Changes Are Detected
1. Determine Legitimacy
- Was the policy change authorized?
- Was there a valid business reason?
- Did the user have approval to make this change?
- Is there a change request ticket?
2. Assess Impact
- How many users affected by policy change?
- What applications/resources are now unprotected?
- How long was the policy disabled/weakened?
- Are there compliance implications (regulatory requirements)?
3. Remediation Actions
| Action | Priority |
|---|---|
Restore policy to enabled state if unauthorized |
IMMEDIATE |
| Investigate root cause (why was user blocked?) | HIGH |
| Fix underlying issue (device compliance, MFA enrollment) | HIGH |
| Review who has permission to modify CA policies | MEDIUM |
| Implement approval workflows for policy changes | MEDIUM |
| Create alerts for future CA policy modifications | MEDIUM |
4. Long-Term Improvements
| Improvement | Benefit |
|---|---|
| Use PIM for Security Administrator role | Requires approval for elevated access |
| Implement CA policy change alerts | Real-time notification of modifications |
| Require multi-admin approval for state changes | Prevents single-person bypass |
| Document approved procedures | Clear guidance for legitimate troubleshooting |
| Regular access reviews | Ensure only necessary users have CA admin rights |
Prerequisites
Required MCP Servers
This skill requires:
- Microsoft Sentinel MCP - For KQL queries against SigninLogs and AuditLogs
mcp_sentinel-data_query_lake: Execute KQL queriesmcp_sentinel-data_search_tables: Discover table schemas
Required Data Sources
- SigninLogs - Interactive sign-in events with CA status
- AADNonInteractiveUserSignInLogs - Non-interactive sign-in events
- AuditLogs - CA policy modification events
Required Permissions
To view CA policy changes in AuditLogs, ensure:
- Sentinel workspace has AuditLogs ingestion enabled
- User has appropriate RBAC to query the workspace
Integration with Other Skills
CA Policy Investigation often follows a user-investigation:
- Run user-investigation skill → Identifies sign-in failures
- Notice CA-related error codes → 53000, 50074, 530032
- Run ca-policy-investigation skill → Correlate failures with policy changes
- Document findings → Security assessment with remediation recommendations
Key Integration Points:
- Sign-in failure data comes from user investigation
- CA policy changes are NEW queries specific to this skill
- Assessment combines timeline correlation with policy state analysis
.github/skills/context-memory-review/SKILL.md
npx skills add SCStelz/security-investigator --skill context-memory-review -g -y
SKILL.md
Frontmatter
{
"name": "context-memory-review",
"description": "Weekly review of an investigation tenant-context memory file against the most recent SOC scan reports (e.g. Threat Pulse). Surfaces candidate ADD \/ MODIFY \/ FLAG changes to the context file as a propose-only review document for human approval — it NEVER edits the context file, commits, or opens a PR. Trigger on 'review my context file', 'review tenant context', 'propose context updates', 'what should I add to my context memory'."
}
Context Memory Review — Instructions
Purpose
Investigation workflows in this project lean on a tenant-context memory file — a local, gitignored living document that records environment-specific ground truth (known automation/orchestration fingerprints, known-good IPs, account classifications, honeypot/field-device inventory, validated personnel, and documented false-positive rules). Scan automations (e.g. the daily Threat Pulse) read that file to render accurate verdicts.
Over a week of scans, drill-down investigations validate new ground truth — new IPs, new personas, new FP classes, new device classes — that is not yet captured in the context file. This skill reads the last N days of scan reports, compares them against the current context file, and produces a propose-only review document: a list of discrete, human-reviewable candidate changes (ADD / MODIFY / FLAG) with section anchors, proposed text, supporting evidence, recurrence counts, and confidence.
This skill is the first half of a deliberate two-phase, human-in-the-loop workflow:
| Phase | Who | Action |
|---|---|---|
| 1. Propose (this skill) | Automation / interactive | Read reports + context file → emit review doc. No edits. |
| 2. Apply (separate, manual) | Human-directed interactive session | Operator reviews the doc, says "apply items X, Y, Z" → surgical edits to the context file. |
🔴 CRITICAL RULES — READ FIRST
-
PROPOSE-ONLY. NEVER edit the context file in this skill. Do not write, append to, or modify the context memory file. Do not
git commit, push, or open a PR. The only file this skill writes is the review document in the output directory. -
Read-only against the tenant. If any live queries are needed to corroborate a candidate change, they MUST be read-only (per the Remediation Output Policy). Prefer evidence already present in the reports — only query the tenant to disambiguate a contradiction.
-
⛔ Feedback-loop guard (the single most important rule). Scan reports are partly downstream of the context file: a scan verdict may simply echo an existing context entry rather than independently confirm it. You MUST distinguish:
- First-party validation — a drill-down in the report actually ran a query/enrichment and confirmed the fact (e.g. "enriched IP 203.0.113.10 → datacenter ASN, 0 abuse reports, recurred on 3 days"). This CAN drive a High-confidence proposal.
- Context-derived echo — the report verdict only restated something the context file already said ("🟢 known orchestration IP per tenant context"). This must NOT be promoted into a new or strengthened entry. Promoting echoes entrenches errors. When unsure, classify as echo.
-
Never propose weakening or removing a documented FP/safety guardrail based solely on its absence from the week's reports. Absence of a finding ≠ obsolescence of a guardrail. Staleness candidates are FLAG-only, Low confidence, for human judgment — never auto-REMOVE.
-
Evidence-based only. Every proposed change cites the specific report file(s), date(s), and finding it derives from. Never invent entities, counts, IPs, UPNs, or dates. If the reports don't support a change, don't propose it.
-
PII stays local. The review document will contain live tenant entities (IPs, UPNs, device names). Write it ONLY to the gitignored output directory. Never commit it, never include it in a PR, never paste tenant PII into any git artifact.
Inputs (supplied by the invoking prompt / workflow)
The invoking workflow or user supplies these. If invoked interactively without them, ask once, then proceed with the defaults shown.
| Input | Meaning | Default |
|---|---|---|
context_file |
Absolute path to the tenant-context memory file to review | (must be provided) |
reports_dir |
Directory (or glob) holding the scan reports to review | (must be provided) |
reports_glob |
Filename pattern for the reports of interest | *.md |
lookback_days |
How far back to include reports (by filename date or mtime) | 7 |
output_dir |
Where to write the review document (must be gitignored) | reports/context-reviews |
Execution Workflow
Phase 0 — Load inputs and current state
- Read the context file in full (
context_file). Build an internal index of its structure: every section heading (the anchor targets for proposals), and within sections the discrete entries — table rows (e.g. IP tables), bullet points, labelled sub-notes (e.g. "A.2", "Section C"), device entries. Note anyvalidated YYYY-MM-DDprovenance stamps. - Enumerate the reports in window. List files in
reports_dirmatchingreports_glob, select those whose date (from filenameYYYYMMDDif present, else file mtime) falls withinlookback_days. Sort oldest→newest. If zero reports are in window, STOP and report "no reports in window — nothing to review" (a normal quiet-week outcome, not a failure). - Read each in-window report. For large reports, read in ranges. Extract structured signal:
- Concrete entities that appeared with a verdict: IPs, UPNs/accounts, device/host names, OAuth apps, incident IDs, CVEs.
- For each: was the verdict reached by a first-party drill-down (a query/enrichment was executed in the report) or an echo of existing context? Capture the distinction — it gates confidence.
- New FP classes / tuning notes the report's drill-downs articulated.
- Any contradiction: a drill-down that concluded the opposite of an existing context entry.
- Note in each report whether the context file was successfully loaded/applied during that scan (the reports state this) — echoes only count as echoes if context was actually applied.
Phase 1 — Correlate across the week
Aggregate signal across all in-window reports:
- Recurrence — For each candidate entity/pattern, count how many distinct report-days it appeared on with a consistent first-party classification. Recurrence is the backbone of confidence.
- Match against the context file index — For each candidate, determine whether it is:
- Absent from the context file → ADD candidate.
- Present but refined by the reports (role/volume/scope changed, new regional sibling, expanded persona list) → MODIFY candidate.
- Present and merely echoed (no new first-party info) → NOT a candidate (drop it; feedback-loop
guard). It may at most justify refreshing a
validateddate if a first-party drill-down re-confirmed it — and that is a Low/Medium MODIFY, clearly labelled "provenance refresh only". - Contradicted by a first-party drill-down → FLAG candidate (never auto-resolve).
- Staleness sweep (FLAG-only) — Identify context entries that were NOT referenced by ANY in-window report. These are candidates for human review, not removal. Low confidence. Exclude documented safety/FP guardrails from staleness flags entirely (their value is in preventing future errors, not in weekly hit-rate).
Phase 2 — Score and assemble proposals
Assign each candidate a type and confidence:
| Type | When |
|---|---|
| ADD | New, first-party-validated fact absent from the context file. |
| MODIFY | Existing entry that a first-party drill-down refined/expanded, or a provenance-refresh. |
| FLAG | A contradiction needing human judgment, or a staleness candidate. Never an auto-edit. |
| Confidence | Criteria |
|---|---|
| High | First-party validated AND recurred on ≥3 report-days (or a single explicit, thorough validated drill-down with enrichment/queries). Consistent classification, no contradicting evidence. |
| Medium | First-party validated on 2 report-days, OR 1 strong drill-down without recurrence. |
| Low | Single weak signal, provenance-refresh only, or any FLAG/staleness candidate. |
For every proposal, produce:
- ID — sequential (
P1,P2, …). - Type + Confidence.
- Target section — the exact heading/anchor in the context file where it belongs (for ADD), or the exact existing entry text being changed (for MODIFY/FLAG).
- Proposed text — for ADD/MODIFY, the literal line/table-row/bullet to insert or the
before→after change, written in the context file's existing style and including a
(validated <today's date>)stamp where the file uses that convention. - Rationale — one or two sentences.
- Evidence — the report file name(s) + date(s) + the specific finding, and an explicit note of whether it was first-party or echo (only first-party drives ADD/MODIFY).
- Recurrence — "appeared on N of M report-days".
- Apply instruction — precise enough for a later interactive session to make a surgical edit (which section, insert-after-which-line, exact text). For FLAG items, the question the human must answer.
Phase 3 — Write the review document
Write the document to output_dir (create the folder if needed) as:
<output_dir>/context-review_<YYYYMMDD>_<HHMMSS>.md
Use this structure:
# Context Memory Review — <today's date>
**Context file reviewed:** <context_file>
**Reports reviewed:** <N> file(s) over <lookback_days>d (<earliest date> → <latest date>)
**Proposed changes:** <A> ADD · <M> MODIFY · <F> FLAG
**Confidence mix:** <High count> High · <Medium count> Medium · <Low count> Low
> ⚠️ PROPOSE-ONLY. No changes have been made to the context file. To apply, open an interactive
> session and say e.g. "apply items P1, P3, P7" — those edits will be made surgically with a
> validated-date stamp. Review each item's evidence before approving.
## Reports in this review window
| Date | File | Context applied during scan? |
|------|------|------------------------------|
| ... | ... | yes / no |
## Proposed changes
### P1 — [ADD · High] <short title>
- **Target section:** <heading/anchor>
- **Proposed text:**
> <literal text to add, in file style, with (validated <date>)>
- **Rationale:** ...
- **Evidence:** <report file(s) + date(s) + finding>; first-party drill-down.
- **Recurrence:** appeared on N of M report-days.
- **Apply instruction:** Insert under "<section>" after "<anchor line>".
### P2 — [MODIFY · Medium] ...
...
### P3 — [FLAG · Low] <contradiction or staleness> ...
- **Question for human:** ...
## Items considered but NOT proposed (feedback-loop guard)
Brief list of candidate signals that were only context-echoes (already in the file, no new first-party
evidence) and were therefore intentionally dropped — so the reviewer can confirm nothing was missed.
## Summary
One paragraph: the week's theme, the highest-value proposed addition, any contradiction needing
attention, and the count of staleness flags.
Phase 4 — Report to chat
End your response with a concise summary: context file + report window reviewed, counts of ADD/MODIFY/FLAG by confidence, the single highest-value proposed change, any contradictions surfaced, the output document path, and a reminder that nothing was applied and how to apply (interactive "apply items …").
Quality Checklist
Before finishing, verify:
- The context file was not modified; no commit/PR/push occurred.
- The review document was written only to the gitignored output directory.
- Every ADD/MODIFY proposal is backed by first-party evidence (not a context echo).
- No proposal weakens/removes a safety or FP guardrail on the basis of absence alone.
- Every proposal cites specific report file(s) + date(s) and a recurrence count.
- Contradictions are FLAG (human decides), never auto-resolved.
- Staleness candidates are FLAG · Low, and exclude documented guardrails.
- Proposed text matches the context file's existing style and includes a validated-date stamp where the file uses that convention.
- The "considered but not proposed" section documents the dropped echoes.
Notes
- This skill is environment-agnostic. All tenant-specific values (which context file, which reports, output location) are supplied by the invoking workflow or user — keep this file free of any tenant-specific identifiers, hostnames, UPNs, or environment names.
- Apply is intentionally out of scope here. Keeping propose and apply as separate phases — with apply driven by an explicit human instruction — is the safety boundary that prevents an unattended run from silently rewriting the ground-truth the scans depend on.
.github/skills/detection-authoring/SKILL.md
npx skills add SCStelz/security-investigator --skill detection-authoring -g -y
SKILL.md
Frontmatter
{
"name": "detection-authoring",
"description": "Create, deploy, update, and manage custom detection rules in Microsoft Defender XDR via the Graph API (\/beta\/security\/rules\/detectionRules). Covers query adaptation from Sentinel KQL to custom detection format, deployment via PowerShell (Invoke-MgGraphRequest), manifest-driven batch deployment, and lifecycle management (list, enable\/disable, delete). Companion script: Deploy-CustomDetections.ps1."
}
Custom Detection Authoring — Instructions
Purpose
This skill deploys custom detection rules to Microsoft Defender XDR via the Microsoft Graph API (/beta/security/rules/detectionRules). It handles:
- Query adaptation — Converting Sentinel KQL queries into custom detection format
- Single-rule deployment — Creating one rule via Graph API
- Batch deployment — Deploying multiple rules from a JSON manifest
- Lifecycle management — Listing, updating, enabling/disabling, and deleting rules
- Validation — Dry-run queries in Advanced Hunting before deployment
Entity Type: Custom detection rules (Defender XDR)
Writing new detection queries from scratch? This skill focuses on deploying and managing detection rules — not query creation. If you need to write detection KQL from scratch (schema validation, community examples, performance optimization), use the kql-query-authoring skill first with CD intent markers (say "create custom detection queries for [scenario]"). It will produce Sentinel-format queries with
cd-metadatablocks ready for this skill to adapt and deploy.
📑 TABLE OF CONTENTS
- Prerequisites — Auth, scopes, PowerShell modules
- Critical Rules — Mandatory constraints (includes query adaptation checklist)
- Naming Convention — Standardized
displayNameformat (no prefixes, no MITRE IDs, colon separators) - API Reference — Graph API schema and field values
- Frequency & Lookback — Schedule periods, lookback windows, NRT constraints
- Deployment Workflow — Step-by-step process
- Batch Deployment — Manifest-driven multi-rule deployment
- Lifecycle Management — CRUD operations
- Existing Rule Discovery — Search Analytic Rules & Custom Detections by table, EventID, or keyword
- Known Pitfalls — Lessons learned (18 pitfalls documented)
- CD Metadata Contract — Schema for query file ↔ detection skill coordination
Prerequisites
Required PowerShell Module
# Microsoft.Graph.Authentication — provides Invoke-MgGraphRequest
Install-Module Microsoft.Graph.Authentication -Scope CurrentUser
Required Graph API Scopes
| Operation | Scope | Type |
|---|---|---|
| List / Get rules | CustomDetection.Read.All |
Delegated |
| Create / Update / Delete | CustomDetection.ReadWrite.All |
Delegated |
Authentication
# Read-only
Connect-MgGraph -Scopes "CustomDetection.Read.All" -NoWelcome
# Full CRUD
Connect-MgGraph -Scopes "CustomDetection.ReadWrite.All" -NoWelcome
Why
Invoke-MgGraphRequest? The Graph MCP server andaz restboth return 403 for custom detection endpoints — they lack theCustomDetection.*scopes.Invoke-MgGraphRequestuses interactive delegated auth with consent, which works.
Companion Script
Deploy-CustomDetections.ps1 — PowerShell script for manifest-driven batch deployment. See Batch Deployment.
⚠️ CRITICAL RULES — READ FIRST ⚠️
Mandatory Query Requirements
Custom detection queries have strict requirements that differ from Sentinel analytic rules:
| Requirement | Detail |
|---|---|
| 🔴 Author-only by default | The default behavior is to author, validate, and write the manifest only — do NOT call the Graph API to deploy rules unless the user explicitly says "deploy", "create the rule", "push to Defender", or similar deployment-intent language. If deployment intent is ambiguous, ask before calling the API. |
| Timestamp column must be projected as-is | The query MUST project the timestamp column exactly as it appears in the source table — TimeGenerated for Sentinel/LA tables, Timestamp for XDR-native tables. Do not alias one to the other (e.g., Timestamp = TimeGenerated causes 400 Bad Request). See Pitfall 1. |
| Event-unique columns (per table type) | Required columns that uniquely identify the event differ by table family. A bare summarize count() or make_set() loses these columns and fails. summarize with arg_max IS allowed — see Pitfall 3. See table below for per-type requirements. |
| Impacted asset identifier column | The query must project at least one column whose name matches a valid impactedAssets identifier (e.g., AccountUpn, DeviceName, DeviceId). See Impacted Asset Types and Pitfall 9. Queries without project or summarize typically return these columns automatically. |
impactedAssets must be non-empty |
The impactedAssets array must contain at least 1 element. An empty array ([]) is rejected with 400 BadRequest: "The field ImpactedAssets must be a string or array type with a minimum length of '1'." Every detection must declare which entity it impacts. See Pitfall 13. |
No let statements (NRT) |
NRT rules (schedule: "0") reject let entirely — the API returns a generic 400 Bad Request. This is not documented by Microsoft (empirically discovered Feb 2026) but consistently reproducible. Inline all dynamic arrays/lists directly in where clauses. Non-NRT rules (1H+) tolerate let. |
Unique displayName AND title |
Both the rule displayName and the alert title must be unique across all custom detections. Duplicate displayName returns 409 Conflict. Duplicate title returns 400 Bad Request. |
🔴 Naming convention for displayName |
Follow the standardized naming convention documented in Naming Convention below. No schedule prefixes, no MITRE IDs, no tactic labels — the portal columns already display these. Use clean, descriptive title-case names with colon (:) as the only sub-separator. |
| 150 alerts per run | Each rule generates a maximum of 150 alerts per execution. Tune the query to avoid alerting on normal day-to-day activity. |
| 🔴 No response actions | All rules deployed by this skill MUST use "responseActions": []. Automated response actions (isolate device, disable user, block file, etc.) are PROHIBITED — they must only be configured manually by a human operator in the Defender portal after the rule is validated. Never populate responseActions in manifests or API calls. |
| First run = 30-day backfill | When a new rule is saved, it immediately runs against the past 30 days of data. Expect a burst of initial alerts if the query has broad coverage. |
Required event-unique columns by table type (MS Learn source):
| Table Family | Required Columns (besides timestamp) |
|---|---|
| MDE tables (Device*) | DeviceId AND ReportId |
| Alert* tables | None (just timestamp) |
| Observation* tables | ObservationId |
| All other XDR tables | ReportId |
| Sentinel/LA tables (AuditLogs, SigninLogs, SecurityEvent, OfficeActivity, etc.) | ReportId recommended (use proxy: CorrelationId, OfficeObjectId, CallerProcessId) but not strictly mandated by the docs |
Query Adaptation Checklist
When converting a Sentinel query to custom detection format:
- ✅ Remove bare
summarize— project raw rows instead. Exception:summarizewitharg_maxis allowed for threshold-based detections (see Pitfall 3) - ✅ Project the timestamp column as-is:
TimeGenerated = TimeGeneratedfor Sentinel/LA tables,Timestampfor XDR tables. Never alias one to the other. - ✅ Project the impacted asset identifier column — the column name must match a valid identifier from Impacted Asset Types. Examples:
DeviceName = Computerfor device-focused detections,AccountUpn = UserIdfor user-focused. See Pitfall 9. - ✅ Project event-unique columns per table type —
DeviceId+ReportIdfor MDE tables;ReportIdfor other XDR tables; recommended proxyReportIdfor Sentinel tables (e.g.,ReportId = CorrelationId). Caveat: proxy columns may contain empty strings for some events — acceptable but means those rows won't be individually identifiable in alert details. - ✅ Add a time filter as the first
whereclause — preferingestion_time() > ago(1h)overTimestamp > ago(1h)(see tip below). NRT exception: For NRT rules (schedule: "0"), omit all time filters —ingestion_time()causes400 Bad Requestin NRT mode (see Pitfall 17).Timestamp > ago(...)is accepted but unnecessary. - ✅ Remove
letvariables for NRT rules — NRT rejectsletentirely (generic 400 error, undocumented). Inline all dynamic arrays directly inwhereclauses. Non-NRT rules toleratelet. - ✅ Validate via Advanced Hunting dry-run before deployment
- ✅ For NRT rules: avoid
tostring()on dynamic columns — use native string columns instead (e.g.,Propertiesinstead oftostring(Properties_d)). See Pitfall 11. - ✅ For NRT rules: verify the table's ingestion lag justifies NRT. See Pitfall 12.
- ✅ Count unique
{{Column}}references acrosstitleANDdescriptioncombined — max 3 unique columns total (shared across both fields, not per-field). Exceeding this returns400 Bad Request: "Dynamic properties in alertTitle and alertDescription must not exceed 3 fields". See Pitfall 14.
Performance tip (from MS Learn): "Avoid filtering custom detections by using the
Timestampcolumn. The data used for custom detections is prefiltered based on the detection frequency." Useingestion_time()instead — it aligns with the platform's pre-filtering for better performance. For scheduled rules, match the time filter to the run frequency (ingestion_time() > ago(1h)for 1H rules). For NRT rules, no time filter is needed. ⚠️ PowerShell note: When buildingqueryTextcontaining backslashes (file paths, regex), always use single-quoted here-strings (@'...'@) to avoid escape sequence mangling — see Pitfall 15.
Example Adaptation
Before (Sentinel KQL — uses summarize):
let _Lookback = 7d;
SecurityEvent
| where TimeGenerated > ago(_Lookback)
| where EventID == 4799
| where TargetSid == "S-1-5-32-544"
| where SubjectUserSid != "S-1-5-18"
| where AccountType != "Machine"
| where not(SubjectUserSid endswith "-500")
| project TimeGenerated, Computer, Actor = SubjectUserName, ...
| summarize EnumerationCount = count(), Processes = make_set(CallerProcess)
by Actor, ActorDomain, ActorSID
After (Custom Detection — row-level, mandatory columns):
SecurityEvent
| where TimeGenerated > ago(1h)
| where EventID == 4799
| where TargetSid == "S-1-5-32-544"
| where SubjectUserSid != "S-1-5-18"
| where AccountType != "Machine"
| where not(SubjectUserSid endswith "-500")
| project
TimeGenerated = TimeGenerated,
DeviceName = Computer,
AccountName = SubjectUserName,
AccountDomain = SubjectDomainName,
AccountSid = SubjectUserSid,
CallerProcess = CallerProcessName,
ReportId = CallerProcessId
Key changes:
- Removed
let _Lookback→ hardcodedago(1h) - Removed
summarize→ rawproject - Added
TimeGenerated = TimeGenerated(identity projection — mandatory) - Added
DeviceName = Computer(impacted asset identifier — device-focused detection) - Added
ReportId = CallerProcessId(proxy ReportId — event-unique identifier)
Naming Convention
The displayName should be a clean, title-case description of what the detection finds. The portal columns already show Scheduling Type, Tactics, and Techniques — don't repeat them in the name.
| Rule | Example |
|---|---|
Use colon (:) for sub-detail |
Event Log Clearing: Security or System Log Wiped |
| Threat actor/family in parentheses at end | Credential Dumping Tool Execution (Storm-2885) |
TI rules: Threat Intelligence: {IoC} Match on {Table} |
Threat Intelligence: IP Match on CloudAppEvents |
No schedule prefixes (NRT —, 1H —) |
Portal Scheduling Type column covers this |
No MITRE IDs (T1036 —) |
Portal Techniques column covers this |
No tactic labels ((Collection), (Exfiltration)) |
Portal Tactics column covers this |
No em dash (—) separator |
Use colon (:) instead |
API Reference
Endpoint
POST /beta/security/rules/detectionRules — Create
GET /beta/security/rules/detectionRules — List all
GET /beta/security/rules/detectionRules/{id} — Get by ID
PATCH /beta/security/rules/detectionRules/{id} — Update
DELETE /beta/security/rules/detectionRules/{id} — Delete
Schedule Periods
| Value | Meaning | Notes |
|---|---|---|
"0" |
NRT (Near Real-Time / Continuous) | Runs continuously. See NRT Constraints. |
"1H" |
Every 1 hour | Most common for custom detections |
"3H" |
Every 3 hours | |
"12H" |
Every 12 hours | |
"24H" |
Every 24 hours | Daily |
Alert Severity Values
| Value | Use Case |
|---|---|
"informational" |
Baseline queries, low-noise canaries |
"low" |
Suspicious but may be benign |
"medium" |
Likely malicious, needs investigation |
"high" |
High-confidence detection, immediate response |
Alert Category Values
category is a case-sensitive, single-value, server-validated enum accepting two groups:
- MITRE tactics (title case):
InitialAccess,Execution,Persistence,PrivilegeEscalation,DefenseEvasion,CredentialAccess,Discovery,LateralMovement,Collection,Exfiltration,CommandAndControl,Impact,Reconnaissance,ResourceDevelopment - Non-tactic threat categories:
Malware,Ransomware,SuspiciousActivity,UnwantedSoftware— these hide the MITRE techniques field in the portal (MS Learn); prefer a tactic when you want techniques to display.
Portal label note: The portal labels this control "Tactic" (UX rename, 2026), but the API field stays
category— single-value, automation unaffected. Validation details in Pitfall 18.
Impacted Asset Types
Device asset:
{
"@odata.type": "#microsoft.graph.security.impactedDeviceAsset",
"identifier": "<identifier>"
}
Valid device identifiers: deviceId, deviceName, remoteDeviceName, targetDeviceName, destinationDeviceName
User asset:
{
"@odata.type": "#microsoft.graph.security.impactedUserAsset",
"identifier": "<identifier>"
}
Valid user identifiers: accountObjectId, accountSid, accountUpn, accountName, accountDomain, accountId, requestAccountSid, requestAccountName, requestAccountDomain, recipientObjectId, processAccountObjectId, initiatingAccountSid, initiatingProcessAccountUpn, initiatingAccountName, initiatingAccountDomain, servicePrincipalId, servicePrincipalName, targetAccountUpn
Mailbox asset:
{
"@odata.type": "#microsoft.graph.security.impactedMailboxAsset",
"identifier": "<identifier>"
}
Valid mailbox identifiers: accountUpn, fileOwnerUpn, initiatingProcessAccountUpn, lastModifyingAccountUpn, targetAccountUpn, senderFromAddress, senderDisplayName, recipientEmailAddress, senderMailFromAddress
Minimal Valid POST Body
{
"displayName": "Rule Name",
"isEnabled": true,
"queryCondition": {
"queryText": "SecurityEvent\r\n| where TimeGenerated > ago(1h)\r\n| ..."
},
"schedule": {
"period": "1H"
},
"detectionAction": {
"alertTemplate": {
"title": "Alert Title",
"description": "Alert description text.",
"severity": "medium",
"category": "Discovery",
"recommendedActions": null,
"mitreTechniques": ["T1069.001"],
"impactedAssets": [
{
"@odata.type": "#microsoft.graph.security.impactedDeviceAsset",
"identifier": "deviceName"
}
]
},
"responseActions": []
}
}
impactedAssets: Must contain at least 1 element — an empty array causes400 BadRequest. Every detection must map to at least one impacted entity (device, user, or mailbox). See Pitfall 13.
recommendedActions: Can benullor a string. The portal sets it tonullby default.
responseActions: Must always be[]— response actions are prohibited in LLM-authored detections (see Critical Rules). Must be[], notnull— sendingnullcauses400 Bad Request. See Pitfall 10.
organizationalScope: Omit this field entirely for tenant-wide rules (the API default). Including"organizationalScope": nullexplicitly may cause400 Bad Requestin some API versions.
Custom details (not shown above): The API also supports a
customDetailsarray of key-value pairs surfaced in the alert side panel. Each rule supports up to 20 KVPs with a combined 4KB size limit. Keys are display labels; values are query column names. See MS Learn.
Related evidence (not shown above): Beyond
impactedAssets, the entity mapping also supports linking related evidence entities (Process, File, Registry value, IP, OAuth application, DNS, Security group, URL, Mail cluster, Mail message). These provide correlation context but are not impacted assets. See MS Learn.
Dynamic Alert Titles and Descriptions
Alert titles and descriptions can reference query result columns using {{ColumnName}} syntax, making alerts self-descriptive:
{
"title": "Admin Group Enumeration by {{AccountName}} on {{DeviceName}}",
"description": "User {{AccountName}} enumerated group {{TargetGroupName}} on the device."
}
| Constraint | Limit |
|---|---|
| Max unique dynamic columns | 3 unique {{Column}} references TOTAL across title AND description combined — NOT per field. E.g., the example above uses AccountName + DeviceName in title and AccountName + TargetGroupName in description = 3 unique columns (AccountName is reused). Exceeding this returns 400 Bad Request with "Dynamic properties in alertTitle and alertDescription must not exceed 3 fields". |
| ⚠️ Discrepancy with MS Learn docs: The official documentation states "The number of columns you can reference in each field is limited to three" (i.e., 3 per field). However, the Graph API empirically enforces 3 unique columns total across both fields combined (confirmed Feb 2026). The portal UI may enforce the per-field limit differently than the API. Use 3 unique total as the safe limit for Graph API deployments. | |
| Format | {{ExactColumnName}} — must match a column in query output |
| Markup | Plain text only — HTML, Markdown, and code are sanitized |
| URLs | Must use percent-encoding format |
Frequency & Lookback
Lookback Windows by Frequency
Each frequency has a built-in lookback window. Results outside this window are ignored even if the query requests them:
| Frequency | Lookback Period | Query Filter Recommendation |
|---|---|---|
| NRT (Continuous) | Streaming | No time filter needed — events processed as collected |
| Every 1 hour | Past 4 hours | ago(4h) or ago(1h) |
| Every 3 hours | Past 12 hours | ago(12h) or ago(3h) |
| Every 12 hours | Past 48 hours | ago(48h) or ago(12h) |
| Every 24 hours | Past 30 days | ago(30d) or ago(24h) |
| Custom (Sentinel only) | 4× frequency (<daily) or 30d (≥daily) | Match lookback |
Tip: Match the query time filter to the run frequency (
ago(1h)for 1H rules), not the full lookback window. The lookback ensures late-arriving data is caught, but your filter should target the detection window.
NRT Constraints
NRT (Continuous, period: "0") rules have stricter requirements than scheduled rules:
| Constraint | Detail |
|---|---|
| Single table only | Query must reference exactly one table — no joins or unions |
No let statements |
let variables are silently rejected — the API returns a generic 400 Bad Request with no useful error message. Always inline dynamic arrays/lists directly in where clauses. This constraint is not listed in the official NRT docs (which list only 4 constraints) but is consistently reproducible via Graph API (empirically confirmed Feb 2026). |
No externaldata |
Cannot use the externaldata operator |
| No comments | Query text must not contain any comment lines (//) |
| Supported operators only | Limited to supported KQL features. tostring() on dynamic columns is rejected — use native string columns instead (e.g., Properties instead of tostring(Properties_d)). See Pitfall 11. |
| No time filter needed | NRT processes events as they stream in. The platform pre-filters automatically. Timestamp > ago(1h) is unnecessary but harmless. However, ingestion_time() is rejected — the API returns 400 Bad Request. See Pitfall 17. |
NRT-Supported Tables
Not all tables support NRT frequency. Use NRT only with these tables:
Defender XDR tables:
AlertEvidence, CloudAppEvents, DeviceEvents, DeviceFileCertificateInfo, DeviceFileEvents, DeviceImageLoadEvents, DeviceLogonEvents, DeviceNetworkEvents, DeviceNetworkInfo, DeviceInfo, DeviceProcessEvents, DeviceRegistryEvents, EmailAttachmentInfo, EmailEvents*, EmailPostDeliveryEvents, EmailUrlInfo, IdentityDirectoryEvents, IdentityLogonEvents, IdentityQueryEvents, UrlClickEvents
* EmailEvents: LatestDeliveryLocation and LatestDeliveryAction columns are excluded from NRT.
Sentinel tables (Preview):
ABAPAuditLog_CL, ABAPChangeDocsLog_CL, AuditLogs, AWSCloudTrail, AWSGuardDuty, AzureActivity, CommonSecurityLog, GCPAuditLogs, MicrosoftGraphActivityLogs, OfficeActivity, Okta_CL, OktaV2_CL, ProofpointPOD, ProofPointTAPClicksPermitted_CL, ProofPointTAPMessagesDelivered_CL, SecurityAlert, SecurityEvent, SigninLogs
Important:
SecurityEventandSigninLogssupport NRT — our Event ID 4799/4702 queries can run as NRT if they meet the single-table/no-joins constraint.
Ingestion Lag Consideration — NRT Suitability
A table being NRT-supported means the API accepts NRT rules — not that NRT is the right choice. If a table's ingestion lag exceeds the detection frequency benefit, NRT adds overhead with no detection speed improvement. See Pitfall 12 for a per-table ingestion lag assessment and recommendation matrix. Rule of thumb: if ingestion lag > 30 min, use 1H scheduled instead.
Custom Frequency (Sentinel Data Only)
For rules based entirely on Sentinel-ingested data, a custom frequency is available (Preview):
- Range: 5 minutes to 14 days
- Lookback: Automatically calculated — 4× frequency for sub-daily, 30 days for daily or longer
- Requirement: Data must be available in Microsoft Sentinel (not XDR-only tables)
Deployment Workflow
🔴 DEPLOYMENT GATE: Only proceed to Steps 2-3 (API calls) when the user has explicitly requested deployment. Trigger phrases: "deploy", "create the rule", "push", "POST it", "make it live". If the user asked to "author", "write", "create a manifest", "prepare", or "draft" a detection — stop after validation (Step 1) and manifest generation. Present the manifest JSON for review and wait for explicit deployment confirmation.
Single Rule Deployment
Step 1: Validate the query in Advanced Hunting
Run the adapted query with a 1h lookback to validate schema:
Use RunAdvancedHuntingQuery with the adapted KQL query.
Confirm: 0 or more results, correct column schema (TimeGenerated, DeviceName, AccountName, etc.)
Then run with 30d lookback to confirm it returns real data:
Change ago(1h) to ago(30d) for the validation run.
Verify results contain expected columns and realistic data.
Step 2: Check for duplicates, then build and POST the rule
Connect-MgGraph -Scopes "CustomDetection.ReadWrite.All" -NoWelcome
# Pre-flight: check if rule name already exists
$ruleName = "Rule Name"
$existing = (Invoke-MgGraphRequest -Method GET `
-Uri "/beta/security/rules/detectionRules" -OutputType PSObject).value `
| Where-Object { $_.displayName -eq $ruleName }
if ($existing) {
Write-Host "Rule '$ruleName' already exists (ID: $($existing.id)). Skipping POST."
return
}
$body = @{
displayName = $ruleName
isEnabled = $true
queryCondition = @{
queryText = "SecurityEvent`r`n| where TimeGenerated > ago(1h)`r`n| ..."
}
schedule = @{ period = "1H" }
detectionAction = @{
alertTemplate = @{
title = "Alert Title"
description = "Description"
severity = "medium"
category = "Discovery"
recommendedActions = $null
mitreTechniques = @("T1069.001")
impactedAssets = @(
@{
"@odata.type" = "#microsoft.graph.security.impactedDeviceAsset"
identifier = "deviceName"
}
)
}
responseActions = @()
}
} | ConvertTo-Json -Depth 10
$result = Invoke-MgGraphRequest -Method POST `
-Uri "/beta/security/rules/detectionRules" `
-Body $body -ContentType "application/json" -OutputType PSObject
Step 3: Verify creation
$rules = Invoke-MgGraphRequest -Method GET `
-Uri "/beta/security/rules/detectionRules" -OutputType PSObject
$rules.value | Select-Object id, displayName, isEnabled,
@{N='Schedule';E={$_.schedule.period}},
@{N='Status';E={$_.lastRunDetails.status}} | Format-Table -AutoSize
Batch Deployment
Use the companion script Deploy-CustomDetections.ps1 for manifest-driven batch deployment.
Manifest storage: Save manifest JSON files in the
temp/folder (gitignored). Manifests are deployment artifacts, not versioned query definitions.
Manifest Format
See example-manifest.json for a complete 2-rule reference covering NRT and scheduled (with summarize/arg_max) patterns.
The script reads a JSON file containing an array of rule definitions:
[
{
"displayName": "Admin Group Enumeration by Non-Admin User",
"title": "Admin Group Enumeration by {{AccountName}} on {{DeviceName}}",
"queryText": "SecurityEvent\r\n| where TimeGenerated > ago(1h)\r\n| ...",
"schedule": "0",
"severity": "medium",
"category": "Discovery",
"mitreTechniques": ["T1069.001", "T1087.001"],
"description": "User {{AccountName}} enumerated the local Administrators group.",
"recommendedActions": "Verify whether the user has a legitimate reason to enumerate admin group membership.",
"impactedAssets": [
{ "type": "device", "identifier": "deviceName" },
{ "type": "user", "identifier": "accountSid" }
],
"responseActions": []
}
]
Usage
# Dry-run — validate all queries in Advanced Hunting without creating rules
.\Deploy-CustomDetections.ps1 -ManifestPath .\temp\4799_4702.json -DryRun
# Deploy all rules from manifest (skips existing rules by default)
.\Deploy-CustomDetections.ps1 -ManifestPath .\temp\4799_4702.json
# Deploy and overwrite — attempt POST even if rule name exists (may cause 409)
.\Deploy-CustomDetections.ps1 -ManifestPath .\temp\4799_4702.json -Force
Lifecycle Management
List All Rules
$rules = Invoke-MgGraphRequest -Method GET `
-Uri "/beta/security/rules/detectionRules" -OutputType PSObject
$rules.value | Select-Object id, displayName, isEnabled,
@{N='Schedule';E={$_.schedule.period}},
@{N='LastRun';E={$_.lastRunDetails.status}},
@{N='Created';E={$_.createdDateTime}} | Format-Table -AutoSize
Get Rule by ID
$rule = Invoke-MgGraphRequest -Method GET `
-Uri "/beta/security/rules/detectionRules/5632" -OutputType PSObject
$rule | ConvertTo-Json -Depth 10
Update Rule (PATCH)
PATCH /beta/security/rules/detectionRules/{id} — send only the fields you want to change. All fields are optional.
Updatable fields:
| Field Path | Type | Notes |
|---|---|---|
displayName |
String | Rule name — follow Naming Convention |
isEnabled |
Boolean | Enable/disable without deleting |
queryCondition.queryText |
String | KQL query — validates before saving |
schedule.period |
String | 0 (NRT), 1H, 3H, 12H, 24H |
detectionAction.alertTemplate.title |
String | Alert title (supports {{Column}} variables) |
detectionAction.alertTemplate.description |
String | Alert description (supports {{Column}} variables) |
detectionAction.alertTemplate.severity |
String | informational, low, medium, high |
detectionAction.alertTemplate.category |
String | ATT&CK tactic (e.g., CredentialAccess) |
detectionAction.alertTemplate.recommendedActions |
String | null to clear |
detectionAction.alertTemplate.impactedAssets |
Array | null to clear |
detectionAction.responseActions |
Array | Always [] — see critical rules |
Examples:
# Rename a rule
$body = @{ displayName = 'Cloud Password Spray: Multi-Account Failed Auth from Single IP' } | ConvertTo-Json
Invoke-MgGraphRequest -Method PATCH `
-Uri "/beta/security/rules/detectionRules/6044" `
-Body $body -ContentType "application/json"
# Change schedule and severity
$body = @{
schedule = @{ period = "24H" }
detectionAction = @{
alertTemplate = @{ severity = "high" }
}
} | ConvertTo-Json -Depth 10
Invoke-MgGraphRequest -Method PATCH `
-Uri "/beta/security/rules/detectionRules/5632" `
-Body $body -ContentType "application/json"
# Batch rename (loop pattern)
$renames = @{
'Old Rule Name' = 'New Rule Name'
'Another Old Name' = 'Another New Name'
}
$rules = (Invoke-MgGraphRequest -Method GET `
-Uri "/beta/security/rules/detectionRules" -OutputType PSObject).value
foreach ($old in $renames.Keys) {
$rule = $rules | Where-Object { $_.displayName -eq $old }
if (-not $rule) { continue }
$body = @{ displayName = $renames[$old] } | ConvertTo-Json
Invoke-MgGraphRequest -Method PATCH `
-Uri "/beta/security/rules/detectionRules/$($rule.id)" `
-Body $body -ContentType "application/json"
}
Delete Rule
Invoke-MgGraphRequest -Method DELETE `
-Uri "/beta/security/rules/detectionRules/5632"
⚠️ Deletion propagation delay: After deleting a rule, the name remains reserved for ~30-60 seconds. Creating a rule with the same
displayNameduring this window returns409 Conflict— but the rule may still be created despite the error. Always verify with a GET after creation.
Enable/Disable Without Deleting
# Disable
Invoke-MgGraphRequest -Method PATCH `
-Uri "/beta/security/rules/detectionRules/5632" `
-Body '{"isEnabled": false}' -ContentType "application/json"
# Enable
Invoke-MgGraphRequest -Method PATCH `
-Uri "/beta/security/rules/detectionRules/5632" `
-Body '{"isEnabled": true}' -ContentType "application/json"
Existing Rule Discovery
Before authoring new custom detections, check what Analytic Rules (Sentinel) and Custom Detection rules (Defender XDR) already exist for the same table, EventID, or keyword. This avoids duplicating coverage and helps identify gaps.
Step 0: Construct the Analytic Rules URL (once per session)
$cfg = Get-Content config.json | ConvertFrom-Json
$sub = $cfg.subscription_id
$rg = $cfg.azure_mcp.resource_group
$ws = $cfg.azure_mcp.workspace_name
$arUrl = "https://management.azure.com/subscriptions/$sub/resourceGroups/$rg/providers/Microsoft.OperationalInsights/workspaces/$ws/providers/Microsoft.SecurityInsights/alertRules?api-version=2024-09-01"
# Verify (should return rule count)
az rest --method get --url $arUrl --query "length(value)" -o tsv 2>$null
All patterns below reuse
$arUrl. The Sentinel REST API returns the full KQL query text for every rule — there is no server-side content filtering, so we pull all rules in one call and filter client-side with JMESPathcontains().
Search Analytic Rules by Table Name or Keyword
# Which rules reference a specific table? (e.g., SecurityEvent)
az rest --method get --url $arUrl `
--query "value[?properties.query && contains(properties.query, 'SecurityEvent')].{name: properties.displayName, severity: properties.severity, enabled: properties.enabled}" `
-o table 2>$null
Search Analytic Rules by EventID
# Which rules reference a specific EventID?
az rest --method get --url $arUrl `
--query "value[?properties.query && contains(properties.query, '<EventID>')].{name: properties.displayName, severity: properties.severity, enabled: properties.enabled}" `
-o table 2>$null
To see the surrounding KQL context of a match:
az rest --method get --url $arUrl `
--query "value[?properties.query && contains(properties.query, '<EventID>')].properties.query" `
-o tsv 2>$null | Select-String -Pattern '<EventID>' -Context 1,1
Search Analytic Rules for ASIM Parser Dependencies
$rules = az rest --method get --url $arUrl `
--query "value[?properties.enabled==``true`` && properties.query].{displayName: properties.displayName, query: properties.query}" `
-o json 2>$null | ConvertFrom-Json
$asimRules = $rules | Where-Object { $_.query -match '_Im_|_ASim_' }
$asimRules | ForEach-Object {
$schemas = [regex]::Matches($_.query, '_Im_(\w+)') | ForEach-Object { $_.Groups[1].Value } | Sort-Object -Unique
Write-Host "$($_.displayName): $($schemas -join ', ')"
}
Dump All Enabled Rule Queries for Local Search
az rest --method get --url $arUrl `
--query "value[?properties.enabled==``true`` && properties.query].{name: properties.displayName, query: properties.query}" `
-o json > temp/analytic_rule_queries.json
# Then search locally for any pattern
Get-Content temp/analytic_rule_queries.json | Select-String -Pattern 'EventID\s*(==|in\s*\(|has|contains)' -AllMatches
Search Custom Detection Rules (Graph API)
⚠️ Important: The Graph MCP server returns 403 for the Custom Detection endpoint. Always use Invoke-MgGraphRequest via the terminal.
Import-Module Microsoft.Graph.Authentication -ErrorAction Stop
$ctx = Get-MgContext
if (-not $ctx -or $ctx.Scopes -notcontains 'CustomDetection.Read.All') {
Connect-MgGraph -Scopes 'CustomDetection.Read.All' -NoWelcome
}
$response = Invoke-MgGraphRequest -Method GET `
-Uri '/beta/security/rules/detectionRules?$select=id,displayName,isEnabled,queryCondition,schedule,lastRunDetails,createdDateTime,lastModifiedDateTime' `
-OutputType PSObject
Then filter by table name or keyword:
# Which CD rules reference SecurityEvent?
$response.value | Where-Object { $_.queryCondition.queryText -match 'SecurityEvent' } |
Select-Object displayName, isEnabled, @{N='Query';E={$_.queryCondition.queryText}}
# Which CD rules reference a specific EventID?
$response.value | Where-Object { $_.queryCondition.queryText -match '4688|ProcessCreate' } |
Select-Object displayName, isEnabled, @{N='Query';E={$_.queryCondition.queryText}}
Identify stale rules (no run in 90 days):
$cutoff = (Get-Date).AddDays(-90).ToString('yyyy-MM-ddTHH:mm:ssZ')
$response.value | Where-Object {
$_.lastRunDetails.lastRunDateTime -and $_.lastRunDetails.lastRunDateTime -lt $cutoff
} | Select-Object displayName, isEnabled,
@{N='LastRun';E={$_.lastRunDetails.lastRunDateTime}},
@{N='Status';E={$_.lastRunDetails.status}}
Key API Fields for Rule Discovery
| Source | Field Path | Content |
|---|---|---|
| Analytic Rules (REST) | properties.displayName |
Rule name |
properties.query |
Full KQL query text | |
properties.severity |
High / Medium / Low / Informational | |
properties.enabled |
true / false |
|
| Custom Detections (Graph) | displayName |
Rule name |
queryCondition.queryText |
Full KQL query (AH syntax) | |
schedule.period |
PT1H, PT24H, PT0S (continuous) |
|
lastRunDetails.lastRunDateTime |
Last execution timestamp | |
lastRunDetails.status |
completed, failed, running |
|
isEnabled |
true / false |
Tip: JMESPath
contains()(used inaz rest --query) is case-sensitive. For case-insensitive search, dump to JSON and use PowerShell-matchinstead.
Known Pitfalls
Pitfall 1: Timestamp vs TimeGenerated — Project As-Is
The query must project the timestamp column exactly as it appears in the source table. Do NOT alias one to the other.
| Source Table Type | Correct | Wrong |
|---|---|---|
| Sentinel/LA tables (SecurityEvent, SigninLogs, AuditLogs, etc.) | TimeGenerated = TimeGenerated |
Timestamp = TimeGenerated |
| XDR-native tables (DeviceEvents, DeviceProcessEvents, etc.) | Timestamp (native) |
TimeGenerated = Timestamp |
The MS Learn docs confirm: "Timestamp or TimeGenerated — This column sets the timestamp for generated alerts. The query shouldn't manipulate this column and should return it exactly as it appears in the raw event." Aliasing across types causes 400 Bad Request.
Pitfall 2: Silent Rule Creation on Error Responses (400 AND 409)
The API can silently create a rule even when it returns an error. This applies to both 400 Bad Request and 409 Conflict responses.
Cause A — 400 with partial validation: A POST may pass structural validation (creating the rule) but fail a secondary check (e.g., let variable in NRT query, >3 dynamic fields). The API returns 400 Bad Request — but the rule was already created. A subsequent retry with a fixed query then hits 409 Conflict because the rule exists.
Cause B — Deletion propagation delay: Deleting a rule leaves a name reservation for ~30-60 seconds. POSTing a rule with the same displayName in this window returns 409 Conflict — but the API may still create the rule.
Cause C — Silent success + accidental retry: When running Invoke-MgGraphRequest in a terminal, the POST may succeed but the output buffer splits across calls, making it look like nothing happened. Re-running the same POST produces a 409 because the rule was already created seconds earlier.
Prevention:
- Always run a GET before POST to check if the rule name already exists (see Step 2)
- Always verify with GET after ANY error response (400 or 409) — the rule may have been created despite the error
- Never re-run a POST without first checking via GET whether the previous attempt succeeded
- If a rule was silently created with a bad query, use PATCH to update the
queryCondition.queryTextrather than deleting and re-creating
Pitfall 3: summarize — Allowed Only With Row-Level Output
Custom detection queries must return row-level results with required columns (TimeGenerated, DeviceName, ReportId). A bare summarize count() or make_set() as the final operator fails validation because the output lacks these columns.
However, summarize with arg_max IS allowed when used to return the required columns alongside aggregation:
// ✅ ALLOWED — uses arg_max to preserve row-level columns
DeviceEvents
| where ingestion_time() > ago(1d)
| where ActionType == "AntivirusDetection"
| summarize (Timestamp, ReportId)=arg_max(Timestamp, ReportId), count() by DeviceId
| where count_ > 5
This pattern counts by entity but still returns Timestamp, ReportId, and DeviceId per row — satisfying the requirement. Use this for threshold-based detections ("alert when count > N").
Pitfall 4: Graph MCP and az rest Cannot Access This API
Both the Graph MCP server and az rest lack the CustomDetection.ReadWrite.All scope. Only Invoke-MgGraphRequest with interactive delegated auth works.
Pitfall 5: recommendedActions Type
The recommendedActions field is a String (not an array). Set to null if not needed. The portal always sets it to null.
Pitfall 6: Query Newlines in JSON
The queryText JSON field requires \r\n (CRLF) line breaks on the wire. When using ConvertTo-Json on a PowerShell hashtable (the recommended approach), this is handled automatically — multiline here-string content in the hashtable value is serialized with correct CRLF encoding. No manual newline insertion is needed.
If manually constructing a raw JSON string body (not recommended), use PowerShell backtick escapes `r`n to produce CRLF in the output.
Pitfall 7: Duplicate Name AND Title Check
The API enforces unique displayName AND unique title (alert title) across all custom detections. Duplicate displayName returns 409 Conflict. Duplicate title returns 400 Bad Request. The batch deployment script checks for displayName duplicates by default — use -Force to override. The MS Learn docs state both should be unique: "Detection name... make it unique" and "Alert title... make it unique".
Pitfall 8: Alert Deduplication
Custom detections automatically deduplicate alerts. If a detection fires twice on events with the same entities, custom details, and dynamic details, only one alert is created. This can happen when the lookback period is longer than the run frequency (e.g., 1H frequency with 4H lookback means 3 hours of overlap). Different events on the same entity produce separate alert entries under the same alert.
Pitfall 9: impactedAssets Identifier Must Be a Predefined API Value
The identifier field in impactedAssets must use one of the predefined values from the Impacted Asset Types section — NOT arbitrary query column names. Using a custom column name (e.g., "identifier": "TargetComputer" or "identifier": "Actor") causes a silent 400 InvalidInput with an empty error message.
This aligns with the MS Learn docs which list specific "strong identifier" columns for impacted assets. The portal wizard enforces this via a dropdown; the Graph API rejects non-matching values silently.
Identifier values must use camelCase as listed in the Impacted Asset Types section (e.g., recipientEmailAddress, not RecipientEmailAddress). The API treats identifier values as case-sensitive when matching to the predefined list.
Additionally, the query MUST project a column whose name matches the chosen identifier. If you use "identifier": "accountUpn", the query must project an AccountUpn column (alias if needed: AccountUpn = UserId). The column name match is case-insensitive — AccountUpn in the query matches accountUpn in the identifier.
| Wrong | Correct |
|---|---|
"identifier": "UserId" |
"identifier": "accountUpn" + project AccountUpn = UserId |
"identifier": "Actor" |
"identifier": "accountUpn" + rename Actor → AccountUpn |
"identifier": "TargetComputer" |
"identifier": "deviceName" + project DeviceName = Computer |
"identifier": "TargetUPN" |
"identifier": "accountUpn" + rename TargetUPN → AccountUpn |
⚠️
InitiatingProcess*column trap (Apr 2026): Device* tables project manyInitiatingProcess*columns (e.g.,InitiatingProcessAccountName,InitiatingProcessAccountSid,InitiatingProcessAccountUpn,InitiatingProcessAccountObjectId). Only three of these are valid user identifiers:initiatingProcessAccountUpn, and theinitiatingAccount*variants (initiatingAccountSid,initiatingAccountName,initiatingAccountDomain). Notably,initiatingProcessAccountNameis NOT valid — it looks correct because the column exists, but the API enum usesaccountNameinstead. The API rejects invalid identifiers with a silent400 InvalidInput(empty error message), making this very hard to debug. Always alias the column:AccountName = InitiatingProcessAccountName. DeviceId requirement: For XDR-native tables (Device*, Email*, CloudAppEvents) with a device-type impactedAsset, the query must projectDeviceId(not justDeviceName). Sentinel/LA tables (SecurityEvent, AuditLogs) do not requireDeviceId.
Pitfall 10: PowerShell Empty Array Swallowing & organizationalScope
Root cause (Feb 2026): When using PowerShell if/else expressions to assign empty arrays, PowerShell swallows @() and produces $null instead:
# ❌ BUG — $x becomes $null, NOT an empty array
$x = if ($false) { @($items) } else { @() }
# Result: $null
# ✅ CORRECT — assign first, then overwrite conditionally
$x = @()
if ($condition) { $x = @($items) }
# Result: empty Object[] (serializes to [])
This caused array fields like responseActions and mitreTechniques to serialize as null instead of [], which the API rejects with 400 Bad Request.
Combined with organizationalScope: null — including this field explicitly (even as null) was also rejected. The fix: omit organizationalScope entirely and use direct assignment for array fields.
Symptoms: All rules in a batch return 400 Bad Request, but some may be silently created (see Pitfall 2). Manual deployment of the same rule body (without the null fields) succeeds.
Fixed in: Deploy-CustomDetections.ps1 — array fields now use direct assignment, organizationalScope removed from body.
Pitfall 11: tostring() on Dynamic Columns Rejected in NRT Mode
Root cause (Feb 2026): NRT rules (schedule: "0") reject tostring() wrapping dynamic-typed columns. The API returns a generic 400 Bad Request with no useful error message — similar to the let rejection described in NRT Constraints. The same query deploys successfully as a scheduled rule (1H+).
Example — AzureActivity table:
// ❌ FAILS in NRT mode — tostring() on dynamic column
AzureActivity
| where OperationNameValue =~ "MICROSOFT.SECURITY/PRICINGS/WRITE"
| where tostring(Properties_d.pricings_pricingTier) == "Free"
// ✅ WORKS — use the native string column instead
AzureActivity
| where OperationNameValue =~ "MICROSOFT.SECURITY/PRICINGS/WRITE"
| where Properties has '"pricingTier":"Free"'
Workarounds:
- Prefer native string columns — many Sentinel tables have both a dynamic column (e.g.,
Properties_d) and a string column (e.g.,Properties). Use the string column withhasorcontainsfor NRT. - Switch to 1H schedule — if
tostring()is required for precise extraction, use a scheduled rule where it works reliably.
Ingestion lag consideration: Even when a table is NRT-supported, check whether ingestion lag makes NRT impractical — see Ingestion Lag Consideration.
Pitfall 12: NRT-Supported ≠ NRT-Practical — Check Ingestion Lag
A table appearing in the NRT-Supported Tables list means the API accepts NRT rules for that table — it does NOT mean NRT adds value. Tables with significant ingestion lag negate the benefit of continuous detection.
| Table | Typical Ingestion Lag | NRT Practical? | Recommendation |
|---|---|---|---|
DeviceEvents, DeviceProcessEvents |
< 5 min | ✅ Yes | NRT is effective |
SigninLogs, AuditLogs |
5-15 min | ⚠️ Marginal | 1H is usually sufficient |
AzureActivity |
3-20 min (docs) | ⚠️ Marginal | Evaluate per use case |
SecurityEvent |
< 5 min | ✅ Yes | NRT is effective |
OfficeActivity |
15-60 min | ⚠️ Marginal | Evaluate per use case |
Rule of thumb: If the table's ingestion lag exceeds 30 minutes, use a 1H scheduled rule instead of NRT. The detection latency is dominated by ingestion lag, not rule frequency.
Pitfall 13: impactedAssets Must Be Non-Empty
Root cause (Feb 2026): The Graph API requires impactedAssets to contain at least 1 element. Sending an empty array ("impactedAssets": []) returns 400 BadRequest with InvalidInput code and the message: "The field ImpactedAssets must be a string or array type with a minimum length of '1'."
This error is particularly difficult to diagnose because:
- The error message only appears in some response formats — when using
Invoke-MgGraphRequestwith raw JSON strings, the"message"field is often empty ("") - The actual error text only surfaced when using
ConvertTo-Jsonon a PowerShell hashtable body - All other fields in the payload may be valid, making it seem like a server-side issue
Every custom detection must declare at least one impacted entity. Choose the most relevant asset type for the detection:
| Detection Focus | Asset Type | Example Identifier |
|---|---|---|
| Email-based threats | impactedMailboxAsset |
recipientEmailAddress, senderFromAddress |
| User activity | impactedUserAsset |
accountUpn, accountObjectId |
| Endpoint/device | impactedDeviceAsset |
deviceId, deviceName |
Prevention:
- Always include at least one
impactedAssetsentry in manifests and API payloads - The companion script Deploy-CustomDetections.ps1 validates this at manifest load time and rejects rules with empty
impactedAssetsbefore calling the API - Review the Impacted Asset Types section for the full list of valid identifiers per asset type
Pitfall 14: Max 3 Unique Dynamic Columns Across Title + Description
The Graph API enforces 3 unique {{Column}} references across title and description combined (not per field). Exceeding this returns 400 Bad Request — often with an empty error message via Invoke-MgGraphRequest.
⚠️ MS Learn discrepancy: Docs say 3 per field; the API empirically enforces 3 unique total across both fields (confirmed Mar 2026).
| Scenario | Unique Columns | Result |
|---|---|---|
title: {{A}} {{B}}, description: {{A}} {{C}} |
A, B, C = 3 | ✅ Accepted |
title: {{A}} {{B}}, description: {{C}} {{D}} |
A, B, C, D = 4 | ❌ 400 Bad Request |
Counting: Reuse across fields is free ({{A}} in both = 1). Count distinct names, not occurrences.
Workaround: Replace excess {{Column}} refs with static text, or use customDetails (up to 20 KVPs) to surface extra columns in the alert side panel. Deploy-CustomDetections.ps1 validates this at manifest load time.
Pitfall 15: PowerShell Double-Quoted Here-Strings — Variable Interpolation & Escaping Traps
When building queryText in PowerShell, always use single-quoted here-strings (@'...'@), NEVER double-quoted (@"..."@). Two distinct failure modes make double-quoted here-strings unreliable for KQL:
Risk 1 — $variable interpolation: PowerShell double-quoted strings interpolate $var references. KQL uses $left and $right in join syntax and $ as a dynamic property prefix. Inside @"..."@, PowerShell replaces these with empty strings (undefined variables → $null → empty), silently producing broken KQL with no compile-time warning.
Risk 2 — LLM/human escaping confusion (confirmed Mar 2026): When writing KQL inside a double-quoted context, an LLM (or human) instinctively adapts backslash escaping — writing \skills (single backslash) instead of \\skills (double backslash), because most languages interpret \\ → \ in double-quoted strings. PowerShell does NOT do this (backtick ` is the escape character, not backslash), so the single \ passes through literally to the KQL parser, which rejects \s as an invalid escape sequence → 400 Bad Request: "syntax errors".
Byte-level proof (Mar 2026): When identical \\skills content is deliberately placed in both here-string types, PowerShell produces identical bytes — confirming PowerShell itself does not mangle backslashes. The difference arises from what gets written into the string (by the LLM or human), not from PowerShell processing it. This makes the bug extremely hard to diagnose: the query looks correct in terminal output, and the root cause is an invisible content difference between attempts.
| Here-String Type | $left / $right |
Backslash Content | Practical Safety |
|---|---|---|---|
@'...'@ (single-quoted) |
Literal $left ✅ |
What you write is what you get | ✅ Safe — no interpretation |
@"..."@ (double-quoted) |
Interpolated → empty ❌ | What you write is what you get — but LLMs write different content | ❌ Fragile — two failure modes |
Rule: For ANY queryText, always use @'...'@. This eliminates both $ interpolation bugs and escaping confusion. Applies to inline PowerShell, the batch deployment script, and any LLM-generated deployment commands.
Additional validated finding: ingestion_time() IS accepted by the CD API for scheduled (non-NRT) rules (tested and confirmed Mar 2026). However, NRT rules reject ingestion_time() with 400 Bad Request — empirically confirmed Apr 2026, reproducible on retry. See Pitfall 17.
Pitfall 16: StrictMode .Count on Pipeline Scalars
PowerShell pipelines returning exactly 1 result unwrap to a scalar. Under Set-StrictMode -Version Latest, .Count on a scalar throws a terminating error. Always wrap in @() when .Count will be accessed — applies to pipelines, ConvertFrom-Json (single-element JSON arrays), and Get-* cmdlets.
# ❌ $x is scalar string when 1 result → .Count fails
$x = ... | Sort-Object -Unique
# ✅ $x is always Object[]
$x = @(... | Sort-Object -Unique)
Fixed in Deploy-CustomDetections.ps1 (Mar 2026): dynamic column validation, manifest load, and existing rule fetch all wrapped in @().
Pitfall 17: ingestion_time() Rejected in NRT Rules
NRT rules (schedule: "0") reject ingestion_time() with 400 Bad Request. The NRT Constraints table notes that Timestamp > ago(...) is "unnecessary but harmless" — however, ingestion_time() is NOT harmless in NRT mode. It is a function call (not a column filter), and the NRT streaming pipeline rejects it outright.
Empirically confirmed (Apr 2026): Four-attempt A/B test on the same query (DeviceProcessEvents with vssadmin/bcdedit destructive command detection):
| Attempt | ingestion_time() present |
Result |
|---|---|---|
| 1st deploy | Yes | ❌ 400 Bad Request |
| 2nd deploy | Removed | ✅ Created |
| Delete + redeploy | Restored | ❌ 400 Bad Request |
| Delete + redeploy | Removed | ✅ Created |
Root cause hypothesis: NRT rules process events via streaming ingestion — ingestion_time() likely depends on a materialized ingestion timestamp that isn't available (or isn't filterable) in the NRT streaming pipeline.
Fix: For NRT rules, omit ingestion_time() entirely. If you need a time filter, use Timestamp > ago(...) instead (accepted but unnecessary since NRT pre-filters automatically).
| Rule type | ingestion_time() |
Timestamp > ago(...) |
|---|---|---|
| Scheduled (1H+) | ✅ Accepted (preferred) | ✅ Accepted |
NRT ("0") |
❌ 400 Bad Request | ✅ Accepted (unnecessary) |
Pitfall 18: category and mitreTechniques Are Server-Side Validated
Both fields are validated against fixed allowlists — invalid values return 400 Bad Request (with descriptive messages, unlike Pitfall 13), not silently accepted.
category— title-case and case-sensitive (defenseevasion→ "Invalid alert category."), single-value (arrays rejected). Full accepted set in Alert Category Values.mitreTechniques— invalid IDs → "Mitre techniques (...) are invalid." The accepted set tracks the tenant's ATT&CK version: legacy IDs (e.g.T1003.001) always work; newer sub-techniques (e.g.T1556.009,T1659) work only on refreshed tenants (expanded set rolling out via preview as of June 2026).
Fallback rule: If a newer sub-technique returns "Mitre techniques ... are invalid", fall back to the parent technique (
T1556.009→T1556) or a legacy ID. Don't assume the newest ATT&CK values are available everywhere.
CD Metadata Contract
Query files in queries/ can include per-query cd-metadata blocks that provide structured data for the detection authoring skill. This is the producer/consumer contract between the KQL Query Authoring skill (producer) and the Detection Authoring skill (consumer).
When cd-metadata is present
When a query in queries/ includes a cd-metadata block, the detection authoring skill uses it to:
- Pre-populate manifest fields (
schedule,severity,category,title,impactedAssets, etc.) - Skip manual CD-readiness assessment — the block declares readiness explicitly
- Generate the adapted CD query by applying the Query Adaptation Checklist to the Sentinel query in the same section
Schema
The cd-metadata block is an HTML comment with YAML content, placed immediately after the per-query metadata fields (Severity, MITRE, Tuning Notes) and before the KQL code block:
### Query N: [Title]
**Purpose:** ...
**Severity:** High
**MITRE:** T1053.005, T1059.001
<!-- cd-metadata
cd_ready: true
schedule: "1H"
category: "Persistence"
title: "Encoded PowerShell in Scheduled Task on {{DeviceName}}"
impactedAssets:
- type: device
identifier: DeviceName
recommendedActions: "Investigate the scheduled task XML. Decode the base64 payload and check for malicious content."
adaptation_notes: "Straightforward — already row-level, add mandatory columns"
-->
```kql
// Query code...
### Field Reference
| Field | Required | Type | Description |
|-------|----------|------|-------------|
| `cd_ready` | Yes | `true` / `false` | Whether this query can be adapted for custom detection deployment |
| `schedule` | If cd_ready | `"0"` / `"1H"` / `"3H"` / `"12H"` / `"24H"` | Detection frequency. `"0"` = NRT (single-table, no joins/unions) |
| `category` | If cd_ready | string | Alert category (see [API Reference](#api-reference) for valid values) |
| `title` | No | string | Dynamic alert title with `{{ColumnName}}` placeholders. Falls back to query heading if omitted. **Limit: max 3 unique `{{Column}}` references across `title` AND `description` combined** (see [Pitfall 14](#pitfall-14-max-3-unique-dynamic-columns-across-title--description)) |
| `impactedAssets` | If cd_ready | array | Asset entities to extract. Each entry: `type` (`device`/`user`/`mailbox`) + `identifier` (predefined API value, e.g., `accountUpn`, `deviceName` — see [Impacted Asset Types](#impacted-asset-types)) |
| `recommendedActions` | No | string | Triage guidance shown in the alert. Omit if not needed |
| `responseActions` | No | array | **PROHIBITED** — must always be omitted or empty `[]`. Response actions must only be configured manually in the Defender portal |
| `adaptation_notes` | No | string | Human-readable notes on what adaptation is needed (for the summary table) |
### Queries NOT suitable for CD
For queries that cannot be adapted (baseline queries, statistical aggregations), use:
```markdown
<!-- cd-metadata
cd_ready: false
adaptation_notes: "Statistical baseline query — requires summarize with dcount, not suitable for CD"
-->
This explicitly documents the assessment so the detection skill doesn't re-evaluate it each time.
How the detection skill consumes cd-metadata
- User says "deploy query 8 as a custom detection" → Skill reads the query file, finds the cd-metadata block for Query 8
- Pre-populates manifest entry from cd-metadata fields (schedule, category, severity, title, impactedAssets)
- Applies Query Adaptation Checklist to the Sentinel KQL query in that section
- Writes manifest JSON to
temp/for review. Only deploys via Graph API if the user explicitly requested deployment (see Deployment Gate)
If a query file has no cd-metadata blocks, the skill assesses CD-readiness manually based on the query structure and the Query Adaptation Checklist.
.github/skills/geomap-visualization/SKILL.md
npx skills add SCStelz/security-investigator --skill geomap-visualization -g -y
SKILL.md
Frontmatter
{
"name": "geomap-visualization",
"description": "Use this skill when asked to create geographic maps, visualize attack origins on a world map, show location-based data, or display IP geolocation. Triggers on keywords like \"geomap\", \"world map\", \"geographic\", \"attack map\", \"show on map\", \"visualize locations\", \"attack origins\", or when analyzing data with latitude\/longitude coordinates."
}
Geomap Visualization Skill
Purpose
Generate interactive world map visualizations from Microsoft Sentinel data using the Sentinel Geomap MCP App. Geomaps display markers on a world map with coordinates, ideal for visualizing attack origins, geographic distribution of threats, or location-based security data.
📑 TABLE OF CONTENTS
- Quick Start - Minimal example to get started
- MCP Tool Reference - Parameters and schemas
- Data Sources - Tables with native vs enriched geolocation
- KQL Query Patterns - Ready-to-use queries by scenario
- Enrichment Integration - Adding threat intel drill-down
- Examples - End-to-end workflows
- Follow-Up Investigation Queries - Queries for selected IPs
- Interactive Selection Feature - Multi-select and chat integration
Quick Start
Minimal Geomap (3 Steps)
# 1. Query Sentinel for data with coordinates
mcp_sentinel-data_query_lake({
"query": "W3CIISLog | where TimeGenerated > ago(7d) | where scStatus == '401' | summarize value = count(), lat = take_any(RemoteIPLatitude), lon = take_any(RemoteIPLongitude) by ip = cIP | where lat != 0 | project ip, lat, lon, value"
})
# 2. Display geomap
mcp_sentinel-geom_show-attack-map({
"data": [<query results>],
"title": "Attack Origins (Last 7 Days)",
"valueLabel": "Failed Logins",
"colorScale": "blue-red"
})
MCP Tool Reference
Tool: mcp_sentinel-geom_show-attack-map
| Parameter | Required | Type | Description |
|---|---|---|---|
data |
✅ | array | Array of {ip, lat, lon, value} objects |
title |
❌ | string | Title displayed above map (default: "Attack Origin Map") |
valueLabel |
❌ | string | Label for values (default: "Attacks") |
colorScale |
❌ | string | blue-red (threats), green-red, or blue-yellow |
enrichment |
❌ | array | IP enrichment data for click-to-expand panels |
Data Schema
{
"data": [
{"ip": "101.36.107.228", "lat": 22.25, "lon": 114.15, "value": 44},
{"ip": "193.142.147.209", "lat": 52.35, "lon": 4.92, "value": 13},
{"ip": "170.64.158.196", "lat": -33.90, "lon": 151.19, "value": 9}
]
}
Enrichment Schema
{
"enrichment": [
{
"ip": "101.36.107.228",
"city": "Hong Kong",
"country": "HK",
"org": "AS135377 UCLOUD INFORMATION TECHNOLOGY",
"is_vpn": true,
"is_proxy": false,
"is_tor": false,
"abuse_confidence_score": 100,
"total_reports": 4612,
"last_reported": "2026-01-29",
"threat_categories": ["SSH", "Brute-Force", "Web App Attack"]
}
]
}
⚠️ CRITICAL: Complete Enrichment Requirement
When providing enrichment data, ALWAYS include ALL IPs - never a subset.
Rule: 100% Enrichment Coverage
| Scenario | Correct Action |
|---|---|
| Queried 50 IPs from Sentinel | Include enrichment for ALL 50 IPs |
| Enriched 25 IPs | Include ALL 25 in enrichment array |
| Some IPs failed enrichment | Include them with empty fields, or filter from both data AND enrichment |
Why This Matters
- Users click markers expecting threat intel panels
- Missing enrichment = empty panels = broken UX
- Partial enrichment misleads security analysts
Workflow to Ensure Complete Enrichment
- Query Sentinel → Get N IPs with coordinates
- Batch enrich IPs →
python enrich_ips.py <all_ips>orpython enrich_ips.py --file <ips.json> - Parse enrichment JSON → Extract ALL enriched entries
- Build enrichment array → One entry per IP, matching
dataarray exactly - Call geomap → Both
dataandenrichmentarrays must have same IPs
Example: Building Complete Enrichment
import json
# Load enrichment from batch operation
with open('temp/ip_enrichment_<timestamp>.json', 'r') as f:
raw_enrichment = json.load(f)
# Build geomap enrichment array - INCLUDE ALL
enrichment = []
for e in raw_enrichment:
threat_cats = []
for c in e.get('recent_comments', [])[:5]:
threat_cats.extend(c.get('categories', []))
enrichment.append({
'ip': e['ip'],
'city': e.get('city', 'Unknown'),
'country': e.get('country', '??'),
'org': e.get('org', 'Unknown'),
'is_vpn': e.get('is_vpn') or e.get('vpnapi_security_vpn', False),
'is_proxy': e.get('is_proxy') or e.get('vpnapi_security_proxy', False),
'is_tor': e.get('is_tor') or e.get('vpnapi_security_tor', False),
'abuse_confidence_score': e.get('abuse_confidence_score', 0),
'total_reports': e.get('total_reports', 0),
'last_reported': e.get('recent_comments', [{}])[0].get('date', '')[:10] if e.get('recent_comments') else '',
'threat_categories': list(set(threat_cats))[:5]
})
# Verify coverage
print(f"Enrichment entries: {len(enrichment)}") # Must match data array length
❌ NEVER Do This
# BAD: Only including first 25 IPs
enrichment = enrichment[:25] # WRONG
# BAD: Skipping IPs without abuse scores
enrichment = [e for e in enrichment if e['abuse_confidence_score'] > 0] # WRONG
✅ ALWAYS Do This
# GOOD: Include all IPs, even if some fields are empty
enrichment = [transform(e) for e in raw_enrichment] # All entries
# GOOD: If filtering, filter BOTH data and enrichment consistently
valid_ips = set(e['ip'] for e in enrichment if e.get('city'))
data = [d for d in data if d['ip'] in valid_ips] # Filter both
Data Sources
Tables with Native Geolocation
Some Sentinel tables include lat/lon directly from Microsoft's GeoIP enrichment:
| Table | Latitude Column | Longitude Column | Country Column |
|---|---|---|---|
| W3CIISLog | RemoteIPLatitude |
RemoteIPLongitude |
RemoteIPCountry |
| CommonSecurityLog | DeviceGeoLatitude |
DeviceGeoLongitude |
DeviceGeoCountry |
| AzureDiagnostics | varies by source | varies by source | varies by source |
| AzureNetworkAnalytics | SrcGeoLatitude |
SrcGeoLongitude |
SrcGeoCountry |
Use these when available - no enrichment needed for coordinates.
Tables Requiring IP Enrichment
These tables have IP addresses but no coordinates:
| Table | IP Column | Enrichment Required |
|---|---|---|
| SigninLogs | IPAddress |
Yes - use enrich_ips.py |
| SecurityEvent | IpAddress |
Yes - use enrich_ips.py |
| Syslog | extract from message | Yes - use enrich_ips.py |
| DeviceNetworkEvents | RemoteIP |
Yes - use enrich_ips.py |
| OfficeActivity | ClientIP |
Yes - use enrich_ips.py |
Enrichment script now captures latitude and longitude from ipinfo.io.
KQL Query Patterns
Pattern 1: Native Geolocation (W3CIISLog)
W3CIISLog
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| where <filter_condition>
| summarize
value = count(),
lat = take_any(RemoteIPLatitude),
lon = take_any(RemoteIPLongitude),
country = take_any(RemoteIPCountry)
by ip = cIP
| where lat != 0 and lon != 0 // Filter unknown locations
| project ip, lat, lon, value
| order by value desc
Pattern 2: Native Geolocation (CommonSecurityLog)
CommonSecurityLog
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| where <filter_condition>
| summarize
value = count(),
lat = take_any(DeviceGeoLatitude),
lon = take_any(DeviceGeoLongitude)
by ip = SourceIP
| where lat != 0 and lon != 0
| project ip, lat, lon, value
| order by value desc
Pattern 3: Enrichment Required (Extract IPs Only)
<Table>
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| where <filter_condition>
| summarize value = count() by ip = <IP_column>
| order by value desc
| take 100
Then run enrich_ips.py to get lat/lon.
Scenario-Specific KQL Queries
Scenario: W3CIISLog - Failed Logins (Native Geo)
W3CIISLog
| where TimeGenerated > ago(90d)
| where Computer startswith "<honeypot_name>"
| where scStatus == "401" // Failed auth
| where cIP != "127.0.0.1"
| summarize
value = count(),
lat = take_any(RemoteIPLatitude),
lon = take_any(RemoteIPLongitude),
country = take_any(RemoteIPCountry)
by ip = cIP
| where lat != 0 and lon != 0
| project ip, lat, lon, value
| order by value desc
Scenario: W3CIISLog - Web Attacks (Native Geo)
W3CIISLog
| where TimeGenerated > ago(30d)
| where tolong(scStatus) >= 400
| where csUriStem has_any ("'", "union", "select", "script", "../", "cmd.exe")
| where cIP != "127.0.0.1"
| summarize
value = count(),
lat = take_any(RemoteIPLatitude),
lon = take_any(RemoteIPLongitude)
by ip = cIP
| where lat != 0
| project ip, lat, lon, value
| order by value desc
| take 100
Scenario: CommonSecurityLog - Firewall Blocks (Native Geo)
CommonSecurityLog
| where TimeGenerated > ago(7d)
| where DeviceAction == "Deny" or Activity has "blocked"
| summarize
value = count(),
lat = take_any(DeviceGeoLatitude),
lon = take_any(DeviceGeoLongitude)
by ip = SourceIP
| where lat != 0 and lon != 0
| project ip, lat, lon, value
| order by value desc
| take 100
Scenario: SigninLogs - Failed Sign-ins (Requires Enrichment)
Step 1: Query IPs and values
SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != 0 // Failed
| summarize value = count() by ip = IPAddress
| order by value desc
| take 50
Step 2: Enrich IPs
python enrich_ips.py <ip1> <ip2> <ip3> ...
Step 3: Build map data from enrichment JSON (includes lat/lon)
Scenario: SecurityEvent - RDP Brute Force (Requires Enrichment)
SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID == 4625
| where LogonType == 10 // RDP
| where IpAddress != "-" and IpAddress != "127.0.0.1"
| summarize value = count() by ip = IpAddress
| order by value desc
| take 50
Then enrich to get coordinates.
Scenario: DeviceNetworkEvents - Inbound Attacks (Requires Enrichment)
DeviceNetworkEvents
| where TimeGenerated > ago(7d)
| where DeviceName =~ "<device_name>"
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
| where LocalPort in (3389, 22, 445, 80, 443)
| where RemoteIP !startswith "192.168." and RemoteIP !startswith "10."
| summarize value = count() by ip = RemoteIP
| order by value desc
| take 50
Enrichment Integration
When Coordinates Are Not in Sentinel
For tables without native geo fields, use the enrichment script:
Step 1: Run your KQL query to get IPs and values
Step 2: Enrich IPs:
python enrich_ips.py 203.0.113.42 198.51.100.10 192.0.2.1
# Or from file:
python enrich_ips.py --file temp/attack_ips.json
Step 3: Load enrichment JSON and build map data:
import json
# Load enrichment (now includes latitude/longitude from ipinfo.io)
with open('temp/ip_enrichment_<timestamp>.json', 'r') as f:
enrichment = json.load(f)
# Build map data
map_data = []
enrichment_out = []
for e in enrichment:
ip = e['ip']
lat = e.get('latitude')
lon = e.get('longitude')
if lat is None or lon is None:
continue # Skip IPs without coordinates
# Get value from your KQL results (create a lookup dict)
value = attack_counts.get(ip, 1)
map_data.append({
'ip': ip,
'lat': lat,
'lon': lon,
'value': value
})
# Build enrichment for drill-down
threat_cats = []
for c in e.get('recent_comments', [])[:5]:
threat_cats.extend(c.get('categories', []))
enrichment_out.append({
'ip': ip,
'city': e.get('city', 'Unknown'),
'country': e.get('country', '??'),
'org': e.get('org', 'Unknown'),
'is_vpn': e.get('is_vpn') or e.get('vpnapi_security_vpn', False),
'abuse_confidence_score': e.get('abuse_confidence_score', 0),
'total_reports': e.get('total_reports', 0),
'last_reported': e.get('recent_comments', [{}])[0].get('date', '')[:10] if e.get('recent_comments') else '',
'threat_categories': list(set(threat_cats))[:5]
})
Interactive Features with Enrichment
When enrichment is provided:
- Click any marker → Opens threat intel panel showing:
- 📍 Location (city, country)
- 🏢 Organization/ISP
- 🏷️ VPN/Proxy/Tor badges
- 📊 AbuseIPDB confidence meter
- 📈 Total reports count
- 🔴 Threat category tags
Color Scale Guide
| Scale | Low Value | High Value | Best For |
|---|---|---|---|
blue-red |
Blue | Red | Threats (attacks, failures) - DEFAULT |
green-red |
Teal | Green | Positive activity (benign traffic) |
blue-yellow |
Blue | Yellow | Neutral data distributions |
For threat/attack maps, always use blue-red.
Complete Examples
Example 1: 90-Day Honeypot Attack Map (Native Geo)
# 1. Query with native lat/lon from W3CIISLog
mcp_sentinel-data_query_lake({
"query": "W3CIISLog | where TimeGenerated > ago(90d) | where Computer startswith '<HONEYPOT_SERVER>' | where scStatus == '401' | summarize value = count(), lat = take_any(RemoteIPLatitude), lon = take_any(RemoteIPLongitude), country = take_any(RemoteIPCountry) by ip = cIP | where lat != 0 and lon != 0 | project ip, lat, lon, value | order by value desc"
})
# 2. Enrich top IPs for threat intel drill-down
python enrich_ips.py 101.36.107.228 193.142.147.209 80.190.82.185
# 3. Display geomap
mcp_sentinel-geom_show-attack-map({
"data": [
{"ip": "101.36.107.228", "lat": 22.25, "lon": 114.15, "value": 44},
{"ip": "80.190.82.185", "lat": 50.97, "lon": 6.83, "value": 44},
{"ip": "193.142.147.209", "lat": 52.35, "lon": 4.92, "value": 13},
{"ip": "170.64.158.196", "lat": -33.9, "lon": 151.19, "value": 9}
],
"title": "Honeypot Attack Origins - 90 Day Analysis",
"valueLabel": "Failed Logins",
"colorScale": "blue-red",
"enrichment": [
{"ip": "101.36.107.228", "city": "Hong Kong", "country": "HK", "org": "AS135377 UCLOUD", "is_vpn": true, "abuse_confidence_score": 100, "total_reports": 4612, "threat_categories": ["SSH", "Brute-Force"]},
{"ip": "193.142.147.209", "city": "Amsterdam", "country": "NL", "org": "AS213438 ColocaTel", "is_vpn": true, "abuse_confidence_score": 100, "total_reports": 30973, "threat_categories": ["Web App Attack", "Hacking"]}
]
})
Example 2: SigninLogs Attack Map (Enrichment Required)
# 1. Query IPs with failed sign-ins
mcp_sentinel-data_query_lake({
"query": "SigninLogs | where TimeGenerated > ago(7d) | where ResultType != 0 | summarize value = count() by ip = IPAddress | order by value desc | take 50"
})
# 2. Enrich all IPs (script now captures lat/lon)
python enrich_ips.py <ip1> <ip2> ...
# 3. Load enrichment JSON and build map data
# (See Python code in Enrichment Integration section)
# 4. Display geomap
mcp_sentinel-geom_show-attack-map({
"data": [<map_data from enrichment>],
"title": "Failed Sign-In Origins (Last 7 Days)",
"valueLabel": "Failed Attempts",
"colorScale": "blue-red",
"enrichment": [<enrichment_out>]
})
Example 3: Firewall Blocks (Native Geo)
# 1. Query blocked traffic with geo
mcp_sentinel-data_query_lake({
"query": "CommonSecurityLog | where TimeGenerated > ago(24h) | where DeviceAction == 'Deny' | summarize value = count(), lat = take_any(DeviceGeoLatitude), lon = take_any(DeviceGeoLongitude) by ip = SourceIP | where lat != 0 | project ip, lat, lon, value | order by value desc | take 100"
})
# 2. Display geomap
mcp_sentinel-geom_show-attack-map({
"data": [<query results>],
"title": "Blocked Traffic Origins (Last 24h)",
"valueLabel": "Blocked Connections",
"colorScale": "blue-red"
})
Follow-Up Investigation Queries
When users select IPs from the geomap and click "🔍 Investigate in Chat", run these queries to provide comprehensive threat analysis. Execute queries in parallel where possible.
Multi-IP Filter Pattern
All queries use this dynamic IP filter:
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>", ...]);
Replace with the actual IPs selected from the geomap.
Query 1: DeviceNetworkEvents (Network Activity)
Purpose: Show all network connections from selected IPs to any device in the environment.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where RemoteIP in (target_ips)
| summarize
ConnectionCount = count(),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
TargetDevices = make_set(DeviceName, 10),
TargetPorts = make_set(LocalPort, 20),
Actions = make_set(ActionType, 5)
by RemoteIP
| extend Duration = LastSeen - FirstSeen
| order by ConnectionCount desc
Columns returned:
RemoteIP: Attacker IPConnectionCount: Total connectionsFirstSeen/LastSeen: Activity time rangeTargetDevices: Devices contactedTargetPorts: Ports targeted (LocalPort = service ports on your devices)Actions: Connection types (Success, Blocked, etc.)
Query 2: SecurityEvent (Windows Authentication)
Purpose: Show Windows authentication attempts from selected IPs.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
SecurityEvent
| where TimeGenerated between (start .. end)
| where IpAddress in (target_ips)
| where EventID in (4624, 4625, 4648, 4771, 4776)
| summarize
EventCount = count(),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
TargetComputers = make_set(Computer, 10),
TargetAccounts = make_set(Account, 20),
LogonTypes = make_set(LogonType, 5)
by IpAddress, EventID
| extend EventType = case(
EventID == 4624, "Successful Logon",
EventID == 4625, "Failed Logon",
EventID == 4648, "Explicit Credentials",
EventID == 4771, "Kerberos Pre-Auth Failed",
EventID == 4776, "NTLM Auth Attempt",
"Other")
| project IpAddress, EventType, EventCount, TargetComputers, TargetAccounts, LogonTypes, FirstSeen, LastSeen
| order by EventCount desc
Key Event IDs:
4624: Successful logon (ALERT: attacker got in!)4625: Failed logon (brute force indicator)4648: Explicit credentials used (lateral movement)4771: Kerberos pre-auth failed4776: NTLM credential validation
Query 3: W3CIISLog (Web Attacks)
Purpose: Show HTTP requests from selected IPs including attack patterns.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
W3CIISLog
| where TimeGenerated between (start .. end)
| where cIP in (target_ips)
| summarize
RequestCount = count(),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
TargetServers = make_set(Computer, 10),
URIs = make_set(csUriStem, 20),
StatusCodes = make_set(tolong(scStatus), 10),
Methods = make_set(csMethod, 5),
UserAgents = make_set(csUserAgent, 5)
by cIP
| extend AttackPatterns = case(
URIs has_any ("'", "union", "select"), "SQL Injection",
URIs has "script", "XSS",
URIs has_any ("../", "..\\"), "Path Traversal",
URIs has_any ("cmd.exe", "powershell"), "Command Injection",
"Reconnaissance")
| project IP = cIP, RequestCount, AttackPatterns, TargetServers, StatusCodes, Methods, URIs, FirstSeen, LastSeen
| order by RequestCount desc
Query 4: SigninLogs (Azure AD Activity)
Purpose: Show Azure AD sign-in attempts from selected IPs.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
SigninLogs
| where TimeGenerated between (start .. end)
| where IPAddress in (target_ips)
| summarize
SignInCount = count(),
SuccessCount = countif(ResultType == 0),
FailureCount = countif(ResultType != 0),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
TargetUsers = make_set(UserPrincipalName, 20),
TargetApps = make_set(AppDisplayName, 10),
ErrorCodes = make_set(ResultType, 10),
ClientApps = make_set(ClientAppUsed, 5)
by IPAddress
| extend SuccessRate = round(100.0 * SuccessCount / SignInCount, 1)
| project IPAddress, SignInCount, SuccessCount, FailureCount, SuccessRate, TargetUsers, TargetApps, ErrorCodes, FirstSeen, LastSeen
| order by SignInCount desc
CRITICAL: Check SuccessCount > 0 - This indicates the attacker successfully authenticated!
Query 5: ThreatIntelIndicators (Known Threats)
Purpose: Check if selected IPs match threat intelligence databases.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
ThreatIntelIndicators
| extend IndicatorType = replace_string(replace_string(replace_string(tostring(split(ObservableKey, ":", 0)), "[", ""), "]", ""), "\"", "")
| where IndicatorType in ("ipv4-addr", "ipv6-addr", "network-traffic")
| extend NetworkSourceIP = toupper(ObservableValue)
| where NetworkSourceIP in (target_ips)
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| extend TrafficLightProtocolLevel = tostring(parse_json(AdditionalFields).TLPLevel)
| extend ActivityGroupNames = extract(@"ActivityGroup:(\S+)", 1, tostring(parse_json(Data).labels))
| summarize arg_max(TimeGenerated, *) by NetworkSourceIP
| project
IPAddress = NetworkSourceIP,
ThreatDescription = Description,
ActivityGroupNames,
Confidence,
ValidUntil,
TrafficLightProtocolLevel,
IsActive,
TimeGenerated
| order by Confidence desc
Key Fields:
Confidence: 0-100 threat confidence scoreActivityGroupNames: APT/threat actor attribution (e.g., "PHOSPHORUS", "NOBELIUM")ThreatDescription: Details about the threat
Query 6: SecurityAlert with Incident Status
Purpose: Find security alerts that reference selected IPs, with the actual status from SecurityIncident (not the immutable alert status).
⚠️ IMPORTANT: SecurityAlert.Status is immutable ("New" at creation time). The actual status is on the SecurityIncident table. This query joins to get the real incident status.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
// Step 1: Find alerts containing target IPs as entities
let matched_alerts = SecurityAlert
| where TimeGenerated between (start .. end)
| extend EntitiesParsed = parse_json(Entities)
| mv-expand Entity = EntitiesParsed
| where Entity.["Type"] == "ip"
| extend EntityIP = tostring(Entity.Address)
| where EntityIP in (target_ips)
| summarize MatchedIPs = make_set(EntityIP) by SystemAlertId;
// Step 2: Get latest incident status for these alerts (keep AlertIds)
let incident_status = SecurityIncident
| where TimeGenerated between (start .. end)
| summarize arg_max(TimeGenerated, Status, Classification, IncidentNumber, AlertIds) by IncidentName
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| project AlertId, IncidentStatus = Status, Classification, IncidentNumber;
// Step 3: Join alerts with matched IPs and incident status
SecurityAlert
| where TimeGenerated between (start .. end)
| where SystemAlertId in (matched_alerts)
| join kind=leftouter matched_alerts on $left.SystemAlertId == $right.SystemAlertId
| join kind=leftouter incident_status on $left.SystemAlertId == $right.AlertId
| summarize arg_max(TimeGenerated, AlertName, AlertSeverity, Status, ProviderName, Tactics, Description, MatchedIPs, IncidentStatus, Classification, IncidentNumber) by SystemAlertId
| extend FinalStatus = coalesce(IncidentStatus, Status) // Use incident status if available
| project
TimeGenerated,
AlertName,
AlertSeverity,
Status = FinalStatus,
Classification,
IncidentNumber,
ProviderName,
Tactics,
MatchedIPs,
Description
| order by TimeGenerated desc
| take 25
Why This Matters:
- SecurityAlert.Status = "New" is the creation status (immutable)
- SecurityIncident.Status shows the current status (New/Active/Closed)
- SecurityIncident.Classification shows the closure reason (TruePositive/FalsePositive/BenignPositive)
- Alerts without incidents keep their original "New" status
Entities JSON Structure Example:
[
{"$id":"3","HostName":"contoso-server","Type":"host"},
{"$id":"4","Address":"203.0.113.10","Type":"ip"},
{"$id":"5","Address":"198.51.100.20","Type":"ip"}
]
Query 7: DeviceProcessEvents (Process Execution Post-Compromise)
Purpose: If attacker IPs had successful connections, check for suspicious process execution.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
// First, find devices that had connections from target IPs
let compromised_devices = DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where RemoteIP in (target_ips)
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
| distinct DeviceName;
// Then check for suspicious processes on those devices
DeviceProcessEvents
| where TimeGenerated between (start .. end)
| where DeviceName in (compromised_devices)
| where FileName in~ ("powershell.exe", "cmd.exe", "wscript.exe", "cscript.exe", "mshta.exe", "certutil.exe", "bitsadmin.exe", "regsvr32.exe", "rundll32.exe")
or ProcessCommandLine has_any ("Invoke-", "IEX", "DownloadString", "WebClient", "-enc", "-encoded", "bypass", "hidden")
| project TimeGenerated, DeviceName, FileName, ProcessCommandLine, AccountName, InitiatingProcessFileName
| order by TimeGenerated desc
| take 50
Query 8: DeviceFileEvents (Malware Drops)
Purpose: Check for file creation/modification on devices contacted by attacker IPs.
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
// Find devices that had connections from target IPs
let compromised_devices = DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where RemoteIP in (target_ips)
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
| distinct DeviceName;
// Check for suspicious file activity
DeviceFileEvents
| where TimeGenerated between (start .. end)
| where DeviceName in (compromised_devices)
| where ActionType in ("FileCreated", "FileModified")
| where FileName endswith_cs ".exe" or FileName endswith_cs ".dll" or FileName endswith_cs ".ps1"
or FileName endswith_cs ".bat" or FileName endswith_cs ".vbs" or FileName endswith_cs ".js"
| where FolderPath has_any ("\\Temp\\", "\\AppData\\", "\\Downloads\\", "\\ProgramData\\", "\\Users\\Public\\")
| project TimeGenerated, DeviceName, FileName, FolderPath, ActionType, InitiatingProcessFileName, SHA256
| order by TimeGenerated desc
| take 50
Recommended Execution Order
When user selects IPs and clicks "Investigate in Chat":
Phase 1 (Parallel):
- Query 1: DeviceNetworkEvents
- Query 2: SecurityEvent
- Query 3: W3CIISLog
- Query 4: SigninLogs
- Query 5: ThreatIntelIndicators
- Query 6: SecurityAlert
Phase 2 (If connections found):
- Query 7: DeviceProcessEvents (post-compromise activity)
- Query 8: DeviceFileEvents (malware indicators)
Response Format:
Summarize findings with:
- Threat Level Assessment (Critical/High/Medium/Low)
- Attack Summary - What the IPs did, which devices/users were targeted
- Successful Access - ALERT if any successful logins (4624) or Azure AD success (ResultType=0)
- Threat Intel Matches - Known APT groups, malware campaigns
- Recommendations - Block IPs, investigate users, isolate devices
Interactive Selection Feature
The geomap supports multi-select mode for follow-up investigations:
How to Use
- Click "☑ Select" button (top of map) to enter selection mode
- Click markers to add/remove IPs from selection (green checkmark ✓)
- Review selection panel showing selected IPs with enrichment summary
- Click "🔍 Investigate in Chat" to send selected IPs for investigation
What Happens
When you click "Investigate in Chat":
- All selected IPs are formatted with enrichment context
- Message is sent to chat as a user message
- LLM runs the follow-up queries above automatically
- Results are summarized with threat assessment
Selection Panel Shows
For each selected IP:
- IP address
- City, Country
- Abuse confidence score (color-coded badge)
- Attack value from the map
Technical Notes
- Projection: Robinson projection for accurate world map display
- Map Source: SimpleMaps.com world SVG (MIT license)
- Bundle Size: ~650 KB (includes embedded world map)
- CSP Compliance: No external resources - all assets embedded inline
- Coordinate System: Standard WGS84 (latitude: -90 to 90, longitude: -180 to 180)
When to Use Geomaps
✅ Good Use Cases:
- Attack origin visualization (honeypots, firewalls)
- Geographic threat distribution
- Anomalous sign-in locations
- VPN/anonymization analysis across regions
- Executive briefings on global threats
❌ Skip Geomaps When:
- Fewer than 3 unique locations (too sparse)
- All IPs from same region (use heatmap instead)
- Time-based patterns needed (use heatmap)
- No geographic data available and enrichment not feasible
Last Updated: January 29, 2026
.github/skills/heatmap-visualization/SKILL.md
npx skills add SCStelz/security-investigator --skill heatmap-visualization -g -y
SKILL.md
Frontmatter
{
"name": "heatmap-visualization",
"description": "Use this skill when asked to create heatmaps, visualize patterns over time, show activity grids, or display aggregated data in a matrix format. Triggers on keywords like \"heatmap\", \"show heatmap\", \"visualize patterns\", \"activity grid\", \"time-based visualization\", or when analyzing attack patterns, sign-in activity, or event distributions by time period."
}
Heatmap Visualization Skill
Purpose
Generate interactive heatmap visualizations from Microsoft Sentinel data using the Sentinel Heatmap MCP App. Heatmaps display aggregated data in a row/column grid with color-coded intensity, ideal for identifying patterns across time periods, comparing entities, or spotting anomalies.
📑 TABLE OF CONTENTS
- Quick Start - Minimal example to get started
- MCP Tool Reference - Parameters and schemas
- KQL Query Patterns - Ready-to-use queries by scenario
- Enrichment Integration - Adding threat intel drill-down
- Color Scale Guide - Choosing the right colors
- Examples - End-to-end workflows
Quick Start
Minimal Heatmap (3 Steps)
# 1. Query Sentinel for aggregated data
mcp_sentinel-data_query_lake({
"query": "SigninLogs | where TimeGenerated > ago(24h) | summarize value = count() by row = AppDisplayName, column = format_datetime(bin(TimeGenerated, 1h), 'HH:mm') | project row, column, value"
})
# 2. Display heatmap
mcp_sentinel-heat_show-signin-heatmap({
"data": [<query results>],
"title": "Sign-Ins by Application (Last 24h)",
"rowLabel": "Application",
"colLabel": "Hour (UTC)",
"valueLabel": "Sign-ins",
"colorScale": "green-red"
})
MCP Tool Reference
Tool: mcp_sentinel-heat_show-signin-heatmap
| Parameter | Required | Type | Description |
|---|---|---|---|
data |
✅ | array | Array of {row, column, value} objects |
title |
❌ | string | Title displayed above heatmap |
rowLabel |
❌ | string | Label for row axis (e.g., "IP Address") |
colLabel |
❌ | string | Label for column axis (e.g., "Hour") |
valueLabel |
❌ | string | Label for cell values (e.g., "Events") |
colorScale |
❌ | string | green-red, blue-red, or blue-yellow |
enrichment |
❌ | array | IP enrichment data for click-to-expand panels |
Data Schema
{
"data": [
{"row": "192.168.1.1", "column": "10:00", "value": 45},
{"row": "192.168.1.1", "column": "11:00", "value": 62},
{"row": "10.0.0.5", "column": "10:00", "value": 128}
]
}
Enrichment Schema (Optional)
{
"enrichment": [
{
"ip": "80.94.95.83",
"city": "Timișoara",
"country": "RO",
"org": "AS204428 SS-Net",
"is_vpn": false,
"abuse_confidence_score": 100,
"total_reports": 975,
"last_reported": "2026-01-29",
"threat_categories": ["RDP Brute-Force", "Hacking", "Port Scan"]
}
]
}
KQL Query Patterns
All queries must return row, column, value columns.
Pattern 1: Activity by Entity and Hour
<Table>
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| summarize value = count()
by row = <entity_field>,
column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc
Pattern 2: Activity by Entity and Day
<Table>
| where TimeGenerated > ago(30d)
| summarize value = count()
by row = <entity_field>,
column = format_datetime(bin(TimeGenerated, 1d), "yyyy-MM-dd")
| project row, column, value
| order by column asc
Pattern 3: Cross-Tabulation (Two Dimensions)
<Table>
| where TimeGenerated > ago(7d)
| summarize value = count()
by row = <dimension1>,
column = <dimension2>
| project row, column, value
| order by value desc
Scenario-Specific KQL Queries
Scenario: Sign-In Activity by Application and Hour
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0 // Successful sign-ins
| summarize value = count()
by row = AppDisplayName,
column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc
Recommended: colorScale: "green-red" (activity = good)
Scenario: Failed Sign-Ins by IP and Hour
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0 // Failed sign-ins
| summarize value = count()
by row = IPAddress,
column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc
| take 500 // Limit to top patterns
Recommended: colorScale: "blue-red" (failures = threat)
Scenario: Honeypot Attack Patterns (SecurityEvent)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer contains honeypot
| where EventID in (4625, 4771, 4776) // Failed auth events
| where isnotempty(IpAddress) and IpAddress != "-" and IpAddress != "127.0.0.1"
| summarize value = count()
by row = IpAddress,
column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc
Recommended: colorScale: "blue-red" (attacks = threat)
Scenario: Web Attack Patterns (W3CIISLog)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
W3CIISLog
| where TimeGenerated between (start .. end)
| where tolong(scStatus) >= 400 // HTTP errors
| where cIP != "127.0.0.1"
| summarize value = count()
by row = cIP,
column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc
| take 300
Recommended: colorScale: "blue-red"
Scenario: Defender Alerts by Severity and Day
SecurityAlert
| where TimeGenerated > ago(30d)
| summarize value = count()
by row = AlertSeverity,
column = format_datetime(bin(TimeGenerated, 1d), "yyyy-MM-dd")
| project row, column, value
| order by column asc
Recommended: colorScale: "blue-yellow" (neutral overview)
Scenario: User Activity by Application
SigninLogs
| where TimeGenerated > ago(7d)
| where UserPrincipalName =~ '<UPN>'
| summarize value = count()
by row = AppDisplayName,
column = format_datetime(bin(TimeGenerated, 1d), "MM-dd")
| project row, column, value
| order by column asc
Recommended: colorScale: "green-red"
Scenario: Multi-Source Combined Heatmap
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
union
(SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer contains honeypot
| where EventID in (4625, 4771, 4776)
| where isnotempty(IpAddress) and IpAddress != "-"
| extend Source = "RDP/SMB", IP = IpAddress),
(W3CIISLog
| where TimeGenerated between (start .. end)
| where Computer contains honeypot
| where tolong(scStatus) >= 400
| extend Source = "IIS", IP = cIP),
(DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName contains honeypot
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
| extend Source = "Network", IP = RemoteIP)
| where IP != "127.0.0.1" and IP != "::1"
| summarize value = count()
by row = strcat(IP, " (", Source, ")"),
column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc
Enrichment Integration
Adding Threat Intel Drill-Down
When displaying IP-based heatmaps, add enrichment data for click-to-expand threat panels:
Step 1: Extract unique IPs from your query results
Step 2: Enrich IPs using the enrichment script:
python enrich_ips.py 80.94.95.83 193.142.147.209 101.36.107.228
Step 3: Transform enrichment output to heatmap format:
enrichment_out = []
for e in enrichment_data:
threat_cats = []
for c in e.get('recent_comments', [])[:5]:
threat_cats.extend(c.get('categories', []))
enrichment_out.append({
'ip': e['ip'],
'city': e.get('city', 'Unknown'),
'country': e.get('country', '??'),
'org': e.get('org', 'Unknown'),
'is_vpn': e.get('is_vpn') or e.get('vpnapi_security_vpn', False),
'abuse_confidence_score': e.get('abuse_confidence_score', 0),
'total_reports': e.get('total_reports', 0),
'last_reported': e.get('recent_comments', [{}])[0].get('date', '')[:10],
'threat_categories': list(set(threat_cats))[:5]
})
Step 4: Include in heatmap call:
mcp_sentinel-heat_show-signin-heatmap({
"data": [...],
"enrichment": [<enrichment_out>],
...
})
Interactive Features with Enrichment
When enrichment is provided:
- Click any IP row → Opens threat intel panel showing:
- 📍 Location (city, country)
- 🏢 Organization/ISP
- 🏷️ VPN/Proxy/Tor badges
- 📊 AbuseIPDB confidence meter (0-100)
- 📈 Total reports count
- 🔴 Threat category tags
- Hover any cell → Tooltip with row, column, exact value
Color Scale Guide
| Scale | Low Value | High Value | Best For |
|---|---|---|---|
green-red |
Teal/Blue | Green | Positive activity (sign-ins, successful ops) |
blue-red |
Blue | Red | Threats/failures (attacks, errors, risks) |
blue-yellow |
Blue | Yellow | Neutral data (general distributions) |
Decision Tree
Is the data about threats/failures/attacks?
→ YES: Use "blue-red" (red = danger)
→ NO: Is high volume a positive indicator?
→ YES: Use "green-red" (green = success)
→ NO: Use "blue-yellow" (neutral)
Complete Examples
Example 1: Honeypot Attack Heatmap with Enrichment
# Query attack data
mcp_sentinel-data_query_lake({
"query": "SecurityEvent | where TimeGenerated between (datetime(<START_DATE>) .. datetime(<END_DATE>)) | where Computer contains '<HONEYPOT_SERVER>' | where EventID == 4625 | where IpAddress != '127.0.0.1' | summarize value = count() by row = IpAddress, column = format_datetime(bin(TimeGenerated, 1h), 'HH:mm') | project row, column, value | order by column asc, value desc | take 200"
})
# Enrich top IPs
python enrich_ips.py 80.94.95.83 193.142.147.209 101.36.107.228
# Display heatmap
mcp_sentinel-heat_show-signin-heatmap({
"data": [
{"row": "80.94.95.83", "column": "19:00", "value": 636},
{"row": "193.142.147.209", "column": "20:00", "value": 245},
...
],
"title": "Honeypot Attack Analysis - Click IP for Threat Intel",
"rowLabel": "Attacker IP",
"colLabel": "Hour (UTC)",
"valueLabel": "Failed Auth Attempts",
"colorScale": "blue-red",
"enrichment": [
{"ip": "80.94.95.83", "city": "Timișoara", "country": "RO", "org": "AS204428 SS-Net", "is_vpn": false, "abuse_confidence_score": 100, "total_reports": 975, "threat_categories": ["RDP Brute-Force", "Hacking"]},
{"ip": "193.142.147.209", "city": "Amsterdam", "country": "NL", "org": "AS213438 ColocaTel Inc.", "is_vpn": true, "abuse_confidence_score": 100, "total_reports": 30972, "threat_categories": ["SSH Brute-Force", "Port Scan"]}
]
})
Example 2: Sign-In Activity Overview
# Query sign-in data
mcp_sentinel-data_query_lake({
"query": "SigninLogs | where TimeGenerated > ago(24h) | where ResultType == 0 | summarize value = count() by row = AppDisplayName, column = format_datetime(bin(TimeGenerated, 1h), 'HH:mm') | project row, column, value | order by column asc"
})
# Display heatmap (no enrichment needed - not IP-based)
mcp_sentinel-heat_show-signin-heatmap({
"data": [
{"row": "Microsoft Teams", "column": "09:00", "value": 145},
{"row": "Outlook", "column": "09:00", "value": 312},
...
],
"title": "Sign-In Activity by Application (Last 24h)",
"rowLabel": "Application",
"colLabel": "Hour (UTC)",
"valueLabel": "Sign-ins",
"colorScale": "green-red"
})
Known Pitfalls
Column Sorting Is Lexicographic
Problem: The heatmap MCP app sorts columns alphabetically. Labels like Nov 10, Dec 01, Jan 05, Feb 02 will render as Dec → Feb → Jan → Nov — completely out of chronological order.
Solution: Always use ISO date format (YYYY-MM-DD) for time-based column labels. 2025-11-10, 2025-12-01, 2026-01-05 sorts correctly both alphabetically and chronologically.
// ✅ CORRECT — sortable column labels
| summarize value = count() by row = ..., column = format_datetime(bin(TimeGenerated, 7d), "yyyy-MM-dd")
// ❌ WRONG — alphabetic sort breaks chronological order
| summarize value = count() by row = ..., column = format_datetime(bin(TimeGenerated, 7d), "MMM dd")
For hourly heatmaps within a single day, HH:mm is fine (00:00–23:00 sorts correctly). The issue only affects multi-day/week/month labels.
When to Use Heatmaps
✅ Good Use Cases:
- Attack patterns over time (by hour/day)
- Comparing activity across entities (IPs, apps, users)
- Identifying peak activity periods
- Spotting anomalies in regular patterns
- Executive-friendly threat visualization
❌ Skip Heatmaps When:
- Fewer than 5 unique rows or columns (too sparse)
- Single-dimension data (use bar chart instead)
- Geographic data (use geomap skill instead)
- Real-time streaming data (heatmaps are for aggregated snapshots)
Last Updated: January 29, 2026
.github/skills/honeypot-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill honeypot-investigation -g -y
SKILL.md
Frontmatter
{
"name": "honeypot-investigation",
"description": "Use this skill when asked to analyze, investigate, or report on honeypot server security. Triggers on keywords like \"honeypot investigation\", \"analyze honeypot\", \"honeypot security\", \"honeypot report\", or when a server name is mentioned with honeypot analysis context. This skill provides comprehensive security analysis including attack patterns, threat intelligence correlation, IP enrichment, vulnerability assessment, and executive report generation.",
"drill_down_prompt": "Investigate honeypot {entity} — attack patterns, threat intel, vulnerability assessment",
"threat_pulse_domains": [
"endpoint",
"exposure"
]
}
Honeypot Investigation Agent - Instructions
Purpose
This agent performs comprehensive security analysis on honeypot servers to assess attack patterns, threat intelligence, vulnerabilities, and defensive effectiveness. Honeypots are decoy systems designed to attract attackers and provide early warning of emerging threats.
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Investigation Parameters - Input requirements
- Execution Workflow - Complete process with time tracking
- KQL Query Library - Validated query patterns
- Report Template - Executive markdown structure
- Error Handling - Troubleshooting guide
- Visualization Options - Heatmap and Geomap skills
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY honeypot investigation:
- ALWAYS calculate date ranges correctly (use current date from context)
- ALWAYS track and report time after each major step (mandatory per main instructions)
- ALWAYS run independent queries in parallel (drastically faster execution)
- ALWAYS save intermediate results to temp/ (enables debugging and auditing)
- ALWAYS use
create_filefor reports (NEVER use PowerShell terminal commands)
Date Range Rules (from main copilot-instructions):
- Real-time/recent searches: Add +2 days to current date for end range
- Example: Current date = Dec 12, 2025; Last 48 hours =
datetime(2025-12-10)todatetime(2025-12-14)
Investigation Parameters
Required Inputs
| Parameter | Description | Example |
|---|---|---|
| Honeypot Name | Server/device name | honeypot-server |
| Time Range | Investigation period | last 48 hours, last 7 days |
Automatic Derivations
- Start Date: Current date - time range
- End Date: Current date + 2 days (per date range rules)
- Output File:
reports/honeypot/Honeypot_Report_<hostname>_<timestamp>.md - Temp Files:
temp/honeypot_ips_<timestamp>.json,temp/honeypot_data_<timestamp>.json
Execution Workflow
🚨 MANDATORY: Time Tracking Pattern
YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:
[MM:SS] ✓ Step description (XX seconds)
Required Reporting Points:
- After Phase 1 (failed connection queries)
- After Phase 2 (IP enrichment + threat intel)
- After Phase 3 (incident filtering)
- After Phase 4 (vulnerability scan)
- After Phase 5 (report generation)
- Final: Total elapsed time
Phase 1: Query Failed Connections (PARALLEL)
Execute ALL THREE queries in parallel using mcp_sentinel-data_query_lake:
Query 1A: SecurityEvent (Windows Security Logs)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer contains honeypot // Use 'contains' for flexible hostname matching
| where EventID in (4625, 4771, 4776) // Failed logon attempts
| where isnotempty(IpAddress) and IpAddress != "-" // IpAddress is built-in field
| where IpAddress != "127.0.0.1" // Exclude localhost (internal honeypot traffic)
| summarize
FailedAttempts=count(),
FirstSeen=min(TimeGenerated),
LastSeen=max(TimeGenerated),
TargetAccounts=make_set(Account, 10)
by IpAddress, EventID
| extend EventType = case(
EventID == 4625, "Failed Logon",
EventID == 4771, "Kerberos Pre-Auth Failed",
EventID == 4776, "NTLM Auth Failed",
"Unknown")
| order by FailedAttempts desc
| take 50
Query 1B: W3CIISLog (IIS Web Server Logs)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
W3CIISLog
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where tolong(scStatus) >= 400 // HTTP errors (4xx/5xx) - scStatus is string type
| where cIP != "127.0.0.1" and cIP != "::1" // Exclude localhost (internal honeypot traffic)
| summarize
RequestCount=count(),
FirstSeen=min(TimeGenerated),
LastSeen=max(TimeGenerated),
TargetedURIs=make_set(csUriStem, 10),
StatusCodes=make_set(tolong(scStatus), 5) // Convert to long for proper aggregation
by IpAddress = cIP
| order by RequestCount desc
| take 50
Query 1C: DeviceNetworkEvents (Defender Network Traffic - INBOUND ONLY)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName =~ honeypot
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted", "ConnectionFound") // Successful inbound TCP connections
| where LocalPort in (3389, 80, 443, 445, 22, 21, 23, 8080, 8443) // Filter by attacked services (LocalPort = honeypot's listening port)
| where RemoteIP != "127.0.0.1" and RemoteIP != "::1" and RemoteIP != "::ffff:127.0.0.1" // Exclude localhost
| where RemoteIP !startswith "192.168." and RemoteIP !startswith "10." and RemoteIP !startswith "172.16." // Exclude RFC1918 private IPs
| where RemoteIP !startswith "fe80:" and RemoteIP !startswith "fc00:" and RemoteIP !startswith "fd00:" // Exclude IPv6 link-local and ULA
| where RemoteIP !startswith "::ffff:" // Filter out IPv6-mapped IPv4 addresses (reduces duplicate noise)
| summarize
ConnectionCount=count(),
FirstSeen=min(TimeGenerated),
LastSeen=max(TimeGenerated),
TargetedPorts=make_set(LocalPort, 10), // LocalPort = attacked services on honeypot
Actions=make_set(ActionType, 5)
by RemoteIP // RemoteIP = attacker source
| order by ConnectionCount desc
| take 50
IMPORTANT: This query shows TCP connection establishment (network layer), NOT successful authentication. Attackers who appear here may still fail at the authentication layer (SecurityEvent 4625). For honeypots, all inbound connections should be treated as reconnaissance/attack attempts.
After Phase 1 completes:
- Merge all three result sets
- Rank IPs by attack volume (prioritize SecurityEvent FailedAttempts, then W3CIISLog RequestCount, then DeviceNetworkEvents ConnectionCount)
- Select top 10-15 IPs for enrichment (focus on high-volume attackers, not one-off scanners)
- Extract unique IP addresses into array
- Save prioritized IPs only to
temp/honeypot_ips_<timestamp>.jsonin format:{"ips": ["1.2.3.4", "5.6.7.8", ...]} - Document total unique attacker count separately for report statistics
- Report elapsed time:
[MM:SS] ✓ Failed connection queries completed (XX seconds) - [total_count] unique IPs identified, top [enrichment_count] prioritized for enrichment
Phase 2: IP Enrichment & Threat Intelligence (PARALLEL)
Execute IP enrichment script AND Sentinel threat intel query in parallel:
2A: Run IP Enrichment Script
# Read prioritized IPs from JSON file (top 10-15 by attack volume)
# This reduces token consumption by ~80% while maintaining critical intelligence
$env:PYTHONPATH = "<WORKSPACE_ROOT>"
cd "<WORKSPACE_ROOT>"
.\.venv\Scripts\python.exe enrich_ips.py --file temp/honeypot_ips_<timestamp>.json
Enrichment provides (for prioritized IPs only):
- Geolocation (city, region, country)
- ISP/Organization (ASN, org name)
- VPN/Proxy/Tor detection (
is_vpn,is_proxy,is_tor) - Abuse reputation (
abuse_confidence_score,total_reports) - Shodan intelligence: open ports, CVEs, tags (e.g.,
eol-os,self-signed,c2), CPEs, hostnames - Risk level assessment (HIGH/MEDIUM/LOW)
Note: Enrichment script provides aggregated statistics for all IPs - use these summary stats in report narrative instead of listing every IP
2B: Query Sentinel Threat Intelligence
let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>", ...]); // From Phase 1 prioritized list (top 10-15 IPs)
ThreatIntelIndicators
| extend IndicatorType = replace_string(replace_string(replace_string(tostring(split(ObservableKey, ":", 0)), "[", ""), "]", ""), "\"", "")
| where IndicatorType in ("ipv4-addr", "ipv6-addr", "network-traffic")
| extend NetworkSourceIP = toupper(ObservableValue)
| where NetworkSourceIP in (target_ips)
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| extend TrafficLightProtocolLevel = tostring(parse_json(AdditionalFields).TLPLevel)
| extend ActivityGroupNames = extract(@"ActivityGroup:(\S+)", 1, tostring(parse_json(Data).labels))
| summarize arg_max(TimeGenerated, *) by NetworkSourceIP
| project
TimeGenerated,
IPAddress = NetworkSourceIP,
ThreatDescription = Description,
ActivityGroupNames,
Confidence,
ValidUntil,
TrafficLightProtocolLevel,
IsActive
| order by Confidence desc, TimeGenerated desc
After Phase 2 completes:
- Merge IP enrichment JSON with Sentinel threat intel results
- Save combined data to
temp/honeypot_data_<timestamp>.json - Report elapsed time:
[MM:SS] ✓ IP enrichment completed (XX seconds)
Phase 3: Query Security Incidents (Sentinel KQL)
Step 3A: Get Device ID from Sentinel
let honeypot = '<HONEYPOT_NAME>';
DeviceInfo
| where TimeGenerated > ago(30d)
| where DeviceName =~ honeypot or DeviceName contains honeypot
| summarize arg_max(TimeGenerated, *)
| project DeviceId, DeviceName, OSPlatform, OSVersion, PublicIP
Extract DeviceId (GUID) from result - returns single most recent device record.
Step 3B: Query Security Incidents
let targetDevice = "<HONEYPOT_NAME>";
let targetDeviceId = "<DEVICE_ID>"; // REQUIRED: Get from DeviceInfo query (Step 3A)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let relevantAlerts = SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has targetDevice or Entities has targetDeviceId
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProviderName, Tactics;
SecurityIncident
| where CreatedTime between (start .. end) // Filter on CreatedTime for incidents created in range
| summarize arg_max(TimeGenerated, *) by ProviderIncidentId // Get most recent state per ProviderIncidentId
| project ProviderIncidentId, Title, Severity, Status, Classification, CreatedTime, LastModifiedTime, Owner, AdditionalData, AlertIds, Labels
| where not(tostring(Labels) has "Redirected") // Exclude merged incidents
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend ProviderIncidentUrl = tostring(AdditionalData.providerIncidentUrl)
| extend OwnerUPN = tostring(Owner.userPrincipalName)
| extend LastModifiedTime = todatetime(LastModifiedTime)
| summarize
Title = any(Title),
Severity = any(Severity),
Status = any(Status),
Classification = any(Classification),
CreatedTime = any(CreatedTime),
LastModifiedTime = any(LastModifiedTime),
OwnerUPN = any(OwnerUPN),
ProviderIncidentUrl = any(ProviderIncidentUrl),
AlertCount = count(),
MitreTactics = make_set(Tactics)
by ProviderIncidentId
| order by LastModifiedTime desc
| take 10
IMPORTANT:
-
This query joins SecurityIncident with SecurityAlert to provide full incident context
-
Deduplication: The final
summarizestatement collapses multiple alerts per incident into a single row (groups by ProviderIncidentId) -
Filter on
CreatedTimeto find incidents created in the investigation period -
Use
arg_max(TimeGenerated, *) by IncidentNumberto get the most recent update for each incident (includes status changes, comments, etc.) -
Returns up to 10 unique incidents (grouped by ProviderIncidentId to ensure one row per external incident ID)
-
⚠️ CHECK STATUS FIELD: Only report incidents with Status="New" or "Active" as threats. Status="Closed" + Classification="BenignPositive" = expected honeypot activity (do not flag as threat)
After Phase 3 completes:
- Report elapsed time:
[MM:SS] ✓ Security incidents query completed (XX seconds)
Phase 4: Vulnerability Assessment
⚠️ CRITICAL: TVM tables are snapshot tables — NO time filtering!
DeviceTvmSoftwareVulnerabilitieshas NOTimestamporTimeGeneratedcolumn- Do NOT add
where Timestamp between (...)— it will fail with a schema error - Do NOT use Sentinel Data Lake (
query_lake) — TVM tables are only available via Advanced Hunting - Use
RunAdvancedHuntingQueryMCP tool only
Step 4A: Query Vulnerabilities via Advanced Hunting KQL
let deviceName = '<HONEYPOT_NAME>';
DeviceTvmSoftwareVulnerabilities
| where DeviceName startswith deviceName
| project
CveId,
VulnerabilitySeverityLevel,
SoftwareVendor,
SoftwareName,
SoftwareVersion,
RecommendedSecurityUpdate,
RecommendedSecurityUpdateId
| summarize by CveId, VulnerabilitySeverityLevel, SoftwareVendor, SoftwareName, SoftwareVersion, RecommendedSecurityUpdate, RecommendedSecurityUpdateId
| order by case(VulnerabilitySeverityLevel == "Critical", 1, VulnerabilitySeverityLevel == "High", 2, VulnerabilitySeverityLevel == "Medium", 3, 4) asc
| take 30
Key columns returned:
CveId— CVE identifier (e.g., CVE-2025-15467)VulnerabilitySeverityLevel— String: Critical / High / Medium / LowSoftwareVendor,SoftwareName,SoftwareVersion— Affected software detailsRecommendedSecurityUpdate— Patch info (may be empty)
🔴 PROHIBITED:
- ❌ Adding
TimestamporTimeGeneratedfilters (column does not exist) - ❌ Projecting
CvssScore(column does not exist — useVulnerabilitySeverityLevelinstead) - ❌ Using Sentinel Data Lake MCP (
query_lake) for TVM tables - ❌ Using
GetDefenderMachineVulnerabilitiesAPI (requires separate machine ID lookup, less reliable)
After Phase 4 completes:
- Report elapsed time:
[MM:SS] ✓ Vulnerability scan completed (XX seconds)
Phase 5: Generate Executive Report
Use the Report Template (see section below) to create markdown report.
Critical Report Sections:
- Executive Summary - High-level findings (2-3 paragraphs)
- Attack Surface Analysis - Failed connections by IP, service, pattern
- Threat Intelligence Correlation - Known malicious IPs, APT groups, VPNs
- Security Incidents - Incidents triggered by honeypot activity
- Attack Pattern Analysis - Targeted services, credential attacks, web exploits
- Vulnerability Status - Current CVEs and exploitation risk
- Key Detection Insights - TTPs, MITRE ATT&CK mapping, novel indicators
- Honeypot Effectiveness - Metrics and recommendations
- Conclusion - Summary and next steps
Report Generation:
- Populate template with data from Phases 1-4
- Use
create_fileto save:reports/honeypot/Honeypot_Report_<hostname>_<timestamp>.md - Return absolute path to user
After Phase 5 completes:
- Report elapsed time:
[MM:SS] ✓ Report generated (XX seconds) - Provide comprehensive timeline breakdown with total elapsed time
KQL Query Library
Additional Useful Queries
Query: Top Targeted User Accounts (Credential Attacks)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where EventID == 4625 // Failed logon
| summarize FailedAttempts = count() by Account
| order by FailedAttempts desc
| take 20
Query: Web Exploitation Patterns (SQL Injection, XSS, Path Traversal)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
W3CIISLog
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where csUriStem has_any ("'", "union", "select", "script", "../", "..\\", "cmd.exe", "powershell")
| summarize
AttemptCount = count(),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
UniqueIPs = dcount(cIP)
by ExploitPattern = case(
csUriStem has_any ("'", "union", "select"), "SQL Injection",
csUriStem has "script", "XSS",
csUriStem has_any ("../", "..\\"), "Path Traversal",
csUriStem has_any ("cmd.exe", "powershell"), "Command Injection",
"Other")
| order by AttemptCount desc
Query: Port Scanning Detection
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName =~ honeypot
| summarize
DistinctPorts = dcount(RemotePort),
PortsScanned = make_set(RemotePort),
EventCount = count()
by RemoteIP
| where DistinctPorts >= 5 // Threshold: 5+ ports = scan
| order by DistinctPorts desc
| take 20
Query: Brute Force Detection (High Volume from Single IP)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
let threshold = 50; // 50+ failed attempts = brute force
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where EventID == 4625
| extend IpAddress = extract(@"Source Network Address:\s+([^\s]+)", 1, tostring(EventData))
| summarize FailedAttempts = count() by IpAddress
| where FailedAttempts >= threshold
| order by FailedAttempts desc
Report Template
Use this structure for executive reports:
# Honeypot Security Analysis - <HONEYPOT_NAME>
**Analysis Period:** <START_DATE> to <END_DATE> (<HOURS> hours)
**Report Generated:** <TIMESTAMP>
**Classification:** CONFIDENTIAL
---
## Executive Summary
[3 comprehensive paragraphs covering attack overview, threat landscape, and value delivered]
**Key Metrics:**
- **Total Attack Attempts:** [count]
- **Unique Attacking IPs:** [count]
- **Security Incidents Triggered:** [count]
- **Known Malicious IPs (Threat Intel):** [count] ([percentage]%)
- **Current Vulnerabilities:** [count] HIGH, [count] MEDIUM
---
## 1. Attack Surface Analysis
[Failed connections by source IP, geographic distribution, VPN/anonymization summary]
## 2. Threat Intelligence Correlation
[IPs matched in threat intel, highest confidence threats, MSTIC indicators]
## 3. Security Incidents
[Incidents involving honeypot with severity, status, classification, MITRE tactics]
## 4. Attack Pattern Analysis
[Targeted services, credential attacks, web exploitation, port scanning]
## 5. Honeypot Vulnerability Status
[CVE inventory, exploitation risk assessment, cross-reference with attacks]
## 6. Key Detection Insights
[MITRE ATT&CK mapping, novel indicators, threat actor attribution]
## 7. Honeypot Effectiveness
[Detection metrics, recommendations for optimization]
## 8. Conclusion
[Summary, key takeaways, immediate/short-term/long-term actions]
---
**Investigation Timeline:**
[Phase timing breakdown]
**Total Investigation Time:** [duration]
Error Handling
Common Issues and Solutions
| Issue | Solution |
|---|---|
| Missing honeypot in DeviceInfo table | Verify device name; check if device reports to Defender; try Computer field instead |
| No SecurityEvent logs | Device may not be sending Windows Security logs; verify log forwarding configuration |
| W3CIISLog table not found | IIS logging may not be enabled; query WebAccessLog or HTTP logs instead |
| IP enrichment script fails | Check ipinfo.io token in config.json; verify internet connectivity; check temp file exists |
| Date range returns no results | Verify date calculation (current date from context + proper offset); expand time range |
| KQL timeout | Reduce take limit; narrow time range; remove complex aggregations |
Validation Checklist
Before delivering report, verify:
- ✅ All Phase timestamps reported to user
- ✅ Total elapsed time calculated and displayed
- ✅ IP enrichment data merged with attack logs
- ✅ Incident filtering correctly applied (only honeypot-related incidents)
- ✅ Vulnerability data retrieved (or documented as unavailable)
- ✅ Report saved to correct path:
reports/honeypot/Honeypot_Report_<hostname>_<timestamp>.md - ✅ Absolute path returned to user
Integration with Main Copilot Instructions
This skill follows all patterns from the main copilot-instructions.md:
- Date range handling: Uses +2 day rule for real-time searches
- Parallel execution: Runs independent queries simultaneously
- Time tracking: Mandatory reporting after each phase
- Token management: Uses
create_filefor all output - KQL best practices: Follows Sample KQL Query patterns
- IP enrichment: Uses documented
enrich_ips.pyutility
Example invocations:
- "Investigate the honeypot HONEYPOT-01 over the last 48 hours"
- "Run honeypot security analysis for honeypot-server-01 from Dec 10-12"
- "Generate honeypot report for [hostname] last 7 days"
Visualization Options
After completing the investigation, offer to visualize the attack data using the dedicated visualization skills:
Heatmap Visualization
Use the heatmap-visualization skill (.github/skills/heatmap-visualization/SKILL.md) to show attack patterns over time with threat intel drill-down.
When to offer:
- ✅ After completing honeypot investigation phases
- ✅ When user asks "show me the attack patterns" or "visualize the attacks"
- ✅ For comparing attack volumes across time periods
- ❌ Skip if investigation found minimal activity (<5 unique IPs)
Geomap Visualization
Use the geomap-visualization skill (.github/skills/geomap-visualization/SKILL.md) to show attack origins on a world map.
When to offer:
- ✅ After completing honeypot investigation phases
- ✅ When user asks "where are the attacks coming from?" or "show on a map"
- ✅ For geographic threat distribution analysis
- ❌ Skip if all IPs are from the same region
Note: W3CIISLog includes native RemoteIPLatitude and RemoteIPLongitude fields - use these directly for geomap visualization without additional enrichment.
Last Updated: January 29, 2026
.github/skills/kql-query-authoring/SKILL.md
npx skills add SCStelz/security-investigator --skill kql-query-authoring -g -y
SKILL.md
Frontmatter
{
"name": "kql-query-authoring",
"description": "Use this skill when asked to write, create, or help with KQL (Kusto Query Language) queries for Microsoft Sentinel, Defender XDR, or Azure Data Explorer. Triggers on keywords like \"write KQL\", \"create KQL query\", \"help with KQL\", \"query [table]\", \"KQL for [scenario]\", or when a user requests queries for specific data analysis scenarios. This skill uses schema validation, Microsoft Learn documentation, and community examples to generate production-ready KQL queries."
}
KQL Query Authoring - Instructions
Purpose
Generate validated, production-ready KQL queries by combining schema validation (331+ indexed tables), Microsoft Learn documentation, community examples, and performance best practices.
Prerequisites
Required MCP Servers:
-
KQL Search MCP Server — Schema validation, query examples, table discovery
- Install:
npm install -g kql-search-mcp(npm)
- Install:
-
Microsoft Docs MCP Server — Official Microsoft Learn documentation and code samples
- GitHub: MicrosoftDocs/mcp
Verification: Tools should be available as mcp_kql-search_* and mcp_microsoft-lea_*.
⚠️ Known Issues
search_favorite_repos Bug (v1.0.5)
❌ Broken — ERROR_TYPE_QUERY_PARSING_FATAL. Use mcp_kql-search_search_github_examples_fallback instead.
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
-
Validate table schema FIRST —
mcp_kql-search_get_table_schemato verify table exists, column names, and data types. -
Check platform schema — Sentinel uses
TimeGenerated; Defender XDR usesTimestamp. Microsoft Learn examples default to XDR syntax — always convert before testing on Sentinel. -
Check local query library FIRST — Use the discovery manifest (
.github/manifests/discovery-manifest.yaml) for domain/MITRE lookups andgrep_searchfor table-name/keyword lookups. See the KQL Pre-Flight Checklist incopilot-instructions.mdfor the full priority order. -
Query file structure: NO placeholder TOC — When creating a new query file, do NOT add a
## Quick Reference — Query Indexheading or placeholder.scripts/generate_tocs.pycreates the heading and table itself. Pre-creating it confuses the strip-and-reinsert logic and produces duplicated content. See## Creating Query Filesbelow for full file structure rules. -
Use multiple sources — Schema (authoritative column names) + Microsoft Learn (official patterns) + community queries (real-world examples).
-
Test using the correct execution tool — Follow the Tool Selection Rule in
copilot-instructions.md:- Sentinel-native tables → Data Lake or AH
- XDR tables ≤ 30d → Advanced Hunting (free); > 30d → Data Lake
- XDR-only tables (DeviceTvm*, Exposure*) → Advanced Hunting only
- Adapt timestamp column when switching tools
-
Test queries before presenting to user — Run with
| take 5via live execution. Usemcp_kql-search_validate_kql_queryas fallback if live testing unavailable. -
Provide context — Explain what the query does, expected results, and any limitations.
-
Read the complete workflow below before starting.
📋 Inherited rules: This skill inherits the KQL Pre-Flight Checklist, Tool Selection Rule (Data Lake vs Advanced Hunting), and Known Table Pitfalls from
copilot-instructions.md. Those rules are authoritative — do not contradict them here.
Query Authoring Workflow
Step 1: Understand User Requirements
Extract key information:
- Table(s) needed: Which data source? (e.g.,
EntraIdSignInEvents,EmailEvents,SecurityAlert) - Time range: How far back? (e.g., last 7 days, specific date range)
- Filters: What specific conditions? (e.g., user, IP, threat type)
- Output: Statistics, detailed records, time series, aggregations?
- Platform: Sentinel or Defender XDR? (affects column names)
- Deployment target: Custom detection rule? (see below)
Custom Detection Intent Detection:
If the user mentions "custom detection", "detection rule", "deploy as detection", "CD rule", "author detections for", or "deploy to Defender":
- Read the detection-authoring skill (
.github/skills/detection-authoring/SKILL.md) — Critical Rules and CD Metadata Contract sections - Design queries with CD constraints — row-level output, mandatory columns (
TimeGenerated,DeviceName,ReportId), no baresummarize - Include
cd-metadatablocks in the output file (see Step 8) - Still write queries in Sentinel format (with
letvariables, 7d lookback) — adaptation to CD format happens at deployment time via the detection-authoring skill
Step 2: Check Local Query Library
Search for existing verified queries before writing from scratch. Use two complementary methods:
- Manifest lookup (domain/MITRE): Read
.github/manifests/discovery-manifest.yamland match by domain tag (e.g.,identity,endpoint,email) or MITRE technique ID (e.g.,T1078,T1566). Best when you know the security domain or ATT&CK technique. - Targeted
grep_search(table/keyword):grep_searchfor the specific table name (e.g.,CloudAppEvents,OfficeActivity) or operation keyword (e.g.,New-InboxRule,SecretGet) scoped toqueries/**and.github/skills/**. The manifest lacks table-name and keyword fields — grep fills this gap. - Check the Ad-Hoc Query Examples appendix in
copilot-instructions.md
When to use which: Domain/technique known → manifest first. Table name/operation known → grep first. Both can be used together — manifest for breadth, grep for precision.
If a suitable query is found, adapt it and skip to Step 6. These queries encode known pitfalls and schema quirks.
Step 3: Get Table Schema (MANDATORY)
mcp_kql-search_get_table_schema("<table_name>")
Returns: category, description, all columns with data types, and example queries. Use this to verify column names and understand data types.
Step 4: Get Official Code Samples
mcp_microsoft-lea_microsoft_code_sample_search(
query: "<table_name> <scenario description>",
language: "kusto"
)
Include table name + scenario in the query (e.g., "EmailEvents phishing detection").
Step 5: Get Community Examples
mcp_kql-search_search_github_examples_fallback(
table_name: "<table_name>",
description: "<goal description>"
)
Also available: mcp_kql-search_search_kql_repositories to find KQL-focused repos.
Step 6: Generate Query
Combine insights: schema for column names, Learn for patterns, community for techniques.
Standalone queries rule: When generating MULTIPLE separate queries, each must start directly with the table name — never use shared let variables across separate queries (they run independently). Use let variables only within a single complex query.
Step 7: Validate and Test (MANDATORY)
Test queries against live data before presenting to the user.
- Convert
Timestamp→TimeGeneratedif adapting MS Learn examples for Sentinel - Test via
mcp_sentinel-data_query_lakeorRunAdvancedHuntingQuerywith| take 5 - Verify results are sensible — check for empty results (wrong table/time/filters)
- Fix schema mismatches or syntax errors, re-test
- Remove test limits, present to user
Common errors:
| Error | Fix |
|---|---|
Failed to resolve column 'Timestamp' |
Use TimeGenerated (Sentinel) |
Failed to resolve column 'TimeGenerated' |
Use Timestamp (XDR AH) |
Table not found |
Verify with get_table_schema; try the other execution tool |
expected string expression |
Add tostring() after mv-expand or parse_json |
| Query timeout / too many results | Add datetime filter + take or summarize |
Fallback validation: mcp_kql-search_validate_kql_query("<query>") — syntax/schema check only, no live data.
Step 8: Format and Deliver Output
Single query: Provide directly in chat with brief explanation and expected results.
Multiple queries (3+): Create a markdown file in queries/<subfolder>/ with the standardized metadata header. This header is mandatory — build_manifest.py parses it to index the file for discovery by threat-pulse and other skills.
File naming: queries/<subfolder>/<topic>.md — e.g., queries/email/email_threat_detection.md
Required metadata header template (first 10 lines of every query file):
# <Descriptive Title>
**Created:** YYYY-MM-DD
**Platform:** Microsoft Sentinel | Microsoft Defender XDR | Both
**Tables:** <comma-separated exact KQL table names>
**Keywords:** <comma-separated searchable terms — attack techniques, scenarios, field names>
**MITRE:** <comma-separated technique IDs, e.g., T1098.001, T1136.003, TA0008>
**Domains:** <comma-separated domain tags from the valid set below>
**Timeframe:** Last N days (configurable)
Valid domain tags: incidents, identity, spn, endpoint, email, admin, cloud, exposure
| Field | Purpose | Parsed By |
|---|---|---|
Tables: |
Exact KQL table names for grep_search discovery |
build_manifest.py (full manifest) |
Keywords: |
Searchable terms for attack scenarios, operations, field names | build_manifest.py (full manifest) |
MITRE: |
ATT&CK technique/tactic IDs for cross-referencing | build_manifest.py (slim + full) |
Domains: |
Domain tags for threat-pulse cross-referencing | build_manifest.py (slim + full) — missing = validation error |
After creating a new query file: Run python .github/manifests/build_manifest.py to regenerate the discovery manifest, then run python scripts/generate_tocs.py to auto-generate the Quick Reference TOC. The validator will flag any missing required fields.
Subfolder selection: Place files in the subfolder matching the primary data source: identity/, endpoint/, email/, network/, cloud/.
Include per-query documentation with Purpose, Thresholds, Expected Results, and Tuning guidance.
Heading format for TOC compatibility: The generate_tocs.py script auto-generates a Quick Reference TOC by scanning ### and ## Query headings that have a KQL code block within 40 lines. To ensure clean TOC output:
- ✅ DO use
### Query N: <Title>or## Query N: <Title>for query headings — the number prefix ensures proper TOC ordering - ✅ DO add a
##heading (e.g.,## Queries,## Part A:,## Hunts) immediately before the first### Query N:if the file has preamble content (Overview, Table Selection, etc.). The TOC generator uses a---→##heading pair as its insertion anchor — without it, the script inserts the TOC at the bottom of the file. - ✅ DO start non-query section headings with a non-query keyword (e.g.,
### Deployment,### Tuning,### References) — these are automatically filtered out by the TOC generator - ❌ DO NOT add a
## Quick Reference — Query Indexheading or placeholder yourself — the script creates the heading and table. Pre-existing placeholders cause duplicated content and a broken file structure. (This is also enforced as Critical Rule #4 above.) - ❌ DO NOT use
###headings for non-query content that contains a KQL code block within 40 lines — the TOC generator uses KQL proximity to detect query headings and will incorrectly include them
Investigation shortcuts (optional): Query files can include an **Investigation shortcuts:** bulleted list between the ## Quick Reference heading and the TOC table. These document recommended query combos for common investigation scenarios (e.g., "Delivered phishing drill-down: Q2.4 + Q7.6 + Q3.3"). Shortcuts are preserved by generate_tocs.py across re-runs. Don't add them to new files — they're a refinement added after real investigations reveal which query combos work best together.
CD-Aware Output
When CD intent is detected (Step 1), each query MUST include a <!-- cd-metadata --> HTML comment block. The full schema is in .github/skills/detection-authoring/SKILL.md under CD Metadata Contract.
Valid cd-metadata fields (exhaustive list):
| Field | Required | Notes |
|---|---|---|
cd_ready |
Always | true or false |
schedule |
If cd_ready | "0" (NRT), "1H", "3H", "12H", "24H" |
category |
If cd_ready | MITRE tactic (e.g., Persistence, CredentialAccess) |
title |
Optional | Dynamic title with {{Column}} placeholders (max 3 unique columns across title + description) |
impactedAssets |
If cd_ready | Array of type + identifier pairs |
recommendedActions |
Optional | Triage and response guidance string |
adaptation_notes |
Optional | What needs to change for CD format |
⛔ responseActions is NOT a valid cd-metadata field. It shares a name with the Graph API field that is explicitly prohibited in LLM-authored detections ("responseActions": [] is mandatory). Do not include it. Put incident response guidance in recommendedActions instead.
<!-- cd-metadata
cd_ready: true
schedule: "1H"
category: "Persistence"
title: "Suspicious Scheduled Task on {{DeviceName}}"
impactedAssets:
- type: device
identifier: DeviceName
recommendedActions: "Investigate the task XML and decode any encoded payloads."
adaptation_notes: "Remove let blocks, add mandatory columns"
-->
For queries not suitable for CD (baseline/statistical):
<!-- cd-metadata
cd_ready: false
adaptation_notes: "Statistical baseline — requires bare summarize, not CD-compatible"
-->
Summary table: Include a CD column in the Implementation Priority table: ✅ 1H / ❌.
Tool Quick Reference
| Tool | Purpose |
|---|---|
mcp_kql-search_get_table_schema |
Get table columns, types, example queries (Step 3) |
mcp_microsoft-lea_microsoft_code_sample_search |
Official MS Learn KQL samples — use language: "kusto" (Step 4) |
mcp_kql-search_search_github_examples_fallback |
Community KQL examples by table name (Step 5) |
mcp_kql-search_search_kql_repositories |
Find GitHub repos with KQL collections |
mcp_kql-search_validate_kql_query |
Syntax/schema validation (fallback for Step 7) |
mcp_kql-search_find_column |
Find which tables contain a specific column |
mcp_kql-search_generate_kql_query |
Auto-generate schema-validated query from natural language |
mcp_sentinel-data_query_lake |
Execute KQL against live Sentinel (primary validation) |
mcp_sentinel-data_search_tables |
Discover tables using natural language |
Schema Differences
| Platform | Timestamp Column | Notes |
|---|---|---|
| Sentinel / Log Analytics | TimeGenerated |
All ingested logs |
| Defender XDR (Advanced Hunting) | Timestamp |
XDR-native tables only; Sentinel tables in AH still use TimeGenerated |
Other common differences: Identity/UserPrincipalName (Sentinel) vs AccountUpn/AccountName (XDR); IPAddress (Sentinel) vs RemoteIP/LocalIP (XDR). Always verify with get_table_schema.
Sign-In Table Selection (High-Frequency Queries)
Sign-in queries are the most common query type. Use this decision rule:
| Scenario | Table | Key Differences |
|---|---|---|
| AH query, ≤30d | EntraIdSignInEvents (single table) |
Covers both interactive + non-interactive. ErrorCode (int), AccountUpn, Country/City (direct strings), LogonType (JSON array — use has), Timestamp |
| Data Lake / >30d | SigninLogs + AADNonInteractiveUserSignInLogs (union) |
ResultType (string), UserPrincipalName, parse_json(LocationDetails) needed for geo, IsInteractive (bool), TimeGenerated |
Common mistakes:
- Using
union SigninLogs, AADNonInteractiveUserSignInLogsin AH queries — unnecessary,EntraIdSignInEventscovers both - Using
LogonType == "nonInteractiveUser"— values are JSON arrays (["nonInteractiveUser"]), usehas - Using
ResultTypeonEntraIdSignInEvents— column isErrorCode(int), not string
Full details: See
copilot-instructions.md→ Known Table Pitfalls →EntraIdSignInEvents (AH table preference rule)for complete column mapping and additional pitfalls.
Full table pitfalls (dynamic field parsing, immutable fields, table casing, deprecated tables) are documented in
copilot-instructions.mdunder Known Table Pitfalls. Refer there forSecurityAlert.Status,AuditLogs.InitiatedBy,SigninLogs.DeviceDetail, and 20+ other table-specific gotchas.
Best Practices
Performance Optimization
Reference: KQL Best Practices — Microsoft Learn
1. Filter on datetime columns first
The most important optimization. Datetime predicates use efficient index-based shard elimination, skipping entire data partitions without scanning.
// ✅ Correct — datetime first, then selective string filters
SigninLogs
| where TimeGenerated > ago(7d)
| where UserPrincipalName =~ "user@domain.com"
// ❌ Wrong — string filter before datetime
SigninLogs
| where UserPrincipalName =~ "user@domain.com"
| where TimeGenerated > ago(7d)
2. Use has over contains for token matching
has uses the term index for full-token lookup. contains scans every character — dramatically slower on large tables.
// ✅ Faster — term-level index lookup
| where UserPrincipalName has "admin"
// ❌ Slower — full substring scan
| where UserPrincipalName contains "admin"
Use contains only when you genuinely need substring matching (e.g., fragments inside URL paths).
3. Prefer case-sensitive operators
Case-sensitive comparisons (==, in, has_cs) are faster than case-insensitive (=~, in~, has). Use case-insensitive only when casing is unpredictable.
// ✅ Faster — ActionType, Operation, OfficeWorkload have consistent casing
| where ActionType == "LogonFailed"
| where Operation in ("New-InboxRule", "Set-InboxRule")
| where OfficeWorkload == "Exchange"
// 🔵 Use =~ only when casing varies (e.g., user-entered UPNs)
| where UserPrincipalName =~ "user@domain.com"
Common fields with consistent casing (always use == / in): ActionType, Operation, OfficeWorkload, EventID, ResultType, DeliveryAction, EmailDirection, LogonType, Severity, Status, Classification.
4. Filter tables BEFORE joins
Pre-filter both sides of a join to reduce data volume. Move where clauses into subqueries.
// ✅ Correct — filter KB table before joining
DeviceTvmSoftwareVulnerabilities
| join kind=inner (
DeviceTvmSoftwareVulnerabilitiesKB
| where IsExploitAvailable == true
| where CvssScore >= 8.0
) on CveId
// ❌ Wrong — joins full tables, filters after
DeviceTvmSoftwareVulnerabilities
| join kind=inner DeviceTvmSoftwareVulnerabilitiesKB on CveId
| where IsExploitAvailable == true
Join sizing rules:
- Smaller table on the left (or
hint.strategy=broadcastwhen left is small) ininstead ofleft semi joinfor single-column filteringlookupinstead ofjoinwhen right side is small (<50 MB)hint.shufflekey=<key>when both sides are large with high-cardinality join key
5. Use materialize() for multi-referenced let statements
Without materialize(), the engine may recompute the let expression each time it's referenced.
// ✅ Computed once, reused twice
let SprayFailures = materialize(
EntraIdSignInEvents
| where Timestamp > ago(7d)
| where ErrorCode in (50126, 50053, 50057)
| summarize FailedAttempts = count(), TargetUsers = dcount(AccountUpn)
by SourceIP = IPAddress
| where TargetUsers >= 5);
6. Narrow arg_max to only needed columns
arg_max(TimeGenerated, *) materializes every column. Specify only what you use.
// ✅ Only 5 columns materialized
SecurityAlert
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, Entities, Tactics, Techniques, AlertName, AlertSeverity) by SystemAlertId
// ❌ Materializes all 30+ columns
SecurityAlert
| summarize arg_max(TimeGenerated, *) by SystemAlertId
7. Pre-filter before JSON parsing
For rare key/value lookups in dynamic columns, use has to eliminate rows before expensive parse_json().
// ✅ Term filter first, JSON parse on survivors
AuditLogs
| where tostring(TargetResources) has "MyApp"
| extend Target = tostring(parse_json(tostring(TargetResources[0])).displayName)
| where Target == "MyApp"
8. Filter on table columns, not calculated columns
Filtering on native columns enables index usage; calculated columns force full scans.
// ✅ Filter on native column
SecurityEvent | where EventID == 4625
// ❌ Filter on calculated column
SecurityEvent | extend Cat = case(EventID == 4625, "Fail", ...) | where Cat == "Fail"
9. Project only needed columns early
Drop unnecessary columns before expensive operators (join, summarize, mv-expand) to reduce memory and shuffling.
10. Use take or summarize to limit results
Unbounded queries on large tables consume excessive resources.
11. Platform-specific dynamic column access
In AH, AuditLogs.InitiatedBy and TargetResources are native dynamic — use direct dot-notation. In Data Lake, they may be string-typed requiring parse_json().
// ✅ Advanced Hunting — direct access
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)
// ✅ Data Lake — parse_json wrapper
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)
// 🔵 Safe in both — stringify full field
| where tostring(InitiatedBy) has "user@domain.com"
Security and Privacy
- Limit sensitive data exposure — redact PII with
strcat(substring(UPN, 0, 3), "***")when appropriate - Filter early — reduce dataset before projecting sensitive columns
Code Quality
- Comments — explain what the query does and why key filters are applied
- Meaningful variable names —
let SuspiciousIPs = ...notlet x = ... - Standalone queries — when providing multiple separate queries, each MUST start with the table name directly. Never share
letvariables across queries the user will run independently
Dynamic Type Casting
Common "expected string expression" error: After mv-expand, parse_json, or split, values are dynamic — string functions fail. Always convert first:
// After mv-expand
| mv-expand AuthDetails
| extend AuthMethod = tostring(AuthDetails.authenticationMethod)
// After split
| extend Parts = split(UPN, "@")
| extend Domain = tostring(Parts[1])
Rule of thumb: If you get "expected string expression", add tostring().
.github/skills/threat-pulse/SKILL.md
npx skills add SCStelz/security-investigator --skill threat-pulse -g -y
SKILL.md
Frontmatter
{
"name": "threat-pulse",
"description": "Recommended starting point for new users and daily SOC operations. 15-minute broad security scan across 7 domains (incidents, identity, NHI, endpoint, email, admin\/cloud, exposure) producing a Threat Pulse Dashboard with drill-down recommendations to specialized skills. Trigger on getting-started questions like \"where do I start\", \"what can you do\", \"help me investigate\"."
}
Threat Pulse — Instructions
Purpose
The Threat Pulse skill is a rapid, broad-spectrum security scan designed for the "if you only had 15 minutes" scenario. It executes 12 queries across 7 security domains in parallel, producing a prioritized dashboard of findings with drill-down recommendations to specialized investigation skills.
What this skill covers:
| Domain | Key Questions Answered |
|---|---|
| 🔴 Incidents | What incidents are open and unresolved? Prioritizes High/Critical, backfills with Medium/Low in smaller environments. How old are they? Who owns them? What was recently resolved — TP rate, MITRE tactics, severity distribution? |
| 🔐 Identity (Human) | Which users have the highest Defender XDR Risk Score (0-100)? Which are flagged by Identity Protection (RiskLevel/RiskStatus)? What risk events are driving the signals? Are there password spray / brute-force patterns? |
| 🤖 Identity (NonHuman) | Which service principals expanded their resource/IP/location footprint? |
| 💻 Endpoint | Which endpoints deviated most from their process behavioral baseline? What singleton process chains exist? |
| 📧 Email Threats | What's the phishing/spam/malware breakdown? Were any phishing emails delivered? |
| 🔑 Admin & Cloud Ops | What mailbox rules, OAuth consents, transport rules, or mailbox permission changes occurred? Is there programmatic mailbox access via API? Any MCAS-flagged compromised sign-ins? Human-initiated CA policy changes? Who performed high-impact admin operations — role assignments, MFA registration, app registration, ownership grants? |
| 🛡️ Exposure | Are any critical assets internet-facing with RCE vulnerabilities? What exploitable CVEs (CVSS ≥ 8) are present across the fleet? |
Data sources: SecurityIncident, SecurityAlert, IdentityInfo, AADUserRiskEvents, EntraIdSignInEvents, DeviceProcessEvents, DeviceLogonEvents, ExposureGraphNodes, AADServicePrincipalSignInLogs, EmailEvents, CloudAppEvents, AuditLogs, DeviceTvmSoftwareVulnerabilities, DeviceTvmSoftwareVulnerabilitiesKB
Portal URL patterns are defined in the Defender XDR Portal Links table in the Take Action section. Append tid=<tenant_id> (from config.json) to ALL security.microsoft.com URLs — use ?tid= or &tid= depending on existing query params. Omit if tenant_id is not configured.
📑 TABLE OF CONTENTS
- Critical Workflow Rules
- Execution Workflow — Phase 0–3
- Phase 4: Interactive Follow-Up Loop
- Take Action — Portal links, AH queries, defanging
- Sample KQL Queries — 12 queries
- Post-Processing — Drift scores, cross-query correlation
- Query File Recommendations
- Report Template — Inline + full markdown file structure
- Known Pitfalls
- Quality Checklist
- SVG Dashboard Generation
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
-
Workspace selection — Follow the SENTINEL WORKSPACE SELECTION rule from
copilot-instructions.md. Calllist_sentinel_workspaces()before first query. -
Read
config.json— Load workspace ID, tenant, subscription, and Azure MCP parameters before execution. -
Output defaults — Default to inline chat with 7d lookback. Only ask the user for output preferences if they explicitly mention a different mode (e.g., "save to file", "markdown report", "30 day lookback"). If the user just says "threat pulse", "run a scan", or similar — proceed immediately with defaults, do not prompt.
-
⛔ MANDATORY: Evidence-based analysis only — Every finding must cite query results. Every "clear" verdict must cite 0 results. Follow the Evidence-Based Analysis rule from
copilot-instructions.md. -
Parallel execution — Run the Data Lake query (Q5) and all Advanced Hunting queries (Q1, Q2, Q3, Q4, Q6, Q7, Q8, Q9, Q10, Q11, Q12) simultaneously.
-
Cross-query correlation — After all queries complete, check for correlated findings per the Cross-Query Correlation table in Post-Processing. Escalate priority when patterns match.
-
SecurityIncident output rule — Every incident MUST include a clickable Defender XDR portal URL:
https://security.microsoft.com/incidents/{ProviderIncidentId}?tid=<tenant_id>. See Tenant ID in Portal URLs. -
⛔ MANDATORY: Query File Recommendations (tiered) — After rendering the main report body (Dashboard Summary through Recommended Actions), append the Query File Recommendations section. This runs AFTER the report is visible to the user — not as a blocking gate. Skip only when ALL verdicts are ✅.
-
⛔ MANDATORY: 30d drill-down lookback — ALL Phase 4 drill-down queries use 30d (AH) or 90d (Data Lake) lookback, regardless of the Threat Pulse scan window. Entity-scoped queries (filtered by UPN/IP/device) have negligible performance difference between 7d and 30d, and attacks routinely predate the pulse window. AH caps at 30d anyway. Substitute
ago(7d)→ago(30d)in all query file and skill queries during drill-downs.
| Highest Verdict | Query Files | Proactive Skills | Report Section |
|---|---|---|---|
| 🔴 or 🟠 | Top 3–5, entity-specific prompts | All matching skills | 📂 Recommended Query Files |
| 🟡 (no 🔴/🟠) | Top 1–2, broader prompts | Up to 3 posture skills | 📂 Proactive Hunting Suggestions |
| All ✅ | Skip | Skip | Omit entirely |
- ⛔ MANDATORY: The follow-up loop is stateful, memory-backed, and self-sustaining. Three non-negotiable invariants that hold for the ENTIRE session (re-read this rule before any follow-up interaction):
- (a) Memory is the source of truth, not the conversation. The prompt pool lives ONLY in
/memories/session/threat-pulse-drilldowns.md. It MUST be created the first time the pool is built (Phase 4 step 1) and is a hard precondition for rendering any selection list. If you are about to present follow-up options and this file does not exist, STOP and create it first. NEVER reconstruct the pool from conversation history — alwaysmemory viewimmediately before eachvscode_askQuestionscall. - (b) The loop re-presents itself automatically. After EVERY completed drill-down, you MUST return to Phase 4 step 2 and call
vscode_askQuestionsagain with the updated pool — without waiting for the user to ask for the menu. The only exits are the user selectingSkip, or an empty pool. "Bring the menu back up" should never be something the user has to request. - (c) The Quick Pick Call Contract is mechanical, not advisory. Run the Pre-Flight Checklist and print the Pool Receipt line before every call. In particular: ZERO
recommendedkeys,multiSelect: true, correct icon taxonomy (🔍 📄 🎯 💾 🆕 🔄 📋), and the💾 / 🔄 / Skiptail every iteration. Do not substitute an ad-hoc "Done" option for the contracted tail.
Execution Workflow
Phase 0: Prerequisites
-
Read
config.jsonfor workspace ID and Azure MCP parameters -
Call
list_sentinel_workspaces()to enumerate available workspaces -
Use defaults (inline chat, 7d) unless user specified otherwise
-
⛔ MANDATORY: Display scan summary — Before executing any queries, output the following brief to the user as plain markdown text (NOT inside a fenced code block, NOT as inline code). Use the exact heading, line breaks, and emoji-prefixed bullet items shown below. Substitute
<WorkspaceName>,<WorkspaceId>, lookback, and output format. Never skip this step — it sets analyst expectations for what's about to run.🔍 Threat Pulse — Scan Plan
Workspace: <WorkspaceName> (<WorkspaceId>) Lookback: <N>d Output: <Inline / Markdown file / Both>
Executing 12 queries across 7 domains:
🔴 Incidents — Open incidents (severity-ranked) + 7d closed summary (Q1, Q2) 🔐 Identity — Identity risk posture, risk event enrichment, auth spray (Q3, Q4) 🤖 NonHuman ID — Service principal behavioral drift (Q5) 💻 Endpoint — Device process drift, rare process chains (Q6, Q7) 📧 Email — Inbound threat snapshot (Q8) 🔑 Admin & Cloud — Cloud app ops, privileged operations (Q9, Q10) 🛡️ Exposure — Critical assets, exploitable CVEs (Q11, Q12)
Data Lake: 1 query | Advanced Hunting: 11 queries in parallel Estimated time: ~2–4 minutes
Phase 1: Data Lake Query (Q5)
Why only 1 query on Data Lake? Q5 requires a 97-day lookback for SPN baseline computation — AH Graph API caps at 30 days. All other queries use ≤30d lookback on Analytics-tier tables accessible via AH.
| Query | Domain | Purpose | Tool |
|---|---|---|---|
| Q5 | 🤖 Identity (NonHuman) | Service principal behavioral drift (90d vs 7d) | query_lake |
Phase 2: Advanced Hunting Queries (Q1, Q2, Q3, Q4, Q6, Q7, Q8, Q9, Q10, Q11, Q12)
Run all 11 in parallel — no dependencies between queries.
Design rationale: The connected LA workspace makes all Sentinel tables (SecurityIncident, IdentityInfo, AADUserRiskEvents, AuditLogs, etc.) queryable via AH. AH is preferred: it's free for Analytics-tier tables and avoids per-query Data Lake billing.
| Query | Domain | Purpose | Tool |
|---|---|---|---|
| Q1 | 🔴 Incidents | Open incidents (severity-ranked backfill) with MITRE tactics | RunAdvancedHuntingQuery |
| Q2 | 🔴 Incidents | 7-day closed incident summary (classification, MITRE, severity) | RunAdvancedHuntingQuery |
| Q3 | 🔐 Identity (Human) | Identity risk posture (IdentityInfo) + risk event enrichment (AADUserRiskEvents) | RunAdvancedHuntingQuery |
| Q4 | 🔐 Identity (Human) | Password spray / brute-force across Entra ID + RDP/SSH | RunAdvancedHuntingQuery |
| Q6 | 💻 Endpoint | Fleet device process drift (7d baseline vs 1d) | RunAdvancedHuntingQuery |
| Q7 | 💻 Endpoint | Rare process chain singletons (30d) | RunAdvancedHuntingQuery |
| Q8 | Inbound email threat snapshot | RunAdvancedHuntingQuery |
|
| Q9 | 🔑 Admin & Cloud Ops | Cloud app suspicious activity (CloudAppEvents) | RunAdvancedHuntingQuery |
| Q10 | 🔑 Admin & Cloud Ops | High-impact admin operations (AuditLogs) | RunAdvancedHuntingQuery |
| Q11 | 🛡️ Exposure | Internet-facing critical assets | RunAdvancedHuntingQuery |
| Q12 | 🛡️ Exposure | Exploitable CVEs (CVSS ≥ 8) across fleet | RunAdvancedHuntingQuery |
Phase 3: Post-Processing & Report
- Interpret device drift scores from Q6 results (see Post-Processing)
- Run cross-query correlation checks (see rule 6 above)
- Assign verdicts to each domain (🔴 Escalate / 🟠 Investigate / 🟡 Monitor / ✅ Clear)
- Generate prioritized recommendations with drill-down skill references
- Render the report immediately — output the Dashboard Summary, Detailed Findings, Cross-Query Correlations, and 🎯 Recommended Actions. Do NOT block on the manifest or prompt pool building.
- After the report is rendered, run the Query File Recommendations procedure and append the
📂 Recommended Query Filessection. This happens while the user is already reading the report — no perceived delay. Skip entirely when all verdicts are ✅.
Performance note: The Recommendation Gate was previously a blocking step (Phase 3.5) that loaded the ~500-line manifest YAML and ranked entries before the report could render. By moving it after the report output, the user sees findings immediately while recommendations load in the background. The Phase 4 prompt pool building also benefits — it reuses the recommendations already computed in step 6 rather than re-scanning all 12 query results independently.
Phase 4: Interactive Follow-Up Loop
After rendering the report, present the user with a selectable list of follow-up actions — skill investigations, query file hunts, and IOC lookups. Runs when at least one 🔴, 🟠, or 🟡 verdict exists (skip only when ALL verdicts are ✅).
This is a loop, not a one-shot. After each action completes, re-present the selection list with the prompt pool updated. Tier depth (🔴/🟠 vs 🟡-only vs all ✅) follows Rule 8.
⛔ Loop invariant — verify before EVERY iteration (per Rule 10): (a)
/memories/session/threat-pulse-drilldowns.mdexists and was just re-read viamemory view— if not, create/read it first; (b) you are re-presenting the menu automatically after the prior drill-down, not because the user asked; (c) the Pre-Flight Checklist passed and the Pool Receipt was printed. If any of the three is false, fix it before callingvscode_askQuestions. The loop only ends onSkipor an empty pool.
Prompt types (three categories, one unified list):
| Type | Icon | Source | Example |
|---|---|---|---|
| Skill investigation | 🔍 | Per-query Drill-down: skill + entities from findings |
🔍 Investigate user jsmith@contoso.com → user-investigation |
| Query file hunt | 📄 | Manifest domain + MITRE matching → query file | 📄 Hunt for RDP lateral movement from 10.0.0.50 → queries/endpoint/rdp_threat_detection.md |
| IOC lookup | 🎯 | Suspicious IPs, domains, hashes surfaced in findings | 🎯 Enrich and investigate IP 203.0.113.42 → ioc-investigation |
Skill matching rules — derive from findings:
| Query | Trigger | Skill | Prompt |
|---|---|---|---|
| Q1 | Incident surfaced | incident-investigation |
Investigate incident <ProviderIncidentId> |
| Q1 | Incident with Exfiltration tactic or DLP/Insider Risk in AlertNames | data-security-analysis |
Analyze data security events for <entity> |
| Q2 | TruePositive > 0 with non-empty Techniques array |
mitre-coverage-report |
Run MITRE coverage report |
| Q3–Q4 | Username/UPN in findings | user-investigation |
Investigate <UPN> |
| Q3 | 3+ risky users, or any ConfirmedCompromised | identity-posture |
Run identity posture report |
| Q3 | User with anonymizedIPAddress, impossibleTravel, or anomalousToken in TopRiskEventTypes |
authentication-tracing |
Trace authentication chain for <UPN> |
| Q3 | User with unfamiliarFeatures or suspiciousAPITraffic in TopRiskEventTypes |
scope-drift-detection/user |
Analyze user behavioral drift for <UPN> |
| Q3+Q4 | 🟡-only identity verdicts (no 🔴/🟠) | identity-posture |
Run identity posture report |
| Q4 | Spray source IP | ioc-investigation |
Investigate IP <address> |
| Q4 | Spray targeting 5+ users | identity-posture |
Run identity posture report |
| Q5 | SPN with drift | scope-drift-detection/spn |
Analyze drift for <SPN> |
| Q6 | Device with DriftScore > 130 | scope-drift-detection/device |
Analyze device process drift for <hostname> |
| Q6–Q7 | Device in findings | computer-investigation |
Investigate device <hostname> |
| Q8 | Phishing delivered or malware detected | email-threat-posture |
Run email threat posture report |
| Q8+Q3 | Phishing recipient appears in Q3 risky users | authentication-tracing |
Trace authentication chain for <UPN> |
| Q9 | Compromised Sign-In user surfaced |
user-investigation + authentication-tracing |
Investigate <UPN> / Trace authentication chain for <UPN> |
| Q9 | Conditional Access Change by human actor |
ca-policy-investigation |
Investigate CA policy changes by <UPN> |
| Q9 | Exchange Admin/Rule Change actors |
user-investigation |
Investigate <UPN> |
| Q10 | MFA-Registration user |
user-investigation |
Investigate <UPN> |
| Q10 | AppRegistration or Ownership operations |
app-registration-posture |
Run app registration posture report |
| Q10 | AppRegistration targets containing AI/Agent/Copilot keywords |
ai-agent-posture |
Run AI agent security audit |
| Q10 | RoleManagement Global/Security Admin OR bulk Password resets from single actor |
identity-posture |
Run identity posture report |
| Q10 | 3+ categories with same actor in TopActors | user-investigation |
Investigate <UPN> |
| Q11 | Any IsVerifiedExposed == true asset |
exposure-investigation |
Run exposure report for <hostname> |
| Q11–Q12 | Device in findings | computer-investigation |
Investigate device <hostname> |
| Q12 | CVE with fleet impact | exposure-investigation |
Run vulnerability report for <CVE> |
Drill-down lookback — Per Rule 9, substitute
ago(7d)→ago(30d)(AH) orago(90d)(Data Lake) in all drill-down queries.
Procedure:
-
Build the initial prompt pool by combining:
- Skill prompts: one per unique entity + matching skill from the table above. If the same entity appears in multiple queries (e.g., Q3 and Q9), create ONE skill prompt for that entity — the correlation context goes in the Description, not in the Label.
- Query file prompts: from Phase 3 step 5 keyword extraction. Each query file is its OWN separate prompt — never merge a query file prompt with a skill prompt.
- IOC prompts: any suspicious IPs/domains from non-✅ findings not already covered by a skill prompt
- Deduplicate: if a skill prompt and IOC prompt target the same entity, keep only the skill prompt
- 🔴 NEVER merge a skill prompt (🔍) with a query file prompt (📄) into a single option. They are different action types with different execution paths.
- ⛔ Persist the pool. Write the final pool to
/memories/session/threat-pulse-drilldowns.mdusing the exact template below. The format banner is mandatory — it makes the—delimiter contract visible to every iteration and every LLM that edits the file. This memory block is the single source of truth; conversation history is not.
Memory File Template (write on first pool creation)
# Threat Pulse Session — <YYYY-MM-DD> **Workspace:** <name> (<id>) **Lookback:** <7d|30d|90d> **Scan Start:** <YYYY-MM-DD HH:MM UTC> ## Active Prompt Pool <!-- FORMAT: `- <ICON> <action> <entity> — Q<N>: <finding> → <skill-or-query-file>` --> <!-- ` — ` (space-emdash-space) is the REQUIRED label/description split delimiter. --> <!-- One icon per line. Order = file position (no numbering). Do not edit this comment block. --> - 🔍 Investigate incident #<IncidentId> — Q1: <brief finding>, <N> alerts, <MITRE-ID> → incident-investigation - 🎯 Enrich and investigate IP <IP> — Q4: <N> spray attempts / <N> users → ioc-investigation ... ## Pulse Key Findings (quick reference) ... ## Completed Drill-Downs _(none yet)_ -
Call
vscode_askQuestionsusing the Quick Pick Call Contract below. Apply identically on every iteration.Quick Pick Call Contract
header:Follow-Up Investigationquestion:Select one or more actions to launch (or skip):options: entity prompts (from memory), then📋(if truncated), then💾 / 🔄 / Skipas the final three — in that order, every iteration. 🆕 prompts prepend to the entity portion only.💾 Save full investigation report— Save the complete Threat Pulse session (scan + all drill-downs) as a markdown file🔄 Refresh prompt pool— Rebuild the follow-up prompt list from existing pulse + drill-down findings (does NOT re-run the 12 pulse queries)Skip— No follow-up — investigation complete
- Allowed Label icons:
🔍 📄 🎯 💾 🆕 🔄 📋. Verdict emoji (🔴🟠🟡🟢✅) are banned from Labels (render as��in VS Code quick picks) but fine in Descriptions. Drop 💾 after report is saved; 🔄 and Skip always remain.
🔴 Pre-Flight Checklist — run mechanically before EVERY
vscode_askQuestionscall□ 1. memory view → read `## Active Prompt Pool` just now (not earlier) □ 2. Count entity prompts (exclude 💾/🔄/📋/Skip) = N □ 3. Format integrity: every entity line starts with `- ` followed by exactly ONE icon. Any legacy `<N>.` prefix → migrate to `- ` first, re-read, then continue. □ 4. If N > 12: render top 12 (🆕 first, then memory order) + append `📋 Show full prompt pool (N items)` □ 5. For each rendered option: split memory line at FIRST ` — ` → label = text after `- ` up to delimiter, description = right, BYTE-FOR-BYTE (no paraphrasing; if something is missing, edit memory first then re-read) □ 6. Atomic check: each option Label has exactly ONE icon; Description has at most ONE `→ target` □ 7. `multiSelect: true` in call args □ 8. ZERO `recommended` keys anywhere in options[] □ 9. Tail = 💾 / 🔄 / Skip (or 📋 / 💾 / 🔄 / Skip if truncated) □ 10. Print the Pool Receipt line to chat BEFORE invoking the toolPool Receipt (box 10) — one-liner printed to chat so contract violations are user-visible:
📊 Pool: <N> total / rendering <R> (🆕×<a>, 🔍×<b>, 📄×<c>, 🎯×<d>) / truncated <✔|—> | multiSelect=true ✔ | recommended=0 ✔If user selects
📋: re-invoke with all entity prompts (drop📋, keep 💾/🔄/Skip tail). -
If user selects Skip (alone) or pool is empty: end skill execution. Ignore any freeform text if Skip is selected.
-
Freeform input routing — If user types freeform text instead of (or alongside) selecting options, route by matching intent to validated sources. Do NOT write ad-hoc KQL — find the right skill or query file first. Classified actions feed into step 7 alongside any selected options.
- Skill match — Check the request against copilot-instructions.md Available Skills trigger keywords. "Check vulnerabilities on that device" →
exposure-investigationorcomputer-investigation. Route as 🔍 — theread_filegate in step 7 applies. - Query file match —
grep_searchthe request's key terms (table names, operations, attack types) againstqueries/**. "Check forwarding rules" →queries/email/email_threat_detection.md. Route as 📄. - Contextual question — If answerable from data already in context (e.g., "is that IP in other alerts?"), answer directly. If a query is needed, loop back to sub-steps 1–2 to find the right source.
- No match — If no skill or query file covers the request, follow the KQL Pre-Flight Checklist from copilot-instructions.md (schema validation, table pitfalls, existing query search) before writing any KQL. Never skip the pre-flight for freeform requests.
- Skill match — Check the request against copilot-instructions.md Available Skills trigger keywords. "Check vulnerabilities on that device" →
-
💾 Save full investigation report selected:
- Read
/memories/session/threat-pulse-drilldowns.md(critical after context compaction) and compile pulse dashboard + all drill-down findings into a single markdown file using the Report Template (file mode). Weave drill-down insights into the main report — do NOT just append raw output. - If no drill-downs were executed yet, omit the
Drill-Down Investigation ResultsandCross-Investigation Correlationsections with note: "No drill-down investigations were performed in this session." - Save to
reports/threat-pulse/Threat_Pulse_YYYYMMDD_HHMMSS.md. Drop 💾 from subsequent pool iterations.
- Read
-
🔄 Refresh prompt pool selected — prompt list ONLY, no KQL execution. Refresh rebuilds the follow-up list; it does NOT re-run Q1–Q12 and does NOT re-run drill-downs. Discard the current pool, rebuild by re-applying Query File Recommendations and the skill matching table against pulse findings + all drill-down findings in memory. Deduplicate against completed prompts. If selected alongside other actions, refresh FIRST, then present the new pool before executing the others.
-
One or more actions selected — execute sequentially. Build a todo list (one item per action). For each:
- 🔍 Skill prompt: ⛔
read_filethe child SKILL.md BEFORE writing ANY query → find Investigation shortcut → match TP Q# trigger → execute with entity substitution. Writing KQL without the priorread_file= schema hallucination. See 🔍 Skill Drill-Down Execution Rule. - 📄 Query file prompt: read the file and execute its queries verbatim with entity substitution. See 📄 Query File Execution Rule.
- 🎯 IOC prompt: load
ioc-investigationskill with the target indicator.
After each drill-down, append a session-state entry to
/memories/session/threat-pulse-drilldowns.mdunder## Completed Drill-Downs:### <N>. <Prompt Label> (<skill-name>, <YYYY-MM-DD HH:MM>) - **Entity:** <target entity> - **Trigger:** Q<N> — <original finding> - **Key Findings:** <1–8 bullets, evidence-cited> - **Risk Assessment:** <emoji> <level> — <1-line justification> - **Cross-References:** <overlaps with other drill-downs or pulse queries> - **Recommendations:** <top 1–3 actions>This survives context compaction and feeds the
💾 Savereport.Before returning to step 2 — MANDATORY, in order:
- New Evidence Scan — review drill-down results for entities/TTPs not present in prior findings. Add 🆕 prompts only for meaningful leads (new attacker IP with high abuse, new critical CVE on exposed device, etc.). If nothing warrants follow-up, note: "No actionable new evidence."
- Manifest check — for each 🆕 item, consult
.github/manifests/discovery-manifest.yaml(match bydomains,mitre, ortitle). Only fall back to ad-hoc KQL if nothing matches. - Reload → mutate → write back —
memory view## Active Prompt Pool→ delete the completed bullet line(s) → prepend 🆕 prompts as new bullet lines (- <ICON> ...) →memory str_replace. Every entity line is a bullet — no ordinals, so adding/removing items never requires renumbering. Never reconstruct the pool from conversation history. - Return to step 2. Never render the pool as a markdown table/list instead of calling
vscode_askQuestions.
- 🔍 Skill prompt: ⛔
Atomic options — ONE action per option. Each option maps to ONE skill + ONE entity, OR ONE query file. When correlations link findings (e.g., Q3+Q9 same user), generate separate options, put the correlation in the Description. Bundling multiple actions/arrows in a single option is the #1 follow-up mistake.
✅ Correct: 🔍 Investigate user cameron@contoso.com / desc Q3+Q9: identity risk + inbox rule manipulation → user-investigation
❌ Wrong: 🔍 Investigate cameron ... → user-investigation, 📄 Hunt phishing → queries/email/...
📄 Query File Execution Rule
⛔ MANDATORY — applies to ALL 📄 query file prompt executions in Phase 4.
When executing a 📄 prompt, use the queries from the file verbatim with entity substitution. Do NOT rewrite queries against different tables than the file specifies.
- Read the query file and check its Investigation shortcuts section at the top — match the
(TP Q#)annotation to the triggering Threat Pulse query to identify the recommended query chain. Follow that chain for the hunt - Substitute entity values (hostnames, IPs, UPNs) and adjust
ago(Nd)lookback if context-aware expansion applies - ⚠️ Hostname-safe substitution: Device names vary across tables (short hostname vs FQDN vs uppercase). NEVER use
==for device/computer filters — usestartswith(default, case-insensitive, matches both short name and FQDN), orin~(multi-device). Override==in query file entity substitution notes withstartswith. - Execute using the file's exact tables, columns, and filters
- If supplementing with additional tables, execute the file's queries first, then add your own — clearly label which are from the file vs. supplementary
| Action | Status |
|---|---|
| Reading a query file then writing queries against a different table | ❌ PROHIBITED |
| Using the query file as "inspiration" and rewriting from scratch | ❌ PROHIBITED |
| Executing the file's queries verbatim with entity substitution | ✅ REQUIRED |
🔍 Skill Drill-Down Execution Rule
⛔ MANDATORY — applies to ALL 🔍 skill drill-down executions in Phase 4.
When executing a skill drill-down, load the child skill's SKILL.md and use its validated queries. Do NOT write ad-hoc queries from memory — schema hallucination (wrong column names, wrong table) is the #1 drill-down failure mode.
- Load the child skill's SKILL.md
- Match the trigger context (TP Q number) against the skill's Investigation shortcuts section to identify the relevant query chain
- Execute the shortcut query chain — substitute only entity placeholders and date ranges. Do NOT add columns, change
project/summarize by, or restructure. Column names vary across Device* tables; the SKILL.md queries already use the correct ones. - For quick triage: run only the shortcut chain. For deep investigation: run the full skill workflow
| Action | Status |
|---|---|
| Writing ad-hoc KQL without loading the child SKILL.md | ❌ PROHIBITED |
| Loading SKILL.md then modifying its queries (adding/changing columns, restructuring) | ❌ PROHIBITED |
| Using SKILL.md queries verbatim with entity substitution | ✅ REQUIRED |
🎬 Take Action — Portal-Ready Remediation Blocks
⚠️ AI-generated content may be incorrect. Always review Take Action queries and portal links for accuracy before executing remediation actions.
After every non-✅ drill-down that surfaces actionable entities, append a 🎬 Take Action section with direct portal links (single entities) or Advanced Hunting queries (bulk entities). Ref: Take action on AH results
Every 🎬 Take Action heading in the output — this one and every subsequent one — MUST be immediately followed by the AI-content warning blockquote above.
Skip when: verdict is ✅/🔵, or the action was already taken (e.g., ZAP purged emails).
Single Entity vs Bulk Entity Decision Rule
The remediation format depends on how many entities need action.
| Scenario | Format |
|---|---|
| 1 entity (user, device, IP, domain, hash) | Direct Defender XDR portal link (see Portal Links table for URL patterns) |
| 2+ emails | AH query with NetworkMessageId in (...) → Take actions |
| 2+ devices | AH query with DeviceName in~ (...) → Take actions |
| 2+ IPs/domains/hashes | AH query → click value in results → Add Indicator (allow/warn/block) |
⛔ PROHIBITED: Generating an AH query for a single entity when a direct portal link would suffice. AH Take Action is for bulk remediation — for a single entity, link directly to the portal page where the analyst can act.
ID sources (agent retrieves silently — never ask the user):
- User OID: Graph
/v1.0/users/<UPN>?$select=idorIdentityInfo.AccountObjectId - MDE DeviceId:
DeviceInfo.DeviceIdorGetDefenderMachineAPI - SHA / NetworkMessageId / etc.: from the originating AH table
⛔ Never emit prompts like "Retrieve the DeviceId" — run the lookup and emit the finished link in the same turn.
Required Columns per Entity Type
Missing a required column silently disables the action menu. Always include these:
| Entity | Required Columns | Actions | Notes |
|---|---|---|---|
NetworkMessageId, RecipientEmailAddress |
Soft/hard delete, move to folder, submit to Microsoft, initiate investigation | Do NOT use project — Submit to Microsoft and Initiate Automated Investigation require undocumented columns that project strips, silently greying out those options. The portal's Show empty columns toggle only works when columns exist in the result schema. Return all columns; use where to scope results. |
|
| 💻 Device | DeviceId |
Isolate, collect investigation package, AV scan, initiate investigation, restrict app execution | Use summarize arg_max(Timestamp, *) by DeviceId for latest state |
| 📁 File | SHA1 or SHA256 + DeviceId |
Quarantine file | Both hash and device required |
| 🔗 Indicator | IP, URL/domain, or SHA hash column | Add indicator: allow, warn, or block | An AH query is still required to surface the values as clickable — there is no Take actions dropdown button. Instead, click any IP/URL/hash value directly in the AH results → Add indicator to create a Defender for Endpoint custom indicator |
| 🔐 Identity | (No AH Take Action) | Confirm compromised, revoke sessions, suspend in app | Single user: Direct Defender XDR Identity page link. Never generate an AH query for identity remediation |
Template Queries
📧 Email — by NetworkMessageId: (no project — see Email row above)
EmailEvents
| where Timestamp > ago(7d)
| where NetworkMessageId in ("<id1>", "<id2>")
→ Take actions → Move to mailbox folder, Delete email (soft/hard), Submit to Microsoft, Initiate automated investigation
📧 Email — by compromised sender domain:
EmailEvents
| where Timestamp > ago(30d)
| where SenderFromDomain =~ "<domain>" and ThreatTypes has "Phish" and DeliveryAction == "Delivered"
| take 500
→ Take actions → Move to mailbox folder, Delete email (soft/hard), Submit to Microsoft, Initiate automated investigation
💻 Single Device — direct portal link:
Link to the Defender XDR machine page. If DeviceId isn't in context, look it up yourself:
DeviceInfo | where DeviceName startswith '<name>' | summarize arg_max(Timestamp, *) by DeviceId | project DeviceId
Then emit: [<DeviceName>](https://security.microsoft.com/machines/v2/<DeviceId>?tid=<tenant_id>). Never fabricate URLs with ?DeviceName=, /machines?, or bare hostnames.
→ Machine page → Response actions → Isolate device, Collect investigation package, Run antivirus scan, Initiate investigation, Restrict app execution
💻 Bulk Devices (2+) — AH query:
DeviceInfo
| where Timestamp > ago(1d)
| where DeviceName in~ ("<device1>", "<device2>")
| summarize arg_max(Timestamp, *) by DeviceId
| project DeviceId, DeviceName, OSPlatform, MachineGroup
→ Take actions → Isolate device, Collect investigation package, Run antivirus scan, Initiate investigation, Restrict app execution
📁 File — by hash:
Source-aware table selection. SHA hashes appear across many tables (
DeviceProcessEvents,DeviceImageLoadEvents,DeviceFileEvents,AlertEvidence). UseDeviceFileEventsas the default — it captures file writes and has the columns needed for Quarantine. If the hash was only observed via process execution (no separate file write event), substitute or union withDeviceProcessEvents. The Quarantine action requiresDeviceId+SHA1/SHA256regardless of source table.
File write events (default — DeviceFileEvents):
DeviceFileEvents
| where Timestamp > ago(7d)
| where SHA1 == "<hash>" or SHA256 == "<hash>"
| project DeviceId, DeviceName, SHA1, SHA256, FileName, FolderPath
Process execution events (when file write not captured — DeviceProcessEvents):
DeviceProcessEvents
| where Timestamp > ago(7d)
| where SHA1 == "<hash>" or SHA256 == "<hash>"
| project DeviceId, DeviceName, SHA1, SHA256, FileName, FolderPath, ProcessCommandLine
→ Take actions → Quarantine file
🔗 Bulk Indicators (2+ IPs/domains/hashes) — AH query for Add Indicator:
When blocking multiple IPs, domains, or hashes, provide an AH query that surfaces the values as clickable columns. There is no Take actions dropdown — the analyst clicks each value directly in results → Add indicator.
Source-aware table selection. The table MUST match where the IPs were originally discovered.
DeviceNetworkEventsis the default for network-layer IPs (endpoint connections, firewall events). However, IPs from authentication-layer sources (AADUserRiskEvents,EntraIdSignInEvents,SigninLogs,AADServicePrincipalSignInLogs) may never appear in endpoint network events — queryingDeviceNetworkEventsfor those returns 0 results. Use the originating table so the analyst sees the IPs in context and can click to add indicators.
Network-layer IPs (from DeviceNetworkEvents, DeviceLogonEvents, firewall logs):
// Surface attacker IPs as clickable values for Add Indicator
DeviceNetworkEvents
| where Timestamp > ago(7d)
| where RemoteIP in ("<ip1>", "<ip2>", "<ip3>")
| summarize Connections = count(), Ports = make_set(LocalPort) by RemoteIP
| order by Connections desc
Auth-layer IPs (from AADUserRiskEvents, EntraIdSignInEvents, SigninLogs):
// Surface attacker IPs from sign-in/risk events for Add Indicator
EntraIdSignInEvents
| where Timestamp > ago(30d)
| where IPAddress in ("<ip1>", "<ip2>", "<ip3>")
| summarize SignIns = count(), Users = dcount(AccountUpn), Countries = make_set(Country, 5) by IPAddress
| order by SignIns desc
→ Click any IP value in results → Add indicator → Block and remediate
Variant — domains/URLs:
DeviceNetworkEvents
| where Timestamp > ago(7d)
| where RemoteUrl has_any ("<domain1>", "<domain2>")
| summarize Connections = count() by RemoteUrl
| order by Connections desc
→ Click any RemoteUrl value → Add indicator → Block and remediate
Defender XDR Portal Links — All Entity Types
🔴 Every entity (user, domain, URL, IP, file hash) in action/recommendation tables MUST be a clickable Defender XDR portal link — the entity name IS the link. Do NOT add a separate "Portal" column or leave entities as plain text. VS Code renders bare UPNs as mailto: and bare URLs/IPs as broken links.
| Entity | URL Pattern | Example |
|---|---|---|
| User | https://security.microsoft.com/user?aad=<OID>&upn=<UPN>&tab=overview&tid=<tenant_id> |
[user@contoso.com](https://security.microsoft.com/user?aad=<OID>&upn=user@contoso.com&tab=overview&tid=<tenant_id>) |
| Domain | https://security.microsoft.com/domains/overview?urlDomain=<domain>&tid=<tenant_id> |
[contoso.com](https://security.microsoft.com/domains/overview?urlDomain=contoso.com&tid=<tenant_id>) |
| URL | https://security.microsoft.com/url/overview?url=<url-encoded-URL>&tid=<tenant_id> |
[example.com/path](https://security.microsoft.com/url/overview?url=http%3A%2F%2Fexample.com%2Fpath&tid=<tenant_id>) |
| IP | https://security.microsoft.com/ip/<IP>/overview?tid=<tenant_id> |
[<IP>](https://security.microsoft.com/ip/<IP>/overview?tid=<tenant_id>) |
| File Hash | https://security.microsoft.com/file/<SHA1-or-SHA256>/?tid=<tenant_id> |
[da5e459...b1bb1e](https://security.microsoft.com/file/da5e45915354850261cf0e87dc7af19597b1bb1e/?tid=<tenant_id>) |
| Device | https://security.microsoft.com/machines/v2/<MDE_DeviceId>?tid=<tenant_id> |
[<DeviceName>](https://security.microsoft.com/machines/v2/<MDE_DeviceId>?tid=<tenant_id>) |
| SPN / Non-Human Identity | https://security.microsoft.com/identity-inventory?tab=NonHumanIdentities&tid=<tenant_id> |
[Non-Human Identities Inventory](https://security.microsoft.com/identity-inventory?tab=NonHumanIdentities&tid=<tenant_id>) |
User fallbacks: ?upn=<UPN> when ObjectId is unavailable; ?sid=<SID>&accountName=<Name>&accountDomain=<Domain> for on-prem AD.
Device ID source: DeviceId from the DeviceInfo AH table or the id field from GetDefenderMachine API. This is the MDE machine identifier — NOT the Entra Device Object ID (which is different). The computer-investigation skill retrieves this in Step 1b.
🔴 Portal URL Allowlist — No Invented Paths. The 7 patterns above plus /v2/advanced-hunting?tid=<tenant_id> are the ONLY security.microsoft.com URLs you may emit. For any other action (Custom Indicators, Safe Links policy, Email Explorer, CA policy editor, Secure Score, etc.), write a textual breadcrumb — e.g., "Defender XDR → Settings → Endpoints → Indicators → URLs/Domains → Add item". Never guess a path from memory.
Entity Display — Portal Link vs Defang (Mutually Exclusive)
| Context | Treatment | Example |
|---|---|---|
| Action / Take Action / recommendation tables | Wrap entity name in portal link (from table above). Never defang. | [evil.com](https://security.microsoft.com/domains/overview?urlDomain=evil.com&tid=<tenant_id>) |
| Data / results tables (raw query output) | Defang entity as plain text. Never portal-link. | hxxps://evil[.]com/path |
Defang rules: http:// → hxxp://, https:// → hxxps://, . in domain → [.]. VS Code auto-linkifies anything URL-shaped, which is why defanging is required in data tables. Conversely, a portal-linked entity has the portal URL as the link target, so linkification is safe \u2014 defanging would just break the link.
Rules Summary
| Rule | Status |
|---|---|
Every 🎬 Take Action heading immediately followed by the AI-content warning blockquote |
✅ REQUIRED |
| Single entity \u2192 direct portal link (never an AH query) | ✅ REQUIRED |
2+ entities \u2192 AH query with Take Actions, all required columns present, no project on emails |
✅ REQUIRED |
Every AH query includes BOTH a ```kql code block AND a plain [Run in Advanced Hunting](https://security.microsoft.com/v2/advanced-hunting?tid=<tenant_id>) link below it |
✅ REQUIRED |
| Action tables: entity = clickable portal link (from the 7 approved patterns). No separate "Portal" column, no defanging. | ✅ REQUIRED |
| Data tables: entity = defanged plain text. No portal linking. | ✅ REQUIRED |
| Textual breadcrumb ("Defender XDR → …") when no approved portal URL pattern covers the action | ✅ REQUIRED |
Emitting any security.microsoft.com URL outside the 7 approved patterns + /v2/advanced-hunting |
❌ PROHIBITED |
Generating gzip/base64-encoded AH deep links via kql_to_ah_url.py for output |
❌ PROHIBITED |
| Non-✅ drill-down surfaces actionable entities but no Take Action block | ❌ PROHIBITED |
Sample KQL Queries
All queries below are verified against live Sentinel/Defender XDR schemas. Use them exactly as written. Lookback periods use
ago(Nd)— substitute the user's preferred lookback where noted.
Query 1: Open Incidents with Severity-Ranked Backfill & MITRE Techniques
🔴 Incident hygiene — Surfaces unresolved incidents prioritized by severity (Critical → High → Medium → Low), with age, owner, alert count, MITRE tactics, MITRE technique IDs, and extracted entity names (accounts + devices) for cross-query correlation. In large environments, all 10 slots fill with High/Critical. In smaller environments, Medium/Low backfill remaining slots automatically.
Tool: RunAdvancedHuntingQuery
let OpenIncidents = SecurityIncident
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where Status in ("New", "Active");
let TotalHighCritical = toscalar(OpenIncidents | where Severity in ("High", "Critical") | count);
let TotalAll = toscalar(OpenIncidents | count);
OpenIncidents
| extend SevRank = case(Severity == "Critical", 0, Severity == "High", 1, Severity == "Medium", 2, Severity == "Low", 3, 4)
| extend ParsedLabels = parse_json(Labels)
| mv-apply Label = ParsedLabels on (
summarize Tags = make_set(tostring(Label.labelName), 5)
)
| extend Tags = set_difference(Tags, dynamic([""]))
| mv-expand AlertId = AlertIds | extend AlertId = tostring(AlertId)
| join kind=leftouter (
SecurityAlert
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, Entities, Tactics, Techniques, AlertName, AlertSeverity) by SystemAlertId
| extend ParsedEntities = parse_json(Entities)
| mv-expand Entity = ParsedEntities
| extend EntityType = tostring(Entity.Type),
AccountUPN = case(
tostring(Entity.Type) == "account" and isnotempty(tostring(Entity.UPNSuffix)),
tolower(strcat(tostring(Entity.Name), "@", tostring(Entity.UPNSuffix))),
tostring(Entity.Type) == "account" and isnotempty(tostring(Entity.AadUserId)),
tostring(Entity.AadUserId),
""),
HostName = iff(tostring(Entity.Type) == "host", tolower(tostring(Entity.HostName)), "")
| project SystemAlertId, Tactics, Techniques, AlertName, AlertSeverity, AccountUPN, HostName
) on $left.AlertId == $right.SystemAlertId
| mv-expand Technique = parse_json(Techniques)
| extend Technique = tostring(Technique)
| extend TacticsSplit = split(Tactics, ", ")
| mv-expand Tactic = TacticsSplit
| extend Tactic = tostring(Tactic)
| summarize
Tactics = make_set(Tactic),
Techniques = make_set(Technique),
AlertNames = make_set(AlertName, 5),
AlertCount = dcount(AlertId),
Accounts = make_set(AccountUPN, 5),
Devices = make_set(HostName, 5),
Tags = take_any(Tags)
by ProviderIncidentId, Title, Severity, SevRank, Status, CreatedTime,
OwnerUPN = tostring(Owner.userPrincipalName)
| extend Techniques = set_difference(Techniques, dynamic([""]))
| extend Tactics = set_difference(Tactics, dynamic([""]))
| extend Accounts = set_difference(Accounts, dynamic([""]))
| extend Devices = set_difference(Devices, dynamic([""]))
| extend AgeDisplay = case(
datetime_diff('minute', now(), CreatedTime) < 60, strcat(datetime_diff('minute', now(), CreatedTime), "m ago"),
datetime_diff('hour', now(), CreatedTime) < 24, strcat(datetime_diff('hour', now(), CreatedTime), "h ago"),
strcat(datetime_diff('day', now(), CreatedTime), "d ago"))
| extend PortalUrl = strcat("https://security.microsoft.com/incidents/", ProviderIncidentId, "?tid=<TENANT_ID>")
| extend TotalHighCritical = TotalHighCritical, TotalAll = TotalAll
| project TotalHighCritical, TotalAll, ProviderIncidentId, Title, Severity, SevRank, AgeDisplay, AlertCount,
OwnerUPN, Tactics, Techniques, Accounts, Devices, Tags, PortalUrl, AlertNames, CreatedTime
// --- Deduplicate by Title: keep one representative incident per title for variety ---
| as AllOpenIncidents
| join kind=leftouter (
AllOpenIncidents | summarize TitleDupCount = count() by Title
) on Title
| project-away Title1
| order by Title asc, SevRank asc, bin(CreatedTime, 1d) desc, AlertCount desc
| extend _rn = row_number(1, prev(Title) != Title)
| where _rn == 1
| project-away _rn
| order by SevRank asc, bin(CreatedTime, 1d) desc, AlertCount desc
| take 10
Purpose: Top 10 open incidents with severity-ranked backfill (Critical→High→Medium→Low). In large envs, all slots fill with High/Critical; small envs backfill with Medium/Low. TotalHighCritical and TotalAll drive the adaptive report header ("Showing 10 of {TotalAll} open incidents ({TotalHighCritical} High/Critical)") and are computed across all open incidents pre-dedup, so header counts stay accurate. The list is deduplicated by Title so the top 10 shows distinct incident types rather than near-identical rows — in noisy envs a single recurring title (password-spray, DLP rule) can otherwise monopolize all 10 slots; the single highest-priority representative per title is kept (severity → newest day → alert count) and TitleDupCount preserves the volume signal. Joins SecurityAlert for MITRE tactics/techniques and extracts Accounts (UPN or AAD ObjectId, lowercased), Devices (hostname, lowercased), and Tags (from Labels — both AutoAssigned ML classifications and User-applied SOC tags) — each capped at 5 per incident — for cross-query correlation with Q3/Q4/Q6/Q7/Q12. Flags unassigned incidents (empty OwnerUPN).
Sort: SevRank asc, bin(CreatedTime, 1d) desc, AlertCount desc — severity tier first, then calendar day (newest first), then complexity within each day.
Verdict logic:
- 🔴 Escalate: 5+ new High/Critical in 24h, OR any incident with
AlertCount > 50, OR any unassigned High/Critical with CredentialAccess/LateralMovement tactics - 🟠 Investigate: Any unassigned High/Critical, OR
AlertCount > 10, OR multiple High/Critical in <6h - 🟡 Monitor: Only Medium/Low incidents exist (no High/Critical), OR High/Critical assigned with low alert count
- ✅ Clear: 0 open incidents of any severity (Q2 closed summary still renders as context)
Query 2: Closed Incident Summary (7-Day Lookback)
🔴 Threat landscape context — Even when all incidents are resolved, the classification breakdown, MITRE tactic distribution, and severity mix from recent closures provide actionable signals for cross-correlation and query file recommendations.
Tool: RunAdvancedHuntingQuery
Always runs in parallel with Q1 — not conditional on Q1 results.
SecurityIncident
| where CreatedTime > ago(7d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where Status == "Closed"
| where array_length(AlertIds) > 0
| mv-expand AlertId = AlertIds | extend AlertId = tostring(AlertId)
| join kind=leftouter (
SecurityAlert
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, Tactics, Techniques) by SystemAlertId
| project SystemAlertId, Tactics, Techniques
) on $left.AlertId == $right.SystemAlertId
| mv-expand Technique = parse_json(Techniques)
| extend Technique = tostring(Technique)
| extend TacticsSplit = split(Tactics, ", ")
| mv-expand Tactic = TacticsSplit
| extend Tactic = tostring(Tactic)
| summarize
Total = dcount(IncidentNumber),
TruePositive = dcountif(IncidentNumber, Classification == "TruePositive"),
BenignPositive = dcountif(IncidentNumber, Classification == "BenignPositive"),
FalsePositive = dcountif(IncidentNumber, Classification == "FalsePositive"),
Undetermined = dcountif(IncidentNumber, Classification == "Undetermined"),
HighCritical = dcountif(IncidentNumber, Severity in ("High", "Critical")),
MediumLow = dcountif(IncidentNumber, Severity in ("Medium", "Low")),
Tactics = make_set(Tactic),
Techniques = make_set(Technique)
| extend Techniques = set_difference(Techniques, dynamic([""]))
| extend Tactics = set_difference(Tactics, dynamic([""]))
Purpose: Provides a 7-day closed incident summary with classification breakdown (TP/BP/FP/Undetermined), severity distribution, aggregated MITRE tactics, and aggregated MITRE technique IDs. Uses CreatedTime (not TimeGenerated) to match portal "created in last 7 days" semantics — TimeGenerated captures any incident updated in the window, inflating counts with old incidents. Filters array_length(AlertIds) > 0 to exclude phantom incidents — the SecurityIncident table contains hundreds of records synced from XDR with empty AlertIds that never surface in the Defender XDR portal queue (see copilot-instructions.md Known Table Pitfalls). This data feeds three downstream uses:
- TP rate signal — High TruePositive ratio indicates an active threat environment
- MITRE tactic context — Tactics from closed TPs identify the current threat landscape for cross-correlation with Q3/Q7/Q8 findings
- Manifest MITRE matching — The
Techniquesarray contains ATT&CK technique IDs (e.g.,T1566,T1078,T1059) directly matchable against manifest entrymitrefields. No tactic→technique mapping needed — the technique IDs are the primary matching key for query file recommendations
Verdict logic:
- 🟠 Investigate:
TruePositive / Total > 0.5(majority of closures are real threats — active threat environment) - 🟡 Monitor: Any TruePositive closures exist, or
Undetermined > 0(some incidents lack classification) - ✅ Clear: 0 TruePositive closures; all closures are BenignPositive or FalsePositive
- 🔵 Informational: 0 closed incidents in 7d
Rendering rules:
- Always render Q2 results in the report, regardless of Q1 verdict
- In the Dashboard Summary, Q2 gets its own row. In Detailed Findings, render Q2 immediately after Q1 as a compact summary block
- Flatten the
TacticsandTechniquesarrays and report distinct values from TruePositive incidents - The
Techniquesarray feeds directly into the Query File Recommendations manifest MITRE matching (no tactic→technique translation needed) - If 0 closed incidents in 7d, display: "No incidents closed in the last 7 days"
Query 3: Identity Risk Posture & Risk Event Enrichment
🔐 Identity risk posture — Hybrid two-signal query: IdentityInfo.RiskScore (Defender XDR composite, 0-100) captures alert-chain and MITRE-stage risk, while RiskLevel/RiskStatus (Identity Protection) captures sign-in anomalies and AI-driven signals. Uses both because they are independent engines — a user can have RiskScore=93 with Remediated IdP status, or RiskScore=0 with High/AtRisk IdP status. AADUserRiskEvents enriches with the specific detections explaining why they're flagged.
Tool: RunAdvancedHuntingQuery
let lookback = 7d;
// Layer 1: IdentityInfo — hybrid filter (Defender RiskScore + IdP RiskLevel/Status + Criticality)
let IdentityPosture = IdentityInfo
| where Timestamp > ago(lookback)
| summarize arg_max(Timestamp, *) by AccountUpn
| where RiskScore >= 71
or RiskLevel in ("High", "Medium")
or RiskStatus in ("AtRisk", "ConfirmedCompromised")
or CriticalityLevel >= 3;
// Layer 2: AADUserRiskEvents — enrichment (the why)
let UserRiskEvents = AADUserRiskEvents
| where TimeGenerated > ago(lookback)
| extend Country = tostring(parse_json(Location).countryOrRegion)
| summarize
RiskDetections = count(),
HighCount = countif(RiskLevel == "high"),
TopRiskEventTypes = make_set(RiskEventType, 8),
TopCountries = make_set(Country, 5),
LatestDetection = max(TimeGenerated)
by UserPrincipalName;
// IdentityInfo drives, AADUserRiskEvents enriches
IdentityPosture
| join hint.strategy=broadcast kind=leftouter (UserRiskEvents)
on $left.AccountUpn == $right.UserPrincipalName
| extend
DisplayName = coalesce(AccountDisplayName, AccountName, AccountUpn),
PortalUrl = strcat("https://security.microsoft.com/user?",
case(
isnotempty(AccountObjectId), strcat("aad=", AccountObjectId, "&upn=", AccountUpn),
isnotempty(OnPremSid), strcat("sid=", OnPremSid, "&accountName=", AccountName,
"&accountDomain=", AccountDomain),
isnotempty(AccountUpn), strcat("upn=", AccountUpn),
""),
"&tab=overview&tid=<TENANT_ID>")
| project DisplayName, PortalUrl, RiskScore, RiskLevel, RiskStatus, CriticalityLevel,
RiskDetections = coalesce(RiskDetections, long(0)),
HighCount = coalesce(HighCount, long(0)),
TopRiskEventTypes, TopCountries, LatestDetection
| order by RiskScore desc, HighCount desc, RiskDetections desc, CriticalityLevel desc
| take 15
Purpose: RiskScore (int, 0-100) is the Defender XDR composite score on IdentityInfo — factors include alert chains, MITRE stage progression, and asset criticality. Portal thresholds: 71-90 = High, 91-100 = Critical. RiskLevel/RiskStatus are Identity Protection signals (sign-in anomalies, leaked creds, AI signals) — a separate engine that doesn't always agree with RiskScore. The hybrid OR filter ensures users flagged by either engine surface. Users with both signals firing are highest priority (corroborated).
Output columns: DisplayName (linked to Defender XDR Identity page via PortalUrl), RiskScore (0-100, primary sort), RiskLevel, RiskStatus, CriticalityLevel, RiskDetections (count), HighCount, TopRiskEventTypes, TopCountries, LatestDetection.
Portal URL resolution: Three-tier fallback for identity environment coverage:
- Cloud/Hybrid (has Entra ObjectId):
aad=<ObjectId>&upn=<UPN> - On-prem AD (SID only, no Entra sync):
sid=<SID>&accountName=<Name>&accountDomain=<Domain> - External IdP (UPN only, e.g., CyberArk/Okta):
upn=<UPN>
Report rendering: Show top 10 users in the dashboard table. Use DisplayName as clickable link text with PortalUrl as the target. If >10 results, note "+N more — drill down with user-investigation skill". For each user, render RiskScore and TopRiskEventTypes as the key risk indicators.
Verdict logic:
- 🔴 Escalate: Any user with
RiskScore >= 91, orConfirmedCompromisedstatus, orHighCount > 3, or multiple users withHighCount > 0 - 🟠 Investigate:
RiskScore >= 71, orHighCount > 0for any user, or any userAtRiskwith risk events indicatingaiCompoundAccountRisk,impossibleTravel, ormaliciousIPAddress - 🟡 Monitor: Only
Mediumrisk users with low-severity risk event types (e.g.,unfamiliarFeatures) - ✅ Clear: 0 users matching the hybrid filter
⚠️ Risk Event Type Routing Guard (Phase 4 drill-down):
suspiciousAuthAppApproval→ T1621 MFA Fatigue (suspicious Authenticator push approval patterns), NOT OAuth app consent. Route touser-investigationorauthentication-tracing. NEVER recommendapp-registration-posturebased on this risk event alonemcasSuspiciousInboxManipulationRules→ T1114.003 email exfiltration via inbox rules. Route touser-investigationwith OfficeActivity drill-down
Query 4: Password Spray / Brute-Force Detection
🔐 Auth spray detection (T1110.003 / T1110.001) — Identifies IPs targeting multiple users with failed auth across Entra ID cloud sign-ins AND RDP/SSH/network logons on endpoints.
Tool: RunAdvancedHuntingQuery
// Step 1: Count spray-specific failures per IP (materialized — referenced twice)
let SprayFailures = materialize(EntraIdSignInEvents
| where Timestamp > ago(7d)
| where ErrorCode in (50126, 50053, 50057)
| summarize
FailedAttempts = count(),
TargetUsers = dcount(AccountUpn),
SampleTargets = make_set(AccountUpn, 5),
FailedApps = make_set(Application, 3),
Countries = make_set(Country, 3)
by SourceIP = IPAddress
| where TargetUsers >= 5);
// Step 2: Get full traffic profile for flagged IPs (success context)
let IPTrafficProfile = EntraIdSignInEvents
| where Timestamp > ago(7d)
| where IPAddress in ((SprayFailures | project SourceIP))
| summarize
TotalSignIns = count(),
Successes = countif(ErrorCode == 0),
TotalDistinctUsers = dcount(AccountUpn),
TotalDistinctApps = dcount(Application)
by SourceIP = IPAddress;
// Step 3: Join and filter — eliminate shared infrastructure false positives
let EntraResults = SprayFailures
| join kind=inner IPTrafficProfile on SourceIP
| extend
SprayRatio = round(FailedAttempts * 100.0 / max_of(TotalSignIns, 1), 1),
SuccessRate = round(Successes * 100.0 / max_of(TotalSignIns, 1), 1)
| where SprayRatio >= 1.0 and TotalDistinctApps < 50
| extend Surface = "Entra ID"
| project SourceIP, FailedAttempts, TargetUsers, SampleTargets,
Protocols = FailedApps, Countries, Surface,
TotalSignIns, Successes, SprayRatio, SuccessRate, TotalDistinctApps;
// Endpoint brute-force — Surface label by LogonType
let EndpointBrute = DeviceLogonEvents
| where Timestamp > ago(7d)
| where ActionType == "LogonFailed"
| where LogonType in ("RemoteInteractive", "Network")
| where isnotempty(RemoteIP)
| summarize
FailedAttempts = count(),
TargetUsers = dcount(AccountName),
SampleTargets = make_set(AccountName, 5),
Protocols = make_set(strcat(LogonType, " → ", DeviceName), 3),
Countries = dynamic(["—"]),
LogonTypes = make_set(LogonType)
by SourceIP = RemoteIP
| where FailedAttempts >= 10
| extend Surface = iff(array_length(LogonTypes) == 1 and LogonTypes[0] == "RemoteInteractive", "Endpoint (RDP)", "Endpoint (Network Logon)"),
TotalSignIns = FailedAttempts, Successes = long(0),
SprayRatio = 100.0, SuccessRate = 0.0, TotalDistinctApps = long(0)
| project-away LogonTypes;
union EntraResults, EndpointBrute
| order by SprayRatio desc, TargetUsers desc, FailedAttempts desc
| take 15
Purpose: Detects password spray (1 IP → many users, MITRE T1110.003) and brute-force (1 IP → high failure count, T1110.001) across two surfaces, with shared infrastructure false-positive filtering:
- Entra ID: Uses
EntraIdSignInEvents(Advanced Hunting) which merges interactive + non-interactive sign-ins into a single table. Error codes: 50126=bad password, 50053=locked account, 50057=disabled account. The query enriches failure data with the IP's full traffic profile to computeSprayRatio(spray failures ÷ total sign-ins) andTotalDistinctApps. Two filters eliminate corporate proxies, VPN concentrators, and Azure gateways:SprayRatio >= 1.0— spray failures must be ≥1% of the IP's total sign-in volume. A proxy with 500K sign-ins and 77 spray errors → 0.01% → filtered. A pure attacker with 77 failures and 0 successes → 100% → kept.TotalDistinctApps < 50— IPs serving 50+ distinct applications are shared infrastructure. Real spray targets 1–3 apps.
- Endpoint: RDP (
RemoteInteractive) and Network Logon (Network) failed logons on MDE-enrolled devices. Surface labels:Endpoint (RDP)for pure RemoteInteractive,Endpoint (Network Logon)for anything involving Network logon type. NLA caveat: RDP with Network Level Authentication generatesLogonType == "Network"(notRemoteInteractive), soEndpoint (Network Logon)may be RDP-via-NLA or SMB — the manifest surfaces bothrdp_threat_detection.mdandsmb_threat_detection.mdfor drill-down. Threshold of ≥10 failures. No success context available in DeviceLogonEvents for filtering.
Output columns: SourceIP, FailedAttempts, TargetUsers, SampleTargets, Protocols, Countries, Surface, TotalSignIns, Successes, SprayRatio, SuccessRate, TotalDistinctApps. The SprayRatio and TotalDistinctApps columns provide immediate false-positive triage context.
Verdict logic:
- 🔴 Escalate: Any IP targeting >25 Entra users OR >100 endpoint failures from a single IP
- 🟠 Investigate: Any spray/brute-force pattern detected (meets thresholds)
- 🟡 Monitor: Spray activity detected but below thresholds (e.g., single IP with 3–4 target users, or <10 endpoint failures)
- ✅ Clear: 0 results — no spray/brute-force patterns detected
Drill-down: Use user-investigation skill for targeted users, ioc-investigation for source IPs.
Query 5: SPN Behavioral Drift (90d Baseline vs 7d Recent)
🤖 Automation monitoring — Composite drift score across 5 dimensions for service principals, with IPv6 subnet normalization and IPDrift cap.
Tool: mcp_sentinel-data_query_lake (needs >30d lookback)
let BL_Start = ago(97d); let BL_End = ago(7d);
let RC_Start = ago(7d); let RC_End = now();
let BL = AADServicePrincipalSignInLogs
| where TimeGenerated between (BL_Start .. BL_End)
| extend NormalizedIP = case(
IPAddress has ":", strcat_array(array_slice(split(IPAddress, ":"), 0, 3), ":"),
IPAddress)
| summarize
BL_Vol = count(),
BL_Res = dcount(ResourceDisplayName),
BL_IPs = dcount(NormalizedIP),
BL_Loc = dcount(Location),
BL_Fail = dcountif(ResultType, ResultType != "0" and ResultType != 0)
by ServicePrincipalId, ServicePrincipalName;
let RC = AADServicePrincipalSignInLogs
| where TimeGenerated between (RC_Start .. RC_End)
| extend NormalizedIP = case(
IPAddress has ":", strcat_array(array_slice(split(IPAddress, ":"), 0, 3), ":"),
IPAddress)
| summarize
RC_Vol = count(),
RC_Res = dcount(ResourceDisplayName),
RC_IPs = dcount(NormalizedIP),
RC_Loc = dcount(Location),
RC_Fail = dcountif(ResultType, ResultType != "0" and ResultType != 0)
by ServicePrincipalId, ServicePrincipalName;
RC | join kind=inner BL on ServicePrincipalId
| extend
VolDrift = round(RC_Vol * 100.0 / max_of(BL_Vol, 10), 0),
ResDrift = round(RC_Res * 100.0 / max_of(BL_Res, 3), 0),
IPDriftRaw = round(RC_IPs * 100.0 / max_of(BL_IPs, 3), 0),
IPDrift = min_of(round(RC_IPs * 100.0 / max_of(BL_IPs, 3), 0), 300),
LocDrift = round(RC_Loc * 100.0 / max_of(BL_Loc, 2), 0),
FailDrift = round(RC_Fail * 100.0 / max_of(BL_Fail, 5), 0)
| extend DriftScore = round((VolDrift*0.20 + ResDrift*0.25 + IPDrift*0.25 + LocDrift*0.15 + FailDrift*0.15), 0)
| where DriftScore > 120
| order by DriftScore desc
| take 10
Purpose: Identifies service principals with significant behavioral changes from their 90-day baseline.
Tuning notes:
- IPv6 /64 normalization: IPv6 addresses are collapsed to their /64 prefix before counting. Azure PaaS services (Copilot Studio, Playbook Automation) rotate through dozens of
fd00:ULA pod addresses within the same cluster — without normalization, each pod IP inflates IPDrift by hundreds of percent. - IPDrift cap (300%):
IPDriftRawshows the true ratio;IPDriftis capped to prevent IP-only spikes from dominating. Transparent when IPv4-only SPNs have genuine expansion. - Weights: Volume 20%, Resources 25%, IPs 25%, Locations 15%, Failure Rate 15%.
Verdict logic:
- 🔴 Escalate: Any SPN with
DriftScore > 250orIPDriftRaw > 400% - 🟠 Investigate:
DriftScore > 150 - 🟡 Monitor:
DriftScore 120–150 - ✅ Clear: No SPNs above threshold
Drill-down: Use scope-drift-detection/spn skill for full investigation of flagged SPNs.
Query 6: Fleet-Wide Device Process Drift
💻 Endpoint behavioral baseline — Per-device drift scores computed in-query (7d baseline vs 1d recent), with infrastructure noise filtering and VolDrift cap to prevent automation-driven false positives.
Tool: RunAdvancedHuntingQuery
let uptime = DeviceInfo
| where Timestamp > ago(7d)
| extend IsRecent = Timestamp >= ago(1d)
| summarize
BaselineHours = dcountif(bin(Timestamp, 1h), not(IsRecent)),
RecentHours = dcountif(bin(Timestamp, 1h), IsRecent)
by DeviceName;
DeviceProcessEvents
| where Timestamp > ago(7d)
| where not(
InitiatingProcessFileName in ("gc_worker", "gc_linux_service", "dsc_host")
or (InitiatingProcessFileName == "dash" and InitiatingProcessParentFileName in ("gc_worker", "gc_linux_service"))
)
| extend IsRecent = Timestamp >= ago(1d), DayBucket = bin(Timestamp, 1d)
| summarize
BL_Events = countif(not(IsRecent)),
RC_Events = countif(IsRecent),
BL_Procs = dcountif(FileName, not(IsRecent)),
RC_Procs = dcountif(FileName, IsRecent),
BL_Accts = dcountif(AccountName, not(IsRecent)),
RC_Accts = dcountif(AccountName, IsRecent),
BL_Chains = dcountif(strcat(InitiatingProcessFileName, "→", FileName), not(IsRecent)),
RC_Chains = dcountif(strcat(InitiatingProcessFileName, "→", FileName), IsRecent),
BL_Comps = dcountif(ProcessVersionInfoCompanyName, not(IsRecent)),
RC_Comps = dcountif(ProcessVersionInfoCompanyName, IsRecent),
BaselineDays = dcountif(DayBucket, not(IsRecent))
by DeviceName
| where RC_Events > 0 and BL_Events > 0 and BaselineDays >= 4
| join kind=inner uptime on DeviceName
| where BaselineHours >= 48 and RecentHours >= 4
| extend
VolDriftRaw = round(RC_Events * 600.0 / max_of(BL_Events, 1), 0),
VolDrift = min_of(round(RC_Events * 600.0 / max_of(BL_Events, 1), 0), 300),
ProcDrift = round(RC_Procs * 100.0 / max_of(BL_Procs, 1), 0),
AcctDrift = round(RC_Accts * 100.0 / max_of(BL_Accts, 1), 0),
ChainDrift = round(RC_Chains * 100.0 / max_of(BL_Chains, 1), 0),
CompDrift = round(RC_Comps * 100.0 / max_of(BL_Comps, 1), 0)
| extend DriftScore = round(VolDrift * 0.30 + ProcDrift * 0.25 + ChainDrift * 0.20 + AcctDrift * 0.15 + CompDrift * 0.10, 0)
| order by DriftScore desc
| take 10
| project DeviceName, DriftScore, BaselineDays, BaselineHours, RecentHours, VolDriftRaw, VolDrift, ProcDrift, AcctDrift, ChainDrift, CompDrift
Purpose: Returns the top 10 devices ranked by composite drift score, pre-computed in KQL. No LLM-side math required — just interpret the returned scores.
Tuning notes:
- GC filter: Excludes Azure Guest Configuration noise (Linux only; <1% impact on Windows).
- Uptime + baseline-days gates: Filter intermittent endpoints whose offline baseline inflates VolDrift. Drop the
DeviceInfojoin if heartbeats aren't ingested. - VolDrift cap (300%):
VolDriftRawpreserves the true ratio. HighVolDriftRawwith ~100 diversity metrics = infrastructure noise; both elevated = high-confidence anomaly. - Weights: Volume 30%, Processes 25%, Chains 20%, Accounts 15%, Companies 10%.
Verdict logic: See Device Drift Score Interpretation in Post-Processing for the full scale, VolDrift cap context, and fleet-uniformity rule.
Query 7: Rare Process Chain Singletons
💻 Threat hunting — Parent→child process combinations appearing fewer than 3 times in 30 days.
Tool: RunAdvancedHuntingQuery
DeviceProcessEvents
| where Timestamp > ago(30d)
| summarize
Count = count(),
UniqueDevices = dcount(DeviceName),
SampleDevice = take_any(DeviceName),
SampleUser = strcat(take_any(AccountDomain), "\\", take_any(AccountName)),
SampleChildCmd = take_any(ProcessCommandLine),
GrandparentProcess = take_any(InitiatingProcessParentFileName),
LastSeen = max(Timestamp)
by ParentProcess = InitiatingProcessFileName, ChildProcess = FileName
| where Count < 3
| order by Count asc, UniqueDevices asc
| take 20
Purpose: Surfaces the 20 rarest process chains — singletons and near-singletons within the 30-day AH window. Effective for spotting LOLBin abuse, malware execution, or novel attack tooling. Review SampleChildCmd for suspicious command-line patterns.
Verdict logic:
- 🟠 Investigate: Any singleton with suspicious parent (cmd.exe, powershell.exe, wscript.exe, mshta.exe, rundll32.exe) or child running from temp/user profile directories
- 🟡 Monitor: Rare chains from system/update processes (version-stamped binaries, Azure VM agents)
- ✅ Clear: All rare chains are explainable infrastructure artifacts
Query 8: Inbound Email Threat Snapshot
📧 Email posture — Single-row summary of inbound email volume, threat breakdown, and delivered threats.
Tool: RunAdvancedHuntingQuery
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize
TotalInbound = count(),
Clean = countif(isempty(ThreatTypes)),
Phish = countif(ThreatTypes has "Phish"),
Malware = countif(ThreatTypes has "Malware"),
Spam = countif(ThreatTypes has "Spam"),
HighConfPhish = countif(ConfidenceLevel has "High" and ThreatTypes has "Phish"),
Blocked = countif(DeliveryAction == "Blocked"),
Delivered = countif(DeliveryAction == "Delivered"),
PhishDelivered = countif(ThreatTypes has "Phish" and DeliveryAction == "Delivered"),
DistinctSenders = dcount(SenderFromAddress),
DistinctRecipients = dcount(RecipientEmailAddress)
Purpose: Instant C-level email posture briefing. The key escalation metric is PhishDelivered — phishing emails that bypassed all protections and reached mailboxes.
Verdict logic:
- 🔴 Escalate:
PhishDelivered > 5orMalware > 0delivered - 🟠 Investigate:
PhishDelivered > 0(any phishing reached mailboxes) - 🟡 Monitor: Phishing detected but 100% blocked/junked
- ✅ Clear: 0 phishing, 0 malware
Drill-down: Use email-threat-posture skill for full email security analysis including ZAP, Safe Links, and authentication breakdown.
Query 9: Cloud App Suspicious Activity
🔑 Cloud ops monitoring — Detects mailbox rule manipulation, transport rule changes, mailbox delegation, MCAS-flagged compromised sign-ins, and human-initiated Conditional Access policy changes via CloudAppEvents. Focuses on rule/permission/CA mutations — the lower-confidence signals not duplicated by Q1's incident roll-up.
Tool: RunAdvancedHuntingQuery
// Allow-list of Microsoft platform service principals that perform automated mailbox/CA lifecycle ops.
// These appear with empty AccountDisplayName; the real actor name lives in RawEventData.UserId.
// Pattern: any RawEventData.UserId starting with "NT SERVICE\" is Microsoft datacenter automation
// (e.g., MSExchangeAdminApiNetCore for tenant-onboarding/permission hygiene). Exclude from analyst
// view to avoid false-positive "empty actor" alarms.
let PlatformServicePrefix = @"NT SERVICE\";
CloudAppEvents
| where Timestamp > ago(7d)
| where ActionType in (
// Exchange — Mail flow manipulation
"New-InboxRule", "Set-InboxRule", "Set-Mailbox",
"Add-MailboxPermission", "New-TransportRule", "Set-TransportRule", "New-Mailbox",
// Exchange — Anti-forensic
"Remove-MailboxPermission", "Remove-InboxRule",
// Conditional Access manipulation (human-initiated only)
"Set-ConditionalAccessPolicy", "New-ConditionalAccessPolicy",
// Compromise signals
"CompromisedSignIn"
)
// Resolve effective actor: AccountDisplayName when present, else RawEventData.UserId
| extend RawUserId = tostring(parse_json(tostring(RawEventData)).UserId)
| extend EffectiveActor = iff(isnotempty(AccountDisplayName), AccountDisplayName, RawUserId)
// Exclude Microsoft platform service principals (datacenter automation noise)
| where not(EffectiveActor startswith PlatformServicePrefix)
// Filter out system/automation-driven CA changes (CA agent, backup policies)
| where not(ActionType in ("Set-ConditionalAccessPolicy", "New-ConditionalAccessPolicy")
and isempty(EffectiveActor))
| extend Category = case(
ActionType in ("New-InboxRule", "Set-InboxRule", "Remove-InboxRule",
"Set-Mailbox", "Add-MailboxPermission", "Remove-MailboxPermission",
"New-TransportRule", "Set-TransportRule", "New-Mailbox"),
"Exchange Admin/Rule Change",
ActionType in ("Set-ConditionalAccessPolicy", "New-ConditionalAccessPolicy"),
"Conditional Access Change",
ActionType == "CompromisedSignIn",
"Compromised Sign-In",
"Other")
| summarize
Count = count(),
UniqueActors = dcount(EffectiveActor),
TopActors = make_set(EffectiveActor, 5),
Actions = make_set(ActionType, 5),
LatestTime = max(Timestamp)
by Category
| order by Count desc
Purpose: Three-category view of cloud app activity invisible to Q10 (AuditLogs). CompromisedSignIn is an MCAS signal independent from Q3's Identity Protection risk events — dual-source corroboration when both fire. CA changes with empty AccountDisplayName are system/agent-driven and filtered out. Inbox rule, transport rule, and mailbox permission changes are the primary BEC persistence/exfil mechanisms — even when no rule has a forwarding payload, rule creation by a previously-flagged user is a strong follow-up signal.
Actor resolution:
AccountDisplayNameis often empty for non-interactive ops; the query falls back toRawEventData.UserId. Actors prefixedNT SERVICE\are Microsoft datacenter automation (e.g.,MSExchangeAdminApiNetCore) and are excluded.
Verdict logic:
- 🔴 Escalate:
Compromised Sign-Inwith 5+ users, ORConditional Access Changeby any human actor, ORExchange Admin/Rule Changewith forwarding-related rules (New-InboxRule,Set-InboxRule,New-TransportRule) - 🟠 Investigate:
Compromised Sign-In(any count), ORRemove-InboxRule/Remove-MailboxPermission(anti-forensic cleanup signals) - 🟡 Monitor: Low-count
Set-Mailboxfrom system actors - ✅ Clear: 0 results across all categories
Drill-down: Use user-investigation for actors in Compromised Sign-In category. Use ca-policy-investigation for Conditional Access Change. For any Exchange-related Q9 finding, also query OfficeActivity | where OfficeWorkload == "Exchange" — CloudAppEvents only surfaces ActionType summaries; OfficeActivity carries the full Parameters JSON (ForwardTo / RedirectTo / ForwardingSmtpAddress), per-operation ClientIP, and ops like MoveToDeletedItems / SoftDelete / HardDelete / MailboxLogin that reveal post-compromise forensics. See queries/email/email_threat_detection.md and the CloudAppEvents / OfficeActivity entries in copilot-instructions.md Known Table Pitfalls.
Query 10: High-Impact Privileged Operations
🔑 Admin activity monitoring — Category-aggregated view of privileged operations: role assignments, PIM activations, credential lifecycle, consent grants, CA policy changes, password management, MFA registration, app registration, and ownership grants.
Tool: RunAdvancedHuntingQuery
let PrivOps = AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName has_any (
"role", "credential", "consent", "Conditional Access", "password", "certificate",
"security info", "owner", "application"
)
| where Result == "success"
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)
// Exclude system-driven CA policy additions (empty actor = CA agent)
| where not(OperationName has "conditional access" and isempty(Actor))
| extend Target = tostring(TargetResources[0].displayName)
| extend Category = case(
OperationName has "security info", "MFA-Registration",
OperationName has "owner", "Ownership",
OperationName has "application", "AppRegistration",
OperationName has "role", "RoleManagement",
OperationName has "credential" or OperationName has "certificate", "Credentials",
OperationName has "consent", "Consent",
OperationName has "Conditional Access", "ConditionalAccess",
OperationName has "password", "Password",
"Other");
PrivOps
| summarize
Count = count(),
UniqueActors = dcount(Actor),
TopActors = make_set(Actor, 5),
Operations = make_set(OperationName, 5),
Targets = make_set(Target, 5),
LatestTime = max(TimeGenerated)
by Category
| order by Count desc
Purpose: Category-level aggregation ensures all 8 privilege domains surface regardless of volume distribution (previous per-actor aggregation was truncated at 15 rows, hiding MFA-Registration, Ownership, and AppRegistration). Key non-obvious details: MFA-Registration deletion + re-registration by same user = credential takeover (T1556.006). Ownership grants to external accounts = persistence (T1098). System-driven CA additions (empty Actor) are filtered out. Password category is high-volume by nature — flag single-actor bulk resets, not self-service.
Verdict logic:
- 🔴 Escalate:
MFA-Registrationdeletions + registrations for same user (method swap attack), ORConsentgrants from unexpected actors, OROwnershipgrants to external accounts, ORConditionalAccesschanges by non-admin actors, ORAppRegistrationwith secrets management from external domains - 🟠 Investigate:
MFA-Registrationfrom CTF/external accounts, ORRoleManagementtargeting Global Admin / Security Admin roles, ORAppRegistrationconsent operations, ORPasswordwith bulk admin resets (single actor, 10+ targets) - 🟡 Monitor: Normal PIM activations and expirations, self-service password resets, credential lifecycle (WHfB/passkey registration)
- ✅ Clear: 0 results or only system-driven operations with expected volume
Query 11: Critical Assets with Verified Internet Exposure
🛡️ Attack surface — Combines ExposureGraph critical asset inventory with MDE's authoritative DeviceInfo.IsInternetFacing classification to identify verified internet-exposed critical assets.
Tool: RunAdvancedHuntingQuery
let InternetFacing = DeviceInfo
| where Timestamp > ago(7d)
| where IsInternetFacing == true
| summarize arg_max(Timestamp, *) by DeviceId
| project DeviceName,
Reason = extractjson("$.InternetFacingReason", AdditionalFields, typeof(string)),
PublicIP = extractjson("$.InternetFacingPublicScannedIp", AdditionalFields, typeof(string)),
ExposedPort = extractjson("$.InternetFacingLocalPort", AdditionalFields, typeof(int));
let CriticalAssets = ExposureGraphNodes
| where set_has_element(Categories, "device")
| where isnotnull(NodeProperties.rawData.criticalityLevel)
| extend critLevel = toint(NodeProperties.rawData.criticalityLevel.criticalityLevel)
| where critLevel < 4
| project DeviceName = NodeName, CriticalityLevel = critLevel,
ExposureScore = tostring(NodeProperties.rawData.exposureScore);
CriticalAssets
| join kind=leftouter InternetFacing on DeviceName
| extend IsVerifiedExposed = isnotempty(PublicIP) or isnotempty(Reason)
| project DeviceName, CriticalityLevel, IsVerifiedExposed,
Reason, PublicIP, ExposedPort, ExposureScore
| order by IsVerifiedExposed desc, CriticalityLevel asc
| take 25
Purpose: Returns the critical asset inventory (criticality 0–3) enriched with MDE's authoritative internet-facing classification. DeviceInfo.IsInternetFacing is confirmed via Microsoft external scans or observed inbound connections and auto-expires after 48h — far more reliable than ExposureGraph properties like isCustomerFacing (business flag) or rawData.IsInternetFacing (not populated in many environments). See MS Docs and queries/network/internet_exposure_analysis.md Query 1 for the canonical reference.
IsVerifiedExposed logic: Checks BOTH PublicIP (populated for PublicScan — Microsoft external scanner) AND Reason (populated for ExternalNetworkConnection — observed inbound traffic). The original isnotempty(PublicIP) missed ExternalNetworkConnection exposures where MDE confirms inbound connections but doesn't populate the scanned public IP field.
Verdict logic:
- 🔴 Escalate: Any
IsVerifiedExposed == truewithCriticalityLevel == 0(internet-facing domain controller/CA) - 🟠 Investigate: Any
IsVerifiedExposed == true(internet-facing critical asset) - 🟡 Monitor: Critical assets exist but none verified internet-facing
- ✅ Clear: All critical assets properly segmented, no internet exposure
Query 12: Exploitable CVEs (CVSS ≥ 8.0) Across Fleet
🛡️ Vulnerability patch priority — Top exploitable critical CVEs with affected device count.
Tool: RunAdvancedHuntingQuery
DeviceTvmSoftwareVulnerabilities
| join kind=inner (
DeviceTvmSoftwareVulnerabilitiesKB
| where IsExploitAvailable == true
| where CvssScore >= 8.0
) on CveId
| summarize
AffectedDevices = dcount(DeviceName),
SampleDevices = make_set(DeviceName, 3),
Software = make_set(SoftwareName, 3)
by CveId, VulnerabilitySeverityLevel, CvssScore
| order by AffectedDevices desc, CvssScore desc
| take 15
Purpose: Instant "what should we patch today" list. Ranks exploitable CVEs by fleet impact (devices affected × CVSS severity). Focus on CVEs with public exploits affecting the most devices.
Verdict logic:
- 🔴 Escalate: Any CVE with
CvssScore >= 9.0ANDAffectedDevices > 10 - 🟠 Investigate: CVE with
CvssScore >= 8.0ANDAffectedDevices > 5 - 🟡 Monitor: Exploitable CVEs exist but affect < 5 devices
- ✅ Clear: No exploitable CVEs with CVSS ≥ 8.0 (unlikely but possible in small environments)
Drill-down: Use exposure-investigation skill for full vulnerability posture assessment.
Post-Processing
Device Drift Score Interpretation (Q6)
Q6 returns pre-computed drift scores directly from KQL — no LLM-side math is needed. Simply present the returned table and apply verdicts using this scale:
| DriftScore | Interpretation | Verdict |
|---|---|---|
| < 80 | Contracting activity (device may be idle/decommissioned) | 🔵 Informational |
| 80–110 | Stable steady-state servers (fleet floor with uptime gate — was 80–120 pre-uptime-filter) | ✅ Clear |
| 110–130 | Minor behavioral expansion | 🟡 Monitor |
| 130–180 | Significant deviation — includes genuine intermittent-workstation drift now that uptime FPs are filtered | 🟠 Investigate |
| 180+ | Major anomaly — multi-dimensional with confirmed uptime baseline | 🔴 Escalate |
VolDrift cap context: VolDriftRaw is projected alongside the capped VolDrift. When interpreting results:
- If
VolDriftRaw≫ 300 but ProcDrift/ChainDrift/AcctDrift are near 100: infrastructure volume spike (GC, patching, agent restart) — low concern despite high raw volume. - If
VolDriftRaw> 300 AND ProcDrift/ChainDrift/AcctDrift are also elevated: genuine multi-dimensional anomaly — high confidence finding. - If
VolDriftRaw≤ 300: cap was not triggered — score reflects true proportions.
Fleet-uniformity rule: If ALL top-10 devices cluster within 20 points of each other, the fleet is behaving uniformly and the verdict should be downgraded one level. Drift is most meaningful when individual devices diverge from the fleet cluster.
⛔ DO NOT manually recompute drift scores. The KQL query handles Volume normalization (÷6 baseline days), VolDrift capping (at 300%), GC infrastructure filtering, and dcount comparison (direct ratio). Trust the returned DriftScore column.
Cross-Query Correlation
After all queries complete, check these correlation patterns and escalate priority when found:
| Pattern | Queries | Implication | Action |
|---|---|---|---|
| Incident account matches risky identity | Q1 Accounts ∩ Q3 AccountUpn |
Incident involves user already flagged AtRisk/Compromised — corroborated signal | Escalate to 🔴 |
| Incident device matches drifting endpoint | Q1 Devices ∩ Q6 DeviceName |
Incident target has behavioral anomalies on endpoint | Escalate to 🔴 |
| Incident device has exploitable CVE | Q1 Devices ∩ Q12 DeviceName |
Incident device is vulnerable to active exploitation | Escalate to 🔴 |
| Spray target already in incident | Q4 targets ∩ Q1 Accounts |
Spray target is already involved in an active incident | Escalate to 🔴 |
| SPN drift AND unusual credential/consent activity | Q5 + Q10 | App credential abuse / persistence | Escalate to 🔴 |
| Device with rare process chain AND exploitable CVE | Q7 + Q12 | Potential active exploitation | Escalate to 🔴 |
| Spray IP target already flagged as risky | Q4 + Q3 | Spray target has active Identity Protection risk | Escalate to 🔴 |
| Closed TP tactics match active findings | Q2 + Q3/Q7/Q8 | Same attack pattern recurring despite recent closures | Escalate to 🟠, note recurrence |
| Mailbox rule manipulation AND email threats | Q9 + Q8 | Potential email exfiltration setup following phishing | Escalate to 🔴 |
| Compromised Sign-In user matches risky identity | Q9 Compromised Sign-In ∩ Q3 AccountUpn |
MCAS compromise + Identity Protection risk — dual-signal corroboration | Escalate to 🔴 |
| Compromised Sign-In user has Mailbox Read (API) | Q9 Compromised Sign-In ∩ Q9 Mailbox Read (API) |
Compromised account actively exfiltrating email via API — BEC kill chain | Escalate to 🔴 |
| Compromised Sign-In user in open incident | Q9 Compromised Sign-In ∩ Q1 Accounts |
MCAS compromise detection overlaps active incident entities | Escalate to 🔴 |
| MFA registration from spray target | Q10 MFA-Registration ∩ Q4 spray targets |
Attacker completing MFA enrollment after successful spray — T1556.006 | Escalate to 🔴 |
| MFA registration from risky user | Q10 MFA-Registration ∩ Q3 AccountUpn |
Risky user registering new auth methods — potential credential takeover | Escalate to 🔴 |
| App registration + SPN drift | Q10 AppRegistration ∩ Q5 SPN drift |
New app + expanding SPN footprint = T1098.001 app-based persistence | Escalate to 🔴 |
| CA policy change + spray/compromise activity | Q9 Conditional Access Change + Q4 or Q9 Compromised Sign-In |
Defense weakened during active attack | Escalate to 🔴 |
| Mailbox Read (API) user has inbox rule changes | Q9 Mailbox Read (API) ∩ Q9 Exchange Admin/Rule Change |
Programmatic read + forwarding rule = full exfiltration chain (T1114.003) | Escalate to 🔴 |
| Phishing recipient is risky user | Q8 delivered phishing ∩ Q3 AccountUpn |
Credential harvesting targeting already-compromised or at-risk user — AiTM chain indicator | Escalate to 🔴 |
| DLP/exfiltration incident + API mailbox access | Q1 Exfiltration tactic ∩ Q9 Mailbox Read (API) |
Incident-level exfiltration alert + active API data access — data loss in progress | Escalate to 🔴 |
| Role management + SPN drift by same actor | Q10 RoleManagement same actor ∩ Q5 SPN drift |
Role escalation + expanding app footprint = app-based persistence (T1098) | Escalate to 🔴 |
Query File Recommendations
Use .github/manifests/discovery-manifest.yaml (auto-generated by python .github/manifests/build_manifest.py) to match findings to downstream query files and skills. Contains title, path, domains, mitre, prompt.
Skip entirely when all verdicts are ✅. Tier depth follows the Rule 8 table.
Query-to-Domain Map
| Query Group | Domain Tag(s) |
|---|---|
| Q1, Q2 (Incidents) | incidents |
| Q3, Q4 (Identity) | identity |
| Q5 (SPN Drift) | spn |
| Q6, Q7 (Endpoint) | endpoint |
| Q8 (Email) | email |
| Q9, Q10 (Admin & Cloud) | admin, cloud |
| Q11, Q12 (Exposure) | exposure |
Valid tags: incidents, identity, spn, endpoint, email, admin, cloud, exposure.
Procedure
For each non-✅ verdict, collect its domain tag(s), then:
- Query files — filter
manifest.querieswheredomainscontains ANY active tag. Rank by (a) number of matching tags, (b) MITRE technique overlap with Q1/Q2Techniques(exact string match onmitrefield), (c) keyword overlap (entities, process names, CVE IDs, ActionTypes) against title/path. Select top 3–5 files for 🔴/🟠, 1–2 for 🟡-only. - Skills — filter
manifest.skillswheredomainsmatches. Substitute actual entity values into theprompttemplate's{entity}placeholder. 🔴/🟠: include all matches as drill-down options; 🟡-only: limit to 3. Skills withoutdomains(tooling/visualization) are never auto-suggested.
Report Output
Insert 📂 Recommended Query Files after 🎯 Recommended Actions. Include a 🔧 Suggested Skill Drill-Downs sub-section with manifest skill prompts (entity-substituted).
⛔ Numbered list, NOT table — links inside table cells don't render clickable in VS Code chat.
Format: 1. **[<Title>](queries/<subfolder>/<file>.md)** — Q<N>: <finding> — 💡 *"<entity-specific prompt>"*
- Link text = manifest
title, target = manifestpath(forward slashes). - Prompts MUST reference specific entities/IOCs/TTPs from findings — no generic placeholders.
- When no matching files: suggest authoring new queries.
Adding New Query Files or Skills
- Query files: add
**Domains:** <tag1>, <tag2>to metadata header (after**MITRE:**). - Skills: add
threat_pulse_domains: [<tag>]anddrill_down_prompt: '<prompt>'to YAML frontmatter. - Run
python .github/manifests/build_manifest.py— validator flags missing fields.
Report Template
Output modes:
- Inline chat (default) — render in chat. Truncate data tables to 10 rows; omit Drill-Down, Cross-Investigation, and Investigation Timeline sections when no drill-downs have executed.
- Markdown file — triggered by
💾 Save full investigation reportin Phase 4. Full data tables, no row limits. Path:reports/threat-pulse/Threat_Pulse_YYYYMMDD_HHMMSS.md. Source data: pulse results from context +/memories/session/threat-pulse-drilldowns.md(authoritative after context compaction).
Verdicts: 🔴 Escalate | 🟠 Investigate | 🟡 Monitor | ✅ Clear | 🔵 Informational | ❓ No Data
- ❓ No Data — query returned table resolution error or timeout. Report the error and table. Treat as monitoring gap.
- 🔵 Informational — neutral context (e.g., Q2 with 0 closures, Q6 with DriftScore < 80). No action needed.
- Zero results format:
✅ No <type> detected in the last <N>d. Checked: <table> (0 matches)
Structure
# 🔍 Threat Pulse — <Workspace> | <Date>
**Workspace:** <name> (`<id>`)
**Scan Date:** <YYYY-MM-DD HH:MM UTC>
**Scan Duration:** <N>min | **Queries:** 12 | **Drill-Downs:** <N> (file mode only)
## Executive Summary
<2–4 sentences synthesizing pulse + drill-down findings. State final risk posture incorporating all evidence.>
## Dashboard Summary
<12-row table (Q1, Q2, Q3, Q4–Q12) — columns: #, Domain, Status (verdict emoji), Key Finding (1-line).>
## Detailed Findings
<One section per query — EVERY query gets a section (no skipping). Q2 closed summary always renders after Q1 even when Q1 is ✅.>
## Cross-Query Correlations
<Table per Post-Processing rules, or `✅ No correlations detected`.>
## 🎯 Recommended Actions
<Prioritized table: action, trigger query, drill-down skill.>
## 📂 Recommended Query Files
<Per Report Output Block procedure. For 🟡-only verdicts use "📂 Proactive Hunting Suggestions" header. Omit when all ✅.>
## Drill-Down Investigation Results (file mode, when drill-downs executed)
### 1. <Title> — <Skill Name>
**Triggered by:** Q<N> — <finding>
**Entity:** <target> | **Lookback:** <timerange> | **Risk:** <emoji> <level>
**Key Findings:** <max 8 evidence-cited bullets>
**Evidence Summary:** <1–2 paragraph narrative with specific numbers/identifiers. Back-reference pulse queries.>
**Recommendations:** <numbered actions>
### 2. <Next Title> — <Skill Name>
...
## Cross-Investigation Correlation (file mode, when drill-downs executed)
| Connection | Evidence | Drill-Downs | Implication |
|-----------|----------|-------------|-------------|
<Patterns only visible across multiple investigations. If none: `✅ No cross-investigation correlations identified — each finding is independent.`>
## Consolidated Recommendations (file mode)
| Priority | Recommendation | Source | Risk |
<Deduplicated across pulse + drill-downs. If same action appears in both, cite both sources on one row.>
## Appendix: Investigation Timeline (file mode)
| Time | Action | Key Result |
Column / Format Rules
- Q1:
| Incident | Sev | Title | Age | Alerts | Owner | Tactics | Accounts | Devices | Tags |—Sev= incident severity, Unassigned →⚠️ Unassigned,Ageuses relativeAgeDisplay, entity/tag columns render max 5 comma-separated.- When
TotalAll > 10: prepend**Showing 10 of {TotalAll} open incidents ({TotalHighCritical} High/Critical)** (sorted by severity, then newest, most complex first) - The list is deduplicated by Title (one representative per title). When an incident's
TitleDupCount > 1, append(+{TitleDupCount-1} more)to its Title cell so recurring/noisy incident types remain visible without monopolizing the table. - When
TotalHighCritical == 0: prepend**No High/Critical incidents — showing top Medium/Low from {TotalAll} open**
- When
- Q1 incidents must include
[#<id>](https://security.microsoft.com/incidents/<ProviderIncidentId>?tid=<tenant_id>)links. - Q2: Classification breakdown + severity + MITRE tactics/techniques from TP closures. Always render even when Q1 is ✅.
Rules
| Rule | Status |
|---|---|
| Executive Summary synthesizes across pulse AND drill-downs (when present) | ✅ REQUIRED |
| Every query has a verdict row — no omissions, no skipped "clear" sections | ✅ REQUIRED |
Drill-down subsections are structured summaries, not raw dumps, with Triggered by: Q<N> |
✅ REQUIRED |
| Cross-Investigation Correlation explicitly states "none found" if no connections exist | ✅ REQUIRED |
| Consolidated Recommendations are deduplicated (same action + multiple sources → one row) | ✅ REQUIRED |
| Fabricated data | ❌ PROHIBITED |
Known Pitfalls
| Pitfall | Mitigation |
|---|---|
| Q5 takes ~35s (97d lookback) | Acceptable — runs in parallel. Only query needing Data Lake |
Q7 capped at ago(30d) |
AH Graph API limit. Use queries/endpoint/rare_process_chains.md via Data Lake for 90d |
| Q6 drift scores | Computed in-query — do NOT recompute LLM-side |
| Q9 drill-down: CloudAppEvents identity filtering | AccountId and AccountObjectId are Entra ObjectId GUIDs, NOT UPNs. Filtering by UPN returns 0 results silently. Use AccountDisplayName for display-name matching, or resolve UPN→ObjectId via Graph API first. NEVER use tostring(RawEventData) has "UPN" — it causes query cancellation on this high-volume table |
Q9: RESTSystem false positives |
Exchange Online first-party backend services use Client=RESTSystem in ClientInfoString and appear as AppId GUIDs in AccountDisplayName. These are NOT user/app API access — they are system-level mail flow, compliance scanning, or connector ingestion. Q9 filters these out; if investigating Q9 results and see GUID actors with RESTSystem, they are benign Microsoft internal operations |
| Drill-down query error → silent skip | ⛔ NEVER skip. On SemanticError/Failed to resolve: diagnose → fix → re-execute → present corrected results. Partial results with silently omitted failures are PROHIBITED |
Schema pitfalls (column names, dynamic fields,
parse_jsonpatterns) are covered incopilot-instructions.mdKnown Table Pitfalls. Refer there forSecurityAlert.Status,ExposureGraphNodes.NodeProperties, timestamp columns, andAuditLogs.InitiatedBy.
Quality Checklist
- All 12 queries executed
- Every query has a verdict row — no omissions, no skipped "clear" sections
- ✅ verdicts cite table + "0 results"; 🔴/🟠 cite specific evidence
- All incidents have clickable XDR portal URLs
- Cross-query correlations checked
- Every non-✅ drill-down has a
🎬 Take Actionblock with portal-ready KQL (correct required columns per entity type) - Every
🎬 Take Actionblock includes the⚠️ AI-generated contentwarning immediately below the heading -
📂 Recommended Query Filessection present when any non-✅ verdict exists (clickable links, not tables) - No fabricated data
SVG Dashboard Generation
After completing the Threat Pulse report, the user may request an SVG visualization. Use the svg-dashboard skill in manifest mode — the widget manifest is at .github/skills/threat-pulse/svg-widgets.yaml.
Execution
- Read
svg-widgets.yaml(widget manifest) - Read the
svg-dashboardSKILL.md for component rendering rules - Map manifest
fieldvalues to the Threat Pulse report data already in context (or read the saved report file) - Render SVG → save to
temp/threat_pulse_{date}_dashboard.svg
.github/skills/ai-agent-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill ai-agent-posture -g -y
SKILL.md
Frontmatter
{
"name": "ai-agent-posture",
"description": "Use this skill when asked to audit, assess, or report on AI agent security posture across Copilot Studio, Microsoft 365 Copilot, Microsoft Foundry, and third-party agents. Triggers on keywords like \"AI agent posture\", \"agent security audit\", \"Copilot Studio agents\", \"agent inventory\", \"agent access\", \"broadly accessible agents\", \"agent tools\", \"MCP tools on agents\", \"agent knowledge sources\", \"XPIA risk\", \"agent sprawl\", \"AI agent risk\", \"agent governance\", or when investigating AI agent configurations, access posture, tool permissions, or credential exposure. This skill queries the AgentsInfo table in Advanced Hunting to produce a comprehensive security posture assessment covering agent inventory, access posture, broadly-accessible agent exposure, MCP tool proliferation, knowledge source exposure, XPIA email exfiltration risk, hard-coded credential detection, external endpoint risks, creator governance, and agent sprawl analysis. Supports inline chat and markdown file output.",
"drill_down_prompt": "Run AI agent security audit — agent inventory, authentication gaps, tool permissions",
"threat_pulse_domains": [
"admin",
"cloud"
]
}
AI Agent Security Posture — Instructions
Purpose
This skill audits the security posture of AI agents (Copilot Studio, Microsoft 365 Copilot / Agent Builder, Microsoft Foundry, and third-party platforms) across your organization using the AgentsInfo table in Microsoft Defender XDR Advanced Hunting.
🔄 Table migration (AIAgentsInfo → AgentsInfo): This skill was migrated from the deprecated
AIAgentsInfotable to the unified multi-platformAgentsInfotable.AIAgentsInforemains queryable until July 1, 2026, but it is Copilot Studio-only and uses a different schema. All queries in this skill targetAgentsInfo. The new table is a different data model, not a rename — see Table Schema Reference and Known Pitfalls for the differences that shaped these queries.
AI agents are autonomous or semi-autonomous applications that can access organizational data, send emails, call external APIs, and use MCP tools. Misconfigured agents — missing authentication, overly broad access, AI-controlled email sending, hard-coded credentials — represent a growing attack surface. This skill systematically evaluates that surface.
What this skill covers:
| Domain | Key Questions Answered |
|---|---|
| 🔍 Agent Inventory | How many agents exist? What's their status, platform, environment? |
| 🔐 Access Posture | Which agents are broadly accessible (allowForAllUsers)? How are agents shared (appType: lob/shared)? |
| 🛠️ Tools & MCP | Which agents have MCP tools? What operations can they perform? |
| 📚 Knowledge Sources | What data sources are agents connected to? |
| 📧 XPIA Email Risk | Which agents can send email (data exfil precondition)? |
| 🔑 Credential Exposure | Are credentials hard-coded in agent instructions or connector metadata? |
| 🌐 External Endpoint Risk | What external hosts do agent connectors reach? Any insecure schemes or non-standard ports? |
| 👥 Creator Governance | Who creates agents? Is there naming hygiene? Abandoned agents? |
Data source: AgentsInfo table (Advanced Hunting) — currently in Preview.
References:
- Microsoft Docs — AgentsInfo table
- From runtime risk to real-time defense: Securing AI agents — Microsoft Defender Security Research blog detailing three attack scenarios this skill detects
- Microsoft Agent 365: The control plane for AI agents — Enterprise governance platform for agent lifecycle management (Registry, Access Control, Visualization, Interoperability, Security)
- Securing Copilot Studio agents with Microsoft Defender
- Real-time agent protection during runtime (Preview)
🔴 URL Registry — Canonical Links for Report Generation
MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL. If a URL is not in this registry, omit the hyperlink entirely and use plain text.
| Label | Canonical URL |
|---|---|
BLOG_RUNTIME_RISK |
https://www.microsoft.com/en-us/security/blog/2026/01/23/runtime-risk-realtime-defense-securing-ai-agents/ |
BLOG_AGENT_365 |
https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-agent-365-the-control-plane-for-ai-agents/ |
DOCS_AGENTSINFO |
https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-agentsinfo-table |
DOCS_AGENT_PROTECTION |
https://learn.microsoft.com/en-us/defender-cloud-apps/ai-agent-protection |
DOCS_RUNTIME_PROTECTION |
https://learn.microsoft.com/en-us/defender-cloud-apps/real-time-agent-protection-during-runtime |
Usage in reports: When referencing attack scenarios, link to BLOG_RUNTIME_RISK. When referencing Agent 365 governance, link to BLOG_AGENT_365. When referencing runtime protection, link to DOCS_RUNTIME_PROTECTION.
Threat Landscape: Why AI Agent Posture Matters
Microsoft Defender Security Research has identified that AI agents represent a fundamentally new attack surface where the agent's capabilities are effectively equivalent to code execution. When a tool is invoked, it can read/write data, send emails, update records, or trigger workflows — and an attacker who can influence the agent's plan can indirectly cause the execution of unintended operations within the agent's capability sandbox.
The core risk: the agent's orchestrator depends on natural language input to determine which tools to use and how to use them. This creates exposure to prompt injection and reprogramming failures, where malicious prompts, embedded instructions, or crafted documents can manipulate the decision-making process.
This skill's queries map directly to three attack scenarios documented by Microsoft:
Attack Scenario 1: Malicious Instruction Injection via Event-Triggered Workflow
| Element | Detail |
|---|---|
| Vector | Crafted email sent to an agent-monitored mailbox (event trigger) |
| Mechanism | Email contains hidden instructions telling the agent to search knowledge base for sensitive data and exfiltrate via email to attacker |
| Preconditions | Agent can send email (email connector) + has an event/email trigger + a knowledge source |
| Detection | Q5 (XPIA Email Risk) detects email-capable agents via connector operations; Q7 (Knowledge Sources) identifies data exposure |
| Skill Signal | Agents with an email-send operation (e.g., Office 365 Outlook Send an email (V2)) + knowledge sources, especially if broadly accessible (allowForAllUsers == "true") = highest risk |
Attack Scenario 2: Prompt Injection via Shared Document → Email Exfiltration (XPIA)
| Element | Detail |
|---|---|
| Vector | Malicious insider edits a SharePoint document with crafted instructions |
| Mechanism | Agent processing the document is tricked into reading a sensitive file on a different SharePoint site (that the agent has access to but the attacker doesn't) and emailing contents to attacker-controlled domain |
| Preconditions | Agent has a knowledge/data source + an email-send connector operation |
| Detection | Q5 (XPIA) + Q7 (Knowledge Sources) identifies the attack surface |
| Skill Signal | A declared data source + an email-send operation (e.g., Send an email (V2)) on the same agent = classic XPIA vector |
Attack Scenario 3: Capability Reconnaissance on Unauthenticated Agent
| Element | Detail |
|---|---|
| Vector | Attacker interacts with publicly accessible chatbot (no authentication required) |
| Mechanism | Series of crafted prompts to probe and enumerate the agent's tools and knowledge sources, then exploit them to extract sensitive data |
| Preconditions | Agent is broadly accessible (allowForAllUsers == "true", e.g., shared tenant-wide or website embed) |
| Detection | Q4 (Broadly-Accessible Agents) identifies exposed agents; cross-reference with Q7 (knowledge sources with customer data) |
| Skill Signal | allowForAllUsers == "true" + knowledge sources containing sensitive data = reconnaissance target |
⚠️ Authentication-type telemetry gap: The deprecated
AIAgentsInfotable exposedUserAuthenticationType(None/Integrated/Custom), which let this skill directly flag unauthenticated agents. The newAgentsInfotable has no populated authentication-type column in current telemetry (ToolsAuthenticationTypeis empty). The closest available exposure signal isRawAgentInfo.allowForAllUsers == "true"(broadly accessible to all tenant users). This is a proxy, not an equivalent — it measures broad reach, not absence of authentication. Treat broadly-accessible agents as the highest-exposure cohort and recommend Entra-based access policies (Agent 365) to close the gap.
Mitigation: Defender Runtime Protection
Microsoft Defender provides webhook-based runtime inspection for Copilot Studio agents. Before every tool, topic, or knowledge action is executed, the generative orchestrator sends a webhook to Defender containing the planned invocation context. Defender analyzes intent and destination in real time and can allow or block the action before execution.
This is the primary runtime defense against all three scenarios above. When reviewing posture findings from this skill, always recommend enabling Defender Runtime Protection for agents flagged as high-risk. See Real-time agent protection during runtime.
Governance Framework: Microsoft Agent 365
Microsoft Agent 365 is the enterprise control plane for AI agents — the platform-level answer to the governance gaps this skill detects. It provides five capabilities that directly map to this skill's risk dimensions:
| Agent 365 Capability | What It Does | Skill Dimensions Addressed |
|---|---|---|
| 1. Registry | Single source of truth for all agents (Entra agent ID). IT can quarantine unsanctioned agents and detect shadow agents. Agent Store for governed discovery. | Agent Inventory (Q1), Creator Governance (Q10), Agent Sprawl (Q11) |
| 2. Access Control | Unique agent IDs via Entra. Agent Policy Templates enforce security from day one. Adaptive, risk-based access policies. Least-privilege enforcement. | Broadly-Accessible Agents (Q4), Access Posture (Q3) |
| 3. Visualization | Unified dashboard mapping agents ↔ users ↔ resources. Role-based reporting. Compliance logging, e-discovery, and audit trail. | MCP Tool Exposure (Q6), Knowledge Sources (Q7), Creator Governance (Q10) |
| 4. Interoperability | Agents access Work IQ (org data, relationships, context). Works across Copilot Studio, Microsoft Foundry, Agent Framework, Agent 365 SDK, and partner platforms. | Knowledge Source Risk (Q7), Tools Inventory (Q12) |
| 5. Security | Defense-in-depth via Microsoft Defender (posture + threat detection + runtime protection), Entra (real-time blocking), and Purview (data exposure risk, sensitive data leak prevention, compliance). | XPIA Email Risk (Q5), Credential Hygiene (Q8), External Endpoint Risk (Q9) |
How to reference Agent 365 in reports: When this skill identifies governance gaps (sprawl, missing authentication, uncontrolled tool access), recommend Agent 365 as the strategic platform to address them. Specific mappings:
- Agent sprawl / no naming conventions → Agent 365 Registry + quarantine for unsanctioned agents
- Missing access controls / broadly-accessible agents → Agent 365 Access Control + Entra agent IDs + Policy Templates
- No visibility into agent-resource connections → Agent 365 Visualization dashboard
- Uncontrolled MCP/tool proliferation → Agent 365 Security + Defender posture management
- XPIA / data exfiltration risk → Agent 365 Security + Purview for real-time data leak prevention
📑 TABLE OF CONTENTS
- Critical Workflow Rules — Mandatory rules
- Table Schema Reference — AgentsInfo columns and data types
- Agent Security Score Formula — Composite risk scoring
- Execution Workflow — Phase-by-phase query plan
- Sample KQL Queries — All queries (Q1–Q12)
- Output Modes — Inline vs Markdown report
- Inline Report Template — Chat-rendered format
- Markdown File Report Template — Disk-saved format
- Known Pitfalls — Schema quirks and edge cases
- Quality Checklist — Pre-delivery validation
- SVG Dashboard Generation — Visual dashboard from report
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
-
ALWAYS use
RunAdvancedHuntingQuery— TheAgentsInfotable is an Advanced Hunting table. It is NOT available in Sentinel Data Lake (query_lake). All queries in this skill MUST useRunAdvancedHuntingQuery. -
ALWAYS deduplicate agents with
arg_max— The table contains multiple records per agent (state snapshots over time). Every query that analyzes current agent state MUST use| summarize arg_max(Timestamp, *) by AgentIdto get the latest record per agent. NoteAgentIdis a guid. -
ALWAYS exclude deleted agents (unless specifically auditing deletions) — Add
| where LifecycleStatus != "Deleted"after deduplication.LifecycleStatusis blank for active agents and only set toDeletedfor removed ones, so this filter keeps active agents. -
ASK the user for output format before generating the report:
- Inline chat summary (quick review in chat)
- Markdown file report (detailed, archived to
reports/ai-agent-posture/) - Both (markdown + inline summary)
-
⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (
✅ No [finding] detected) when queries return 0 results. Never guess or assume. -
🔴 The rich agent detail lives in
RawAgentInfo(dynamic), not in flat columns — Governance signals (creatorId,allowForAllUsers,appType,scope) and deep tool/connector detail (declarativeCopilotMetadata) are nested inside theRawAgentInfodynamic column. The normalized columns (DeclaredTools,McpServers,DeclaredDataSources) are sparse and flat. ParseRawAgentInfowithmv-expand/dot-notation — never assume a flat column holds the value. See Known Pitfalls. -
Run queries in parallel batches where possible — Phase 1 queries (Q1–Q3) are independent and can run in parallel. Phase 2 queries (Q4–Q9) are independent and can run in parallel. Phase 3 (Q10–Q12) can run in parallel.
-
Time tracking — Report elapsed time after each phase completion.
Table Schema Reference
The AgentsInfo table (Preview) contains configuration snapshots of AI agents across Copilot Studio, Microsoft 365 Copilot (Agent Builder), Microsoft Foundry, and third-party platforms. The schema below reflects the live table (which differs from the published docs in several places — column casing, types, and which columns are actually populated).
Top-level columns
| Column | Type | Description |
|---|---|---|
Timestamp |
datetime | Last recorded date/time for this agent snapshot |
AgentId |
guid | Unique agent identifier (dedup key) |
Name |
string | Display name of the agent |
Description |
string | Agent description |
Platform |
string | Copilot Studio, Agent Builder in Microsoft 365 Copilot, Microsoft Foundry, Other, SharePoint, Amazon Bedrock, LocalAgents |
Version |
string | Agent version |
PublishedStatus |
string | Published, Draft |
LifecycleStatus |
string | Blank for active agents; Deleted for removed agents |
CreatedDateTime |
datetime | When the agent was created |
LastUpdatedDateTime |
datetime | When last updated |
LastPublishedDateTime |
datetime | When last published |
Owners |
dynamic | Owner identities (sparse) |
SharedWith |
dynamic | Sharing targets (sparse) |
InstanceCount |
int | Blueprint instance count |
Instructions |
string | System prompt / agent instructions (well populated) |
Model |
string | Backing LLM model (sparse) |
Capabilities |
dynamic | Declared capabilities (sparse) |
DeclaredDataSources |
dynamic | Knowledge/data sources — array of filename/source strings (sparse) |
DeclaredTools |
dynamic | Declared tools — array of {type, name} (sparse, flat) |
McpServers |
dynamic | MCP servers — array of {name, description} (sparse) |
Skills, ConnectedAgents, Memory, Guardrails |
dynamic | Additional declared config (sparse) |
EntraAgentID / EntraBlueprintID / ObservabilityID |
string | Entra + observability linkage (note capital ID) |
RawAgentInfo |
dynamic | Primary detail source — full governance + connector manifest (populated for ~all agents). See nested keys below |
TenantId, Type, SourceSystem |
string | Standard envelope columns |
⚠️ Columns that are EMPTY / unreliable in current telemetry
These columns exist but are not populated in observed data — do NOT build detections on them without first confirming population:
ToolsAuthenticationType (auth-type gap — see below), Availability, Endpoints, Triggers, Permissions, Model (mostly), and most of Owners/SharedWith.
🔴 Authentication-type gap: The deprecated
AIAgentsInfo.UserAuthenticationType(None/Integrated/Custom) has no populated equivalent inAgentsInfo. There is no reliable way to flag "unauthenticated" agents from this table. UseRawAgentInfo.allowForAllUsers == "true"as a broad-exposure proxy (Q4) and document the gap.
RawAgentInfo nested keys (the rich data)
For Copilot Studio agents, RawAgentInfo is a marketplace/governance manifest. Key fields the queries below rely on:
| Path | Meaning |
|---|---|
RawAgentInfo.creatorId |
Creator GUID (resolve to UPN via IdentityInfo join). Replaces CreatorAccountUpn. Sparse |
RawAgentInfo.allowForAllUsers |
"true" = broadly accessible to all tenant users (exposure signal). Replaces AccessControlPolicy == "Any" |
RawAgentInfo.appType |
lob (line-of-business, owner-scoped), shared, thirdParty, firstParty |
RawAgentInfo.scope |
Sharing scope (e.g., tenant) |
RawAgentInfo.declarativeCopilotMetadata |
Deep connector/tool detail (DCM). Present only for the connector-sourced subset (~10% of Copilot Studio agents) |
DCM nesting (recovers deep tool, operation, and endpoint detail):
RawAgentInfo.declarativeCopilotMetadata[]
.actions[]
.apis[] // .type = OpenApi | RemoteMCPServer | api_action
.serverUrls[] // populated for OpenApi + RemoteMCPServer (external hosts)
.operations[]
.operationId // e.g., "Office 365 Outlook Send an email (V2)"
DCM siblings also carry instructions, llmModels (model), and sourceIds (incl. EnvironmentId, SourceAgentId).
Agent Security Score Formula
The Agent Security Score is a composite risk indicator that summarizes the security posture of an organization's AI agent fleet. Higher scores indicate greater risk.
Scoring Dimensions
$$ \text{AgentSecurityScore} = \sum_{i} \text{DimensionScore}_i $$
Each dimension contributes 0–20 points to a maximum of 100:
| Dimension | Max | 🟢 Low (0–5) | 🟡 Medium (6–12) | 🔴 High (13–20) |
|---|---|---|---|---|
| Broadly-Accessible Agents | 20 | 0 agents with allowForAllUsers == "true" |
1–2 broadly-accessible agents | ≥3 broadly-accessible agents, especially if Published with knowledge sources or email capability |
| XPIA Email Risk | 20 | 0 email-capable agents | 1–2 email-capable agents (scoped access) | ≥1 email-capable agent that is also broadly accessible or has knowledge sources |
| Tool & Endpoint Exposure | 20 | 0–2 MCP agents, known creators, no external endpoints | 3–10 MCP agents, external endpoints all HTTPS/standard-port | >10 MCP agents, OR MCP/endpoint agents that are broadly accessible, OR any insecure-scheme / non-standard-port external endpoint (Q9 escalators) |
| Knowledge Source Risk | 20 | 0 agents with data sources + broad access | 1–3 agents with data sources + scoped access | Agents with data sources + allowForAllUsers == "true". Compounding rule: When agents have data sources + an email-send operation + broad access (the full XPIA chain from Q5 + Q7), score at maximum (20) for this dimension AND score XPIA Email Risk at maximum (20) — the combination is the documented attack pattern |
| Credential Hygiene | 20 | 0 credential patterns detected | Patterns found but agent is Draft (unpublished) | Patterns found in Published agents |
Interpretation Scale
| Score | Rating | Action |
|---|---|---|
| 0–20 | ✅ Healthy | Normal posture, no immediate concerns |
| 21–45 | 🟡 Elevated | Review — minor misconfigurations detected |
| 46–70 | 🟠 Concerning | Investigate — multiple risk signals present |
| 71–100 | 🔴 Critical | Immediate remediation — significant agent security risk |
The Tool & Endpoint Exposure dimension folds external-endpoint risk (Q9) into the MCP exposure signal: an insecure scheme, a non-standard port, or an external endpoint on a broadly-accessible agent each escalates this dimension to its High tier regardless of MCP count.
Supplementary Indicators (not summed into the /100 score)
Two indicators are reported alongside the composite score for added context. They are intentionally not added to the /100 total — they enrich interpretation and feed the dimensions above as evidence.
| Indicator | Source | What it tells you |
|---|---|---|
| Capability Privilege Index | Q13 | Count of agents holding ≥1 sensitive operation (mail-send, directory-write, data-write, messaging). Split by broad access. A high count of broadly-accessible + sensitive-op agents is the strongest privilege-abuse signal and should justify maxing the Broad Access and/or XPIA dimensions. |
| Deep-Manifest Coverage | Q14 | Percentage of the fleet carrying declarativeCopilotMetadata (DCM). Because the XPIA, endpoint, and capability queries depend on DCM, this is the fraction of the estate that was fully inspectable. Every report MUST surface this so the analyst knows what was not inspected. |
Execution Workflow
Phase 0: Prerequisites
- Confirm
RunAdvancedHuntingQueryis available (AgentsInfo is AH-only) - Ask user for output format (inline / markdown / both)
Phase 1: Inventory & Overview (Q1–Q3)
Run in parallel — no dependencies between queries.
| Query | Purpose |
|---|---|
| Q1 | Global inventory summary (counts, date range, platforms, creators) |
| Q2 | Status and platform breakdown |
| Q3 | Access posture distribution (appType / allowForAllUsers) |
Phase 2: Security Risk Analysis (Q4–Q9)
Run in parallel — no dependencies between queries.
| Query | Purpose |
|---|---|
| Q4 | Broadly-accessible agents (allowForAllUsers == "true" detail) |
| Q5 | XPIA email exfiltration risk (email-send connector operations) |
| Q6 | MCP tool inventory across agents |
| Q7 | Knowledge / data source audit |
| Q8 | Hard-coded credential scan |
| Q9 | External endpoint & HTTP risk (connector serverUrls) |
Phase 3: Governance & Trends (Q10–Q12)
Run in parallel — no dependencies between queries.
| Query | Purpose |
|---|---|
| Q10 | Top creators and naming hygiene |
| Q11 | Agent creation trend over time |
| Q12 | Capability / tools inventory (all operation types) |
| Q13 | Operation-level privilege mapping (sensitive-operation matrix → Capability Privilege Index) |
| Q14 | Deep-manifest coverage (% of fleet with DCM → report coverage banner) |
Phase 4: Score Computation & Report Generation
- Compute per-dimension scores from Phase 1–3 data
- Sum dimension scores for composite Agent Security Score
- Generate report in requested output mode
- Report total elapsed time
Phase 5: Runtime Correlation (Optional)
AgentsInfo describes how agents are configured; it does not show whether they are actually used or what they do at runtime. To close that gap, correlate the flagged configuration set against the CopilotActivity table (all-surface AI activity log, available in Advanced Hunting).
When to run: After Phase 4, when the user wants to know which flagged agents are actually active, which are dormant, or whether a high-risk agent shows runtime behavior.
🔴 Use a SCOPED lookup, never a fleet-wide join. A leftouter/inner join of the full AgentsInfo fleet (~15k agents, heavy RawAgentInfo dynamic) against CopilotActivity (100k+ rows) times out the Advanced Hunting endpoint. Instead:
- Phase 4 produces a small flagged-agent NAME list (broadly-accessible from Q4 + sensitive-op agents from Q13 — typically <50 names).
- Filter
CopilotActivityto that name set withwhere AgentName in (FlaggedNames)— light, no join. See Query 15.
Join-key pitfall: CopilotActivity.AgentId is a composite/prefixed string (e.g., T_<tenant>.<guid>, CopilotStudio.Declarative.T_….gpt.<guid>, or literals like AgentBuilder) — it does not equal the clean AgentsInfo.AgentId GUID, so ID-based joins return 0 matches. AgentName is the reliable correlation key. Also note most CopilotActivity rows have an empty AgentId/AgentName (general M365 Copilot usage, not declarative-agent-attributed), so runtime attribution is inherently low-coverage — absence from CopilotActivity does NOT prove an agent is dormant.
Two high-value correlations:
- Active-and-dangerous — a flagged agent (broadly accessible / XPIA-exposed / sensitive ops) that ALSO appears in
CopilotActivitywith real interactions → highest remediation priority (Query 15). - Configured-but-dormant — a flagged agent absent from
CopilotActivityover the window → lower urgency, candidate for decommissioning (caveat: attribution gaps above).
For deeper runtime reconstruction (data accessed, tools invoked, jailbreak detections), hand off to the dedicated query library queries/cloud/copilot_activity_investigation.md rather than duplicating queries here.
Keep this phase thin and scoped: the posture skill owns configuration assessment;
copilot_activity_investigation.mdowns runtime reconstruction. Reference, don't duplicate.
Sample KQL Queries
All queries below are validated against the live
AgentsInfotable. Use them exactly as written, substituting only where noted. Because the rich agent detail lives in theRawAgentInfodynamic column, several queries parseRawAgentInfo.declarativeCopilotMetadata(DCM). DCM is present only for the connector-sourced subset of agents — queries that depend on it carry a coverage caveat.
Query 1: Global Inventory Summary
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| extend CreatorId = tostring(RawAgentInfo.creatorId)
| summarize
UniqueAgents = dcount(AgentId),
EarliestRecord = min(Timestamp),
LatestRecord = max(Timestamp),
Published = countif(PublishedStatus == "Published"),
Draft = countif(PublishedStatus == "Draft"),
Deleted = countif(LifecycleStatus == "Deleted"),
UniquePlatforms = dcount(Platform),
UniqueCreators = dcount(CreatorId)
Note:
UniqueCreatorscounts only agents with a populatedRawAgentInfo.creatorId(the connector-sourced subset). It under-counts true creators; treat it as a lower bound.
Query 2: Status & Platform Breakdown
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| summarize AgentCount = count() by Platform, PublishedStatus
| order by AgentCount desc
⚠️ Authentication-type gap: The deprecated
AIAgentsInfotable broke this down byUserAuthenticationType.AgentsInfohas no populated authentication-type column, so this query reports status by platform instead. For exposure, use Q3 (access posture) and Q4 (broadly-accessible agents).
Query 3: Access Posture Distribution
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend AppType = tostring(RawAgentInfo.appType),
AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| summarize AgentCount = count() by Platform, AppType, AllowAllUsers
| order by AgentCount desc
Interpretation: appType == "lob" (line-of-business) agents are owner-scoped; appType == "shared" are shared more widely. allowForAllUsers == "true" (any platform) is the broad-exposure signal — these reach every tenant user. This replaces the old AccessControlPolicy distribution.
Query 4: Broadly-Accessible Agents
🔴 Security-critical query — agents with allowForAllUsers == "true" are accessible to all tenant users. This is the closest available proxy for the old "unauthenticated / Any access" exposure signal (see the authentication-type gap).
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers),
AppType = tostring(RawAgentInfo.appType),
CreatorId = tostring(RawAgentInfo.creatorId)
| where AllowAllUsers == "true"
| project Name, Platform, PublishedStatus, AppType, CreatorId, AgentId, CreatedDateTime, Description
| order by PublishedStatus asc, CreatedDateTime desc
Post-processing: For each broadly-accessible agent, note:
- Is it Published (active) or Draft?
- Cross-reference with Q5 (email-capable) and Q7 (knowledge sources) for compounding XPIA / reconnaissance risk.
🔴 Capability Reconnaissance Risk (Attack Scenario 3): Broadly-accessible agents are prime targets for adversarial probing. Published agents with knowledge sources containing customer/internal data are the highest-priority findings.
Query 5: XPIA Email Exfiltration Risk (Email-Capable Agents)
🔴 Security-critical query — agents that can send email via a connector operation. A successful prompt-injection (XPIA) attack could direct the agent to exfiltrate data to arbitrary recipients.
Coverage caveat: Detects email-send operations declared in
RawAgentInfo.declarativeCopilotMetadata(DCM). DCM is present only for the connector-sourced agent subset. The oldIsGenerativeOrchestrationEnabledflag and action-levelinputs(AI-controlled vs hardcoded recipient) are not available inAgentsInfo— this query identifies capability, not orchestration mode.
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId)
| where OperationId has "Send an email" or OperationId has "SendEmail"
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers),
CreatorId = tostring(RawAgentInfo.creatorId)
| summarize EmailOperations = make_set(OperationId)
by AgentId, Name, Platform, PublishedStatus, AllowAllUsers, CreatorId
| order by AllowAllUsers desc, PublishedStatus asc
Post-processing:
AllowAllUsers == "true"→ email-capable and broadly accessible = highest XPIA risk (any tenant user can trigger the chain).- Cross-reference with Q7: an email-capable agent that also has knowledge/data sources is the documented XPIA exfiltration pattern (Attack Scenario 2). Prioritize these for Defender Runtime Protection.
🔴 Attack Scenario Mapping: This query detects the agent-configuration precondition (email-send capability) for two documented scenarios — Malicious Instruction Injection via Event Trigger and Prompt Injection via Shared Document. Broadly-accessible email-capable agents (no access restriction + email) are the most dangerous.
Query 6: MCP Tool Inventory Across Agents
🟠 Governance query — MCP servers give agents access to external systems, Graph API, Sentinel data, and more. Uncontrolled MCP proliferation increases the attack surface. AgentsInfo exposes a dedicated McpServers column (cleaner than the old tool-detail parse).
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(McpServers) > 0
| mv-expand Mcp = McpServers
| extend McpName = tostring(Mcp.name)
| extend CreatorId = tostring(RawAgentInfo.creatorId),
AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| summarize McpServerList = make_set(McpName), McpToolCount = dcount(McpName)
by AgentId, Name, Platform, CreatorId, AllowAllUsers
| order by McpToolCount desc
MCP server distribution (which servers appear on the most agents):
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(McpServers) > 0
| mv-expand Mcp = McpServers
| summarize AgentCount = dcount(AgentId) by McpServer = tostring(Mcp.name)
| order by AgentCount desc
Note:
McpServersis flat ({name, description}only) — no server URLs or credential config. For external MCP endpoint detail (host/scheme/port), use Q9, which parsesRemoteMCPServerserverUrlsfrom DCM.
Query 7: Knowledge / Data Source Audit
🟡 Data exposure query — identifies what data sources agents declare. In AgentsInfo, declared sources appear in the DeclaredDataSources column as an array of source/filename strings.
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(DeclaredDataSources) > 0
| mv-expand DS = DeclaredDataSources
| extend DataSource = tostring(DS)
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers),
CreatorId = tostring(RawAgentInfo.creatorId)
| summarize DataSources = make_set(DataSource), SourceCount = dcount(DataSource)
by AgentId, Name, Platform, AllowAllUsers, CreatorId
| order by SourceCount desc
Post-processing — flag high-risk combinations:
- Data sources +
allowForAllUsers == "true"→ internal data potentially exposed broadly. - Any data source on an agent that is also email-capable (Q5) → XPIA exfiltration chain.
Coverage caveat:
DeclaredDataSourcesis sparse and stores source names/filenames (e.g.,Priority-Banking-Policy.docx), not the richer$kind/site structure the oldKnowledgeDetailscolumn held. Source type classification (SharePoint vs public site vs federated) is not reliably available — report the declared source names and flag broadly-accessible agents that carry any.
🔴 Document Injection Risk (Attack Scenario 2): Data sources are the primary vector for indirect prompt injection (XPIA). Cross-reference with Q5: agents that combine declared data sources with an email-send operation are the textbook XPIA exfiltration pattern — flag these as highest priority in the Knowledge Source Risk dimension.
Query 8: Hard-Coded Credential Scan
🔴 Security-critical query — scans agent Instructions and the connector metadata in RawAgentInfo for patterns matching API keys, JWTs, Basic auth headers, and embedded credentials.
let suspicious_patterns = @"(AKIA[0-9A-Z]{16})|(AIza[0-9A-Za-z_\-]{35})|(xox[baprs]-[0-9a-zA-Z]{10,48})|(ghp_[A-Za-z0-9]{36,59})|(sk_(live|test)_[A-Za-z0-9]{24})|(SG\.[A-Za-z0-9]{22}\.[A-Za-z0-9]{43})|(eyJ[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+)|(Authorization\s*:\s*Basic\s+[A-Za-z0-9=:+]+)|([A-Za-z]+:\/\/[^\/\s]+:[^\/\s]+@[^\/\s]+)";
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend Haystack = strcat(tostring(Instructions), " ", tostring(RawAgentInfo.declarativeCopilotMetadata))
| where Haystack matches regex suspicious_patterns
| project Name, Platform, PublishedStatus,
CreatorId = tostring(RawAgentInfo.creatorId), AgentId
Post-processing:
- Published agents with credential matches = immediate remediation required.
- Recommend Azure Key Vault + environment variables instead of hard-coded secrets.
- The JWT (
eyJ...) andurl://user:pass@hostpatterns can false-positive on example payloads — manually review each match.
Query 9: External Endpoint & HTTP Risk
🟠 Network risk query — inventories the external hosts that agent connectors reach, and flags insecure schemes or non-standard ports. External endpoints are declared in DCM apis[].serverUrls for OpenApi and RemoteMCPServer connector types (these are populated; api_action Power Platform connectors abstract the URL and are not covered).
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| extend ApiType = tostring(Api.type)
| where ApiType in ("OpenApi", "RemoteMCPServer")
| mv-expand Url = Api.serverUrls
| extend Url = tostring(Url)
| where isnotempty(Url)
| extend Host = tostring(parse_url(Url).Host),
Port = tostring(parse_url(Url).Port),
Scheme = tostring(parse_url(Url).Scheme)
| extend NonStandardPort = isnotempty(Port) and Port !in ("443", "80", ""),
InsecureScheme = Scheme != "https"
| project Name, Platform, ApiType, Scheme, Host, Port, Url,
NonStandardPort, InsecureScheme,
AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| order by NonStandardPort desc, InsecureScheme desc, Host asc
Post-processing:
InsecureScheme == true(non-HTTPS) orNonStandardPort == true→ review the connector; data may transit insecurely.- Unfamiliar external hosts on broadly-accessible agents (
AllowAllUsers == "true") → highest priority.
Coverage caveat: Only
OpenApi+RemoteMCPServerconnectors declareserverUrls. Power Platformapi_actionconnectors (the majority) do not expose a URL here, so their destinations are not inventoried by this query. The old topic-levelHttpRequestActionparsing is not applicable toAgentsInfo.
Query 10: Top Creators & Naming Hygiene
👥 Governance query — identifies prolific agent creators and names lacking descriptiveness. Creator is a GUID in RawAgentInfo.creatorId; resolve to UPN via an IdentityInfo join.
let IdMap = materialize(IdentityInfo
| where isnotempty(AccountObjectId) and isnotempty(AccountUpn)
| distinct AccountObjectId, AccountUpn);
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend CreatorId = tostring(RawAgentInfo.creatorId)
| where isnotempty(CreatorId)
| join kind=leftouter IdMap on $left.CreatorId == $right.AccountObjectId
| extend CreatorUpn = coalesce(AccountUpn, CreatorId)
| summarize
AgentCount = count(),
PublishedCount = countif(PublishedStatus == "Published"),
GenericNameCount = countif(Name in~ ("Agent", "agent", "Test", "test", "New Agent")),
NoDescriptionCount = countif(isempty(Description)),
AgentNames = make_set(Name, 10)
by CreatorUpn
| order by AgentCount desc
| take 20
Coverage caveat: Only agents with a populated
RawAgentInfo.creatorIdare attributed. Creators whose GUID does not resolve inIdentityInfofall back to the raw GUID. A single creator with a very highAgentCountis a sprawl signal worth investigating.
Query 11: Agent Creation Trend
📈 Trend query — shows agent creation velocity over time to detect sprawl acceleration.
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(CreatedDateTime)
| summarize AgentsCreated = count() by bin(CreatedDateTime, 7d)
| order by CreatedDateTime asc
Query 12: Full Capability / Tools Inventory
🛠️ Tools governance query — catalogs the operations agents can invoke across all connector types, to understand the full capability surface. Parses DCM operations (operationId + API type).
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId), ApiType = tostring(Api.type)
| where isnotempty(OperationId)
| summarize AgentCount = dcount(AgentId), Agents = make_set(Name, 5) by OperationId, ApiType
| order by AgentCount desc
Alternative for non-DCM agents — the flat DeclaredTools column ({type, name}) covers agents without DCM:
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(DeclaredTools) > 0
| mv-expand Tool = DeclaredTools
| summarize AgentCount = dcount(AgentId)
by ToolType = tostring(Tool.type), ToolName = tostring(Tool.name)
| order by AgentCount desc
Coverage caveat: The DCM query yields deep operation-level detail but only for the connector-sourced subset. The
DeclaredToolsfallback is broader but flatter (tool name/type only, no operation IDs). Run both for the fullest picture.
Query 13: Operation-Level Privilege Mapping
🔐 Privilege query — buckets every declared operation into a sensitivity category (mail-send, directory-write, data-write, messaging, security-tooling, read/other) to surface where write/exfiltration capability concentrates. Feeds the Capability Privilege Index supplementary indicator.
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId)
| where isnotempty(OperationId)
| extend PrivilegeCategory = case(
OperationId has_any ("Send an email", "SendEmail", "Send email"), "Mail-Send",
OperationId has_any ("AddUserToGroup", "RemoveMember", "UpdatePerson", "UpdateOrganisation", "Create user", "Delete user", "Update user", "Assign"), "Directory-Write",
OperationId has_any ("unbound action", "Create a row", "Update a row", "Delete a row", "Create record", "Update record"), "Data-Write",
OperationId has_any ("Post message", "Post a message", "Send message", "Create chat", "post in a chat"), "Messaging",
OperationId has_any ("Security Copilot", "Sentinel"), "Security-Tooling",
"Other/Read")
| summarize AgentCount = dcount(AgentId) by PrivilegeCategory
| order by AgentCount desc
Capability Privilege Index — distinct agents holding ≥1 sensitive (write/send) operation, split by broad access:
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId)
| where OperationId has_any ("Send an email", "SendEmail", "AddUserToGroup", "RemoveMember", "UpdatePerson", "UpdateOrganisation", "unbound action", "Create a row", "Update a row", "Delete a row", "Post message", "post in a chat")
| summarize SensitiveAgents = dcount(AgentId),
BroadAndSensitive = dcountif(AgentId, AllowAllUsers == "true")
Interpretation:
BroadAndSensitive > 0is a direct privilege-abuse signal — a broadly-accessible agent that can write to the directory, write data, or send mail. These agents justify maxing the Broad Access and/or XPIA dimensions. Tune the operation keyword lists to your tenant's connector set.
Query 14: Deep-Manifest Coverage (Report Banner)
📊 Coverage query — reports what fraction of the fleet carries the deep declarativeCopilotMetadata (DCM) that the XPIA, endpoint, and capability queries depend on. Run this every report and surface the result as a banner so the analyst knows what was not fully inspected.
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| summarize Total = count(),
WithDCM = countif(isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))),
WithInstructions = countif(isnotempty(Instructions)),
WithObservabilityID = countif(isnotempty(ObservabilityID)),
WithEntraAgentID = countif(isnotempty(EntraAgentID))
| extend DcmCoveragePct = round(100.0 * WithDCM / Total, 1),
InstrCoveragePct = round(100.0 * WithInstructions / Total, 1),
ObsIdPct = round(100.0 * WithObservabilityID / Total, 1),
EntraIdPct = round(100.0 * WithEntraAgentID / Total, 1)
Why both ID columns:
ObservabilityIDis near-universally populated (~100%) and is the natural runtime-correlation handle;EntraAgentIDis sparse (only agents provisioned with an Entra Agent ID). Report both so the analyst knows which runtime/identity correlations are feasible.
Query 15: Runtime Correlation — Active-and-Dangerous (Scoped)
🎯 Runtime query (Phase 5) — confirms which flagged agents are actually active. Scoped by name list — no fleet-wide join (see Phase 5 for why a full join times out). Populate FlaggedNames from the Q4 broadly-accessible and Q13 sensitive-op results.
let FlaggedNames = dynamic(["<broadly-accessible or sensitive-op agent names from Q4/Q13>"]);
CopilotActivity
| where TimeGenerated > ago(7d)
| where AgentName in (FlaggedNames)
| summarize Interactions = count(),
DistinctUsers = dcount(ActorUserId),
LastSeen = max(TimeGenerated),
SrcIPs = dcount(SrcIpAddr) by AgentName
| order by Interactions desc
Interpretation: A flagged agent appearing here with real
Interactionsis active-and-dangerous — prioritize for remediation over dormant flagged agents. Join key isAgentName(CopilotActivity.AgentIdis a composite prefixed string that does NOT equalAgentsInfo.AgentId). Absence here does not prove dormancy — mostCopilotActivityrows are unattributed (emptyAgentName).AIModelNameis sparse in this table; do not rely on it for model inventory.
Output Modes
Mode 1: Inline Chat Summary
Render the full analysis directly in the chat response. Best for quick review.
Mode 2: Markdown File Report
Save a comprehensive report to disk at:
reports/ai-agent-posture/AI_Agent_Posture_Report_YYYYMMDD_HHMMSS.md
Mode 3: Both
Generate the markdown file AND provide an inline summary in chat.
Always ask the user which mode before generating output.
Inline Report Template
Render the following sections in order. Omit sections only if explicitly noted as conditional.
🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).
# 🤖 AI Agent Security Posture Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** AgentsInfo (Advanced Hunting)
**Analysis Period:** <EarliestRecord> → <LatestRecord>
**Platforms:** <list discovered Platform values>
---
> 📊 **Deep-Manifest Coverage (Q14):** `<WithDCM>/<Total>` agents (**<DcmCoveragePct>%**) carry `declarativeCopilotMetadata` — the XPIA, external-endpoint, and capability findings below cover **only this subset**. Instructions present on **<InstrCoveragePct>%**, ObservabilityID on **<ObsIdPct>%** (runtime-correlation handle), EntraAgentID on **<EntraIdPct>%**. The remaining `<Total - WithDCM>` agents were inventoried but not deeply inspected.
---
## Executive Summary
<2-3 sentences: total agents, key risk findings, overall score>
**Overall Risk Rating:** 🔴/🟠/🟡/✅ <RATING> (<Score>/100)
---
## Key Metrics
| Metric | Value |
|--------|-------|
| Total Agents (non-deleted) | <N> |
| Published Agents | <N> |
| Draft Agents | <N> |
| Platforms Represented | <N> |
| Resolved Creators (lower bound) | <N> |
| Broadly-Accessible Agents (allowForAllUsers) | <N> |
| Agents with MCP Servers | <N> |
| Agents with Declared Data Sources | <N> |
| Email-Capable Agents (XPIA Risk) | <N> |
> ℹ️ **Coverage note:** Creator and capability metrics are derived from `RawAgentInfo` and `declarativeCopilotMetadata`, which are sparsely populated. Counts marked "lower bound" reflect only agents with the relevant field present — see per-section caveats.
---
## 🔓 Access Posture
> **Authentication-type gap:** `AgentsInfo` has no equivalent to the old `UserAuthenticationType` (None/Microsoft/Custom). The `ToolsAuthenticationType` column is effectively empty in practice. Access exposure is assessed via the `RawAgentInfo.allowForAllUsers` governance signal instead — a **proxy for broad exposure, not an authentication state**.
### Access Distribution (Q3)
| App Type | Allow-All-Users | Count |
|----------|-----------------|-------|
| <appType> | <true/false> | <N> |
### 🔴 Broadly-Accessible Agents (Q4)
<If Q4 returns results:>
| Agent Name | Platform | App Type | Published | Created |
|------------|----------|----------|-----------|---------|
| <name> | <platform> | <appType> | <status> | <date> |
<If Q4 returns 0:>
✅ No broadly-accessible agents (`allowForAllUsers == "true"`) detected.
---
## 📧 XPIA Email Exfiltration Risk
<If Q5 returns results:>
| Agent Name | Platform | Email Operation | Broadly Accessible |
|------------|----------|-----------------|--------------------|
| <name> | <platform> | <operationId> | 🔴 Yes / 🟢 No |
**Risk Assessment:**
- 🔴 Email-capable agents can be exploited via XPIA to exfiltrate data, especially when combined with declared data sources (Q7).
- ⚠️ Recommendation: Review recipient controls; apply Power Platform DLP and Defender Runtime Protection.
> **Coverage caveat:** Email capability is detected from DCM `operations[].operationId` (e.g., "Send an email", "SendEmail"). There is no longer a GenAI-orchestration flag or an `inputs` field, so AI-controlled-vs-hardcoded recipient distinction is **not available** — treat all email-capable agents as candidates. Only the DCM-bearing subset is covered.
<If Q5 returns 0:>
✅ No email-capable agents detected in the DCM-bearing subset.
---
## 🛠️ MCP Server Exposure
<If Q6 returns results:>
| Agent Name | Platform | MCP Servers | Broadly Accessible |
|------------|----------|-------------|--------------------|
| <name> | <platform> | <server list> | <yes/no> |
**MCP Server Distribution:**
| MCP Server | Agent Count |
|------------|-------------|
| <server> | <N> |
<If Q6 returns 0:>
✅ No agents with MCP servers detected.
> **Coverage caveat:** The `McpServers` column is flat (`{name, description}` only) — no server URLs, credential config, or transport detail. Non-HTTPS/hardcoded-cred MCP detection from the old schema is not possible here.
> **Dimension note:** MCP exposure and the External Endpoint findings (below) both feed the single **Tool & Endpoint Exposure** score dimension. Any insecure scheme, non-standard port, or external endpoint on a broadly-accessible agent escalates that dimension to High regardless of MCP count.
---
## 📚 Declared Data Source Exposure
<If Q7 returns results:>
| Agent Name | Platform | Data Sources | Broadly Accessible |
|------------|----------|--------------|--------------------|
| <name> | <platform> | <source names> | <yes/no> |
**⚠️ High-Risk Combinations:**
<List agents with declared data sources + allowForAllUsers == "true", and agents combining data sources with email capability (Q5)>
<If Q7 returns 0:>
✅ No declared data sources found on any agents.
> **Coverage caveat:** `DeclaredDataSources` stores source **names/filenames** only — source *type* classification (SharePoint vs public site vs federated) is not available.
---
## 🔑 Credential Hygiene
<If Q8 returns results:>
🔴 **Hard-coded credential patterns detected in <N> agent(s):**
| Agent Name | Platform | Status | Creator |
|------------|----------|--------|---------|
| <name> | <platform> | <status> | <creatorId/upn> |
⚠️ **Recommendation:** Move secrets to Azure Key Vault; use environment variables at runtime.
<If Q8 returns 0:>
✅ No hard-coded credential patterns detected in agent instructions or connector metadata.
---
## 🌐 External Endpoint & HTTP Risk
<If Q9 returns results:>
| Agent | API Type | Scheme | Host | Port | Insecure | Non-Standard Port |
|-------|----------|--------|------|------|----------|-------------------|
| <name> | <OpenApi/RemoteMCPServer> | <scheme> | <host> | <port> | 🔴/🟢 | 🔴/🟢 |
<If Q9 returns 0:>
✅ No external endpoints with insecure schemes or non-standard ports detected.
> **Coverage caveat:** Only `OpenApi` + `RemoteMCPServer` connectors declare `serverUrls`. Power Platform `api_action` connectors do not expose destination URLs.
---
## 👥 Creator Governance
### Top Creators
| Creator | Agents | Published | Generic Names | No Description |
|---------|--------|-----------|---------------|----------------|
| <upn/creatorId> | <N> | <N> | <N> | <N> |
### Naming Hygiene
- Agents with generic names ("Agent", "Test"): <N>
- Agents with no description: <N>
> **Coverage caveat:** Only agents with a populated `RawAgentInfo.creatorId` are attributed; GUIDs unresolved in `IdentityInfo` fall back to the raw GUID.
---
## 📈 Agent Creation Trend
<ASCII bar chart or summary table of Q11 results — weekly agent creation counts>
---
## 🛠️ Full Capability / Tools Inventory
| Operation / Tool | API / Tool Type | Agent Count | Example Agents |
|------------------|-----------------|-------------|----------------|
| <operationId/name> | <type> | <N> | <agent names> |
---
## 🔐 Capability Privilege Index (Supplementary — not summed into score)
**Operation sensitivity distribution (Q13):**
| Privilege Category | Agent Count |
|--------------------|-------------|
| Mail-Send | <N> |
| Directory-Write | <N> |
| Data-Write | <N> |
| Messaging | <N> |
| Security-Tooling | <N> |
| Other/Read | <N> |
**Index:** <SensitiveAgents> agent(s) hold ≥1 sensitive (write/send) operation; **<BroadAndSensitive>** of those are also broadly accessible (`allowForAllUsers == "true"`).
<If BroadAndSensitive > 0:>
🔴 **<BroadAndSensitive> broadly-accessible agent(s) with sensitive write/send capability** — direct privilege-abuse exposure. These justify maxing the Broad Access and/or XPIA dimensions.
<If BroadAndSensitive == 0:>
✅ No broadly-accessible agents hold sensitive write/send operations (within the DCM-bearing subset).
> Supplementary indicator — provides privilege context but is **not** added to the /100 composite. Coverage limited to the DCM-bearing subset (see banner).
---
## 🎯 Runtime Correlation — Active-and-Dangerous (Q15, Optional)
<If Phase 5 was run — flagged agents correlated against CopilotActivity:>
| Agent Name | Interactions | Distinct Users | Source IPs | Last Seen |
|------------|--------------|----------------|------------|-----------|
| <name> | <N> | <N> | <N> | <date> |
🔴 **Active-and-dangerous:** Flagged agents (broadly accessible / sensitive ops) confirmed active at runtime — prioritize for remediation over dormant flagged agents.
<If no flagged agents appear in CopilotActivity:>
✅ No flagged agents showed runtime activity in the window. *(Caveat: most `CopilotActivity` rows are unattributed — absence does not prove dormancy.)*
> Scoped name-based lookup (`AgentName` key). Runtime attribution is inherently low-coverage; this section confirms presence, not absence.
---
## Agent Security Score Card
```
┌──────────────────────────────────────────────────────┐
│ AGENT SECURITY SCORE: <NN>/100 │
│ Rating: <EMOJI> <RATING> │
├──────────────────────────────────────────────────────┤
│ Broad Access [<bar>] <N>/20 (<detail>) │
│ XPIA Email Risk [<bar>] <N>/20 (<detail>) │
│ Tool & Endpt Expo[<bar>] <N>/20 (<detail>) │
│ Data Source Risk [<bar>] <N>/20 (<detail>) │
│ Credential Hygn [<bar>] <N>/20 (<detail>) │
├──────────────────────────────────────────────────────┤
│ Supplementary (not scored): │
│ Capability Privilege Index: <S> sensitive / <B> broad│
│ Deep-Manifest Coverage: <DcmCoveragePct>% │
└──────────────────────────────────────────────────────┘
```
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |
---
## Recommendations
> **Key mitigation — Runtime:** For all high-risk agents, recommend enabling **Microsoft Defender Runtime Protection** — webhook-based real-time inspection that can block malicious tool invocations before execution. See [Real-time agent protection during runtime](https://learn.microsoft.com/en-us/defender-cloud-apps/real-time-agent-protection-during-runtime).
> **Key mitigation — Governance:** For fleet-wide governance gaps (sprawl, missing auth, uncontrolled tools), recommend adopting **[Microsoft Agent 365](https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-agent-365-the-control-plane-for-ai-agents/)** as the enterprise control plane — providing centralized Registry (inventory + quarantine), Access Control (Entra agent IDs + Policy Templates), Visualization (agent ↔ resource mapping), and Security (Defender + Purview integration).
1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...
---
## Appendix: Query Execution Summary
| Query | Description | Records | Time |
|-------|-------------|---------|------|
| Q1 | Global Inventory | <N> | <time> |
| Q2 | Status & Auth Breakdown | <N> | <time> |
| ... | ... | ... | ... |
| Q13 | Operation-Level Privilege Mapping | <N> | <time> |
| Q14 | Deep-Manifest Coverage | <N> | <time> |
| Q15 | Runtime Correlation (scoped, optional) | <N> | <time> |
Markdown File Report Template
When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:
reports/ai-agent-posture/AI_Agent_Posture_Report_YYYYMMDD_HHMMSS.md
Include the following additional sections in the file report that are omitted from inline:
- Full agent detail table (all non-deleted agents with key fields)
- Per-platform breakdown (agent counts and creators by
Platform) - Complete data source listing (every declared source name, not just examples)
- Complete MCP agent listing (every MCP agent with full server list)
- Raw query references — note that full query definitions are in this SKILL.md file
File Report Header
# AI Agent Security Posture Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** AgentsInfo (Advanced Hunting)
**Analysis Period:** <EarliestRecord> → <LatestRecord> (<N> days)
**Platforms:** <list discovered Platform values>
**Total Agents:** <N> (Published: <N>, Draft: <N>)
---
> 📊 **Deep-Manifest Coverage (Q14):** `<WithDCM>/<Total>` agents (**<DcmCoveragePct>%**) carry `declarativeCopilotMetadata`; XPIA/endpoint/capability findings cover only this subset. ObservabilityID **<ObsIdPct>%**, EntraAgentID **<EntraIdPct>%**.
---
Include the Capability Privilege Index and (if Phase 5 ran) Runtime Correlation sections from the inline template in the file report as well.
Known Pitfalls
1. AgentsInfo Is Advanced Hunting Only
Problem: The AgentsInfo table does NOT exist in Sentinel Data Lake. Querying via mcp_sentinel-data_query_lake returns SemanticError: Failed to resolve table.
Solution: Always use RunAdvancedHuntingQuery. The table has 30-day retention in AH.
2. Multiple Records Per Agent (State Snapshots)
Problem: The table logs configuration snapshots over time. Querying without deduplication returns inflated counts and duplicate agent entries.
Solution: Always use | summarize arg_max(Timestamp, *) by AgentId to get the latest state per agent before any analysis. Note AgentId is a guid and the column is Name/Description (not AgentName/AgentDescription as some docs state).
3. RawAgentInfo Is the Real Detail Source
Problem: The normalized columns (DeclaredTools, McpServers, DeclaredDataSources, Owners, Capabilities) are sparsely populated and flat. The rich governance/configuration detail lives in the RawAgentInfo dynamic column (populated for ~all agents) and, for the connector-sourced subset, in RawAgentInfo.declarativeCopilotMetadata (DCM).
Solution: For creator (RawAgentInfo.creatorId), broad access (RawAgentInfo.allowForAllUsers), app type (RawAgentInfo.appType), and deep capability/endpoint detail, parse RawAgentInfo. RawAgentInfo is dynamic — no double-parse needed; access nested keys directly with tostring(RawAgentInfo.key).
4. declarativeCopilotMetadata (DCM) Covers Only a Subset
Problem: Deep capability queries (Q5 email, Q9 endpoints, Q12 operations) depend on RawAgentInfo.declarativeCopilotMetadata, which is present for only ~10% of Copilot Studio agents (the connector-sourced subset). The majority have only a shallow manifest.
Solution: Always state the coverage caveat in reports. DCM path: declarativeCopilotMetadata[].actions[].apis[] with .type (OpenApi/RemoteMCPServer/api_action), .serverUrls[], and .operations[].operationId. Results from these queries are a floor, not a complete inventory.
5. Authentication-Type Detection Has No Equivalent
Problem: The old UserAuthenticationType (None/Microsoft/Custom) is gone. The ToolsAuthenticationType column exists in schema but is effectively empty (~100% blank). There is no way to classify agents as "unauthenticated" the way the old skill did.
Solution: Use RawAgentInfo.allowForAllUsers == "true" as a broad-exposure proxy (documented as a proxy, NOT an authentication state). Never claim an agent is "unauthenticated" — say "broadly accessible".
6. Many Schema Columns Are Empty in Practice
Problem: ToolsAuthenticationType, Availability, Endpoints, Triggers, Permissions, and Model are present in the schema but empty/null in practice. Queries built on them silently return 0 rows.
Solution: Do not build core logic on these columns. Validate population with a quick summarize countif(isnotempty(<col>)) before relying on a column. LifecycleStatus is blank for active agents (only Deleted is populated) — LifecycleStatus != "Deleted" correctly passes blanks.
7. creatorId Is a GUID — Join IdentityInfo for UPN
Problem: RawAgentInfo.creatorId is an Entra object GUID, not a UPN. There is no CreatorAccountUpn, LastModifiedByUpn, or LastPublishedByUpn equivalent.
Solution: Resolve via leftouter join to IdentityInfo on AccountObjectId, then coalesce(AccountUpn, CreatorId). Creator attribution is a lower bound — creatorId is sparse.
8. serverUrls Only Populated for OpenApi & RemoteMCPServer
Problem: External endpoint URLs in DCM apis[].serverUrls are populated for OpenApi and RemoteMCPServer connector types, but not for api_action (Power Platform connectors, the majority). Filtering all API types yields mostly empty URLs.
Solution: Filter ApiType in ("OpenApi", "RemoteMCPServer") before expanding serverUrls. State that api_action destinations are not inventoried.
9. McpServers Is Flat (Name/Description Only)
Problem: The dedicated McpServers column contains only {name, description} — no server URLs, credential configuration, or transport detail. Non-HTTPS MCP detection and hardcoded-cred-in-MCP detection from the old design are not possible.
Solution: Use McpServers for inventory/exposure counts only. For MCP server endpoints, fall back to the DCM RemoteMCPServer API type (Q9).
10. AH Booleans Are Textual True/False (Feb 25, 2026)
Problem: Since Feb 25, 2026, Advanced Hunting boolean results render as textual True/False, not 1/0. Governance flags from RawAgentInfo (e.g., allowForAllUsers) are JSON strings ("true"/"false").
Solution: Compare against the string form: tostring(RawAgentInfo.allowForAllUsers) == "true". Avoid == 1 / == true numeric/bool comparisons on parsed JSON values.
11. CopilotActivity Correlation — Composite AgentId & Fleet-Join Timeouts
Problem: Phase 5 runtime correlation against CopilotActivity has three traps: (1) CopilotActivity.AgentId is a composite/prefixed string (e.g., T_<tenant>.<guid>, CopilotStudio.Declarative.T_….gpt.<guid>, or literals like AgentBuilder) that does not equal the clean AgentsInfo.AgentId GUID — ID joins return 0 matches. (2) A fleet-wide AgentsInfo ↔ CopilotActivity join (~15k agents × 100k+ rows, heavy RawAgentInfo) times out the AH endpoint. (3) Most CopilotActivity rows have an empty AgentName/AgentId (general M365 Copilot usage), so runtime attribution is low-coverage.
Solution: Use a scoped name-based lookup (Query 15): build a small flagged-name list from Q4/Q13, then CopilotActivity | where AgentName in (FlaggedNames) — no join. AgentName is the reliable cross-table key. Never join the full fleet. Treat absence from CopilotActivity as unconfirmed, not proof of dormancy. AIModelName is sparse here — do not use it for model inventory.
Quality Checklist
Before delivering the report, verify:
- All queries used
arg_max(Timestamp, *) by AgentIdfor deduplication - All queries filtered
LifecycleStatus != "Deleted"(unless auditing deletions) - All queries ran via
RunAdvancedHuntingQuery(not Data Lake) - Zero-result queries are reported with explicit absence confirmation (✅ pattern)
- The Agent Security Score calculation is transparent with per-dimension evidence
- Broadly-accessible agents are described as a proxy (NOT "unauthenticated"); the auth-type gap is stated
- DCM-dependent sections (XPIA email, external endpoints, capability inventory) include the coverage caveat
- Deep-Manifest Coverage banner (Q14) is present at the top of the report (DCM %, ObservabilityID %, EntraAgentID %)
- Capability Privilege Index (Q13) is reported as a supplementary indicator, explicitly noted as NOT summed into the /100 score
- If Phase 5 ran, runtime correlation is SCOPED by
AgentName(Query 15) — never a fleet-wide join; absence is described as unconfirmed, not dormant - Score card uses the Tool & Endpoint Exposure dimension label (not "MCP Server Expo") and shows the two supplementary indicators
- MCP server inventory includes server names, not just counts
- Declared data sources note that source-type classification is unavailable
- Creator governance resolves
creatorIdGUIDs viaIdentityInfoand notes the lower-bound caveat - Recommendations are prioritized and evidence-based
- All hyperlinks in the report are copied verbatim from the URL Registry — no fabricated or recalled-from-memory URLs
- No PII from live environments in the SKILL.md file itself
SVG Dashboard Generation
📊 Optional post-report step. After an AI Agent Security Posture report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/ai-agent-posture/AI_Agent_Posture_Report_<org>_<date>.md - Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/ai-agent-posture/{report_name}_dashboard.svg
The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.
.github/skills/app-registration-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill app-registration-posture -g -y
SKILL.md
Frontmatter
{
"name": "app-registration-posture",
"description": "Audit Entra ID app registration and service principal security posture. Triggers on keywords like \"app registration posture\", \"service principal permissions\", \"dangerous app permissions\", \"app ownership\", \"app credential abuse\", \"SPN lateral movement\", \"app consent grant\", \"overprivileged apps\", \"cross-tenant SPN\", \"app registration kill chain\", \"app persistence\", \"credential add chain\", \"Graph API permissions audit\". Combines Graph API current-state inventory (dangerous permissions, ownership, credential hygiene) with KQL chain detection (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs) for posture assessment covering permission concentration, owner risk, credential hygiene, cross-tenant exposure, and active abuse signals. Includes 5-dimension App Permission Risk Score. Inline chat or markdown output.",
"drill_down_prompt": "Run app registration posture audit — dangerous permissions, credential hygiene, abuse chains",
"threat_pulse_domains": [
"spn",
"admin"
]
}
App Registration Security Posture — Instructions
Purpose
This skill audits the security posture of Entra ID App Registrations and Service Principals across your organization, combining Graph API current-state inventory with KQL attack chain detection to create a comprehensive assessment.
App Registrations are a growing persistence and lateral movement vector. Attackers who compromise a user with app ownership can add credentials (secrets/certificates), disconnect from the user session, and authenticate as the service principal — inheriting all the app's permissions. This is the exact pattern documented in the Guardz research and used in the SolarWinds/Solorigate attack.
What this skill covers:
| Domain | Key Questions Answered | Data Source |
|---|---|---|
| 🔐 Permission Inventory | Which apps have dangerous Graph API permissions? How concentrated are critical permissions? | Graph API |
| 👤 Owner Risk | Which app owners are non-admin users (phishing targets)? Are owners currently risky? Ownerless apps? | Graph API + Q1 |
| 🔑 Credential Hygiene | Stale secrets, multi-credential apps, long-lived credentials, cert+secret anomalies | Graph API |
| 🌐 Cross-Tenant Exposure | Foreign SPNs authenticating into your tenant with dangerous permissions | Q4 |
| ⚡ Active Abuse Chains | Risky user → app ops, credential add → SPN activation, ownership → credential chains, Graph API lateral movement, permission escalation, multi-app ownership spread, App Governance & OAuth incident cross-reference | Q1–Q8 |
How this differs from existing capabilities:
| Existing Resource | Coverage | Gap This Skill Fills |
|---|---|---|
app_credential_management.md |
Individual credential/ownership/consent events | No cross-table chain correlation |
service_principal_scope_drift.md |
SPN behavioral baseline drift | No link to preceding compromise signals |
| App Governance (Microsoft) | Anomalous app behavior, overprivileged apps | No correlation with user risk signals or multi-step chains |
| This skill | Graph API posture + KQL chain detection | End-to-end: current state → historical abuse → risk scoring |
Data sources:
| Source | Type | What It Provides |
|---|---|---|
AuditLogs (ApplicationManagement) |
KQL | Credential adds, ownership changes, consent grants, permission assignments |
AADServicePrincipalSignInLogs |
KQL | SPN authentication patterns, cross-tenant sign-ins, credential types |
AADUserRiskEvents |
KQL | Identity Protection risk detections for app owners |
MicrosoftGraphActivityLogs |
KQL | Graph API calls by SPNs post-credential-add |
AlertInfo + AlertEvidence |
KQL | App Governance alerts, OAuth incidents, Attack Disruption events (Q8) |
Graph API (/servicePrincipals, /applications) |
REST | Current-state permission grants, app ownership, credential inventory |
References:
- Guardz: Abusing Entra ID App Registrations for Long-Term Persistence
- Microsoft: Solorigate Coordinated Defense
- Microsoft: App Governance in Defender for Cloud Apps
- MITRE ATT&CK T1098.001 — Additional Cloud Credentials
- MITRE ATT&CK T1550.001 — Application Access Token
- Microsoft: Verify First-Party Apps in Sign-In Reports
🔴 URL Registry — Canonical Links for Report Generation
MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL. If a URL is not in this registry, omit the hyperlink entirely and use plain text.
| Label | Canonical URL |
|---|---|
BLOG_GUARDZ |
https://guardz.com/blog/abusing-entra-id-app-registrations-for-long-term-persistence/ |
BLOG_SOLORIGATE |
https://www.microsoft.com/en-us/security/blog/2020/12/28/using-microsoft-365-defender-to-coordinate-protection-against-solorigate/ |
DOCS_APP_GOVERNANCE |
https://learn.microsoft.com/en-us/defender-cloud-apps/app-governance-manage-app-governance |
DOCS_GRAPH_PERMS |
https://learn.microsoft.com/en-us/graph/permissions-reference |
DOCS_FIRST_PARTY_APPS |
https://learn.microsoft.com/en-us/troubleshoot/entra/entra-id/governance/verify-first-party-apps-sign-in |
MITRE_T1098_001 |
https://attack.mitre.org/techniques/T1098/001/ |
MITRE_T1550_001 |
https://attack.mitre.org/techniques/T1550/001/ |
Threat Landscape: Why App Registration Posture Matters
The attack pattern is well-documented and increasingly exploited:
User compromised → discovers app ownership → adds credential (secret/cert) →
disconnects from user session → authenticates AS the app (SPN) →
uses app permissions for lateral movement / data exfiltration / privilege escalation
Why app registrations are attractive to attackers:
| Factor | Risk |
|---|---|
| Persistence beyond user compromise | Revoking the user's password doesn't revoke the app credential — the SPN continues to operate |
| Non-admin users as owners | Standard users can own apps with Application.ReadWrite.All — if phished, the attacker inherits those permissions |
| Permissions outlive their creators | App permissions persist even after the admin who granted them leaves the org |
| Cross-tenant trust | Multi-tenant apps create implicit trust relationships that survive account remediation |
| Low visibility | SPN sign-ins are in a separate log table (AADServicePrincipalSignInLogs) that many SOCs don't monitor |
MITRE ATT&CK Mapping:
| Technique | ID | Kill Chain Stage | Detection Query |
|---|---|---|---|
| Additional Cloud Credentials | T1098.001 | Persistence | Q2, Q3 |
| Additional Cloud Roles | T1098.003 | Privilege Escalation | Q6 |
| Cloud Accounts | T1078.004 | Initial Access / Persistence | Q1 |
| Application Access Token | T1550.001 | Lateral Movement | Q2, Q5 |
| SAML/OAuth Tokens | T1606.002 | Credential Access | Q4 |
| Impersonation | T1656 | Defense Evasion | Q4 |
Q8 note: Q8 (App Governance & OAuth Incident Cross-Reference) is a detection validation query, not a technique-specific detector. It cross-references existing Defender detections spanning multiple techniques above against Phase 1 findings.
📑 TABLE OF CONTENTS
- Critical Workflow Rules — Mandatory rules
- Schema Pitfalls — AuditLogs and Graph API pitfalls
- Dangerous Permissions Reference — Application-level Graph API grants
- App Permission Risk Score Formula — Composite risk scoring
- Execution Workflow — Phase-by-phase plan
- Phase 1: Graph API Posture Inventory — Steps P1–P7
- Phase 2: KQL Chain Detection Queries — Queries Q1–Q8
- Output Modes — Inline vs Markdown report
- Inline Report Template — Chat-rendered format
- Markdown File Report Template — Disk-saved format
- Known Pitfalls — Schema quirks and edge cases
- Quality Checklist — Pre-delivery validation
- SVG Dashboard Generation — Visual dashboard from report
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
-
Dual data source skill: This skill uses BOTH Graph API (via Graph MCP) for current-state posture AND KQL (via
RunAdvancedHuntingQuery) for historical chain detection. Both phases are required for a complete assessment. -
Graph API before KQL: Run Phase 1 (Graph API posture) first — it identifies the dangerous apps. Phase 2 (KQL chains) then checks whether those apps show historical abuse signals.
-
Use
RunAdvancedHuntingQueryfor all KQL queries. All tables used (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs, AlertInfo, AlertEvidence) are available in Advanced Hunting. AH is free for Analytics-tier tables. Data Lake fallback only if AH fails or lookback > 30 days (note: AlertInfo/AlertEvidence are AH-only). -
ASK the user for output format before generating the report:
- Inline chat summary (quick review in chat)
- Markdown file report (detailed, archived to
reports/app-registration-posture/) - Both (markdown + inline summary)
-
⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (
✅ No [finding] detected) when queries return 0 results. Never guess or assume. -
AuditLogs dynamic fields require special handling — Always extract with
tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName). See Schema Pitfalls. -
Graph API: query from the permission side, not the app side — Don't enumerate all app registrations (could be 1000+). Query
appRoleAssignedToon the Microsoft Graph service principal to get all dangerous grants in ~3 API calls. See Phase 1 Scaling Strategy. -
Run KQL queries in parallel batches where possible — Q1–Q8 are all independent and can run in parallel.
-
Time tracking — Report elapsed time after each phase completion.
⛔ PROHIBITED ACTIONS
| Action | Status |
|---|---|
| Enumerating all app registrations individually via Graph API | ❌ PROHIBITED — use appRoleAssignedTo approach |
Querying requiredResourceAccess for granted permissions |
❌ PROHIBITED — shows requested, not granted perms |
Querying ServicePrincipal for ownership (/servicePrincipals/{id}?$expand=owners) |
❌ PROHIBITED — ownership is on Application object |
Joining AuditLog operations on TargetResources[0].id across operation types |
❌ PROHIBITED — AppId ≠ SPNId for same app |
| Reporting 0 KQL results without sanity-checking the query logic | ❌ PROHIBITED |
| Fabricating URLs not in the URL Registry | ❌ PROHIBITED |
Schema Pitfalls
Read these before modifying any query in this skill.
| Pitfall | Details | Workaround |
|---|---|---|
| Application ObjectId ≠ ServicePrincipal ObjectId | The same app has different GUIDs in TargetResources[0].id depending on operation type. Credential operations → Application ObjectId; permission/consent operations → ServicePrincipal ObjectId |
Join on displayName or Actor when correlating across operation types (see Q6) |
| Ownership target name in modifiedProperties | For "Add owner to application", TargetResources[0] is the new owner (User type). The app name is in TargetResources[0].modifiedProperties[1].newValue (field Application.DisplayName) |
Extract with tostring(parse_json(tostring(ModProps[1].newValue))) |
| OperationName trailing spaces | Credential operations have trailing spaces: "Update application – Certificates and secrets management " |
Preserve trailing spaces in filters or use has instead of == |
InitiatedBy is dynamic |
Always extract with tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName) |
Never use dot-notation directly |
| Consent targets structure | "Consent to application": Target[0] = the app receiving consent. "Add delegated permission grant": Target[0] = the resource API (e.g., Microsoft Graph), Target[1] = the app |
Check OperationName before assuming Target[0] is the app |
| Cross-tenant SPNs have no local app object | GET /v1.0/applications?$filter=displayName eq 'X' returns empty for SPNs owned by foreign tenants |
Identify via AADServicePrincipalSignInLogs where AppOwnerTenantId != AADTenantId (Q4). These can only be managed by the owning tenant |
| SP owners ≠ Application owners | /servicePrincipals/{id}?$expand=owners often returns empty even when the Application has owners |
Always query the Application object for ownership |
requiredResourceAccess ≠ granted permissions |
The Application object's requiredResourceAccess shows what the app requests, not what's been granted |
Use appRoleAssignedTo for granted permissions — this is the authoritative source |
| Red team apps may have owners stripped | Attack simulation tools often remove ownership post-creation | Fall back to AuditLogs "Add application" operation to find the original creator |
Dangerous Permissions Reference
Application-level Graph API grants that this skill flags:
| Permission | Risk | Attack Use |
|---|---|---|
Application.ReadWrite.All |
🔴 Critical | Create/modify any app registration — further persistence |
AppRoleAssignment.ReadWrite.All |
🔴 Critical | Grant itself or any app any permission — golden ticket |
RoleManagement.ReadWrite.Directory |
🔴 Critical | Assign any directory role to any principal |
Directory.ReadWrite.All |
🔴 Critical | Read/write all directory objects |
Policy.ReadWrite.ConditionalAccess |
🔴 Critical | Disable CA policies — defense evasion |
Mail.ReadWrite |
🟠 High | Read any user's mailbox — data exfiltration |
Mail.Send |
🟠 High | Send email as any user — phishing, BEC |
Mail.Read |
🟠 High | Read any user's mail — reconnaissance |
MailboxSettings.ReadWrite |
🟠 High | Create forwarding rules — silent exfiltration |
User.ReadWrite.All |
🟠 High | Modify any user account — credential reset |
Group.ReadWrite.All |
🟠 High | Modify group membership — privilege escalation |
Files.ReadWrite.All |
🟠 High | Access all SharePoint/OneDrive files |
Sites.ReadWrite.All |
🟠 High | Full SharePoint site access |
SecurityEvents.ReadWrite.All |
🟡 Medium | Read/modify security alerts — cover tracks |
User.Export.All |
🟡 Medium | Export all user data — bulk exfiltration |
Exchange.ManageAsApp |
🟡 Medium | Full Exchange management — mailbox access |
Permission risk classification for scoring:
- Critical (🔴): Permissions that enable self-elevation or directory-wide control — 5 permissions listed above
- High (🟠): Permissions that enable data access or account manipulation — 8 permissions listed above
- Medium (🟡): Permissions that enable reconnaissance or secondary access — 3 permissions listed above
🔴 Delegated vs Application Permissions — Risk Model
This skill focuses on application permissions (appRoleAssignments) because they represent unattended, user-independent privilege. Delegated permissions (oauth2PermissionGrants) are a fundamentally different risk category. Do not conflate the two.
Why This Distinction Matters
| Factor | Application Permissions (appRoleAssignments) |
Delegated Permissions (oauth2PermissionGrants) |
|---|---|---|
| Identity | App acts as its own identity — no user context required | App acts on behalf of a signed-in user |
| Effective permissions | The full granted scope — the app CAN do everything the permission allows | Intersection of app's delegated scope AND the user's own Entra roles — the app can only do what the user could already do |
| Unattended access | ✅ Yes — runs 24/7 via client credentials or managed identity | ❌ No — requires a user session (interactive or refresh token) |
| Blast radius | The permission itself IS the blast radius — Directory.ReadWrite.All means full directory write for the app, regardless of who triggered it |
Bounded by the user's roles — a standard user with Directory.ReadWrite.All delegated consent still can't write to the directory because they lack the Entra role |
| Token theft impact | Stolen app credential = full permission scope, no MFA challenge | Stolen user token = only the user's own effective permissions, bounded by their roles |
| Risk priority | 🔴 Primary concern — this skill's focus | 🟡 Secondary concern — relevant only for privileged admin accounts |
What AllPrincipals Delegated Consent Actually Does
An AllPrincipals (admin consent) delegated grant removes the per-user consent prompt — it does NOT grant users abilities beyond their existing Entra roles. The practical impact:
- Standard users: Effectively no additional risk. The app can request tokens with broad scopes, but the effective permissions are still limited by the user's role assignments. A user without Exchange Admin role cannot manage mailboxes even if
Mail.ReadWriteis consented. - Privileged admins: Marginal incremental risk. The consent prompt is removed as a speed bump, so a stolen admin session can silently acquire tokens with the consented scopes — but the admin could have granted that consent themselves in one click anyway.
- Token theft for admins: The real scenario where delegated consent matters. An attacker with a stolen Global Admin refresh token can silently use any AllPrincipals-consented scope without triggering a consent dialog. However, the admin already had the ability to do everything those scopes enable.
How This Affects Skill Analysis
-
Phase 1 (P2) queries
appRoleAssignedTo— these are application permissions. This is correct and intentional. The Dangerous Permissions Reference table above applies to application-level grants only. -
Chain detection queries (Q1, Q3, Q6) detect
"Consent to application"and"Add delegated permission grant"in AuditLogs — these detect the act of granting consent, which is a valid abuse signal regardless of permission type (a compromised user granting broad consent is suspicious). The risk assessment should focus on what the user then DOES with the consented access, not on the scope list itself. -
When assessing consent grants in chain detection output:
- A compromised user adding application permissions (
Add app role assignment to service principal) = 🔴 Critical — the app gains independent, unattended access - A compromised user granting delegated consent (
Consent to application,Add delegated permission grant) = 🟠 High if the user is a privileged admin, 🟡 Medium for standard users — the effective permissions are bounded by the user's roles
- A compromised user adding application permissions (
-
Do NOT overstate delegated AllPrincipals consent risk. Reporting 100+ delegated scopes as "dangerous" without explaining the intersection model misleads stakeholders into believing any user can exploit those scopes. Always qualify: "Effective delegated permissions are limited to what each user's Entra roles already allow."
When Delegated Permissions ARE Concerning
Despite the lower baseline risk, flag delegated consents when:
| Scenario | Why It Matters |
|---|---|
| AllPrincipals consent on a 3rd-party (non-Microsoft) app with broad scopes | The app vendor could be compromised, and the consent enables data access for any admin session |
| Delegated consent combined with Q1 chain (risky admin → consent grant) | A compromised admin granting broad delegated consent may be preparing for token-based lateral movement |
| App has BOTH application permissions AND broad delegated consent | Dual permission model = dual attack surface |
AllPrincipals consent for offline_access + sensitive scopes on a public client app |
Enables refresh token persistence without re-authentication |
⛔ PROHIBITED Actions
| Action | Status |
|---|---|
| Stating that AllPrincipals delegated consent gives "any user" access to the scoped resources | ❌ PROHIBITED — effective permissions = intersection with user's roles |
| Rating delegated consent scopes at the same severity as identical application permission scopes | ❌ PROHIBITED — application permissions are unattended and user-independent |
| Omitting the delegated-vs-application distinction when presenting permission findings | ❌ PROHIBITED — always clarify which permission type is being discussed |
| Ignoring delegated consent entirely | ❌ PROHIBITED — it is a secondary risk that matters for privileged accounts |
App Permission Risk Score Formula
The App Permission Risk Score is a composite risk indicator summarizing the security posture of your organization's app registration and service principal fleet. Higher scores indicate greater risk.
Scoring Dimensions
$$ \text{AppPermissionRiskScore} = \sum_{i} \text{DimensionScore}_i $$
Each dimension contributes 0–20 points to a maximum of 100:
| Dimension | Max | 🟢 Low (0–5) | 🟡 Medium (6–12) | 🔴 High (13–20) |
|---|---|---|---|---|
| Permission Concentration | 20 | 0–2 apps with dangerous perms; 0 critical-tier perms | 3–5 apps with dangerous perms; ≤1 app with ≥3 critical-tier perms | >5 apps with dangerous perms OR ≥2 apps with ≥3 critical-tier perms OR any app with AppRoleAssignment.ReadWrite.All (golden ticket → auto 16+) |
| Owner Risk | 20 | All flagged apps have admin owners; 0 ownerless dangerous apps | 1–2 ownerless dangerous apps; OR non-admin owner on 🟠-level app | ≥3 ownerless apps with dangerous perms OR non-admin owner on 🔴-level app OR any app owner with active Identity Protection risk (atRisk/confirmedCompromised) |
| Credential Hygiene | 20 | All apps ≤1 active credential; all secrets <180 days old; 0 dormant privileged apps | Any app with 2 active secrets; OR any secret 180d–730d old; OR 1 dormant privileged app | Any app with ≥3 active secrets + critical perms; OR any secret >730d old (2yr); OR cert+secret on same critical app |
| Cross-Tenant Exposure | 20 | 0 foreign SPNs with dangerous perms | 1–2 foreign SPNs with 🟠-level perms; all from known/identified partner tenants | Any foreign SPN with 🔴 critical perms (AppRoleAssignment.ReadWrite.All, Directory.ReadWrite.All, RoleManagement.ReadWrite.Directory, Policy.ReadWrite.ConditionalAccess) OR foreign SPN from unidentified tenant |
| Active Abuse Signals | 20 | Q1–Q8 all return 0 non-pipeline results | Q1–Q7 return only 🟡-priority results (after pipeline collapse); OR only suspiciousAuthAppApproval self-referencing chains; OR Q8 returns only App Governance “Unused”/“Expiring” alerts with no XDR/MCAS overlap |
Q1 returns any chain with adminConfirmedUserCompromised or confirmedCompromised (→ auto 15+); OR Q6 returns 🔴-priority cred→consent chain from a user with active Identity Protection risk; OR Q8 returns apps with DetectionBreadth ≥2 (multi-source detections) or any Attack Disruption incident |
Scoring Anchors (Deterministic Rules)
Apply these anchors BEFORE adjusting within bands. They set a floor for the dimension score:
| Condition | Dimension | Minimum Score |
|---|---|---|
AppRoleAssignment.ReadWrite.All granted to ANY app |
Permission Concentration | 16 |
Any app owner has adminConfirmedUserCompromised |
Owner Risk | 15 |
| Any secret >730 days old on an app with critical perms | Credential Hygiene | 14 |
Foreign SPN with AppRoleAssignment.ReadWrite.All |
Cross-Tenant Exposure | 17 |
Q1 chain with adminConfirmedUserCompromised → app consent |
Active Abuse Signals | 15 |
| Q8 returns any Attack Disruption incident for an app in Phase 1 | Active Abuse Signals | 16 |
| Q8 returns app with DetectionBreadth ≥3 AND in Phase 1 flagged list | Active Abuse Signals | 14 |
| All Q1–Q8 non-pipeline results = 0 | Active Abuse Signals | ≤5 (cap) |
Interpretation Scale
| Score | Rating | Action |
|---|---|---|
| 0–20 | ✅ Healthy | Normal posture, routine monitoring |
| 21–45 | 🟡 Elevated | Review — minor permission sprawl or credential age detected |
| 46–70 | 🟠 Concerning | Investigate — multiple risk signals across dimensions |
| 71–100 | 🔴 Critical | Immediate remediation — active abuse chains or critical permission concentration |
Execution Workflow
Phase 0: Prerequisites
- Confirm Graph MCP (
mcp_graph-mcp-ser) is available for posture queries - Confirm
RunAdvancedHuntingQueryis available for chain detection - Ask user for output format (inline / markdown / both)
- Ask user for lookback period (default: 30 days for KQL queries)
Phase 1: Graph API Posture Inventory (Steps P1–P7)
Sequential — each step depends on the previous.
| Step | Purpose | API Call(s) |
|---|---|---|
| P1 | Find Microsoft Graph service principal ID in tenant | 1 call |
| P2 | List ALL application permission grants to Microsoft Graph | 1 call (paginated) — save to temp/p2_grants.json |
| P3 | Resolve permission GUIDs to human-readable names | 1 call — run in parallel with P2 — save to temp/p3_approles.json |
| P4 | Filter to dangerous permissions (PowerShell script) | 0 API calls — joins P2+P3 JSON, outputs flagged apps |
| P5 | Resolve owners for flagged apps | N calls (only flagged apps) |
| P6 | Assess owner risk (directory roles) | M calls (only flagged owners) |
| P7 | Credential hygiene check (from P5 response) | 0 calls |
Total: 3 + N + M calls (typically < 20 for most tenants)
Phase 2: KQL Chain Detection (Q1–Q8)
Run in parallel — no dependencies between queries. Q8 uses a 90-day lookback (incident data is sparser); Q1–Q7 use 30 days.
| Query | Purpose | Tables | Kill Chain Stage |
|---|---|---|---|
| Q1 | Risky User → App Operations Chain | AADUserRiskEvents + AuditLogs | Compromise → App Abuse |
| Q2 | Credential Add → SPN Activation | AuditLogs + AADServicePrincipalSignInLogs | Persistence → SPN Impersonation |
| Q3 | Ownership Add → Credential Modification Chain | AuditLogs (self-join) | Privilege Escalation → Persistence |
| Q4 | Cross-Tenant SPN Sign-Ins | AADServicePrincipalSignInLogs | Lateral Movement (cross-tenant) |
| Q5 | Credential Add → SPN Graph API Lateral Movement | AuditLogs + MicrosoftGraphActivityLogs | Lateral Movement / Data Exfiltration |
| Q6 | Credential Add → Permission Escalation Chain | AuditLogs (self-join) | Persistence → Privilege Escalation |
| Q7 | Multi-App Ownership Spread | AuditLogs | Persistence (breadth) |
| Q8 | App Governance & OAuth Incident Cross-Reference | AlertInfo + AlertEvidence | Detection Validation |
Phase 3: Score Computation & Report Generation
- Compute per-dimension scores from Phase 1 and Phase 2 data
- Cross-reference: Map Phase 1 flagged apps to Phase 2 chain detections
- Sum dimension scores for composite App Permission Risk Score
- Generate report in requested output mode
- Report total elapsed time
Phase 1: Graph API Posture Inventory
Scaling Strategy: Don't enumerate all app registrations (could be 1000+). Query from the permission grant side — find what's been granted dangerous permissions, then resolve owners only for those flagged apps.
Step P1: Find the Microsoft Graph Service Principal ID
The Microsoft Graph resource service principal is the target of all application permission grants. Its well-known AppId is 00000003-0000-0000-c000-000000000000, but its ObjectId varies per tenant.
GET /v1.0/servicePrincipals?$filter=appId eq '00000003-0000-0000-c000-000000000000'&$select=id,displayName
Save the returned id — you'll need it for Steps P2 and P3.
Step P2: List ALL Application Permission Grants to Microsoft Graph
This single call returns every app in the tenant that has been granted application-level permissions (not delegated) to Microsoft Graph.
GET /v1.0/servicePrincipals/{graph-sp-id}/appRoleAssignedTo
?$select=principalDisplayName,principalId,principalType,appRoleId,createdDateTime
&$top=999
Returns: One row per permission grant. Each row contains:
principalDisplayName— app nameprincipalId— ServicePrincipal ObjectIdappRoleId— permission GUIDcreatedDateTime— when the permission was granted
Post-processing: Group by principalDisplayName to get the per-app permission list.
⚠️ Large Response Handling: P2 can return hundreds of rows (one per permission grant across all apps). When the response is large:
- Save P2 and P3 responses to
temp/as JSON files before processing — this prevents data loss if context gets truncated - Run P2 and P3 in parallel — they are independent (P3 only needs the Graph SP ID from P1, same as P2)
- Use PowerShell for the GUID→name join and dangerous-permission filter — do NOT attempt to parse large JSON in-context. Write a script that:
- Loads P2 grants + P3 appRoles from the saved JSON files
- Builds the
appRoleId→valuelookup map - Filters to dangerous permissions
- Groups by app name
- Outputs the flagged-app summary (app name, dangerous perms, grant dates, principalId)
- Only bring the filtered summary back into context — the full P2/P3 data stays in temp files for reference
# Save MCP responses to temp files first, then:
$grants = Get-Content "temp/p2_grants.json" -Raw | ConvertFrom-Json
$roles = Get-Content "temp/p3_approles.json" -Raw | ConvertFrom-Json
# Build GUID→name map
$roleMap = @{}
foreach ($r in $roles) { $roleMap[$r.id] = $r.value }
# Dangerous permissions list
$dangerousPerms = @(
"Directory.ReadWrite.All", "Application.ReadWrite.All",
"AppRoleAssignment.ReadWrite.All", "RoleManagement.ReadWrite.Directory",
"Mail.ReadWrite", "Mail.Send", "Mail.Read",
"Files.ReadWrite.All", "User.ReadWrite.All", "Group.ReadWrite.All",
"Sites.ReadWrite.All", "MailboxSettings.ReadWrite", "User.Export.All",
"Exchange.ManageAsApp", "full_access_as_app",
"Policy.ReadWrite.ConditionalAccess", "SecurityEvents.ReadWrite.All"
)
# Enrich grants with permission names and filter
$enriched = $grants | ForEach-Object {
$permName = $roleMap[$_.appRoleId]
[PSCustomObject]@{
App = $_.principalDisplayName
PrincipalId = $_.principalId
Permission = $permName
Dangerous = $permName -in $dangerousPerms
GrantDate = $_.createdDateTime
}
}
# Summary: apps with dangerous permissions
$flagged = $enriched | Where-Object Dangerous | Group-Object App | ForEach-Object {
[PSCustomObject]@{
App = $_.Name
DangerousPerms = ($_.Group.Permission | Sort-Object -Unique) -join ", "
Count = $_.Count
LatestGrant = ($_.Group.GrantDate | Sort-Object -Descending | Select-Object -First 1)
PrincipalId = $_.Group[0].PrincipalId
}
} | Sort-Object Count -Descending
# Display summary
$totalApps = ($enriched | Select-Object -Unique App).Count
Write-Host "Total apps with Graph permissions: $totalApps"
Write-Host "Apps with dangerous permissions: $($flagged.Count)"
Write-Host "Total dangerous grants: $(($enriched | Where-Object Dangerous).Count)"
$flagged | Format-Table -AutoSize
This script replaces the manual P3/P4 steps — it does the GUID resolution AND dangerous-permission filtering in one pass.
Step P3: Resolve Permission GUIDs to Names
Run in parallel with P2 — both only need the Graph SP ID from P1.
GET /v1.0/servicePrincipals/{graph-sp-id}/appRoles
Returns: Complete list of Microsoft Graph permission definitions with id (GUID), value (e.g., Mail.ReadWrite), and displayName.
Save the response to temp/p3_approles.json. The PowerShell script from P2 loads this file to build the GUID→name lookup.
Step P4: Filter to Dangerous Permissions
Handled by the PowerShell script in P2. The script performs GUID→name join, dangerous-permission filter, and per-app grouping in one pass. No additional API calls needed.
Output: A table of flagged apps with their dangerous permission list, permission risk level, and grant dates.
Step P5: Resolve Owners for Flagged Apps
Only for apps flagged in P4, retrieve owners from the Application object (NOT the ServicePrincipal):
GET /v1.0/applications?$filter=displayName eq '{flagged-app-name}'
&$select=id,appId,displayName,passwordCredentials,keyCredentials
&$expand=owners($select=id,displayName,userPrincipalName)
Repeat for each flagged app. Important:
- Cross-tenant SPNs return empty results (no local Application object)
- Red team apps may have owners stripped post-creation
- For ownerless apps, fall back to AuditLogs
"Add application"to find original creator
Step P6: Assess Owner Risk
For each owner found in P5:
-
Check directory roles — is the owner a privileged admin or a standard user?
GET /v1.0/roleManagement/directory/roleAssignments ?$filter=principalId eq '{owner-id}' &$expand=roleDefinition($select=displayName)Non-admin owners of apps with critical permissions = the Guardz attack vector.
-
Check Identity Protection risk — feed
owner.userPrincipalNameinto Q1 to detect active risk events. An owner currently flagged by Identity Protection who owns a dangerous app is the highest-priority finding.
Step P7: Credential Hygiene Check
The P5 response includes passwordCredentials and keyCredentials. Assess:
| Check | Field | Risk |
|---|---|---|
| Multiple active secrets | passwordCredentials[] where endDateTime > now |
🟠 Multiple access methods — harder to revoke |
| Long-lived secrets | endDateTime > 2 years from startDateTime |
🟠 Stale credential risk — may leak without detection |
| No credentials at all | Empty passwordCredentials + keyCredentials |
🟢 App can't be used for SPN auth (lower risk) |
| Certificate + Secret both active | Both arrays non-empty | 🟡 Review — cert is expected, secret alongside is unusual |
Phase 2: KQL Chain Detection Queries
All queries below are verified against live data. Use them exactly as written, substituting only the lookback period and chain windows where noted.
Tool: Use
RunAdvancedHuntingQueryfor all queries. All tables are Analytics-tier — AH queries are free. Fall back tomcp_sentinel-data_query_lakeonly for lookback > 30 days.
Query 1: Risky User → App Operations Chain (HIGHEST SIGNAL)
Purpose: Detect users with active Identity Protection risk detections who then perform app credential, ownership, or consent operations.
Kill Chain Stage: Compromise → App Abuse
Tables: AADUserRiskEvents + AuditLogs
Why high signal: A user flagged by Identity Protection performing app credential operations within days is strong evidence of the exact attack pattern described in the Guardz research.
// Chain Detection: Users with active risk → app credential/ownership operations
let lookback = 30d;
let chainWindow = 7d; // Risk event → app operation within 7 days
// Step 1: Users with unresolved or confirmed risk
let RiskyUsers = AADUserRiskEvents
| where TimeGenerated > ago(lookback)
| where RiskState in ("atRisk", "confirmedCompromised")
| summarize
RiskEvents = count(),
RiskTypes = make_set(RiskEventType, 5),
MaxRiskLevel = max(RiskLevel),
EarliestRisk = min(TimeGenerated),
LatestRisk = max(TimeGenerated)
by UserPrincipalName;
// Step 2: App credential/ownership/consent operations by those users
AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName has_any ("credential", "secret", "certificate", "owner", "consent", "permission")
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| where isnotempty(InitiatedByUser)
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppName = coalesce(
tostring(Target.displayName),
tostring(parse_json(tostring(parse_json(tostring(Target.modifiedProperties))[1].newValue))))
| join kind=inner RiskyUsers on $left.InitiatedByUser == $right.UserPrincipalName
| where TimeGenerated between (EarliestRisk .. (LatestRisk + chainWindow))
| project
RiskDetectedAt = EarliestRisk,
AppOperationAt = TimeGenerated,
TimeDeltaHours = datetime_diff('hour', TimeGenerated, EarliestRisk),
User = InitiatedByUser,
RiskTypes,
MaxRiskLevel,
RiskEvents,
OperationName,
TargetApp = TargetAppName,
CorrelationId
| order by RiskDetectedAt desc
Triage Priority:
- 🔴 Critical:
MaxRiskLevel= high + credential add operation → likely active compromise - 🟠 High:
MaxRiskLevel= medium + ownership add → attacker positioning for persistence - 🟡 Medium:
MaxRiskLevel= low + consent grant → may besuspiciousAuthAppApprovalself-referencing
Tuning:
- Tighten
chainWindowto1dfor higher precision - Add
| where RiskTypes !has "suspiciousAuthAppApproval"to exclude consent-flagging-consent loops
Query 2: Credential Add → SPN Activation from New Origin
Purpose: After a credential is added to an app, detect when the SPN authenticates from a new IP within 72 hours. This is the SolarWinds "backdoor credential → authenticate as the app" pattern.
Kill Chain Stage: Persistence → SPN Impersonation
Tables: AuditLogs + AADServicePrincipalSignInLogs
// Chain Detection: Credential added → SPN signs in within 72h
let lookback = 30d;
let activationWindow = 72h;
// Step 1: Credential additions with actor and target
let CredentialAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
"Update application – Certificates and secrets management ",
"Add service principal credentials"
)
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| extend InitiatedByApp = tostring(parse_json(tostring(InitiatedBy)).app.displayName)
| extend Actor = iff(isnotempty(InitiatedByUser), InitiatedByUser, InitiatedByApp)
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppName = tostring(Target.displayName)
| extend TargetAppId = tostring(Target.id)
| extend ModifiedProps = parse_json(tostring(Target.modifiedProperties))
| extend KeyDescription = tostring(ModifiedProps[0].newValue)
| extend CredentialType = case(
KeyDescription has "AsymmetricX509Cert", "Certificate",
KeyDescription has "Password", "Client Secret",
"Unknown")
| project CredAddTime = TimeGenerated, Actor, TargetAppName, TargetAppId, CredentialType, CorrelationId;
// Step 2: SPN sign-ins after credential add
CredentialAdds
| join kind=inner (
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(lookback)
| where ResultType == "0" // successful only
| project SPNSignInTime = TimeGenerated, AppId, ServicePrincipalName, IPAddress,
Location, ResourceDisplayName, ClientCredentialType,
ServicePrincipalCredentialKeyId
) on $left.TargetAppId == $right.AppId
| where SPNSignInTime between (CredAddTime .. (CredAddTime + activationWindow))
| summarize
SPNSignIns = count(),
DistinctIPs = dcount(IPAddress),
IPs = make_set(IPAddress, 10),
Resources = make_set(ResourceDisplayName, 5),
CredTypes = make_set(ClientCredentialType, 5),
FirstSignIn = min(SPNSignInTime),
LastSignIn = max(SPNSignInTime)
by CredAddTime, Actor, TargetAppName, TargetAppId, CredentialType, CorrelationId
| extend HoursToActivation = datetime_diff('hour', FirstSignIn, CredAddTime)
| order by CredAddTime desc
Triage Priority:
- 🔴 Critical:
HoursToActivation< 1 + new IP not in SPN's historical baseline - 🟠 High:
HoursToActivation< 24 + accessing sensitive resources (Graph, Key Vault) - 🟡 Medium: Normal activation window but from multiple IPs
Enhancement: Run the SPN scope drift skill (.github/skills/scope-drift-detection/spn/SKILL.md) on any flagged SPN for baseline comparison.
Query 3: Ownership Add → Credential Modification Chain
Purpose: Detect the exact Guardz attack sequence — user is added as app owner, then credentials are modified on that app within 7 days. The SameActorAsNewOwner flag is key: if the newly added owner immediately creates a credential, that's the attacker using ownership to establish persistence.
Kill Chain Stage: Privilege Escalation → Persistence
Tables: AuditLogs (self-join)
// Chain Detection: Owner added to app → credential/permission op on same app within 7d
let lookback = 30d;
let chainWindow = 7d;
// Step 1: Ownership additions — extract new owner and target app
let OwnershipAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ ("Add owner to application", "Add owner to service principal")
| extend Target0 = parse_json(tostring(TargetResources))[0]
| extend NewOwnerUPN = tostring(Target0.userPrincipalName)
| extend NewOwnerId = tostring(Target0.id)
| extend ModProps = parse_json(tostring(Target0.modifiedProperties))
| extend TargetAppName = tostring(parse_json(tostring(ModProps[1].newValue)))
| extend TargetAppId = tostring(parse_json(tostring(ModProps[0].newValue)))
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| extend Actor = iff(isnotempty(InitiatedByUser), InitiatedByUser, tostring(parse_json(tostring(InitiatedBy)).app.displayName))
| project OwnerAddTime = TimeGenerated, Actor, NewOwnerUPN, TargetAppName, TargetAppId, OperationName;
// Step 2: Credential or permission operations on the same app
AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
"Update application – Certificates and secrets management ",
"Add service principal credentials",
"Add delegated permission grant",
"Consent to application",
"Add app role assignment to service principal"
)
| extend Target = parse_json(tostring(TargetResources))[0]
| extend CredTargetId = tostring(Target.id)
| extend CredActor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| join kind=inner OwnershipAdds on $left.CredTargetId == $right.TargetAppId
| where TimeGenerated between (OwnerAddTime .. (OwnerAddTime + chainWindow))
| project
OwnerAddTime,
CredOpTime = TimeGenerated,
HoursGap = datetime_diff('hour', TimeGenerated, OwnerAddTime),
NewOwnerUPN,
CredActor,
SameActorAsNewOwner = (CredActor =~ NewOwnerUPN),
OwnershipOp = OperationName1,
CredentialOp = OperationName,
TargetAppName,
TargetAppId
| order by OwnerAddTime desc
Triage Priority:
- 🔴 Critical:
SameActorAsNewOwner= true +HoursGap< 1 → scripted attack - 🟠 High:
SameActorAsNewOwner= true +HoursGap< 24 → manual attacker - 🟡 Medium: Different actors (admin added owner, owner later legitimately rotated creds)
Query 4: SPN Cross-Tenant Sign-Ins
Purpose: Detect service principals owned by external tenants authenticating into your tenant. Multi-tenant app abuse was the core SolarWinds persistence mechanism.
Kill Chain Stage: Lateral Movement (cross-tenant)
Tables: AADServicePrincipalSignInLogs
// Detect cross-tenant SPN authentication — foreign SPNs accessing local resources
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(30d)
| where ResultType == "0"
| where isnotempty(AppOwnerTenantId)
| where AppOwnerTenantId != AADTenantId
| summarize
SignIns = count(),
DistinctIPs = dcount(IPAddress),
IPs = make_set(IPAddress, 5),
Resources = make_set(ResourceDisplayName, 10),
CredTypes = make_set(ClientCredentialType, 5),
Locations = make_set(Location, 5),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by ServicePrincipalName, AppId, AppOwnerTenantId, AADTenantId
| order by SignIns desc
Triage Priority:
- 🔴 Critical: Unknown foreign tenant SPN accessing sensitive resources (Graph, Key Vault, ARM)
- 🟠 High: Known partner/vendor SPN with new access patterns
- 🟡 Low: Microsoft first-party service SPNs (verify against first-party app list)
Enhancement — New Cross-Tenant SPNs (first seen in last 7d vs 30d baseline):
let recent = 7d;
let baseline = 30d;
let RecentCrossTenant = AADServicePrincipalSignInLogs
| where TimeGenerated > ago(recent)
| where ResultType == "0"
| where AppOwnerTenantId != AADTenantId
| distinct AppId, ServicePrincipalName, AppOwnerTenantId;
let BaselineCrossTenant = AADServicePrincipalSignInLogs
| where TimeGenerated between (ago(baseline) .. ago(recent))
| where ResultType == "0"
| where AppOwnerTenantId != AADTenantId
| distinct AppId;
RecentCrossTenant
| join kind=leftanti BaselineCrossTenant on AppId
| project ServicePrincipalName, AppId, AppOwnerTenantId
Query 5: Credential Add → SPN Graph API Lateral Movement
Purpose: After a credential is added, track what Graph API calls the SPN makes. Categorizes API endpoints into sensitive categories to identify lateral movement and data exfiltration.
Kill Chain Stage: Lateral Movement / Data Exfiltration
Tables: AuditLogs + MicrosoftGraphActivityLogs
Prerequisite: MicrosoftGraphActivityLogs must be ingested (requires Entra ID P1/P2 + diagnostic settings enabled).
// Chain Detection: Credential added → SPN Graph API calls within 72h
let lookback = 30d;
let monitorWindow = 72h;
// Step 1: Apps that had credentials added
let CredentialAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
"Update application – Certificates and secrets management ",
"Add service principal credentials"
)
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppId = tostring(Target.id)
| extend TargetAppName = tostring(Target.displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| project CredAddTime = TimeGenerated, Actor, TargetAppName, TargetAppId;
// Step 2: Graph API calls by those apps after credential add
CredentialAdds
| join kind=inner (
MicrosoftGraphActivityLogs
| where TimeGenerated > ago(lookback)
| where isnotempty(ServicePrincipalId)
| project GraphCallTime = TimeGenerated, AppId, RequestMethod, RequestUri,
ResponseStatusCode, ServicePrincipalId
) on $left.TargetAppId == $right.AppId
| where GraphCallTime between (CredAddTime .. (CredAddTime + monitorWindow))
| extend EndpointCategory = case(
RequestUri has "/roleManagement/", "Role Management",
RequestUri has_any ("/applications/", "/servicePrincipals/"), "App/SPN Management",
RequestUri has "/users/", "User Enumeration",
RequestUri has "/groups/", "Group Enumeration",
RequestUri has "/identity/conditionalAccess/", "CA Policy Access",
RequestUri has "/policies/", "Policy Management",
RequestUri has "/security/", "Security Data",
RequestUri has_any ("/mail/", "/messages", "/mailFolders"), "Email Access",
RequestUri has_any ("/drives/", "/sites/"), "File Access",
RequestUri has "/auditLogs/", "Audit Log Access",
"Other")
| where EndpointCategory != "Other"
| summarize
GraphCalls = count(),
Methods = make_set(RequestMethod, 5),
SampleUris = make_set(RequestUri, 3),
SuccessRate = round(100.0 * countif(ResponseStatusCode >= 200 and ResponseStatusCode < 300) / count(), 1)
by CredAddTime, Actor, TargetAppName, TargetAppId, EndpointCategory
| order by CredAddTime desc, GraphCalls desc
Triage Priority:
- 🔴 Critical:
Role ManagementorApp/SPN Management→ privilege escalation / further persistence - 🔴 Critical:
Email Access→ data exfiltration (SolarWinds primary objective) - 🟠 High:
CA Policy AccessorPolicy Management→ defense evasion - 🟡 Medium:
File Access→ potential data staging
Query 6: Credential Add → Permission Escalation Chain
Purpose: After adding a credential (persistence), detect the attacker granting additional permissions or consenting to broader API access on the same app.
Kill Chain Stage: Persistence → Privilege Escalation
Tables: AuditLogs (self-join)
Schema Note: Credential operations and consent operations use different ID spaces for the same app (Application ObjectId vs ServicePrincipal ObjectId). This query joins on Actor + TargetAppName to bridge the gap.
// Chain Detection: Credential added → permission/consent on same app within 7d
let lookback = 30d;
let escalationWindow = 7d;
// Step 1: Credential additions
let CredentialAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
"Update application – Certificates and secrets management ",
"Add service principal credentials"
)
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppName = tostring(Target.displayName)
| where isnotempty(TargetAppName)
| extend CredActor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| where isnotempty(CredActor)
| project CredAddTime = TimeGenerated, CredActor, TargetAppName;
// Step 2: Permission grants by same actor on same-named app
let PermissionGrants = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
"Add delegated permission grant",
"Consent to application",
"Add app role assignment to service principal"
)
| extend EscActor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| where isnotempty(EscActor)
| extend Target0 = parse_json(tostring(TargetResources))[0]
| extend PermAppName = case(
OperationName =~ "Consent to application", tostring(Target0.displayName),
tostring(Target0.displayName))
| project PermOpTime = TimeGenerated, EscActor, PermAppName, EscalationOp = OperationName;
// Join: same actor + same app + credential first then permission
CredentialAdds
| join kind=inner PermissionGrants on $left.CredActor == $right.EscActor, $left.TargetAppName == $right.PermAppName
| where PermOpTime between (CredAddTime .. (CredAddTime + escalationWindow))
| project
CredAddTime,
PermissionOpTime = PermOpTime,
HoursGap = datetime_diff('hour', PermOpTime, CredAddTime),
Actor = CredActor,
TargetAppName,
EscalationOp
| order by CredAddTime desc
Triage Priority:
- 🔴 Critical:
HoursGap= 0 + consent grant → automated attack tool - 🟠 High: Consent to powerful API scopes
- 🟡 Medium:
Add app role assignmentwith larger gap → possibly legitimate
Query 7: Multi-App Ownership Spread
Purpose: Detect a single user being added as owner to multiple applications within a rolling window. Attackers spread ownership across apps to maximize blast radius.
Kill Chain Stage: Persistence (breadth)
Tables: AuditLogs
// Detect lateral ownership expansion — one user becoming owner of many apps
let lookback = 30d;
AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ ("Add owner to application", "Add owner to service principal")
| extend Target0 = parse_json(tostring(TargetResources))[0]
| extend NewOwnerUPN = tostring(Target0.userPrincipalName)
| extend ModProps = parse_json(tostring(Target0.modifiedProperties))
| extend TargetAppName = tostring(parse_json(tostring(ModProps[1].newValue)))
| extend TargetAppId = tostring(parse_json(tostring(ModProps[0].newValue)))
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| extend Actor = iff(isnotempty(InitiatedByUser), InitiatedByUser, tostring(parse_json(tostring(InitiatedBy)).app.displayName))
| where isnotempty(NewOwnerUPN)
| summarize
AppsOwned = dcount(TargetAppId),
AppNames = make_set(TargetAppName, 10),
OwnershipOps = count(),
FirstAdd = min(TimeGenerated),
LastAdd = max(TimeGenerated),
AddedBy = make_set(Actor, 5)
by NewOwnerUPN
| extend SpreadWindowHours = datetime_diff('hour', LastAdd, FirstAdd)
| where AppsOwned >= 3
| order by AppsOwned desc
Triage Priority:
- 🔴 Critical:
AppsOwned>= 5 +SpreadWindowHours< 24 → bulk automated ownership grab - 🟠 High: Non-admin user (
AddedBy= themselves) withAppsOwned>= 3 - 🟡 Medium: Automation account adding ownership as part of deployment
Enhancement: Feed NewOwnerUPN values into Q1 to check for active identity risk events.
Query 8: App Governance & OAuth Incident Cross-Reference
Purpose: Surface existing Defender detections (App Governance, MCAS, Defender XDR attack disruptions) for apps in our posture assessment. Creates a cross-reference between our Graph API + KQL findings and what Microsoft's own detection products already flagged — confirming known threats and highlighting gaps.
Kill Chain Stage: Detection Validation (cross-reference)
Tables: AlertInfo + AlertEvidence
Why this matters:
- Apps flagged by BOTH our skill AND App Governance/XDR → confirmed threat, urgent remediation
- Apps flagged ONLY by our skill → unique detection value (the skill caught what App Governance missed)
- Apps flagged ONLY by App Governance → coverage gap in our assessment (e.g., apps without dangerous Graph perms but with suspicious behavior)
Key field mappings (discovered via live testing):
| Field | Table | Values |
|---|---|---|
ServiceSource |
AlertInfo |
"App Governance", "Microsoft Defender for Cloud Apps", "Microsoft Defender XDR", "Microsoft Defender for Identity" |
DetectionSource |
AlertInfo |
"App Governance Policy", "Microsoft 365 Defender", "Security Copilot", "Custom detection" |
EntityType |
AlertEvidence |
"OAuthApplication" (app entities), "CloudApplication" (resource targets) |
AdditionalFields.OAuthAppId |
AlertEvidence |
Application (client) ID — join key to Graph API flagged apps |
AdditionalFields.Name |
AlertEvidence |
App display name |
App Governance alert types:
Custom policy,App Creation Policy— admin-defined rulesOverprivileged app,New highly privileged app— permission-based detectionsExpiring credentials,Unused credentials,Unused app— hygiene alerts
Defender XDR OAuth alert types:
Malicious OAuth application registration by a compromised user— attack disruptionSuspicious OAuth consent and privilege escalation activity— Security Copilot detectionSuspicious OAuth app registration— MCAS detectionAnomalous OAuth device code authentication activity— MDI detection
// Q8: App Governance + OAuth Incident Cross-Reference
let lookback = 90d;
// Part 1: App Governance alerts
let AppGovAlerts = AlertInfo
| where Timestamp > ago(lookback)
| where ServiceSource == "App Governance"
| project AlertId, AlertTitle = Title, ServiceSource, DetectionSource, Severity, Timestamp;
// Part 2: OAuth-related alerts from all sources
let OAuthAlerts = AlertInfo
| where Timestamp > ago(lookback)
| where Title has "OAuth"
or (ServiceSource == "Microsoft Defender for Cloud Apps" and Title has_any ("app registration", "OAuth"))
| project AlertId, AlertTitle = Title, ServiceSource, DetectionSource, Severity, Timestamp;
// Part 3: Attack Disruption incidents targeting OAuth/compromised-user app abuse
let AttackDisruption = AlertInfo
| where Timestamp > ago(lookback)
| where Title has "attack disruption" and Title has_any ("OAuth", "malicious", "compromised")
| project AlertId, AlertTitle = Title, ServiceSource, DetectionSource, Severity, Timestamp;
// Combine all alert sources (deduplicate)
let AllAppAlerts = union AppGovAlerts, OAuthAlerts, AttackDisruption
| summarize arg_max(Timestamp, *) by AlertId;
// Join with AlertEvidence to get OAuthApplication entities
AllAppAlerts
| join kind=leftouter (
AlertEvidence
| where Timestamp > ago(lookback)
| where EntityType == "OAuthApplication"
| extend OAuthAppId = tostring(parse_json(AdditionalFields).OAuthAppId)
| extend OAuthAppName = tostring(parse_json(AdditionalFields).Name)
| project AlertId, OAuthAppId, OAuthAppName, EntityType
) on AlertId
| summarize
AlertCount = count(),
AlertTitles = make_set(AlertTitle, 10),
Severities = make_set(Severity, 5),
ServiceSources = make_set(ServiceSource, 5),
DetectionSources = make_set(DetectionSource, 5),
LatestAlert = max(Timestamp),
EarliestAlert = min(Timestamp)
by OAuthAppName, OAuthAppId
| extend OAuthAppName = iff(isempty(OAuthAppName), "⚠️ No app entity extracted", OAuthAppName)
| extend HasDefenderXDR = ServiceSources has "Microsoft Defender XDR"
| extend HasAppGov = ServiceSources has "App Governance"
| extend HasMCAS = ServiceSources has "Microsoft Defender for Cloud Apps"
| extend DetectionBreadth = toint(HasDefenderXDR) + toint(HasAppGov) + toint(HasMCAS)
| order by DetectionBreadth desc, AlertCount desc
Post-processing — Cross-reference with Phase 1 flagged apps:
After Q8 returns, compare the OAuthAppName values against the apps flagged in Phase 1 (P4):
| Scenario | Meaning | Report Action |
|---|---|---|
| App in BOTH Phase 1 (dangerous perms) AND Q8 (existing detections) | Confirmed threat — multiple detection layers agree | 🔴 Highlight in report: "Corroborated by N existing Defender detections" |
| App in Phase 1 ONLY (dangerous perms, no Q8 hits) | Skill-unique detection — App Governance hasn't flagged it | 🟠 Highlight: "Not yet detected by App Governance — unique skill finding" |
| App in Q8 ONLY (existing detections, not in Phase 1) | App may not have dangerous Graph perms but has suspicious behavior | 🔵 Include in appendix: "Additional apps flagged by App Governance (not in dangerous-perms scope)" |
App with DetectionBreadth ≥ 2 |
Multiple Defender products independently detected the app | 🔴 Highest confidence finding |
Triage Priority:
- 🔴 Critical:
DetectionBreadth≥ 2 AND app also in Phase 1 flagged list → multi-source confirmed threat - 🔴 Critical: Any alert titled "Malicious OAuth application registration by a compromised user" (attack disruption) → Defender XDR auto-disrupted the attack
- 🟠 High: App Governance
Overprivileged apporNew highly privileged appalerts on Phase 1 flagged apps - 🟡 Medium: App Governance hygiene alerts (
Expiring credentials,Unused app) on any app
Output Modes
Mode 1: Inline Chat Summary
Render the full analysis directly in the chat response. Best for quick review.
Mode 2: Markdown File Report
Save a comprehensive report to disk at:
reports/app-registration-posture/App_Registration_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md
Where {tenant} is a short identifier for the tenant (derive from config.json or ask the user).
Mode 3: Both
Generate the markdown file AND provide an inline summary in chat.
Always ask the user which mode before generating output.
Inline Report Template
Render the following sections in order. Omit sections only if explicitly noted as conditional.
🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).
# 🔐 App Registration Security Posture Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Sources:** Graph API + Advanced Hunting (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs, AlertInfo, AlertEvidence)
**KQL Lookback:** <N> days (Q1–Q7); 90 days (Q8)
**Tenant:** <tenant name> (<tenant ID>)
---
## Executive Summary
<2-3 sentences: total apps with Graph permissions, apps with dangerous permissions, key chain detection findings, overall score>
**Overall Risk Rating:** 🔴/🟠/🟡/✅ <RATING> (<Score>/100)
---
## Key Metrics
| Metric | Value |
|--------|-------|
| Apps with Graph API Permissions | <N> |
| Apps with Dangerous Permissions | <N> |
| Critical Permission Grants (🔴) | <N> |
| High Permission Grants (🟠) | <N> |
| Medium Permission Grants (🟡) | <N> |
| Ownerless Apps with Dangerous Perms | <N> |
| Apps with No Local Application Object | <N> |
| Cross-Tenant SPNs | <N> |
| Active Abuse Chain Detections (Q1–Q8) | <N total hits> |
---
## 🔐 Permission Inventory (Graph API)
### Apps with Dangerous Permissions
| App Name | Dangerous Permissions | Risk Level | Grant Dates |
|----------|----------------------|------------|-------------|
| <app> | <perm1>, <perm2>, ... | 🔴/🟠/🟡 | <dates> |
### Permission Concentration
| Permission | Apps Granted | Risk |
|------------|-------------|------|
| <perm> | <N> (<app names>) | 🔴/🟠/🟡 |
**Assessment:**
- <emoji> <evidence-based finding about permission concentration>
- <emoji> <finding about golden ticket permissions (AppRoleAssignment.ReadWrite.All)>
---
## 👤 Owner Risk Assessment
### Flagged App Owners
> **Non-optional columns:** The `Identity Protection Risk` column MUST always be present. For each owner, check Q1 results or query AADUserRiskEvents for active risk state. If no risk events exist, show "✅ None". Never drop this column.
| App Name | Owner | Owner Roles | Identity Protection Risk | Owner Risk |
|----------|-------|-------------|--------------------------|------------|
| <app> | <upn> | <roles or "None (standard user)"> | <risk state + risk types, or "✅ None"> | 🔴/🟠/🟡/🟢 |
### Ownerless Apps with Dangerous Permissions
| App Name | Dangerous Permissions | Creator (from AuditLogs) |
|----------|----------------------|--------------------------|
| <app> | <perms> | <creator UPN or "Unknown"> |
**Assessment:**
- <emoji> <finding about non-admin owners on critical-permission apps>
- <emoji> <finding about ownerless apps>
---
## 🔑 Credential Hygiene
| App Name | Active Secrets | Active Certs | Oldest Secret Age | Longest Expiry | Risk |
|----------|---------------|-------------|-------------------|----------------|------|
| <app> | <N> | <N> | <days> | <date> | 🔴/🟠/🟡/🟢 |
**Assessment:**
- <emoji> <finding about multi-credential apps>
- <emoji> <finding about long-lived secrets>
- 🟡 **Dormant privileged apps:** List any apps with dangerous permissions but NO active credentials (0 secrets, 0 valid certs). These are one `Add service principal credentials` operation away from active abuse — rate as 🟡 at assessment level (not 🟢). Example: "Contoso employee onboarding has `User.ReadWrite.All` but no credentials — dormant risk."
---
## 🌐 Cross-Tenant SPN Exposure (Q4)
<If Q4 returns results:>
| SPN Name | Owner Tenant | Sign-Ins (30d) | Distinct IPs | Resources Accessed | Auth Methods | Locations | First Seen | Last Seen |
|----------|-------------|----------------|-------------|-------------------|-------------|-----------|------------|-----------|
| <name> | <tenant ID> | <N> | <N> | <resources> | <methods> | <locations> | <date> | <date> |
> **Auth method note:** `clientAssertion` (certificate-based) indicates higher attacker sophistication than `clientSecret`. Both present on a single SPN may indicate migration or redundant credential paths.
<If Q4 enhancement returns new SPNs:>
⚠️ **New Cross-Tenant SPNs (first seen in last 7 days):**
| SPN Name | Owner Tenant |
|----------|-------------|
| <name> | <tenant ID> |
<If Q4 returns 0:>
✅ No cross-tenant SPN sign-ins detected in the last <N> days.
**Assessment:**
- <emoji> <finding about foreign-tenant SPNs with golden ticket or CA policy write permissions>
- <emoji> <finding about sign-in volume and resource breadth>
- 🔵 Filter out known [first-party Microsoft service SPNs](https://learn.microsoft.com/en-us/troubleshoot/entra/entra-id/governance/verify-first-party-apps-sign-in) — normal behavior.
---
## ⚡ Active Abuse Chain Detection (Q1–Q3, Q5–Q8)
> **Note:** Q4 (Cross-Tenant SPNs) is presented in its own section above since it doubles as both a chain detection and a posture finding.
> **Bulk-pattern collapse rule:** When any chain query (Q1–Q8) returns >10 chains where >80% share the same actor AND the same pattern (uniform resource, timing, app naming convention), collapse into a single **"Automated Pipeline"** summary row with the total count and a governance-review flag. Only table the outliers individually. This prevents automation noise from burying genuine attack chains.
### Q1: Risky User → App Operations
<If Q1 returns results, always start with a rollup summary table:>
**Summary:**
| Priority | Chains | Users | Key Finding |
|----------|--------|-------|-------------|
| 🔴 Critical | <N> | <users> | <top finding — e.g., adminConfirmedUserCompromised → app consent> |
| 🟠 High | <N> | <users> | <summary> |
| 🟡 Low | <N> | <users> | <summary or "consent-flagging-consent loops"> |
<Then detail tables for 🔴 Critical and 🟠 High chains only. Collapse 🟡 Low into the summary.>
| Risk Detected | App Operation | Hours Gap | User | Risk Types | Risk Level | Target App |
|--------------|---------------|-----------|------|------------|------------|------------|
| <date> | <date> | <N> | <upn> | <types> | <level> | <app> |
> ⚠️ **Self-referencing note:** If Q1 results are dominated by `suspiciousAuthAppApproval` risk types, these may be self-referencing — Identity Protection flags consent operations as risky, which then correlates back to the same consent. Report both the raw count and a filtered count (`| where RiskTypes !has "suspiciousAuthAppApproval"`) to distinguish genuine compromise signals from circular detections.
<If Q1 returns 0:>
✅ No risky-user → app-operations chains detected.
### Q2: Credential Add → SPN Activation
<If Q2 returns results:>
| Cred Added | First SPN Sign-In | Hours to Activation | Actor | App | Distinct IPs | Resources |
|------------|-------------------|---------------------|-------|-----|-------------|-----------|
| <date> | <date> | <N> | <upn> | <app> | <N> | <resources> |
<If Q2 returns 0:>
✅ No credential-add → SPN-activation chains detected.
### Q3: Ownership → Credential Chain
<If Q3 returns results:>
| Owner Added | Cred Operation | Hours Gap | New Owner | Same Actor? | App |
|-------------|---------------|-----------|-----------|-------------|-----|
| <date> | <date> | <N> | <upn> | <yes/no> | <app> |
<If Q3 returns 0:>
✅ No ownership → credential modification chains detected.
### Q5: Credential Add → Graph API Lateral Movement
<If Q5 returns results:>
| Cred Added | Actor | App | Endpoint Category | Graph Calls | Methods | Success Rate |
|------------|-------|-----|-------------------|-------------|---------|-------------|
| <date> | <upn> | <app> | <category> | <N> | <methods> | <pct>% |
<If Q5 returns 0:>
✅ No credential-add → Graph API lateral movement chains detected.
> **Note:** MicrosoftGraphActivityLogs requires Entra ID P1/P2 + diagnostic settings. If table not found, report as: `❓ MicrosoftGraphActivityLogs not available — cannot assess Graph API lateral movement.`
### Q6: Credential Add → Permission Escalation
<If Q6 returns results:>
| Cred Added | Perm Escalation | Hours Gap | Actor | App | Escalation Operation |
|------------|----------------|-----------|-------|-----|---------------------|
| <date> | <date> | <N> | <upn> | <app> | <operation> |
<If Q6 returns 0:>
✅ No credential-add → permission-escalation chains detected.
### Q7: Multi-App Ownership Spread
<If Q7 returns results:>
| User | Apps Owned | Spread Window (hrs) | App Names | Added By |
|------|-----------|---------------------|-----------|----------|
| <upn> | <N> | <N> | <names> | <actors> |
<If Q7 returns 0:>
✅ No multi-app ownership spread detected (threshold: ≥3 apps).
### Q8: App Governance & OAuth Incident Cross-Reference
> **Purpose:** Cross-reference Phase 1 flagged apps with existing Microsoft detections (App Governance alerts, Defender XDR OAuth alerts, Attack Disruption incidents). This validates skill findings against Microsoft's own detection coverage and surfaces apps with multi-source detections.
<If Q8 returns results:>
**Detection Summary:**
| App Name | App ID | Alert Count | Detection Sources | Detection Breadth | Highest Severity | Has Attack Disruption |
|----------|--------|-------------|-------------------|-------------------|------------------|-----------------------|
| <name> | <id> | <N> | <sources> | <N> | <severity> | ✅/❌ |
**Cross-Reference with Phase 1:**
- 🔴 **Both skill and Microsoft flagged:** <list apps found in BOTH Phase 1 dangerous-permission inventory AND Q8 detections — these are confirmed high-priority>
- 🟠 **Skill-only (no Microsoft detection):** <list apps from Phase 1 that Q8 did NOT detect — skill's unique value-add, may indicate detection gap in App Governance>
- 🔵 **Microsoft-only (not in skill scope):** <list apps from Q8 that are NOT in Phase 1 — may not have dangerous permissions but triggered behavioral alerts>
<If Q8 returns 0:>
✅ No App Governance, OAuth, or Attack Disruption alerts detected for any apps in the last 90 days.
---
## App Permission Risk Score Card
```
┌──────────────────────────────────────────────────────────────┐
│ APP PERMISSION RISK SCORE: <NN>/100 │
│ Rating: <EMOJI> <RATING> │
├──────────────────────────────────────────────────────────────┤
│ Perm Concentration [<bar>] <N>/20 (<detail>) │
│ Owner Risk [<bar>] <N>/20 (<detail>) │
│ Credential Hygiene [<bar>] <N>/20 (<detail>) │
│ Cross-Tenant Exp. [<bar>] <N>/20 (<detail>) │
│ Active Abuse Sigs [<bar>] <N>/20 (<detail>) │
└──────────────────────────────────────────────────────────────┘
```
### Dimension Details
| Dimension | Score | Evidence |
|-----------|-------|----------|
| **Permission Concentration** | 🔴/🟠/🟡 <N>/20 | <N> apps with dangerous perms; list golden ticket / critical perms found |
| **Owner Risk** | 🔴/🟠/🟡 <N>/20 | <N> ownerless apps; non-admin owners on critical apps; Identity Protection signals |
| **Credential Hygiene** | 🔴/🟠/🟡 <N>/20 | Multi-secret apps; stale credentials; dormant privileged apps |
| **Cross-Tenant Exposure** | 🔴/🟠/🟡 <N>/20 | Foreign SPNs with critical perms; unknown tenant IDs; resource breadth |
| **Active Abuse Signals** | 🔴/🟠/🟡 <N>/20 | Which chain queries (Q1–Q8) returned critical results; key actors; Q8 detection breadth |
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |
---
## Recommendations
> **Key context:** This skill detects signals that [Microsoft App Governance](https://learn.microsoft.com/en-us/defender-cloud-apps/app-governance-manage-app-governance) does NOT — specifically the cross-table correlation between user compromise signals and app abuse chains. Recommendations should complement App Governance, not duplicate it.
**Minimum recommendation checklist** — include ALL applicable items (skip only if the finding doesn't exist in the data). Order by severity (🔴 first):
| # | Must-Include Topic | When Applicable |
|---|-------------------|------------------|
| a | **Golden ticket / critical cross-tenant SPN remediation** | Any foreign SPN with `AppRoleAssignment.ReadWrite.All` or `Directory.ReadWrite.All` |
| b | **Compromised-user consent investigation** | Q1 returns `adminConfirmedUserCompromised` or `confirmedCompromised` chains |
| c | **Owner assignment for ownerless dangerous apps** | Any ownerless app with dangerous perms |
| d | **Stale credential rotation** | Any secret >365 days old on an app with dangerous perms |
| e | **Multi-credential reduction** | Any app with ≥3 active secrets |
| f | **Non-admin owner risk mitigation** | Non-admin user owns app with 🔴-level perms |
| g | **Single-user blast radius reduction** | Any user owns ≥20 apps (pipeline or otherwise) |
| h | **Dormant privileged app disposition** | App with dangerous perms but no credentials |
| i | **Expired-credential permission cleanup** | App with expired creds that still retains dangerous permission grants |
| j | **App Governance enablement** | Always include if not already deployed (standard closing recommendation) |
1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...
---
## Related Workspace Resources
| Resource | Relationship |
|----------|-------------|
| `queries/identity/app_credential_management.md` | Individual event queries — complements chain detections |
| `queries/identity/service_principal_scope_drift.md` | SPN behavioral baseline — use for post-detection deep dive |
| `.github/skills/scope-drift-detection/spn/SKILL.md` | Full SPN investigation workflow — run on SPNs flagged by Q2 |
| `queries/cloud/behavior_entities.md` Q6 | MCAS `UnusualAdditionOfCredentialsToAnOauthApp` detection |
---
## Appendix: Query Execution Summary
| Phase | Query | Description | Records |
|-------|-------|-------------|--------|
| 1 | P1 | Find Graph SP ID | 1 |
| 1 | P2 | List permission grants | <N> |
| 1 | P3 | Resolve permission names | <N> |
| 1 | P4 | Filter dangerous perms | <N> |
| 1 | P5 | Resolve owners | <N> apps |
| 1 | P6 | Assess owner risk | <N> owners |
| 1 | P7 | Credential hygiene | <N> apps |
| 2 | Q1 | Risky User → App Ops | <N> |
| 2 | Q2 | Cred → SPN Activation | <N> |
| 2 | Q3 | Ownership → Credential | <N> |
| 2 | Q4 | Cross-Tenant SPNs | <N> |
| 2 | Q5 | Cred → Graph API | <N> |
| 2 | Q6 | Cred → Permission Esc. | <N> |
| 2 | Q7 | Ownership Spread | <N> |
| 2 | Q8 | App Gov & OAuth Cross-Ref | <N> |
Markdown File Report Template
When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:
reports/app-registration-posture/App_Registration_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md
Include the following additional sections in the file report that are omitted from inline:
- Full permission grant table (all apps with Graph permissions, not just dangerous ones)
- Complete owner listing (all owners for all flagged apps, including creator fallback from AuditLogs)
- Credential detail table (full
passwordCredentialsandkeyCredentialswith expiry dates) - Cross-tenant SPN detail (full resource access breakdown per foreign SPN)
- Raw Q1–Q8 results (full chain detection output, not summarized)
- MITRE ATT&CK mapping table (techniques detected vs not detected)
File Report Header
# App Registration Security Posture Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Sources:** Graph API + Advanced Hunting (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs, AlertInfo, AlertEvidence)
**KQL Lookback:** <N> days (Q1–Q7); 90 days (Q8)
**Tenant:** <tenant name> (<tenant ID>)
**Apps with Graph Permissions:** <N>
**Apps with Dangerous Permissions:** <N>
**Cross-Tenant SPNs:** <N>
**Chain Detections (Q1–Q8):** <N total hits>
---
File Report Differences from Inline
The file report uses the same inline template structure with these additions:
- Q1–Q8 chain sections: Include ALL result rows (inline collapses 🟡 Low into the summary)
- Cross-Tenant SPN Exposure table: Add
Auth MethodsandLocationscolumns (inline may abbreviate) - Credential Hygiene table: Add
Application Objectcolumn (✅ Exists / ❌ No local object) - Dimension Details table: Always included (inline may omit if score is low)
- Dormant privileged apps callout: Include in credential hygiene section even for 🟢 apps
Known Pitfalls
1. Application ObjectId ≠ ServicePrincipal ObjectId
Problem: The same app has different GUIDs in TargetResources[0].id depending on the AuditLog operation type. Credential operations reference the Application ObjectId; permission/consent operations reference the ServicePrincipal ObjectId.
Impact: Joining credential events to permission events on TargetResources[0].id returns zero results even when both operations target the same app.
Solution: Q6 joins on Actor + TargetAppName (display name match) instead of ObjectId. This works reliably for same-actor chains.
2. Ownership Operations — Target Name in modifiedProperties
Problem: For "Add owner to application", TargetResources[0] is the new owner (User type), not the app. The app name is buried in TargetResources[0].modifiedProperties[1].newValue.
Solution: Extract with tostring(parse_json(tostring(ModProps[1].newValue))). Field name is Application.DisplayName.
3. OperationName Trailing Spaces
Problem: "Update application – Certificates and secrets management " has a trailing space. String equality (==) fails without it.
Solution: Use in~() with the exact string (including trailing space) or use has for substring matching.
4. Cross-Tenant SPNs Have No Local Application Object
Problem: Graph API calls to /v1.0/applications?$filter=displayName eq 'X' return empty for SPNs owned by foreign tenants — they only have a ServicePrincipal object in your tenant, not an Application object.
Impact: Cannot retrieve ownership or credential details for cross-tenant SPNs via local Graph API.
Solution: Identify cross-tenant SPNs via Q4 (AppOwnerTenantId != AADTenantId). Report them separately with a note that ownership is managed by the foreign tenant.
5. Graph API requiredResourceAccess ≠ Granted Permissions
Problem: The Application object's requiredResourceAccess shows what the app requests (manifest), not what's been admin-consented/granted.
Solution: Always use appRoleAssignedTo on the resource service principal (Step P2) for the authoritative granted permissions list.
6. Red Team Apps May Have Owners Stripped
Problem: Attack simulation tools often remove app ownership post-creation to evade detection. Graph API returns no owners.
Solution: Fall back to AuditLogs "Add application" OperationName to find the original creator — AuditLogs retain the InitiatedBy actor forever.
7. MicrosoftGraphActivityLogs May Not Be Available
Problem: Q5 requires MicrosoftGraphActivityLogs, which needs Entra ID P1/P2 and diagnostic settings to be enabled. Not all tenants have this.
Impact: If the table doesn't exist, Q5 returns an error.
Solution: If Q5 fails with "table not found", report as ❓ MicrosoftGraphActivityLogs not available and skip — do not fail the entire assessment. The other 7 chain queries and Graph API posture still provide substantial coverage.
8. suspiciousAuthAppApproval Self-Referencing in Q1
Problem: When a consent grant occurs, Identity Protection may flag the same event as a suspiciousAuthAppApproval risk detection. Q1 then correlates the risk event WITH the consent operation, creating a circular detection.
Solution: If Q1 results are dominated by suspiciousAuthAppApproval risk types, note in the report that these may be self-referencing. The user can filter with | where RiskTypes !has "suspiciousAuthAppApproval" for higher-confidence chains.
9. Conflating Delegated AllPrincipals Consent with Application Permission Risk
Problem: When auditing tenant permissions (e.g., Get-MgOauth2PermissionGrant -Filter "consentType eq 'AllPrincipals'"), the returned delegated scopes can look alarming — 100+ scopes on a single app. It is tempting to rate these at the same severity as application permissions.
Why this is wrong: Delegated permissions operate as the intersection of the app's consented scopes and the signed-in user's Entra roles. A standard user cannot exploit broad delegated consent beyond their own role boundaries. The consent only removes the per-user prompt — it does not elevate privilege.
Solution: See Delegated vs Application Permissions — Risk Model. When this skill's analysis overlaps with a separate delegated consent audit, always clarify which permission type is being discussed. Application permissions (from P2/appRoleAssignedTo) are the primary risk. Delegated AllPrincipals consents are a secondary concern relevant mainly to privileged admin account compromise scenarios.
Quality Checklist
Before delivering the report, verify:
- Phase 1 (Graph API) completed: P1–P7 steps executed
- Phase 2 (KQL) completed: Q1–Q8 all executed via
RunAdvancedHuntingQuery - Zero-result queries are reported with explicit absence confirmation (✅ pattern)
- Graph API used
appRoleAssignedTo(NOTrequiredResourceAccess) for permission inventory - App ownership queried from Application object (NOT ServicePrincipal)
- Cross-tenant SPNs reported separately with foreign-tenant note
- The App Permission Risk Score calculation is transparent with per-dimension evidence
- Permission inventory includes human-readable names (not just GUIDs)
- Owner risk assessment includes directory role check + Identity Protection status
- Credential hygiene includes expiry dates, not just counts
- Chain detection results include triage priority (🔴/🟠/🟡) for each finding
- Chain detection consent findings distinguish application permission grants (🔴) from delegated consent grants (🟠/🟡) — see Delegated vs Application Permissions — Risk Model
- Q8 cross-reference includes three-way breakdown (both flagged, skill-only, Microsoft-only)
- Recommendations complement (not duplicate) App Governance capabilities
- All hyperlinks copied verbatim from URL Registry — no fabricated URLs
- No PII from live environments in the SKILL.md file itself
- Total elapsed time reported
SVG Dashboard Generation
📊 Optional post-report step. After an App Registration Posture report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/app-registration-posture/App_Registration_Posture_Report_<tenant>_<date>.md - Customization: Create an
svg-widgets.yamlin this skill folder before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest, if it exists)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode if yaml exists, Freeform Mode otherwise)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/app-registration-posture/{report_name}_dashboard.svg
.github/skills/authentication-tracing/SKILL.md
npx skills add SCStelz/security-investigator --skill authentication-tracing -g -y
SKILL.md
Frontmatter
{
"name": "authentication-tracing",
"description": "Use this skill when asked to trace authentication flows, analyze SessionId chains, investigate token reuse vs interactive MFA, or assess geographic anomalies in sign-ins. Triggers on keywords like \"trace authentication\", \"trace back to interactive MFA\", \"SessionId analysis\", \"token reuse\", \"geographic anomaly\", \"impossible travel\", or when investigating suspicious sign-in locations. This skill provides forensic analysis of Entra ID authentication chains to distinguish legitimate activity from credential\/token theft.",
"drill_down_prompt": "Trace authentication chain for {entity} — SessionId analysis, token reuse, geographic anomalies",
"threat_pulse_domains": [
"identity"
]
}
Authentication Tracing - Instructions
Purpose
This skill performs forensic analysis of Entra ID authentication flows to determine whether anomalous sign-ins represent:
- Legitimate activity (VPN usage, user travel, mobile carrier routing)
- Token theft/credential compromise (stolen refresh tokens, session hijacking)
The key distinction is whether the user actively performed MFA at a suspicious location or if the authentication used a refresh token from a prior session.
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Key Forensic Indicators - Understanding authentication signals
- IP Enrichment Data - JSON structure reference
- 6-Step Forensic Workflow - SessionId-based investigation
- Real-World Example - Complete walkthrough
- Authentication Methods Reference - Method patterns
- Risk Escalation Criteria - High/Medium/Low classification
- Best Practices - Summary checklist
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
🚨 MANDATORY CHECKPOINT: Before providing ANY risk assessment for authentication anomalies:
- STOP - Do not improvise or use general security knowledge
- READ the complete risk assessment framework in this document
- QUOTE specific instruction sections in your analysis
- VERIFY your conclusions match documented guidance before responding to user
Before executing ANY authentication tracing queries, you MUST:
- Read the SessionId-based workflow (Steps 1-6 below) in full
- Search the investigation JSON for IP enrichment data (
ip_enrichmentarray) - PRIMARY DATA SOURCE - Follow the documented steps in order (SessionId → Authentication chain → Interactive MFA → Risk assessment)
- Use IP enrichment context in your final risk assessment (VPN status, abuse scores, threat intel, auth patterns)
Skipping these steps will result in incomplete or incorrect analysis.
Key Forensic Indicators
When investigating anomalous sign-ins (e.g., from new countries, IPs, or devices), it's critical to determine whether the user actively performed MFA at that location or if the authentication used a refresh token from a prior session.
RequestSequence Field
| Value | Meaning | Implication |
|---|---|---|
RequestSequence: 1 or higher |
Interactive authentication | User was challenged and responded |
RequestSequence: 0 |
Token-based authentication | No user interaction required |
AuthenticationDetails Array Patterns
Interactive Pattern:
- Array contains authentication method (e.g., "Passkey (device-bound)") with
RequestSequence > 0 - Followed by "Previously satisfied" entry
Token Reuse Pattern:
- Array contains ONLY "Previously satisfied" entries
- Shows "MFA requirement satisfied by claim in the token"
authenticationStepDateTime Correlation
- If
authenticationStepDateTimereferences a time when NO interactive auth occurred, it indicates token reuse - Cross-reference timestamps with events that have
RequestSequence > 0to trace token origin
IP Enrichment Data Structure (PRIMARY EVIDENCE SOURCE)
CRITICAL: The investigation JSON contains a comprehensive ip_enrichment array with authoritative detection flags.
Always reference this data FIRST before making VPN/proxy/Tor determinations.
Example IP Enrichment Entry (Actual JSON Structure)
{
"ip": "203.0.113.42", // ← KEY: Use "ip" field, not "ip_address"
"city": "Singapore",
"region": "Singapore",
"country": "SG",
"org": "AS12345 Example Hosting Ltd",
"asn": "AS12345",
"timezone": "Asia/Singapore",
"risk_level": "HIGH", // ← Overall risk assessment (LOW/MEDIUM/HIGH)
"assessment": "⚠️ Threat Intelligence Match: Commercial VPN Service Detected",
"is_vpn": true, // ← PRIMARY VPN DETECTION FLAG (ipinfo.io detection)
"is_proxy": false, // ← PRIMARY PROXY DETECTION FLAG
"is_tor": false, // ← PRIMARY TOR DETECTION FLAG
"abuse_confidence_score": 0, // ← AbuseIPDB score 0-100 (0=clean, 75+=high risk)
"total_reports": 2, // ← Number of abuse reports in AbuseIPDB
"is_whitelisted": false,
"threat_description": "Commercial VPN Service: Known Anonymization Infrastructure",
"anomaly_type": "NewInteractiveIP",
"first_seen": "2025-10-16", // ← First sign-in from this IP (date string)
"last_seen": "2025-10-16", // ← Last sign-in from this IP (date string)
"hit_count": 5, // ← Number of anomaly detections
"signin_count": 8, // ← Total sign-ins from this IP
"success_count": 7, // ← Successful authentications
"failure_count": 1, // ← Failed authentications
"last_auth_result_detail": "MFA requirement satisfied by claim in the token",
"threat_detected": false, // ← Legacy field (use threat_description instead)
"threat_confidence": 0,
"threat_tlp_level": "",
"threat_activity_groups": ""
}
CRITICAL: Always use ip_enrichment[].ip to match IPs, NOT ip_address!
Key Fields for Analysis
| Field | Purpose | Usage Example |
|---|---|---|
| is_vpn | Definitive VPN detection | is_vpn: true → Confirmed VPN endpoint (don't infer, use this flag) |
| is_proxy | Definitive proxy detection | is_proxy: true → Confirmed proxy (anonymized traffic) |
| is_tor | Definitive Tor detection | is_tor: true → Confirmed Tor exit node (high anonymity risk) |
| abuse_confidence_score | AbuseIPDB reputation (0-100) | >= 75 = High risk, >= 25 = Medium risk, 0 = Clean |
| threat_detected | Threat intel match flag | true → IP matches ThreatIntelIndicators table |
| threat_description | Threat intel details | "Surfshark VPN", "Malicious activity detected", etc. |
| org / asn | Network ownership | AS9009 = M247 Europe (VPN infrastructure provider) |
| signin_count | Total sign-ins from IP | High count (>100) = established pattern vs transient |
| last_auth_result_detail | Authentication method | "MFA satisfied by token" vs "Correct password" = interactive vs token reuse |
| first_seen / last_seen | Temporal pattern | Single day = transient, multi-day = established behavior |
Analysis Priority Hierarchy
- IP enrichment flags (
is_vpn,is_proxy,is_tor) - Most authoritative source - Abuse reputation (
abuse_confidence_score,total_reports) - Community-validated risk data - Threat intelligence (
threat_detected,threat_description) - IOC matches from Sentinel - Network ownership (
org,asn,company_type) - Infrastructure context (hosting, ISP, etc.) - Authentication patterns (
last_auth_result_detail,signin_count) - Behavioral context - Identity Protection (risk detections) - Microsoft ML-based risk signals
⚠️ NEVER say "likely VPN" or "probably proxy" if enrichment data has explicit boolean flags!
Forensic Workflow: Tracing Authentication Chains
Scenario: Anomalous sign-ins detected from new IP/location. Determine if user performed fresh MFA or reused token.
Tool Selection: AH-First, Data Lake Fallback
| Lookback | Tool | Table | Why |
|---|---|---|---|
| ≤ 30 days (default) | RunAdvancedHuntingQuery |
EntraIdSignInEvents |
Single table covers interactive + non-interactive. No union needed. Direct columns for Country, City, Browser, UserAgent. Free on Analytics tier. |
| > 30 days | mcp_sentinel-data_query_lake |
union SigninLogs, AADNonInteractiveUserSignInLogs |
AH Graph API caps at 30d. Data Lake retains 90d+. See Data Lake Fallback Queries below. |
Column mapping — EntraIdSignInEvents vs SigninLogs:
| EntraIdSignInEvents (AH) | SigninLogs (Data Lake) | Notes |
|---|---|---|
Timestamp |
TimeGenerated |
|
AccountUpn |
UserPrincipalName |
|
Application |
AppDisplayName |
|
ErrorCode (int) |
ResultType (string) |
AH: ErrorCode == 0, DL: ResultType == "0" |
Country, City (direct strings) |
Location or parse_json(LocationDetails) |
No parsing needed in AH |
LogonType (JSON array) |
Separate tables (SigninLogs vs AADNonInteractive) | AH: has "interactiveUser", DL: check which table |
AuthenticationRequirement |
AuthenticationRequirement |
Same values: singleFactorAuthentication, multiFactorAuthentication |
UserAgent, Browser, OSPlatform |
parse_json(DeviceDetail) |
Direct columns in AH |
UniqueTokenId |
(not available) | AH-only — token-level forensics |
SessionId |
SessionId |
Same |
| (not available) | AuthenticationDetails (JSON array) |
DL-only — per-step RequestSequence + authenticationMethod |
Key trade-off:
AuthenticationDetails(Data Lake only) provides per-stepRequestSequenceandauthenticationMethod("Password", "Previously satisfied", "Mobile app notification").EntraIdSignInEventsreplaces this with row-levelLogonType(interactive vs non-interactive) +AuthenticationRequirement(singleFactor/multiFactor) +UniqueTokenId. Both achieve the same forensic goal — determining interactive MFA vs token reuse — through different signals.
⚠️ Table name casing: Capital I in SignIn — EntraIdSignInEvents, NOT EntraIdSigninEvents.
⚠️ LogonType is a JSON array string (e.g., ["interactiveUser"]). Use has for filtering, NOT ==.
CRITICAL: START WITH SessionId - This is Your Primary and Most Efficient Investigation Pattern:
- Query suspicious IP(s) to get SessionId (single query for all suspicious IPs)
- Query SessionId for complete chain — interactive vs non-interactive classification, geographic progression, token tracking
- Find interactive sign-ins to determine where the user (or attacker) authenticated interactively
- Expand date range progressively if needed: investigation window → 7 days → 30 days (AH limit)
- If > 30d lookback required: Switch to Data Lake fallback queries (90d retention)
AVOID chronological searching without SessionId - it requires multiple queries and is less efficient.
Step 1: Get SessionId from Suspicious Authentication (ALWAYS START HERE)
Tool: RunAdvancedHuntingQuery
This single query gives you SessionId AND enough context to determine next steps:
let suspicious_ips = dynamic(["<IP_1>", "<IP_2>"]); // All suspicious IPs
EntraIdSignInEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ '<UPN>'
| where IPAddress in (suspicious_ips)
| project Timestamp, IPAddress, Country, City, Application,
SessionId, LogonType, AuthenticationRequirement,
UserAgent, Browser, OSPlatform, ErrorCode, UniqueTokenId
| order by Timestamp asc
| take 50
What This Returns:
- SessionId(s) for suspicious authentications (your primary key for Step 2)
- Device fingerprint (UserAgent) to check for device consistency
- Application context
- Initial timeline
Critical Decision Point:
- All suspicious IPs share same SessionId? → Session continuity detected → Investigate further (could be legitimate user OR stolen token)
- Different SessionIds across IPs? → Different authentication flows → Investigate device and authentication patterns
- IMPORTANT: SessionId alone does NOT determine legitimacy - must correlate with UserAgent, geography, and behavior patterns
Step 2: Trace Complete Authentication Chain by SessionId (DEFINITIVE PROOF)
Tool: RunAdvancedHuntingQuery
Once you have SessionId from Step 1, query ALL authentications in that session:
let target_session_id = "<SESSION_ID_FROM_STEP_1>";
EntraIdSignInEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ '<UPN>'
| where SessionId == target_session_id
| project Timestamp, IPAddress, Country, City, Application,
LogonType, AuthenticationRequirement, ErrorCode,
UserAgent, Browser, OSPlatform, UniqueTokenId
| order by Timestamp asc
This Single Query Reveals:
- Complete geographic progression (all IPs/locations in chronological order)
- Where interactive authentication occurred (
LogonType has "interactiveUser"+AuthenticationRequirement == "multiFactorAuthentication") - Token reuse pattern (
LogonType has "nonInteractiveUser"— all subsequent events using cached tokens) - Device consistency (Browser + UserAgent + OSPlatform should match across session)
- Time gaps between locations (assess physical possibility of travel)
- Token-level tracking (
UniqueTokenId— same token reused across IPs = session continuity)
Critical Evidence - What SessionId Indicates:
- SessionId is a browser session identifier that tracks authentication flows
- Same SessionId across IPs = Session continuity (could be legitimate user OR stolen token replay)
- SessionId does NOT prove device identity - stolen refresh tokens maintain session continuity
- Same SessionId + Same UserAgent + Geographic impossibility = Possible token theft
- Token theft attacks maintain the original SessionId - attacker inherits session from stolen token
- CRITICAL: Same SessionId does NOT rule out credential/token theft
Analysis Pattern:
- Look at FIRST authentication in session (earliest Timestamp)
- Check if
LogonType has "interactiveUser"→ User performed interactive authentication at that IP/location - Check
AuthenticationRequirement→multiFactorAuthentication= MFA was required and satisfied;singleFactorAuthentication= password-only - Subsequent events with
LogonType has "nonInteractiveUser"= token reuse (expected OAuth flow) - Verify device consistency (Browser + UserAgent should match across session; different = possible token theft)
- Assess geographic progression (impossible travel = high risk; reasonable = needs user confirmation)
- Track
UniqueTokenId— same token ID across geographically distant IPs = session continuity (could be VPN OR stolen token)
Step 3: Find Interactive Sign-Ins with Progressive Date Range Expansion
Tool: RunAdvancedHuntingQuery (≤30d) or Data Lake fallback (>30d)
Use this when Step 2 shows only nonInteractiveUser logon types (no interactive auth in the session)
Query Pattern:
EntraIdSignInEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ '<UPN>'
| where LogonType has "interactiveUser"
| where ErrorCode == 0
| summarize
SignInCount = count(),
Apps = make_set(Application, 5),
Countries = make_set(Country, 3),
AuthReqs = make_set(AuthenticationRequirement),
TokenIds = dcount(UniqueTokenId),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by IPAddress, SessionId
| order by LastSeen desc
| take 20
What This Returns:
- All IPs where the user performed interactive sign-ins, grouped by session
AuthenticationRequirementper IP — reveals whether MFA was required or bypassedTokenIdscount — how many distinct tokens were issued from each IP/session pair- Match SessionIds against Step 1 results — if the suspicious SessionId also has interactive sign-ins from a VPS IP, the attacker has the password
Progressive expansion (if AH 30d window is insufficient):
- If no interactive sign-ins found in 30d → Switch to Data Lake fallback queries for 90d lookback
- Tokens can be valid for up to 90 days depending on tenant policy
Data Lake Fallback Queries (>30d)
Use these when the AH 30d window is insufficient — e.g., tracing token origins older than 30 days.
Tool: mcp_sentinel-data_query_lake with workspaceId
Step 1 (Data Lake):
let suspicious_ips = dynamic(["<IP_1>", "<IP_2>"]);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(90d)
| where UserPrincipalName =~ '<UPN>'
| where IPAddress in (suspicious_ips)
| project TimeGenerated, IPAddress, Location, AppDisplayName,
SessionId = tostring(SessionId), UserAgent, ResultType, CorrelationId
| order by TimeGenerated asc
| take 50
Step 2 (Data Lake) — with per-step auth detail:
let target_session_id = "<SESSION_ID>";
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(90d)
| where UserPrincipalName =~ '<UPN>'
| where SessionId == target_session_id
| extend AuthDetails = parse_json(tostring(AuthenticationDetails))
| mv-expand AuthDetails
| extend AuthMethod = tostring(AuthDetails.authenticationMethod)
| extend AuthStepDateTime = todatetime(AuthDetails.authenticationStepDateTime)
| extend RequestSeq = toint(AuthDetails.RequestSequence)
| project TimeGenerated, IPAddress, Location, AppDisplayName,
AuthMethod, AuthStepDateTime, RequestSeq, UserAgent, ResultType
| order by TimeGenerated asc
Data Lake advantage:
AuthenticationDetailsprovides granular per-stepRequestSequenceandauthenticationMethod("Password", "Previously satisfied", "Mobile app notification") not available inEntraIdSignInEvents. Use this for forensic-grade MFA step tracing when AH'sLogonType+AuthenticationRequirementcolumns are insufficient.
Step 3 (Data Lake) — interactive MFA search:
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(90d)
| where UserPrincipalName =~ '<UPN>'
| extend AuthDetails = parse_json(tostring(AuthenticationDetails))
| mv-expand AuthDetails
| extend AuthMethod = tostring(AuthDetails.authenticationMethod)
| extend RequestSeq = toint(AuthDetails.RequestSequence)
| where AuthMethod != "Previously satisfied"
| where RequestSeq > 0
| project TimeGenerated, IPAddress, Location, AppDisplayName, AuthMethod,
RequestSeq, SessionId = tostring(SessionId), UserAgent, ResultType
| order by TimeGenerated desc
| take 30
Step 4: Collect All IPs from Authentication Chain
CRITICAL: After completing the SessionId trace, extract ALL unique IP addresses discovered:
- From Interactive MFA session (Step 3 results)
- From Suspicious session (Step 1 results)
- From Complete SessionId chain (Step 2 results)
Build comprehensive IP list for enrichment analysis.
Step 5: Analyze IP Enrichment Data for ALL Discovered IPs
MANDATORY: Search investigation JSON ip_enrichment array for EVERY IP in the authentication chain:
For each IP address discovered in Steps 1-3:
-
Locate IP in
ip_enrichmentarray (search by"ip": "<IP_ADDRESS>"field) -
Extract key risk indicators:
is_vpn,is_proxy,is_tor(anonymization detection)abuse_confidence_score,total_reports(reputation)threat_description,threat_detected(threat intel matches)org,asn(network ownership - hosting vs ISP)last_auth_result_detail(authentication pattern)signin_count,success_count,failure_count(frequency/behavior)first_seen,last_seen(temporal pattern - transient vs established)
-
Document findings for EACH IP in the chain:
- Geographic location + ISP/VPN status
- Risk level + threat intelligence status
- Authentication pattern (interactive vs token reuse)
- Behavioral context (frequency, success rate, temporal pattern)
This creates a complete evidence picture showing the full authentication journey with enrichment context.
Step 6: Document Risk Assessment
⚠️ MANDATORY CHECKPOINT - Before writing risk assessment:
- READ the "When to Escalate Authentication Anomalies" section below
- IDENTIFY which risk classification criteria applies to your case
- QUOTE the specific criteria in your analysis
- DO NOT improvise - follow documented classification exactly
Present findings in clear evidence trail:
- Interactive Session: IP, Location, Timestamp, AuthMethod, SessionId
- Subsequent Session: IP, Location, Timestamp, AuthMethod (token-based), SessionId
- IP Enrichment Analysis for ALL IPs: Present enrichment data for EVERY IP discovered in trace (VPN status, abuse scores, threat intel, auth patterns, frequency, temporal context)
- Connection Proof: SessionId match + time gap + geographic distance + comprehensive enrichment context from all IPs
- Risk Assessment: Evaluate based on context - MUST quote specific instruction criteria
Risk Assessment Framework - SessionId Interpretation:
- SessionId does NOT prove device identity - token theft maintains session continuity
- Same SessionId across geographically distant IPs = Requires investigation (VPN/travel OR stolen token)
- Different SessionIds = Different authentication flows (not necessarily more suspicious)
- Must correlate multiple signals: SessionId + UserAgent + Geography + Behavior + Time patterns + IP enrichment data
Real-World Example: Geographic Anomaly Analysis
Scenario: User sign-ins detected from two geographically distant locations within 18 hours.
Step 1: Interactive MFA Analysis
Location A Analysis:
- Query 1: Found 2 events with
SMS verificationandRequestSeq: 1 - Result: User performed fresh interactive SMS authentication at Location A
- Evidence:
authenticationStepDateTime: 2025-10-15T14:23:05ZwithRequestSequence: 1
Location B Analysis:
- Query 1: Zero results (no non-"Previously satisfied" methods)
- Result: Location B authentications used only token reuse - NO interactive MFA
- Evidence: All events show
"MFA requirement satisfied by claim in the token"
Step 2: SessionId Verification (SMOKING GUN)
Query to compare sessions across both IPs:
let suspicious_ips = dynamic(["<IP_ADDRESS_1>", "<IP_ADDRESS_2>"]);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (datetime(<START_DATE>) .. datetime(<END_DATE>))
| where UserPrincipalName =~ '<UPN>'
| where IPAddress in (suspicious_ips)
| project TimeGenerated, IPAddress, Location, SessionId, UserAgent
| order by TimeGenerated asc
CRITICAL FINDING:
- SessionId:
<SESSION_ID_EXAMPLE> - ALL Location A authentications: Same SessionId (over time period 1)
- ALL Location B authentications: Same SessionId (over time period 2)
- Time gap: Varies (analyze based on context)
- Geographic distance: Varies (analyze based on context)
Initial Appearance: Potential geographic anomaly requiring investigation Further Analysis Required: Correlate SessionId with UserAgent, behavior patterns, and user confirmation
Step 3: Evidence Summary and Interpretation
| Evidence Type | Finding | Observation |
|---|---|---|
| Interactive MFA | Location A only | User performed SMS authentication |
| Location B Auth Methods | "Previously satisfied" only | Token reuse (normal OAuth flow) |
| SessionId | Same across both locations | Session continuity maintained |
| Time Gap | 18 hours | Within typical refresh token lifetime (24-90 days) |
| User Agent | Same | Consistent device fingerprint |
| Applications | Consistent across locations | Consistent workflow pattern |
Critical Analysis - SessionId Does NOT Prove Legitimacy
The same SessionId requires careful analysis because:
- SessionId is a browser session identifier that tracks authentication flows
- Same SessionId = Session continuity (could be legitimate user OR stolen token)
- Stolen refresh tokens maintain the original SessionId - attacker inherits session state
- Same SessionId does NOT rule out token theft or credential compromise
Possible Scenarios Requiring Investigation:
| Scenario | Description | Action Required |
|---|---|---|
| Legitimate VPN Connection | User switched VPN exit nodes (same device, different apparent location) | Requires user confirmation |
| Legitimate User Travel | User traveled between locations with sufficient time gap (tokens remained valid) | Requires user confirmation |
| Multi-Device User | User has laptop + phone active simultaneously (different IPs, concurrent activity) | Check UserAgent for mobile vs desktop - Requires user confirmation |
| Stolen Token Replay | Attacker obtained refresh token (SessionId stays same, may show different UserAgent) | Cannot be ruled out by SessionId alone |
| Mobile Carrier Routing | Carrier routes traffic through regional gateways (device in one location, exits another) | Check IP enrichment for ISP org |
Additional Investigation Checklist
- ✅ Check UserAgent consistency across all sessions
- ✅ Distinguish mobile vs desktop UserAgents - Concurrent activity from different device types (e.g., Android Chrome + Windows Edge) may indicate legitimate multi-device usage, not token theft
- ✅ Verify geographic progression is physically possible
- ✅ Review applications accessed (any unusual admin tools?)
- ✅ Check for failed authentication attempts before success
- ✅ Look for account modifications or privilege changes
- ✅ Check IP enrichment data in investigation JSON - Use
ip_enrichmentarray to verify:- VPN/proxy/Tor status (
is_vpn,is_proxy,is_tor) - Abuse reputation (
abuse_confidence_score,total_reports) - Threat intelligence matches (
threat_detected,threat_description) - Authentication patterns (
last_auth_result_detail,signin_count,success_count,failure_count) - Temporal context (
first_seen,last_seen- transient vs established pattern)
- VPN/proxy/Tor status (
- ✅ Most important: Confirm with user directly
Recommendation: User Confirmation Questions
Use IP enrichment data from investigation JSON to strengthen your analysis, then confirm with user:
- "Were you using a VPN on [date] around [time]?" (if
is_vpn: true) - "Did you travel between [Location A] and [Location B] during this timeframe?"
- "Were you using multiple devices (e.g., laptop and phone) at the same time?" (if concurrent activity with different UserAgents detected)
- "Do you recognize [applications] activity during this timeframe?"
- "Have you noticed any unusual device or account behavior recently?"
Only after user confirmation can you conclude VPN usage or travel is legitimate. Same SessionId + IP enrichment data together provide strong evidence, but user confirmation is still required.
Common Authentication Methods and RequestSequence Patterns
| Authentication Method | RequestSeq > 0 Meaning | RequestSeq = 0 Meaning |
|---|---|---|
| Passkey (device-bound) | User physically approved with biometric/PIN | Passkey used in prior session, token reused |
| Phone sign-in | User approved notification on phone | Phone approval in prior session, token reused |
| SMS verification | User entered SMS code | SMS verification in prior session, token reused |
| Microsoft Authenticator app | User approved push notification | Authenticator used in prior session, token reused |
| Previously satisfied | N/A - never has RequestSeq > 0 | Always indicates token/claim reuse |
When to Escalate Authentication Anomalies
CRITICAL: Always check IP enrichment data before making risk determination!
High Risk (Escalate Immediately)
- Token reuse from geographically impossible locations (regardless of SessionId)
- Token reuse after user reports device loss/theft
- Concurrent sessions from multiple countries simultaneously with same UserAgent (same device can't be in two places)
- Note: Concurrent activity with different UserAgents (mobile vs desktop) may indicate legitimate multi-device usage - verify before escalating
- Token reuse from IPs matching ThreatIntelIndicators OR
threat_detected: truein IP enrichment - Unusual application access (admin portals, sensitive resources not in user's normal pattern)
- Failed authentication attempts followed by successful token reuse
- Account modifications or privilege escalations during suspicious sessions
- Geographic anomaly + Same SessionId + Different UserAgent = Likely token theft
- Impossible travel time between authentications (regardless of SessionId)
- IP enrichment shows:
abuse_confidence_score >= 75,is_tor: true, or maliciousthreat_description
Medium Risk (Investigate Further - Confirm with User)
- Same SessionId + Geographically distant locations = Could be VPN/travel OR token theft - VERIFY with IP enrichment
- Concurrent activity from different IPs with different UserAgents = Could be multi-device (laptop + phone) OR token theft - ASK user about device usage
- Token reuse from unexpected country without prior user notification
- Token reuse spanning >30 days (excessive token lifetime - increases theft window)
- Pattern of token-only authentications without any interactive MFA in 30+ days
- Sign-ins during unusual hours for user's timezone
- Access to sensitive data repositories during suspicious sessions
- Same SessionId + Same UserAgent + Unusual geographic pattern = Needs user confirmation
- IP enrichment shows:
abuse_confidence_score >= 25,is_vpn: truewithout user confirmation, ortotal_reports > 0
Low Risk / Likely Legitimate (Monitor Only)
- Token reuse from nearby IPs in same city (mobile carrier IP rotation)
- Token reuse following confirmed interactive MFA from expected location
- Token reuse from known corporate VPN IP ranges
- Applications and access patterns consistent with user's role
- User confirms VPN usage or travel when questioned
- No unusual data access or configuration changes
- Consistent UserAgent + Reasonable geographic progression + User confirmation
- IP enrichment shows:
abuse_confidence_score: 0, residential ISP org (TELUS, Comcast, etc.),is_vpn: false, highsignin_countwith consistent success rate
Best Practices for Authentication Tracing
- START WITH SessionId - Query suspicious IPs to get SessionId first (most efficient approach)
- Use SessionId to trace complete chain - Single query shows entire authentication progression
- Check IP enrichment data - Use investigation JSON
ip_enrichmentarray for VPN, abuse scores, threat intel - Verify device consistency - Same SessionId + Same UserAgent + Geographic reasonableness = Likely legitimate
- Check for multi-device scenarios - Different UserAgents (mobile vs desktop) with concurrent activity often indicates legitimate multi-device usage, not token theft. Users commonly work on laptop while checking email on phone.
- Concurrent activity ≠ Automatic compromise - Before concluding token theft from concurrent sessions, verify UserAgent differences and ask user about device usage patterns
- SessionId alone is NOT conclusive - Must correlate with UserAgent, geography, behavior, and user confirmation
- Check first authentication in session - RequestSeq > 0 shows where user performed interactive MFA
- Assess geographic progression - Evaluate if travel is physically possible or if VPN is likely
- Widen time ranges if needed - Tokens can be valid for 24-90 days depending on policy
- Always confirm with user - Geographic anomalies require user verification regardless of SessionId
Prerequisites
Required MCP Servers
This skill requires:
- Sentinel Triage MCP (primary, ≤30d) —
RunAdvancedHuntingQueryforEntraIdSignInEvents - Microsoft Sentinel Data Lake MCP (fallback, >30d) —
mcp_sentinel-data_query_lakeforSigninLogs+AADNonInteractiveUserSignInLogsunion
Required Data Sources
- EntraIdSignInEvents (primary) — Unified interactive + non-interactive sign-in events. Advanced Hunting only, 30d retention via Graph API
- SigninLogs + AADNonInteractiveUserSignInLogs (fallback) — Sentinel Data Lake, 90d+ retention. Required when tracing token origins older than 30 days or when
AuthenticationDetailsper-step granularity is needed - Investigation JSON - Pre-generated investigation file with
ip_enrichmentarray (from user-investigation skill)
How to Find Investigation JSON
- Pattern:
temp/investigation_<upn_prefix>_<timestamp>.json - Most recent file for user is usually the one to analyze
- Use
file_searchorlist_dirto locate existing investigations
Integration with User Investigation Skill
Authentication tracing is typically performed as a follow-up analysis after running a user investigation:
- Run user-investigation skill → Generates investigation JSON with
ip_enrichmentarray - Review anomalies → Identify suspicious IPs/locations requiring deeper analysis
- Run authentication-tracing skill → Trace SessionId chains, correlate with IP enrichment
- Document findings → Provide risk assessment with evidence trail
Key Integration Points:
- IP enrichment data comes from investigation JSON (already queried by user-investigation)
- SessionId queries are NEW queries specific to authentication tracing
- Risk assessment combines both data sources
.github/skills/computer-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill computer-investigation -g -y
SKILL.md
Frontmatter
{
"name": "computer-investigation",
"description": "Use this skill when asked to investigate a computer, device, endpoint, or machine for security issues, suspicious activity, malware, or compliance review. Triggers on keywords like \"investigate computer\", \"investigate device\", \"investigate endpoint\", \"check machine\", \"device security\", \"endpoint investigation\", or when a device name\/hostname is mentioned with investigation context. This skill provides comprehensive device security analysis including Defender alerts, sign-in patterns, logged-on users, vulnerabilities, software inventory, compliance status, network activity, and automated investigation tracking for Entra Joined, Hybrid Joined, and Entra Registered devices.",
"drill_down_prompt": "Investigate device {entity} — Defender alerts, process activity, vulnerabilities, compliance",
"threat_pulse_domains": [
"endpoint"
]
}
Computer Security Investigation - Instructions
Purpose
This skill performs comprehensive security investigations on Windows, macOS, and Linux devices registered in Microsoft Entra ID and/or managed by Microsoft Defender for Endpoint. It analyzes Defender alerts, device compliance, sign-in patterns, logged-on users, installed software, vulnerabilities, network connections, and automated investigation results for:
- Entra Joined Devices: Cloud-only devices joined directly to Microsoft Entra ID
- Hybrid Joined Devices: Devices joined to both on-premises Active Directory and Microsoft Entra ID
- Entra Registered Devices: Personal devices (BYOD) registered with Microsoft Entra ID
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Investigation Types - Standard/Quick/Comprehensive
- Output Modes - Inline / Markdown file / JSON export
- Quick Start - 5-step investigation pattern
- Execution Workflow - Complete process
- Sample KQL Queries - Validated query patterns
- Microsoft Graph Queries - Entra ID device data
- Defender for Endpoint Queries - MDE API integration
- Markdown Report Template - Full markdown report structure
- JSON Export Structure - Required fields
- Error Handling - Troubleshooting guide
- SVG Dashboard Generation - Visual dashboard from report data
Investigation shortcuts:
- Device with behavioral drift (TP Q6): Q3 (suspicious processes) → Q11 (logon events) → Q7 (incidents) → Q8 (device info)
- Internet-facing critical asset (TP Q11): Q8 (device info + internet-facing) → Q4 (outbound connections) → Q10 (vulnerabilities) → Q11 (logon events)
- Device in active incident (TP Q1): Q2 (security alerts) → Q3 (process execution) → Q5 (file events) → Q6 (registry persistence) → Q7 (incidents)
- Brute-forced endpoint (TP Q4): Q11 (logon events) → Q4 (outbound connections) → Q12 (TI IP matches)
- Vulnerability assessment (TP Q12): Q9 (software inventory) → Q10 (CVEs on device) → Q8 (exposure score)
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY computer investigation:
- ALWAYS get Device ID FIRST (required for Defender API and Graph queries - multiple IDs exist!)
- ALWAYS determine device type (Entra Joined, Hybrid Joined, or Entra Registered)
- ALWAYS calculate date ranges correctly (use current date from context - see Date Range section)
- ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or JSON export (see Output Modes)
- ALWAYS track and report time after each major step (mandatory)
- ALWAYS run independent queries in parallel (drastically faster execution)
- ALWAYS use
create_filefor JSON export and markdown reports (NEVER use PowerShell terminal commands) - ⛔ ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
⛔ MANDATORY: Sentinel Workspace Selection
This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:
When invoked from a parent skill (incident-investigation, threat-pulse, etc.):
- Inherit the workspace selection from the parent investigation context
- If no workspace was selected in parent context: STOP and ask user to select
- Use the
SELECTED_WORKSPACE_IDSpassed from the parent skill - Skip output mode prompts — default to inline chat (the parent skill controls the final output format)
When invoked standalone (direct user request):
- ALWAYS call
list_sentinel_workspacesMCP tool FIRST - If 1 workspace exists: Auto-select, display to user, proceed
- If multiple workspaces exist:
- Display all workspaces with Name and ID
- ASK: "Which Sentinel workspace should I use for this investigation?"
- ⛔ STOP AND WAIT for user response
- ⛔ DO NOT proceed until user explicitly selects
- If a query fails on the selected workspace:
- ⛔ DO NOT automatically try another workspace
- STOP and report the error
- Display available workspaces
- ASK user to select a different workspace
- WAIT for user response
Workspace Failure Handling
IF query returns "Failed to resolve table" or similar error:
- STOP IMMEDIATELY
- Report: "⚠️ Query failed on workspace [NAME] ([ID]). Error: [ERROR_MESSAGE]"
- Display: "Available workspaces: [LIST_ALL_WORKSPACES]"
- ASK: "Which workspace should I use instead?"
- WAIT for explicit user response
- DO NOT retry with a different workspace automatically
🔴 PROHIBITED ACTIONS:
- ❌ Selecting a workspace without user consent when multiple exist
- ❌ Switching to another workspace after a failure without asking
- ❌ Proceeding with investigation if workspace selection is ambiguous
- ❌ Assuming a workspace based on previous sessions
Device ID Types:
- Entra Device ID (Azure AD Object ID): Used for Graph API queries - GUID format
- Defender Device ID: Used for MDE API queries - GUID format (different from Entra ID!)
- Device Name/Hostname: Human-readable name, use for initial search
- Intune Device ID: Used for Intune management queries
Date Range Rules:
- Real-time/recent searches: Add +2 days to current date for end range
- Historical ranges: Add +1 day to user's specified end date
- Example: Current date = Jan 23; "Last 7 days" →
datetime(2026-01-16)todatetime(2026-01-25)
Device Types Reference
Entra Joined Devices
- trustType:
AzureAd - Characteristics: Cloud-only, no on-premises AD connection
- Identity: Uses Entra ID for authentication
- Common scenarios: Cloud-native organizations, Windows Autopilot deployments
Hybrid Joined Devices
- trustType:
ServerAd(indicates hybrid join with on-premises AD) - Characteristics: Joined to both on-premises AD and Entra ID
- Identity: Uses both on-premises AD and Entra ID
- Common scenarios: Traditional enterprise environments migrating to cloud
Entra Registered Devices
- trustType:
Workplace - Characteristics: Personal/BYOD devices, user adds work account
- Identity: User authenticates with Entra ID, device not fully managed
- Common scenarios: BYOD policies, personal device access to corporate resources
Available Investigation Types
Standard Investigation (7 days)
When to use: General security reviews, routine investigations
Example prompts:
- "Investigate device WORKSTATION-001 for the last 7 days"
- "Run security investigation for computer LAP-JSMITH from 2026-01-16 to 2026-01-23"
- "Check endpoint security for DESKTOP-ABC123"
Quick Investigation (1 day)
When to use: Urgent cases, active malware alerts, recent suspicious activity
Example prompts:
- "Quick investigate infected device SRV-SQL01"
- "Run quick security check on machine WKS-FINANCE02"
- "Urgent: check device LAPTOP-EXEC-01 for compromise"
Comprehensive Investigation (30 days)
When to use: Deep-dive analysis, lateral movement detection, thorough forensics
Example prompts:
- "Full investigation for potentially compromised device SRV-DC01"
- "Do a deep dive investigation on endpoint WORKSTATION-IT03 last 30 days"
- "Comprehensive security analysis for hybrid joined device DESKTOP-HR01"
All types include: Defender alerts, device compliance, sign-in patterns from device, logged-on users, software inventory, vulnerabilities, network connections, file activities, automated investigation status, and security recommendations.
Output Modes
This skill supports three output modes. ASK the user which they prefer if not explicitly specified. Multiple modes may be selected simultaneously.
Mode 1: Inline Chat Summary (Default)
- Render the full investigation analysis directly in the chat response
- Includes device profile, risk assessment, alerts, vulnerabilities, logged-on users, and recommendations
- Best for quick review and interactive follow-up questions
- No file output — results stay in the chat context
Mode 2: Markdown File Report
- Save a comprehensive investigation report to
reports/computer-investigations/computer_investigation_<device_name>_<YYYYMMDD_HHMMSS>.md - All sections from inline mode plus additional detail (full vulnerability tables, process event samples, network connection details, query appendix)
- Uses the Markdown Report Template defined below
- Use
create_filetool — NEVER use terminal commands for file output - Filename pattern:
computer_investigation_<device_name>_YYYYMMDD_HHMMSS.md(lowercase device name, replace spaces/special chars with underscores)
Mode 3: JSON Export (Legacy)
- Export investigation data to JSON for downstream processing or archival
- Uses the JSON Export Structure defined below
- Best for programmatic consumption or integration with other tools
Markdown Rendering Notes
- ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
- ✅ Unicode block characters (
█full block,─box-drawing horizontal) display correctly in monospaced fonts - ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
- ✅ Standard markdown tables (
| col |) render as formatted tables - Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering
Quick Start (TL;DR)
When a user requests a computer security investigation:
-
Get Device IDs:
# First, find the device and get both Entra ID and Defender ID mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,trustType,isCompliant,isManaged") # Then get Defender device ID from MDE Use Defender `ListDefenderMachines` or Advanced Hunting to find by device name -
Run Parallel Queries:
- Batch 1: 8 Sentinel/Advanced Hunting queries (device sign-ins, alerts, process events, network, files, incidents)
- Batch 2: 5 Defender API queries (machine details, logged-on users, alerts, vulnerabilities, recommendations)
- Batch 3: 3 Graph queries (device details, compliance, BitLocker keys if needed)
-
Export & Report (Mode-Dependent):
- Mode 1 (Inline): Render analysis directly in chat using the Markdown Report Template as a guide
- Mode 2 (Markdown): Build full report using the Markdown Report Template, save to
reports/computer-investigations/ - Mode 3 (JSON): Export to
temp/investigation_device_<device_name>_<timestamp>.json
-
Generate Summary Report: Provide investigation summary with key findings, risk assessment, and recommendations.
-
Track time after each major step and report to user
Execution Workflow
🚨 MANDATORY: Time Tracking Pattern
YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:
[MM:SS] ✓ Step description (XX seconds)
Required Reporting Points:
- After Device ID retrieval
- After parallel data collection
- After JSON file creation
- After summary generation
- Final: Total elapsed time
Phase 1: Get Device IDs (REQUIRED FIRST)
Step 1a: Get Entra Device ID from Microsoft Graph
/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,trustType,isCompliant,isManaged,registrationDateTime,approximateLastSignInDateTime,mdmAppId,profileType
Step 1b: Get Defender Device ID Use Advanced Hunting or Defender API to find the MDE device ID:
DeviceInfo
| where DeviceName startswith '<DEVICE_NAME>' // Use startswith to match both hostname and FQDN
| summarize arg_max(TimeGenerated, *) by DeviceName
| project DeviceId, DeviceName, OSPlatform, OSVersion, MachineGroup, OnboardingStatus, ExposureLevel, SensorHealthState, DeviceManualTags, DeviceDynamicTags, RegistryDeviceTag
Note: RiskScore is NOT in DeviceInfo - use GetDefenderMachine API to get riskScore and exposureLevel.
Why BOTH IDs are required:
- Entra Device ID: Used for Graph API (compliance, registration, BitLocker, Intune)
- Defender Device ID: Used for MDE API (alerts, vulnerabilities, logged-on users, investigations)
- IDs are DIFFERENT: The same device has different GUIDs in Entra ID vs Defender for Endpoint
Device Type Determination:
- Check
trustTypefield from Graph API response:AzureAd= Entra JoinedServerAd= Hybrid JoinedWorkplace= Entra Registered
Phase 2: Parallel Data Collection
CRITICAL: Use create_file tool to create JSON - NEVER use PowerShell terminal commands!
Batch 1: Sentinel/Advanced Hunting Queries (Run ALL in parallel)
- Device sign-in events (Query 1) - Who signed into this device
- Device alerts (Query 2) - SecurityAlert filtered by device
- Process execution events (Query 3) - Suspicious process activity
- Network connection events (Query 4) - Outbound connections
- File events (Query 5) - File creation/modification/deletion
- Registry events (Query 6) - Registry modifications
- Security incidents (Query 7) - Incidents containing this device
- Device inventory changes (Query 8) - Configuration changes
Batch 2: Defender for Endpoint API (Run ALL in parallel)
- Machine details (
GetDefenderMachine) - Device info from MDE - Logged-on users (
GetDefenderMachineLoggedOnUsers) - Recent users - Device alerts (
GetDefenderMachineAlerts) - MDE alerts - Device vulnerabilities (Advanced Hunting) - CVEs on device
- Installed software (Advanced Hunting) - Software inventory
Batch 3: Graph API Queries (Run ALL in parallel)
- Device details (Graph) - Full device properties
- Compliance policies (Graph) - Applied compliance policies
- Intune device status (if MDM enrolled) - Intune management data
Phase 3: Export & Generate Report (Mode-Dependent)
Mode 1 — Inline Chat Summary
- No file export needed
- Render the full investigation analysis directly in chat using the section structure from the Markdown Report Template as a guide
- Include: Device Profile, Alert Summary, Logged-On Users, Vulnerability Overview, Process Activity, Network Connections, Risk Assessment, Recommendations
- Use emoji-coded tables for risk factors and mitigating factors
Mode 2 — Markdown File Report
-
Assess IP enrichment needs:
- Extract public IPs from network connection events and sign-in data
- Run
python enrich_ips.py <ip1> <ip2> ...for threat intelligence enrichment - Parse the output to populate IP Intelligence tables in the report
-
Build the markdown report using the Markdown Report Template below
- Populate ALL sections with actual query data
- For sections with no data: use the explicit absence confirmation pattern (e.g., "✅ No alerts detected...")
- Calculate risk score and assessment dynamically
-
Save the report:
create_file("reports/computer-investigations/computer_investigation_<device_name>_YYYYMMDD_HHMMSS.md", markdown_content)- Use
create_filetool — NEVER use terminal commands for file output - Lowercase device name, replace spaces/special chars with underscores
- Use
Mode 3 — JSON Export (Legacy)
-
Export to JSON:
create_file("temp/investigation_device_<device_name>_<timestamp>.json", json_content) -
Merge all results into one dict structure (see JSON Export Structure section below)
Required Field Specifications
Device Query (Graph API)
/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,trustType,isCompliant,isManaged,registrationDateTime,approximateLastSignInDateTime,mdmAppId,profileType,manufacturer,model,enrollmentType,deviceOwnership
- All fields REQUIRED for investigation
trustTypedetermines device join typeisCompliantandisManagedindicate MDM status
Defender Machine Details
Use the Defender GetDefenderMachine MCP tool with Defender Device ID:
- Returns: healthStatus, riskScore, exposureLevel, onboardingStatus, lastSeen, osPlatform, osVersion
Sample KQL Queries
Use these exact patterns with the appropriate MCP tool. Replace <DEVICE_NAME>, <DEVICE_ID>, <StartDate>, <EndDate>.
⚠️ CRITICAL: START WITH THESE EXACT QUERY PATTERNS These queries have been tested and validated. Use them as your PRIMARY reference.
🔧 MCP Tool Invocation Reference
CRITICAL: Use the correct parameter names for each tool!
Sentinel Data Lake MCP (query_lake tool)
- Tool: Use the Sentinel Data Lake MCP's
query_laketool - Parameter name:
query - Time column:
TimeGenerated - Use for: Lookbacks >30 days on any table (AH Graph API is capped at 30d), or when AH is blocked by the safety filter
Example invocation:
query_lake(
query="DeviceInfo | where DeviceName startswith 'DEVICENAME' | summarize arg_max(TimeGenerated, *) by DeviceId",
workspaceId="<WORKSPACE_ID>"
)
Defender XDR Advanced Hunting (RunAdvancedHuntingQuery tool)
- Tool: Use the Sentinel Triage MCP's
RunAdvancedHuntingQuerytool - Parameter name:
kqlQuery(NOTquery!) - Time column:
Timestampfor XDR-native tables (Device*,Email*, etc.);TimeGeneratedfor LA/Sentinel tables (SigninLogs,SecurityAlert, etc.) — even in AH - Use for: Default choice for all ≤30d queries (free for Analytics-tier tables). Required for TVM tables (
DeviceTvmSoftwareInventory,DeviceTvmSoftwareVulnerabilities) which don't exist in Data Lake.
Example invocation:
RunAdvancedHuntingQuery(
kqlQuery="DeviceTvmSoftwareVulnerabilities | where DeviceName startswith 'DEVICENAME' | take 30"
)
Tool Selection Guide
Follow the global Tool Selection Rule in .github/copilot-instructions.md (Data Lake vs Advanced Hunting). This skill does NOT override the global default — use Advanced Hunting first for ≤30d lookbacks (free for Analytics-tier tables), and fall back to Data Lake only for >30d windows or when AH is blocked by the safety filter.
| Table | Tool (lookback ≤30d) | Tool (lookback >30d) | Time Column |
|---|---|---|---|
Device* (DeviceInfo, DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceLogonEvents, DeviceRegistryEvents) |
Advanced Hunting (free) | Data Lake | AH: Timestamp / DL: TimeGenerated |
SecurityAlert, SecurityIncident |
Advanced Hunting | Data Lake | TimeGenerated (both tools) |
SigninLogs, AuditLogs, AADNonInteractiveUserSignInLogs |
Advanced Hunting | Data Lake | TimeGenerated (both tools) |
DeviceTvmSoftwareInventory, DeviceTvmSoftwareVulnerabilities |
Advanced Hunting only | Advanced Hunting only | Timestamp (snapshot, no time filter needed) |
When adapting the sample queries below: they are written with TimeGenerated for Data Lake compatibility. For Advanced Hunting on Device* tables, swap TimeGenerated → Timestamp. For SecurityAlert/SecurityIncident/SigninLogs in AH, keep TimeGenerated (LA/Sentinel tables retain their column name in AH).
Schema differences: Some MDE columns (e.g., SentBytes, ReceivedBytes in DeviceNetworkEvents) may not be available in Data Lake. If a column fails in one tool, try the other.
📅 Date Range Quick Reference
🔴 STEP 0: GET CURRENT DATE FIRST (MANDATORY) 🔴
- ALWAYS check the current date from the context header BEFORE calculating date ranges
- NEVER use hardcoded years - the year changes and you WILL query the wrong timeframe
RULE 1: Real-Time/Recent Searches (Current Activity)
- Add +2 days to current date for end range
- Why +2? +1 for timezone offset (PST behind UTC) + +1 for inclusive end-of-day
- Pattern: Today is Jan 23 (PST) → Use
datetime(2026-01-25)as end date
RULE 2: Historical Searches (User-Specified Dates)
- Add +1 day to user's specified end date
- Why +1? To include all 24 hours of the final day
Examples Table (Assuming Current Date = January 23, 2026):
| User Request | <StartDate> |
<EndDate> |
Rule Applied |
|---|---|---|---|
| "Last 7 days" | 2026-01-16 |
2026-01-25 |
Rule 1 (+2) |
| "Last 30 days" | 2025-12-24 |
2026-01-25 |
Rule 1 (+2) |
| "Jan 15 to Jan 20" | 2026-01-15 |
2026-01-21 |
Rule 2 (+1) |
1. Device Sign-In Events (Who authenticated on this device)
Note: DeviceDetail is dynamic in SigninLogs but string in AADNonInteractiveUserSignInLogs. Query SigninLogs only for device context (interactive sign-ins contain device info). Do NOT use union with DeviceDetail filtering - causes schema conflicts in Sentinel Data Lake.
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
SigninLogs
| where TimeGenerated between (start .. end)
| extend DeviceDetailStr = tostring(DeviceDetail)
| where DeviceDetailStr has deviceName
| extend ParsedDevice = parse_json(DeviceDetailStr)
| extend DeviceName = tostring(ParsedDevice.displayName)
| extend DeviceId = tostring(ParsedDevice.deviceId)
| extend DeviceOS = tostring(ParsedDevice.operatingSystem)
| extend DeviceTrustType = tostring(ParsedDevice.trustType)
| extend DeviceCompliant = tostring(ParsedDevice.isCompliant)
| summarize
SignInCount = count(),
SuccessCount = countif(ResultType == '0'),
FailureCount = countif(ResultType != '0'),
UniqueUsers = dcount(UserPrincipalName),
Users = make_set(UserPrincipalName, 10),
Applications = make_set(AppDisplayName, 10),
IPAddresses = make_set(IPAddress, 10),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by DeviceName, DeviceOS, DeviceTrustType, DeviceCompliant
| order by SignInCount desc
2. Device Security Alerts (SecurityAlert table)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has deviceName or CompromisedEntity has deviceName
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project
TimeGenerated,
AlertName,
AlertSeverity,
Status,
Description,
ProviderName,
Tactics,
Techniques,
CompromisedEntity,
RemediationSteps
| order by TimeGenerated desc
| take 20
3. Process Execution Events (Suspicious processes)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceProcessEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| where ActionType in ("ProcessCreated", "ProcessCreatedUsingWmiQuery")
| extend CommandLineLength = strlen(ProcessCommandLine)
| extend IsSuspicious = case(
ProcessCommandLine has_any ("powershell", "cmd", "wscript", "cscript") and ProcessCommandLine has_any ("-enc", "-e ", "bypass", "hidden", "downloadstring", "invoke-expression", "iex"), true,
ProcessCommandLine has_any ("certutil", "bitsadmin") and ProcessCommandLine has_any ("download", "transfer", "urlcache"), true,
ProcessCommandLine has_any ("reg", "registry") and ProcessCommandLine has_any ("add", "delete") and ProcessCommandLine has_any ("run", "runonce"), true,
FileName in~ ("mimikatz.exe", "procdump.exe", "psexec.exe", "cobaltstrike", "beacon.exe"), true,
CommandLineLength > 500, true,
false)
| summarize
ProcessCount = count(),
SuspiciousCount = countif(IsSuspicious),
UniqueProcesses = dcount(FileName),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
SampleCommands = make_set(ProcessCommandLine, 5)
by FileName, FolderPath, AccountName, AccountDomain
| where SuspiciousCount > 0 or ProcessCount > 50
| order by SuspiciousCount desc, ProcessCount desc
| take 20
4. Network Connection Events (Outbound connections)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| where ActionType == "ConnectionSuccess"
| where RemoteIPType != "Private" // Focus on public IPs
| summarize
ConnectionCount = count(),
UniqueRemoteIPs = dcount(RemoteIP),
UniqueRemotePorts = dcount(RemotePort),
Protocols = make_set(Protocol, 5),
InitiatingProcesses = make_set(InitiatingProcessFileName, 10),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by RemoteIP, RemotePort, RemoteUrl
| order by ConnectionCount desc
| take 30
5. File Events (File creation/modification/deletion)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceFileEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| where ActionType in ("FileCreated", "FileModified", "FileDeleted", "FileRenamed")
| extend FileExtension = tostring(split(FileName, ".")[-1])
| extend IsSuspicious = case(
FileExtension in~ ("exe", "dll", "bat", "cmd", "ps1", "vbs", "js", "hta", "scr", "pif"), true,
FolderPath has_any ("\\temp\\", "\\tmp\\", "\\appdata\\local\\temp", "\\programdata\\", "\\users\\public\\"), true,
false)
| summarize
FileEventCount = count(),
SuspiciousCount = countif(IsSuspicious),
CreatedCount = countif(ActionType == "FileCreated"),
ModifiedCount = countif(ActionType == "FileModified"),
DeletedCount = countif(ActionType == "FileDeleted"),
UniqueFiles = dcount(FileName),
FileExtensions = make_set(FileExtension, 10),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by FolderPath, InitiatingProcessFileName
| where SuspiciousCount > 0 or FileEventCount > 100
| order by SuspiciousCount desc, FileEventCount desc
| take 20
6. Registry Events (Registry modifications)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceRegistryEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| where ActionType in ("RegistryValueSet", "RegistryKeyCreated")
| extend IsPersistence = case(
RegistryKey has_any ("\\CurrentVersion\\Run", "\\CurrentVersion\\RunOnce", "\\CurrentVersion\\RunServices"), true,
RegistryKey has_any ("\\Policies\\Explorer\\Run", "\\Active Setup\\Installed Components"), true,
RegistryKey has_any ("\\Image File Execution Options\\", "\\Winlogon\\", "\\BootExecute"), true,
RegistryKey has_any ("\\Services\\", "\\Drivers\\"), true,
false)
| summarize
RegistryEventCount = count(),
PersistenceCount = countif(IsPersistence),
UniqueKeys = dcount(RegistryKey),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by RegistryKey, RegistryValueName, InitiatingProcessFileName
| where PersistenceCount > 0
| order by PersistenceCount desc, RegistryEventCount desc
| take 20
7. Security Incidents Containing Device
let deviceName = '<DEVICE_NAME>';
let deviceId = '<DEVICE_ID>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let relevantAlerts = SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has deviceName or Entities has deviceId or CompromisedEntity has deviceName
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProviderName, Tactics;
SecurityIncident
| where CreatedTime between (start .. end)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where not(tostring(Labels) has "Redirected")
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend ProviderIncidentUrl = tostring(AdditionalData.providerIncidentUrl)
| extend OwnerUPN = tostring(Owner.userPrincipalName)
| summarize
Title = any(Title),
Severity = any(Severity),
Status = any(Status),
Classification = any(Classification),
CreatedTime = any(CreatedTime),
LastModifiedTime = any(LastModifiedTime),
OwnerUPN = any(OwnerUPN),
ProviderIncidentUrl = any(ProviderIncidentUrl),
AlertCount = count(),
Tactics = make_set(Tactics)
by ProviderIncidentId
| order by LastModifiedTime desc
| take 10
8. Device Inventory and Configuration Changes
Note: RiskScore is NOT in DeviceInfo - use GetDefenderMachine API for risk/exposure scores.
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceInfo
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| summarize arg_max(TimeGenerated, *) by DeviceId
| project
TimeGenerated,
DeviceId,
DeviceName,
OSPlatform,
OSVersion,
OSBuild,
OSArchitecture,
LoggedOnUsers,
MachineGroup,
DeviceCategory,
OnboardingStatus,
SensorHealthState,
ExposureLevel,
IsAzureADJoined,
IsInternetFacing,
JoinType,
PublicIP,
DeviceManualTags,
DeviceDynamicTags,
RegistryDeviceTag
9. Software Inventory on Device
⚠️ DO NOT use Sentinel Data Lake MCP (query_lake) for this query. The DeviceTvmSoftwareInventory table is NOT available in the Sentinel Data Lake. Use Advanced Hunting MCP (RunAdvancedHuntingQuery) only. TVM tables use snapshot ingestion with no TimeGenerated filtering.
let deviceName = '<DEVICE_NAME>';
DeviceTvmSoftwareInventory
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| project
DeviceName,
SoftwareVendor,
SoftwareName,
SoftwareVersion,
EndOfSupportStatus,
EndOfSupportDate
| summarize by SoftwareVendor, SoftwareName, SoftwareVersion, EndOfSupportStatus, EndOfSupportDate
| order by NumberOfWeaknesses desc
| take 30
10. Vulnerabilities on Device
⚠️ DO NOT use Sentinel Data Lake MCP (query_lake) for this query. The DeviceTvmSoftwareVulnerabilities table is NOT available in the Sentinel Data Lake. Use Advanced Hunting MCP (RunAdvancedHuntingQuery) only. TVM tables use snapshot ingestion with no TimeGenerated filtering.
let deviceName = '<DEVICE_NAME>';
DeviceTvmSoftwareVulnerabilities
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| project
CveId,
VulnerabilitySeverityLevel,
SoftwareVendor,
SoftwareName,
SoftwareVersion,
RecommendedSecurityUpdate,
RecommendedSecurityUpdateId
| summarize by CveId, VulnerabilitySeverityLevel, SoftwareVendor, SoftwareName, SoftwareVersion, RecommendedSecurityUpdate, RecommendedSecurityUpdateId
| order by case(VulnerabilitySeverityLevel == "Critical", 1, VulnerabilitySeverityLevel == "High", 2, VulnerabilitySeverityLevel == "Medium", 3, 4) asc
| take 30
11. Logon Events on Device
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceLogonEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| summarize
LogonCount = count(),
SuccessCount = countif(ActionType == "LogonSuccess"),
FailureCount = countif(ActionType == "LogonFailed"),
UniqueAccounts = dcount(AccountName),
LogonTypes = make_set(LogonType, 5),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
RemoteIPs = make_set(RemoteIP, 10)
by AccountName, AccountDomain, LogonType
| order by LogonCount desc
| take 20
12. Threat Intelligence IP Matches (Device Network Traffic)
Performance notes: ThreatIntelIndicators can be large (100K+ rows). Filter IsActive/ValidUntil before string transformations per KQL best practices — reduce data first, transform later. The triple replace_string was replaced with direct array indexing split(...)[0] which returns a clean string.
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
let device_ips = DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName // Use startswith to match both hostname and FQDN
| where RemoteIPType != "Private"
| distinct RemoteIP;
ThreatIntelIndicators
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| where tostring(split(ObservableKey, ":")[0]) in ("ipv4-addr", "ipv6-addr", "network-traffic")
| where ObservableValue in (device_ips)
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| summarize arg_max(TimeGenerated, *) by ObservableValue
| project
TimeGenerated,
IPAddress = ObservableValue,
ThreatDescription = Description,
Confidence,
ValidUntil,
IsActive
| order by Confidence desc
| take 20
Microsoft Graph Device Queries
Use these Graph API queries in Phase 2 (Batch 3) of investigation workflow
Step 1: Find Device by Name
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,trustType,isCompliant,isManaged,registrationDateTime,approximateLastSignInDateTime,mdmAppId,profileType,manufacturer,model,enrollmentType,deviceOwnership")
Step 2: Get Device Owners
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices/<DEVICE_OBJECT_ID>/registeredOwners?$select=id,displayName,userPrincipalName")
Step 3: Get Device Users
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices/<DEVICE_OBJECT_ID>/registeredUsers?$select=id,displayName,userPrincipalName")
Step 4: Get BitLocker Recovery Keys (if needed)
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/informationProtection/bitlocker/recoveryKeys?$filter=deviceId eq '<DEVICE_ID>'")
NOTE: Requires BitLockerKey.Read.All permission
Step 5: Get Intune Device Details (if MDM enrolled)
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/deviceManagement/managedDevices?$filter=deviceName eq '<DEVICE_NAME>'&$select=id,deviceName,managedDeviceOwnerType,complianceState,managementAgent,lastSyncDateTime,osVersion,azureADRegistered,azureADDeviceId,deviceEnrollmentType,deviceCategoryDisplayName,serialNumber,userPrincipalName")
Defender for Endpoint Queries
Use these MDE API queries in Phase 2 (Batch 2) of investigation workflow
Get Machine Details
GetDefenderMachine(id="<DEFENDER_DEVICE_ID>")
Returns: id, computerDnsName, osPlatform, osVersion, healthStatus, onboardingStatus, riskScore, exposureLevel, lastSeen, lastIpAddress, lastExternalIpAddress, rbacGroupName, machineTags (API field — maps to DeviceManualTags in AH)
Get Logged-On Users
GetDefenderMachineLoggedOnUsers(id="<DEFENDER_DEVICE_ID>")
Returns: Array of users with accountName, accountDomain, firstSeen, lastSeen, logonTypes
Get Machine Alerts (via API)
Use the ListAlerts MCP tool filtered by device:
ListAlerts with machineId filter
Get Automated Investigations
ListDefenderInvestigations
Filter results by machineId to find investigations related to the device
Get Remediation Activities
ListDefenderRemediationActivities
Filter results by machineId to find remediation tasks for the device
Markdown Report Template
When outputting to markdown file (Mode 2), use this template. Populate ALL sections with actual query data. For sections with no data, use the explicit absence confirmation pattern.
Filename pattern: reports/computer-investigations/computer_investigation_<device_name>_YYYYMMDD_HHMMSS.md
# Computer Security Investigation Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Device:** `<DEVICE_NAME>`
**OS:** <operating_system> <os_version>
**Trust Type:** <Entra Joined / Hybrid Joined / Entra Registered> (`<trustType>`)
**Compliance:** <Compliant/Non-Compliant> | **Managed:** <Yes/No> | **MDM:** <Intune/None>
**Investigation Period:** <start_date> → <end_date> (<N> days)
**Investigation Type:** <Standard (7d) / Quick (1d) / Comprehensive (30d)>
**Data Sources:** DeviceInfo, DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceLogonEvents, SigninLogs, SecurityAlert, SecurityIncident, DeviceTvmSoftwareVulnerabilities, DeviceTvmSoftwareInventory, ThreatIntelIndicators, Microsoft Graph API, Defender for Endpoint API
---
## Executive Summary
<2-4 sentence summary: overall device risk level, key findings, most significant alerts or vulnerabilities, and primary recommendation. Ground every claim in evidence from query results.>
**Overall Risk Level:** 🔴 CRITICAL / 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL
---
## Device Profile
| Property | Value |
|----------|-------|
| **Device Name** | `<device_name>` |
| **OS** | <os_platform> <os_version> (<os_build>) |
| **Architecture** | <os_architecture> |
| **Trust Type** | <Entra Joined / Hybrid Joined / Entra Registered> |
| **Compliant** | 🟢 Yes / 🔴 No |
| **Managed** | 🟢 Yes / 🔴 No |
| **Manufacturer** | <manufacturer> |
| **Model** | <model> |
| **Registration Date** | <datetime> |
| **Last Sign-in** | <datetime> |
| **Internet Facing** | 🔴 Yes / 🟢 No |
### Defender for Endpoint Status
| Property | Value |
|----------|-------|
| **Onboarding Status** | 🟢 Onboarded / 🔴 Not Onboarded |
| **Sensor Health** | 🟢 Active / 🟠 Inactive / 🔴 Misconfigured |
| **Health Status** | <health_status> |
| **Risk Score** | 🔴/🟠/🟡/🟢 <None/Low/Medium/High> |
| **Exposure Level** | 🔴/🟠/🟡/🟢 <None/Low/Medium/High> |
| **Last Seen** | <datetime> |
| **Last Internal IP** | <ip_address> |
| **Last External IP** | <ip_address> |
| **Machine Group** | <group_name> |
| **Device Tags** | <comma-separated list from DeviceManualTags + DeviceDynamicTags, or "None"> |
### Device Owners & Registered Users
<If owners/users found:>
| User | UPN | Role |
|------|-----|------|
| <display_name> | <upn> | Owner / Registered User |
<If no owners/users:>
✅ No registered owners or users found for this device.
---
## Key Metrics
| Metric | Value |
|--------|-------|
| **Security Alerts** | <count> (Critical: <n>, High: <n>, Medium: <n>, Low: <n>) |
| **Security Incidents** | <count> (Open: <n>, Closed: <n>) |
| **Logged-On Users** | <count> unique users |
| **Sign-ins from Device** | <count> (Success: <n>, Failed: <n>) |
| **Vulnerabilities** | <count> (Critical: <n>, High: <n>, Medium: <n>) |
| **Suspicious Processes** | <count> flagged |
| **Network Connections** | <count> external IPs |
| **TI Matches** | <count> threat intel hits |
| **End-of-Support Software** | <count> |
---
## Security Alerts
<If alerts found:>
| Time | Alert Name | Severity | Status | Provider | Tactics | Compromised Entity |
|------|-----------|----------|--------|----------|---------|---------------------|
| <datetime> | <alert_name> | 🔴/🟠/🟡 <severity> | <status> | <provider> | <tactics> | <entity> |
**Alert Summary:**
- <X> total alerts (<breakdown by severity>)
- <Brief description of most critical alert(s)>
- Remediation steps: <summary of recommended actions from alert data>
<If no alerts:>
✅ No security alerts detected for this device in the investigation period.
- Checked: SecurityAlert filtered by device name and device ID (0 matches)
---
## Security Incidents
<If incidents found:>
| ID | Title | Severity | Status | Classification | Created | Owner | Alerts | Link |
|----|-------|----------|--------|----------------|---------|-------|--------|------|
| <provider_incident_id> | <title> | 🔴/🟠/🟡 <severity> | <New/Active/Closed> | <TP/FP/BP/—> | <date> | <owner_upn> | <count> | [View](<url>) |
**Incident Summary:**
- <X> total incidents (<Y> open, <Z> closed)
- Highest severity: <level>
- <Brief description of most critical incident>
<If no incidents:>
✅ No security incidents involving this device in the investigation period.
- Checked: SecurityAlert → SecurityIncident join on device name and device ID (0 matches)
---
## Logged-On Users
<If users found:>
| Account | Domain | Logon Type | Logon Count | Success | Failed | First Seen | Last Seen |
|---------|--------|------------|:-----------:|:-------:|:------:|------------|-----------|
| <account_name> | <domain> | <Interactive/RemoteInteractive/Network/etc.> | <count> | <count> | <count> | <date> | <date> |
**User Analysis:**
- <X> unique accounts authenticated on this device
- <Summary of logon patterns — expected vs unexpected accounts, after-hours logons, remote IPs>
<If no logon data:>
✅ No logon events detected for this device in the investigation period.
### Defender Logged-On Users (API)
<If MDE logged-on users found:>
| Account | Domain | First Seen | Last Seen | Logon Types |
|---------|--------|------------|-----------|-------------|
| <account_name> | <domain> | <date> | <date> | <types> |
<If no MDE data:>
✅ No logged-on user data returned from Defender for Endpoint API.
---
## Sign-in Activity (From Device)
<If sign-in events found:>
| Device Name | OS | Trust Type | Compliant | Users | Applications | IPs | Sign-ins | Success | Failed | First Seen | Last Seen |
|-------------|-----|------------|-----------|:-----:|:------------:|:---:|:--------:|:-------:|:------:|------------|-----------|
| <name> | <os> | <trust> | 🟢/🔴 | <count> | <count> | <count> | <count> | <count> | <count> | <date> | <date> |
**Top Users:** <list of UPNs>
**Top Applications:** <list of apps>
**Top IPs:** <list of IPs>
<If no sign-in events:>
✅ No sign-in events found for this device in the investigation period.
---
## Process Activity
<If suspicious processes found:>
| Process | Path | Account | Process Count | Suspicious | Sample Command Lines |
|---------|------|---------|:------------:|:----------:|----------------------|
| <filename> | <folder_path> | <account_name> | <count> | 🔴 <count> | <truncated_command> |
**Process Analysis:**
- <X> suspicious process executions detected
- <Summary of suspicious patterns — encoded commands, LOLBins, credential dumping tools, long command lines>
<If no suspicious processes:>
✅ No suspicious process activity detected on this device in the investigation period.
- Checked: DeviceProcessEvents filtered for suspicious indicators (0 flagged)
---
## Network Connections
<If external connections found:>
| Remote IP | Remote Port | URL | Connections | Unique Ports | Protocols | Initiating Processes | First Seen | Last Seen |
|-----------|:-----------:|-----|:-----------:|:------------:|-----------|----------------------|------------|-----------|
| <ip> | <port> | <url> | <count> | <count> | <protocols> | <process_list> | <date> | <date> |
**Network Summary:**
- <X> unique external IPs contacted
- <Y> unique remote ports
- <Top initiating processes>
<If no external connections:>
✅ No external network connections detected for this device in the investigation period.
### Threat Intelligence Matches
<If TI matches found:>
| IP Address | Threat Description | Confidence | Valid Until | Active |
|------------|-------------------|:----------:|------------|:------:|
| <ip> | <description> | <score> | <date> | ✅/❌ |
<If no TI matches:>
✅ No threat intelligence matches found for device network traffic.
- Checked: ThreatIntelIndicators joined with device external IPs (0 matches)
---
## File Activity
<If suspicious file events found:>
| Folder Path | Initiating Process | Total Events | Suspicious | Created | Modified | Deleted | Extensions | First Seen | Last Seen |
|-------------|-------------------|:------------:|:----------:|:-------:|:--------:|:-------:|------------|------------|-----------|
| <path> | <process> | <count> | 🔴 <count> | <count> | <count> | <count> | <ext_list> | <date> | <date> |
**File Activity Analysis:**
- <X> suspicious file operations detected
- <Summary — executable drops in temp folders, script creation, mass file modifications>
<If no suspicious file events:>
✅ No suspicious file activity detected on this device in the investigation period.
- Checked: DeviceFileEvents for suspicious extensions and temp folder activity (0 flagged)
---
## Registry Modifications
<If persistence-related registry events found:>
| Registry Key | Value Name | Initiating Process | Total Events | Persistence | First Seen | Last Seen |
|-------------|------------|-------------------|:------------:|:-----------:|------------|-----------|
| <key> | <value_name> | <process> | <count> | 🔴 <count> | <date> | <date> |
**Registry Analysis:**
- <X> persistence-related registry modifications detected
- <Summary — Run keys, services, Winlogon, IFEO modifications>
<If no persistence registry events:>
✅ No persistence-related registry modifications detected on this device in the investigation period.
- Checked: DeviceRegistryEvents for Run/RunOnce/Services/Winlogon/IFEO keys (0 flagged)
---
## Vulnerabilities
<If vulnerabilities found:>
| CVE ID | Severity | Vendor | Software | Version | Security Update |
|--------|----------|--------|----------|---------|-----------------|
| <cve_id> | 🔴/🟠/🟡 <severity> | <vendor> | <software> | <version> | <update_id> |
**Vulnerability Summary:**
- <X> total vulnerabilities (Critical: <n>, High: <n>, Medium: <n>, Low: <n>)
- <Most critical CVEs and their remediation status>
<If no vulnerabilities:>
✅ No known vulnerabilities detected on this device.
- Checked: DeviceTvmSoftwareVulnerabilities (0 records)
---
## Software Inventory
<If notable software found:>
| Vendor | Software | Version | End of Support | EOS Date |
|--------|----------|---------|:--------------:|----------|
| <vendor> | <software> | <version> | 🔴 Yes / 🟢 No | <date> |
**Software Summary:**
- <X> total software packages installed
- <Y> end-of-support software detected
- <Notable findings — outdated browsers, deprecated runtimes, risky applications>
<If no software data:>
✅ No software inventory data available for this device.
- Checked: DeviceTvmSoftwareInventory (0 records)
---
## Device Configuration
<If configuration data available:>
| Property | Value |
|----------|-------|
| **Public IP** | <ip> |
| **Machine Group** | <group> |
| **Device Category** | <category> |
| **Onboarding Status** | <status> |
| **Sensor Health** | <health> |
| **Exposure Level** | <level> |
| **Azure AD Joined** | <Yes/No> |
| **Internet Facing** | <Yes/No> |
| **Join Type** | <type> |
---
## IP Intelligence
<Table of external IPs from network connections and sign-in data. Run `enrich_ips.py` for top IPs.>
| IP Address | Source | Location | ISP/Org | VPN | Abuse Score | Reports | Risk |
|------------|--------|----------|---------|-----|-------------|---------|------|
| <ip> | 🔵 Network / 🔵 Sign-in / 🔴 TI Match | <city, country> | <org> | 🟢 No / 🔴 Yes | <score>% | <count> | HIGH/MED/LOW |
---
## Risk Assessment
### Risk Score: <XX>/100 — 🔴 CRITICAL / 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL
### Risk Factors
| Factor | Finding |
|--------|---------|
| 🔴/🟠/🟡 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |
### Mitigating Factors
| Factor | Finding |
|--------|---------|
| 🟢 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |
---
## Recommendations
### Critical Actions
<Numbered list of critical actions with evidence. Only include if critical findings exist.>
### High Priority Actions
<Numbered list of high-priority actions with evidence.>
### Monitoring Actions (14-Day Follow-Up)
<Bulleted list of ongoing monitoring recommendations.>
---
## Appendix: Query Details
| # | Query | Table(s) | Tool | Records | Execution |
|---|-------|----------|------|--------:|----------:|
| 1 | Device Sign-In Events | SigninLogs | Data Lake | <count> | <time> |
| 2 | Security Alerts | SecurityAlert | Data Lake | <count> | <time> |
| 3 | Process Events | DeviceProcessEvents | Data Lake | <count> | <time> |
| 4 | Network Connections | DeviceNetworkEvents | Data Lake | <count> | <time> |
| 5 | File Events | DeviceFileEvents | Data Lake | <count> | <time> |
| 6 | Registry Events | DeviceRegistryEvents | Data Lake | <count> | <time> |
| 7 | Security Incidents | SecurityAlert, SecurityIncident | Data Lake | <count> | <time> |
| 8 | Device Inventory | DeviceInfo | Data Lake | <count> | <time> |
| 9 | Software Inventory | DeviceTvmSoftwareInventory | Advanced Hunting | <count> | <time> |
| 10 | Vulnerabilities | DeviceTvmSoftwareVulnerabilities | Advanced Hunting | <count> | <time> |
| 11 | Logon Events | DeviceLogonEvents | Data Lake | <count> | <time> |
| 12 | Threat Intelligence | ThreatIntelIndicators, DeviceNetworkEvents | Data Lake | <count> | <time> |
| — | Device Profile | Microsoft Graph API | Graph | 1 | <time> |
| — | Device Owners/Users | Microsoft Graph API | Graph | <count> | <time> |
| — | Machine Details | Defender for Endpoint API | MDE | 1 | <time> |
| — | Logged-On Users | Defender for Endpoint API | MDE | <count> | <time> |
*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*
**Do NOT include full KQL text in the appendix** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.
---
**Investigation Timeline:**
- [MM:SS] ✓ Phase 1: Device ID retrieval (<X>s)
- [MM:SS] ✓ Phase 2: Parallel data collection (<X>s)
- [MM:SS] ✓ IP Enrichment (<X>s)
- [MM:SS] ✓ Phase 3: Report generation (<X>s)
- **Total Investigation Time:** <duration>
Markdown Report Authoring Guidelines
- Populate every section — even if data is empty. Use the
✅ No <X> detected...pattern for empty sections. - Never invent data — follow the Evidence-Based Analysis global rule strictly. Every number in the report must come from a query result.
- Risk assessment is dynamic — calculate risk score using the weighted framework in the Risk Assessment Framework section (Defender Risk Score 25%, Active Alerts 25%, Vulnerabilities 20%, Compliance Status 15%, Sign-in Anomalies 15%).
- IP enrichment — run
enrich_ips.pyfor external IPs from network connections and sign-in data. Ifenrich_ips.pyis unavailable, use Sentinel ThreatIntelIndicators data as fallback. - PII-Free — the report file is saved to
reports/which is gitignored. However, exercise caution with any files that may be shared externally. - Emoji consistency — follow the Emoji Formatting table from
copilot-instructions.mdfor all risk/status indicators. - Query appendix — include record counts and execution times but NOT full KQL text. Reference the SKILL.md query numbers.
- Trust type context — always reference the device trust type in the Executive Summary and Risk Assessment, as it affects the security implications.
JSON Export Structure
Export MCP query results to a single JSON file with these required keys:
{
"device_name": "WORKSTATION-001",
"device_id_entra": "<ENTRA_DEVICE_OBJECT_ID>",
"device_id_defender": "<DEFENDER_DEVICE_ID>",
"device_type": "HybridJoined",
"investigation_date": "2026-01-23",
"start_date": "2026-01-16",
"end_date": "2026-01-25",
"timestamp": "20260123_143200",
"device_profile": {
"displayName": "WORKSTATION-001",
"operatingSystem": "Windows",
"operatingSystemVersion": "10.0.22621.3007",
"trustType": "ServerAd",
"isCompliant": true,
"isManaged": true,
"registrationDateTime": "2025-06-15T10:30:00Z",
"approximateLastSignInDateTime": "2026-01-23T14:00:00Z",
"manufacturer": "Dell Inc.",
"model": "Latitude 5520"
},
"defender_profile": {
"healthStatus": "Active",
"riskScore": "Medium",
"exposureLevel": "Low",
"onboardingStatus": "Onboarded",
"sensorHealthState": "Active",
"lastSeen": "2026-01-23T14:30:00Z",
"lastIpAddress": "10.0.1.50",
"lastExternalIpAddress": "203.0.113.42"
},
"device_owners": [...],
"device_users": [...],
"signin_events": [...],
"security_alerts": [...],
"process_events": [...],
"network_events": [...],
"file_events": [...],
"registry_events": [...],
"incidents": [...],
"logged_on_users": [...],
"software_inventory": [...],
"vulnerabilities": [...],
"automated_investigations": [...],
"remediation_activities": [...],
"threat_intel_matches": [...],
"summary": {
"total_alerts": 5,
"critical_alerts": 1,
"high_alerts": 2,
"medium_alerts": 2,
"low_alerts": 0,
"total_vulnerabilities": 15,
"critical_vulnerabilities": 2,
"unique_logged_on_users": 3,
"suspicious_processes": 4,
"threat_intel_hits": 1
}
}
Error Handling
Common Issues and Solutions
| Issue | Solution |
|---|---|
| Device not found in Graph API | Try searching by deviceId instead of displayName, check case sensitivity |
| Defender Device ID not matching | Use Advanced Hunting to find correct Defender ID by device name |
| DeviceName query returns empty | Use startswith instead of =~ - DeviceName often contains FQDN (e.g., hostname.domain.com) |
| SigninLogs DeviceDetail fails with union | DeviceDetail is dynamic in SigninLogs but string in AADNonInteractiveUserSignInLogs - query tables separately, don't use union isfuzzy=true with DeviceDetail filtering |
| RiskScore column not found | RiskScore is NOT in DeviceInfo table - use GetDefenderMachine API for riskScore |
| Missing compliance data | Device may not be MDM enrolled - check isManaged field |
| No process events | Device may not be onboarded to Defender for Endpoint |
| Trust type is null | Device may be partially registered - check registrationDateTime |
| Query timeout on DeviceEvents | Reduce date range or add more specific filters |
| BitLocker query fails | Verify permissions and that BitLocker is enabled on device |
Required Field Defaults
{
"trustType": "Workplace",
"isCompliant": false,
"isManaged": false,
"approximateLastSignInDateTime": "1970-01-01T00:00:00Z",
"riskScore": "Unknown",
"exposureLevel": "Unknown",
"healthStatus": "Unknown"
}
Empty Result Handling
{
"signin_events": [],
"security_alerts": [],
"process_events": [],
"network_events": [],
"file_events": [],
"registry_events": [],
"incidents": [],
"logged_on_users": [],
"software_inventory": [],
"vulnerabilities": [],
"automated_investigations": [],
"remediation_activities": [],
"threat_intel_matches": []
}
Device Trust Type Analysis
Security Implications by Trust Type
Entra Joined (trustType: AzureAd)
- Pros: Full cloud management, Conditional Access enforcement, BitLocker key escrow
- Cons: No access to on-premises resources without VPN/Azure AD Application Proxy
- Investigation Focus: Cloud sign-in patterns, Intune compliance, Conditional Access logs
Hybrid Joined (trustType: ServerAd)
- Pros: Access to both cloud and on-premises resources, GPO support
- Cons: Complex identity, dual token handling, potential for on-prem compromise to affect cloud
- Investigation Focus: BOTH cloud and on-premises sign-ins, AD replication, Kerberos tickets
Entra Registered (trustType: Workplace)
- Pros: BYOD support, minimal device management overhead
- Cons: Limited compliance enforcement, device not fully controlled
- Investigation Focus: User activity on device, data access patterns, potential data exfiltration
Risk Assessment Framework
Device Risk Scoring
| Factor | Weight | High Risk Indicators |
|---|---|---|
| Defender Risk Score | 25% | "High" or "Critical" |
| Active Alerts | 25% | Any Critical/High severity alerts |
| Vulnerabilities | 20% | Critical CVEs, end-of-support software |
| Compliance Status | 15% | Non-compliant, not managed |
| Sign-in Anomalies | 15% | Multiple users, unusual hours, new IPs |
Risk Level Determination
- Critical: Active critical alert OR critical vulnerability being exploited
- High: High severity alerts OR critical unpatched vulnerabilities OR compromised user logged on
- Medium: Medium alerts OR high vulnerabilities OR non-compliance
- Low: Minor alerts OR low vulnerabilities, device is compliant and healthy
- Informational: No alerts, compliant, healthy sensor
Integration with Main Copilot Instructions
This skill follows all patterns from the main copilot-instructions.md:
- Date range handling: Uses +2 day rule for real-time searches
- Parallel execution: Runs independent queries simultaneously
- Time tracking: Mandatory reporting after each phase
- Token management: Uses
create_filefor all output - Follow-up analysis: Reference
copilot-instructions.mdfor cross-entity correlation
Example invocations:
- "Investigate device WORKSTATION-001 for the last 7 days"
- "Quick security check on computer LAP-JSMITH01"
- "Full investigation for potentially compromised endpoint SRV-DC01 last 30 days"
- "Check hybrid joined device DESKTOP-HR01 for malware"
- "Analyze BYOD device iPad-John for suspicious activity"
SVG Dashboard Generation
After generating a computer investigation report (markdown file output), an SVG dashboard can be created using the shared SVG rendering skill.
Trigger: User asks "generate an SVG dashboard from the report" or "visualize this report"
Workflow:
- Read this skill's
svg-widgets.yaml(widget manifest — defines layout, colors, field mapping) - Read
.github/skills/svg-dashboard/SKILL.md(rendering rules — component library, quality standards) - Extract data from the completed report using
data_sources.field_mapping_notes - Render SVG → save as
{report_basename}_dashboard.svgin the same directory
Layout: 5 rows — title banner, risk score card + KPI cards (alerts/incidents/vulnerabilities/users/EOS software), alerts by MITRE tactic bar chart + vulnerabilities by severity bar chart, incidents table + risk/mitigating factors table, assessment banner + recommendations.
Last Updated: March 24, 2026
.github/skills/data-security-analysis/SKILL.md
npx skills add SCStelz/security-investigator --skill data-security-analysis -g -y
SKILL.md
Frontmatter
{
"name": "data-security-analysis",
"description": "Analyze data security events, sensitive information type (SIT) access, sensitivity label access, DLP matches, or Purview insider risk activity. Triggers on keywords like \"data security\", \"sensitive information type\", \"SIT access\", \"DLP events\", \"DataSecurityEvents\", \"EDM access\", \"credit card access\", \"insider risk activity\", \"Purview data security\", \"sensitivity label\", \"label downgrade\", \"label change\", \"Copilot label exposure\". Queries DataSecurityEvents in Advanced Hunting to produce SIT and label access analysis: volume breakdowns, user drill-downs, file inventories, action type distribution, DLP correlation, label change tracking, Copilot label exposure, temporal patterns, and risk-ranked user summaries. Inline chat or markdown output. Designed for large environments (100k+ users) with tiered drill-down.",
"drill_down_prompt": "Analyze data security events — SIT access patterns, label changes, DLP policy matches",
"threat_pulse_domains": [
"admin",
"cloud"
]
}
Data Security Events Analysis — Instructions
Purpose
This skill analyzes DataSecurityEvents (Microsoft Purview Insider Risk Management / DLP telemetry) to answer questions about who accessed documents containing sensitive information types (SITs) and/or sensitivity labels — including EDM (Exact Data Match), built-in SITs (credit cards, SSNs, etc.), trainable classifiers, and Microsoft Purview sensitivity labels (Confidential, Highly Confidential, custom labels, etc.).
Primary Table: DataSecurityEvents (Defender XDR Advanced Hunting)
| Use Case | Example Question |
|---|---|
| SIT access audit | "Who accessed files with credit card numbers in the last 30 days?" |
| EDM monitoring | "Show me all access to documents matching our EDM SIT" |
| DLP event analysis | "What DLP policy matches occurred this week?" |
| Insider risk triage | "Which users have the most sensitive data interactions?" |
| SIT landscape overview | "What sensitive information types exist in our environment?" |
| Sensitivity label audit | "Who accessed Highly Confidential labeled documents?" |
| Label change tracking | "Show me all label downgrades in the last 30 days" |
| Copilot label exposure | "What labeled documents did Copilot access in risky interactions?" |
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- SIT GUID Mapping Strategy - How SIT GUIDs are resolved to names
- Label GUID Mapping Strategy - How sensitivity label GUIDs are resolved to names
- Output Modes - Inline chat vs. Markdown file
- Quick Start - 8-step execution pattern
- Execution Workflow - 6-phase analysis process
- Sample KQL Queries - Validated query patterns (Queries 1-16d)
- Report Template - Rendering rules (15 rules) + output format specification
- Known Pitfalls - Table quirks and edge cases (27 entries)
- Error Handling - Troubleshooting guide
- SVG Dashboard Generation - Visual dashboard from report data
Investigation shortcuts:
- DLP/exfiltration incident entities (TP Q1): Q3 (top users by SIT volume) → Q6 (DLP policy matches) → Q9 (single-user SIT profile for each incident entity) → Q10b (file-based spikes)
- High-volume mailbox API access (TP Q9): Q9 (single-user SIT profile for API actors) → Q4 (top files accessed) → Q10b (file-based spikes) → Q6 (DLP policy matches)
- Risky identity with data access (TP Q3): Q9 (single-user SIT profile) → Q4 (top files) → Q13 (label downgrade/changes by user)
- Copilot sensitive data exposure (TP Q1 Copilot incidents, or TP Q10 AppRegistration with AI keywords): Q16a (Copilot SIT landscape + agent/human split) → Q16b (top human users, high-priority SITs) → Q16d (prompt-only risk signal)
- Label compliance / downgrade alert (TP Q1 label-related incidents): Q13 (label changes) → Q15 (label-only events) → Q14 (Copilot label exposure)
- Tenant-wide data security posture (standalone, no TP trigger): Full Phase 1–5 workflow
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full Phase 1-5 sequence when the user explicitly requests "full analysis", "comprehensive", or "tenant-wide overview". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
When invoked from a parent skill (threat-pulse, incident-investigation, user-investigation):
- Inherit the workspace selection from the parent investigation context
- Skip output mode prompts — default to inline chat (the parent skill controls the final output format)
- Match the TP Q# trigger to the shortcuts above and execute that chain with entity substitution
- Use 30d lookback (AH default) unless the parent specifies otherwise
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY data security analysis:
- ALWAYS use
RunAdvancedHuntingQuery— DataSecurityEvents is an Advanced Hunting table, NOT available in Sentinel Data Lake - ALWAYS run Query 1 (SIT Discovery) first — establishes which SITs are active and builds the GUID-to-Name mapping
- ALWAYS use
summarizeaggressively — this table can have 600k+ rows in 30 days even in mid-size tenants. NEVER retrieve raw rows except for targeted samples - ALWAYS pre-filter with
hasbeforemv-expandonSensitiveInfoTypeInfo— thehas "<GUID>"filter avoids expensive expansion on non-matching rows - ALWAYS use
tostring()+ doubleparse_json()for SensitiveInfoTypeInfo — it'sCollection(String), not native dynamic - NEVER report SIT GUIDs without attempting name resolution — use the mapping strategy below
- ALWAYS ask for output mode if not specified: inline chat or markdown file
- Prerequisite: DataSecurityEvents requires Insider Risk Management opt-in to share data with Defender XDR. If the table returns 0 rows or "table not found", inform the user of this requirement
- ALWAYS run the Label Coverage Assessment (Query 11 quick stats variant) during Phase 1 to determine if this environment has significant label usage. Adapt the report accordingly (see Rule 11)
- NEVER report sensitivity label GUIDs without attempting name resolution — use the label mapping strategy below
- ALWAYS use
split()onSensitivityLabelId— this column can contain comma-separated GUIDs (one per sub-entity), not a single GUID
⛔ PROHIBITED ACTIONS
| Action | Status |
|---|---|
Querying DataSecurityEvents via mcp_sentinel-data_query_lake |
❌ PROHIBITED — AH-only table |
Retrieving raw rows without summarize or take limit |
❌ PROHIBITED — table is massive |
| Reporting SIT GUIDs without name resolution attempt | ❌ PROHIBITED |
| Reporting sensitivity label GUIDs without name resolution attempt | ❌ PROHIBITED |
Running mv-expand on SensitiveInfoTypeInfo without pre-filtering with has |
❌ PROHIBITED — performance killer at scale |
Assuming SensitiveInfoTypeInfo is native dynamic |
❌ PROHIBITED — it's Collection(String), requires double-parse |
SIT GUID Mapping Strategy
The Problem
DataSecurityEvents.SensitiveInfoTypeInfo contains SIT GUIDs, not human-readable names. SIT GUIDs fall into three categories:
| Category | Resolvable via KQL? | Example |
|---|---|---|
| Built-in Microsoft SITs | ✅ Yes — use embedded mapping | 50842eb7-...-b085 → "Credit Card Number" |
| Custom/EDM SITs | ❌ No — org-specific GUIDs | b28fcea1-...-9291 → "Project Obsidian" (custom) |
| Trainable Classifiers (ML) | ❌ No — ClassifierType: "MLModel" |
77a140be-...-7560 → unknown ML classifier |
Resolution Strategy (3 tiers, in order)
Tier 1: Embedded Well-Known SIT Mapping (instant, no auth)
The query library below includes a datatable of the most common Microsoft SIT GUIDs encountered in production environments. This covers ~90% of detections in typical tenants.
Tier 2: User-Provided Custom SIT Mapping (config-driven)
If the user has custom/EDM SITs, they can provide a mapping in config.json under a sit_mapping key:
{
"sit_mapping": {
"<custom-sit-guid-1>": "Your Custom SIT Name",
"<custom-sit-guid-2>": "Your EDM SIT Name"
}
}
At skill startup: Check if config.json has a sit_mapping section. If yes, merge it into the KQL datatable for name resolution.
Tier 3: PowerShell Resolution (optional, on-demand)
If unresolved GUIDs remain after Tier 1+2, offer to resolve them via PowerShell:
"I found N SIT GUIDs that aren't in the built-in mapping. Would you like me to resolve them via
Get-DlpSensitiveInformationType? This requires an active Security & Compliance PowerShell session (Connect-IPPSSession)."
If the user agrees:
# Requires: Install-Module ExchangeOnlineManagement
# Requires: Connect-IPPSSession -UserPrincipalName <UPN>
Get-DlpSensitiveInformationType -Identity "<GUID>" | Select-Object Name, Id, Publisher
After resolution: Offer to save the mapping to config.json for future runs.
Post-Resolution Persistence (MANDATORY)
After Tier 3 PowerShell resolution completes, always offer to persist the resolved GUIDs:
"I resolved N SIT GUIDs via PowerShell. Would you like me to save these to
config.jsonundersit_mappingso future runs resolve them automatically via Tier 2?"
If the user agrees, read the current config.json, add/merge a sit_mapping object with the resolved GUIDs, and write it back. Format:
{
"sit_mapping": {
"<guid>": "<resolved-name>",
"<guid>": "<resolved-name>"
}
}
Why this matters: Without persistence, every new session re-encounters the same unresolved GUIDs. The first report in a workspace should resolve and persist; subsequent runs benefit automatically.
Trainable Classifiers
GUIDs with ClassifierType: "MLModel" are trainable classifiers and may not resolve via Get-DlpSensitiveInformationType. Display them as:
[ML Classifier] <GUID>if unresolved- Check if the GUID appears in the well-known mapping (some trainable classifiers have known GUIDs)
Label GUID Mapping Strategy
The Problem
DataSecurityEvents has 4 label-related columns, all containing sensitivity label GUIDs (not names):
| Column | Type | Content |
|---|---|---|
SensitivityLabelId |
string | Label on the document at event time. Can contain comma-separated GUIDs (one per sub-entity) |
PreviousSensitivityLabelId |
string | Previous label — only populated on label-change events (downgrade, removal) |
SharepointSiteSensitivityLabelId |
string | Label on the SharePoint site (not the document) |
RiskyAIUsageSensitivityLabelsInfo |
Collection(String) | Labels on resources Copilot accessed in risky AI events — JSON array of objects with SubEntityId, SubEntityName, SensitivityLabelId |
Resolution Strategy (3 tiers, in order)
Tier 1: Embedded Well-Known Label Mapping (instant, no auth)
The query library includes a datatable of Microsoft default sensitivity labels (the defa4170-* GUID family). All 12 default labels — including parent labels — use the deterministic pattern defa4170-0d19-0005-XXXX-bc88714345d2, confirmed across multiple tenants.
⚠️ Important: Microsoft does not publish default label GUIDs in official documentation. The GUID pattern is confirmed via
Get-Labelon default-configuration tenants. Older tenants may have renamed default labels (e.g., "Non-business" instead of "Personal", "Internal exception" instead of "Anyone (unrestricted)") or replaced default parent label GUIDs with random tenant-specific ones. Always validate withGet-Label(Tier 3) when accuracy matters.
Default Label GUID Pattern: defa4170-0d19-0005-XXXX-bc88714345d2 — complete mapping (12 labels, priority-ordered):
| GUID suffix | Priority | Default Name | Parent |
|---|---|---|---|
0000 |
0 | Personal | (top-level) |
0001 |
1 | Public | (top-level) |
0002 |
2 | General | (top-level) |
0003 |
3 | Anyone (unrestricted) | General |
0004 |
4 | All Employees (unrestricted) | General |
0005 |
5 | Confidential | (top-level, parent) |
0006 |
6 | Anyone (unrestricted) | Confidential |
0007 |
7 | All Employees | Confidential |
0008 |
8 | Trusted People | Confidential |
0009 |
9 | Highly Confidential | (top-level, parent) |
000a |
10 | All Employees | Highly Confidential |
000b |
11 | Specified People | Highly Confidential |
Older/customized tenants: Admins may have renamed default labels or deleted and recreated parent labels with random GUIDs. If
Get-Labelreturns a different GUID for "Confidential" or "Highly Confidential" (not matchingdefa4170-*), the tenant has custom parent labels — add them viaconfig.json(Tier 2).
Tier 2: User-Provided Custom Label Mapping (config-driven)
Custom labels (org-created) have random GUIDs. Users can provide a mapping in config.json:
{
"label_mapping": {
"<custom-label-guid-1>": "Your Custom Label",
"<sub-label-guid>": "Sub-Label Name|Parent Label Name",
"<parent-label-guid>": "Confidential"
}
}
Value format: "LabelName" for top-level labels, "LabelName|ParentName" (pipe-delimited) for sub-labels. When building the KQL datatable, split on | to populate LabelName and LabelParent columns.
Renamed defaults: If a tenant has renamed default labels (e.g., defa4170...0000 → "Non-business" instead of "Personal"), include the renamed GUID in label_mapping — Tier 2 entries override Tier 1 defaults.
At skill startup: Check if config.json has a label_mapping section. If yes, merge it into the KQL datatable for name resolution. Tier 2 entries take precedence over Tier 1 defaults for the same GUID.
Tier 3: PowerShell Resolution (optional, on-demand)
If unresolved label GUIDs remain after Tier 1+2, offer to resolve them via PowerShell:
"I found N label GUIDs that aren't in the built-in mapping. Would you like me to resolve them via
Get-Label? This requires an active Security & Compliance PowerShell session (Connect-IPPSSession)."
If the user agrees:
# Requires: Install-Module ExchangeOnlineManagement
# Requires: Connect-IPPSSession -UserPrincipalName <UPN>
Get-Label | Select-Object DisplayName, @{N='LabelGuid';E={$_.Guid.ToString()}}, ParentLabelDisplayName | Format-List
After resolution: Offer to save the mapping to config.json under label_mapping for future runs (same persistence pattern as SIT mapping).
Key difference from SIT resolution: Labels use
Get-Label(notGet-DlpSensitiveInformationType). The cmdlet returns ALL labels at once — no need to query by individual GUID.
Output Modes
ASK the user which they prefer if not explicitly specified. Both may be selected.
Mode 1: Inline Chat Summary (Default)
- Render analysis directly in chat
- Includes summary tables, top-N breakdowns, risk-ranked user list
- Best for quick review and follow-up questions
Mode 2: Markdown File Report
- Save to
reports/data-security/DataSecurity_Analysis_<scope>_<timestamp>.md - Full detail including all phases, temporal charts, file inventories
- Use
create_filetool — NEVER use terminal commands for file output - Filename pattern:
DataSecurity_Analysis_<scope>_YYYYMMDD_HHMMSS.md<scope>=tenant_wide,sit_<SITname>,user_<username>, etc.
Quick Start (TL;DR)
- Determine scope → Tenant-wide overview? Specific SIT? Specific user? Specific label? Time range?
- Check config.json → Look for
sit_mappingandlabel_mappingsections for custom name resolution - Run Phase 1 → Query 1 (SIT Discovery) + Label Coverage Assessment (quick stats variant) to determine environment maturity
- Run Phase 2 → Queries 2-5 (breakdowns by action type, user, file, time)
- Run Phase 2.5 → Queries 16a-16d (Copilot SIT exposure analysis) — conditional on Copilot volume (see Phase 2.5 trigger)
- Run Phase 3 → Queries 6-8 (DLP correlation, workload, SIT drill-down), Query 10b (file-based spikes)
- Run Phase 4 → Queries 11-15 (label landscape, label-based user ranking, label downgrades, Copilot label exposure, label-only events) — depth depends on label coverage (see Rule 11)
- Output Results → Render in selected mode(s), offer PowerShell resolution for unknowns
Execution Workflow
Phase 1: Discovery & Mapping (always run first)
Goal: Establish what SITs and labels exist in the data, their volume, and resolve GUIDs to names.
- Run Query 1 (SIT Discovery) — returns top SIT GUIDs with hit counts
- Run Label Coverage Assessment (quick stats from Query 11 comment block) — returns label vs SIT coverage percentages
- Apply Tier 1 mapping (embedded
datatable) to resolve known SIT and label GUIDs - Check
config.jsonfor Tier 2 mapping (sit_mapping+label_mapping) to resolve custom GUIDs - Flag any remaining unresolved GUIDs for optional Tier 3 (PowerShell)
- Present the SIT + label landscape to the user before proceeding
- Determine label analysis depth based on coverage (see Rule 11)
Phase 2: Breakdown Analysis
Goal: Decompose SIT access patterns by multiple dimensions.
Run these queries in parallel where possible:
| Query | Dimension | Purpose |
|---|---|---|
| Query 2 | Action Type | What operations triggered SIT detections (file read, download, copy, Copilot response, etc.) |
| Query 3 | User Ranking | Top users by SIT interaction volume — risk-ranked |
| Query 4 | File Inventory | Top files/documents containing the most SIT detections |
| Query 5 | Temporal Pattern | Daily/hourly volume trend to spot spikes |
Phase 2.5: Copilot SIT Exposure Analysis (conditional on Copilot volume)
Trigger: Run this phase when Copilot/AI events exceed 30% of total volume (determined from Query 2 Action Type breakdown or Query 7 Workload breakdown).
Goal: Decompose Copilot SIT interactions by priority tier, identify users prompting high-value SITs into Copilot, separate service account noise from human risk signals, and estimate real interaction counts (correcting for row multiplication).
Key insight: Each Copilot interaction generates ~2-3 DSE rows on average (up to 35 for complex exchanges) because Purview creates separate rows for prompt SIT matches, response SIT matches, and compound agent interactions. Raw event counts must be corrected for this multiplier when reporting interaction volumes.
| Query | Purpose |
|---|---|
| Query 16a | Copilot SIT Landscape — Which SITs fire in Copilot interactions, classified by priority tier (High/Medium/Low) |
| Query 16b | Top Users by High-Priority SIT in Copilot — Risk-ranked users excluding service accounts |
| Query 16c | Daily Temporal Trend by SIT Category — Spot adoption vs risk pattern changes over time |
| Query 16d | Prompt-Only Human Users — Users typing sensitive data INTO Copilot (primary risk signal), excluding service accounts and responses |
Service Account Filtering: Automated service accounts (e.g., Security Copilot agents, Purview agents) can generate 50-70% of all Copilot events. These accounts typically follow patterns like securitycopilotagentuser-*, svc-*, or system-generated UPN prefixes with GUIDs. Query 16a quantifies the agent vs human split; Queries 16b-16d exclude agents to surface human risk.
Priority SIT Classification: SITs are classified into tiers for Copilot risk assessment:
| Tier | SIT Categories | Risk Rationale |
|---|---|---|
| 🔴 High | Credit Card Numbers, SSNs, Azure/Cloud Credentials, Employee HR Data (custom) | Direct financial, identity, or infrastructure exposure |
| 🟡 Medium | Project code names (custom), Employee IDs (custom) | Business-sensitive but not directly exploitable |
| 🔵 Low | All Full Names, IP Addresses, Physical Addresses, Medical Terms | High-volume, low-specificity — noise in Copilot context |
Custom SIT classification: Organizations should classify their custom/EDM SITs into these tiers. If
config.jsonhas asit_prioritysection (mapping GUID → tier), use it. Otherwise, classify custom SITs as 🔴 High by default (conservative).
Phase 3: Deep Dive (conditional on scope)
| Scenario | Run These |
|---|---|
| Tenant-wide overview | Query 6 (DLP policy matches), Query 7 (Workload breakdown) |
| Specific SIT investigation | Query 8 (Single-SIT deep dive with full user/file/action breakdown) |
| Specific user investigation | Query 9 (Single-user SIT access profile) |
| Anomaly detection | Query 10b (file-based spikes — PRIMARY), Query 10 (overall spikes — secondary, includes Copilot) |
Phase 4: Sensitivity Label Analysis (conditional on coverage)
Goal: Analyze sensitivity label access patterns, label changes, and Copilot label exposure.
Run the Label Coverage Assessment first (Phase 1, step 2). Then adapt depth per Rule 11:
| Label Coverage | Analysis Depth | Run These |
|---|---|---|
| ≥5% of events have labels (label-mature environment) | Full label analysis — dedicated report sections | Query 11 (label landscape), Query 12 (label-based user ranking), Query 13 (label changes), Query 14 (Copilot label exposure), Query 15 (label-only events) |
| 1-5% of events have labels (emerging label environment) | Summary label section — condensed into one section | Query 11 (label landscape), Query 13 (label changes) |
| <1% of events have labels (SIT-dominant environment) | Brief note only — mention label presence in Scope & Limitations | Label coverage stats from Phase 1 assessment only |
| User asks specifically about labels | Full label analysis regardless of coverage percentage | All label queries (11-15) |
Phase 5: Report Generation
Render findings using the Report Template below.
Sample KQL Queries
Well-Known SIT GUID Mapping (datatable)
Use this let block as a prefix for any query that needs name resolution. It covers the most common Microsoft SITs plus placeholders for custom SITs from config.json.
// Well-known SIT GUID mapping — covers ~90% of typical detections
// Add custom/EDM SIT GUIDs from config.json sit_mapping section
let SITMapping = datatable(SITId: string, SITName: string) [
// ── Financial ──
"50842eb7-edc8-4019-85dd-5a5c1f2bb085", "Credit Card Number",
"cb353f78-2b72-4c3c-8827-92ebe4f69fdf", "ABA Routing Number",
"78e09124-f2c3-4656-b32a-c1a132cd2711", "Brazil CPF Number",
// ── Identity / PII ──
"a44669fe-0d48-453d-a9b1-2cc83f2cba77", "U.S. Social Security Number (SSN)",
"a7dd5e5f-e7f9-4626-a2c6-86a8cb6830d2", "IP Address v4",
"1daa4ad5-e2dd-4ca4-a788-54722c09efb2", "IP Address",
"50b8b56b-4ef8-44c2-a924-03374f5831ce", "All Full Names",
"8548332d-6d71-41f8-97db-cc3b5fa544e6", "All Physical Addresses",
"44aa44f2-63d1-41df-af0d-970283ac41e2", "U.S. Physical Addresses",
"d1d18c85-1203-46f5-b32f-2d6309de4e5b", "Australia Physical Addresses",
"6fa57f91-314a-4561-8248-7ab921957448", "Philippines Passport Number",
"d0001c83-e72f-4360-98d3-f5a41dc5a380", "Indonesia Passport Number",
// ── Healthcare ──
"065bdd91-ef07-40d3-b8a4-0aea722eaa49", "All Medical Terms And Conditions",
"17066377-466d-43ff-997f-c9240414021c", "Diseases",
"f6dc2d17-3549-41e2-af29-ae1846ae9542", "Types Of Medication",
"ee05bb9c-7b87-42e1-9987-446b243245d5", "Lab Test Terms",
// ── Azure / Cloud secrets ──
"0f587d92-eb28-44a9-bd1c-90f2892b47aa", "Azure DocumentDB Auth Key",
"ce1a126d-186f-4700-8c0c-486157b953fd", "Azure SQL Connection String",
"0b34bec3-d5d6-4974-b7b0-dcdb5c90c29d", "Azure IoT Connection String",
"c7bc98e8-551a-4c35-a92d-d2c8cda714a7", "Azure Storage Account Key",
"095a7e6c-efd8-46d5-af7b-5298d53a49fc", "Azure Redis Cache Connection String",
// ─── ADD CUSTOM / EDM SITs FROM config.json sit_mapping HERE ───
// Example: "<your-edm-guid>", "Your EDM SIT Name",
"END_MARKER", "END_MARKER"
];
Instructions: When building queries, read
config.jsonforsit_mappingentries and insert them into thedatatableabove, replacing theEND_MARKERrow. If no custom mapping exists, remove theEND_MARKERrow.
Query 1: SIT Discovery — Active SIT Landscape
Purpose: Find all active SIT GUIDs, their volume, and classify them.
// Query 1: SIT Discovery — What SITs are active in this environment?
// Adjust timespan as needed (default: 30d)
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend ClassifierType = tostring(SITJson.ClassifierType)
| extend SITConfidence = toint(SITJson.Confidence)
| extend SITCount = toint(SITJson.Count)
| summarize
TotalEvents = count(),
DistinctUsers = dcount(AccountUpn),
DistinctFiles = dcount(ObjectId),
AvgConfidence = avg(SITConfidence),
MaxConfidence = max(SITConfidence),
ClassifierTypes = make_set(ClassifierType)
by SITId
| order by TotalEvents desc
| take 50
Post-processing: Join results with the SITMapping datatable to resolve names. Flag any GUIDs not in the mapping as "Unknown — custom/EDM SIT" or "[ML Classifier]" based on ClassifierTypes.
Query 2: Action Type Breakdown
Purpose: Break down SIT detections by what operation triggered them.
// Query 2: Action Type Breakdown — What operations trigger SIT detections?
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize
EventCount = count(),
DistinctUsers = dcount(AccountUpn),
DistinctFiles = dcount(ObjectId)
by ActionType
| order by EventCount desc
Query 3: Top Users by SIT Interaction Volume
Purpose: Risk-rank users by sensitive data interaction volume. Designed for 100k+ user environments.
// Query 3: Top 50 Users by SIT access volume
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize
TotalEvents = count(),
DistinctSITs = dcount(tostring(parse_json(tostring(parse_json(tostring(SensitiveInfoTypeInfo))[0])).SensitiveInfoTypeId)),
DistinctFiles = dcount(ObjectId),
ActionTypes = make_set(ActionType),
Workloads = make_set(Workload),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by AccountUpn
| order by TotalEvents desc
| take 50
Query 4: Top Files by SIT Detection Count
Purpose: Identify the most sensitive documents — files with the most SIT detections across access events.
// Query 4: Top 30 Files by SIT detection frequency
// Excludes system/operational files (DLPCache, EBWebView) that are Defender operational reads, not user-initiated
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| where isnotempty(ObjectId)
| where ObjectId !has "DLPCache" and ObjectId !has "EBWebView" and ObjectId !has "\\ProgramData\\Microsoft\\Windows Defender\\"
| summarize
AccessCount = count(),
DistinctUsers = dcount(AccountUpn),
ActionTypes = make_set(ActionType),
LastAccessed = max(Timestamp)
by ObjectId
| order by AccessCount desc
| take 30
Query 5: Temporal Pattern — Daily SIT Event Volume
Purpose: Detect volume spikes or anomalies in SIT-related activity over time.
// Query 5: Daily SIT event volume trend — includes file-based column for spike attribution (Rule 10)
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize
DailyEvents = count(),
FileEvents = countif(Workload !in ("Copilot", "ConnectedAIApp")),
DistinctUsers = dcount(AccountUpn)
by Day = bin(Timestamp, 1d)
| order by Day asc
Query 6: DLP Policy Match Correlation
Purpose: Show DLP policy matches alongside SIT detections — which policies fired and how often.
// Query 6: DLP Policy Match breakdown
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(DlpPolicyMatchInfo)
| extend DlpInfo = parse_json(DlpPolicyMatchInfo)
| mv-expand DlpPolicy = DlpInfo
| extend PolicyName = tostring(DlpPolicy.PolicyName)
| extend PolicyId = tostring(DlpPolicy.PolicyId)
| summarize
MatchCount = count(),
DistinctUsers = dcount(AccountUpn),
DistinctFiles = dcount(ObjectId),
ActionTypes = make_set(ActionType)
by PolicyName
| order by MatchCount desc
Query 7: Workload Breakdown
Purpose: Where is sensitive data being accessed — SharePoint, OneDrive, Exchange, Teams, Endpoints, Copilot?
// Query 7: Workload distribution of SIT events
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize
EventCount = count(),
DistinctUsers = dcount(AccountUpn),
DistinctFiles = dcount(ObjectId)
by Workload
| order by EventCount desc
Query 8: Single-SIT Deep Dive
Purpose: Full breakdown for a specific SIT GUID — who accessed it, which files, what operations, over what time period.
Usage: Replace
<TARGET_SIT_GUID>with the specific SIT GUID to investigate (e.g., an EDM SIT GUID).
// Query 8: Single-SIT deep dive — replace GUID
let targetSIT = "<TARGET_SIT_GUID>";
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| where SensitiveInfoTypeInfo has targetSIT
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| where SITId == targetSIT
| extend SITConfidence = toint(SITJson.Confidence)
| extend SITCount = toint(SITJson.Count)
| summarize
AccessCount = count(),
AvgConfidence = avg(SITConfidence),
TotalSITInstances = sum(SITCount),
ActionTypes = make_set(ActionType),
Workloads = make_set(Workload),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by AccountUpn, ObjectId
| order by AccessCount desc
| take 100
Query 9: Single-User SIT Access Profile
Purpose: Complete SIT interaction profile for a specific user — what SITs they accessed, which files, operations, and when.
Usage: Replace
<TARGET_UPN>with the user's UPN.
// Query 9: Single-user SIT access profile
let targetUser = "<TARGET_UPN>";
DataSecurityEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ targetUser
| where isnotempty(SensitiveInfoTypeInfo)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend SITConfidence = toint(SITJson.Confidence)
| extend SITCount = toint(SITJson.Count)
| summarize
AccessCount = count(),
DistinctFiles = dcount(ObjectId),
AvgConfidence = avg(SITConfidence),
ActionTypes = make_set(ActionType),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by SITId
| order by AccessCount desc
Query 10: Anomaly Detection — Users with SIT Access Spikes
Purpose: Compare each user's recent 7-day SIT activity against their 30-day daily average to detect sudden spikes. Designed for 100k+ user environments.
// Query 10: SIT access spike detection (7d recent vs 23d baseline) — ALL events
// NOTE: This includes Copilot events. For file-based-only spikes, use Query 10b below.
let baseline = DataSecurityEvents
| where Timestamp between (ago(30d) .. ago(7d))
| where isnotempty(SensitiveInfoTypeInfo)
| summarize BaselineTotal = count() by AccountUpn
| extend BaselineDailyAvg = round(BaselineTotal / 23.0, 1); // 23 days in baseline window
let recent = DataSecurityEvents
| where Timestamp > ago(7d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize RecentTotal = count() by AccountUpn
| extend RecentDailyAvg = round(RecentTotal / 7.0, 1);
recent
| join kind=inner baseline on AccountUpn
| extend SpikeRatio = round(RecentDailyAvg / BaselineDailyAvg, 2)
| where SpikeRatio > 2.0 and RecentTotal > 20 and BaselineTotal >= 10
| project AccountUpn, BaselineDailyAvg, RecentDailyAvg, SpikeRatio, BaselineTotal, RecentTotal
| order by SpikeRatio desc
| take 30
Query 10b: File-Based-Only Spike Detection (Excludes Copilot)
Purpose: Same as Query 10 but excludes Copilot and ConnectedAIApp events to surface actual file access spikes. This is the primary risk signal — Copilot spikes often just reflect adoption changes.
// Query 10b: File-based SIT access spike detection (excludes Copilot/AI events)
let CopilotActionTypes = dynamic(["Risky prompt entered in Copilot", "Sensitive response received in Copilot",
"Risky prompt entered in connected AI apps", "Sensitive response received in connected AI apps"]);
let baseline = DataSecurityEvents
| where Timestamp between (ago(30d) .. ago(7d))
| where isnotempty(SensitiveInfoTypeInfo)
| where not(ActionType has_any (CopilotActionTypes))
| where Workload !in ("Copilot", "ConnectedAIApp")
| summarize BaselineTotal = count() by AccountUpn
| extend BaselineDailyAvg = round(BaselineTotal / 23.0, 1);
let recent = DataSecurityEvents
| where Timestamp > ago(7d)
| where isnotempty(SensitiveInfoTypeInfo)
| where not(ActionType has_any (CopilotActionTypes))
| where Workload !in ("Copilot", "ConnectedAIApp")
| summarize RecentTotal = count() by AccountUpn
| extend RecentDailyAvg = round(RecentTotal / 7.0, 1);
recent
| join kind=inner baseline on AccountUpn
| extend SpikeRatio = round(RecentDailyAvg / BaselineDailyAvg, 2)
| where SpikeRatio > 2.0 and RecentTotal > 10 and BaselineTotal >= 10
| project AccountUpn, BaselineDailyAvg, RecentDailyAvg, SpikeRatio, BaselineTotal, RecentTotal
| order by SpikeRatio desc
| take 30
Well-Known Label GUID Mapping (datatable)
Use this let block as a prefix for label queries that need name resolution. It covers the Microsoft default sensitivity labels plus placeholder slots for custom labels from config.json.
// Microsoft default sensitivity label GUID mapping — all 12 labels
// Confirmed via Get-Label on default-configuration tenants
// Older tenants may have renamed labels or replaced parent GUIDs — validate with Get-Label if needed
// Add custom labels from config.json label_mapping section
let LabelMapping = datatable(LabelId: string, LabelName: string, LabelParent: string) [
// ── Microsoft Default Labels (defa4170-0d19-0005-XXXX-bc88714345d2 family) ──
// Priority 0-11, sequential GUID suffixes
"defa4170-0d19-0005-0000-bc88714345d2", "Personal", "",
"defa4170-0d19-0005-0001-bc88714345d2", "Public", "",
"defa4170-0d19-0005-0002-bc88714345d2", "General", "",
"defa4170-0d19-0005-0003-bc88714345d2", "Anyone (unrestricted)", "General",
"defa4170-0d19-0005-0004-bc88714345d2", "All Employees (unrestricted)", "General",
"defa4170-0d19-0005-0005-bc88714345d2", "Confidential", "",
"defa4170-0d19-0005-0006-bc88714345d2", "Anyone (unrestricted)", "Confidential",
"defa4170-0d19-0005-0007-bc88714345d2", "All Employees", "Confidential",
"defa4170-0d19-0005-0008-bc88714345d2", "Trusted People", "Confidential",
"defa4170-0d19-0005-0009-bc88714345d2", "Highly Confidential", "",
"defa4170-0d19-0005-000a-bc88714345d2", "All Employees", "Highly Confidential",
"defa4170-0d19-0005-000b-bc88714345d2", "Specified People", "Highly Confidential",
// ─── ADD CUSTOM LABELS FROM config.json label_mapping HERE ───
// Example: "<your-custom-label-guid>", "Your Custom Label", "Parent Label",
"END_MARKER", "END_MARKER", "END_MARKER"
];
Instructions: When building queries, read
config.jsonforlabel_mappingentries and insert them into thedatatableabove, replacing theEND_MARKERrow. Older/customized tenants: IfGet-Labelshows parent labels (Confidential, Highly Confidential) with random GUIDs instead ofdefa4170-*, the admin has recreated them — add the tenant-specific GUIDs fromconfig.jsonlabel_mappingorGet-Label. If resolved sub-label names differ from the datatable (e.g., "Non-business" vs "Personal"), prefer theGet-Labelname for that tenant.
Query 11: Label Coverage Overview — Sensitivity Label Landscape
Purpose: Discover which sensitivity labels appear in the data, their volume, and resolve GUIDs to names. Also includes a quick stats variant for Phase 1 coverage assessment.
Quick Stats Variant (run first in Phase 1):
// Label Coverage Assessment — run in Phase 1 to determine label analysis depth
DataSecurityEvents
| where Timestamp > ago(30d)
| summarize
TotalEvents = count(),
WithSIT = countif(isnotempty(SensitiveInfoTypeInfo) and SensitiveInfoTypeInfo != "[]"),
WithLabel = countif(isnotempty(SensitivityLabelId)),
WithPrevLabel = countif(isnotempty(PreviousSensitivityLabelId)),
LabelOnly_NoSIT = countif(isnotempty(SensitivityLabelId) and (isempty(SensitiveInfoTypeInfo) or SensitiveInfoTypeInfo == "[]")),
SIT_WithLabel = countif(isnotempty(SensitivityLabelId) and isnotempty(SensitiveInfoTypeInfo) and SensitiveInfoTypeInfo != "[]")
Full Label Landscape Query:
// Query 11: Label Landscape — which sensitivity labels appear and how often
// Prefix with LabelMapping datatable from above
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitivityLabelId)
| extend LabelIds = split(SensitivityLabelId, ",")
| mv-expand LabelIdRaw = LabelIds
| extend LabelId = tostring(trim(" ", tostring(LabelIdRaw)))
| where isnotempty(LabelId)
| lookup kind=leftouter LabelMapping on LabelId
| extend LabelDisplay = iff(isempty(LabelName) or LabelName == "END_MARKER",
strcat("[Unknown] ", LabelId),
iff(isempty(LabelParent), LabelName, strcat(LabelParent, " / ", LabelName)))
| summarize
EventCount = count(),
DistinctUsers = dcount(AccountUpn),
DistinctFiles = dcount(ObjectId),
ActionTypes = make_set(ActionType)
by LabelDisplay, LabelId
| order by EventCount desc
Post-processing: Flag any [Unknown] GUIDs for Tier 2/3 resolution. The LabelDisplay column renders as "Parent / Child" for sub-labels (e.g., "Highly Confidential / Project Obsidian") and just the label name for top-level labels.
Query 12: Top Users by Labeled Document Access (File-Based)
Purpose: Risk-rank users by labeled document access volume, excluding Copilot/AI events. This is the label-dimension equivalent of Query 3.
// Query 12: Top users by labeled document access (file-based only)
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitivityLabelId)
| where ActionType !has "Copilot" and Workload !in ("Copilot", "ConnectedAIApp")
| summarize
EventCount = count(),
DistinctLabels = dcount(SensitivityLabelId),
DistinctFiles = dcount(ObjectId),
ActionTypes = make_set(ActionType),
Workloads = make_set(Workload),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by AccountUpn
| order by EventCount desc
| take 30
Query 13: Label Downgrade & Change Tracking
Purpose: Find all events where a sensitivity label was downgraded, removed, or changed. Critical for detecting policy circumvention or insider risk.
// Query 13: Label downgrade/removal events — detect label circumvention
// Prefix with LabelMapping datatable to resolve both current and previous label GUIDs
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(PreviousSensitivityLabelId)
| extend CurrentLabelId = SensitivityLabelId
| extend PrevLabelId = PreviousSensitivityLabelId
| lookup kind=leftouter (LabelMapping | project LabelId, CurrentLabelName=LabelName, CurrentParent=LabelParent) on $left.CurrentLabelId == $right.LabelId
| lookup kind=leftouter (LabelMapping | project LabelId, PrevLabelName=LabelName, PrevParent=LabelParent) on $left.PrevLabelId == $right.LabelId
| extend CurrentDisplay = iff(isempty(CurrentLabelName), iff(isempty(CurrentLabelId), "[Removed]", strcat("[Unknown] ", CurrentLabelId)),
iff(isempty(CurrentParent), CurrentLabelName, strcat(CurrentParent, " / ", CurrentLabelName)))
| extend PrevDisplay = iff(isempty(PrevLabelName), strcat("[Unknown] ", PrevLabelId),
iff(isempty(PrevParent), PrevLabelName, strcat(PrevParent, " / ", PrevLabelName)))
| project Timestamp, ActionType, AccountUpn, ObjectId,
PreviousLabel = PrevDisplay, CurrentLabel = CurrentDisplay, Workload
| order by Timestamp desc
Key ActionTypes in label change events:
Label downgraded on a file— label lowered (e.g., HC → Confidential)Label removed from a file— label stripped entirelyLabel on file downgraded or removed, then file accessed by Copilot— label reduced AND Copilot accessed the now-less-protected file
Query 14: Copilot Label Exposure — Labeled Resources Accessed by Copilot
Purpose: Identify which sensitivity-labeled documents Copilot accessed during risky AI interactions. This surfaces data exposure risk where Copilot may be surfacing Highly Confidential content.
// Query 14: Copilot label exposure — what labeled docs did Copilot access in risky interactions?
// Prefix with LabelMapping datatable
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(RiskyAIUsageSensitivityLabelsInfo)
| where tostring(RiskyAIUsageSensitivityLabelsInfo) !has "null"
or (tostring(RiskyAIUsageSensitivityLabelsInfo) has "null" and tostring(RiskyAIUsageSensitivityLabelsInfo) has "SensitivityLabelId")
| mv-expand LabelEntry = parse_json(tostring(RiskyAIUsageSensitivityLabelsInfo))
| extend LabelJson = parse_json(tostring(LabelEntry))
| extend SubEntityName = tostring(LabelJson.SubEntityName)
| extend LabelId = tostring(LabelJson.SensitivityLabelId)
| where isnotempty(LabelId)
| lookup kind=leftouter LabelMapping on LabelId
| extend LabelDisplay = iff(isempty(LabelName) or LabelName == "END_MARKER",
strcat("[Unknown] ", LabelId),
iff(isempty(LabelParent), LabelName, strcat(LabelParent, " / ", LabelName)))
| summarize
EventCount = count(),
DistinctUsers = dcount(AccountUpn),
SubEntities = make_set(SubEntityName),
ActionTypes = make_set(ActionType)
by LabelDisplay, LabelId
| order by EventCount desc
SubEntityName values:
ResponseAccessedResource— a labeled document that Copilot cited in its responseResponse— the Copilot response itself that was flagged
Query 15: Label-Only Events — Events Triggered Purely by Label (No SIT Content Match)
Purpose: Find events where the trigger was the sensitivity label alone, not SIT content detection. These represent label-aware DLP/IRM policy matches.
// Query 15: Label-only events — triggered by label, not SIT content
// Prefix with LabelMapping datatable
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitivityLabelId)
| where isempty(SensitiveInfoTypeInfo) or SensitiveInfoTypeInfo == "[]"
| extend LabelIds = split(SensitivityLabelId, ",")
| mv-expand LabelIdRaw = LabelIds
| extend LabelId = tostring(trim(" ", tostring(LabelIdRaw)))
| where isnotempty(LabelId)
| lookup kind=leftouter LabelMapping on LabelId
| extend LabelDisplay = iff(isempty(LabelName) or LabelName == "END_MARKER",
strcat("[Unknown] ", LabelId),
iff(isempty(LabelParent), LabelName, strcat(LabelParent, " / ", LabelName)))
| summarize
EventCount = count(),
DistinctUsers = dcount(AccountUpn),
DistinctFiles = dcount(ObjectId),
ActionTypes = make_set(ActionType),
Workloads = make_set(Workload)
by LabelDisplay, LabelId
| order by EventCount desc
Why this matters: In label-mature environments, this query can surface significant activity that the SIT-only queries completely miss. If a document has a "Highly Confidential" label but no SIT content (e.g., manually labeled strategic document), it only appears here.
Query 16a: Copilot SIT Landscape with Priority Tiers
Purpose: Break down which SITs fire in Copilot interactions, classify by priority tier, and quantify service account vs human split. This is the entry point for Phase 2.5.
Prerequisite: Merge the well-known SIT GUID mapping datatable (above) with any
config.jsonsit_mappingentries before running.
// Query 16a: Copilot SIT Landscape — priority-tiered breakdown with agent/human split
// Prefix with SITMapping datatable
// ── SIT Priority Classification (canonical definition — Queries 16c/16d reference this) ──
let HighPrioritySITs = dynamic([
"50842eb7-edc8-4019-85dd-5a5c1f2bb085", // Credit Card Number
"a44669fe-0d48-453d-a9b1-2cc83f2cba77", // U.S. SSN
"0f587d92-eb28-44a9-bd1c-90f2892b47aa", // Azure DocumentDB Auth Key
"ce1a126d-186f-4700-8c0c-486157b953fd", // Azure SQL Connection String
"0b34bec3-d5d6-4974-b7b0-dcdb5c90c29d", // Azure IoT Connection String
"c7bc98e8-551a-4c35-a92d-d2c8cda714a7", // Azure Storage Account Key
"095a7e6c-efd8-46d5-af7b-5298d53a49fc" // Azure Redis Cache Connection String
// ── ADD credential/HR SIT GUIDs from config.json sit_mapping HERE ──
]);
let MediumPrioritySITs = dynamic([
// ── ADD project/employee ID SIT GUIDs from config.json sit_mapping HERE ──
]);
// ── Service account regex (update with org-specific patterns) ──
let ServiceAccountPattern = @"^(securitycopilotagentuser-|svc-)";
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "Copilot"
| where isnotempty(SensitiveInfoTypeInfo)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend IsAgent = AccountUpn matches regex ServiceAccountPattern
| summarize
TotalEvents = count(),
AgentEvents = countif(IsAgent),
HumanEvents = countif(not(IsAgent) and isnotempty(AccountUpn)),
HumanUsers = dcountif(AccountUpn, not(IsAgent) and isnotempty(AccountUpn)),
PromptEvents = countif(ActionType has "prompt"),
ResponseEvents = countif(ActionType has "response")
by SITId
| lookup kind=leftouter SITMapping on $left.SITId == $right.SITId
| extend SITDisplay = iff(isempty(SITName) or SITName == "END_MARKER", strcat("[Unknown] ", SITId), SITName)
| extend PriorityTier = case(
SITId in (HighPrioritySITs), "🔴 High",
SITId in (MediumPrioritySITs), "🟡 Medium",
"🔵 Low")
| project SITDisplay, PriorityTier, TotalEvents, AgentEvents, HumanEvents, HumanUsers, PromptEvents, ResponseEvents, SITId
| order by TotalEvents desc
Post-processing:
- Populate
HighPrioritySITsandMediumPrioritySITsarrays with credential, HR, and custom SIT GUIDs fromconfig.jsonsit_mapping. Any SIT not in either array defaults to Low. - If
config.jsonhas asit_prioritysection (GUID → tier mapping), use it to override the default classification. - Calculate Agent % of total — if > 50%, flag prominently in report ("⚠️ N% of Copilot SIT events are from automated service accounts").
- Unknown SITs (
[Unknown]) should be classified as 🔴 High by default (conservative — unknown custom SITs may be high-value EDM/exact data match).
Query 16b: Top Users by High-Priority SIT in Copilot (Excluding Service Accounts)
Purpose: Risk-rank human users whose Copilot interactions triggered high-priority SIT detections. Excludes automated service accounts.
// Query 16b: Top 20 human users by high-priority SIT in Copilot interactions
// Prefix with SITMapping datatable and HighPrioritySITs + ServiceAccountPattern from Query 16a
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "Copilot"
| where isnotempty(SensitiveInfoTypeInfo)
| where not(AccountUpn matches regex ServiceAccountPattern)
| where isnotempty(AccountUpn)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| where SITId in (HighPrioritySITs)
| lookup kind=leftouter SITMapping on $left.SITId == $right.SITId
| extend SITDisplay = iff(isempty(SITName) or SITName == "END_MARKER", strcat("[Unknown] ", SITId), SITName)
| summarize
Events = count(),
DistinctHighSITs = dcount(SITId),
SITNames = make_set(SITDisplay),
PromptEvents = countif(ActionType has "prompt"),
ResponseEvents = countif(ActionType has "response"),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by AccountUpn
| order by Events desc
| take 20
Query 16c: Daily Copilot SIT Trend by Priority Category
Purpose: Track daily Copilot SIT detection volume by priority tier to distinguish adoption changes from risk spikes.
// Query 16c: Daily Copilot SIT trend by priority category (human users only)
// ── Copy HighPrioritySITs from Query 16a (canonical list: CCN, SSN, Azure credentials + config.json custom) ──
// ── Copy ServiceAccountPattern from Query 16a ──
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "Copilot"
| where isnotempty(SensitiveInfoTypeInfo)
| where not(AccountUpn matches regex ServiceAccountPattern)
| where isnotempty(AccountUpn)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend PriorityTier = iff(SITId in (HighPrioritySITs), "High", "Low")
| summarize Events = count() by Day = bin(Timestamp, 1d), PriorityTier
| order by Day asc, PriorityTier asc
Post-processing: Render as a dual-line chart or table with High vs Low columns per day. Spikes in the High tier warrant investigation; spikes in Low tier alone are typically noise from broad SITs (All Full Names, IP Addresses) and can be noted but not escalated.
Query 16d: Prompt-Only Analysis — Human Users Typing Sensitive Data INTO Copilot
Purpose: The primary risk signal — users who typed sensitive data into Copilot prompts (not just receiving sensitive responses). Excludes service accounts and response-only events.
// Query 16d: Prompt-only human users with high-priority SIT detections
// This is the key risk signal: sensitive data entered BY the user INTO Copilot
// ── Copy HighPrioritySITs and ServiceAccountPattern from Query 16a ──
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "prompt" // Prompts only — user-initiated risk
| where isnotempty(SensitiveInfoTypeInfo)
| where not(AccountUpn matches regex ServiceAccountPattern)
| where isnotempty(AccountUpn)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| where SITId in (HighPrioritySITs)
| lookup kind=leftouter SITMapping on $left.SITId == $right.SITId
| extend SITDisplay = iff(isempty(SITName) or SITName == "END_MARKER", strcat("[Unknown] ", SITId), SITName)
| summarize
PromptEvents = count(),
DistinctHighSITs = dcount(SITId),
SITNames = make_set(SITDisplay),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by AccountUpn
| order by PromptEvents desc
| take 20
Why prompts matter more than responses:
- "Risky prompt entered in Copilot" = User typed/pasted sensitive data (SSN, credit card, credentials) into a Copilot prompt. This is a behavioral risk — the user chose to share sensitive data with AI.
- "Sensitive response received in Copilot" = Copilot retrieved and surfaced sensitive data from grounding sources (SharePoint, OneDrive, email). This is an access/configuration risk — overpermissioned data exposure.
- Prompt events are the stronger signal for insider risk and user coaching interventions.
Report Template
Report Rendering Rules
These rules are MANDATORY for all report output (inline chat and markdown file). Follow strictly.
Rule 1: Risk Rating Scale
When assigning risk levels to users in the file-based user ranking, use this hierarchy:
| Risk Level | Evidence Required |
|---|---|
| Critical | "Files collected and exfiltrated" ActionType present — confirmed insider risk exfiltration signal OR mass exfiltration pattern: ≥1,000 distinct files to removable media + file deletions within a ≤48-hour window (volume-based escalation even without the IRM-labeled ActionType) |
| High | Exfiltration signals below Critical thresholds (e.g., USB copies < 1,000 files without deletion pattern) OR sustained high DLP alert volume (top 2-3 by events/files) |
| Medium | Broad SIT diversity (10+ SIT types) OR cross-workload activity (3+ workloads) OR external domain WITHOUT explicit exfiltration signal |
| Low | Single-workload, moderate volume, no exfiltration or anomaly signals |
⛔ PROHIBITED: Rating a user with "Files collected and exfiltrated" as Medium or Low. This ActionType is always High or Critical. ⛔ PROHIBITED: Rating a user with ≥1,000 files USB-copied + deletion in ≤48h as anything below Critical.
Rule 2: Executive Summary Uses File-Based Metrics Only
The Executive Summary MUST cite file-based (non-Copilot) event counts and file counts for user risk descriptions. Never cite overall metrics that include Copilot volume — this inflates perceived risk.
| Context | Cite | Example |
|---|---|---|
| ✅ File-based risk summary | Non-Copilot events, non-Copilot files | "u3087 generated 211 file-based events across 32 files" |
| ❌ Inflated overall metrics | Total events including Copilot |
Rule 3: Top Users Overall Section — Copilot Compression
When Copilot events exceed 80% of total volume:
- Do NOT render a standalone "Top Users Overall" section dominated by Copilot service accounts
- Instead, include a brief note: "Top overall users are dominated by Copilot service accounts/heavy Copilot users — see file-based user ranking below for actual data access risk."
- If any users in the top-10 overall have multi-workload activity (Copilot + file operations), call them out in a single sentence rather than a full table
Rule 4: Copilot Count Reconciliation
When reporting Copilot vs file-based splits, ensure the counts reconcile across sections:
- Action Type breakdown Copilot total = Workload breakdown Copilot total
- If they differ (e.g., Connected AI App events counted differently), annotate the delta
- Show the reconciliation in the Action Type section: "Copilot interactions: N events (Action Types: Risky prompt X + Sensitive response Y + Combined Z = N)"
Rule 5: Scope & Limitations Section (Required for Markdown Reports)
Markdown file reports MUST include a Scope & Limitations section immediately after the Executive Summary. Include:
## Scope & Limitations
| Consideration | Detail |
|--------------|--------|
| **Data Source** | DataSecurityEvents (Defender XDR Advanced Hunting) — requires Insider Risk Management opt-in to share data with Defender XDR |
| **Coverage** | SIT detections + sensitivity label events — files with neither SIT content match nor sensitivity label do NOT appear in this data |
| **Label Coverage** | X% of events have sensitivity labels — label analysis depth adjusted accordingly (see Rule 11) |
| **Retention** | 30-day Advanced Hunting retention |
| **ML Classifiers** | N trainable classifier GUIDs could not be resolved — see Unresolved SIT GUIDs section |
| **Copilot Volume** | Copilot events represent X% of total volume and are separated from file-based analysis throughout this report |
Fill in the actual values for N and X% from the query results.
Rule 6: SIT Landscape Table Integrity
- Each GUID must appear exactly once in the SIT Landscape table — one row per GUID, no exceptions
- NEVER group multiple GUIDs into a single row with slash-separated values (e.g.,
1e883268/d2cdc387/bf6e0b84...). Even "copy" variants of the same SIT that share identical metrics MUST be separate rows - After all GUID resolution tiers complete, deduplicate by GUID — if conflicts exist, prefer the most specific resolution (Tier 3 PowerShell > Tier 2 config > Tier 1 embedded)
- Group the table by category: Custom/Organization SITs, Built-in Microsoft SITs, ML Classifiers (Unresolvable)
- Do NOT include a GUID under two different names
- The total distinct SIT count cited in the Executive Summary must equal the number of rows in the SIT Landscape tables (sum of all category sub-tables)
- Post-render verification (MANDATORY): After building all SIT Landscape sub-tables, count the total rows. If the exec summary cites a different number, update the exec summary to match. Format: "N active SIT types" where N = sum of rows across Custom, Built-in, and Unresolved sub-tables
⛔ PROHIBITED: Bundling GUIDs like "Credit Card Number copy (x3)" with 6,966 ea. — each of the 3 GUIDs must be its own row with its own exact counts
⛔ PROHIBITED: Exec summary citing a SIT count that doesn't match the actual row count in the SIT Landscape tables
Rule 7: Spike Detection — File-Based Primary, Overall Secondary
When rendering spike alerts:
- Primary: Always run and display Query 10b (file-based-only spikes) as the main spike alert section. This surfaces actual sensitive data access spikes.
- Secondary: Run Query 10 (all events) only if the user requests overall spikes or if there are interesting patterns worth noting. Include a clear note that these spikes are predominantly Copilot-driven.
- If only running one spike query, always prefer Query 10b.
- In the report, label sections clearly: "File-Based SIT Access Spikes" vs "Overall SIT Access Spikes (incl. Copilot)".
Rule 8: Top Files — Exclude System/Operational Files
The Top Files section must exclude Defender for Endpoint operational file reads:
C:\ProgramData\Microsoft\Windows Defender\DLPCache\*— DLP label metadata cache reads*\EBWebView\*— Edge WebView browser cache- Any path matching
\ProgramData\Microsoft\that is clearly a system/cache path
If system files appear in results despite the query filter, separate them into a "System/Operational Files" subsection below the main "User-Accessed Files" list.
Rule 9: Risk Rating Consistency — Exec Summary Must Match User Table
Every user mentioned in the Executive Summary MUST use the same risk rating as in the File-Based Top Users table. If the table says 🔴 Critical, the exec summary must say Critical (and vice versa).
- After building the File-Based Top Users table (the source of truth), cross-check every user mention in the exec summary
- If there is a conflict, the User Table rating wins — update the exec summary to match
- Never rate a user differently in two sections of the same report
⛔ PROHIBITED: Exec summary says "High" while user table says "Critical" for the same user (or any other mismatch).
Rule 10: Temporal Pattern — Include File-Based Event Column
The Temporal Pattern (daily volume) section MUST include a File Events column alongside the total. Without this, daily spikes appear alarming when they may be entirely Copilot-driven.
| Column | Required | Source |
|---|---|---|
| Date | ✅ | bin(Timestamp, 1d) |
| Daily Events | ✅ | Total count() |
| File Events | ✅ | countif(Workload !in ("Copilot", "ConnectedAIApp")) |
| Distinct Users | ✅ | dcount(AccountUpn) |
| Notable | ✅ | Annotation for spikes |
When annotating spikes (🔴), clarify whether the spike is Copilot-driven or file-driven:
- "🔴 Major spike — file-driven" (if File Events also spike)
- "🟡 Copilot adoption spike — file activity normal" (if only total spikes but File Events are stable)
Use Query 5 (updated) which returns both columns.
Rule 11: Adaptive Label Analysis Depth
The depth of sensitivity label analysis MUST be determined dynamically based on the Phase 1 Label Coverage Assessment. This prevents wasting queries in SIT-dominant environments while ensuring full coverage in label-mature environments.
| Label Coverage | Report Behavior |
|---|---|
| ≥5% labeled events | Full dedicated label sections: Label Landscape table, Label-Based Top Users, Label Changes, Copilot Label Exposure, Label-Only Events. These render as peer sections alongside SIT analysis |
| 1-5% labeled events | Condensed "Sensitivity Labels" section: Label Landscape table + Label Changes only. Other label dimensions mentioned in summary notes |
| <1% labeled events | Brief note in Scope & Limitations: "Sensitivity label data is sparse (<1% of events) — environment is SIT-dominant. N events had labels; see label coverage stats below." No dedicated label sections unless user asks |
| User explicitly asks about labels | Full label sections regardless of coverage percentage |
The Label Coverage Assessment also determines the overall report framing:
- SIT-dominant (<5% labels): Report title/framing stays "SIT Access Analysis" with label addendum
- Dual signal (5-50% labels): Report framing becomes "Data Security Analysis (SIT + Labels)"
- Label-dominant (>50% labels): Report framing becomes "Data Security Analysis" with labels as primary signal and SIT as secondary
⛔ PROHIBITED: Running all 5 label queries (11-15) when coverage is <1% and user didn't ask about labels — this wastes API calls and clutters the report.
Rule 12: Label Display Format — Always Show Parent/Child Hierarchy
When rendering sensitivity label names, ALWAYS show the parent-child hierarchy using "/" notation:
| Raw Label | Correct Display | Incorrect Display |
|---|---|---|
| Sub-label under Highly Confidential | "Highly Confidential / All Employees" | "All Employees" |
| Sub-label under Confidential | "Confidential / All employees" | "All employees" |
| Top-level label | "General" | "General" (correct as-is) |
| Unknown label | "[Unknown] abc12345..." | blank or omitted |
This prevents ambiguity — "All Employees" exists under BOTH Confidential and Highly Confidential as different labels with different GUIDs.
⛔ PROHIBITED: Displaying sub-label names without their parent (e.g., just "Specific people" — which parent?).
Rule 13: Service Account Separation in Copilot Analysis
All Copilot SIT analysis MUST separate automated service accounts from human users. Service accounts (Security Copilot agents, Purview agents, etc.) can generate 50-70% of Copilot event volume and must not inflate human risk metrics.
| Requirement | Detail |
|---|---|
| Identify service accounts | Filter: AccountUpn matches regex ServiceAccountPattern (defined in Query 16a: @"^(securitycopilotagentuser-|svc-)"). Check Query 16a results for additional org-specific patterns (any account with >10K events and a GUID-like UPN prefix is likely automated) |
| Report agent volume separately | Include a summary line: "⚠️ N automated service accounts generated X events (Y% of Copilot volume) — excluded from human risk analysis below" |
| Never mix in user rankings | Queries 16b-16d exclude agents by default. If rendering an overall Copilot user table, agents go in a separate subsection |
| Power-user flagging | After agent exclusion, if any single human user accounts for >20% of remaining Copilot events, flag them: "⚠️ Power user — may indicate automated workflow via Copilot" |
⛔ PROHIBITED: Including automated service accounts in human user risk rankings for Copilot SIT analysis. ⛔ PROHIBITED: Reporting raw Copilot event counts as "user interactions" without noting the ~2-3x row multiplication factor.
Rule 14: Copilot Event Row Multiplication Awareness
Each Copilot interaction generates multiple DSE rows (average ~2-3x, up to 35x for complex exchanges). Purview creates separate rows for:
- The prompt (if it contains a SIT match) — "Risky prompt entered in Copilot"
- The response (if grounding data contains SIT matches) — "Sensitive response received in Copilot"
- Compound events — agent interactions, SharePoint access, multiple simultaneous conditions
| Reporting Requirement | Format |
|---|---|
| Raw event counts | "N Copilot SIT events (raw DSE rows)" |
| Estimated interactions | "~N/2.5 estimated real interactions" (use the ratio from the environment if calculated) |
| User daily rates | If citing per-user daily rates, note whether raw or estimated: "~X interactions/day (estimated from Y raw events)" |
This rule prevents stakeholders from seeing "270K Copilot events" and panicking when the actual interaction count is ~95K.
Rule 15: Copilot SIT Report Section (Phase 2.5 Output)
When Phase 2.5 is triggered (Copilot >30% of events), render a dedicated "Copilot SIT Exposure" section with:
- Service Account vs Human Split — "X events from N service accounts (Y%), Z events from N human users (Y%)" with the agent exclusion note
- SIT Priority Landscape — Table from Query 16a showing SITs by priority tier (High/Medium/Low) with human-only metrics
- High-Priority SIT Users — Top 10 human users from Query 16b with prompt vs response breakdown
- Prompt Risk Signal — Top 10 users from Query 16d who typed high-priority SITs into Copilot prompts — this is the primary actionable finding
- Temporal Trend — Daily trend from Query 16c (optional — include if spikes are interesting)
If Copilot events are <30% of total, skip Phase 2.5 entirely and note in the Copilot section: "Copilot events represent X% of total volume — below threshold for dedicated analysis. See Action Type breakdown for Copilot summary."
Inline Chat Format
## 📊 Data Security Events Analysis
**Scope:** <Tenant-wide / SIT: <name> / User: <UPN>>
**Time Range:** <start> to <end>
**Total Events:** <N> | **Distinct Users:** <N> | **Distinct Files:** <N>
### SIT Landscape
| # | SIT Name | GUID (short) | Events | Users | Files | Classifier |
|---|----------|-------------|--------|-------|-------|------------|
| 1 | Credit Card Number | 50842eb7... | 7,255 | 46 | 346 | Content |
| 2 | All Full Names | 50b8b56b... | 128,957 | 1,475 | 119 | Content |
| ... | | | | | | |
### Action Type Breakdown
| Action Type | Events | Users | Files |
|-------------|--------|-------|-------|
| Sensitive response received in Copilot | 228,564 | ... | ... |
| Risky prompt entered in Copilot | 390,905 | ... | ... |
| ... | | | |
### 🔴 Top Users by SIT Volume (Risk-Ranked)
| # | User | Total Events | Distinct SITs | Distinct Files | Last Active |
|---|------|-------------|---------------|----------------|-------------|
| 1 | user@domain.com | 12,345 | 8 | 42 | 2026-03-16 |
| ... | | | | | |
### 🏷️ Sensitivity Label Landscape (if ≥1% coverage)
| # | Label | GUID (short) | Events | Users | Files |
|---|-------|-------------|--------|-------|-------|
| 1 | Highly Confidential / All Employees | defa4170...000a | 62 | 9 | 31 |
| 2 | General | defa4170...0002 | 55 | 12 | 32 |
| ... | | | | | |
### ⚠️ Label Changes (if any PreviousSensitivityLabelId events exist)
| Timestamp | User | File | Previous Label | Current Label | Action |
|-----------|------|------|---------------|--------------|--------|
| 2026-03-11 | user@domain.com | doc.docx | HC / Project X | Confidential / All employees | Label downgraded |
| ... | | | | | |
### 🤖 Copilot SIT Exposure (if Copilot >30% of events)
**Service Accounts:** N agents generated X events (Y%) — excluded from analysis below
**Row Multiplication:** ~2-3 DSE rows per real interaction (est. ~Z real interactions)
| # | SIT Name | Priority | Human Events | Human Users | Prompts | Responses |
|---|----------|----------|-------------|-------------|---------|----------|
| 1 | Credit Card Number | 🔴 High | 1,234 | 89 | 456 | 778 |
| 2 | All Full Names | 🔵 Low | 45,678 | 1,200 | 12,345 | 33,333 |
| ... | | | | | | |
**🔴 Prompt Risk — Users Typing High-Priority SITs Into Copilot:**
| # | User | Prompt Events | SIT Types | Last Active |
|---|------|-------------|-----------|-------------|
| 1 | user@domain.com | 142 | SSN, Credit Card | 2026-03-16 |
| ... | | | | |
### ⚠️ SIT Access Spike Alerts
| User | Baseline (daily avg) | Recent (daily avg) | Spike Ratio | Status |
|------|---------------------|-------------------|-------------|--------|
| user@domain.com | 5.2 | 47.1 | 9.06x | 🔴 Spike |
| ... | | | | |
### Unresolved SIT GUIDs
<List of GUIDs not in mapping — offer PowerShell resolution>
Markdown File Format
Same structure as inline, wrapped in proper markdown with:
- Report metadata header (generated timestamp, scope, tool versions)
- Scope & Limitations section immediately after Executive Summary (see Rule 5 above)
- Each section as H2/H3
- File-based user ranking as the primary risk section (NOT the Copilot-dominated overall ranking)
- DLP Policy Match section with DefaultRule explanation for empty PolicyName entries
- Sensitivity Label sections (if coverage ≥1% or user requested): Label Landscape, Label-Based Top Users, Label Changes, Copilot Label Exposure
- Code fences for any raw data samples
- Save to:
reports/data-security/DataSecurity_Analysis_<scope>_YYYYMMDD_HHMMSS.md
Known Pitfalls
| Pitfall | Detail | Mitigation |
|---|---|---|
SensitiveInfoTypeInfo is Collection(String), not dynamic |
Each element is a JSON string requiring double-parse: parse_json(tostring(SensitiveInfoTypeInfo)) to expand array, then parse_json(tostring(element)) to access fields |
Always double-parse. Direct dot-notation fails silently |
| Massive row counts | 600k+ rows in 30 days for mid-size tenants; millions for 100k+ user orgs | ALWAYS summarize first. NEVER retrieve raw rows without take limit |
mv-expand is expensive |
Expanding SensitiveInfoTypeInfo across 600k rows without pre-filtering is extremely slow | Pre-filter with where SensitiveInfoTypeInfo has "<GUID>" before mv-expand |
| Copilot dominates event volume | 90%+ of events can be Copilot-related. See Rules 2-3 for report handling and Phase 2.5 for dedicated Copilot analysis | Filter to specific ActionType values for file access. Use Workload !in ("Copilot", "ConnectedAIApp") for file-only analysis |
| Trainable classifiers (MLModel) don't resolve | GUIDs with ClassifierType: "MLModel" may not exist in Get-DlpSensitiveInformationType |
Display as [ML Classifier] <GUID> and note in report |
| SIT GUID is per-SIT, not per-detection | Multiple documents can match the same SIT GUID — the GUID identifies the SIT type, not the specific match | Use Count and Confidence fields from SITJson for detection-level detail |
ObjectId can be empty |
Copilot interaction events may not have an ObjectId (no specific file) | Filter isnotempty(ObjectId) for file-specific analysis |
| IRM opt-in required | DataSecurityEvents is populated by Insider Risk Management. No opt-in = empty table | Check for 0 results and explain the prerequisite |
| Table schema evolves | DataSecurityEvents is in Preview — column names and availability may change | Run getschema if queries fail with column resolution errors |
DlpPolicyMatchInfo is sparse |
Only ~0.5% of rows have DLP policy match data (the rest are IRM-only detections) | Don't assume all SIT events have DLP data; they're independent signals |
| Duplicate GUID in SIT mapping | One GUID can only resolve to one SIT name. If a GUID appears in both the embedded datatable and a Tier 2/3 resolution with a different name, the result will have duplicate rows with conflicting names. This can happen when a built-in SIT GUID overlaps with a custom SIT copy, or when PowerShell returns a different display name than the embedded mapping |
After resolving all GUIDs, deduplicate by GUID before rendering the SIT Landscape table. If a GUID maps to multiple names, prefer the Tier 3 (PowerShell) name over Tier 1 (embedded). Never show the same GUID on two rows with different names |
Empty PolicyName = DefaultRule ("Always audit") |
DLP alerts with empty/null PolicyName are typically generated by the built-in DefaultRule that fires when "Always audit file activity for devices" is enabled (ON by default). These are NOT orphaned or misconfigured policies — they are the expected result of the default audit setting |
In the DLP Policy Match section, explain: "Events with empty PolicyName are generated by the built-in DefaultRule, which audits all monitored file types (Office, PDF, CSV) on onboarded devices when 'Always audit file activity for devices' is enabled (default: ON). No explicit DLP policy is required for these events." |
| Compound ActionType values | Some events have ActionType values that combine multiple labels (e.g., "Risky prompt entered in Copilot, Sensitive response received in Copilot" or "Sensitive info shared on Teams, DLP Rule Matched"). These are literal string values from the data, not aggregation artifacts | Display compound ActionTypes exactly as they appear in the data. Do NOT split or merge them — they represent events where multiple conditions were met simultaneously |
| System/operational files dominate Top Files | Files under C:\ProgramData\Microsoft\Windows Defender\DLPCache\RMSLabels\ and *\EBWebView\* are Defender for Endpoint reading sensitivity label metadata — NOT user-initiated file access. These can dominate 90%+ of the Top Files list |
Query 4 filters these paths. If they still appear, separate into a "System/Operational Files" subsection. Never present DLPCache reads as user data access risk |
| Localized SIT names in CloudAppEvents | CloudAppEvents DLPRuleMatch events include SIT names, but names appear in the user's locale (e.g., "የዱቤ ካርድ ቁጥር" instead of "Credit Card Number" for Amharic users). Same GUID can map to different name strings depending on locale |
Always aggregate by SIT GUID, never by SIT name. Use the GUID-to-name mapping (Tier 1/2/3) for canonical English names. This applies when cross-referencing CloudAppEvents with DataSecurityEvents |
| Browsing events are not files — separate from Top Files | ActionTypes like "Generative AI websites browsed" and "Gambling websites browsed" reference URLs, not files. They have no ObjectId file path — only a URL domain. Including them under "Top Files" is misleading |
Render browsing/URL events in a separate subsection (e.g., "External / AI Service Access") below Top Files. Never mix URL-based events into the file ranking tables. Title the Top Files section accurately ("Top Files" not "Top Files & URLs") |
| Temporal spike annotations must reference known users | When annotating daily spikes in the Temporal Pattern table (e.g., "🔴 Major spike — u625 SharePoint batch"), the attributed user MUST appear elsewhere in the report — in the File-Based Top Users table, a drill-down section, or at minimum a footnote. Referencing a user that exists nowhere else in the report violates the evidence-based analysis rule and creates an unverifiable claim | Before annotating a spike with a user attribution, verify the user appears in the Top Users ranking. If they don't make the top-10 but are the spike driver, either: (a) add them to the user table with a note "included due to temporal spike attribution", or (b) use a generic annotation ("🔴 Major spike — file-driven") without naming the user |
SensitivityLabelId can contain comma-separated GUIDs |
When an event involves multiple sub-entities (e.g., Copilot accessing multiple labeled resources), the SensitivityLabelId column contains comma-separated GUIDs like aaaa-...,aaaa-...,bbbb-.... This is NOT a single GUID |
Always split(SensitivityLabelId, ",") then mv-expand to handle multi-GUID values. Querying with == "<GUID>" will miss events; use has "<GUID>" for filtering or split() + mv-expand for enumeration |
RiskyAIUsageSensitivityLabelsInfo is mostly [null] |
The column is populated on 90%+ of Copilot events, but almost all contain [null] (a JSON array with a literal null element). Only events where Copilot actually accessed labeled resources have real data |
Filter with tostring(RiskyAIUsageSensitivityLabelsInfo) !has "null" to exclude the null-dominated rows. The isnotempty() check alone is insufficient — [null] passes it |
Label GUIDs have no ClassifierType field |
Unlike SIT GUIDs which have ClassifierType: "Content" or "MLModel", label GUIDs have no built-in type indicator. Resolution requires external lookup (datatable, config, or PowerShell Get-Label) |
Use the Label GUID Mapping Strategy (3 tiers). Unknown labels display as [Unknown] <GUID> |
| Default label GUIDs are deterministic but not officially documented | All 12 default labels (including parents) use defa4170-0d19-0005-XXXX-bc88714345d2 with sequential suffixes 0000–000b (confirmed via Get-Label on default-configuration tenants). However, Microsoft does not publish these GUIDs in official docs. Older/customized tenants may have: (a) renamed default labels (e.g., "Non-business" instead of "Personal"), (b) replaced parent label GUIDs with random tenant-specific ones, or (c) added custom sub-labels that break the sequential pattern |
The embedded datatable includes all 12 confirmed default GUIDs. For tenants with custom parent GUIDs, add them via config.json label_mapping. If Get-Label returns different names for a defa4170 GUID, prefer the Get-Label name for that tenant |
PreviousSensitivityLabelId can equal SensitivityLabelId |
On "Label removed from a file" events, both fields may contain the same GUID (the label that was removed). The current label after removal may be empty | Check ActionType to distinguish: "Label downgraded" = actual label change; "Label removed" = label stripped (current may be empty); "Label on file downgraded or removed, then file accessed by Copilot" = compound event with Copilot exposure |
| Label-only events require label-aware DLP/IRM policies | Events with labels but no SIT content (Query 15) only appear if a DLP or IRM policy explicitly targets sensitivity labels. Environments with no label-based policies will see zero label-only events even if documents are labeled | If Query 15 returns 0 results AND labels exist (Query 11 > 0), this indicates no label-based DLP/IRM policies are configured — mention this as a gap in the report |
| Copilot service accounts inflate event counts | Automated service accounts can generate 50-70% of all Copilot events. See Rule 13 for full requirements | Filter: where not(AccountUpn matches regex ServiceAccountPattern) (defined in Query 16a). Run Query 16a to quantify agent vs human split |
| Copilot row multiplication (~2-3x per interaction) | Each interaction generates ~2-3 DSE rows (prompt + response + compound events), up to 35 for complex exchanges. See Rule 14 for reporting requirements | Estimate real interactions as raw_events / 2.5. Always include multiplier context when reporting Copilot volumes |
Correct field name is SensitiveInfoTypeId (not SensitiveType) |
Inside the SensitiveInfoTypeInfo JSON, the SIT GUID field is named SensitiveInfoTypeId. LLMs frequently generate SensitiveType or SensitiveTypeId — both are wrong and return null/empty when accessed |
Always use tostring(SITJson.SensitiveInfoTypeId). Other valid fields: Confidence (int), ClassifierType (string), Count (int), SubEntityName (string — "Prompt" or "Response" in Copilot events) |
SubEntityName distinguishes Prompt vs Response in Copilot SIT events |
Inside SensitiveInfoTypeInfo JSON, SubEntityName contains "Prompt" or "Response" — indicating whether the SIT was detected in the user's prompt or Copilot's response. This is more reliable than relying on ActionType text matching for prompt/response classification within individual SIT matches |
Use tostring(SITJson.SubEntityName) when doing per-SIT prompt/response breakdowns. ActionType classifies the event-level action; SubEntityName classifies the per-SIT-match context |
Error Handling
| Error | Cause | Resolution |
|---|---|---|
Failed to resolve table 'DataSecurityEvents' |
Table not available — IRM not opted in, or not connected to Defender XDR | Inform user: "DataSecurityEvents requires Microsoft Purview Insider Risk Management opt-in to share data with Defender XDR." |
0 results for SensitiveInfoTypeInfo queries |
No SIT detections in timeframe, or SIT detection not enabled in IRM policies | Widen time range; check if IRM policies include SIT detection |
Failed to resolve column 'ObjectName' |
Schema changed or column renamed | Use ObjectId instead (confirmed available). Run getschema to verify current schema |
PowerShell Get-DlpSensitiveInformationType fails |
Not connected to IPPS session | Run Connect-IPPSSession -UserPrincipalName <UPN> first |
The term 'Get-DlpSensitiveInformationType' is not recognized |
Module not installed or IPPS session in different terminal | Install-Module ExchangeOnlineManagement then Connect-IPPSSession in the same terminal session |
The term 'Get-Label' is not recognized |
Not connected to IPPS session or module not loaded | Same as above — Connect-IPPSSession provides both Get-DlpSensitiveInformationType and Get-Label |
Label GUIDs all show as [Unknown] |
Default label datatable doesn't match target tenant (parent GUIDs vary) | Resolve via Get-Label and persist to config.json label_mapping |
File Access Action Types Reference
When the user specifically asks about who opened/accessed/downloaded documents, filter to these ActionTypes:
| ActionType | Meaning |
|---|---|
Sensitive File read |
File opened on endpoint (Defender for Endpoint) |
File accessed on SPO |
File opened in SharePoint Online / OneDrive |
File downloaded from SharePoint |
File downloaded from SPO/OneDrive |
File copied to Removable media |
File copied to USB/removable storage |
File upload to cloud |
File uploaded to cloud storage |
Sensitive file created |
New file created with sensitive content |
File Archived |
File moved to archive |
Text copied to clipboard from sensitive file |
Clipboard copy from sensitive doc |
Copilot-related ActionTypes (separate category — AI interaction, not direct file access):
| ActionType | Meaning |
|---|---|
Sensitive response received in Copilot |
Copilot surfaced content matching a SIT |
Risky prompt entered in Copilot |
User prompt triggered risk detection |
Sensitive response received in Copilot;Agent generating sensitive responses |
Copilot agent generated response containing SIT matches |
Risky prompt entered in Copilot;Sensitive response received in Copilot |
Both prompt and response contained SIT matches |
Risky prompt entered in Copilot;Sensitive response received in Copilot;Exposing agent to risky prompts;Agent generating sensitive responses |
Compound agent event — prompt + response + agent risk |
Risky prompt entered in Copilot;Exposing agent to risky prompts |
User exposed a Copilot agent to a risky prompt |
Label-related ActionTypes (sensitivity label events):
| ActionType | Meaning |
|---|---|
Label downgraded on a file |
Sensitivity label lowered (e.g., HC → Confidential) |
Label removed from a file |
Sensitivity label stripped entirely |
Label on file downgraded or removed, then file accessed by Copilot |
Label reduced AND Copilot subsequently accessed the file |
DLP ActionTypes:
| ActionType | Meaning |
|---|---|
Generated High severity DLP alerts |
DLP policy triggered a high-severity alert |
DLP Rule Matched |
DLP rule matched (may be combined with other types) |
SVG Dashboard Generation
After generating a Data Security Events analysis report (markdown file output), an SVG dashboard can be created using the shared SVG rendering skill.
Trigger: User asks "generate an SVG dashboard from the report" or "visualize this report"
Workflow:
- Read this skill's
svg-widgets.yaml(widget manifest — defines layout, colors, field mapping) - Read
.github/skills/svg-dashboard/SKILL.md(rendering rules — component library, quality standards) - Extract data from the completed report using
data_sources.field_mapping_notes - Render SVG → save as
{report_basename}_dashboard.svgin the same directory
Layout: 5 rows — title banner, KPI cards (events/users/files/file%/copilot%/label%), top SITs bar chart + workload donut, risk-ranked users table + file action bars, assessment banner + recommendations.
.github/skills/email-threat-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill email-threat-posture -g -y
SKILL.md
Frontmatter
{
"name": "email-threat-posture",
"description": "Generate email threat protection reports and assess email security posture. Triggers on keywords like \"email threat report\", \"email security posture\", \"phishing report\", \"MDO report\", \"Defender for Office 365 report\", \"ZAP effectiveness\", \"Safe Links report\", \"DMARC report\", \"spam report\", \"email volume report\". Queries EmailEvents, EmailPostDeliveryEvents, UrlClickEvents, and EmailAttachmentInfo in Advanced Hunting for a posture assessment covering inbound mail flow, threat composition, phishing detection, email authentication (DMARC\/DKIM\/SPF), post-delivery remediation (ZAP), Safe Links click protection, attachment analysis, detection method effectiveness, and delivery disposition. Supports inline chat, markdown file, and SVG dashboard output.",
"drill_down_prompt": "Run email threat posture report — phishing trends, delivery gaps, protection effectiveness",
"threat_pulse_domains": [
"email"
]
}
Email Threat Protection Posture — Instructions
Purpose
This skill generates an Email Threat Protection Posture Report using Microsoft Defender for Office 365 (MDO) telemetry available through Advanced Hunting. It provides C-level visibility into how effectively the organization's email security stack is detecting, blocking, and remediating email-based threats.
What this skill covers:
| Domain | Key Questions Answered |
|---|---|
| 📬 Mail Flow Overview | How many inbound emails? What's the daily trend? Who are the top senders? |
| 🛡️ Threat Composition | How many phishing, spam, and malware threats were detected? |
| 🎯 Phishing Protection | How many phishing emails were blocked vs delivered? Who are the most targeted users? |
| 🔐 Email Authentication | What are the DMARC/DKIM/SPF/CompAuth pass rates? Which domains fail authentication? |
| 🧹 Post-Delivery Remediation | How effective is ZAP? How many remediations succeeded vs failed? |
| 🔗 Safe Links Protection | How many URL clicks were scanned? Were any phishing clicks allowed through? |
| 📎 Attachment Analysis | What attachment types are flowing through email? Were any malicious? |
| 📊 Detection Methods | What detection technologies are catching threats (URL detonation, fingerprinting, etc.)? |
| 📦 Delivery Disposition | Where do emails end up — inbox, junk, quarantine, blocked? |
| 🚨 MDO Incidents | How many security incidents were generated by Defender for Office? What severity, status, and types? |
Data sources: EmailEvents, EmailPostDeliveryEvents, UrlClickEvents, EmailAttachmentInfo, SecurityAlert, SecurityIncident (Advanced Hunting)
References:
- Microsoft Docs — EmailEvents table
- Microsoft Docs — EmailPostDeliveryEvents table
- Microsoft Docs — UrlClickEvents table
- Microsoft Docs — EmailAttachmentInfo table
- MDO Efficacy Query — Microsoft Learn
🔴 URL Registry — Canonical Links for Report Generation
MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL.
| Label | Canonical URL |
|---|---|
DOCS_EMAILEVENTS |
https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-emailevents-table |
DOCS_EMAILPOSTDELIVERY |
https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-emailpostdeliveryevents-table |
DOCS_URLCLICKEVENTS |
https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-urlclickevents-table |
DOCS_EMAILATTACHMENTINFO |
https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-emailattachmentinfo-table |
DOCS_MDO_EFFICACY |
https://learn.microsoft.com/en-us/defender-office-365/reports-mdo-email-collaboration-dashboard#appendix-advanced-hunting-efficacy-query-in-defender-for-office-365-plan-2 |
DOCS_MDO_OVERVIEW |
https://learn.microsoft.com/en-us/defender-office-365/mdo-about |
DOCS_SECURITY_ALERT |
https://learn.microsoft.com/en-us/azure/sentinel/data-connectors/microsoft-sentinel-security-alert |
DOCS_ZAP |
https://learn.microsoft.com/en-us/defender-office-365/zero-hour-auto-purge |
DOCS_SAFE_LINKS |
https://learn.microsoft.com/en-us/defender-office-365/safe-links-about |
📑 TABLE OF CONTENTS
- Critical Workflow Rules — Mandatory rules
- Email Protection Score Formula — Composite posture scoring
- Execution Workflow — Phase-by-phase query plan
- Sample KQL Queries — All queries (Q1–Q14)
- Output Modes — Inline vs Markdown report
- Inline Report Template — Chat-rendered format
- Markdown File Report Template — Disk-saved format
- Known Pitfalls — Schema quirks and edge cases
- Quality Checklist — Pre-delivery validation
- SVG Dashboard Generation — Visual dashboard from report
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
-
Use
RunAdvancedHuntingQueryby default — EmailEvents and related tables are XDR-native tables available in Advanced Hunting. UseTimestampas the datetime column. If a query fails in AH, fall back to Sentinel Data Lake (query_lake) usingTimeGenerated. -
Default lookback: 7 days — Unless the user specifies a different period. This provides a meaningful weekly snapshot for executive reporting while staying within AH's 30-day retention.
-
ASK the user for output format before generating the report:
- Inline chat summary (quick review in chat)
- Markdown file report (detailed, archived to
reports/email-threat-posture/) - Both (markdown + inline summary)
-
⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (
✅ No [finding] detected) when queries return 0 results. Never fabricate data. -
Run queries in parallel batches where possible — Phase 1 queries (Q1–Q4) are independent. Phase 2 queries (Q5–Q8) are independent. Phase 3 queries (Q9–Q12) are independent.
-
PII handling — Do NOT include recipient email addresses in inline reports or markdown files. Aggregate by domain or use anonymized references (e.g., "2 users in the contoso.com domain"). Top sender domains from external sources are acceptable.
-
Percentages must be grounded — Always show both the percentage AND the raw count (e.g., "99.8% clean (5,851 of 5,864)").
Email Protection Score Formula
The Email Protection Score is a composite posture indicator summarizing the effectiveness of email security controls. Higher scores indicate stronger protection (inverse of a risk score).
Scoring Dimensions
$$ \text{EmailProtectionScore} = \sum_{i} \text{DimensionScore}_i $$
Each dimension contributes 0–20 points to a maximum of 100:
| Dimension | Max | 🟢 High (16–20) | 🟡 Medium (8–15) | 🔴 Low (0–7) |
|---|---|---|---|---|
| Threat Block Rate | 20 | ≥95% of threats not in inbox (post-ZAP final state) | 80–94% remediated | <80% remediated (threats remain in inbox) |
| Email Authentication | 20 | SPF+DMARC+DKIM all ≥95% | Any one 80–94% | Any one <80% |
| ZAP Effectiveness | 20 | ≥95% ZAP success rate + 0 failed ZAPs | 80–94% success OR 1–2 failures | <80% success OR ≥3 failures |
| Safe Links Protection | 20 | 0 phishing click-throughs AND active scanning | 1–2 phishing click-throughs | ≥3 phishing click-throughs OR no scanning |
| Phishing Delivery Rate | 20 | 0 phishing emails delivered (post-ZAP) | 1–5 phishing delivered (post-ZAP) | >5 phishing still in mailboxes (post-ZAP) |
Interpretation Scale
| Score | Rating | Action |
|---|---|---|
| 85–100 | ✅ Strong | Excellent posture — maintain current configurations |
| 65–84 | 🟡 Good | Minor gaps — review flagged dimensions |
| 45–64 | 🟠 Needs Improvement | Multiple weaknesses — prioritize remediation |
| 0–44 | 🔴 Critical | Significant exposure — immediate action required |
Execution Workflow
Phase 0: Prerequisites
- Confirm
RunAdvancedHuntingQueryis available (EmailEvents tables are AH-native) - Ask user for output format (inline / markdown / both)
- Confirm lookback period (default: 7 days)
Phase 1: Mail Flow & Threat Overview (Q1–Q4)
Run in parallel — no dependencies between queries.
| Query | Purpose |
|---|---|
| Q1 | Inbound email summary with threat breakdown |
| Q2 | Email volume trend by day |
| Q3 | Delivery action and location breakdown |
| Q4 | Detection methods breakdown |
Phase 2: Protection Effectiveness (Q5–Q8)
Run in parallel — no dependencies between queries.
| Query | Purpose |
|---|---|
| Q5 | Email authentication pass rates (DMARC/DKIM/SPF/CompAuth) |
| Q6 | ZAP and post-delivery remediation summary |
| Q7 | Safe Links click activity summary |
| Q8 | Phishing emails delivered (not blocked) |
Phase 3: Deep Dives & Governance (Q9–Q12)
Run in parallel — no dependencies between queries.
| Query | Purpose |
|---|---|
| Q9 | Top phishing sender domains |
| Q10 | Most targeted recipients (aggregated) |
| Q11 | Attachment type distribution |
| Q12 | Post-ZAP threat state (latest delivery location) |
Phase 4: MDO Security Incidents (Q13–Q14)
Run in parallel — no dependencies between queries.
| Query | Purpose |
|---|---|
| Q13 | MDO incident summary by severity and status |
| Q14 | MDO incident type breakdown (top alert-driven incidents) |
⚠️ SecurityAlert.Status is IMMUTABLE — always "New" regardless of actual state. These queries use the canonical SecurityAlert→SecurityIncident join to get real Status and Classification from the SecurityIncident table. See copilot-instructions.md Known Table Pitfalls.
Phase 5: Score Computation & Report Generation
- Compute per-dimension scores from Phase 1–4 data
- Sum dimension scores for composite Email Protection Score
- Generate report in requested output mode
- Offer SVG dashboard if not already requested
Sample KQL Queries
All queries below are verified against the EmailEvents family of tables. Use them exactly as written, substituting only the lookback period where noted. These queries use
Timestampfor Advanced Hunting. If falling back to Data Lake, replaceTimestampwithTimeGenerated.
Query 1: Inbound Email Summary with Threat Breakdown
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize
TotalInbound = count(),
Clean = countif(isempty(ThreatTypes)),
Phish = countif(ThreatTypes has "Phish"),
Malware = countif(ThreatTypes has "Malware"),
Spam = countif(ThreatTypes has "Spam"),
HighConfPhish = countif(ConfidenceLevel has "High" and ThreatTypes has "Phish"),
Blocked = countif(DeliveryAction == "Blocked"),
Delivered = countif(DeliveryAction == "Delivered"),
Junked = countif(DeliveryAction == "Junked"),
DistinctSenders = dcount(SenderFromAddress),
DistinctRecipients = dcount(RecipientEmailAddress)
| project TotalInbound, Clean, Phish, Malware, Spam, HighConfPhish,
Blocked, Delivered, Junked, DistinctSenders, DistinctRecipients
Query 2: Email Volume Trend by Day
EmailEvents
| where Timestamp > ago(7d)
| summarize
Inbound = countif(EmailDirection == "Inbound"),
Outbound = countif(EmailDirection == "Outbound"),
IntraOrg = countif(EmailDirection == "Intra-org")
by Day = bin(Timestamp, 1d)
| order by Day asc
Query 3: Delivery Action & Location Breakdown
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize Count = count() by DeliveryAction, DeliveryLocation
| order by Count desc
Query 4: Detection Methods Breakdown
EmailEvents
| where Timestamp > ago(7d)
| where isnotempty(DetectionMethods) and DetectionMethods != "{}"
| extend DetMethods = parse_json(DetectionMethods)
| extend FirstDetection = tostring(bag_keys(DetMethods)[0])
| extend FirstSubcategory = iif(
FirstDetection != "" and array_length(DetMethods[FirstDetection]) > 0,
strcat(FirstDetection, ": ", tostring(DetMethods[FirstDetection][0])),
FirstDetection)
| summarize Count = count() by FirstSubcategory
| order by Count desc
Query 5: Email Authentication Pass Rates
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| extend AuthDetails = parse_json(AuthenticationDetails)
| extend
DMARC = tostring(AuthDetails.DMARC),
DKIM = tostring(AuthDetails.DKIM),
SPF = tostring(AuthDetails.SPF),
CompAuth = tostring(AuthDetails.CompAuth)
| summarize
TotalEmails = count(),
DMARCPass = countif(DMARC == "pass"),
DMARCFail = countif(DMARC == "fail"),
DKIMPass = countif(DKIM == "pass"),
DKIMFail = countif(DKIM == "fail"),
SPFPass = countif(SPF == "pass"),
SPFFail = countif(SPF == "fail"),
CompAuthPass = countif(CompAuth has "pass"),
CompAuthFail = countif(CompAuth == "fail")
Query 6: ZAP & Post-Delivery Remediation Summary
EmailPostDeliveryEvents
| where Timestamp > ago(7d)
| summarize
TotalActions = count(),
PhishZAP = countif(ActionType == "Phish ZAP"),
MalwareZAP = countif(ActionType == "Malware ZAP"),
SpamZAP = countif(ActionType == "Spam ZAP"),
ThreatZAPTotal = countif(ActionType in ("Phish ZAP", "Malware ZAP", "Spam ZAP")),
ManualRemediation = countif(ActionType has "Admin"),
SuccessCount = countif(ActionResult == "Success"),
ErrorCount = countif(ActionResult == "Error")
| project TotalActions, PhishZAP, MalwareZAP, SpamZAP, ThreatZAPTotal, ManualRemediation, SuccessCount, ErrorCount
Query 7: Safe Links Click Activity Summary
UrlClickEvents
| where Timestamp > ago(7d)
| summarize
TotalClicks = count(),
BlockedClicks = countif(ActionType == "ClickBlocked"),
AllowedClicks = countif(ActionType == "ClickAllowed"),
ClickedThrough = countif(IsClickedThrough == true),
PhishClicks = countif(ThreatTypes has "Phish"),
DistinctUrls = dcount(Url),
DistinctUsers = dcount(AccountUpn)
Query 8: Phishing Emails Delivered (Not Blocked)
EmailEvents
| where Timestamp > ago(7d)
| where ThreatTypes has "Phish"
| where DeliveryAction == "Delivered" or LatestDeliveryAction == "Delivered"
| summarize
DeliveredPhish = count(),
DistinctRecipients = dcount(RecipientEmailAddress),
DistinctSenders = dcount(SenderFromAddress),
Subjects = make_set(Subject, 5)
Query 9: Top Phishing Sender Domains
EmailEvents
| where Timestamp > ago(7d)
| where ThreatTypes has "Phish"
| summarize
Count = count(),
DistinctRecipients = dcount(RecipientEmailAddress),
DeliveredCount = countif(DeliveryAction == "Delivered" or LatestDeliveryAction == "Delivered")
by SenderFromDomain
| top 10 by Count
Query 10: Most Targeted Recipients (Aggregated by Domain)
EmailEvents
| where Timestamp > ago(7d)
| where isnotempty(ThreatTypes) and EmailDirection == "Inbound"
| extend RecipientDomain = tostring(split(RecipientEmailAddress, "@")[1])
| summarize
ThreatCount = count(),
PhishCount = countif(ThreatTypes has "Phish"),
SpamCount = countif(ThreatTypes has "Spam"),
MalwareCount = countif(ThreatTypes has "Malware"),
DistinctRecipients = dcount(RecipientEmailAddress)
by RecipientDomain
| order by ThreatCount desc
Query 11: Attachment Type Distribution
EmailAttachmentInfo
| where Timestamp > ago(7d)
| summarize
Count = count(),
DistinctFiles = dcount(FileName),
ThreatCount = countif(isnotempty(ThreatTypes))
by FileType
| order by Count desc
| take 15
Query 12: Post-ZAP Threat State (Latest Delivery Location)
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| where isnotempty(ThreatTypes)
| summarize Count = count() by LatestDeliveryAction, LatestDeliveryLocation, ThreatTypes
| order by Count desc
Query 13: MDO Incident Summary by Severity and Status
Uses the canonical SecurityAlert→SecurityIncident join. Filters to
ProductName == "Office 365 Advanced Threat Protection"and excludes Communication Compliance alerts (CC_prefix).
let MDOAlerts = SecurityAlert
| where TimeGenerated > ago(7d)
| where ProductName == "Office 365 Advanced Threat Protection"
| where AlertName !startswith "CC_"
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId;
SecurityIncident
| where CreatedTime > ago(7d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner MDOAlerts on $left.AlertId == $right.SystemAlertId
| summarize IncidentCount = dcount(IncidentNumber) by Severity, Status, Classification
| order by Severity asc, IncidentCount desc
Query 14: MDO Incident Type Breakdown (Top Alert-Driven Incidents)
Groups incidents by title and alert composition to show the most common MDO-generated incident types.
let MDOAlerts = SecurityAlert
| where TimeGenerated > ago(7d)
| where ProductName == "Office 365 Advanced Threat Protection"
| where AlertName !startswith "CC_"
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName;
SecurityIncident
| where CreatedTime > ago(7d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner MDOAlerts on $left.AlertId == $right.SystemAlertId
| summarize
IncidentCount = dcount(IncidentNumber),
AlertCount = count(),
OpenCount = dcountif(IncidentNumber, Status == "New" or Status == "Active"),
ClosedCount = dcountif(IncidentNumber, Status == "Closed"),
TruePositives = dcountif(IncidentNumber, Classification == "TruePositive")
by AlertName, Severity
| order by IncidentCount desc
| take 10
Output Modes
Mode 1: Inline Chat Summary
Render the full analysis directly in the chat response. Best for quick review and C-level briefings.
Mode 2: Markdown File Report
Save a comprehensive report to disk at:
reports/email-threat-posture/Email_Threat_Protection_Report_YYYYMMDD_HHMMSS.md
Mode 3: Both
Generate the markdown file AND provide an inline summary in chat.
Always ask the user which mode before generating output.
Inline Report Template
Render the following sections in order. Omit sections only if explicitly noted as conditional.
🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).
# 📧 Email Threat Protection Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** Microsoft Defender for Office 365 (Advanced Hunting)
**Analysis Period:** <StartDate> → <EndDate> (<N> days)
**Protected Mailboxes:** <DistinctRecipients>
**Total Inbound Emails:** <N>
**Email Protection Score:** <Score>/100 — <RATING>
---
## Executive Summary
<2-3 sentences: total inbound volume, threat detection rate, key findings, overall posture rating>
**Email Protection Score:** 🟢/🟡/🟠/🔴 <RATING> (<Score>/100)
---
## Key Metrics
| Metric | Value |
|--------|-------|
| Total Inbound Emails | <N> |
| Clean Email Rate | <N>% (<clean> of <total>) |
| Threats Detected | <N> (Phish: <N>, Spam: <N>, Malware: <N>) |
| Threats Blocked Pre-Delivery | <N> |
| Phishing Delivered (Now Remediated) | <N> |
| Threat ZAP Actions | <N> (Phish: <N>, Malware: <N>, Spam: <N>) |
| Total Post-Delivery Actions | <N> (includes system events) |
| ZAP Success Rate | <N>% (Failed: <N>) |
| Threats Still in Mailboxes (Post-ZAP) | <N> (Phish: <N>, Spam: <N>) |
| Safe Links Clicks Scanned | <N> |
| Phishing Click-Throughs | <N> |
| Distinct Senders | <N> |
| Protected Mailboxes | <N> |
---
## 📬 Mail Flow Overview
### Daily Volume Trend
<Table or sparkline showing inbound/outbound/intra-org by day>
| Day | Inbound | Outbound | Intra-org |
|-----|---------|----------|-----------|
| <date> | <N> | <N> | <N> |
**Observations:** <Note any spikes, trends, or anomalies>
---
## 🛡️ Threat Composition
### Threat Categories
| Category | Count | % of Threats |
|----------|-------|-------------|
| Phishing | <N> | <N>% |
| Spam | <N> | <N>% |
| Malware | <N> | <N>% |
| High-Confidence Phishing | <N> | — |
### Detection Methods
| Method | Count |
|--------|-------|
| <method> | <N> |
### Top Phishing Sender Domains
| Domain | Phish Count | Delivered | Recipients Hit |
|--------|-------------|-----------|----------------|
| <domain> | <N> | <N> | <N> |
<If Q9 returns 0 phishing domains:>
✅ No phishing sender domains detected.
---
## 📦 Delivery Disposition
### Initial Delivery Action
| Action | Location | Count |
|--------|----------|-------|
| Delivered | Inbox/folder | <N> |
| Blocked | Dropped | <N> |
| Blocked | Quarantine | <N> |
| Junked | Junk folder | <N> |
### Post-ZAP Threat State
<Shows where threats currently reside after ZAP remediation>
| Latest Action | Location | Threat Type | Count |
|---------------|----------|-------------|-------|
| <action> | <location> | <type> | <N> |
**Summary of current threat locations (post-ZAP):**
| Current Location | Threat Count | % of Threats |
|-----------------|-------------|-------------|
| 🟢 Quarantine | <N> | <N>% |
| 🟢 Junk folder | <N> | <N>% |
| 🟢 Blocked/Dropped/Failed | <N> | <N>% |
| 🟢 Deleted items | <N> | <N>% |
| 🔴 **Still in Inbox** | **<N>** | **<N>%** |
| **Total** | **<N>** | **100%** |
> Show the phishing vs spam breakdown for "Still in Inbox": e.g., "<N> phishing (<N> total threats including spam)"
---
## 🔐 Email Authentication
| Protocol | Pass Rate | Pass Count | Fail Count | Other/None |
|----------|-----------|------------|------------|------------|
| SPF | <N>% | <N> | <N> | <N> |
| DMARC | <N>% | <N> | <N> | <N> |
| DKIM | <N>% | <N> | <N> | <N> |
| CompAuth | <N>% | <N> | <N> | <N> |
> **Note:** "Other/None" = emails with no result for that protocol (e.g., no DKIM signature). A low DKIM pass rate with 0 failures means unsigned senders, not spoofing. Compare against DMARC and CompAuth for the complete authentication picture.
**Assessment:**
- <emoji> <finding for each protocol>
---
## 🧹 Post-Delivery Remediation (ZAP)
| Metric | Value |
|--------|-------|
| Threat ZAP Actions | <N> (Phish: <N>, Malware: <N>, Spam: <N>) |
| Total Post-Delivery Actions | <N> (includes system events, admin actions) |
| ZAP Success Rate | <N>% (<success> of <total>) |
| Failed Remediations | <N> |
> **Reporting guidance:** The Key Metrics "Threat ZAP Actions" row should show **only** the Phish + Malware + Spam ZAP count — NOT the TotalActions, which includes system-initiated post-delivery events (message trace updates, delivery location changes). TotalActions is shown separately with a clarifying note.
<If ErrorCount > 0:>
⚠️ **<N> ZAP remediation(s) failed** — manual follow-up recommended. Threats may remain in user mailboxes.
<If ErrorCount == 0:>
✅ All post-delivery remediations completed successfully.
---
## 🔗 Safe Links Protection
| Metric | Value |
|--------|-------|
| Total Clicks Scanned | <N> |
| Clicks Blocked | <N> |
| Clicks Allowed | <N> |
| Phishing Clicks | <N> |
| Click-Through Overrides | <N> |
| Distinct URLs Scanned | <N> |
| Users Protected | <N> |
<If PhishClicks > 0:>
🔴 **<N> phishing URL click(s) detected** — investigate affected users for credential compromise.
<If PhishClicks == 0:>
✅ No phishing URL click-throughs detected.
---
## 📎 Attachment Analysis
### Top Attachment Types
| File Type | Count | Distinct Files | Threats Detected |
|-----------|-------|----------------|------------------|
| <type> | <N> | <N> | <N> |
<If any ThreatCount > 0:>
⚠️ **Malicious attachments detected in <N> file type(s)** — verify delivery status and endpoint execution.
<If all ThreatCount == 0:>
✅ No malicious attachments detected in email flow.
---
## 🎯 Targeted Recipients
| Recipient Domain | Threat Count | Phish | Spam | Malware | Recipients |
|-----------------|-------------|-------|------|---------|------------|
| <domain> | <N> | <N> | <N> | <N> | <N> |
---
## Email Protection Score Card
```
┌──────────────────────────────────────────────────────┐
│ EMAIL PROTECTION SCORE: <NN>/100 │
│ Rating: <EMOJI> <RATING> │
├──────────────────────────────────────────────────────┤
│ Threat Block Rate [<bar>] <N>/20 (<detail>) │
│ Email Authentication [<bar>] <N>/20 (<detail>) │
│ ZAP Effectiveness [<bar>] <N>/20 (<detail>) │
│ Safe Links Protection[<bar>] <N>/20 (<detail>) │
│ Phishing Delivery [<bar>] <N>/20 (<detail>) │
└──────────────────────────────────────────────────────┘
```
---
## 🚨 MDO Security Incidents
### Incident Summary (Last <N> Days)
| Severity | Open | Closed | True Positive | Total |
|----------|------|--------|---------------|-------|
| 🔴 High | <N> | <N> | <N> | <N> |
| 🟠 Medium | <N> | <N> | <N> | <N> |
| 🟡 Low | <N> | <N> | <N> | <N> |
| 🔵 Informational | <N> | <N> | <N> | <N> |
| **Total** | **<N>** | **<N>** | **<N>** | **<N>** |
### Top MDO Incident Types
| Alert Name | Severity | Incidents | Open | Closed | True Positives |
|------------|----------|-----------|------|--------|----------------|
| <name> | <sev> | <N> | <N> | <N> | <N> |
<If Q13 returns 0 incidents:>
✅ No MDO-generated security incidents in the analysis period.
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |
---
## Recommendations
1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...
---
## Appendix: Query Execution Summary
| Query | Description | Records | Time |
|-------|-------------|---------|------|
| Q1 | Inbound Email Summary | <N> | <time> |
| Q2 | Daily Volume Trend | <N> | <time> |
| ... | ... | ... | ... |
| Q13 | MDO Incident Summary | <N> | <time> |
| Q14 | MDO Incident Types | <N> | <time> |
Markdown File Report Template
When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:
reports/email-threat-posture/Email_Threat_Protection_Report_YYYYMMDD_HHMMSS.md
Include the following additional sections in the file report that are omitted from inline:
- Top sender domains table (full top 10 by volume with phish/spam breakdown)
- Authentication failure breakdown by domain (domains failing DMARC/DKIM/SPF)
- Overridden threats (emails detected as threats but allowed by policy)
- Complete detection methods table (all detection categories, not just top)
- First-contact phishing attempts (emails from never-before-seen senders flagged as phish)
- MDO security incidents — Full severity × status breakdown + top incident types from Q13/Q14
- Raw query references — note that full query definitions are in this SKILL.md file
Markdown Section Ordering
Follow this exact section order in markdown file reports:
| Order | Section | Source |
|---|---|---|
| 1 | Header (with Total Inbound + Score) | Template header |
| 2 | Executive Summary | Template |
| 3 | Key Metrics | Template |
| 4 | Mail Flow Overview (daily trend) | Q2 |
| 5 | Threat Composition (categories + detection methods + top phish senders) | Q1, Q4, Q9 |
| 6 | Delivery Disposition (initial + post-ZAP threat state) | Q3, Q12 |
| 7 | Email Authentication (with auth failures by domain) | Q5, QM2 |
| 8 | Post-Delivery Remediation (ZAP) | Q6 |
| 9 | Safe Links Protection | Q7 |
| 10 | Attachment Analysis | Q11 |
| 11 | Targeted Recipients | Q10 |
| 12 | — Deep-dive sections start here — | |
| 13 | Overridden Threats | QM3 |
| 14 | First-Contact Phishing | QM4 |
| 15 | MDO Security Incidents | Q13, Q14 |
| 16 | Top Sender Domains by Volume | QM1 |
| 17 | — Score and assessment — | |
| 18 | Email Protection Score Card | Computed |
| 19 | Security Assessment | Synthesized |
| 20 | Recommendations | Synthesized |
| 21 | Appendix: Query Execution Summary | All queries |
| 22 | References | URL Registry |
Key rule: Score Card → Assessment → Recommendations always come AFTER all data sections (including deep dives). This ensures the reader sees all evidence before the overall assessment.
Additional Queries for Markdown File Deep Dives
These queries provide enrichment data for the markdown file report only. Skip for inline mode.
QM1: Top Sender Domains by Volume
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize
EmailCount = count(),
PhishCount = countif(ThreatTypes has "Phish"),
SpamCount = countif(ThreatTypes has "Spam"),
DistinctSenders = dcount(SenderFromAddress)
by SenderFromDomain
| order by EmailCount desc
| take 10
QM2: Authentication Failures by Domain
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| extend AuthDetails = parse_json(AuthenticationDetails)
| extend
DMARC = tostring(AuthDetails.DMARC),
DKIM = tostring(AuthDetails.DKIM),
SPF = tostring(AuthDetails.SPF)
| summarize
TotalEmails = count(),
DMARCFail = countif(DMARC == "fail"),
DKIMFail = countif(DKIM == "fail"),
SPFFail = countif(SPF == "fail")
by SenderFromDomain
| where DMARCFail > 0 or DKIMFail > 0 or SPFFail > 0
| order by TotalEmails desc
| take 15
QM3: Overridden Threats (Allow Policies)
EmailEvents
| where Timestamp > ago(7d)
| where OrgLevelAction == "Allow" and isnotempty(ThreatTypes)
| summarize Count = count() by ThreatTypes, OrgLevelPolicy, DetectionMethods
| order by Count desc
QM4: First-Contact Phishing Attempts
EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| where IsFirstContact == true
| where ThreatTypes has "Phish" or UrlCount > 3
| summarize
FirstContactCount = count(),
PhishCount = countif(ThreatTypes has "Phish"),
HighUrlCount = countif(UrlCount > 3),
DistinctSenders = dcount(SenderFromAddress)
File Report Header
# Email Threat Protection Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** Microsoft Defender for Office 365 (Advanced Hunting)
**Analysis Period:** <StartDate> → <EndDate> (<N> days)
**Protected Mailboxes:** <DistinctRecipients>
**Total Inbound Emails:** <N>
**Email Protection Score:** <Score>/100 — <RATING>
---
Known Pitfalls
1. DetectionMethods Is a JSON String
Problem: DetectionMethods looks like it should be dynamic but is a string column containing JSON. Direct property access fails.
Solution: Always parse_json(DetectionMethods) before accessing sub-keys:
| extend DetMethods = parse_json(DetectionMethods)
| extend FirstDetection = tostring(bag_keys(DetMethods)[0])
2. AuthenticationDetails Is a JSON String
Problem: Same as DetectionMethods — AuthenticationDetails is a string column, not dynamic.
Solution: Always parse_json(AuthenticationDetails):
| extend AuthDetails = parse_json(AuthenticationDetails)
| extend DMARC = tostring(AuthDetails.DMARC)
3. ThreatTypes Is Pipe-Delimited
Problem: ThreatTypes can contain multiple values pipe-delimited (e.g., "Phish|Spam"). Using == will miss multi-category threats.
Solution: Always use has operator:
| where ThreatTypes has "Phish" // ✅ Correct
| where ThreatTypes == "Phish" // ❌ Misses "Phish|Spam"
4. Timestamp vs TimeGenerated
Problem: Advanced Hunting uses Timestamp for XDR-native tables. Sentinel Data Lake uses TimeGenerated.
Solution: Default queries use Timestamp (AH). If falling back to Data Lake, replace Timestamp with TimeGenerated throughout.
5. IsFirstContact May Be Null
Problem: IsFirstContact can be null for outbound or intra-org emails. Filtering on it without scoping to inbound emails may miss records.
Solution: Always filter EmailDirection == "Inbound" before using IsFirstContact.
6. LatestDeliveryAction vs DeliveryAction
Problem: DeliveryAction is the initial delivery disposition. LatestDeliveryAction reflects the current state after ZAP or manual remediation. Reporting only DeliveryAction overstates the number of threats in mailboxes.
Solution: When assessing current threat exposure, use LatestDeliveryAction and LatestDeliveryLocation. When assessing initial filter effectiveness, use DeliveryAction.
7. DKIM Pass Rate May Be Lower Than Expected
Problem: DKIM pass rate can appear low because many legitimate emails (especially bulk/marketing) don't sign with DKIM at all. An email with no DKIM signature isn't a DKIM "fail" — it simply has no result. The DKIM field from AuthenticationDetails may be empty or "none" rather than "fail".
Solution: When computing DKIM pass rate, note the denominator: emails with a DKIM result vs total emails. A lower DKIM rate is expected and doesn't necessarily indicate spoofing. Compare against DMARC and CompAuth for a better authentication picture.
8. ZAP ErrorCount May Include Non-Threat Emails
Problem: ZAP errors can occur for legitimate reasons: shared mailboxes, retention policies preventing purge, user-moved emails. A ZAP error doesn't always mean a threat is still active.
Solution: When reporting ZAP failures, note that manual investigation may confirm the threat was already handled. Don't over-alarm on ZAP errors without context.
9. ZAP TotalActions ≠ Threat ZAP Count
Problem: EmailPostDeliveryEvents includes all post-delivery events — not just ZAP threat remediations. The TotalActions count from Q6 includes system-initiated events (message trace updates, delivery location changes, admin investigation submissions). Reporting TotalActions as "ZAP Remediations" in Key Metrics massively overstates the threat remediation picture (e.g., 7,790 total when only 674 are actual threat ZAPs).
Solution: Always use ThreatZAPTotal (PhishZAP + MalwareZAP + SpamZAP) for headline ZAP metrics. Show TotalActions separately with a clarifying note: "includes system events". In Key Metrics, use "Threat ZAP Actions: 674" not "ZAP Remediations: 7,790".
10. Threat Block Rate — Post-ZAP vs Pre-Delivery
Problem: The scoring dimension "Threat Block Rate" can be interpreted two ways: (a) pre-delivery block rate (threats blocked before reaching inbox), or (b) final disposition rate (threats not in inbox after ZAP). These give different numbers — e.g., 72.5% pre-delivery vs 81.2% post-ZAP.
Solution: The dimension measures final threat disposition (post-ZAP) — the percentage of detected threats that are NOT currently in user inboxes. This is the operationally relevant metric because it reflects actual user exposure. The dimension description explicitly says "not in inbox (post-ZAP final state)".
Quality Checklist
Before delivering the report, verify:
- All percentage values show both percentage AND raw count
- All queries used
Timestamp(AH) orTimeGenerated(Data Lake) consistently - Zero-result queries are reported with explicit absence confirmation (✅ pattern)
- The Email Protection Score calculation is transparent with per-dimension evidence
- Detection methods show the full breakdown, not just "threats detected"
- ZAP effectiveness distinguishes threat ZAP count vs TotalActions (no inflated headline metric)
- Key Metrics ZAP row shows ThreatZAPTotal (Phish+Malware+Spam), NOT TotalActions
- Safe Links section distinguishes blocked vs allowed vs click-through
- Email authentication covers all four protocols with Other/None column
- Threats-in-mailbox summary breaks down phishing vs spam (not just total)
- Markdown report follows section ordering guidance (data → deep dives → score → assessment)
- Post-ZAP state (Q12) shows where threats currently reside, not just initial delivery
- MDO incidents section shows severity × status breakdown with open/closed/TP counts
- Incident queries use canonical SecurityAlert→SecurityIncident join (NOT SecurityAlert.Status)
- Recommendations are prioritized and evidence-based
- All hyperlinks copied verbatim from the URL Registry — no fabricated URLs
- No recipient PII (email addresses) in the report — aggregate by domain only
- Daily volume trend includes at least a note on peak/anomaly days
SVG Dashboard Generation
📊 Optional post-report step. After an Email Threat Protection report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/email-threat-posture/Email_Threat_Protection_Report_<date>.md - Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/email-threat-posture/{report_name}_dashboard.svg
The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.
.github/skills/exposure-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill exposure-investigation -g -y
SKILL.md
Frontmatter
{
"name": "exposure-investigation",
"description": "Use this skill when asked to generate a vulnerability and exposure management report, assess security posture, or review CVEs, security configurations, and attack paths. Triggers on keywords like \"vulnerability report\", \"exposure report\", \"CVE assessment\", \"security posture\", \"vulnerability assessment\", \"exposure management\", \"patch status\", \"end of support\", \"security recommendations\", \"attack paths\", \"critical assets\", \"configuration compliance\", \"Defender device health\", \"security score\", \"TVM\", \"threat and vulnerability management\", or when asking about overall organizational vulnerability\/exposure state. This skill queries DeviceTvm* tables and ExposureGraphNodes\/Edges to produce a comprehensive posture report covering CVEs, exploitable vulnerabilities, security configuration compliance, end-of-support software, critical asset inventory, attack paths, Defender device health, and certificate status. Supports org-wide and per-device scoping with inline chat and markdown file output.",
"drill_down_prompt": "Run vulnerability and exposure report — CVEs, attack paths, critical assets, configuration compliance",
"threat_pulse_domains": [
"exposure"
]
}
Vulnerability & Exposure Management Report — Instructions
Purpose
This skill generates a comprehensive Vulnerability & Exposure Management Report covering the full security posture of the organization (or a specific device). It goes beyond CVEs to include security configuration compliance, end-of-support software, Exposure Management critical assets, attack paths, and certificate status.
Entity Type: Organization-wide (default) or single device
| Scope | Primary Tables | Use Case |
|---|---|---|
| Org-wide (default) | DeviceTvmSoftwareVulnerabilities, ExposureGraphNodes, ExposureGraphEdges |
Full organizational posture assessment |
| Per-device | DeviceTvmSoftwareVulnerabilities, DeviceTvmSecureConfigurationAssessment |
Focused device vulnerability review |
What this skill covers:
| Section | Data Source | Coverage |
|---|---|---|
| CVE Vulnerabilities | DeviceTvmSoftwareVulnerabilities + DeviceTvmSoftwareVulnerabilitiesKB |
Severity distribution, exploitable CVEs, CVSS scores |
| Security Configuration | DeviceTvmSecureConfigurationAssessment + ...KB |
OS, Network, Security Controls, Accounts, Application compliance |
| End-of-Support Software | DeviceTvmSoftwareInventory |
EoS/EoL software with dates and affected devices |
| Critical Assets | ExposureGraphNodes |
Criticality levels, internet-facing, RCE/privesc flags |
| Attack Paths | ExposureGraphEdges + ExposureGraphNodes |
Multi-hop paths from vulnerable to critical assets |
| Defender Device Health | DeviceTvmSecureConfigurationAssessment + DeviceInfo |
AV mode, signatures, RTP, tamper protection, cloud protection compliance by active/inactive status |
| Certificate Status | DeviceTvmCertificateInfo |
Expired and expiring certificates |
| Software Evidence (drill-down) | DeviceTvmSoftwareEvidenceBeta |
File paths, registry paths linking vulnerable software to on-disk locations — used for targeted remediation |
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Output Modes - Inline chat vs. Markdown file
- Quick Start - 8-step execution pattern
- Execution Workflow - Complete phased process
- Sample KQL Queries - Validated query patterns (Queries 1-11, 13-16)
- Drill-Down Reference Queries - Targeted file-level evidence for remediation (Queries 17-19)
- Report Template - Output structure and formatting
- Per-Device Mode - Single device scoping
- Known Pitfalls - Edge cases
- Error Handling - Troubleshooting guide
Investigation shortcuts:
- Specific CVE assessment (TP Q12): Q2 (exploitable CVEs + KB details) → Q5 (per-device vuln counts, scoped) → Q14 (top vulnerable software) → Q17/Q18 (file evidence drill-down)
- Internet-facing critical asset exposure (TP Q11): Q7 (critical asset inventory) → Q15 (internet-facing + vulns) → Q10a (vulnerable device summary) → Q10b (blast radius edges) → Q16 (multi-hop attack paths, optional)
- Per-device vulnerability review (TP Q12, TP Q1): Q5 (per-device vuln counts) → Q6 (per-device compliance) → Q8 (high-impact misconfigs) → Q13 (certificates)
- Fleet-wide posture report (standalone): Q1 (severity dist) → Q2 (exploitable) → Q3 (config compliance) → Q4 (EoS software) → Q7 (critical assets) → Q9 (Defender health)
- Defender health audit (TP Q11, standalone): Q9 (fleet summary by control×OS) → Q11 (non-compliant exceptions, active only) → Q6 (per-device compliance scorecard)
- Attack path analysis (TP Q11+Q12): Q10a (vulnerable device exposure) → Q10b (1-hop blast radius) → Q15 (internet-facing critical + vulns) → Q16 (multi-hop paths, optional)
- Software version sprawl (after Q14 or Q2): Q14 (top vulnerable software) → Q17 (version sprawl by source) → Q18 (CVE to file mapping) → Q19 (stale extension folders)
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY vulnerability/exposure report:
- ALL queries in this skill use
RunAdvancedHuntingQuery— DeviceTvm* and ExposureGraph* tables are Advanced Hunting only (NOT in Sentinel Data Lake) - No Sentinel workspace selection is required — this skill does NOT query Sentinel Data Lake tables
- ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or both (default: both)
- ALWAYS ask the user for scope if ambiguous: org-wide (default) or specific device name
- ALWAYS run independent queries in parallel for performance
- ALWAYS use
create_filefor markdown reports (NEVER use PowerShell terminal commands) - ALWAYS sanitize PII from saved reports — use generic placeholders for real hostnames, tenant names, and UPNs in committed files (reports/ files for user's own use may contain real values)
- ExposureGraph tables are snapshot data — no
TimestamporTimeGeneratedfilter needed - DeviceTvm assessment tables — use
summarize arg_max(Timestamp, *) by DeviceId, ConfigurationIdto get the latest assessment per device×config. Do NOT useTimestamp > ago(1d)as a pre-filter — lab/weekend environments may have stale data and return 0 results
Tool Selection
| Table Pattern | Tool | Notes |
|---|---|---|
DeviceTvm* |
RunAdvancedHuntingQuery |
AH-only tables |
ExposureGraphNodes |
RunAdvancedHuntingQuery |
AH-only, snapshot data, no timestamp filter |
ExposureGraphEdges |
RunAdvancedHuntingQuery |
AH-only, snapshot data, no timestamp filter |
🔴 PROHIBITED:
- ❌ Using
mcp_sentinel-data_query_lakefor any table in this skill - ❌ Adding
TimeGeneratedfilters to ExposureGraph queries - ❌ Reporting findings without actual query evidence
- ❌ Fabricating CVE IDs, CVSS scores, or device names
When invoked from a parent skill (threat-pulse, incident-investigation, etc.)
- Skip output mode and scope prompts — the parent skill controls output format
- Use the investigation shortcut that matches the parent trigger (see shortcuts above the TOC)
- For quick triage: run only the shortcut query chain
- For deep investigation: run the full phased workflow
Output Modes
Mode 1: Inline Chat Summary (default for quick requests)
Compact executive summary rendered directly in chat.
Mode 2: Markdown File Report
Full detailed report saved to reports/exposure/vulnerability_exposure_report_<YYYYMMDD_HHMMSS>.md.
Mode 3: Both (default when user says "report" or "generate report")
Inline chat executive summary + full markdown file.
Ask user if not specified:
"How would you like the report? I can provide:
- Inline chat summary — executive overview in chat
- Markdown file — detailed report saved to reports/exposure/
- Both (recommended) — summary in chat + full report file"
Quick Start (TL;DR)
8-step execution pattern for org-wide report:
Step 1: Determine scope (org-wide or specific device) and output mode
Step 2: Run Phase 1 queries in parallel — CVE distribution, exploitable CVEs, config compliance
Step 3: Run Phase 2 queries in parallel — EoS software, per-device vulns, per-device compliance
Step 4: Run Phase 3 queries in parallel — ExposureGraph critical assets, high-impact misconfigs, Defender health fleet summary
Step 5: Run Phase 4 queries in parallel — Attack paths, Defender health exceptions, certificates
Step 6: Run Phase 5 (optional) — Top vulnerable software, internet-facing critical assets
Step 7: Compute summary metrics and risk assessment
Step 8: Render inline chat executive summary
Step 9: Generate markdown file report (if requested)
Execution Workflow
Phase 1: Core Vulnerability & Compliance (3 parallel queries)
Run these simultaneously:
| Query | Description | Reference |
|---|---|---|
| Q1 | CVE severity distribution | Query 1 |
| Q2 | Exploitable CVEs (with known exploits) | Query 2 |
| Q3 | Security config compliance by category | Query 3 |
Phase 2: Software & Per-Device Detail (3 parallel queries)
| Query | Description | Reference |
|---|---|---|
| Q4 | End-of-support software inventory | Query 4 |
| Q5 | Per-device vulnerability counts | Query 5 |
| Q6 | Per-device compliance scorecard | Query 6 |
Phase 3: Exposure Management & Defender Health (3 parallel queries)
| Query | Description | Reference |
|---|---|---|
| Q7 | Critical asset inventory | Query 7 |
| Q8 | High-impact misconfigurations with remediation | Query 8 |
| Q9 | Defender health fleet summary | Query 9 |
Phase 4: Attack Paths & Supplementary (4 parallel queries)
| Query | Description | Reference |
|---|---|---|
| Q10a | Vulnerable device exposure summary (fast) | Query 10a |
| Q10b | Edge connectivity from vulnerable devices (fast) | Query 10b |
| Q11 | Defender health non-compliant exceptions | Query 11 |
| Q13 | Certificate expiration status | Query 13 |
Phase 5: Supplementary Detail (optional, 3 parallel queries)
Run only if Phase 1-4 reveal high-risk items:
| Query | Description | Reference |
|---|---|---|
| Q14 | Top vulnerable software by CVE count | Query 14 |
| Q15 | Internet-facing critical assets with vulnerabilities | Query 15 |
| Q16 | Multi-hop attack path enumeration (slow — graph-match) | Query 16 |
Phase 6: Render Output
- Compute summary metrics from all query results
- Assign overall risk rating (see Risk Assessment)
- Render inline chat executive summary
- Generate markdown file (if requested)
Sample KQL Queries
All queries use
RunAdvancedHuntingQueryvia the Sentinel Triage MCP server.
Query 1: CVE Severity Distribution
DeviceTvmSoftwareVulnerabilities
| summarize
DeviceCount = dcount(DeviceId),
VulnCount = count()
by VulnerabilitySeverityLevel
| order by VulnCount desc
Purpose: Top-level severity breakdown for executive summary.
Query 2: Exploitable CVEs
DeviceTvmSoftwareVulnerabilities
| join kind=inner DeviceTvmSoftwareVulnerabilitiesKB on CveId
| where IsExploitAvailable == true
| summarize
AffectedDevices = dcount(DeviceName),
DeviceList = make_set(DeviceName)
by CveId, VulnerabilitySeverityLevel, CvssScore, VulnerabilityDescription
| order by CvssScore desc, AffectedDevices desc
| take 20
Purpose: Highest-risk CVEs — known exploits mean active threat. These are always Priority 1.
Query 3: Security Config Compliance by Category
DeviceTvmSecureConfigurationAssessment
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| summarize
TotalAssessments = count(),
CompliantCount = countif(IsCompliant == true),
NonCompliantCount = countif(IsCompliant == false)
by ConfigurationCategory
| extend ComplianceRate = round(100.0 * CompliantCount / TotalAssessments, 1)
| order by NonCompliantCount desc
Purpose: Compliance posture across OS, Network, Security Controls, Accounts, Application categories.
Query 4: End-of-Support Software
DeviceTvmSoftwareInventory
| where EndOfSupportStatus != ""
| summarize
AffectedDevices = dcount(DeviceId),
DeviceList = make_set(DeviceName)
by SoftwareVendor, SoftwareName, SoftwareVersion, EndOfSupportStatus, EndOfSupportDate
| order by AffectedDevices desc
Purpose: Identify unsupported software — no patches available, high risk.
EndOfSupportStatus values:
EOS Software— Entire product line end-of-supportEOS Version— Specific version end-of-supportUpcoming EOS Version— EoS within next 6 months
Query 5: Per-Device Vulnerability Counts
DeviceTvmSoftwareVulnerabilities
| summarize
Critical = countif(VulnerabilitySeverityLevel == "Critical"),
High = countif(VulnerabilitySeverityLevel == "High"),
Medium = countif(VulnerabilitySeverityLevel == "Medium"),
Low = countif(VulnerabilitySeverityLevel == "Low"),
Total = count()
by DeviceName, OSPlatform
| order by Critical desc, High desc, Total desc
Purpose: Per-device vulnerability heatmap — identifies most vulnerable endpoints.
Query 6: Per-Device Compliance Scorecard
DeviceTvmSecureConfigurationAssessment
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| summarize
TotalChecks = count(),
Compliant = countif(IsCompliant == true),
NonCompliant = countif(IsCompliant == false),
NotApplicable = countif(IsApplicable == false)
by DeviceName
| extend ComplianceRate = round(100.0 * Compliant / (Compliant + NonCompliant), 1)
| order by ComplianceRate asc
Purpose: Rank devices by compliance rate — worst-first for remediation priority.
Query 7: Critical Asset Inventory
🔴 MCP Property Access:
NodePropertiesis stored as a JSON string. Direct dot-notation (NodeProperties.rawData.criticalityLevel) returns null through MCP serialization. MUST use doubleparse_json(tostring())extraction — see Known Pitfalls.
ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend critLevel = rawData.criticalityLevel
| extend critValue = toint(critLevel.criticalityLevel)
| extend ruleBasedCrit = toint(critLevel.ruleBasedCriticalityLevel)
| extend ruleNames = tostring(critLevel.ruleNames)
| where isnotnull(critLevel) and critValue < 4
| extend InternetFacing = iff(isnotnull(rawData.IsInternetFacing), "Yes", "No")
| extend VulnerableToRCE = iff(isnotnull(rawData.vulnerableToRCE), "Yes", "No")
| extend VulnerableToPrivEsc = iff(isnotnull(rawData.VulnerableToPrivilegeEscalation), "Yes", "No")
| extend ExposureScore = tostring(rawData.exposureScore)
| project
DeviceName = NodeName,
CriticalityLevel = critValue,
RuleBasedCriticality = ruleBasedCrit,
RuleNames = ruleNames,
InternetFacing,
VulnerableToRCE,
VulnerableToPrivEsc,
ExposureScore,
NodeLabel
| order by CriticalityLevel asc
Purpose: Inventory critical assets with exposure flags — feeds into prioritization.
Criticality Levels:
- 0-1: Most critical (domain controllers, high-value servers)
- 2-3: High priority
- 4+: Standard (excluded from this query)
Note on zero results: If this query returns 0 results, it means no devices have criticality classifications. Check the raw
NodePropertieswithExposureGraphNodes | where set_has_element(Categories, "device") | extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData)) | project NodeName, rawData | take 5to verify property structure. Criticality is auto-assigned for domain controllers (Level 0) and can be manually assigned in the Exposure Management portal.
Query 8: High-Impact Misconfigurations
DeviceTvmSecureConfigurationAssessment
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| where IsCompliant == false and IsApplicable == true
| summarize AffectedDevices = dcount(DeviceId), DeviceList = make_set(DeviceName) by ConfigurationId
| join kind=inner DeviceTvmSecureConfigurationAssessmentKB on ConfigurationId
| project
ConfigurationId,
ConfigurationName,
ConfigurationCategory,
ConfigurationSubcategory,
ConfigurationImpact,
RiskDescription,
RemediationOptions,
AffectedDevices,
DeviceList
| order by ConfigurationImpact desc, AffectedDevices desc
| take 20
Purpose: Top misconfigurations ranked by impact score with actionable remediation steps from the KB.
ConfigurationImpact scores:
- 9-10: Critical — must remediate immediately
- 7-8: High — remediate in short term
- 4-6: Medium — plan remediation
- 1-3: Low — monitor
Query 9: Defender Health Fleet Summary
// Defender Health Fleet Summary — compliance by control × OS × active/inactive status
// Active = DeviceInfo last seen within 7 days; Inactive = last seen > 7 days ago
// SCID Mapping:
// Windows: scid-2010 (AVMode), scid-2011 (AVSignatures), scid-2012 (RTP),
// scid-2013 (PUA), scid-2016 (CloudProtection), scid-2003 (TamperProtection),
// scid-91 (BehaviourMonitoring), scid-2030 (CoreComponentsUpdate)
// macOS: scid-5090 (RTP), scid-5091 (PUA), scid-5094 (Cloud), scid-5095 (AVSigs)
// Linux: scid-6090 (RTP), scid-6091 (PUA), scid-6094 (Cloud), scid-6095 (AVSigs)
let defenderSCIDs = dynamic([
"scid-2010", "scid-2011", "scid-2012", "scid-2013", "scid-2016",
"scid-2003", "scid-91", "scid-2030",
"scid-5090", "scid-5091", "scid-5094", "scid-5095",
"scid-6090", "scid-6091", "scid-6094", "scid-6095"
]);
let deviceStatus = DeviceInfo
| summarize arg_max(Timestamp, DeviceName, OSPlatform) by DeviceId
| extend DeviceActivity = iff(Timestamp > ago(7d), "Active", "Inactive");
DeviceTvmSecureConfigurationAssessment
| where ConfigurationId in~ (defenderSCIDs)
| where IsApplicable == 1
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| extend Control = case(
ConfigurationId =~ "scid-2010", "AVMode",
ConfigurationId =~ "scid-2011", "AVSignatures",
ConfigurationId =~ "scid-2012", "RealtimeProtection",
ConfigurationId =~ "scid-2013", "PUAProtection",
ConfigurationId =~ "scid-2016", "CloudProtection",
ConfigurationId =~ "scid-2003", "TamperProtection",
ConfigurationId =~ "scid-91", "BehaviourMonitoring",
ConfigurationId =~ "scid-2030", "CoreComponentsUpdate",
ConfigurationId =~ "scid-5090", "RealtimeProtection",
ConfigurationId =~ "scid-5091", "PUAProtection",
ConfigurationId =~ "scid-5094", "CloudProtection",
ConfigurationId =~ "scid-5095", "AVSignatures",
ConfigurationId =~ "scid-6090", "RealtimeProtection",
ConfigurationId =~ "scid-6091", "PUAProtection",
ConfigurationId =~ "scid-6094", "CloudProtection",
ConfigurationId =~ "scid-6095", "AVSignatures",
ConfigurationId)
| join kind=leftouter deviceStatus on DeviceId
| extend DeviceActivity = coalesce(DeviceActivity, "Unknown")
| summarize
Compliant = countif(IsCompliant == 1),
NonCompliant = countif(IsCompliant == 0),
TotalDevices = dcount(DeviceId)
by Control, OSPlatform, DeviceActivity
| extend ComplianceRate = round(100.0 * Compliant / (Compliant + NonCompliant), 1)
| order by DeviceActivity asc, Control asc, OSPlatform asc
Purpose: Fleet-scale Defender for Endpoint health dashboard. Shows compliance rates for each security control by OS platform, split by active/inactive device status. Designed for environments with 1000+ devices — does NOT list individual devices.
Defender Controls Assessed:
| Control | Description | Critical? |
|---|---|---|
| AVMode | Antivirus running in Active mode (vs Passive/EDR Blocked) | 🔴 Yes |
| AVSignatures | Antivirus signature definitions are current | 🟠 High |
| RealtimeProtection | Real-time file scanning enabled | 🔴 Yes |
| PUAProtection | Potentially Unwanted Application blocking enabled | 🟡 Medium |
| CloudProtection | Cloud-delivered protection (MAPS) enabled | 🟠 High |
| TamperProtection | Tamper Protection prevents disabling security settings | 🔴 Yes |
| BehaviourMonitoring | Behavioral analysis and monitoring enabled | 🟠 High |
| CoreComponentsUpdate | MDE unified agent / core components current | 🟡 Medium |
Active vs Inactive Classification:
- Active: Device last seen in
DeviceInfowithin 7 days — these are operational endpoints - Inactive: Device last seen > 7 days ago — stale signature data is expected and should NOT be flagged as a security gap
Interpretation guidance: Focus on active devices with non-compliant critical controls (AVMode, RTP, TamperProtection). Inactive devices with stale AVSignatures are expected — report as "X inactive devices not reporting" rather than "X devices with outdated signatures."
SCID reference: Based on Jeffrey Appel's Defender health guide and Azure/Azure-Sentinel MDE_DeviceHealth.YAML.
Query 10a: Vulnerable Device Exposure Summary
ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend HasHighCritVulns = isnotnull(rawData.highRiskVulnerabilityInsights)
and tostring(parse_json(tostring(rawData.highRiskVulnerabilityInsights)).hasHighOrCritical) == "true"
| extend VulnerableToRCE = isnotnull(rawData.vulnerableToRCE)
| extend VulnerableToPrivEsc = isnotnull(rawData.VulnerableToPrivilegeEscalation)
| extend InternetFacing = isnotnull(rawData.IsInternetFacing)
| extend critLevel = rawData.criticalityLevel
| extend IsCritical = isnotnull(critLevel) and toint(critLevel.criticalityLevel) < 4
| summarize
TotalDevices = count(),
HighCritVulnDevices = countif(HasHighCritVulns),
RCEVulnDevices = countif(VulnerableToRCE),
PrivEscVulnDevices = countif(VulnerableToPrivEsc),
InternetFacingDevices = countif(InternetFacing),
InternetFacingWithHighCritVulns = countif(InternetFacing and HasHighCritVulns),
CriticalDevices = countif(IsCritical),
CriticalWithHighCritVulns = countif(IsCritical and HasHighCritVulns)
Purpose: Fast single-table scan that produces executive-level exposure headlines:
- "X of Y devices have high/critical vulnerabilities"
- "Z internet-facing devices are vulnerable"
- "N critical assets have exploitable weaknesses"
Performance: ⚡ Fast — single ExposureGraphNodes scan, no graph-match. Always runs in <5 seconds.
Key property:
highRiskVulnerabilityInsights.hasHighOrCriticalis the reliable vulnerability flag on device nodes. The property is a nested JSON string requiringparse_json(tostring(...))to extract. Seequeries/cloud/exposure_graph_attack_paths.mdNode Property Reference for full details.
Query 10b: Edge Connectivity from Vulnerable Devices
let VulnDevices = ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| where isnotnull(rawData.highRiskVulnerabilityInsights)
| extend HasHighCritVulns = tostring(parse_json(tostring(rawData.highRiskVulnerabilityInsights)).hasHighOrCritical) == "true"
| where HasHighCritVulns
| project NodeId;
let TargetNodes = ExposureGraphNodes
| project NodeId, TargetName = NodeName, TargetCategories = Categories, TargetLabel = NodeLabel;
ExposureGraphEdges
| join kind=inner VulnDevices on $left.SourceNodeId == $right.NodeId
| join kind=inner TargetNodes on $left.TargetNodeId == $right.NodeId
| extend TargetType = case(
set_has_element(TargetCategories, "identity"), "Identity",
set_has_element(TargetCategories, "compute"), "Compute",
set_has_element(TargetCategories, "data"), "Data Store",
set_has_element(TargetCategories, "ip_address"), "IP Address",
tostring(TargetCategories))
| summarize
PathCount = count(),
UniqueTargets = dcount(TargetNodeId),
SampleTargets = make_set(TargetName, 5)
by EdgeLabel, TargetType
| order by PathCount desc
Purpose: Shows the 1-hop blast radius shape from vulnerable devices WITHOUT expensive graph-match. Reveals:
- How many identities can be reached (lateral movement risk)
- How many Azure resources are reachable (data exfiltration risk)
- Which edge types dominate (authentication vs permissions vs network)
Performance: ⚡ Fast — join-based aggregation, no
make-graphorgraph-match. Runs in <10 seconds even on large graphs.
Interpretation: High counts on "can authenticate as" edges to identities indicate lateral movement risk. High counts on "has permissions to" edges to data stores indicate data exfiltration risk. Feed the most concerning edge types into Q16 (optional deep-dive) if needed.
Portal deep-dive: For interactive multi-hop attack path exploration, use the Exposure Management Attack Paths portal.
Query 11: Defender Health Non-Compliant Exceptions
// Defender Health Non-Compliant Exceptions — exception-based, active devices only
// Groups non-compliant controls per device for fleet-scale readability
// Inactive devices excluded — stale signatures on offline devices are expected
let defenderSCIDs = dynamic([
"scid-2010", "scid-2011", "scid-2012", "scid-2013", "scid-2016",
"scid-2003", "scid-91", "scid-2030",
"scid-5090", "scid-5091", "scid-5094", "scid-5095",
"scid-6090", "scid-6091", "scid-6094", "scid-6095"
]);
let deviceStatus = DeviceInfo
| summarize arg_max(Timestamp, DeviceName, OSPlatform) by DeviceId
| extend DeviceActivity = iff(Timestamp > ago(7d), "Active", "Inactive"),
LastSeen = Timestamp;
DeviceTvmSecureConfigurationAssessment
| where ConfigurationId in~ (defenderSCIDs)
| where IsApplicable == 1
| where IsCompliant == 0
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| extend Control = case(
ConfigurationId =~ "scid-2010", "AVMode",
ConfigurationId =~ "scid-2011", "AVSignatures",
ConfigurationId =~ "scid-2012", "RealtimeProtection",
ConfigurationId =~ "scid-2013", "PUAProtection",
ConfigurationId =~ "scid-2016", "CloudProtection",
ConfigurationId =~ "scid-2003", "TamperProtection",
ConfigurationId =~ "scid-91", "BehaviourMonitoring",
ConfigurationId =~ "scid-2030", "CoreComponentsUpdate",
ConfigurationId =~ "scid-5090", "RealtimeProtection",
ConfigurationId =~ "scid-5091", "PUAProtection",
ConfigurationId =~ "scid-5094", "CloudProtection",
ConfigurationId =~ "scid-5095", "AVSignatures",
ConfigurationId =~ "scid-6090", "RealtimeProtection",
ConfigurationId =~ "scid-6091", "PUAProtection",
ConfigurationId =~ "scid-6094", "CloudProtection",
ConfigurationId =~ "scid-6095", "AVSignatures",
ConfigurationId)
| join kind=inner deviceStatus on DeviceId
| where DeviceActivity == "Active"
| summarize
NonCompliantControls = make_set(Control),
FailedCount = dcount(Control),
HighestImpact = max(toreal(ConfigurationImpact))
by DeviceName, OSPlatform, LastSeen
| order by FailedCount desc, HighestImpact desc
| take 100
Purpose: Exception-based reporting — only surfaces active devices failing Defender health controls. Groups all non-compliant controls per device for fleet-scale readability (one row per problem device, not one row per failed check).
Design for scale:
- Inner join with
DeviceInfo→ only active devices (seen within 7 days) - Summarize by device → one row per device listing all failed controls as an array
take 100→ practical limit for very large environments; increase if needed- Inactive devices excluded → stale signatures on offline devices are expected, not actionable
Note: If this query returns 0 results, that's a positive finding — report as "✅ All active devices pass all Defender health controls." If the fleet summary (Q9) shows non-compliant devices but all are Inactive, report as: "⚠️ X inactive devices have stale Defender configurations — verify if devices should be decommissioned or reconnected."
Query 13: Certificate Expiration Status
🔴 CRITICAL:
DeviceTvmCertificateInfodoes NOT have aDeviceNamecolumn. You MUST join withDeviceInfoto resolve device names. UsingDeviceNamedirectly will fail withSemanticError: Failed to resolve scalar expression named 'DeviceName'. The query below already includes the required join. If the table returns empty or error, skip gracefully — it requires Defender Vulnerability Management add-on licensing.
DeviceTvmCertificateInfo
| extend Status = case(
ExpirationDate < now(), "Expired",
ExpirationDate < datetime_add('day', 30, now()), "Expiring within 30 days",
"Valid"
)
| where Status != "Valid"
| summarize CertCount = count() by Status, DeviceId
| join kind=inner (
DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| project DeviceName, Status, CertCount
| order by Status asc, CertCount desc
Purpose: Identify expired and soon-expiring certificates that can cause service outages or security gaps.
Note:
DeviceTvmCertificateInfodoes NOT have aDeviceNamecolumn — you must join withDeviceInfoto resolve device names. If the table returns empty or error, skip gracefully — it requires Defender Vulnerability Management add-on licensing.
Query 14: Top Vulnerable Software
DeviceTvmSoftwareVulnerabilities
| summarize
CriticalCVEs = countif(VulnerabilitySeverityLevel == "Critical"),
HighCVEs = countif(VulnerabilitySeverityLevel == "High"),
TotalCVEs = count(),
AffectedDevices = dcount(DeviceId)
by SoftwareVendor, SoftwareName
| order by CriticalCVEs desc, HighCVEs desc, TotalCVEs desc
| take 15
Purpose: Identify which software products contribute the most vulnerabilities — useful for upgrade/removal decisions.
Query 15: Internet-Facing Critical Assets with Vulnerabilities
ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend critLevel = rawData.criticalityLevel
| where isnotnull(critLevel) and toint(critLevel.criticalityLevel) < 4
| where isnotnull(rawData.IsInternetFacing)
| extend VulnerableToRCE = isnotnull(rawData.vulnerableToRCE)
| extend VulnerableToPrivEsc = isnotnull(rawData.VulnerableToPrivilegeEscalation)
| project
DeviceName = NodeName,
CriticalityLevel = toint(critLevel.criticalityLevel),
VulnerableToRCE,
VulnerableToPrivEsc,
NodeLabel
| order by CriticalityLevel asc
Purpose: Highest-risk combination: critical + internet-facing + vulnerable. Always Priority 1 remediation.
Query 16: Multi-Hop Attack Path Enumeration
⚠️ Optional — slow query. Only run when Q10a/Q10b reveal high exposure (e.g., many vulnerable devices with identity edges) and the user explicitly requests attack path enumeration. Skip by default in standard reports.
let IdentitiesAndCriticalDevices = ExposureGraphNodes
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend HasRCEVuln = isnotnull(rawData.vulnerableToRCE)
| extend CritLevel = toint(rawData.criticalityLevel.criticalityLevel)
| extend HasCritLevel = isnotnull(rawData.criticalityLevel)
| where
(set_has_element(Categories, "device") and
(
(HasCritLevel and CritLevel < 4)
or
HasRCEVuln
)
)
or
set_has_element(Categories, "identity");
ExposureGraphEdges
| where EdgeLabel in~ ("can authenticate as", "CanRemoteInteractiveLogonTo")
| make-graph SourceNodeId --> TargetNodeId with IdentitiesAndCriticalDevices on NodeId
| graph-match (DeviceWithRCE)-[CanConnectAs]->(Identity)-[CanRemoteLogin]->(CriticalDevice)
where
CanConnectAs.EdgeLabel =~ "can authenticate as" and
CanRemoteLogin.EdgeLabel =~ "CanRemoteInteractiveLogonTo" and
set_has_element(Identity.Categories, "identity") and
set_has_element(DeviceWithRCE.Categories, "device") and DeviceWithRCE.HasRCEVuln and
set_has_element(CriticalDevice.Categories, "device") and CriticalDevice.HasCritLevel
project
RCEDeviceName = DeviceWithRCE.NodeName,
IdentityName = Identity.NodeName,
CriticalDeviceName = CriticalDevice.NodeName,
CriticalityLevel = tostring(CriticalDevice.CritLevel)
| order by CriticalityLevel asc
Purpose: Discover multi-hop attack chains: RCE-vulnerable device → user identity → critical server. This is the heavy graph-match query — use Q10a/Q10b for fast summary stats, and only run this when deep enumeration is needed.
Note: This query may return 0 results if no RCE→identity→critical-device paths exist. That's a positive finding — report as "✅ No multi-hop attack paths from RCE-vulnerable devices to critical servers detected."
Performance: ⚠️ Slow — uses
make-graph+graph-match. Can take 30-60+ seconds on large environments. Filter nodes tightly BEFOREmake-graphto reduce graph size.
Additional patterns: See
queries/cloud/exposure_graph_attack_paths.mdfor 30+ query patterns covering cookie chains, permission analysis, choke point detection, and Azure Resource Graph integration.
Drill-Down Reference Queries
⚠️ These queries are NOT part of the standard report workflow. They use
DeviceTvmSoftwareEvidenceBetato map vulnerable software to actual file paths on disk. Use them for targeted drill-downs when the user asks to investigate a specific software's vulnerabilities, identify cleanup targets, or understand why a software has so many CVE versions.Do NOT run these fleet-wide in large environments — the evidence table can be very large. Always scope to a specific
SoftwareNameand optionally aDeviceId.
When to Use
| Scenario | Query | Trigger |
|---|---|---|
| User asks "why does software X have so many versions?" | Q17 | After Q14 reveals high version sprawl |
| User asks "what files are causing these CVEs?" | Q18 | After Q2 identifies exploitable CVEs for a software |
| User asks "what can I safely clean up?" | Q19 | After Q17/Q18 reveal old extension/app version folders |
| Standard vulnerability report | None | These queries are NOT used in standard reports |
DeviceTvmSoftwareEvidenceBeta — Table Reference
Beta table: Schema and table name may change in future Defender releases. The canonical table name is
DeviceTvmSoftwareEvidenceBeta— NOTDeviceTvmSoftwareEvidencesorDeviceTvmSoftwareEvidence.
| Column | Type | Description |
|---|---|---|
DeviceId |
string | Device identifier (join with DeviceInfo for DeviceName) |
SoftwareVendor |
string | Software vendor name |
SoftwareName |
string | Software product name (matches DeviceTvmSoftwareVulnerabilities.SoftwareName) |
SoftwareVersion |
string | Detected version (matches DeviceTvmSoftwareVulnerabilities.SoftwareVersion) |
DiskPaths |
dynamic | JSON array of file paths where the software was detected on disk |
RegistryPaths |
dynamic | JSON array of registry keys evidencing the software installation |
LastSeenTime |
string | Last time evidence was observed |
Query 17: Version Sprawl by Source — Per-Software Summary
// Drill-down: For a specific software, show all versions with file locations
// categorized by source (Azure extension, application, standalone install, etc.)
// Scope: Single software — ALWAYS filter by SoftwareName
DeviceTvmSoftwareEvidenceBeta
| where SoftwareName =~ '<SOFTWARE_NAME>'
| extend Paths = parse_json(DiskPaths)
| mv-expand Path = Paths
| extend FilePath = tostring(Path)
| extend Source = case(
FilePath has "Packages\\Plugins", "Azure Extension",
FilePath has "Program Files\\Microsoft OneDrive", "OneDrive",
FilePath has "WindowsApps", "Store App",
FilePath has "Program Files\\dotnet", ".NET Runtime",
FilePath has "Python", "Python",
FilePath has "Windows\\System32", "System",
FilePath has "Program Files\\", "Installed Software",
FilePath has "dpkg-query", "Linux Package",
"Other")
| join kind=inner (
DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| summarize
Versions = make_set(SoftwareVersion),
FileCount = dcount(FilePath),
Devices = make_set(DeviceName)
by Source
| extend VersionCount = array_length(Versions), DeviceCount = array_length(Devices)
| order by FileCount desc
Purpose: High-level summary showing WHERE a software's vulnerable files come from — Azure extensions leaving old versions behind, OneDrive version-per-folder sprawl, Store apps, standalone installs, etc. Useful for identifying the root cause of version sprawl and choosing the right remediation approach.
Substitute: Replace <SOFTWARE_NAME> with the software from Q14 results (e.g., openssl, curl, zlib).
When to include in reports: This query produces a compact summary table suitable for including in reports when a specific software dominates the CVE count. Present it under Section 2c (Top Vulnerable Software) as a "Source Breakdown" sub-table for the worst offender.
Query 18: Vulnerable File Paths — CVE to File Mapping
// Drill-down: Map specific software versions to their on-disk file paths
// and correlate with CVE count per version
// Scope: Single software — ALWAYS filter by SoftwareName
let vulnVersions = DeviceTvmSoftwareVulnerabilities
| where SoftwareName =~ '<SOFTWARE_NAME>'
| summarize CVEs = make_set(CveId) by SoftwareVersion
| extend CVECount = array_length(CVEs);
DeviceTvmSoftwareEvidenceBeta
| where SoftwareName =~ '<SOFTWARE_NAME>'
| extend Paths = parse_json(DiskPaths)
| mv-expand Path = Paths
| extend FilePath = tostring(Path)
| join kind=inner (
DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| join kind=leftouter vulnVersions on SoftwareVersion
| summarize
Devices = make_set(DeviceName),
DeviceCount = dcount(DeviceName)
by FilePath, SoftwareVersion, CVECount
| order by CVECount desc, DeviceCount desc
Purpose: Maps every vulnerable file path to its version and CVE count. Shows exactly which files on which devices are contributing to CVE exposure. Key for building targeted cleanup scripts.
Substitute: Replace <SOFTWARE_NAME> with the target software name.
Common patterns revealed:
- Azure extensions:
C:\Packages\Plugins\<ExtensionName>\<OldVersion>\...\libcrypto-3-x64.dll— old extension versions left behind after upgrades, each bundling their own OpenSSL/curl/zlib - OneDrive:
C:\Program Files\Microsoft OneDrive\<version>\— every OneDrive update creates a new version folder with bundled libraries - Store apps:
C:\Program Files\WindowsApps\<AppName_Version>\— managed by Microsoft Store, stale versions auto-cleaned eventually - Standalone installs:
C:\Program Files\<product>\— requires manual update or reinstall
Query 19: Stale Extension Folder Detection
// Drill-down: Find OLD Azure extension version folders still on disk
// by comparing evidence paths against the latest installed version
// Scope: All Azure extension evidence — safe to run fleet-wide (small result set)
//
// ⚠️ PITFALL: Version comparison uses string max() which is LEXICOGRAPHIC.
// "1.29.98" > "1.29.104" because '9' > '1' at position 5.
// Review results manually — a "stale" folder with a higher numeric version
// than "latest" means the comparison inverted. This is a known KQL limitation
// for dotted version strings with variable-width segments.
DeviceTvmSoftwareEvidenceBeta
| extend Paths = parse_json(DiskPaths)
| mv-expand Path = Paths
| extend FilePath = tostring(Path)
| where FilePath has "packages" and FilePath has "plugins"
| extend ExtensionName = extract(@"plugins\\([^\\]+)", 1, FilePath)
| extend ExtensionVersion = extract(@"plugins\\[^\\]+\\([^\\]+)", 1, FilePath)
| where isnotempty(ExtensionName) and isnotempty(ExtensionVersion)
| join kind=inner (
DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| summarize
SoftwareVersions = make_set(SoftwareVersion),
FileCount = dcount(FilePath),
Devices = make_set(DeviceName)
by ExtensionName, ExtensionVersion
| as hint.materialized=true AllExtVersions
| join kind=inner (
AllExtVersions
| summarize LatestVersion = max(ExtensionVersion) by ExtensionName
) on ExtensionName
| where ExtensionVersion != LatestVersion
| project ExtensionName, StaleVersion = ExtensionVersion, LatestVersion,
BundledSoftwareVersions = SoftwareVersions, FileCount, Devices
| order by ExtensionName asc, StaleVersion asc
Purpose: Identifies old Azure extension version folders still present on disk after upgrades. These are the primary source of "phantom" CVEs from bundled libraries (OpenSSL, curl, zlib, etc.) that inflate vulnerability counts. Safe to run fleet-wide because it only returns stale folders (small result set).
Known limitation:
max(ExtensionVersion)uses lexicographic string comparison, which breaks for version segments with different digit counts (e.g.,1.29.98vs1.29.104). Always review results — if a "stale" version number looks higher than "latest," the comparison inverted. There is no built-in KQL function for semantic version comparison.
Regex note:
extract()in KQL is case-sensitive. The evidence table stores paths in lowercase (c:\packages\plugins\...), so the regex uses lowercaseplugins. Thehasoperator used for filtering is case-insensitive.
Remediation pattern: For each stale extension version folder, the entire folder tree can be safely deleted:
Remove-Item -Recurse -Force "C:\Packages\Plugins\<ExtensionName>\<StaleVersion>"After cleanup, TVM will reflect the reduced vulnerability count within 4-24 hours.
Common culprits: Azure Monitor Agent (
AzureMonitorWindowsAgent), Guest Configuration Agent (ConfigurationforWindows), Azure Security Center (MicrosoftMonitoringAgent), and other Azure Arc extensions that bundle OpenSSL, curl, or zlib.
Risk Assessment
Compute an overall risk rating based on query results:
| Rating | Criteria |
|---|---|
| 🔴 Critical | Any: exploitable Critical CVEs on internet-facing assets, OR compliance rate < 40%, OR internet-facing devices with high/critical vulnerabilities (Q10a), OR high blast radius from vulnerable devices to identities/data stores (Q10b) |
| 🟠 High | Any: exploitable High CVEs > 5, OR EoS software on critical assets, OR compliance rate < 60%, OR active devices with RTP/TamperProtection/AVMode non-compliant |
| 🟡 Medium | Any: total High CVEs > 50, OR EoS software present, OR compliance rate < 75%, OR expired certificates > 10 |
| 🟢 Low | None of the above criteria met |
Cite specific evidence when assigning risk level (per copilot-instructions.md Evidence-Based Analysis rule).
Report Template
Inline Chat Executive Summary
📊 VULNERABILITY & EXPOSURE REPORT — <DATE>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Overall Risk:** 🔴 / 🟠 / 🟡 / 🟢 <RATING> — <1-sentence justification with evidence>
### Vulnerability Overview
| Severity | CVE Count | Devices Affected |
|----------|-----------|------------------|
| 🔴 Critical | X | Y |
| 🟠 High | X | Y |
| 🟡 Medium | X | Y |
| 🔵 Low | X | Y |
⚠️ **X CVEs with known exploits** — see full report for details
### Configuration Compliance
| Category | Compliant % | Non-Compliant |
|----------|-------------|---------------|
| OS | X% | Y |
| Network | X% | Y |
| Security Controls | X% | Y |
| Accounts | X% | Y |
| Application | X% | Y |
### Attack Path Exposure
| Metric | Count |
|--------|-------|
| Devices with high/critical vulnerabilities | X of Y |
| Internet-facing vulnerable devices | Z |
| Critical assets with vulnerabilities | N |
| Lateral movement paths (identity edges) | X → Y targets |
| Data access paths (permission edges) | X → Y targets |
🔗 **Full interactive attack paths:** [Exposure Management Portal](https://security.microsoft.com/exposure-management/attack-paths)
### Defender Device Health
**Active Devices:** X/Y controls fully compliant across Z active devices
**Inactive Devices:** N devices not reporting (excluded — stale signatures expected)
⚠️ / ✅ **Non-compliant active devices:** <count and failed control names, or "None">
### Key Findings
- 🔴 <Critical finding 1>
- 🟠 <High finding 2>
- ⚠️ <Notable finding 3>
- ✅ <Positive finding>
### 🎯 TOP 3 PRIORITY ACTIONS
1. 🔴 <Action 1 — e.g., Patch X exploitable CVEs on internet-facing assets>
2. 🟠 <Action 2 — e.g., Remediate Y Impact-9 security misconfigurations>
3. ⚠️ <Action 3 — e.g., Upgrade Z end-of-support software>
📄 Full report: reports/exposure/vulnerability_exposure_report_<YYYYMMDD_HHMMSS>.md
Markdown File Structure
The full markdown report file MUST follow this structure:
# Vulnerability & Exposure Management Report
**Generated:** <DATE>
**Scope:** <Org-Wide / Device: HOSTNAME>
**Overall Risk Rating:** 🔴/🟠/🟡/🟢 <RATING>
---
## 1. Executive Summary
- Overall risk rating with evidence
- Key metrics dashboard
- Top 3 priority remediation actions
## 2. CVE Vulnerability Assessment
🔗 **Browse all CVEs in Defender portal:** [Weaknesses](https://security.microsoft.com/vulnerabilities) | [Software Inventory](https://security.microsoft.com/software-inventory)
### 2a. Severity Distribution
<Table: Severity × CVE Count × Device Count>
### 2b. Exploitable Vulnerabilities
<Table: CVE ID, CVSS, Description, Affected Devices — sorted by CVSS desc>
### 2c. Top Vulnerable Software
<Table: Vendor, Software, Critical/High/Total CVEs, Affected Devices>
### 2d. Per-Device Vulnerability Matrix
<Table: Device, OS, Critical/High/Med/Low/Total>
## 3. Security Configuration Compliance
🔗 **Detailed recommendations in Defender portal:** [Security Recommendations](https://security.microsoft.com/exposure-recommendations) | [Vulnerability Management Dashboard](https://security.microsoft.com/vulnerability-management/dashboard)
### 3a. Compliance by Category
<Table: Category, Total, Compliant %, Non-Compliant>
### 3b. Per-Device Compliance Scorecard
<Table: Device, Compliance %, Compliant/NonCompliant/NA counts>
### 3c. High-Impact Misconfigurations (Impact ≥ 8)
For each misconfiguration:
- **Configuration:** <Name>
- **Category:** <Category> > <Subcategory>
- **Impact Score:** <Score>/10
- **Risk:** <RiskDescription>
- **Affected Devices:** <count> (<device list>)
- **Remediation:** <Summary of RemediationOptions — strip HTML tags>
## 4. End-of-Support Software
<Table: Vendor, Software, Version, EoS Status, EoS Date, Affected Devices>
## 5. Exposure Management
### 5a. Critical Asset Inventory
<Table: Device, Criticality Level, Internet-Facing, RCE Vuln, PrivEsc Vuln>
### 5b. Attack Path & Exposure Analysis
**Vulnerable Device Exposure (Q10a):**
| Metric | Count |
|--------|-------|
| Total devices | X |
| Devices with high/critical vulnerabilities | Y |
| Internet-facing vulnerable devices | Z |
| RCE-vulnerable devices | N |
| Critical assets with vulnerabilities | N |
**Blast Radius from Vulnerable Devices — 1-Hop Connectivity (Q10b):**
| Edge Type | Target Type | Path Count | Unique Targets | Sample Targets |
|-----------|-------------|------------|----------------|----------------|
| can authenticate as | Identity | X | Y | ... |
| has permissions to | Data Store | X | Y | ... |
| ... | ... | ... | ... | ... |
**Interpretation:** <Narrative summarizing lateral movement risk, data access risk, and key choke points>
🔗 **Full interactive attack path analysis:** [Exposure Management Portal](https://security.microsoft.com/exposure-management/attack-paths)
> If Q16 was run (optional deep-dive):
> **Multi-Hop Attack Chains (Q16):** <Table: Entry Device → Identity → Target Device / Criticality>
> Or: "✅ No multi-hop attack paths from RCE-vulnerable devices to critical servers detected."
## 6. Endpoint Health
### 6a. Defender Device Health
**Fleet Summary (Active Devices):** <Table: Control × OS Platform × Compliant / NonCompliant / ComplianceRate — active devices only>
**Inactive Device Summary:** <Count of inactive devices by OS — signature staleness is expected, flag for decommissioning review>
**Non-Compliant Exceptions (Active Only):** <Table: Device, OS, Failed Controls, Count — only active devices failing Defender controls>
If no non-compliant active devices: "✅ All active devices pass all Defender health controls"
If non-compliant only on inactive: "⚠️ X inactive devices have stale Defender configurations — verify if devices should be decommissioned or reconnected"
### 6b. Certificate Status
<Table: Device, Expired/Expiring count>
## 7. Prioritized Remediation Plan
🔗 **Track remediation in Defender portal:** [Remediation Activities](https://security.microsoft.com/vulnerability-management/remediation) | [Security Recommendations](https://security.microsoft.com/exposure-recommendations)
| Priority | Category | Action | Impact |
|----------|----------|--------|--------|
| 🔴 Immediate | ... | ... | ... |
| 🟠 Short-term | ... | ... | ... |
| 🟡 Medium-term | ... | ... | ... |
| 🟢 Ongoing | ... | ... | ... |
## 8. Appendix
- Query reference (all KQL queries used)
- Data freshness notes
- Methodology
Per-Device Mode
When user specifies a device name, scope all DeviceTvm queries to that device:
Add filter to Queries 1-6, 8, 9, 11, 13, 14:
| where DeviceName startswith '<DEVICE_NAME>' // Use startswith — DeviceName is often FQDN (e.g., hostname.domain.com)
ExposureGraph queries (7, 15): Filter by NodeName:
| where NodeName has '<DEVICE_NAME>' // Use has — NodeName may be FQDN, short name, or contain domain suffix
Per-device report differences:
- Section 5b (Attack paths) — filter to paths involving the specific device
- Title changes to:
Vulnerability & Exposure Report — <DEVICE_NAME>
Known Pitfalls
| Pitfall | Impact | Mitigation |
|---|---|---|
DeviceName in TVM tables is stored as FQDN (e.g., hostname.domain.com) |
DeviceName =~ 'hostname' returns 0 results — exact match fails on FQDN |
MUST use DeviceName startswith '<short_name>' for per-device filtering. startswith matches both short names and FQDNs. Same applies to ExposureGraphNodes.NodeName — use has instead of =~ |
DeviceTvmCertificateInfo requires Defender VM add-on |
Query returns empty or error | Skip gracefully, note in report: "Certificate data requires Defender Vulnerability Management add-on" |
DeviceTvmBrowserExtensions may be empty |
No browser extension data | Skip section, note as "No browser extension data available" |
| DeviceTvmSoftwareVulnerabilitiesKB has a specific schema | Ad-hoc project using non-existent columns (CveDescription, ExploitTypes, ExploitVerified, IsExploitVerified, RecommendedSecurityUpdate, RecommendedSecurityUpdateId) returns Failed to resolve scalar expression | Verified columns (via getschema): CveId, CvssScore, CvssVector, CveSupportability, IsExploitAvailable (bool), VulnerabilitySeverityLevel, LastModifiedTime, PublishedDate, VulnerabilityDescription, AffectedSoftware (dynamic), EpssScore (real). There are NO columns named ExploitTypes, ExploitVerified, RecommendedSecurityUpdate, or RecommendedSecurityUpdateId. Those exist on DeviceTvmSoftwareVulnerabilities (the main table), not the KB. Use getschema before adding ad-hoc columns. Stick to skill queries — do NOT improvise projections |
| RemediationOptions in KB tables contains HTML | Raw HTML in output | Strip HTML tags when rendering in markdown: remove <br/>, <ol>, <li>, <a> tags, convert to plain text bullet points |
| NodeProperties is a JSON string, NOT a parsed dynamic object | Direct dot-notation like NodeProperties.rawData.criticalityLevel returns null through MCP JSON serialization — queries silently return 0 results | MUST use double parse_json(tostring()) extraction: parse_json(tostring(parse_json(tostring(NodeProperties)).rawData)) then access sub-properties. This is the ONLY reliable pattern for NodeProperties access. See Q7, Q10a, Q10b, Q15, Q16 for canonical examples |
| ConfigurationBenchmarks in KB contains benchmark mappings | Can enrich report | Optional: extract CIS/NIST benchmark references for compliance mapping |
| DeviceTvm assessments refresh periodically | Data may be 12-24h old | Note data freshness in report appendix |
| DeviceTvmSecureConfigurationAssessment with Timestamp > ago(1d) returns 0 results | Lab, weekend, and low-activity environments may not have assessments in the last 24h. The ago(1d) filter silently drops all data — the #1 cause of empty Q3/Q6/Q8 results | NEVER use Timestamp > ago(1d) as a pre-filter. Use summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId to dedup to the latest assessment per device×config without a time floor. Q9 and Q11 already use this pattern correctly |
| graph-match queries can be slow on large graphs | Timeout possible | Filter nodes BEFORE make-graph to reduce graph size |
| parse_json() and graph-match project produce dynamic-typed columns | order by fails with "key can't be of dynamic type" error | Always wrap in explicit type casts (toint(), tostring(), tolong()) before using in order by, summarize, or comparisons. Applies to ALL parse_json() output — not just graph-match. Example: | extend critValue = toint(rawData.criticalityLevel.criticalityLevel) then | order by critValue asc |
| DeviceTvmInfoGathering table exists but is NOT used by this skill | Agent may attempt to query it for Defender health data, causing errors due to unfamiliar schema | Defender sensor health is covered by Q9 (SCIDs in DeviceTvmSecureConfigurationAssessment). Do NOT improvise queries against DeviceTvmInfoGathering — its schema differs from other DeviceTvm* tables and is not documented here |
| DeviceTvmCertificateInfo has NO DeviceName column | Failed to resolve scalar expression named 'DeviceName' | Join with DeviceInfo \| summarize arg_max(Timestamp, DeviceName) by DeviceId to resolve device names |
| Context in DeviceTvmSecureConfigurationAssessment is double-nested JSON | First parse_json(Context) returns an array of JSON strings; items need a second parse_json() to extract values | Use parse_json(tostring(parse_json(Context)[0]))[N] — e.g., [0] for AV mode code, [2] for signature date |
| SCID numbers are OS-specific — same control has different IDs per platform | Querying Windows SCIDs on macOS/Linux returns IsApplicable=0 | Use the SCID mapping: Windows 2010-2030, macOS 5090-5095, Linux 6090-6095. Q9/Q11 normalize OS-specific SCIDs to unified control names |
| Inactive devices have naturally stale AV signatures | Non-compliant AVSignatures on devices offline >7 days is expected, not a security gap | Always join DeviceInfo to separate active (seen <7d) from inactive devices; report inactive signature staleness as informational only |
| DeviceTvmSoftwareEvidenceBeta is a Beta table | Table name and schema may change in future Defender releases | Use exact name DeviceTvmSoftwareEvidenceBeta — NOT DeviceTvmSoftwareEvidences or DeviceTvmSoftwareEvidence. If the table returns SemanticError, it may have been renamed or graduated to GA — check FetchAdvancedHuntingTablesOverview for the current name |
| DeviceTvmSoftwareEvidenceBeta has no DeviceName column | Cannot display device names directly | Join with DeviceInfo \| summarize arg_max(Timestamp, DeviceName) by DeviceId — same pattern as DeviceTvmCertificateInfo |
| DiskPaths and RegistryPaths are dynamic arrays | Need parse_json() + mv-expand to flatten into individual paths | Pattern: \| extend Paths = parse_json(DiskPaths) \| mv-expand Path = Paths \| extend FilePath = tostring(Path) |
| Evidence queries can be expensive fleet-wide | Large environments have millions of file evidence rows | ALWAYS scope to a specific SoftwareName. Never run DeviceTvmSoftwareEvidenceBeta without a filter |
| max() on version strings is lexicographic | "1.29.98" > "1.29.104" because '9' > '1' at the 5th character — inverts the comparison for multi-digit segments | Q19 results must be manually reviewed. KQL has no built-in semantic version comparison |
| extract() regex is case-sensitive | Evidence table paths are lowercase (c:\packages\plugins\...), but regex patterns with uppercase (e.g., Plugins) won't match | Always use lowercase in extract() patterns for file paths. Use case-insensitive has for filtering |
Error Handling
| Error | Cause | Resolution |
|---|---|---|
SemanticError: Failed to resolve table 'DeviceTvm...' |
Table not available in AH | Verify Defender for Endpoint is onboarded; some DeviceTvm* tables require premium licensing |
SemanticError: Failed to resolve table 'ExposureGraphNodes' |
Exposure Management not enabled | Report as: "⚠️ Microsoft Security Exposure Management is not enabled in this tenant. ExposureGraph sections skipped." |
| Query timeout on graph-match | Graph too large | Reduce node set with tighter filters; try simpler edge queries first |
| Empty results from DeviceTvmSoftwareVulnerabilities | No onboarded devices or no vulns detected | Verify at least one device is MDE-onboarded: `DeviceInfo |
DeviceTvmCertificateInfo not found |
Requires Defender Vulnerability Management add-on | Skip section, note in report |
Graceful Degradation
If a table or query fails, do not abort the entire report. Skip the affected section and note it:
### 6b. Certificate Status
❓ Certificate data not available — `DeviceTvmCertificateInfo` table not found.
This may require the Defender Vulnerability Management add-on license.
Continue with all remaining sections. The report should always produce output for at least:
- CVE Vulnerability Assessment (Sections 2a-2d)
- Security Configuration Compliance (Sections 3a-3c)
These are available in all Defender for Endpoint tenants.
Additional References
- Query the Enterprise Exposure Graph
- DeviceTvmSoftwareVulnerabilities schema
- DeviceTvmSecureConfigurationAssessment schema
- Microsoft Security Exposure Management overview
- Existing query library:
queries/cloud/exposure_graph_attack_paths.md
.github/skills/identity-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill identity-posture -g -y
SKILL.md
Frontmatter
{
"name": "identity-posture",
"description": "Audit identity security posture across the organization. Triggers on keywords like \"identity posture\", \"identity security report\", \"account hygiene\", \"stale accounts\", \"privileged accounts\", \"password posture\", \"identity providers\", \"multi-provider identity\", \"identity sprawl\", \"service accounts\", \"deleted accounts with roles\", \"cross-IdP\", \"honeytoken\", \"sensitive accounts\". Queries IdentityAccountInfo in Advanced Hunting (enriched with IdentityInfo and IdentityLogonEvents) for a posture assessment covering account inventory by provider, privileged account audit, stale\/deleted account hygiene, password posture, risk distribution, multi-provider identity linking, MDI tag analysis, and department-level insights. Inline chat or markdown output.",
"drill_down_prompt": "Run identity posture report — account hygiene, privilege distribution, stale accounts",
"threat_pulse_domains": [
"identity"
]
}
Identity Security Posture — Instructions
Purpose
This skill audits the identity security posture across your organization using the IdentityAccountInfo table in Microsoft Defender XDR Advanced Hunting, enriched with IdentityInfo and IdentityLogonEvents for password policy and logon activity context.
Modern organizations use multiple identity providers (Entra ID, Active Directory, Okta, SailPoint, CyberArk, Ping, etc.). IdentityAccountInfo is the only table that provides a unified identity graph across these providers, linking accounts to a single IdentityId. This skill systematically evaluates the security posture of that identity fabric.
What this skill covers:
| Domain | Key Questions Answered |
|---|---|
| 🔍 Identity Inventory | How many accounts exist? Across which providers? What types and statuses? |
| 👑 Privileged Account Audit | Who holds high-privilege roles? Across which providers? Are they permanent? |
| 🗑️ Stale & Deleted Account Hygiene | Which enabled accounts have no logon activity? Do deleted accounts retain permissions? |
| 🔑 Password Posture | Password age distribution, PasswordNeverExpires/PasswordNotRequired flags (AD accounts via IdentityInfo join) |
| 🟠 Risk Distribution | How are identity risk levels distributed? Which high-risk accounts are still active? |
| 🔗 Multi-Provider Identity Linking | Which identities span multiple IdPs? Are there status mismatches across providers? |
| 🏷️ Sensitive & Honeytoken Accounts | Which accounts are MDI-tagged? Are sensitive accounts properly protected? |
| 🏢 Organizational Context | Account distribution by department, service account inventory |
Primary data source: IdentityAccountInfo table (Advanced Hunting) — currently in Preview.
Enrichment tables:
IdentityInfo— AddsUserAccountControl(PasswordNeverExpires, PasswordNotRequired),DistinguishedName,RiskLevel,BlastRadius,PrivilegedEntraPimRoles(Preview)IdentityLogonEvents— Last logon timestamps across AD, Entra, Okta, SailPoint, M365 appsSigninLogs— Last Entra ID sign-in for stale account detection (via Data Lake for 90d+ lookback)
References:
- Microsoft Docs — IdentityAccountInfo table
- Microsoft Docs — IdentityInfo table
- MDI Accounts Security Posture Assessments
- MDI Hybrid Security Posture Assessments
- Alex Verboon — AD Password Security Posture Assessment
🔴 URL Registry — Canonical Links for Report Generation
MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL. If a URL is not in this registry, omit the hyperlink entirely and use plain text.
| Label | Canonical URL |
|---|---|
DOCS_IDENTITYACCOUNTINFO |
https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-identityaccountinfo-table |
DOCS_IDENTITYINFO |
https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-identityinfo-table |
DOCS_MDI_ACCOUNTS |
https://learn.microsoft.com/en-us/defender-for-identity/security-posture-assessments/accounts |
DOCS_MDI_HYBRID |
https://learn.microsoft.com/en-us/defender-for-identity/security-posture-assessments/hybrid-security |
DOCS_MDI_INFRA |
https://learn.microsoft.com/en-us/defender-for-identity/security-posture-assessments/identity-infrastructure |
GITHUB_VERBOON_PWD |
https://github.com/alexverboon/Hunting-Queries-Detection-Rules/blob/main/Defender%20For%20Identity/MDI-Identity-Password%20Security%20Posture%20Assessment.md |
Why Identity Posture Matters
Identity is the new perimeter. Attackers consistently target credentials, stale accounts, and over-privileged identities as the path of least resistance into enterprise environments. Key risks this skill detects:
| Risk | Impact | Skill Detection |
|---|---|---|
| Stale accounts | Dormant accounts with active permissions are prime targets for credential stuffing and lateral movement | Q5 (Stale Account Detection) |
| Deleted accounts with residual permissions | Accounts that are deleted but retain group memberships and role assignments create orphan access | Q6 (Deleted Account Hygiene) |
| Permanent privileged roles | Standing Global Admin / Security Admin roles violate least-privilege and increase blast radius | Q4 (Privileged Account Audit) |
| Password policy gaps | PasswordNeverExpires and PasswordNotRequired on AD accounts undermine credential rotation | Q7 (Password Posture) |
| Multi-provider identity sprawl | Same person with accounts across AAD + AD + Okta + CyberArk with inconsistent status/permissions | Q8 (Multi-Provider Linking) |
| High-risk active accounts | Accounts flagged High risk by Identity Protection that remain active and privileged | Q9 (Risk Distribution) |
| Unprotected sensitive accounts | MDI-tagged Sensitive/Honeytoken accounts without appropriate monitoring | Q10 (MDI Tags) |
This skill maps directly to the following MDI Security Posture Assessments (see Accounts assessments):
- Remove stale Active Directory accounts
- Entra ID privileged users also privileged in AD
- Identify service accounts in privileged groups
- Locate accounts in built-in Operator Groups
- Accounts with passwords older than 180 days
📑 TABLE OF CONTENTS
- Critical Workflow Rules — Mandatory rules
- Table Schema Reference — IdentityAccountInfo columns
- Identity Posture Score Formula — Composite risk scoring
- Execution Workflow — Phase-by-phase query plan
- Sample KQL Queries — All queries (Q1–Q12)
- Output Modes — Inline vs Markdown report
- Inline Report Template — Chat-rendered format
- Markdown File Report Template — Disk-saved format
- SVG Dashboard Generation — Visual dashboard from report
- Known Pitfalls — Schema quirks and edge cases
- Quality Checklist — Pre-delivery validation
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
-
ALWAYS use
RunAdvancedHuntingQuery— TheIdentityAccountInfotable is an Advanced Hunting table. All queries in this skill MUST useRunAdvancedHuntingQuery. Exception: Q5b (stale account enrichment via SigninLogs) may use Data Lake for 90d+ lookback. -
ALWAYS deduplicate accounts with
arg_max— The table contains multiple snapshots per account (state changes + 24h refresh). Every query that analyzes current account state MUST use| summarize arg_max(Timestamp, *) by AccountIdto get the latest record per account. -
ASK the user for output format before generating the report:
- Inline chat summary (quick review in chat)
- Markdown file report (detailed, archived to
reports/identity-posture/) - Both (markdown + inline summary)
-
⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (
✅ No [finding] detected) when queries return 0 results. Never guess or assume. -
Dynamic fields require
parse_json()+tostring()—AssignedRoles,EligibleRoles,GroupMembership,Tagsare dynamic arrays. Always useparse_json()formv-expandandtostring()for string comparisons. -
Run queries in parallel batches where possible — Phase 1 queries (Q1–Q3) are independent. Phase 2 queries (Q4–Q8) are independent. Phase 3 (Q9–Q12) are independent.
-
Time tracking — Report elapsed time after each phase.
-
Table is in Preview — Some fields documented in the schema may not be populated yet (EnrolledMfas, TenantMembershipType, AuthenticationMethod, CriticalityLevel, DefenderRiskLevel). Handle gracefully — check for empty/null and report as "Not yet populated (Preview)" rather than "No data".
⛔ PROHIBITED ACTIONS
| Action | Status |
|---|---|
Querying IdentityAccountInfo via mcp_sentinel-data_query_lake |
❌ PROHIBITED — AH-only table |
Querying without arg_max(Timestamp, *) by AccountId deduplication |
❌ PROHIBITED — inflates counts |
| Reporting empty Preview fields as "No data found" | ❌ PROHIBITED — report as "Not yet populated (Preview)" |
Filtering AssignedRoles or Tags with direct string comparison without parse_json() |
❌ PROHIBITED — dynamic fields |
Assuming SourceProviderRiskLevel or Tags are populated for all providers |
❌ PROHIBITED — availability varies by IdP |
Table Schema Reference
IdentityAccountInfo (Primary)
| Column | Type | Description | Population |
|---|---|---|---|
Timestamp |
datetime | Snapshot timestamp (state change or 24h refresh) | ✅ All |
AccountId |
string | Internal account identifier (unique per provider account) | ✅ All |
IdentityId |
string | Unified identity — links accounts across providers | ✅ All |
AccountUpn |
string | User principal name | ✅ All |
DisplayName |
string | Display name | ✅ All |
SourceProvider |
string | Identity provider (AzureActiveDirectory, ActiveDirectory, Okta, SailPoint, CyberArkIdentity, Ping) | ✅ All |
AccountStatus |
string | Status (Enabled, Disabled, Deleted, ACTIVE, STAGED, DEPROVISIONED, etc.) | ✅ All |
Type |
string | Account type (User, ServiceAccount) | ✅ All |
AssignedRoles |
dynamic | Role assignments (AAD roles, CyberArk roles, etc.) | ✅ ~60% |
EligibleRoles |
dynamic | PIM-eligible roles | ❌ Empty (Preview) |
GroupMembership |
dynamic | Group IDs | ✅ ~72% |
Tags |
dynamic | MDI tags (Sensitive, Honeytoken, Privileged Account) | ✅ ~1% (tagged accounts only) |
SourceProviderRiskLevel |
dynamic | Risk level from source provider (Low/Medium/High/None) | ✅ ~18% (AAD + AD) |
LastPasswordChangeTime |
datetime | Last password change | 🟡 ~1% (sparse — mostly non-AAD) |
CreatedDateTime |
datetime | Account creation date | ✅ ~99% |
Department |
string | Department name | ✅ ~60% |
Manager |
string | Manager name | 🟡 ~1% |
City / Country |
string | Location | 🟡 <1% |
Sid |
string | Security Identifier (cloud SID for AAD, on-prem SID for AD) | ✅ ~89% |
IsPrimary |
bool | Whether this is the primary account for the linked identity | ✅ All |
IdentityLinkType |
string | Linkage type (Manual, StrongId) | ✅ All |
EnrolledMfas |
dynamic | MFA enrollment details | ❌ Empty (Preview) |
TenantMembershipType |
string | Guest/Member | ❌ Empty (Preview) |
AuthenticationMethod |
string | Credentials/Federated/Hybrid | ❌ Empty (Preview) |
CriticalityLevel |
int | Criticality score | ❌ Empty (Preview) |
IdentityInfo (Enrichment — Join on IdentityId or AccountUpn)
Key columns used for enrichment:
| Column | Type | What It Adds |
|---|---|---|
UserAccountControl |
dynamic | AD flags: PasswordNeverExpires, PasswordNotRequired, etc. |
DistinguishedName |
string | AD OU path |
RiskLevel |
string | Entra ID risk level (Low/Medium/High) |
BlastRadius |
string | UEBA blast radius (Low/Medium/High) — requires Sentinel UEBA |
PrivilegedEntraPimRoles |
dynamic | PIM role schedules (Preview — requires MDI) |
IsAccountEnabled |
boolean | Account enabled status |
RiskStatus |
string | None, AtRisk, Remediated, Dismissed, ConfirmedCompromised |
IdentityLogonEvents (Enrichment — Join on AccountUpn)
Used for stale account detection (last logon across AD, Entra, third-party IdPs).
Identity Posture Score Formula
The Identity Posture Score is a composite risk indicator summarizing the security posture of an organization's identity fabric. Higher scores indicate greater risk.
Scoring Dimensions
$$ \text{IdentityPostureScore} = \sum_{i} \text{DimensionScore}_i $$
Each dimension contributes 0–20 points to a maximum of 100:
| Dimension | Max | 🟢 Low (0–5) | 🟡 Medium (6–12) | 🔴 High (13–20) |
|---|---|---|---|---|
| Stale/Deleted Account Risk | 20 | <5% enabled accounts stale; 0 deleted with roles | 5–15% stale; <50 deleted with roles | >15% stale; >50 deleted accounts retaining roles |
| Privileged Account Exposure | 20 | <5 permanent high-priv accounts; all use PIM | 5–15 permanent high-priv; some PIM gaps | >15 permanent high-priv across multiple providers; no PIM |
| Password Posture | 20 | <10% PasswordNeverExpires; avg age <180d | 10–40% PwdNeverExpires; avg age 180–365d | >40% PwdNeverExpires; avg age >365d; PasswordNotRequired present |
| Risk Distribution | 20 | <5% accounts at High risk; all remediated/dismissed | 5–10% High risk; some unresolved | >10% High risk accounts active; unresolved AtRisk state |
| Identity Sprawl | 20 | <5% identities span >1 provider; consistent status | 5–15% multi-provider; some status mismatches | >15% multi-provider; status mismatches (enabled in one, disabled in another) |
Interpretation Scale
| Score | Rating | Action |
|---|---|---|
| 0–20 | ✅ Healthy | Normal posture, routine monitoring |
| 21–45 | 🟡 Elevated | Review — minor hygiene gaps detected |
| 46–70 | 🟠 Concerning | Investigate — multiple risk signals present |
| 71–100 | 🔴 Critical | Immediate remediation — significant identity security risk |
Execution Workflow
Phase 0: Prerequisites
- Confirm
RunAdvancedHuntingQueryis available (IdentityAccountInfo is AH-only) - Ask user for output format (inline / markdown / both)
Phase 1: Inventory & Overview (Q1–Q3)
Run in parallel — no dependencies between queries.
| Query | Purpose | Table |
|---|---|---|
| Q1 | Global inventory summary (accounts, identities, providers, date range) | IdentityAccountInfo |
| Q2 | Account status distribution by provider | IdentityAccountInfo |
| Q3 | Account type and department distribution | IdentityAccountInfo |
Phase 2: Security Risk Analysis (Q4–Q8)
Run in parallel — no dependencies between queries.
| Query | Purpose | Tables |
|---|---|---|
| Q4 | Privileged account audit — high-value roles across providers | IdentityAccountInfo |
| Q5 | Stale account detection — enabled with no logon in 90d | IdentityAccountInfo + IdentityLogonEvents |
| Q6 | Deleted account hygiene — deleted accounts retaining permissions | IdentityAccountInfo |
| Q7 | Password posture — age distribution + AD policy flags | IdentityAccountInfo + IdentityInfo |
| Q7c | Built-in & infrastructure account password audit | IdentityAccountInfo + IdentityInfo |
| Q8 | Multi-provider identity linking — cross-IdP sprawl and mismatches | IdentityAccountInfo |
Phase 3: Risk & Governance (Q9–Q12)
Run in parallel — no dependencies between queries.
| Query | Purpose | Tables |
|---|---|---|
| Q9 | Risk level distribution | IdentityAccountInfo |
| Q10 | MDI tags analysis (Sensitive, Honeytoken) | IdentityAccountInfo |
| Q11 | Service account inventory | IdentityAccountInfo |
| Q12 | Account creation trend | IdentityAccountInfo |
Phase 4: Score Computation & Report Generation
- Compute per-dimension scores from Phase 1–3 data
- Sum dimension scores for composite Identity Posture Score
- Generate report in requested output mode
- Report total elapsed time
Sample KQL Queries
All queries below are verified against the IdentityAccountInfo table schema (2026-03-24). Use them exactly as written, substituting only where noted.
Query 1: Global Inventory Summary
IdentityAccountInfo
| summarize
TotalRows = count(),
UniqueAccounts = dcount(AccountId),
UniqueIdentities = dcount(IdentityId),
UniqueUPNs = dcount(AccountUpn),
MinTimestamp = min(Timestamp),
MaxTimestamp = max(Timestamp),
SourceProviders = make_set(SourceProvider),
AccountTypes = make_set(Type),
AccountStatuses = make_set(AccountStatus)
Query 2: Account Status Distribution by Provider
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| summarize Count = count() by SourceProvider, AccountStatus, Type
| order by Count desc
Query 3: Department Distribution
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(Department)
| summarize Count = dcount(AccountId) by Department
| order by Count desc
| take 20
Query 4: Privileged Account Audit
🔴 Security-critical query — identifies accounts with high-privilege roles across all identity providers.
let highPrivRoles = dynamic([
"Global Administrator", "Security Administrator", "Exchange Administrator",
"SharePoint Administrator", "Application Administrator",
"Cloud App Security Administrator", "Privileged Role Administrator",
"Intune Administrator", "Compliance Administrator",
"Privileged Authentication Administrator", "User Administrator",
"Azure AD Joined Device Local Administrator",
"SYSTEM_ADMINISTRATOR", "PRIVILEGE_CLOUD_ADMINISTRATORS",
"PRIVILEGE_CLOUD_ADMINISTRATORS_LITE",
"TDR_ADMINISTRATOR", "RISK_MANAGEMENT_ADMIN"
]);
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus in ("Enabled", "ACTIVE")
| where isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"
| mv-expand Role = parse_json(AssignedRoles)
| extend RoleName = tostring(Role)
| where RoleName in (highPrivRoles)
| summarize
HighPrivRoles = make_set(RoleName),
RoleCount = dcount(RoleName)
by AccountUpn, DisplayName, SourceProvider, AccountStatus
| order by RoleCount desc
Post-processing:
- Flag accounts with >2 high-privilege roles as excessive
- Cross-reference with Q8 (multi-provider) — accounts with high-priv roles in both AAD and CyberArk/AD represent dual-privilege risk
- Check if roles are permanent (currently
EligibleRolesis empty in Preview, so all discovered roles appear permanent) - Reference MDI Assessment: Entra ID privileged users also privileged in AD
- Pagination check: If Q4 returns exactly 10,000 rows (AH limit), re-run with
| take 500on the final output and note "Results may be truncated" in the report - Global Administrator callout: After the high-priv table, always add a dedicated GA callout listing all accounts with the Global Administrator role. GA is the highest-risk role and should be immediately scannable
Query 4b: Full Role Distribution
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"
| mv-expand Role = parse_json(AssignedRoles)
| summarize AccountCount = dcount(AccountId) by tostring(Role)
| order by AccountCount desc
| take 25
Query 5: Stale Account Detection
🔴 Security-critical query — identifies enabled accounts with no logon activity in 90 days.
let lastLogon = IdentityLogonEvents
| where Timestamp > ago(90d)
| summarize LastLogon = max(Timestamp) by AccountUpn;
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus in ("Enabled", "ACTIVE")
| join kind=leftouter (lastLogon) on AccountUpn
| where isnull(LastLogon) or LastLogon < ago(90d)
| summarize
StaleEnabledAccounts = count(),
WithRoles = countif(isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"),
WithGroups = countif(isnotempty(tostring(GroupMembership)) and tostring(GroupMembership) != "[]"),
Providers = make_set(SourceProvider)
by Type
| order by StaleEnabledAccounts desc
Post-processing:
- Stale accounts with active roles = highest priority for deprovisioning
- Reference MDI Assessment: Remove stale Active Directory accounts
- Note: IdentityLogonEvents has 30d retention in AH. For accurate 90d stale detection, would need SigninLogs via Data Lake. The 30d window still catches accounts with zero recent activity
Query 5b: Stale Account Provider Breakdown
let lastLogon = IdentityLogonEvents
| where Timestamp > ago(30d)
| summarize LastLogon = max(Timestamp) by AccountUpn;
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus in ("Enabled", "ACTIVE")
| join kind=leftouter (lastLogon) on AccountUpn
| where isnull(LastLogon)
| summarize StaleCount = count() by SourceProvider
| order by StaleCount desc
Query 6: Deleted Account Hygiene
🟠 Governance query — identifies deleted accounts that still retain role assignments and group memberships.
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus == "Deleted"
| extend HasRoles = isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"
| extend HasGroups = isnotempty(tostring(GroupMembership)) and tostring(GroupMembership) != "[]"
| summarize
TotalDeleted = count(),
DeletedWithRoles = countif(HasRoles),
DeletedWithGroups = countif(HasGroups),
DeletedWithBoth = countif(HasRoles and HasGroups),
Providers = make_set(SourceProvider)
Post-processing:
- Deleted accounts with roles = orphan permission risk
- Note: in some providers, "Deleted" status may lag actual deletion. Cross-reference with
DeletedDateTimeif populated - Large numbers indicate lifecycle management gaps
Query 7: Password Posture (IdentityAccountInfo + IdentityInfo Join)
🟠 Security query — combines password age from IdentityAccountInfo with AD policy flags from IdentityInfo. Adapted from Alex Verboon's MDI Password Security Posture Assessment with critical fixes for join direction, null UAC handling, and epoch date filtering.
Key design decisions:
- IdentityAccountInfo as primary (left) table — using IdentityInfo as primary inflates row counts because IdentityInfo has multiple snapshots per identity. IdentityAccountInfo deduplicated by
IdentityIdgives the true enabled-account baseline. - Join on
IdentityId(notAccountUpn) —IdentityIdis the stable cross-table key. UPN-based joins can produce 1:many inflation when multiple IdentityInfo records share a UPN. isnotnull(UserAccountControl)guard on IdentityInfo — see Pitfall #8 below. Without this,array_index_of(null, "value")returnsnull, andnull != -1evaluates totruein KQL, making ALL null-UAC accounts appear to have PasswordNeverExpires.datetime(2000-01-01)date guard — some records contain placeholder dates (e.g.,0001-01-01) producing 700,000+ day password ages.
let accountinfo = IdentityAccountInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where AccountStatus !in ("Disabled", "Deleted", "DEPROVISIONED", "SUSPENDED")
| where Type != "ServiceAccount"
| extend DaysSinceLastPasswordChange =
iff(isnull(LastPasswordChangeTime) or LastPasswordChangeTime < datetime(2000-01-01), int(null),
datetime_diff('day', now(), LastPasswordChangeTime))
| extend Sensitive = array_index_of(Tags, "Sensitive") != -1
| project IdentityId, AccountUpn, AccountStatus, SourceProvider,
LastPasswordChangeTime, DaysSinceLastPasswordChange, Sensitive;
let IdInfo = IdentityInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotnull(UserAccountControl)
| extend PasswordNeverExpires = array_index_of(UserAccountControl, "PasswordNeverExpires") != -1,
PasswordNotRequired = array_index_of(UserAccountControl, "PasswordNotRequired") != -1
| project IdentityId, PasswordNeverExpires, PasswordNotRequired;
accountinfo
| join kind=leftouter (IdInfo) on IdentityId
| summarize
TotalEnabled = count(),
WithPasswordData = countif(isnotnull(DaysSinceLastPasswordChange)),
AvgPasswordAgeDays = avgif(DaysSinceLastPasswordChange, isnotnull(DaysSinceLastPasswordChange)),
MaxPasswordAgeDays = maxif(DaysSinceLastPasswordChange, isnotnull(DaysSinceLastPasswordChange)),
PwdOver365d = countif(DaysSinceLastPasswordChange > 365),
WithUACData = countif(isnotnull(PasswordNeverExpires)),
PwdNeverExpires = countif(PasswordNeverExpires == true),
PwdNotRequired = countif(PasswordNotRequired == true),
SensitiveAccounts = countif(Sensitive)
Post-processing:
WithUACDatashows how many accounts had AD UAC flags to check — only on-prem AD accounts monitored by MDI will have this dataPwdNeverExpiresandPwdNotRequiredare now accurate counts (not directional) thanks to theisnotnull(UserAccountControl)guard- Report password data coverage:
WithPasswordData / TotalEnabled— if < 5%, use condensed template
Query 7b: Password Age Distribution Buckets (with PwdNeverExpires Cross-Reference)
let accountinfo = IdentityAccountInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotnull(LastPasswordChangeTime)
| where LastPasswordChangeTime > datetime(2000-01-01)
| where AccountStatus !in ("Disabled", "Deleted", "DEPROVISIONED", "SUSPENDED")
| where Type != "ServiceAccount"
| extend DaysSinceLastPasswordChange = datetime_diff('day', now(), LastPasswordChangeTime)
| project IdentityId, DaysSinceLastPasswordChange;
let IdInfo = IdentityInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotnull(UserAccountControl)
| extend PasswordNeverExpires = array_index_of(UserAccountControl, "PasswordNeverExpires") != -1
| project IdentityId, PasswordNeverExpires;
accountinfo
| join kind=leftouter (IdInfo) on IdentityId
| extend PasswordAgeBucket = case(
DaysSinceLastPasswordChange <= 30, "0-30 days",
DaysSinceLastPasswordChange <= 90, "31-90 days",
DaysSinceLastPasswordChange <= 180, "91-180 days",
DaysSinceLastPasswordChange <= 365, "181-365 days",
"365+ days")
| summarize Accounts = count(), PwdNeverExpires = countif(PasswordNeverExpires == true) by PasswordAgeBucket
| order by Accounts desc
Post-processing:
- The
PwdNeverExpirescolumn per bucket reveals the root cause of stale passwords — if most 365+ day accounts have PwdNeverExpires, the issue is AD password policy, not user neglect - Highlight correlation: "X of Y accounts with passwords >365 days old have PasswordNeverExpires set"
Query 7c: Built-In & Infrastructure Account Password Check
🔴 Security query — audits password posture of built-in and infrastructure accounts (krbtgt, Administrator, Guest, MSOL_, AAD_, ADSync*). These accounts are high-value targets — krbtgt password age directly affects Golden Ticket attack risk.
let accountinfo = IdentityAccountInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| extend DaysSinceLastPasswordChange =
iff(isnull(LastPasswordChangeTime) or LastPasswordChangeTime < datetime(2000-01-01), int(null),
datetime_diff('day', now(), LastPasswordChangeTime))
| extend Sensitive = array_index_of(Tags, "Sensitive") != -1
| project IdentityId, AccountUpn, AccountStatus, SourceProvider,
LastPasswordChangeTime, DaysSinceLastPasswordChange, Sensitive;
let IdInfo = IdentityInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotempty(AccountName)
| extend PasswordNeverExpires = iff(isnotnull(UserAccountControl), array_index_of(UserAccountControl, "PasswordNeverExpires") != -1, bool(null)),
PasswordNotRequired = iff(isnotnull(UserAccountControl), array_index_of(UserAccountControl, "PasswordNotRequired") != -1, bool(null))
| extend OUPath = extract(@"CN=[^,]+,(.*)", 1, DistinguishedName)
| project IdentityId, AccountName, AccountDomain, AccountDisplayName,
PasswordNeverExpires, PasswordNotRequired, OUPath;
IdInfo
| join kind=leftouter (accountinfo) on IdentityId
| where tolower(AccountName) in ("krbtgt", "administrator", "guest", "admin")
or tolower(AccountName) startswith "msol_"
or tolower(AccountName) startswith "aad_"
or tolower(AccountName) startswith "adsync"
| project AccountName, AccountDomain, AccountDisplayName, AccountStatus,
SourceProvider, LastPasswordChangeTime, DaysSinceLastPasswordChange,
PasswordNeverExpires, PasswordNotRequired, Sensitive, OUPath
| order by DaysSinceLastPasswordChange desc
Post-processing:
- krbtgt: Microsoft recommends rotation every 180 days. Flag any krbtgt account with password >180d as 🔴 High Risk (Golden Ticket attack window). >365d is critical
- MSOL_/AAD_/ADSync: Azure AD Connect service accounts. If
AccountStatus == "Enabled"but the sync is decommissioned, flag as 🟠 stale privileged account. PwdNeverExpires is common but should be monitored - Guest: PwdNotRequired is standard Windows behavior for Guest accounts. Flag only if Guest is Enabled (should always be Disabled)
- Administrator: Check if renamed (may not appear). Flag if password >365d
Query 8: Multi-Provider Identity Linking
🟡 Governance query — identifies identities that span multiple identity providers, including status mismatches.
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| summarize
Providers = make_set(SourceProvider),
ProviderCount = dcount(SourceProvider),
Statuses = make_set(AccountStatus),
StatusCount = dcount(AccountStatus),
UPNs = make_set(AccountUpn),
RolesSummary = make_set(tostring(AssignedRoles))
by IdentityId
| where ProviderCount > 1
| extend HasStatusMismatch = StatusCount > 1
| summarize
MultiProviderIdentities = count(),
WithStatusMismatch = countif(HasStatusMismatch),
MaxProviders = max(ProviderCount),
ProviderCombos = make_set(strcat_array(Providers, " + "))
Query 8b: Multi-Provider Identity Detail (Top 15)
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| summarize
Providers = make_set(SourceProvider),
ProviderCount = dcount(SourceProvider),
Statuses = make_set(AccountStatus),
UPNs = make_set(AccountUpn),
Roles = make_set(tostring(AssignedRoles))
by IdentityId, DisplayName
| where ProviderCount > 1
| order by ProviderCount desc
| take 15
Query 9: Risk Level Distribution
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(tostring(SourceProviderRiskLevel))
| summarize
Count = dcount(AccountId),
EnabledCount = dcountif(AccountId, AccountStatus in ("Enabled", "ACTIVE")),
WithHighPrivRoles = dcountif(AccountId, isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]")
by tostring(SourceProviderRiskLevel), SourceProvider
| order by Count desc
Post-processing:
- High-risk accounts that are Enabled + have high-priv roles = critical finding
- Cross-reference with IdentityInfo
RiskStatusfor Entra accounts to check if risk has been remediated/dismissed
Query 10: MDI Tags Analysis
🏷️ Governance query — analyzes Defender for Identity tags (Sensitive, Honeytoken, custom tags).
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(tostring(Tags)) and tostring(Tags) != "[]"
| mv-expand Tag = parse_json(Tags)
| extend TagName = tostring(Tag)
| summarize
AccountCount = dcount(AccountId),
Accounts = make_set(AccountUpn, 10)
by TagName, SourceProvider
| order by AccountCount desc
Post-processing:
- Sensitive-tagged accounts should be cross-referenced with Q4 (privileged) and Q9 (risk) for comprehensive posture view
- Honeytoken accounts — verify monitoring is active (any logon should generate an alert)
Query 11: Service Account Inventory
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where Type == "ServiceAccount"
| summarize
Count = count(),
Providers = make_set(SourceProvider),
Statuses = make_set(AccountStatus),
EnabledCount = countif(AccountStatus in ("Enabled", "ACTIVE")),
WithRoles = countif(isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]")
Query 12: Account Creation Trend
📈 Trend query — shows account creation velocity over time.
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(CreatedDateTime)
| summarize AccountsCreated = count() by bin(CreatedDateTime, 7d), SourceProvider
| order by CreatedDateTime asc
Output Modes
Mode 1: Inline Chat Summary
Render the full analysis directly in the chat response. Best for quick review.
Mode 2: Markdown File Report
Save a comprehensive report to disk at:
reports/identity-posture/Identity_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md
Where {tenant} is a short identifier for the tenant (e.g., contoso, zava). Derive from the tenant domain in config.json or ask the user. If unknown, omit the tenant tag.
Mode 3: Both
Generate the markdown file AND provide an inline summary in chat.
Always ask the user which mode before generating output.
Inline Report Template
Render the following sections in order. Omit sections only if explicitly noted as conditional.
🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).
# 🔐 Identity Security Posture Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** IdentityAccountInfo (Advanced Hunting — Preview)
**Analysis Period:** <EarliestRecord> → <LatestRecord>
**Identity Providers:** <comma-separated provider list>
---
## Executive Summary
<2-3 sentences: total accounts/identities, key risk findings, overall score>
**Overall Risk Rating:** 🔴/🟠/🟡/✅ <RATING> (<Score>/100)
---
## Key Metrics
| Metric | Value |
|--------|-------|
| Total Accounts (deduplicated) | <N> |
| Unique Identities | <N> |
| Identity Providers | <N> (<list>) |
| Enabled Accounts | <N> |
| Disabled Accounts | <N> |
| Deleted Accounts | <N> |
| Service Accounts | <N> |
| Accounts with High-Privilege Roles | <N> |
| Stale Accounts (no logon 30d*) | <N> |
| Multi-Provider Identities | <N> |
| MDI Sensitive-Tagged Accounts | <N> |
> \* IdentityLogonEvents has 30-day retention in Advanced Hunting. True 90-day stale count is lower. See Stale & Deleted Account Hygiene section for details.
---
## 🔍 Identity Inventory
### Accounts by Provider
| Provider | Accounts | Enabled | Disabled | Deleted | Other | Service Accounts |
|----------|----------|---------|----------|---------|-------|------------------|
| <provider> | <N> | <N> | <N> | <N> | <N> | <N> |
| **Total** | **<N>** | **<N>** | **<N>** | **<N>** | **<N>** | **<N>** |
> **Account count note:** The provider breakdown may sum to slightly more than the deduplicated "Total Accounts" in Key Metrics because `arg_max(Timestamp, *) by AccountId` resolves each account to a single snapshot, while a small number of AccountIds may share provider rows. Always use the deduplicated count from Q1 as the authoritative total.
### Account Status Vocabulary by Provider
| Status | Meaning | Providers |
|--------|---------|----------|
| Enabled / ACTIVE | Active account | AAD, AD, SailPoint, CyberArk, Okta, Ping |
| Disabled | Administratively disabled | AAD, AD |
| Deleted | Soft-deleted (AAD recycle bin) | AAD |
| NONE | No status (SailPoint) | SailPoint |
| INACTIVE | Deactivated | SailPoint |
| STAGED | Provisioned but not activated | Okta |
| DEPROVISIONED | Fully deactivated | Okta |
| PROVISIONED | Created but pending activation | Okta |
| INVITED | Pending acceptance | CyberArk |
| CREATED | Newly created | CyberArk |
| SUSPENDED | Temporarily suspended | CyberArk |
> Include this table in every report. Values are discovered dynamically from Q2 output — add any new statuses observed.
### Department Distribution (Top 15)
| Department | Accounts |
|------------|----------|
| <dept> | <N> |
> **Department aggregation rule:** When case-inconsistent values exist (e.g., "Internal" vs "internal"), collapse them into a single row with combined count and note the inconsistency: `> ⚠️ Department values have case inconsistency: "Internal" (N) and "internal" (N) appear as separate values. Recommend standardizing.`
---
## 👑 Privileged Account Audit
### High-Privilege Role Holders
| Account | Provider | Roles | Status |
|---------|----------|-------|--------|
| <upn> | <provider> | <role list> | <status> |
> 🔴 **Global Administrators (<N>):** <comma-separated list of GA account UPNs> — Best practice: max 2 permanent GA accounts (break glass only). Convert user-facing GA accounts to PIM-eligible.
### Role Distribution (Top 15)
| Role | Account Count |
|------|---------------|
| <role> | <N> |
**Assessment:**
- <emoji> <evidence-based finding about privilege distribution>
- <emoji> <PIM/permanent role finding>
- <emoji> <cross-provider privilege finding>
---
## 🗑️ Stale & Deleted Account Hygiene
### Stale Accounts (Enabled, No Logon in 30d)
| Metric | Value |
|--------|-------|
| Total Stale Enabled | <N> |
| Stale with Active Roles | <N> |
| Stale with Group Memberships | <N> |
| Stale by Provider | <breakdown> |
> ⚠️ **Important caveat:** IdentityLogonEvents has **30-day retention** in Advanced Hunting. Accounts that last logged in 31–90 days ago appear "stale" in this analysis. The true 90-day stale count is likely lower. For accurate 90-day stale detection, cross-reference with SigninLogs via Data Lake (90d+ retention).
### Deleted Accounts with Residual Permissions
| Metric | Value |
|--------|-------|
| Total Deleted | <N> |
| Deleted with Roles | <N> |
| Deleted with Groups | <N> |
| Deleted with Both | <N> |
**Assessment:**
- <emoji> <evidence-based finding about stale account risk>
- <emoji> <deleted account orphan risk finding>
---
## 🔑 Password Posture
<If LastPasswordChangeTime coverage ≥ 5% of enabled accounts — render full section:>
| Metric | Value |
|--------|-------|
| Accounts with Password Data | <WithPasswordData>/<TotalEnabled> (<pct>%) |
| Accounts with UAC Data | <WithUACData> |
| PasswordNeverExpires | <N> of <WithUACData> with UAC data |
| PasswordNotRequired | <N> of <WithUACData> with UAC data |
| Sensitive Accounts | <N> |
| Avg Password Age (days) | <N> |
| Max Password Age (days) | <N> |
| Passwords > 365 days | <PwdOver365d> |
### Password Age Distribution
| Bucket | Accounts | PwdNeverExpires | % |
|--------|----------|-----------------|---|
| 0-30 days | <N> | <N> | <pct>% |
| 31-90 days | <N> | <N> | <pct>% |
| 91-180 days | <N> | <N> | <pct>% |
| 181-365 days | <N> | <N> | <pct>% |
| 365+ days | <N> | <N> | <pct>% |
<Highlight if PwdNeverExpires correlates with 365+ bucket:>
> 🔴 **X of Y accounts with passwords >365 days old have PasswordNeverExpires set** — these passwords will never rotate without manual intervention.
<If LastPasswordChangeTime coverage < 5% of enabled accounts — render condensed format instead:>
⚠️ **Limited data availability:** `LastPasswordChangeTime` populated for <N>/<TotalEnabled> enabled accounts (<pct>%).
Among accounts with data: <N> have passwords >365d old, <N> changed within 30d.
For comprehensive assessment, use Graph API (`/users?$select=passwordPolicies,lastPasswordChangeDateTime`).
### AD Password Policy Flags (via IdentityInfo UAC enrichment)
| Flag | Accounts | Scope |
|------|----------|-------|
| PasswordNeverExpires | <N> | <WithUACData> accounts with UAC data (on-prem AD with MDI only) |
| PasswordNotRequired | <N> | <WithUACData> accounts with UAC data |
> **Data quality note:** UAC flags are only available for on-prem AD accounts monitored by MDI (~<WithUACData>/<TotalEnabled> accounts in this environment). The `isnotnull(UserAccountControl)` filter ensures accurate counts — no inflation from null-UAC accounts.
### Built-In & Infrastructure Account Password Audit
<Render from Q7c results. Always include this section — built-in accounts exist in every AD environment.>
| Account | Domain | Status | Password Age | PwdNeverExpires | PwdNotRequired | Sensitive |
|---------|--------|--------|-------------|----------------|----------------|----------|
| <AccountName> | <AccountDomain> | <Status> | <DaysSinceLastPasswordChange>d | <Yes/No> | <Yes/No> | <Yes/No> |
<Flag critical findings:>
- 🔴 **krbtgt** accounts with password >180 days — Golden Ticket attack window (Microsoft recommends 180-day rotation)
- 🟠 **MSOL_/AAD_/ADSync** accounts still Enabled with PwdNeverExpires — review if Azure AD Connect is still in use
- 🟡 **Guest** accounts with PwdNotRequired — standard Windows behavior, flag only if Enabled
---
## 🟠 Risk Distribution
| Risk Level | Provider | Total | Enabled | With High-Priv Roles |
|------------|----------|-------|---------|----------------------|
| 🔴 High | <provider> | <N> | <N> | <N> |
| 🟠 Medium | <provider> | <N> | <N> | <N> |
| 🟡 Low | <provider> | <N> | <N> | <N> |
| ⚪ None | <provider> | <N> | <N> | <N> |
**Assessment:**
- <emoji> <evidence-based finding about active high-risk accounts>
---
## 🔗 Multi-Provider Identity Linking
| Metric | Value |
|--------|-------|
| Identities Spanning Multiple Providers | <N> |
| Max Providers per Identity | <N> |
| Identities with Status Mismatches | <N> |
| Provider Combinations | <list> |
<If status mismatches found:>
⚠️ **Status Mismatches Detected:** <N> identities have inconsistent status across providers (e.g., Enabled in AAD but DEPROVISIONED in Okta). This indicates lifecycle management gaps.
<Top 5 multi-provider identities table>
---
## 🏷️ Sensitive & Honeytoken Accounts
| Tag | Count | Provider | Sample Accounts |
|-----|-------|----------|----------------|
| <tag> | <N> | <provider> | <upn list> |
**Assessment:**
- <emoji> <honeytoken monitoring confirmation>
- <emoji> <sensitive account protection finding>
---
## Identity Posture Score Card
```
┌─────────────────────────────────────────────────────────────┐
│ IDENTITY POSTURE SCORE: <NN>/100 │
│ Rating: <EMOJI> <RATING> │
├─────────────────────────────────────────────────────────────┤
│ Stale/Deleted [<bar>] <N>/20 (<short detail>) │
│ Privileged [<bar>] <N>/20 (<short detail>) │
│ Password [<bar>] <N>/20 (<short detail>) │
│ Risk Distrib. [<bar>] <N>/20 (<short detail>) │
│ Identity Sprawl[<bar>] <N>/20 (<short detail>) │
└─────────────────────────────────────────────────────────────┘
```
> **Score card detail rule:** Keep `(<short detail>)` to ~30 characters max so text fits within the box. Use abbreviated phrasing, e.g., `885 deleted w/roles; high stale %` not `885 deleted accounts with active role assignments`.
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |
---
## Recommendations
1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...
---
## Next Steps
<1-2 sentences anchoring the immediate follow-up action based on the highest-priority recommendation. Reference the specific recommendation number.>
Example:
> Begin with Recommendation #1 (High-Risk account remediation) by exporting the 560 affected accounts to the security operations team. Schedule a follow-up identity posture review after remediation to verify score improvement.
---
## Appendix: Query Execution Summary
| Query | Description | Records | Time |
|-------|-------------|---------|------|
| Q1 | Global Inventory | <N> | <time> |
| Q2 | Status by Provider | <N> | <time> |
| ... | ... | ... | ... |
Markdown File Report Template
When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:
reports/identity-posture/Identity_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md
Where {tenant} matches the Mode 2 filename convention above.
Include the following additional sections in the file report that are omitted from inline:
- Full privileged account detail table (all high-priv accounts, not just top N)
- Complete multi-provider identity listing (all multi-IdP identities with UPN mapping)
- Per-provider account detail (full status/type breakdown per provider)
- Stale account detail (top stale accounts with last logon dates)
- Preview field coverage summary (which documented fields are/aren't populated)
File Report Header
# Identity Security Posture Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** IdentityAccountInfo (Advanced Hunting — Preview)
**Enrichment:** IdentityInfo, IdentityLogonEvents
**Analysis Period:** <EarliestRecord> → <LatestRecord> (<N> days)
**Identity Providers:** <N> (<list with account counts>)
**Total Accounts:** <N> (Enabled/Active: ~<N> | Disabled: ~<N> | Deleted: <N> | Other: ~<N>)
**Unique Identities:** <N>
---
Account count convention: Use the deduplicated count from Q1 (
dcount(AccountId)) as the authoritative "Total Accounts". Provider breakdowns from Q2 may sum slightly higher due to snapshot resolution. Present status sub-counts with~prefix when derived from Q2 provider rows to signal they are approximate breakdowns.
SVG Dashboard Generation
📊 Optional post-report step. After an Identity Security Posture report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/identity-posture/Identity_Posture_Report_<tenant>_<date>.md - Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/identity-posture/{report_name}_dashboard.svg
The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.
Known Pitfalls
1. IdentityAccountInfo Is Advanced Hunting Only
Problem: The table does NOT exist in Sentinel Data Lake. Querying via mcp_sentinel-data_query_lake returns SemanticError: Failed to resolve table.
Solution: Always use RunAdvancedHuntingQuery. The table has 30-day retention in AH.
2. Multiple Records Per Account (State Snapshots)
Problem: The table logs configuration snapshots over time (state changes + 24h refresh). Querying without deduplication inflates counts.
Solution: Always use | summarize arg_max(Timestamp, *) by AccountId for current state analysis. Use by IdentityId when you want the latest per unified identity.
3. AccountStatus Values Are Provider-Specific
Problem: Each identity provider uses its own status vocabulary:
- AAD:
Enabled,Disabled,Deleted - SailPoint:
ACTIVE,NONE,INACTIVE - Okta:
STAGED,ACTIVE,DEPROVISIONED,PROVISIONED - CyberArk:
ACTIVE,INVITED,SUSPENDED,CREATED
Solution: When filtering for "active/enabled" accounts, use AccountStatus in ("Enabled", "ACTIVE") to catch both AAD and third-party providers. For "disabled" filtering, include provider-specific disabled states.
4. AssignedRoles Contains Mixed Role Vocabularies
Problem: AssignedRoles contains role names from different providers in the same column — AAD roles ("Global Administrator"), CyberArk roles ("SYSTEM_ADMINISTRATOR"), Okta roles, etc. They are NOT normalized.
Solution: When searching for high-privilege roles, include role names from all providers in the highPrivRoles list. See Q4 for the canonical list.
5. EligibleRoles Is Empty (Preview)
Problem: The EligibleRoles column (for PIM-eligible roles) is documented but currently returns empty for all accounts.
Impact: Cannot distinguish permanent vs PIM-eligible roles from this table alone. All discovered roles in AssignedRoles should be treated as potentially permanent. For accurate PIM data, use Graph API (/roleManagement/directory/roleEligibilityScheduleInstances).
6. EnrolledMfas/TenantMembershipType/AuthenticationMethod Are Empty
Problem: These fields are documented but not yet populated in any provider. This is expected for a Preview table.
Solution: Report as "Not yet populated (Preview)" — not as absence of MFA or guest accounts. For MFA data, use SigninLogs (AuthenticationDetails) or Graph API. For Guest/Member, use IdentityInfo (TenantMembershipType — same issue) or Graph API.
7. LastPasswordChangeTime Is Sparse for AAD
Problem: Only ~1% of accounts have LastPasswordChangeTime populated, mostly non-AAD providers (CyberArk, Okta). AAD accounts typically show null. Some records contain placeholder dates (e.g., 0001-01-01T00:00:00Z) that produce nonsensical password age values (700,000+ days).
Solution: For AD-specific password posture, join with IdentityInfo which has UserAccountControl flags (PasswordNeverExpires, PasswordNotRequired). For cloud-only AAD, password age data may need Graph API enrichment. Always filter where LastPasswordChangeTime > datetime(2000-01-01) to exclude placeholder dates before computing avg/max.
8. array_index_of(null) Returns Null — Not -1
Problem: When UserAccountControl is null (which it is for ~99% of identities in IdentityInfo — only on-prem AD accounts with MDI have it), array_index_of(null, "PasswordNeverExpires") returns null — NOT -1. In KQL, null != -1 evaluates to true. This means Verboon's original pattern array_index_of(UserAccountControl, "PasswordNeverExpires") != -1 incorrectly returns true for ALL accounts with null UserAccountControl, massively inflating PwdNeverExpires counts (e.g., 16,197 false positives out of 16,297 identities).
Solution: In the IdentityInfo let block, add | where isnotnull(UserAccountControl) BEFORE computing the boolean flags. This limits the UAC analysis to accounts that actually have UAC data (~100 out of 16,000+ in a typical environment). The Q7 query uses leftouter join, so accounts without UAC data get null for the flag columns, and countif(PasswordNeverExpires == true) correctly excludes nulls. Counts from this pattern are now accurate, not directional.
8b. Q7 IdentityInfo Join — Use IdentityId, Not AccountUpn
Problem: Joining on AccountUpn can produce 1:many inflation when multiple IdentityInfo records share the same UPN. Additionally, using IdentityInfo as the primary (left) table inflates the row count because IdentityInfo contains multiple snapshot records per identity.
Solution: Use IdentityAccountInfo as the primary table (deduplicated by IdentityId). Join IdentityInfo on IdentityId (the stable cross-table identity key). Deduplicate IdentityInfo by IdentityId as well. This ensures 1:1 matching and the correct enabled-account baseline.
9. Tags Only Available on Accounts with MDI Coverage
Problem: Tags (Sensitive, Honeytoken, etc.) are populated only by Defender for Identity. Accounts from providers without MDI integration won't have tags.
Solution: Don't interpret "no tags" as "not sensitive." Report the count of tagged accounts and note that only MDI-monitored accounts can be tagged.
10. IdentityLogonEvents Has 30-Day Retention in AH
Problem: When using IdentityLogonEvents for stale account detection (Q5), AH only retains 30 days. Accounts that last logged in 31–90 days ago will appear "stale" if only checking IdentityLogonEvents.
Solution: For accurate 90-day stale detection, consider enriching with SigninLogs via Data Lake (90d+ retention). The 30d IdentityLogonEvents window is still useful for identifying accounts with zero recent activity.
11. Deduplication Key: AccountId vs IdentityId
Problem: AccountId is unique per provider-account pair. IdentityId is the unified identity (one person may have multiple AccountIds). Using the wrong key inflates or deflates counts.
Solution:
- Use
by AccountIdwhen counting individual accounts/provider-specific analysis - Use
by IdentityIdwhen counting people/unified identity analysis - Q7 (password posture) uses
by IdentityIdbecause it joins with IdentityInfo per person - Q8 (multi-provider) groups by IdentityId to detect cross-provider linking
12. SourceProviderRiskLevel vs IdentityInfo.RiskLevel
Problem: Both tables have risk level fields but they may differ:
IdentityAccountInfo.SourceProviderRiskLevel: Risk from the source provider (AAD Identity Protection, AD MDI)IdentityInfo.RiskLevel: Entra ID risk level +RiskStatusfor remediation state
Solution: For a complete risk picture, check both. SourceProviderRiskLevel covers more providers; IdentityInfo.RiskLevel + RiskStatus gives Entra-specific remediation context.
13. Provider Count Varies by Tenant
Problem: Not all tenants have 6 providers connected. The provider list depends on which identity sources are integrated with Defender XDR / MDI.
Solution: Always report the actual providers found rather than assuming a fixed set. The inventory query (Q1) discovers this dynamically.
Quality Checklist
Before delivering the report, verify:
- All queries used
arg_max(Timestamp, *) by AccountId(orby IdentityIdwhere noted) - All queries ran via
RunAdvancedHuntingQuery(not Data Lake, except Q5b enrichment) - Zero-result queries reported with explicit absence confirmation (✅ pattern)
- Identity Posture Score computation is transparent with per-dimension evidence
- AccountStatus filtering handles provider-specific vocabularies
- Privileged account audit includes roles from all providers (AAD + CyberArk + Okta)
- Empty Preview fields reported as "Not yet populated (Preview)" not "No data"
- Password posture correctly notes LastPasswordChangeTime sparsity
- Multi-provider identity analysis includes status mismatch detection
- Recommendations are prioritized and evidence-based
- All hyperlinks copied verbatim from URL Registry
- No PII from live environments in the SKILL.md file itself
.github/skills/incident-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill incident-investigation -g -y
SKILL.md
Frontmatter
{
"name": "incident-investigation",
"description": "Use this skill when asked to investigate a security incident by ID from Microsoft Defender XDR or Microsoft Sentinel. Triggers on keywords like \"investigate incident\", \"incident ID\", \"incident investigation\", \"analyze incident\", \"triage incident\", or when an incident number\/ID is mentioned with investigation context. This skill provides comprehensive incident analysis including metadata retrieval, alert listing, asset enumeration, evidence filtering, and deep entity investigation using Sentinel MCP tools and specialized skills.",
"drill_down_prompt": "Investigate incident {entity} — alert details, entity extraction, timeline reconstruction",
"threat_pulse_domains": [
"incidents"
]
}
Incident Investigation - Instructions
Purpose
This skill performs comprehensive security investigations on incidents from Microsoft Defender XDR and Microsoft Sentinel. It retrieves incident details, lists alerts, enumerates assets and evidences, and then performs deep investigation on user-selected entities using appropriate tools and specialized skills.
Investigation Flow:
- Phase 1: Incident Description - Retrieve metadata, alerts, assets, and evidences
- Phase 2: Incident Investigation Menu - Ask the user to select the incident assets and entities that should be investigated.
- Phase 2-A: User Investigation - Follow user-investigation skill workflow
- Phase 2-B: Device Investigation - Follow computer-investigation skill workflow
- Phase 2-C: IoC Investigation - Follow ioc-investigation skill workflow for IPs, URLs, Files, Domains, Hashes
- Phase 3: Looping to Phase 2 - Ask the user to select the further assets and entities that should be investigated.
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Phase 1: Incident Description - Metadata, Alerts, Assets, Evidences
- Phase 2: Incident Investigation Menu - Presenting the options
- Phase 2-A: User Investigation - Using user-investigation skill
- Phase 2-B: Device Investigation - Using computer-investigation skill
- Phase 2-C: IoC Investigation - Using ioc-investigation skill (IPs, URLs, Files, Domains, Hashes)
- Phase 3: Post Incident Investigation - Looping to phase 2
- JSON Export Structure - Required fields
- Error Handling - Troubleshooting guide
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY incident investigation:
- ALWAYS complete Phase 1 first - Retrieve full incident description before any deep investigation
- ALWAYS list Sentinel workspaces at the START of Phase 2 - Call
list_sentinel_workspacesMCP tool BEFORE presenting the investigation menu - ⛔ ALWAYS complete workspace selection BEFORE any investigation - This is a MANDATORY CHECKPOINT:
- If 1 workspace: auto-select and display to user
- If multiple workspaces: ASK USER to select and WAIT for response
- DO NOT proceed to any entity investigation without a workspace selected
- ALWAYS present extracted entities to user - After workspace selection, ask user which entities to investigate
- ALWAYS wait for user confirmation - Do not proceed with deep investigation until user selects entities
- ALWAYS use the correct tools for each entity type:
- Users → Follow
.github/skills/user-investigation/SKILL.md - Devices → Follow
.github/skills/computer-investigation/SKILL.md - IPs/URLs/Files/Domains/Hashes → Follow
.github/skills/ioc-investigation/SKILL.md
- Users → Follow
- ALWAYS track and report time after each major step
- ALWAYS filter evidences - Remove internal IPs (RFC1918) and tenant domains from investigation scope. Also remove all public IPs from the devices listed as assets involved in the incident.
- ALWAYS defang malicious/suspicious URLs and IPs - NEVER return them as clickable links. Use defang format:
hxxps://evil[.]com,203[.]0[.]113[.]42 - ⛔ NEVER auto-select a Sentinel workspace when multiple exist - Workspace selection is MANDATORY:
- ❌ DO NOT select a workspace on behalf of the user when multiple exist
- ❌ DO NOT switch to another workspace if a query fails
- ❌ DO NOT proceed with investigation without explicit user selection
- ✅ If query fails: STOP, report error, ask user to select different workspace
- ✅ If multiple workspaces: STOP, list all, WAIT for user selection
- ✅ Only auto-select if exactly ONE workspace exists
Incident ID Patterns:
| Pattern | Source | Tool to Use |
|---|---|---|
Numeric (e.g., 12345, 98765) |
Defender XDR / Sentinel | GetIncidentById |
| GUID format | Sentinel (internal) | Sentinel query_lake MCP tool |
INxx-xxxxx format |
Defender XDR | GetIncidentById |
⚠️ Sentinel → Defender XDR ID Mapping (Critical):
When an incident is discovered via Sentinel KQL (e.g., SecurityIncident or SecurityAlert tables), its IDs are Sentinel-local and will NOT work with the Triage MCP:
| Sentinel Field | Triage MCP Accepts? | Correct Field to Use |
|---|---|---|
SecurityIncident.IncidentNumber |
❌ Returns "not found" | Use SecurityIncident.ProviderIncidentId |
SecurityAlert.SystemAlertId |
❌ Returns "not found" | Extract parse_json(ExtendedProperties).IncidentId |
SecurityIncident.ProviderIncidentId |
✅ | Pass directly to GetIncidentById |
Rule: When querying SecurityIncident for later Triage MCP drill-down, always project ProviderIncidentId alongside IncidentNumber. Use ProviderIncidentId for all GetIncidentById calls.
Date Range Rules:
- Default analysis window: 7 days before current date to current date (Standard)
- Investigation depth options:
- Comprehensive: 30 days window (for thorough analysis)
- Standard: 7 days window (default)
- Quick: 1 day window (for rapid triage)
- Format: ISO 8601 (e.g.,
2026-01-17T00:00:00Zto2026-01-24T00:00:00Z)
Phase 1: Incident Description
This phase retrieves and presents all incident information. Follow the exact structure below.
1.1 Incident Metadata
Retrieve and list the incident's metadata using GetIncidentById:
| Field | Description |
|---|---|
| Title | Incident display name |
| Description | Detailed incident description |
| Status | Active, Resolved, Redirected |
| Severity | High, Medium, Low, Informational |
| Priority assessment | If available from incident data |
| Classification | TruePositive, FalsePositive, BenignPositive, etc. |
| Determination | Malware, Phishing, etc. |
| Created Date | When incident was created |
| First Activity Date | First malicious activity timestamp |
| Last Updated Date | Most recent modification |
| Assigned To | Analyst assigned to incident |
| MITRE Categories | Tactics and techniques involved |
| Tags | Labels applied to incident |
1.2 Incident Alerts
🔴 Tool Selection for Alert Retrieval
Use GetIncidentById with includeAlertsData=true to retrieve incident-specific alerts. This returns only alerts correlated to the incident.
⛔ DO NOT use ListAlerts to retrieve alerts for a specific incident. ListAlerts has NO incidentId parameter — it can only filter by createdAfter, createdBefore, severity, status. Calling it returns all tenant alerts (up to page size 10,000), not incident-specific ones. Any unsupported parameter (e.g., incidentId) is silently ignored.
If GetIncidentById(includeAlertsData=true) returns a truncated or excessively large response (e.g., incident has hundreds of correlated alerts from noise sources like Purview IRM or DLP), use RunAdvancedHuntingQuery as the fallback:
// Get alerts linked to the incident's primary user/entity
AlertInfo
| where Timestamp > datetime(<incident_created_minus_7d>)
| join kind=inner (
AlertEvidence
| where Timestamp > datetime(<incident_created_minus_7d>)
| where EntityType == "User"
| where AccountUpn =~ "<primary_user_upn>" or AccountObjectId == "<user_object_id>"
| distinct AlertId
) on AlertId
| project Timestamp, AlertId, Title, Severity, Category, AttackTechniques, DetectionSource, ServiceSource
| order by Timestamp asc
This approach bypasses the Triage MCP's alert cap and gives full control over date range and entity filtering.
Alert Fields to Retrieve
For each alert, retrieve:
- Alert name
- Tags
- Severity
- Investigation state
- Status
- Impacted assets
- Correlation reason
- Detection source
- First activity
- Last activity
Presentation Rules:
- Return as a table (exclude Alert ID column from display)
- Order by last activity date descending
- Add row numbers starting from 1
- If more than 30 alerts exist, note this after the table and provide a Defender portal link
- NEVER calculate and write the total number of alerts
1.3 Incident Assets
Retrieve and list ALL assets involved in the incident by type:
Device Assets:
| Field | Description |
|---|---|
| Name | Device hostname |
| Domain | AD domain |
| Risk Level | Device risk assessment |
| Exposure Level | Vulnerability exposure |
| OS Platform | Operating system |
User Assets:
| Field | Description |
|---|---|
| Display Name | User's full name |
| UPN | User Principal Name |
| User Status | Account status |
| Domain | User's domain |
| Department | Organizational department |
App Assets:
| Field | Description |
|---|---|
| App Name | Application name |
| App Client ID | OAuth client ID |
| Risk | Application risk level |
| Publisher | App publisher |
Cloud Resource Assets:
| Field | Description |
|---|---|
| Resource Name | Cloud resource identifier |
| Status | Resource status |
| Cloud Environment | Azure, AWS, GCP, etc. |
| Type | Resource type |
Count assets by type ONLY after retrieving complete lists.
1.4 Incident Evidences
Retrieve evidences classified as malicious or suspicious only:
Processes (Top 10):
- Get ALL malicious/suspicious processes
- Return only the 10 most probable signs of malicious activity (use judgment)
Files (Top 10):
- Get ALL malicious/suspicious files
- Return only the 10 most probable signs of malicious activity (use judgment)
IP Addresses (Top 10, Filtered):
- Get ALL malicious/suspicious IPs
- Filter out RFC1918 internal IPs: 10.x.x.x, 172.16-31.x.x, 192.168.x.x
- Filter out public IPs associated to the devices listed as assets involved in the incident
- Return only the first 10 from filtered list
- DEFANG ALL IPs:** When presenting IPs and domains to the user, ALWAYS use defanged format:
203[.]0[.]113[.]42,evil[.]com. NEVER output clickable malicious indicators.
URLs and DNS Domains (Top 10, Filtered):
- Get ALL malicious/suspicious URLs and DNS Domains
- Filter out tenant domain URLs (DNS domains associated with the organization)
- Return only the first 10 from filtered list
- DEFANG ALL URLs AND DNS DOMAINS:** When presenting URLs to the user, ALWAYS use defanged format:
hxxps://evil[.]com/path,hxxp://malware[.]net. NEVER output clickable malicious URLs.
AD Domains:
- Return ALL malicious/suspicious AD domains (no limit)
For each evidence type: If more than 10 exist, note this after the table and provide Defender portal link.
Phase 2: Incident Investigation Menu
⛔ MANDATORY CHECKPOINT: Workspace Selection
This checkpoint MUST be completed before ANY entity investigation can proceed.
Step 2.1: List Sentinel Workspaces
ALWAYS execute this step first, regardless of any other considerations:
list_sentinel_workspaces (MCP tool)
Store the result. This determines the workflow for Step 2.3.
Step 2.2: Present Entity Summary
Show a summary of the incident entities and assets from Phase 1:
- Users (with UPN and display name)
- Devices (with hostname and risk level)
- URLs (defanged)
- IPs (defanged, filtered)
- File hashes
- Domains (defanged)
🔴 DEFANG ALL URLs AND DOMAINS: When presenting URLs and DNS Domains to the user, ALWAYS use defanged format: hxxps://evil[.]com/path, hxxp://malware[.]net, evil[.]com. NEVER output clickable malicious URLs.
🔴 DEFANG ALL IPs: When presenting IPs to the user, ALWAYS use defanged format: 203[.]0[.]113[.]42. NEVER output clickable malicious indicators.
Step 2.3: Workspace Selection Gate
IF workspace_count == 1:
- Auto-select the single workspace
- Display: "Using Sentinel workspace: [NAME] ([ID])"
- Set SESSION_WORKSPACE_SELECTED = true
ELSE IF workspace_count > 1 AND SESSION_WORKSPACE_SELECTED == false:
- Display all workspaces with Name and ID
- ASK USER: "Which Sentinel workspace should I run my searches in? Select one or more, or choose 'all'."
- WAIT for user response
- Set SESSION_WORKSPACE_SELECTED = true after selection
ELSE IF workspace_count > 1 AND SESSION_WORKSPACE_SELECTED == true:
- Display: "Continuing with previously selected workspace: [NAME] ([ID])"
- DO NOT ask again
⛔ DO NOT PROCEED PAST THIS POINT WITHOUT A WORKSPACE SELECTED
If SESSION_WORKSPACE_SELECTED == false after Step 2.3, STOP and ask the user to select a workspace.
Step 2.4: Ask User to Select Entities
Ask the user:
"Which assets and entities involved in the incident should be investigated in depth? Please select them by providing their numbers or names, or simply ask to analyze all of them. The more entities you select, the longer the analysis will take."
🔴 DO NOT OFFER OTHER OPTIONS: Only ask the user whether they want to investigate one or more of the incident entities and assets listed above in more depth.
Read the response.
- If they do not want to proceed with the proposed investigations, ask them what they want to do.
- If they want to proceed with one or more of the proposed investigations, continue with Step 2.5.
Step 2.5: Start Investigations
Pre-flight check: Confirm SESSION_WORKSPACE_SELECTED == true before proceeding.
Proceed in accordance with the instructions described below for Phase 2-A, Phase 2-B, and Phase 2-C. When multiple investigation types are selected (users, devices, IoCs) run them in parallel as much as possible.
Phase 2-A: User Investigation
Pre-requisites (MANDATORY)
⛔ VERIFY BEFORE PROCEEDING:
- ✅
SESSION_WORKSPACE_SELECTED == true(workspace explicitly selected by user) - ✅
SELECTED_WORKSPACE_IDSarray is populated with user's selection - ✅ User has explicitly selected which user(s) to investigate
If any pre-requisite is FALSE: STOP and return to Phase 2.3 Workspace Selection Gate.
User Investigation Workflow
⚡ PARALLEL EXECUTION: When multiple users are selected, execute user investigations in parallel as much as possible.
📦 WORKSPACE CONTEXT: Pass the selected workspace(s) to all child skill invocations:
- Use
SELECTED_WORKSPACE_IDSfrom Phase 2.3 for all Sentinel queries - If a query fails with table/workspace error: STOP, report error, ask user to select different workspace
- ⛔ DO NOT automatically retry with a different workspace
For EACH user selected by the user:
🔴 REFERENCE THE SKILL FILE: Read and follow the complete workflow defined in:
.github/skills/user-investigation/SKILL.md
Key Steps (summary - see skill file for full details):
- Get User Object ID from Microsoft Graph
- Calculate date ranges based on investigation type (Standard/Quick/Comprehensive)
- Run parallel data collection:
- Sign-in anomalies (Signinlogs_Anomalies_KQL_CL — note lowercase 'l' in "logs")
- Sign-in statistics (apps, locations, IPs)
- Audit log events
- Office 365 activity
- Security incidents involving user
- Identity Protection risk detections
- MFA and authentication methods
- Device compliance status
- IP enrichment for flagged addresses
- Compile and present findings
- Generate HTML report (if requested)
DO NOT copy the full workflow here - always read the skill file for the most current instructions.
Phase 2-B: Device Investigation
Device Investigation Workflow
⚡ PARALLEL EXECUTION: When multiple devices are selected, execute device data collection queries in parallel for ALL devices simultaneously. Run Defender alerts, compliance, logged-on users, vulnerabilities, network/process/file events queries concurrently.
For EACH device selected by the user:
🔴 REFERENCE THE SKILL FILE: Read and follow the complete workflow defined in:
.github/skills/computer-investigation/SKILL.md
Key Steps (summary - see skill file for full details):
- Get Device IDs (Entra Device ID + Defender Device ID)
- Determine device type (Entra Joined, Hybrid Joined, Entra Registered)
- Run parallel data collection:
- Defender alerts for device
- Device compliance status
- Logged-on users
- Software vulnerabilities
- Network connections
- Process events
- File events
- Automated investigations
- Compile and present findings
DO NOT copy the full workflow here - always read the skill file for the most current instructions.
Phase 2-C: IoC Investigation
IoC Investigation Workflow
⚡ PARALLEL EXECUTION: When multiple IoCs are selected, execute ALL IoC investigation queries in parallel. Run threat intel lookups, Sentinel queries, and organizational exposure queries concurrently for all IoCs.
For EACH IoC selected by the user:
🔴 REFERENCE THE SKILL FILE: Read and follow the complete workflow defined in:
.github/skills/ioc-investigation/SKILL.md
Supported IoC Types:
| IoC Type | Detection Pattern | Key Investigation Points |
|---|---|---|
| URL | https?:// or domain pattern |
Malicious indicators, phishing, threat intel, organizational exposure |
| IPv4 Address | \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} |
Threat intel, network connections, geographic analysis |
| IPv6 Address | Contains multiple colons | Same as IPv4 |
| Domain | [a-zA-Z0-9][-a-zA-Z0-9]*\.[a-zA-Z]{2,} |
DNS queries, email threats, reputation |
| MD5 Hash | 32 hex characters | File prevalence, malware analysis |
| SHA1 Hash | 40 hex characters | File prevalence, malware analysis |
| SHA256 Hash | 64 hex characters | File prevalence, malware analysis |
Key Steps (summary - see skill file for full details):
- Identify IoC type and normalize
- Query Defender Threat Intelligence
- Check Sentinel ThreatIntelIndicators table
- Analyze organizational exposure (devices, connections)
- Correlate with CVEs if applicable
- Present findings with risk assessment
DO NOT copy the full workflow here - always read the skill file for the most current instructions.
Phase 3: Post-Investigation Loop (MANDATORY)
⛔ CRITICAL: DO NOT END THE RESPONSE WITHOUT COMPLETING THIS PHASE
After completing ALL selected entity investigations in Phase 2, you MUST:
- List remaining uninvestigated entities - Show all entities from Phase 1 that were NOT yet investigated
- Ask the user to select additional entities - Prompt user to continue or conclude
- Wait for user response - Do not assume the investigation is complete
Phase 3 Checklist (Execute After Every Phase 2 Completion)
☐ Step 3.1: Compile list of UNINVESTIGATED entities (exclude already-investigated items)
☐ Step 3.2: Present remaining entities to user with numbered list
☐ Step 3.3: Ask: "Would you like to investigate any of the remaining entities? Select by number/name, or say 'done' to conclude."
☐ Step 3.4: Wait for user response before concluding
Required Prompt Format
After presenting investigation findings, ALWAYS end with:
📋 Remaining Uninvestigated Entities:
# Type Entity Notes 1 Device [DEVICE_NAME] [Risk level or relevant context] 2 File [FILENAME] [Hash or detection status] 3 URL [DEFANGED_URL] [Threat assessment] ... ... ... ... Would you like to investigate any of these remaining entities? Select by number/name, type "all" to investigate everything, or say "done" to conclude the investigation.
Rules
- DO NOT include entities that were already investigated in the list
- DO NOT ask the user to select Sentinel workspaces again (use previously selected workspace)
- DO NOT provide a final summary or recommendations until the user explicitly says "done" or declines further investigation
- DO NOT assume the investigation is complete just because selected entities were analyzed
Loop Behavior
IF user selects additional entities:
→ Return to Phase 2 (2-A, 2-B, or 2-C based on entity type)
→ After completion, return to Phase 3 again
ELSE IF user says "done" or declines:
→ Proceed to Final Summary
→ Provide recommendations
→ Offer to generate consolidated report
Sentinel MCP Tools Reference
analyze_user_entity
Purpose: Starts asynchronous security analysis of a user entity.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
userId |
string | Yes | User's Azure AD Object ID (GUID) |
startTime |
string | Yes | ISO 8601 format start time |
endTime |
string | Yes | ISO 8601 format end time |
workspaceId |
string | No | Sentinel workspace GUID (optional if only one workspace) |
Time Window Options: 30 days (Comprehensive), 7 days (Standard), 1 day (Quick)
Returns: 202 Accepted with analysisId
get_entity_analysis
Purpose: Retrieves results of an asynchronous entity analysis.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
analysisId |
string | Yes | Analysis ID returned from analyze_*_entity |
Returns: 200 OK with analysis results when complete, or status if still processing
Quick Start (TL;DR)
When a user requests an incident investigation:
-
Phase 1 - Incident Description:
- Retrieve incident metadata using
GetIncidentById - List top 30 alerts as a table
- Enumerate all assets by type (devices, users, apps, cloud resources)
- List filtered evidences (processes, files, IPs, URLs, domains)
- Retrieve incident metadata using
-
⛔ Phase 2 - Mandatory Workspace Selection:
- Call
list_sentinel_workspacesMCP tool FIRST - Present entity summary from Phase 1
- If 1 workspace: auto-select and display
- If multiple workspaces: ASK USER to select before proceeding
- DO NOT proceed to investigations without a workspace selected
- Call
-
Phase 2-A - User Investigation:
- For each selected user: Follow
.github/skills/user-investigation/SKILL.md - Present findings
- For each selected user: Follow
-
Phase 2-B - Device Investigation:
- For each selected device: Follow
.github/skills/computer-investigation/SKILL.md - Present findings
- For each selected device: Follow
-
Phase 2-C - IoC Investigation:
- For each selected IoC (IPs, URLs, Files, Domains, Hashes): Follow
.github/skills/ioc-investigation/SKILL.md - Present findings
- For each selected IoC (IPs, URLs, Files, Domains, Hashes): Follow
-
Export & Summary:
- Create consolidated JSON file
- Present investigation summary with recommendations
JSON Export Structure
Required Fields
| Field | Type | Description |
|---|---|---|
investigation_metadata |
object | Incident ID, timestamp, investigation phases completed |
incident_details |
object | Metadata, alerts, assets, evidences from Phase 1 |
user_investigations |
array | Results from Phase 2-A (user-investigation skill) |
device_investigations |
array | Results from Phase 2-B (computer-investigation skill) |
ioc_investigations |
array | Results from Phase 2-C (ioc-investigation skill - includes IPs, URLs, Files, Domains, Hashes) |
summary |
object | Key findings, risk assessment, recommendations |
Example JSON Structure
{
"investigation_metadata": {
"incident_id": "<INCIDENT_ID>",
"investigation_timestamp": "<ISO_TIMESTAMP>",
"phases_completed": ["incident_description", "user_investigation", "device_investigation", "ioc_investigation"],
"total_elapsed_time_seconds": 300
},
"incident_details": {
"metadata": {
"title": "<INCIDENT_TITLE>",
"description": "<DESCRIPTION>",
"severity": "<SEVERITY>",
"status": "<STATUS>",
"classification": "<CLASSIFICATION>",
"determination": "<DETERMINATION>",
"created_date": "<TIMESTAMP>",
"first_activity_date": "<TIMESTAMP>",
"last_updated_date": "<TIMESTAMP>",
"assigned_to": "<ANALYST>",
"mitre_categories": ["<TACTIC1>", "<TACTIC2>"],
"tags": ["<TAG1>", "<TAG2>"]
},
"alerts": [
{
"name": "<ALERT_NAME>",
"severity": "<SEVERITY>",
"status": "<STATUS>",
"first_activity": "<TIMESTAMP>",
"last_activity": "<TIMESTAMP>"
}
],
"assets": {
"devices": [...],
"users": [...],
"apps": [...],
"cloud_resources": [...]
},
"evidences": {
"processes": [...],
"files": [...],
"ip_addresses": [...],
"urls": [...],
"ad_domains": [...]
}
},
"user_investigations": [
{
"upn": "user@domain.com",
"user_id": "<GUID>",
"analysis_id": "<ANALYSIS_ID>",
"time_window": {
"start": "<ISO_TIMESTAMP>",
"end": "<ISO_TIMESTAMP>"
},
"findings": {...},
"risk_level": "High"
}
],
"device_investigations": [
{
"hostname": "<DEVICE_NAME>",
"device_id": "<GUID>",
"findings": {...}
}
],
"ioc_investigations": [
{
"ioc_type": "IP",
"value": "203.0.113.42",
"findings": {...}
},
{
"ioc_type": "URL",
"value": "https://example.com",
"findings": {...},
"threat_assessment": "Malicious"
}
],
"summary": {
"risk_assessment": "High",
"key_findings": [...],
"recommendations": [...]
}
}
Error Handling
Common Issues and Solutions
| Issue | Solution |
|---|---|
| Incident not found | Verify incident ID format; try Sentinel query if Defender fails |
| User Object ID not found | Verify UPN is correct; check if user exists in Entra ID |
| analyze_user_entity returns error | Check userId is GUID format; verify time window ≤ 30 days |
| get_entity_analysis still processing | Poll again after 5-10 seconds; max 2 minutes |
| No workspace found | Use list_sentinel_workspaces MCP tool to get workspace ID |
| Device investigation fails | Verify device exists in Defender; check device ID type |
| IoC investigation timeout | Reduce date range; check IoC format |
Workspace ID Retrieval
If workspace ID is unknown, retrieve it first:
list_sentinel_workspaces (MCP tool)
Returns: List of workspace name/ID pairs
Workspace ID Selection
If there is more than one Sentinel workspace (as retrieved from list_sentinel_workspaces MCP tool), present the list - in terms of workspace names and IDs - to the user so that the user can select which workspace to use for the investigation.
Offer also to the user the possibility to use all existing workspaces.
If only one workspace is selected by the user, use the workspaceId of that workspace when calling investigation tools.
If the user asks to consider more than one workspace, use one by one the workspaceId of all of them when calling investigation tools.
Time Window Limits
| Tool | Time Window Options |
|---|---|
| User Investigation | 30 days (Comprehensive), 7 days (Standard), 1 day (Quick) |
| Computer Investigation | 30 days (Comprehensive), 7 days (Standard), 1 day (Quick) |
| IoC Investigation | 30 days (Comprehensive), 7 days (Standard), 1 day (Quick) |
Example Investigation Workflow
User Request: "Investigate incident 12345"
Phase 1: Incident Description
[00:00] Starting incident investigation for ID: 12345
### Incident Metadata
- **Title:** Multi-stage attack with credential theft
- **Severity:** High
- **Status:** Active
- **Classification:** TruePositive
- **Created:** 2026-01-20T10:30:00Z
- **MITRE Categories:** Initial Access, Credential Access, Lateral Movement
### Incident Alerts
| # | Alert Name | Severity | Status | Last Activity |
|---|------------|----------|--------|---------------|
| 1 | Suspicious sign-in from unusual location | High | New | 2026-01-23 |
| 2 | Credential theft attempt detected | High | InProgress | 2026-01-22 |
| ... | ... | ... | ... | ... |
### Incident Assets
**Devices:**
| Name | Domain | Risk Level | OS |
|------|--------|------------|-----|
| WORKSTATION-01 | contoso.com | High | Windows 11 |
| LAPTOP-EXEC | contoso.com | Medium | Windows 11 |
| SERVER-DC01 | contoso.com | Low | Windows Server 2022 |
**Users:**
| Display Name | UPN | Status | Department |
|--------------|-----|--------|------------|
| John Smith | jsmith@contoso.com | Active | Finance |
| Admin Account | admin@contoso.com | Active | IT |
| Jane Doe | jdoe@contoso.com | Active | HR |
| Service Account | svc-backup@contoso.com | Active | IT |
### Incident Evidences
**IPs (after filtering - excluded private IPs):**
- `203[.]0[.]113[.]42` (Malicious - C2 communication)
- `198[.]51[.]100[.]10` (Suspicious - Data exfiltration)
- `192[.]0[.]2[.]50` (Suspicious - Unusual connection)
...
**URLs (after filtering - excluded managed domains):**
- `hxxps://evil-site[.]com/payload[.]exe` (Malicious)
- `hxxps://phishing[.]example[.]com/login` (Suspicious)
...
[01:30] Phase 1 completed (90 seconds)
Phase 2-A: User Investigation
Which users from the incident assets should be investigated deeply?
Available users:
1. jsmith@contoso.com (Finance)
2. admin@contoso.com (IT)
3. jdoe@contoso.com (HR)
4. svc-backup@contoso.com (IT)
User selects: "1, 2"
[01:35] Starting parallel user analysis for 2 users...
- Getting user Object IDs from Graph API (parallel)
- Starting analyze_user_entity for jsmith@contoso.com (Analysis ID: abc123-def456)
- Starting analyze_user_entity for admin@contoso.com (Analysis ID: xyz789-ghi012)
- Polling for results (parallel)...
[02:15] All analyses complete
### User Analysis: jsmith@contoso.com
**Risk Level:** High
**Key Findings:**
1. Sign-in from unusual location (IP: `203[.]0[.]113[.]42`, Country: Russia)
2. Multiple failed MFA attempts followed by success
3. Unusual file access pattern detected
...
### User Analysis: admin@contoso.com
**Risk Level:** Medium
**Key Findings:**
1. Service account usage from new device
...
[02:20] Phase 2-A completed (45 seconds - parallel execution)
Phase 2-B: Device Investigation
Which devices from the incident assets should be investigated deeply?
Available devices:
1. WORKSTATION-01 (High risk)
2. LAPTOP-EXEC (Medium risk)
3. SERVER-DC01 (Low risk)
User selects: "1"
[03:10] Starting device investigation for WORKSTATION-01...
- Following computer-investigation skill workflow
- Getting device IDs (Entra + Defender)
- Running parallel queries...
[04:30] Device investigation complete
### Device Analysis: WORKSTATION-01
**Key Findings:**
1. Malware execution detected (sha256: abc123...)
2. Outbound C2 communication to 203.0.113.42
3. Credential dumping tool found
...
[04:35] Phase 2-B completed (85 seconds)
Phase 2-C: IoC Investigation
Which IPs, URLs, Files, Domains, or Hashes should be investigated deeply?
Available IoCs:
1. 203[.]0[.]113[.]42 (IP - C2 communication)
2. 198[.]51[.]100[.]10 (IP - Data exfiltration)
3. hxxps://evil-site[.]com/payload[.]exe (URL - Malicious)
4. hxxps://phishing[.]example[.]com/login (URL - Suspicious)
5. abc123def456... (Hash - Malware)
User selects: "1, 3, 4, 5"
[04:40] Starting parallel IoC investigation for 4 IoCs...
- Following ioc-investigation skill workflow
- Running threat intel, Sentinel, and exposure queries in parallel for all IoCs
[05:30] All IoC analyses complete
### IP Analysis: 203[.]0[.]113[.]42
**Threat Assessment:** Malicious
**Key Findings:**
1. Known C2 infrastructure
2. Associated with threat actor APT-XYZ
...
### URL Analysis: hxxps://evil-site[.]com/payload[.]exe
**Threat Assessment:** Malicious
**Key Findings:**
1. Known malware distribution domain
2. 3 devices in organization accessed this URL
...
### URL Analysis: hxxps://phishing[.]example[.]com/login
**Threat Assessment:** Suspicious
**Key Findings:**
1. Phishing page mimicking corporate login
...
### Hash Analysis: abc123def456...
**Threat Assessment:** Malicious
**Key Findings:**
1. Known malware sample
...
[05:35] Phase 2-C completed (55 seconds - parallel execution)
[05:45] Investigation Summary
=========================
**Incident:** 12345 - Multi-stage attack with credential theft
**Total Investigation Time:** 4 minutes 10 seconds (optimized with parallel execution)
**Key Findings:**
1. Compromised user account (jsmith@contoso.com) used for initial access
2. Malware deployed on WORKSTATION-01 establishing C2 channel
3. Credential theft attempt targeting admin account
4. Data exfiltration attempts detected
**Recommendations:**
1. 🔴 CRITICAL: Isolate WORKSTATION-01 immediately
2. 🔴 CRITICAL: Reset credentials for jsmith@contoso.com and admin@contoso.com
3. 🟠 HIGH: Block IP `203[.]0[.]113[.]42` at firewall
4. 🟠 HIGH: Block domain `evil-site[.]com`
5. 🟡 MEDIUM: Review all sign-ins for affected users in past 30 days
**Export:** temp/incident_investigation_12345_20260124.json
Integration with Skill Files
This skill orchestrates investigations by referencing specialized skills:
| Investigation Phase | Skill/Tool | Location/Reference |
|---|---|---|
| Phase 1: Incident Description | Built-in workflow | This file (see Phase 1 section) |
| Phase 2-A: User Investigation | user-investigation skill | .github/skills/user-investigation/SKILL.md |
| Phase 2-B: Device Investigation | computer-investigation skill | .github/skills/computer-investigation/SKILL.md |
| Phase 2-C: IoC Investigation | ioc-investigation skill | .github/skills/ioc-investigation/SKILL.md (IPs, URLs, Files, Domains, Hashes) |
🔴 ALWAYS read the referenced skill file before executing that phase to ensure proper workflow execution.
.github/skills/ioc-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill ioc-investigation -g -y
SKILL.md
Frontmatter
{
"name": "ioc-investigation",
"description": "Use this skill when asked to investigate an Indicator of Compromise (IoC) such as an IP address, DNS domain, URL, or file hash. Triggers on keywords like \"investigate IP\", \"check domain\", \"IoC investigation\", \"threat intel\", \"is this malicious\", \"suspicious URL\", or when an IP\/domain\/URL\/hash is mentioned with investigation context. This skill provides comprehensive IoC analysis using Microsoft Defender Threat Intelligence, Sentinel Threat Intel tables, Advanced Hunting, organizational exposure assessment, CVE correlation, and affected device enumeration.",
"drill_down_prompt": "Investigate IoC {entity} — threat intel, organizational exposure, affected devices",
"threat_pulse_domains": [
"identity",
"endpoint",
"email",
"exposure"
]
}
IoC (Indicator of Compromise) Investigation - Instructions
Purpose
This skill performs comprehensive security investigations on Indicators of Compromise (IoCs) including:
- IP Addresses: Network connections, threat intel matches, geographic analysis, organizational exposure
- DNS Domains: Domain reputation, connection events, email-based threats, URL analysis
- URLs: URL reputation, phishing detection, email delivery, browser activity
- File Hashes: Malware analysis, file prevalence, related alerts, affected devices
The investigation correlates IoCs with Microsoft Defender Threat Intelligence, identifies associated CVEs, and enumerates organizational assets affected by those vulnerabilities.
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Investigation Types - By IoC type
- Quick Start - 5-step investigation pattern
- Execution Workflow - Complete process
- Sample KQL Queries - Validated query patterns
- Defender API Queries - Threat Intel & Vulnerability Management
- JSON Export Structure - Required fields
- Error Handling - Troubleshooting guide
Investigation shortcuts:
- Suspicious IP from spray/brute-force (TP Q4): Q2 (network connections) → Q11 (sign-in analysis) → Q8 (alert evidence) → Q1 (TI match)
- IP from user risk event (TP Q3): Q11 (sign-in analysis) → Q2 (device connections) → Q9 (security alerts) →
enrich_ips.py - Phishing domain/URL (TP Q8): Q4 (DNS/HTTP connections) → Q6 (email delivery) → Q8 (alert evidence) → Q1 (TI match)
- File hash from incident (TP Q1): Q7 (file events across all tables) → Q9 (security alerts) → Q10 (custom indicator check) → Q12 (CVE extraction)
- IoC organizational exposure (TP Q1+Q11): Q2/Q4 (affected devices) → Q9 (alert correlation) → Q12 (CVEs from alerts)
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY IoC investigation:
- ALWAYS identify the IoC type FIRST (IP, Domain, URL, or File Hash)
- ALWAYS normalize the IoC (lowercase domains, validate IP format, extract domain from URL)
- ALWAYS calculate date ranges correctly (use current date from context - see Date Range section)
- ALWAYS track and report time after each major step (mandatory)
- ALWAYS run independent queries in parallel (drastically faster execution)
- ALWAYS use
create_filefor JSON export (NEVER use PowerShell terminal commands) - ⛔ ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
⛔ MANDATORY: Sentinel Workspace Selection
This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:
When invoked from a parent skill (incident-investigation, threat-pulse, etc.):
- Inherit the workspace selection from the parent investigation context
- If no workspace was selected in parent context: STOP and ask user to select
- Use the
SELECTED_WORKSPACE_IDSpassed from the parent skill - Skip output mode prompts — default to inline chat (the parent skill controls the final output format)
When invoked standalone (direct user request):
- ALWAYS call
list_sentinel_workspacesMCP tool FIRST - If 1 workspace exists: Auto-select, display to user, proceed
- If multiple workspaces exist:
- Display all workspaces with Name and ID
- ASK: "Which Sentinel workspace should I use for this investigation?"
- ⛔ STOP AND WAIT for user response
- ⛔ DO NOT proceed until user explicitly selects
- If a query fails on the selected workspace:
- ⛔ DO NOT automatically try another workspace
- STOP and report the error
- Display available workspaces
- ASK user to select a different workspace
- WAIT for user response
Workspace Failure Handling
IF query returns "Failed to resolve table" or similar error:
- STOP IMMEDIATELY
- Report: "⚠️ Query failed on workspace [NAME] ([ID]). Error: [ERROR_MESSAGE]"
- Display: "Available workspaces: [LIST_ALL_WORKSPACES]"
- ASK: "Which workspace should I use instead?"
- WAIT for explicit user response
- DO NOT retry with a different workspace automatically
🔴 PROHIBITED ACTIONS:
- ❌ Selecting a workspace without user consent when multiple exist
- ❌ Switching to another workspace after a failure without asking
- ❌ Proceeding with investigation if workspace selection is ambiguous
- ❌ Assuming a workspace based on previous sessions
IoC Type Detection Rules:
| Pattern | IoC Type | Normalization |
|---|---|---|
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} |
IPv4 Address | Validate octets ≤255 |
[a-fA-F0-9:]+ (with multiple colons) |
IPv6 Address | Lowercase, expand if needed |
[a-zA-Z0-9][-a-zA-Z0-9]*\.[a-zA-Z]{2,} |
Domain | Lowercase, remove trailing dot |
https?://.* or starts with www. |
URL | Extract domain for separate analysis |
| 32 hex chars | MD5 Hash | Lowercase |
| 40 hex chars | SHA1 Hash | Lowercase |
| 64 hex chars | SHA256 Hash | Lowercase |
Date Range Rules:
- Real-time/recent searches: Add +2 days to current date for end range
- Historical ranges: Add +1 day to user's specified end date
- Example: Current date = Jan 23; "Last 7 days" →
datetime(2026-01-16)todatetime(2026-01-25)
Available Investigation Types
IP Address Investigation
When to use: Suspicious inbound/outbound connections, firewall alerts, sign-in anomalies
Example prompts:
- "Investigate IP 203.0.113.42"
- "Is 198.51.100.10 malicious?"
- "Check threat intel for 192.0.2.1"
Data sources:
- Defender Threat Intelligence (IP alerts, statistics)
- DeviceNetworkEvents (connection history)
- ThreatIntelIndicators (Sentinel TI table)
- SigninLogs (if used for authentication)
- Defender IOC list (custom indicators)
enrich_ips.py(3rd-party enrichment: ipinfo.io geo/ISP, vpnapi.io VPN/proxy/Tor, AbuseIPDB abuse score & reports, Shodan ports/services/CVEs/tags)
Domain Investigation
When to use: Suspicious DNS queries, phishing domains, C2 communication
Example prompts:
- "Investigate domain malware-c2.example.com"
- "Is evil.com in our threat intel?"
- "Check if any devices connected to suspicious.net"
Data sources:
- DeviceNetworkEvents (DNS queries, HTTP connections)
- EmailUrlInfo (email-delivered URLs)
- ThreatIntelIndicators (domain indicators)
- Defender IOC list (blocked domains)
- UrlClickEvents (user clicks on domain)
URL Investigation
When to use: Phishing links, malicious downloads, suspicious redirects
Example prompts:
- "Investigate URL https://phishing.example.com/login"
- "Was this URL clicked by anyone?"
- "Check threat intel for http://malware.site/payload.exe"
Data sources:
- EmailUrlInfo (URLs in emails)
- UrlClickEvents (click tracking)
- DeviceNetworkEvents (HTTP/HTTPS connections)
- DeviceFileEvents (downloads from URL)
- ThreatIntelIndicators (URL patterns)
File Hash Investigation
When to use: Malware analysis, suspicious executables, file reputation
Example prompts:
- "Investigate hash a1b2c3d4e5f6..."
- "Is this SHA256 known malware?"
- "Which devices have this file?"
Data sources:
- Defender File Info & Statistics
- Defender File Alerts
- Defender File Related Machines
- DeviceFileEvents (file creation/modification)
- ThreatIntelIndicators (file hash indicators)
Quick Start (TL;DR)
When a user requests an IoC investigation:
-
Identify & Normalize IoC:
- Detect IoC type (IP/Domain/URL/Hash) - Normalize format (lowercase, validate) - Extract embedded IoCs (domain from URL) -
Run Parallel Queries (Batch 1 - Threat Intel):
- Sentinel ThreatIntelIndicators query
- Defender Indicators lookup (ListDefenderIndicators)
- Defender IP/File alerts (GetDefenderIpAlerts or GetDefenderFileAlerts)
- Defender IP/File statistics
-
Run 3rd-Party IP Enrichment (IP IoCs only):
python enrich_ips.py <IP_ADDRESS>- ipinfo.io: Geolocation, ISP/ASN, hosting provider
- vpnapi.io: VPN, proxy, Tor exit node detection
- AbuseIPDB: Abuse confidence score, recent attack reports
- Shodan: Open ports, services/banners, CVEs, tags (e.g.,
c2,eol-os,self-signed)
-
Run Parallel Queries (Batch 2 - Activity):
- DeviceNetworkEvents (connections involving IoC)
- AlertEvidence (alerts with IoC as evidence)
- SecurityAlert (alerts mentioning IoC)
- EmailUrlInfo (if domain/URL)
-
CVE & Vulnerability Correlation:
- Extract CVE IDs from threat intel results AND Shodan enrichment
- For each CVE: ListDefenderMachinesByVulnerability
- Aggregate affected devices
-
Export to JSON & Generate Summary:
temp/ioc_investigation_{ioc_normalized}_{timestamp}.json
Execution Workflow
🚨 MANDATORY: Time Tracking Pattern
YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:
[MM:SS] ✓ Step description (XX seconds)
Required Reporting Points:
- After IoC normalization and type detection
- After 3rd-party IP enrichment (IP IoCs)
- After Defender/Sentinel threat intelligence lookup
- After activity/connection analysis
- After CVE correlation and device enumeration
- After JSON file creation
- Final: Total elapsed time
Phase 1: IoC Identification and Normalization (REQUIRED FIRST)
Step 1.1: Detect IoC Type
# Regex patterns for IoC detection
IPv4: r'^(\d{1,3}\.){3}\d{1,3}$'
IPv6: r'^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$'
Domain: r'^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$'
URL: r'^https?://'
MD5: r'^[a-fA-F0-9]{32}$'
SHA1: r'^[a-fA-F0-9]{40}$'
SHA256: r'^[a-fA-F0-9]{64}$'
Step 1.2: Normalize IoC
- IP Address: Validate octets, detect IPv4 vs IPv6
- Domain: Lowercase, remove trailing dots, extract from URL if needed
- URL: Keep full URL, also extract domain for parallel investigation
- Hash: Lowercase
Step 1.3: Create Investigation Context
{
"ioc_type": "ip|domain|url|hash",
"ioc_value": "<normalized_value>",
"ioc_original": "<user_provided_value>",
"extracted_domain": "<if_url>",
"investigation_start": "<timestamp>",
"date_range_start": "<StartDate>",
"date_range_end": "<EndDate>"
}
Phase 2: 3rd-Party IP Enrichment (IP Address IoCs)
MANDATORY for all IP address investigations. Run enrich_ips.py to get external threat intelligence context that is NOT available from Defender/Sentinel native tools.
python enrich_ips.py <IP_ADDRESS_1> <IP_ADDRESS_2> ...
What it provides:
| Source | Intelligence |
|---|---|
| ipinfo.io | Geolocation (city, country, coordinates), ISP/ASN, organization, hosting provider detection |
| vpnapi.io | VPN, proxy, Tor exit node, relay detection |
| AbuseIPDB | Abuse confidence score (0-100), total reports, last reported date, recent reporter comments with attack categories |
| Shodan | Open ports, service/banner details, OS detection, known CVEs, tags (e.g., c2, eol-os, self-signed, honeypot), CPEs, hostnames |
Output: Per-IP detailed results printed to terminal + JSON export saved to temp/.
Integration with investigation:
- AbuseIPDB score ≥ 75: 🔴 Strong indicator of malicious activity — flag as high risk
- VPN/Proxy/Tor detected: 🟠 Potential evasion — note in risk assessment
- Shodan tags contain
c2: 🔴 Known C2 infrastructure — escalate immediately - Shodan CVEs found: Cross-reference with Phase 5 CVE correlation for organizational exposure
- Hosting provider (not residential ISP): 🟡 May indicate attacker infrastructure
Note: For domain and URL IoCs, extract the resolved IP(s) from DeviceNetworkEvents results and run enrichment on those IPs as a follow-up step.
Phase 3: Parallel Threat Intelligence Collection (Defender & Sentinel)
CRITICAL: Run ALL threat intel queries in parallel for speed!
Batch 1: Threat Intelligence APIs (Run ALL in parallel)
| Query | Tool/API | IoC Types |
|---|---|---|
| Defender IOC List | ListDefenderIndicators ⚠️ |
IP, Domain, URL |
| Defender IP Alerts | GetDefenderIpAlerts |
IP |
| Defender IP Statistics | GetDefenderIpStatistics |
IP |
| Defender File Alerts | GetDefenderFileAlerts |
Hash |
| Defender File Info | GetDefenderFileInfo |
Hash |
| Defender File Statistics | GetDefenderFileStatistics |
Hash |
| Defender File Machines | GetDefenderFileRelatedMachines |
Hash |
⚠️ ListDefenderIndicators Note: If result is written to file (>50KB), you MUST read and filter the file manually. See Custom IOC Management for required processing steps.
Batch 2: Sentinel KQL Queries (Run ALL in parallel)
| Query | Table | IoC Types |
|---|---|---|
| TI Indicators Match | ThreatIntelIndicators | All |
| Network Connections | DeviceNetworkEvents | IP, Domain, URL |
| Alert Evidence | AlertEvidence | All |
| Security Alerts | SecurityAlert | All |
| Email URLs | EmailUrlInfo | Domain, URL |
Phase 4: CVE Correlation and Vulnerability Management
Step 4.1: Extract CVE IDs from Threat Intel AND Enrichment
- Parse threat intel results for CVE references (pattern:
CVE-\d{4}-\d{4,}) - Extract from: alert descriptions, threat family info, MITRE techniques
- Extract from Shodan enrichment (
shodan_vulnsfield fromenrich_ips.pyoutput)
Step 4.2: Query Affected Devices per CVE
For each CVE_ID found:
→ ListDefenderMachinesByVulnerability(cveId: CVE_ID)
→ Collect: deviceId, deviceName, osPlatform, exposureLevel
Step 4.3: Aggregate Device Exposure
{
"cve_correlation": {
"cve_ids_found": ["CVE-2024-1234", "CVE-2024-5678"],
"affected_devices_by_cve": {
"CVE-2024-1234": [
{"deviceId": "...", "deviceName": "...", "osPlatform": "..."}
]
},
"total_unique_affected_devices": 15,
"critical_cves": 2,
"high_cves": 3
}
}
Phase 5: Activity and Connection Analysis
For IP Address IoCs:
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let IPAddress = '<IP_ADDRESS>';
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteIP == IPAddress or LocalIP == IPAddress
| summarize
ConnectionCount = count(),
UniqueDevices = dcount(DeviceId),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp),
Ports = make_set(RemotePort),
Protocols = make_set(Protocol)
by ActionType
| order by ConnectionCount desc
For Domain IoCs:
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let Domain = '<DOMAIN>';
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteUrl has Domain
| summarize
ConnectionCount = count(),
UniqueDevices = dcount(DeviceId),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp),
UniqueURLs = make_set(RemoteUrl, 10)
by DeviceName
| order by ConnectionCount desc
For File Hash IoCs:
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let Hash = '<HASH>';
union withsource=SourceTable DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceLogonEvents, DeviceImageLoadEvents, DeviceEvents
| where Timestamp between (start .. end)
| where SHA1 =~ Hash or SHA256 =~ Hash or MD5 =~ Hash or InitiatingProcessSHA256 =~ Hash
| summarize
EventCount = count(),
UniqueDevices = dcount(DeviceId),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp),
FileNames = make_set(FileName),
FolderPaths = make_set(FolderPath, 5)
by ActionType
| order by EventCount desc
Phase 6: Export to JSON
Create single JSON file: temp/ioc_investigation_{ioc_type}_{ioc_normalized}_{timestamp}.json
Sample KQL Queries
Use these exact patterns with appropriate MCP tools. Replace <IOC_VALUE>, <StartDate>, <EndDate>.
⚠️ CRITICAL: START WITH THESE EXACT QUERY PATTERNS These queries have been tested and validated. Use them as your PRIMARY reference.
📅 Date Range Quick Reference
🔴 STEP 0: GET CURRENT DATE FIRST (MANDATORY) 🔴
- ALWAYS check the current date from the context header BEFORE calculating date ranges
- NEVER use hardcoded years - the year changes and you WILL query the wrong timeframe
RULE 1: Real-Time/Recent Searches (Current Activity)
- Add +2 days to current date for end range
- Why +2? +1 for timezone offset + +1 for inclusive end-of-day
- Pattern: Today is Jan 23 → Use
datetime(2026-01-25)as end date
RULE 2: Historical Searches (User-Specified Dates)
- Add +1 day to user's specified end date
- Why +1? To include all 24 hours of the final day
1. Threat Intelligence Indicator Match (Sentinel - limited to first 20 IoCs)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let ioc_value = '<IOC_VALUE>';
ThreatIntelIndicators
| where TimeGenerated between (start .. end)
| where IsActive == true and IsDeleted == false
| summarize arg_max(TimeGenerated, *) by Id
| where ObservableValue =~ ioc_value
or Pattern has ioc_value
| project
TimeGenerated,
Id,
ObservableKey,
ObservableValue,
Pattern,
Confidence,
ValidFrom,
ValidUntil,
Tags,
Data
| order by TimeGenerated desc
| take 20
2. IP Address - Network Connection Activity (Advanced Hunting)
let target_ip = '<IP_ADDRESS>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteIP == target_ip or LocalIP == target_ip
| extend Direction = iff(RemoteIP == target_ip, "Outbound", "Inbound")
| summarize
TotalConnections = count(),
UniqueDevices = dcount(DeviceId),
UniquePorts = dcount(RemotePort),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp),
Devices = make_set(DeviceName, 10),
Ports = make_set(RemotePort, 20),
Protocols = make_set(Protocol),
ActionTypes = make_set(ActionType),
InitiatingProcesses = make_set(InitiatingProcessFileName, 10),
Direction = make_set(Direction,2)
3. IP Address - Detailed Connection Timeline (limited to first 20 events)
let target_ip = '<IP_ADDRESS>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteIP == target_ip or LocalIP == target_ip
| project
Timestamp,
DeviceName,
DeviceId,
ActionType,
RemoteIP,
RemotePort,
RemoteUrl,
LocalIP,
LocalPort,
Protocol,
InitiatingProcessFileName,
InitiatingProcessCommandLine,
InitiatingProcessAccountName
| order by Timestamp desc
| take 20
4. Domain - DNS and HTTP Connection Activity
let target_domain = '<DOMAIN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteUrl has target_domain
| summarize
TotalConnections = count(),
UniqueDevices = dcount(DeviceId),
UniqueUsers = dcount(InitiatingProcessAccountName),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp),
Devices = make_set(DeviceName, 10),
URLs = make_set(RemoteUrl, 20),
Ports = make_set(RemotePort),
InitiatingProcesses = make_set(InitiatingProcessFileName, 10)
5. Domain - Detailed Connection Timeline (limited to first 20 events)
let target_domain = '<DOMAIN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteUrl has target_domain
| project
Timestamp,
DeviceName,
InitiatingProcessAccountName,
ActionType,
RemoteUrl,
RemoteIP,
RemotePort,
Protocol,
InitiatingProcessFileName,
InitiatingProcessCommandLine
| order by Timestamp desc
| take 20
6. URL - Email Delivery Analysis
let target_url = '<URL>';
let target_domain = '<DOMAIN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
EmailUrlInfo
| where TimeGenerated between (start .. end)
| where Url == target_url or Url has target_domain or UrlDomain =~ target_domain
| summarize
EmailCount = dcount(NetworkMessageId),
UniqueURLs = make_set(Url, 10),
UrlLocations = make_set(UrlLocation),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by UrlDomain
| order by EmailCount desc
7. File Hash - Device File Events
let target_hash = '<HASH>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union withsource=SourceTable DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceLogonEvents, DeviceImageLoadEvents, DeviceEvents
| where Timestamp between (start .. end)
| where SHA1 =~ target_hash or SHA256 =~ target_hash or MD5 =~ target_hash or InitiatingProcessSHA256 =~ target_hash
| summarize
EventCount = count(),
UniqueDevices = dcount(DeviceId),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp),
Devices = make_set(DeviceName, 10),
FileNames = make_set(FileName, 10),
FolderPaths = make_set(FolderPath, 10),
ActionTypes = make_set(ActionType)
| extend HashType = case(
isnotempty(target_hash) and strlen(target_hash) == 32, "MD5",
isnotempty(target_hash) and strlen(target_hash) == 40, "SHA1",
isnotempty(target_hash) and strlen(target_hash) == 64, "SHA256",
"Unknown")
8. Alert Evidence - IoC in Alerts (limited to first 20 alerts)
let ioc_value = '<IOC_VALUE>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
AlertEvidence
| where TimeGenerated between (start .. end)
| where RemoteIP == ioc_value
or RemoteUrl has ioc_value
or SHA1 =~ ioc_value
or SHA256 =~ ioc_value
or FileName has ioc_value
or Title has ioc_value
or Categories has ioc_value
| project
TimeGenerated,
AlertId,
Title,
Severity,
Categories,
ServiceSource,
EntityType,
EvidenceRole,
RemoteIP,
RemoteUrl,
FileName,
SHA1,
SHA256,
DeviceName,
AccountName
| order by TimeGenerated desc
| take 20
9. Security Alerts Mentioning IoC
let ioc_value = '<IOC_VALUE>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
AlertEvidence
| where TimeGenerated between (start .. end)
| where RemoteIP == ioc_value
or RemoteUrl has ioc_value
or SHA1 =~ ioc_value
or SHA256 =~ ioc_value
or FileName has ioc_value
or Title has ioc_value
or Categories has ioc_value
| join AlertInfo on AlertId
| extend HostFullName = strcat(parse_json(parse_json(AdditionalFields).Host).HostName,".", parse_json(parse_json(AdditionalFields).Host).DnsDomain)
| extend OS = strcat(parse_json(parse_json(AdditionalFields).Host).OSFamily," ", parse_json(parse_json(AdditionalFields).Host).OSVersion)
| extend IsDomainJoined = parse_json(parse_json(AdditionalFields).Host).IsDomainJoined
| extend AffectedDevice = strcat(HostFullName,",", OS, ",IsDomainJoined: ", IsDomainJoined)
| summarize
AlertCount = dcount(AlertId),
Alerts = make_set(Title, 10),
Severities = make_set(Severity),
Categories = make_set(Category),
AttackTechniques = make_set(AttackTechniques),
AffectedDevices = make_set(AffectedDevice, 10)
10. Defender Custom IOC List Match
// Use Defender API: ListDefenderIndicators with filters
// indicatorType: "IpAddress" | "DomainName" | "Url" | "FileSha1" | "FileSha256" | "FileMd5"
// indicatorValue: "<IOC_VALUE>"
11. IP Address - Sign-in Analysis (Azure AD)
let target_ip = '<IP_ADDRESS>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where IPAddress == target_ip
| summarize
SignInCount = count(),
UniqueUsers = dcount(UserPrincipalName),
SuccessCount = countif(ResultType == '0'),
FailureCount = countif(ResultType != '0'),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
Users = make_set(UserPrincipalName, 10),
Apps = make_set(AppDisplayName, 10),
ResultTypes = make_set(ResultType)
| extend SuccessRate = round(100.0 * SuccessCount / SignInCount, 2)
12. CVE Extraction from Alerts
let ioc_value = '<IOC_VALUE>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
AlertEvidence
| where TimeGenerated between (start .. end)
| where RemoteIP == ioc_value
or RemoteUrl has ioc_value
or SHA1 =~ ioc_value
or SHA256 =~ ioc_value
or FileName has ioc_value
or Title has ioc_value
or Categories has ioc_value
| extend CVEs = extract_all(@"(CVE-\d{4}-\d{4,})", tostring(AttackTechniques))
| mv-expand CVE = CVEs
| where isnotempty(CVE)
| summarize
CVECount = dcount(tostring(CVE)),
CVEs = make_set(tostring(CVE)),
AlertCount = dcount(AlertId),
Alerts = make_set(Title, 5)
Defender API Queries
IP Address Investigation
Get Alerts for IP:
Tool: GetDefenderIpAlerts (MCP)
Parameter: ipAddress = "<IP_ADDRESS>"
Returns: All security alerts associated with the IP
Get IP Statistics:
Tool: activate_file_and_ip_statistics_tools → GetDefenderIpStatistics
Parameter: ipAddress = "<IP_ADDRESS>"
Returns: Organization prevalence, device count, communication stats
Find Devices by IP:
Tool: FindDefenderMachinesByIp (MCP)
Parameters:
ipAddress = "<IP_ADDRESS>"
timestamp = "<DATETIME>" (ISO 8601 format)
Returns: Devices that communicated with IP ±15 minutes of timestamp
File Hash Investigation
Get File Info:
Tool: activate_file_and_ip_statistics_tools → GetDefenderFileInfo
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: File details, global prevalence, threat determination
Get File Statistics:
Tool: activate_file_and_ip_statistics_tools → GetDefenderFileStatistics
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: Organization statistics, device count, global stats
Get File Alerts:
Tool: GetDefenderFileAlerts (MCP)
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: All alerts associated with the file
Get Devices with File:
Tool: GetDefenderFileRelatedMachines (MCP)
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: All devices where file was observed
Vulnerability Management
List Devices Affected by CVE:
Tool: ListDefenderMachinesByVulnerability (MCP)
Parameter: cveId = "CVE-YYYY-NNNNN"
Returns: All devices vulnerable to the CVE with exposure details
Get Device Vulnerabilities:
Tool: GetDefenderMachineVulnerabilities (MCP)
Parameter: id = "<DEVICE_ID>"
Returns: All CVEs affecting the specific device
Custom IOC Management
Search Existing IOCs:
Tool: ListDefenderIndicators (MCP)
Parameters (all optional):
indicatorType = "IpAddress" | "DomainName" | "Url" | "FileSha1" | "FileSha256" | "FileMd5"
indicatorValue = "<IOC_VALUE>"
action = "Alert" | "Block" | "Allow"
severity = "Informational" | "Low" | "Medium" | "High"
Returns: Matching custom indicators in tenant
⚠️ CRITICAL: Processing Large ListDefenderIndicators Results
The ListDefenderIndicators API may return ALL custom indicators in the tenant regardless of filter parameters. When results are large (>50KB), they are written to a temporary file instead of returned inline.
MANDATORY Processing Steps:
-
If result says "Large tool result written to file":
- Use
read_filetool to read the content file path provided - Parse the JSON response to extract the
valuearray - Manually filter for the target IoC using case-insensitive matching:
# Filter logic for IP address matches = [ind for ind in indicators["value"] if ind.get("indicatorValue", "").lower() == target_ioc.lower()] - Report: "Found X custom indicator(s) matching [IOC]" or "No custom indicators match [IOC]"
- Use
-
If result is inline JSON with empty
valuearray:- Report: "No custom indicators found for [IOC]"
🔴 PROHIBITED:
- ❌ Assuming "large result = no match" without reading and filtering the file
- ❌ Reporting "Not in IOC list" without verifying the actual content
- ❌ Skipping file processing due to result size
Example - Correct Processing:
1. Call: ListDefenderIndicators(indicatorType: "IpAddress", indicatorValue: "203.0.113.42")
2. Result: "Large tool result (69KB) written to file: /path/to/content.json"
3. Action: read_file(/path/to/content.json)
4. Parse: Extract value array from JSON
5. Filter: Search for indicatorValue == "203.0.113.42" (case-insensitive)
6. Report: "No custom indicators match 203.0.113.42" OR "Found 1 custom indicator: [details]"
JSON Export Structure
Create file: temp/ioc_investigation_{ioc_type}_{ioc_normalized}_{timestamp}.json
{
"investigation_metadata": {
"ioc_type": "ip|domain|url|hash",
"ioc_value": "<normalized_value>",
"ioc_original": "<user_input>",
"investigation_timestamp": "<ISO8601>",
"date_range_start": "<StartDate>",
"date_range_end": "<EndDate>",
"elapsed_time_seconds": 45
},
"threat_intelligence": {
"sentinel_ti_matches": [],
"defender_ioc_matches": [],
"defender_alerts": [],
"threat_families": [],
"confidence_score": 0-100,
"verdict": "Malicious|Suspicious|Clean|Unknown"
},
"ip_enrichment": {
"geo": { "city": "", "country": "", "org": "", "isp": "" },
"vpn_proxy_tor": { "is_vpn": false, "is_proxy": false, "is_tor": false },
"abuseipdb": { "abuse_confidence_score": 0, "total_reports": 0, "last_reported": "", "recent_categories": [] },
"shodan": { "ports": [], "services": [], "vulns": [], "tags": [], "os": "", "hostnames": [], "cpes": [] }
},
"activity_analysis": {
"network_connections": {
"total_connections": 0,
"unique_devices": 0,
"unique_users": 0,
"first_seen": "<datetime>",
"last_seen": "<datetime>",
"top_devices": [],
"top_ports": [],
"top_processes": []
},
"email_delivery": {
"email_count": 0,
"unique_urls": [],
"delivery_locations": []
},
"file_activity": {
"event_count": 0,
"unique_devices": 0,
"file_names": [],
"folder_paths": [],
"action_types": []
},
"signin_activity": {
"signin_count": 0,
"unique_users": 0,
"success_rate": 0,
"affected_users": []
}
},
"alert_correlation": {
"total_alerts": 0,
"severity_breakdown": {
"high": 0,
"medium": 0,
"low": 0,
"informational": 0
},
"alert_titles": [],
"attack_techniques": [],
"affected_entities": []
},
"cve_correlation": {
"cve_ids_found": [],
"affected_devices_by_cve": {},
"total_unique_affected_devices": 0,
"cve_severity_breakdown": {
"critical": 0,
"high": 0,
"medium": 0,
"low": 0
}
},
"organizational_exposure": {
"total_affected_devices": 0,
"affected_device_list": [],
"exposure_level": "High|Medium|Low|None",
"recommended_actions": []
},
"risk_assessment": {
"overall_risk": "Critical|High|Medium|Low|Informational",
"risk_factors": [],
"mitigating_factors": [],
"confidence": "High|Medium|Low"
}
}
Error Handling
Common Issues and Solutions
| Issue | Solution |
|---|---|
| No TI matches found | IoC may be unknown; proceed with activity analysis |
| Defender API returns 404 | IoC not in organization's scope; check Sentinel data |
| Empty DeviceNetworkEvents | Expand date range or check if MDE is deployed |
| CVE not found in vulnerability DB | CVE may be too new or not applicable to org assets |
| Multiple IoC types detected | Investigate each separately, correlate results |
| Rate limiting on API calls | Add delays between API calls, batch where possible |
| ListDefenderIndicators returns large file | Read file with read_file, parse JSON, manually filter for target IoC value |
Required Field Defaults
If queries return no results, use these defaults:
{
"threat_intelligence": {
"sentinel_ti_matches": [],
"defender_alerts": [],
"verdict": "Unknown",
"confidence_score": 0
},
"activity_analysis": {
"network_connections": {
"total_connections": 0,
"unique_devices": 0
}
},
"cve_correlation": {
"cve_ids_found": [],
"affected_devices_by_cve": {},
"total_unique_affected_devices": 0
}
}
Example Workflows
Example 1: IP Address Investigation
User says: "Investigate IP 203.0.113.42 for the last 7 days"
Workflow:
- Identify IoC: IPv4 Address, normalized:
203.0.113.42 - 3rd-Party Enrichment:
→ Get geo, ISP, VPN/proxy/Tor flags, AbuseIPDB score, Shodan ports/CVEs/tagspython enrich_ips.py 203.0.113.42 - Phase 1 - Threat Intel (parallel):
GetDefenderIpAlerts(ipAddress: "203.0.113.42")- Sentinel ThreatIntelIndicators query
ListDefenderIndicators(indicatorType: "IpAddress", indicatorValue: "203.0.113.42")
- Phase 2 - Activity Analysis (parallel):
- DeviceNetworkEvents query for IP
- SigninLogs query for IP
- AlertEvidence query for IP
- Phase 3 - CVE Correlation:
- Extract CVEs from alerts AND Shodan enrichment
- For each CVE:
ListDefenderMachinesByVulnerability
- Export JSON and summarize findings (include enrichment data in JSON export)
Example 2: Domain Investigation
User says: "Is evil-malware.com in our environment?"
Workflow:
- Identify IoC: Domain, normalized:
evil-malware.com - Phase 1 - Threat Intel (parallel):
- Sentinel ThreatIntelIndicators query
ListDefenderIndicators(indicatorType: "DomainName", indicatorValue: "evil-malware.com")
- Phase 2 - Activity Analysis (parallel):
- DeviceNetworkEvents query for domain
- EmailUrlInfo query for domain
- AlertEvidence query for domain
- Phase 3 - Exposure Assessment:
- List all devices that connected
- Identify affected users
- Export JSON and summarize findings
Example 3: File Hash Investigation with CVE Correlation
User says: "Investigate SHA256 a1b2c3... and check which devices are vulnerable"
Workflow:
- Identify IoC: SHA256 Hash, normalized:
a1b2c3... - Phase 1 - Threat Intel (parallel):
GetDefenderFileInfo(fileHash: "a1b2c3...")GetDefenderFileAlerts(fileHash: "a1b2c3...")GetDefenderFileStatistics(fileHash: "a1b2c3...")
- Phase 2 - Device Exposure:
GetDefenderFileRelatedMachines(fileHash: "a1b2c3...")- DeviceFileEvents query
- Phase 3 - CVE Correlation:
- Extract CVEs from file threat family info
- For each CVE:
ListDefenderMachinesByVulnerability - Cross-reference with devices that have the file
- Export JSON and summarize with remediation priorities
Security Notes
- All investigations are logged for audit purposes
- IoC values may be sensitive - handle with care
- Follow organizational data classification policies
- Consider threat actor attribution implications
- Document investigation actions for incident timeline
Integration with Other Skills
This skill can be combined with:
- user-investigation: When IoC is found in user's sign-in logs
- computer-investigation: When IoC is found on specific device
- authentication-tracing: When IoC IP appears in auth anomalies
- ca-policy-investigation: When IoC triggers conditional access events
Cross-skill pivot example: "Investigate IP 203.0.113.42" → Found in user sign-ins → "Investigate user@domain.com" using user-investigation skill
.github/skills/mcp-usage-monitoring/SKILL.md
npx skills add SCStelz/security-investigator --skill mcp-usage-monitoring -g -y
SKILL.md
Frontmatter
{
"name": "mcp-usage-monitoring",
"description": "Use this skill when asked to monitor, audit, or analyze MCP (Model Context Protocol) server usage in the environment. Triggers on keywords like \"MCP usage\", \"MCP server monitoring\", \"MCP activity\", \"Graph MCP\", \"Sentinel MCP\", \"Azure MCP\", \"MCP audit\", \"tool usage monitoring\", \"MCP breakdown\", \"who is using MCP\", or when investigating MCP user activity, Graph API calls from MCP servers, or workspace query governance. This skill provides comprehensive MCP server telemetry analysis across Graph MCP, Sentinel MCP, and Azure MCP servers including usage trends, endpoint access patterns, user attribution, cross-server user analysis, sensitive API detection, workspace query governance, and security risk assessment with inline and markdown file reporting.",
"drill_down_prompt": "Run MCP usage monitoring report — Graph\/Sentinel\/Azure MCP activity, user attribution",
"threat_pulse_domains": [
"admin"
]
}
MCP Server Usage Monitoring — Instructions
Purpose
This skill monitors and audits Model Context Protocol (MCP) server usage across your Microsoft Sentinel and Defender XDR environment. MCP servers are AI-powered tools that enable language models to interact with Microsoft security services — and like any privileged access channel, they require monitoring.
What this skill tracks:
| MCP Server | Telemetry Source | Key Identifier |
|---|---|---|
| Microsoft Graph MCP Server | MicrosoftGraphActivityLogs |
AppId = e8c77dc2-69b3-43f4-bc51-3213c9d915b4 |
| Sentinel Data Lake MCP | CloudAppEvents |
RecordType 403, Interface = IMcpToolTemplate |
| Sentinel Triage MCP | MicrosoftGraphActivityLogs + SigninLogs |
AppId = 7b7b3966-1961-47b5-b080-43ca5482e21c ("Microsoft Defender Mcp") — dedicated AppId with full user attribution via delegated cert auth |
| Azure MCP Server | AzureActivity |
No dedicated AppId — uses DefaultAzureCredential |
| Sentinel Data Lake — Direct KQL | CloudAppEvents |
RecordType 379, Operation = KQLQueryCompleted |
| Workspace Query Sources (Analytics Tier) | LAQueryLogs |
All clients querying Log Analytics workspace |
What this skill detects:
- Graph API call volume, trends, and endpoint diversity via MCP
- Sensitive/high-risk Graph endpoint access (PIM, credentials, Identity Protection)
- Sentinel workspace query patterns by client application
- User vs. Service Principal attribution across all MCP channels
- Cross-server user analysis — identifies users with broadest MCP footprint (multiple server types, highest call volume)
- Azure ARM operations potentially originating from Azure MCP Server
- Non-MCP platform query sources for governance context (Sentinel Engine, Logic Apps)
- Sentinel Data Lake MCP tool usage — tool call breakdown (
query_lake,list_sentinel_workspaces,search_tables, etc.), success/failure rates, execution duration, tables accessed viaCloudAppEvents(Purview unified audit) - MCP-driven vs Direct KQL delineation — distinguishes Data Lake queries initiated via MCP tools (RecordType 403, Interface
IMcpToolTemplate) from direct KQL queries (RecordType 379) and Analytics tier queries (LAQueryLogs) - Anomalous access patterns: new users, new endpoints, volume spikes, error surges
- MCP server usage as a proportion of total workspace activity
Extended landscape awareness: Beyond these four actively monitored MCP servers, Microsoft's MCP ecosystem includes 30+ additional servers (Copilot Studio built-in catalog, Power BI, Fabric RTI, Playwright, Security Copilot Agent Creation, and more). See Extended Microsoft MCP Server Landscape for the full catalog, telemetry surfaces, and monitoring expansion priorities.
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Extended MCP Server Landscape - Full Microsoft MCP ecosystem catalog
- Output Modes - Inline chat vs. Markdown file
- Scalability & Token Management - Guidance for large environments
- Quick Start - 10-step investigation pattern
- MCP Usage Score Formula - Composite health & risk scoring
- Execution Workflow - Complete 7-phase process
- Sample KQL Queries - Validated query patterns
- Report Template - Output format specification
- Proactive Alerting — KQL Data Lake Jobs - Scheduled anomaly detection
- Known Pitfalls - Edge cases and false positives
- Error Handling - Troubleshooting guide
- SVG Dashboard Generation - Visual dashboard from completed report
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY MCP usage monitoring analysis:
- ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
- ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
- ALWAYS ask the user for time range if not specified: default to 30 days, configurable
- ALWAYS query all MCP telemetry surfaces — do not skip any MCP server type
- ALWAYS include non-MCP workspace context (Sentinel Engine, Logic Apps) for governance proportion analysis
- ALWAYS run independent queries in parallel for performance
- ALWAYS attribute activity to specific users — never present anonymous aggregates
- NEVER conflate non-MCP platform activity with MCP activity — clearly label categories
- ALWAYS execute pre-authored queries from Sample KQL Queries EXACTLY as written — substitute only the time range parameter (e.g.,
ago(30d)→ago(90d)). These queries encode mitigations for schema pitfalls documented in Known Pitfalls. Writing equivalent queries from scratch is ❌ PROHIBITED
Known AppIds Reference
MCP Servers & AI Agents
| AppId | Service | Telemetry Table | Notes |
|---|---|---|---|
e8c77dc2-69b3-43f4-bc51-3213c9d915b4 |
Microsoft Graph MCP Server for Enterprise | MicrosoftGraphActivityLogs |
Read-only Graph API proxy |
7b7b3966-1961-47b5-b080-43ca5482e21c |
Sentinel Triage MCP ("Microsoft Defender Mcp") | MicrosoftGraphActivityLogs, SigninLogs, AADNonInteractiveUserSignInLogs |
Microsoft first-party AppId, same across all tenants. Dedicated AppId — visible in MicrosoftGraphActivityLogs (API calls to /security/* endpoints) and SigninLogs/AADNonInteractiveUserSignInLogs (AppDisplayName = "Microsoft Defender Mcp"). Delegated auth with certificate (ClientAuthMethod=2), full user attribution. Scopes: SecurityAlert.Read.All, SecurityIncident.Read.All, ThreatHunting.Read.All. Target resources: Microsoft Graph, WindowsDefenderATP. No local SPN — display name only visible in SigninLogs. 🔴 Confirmed Feb 2026: Empirical telemetry investigation identified 7b7b3966 as the Triage MCP AppId via MicrosoftGraphActivityLogs + SigninLogs correlation. |
253895df-6bd8-4eaf-b101-1381ec4306eb |
Sentinel Platform Services App Reg | SigninLogs |
Sentinel-hosted MCP platform |
04b07795-8ddb-461a-bbee-02f9e1bf7b46 |
Azure MCP Server (local stdio via DefaultAzureCredential → Azure CLI) | SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs |
Shared AppId with Azure CLI. In LAQueryLogs, RequestClientApp is empty (not a unique fingerprint). Azure MCP appends \n| limit N to query text — the only query-level differentiator. Read-only ARM ops don't appear in AzureActivity. 🔄 Updated Feb 2026: Previously documented as AppId 1950a258 (AzurePowerShellCredential) with csharpsdk,LogAnalyticsPSClient — that fingerprint is obsolete; only 1 occurrence found in 30-day lookback. |
| (none — uses DefaultAzureCredential) | Azure MCP Server (local stdio) | AzureActivity |
ARM write operations only; read ops not logged. Claims.appid = 04b07795. Inherits cred from Azure CLI/VS Code |
| (no AppId — Purview unified audit) | Sentinel Data Lake MCP | CloudAppEvents |
RecordType 403; Interface IMcpToolTemplate; tools: query_lake, list_sentinel_workspaces, search_tables |
Sentinel MCP Collection Endpoints
| Endpoint URL | Collection | Monitored |
|---|---|---|
https://sentinel.microsoft.com/mcp/data-exploration |
Data Exploration (Data Lake MCP) | ✅ Phase 3 |
https://sentinel.microsoft.com/mcp/triage |
Triage (Triage MCP) | ✅ Phase 2 |
https://sentinel.microsoft.com/mcp/security-copilot-agent-creation |
Security Copilot Agent Creation | ❌ See Landscape |
Client Applications
| AppId | Service | Telemetry Table | Notes |
|---|---|---|---|
aebc6443-996d-45c2-90f0-388ff96faa56 |
Visual Studio Code | SigninLogs |
VS Code as MCP client → Sentinel |
9ba5f2e4-6bbf-4df2-b19b-7f1bcb926818 |
PowerPlatform-sentinelmcp-Connector | SigninLogs |
Copilot Studio → Sentinel MCP |
04b07795-8ddb-461a-bbee-02f9e1bf7b46 |
Azure CLI (DefaultAzureCredential) | SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs |
Primary Azure MCP Server credential path (field-tested Feb 2026). RequestClientApp is empty in LAQueryLogs. Azure MCP appends \n| limit N to query text. Shared AppId with manual az CLI — disambiguate via query text pattern or session correlation. 🔄 Previously documented as 1950a258 (AzurePowerShellCredential) — that path is obsolete |
Portal & Platform Applications (Non-MCP — for context)
| AppId | Service | Telemetry Table | Notes |
|---|---|---|---|
80ccca67-54bd-44ab-8625-4b79c4dc7775 |
M365 Security & Compliance Center (Sentinel Portal) | LAQueryLogs |
ASI_Portal, ASI_Portal_Connectors — Sentinel Portal backend, NOT an MCP server |
95a5d94c-a1a0-40eb-ac6d-48c5bdee96d5 |
Azure Portal — AppInsightsPortalExtension | LAQueryLogs |
Azure Portal blade for Log Analytics Usage dashboards/workbooks. RequestClientApp = AppInsightsPortalExtension. Executes billing/usage queries (e.g., Usage | where IsBillable). NOT MCP, NOT VS Code — runs when user opens Workspace Usage Dashboard in browser. No SPN or app registration in tenant (platform-level first-party app). Not in merill/microsoft-info known apps list. |
de8c33bb-995b-4d4a-9d04-8d8af5d59601 |
PowerPlatform-AzureMonitorLogs-Connector | AADNonInteractiveUserSignInLogs, LAQueryLogs |
Logic Apps → Log Analytics (NOT MCP) |
fc780465-2017-40d4-a0c5-307022471b92 |
Sentinel Engine (analytics rules, UEBA, Advanced Hunting backend) | LAQueryLogs |
Built-in scheduled query engine (NOT MCP). Also serves as the execution backend for Advanced Hunting — RequestClientApp = "M365D_AdvancedHunting" indicates AH queries from Triage MCP, Defender portal, or Security Copilot that hit connected LA tables (see Query 7). Separate from analytics rules (RequestClientApp empty or other values). |
Extended Microsoft MCP Server Landscape (Reference)
Beyond the four MCP servers actively monitored by this skill, Microsoft's MCP ecosystem includes many additional servers. This section catalogs them for awareness, threat modeling, and future monitoring expansion.
Sentinel MCP Collections (Microsoft-Hosted)
Microsoft Sentinel exposes three official MCP collections, each at a distinct endpoint:
| Collection | Endpoint URL | Purpose | Monitored by This Skill |
|---|---|---|---|
| Data Exploration | https://sentinel.microsoft.com/mcp/data-exploration |
query_lake, search_tables, list_sentinel_workspaces, entity analyzer |
✅ Phase 3 (CloudAppEvents) |
| Triage | https://sentinel.microsoft.com/mcp/triage |
Incident triage, Advanced Hunting, entity investigation | ✅ Phase 2 (MicrosoftGraphActivityLogs + SigninLogs — AppId 7b7b3966) |
| Security Copilot Agent Creation | https://sentinel.microsoft.com/mcp/security-copilot-agent-creation |
Create Microsoft Security Copilot agents for complex workflows | ❌ Not yet monitored |
Sentinel Custom MCP Tools: Organizations can create their own MCP tools by exposing saved KQL queries from Advanced Hunting as MCP tools. These execute through the same Sentinel MCP infrastructure and are audited in CloudAppEvents (RecordType 403) alongside built-in tools. See Create custom Sentinel MCP tools.
🔵 Monitoring note: Custom MCP tools appear in CloudAppEvents with the same RecordType 403 and
IMcpToolTemplateinterface as built-in tools. TheToolNamefield will show the custom tool name, making them visible in Query 13 without modification.
Power BI MCP Servers
| Server | Type | Endpoint / Repo | Purpose | Telemetry Surface |
|---|---|---|---|---|
| Power BI Remote MCP | Microsoft-hosted | https://api.fabric.microsoft.com/v1/mcp/powerbi |
Query Power BI datasets, reports, and workspaces remotely via SSE transport | 🟡 PowerBIActivity table (if ingested into Sentinel), Fabric audit logs |
| Power BI Modeling MCP | Local (stdio) | microsoft/powerbi-modeling-mcp | Local Power BI model operations (DAX queries, schema exploration) | ❌ Local only — no Azure telemetry |
⚠️ Data exfiltration risk: Power BI Remote MCP provides API-based access to organizational datasets. If an AI agent connects to this endpoint, it can query sensitive business data. Monitor
PowerBIActivityfor unusual access patterns if this table is available in your Sentinel workspace.
Fabric & Azure Data Explorer MCP Servers
| Server | Type | Endpoint / Repo | Purpose | Telemetry Surface |
|---|---|---|---|---|
| Fabric RTI MCP Server | Local (stdio) | microsoft/fabric-rti-mcp | Query Azure Data Explorer clusters and Fabric Real-Time Intelligence Eventhouses via KQL | 🟡 ADX audit logs, Fabric audit events |
| Azure MCP Server — Kusto namespace | Local (stdio) | Part of Azure MCP Server (azmcp --namespace kusto) |
Manage ADX clusters, databases, tables, and queries via ARM | ✅ Already covered (Azure ARM operations — Phase 4) |
| Kusto Query MCP | Copilot Studio built-in | Copilot Studio catalog | KQL query execution from Copilot Studio agents | 🟡 CloudAppEvents (Copilot Studio workload) |
🔵 Note: The Fabric RTI MCP Server is open-source and runs locally. It authenticates to ADX/Eventhouse using the user's credentials. If your org uses ADX, queries from this MCP would appear in ADX audit logs (
.show queries/ diagnostic logs), NOT in SentinelLAQueryLogs.
Developer & Productivity MCP Servers
| Server | Type | Repo | Purpose | Telemetry Surface |
|---|---|---|---|---|
| Playwright MCP | Local (stdio) | microsoft/playwright-mcp (26.9k ⭐) | Browser automation via accessibility tree — enables LLMs to interact with web pages | ❌ Local only — no Azure telemetry |
| GitHub MCP Server | Local (stdio) | github/github-mcp-server | GitHub repo operations (issues, PRs, code search) via PAT | ❌ GitHub audit logs only, not in Sentinel |
| Microsoft Learn Docs MCP | Cloud-hosted | Certified Copilot Studio connector | Search and fetch official Microsoft Learn documentation | ❌ Public docs, no security data |
Copilot Studio Built-in MCP Servers (19+ servers)
Microsoft Copilot Studio provides a catalog of built-in MCP servers for agent development. These are Microsoft-managed, cloud-hosted servers that agents can connect to.
Source: Built-in MCP servers catalog
| Category | MCP Servers | Security Relevance |
|---|---|---|
| Microsoft 365 | Outlook Mail, Outlook Calendar, 365 User Profile, Teams, Word, 365 Copilot (Search) | 🔴 High — email, calendar, user profile access |
| SharePoint & OneDrive | SharePoint and OneDrive, SharePoint Lists | 🟠 Medium — file and data access |
| Administration | 365 Admin Center | 🔴 High — administrative control plane |
| Dataverse | Dataverse MCP | 🟠 Medium — business data access |
| Dynamics 365 | Sales, Finance, Supply Chain, Service, ERP, Contact Center (6 sub-variants) | 🟡 Low-Medium — business application data |
| Fabric | Fabric MCP | 🟠 Medium — analytics data access |
| Office 365 Outlook | Contact Management, Email Management, Meeting Management | 🔴 High — email and contact data |
| Meta-Server | MCP Management MCP | 🟠 Medium — manages other MCP servers via Dataverse/Graph |
⚠️ Telemetry gap: Copilot Studio built-in MCP servers are NOT directly visible in
LAQueryLogsorMicrosoftGraphActivityLogs. Their activity may appear in:
CloudAppEvents— under Copilot Studio workload (if Purview unified audit is configured)- M365 unified audit log — as Copilot Studio agent actions
AuditLogs— service principal lifecycle events (creation, modification)AADServicePrincipalSignInLogs— SPN sign-ins toBot Frameworkfrom Azure internal IPs (fd00:*)To monitor Copilot Studio agent activity, use the
ai-agent-postureskill for comprehensive agent security auditing.
Azure MCP Server — Full Tool Surface
The Azure MCP Server (already tracked in Phase 4) has a much broader tool surface than just ARM operations. The complete namespace catalog:
| Category | Namespaces | Security-Relevant Tools |
|---|---|---|
| AI & ML | foundry, search, speech |
AI Foundry model access, Search index queries |
| Identity | role |
⚠️ RBAC role assignments — view and manage |
| Security | keyvault, appconfig, confidentialledger |
🔴 Key Vault secrets/keys/certs, App Configuration |
| Databases | cosmos, mysql, postgres, redis, sql |
Database access and management |
| Storage | storage, fileshares, storagesync, managedlustre |
Blob, file, and storage account access |
| Compute | appservice, functionapp, aks |
App Service, Functions, Kubernetes |
| Networking | eventhubs, servicebus, eventgrid, communication, signalr |
Messaging and event services |
| DevOps | bicepschema, deploy, monitor, workbooks, grafana |
Infrastructure deployment, monitoring |
| Governance | policy, quota, resourcehealth, cloudarchitect |
Policy management, resource health |
| Other | marketplace, virtualdesktop, loadtesting, acr |
VDI, container registry, load testing |
🔵 Key Vault access via MCP is particularly security-sensitive. The Azure MCP Server implements elicitation (user confirmation prompts) before returning secrets. However, this can be bypassed with the
--insecure-disable-user-confirmationflag. MonitorAzureActivityfor Key Vault operations correlated with MCP usage patterns.
Monitoring Expansion Priorities
If expanding this skill's coverage, prioritize based on data access risk:
| Priority | Server | Why | How to Monitor |
|---|---|---|---|
| 🔴 P1 | Copilot Studio built-in M365 MCPs | Email, Teams, admin center access | ai-agent-posture skill + CloudAppEvents |
| 🔴 P1 | Security Copilot Agent Creation | Creates autonomous security agents | CloudAppEvents for agent creation events |
| 🟠 P2 | Power BI Remote MCP | Dataset query access via API | PowerBIActivity table if available |
| 🟠 P2 | Sentinel Custom MCP Tools | User-defined tools, same audit surface | Already visible in Phase 3 CloudAppEvents |
| 🟡 P3 | Fabric RTI MCP | ADX/Eventhouse data access | ADX diagnostic logs |
| 🟡 P3 | Kusto Query MCP (Copilot Studio) | KQL from Copilot Studio agents | CloudAppEvents (Copilot Studio workload) |
| ⚪ P4 | Playwright, GitHub, Learn Docs MCPs | Local/public, minimal telemetry | Not monitorable from Sentinel |
Note: This catalog reflects the Microsoft MCP ecosystem as of February 2026. The Copilot Studio MCP catalog notes: "This list isn't exhaustive. New MCP connectors are added regularly."
⛔ MANDATORY: Sentinel Workspace Selection
This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:
When invoked from another skill (e.g., incident-investigation):
- Inherit the workspace selection from the parent investigation context
- If no workspace was selected in parent context: STOP and ask user to select
When invoked standalone (direct user request):
- ALWAYS call
list_sentinel_workspacesMCP tool FIRST - If 1 workspace exists: Auto-select, display to user, proceed
- If multiple workspaces exist:
- Display all workspaces with Name and ID
- ASK: "Which Sentinel workspace should I use for this analysis?"
- ⛔ STOP AND WAIT for user response
- ⛔ DO NOT proceed until user explicitly selects
- If a query fails on the selected workspace:
- ⛔ DO NOT automatically try another workspace
- STOP and report the error, display available workspaces, ASK user to select
🔴 PROHIBITED ACTIONS:
- ❌ Selecting a workspace without user consent when multiple exist
- ❌ Switching to another workspace after a failure without asking
- ❌ Proceeding with analysis if workspace selection is ambiguous
Output Modes
This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.
Mode 1: Inline Chat Summary (Default)
- Render the full MCP usage analysis directly in the chat response
- Includes ASCII tables, trend charts, endpoint breakdowns, and security assessment
- Best for quick review and interactive follow-up questions
Mode 2: Markdown File Report
- Save a comprehensive report to
reports/mcp-usage/MCP_Usage_Report_<timestamp>.md - All ASCII visualizations render correctly inside markdown code fences (
```) - Includes all data from inline mode plus additional detail sections
- Use
create_filetool — NEVER use terminal commands for file output - Filename pattern:
reports/mcp-usage/MCP_Usage_Report_YYYYMMDD_HHMMSS.md
Markdown Rendering Notes
- ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
- ✅ Unicode block characters (▓░█) display correctly in monospaced fonts
- ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
- ✅ Standard markdown tables (
| col |) render as formatted tables - Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering
Scalability & Token Management
This skill was developed in a small lab environment (1–2 users, single workspace). In larger tenants with many users, MCP servers, and higher query volumes, the query complexity is not a concern — all queries use summarize, dcount, make_set(..., N), and take operators, so result sets remain bounded regardless of raw table size. Execution time will increase but output shape stays the same.
The primary risk in large environments is LLM token exhaustion during report generation. All query results accumulate in conversation context before the report is written, and this skill file itself consumes significant context. In a large tenant, richer result sets (more users, endpoints, error categories, AppIds) can push past token limits before the report is complete.
Guardrails for Large Environments
1. Tighten result set limits in queries:
| Parameter | Small Env (default) | Large Env |
|---|---|---|
make_set(..., N) for users |
10 | 5 |
make_set(..., N) for endpoints |
20–30 | 10 |
make_set(..., N) for errors |
5 | 3 |
take on governance tables |
25 | 15 |
take on endpoint rankings |
25 | 15 |
take on error analysis |
50 | 20 |
2. Incremental file writes (markdown mode):
Instead of composing the entire report in memory and writing it in one create_file call:
- Write the report header and executive summary first with
create_file - Append each section (Graph MCP, Sentinel Triage, Data Lake, etc.) using
replace_string_in_fileto insert content at the end of the file - This allows earlier query results to fall out of active context after being written
3. Two-pass approach for very large tenants:
- Pass 1 (Summary): Run all queries with aggressive limits (
take 10,make_set(..., 3)). Generate a summary report with top-level numbers only. - Pass 2 (Drill-down): If the user wants detail on a specific section (e.g., "show me the full Data Lake error breakdown"), run targeted queries for that section only.
4. Parallel query batching:
Phases 1–5 contain independent queries — always run them in parallel. But avoid running all ~16 queries simultaneously; batch them into 2–3 groups of 5–6 queries. This balances throughput against context accumulation.
5. Omit raw query appendix for large reports:
The "Appendix: Query Details" section listing every KQL query used can be omitted in large environments to save tokens. The queries are documented in this skill file and don't need to be repeated in the report.
Indicators You're Hitting Token Limits
- Report generation starts but cuts off mid-section
- The agent switches to a new conversation turn unexpectedly during report writing
- Sections become progressively less detailed toward the end of the report
- The agent summarizes findings in chat instead of writing the full markdown file
If any of these occur, ask the agent to: "Continue writing the report from where you left off" — the incremental file write approach ensures partial progress is saved.
Quick Start (TL;DR)
When a user requests MCP usage monitoring:
- Select Workspace →
list_sentinel_workspaces, auto-select or ask - Determine Output Mode → Ask if not specified: inline, markdown file, or both
- Determine Time Range → Ask if not specified; default 30 days
- Run Phase 1 (Graph MCP) → Daily usage summary, top endpoints, sensitive API access
- Run Phase 2 (Sentinel Triage MCP) → API calls via AppId
7b7b3966, auth events, AH downstream queries - Run Phase 3 (Sentinel Data Lake MCP) → CloudAppEvents tool usage, error analysis, MCP vs Direct KQL
- Run Phase 4 (Azure MCP & ARM) → ARM operations, resource provider breakdown
- Run Phase 5 (Workspace Governance) → All query sources (Analytics + Data Lake tiers), MCP proportion
- Run Phase 6 (Cross-Server User Analysis) → Top MCP users by server breadth, power user identification
- Run Phase 7 (Assessment) → Compute MCP Usage Score, security assessment, render report
Parallel execution: Phases 1-5 contain independent queries — run all of them in parallel for performance. Phases 6-7 depend on results from 1-5.
MCP Usage Score Formula
The MCP Usage Score is a composite health and risk indicator that summarizes MCP server activity. Unlike the Drift Score (which is a ratio), this is an absolute assessment based on multiple dimensions.
Scoring Dimensions
$$ \text{MCPUsageScore} = \sum_{i} \text{DimensionScore}_i $$
Each dimension contributes 0–20 points to a maximum of 100:
| Dimension | Max Points | Green (0-5) | Yellow (6-12) | Red (13-20) |
|---|---|---|---|---|
| User Diversity | 20 | 1-2 known users | 3-5 users or 1 unknown | >5 users or unknown users |
| Endpoint Sensitivity | 20 | 0% sensitive endpoints | 1-30% sensitive | >30% calls to sensitive APIs |
| Error Rate | 20 | <1% errors | 1-5% errors | >5% errors |
| Volume Anomaly | 20 | Within ±50% of daily avg | 50-200% spike | >200% spike vs avg |
| Off-Hours Activity | 20 | <5% off-hours | 5-20% off-hours | >20% calls outside business hours |
Interpretation Scale
| Score | Meaning | Action |
|---|---|---|
| 0–25 | Healthy | ✅ Normal MCP usage, no concerns |
| 26–50 | Elevated | 🟡 Review — minor anomalies detected |
| 51–75 | Concerning | 🟠 Investigate — multiple risk signals present |
| 76–100 | Critical | 🔴 Immediate review — significant security risk |
Sensitivity Classification
Sensitive Graph API endpoints — flag any MCP calls to these patterns:
roleManagement, roleAssignments, roleEligibility,
authentication/methods, identityProtection, riskyUsers,
riskDetections, conditionalAccess, servicePrincipals,
appRoleAssignments, oauth2PermissionGrants,
auditLogs, directoryRoles, privilegedAccess,
security/alerts, security/incidents
Off-Hours Definition
Business hours: 08:00–18:00 local time (derive from user's primary sign-in timezone, or use UTC if unknown). Weekends count as off-hours for all 24 hours.
Execution Workflow
Phase 1: Graph MCP Server Analysis
Data source: MicrosoftGraphActivityLogs
Filter: AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
Collect:
- Execute Query 1 (Unified Daily MCP Activity Trend) via
RunAdvancedHuntingQuery— returns dailyServer | Day | Calls | Errors | ErrorRatefor ALL 4 MCP servers in one pass. Run this ONCE here; do NOT re-run in Phases 2–4. Feeds the SVG dashboard Row 5 line chart and volume anomaly detection. - Execute Query 2 (Endpoint & Activity Summary) via
RunAdvancedHuntingQuery— returns per-endpoint rows with call counts, sensitivity flag, off-hours metrics, error rates, and user sets. Replaces former Q2 + Q3 + Q11. Derive: top endpoints (order by CallCount), sensitive APIs (where IsSensitive), off-hours % (sum(OffHoursCalls)/sum(CallCount)).
Phase 2: Sentinel Triage MCP Analysis
Data sources: MicrosoftGraphActivityLogs, SigninLogs, AADNonInteractiveUserSignInLogs
Filter: AppId = 7b7b3966-1961-47b5-b080-43ca5482e21c ("Microsoft Defender Mcp")
Detection Method (Confirmed Feb 2026):
The Sentinel Triage MCP has a dedicated AppId (7b7b3966-1961-47b5-b080-43ca5482e21c) that appears in both MicrosoftGraphActivityLogs and SigninLogs/AADNonInteractiveUserSignInLogs. This enables definitive attribution of Triage MCP calls — no heuristics or shared-surface estimation needed.
Key characteristics:
- AppDisplayName: "Microsoft Defender Mcp" (visible in SigninLogs)
- Auth type: Delegated + certificate (ClientAuthMethod=2) — user identity always available
- Scopes:
SecurityAlert.Read.All,SecurityIncident.Read.All,ThreatHunting.Read.All - Target resources: Microsoft Graph, WindowsDefenderATP
- API endpoints: POST
/v1.0/security/runHuntingQuery/, GET/security/incidents/, GET/security/alerts_v2/ - No local SPN: Microsoft first-party app — display name only visible in SigninLogs, not in Graph API SPN lookup
🔵
MicrosoftGraphActivityLogsretention varies by environment (depends on Log Analytics workspace configuration and diagnostic settings). Do not assume a fixed retention period — check with a baseline row count query first.
Collect:
- Execute Query 3 to get authentication events by client app (VS Code, Copilot Studio, browser) with user, IP, OS, country
- Execute Query 4 to get client app usage breakdown with distinct user counts and last-seen timestamps
- Execute Query 5 to get Triage MCP API usage from
MicrosoftGraphActivityLogs— filter by AppId7b7b3966for exact Triage MCP calls with endpoint/method/user breakdown - Execute Query 6 to get Triage MCP authentication events from
SigninLogs/AADNonInteractiveUserSignInLogs— sign-in frequency, user attribution, IP, OS, country - Execute Query 7 to get LAQueryLogs for Advanced Hunting downstream queries via
fc780465/M365D_AdvancedHunting. Captures queries from anyRunAdvancedHuntingQueryconsumer (Triage MCP, Defender portal, Security Copilot) that hit connected LA tables. XDR-native tables (DeviceEvents, EmailEvents) don't appear here.
Phase 3: Sentinel Data Lake MCP Analysis
Data source: CloudAppEvents (Purview unified audit log)
Execution tool: RunAdvancedHuntingQuery preferred (30-day lookback, free for Analytics-tier tables). CloudAppEvents uses Timestamp in AH (not TimeGenerated). Fall back to mcp_sentinel-data_query_lake (uses TimeGenerated, 90d retention) only if lookback > 30 days or AH returns errors.
Filter: ActionType contains "Sentinel" or ActionType contains "KQL". RecordType is inside RawEventData (not a top-level column) — extract with parse_json(tostring(RawEventData)).RecordType. RecordType 403 = MCP tools, 379 = Direct KQL.
⚠️ MANDATORY: Execute Query 10 against query_lake before reporting any gap. If the query returns 0 results or table-not-found, THEN report the gap. Do NOT skip this phase based on assumptions about E5 licensing or Purview configuration — the table may be populated even without explicit Purview setup.
Audit Path: Sentinel Data Lake MCP tools are NOT audited via LAQueryLogs — they are tracked through Purview unified audit log, surfaced in the CloudAppEvents table. RecordType 403 (inside RawEventData) = Sentinel AI Tool activities, RecordType 379 = KQL activities.
MCP vs Direct KQL Delineation:
| Access Pattern | RecordType | Interface | Operation | What It Represents |
|---|---|---|---|---|
| MCP Server-driven | 403 | IMcpToolTemplate |
SentinelAIToolRunStarted, SentinelAIToolRunCompleted |
Tool calls via Sentinel Data Lake MCP (e.g., query_lake, list_sentinel_workspaces, search_tables) |
| Direct KQL | 379 | Microsoft.SentinelGraph.AIPrimitives.Core.Services.KqsService |
KQLQueryCompleted |
KQL queries executed directly via Sentinel Graph / Data Lake Explorer (no MCP intermediary) |
⚠️ Known Limitation (Discovered Mar 2026): RecordType 403 (SentinelAIToolRunCompleted / IMcpToolTemplate) may not be emitted by the Data Lake MCP server. In verified testing, all Data Lake MCP tool calls (query_lake, search_tables) appeared as RecordType 379 with Interface = "InterfaceNotProvided" — NOT as RecordType 403. When RecordType 403 returns 0 results:
- Do NOT report "0 MCP activity" — the audit pipeline has a gap, not the usage.
- Fallback: Use Interface breakdown within RecordType 379.
InterfaceNotProvidedcontains MCP-driven queries. Cross-reference users inInterfaceNotProvidedwith known Sentinel MCP users from Q4/Q6 (SigninLogs). Known portal interfaces:msglakeexplorer@msec-msg(Portal Data Lake Explorer),msgjobmanagement@msec-msg(scheduled jobs),ipykernel_launcher.py(Jupyter),PowerBIConnector(Power BI),Microsoft.Medeina.Server(Security Copilot). - Report as "Probable MCP" — clearly note the attribution is based on proxy signal (user overlap), not definitive RecordType 403 classification.
Key RawEventData Fields:
| Field | Description | Example |
|---|---|---|
ToolName |
MCP tool invoked | query_lake, list_sentinel_workspaces, search_tables, analyze_url_entity |
Interface |
Execution interface — distinguishes MCP from direct | IMcpToolTemplate (MCP) vs KqsService (direct) |
ExecutionDuration |
Duration in seconds (as string) | "2.4731712" |
FailureReason |
Error message if failed | "SemanticError: 'DeviceDetail' column does not exist" |
TablesRead |
Tables accessed by the query | "SigninLogs" |
DatabasesRead |
Log Analytics workspace name | "la-yourworkspace" |
TotalRows |
Rows returned | 100 |
InputParameters |
Full tool input including KQL query text and workspaceId | JSON string with query and workspaceId keys |
Collect:
- Execute Query 10 to get Data Lake MCP access pattern summary (tool/table/workspace inventory with MCP vs Direct KQL delineation)
- Execute Query 11 to get tool-level breakdown with call counts and avg execution duration
- Execute Query 12 to get error analysis for failed Data Lake MCP tool calls
Phase 4: Azure MCP Server Authentication & Queries
Data sources: SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs
Filter: AppId = 04b07795-8ddb-461a-bbee-02f9e1bf7b46 (sign-in logs, LAQueryLogs)
Collect:
- Execute Query 13 to get Azure MCP Server authentication events from SigninLogs/AADNonInteractiveUserSignInLogs — filter by AppId
04b07795(Azure CLI credential, field-tested Feb 2026). 🔄 Previously documented as AppId1950a258(AzurePowerShellCredential) — that path is obsolete. - Execute Query 14 to get Azure MCP Server workspace queries from LAQueryLogs — filter by AADClientId
04b07795.RequestClientAppis empty (not a unique fingerprint). Azure MCP appends\n| limit Nto query text — use query text pattern as differentiator.
Detection Method (🔄 Updated Feb 2026):
The Azure MCP Server runs as a local .NET process (stdio mode) and authenticates via DefaultAzureCredential. Field-tested Feb 2026: The credential chain now resolves to Azure CLI credential (04b07795-8ddb-461a-bbee-02f9e1bf7b46), NOT AzurePowerShellCredential (1950a258) as previously documented.
Previous fingerprint (OBSOLETE): AppId 1950a258 + RequestClientApp = csharpsdk,LogAnalyticsPSClient. Only 1 occurrence found in 30-day lookback. The Azure MCP Server SDK path has changed.
Current fingerprint (field-tested Feb 2026):
| Signal | Azure MCP Server (Current) | Azure CLI (Manual) | Notes |
|---|---|---|---|
| AppId (SigninLogs) | 04b07795 |
04b07795 |
Shared — not a unique differentiator |
| AADClientId (LAQueryLogs) | 04b07795 |
04b07795 |
Shared |
| RequestClientApp (LAQueryLogs) | Empty ("") |
Empty ("") |
Shared — not a unique differentiator. Empty RequestClientApp is also used by 4+ other AADClientIds |
| Query text pattern (LAQueryLogs) | Appends \n| limit N to all queries |
No standard suffix | ✅ Best differentiator — Azure MCP monitor_workspace_log_query always appends a limit operator |
| AzureActivity (Claims.appid) | 04b07795 (write ops only) |
04b07795 |
Shared; read ops not logged. Use Q14 HasLimitSuffix for query-level differentiation |
🚨 Key change from previous documentation:
- ❌
RequestClientApp = "csharpsdk,LogAnalyticsPSClient"— OBSOLETE, no longer produced by Azure MCP Server - ❌ AppId
1950a258(AzurePowerShellCredential) — OBSOLETE credential path - ✅ AppId
04b07795(Azure CLI) — current credential path - ✅
RequestClientAppis empty — shared with Azure CLI and other tools - ✅ Query text containing
\n| limit— most reliable query-level differentiator
Disambiguation challenges:
- Azure MCP Server queries are difficult to isolate from manual Azure CLI queries in LAQueryLogs because both share the same AppId AND empty
RequestClientApp - The
\n| limit Nsuffix appended bymonitor_workspace_log_queryis the best heuristic but is not guaranteed to be unique - In SigninLogs, UserAgent containing
azsdk-net-Identitywith OSMicrosoft Windowsmay still help if the credential chain includes Azure Identity SDK components - Consider correlating query timing with known MCP session activity for attribution
Authentication Sequence Observed (Current):
- Azure MCP Server acquires token via Azure CLI cached credential
- Token is reused for subsequent operations within its lifetime
- If MFA claim is missing → interactive browser prompt (rare with CLI credential)
- Subsequent calls reuse the cached token until expiry
🔴 Token Caching Behavior (Field-Tested Feb 2026):
- Sign-in events appear at token acquisition time, NOT at each individual API call time
- Once a token is cached, subsequent Azure MCP calls (list resources, get configs, etc.) do NOT generate new sign-in events
- You will see 1-3 sign-in events per token lifecycle, not one per API call
- To count actual API calls, correlate with AzureActivity (write ops) or LAQueryLogs (
monitor_workspace_log_querycalls) - The ~1hr token lifetime means at most ~24 sign-in event clusters per day of continuous use
AzureActivity visibility: Only ARM write/action/delete operations appear in AzureActivity (Administrative category). Azure MCP Server read-only operations (list subscriptions, list resource groups, list clusters) do NOT appear. Claims.appid = 04b07795 when write operations do occur.
Note: Azure MCP Server is difficult to isolate from manual Azure CLI usage because they share the same AppId and both produce empty RequestClientApp. The \n| limit N query text suffix is the best heuristic for LAQueryLogs. In SigninLogs, the shared AppId means Azure MCP authenticated as Azure CLI — there is no unique sign-in fingerprint. Present findings as "Azure MCP Server / Azure CLI (shared AppId 04b07795)" in reports.
Phase 5: Workspace Query Governance
Data source: LAQueryLogs (Analytics tier), CloudAppEvents (Data Lake tier)
Filter: All AADClientIds (LAQueryLogs), All Sentinel operations (CloudAppEvents)
Collect:
- Execute Query 8 to get all clients querying the Analytics tier workspace with query counts, user counts, CPU usage
- Data Lake tier query volume from Phase 3 results (Queries 10-12)
- MCP proportion calculation: combined MCP query volume (Analytics + Data Lake tiers) / total query volume
Phase 6: Cross-Server User Analysis
Data sources: MicrosoftGraphActivityLogs, CloudAppEvents, SigninLogs, AADNonInteractiveUserSignInLogs
Collect:
- Execute Query 9 to get Graph MCP caller attribution — User vs SPN breakdown
- Execute Query 15 to get top MCP users ranked by cross-server breadth — identifies which users span the most MCP servers and their total call volume
Note: Query 15 joins user activity across all 4 MCP channels (Graph MCP, Triage MCP, Data Lake MCP, Azure CLI/MCP) and resolves UserIds to UPNs via SigninLogs. Data Lake MCP attribution uses InterfaceNotProvided proxy signal when RecordType 403 is unavailable.
Phase 7: Score Computation & Report Generation
- Compute per-dimension scores from Phase 1-6 data:
- User Diversity: Count distinct users across all MCP channels (use Query 15 cross-server results)
- Endpoint Sensitivity: % of Graph MCP calls to sensitive patterns (Phase 1 Query 2
IsSensitivecolumn) - Error Rate: % of non-2xx responses across all MCP channels
- Volume Anomaly: Compare most recent day vs rolling average (Phase 1 Query 1 daily data)
- Off-Hours Activity: % of MCP calls outside 08:00-18:00 (Phase 1 Query 2
OffHoursCallscolumn)
- Sum dimension scores for composite MCP Usage Score
- Include Top MCP Users table in report (Phase 6 — Query 15 cross-server results)
- Generate security assessment with emoji-coded findings
- Render output in the user's selected mode
- Validate report completeness — after composing the report, run the Report Completeness Checklist below. Cross-check every required section against the template before saving/presenting. Fix any missing sections before finalizing.
Sample KQL Queries
🔴 MANDATORY: Execute these queries EXACTLY as written. Substitute only the time range parameter (e.g.,
ago(30d)→ago(90d)) and entity-specific values where indicated. These queries are schema-verified and encode mitigations for pitfalls documented in Known Pitfalls. Rewriting, paraphrasing, or constructing "equivalent" queries from scratch risks hitting the exact schema issues these queries were designed to avoid.
| Action | Status |
|---|---|
| Rewriting a pre-authored query from scratch | ❌ PROHIBITED |
Removing parse_json() / tostring() wrappers from queries |
❌ PROHIBITED |
| Substituting column names without schema verification | ❌ PROHIBITED |
Using has instead of contains for CamelCase fields |
❌ PROHIBITED |
| Executing a query not from this section without completing the Pre-Flight Checklist | ❌ PROHIBITED |
Query 1: Unified Daily MCP Activity Trend
Note: Consolidates former Q1 (Graph MCP daily), Q7d (Triage MCP daily), Q23 (Data Lake MCP daily), Q25a (Azure MCP daily) into a single union query.
Feeds: SVG dashboard Row 5 line chart (daily_mcp_trend) — all 4 series in one query.
Tool: mcp_sentinel-data_query_lake (union of SigninLogs + AADNonInteractiveUserSignInLogs fails in AH when AADNonInteractiveUserSignInLogs is on Data Lake tier — common in customer environments).
⚠️ Timestamp: All tables use TimeGenerated in Data Lake (unlike AH where CloudAppEvents uses Timestamp).
// Unified Daily MCP Activity Trend — all 4 MCP servers in one pass
// Configurable: replace 30d with desired lookback (max 30d for AH)
let lookback = 30d;
// --- Graph MCP (AppId e8c77dc2) ---
let graph_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| summarize Calls = count(),
Errors = countif(ResponseStatusCode >= 400)
by Day = bin(TimeGenerated, 1d)
| extend Server = "Graph MCP";
// --- Triage MCP (AppId 7b7b3966) ---
let triage_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "7b7b3966-1961-47b5-b080-43ca5482e21c"
| summarize Calls = count(),
Errors = countif(ResponseStatusCode >= 400)
by Day = bin(TimeGenerated, 1d)
| extend Server = "Triage MCP";
// --- Data Lake MCP (CloudAppEvents RecordType 379 + InterfaceNotProvided) ---
let data_lake_mcp = CloudAppEvents
| where TimeGenerated >= ago(lookback)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend RecordType = toint(RawData.RecordType),
Interface = tostring(RawData.Interface),
FailureReason = tostring(RawData.FailureReason)
| where RecordType == 379 and (Interface == "InterfaceNotProvided" or isempty(Interface))
| summarize Calls = count(),
Errors = countif(isnotempty(FailureReason) and FailureReason != "")
by Day = bin(TimeGenerated, 1d)
| extend Server = "Data Lake MCP";
// --- Azure MCP/CLI (AppId 04b07795 — shared with Azure CLI) ---
let azure_interactive = SigninLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
| project TimeGenerated, ResultType;
let azure_noninteractive = AADNonInteractiveUserSignInLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
| project TimeGenerated, ResultType;
let azure_mcp = union azure_interactive, azure_noninteractive
| summarize Calls = count(),
Errors = countif(ResultType != "0" and ResultType != "")
by Day = bin(TimeGenerated, 1d)
| extend Server = "Azure MCP/CLI";
// --- Union all servers ---
union graph_mcp, triage_mcp, data_lake_mcp, azure_mcp
| extend ErrorRate = iff(Calls > 0, round(100.0 * Errors / Calls, 1), 0.0)
| project Server, Day, Calls, Errors, ErrorRate
| order by Day asc, Server asc
Query 2: Graph MCP — Endpoint & Activity Summary
Replaces: former Q2 (Top Endpoints), Q3 (Sensitive API Access), Q11 (Off-Hours Activity).
Tool: RunAdvancedHuntingQuery
Report derivation: Top endpoints = all rows by CallCount desc. Sensitive endpoints = where IsSensitive. Off-hours % = sum(OffHoursCalls) / sum(CallCount) across all rows.
// Graph MCP — single-pass endpoint analysis with sensitivity + off-hours enrichment
let sensitive_patterns = dynamic([
"roleManagement", "roleAssignments", "roleEligibility",
"authentication/methods", "identityProtection", "riskyUsers",
"riskDetections", "conditionalAccess", "servicePrincipals",
"appRoleAssignments", "oauth2PermissionGrants",
"auditLogs", "directoryRoles", "privilegedAccess",
"security/alerts", "security/incidents"
]);
MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(30d)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| extend Endpoint = tostring(split(RequestUri, "?")[0])
| extend HourOfDay = datetime_part("hour", TimeGenerated)
| extend DayOfWeek = dayofweek(TimeGenerated) / 1d
| extend IsOffHours = HourOfDay < 8 or HourOfDay >= 18 or DayOfWeek >= 5
| extend IsSensitive = RequestUri has_any (sensitive_patterns)
| summarize
CallCount = count(),
DistinctUsers = dcount(UserId),
ErrorCount = countif(ResponseStatusCode >= 400),
AvgDurationMs = round(avg(DurationMs), 0),
OffHoursCalls = countif(IsOffHours),
Methods = make_set(RequestMethod, 5),
Users = make_set(UserId, 10),
LastUsed = max(TimeGenerated)
by Endpoint, IsSensitive
| extend
ErrorRate = round(100.0 * ErrorCount / CallCount, 1),
OffHoursPct = round(100.0 * OffHoursCalls / CallCount, 1)
| order by CallCount desc
| take 50
Query 3: Sentinel MCP — Authentication Events
Tool: RunAdvancedHuntingQuery (30-day lookback, free for Analytics-tier tables). Fall back to mcp_sentinel-data_query_lake only if lookback > 30 days.
⚠️ Pitfall-aware: Uses parse_json(Status) and parse_json(DeviceDetail) wrappers — required for Data Lake (string columns) and safe in AH. Uses = syntax (not as) in project — see project as Keyword Fails in Advanced Hunting.
// Who is authenticating to Sentinel MCP (via VS Code, Copilot Studio, browser)
SigninLogs
| where TimeGenerated >= ago(30d)
| where ResourceDisplayName =~ "Sentinel Platform Services"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
ResourceDisplayName, IPAddress,
ErrorCode = tostring(parse_json(Status).errorCode),
ConditionalAccessStatus, AuthenticationRequirement, ClientAppUsed,
OS = tostring(parse_json(DeviceDetail).operatingSystem),
Country = tostring(parse_json(LocationDetails).countryOrRegion)
| order by TimeGenerated desc
Query 4: Sentinel MCP — Client App Breakdown
Tool: RunAdvancedHuntingQuery (30-day lookback, free for Analytics-tier tables).
// Which client apps (VS Code, Copilot Studio, browser) are accessing Sentinel MCP
SigninLogs
| where TimeGenerated >= ago(30d)
| where ResourceDisplayName =~ "Sentinel Platform Services"
| summarize
SignInCount = count(),
DistinctUsers = dcount(UserPrincipalName),
Users = make_set(UserPrincipalName, 10),
LastSeen = max(TimeGenerated)
by AppDisplayName, AppId, ClientAppUsed
| order by SignInCount desc
Query 5: Sentinel Triage MCP — API Call Activity (Dedicated AppId)
// Measure Sentinel Triage MCP API calls via its dedicated AppId in MicrosoftGraphActivityLogs.
// AppId 7b7b3966 = "Microsoft Defender Mcp" — the Triage MCP server's own identity.
// This gives DEFINITIVE attribution of Triage MCP calls — no shared-surface estimation needed.
//
// Confirmed Feb 2026: AppId 7b7b3966 appears in MicrosoftGraphActivityLogs with delegated
// auth (certificate), full UserId attribution, and scopes SecurityAlert.Read.All,
// SecurityIncident.Read.All, ThreatHunting.Read.All.
//
// Known API endpoints:
// - POST /v1.0/security/runHuntingQuery/ (Advanced Hunting)
// - GET /security/incidents/ (ListIncidents, GetIncidentById)
// - GET /security/alerts_v2/ (ListAlerts, GetAlertById)
let triage_mcp_appid = "7b7b3966-1961-47b5-b080-43ca5482e21c";
MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(30d)
| where AppId == triage_mcp_appid
| extend Endpoint = extract(@"/v\d\.\d/(.+?)(\?|$)", 1, RequestUri)
| summarize
Calls = count(),
DistinctUsers = dcount(UserId),
Users = make_set(UserId, 10),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by RequestMethod, Endpoint
| order by Calls desc
| take 25
Query 6: Sentinel Triage MCP — Authentication Events (SigninLogs)
Tool: mcp_sentinel-data_query_lake (union of SigninLogs + AADNonInteractiveUserSignInLogs fails in AH when AADNonInteractiveUserSignInLogs is on Data Lake tier — common in customer environments).
⚠️ Pitfall-aware: Uses parse_json() wrappers on DeviceDetail/LocationDetails — required for Data Lake (string columns). Uses = syntax (not as) in project.
// Triage MCP authentication events from SigninLogs + AADNonInteractiveUserSignInLogs.
// AppId 7b7b3966 = "Microsoft Defender Mcp" — delegated auth with certificate.
// Uses parse_json() wrappers for DeviceDetail/LocationDetails (safe in both AH and Data Lake).
let triage_mcp_appid = "7b7b3966-1961-47b5-b080-43ca5482e21c";
let signinlogs_interactive = SigninLogs
| where TimeGenerated >= ago(30d)
| where AppId == triage_mcp_appid
| extend SignInType = "Interactive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
ResourceDisplayName, IPAddress,
ResultType = tostring(ResultType),
ResultDescription = tostring(ResultDescription),
SignInType,
OS = tostring(parse_json(DeviceDetail).operatingSystem),
Browser = tostring(parse_json(DeviceDetail).browser),
Country = tostring(parse_json(LocationDetails).countryOrRegion),
City = tostring(parse_json(LocationDetails).city);
let signinlogs_noninteractive = AADNonInteractiveUserSignInLogs
| where TimeGenerated >= ago(30d)
| where AppId == triage_mcp_appid
| extend SignInType = "NonInteractive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
ResourceDisplayName, IPAddress,
ResultType = tostring(ResultType),
ResultDescription = tostring(ResultDescription),
SignInType,
OS = tostring(parse_json(DeviceDetail).operatingSystem),
Browser = tostring(parse_json(DeviceDetail).browser),
Country = tostring(parse_json(LocationDetails).countryOrRegion),
City = tostring(parse_json(LocationDetails).city);
union signinlogs_interactive, signinlogs_noninteractive
| summarize
SignIns = count(),
DistinctUsers = dcount(UserPrincipalName),
Users = make_set(UserPrincipalName, 10),
IPs = make_set(IPAddress, 10),
Countries = make_set(Country, 10),
LastSeen = max(TimeGenerated)
by AppDisplayName, SignInType, ResourceDisplayName
| order by SignIns desc
Query 7: LAQueryLogs — Advanced Hunting Downstream Queries (Supplementary Signal)
// SUPPLEMENTARY detection: Advanced Hunting queries (from Triage MCP, Defender portal,
// Security Copilot, or any RunAdvancedHuntingQuery consumer) that hit connected
// Log Analytics workspace tables.
//
// AH downstream queries appear under fc780465 (Sentinel Engine) with
// RequestClientApp "M365D_AdvancedHunting" — full user attribution (AADEmail populated).
//
// This is a DOWNSTREAM signal — it only fires when RunAdvancedHuntingQuery targets
// Sentinel-connected LA tables (SigninLogs, AuditLogs, SecurityAlert, etc.).
// Queries hitting XDR-native tables (DeviceEvents, EmailEvents, etc.) stay in the
// Defender XDR backend and never appear here.
//
// Use alongside Query 5 (MicrosoftGraphActivityLogs) for complete Triage MCP coverage:
// - Query 5 = PRIMARY: Triage MCP API calls filtered by dedicated AppId 7b7b3966
// - Query 7 = SUPPLEMENTARY: downstream query execution when AH hits LA tables
//
// ATTRIBUTION LIMITATION: Cannot distinguish Triage MCP AH queries from Defender portal
// AH queries or Security Copilot AH queries — all appear as M365D_AdvancedHunting.
LAQueryLogs
| where TimeGenerated >= ago(30d)
| where AADClientId == "fc780465-2017-40d4-a0c5-307022471b92" and RequestClientApp == "M365D_AdvancedHunting"
| summarize
QueryCount = count(),
DistinctUsers = dcount(AADEmail),
Users = make_set(AADEmail, 10),
AvgCPUMs = avg(StatsCPUTimeMs),
TotalRowsReturned = sum(ResponseRowCount),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by AADClientId, RequestClientApp
| order by QueryCount desc
Query 8: All Workspace Query Sources — Complete Governance View
// Every client querying the workspace — MCP and non-MCP combined
LAQueryLogs
| where TimeGenerated >= ago(30d)
| summarize
QueryCount = count(),
DistinctUsers = dcount(AADEmail),
AvgCPUMs = avg(StatsCPUTimeMs),
TotalRowsReturned = sum(ResponseRowCount)
by AADClientId
| order by QueryCount desc
Query 9: Graph MCP — Caller Attribution (User vs SPN)
// Attribute Graph MCP calls to User, Service Principal, or SPN subtype
// Key: UserId populated = delegated (user), ServicePrincipalId populated = app-only (SPN)
// ClientAuthMethod: 0 = public client (user), 1 = client secret (SPN), 2 = certificate (SPN)
MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(30d)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| extend CallerType = case(
isnotempty(ServicePrincipalId) and isempty(UserId), "ServicePrincipal/Agent (App-Only)",
isnotempty(UserId) and isnotempty(ServicePrincipalId), "Delegated (User+SPN/Agent OBO)",
isnotempty(UserId) and isempty(ServicePrincipalId), "User (Delegated)",
"Unknown")
| extend AuthMethod = case(
ClientAuthMethod == 0, "Public Client",
ClientAuthMethod == 1, "Client Secret",
ClientAuthMethod == 2, "Client Certificate",
"Unknown")
| summarize
CallCount = count(),
DistinctEndpoints = dcount(tostring(split(RequestUri, "?")[0])),
SuccessRate = round(100.0 * countif(ResponseStatusCode >= 200 and ResponseStatusCode < 300) / count(), 1),
SampleEndpoints = make_set(tostring(split(RequestUri, "?")[0]), 5),
IPs = make_set(IPAddress, 5)
by CallerType, AuthMethod, UserId, ServicePrincipalId
| order by CallCount desc
Post-processing: For any rows where CallerType = "ServicePrincipal/Agent (App-Only)", cross-reference the ServicePrincipalId with Entra via Graph API:
- Primary method (most reliable): Query
/beta/servicePrincipals/{id}?$select=id,appId,displayName,servicePrincipalType,tags— checktagsarray for agentic indicators:AgenticApp— confirms this is an agent applicationAIAgentBuilder— agent was created by an AI agent builder platformAgentCreatedBy:CopilotStudio— specifically created by Copilot StudioAgenticInstance— runtime instance of an agentpower-virtual-agents-*— Copilot Studio internal tracking tag
- Fallback: Check
servicePrincipalType— if it equals"Agent", it is a registered Agent Identity. Note: as of Feb 2026, Copilot Studio agents still show"Application"here despite being true agents. - Name-based filtering is UNRELIABLE — SPNs with "Agent" in display name may be standard app registrations (e.g., "Contoso Agent Tools" =
GitCreatedApp).
Use microsoft_graph_suggest_queries → microsoft_graph_get for the Graph API calls. Query multiple SPNs in one call: /beta/servicePrincipals?$count=true&$filter=id in ('id1','id2')&$select=id,appId,displayName,servicePrincipalType,tags.
Query 10: Data Lake MCP — Access Pattern Summary
Note: Consolidates former Q20 (Tool Usage Summary) + Q24 (MCP vs Direct KQL Delineation) into a single query.
Tool: RunAdvancedHuntingQuery (uses Timestamp for CloudAppEvents).
⚠️ Pitfall-aware: Uses contains (not has) for ActionType/Operation — see CloudAppEvents CamelCase Matching. Uses parse_json(tostring(RawEventData)) — see CloudAppEvents RawEventData Parsing. Filters on SentinelAIToolRunCompleted only — see CloudAppEvents Double-Counting Prevention.
// Data Lake MCP — single-pass access pattern delineation + tool/table/workspace inventory
// Combines former Q20 (summary) and Q24 (delineation) into one query
CloudAppEvents
| where Timestamp >= ago(30d)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend
Operation = tostring(RawData.Operation),
RecordType = toint(RawData.RecordType),
ToolName = tostring(RawData.ToolName),
Interface = tostring(RawData.Interface),
ExecutionDuration = todouble(RawData.ExecutionDuration),
FailureReason = tostring(RawData.FailureReason),
TablesRead = tostring(RawData.TablesRead),
DatabasesRead = tostring(RawData.DatabasesRead),
TotalRows = toint(RawData.TotalRows),
UserId_raw = tostring(RawData.UserId),
InputParams = tostring(RawData.InputParameters)
| extend
AccessPattern = case(
RecordType == 403 and Interface == "IMcpToolTemplate", "MCP Server-Driven",
RecordType == 379 and (Interface == "InterfaceNotProvided" or isempty(Interface)), "MCP-Driven (Probable)",
RecordType == 379 and Interface has "msglakeexplorer", "Portal (Data Lake Explorer)",
RecordType == 379 and Interface has "msgjobmanagement", "Scheduled Jobs",
RecordType == 379, "Other Direct KQL",
"Other"),
IsSuccess = isempty(FailureReason) or FailureReason == "",
HasKQLQuery = InputParams has "query"
| where Operation contains "Completed" or RecordType == 379 // 'contains' not 'has' — CamelCase
| summarize
TotalCalls = count(),
SuccessCount = countif(IsSuccess),
FailureCount = countif(not(IsSuccess)),
DistinctTools = dcount(ToolName),
Tools = make_set(ToolName, 20),
DistinctTables = dcount(TablesRead),
Tables = make_set(TablesRead, 30),
Workspaces = make_set(DatabasesRead, 5),
AvgDurationSec = round(avg(ExecutionDuration), 2),
TotalRowsReturned = sum(TotalRows),
DistinctUsers = dcount(UserId_raw),
Users = make_set(UserId_raw, 10),
KQLQueryCount = countif(HasKQLQuery),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by AccessPattern
| extend ErrorRate = round(100.0 * FailureCount / TotalCalls, 1)
| order by TotalCalls desc
Post-processing for Query 10:
- If
MCP Server-Driven(RecordType 403) has results → use it directly as the definitive MCP count. - If
MCP Server-Drivenreturns 0 rows butMCP-Driven (Probable)has results → report the probable count with the audit gap caveat. Cross-reference users with Q4/Q6 SigninLogs to validate. Portal (Data Lake Explorer)=msglakeexplorer@msec-msginterface,Scheduled Jobs=msgjobmanagement@msec-msg.- Combine with Query 8 (Analytics tier LAQueryLogs — all workspace sources) for a complete two-tier governance view:
| Tier | Data Source | MCP Sources | Non-MCP Sources |
|---|---|---|---|
| Analytics Tier | LAQueryLogs |
AH backend fc780465 / M365D_AdvancedHunting (captures AH queries from Triage MCP, Defender portal, Security Copilot that hit connected LA tables; shared surface, see Query 7) |
Sentinel Portal (80ccca67), Sentinel Engine analytics (fc780465, non-AH), Logic Apps (de8c33bb) |
| Data Lake Tier | CloudAppEvents |
Data Lake MCP (RecordType 403, IMcpToolTemplate) |
Direct KQL (RecordType 379, KqsService) |
| Graph API | MicrosoftGraphActivityLogs |
Graph MCP (e8c77dc2) |
— |
| Azure MCP | SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs |
Azure MCP Server (04b07795, empty RequestClientApp, query text `\n |
limit N` suffix) |
Query 11: Data Lake MCP — Interface Breakdown
Tool: RunAdvancedHuntingQuery (uses Timestamp for CloudAppEvents).
⚠️ Pitfall-aware: Uses contains/parse_json(tostring()) pattern — see Query 10 pitfall notes. Uses todouble(ExecutionDuration) — see Data Lake MCP ExecutionDuration Format. When RecordType 403 is present, groups by ToolName; when absent, falls back to Interface field.
// Breakdown of Data Lake access by Interface — identifies MCP vs Portal vs Jobs
// PRIMARY: Uses RecordType 403 / ToolName when available (MCP audit events)
// FALLBACK: When RecordType 403 absent, groups by Interface field from RecordType 379
// - InterfaceNotProvided = probable MCP-driven (cross-ref with Q4/Q6 SigninLogs)
// - msglakeexplorer@msec-msg = Sentinel Portal Data Lake Explorer
// - msgjobmanagement@msec-msg = Scheduled/job-based queries
// - ipykernel_launcher.py = Jupyter Notebook
// - PowerBIConnector = Power BI
// - Microsoft.Medeina.Server = Security Copilot
CloudAppEvents
| where Timestamp >= ago(30d)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend
Operation = tostring(RawData.Operation),
RecordType = toint(RawData.RecordType),
ToolName = tostring(RawData.ToolName),
Interface = tostring(RawData.Interface),
ExecutionDuration = todouble(RawData.ExecutionDuration),
FailureReason = tostring(RawData.FailureReason),
TablesRead = tostring(RawData.TablesRead),
UserId_raw = tostring(RawData.UserId)
| where Operation contains "Completed" or RecordType == 379
| extend
// When RecordType 403 exists, ToolName is the grouping key; otherwise use Interface
GroupKey = iff(RecordType == 403, coalesce(ToolName, "unknown_tool"), coalesce(Interface, "InterfaceNotProvided")),
IsSuccess = isempty(FailureReason) or FailureReason == "",
Source = iff(RecordType == 403, "MCP Tool (RecordType 403)", "Interface (RecordType 379)")
| summarize
CallCount = count(),
SuccessCount = countif(IsSuccess),
FailureCount = countif(not(IsSuccess)),
AvgDurationSec = round(avg(ExecutionDuration), 2),
MaxDurationSec = round(max(ExecutionDuration), 2),
TablesAccessed = make_set(TablesRead, 20),
DistinctUsers = dcount(UserId_raw),
Users = make_set(UserId_raw, 10),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by GroupKey, Source
| extend ErrorRate = round(100.0 * FailureCount / CallCount, 1)
| order by CallCount desc
Query 12: Data Lake MCP — Error Analysis
Tool: RunAdvancedHuntingQuery (uses Timestamp for CloudAppEvents).
⚠️ Pitfall-aware: Uses contains/parse_json(tostring()) pattern — see Query 10 pitfall notes. Now groups errors by both AccessPattern (MCP vs Portal vs Jobs) and ErrorCategory for richer diagnostics.
// Analyze failed Data Lake queries — identify schema errors, permission issues, etc.
// PRIMARY: Filters on ActionType contains "SentinelAITool" (RecordType 403) when available
// FALLBACK: When RecordType 403 absent, analyzes all failed RecordType 379 events grouped by Interface
CloudAppEvents
| where Timestamp >= ago(30d)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend
Operation = tostring(RawData.Operation),
RecordType = toint(RawData.RecordType),
ToolName = tostring(RawData.ToolName),
Interface = tostring(RawData.Interface),
FailureReason = tostring(RawData.FailureReason),
TablesRead = tostring(RawData.TablesRead),
UserId_raw = tostring(RawData.UserId)
| where Operation contains "Completed" or RecordType == 379
| where isnotempty(FailureReason) and FailureReason != ""
| extend
AccessPattern = case(
RecordType == 403 and Interface == "IMcpToolTemplate", "MCP Server-Driven",
RecordType == 379 and (Interface == "InterfaceNotProvided" or isempty(Interface)), "MCP-Driven (Probable)",
RecordType == 379 and Interface has "msglakeexplorer", "Portal (Data Lake Explorer)",
RecordType == 379 and Interface has "msgjobmanagement", "Scheduled Jobs",
RecordType == 379, "Other Direct KQL",
"Other"),
ErrorCategory = case(
FailureReason has "SemanticError", "Schema/Semantic Error",
FailureReason has "SyntaxError", "KQL Syntax Error",
FailureReason has "Unauthorized" or FailureReason has "403", "Permission Denied",
FailureReason has "Timeout", "Query Timeout",
FailureReason has "NotFound", "Table/Resource Not Found",
"Other Error")
| summarize
ErrorCount = count(),
Tools = make_set(ToolName, 10),
Tables = make_set(TablesRead, 10),
Users = make_set(UserId_raw, 10),
SampleErrors = make_set(substring(FailureReason, 0, 150), 5),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
by AccessPattern, ErrorCategory
| order by AccessPattern asc, ErrorCount desc
Query 13: Azure MCP Server — Authentication Events (SigninLogs)
Tool: mcp_sentinel-data_query_lake (90d lookback exceeds AH 30d limit).
⚠️ Pitfall-aware: Uses parse_json(Status)/parse_json(DeviceDetail) wrappers — see SigninLogs Status Field Needs parse_json(). Uses extend SignInType to avoid Type pseudo-column — see Type Column Unavailable in Data Lake Union Contexts.
// Detect Azure MCP Server authentication events via Azure CLI AppId.
//
// 🔄 UPDATED Feb 2026: Azure MCP Server now uses Azure CLI credential (04b07795),
// NOT AzurePowerShellCredential (1950a258) as previously documented.
// The old AppId 1950a258 + UserAgent 'azsdk-net-Identity' fingerprint is OBSOLETE.
//
// ⚠️ SHARED APPID: 04b07795 is the Azure CLI AppId — shared with manual 'az' CLI usage.
// There is NO unique sign-in fingerprint for Azure MCP Server vs manual Azure CLI.
// This query returns ALL Azure CLI sign-ins. Correlate with LAQueryLogs (Query 14)
// for query-level attribution via the '\n| limit N' text pattern.
//
// NOTE: Sign-in events represent TOKEN ACQUISITIONS, not individual API calls.
// A cached token serves many Azure MCP calls with no additional sign-in events.
// FIX (Feb 2026): Explicit tostring() casts on ResultType, ResultDescription,
// ConditionalAccessStatus, AuthenticationRequirement to prevent union type mismatches
// between SigninLogs and AADNonInteractiveUserSignInLogs. Removed ResourceId (inconsistent
// across tables). Use parse_json() wrapper on DeviceDetail and LocationDetails — these
// columns may be stored as string (not dynamic) in Data Lake workspaces, causing
// SemanticError on dot-notation access without parse_json().
let azure_mcp_appid = "04b07795-8ddb-461a-bbee-02f9e1bf7b46";
let signinlogs_interactive = SigninLogs
| where TimeGenerated >= ago(90d)
| where AppId == azure_mcp_appid
| extend SignInType = "Interactive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
ResourceDisplayName, IPAddress,
ResultType = tostring(ResultType),
ResultDescription = tostring(ResultDescription),
UserAgent, SignInType,
ConditionalAccessStatus = tostring(ConditionalAccessStatus),
AuthenticationRequirement = tostring(AuthenticationRequirement),
OS = tostring(parse_json(DeviceDetail).operatingSystem),
Country = tostring(parse_json(LocationDetails).countryOrRegion);
let signinlogs_noninteractive = AADNonInteractiveUserSignInLogs
| where TimeGenerated >= ago(90d)
| where AppId == azure_mcp_appid
| extend SignInType = "Non-Interactive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
ResourceDisplayName, IPAddress,
ResultType = tostring(ResultType),
ResultDescription = tostring(ResultDescription),
UserAgent, SignInType,
ConditionalAccessStatus = tostring(ConditionalAccessStatus),
AuthenticationRequirement = tostring(AuthenticationRequirement),
OS = tostring(parse_json(DeviceDetail).operatingSystem),
Country = tostring(parse_json(LocationDetails).countryOrRegion);
union signinlogs_interactive, signinlogs_noninteractive
| order by TimeGenerated desc
Query 14: Azure MCP Server — Workspace Queries (LAQueryLogs)
Tool: mcp_sentinel-data_query_lake (90d lookback exceeds AH 30d limit).
// Detect Azure MCP Server workspace queries via LAQueryLogs.
//
// 🔄 UPDATED Feb 2026: Azure MCP Server now uses Azure CLI credential (04b07795).
// RequestClientApp is EMPTY (not 'csharpsdk,LogAnalyticsPSClient' as previously documented).
//
// ⚠️ SHARED FINGERPRINT: Empty RequestClientApp + AppId 04b07795 is shared with manual
// Azure CLI and 4+ other AADClientIds. This query returns ALL queries from AppId 04b07795
// with empty RequestClientApp. To isolate Azure MCP Server queries, look for the
// '\n| limit N' suffix that monitor_workspace_log_query always appends to query text.
//
// 30-day pattern analysis (Feb 2026) showed 11 distinct RequestClientApp values:
// - Empty ("") = 417 queries across 5 AADClientIds (Azure MCP, Sentinel DL MCP, Portal, etc.)
// - "csharpsdk,LogAnalyticsPSClient" = only 1 query ever (obsolete fingerprint)
// - "M365D_AdvancedHunting" = Advanced Hunting backend
// - "ASI_Portal" / "ASI_Portal_Connectors" = Sentinel Portal
// - Others: AppInsightsPortalExtension, LogicApps, PSClient, etc.
let azure_cli_appid = "04b07795-8ddb-461a-bbee-02f9e1bf7b46";
LAQueryLogs
| where TimeGenerated >= ago(90d)
| where AADClientId == azure_cli_appid
| extend HasLimitSuffix = QueryText has "\n| limit" or QueryText has "\r\n| limit"
| project TimeGenerated, AADEmail, AADClientId,
RequestClientApp,
QueryTextTruncated = substring(QueryText, 0, 300),
ResponseCode, ResponseRowCount,
StatsCPUTimeMs,
RequestTarget,
HasLimitSuffix
| order by TimeGenerated desc
Post-processing: Rows with
HasLimitSuffix = trueare highly likely Azure MCP Server queries (themonitor_workspace_log_querycommand always appends| limit N). Rows without the suffix may be manual Azure CLI or other tools using the same credential.
Query 15: Top MCP Users — Cross-Server Breadth
Tool: RunAdvancedHuntingQuery (7-day lookback default, all tables on Analytics tier).
Purpose: Identifies users with the broadest MCP footprint — ranking by how many distinct MCP server types they use and their total call volume across all channels. Feeds the Top MCP Users report section and SVG dashboard widget.
let lookback = 7d;
let graph_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated > ago(lookback)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| where isnotempty(UserId)
| summarize Calls = count() by UserId
| project UserId, Server = "Graph MCP", Calls;
let triage_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated > ago(lookback)
| where AppId == "7b7b3966-1961-47b5-b080-43ca5482e21c"
| where isnotempty(UserId)
| summarize Calls = count() by UserId
| project UserId, Server = "Triage MCP", Calls;
let datalake_mcp = CloudAppEvents
| where Timestamp > ago(lookback)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| where tostring(RawData.Interface) == "InterfaceNotProvided" or isempty(tostring(RawData.Interface))
| where isnotempty(AccountObjectId)
| summarize Calls = count() by UserId = AccountObjectId
| project UserId, Server = "Data Lake MCP", Calls;
let azure_mcp = union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(lookback)
| where AppId == "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
| where isnotempty(UserId)
| summarize Calls = count() by UserId
| project UserId, Server = "Azure CLI/MCP", Calls;
let upn_map = union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(lookback)
| where isnotempty(UserPrincipalName)
| summarize arg_max(TimeGenerated, UserPrincipalName) by UserId
| project UserId, UPN = UserPrincipalName;
union graph_mcp, triage_mcp, datalake_mcp, azure_mcp
| summarize Servers = make_set(Server), ServerCount = dcount(Server), TotalCalls = sum(Calls) by UserId
| join kind=leftouter upn_map on UserId
| project UPN = coalesce(UPN, UserId), ServerCount, Servers, TotalCalls
| sort by ServerCount desc, TotalCalls desc
| take 25
⚠️ Pitfall-aware:
- Data Lake MCP leg: Uses
ActionType contains(nothas) per the CamelCase pitfall. ParsesRawEventDataonce and filters onInterfacefield for theInterfaceNotProvidedproxy signal when RecordType 403 is unavailable (see Phase 3 Known Limitation). - Azure CLI/MCP leg: Uses shared AppId
04b07795— includes both Azure MCP Server and manualazCLI sign-ins. Cannot distinguish at this level. - UPN resolution: Joins with SigninLogs to resolve
UserIdGUIDs to human-readable UPNs. Users with no recent sign-ins will show their GUID instead. - CloudAppEvents timestamp: Uses
Timestamp(notTimeGenerated) since this runs via Advanced Hunting. - AADNonInteractiveUserSignInLogs tier: If this table is on Data Lake/Basic tier, the
union SigninLogs, AADNonInteractiveUserSignInLogslegs may fail in AH. Fall back tomcp_sentinel-data_query_lakeif needed (switchTimestamp→TimeGeneratedfor the CloudAppEvents leg).
Post-processing:
- Render as a ranked table in the report:
| Rank | User (UPN) | Servers Used | MCP Servers | Total Calls | - Users spanning 3+ servers represent the broadest MCP adoption — highlight them.
- Cross-reference top users with the sensitive endpoint data from Q2 to flag users with both breadth AND sensitive access.
Report Template
Inline Chat Report Structure
The inline report MUST include these sections in order:
- Header — Workspace, analysis period, data sources checked, MCP servers detected
- Executive Summary — 2-3 sentence overview of MCP usage posture
- MCP Footprint Summary (SVG-critical: provides consolidated KPIs for dashboard Row 2 + Row 3)
- Server Landscape table — one row per MCP server with: Server, API Calls, Auth Events, Distinct Users, Error Rate, Status. This table feeds the SVG
server_landscapewidget directly. - Consolidated KPI block — aggregate totals across all servers:
Total MCP API Calls: <sum of API calls across Graph + Triage + Data Lake + Azure> Total Auth Events: <sum of auth events across Triage + Azure + Platform Services> Distinct MCP Users: <deduplicated count or max across channels> Active MCP Servers: <count of server types with >0 activity> Combined MCP Query Share: <MCP queries / total workspace queries %> Sensitive API Rate: <sensitive / total Graph MCP calls %> - These values are derived from Phase 1-5 query results and MUST be rendered as a single block for SVG extraction. Do not scatter them across per-server sections only.
- Server Landscape table — one row per MCP server with: Server, API Calls, Auth Events, Distinct Users, Error Rate, Status. This table feeds the SVG
- Graph MCP Server Analysis
- Daily usage trend (ASCII bar chart showing requests/day — from Query 1 unified trend, Graph MCP series)
- Top endpoints table (endpoint, call count, % of total, last used)
- Sensitive API access summary with user attribution
- Caller attribution (User vs SPN vs Agent — from Query 9)
- Sentinel Triage MCP Analysis
- Triage MCP API calls from
MicrosoftGraphActivityLogs— filtered by dedicated AppId7b7b3966("Microsoft Defender Mcp") - Daily usage trend (ASCII bar chart showing calls/day — from Query 1 unified trend, Triage MCP series)
- Triage MCP authentication events from
SigninLogs/AADNonInteractiveUserSignInLogs— sign-in frequency, user attribution, IP, country - User attribution table with sign-in type breakdown
- Triage MCP API calls from
- Sentinel Data Lake MCP Analysis
- MCP tool usage summary (success/failure, avg duration)
- Tool breakdown table (query_lake, list_sentinel_workspaces, search_tables, etc.)
- Error analysis with error categories and sample failure reasons
- Daily activity trend (ASCII bar chart — from Query 1 unified trend, Data Lake MCP series)
- MCP vs Direct KQL delineation table
- Azure MCP & ARM Analysis
- Azure MCP Server authentication events (detected via AppId
04b07795— Azure CLI credential, shared AppId) - Daily auth trend (ASCII bar chart showing events/day — from Query 1 unified trend, Azure MCP/CLI series)
- Azure MCP Server workspace queries from LAQueryLogs (detected via AADClientId
04b07795+ emptyRequestClientApp+\n| limit Nquery text suffix) - ARM operation volume and resource providers accessed — if no ARM write ops detected, explicitly state: "✅ No ARM write operations detected for AppId
04b07795in the analysis period." - Source attribution via Claims.appid (Azure Portal, AI Studio, Power Platform connectors, etc.)
- Azure MCP Server authentication events (detected via AppId
- Workspace Query Governance (Two-Tier)
- Analytics Tier (LAQueryLogs): All query sources table with MCP vs Portal vs Platform breakdown
- Data Lake Tier (CloudAppEvents): MCP-driven vs Direct KQL breakdown
- Combined MCP proportion across both tiers
- Pareto analysis of query sources
- Top MCP Users (Cross-Server Breadth)
- Ranked table of users by number of MCP servers used and total call volume
- Cross-server correlation (Graph MCP, Triage MCP, Data Lake MCP, Azure CLI/MCP)
- UPN resolution from UserIds
- MCP Usage Score — Per-dimension breakdown with scoring rationale
- Security Assessment — Emoji-coded findings table with evidence citations
- Recommendations — Prioritized action items based on findings
Report Completeness Checklist
🔴 MANDATORY — Run before finalizing any report. After composing the full report, verify each row below. Every server section (4-7) must include its Daily Trend chart derived from Query 1. Query 1 returns all 4 server series in a single union — filter by Server column to extract each.
| # | Section | Required Sub-Section | Data Source | Check |
|---|---|---|---|---|
| 4 | Graph MCP Server | Daily Usage Trend (ASCII bar chart) | Q1 → Server = "Graph MCP" |
☐ |
| 4 | Graph MCP Server | Top Endpoints table | Q2 | ☐ |
| 4 | Graph MCP Server | Sensitive API access summary | Q2 IsSensitive rows |
☐ |
| 4 | Graph MCP Server | Caller attribution | Q9 | ☐ |
| 5 | Sentinel Triage MCP | Daily Usage Trend (ASCII bar chart) | Q1 → Server = "Triage MCP" |
☐ |
| 5 | Sentinel Triage MCP | API calls table | Q5 | ☐ |
| 5 | Sentinel Triage MCP | Authentication events | Q6 | ☐ |
| 6 | Data Lake MCP | Daily Activity Trend (ASCII bar chart) | Q1 → Server = "Data Lake MCP" |
☐ |
| 6 | Data Lake MCP | MCP vs Direct KQL delineation | Q10 | ☐ |
| 6 | Data Lake MCP | Tool breakdown table | Q11 | ☐ |
| 6 | Data Lake MCP | Error analysis | Q12 | ☐ |
| 7 | Azure MCP Server | Daily Auth Trend (ASCII bar chart) | Q1 → Server = "Azure MCP/CLI" |
☐ |
| 7 | Azure MCP Server | Authentication events | Q13 | ☐ |
| 7 | Azure MCP Server | Workspace queries (LAQueryLogs) | Q14 | ☐ |
| 7 | Azure MCP Server | AzureActivity write operations | (ad-hoc or explicit "none found") | ☐ |
| 9 | Top MCP Users | Cross-server user breadth table | Q15 | ☐ |
If any checkbox cannot be checked, either the data was missing (state why — e.g., "Q1 returned 0 rows for this server") or the section was accidentally omitted. Do not finalize the report with unchecked boxes unless the data genuinely does not exist.
Report Visualization Patterns
Daily Usage Trend (ASCII)
Graph MCP Usage — Last 30 Days
Day Calls Trend
─────────────────────────────────────
2026-02-07 │ 23 ████████████
2026-02-06 │ 0
2026-02-05 │ 45 ██████████████████████
2026-02-04 │ 12 ██████
...
─────────────────────────────────────
Avg: 15.2/day Peak: 45 Total: 152
Workspace Query Proportion (ASCII)
Analytics Tier Query Sources — Last 30d (LAQueryLogs)
──────────────────────────────────────────
Sentinel Engine ████████████████████████████████████ 88.4% (10,354)
Logic Apps ████ 7.0% (821)
Triage MCP █ 4.1% (481)
Sentinel Portal 0.4% (48)
──────────────────────────────────────────
MCP Servers: 4.1% │ Portal: 0.4% │ Platform: 95.4%
Data Lake Tier Query Sources — Last 30d (CloudAppEvents)
──────────────────────────────────────────
Data Lake MCP ████████████████████████████████████ 97.1% (1,028)
Direct KQL 2.9% (34)
──────────────────────────────────────────
MCP Server-Driven: 97.1% │ Direct KQL: 2.9%
Endpoint Access Distribution (ASCII)
Top Graph MCP Endpoints — 30d
─────────────────────────────────────────────────────
conditionalAccess/policies ████████████ 27 (17.8%)
users ██████████ 22 (14.5%)
roleManagement/directory ████████ 18 (11.8%)
servicePrincipals ██████ 14 (9.2%)
groups █████ 11 (7.2%)
...
─────────────────────────────────────────────────────
🔴 Sensitive: 82/152 (53.9%) │ ✅ Standard: 70/152 (46.1%)
MCP Usage Score Card (ASCII)
┌──────────────────────────────────────────────────────┐
│ MCP USAGE SCORE: 22/100 │
│ Rating: ✅ HEALTHY │
├──────────────────────────────────────────────────────┤
│ User Diversity [██░░░░░░░░] 3/20 (1-2 users) │
│ Endpoint Sensitiv [████████░░] 14/20 (54% sensitive)│
│ Error Rate [░░░░░░░░░░] 0/20 (<1% errors) │
│ Volume Anomaly [██░░░░░░░░] 3/20 (within norm) │
│ Off-Hours Activity [█░░░░░░░░░] 2/20 (<5% off-hrs) │
└──────────────────────────────────────────────────────┘
Markdown File Report Structure
When outputting to markdown file, include everything from the inline format PLUS:
# MCP Server Usage Monitoring Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Analysis Period:** <start> → <end> (<N> days)
**Data Sources:** MicrosoftGraphActivityLogs, SigninLogs, LAQueryLogs, CloudAppEvents, AzureActivity, SentinelAudit
---
## Executive Summary
<2-3 sentence summary: MCP servers detected, total usage volume, risk level, key findings>
---
## MCP Footprint Summary
### Server Landscape
| MCP Server | API Calls | Auth Events | Distinct Users | Error Rate | Status |
|------------|----------:|------------:|---------------:|-----------:|--------|
| Graph MCP | ... | — | ... | ...% | ✅/🟡/🟠/🔴 |
| Triage MCP | ... | ... | ... | ...% | ✅/🟡/🟠/🔴 |
| Data Lake MCP | ... | — | ... | ...% | ✅/🟡/🟠/🔴 |
| Azure MCP/CLI | — | ... | ... | ...% | ✅/🟡/🟠/🔴 |
### Consolidated KPIs
| Metric | Value |
|--------|------:|
| Total MCP API Calls | X,XXX |
| Total Auth Events | X,XXX |
| Distinct MCP Users | XXX |
| Active MCP Servers | N of 4 |
| Combined MCP Query Share | X.X% |
| Sensitive API Rate | X.X% |
> **SVG Note:** These KPIs map directly to Row 2 KPI cards and the Server Landscape maps to Row 3 table widget. Render this section before per-server deep dives to enable incremental SVG generation.
---
## Graph MCP Server
### Daily Usage Trend
<ASCII bar chart — requests per day>
### Top Endpoints
| Rank | Endpoint | Calls | % Total | Users | Last Used |
|------|----------|-------|---------|-------|-----------|
| 1 | ... | ... | ... | ... | ... |
### Sensitive API Access
| Endpoint | Calls | Users | Methods | Risk |
|----------|-------|-------|---------|------|
| roleManagement/... | 18 | 1 | GET | 🟠 Read access to PIM |
| ... | ... | ... | ... | ... |
**Summary:** X of Y calls (Z%) targeted sensitive endpoints. <Risk assessment>.
### Caller Attribution (Query 9)
| Caller Type | Auth Method | Users | Calls | Success Rate |
|-------------|-------------|------:|------:|-------------:|
| 👤 User (Delegated) | ... | ... | ... | ...% |
| 🤖 Service Principal | ... | ... | ... | ...% |
---
## Sentinel Triage MCP
### Triage MCP API Calls (MicrosoftGraphActivityLogs — AppId `7b7b3966`)
| Endpoint | Method | Calls | Users | First Seen | Last Seen |
|----------|--------|-------|-------|------------|----------|
| ... | ... | ... | ... | ... | ... |
### Triage MCP Authentication Events (SigninLogs — "Microsoft Defender Mcp")
| Sign-In Type | Sign-Ins | Users | IPs | Countries | Resource | Last Seen |
|-------------|----------|-------|-----|-----------|----------|----------|
| ... | ... | ... | ... | ... | ... | ... |
---
## Sentinel Data Lake MCP
> **Audit Source:** `CloudAppEvents` (Purview unified audit log)
> **Classification:** RecordType 403 + Interface `IMcpToolTemplate` = MCP-driven | RecordType 379 = Direct KQL
### MCP vs Direct KQL Delineation
| Access Pattern | Total Calls | Success | Failures | Error Rate | Avg Duration | Users |
|---------------|-------------|---------|----------|------------|-------------|-------|
| 🤖 MCP Server-Driven | ... | ... | ... | ...% | ...s | ... |
| 👤 Direct KQL | ... | ... | ... | ...% | ...s | ... |
### MCP Tool Breakdown
| Tool Name | Calls | Success | Failures | Error Rate | Avg Duration | Last Seen |
|-----------|-------|---------|----------|------------|-------------|-----------|
| `query_lake` | ... | ... | ... | ...% | ...s | ... |
| `list_sentinel_workspaces` | ... | ... | ... | ...% | ...s | ... |
| `search_tables` | ... | ... | ... | ...% | ...s | ... |
| ... | ... | ... | ... | ... | ... | ... |
### Error Analysis
| Error Category | Count | % of Failures | Sample Error | Affected Tools |
|---------------|-------|---------------|--------------|----------------|
| Schema/Semantic Error | ... | ...% | `column 'X' does not exist` | ... |
| ... | ... | ... | ... | ... |
### Daily Activity Trend
<ASCII bar chart — MCP + Direct KQL calls per day>
---
## Azure MCP Server
> **Detection Method:** Azure CLI credential (AppId `04b07795`, shared with manual `az` CLI). `RequestClientApp` is empty in LAQueryLogs. Best differentiator: Azure MCP appends `\\n| limit N` to query text via `monitor_workspace_log_query`. 🔄 Previously documented as AppId `1950a258` + `csharpsdk,LogAnalyticsPSClient` — that fingerprint is obsolete.
### Authentication Timeline
| Timestamp | Resource | Result | Auth Type | UserAgent | Notes |
|-----------|----------|--------|-----------|-----------|-------|
| ... | ... | ... | ... | ... | ... |
### Workspace Queries (LAQueryLogs)
| Timestamp | Query (truncated) | Response | CPU (ms) | Source App |
|-----------|-------------------|----------|----------|------------|
| ... | ... | ... | ... | ... |
### AzureActivity Write Operations
| Timestamp | Operation | Resource Provider | Status | Claims.appid |
|-----------|-----------|-------------------|--------|-------------|
| ... | ... | ... | ... | `04b07795` |
> If no ARM write operations found, state: "✅ No ARM write operations detected for AppId `04b07795` in the analysis period. ARM read operations are not logged in AzureActivity."
---
## Azure ARM Operations (All Sources)
> **Source Attribution:** ARM operations attributed via `Claims.appid` in AzureActivity.
> Azure MCP Server read-only operations NOT logged in AzureActivity.
### ARM Source Attribution
| AppId | App Name | Calls | Operations |
|-------|----------|-------|------------|
| ... | ... | ... | ... |
### Operations by Resource Provider
| Resource Provider | Calls | Top Operations | Distinct Resources |
|-------------------|-------|----------------|-------------------|
| ... | ... | ... | ... |
---
## Workspace Query Governance (Two-Tier)
### Analytics Tier (LAQueryLogs)
| Rank | AppId | Source | Category | Queries | % Total | Users |
|------|-------|--------|----------|---------|---------|-------|
| 1 | ... | Sentinel Engine | Platform | ... | ... | ... |
| 2 | ... | Sentinel Triage MCP | MCP Server | ... | ... | ... |
| 3 | ... | Sentinel Portal | Portal | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... |
### Data Lake Tier (CloudAppEvents)
| Access Pattern | Calls | % Total | Users | Tables Accessed |
|---------------|-------|---------|-------|-----------------|
| 🤖 MCP Server-Driven | ... | ...% | ... | ... |
| 👤 Direct KQL | ... | ...% | ... | ... |
### Combined MCP Proportion
<ASCII proportion bar — Analytics + Data Lake tiers combined>
MCP queries represent X% of combined query volume:
- Analytics tier: X of Y queries via Sentinel Triage MCP (Z%)
- Data Lake tier: X of Y queries via Data Lake MCP (Z%)
- Graph API: X calls via Graph MCP
---
## Top MCP Users (Cross-Server Breadth)
### User Ranking by MCP Server Breadth (Query 15)
| Rank | User (UPN) | Servers Used | MCP Servers | Total Calls |
|------|-----------|:------------:|-------------|------------:|
| 1 | ... | N | Graph MCP, Triage MCP, ... | X,XXX |
| 2 | ... | N | ... | X,XXX |
| ... | ... | ... | ... | ... |
> **Interpretation:** Users spanning 3+ MCP servers represent the broadest AI tool adoption. Cross-reference with sensitive endpoint data (§4) to identify users combining breadth with privileged access.
---
## MCP Usage Score
<ASCII score card>
### Dimension Breakdown
| Dimension | Score | Evidence |
|-----------|-------|----------|
| User Diversity | X/20 | N distinct users across M MCP channels |
| Endpoint Sensitivity | X/20 | N% of Graph MCP calls to sensitive endpoints |
| Error Rate | X/20 | N% error rate across all channels |
| Volume Anomaly | X/20 | Peak day was N% of rolling average |
| Off-Hours Activity | X/20 | N% of calls outside 08:00-18:00 UTC |
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡/🟠 **Factor** | Evidence-based finding |
---
## Recommendations
1. ⚠️/🟢 <Prioritized action item with evidence>
2. ...
---
## Appendix: Query Details
Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.
| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q1 — Unified Daily MCP Activity Trend | MicrosoftGraphActivityLogs, CloudAppEvents, SigninLogs, AADNonInteractive, LAQueryLogs | X,XXX | N rows | X.XXs |
| Q2 — Graph MCP Endpoint & Activity Summary | MicrosoftGraphActivityLogs | X,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |
*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*
Proactive Alerting — KQL Data Lake Jobs
This skill provides on-demand visibility (Phases 1-7 above). For continuous, scheduled anomaly detection that feeds Sentinel analytics rules, use the companion KQL Data Lake Jobs defined in:
📄 queries/identity/mcp_anomaly_detection_kql_jobs.md
Maturity Model
| Tier | Capability | Implementation |
|---|---|---|
| 1. Visibility (current skill) | On-demand MCP usage reports via Copilot chat | This SKILL.md — Phases 1-7, Queries 1-15 |
| 2. Baselining | 14-day behavioral baselines per user per MCP server | KQL Jobs 1-8 build baselines automatically |
| 3. Alerting | Automated anomaly detection → Sentinel incidents | KQL Jobs promote to _KQL_CL tables → Analytics Rules fire |
| 4. Enforcement | Real-time guardrails, scope limits (future) | Not yet available — requires MCP protocol-level controls |
KQL Job Inventory
| Job | Anomaly Type | Source Table(s) | Destination Table | Schedule |
|---|---|---|---|---|
| 1 | New sensitive Graph endpoint | MicrosoftGraphActivityLogs |
MCPGraphAnomalies_KQL_CL |
Daily |
| 2 | Graph MCP volume spike (3x baseline) | MicrosoftGraphActivityLogs |
MCPGraphAnomalies_KQL_CL |
Daily |
| 3 | Off-hours Graph MCP activity | MicrosoftGraphActivityLogs |
MCPGraphAnomalies_KQL_CL |
Daily |
| 4 | Graph MCP error rate anomaly | MicrosoftGraphActivityLogs |
MCPGraphAnomalies_KQL_CL |
Daily |
| 5 | New Azure MCP Server user | AADNonInteractiveUserSignInLogs |
MCPAzureAnomalies_KQL_CL |
Daily |
| 6 | New Azure MCP resource target | AADNonInteractiveUserSignInLogs |
MCPAzureAnomalies_KQL_CL |
Daily |
| 7 | Sentinel workspace query anomalies | LAQueryLogs |
MCPSentinelAnomalies_KQL_CL |
Daily |
| 8 | Cross-MCP activity chains | Multiple (join) | MCPCrossMCPCorrelation_KQL_CL |
Daily |
Why KQL Jobs (Not Summary Rules)
KQL jobs support multi-table joins — critical for Job 7 (LAQueryLogs + baseline) and Job 8 (Graph + Azure + Sentinel cross-correlation). Summary rules are limited to single-table with lookup() joins to analytics-tier tables only.
Architecture
Data Lake ──[KQL Jobs (daily)]──► _KQL_CL tables (analytics tier) ──[Analytics Rules]──► Incidents
Key design constraints:
- 15-minute delay: All queries use
now() - 15mto account for Data Lake ingestion latency - Anomaly-only promotion: Only flagged records are written to analytics tier (cost optimization)
- Separate timestamp:
DetectedTimepreserves original event time;TimeGeneratedreflects job execution time - 3 concurrent job limit: Per tenant — prioritize Jobs 1, 7, 8 for highest-value detections
For full query definitions, deployment checklist, and companion analytics rule templates, see queries/identity/mcp_anomaly_detection_kql_jobs.md.
Known Pitfalls
project ... as Keyword Fails in Advanced Hunting
Problem: The as keyword for column aliasing inside project (e.g., tostring(parse_json(Status).errorCode) as ErrorCode) fails in Advanced Hunting with Query could not be parsed at 'as'. While as is valid KQL in Log Analytics / Data Lake, the AH parser rejects it inside project statements.
Solution: Always use = assignment syntax instead: ErrorCode = tostring(parse_json(Status).errorCode). This works in both AH and Data Lake. All queries in this skill have been updated to use = syntax. When writing new queries, never use as for column aliasing in project — reserve as for tabular expression naming (let T = ... | as T).
Azure MCP Server Detection (🔄 Updated Feb 2026)
Problem: Azure MCP Server uses DefaultAzureCredential and the credential chain now resolves to Azure CLI (AppId 04b07795-8ddb-461a-bbee-02f9e1bf7b46), NOT AzurePowerShellCredential (1950a258) as previously documented. In LAQueryLogs, RequestClientApp is empty (not csharpsdk,LogAnalyticsPSClient). The previously documented fingerprint (1950a258 + csharpsdk,LogAnalyticsPSClient) appeared only once in 30-day lookback and is obsolete. ARM read operations (the majority of MCP calls) do not appear in AzureActivity.
Previous fingerprint (OBSOLETE):
- ❌ AppId
1950a258-227b-4e31-a9cf-717495945fc2(AzurePowerShellCredential) - ❌
RequestClientApp = "csharpsdk,LogAnalyticsPSClient"in LAQueryLogs - ❌ UserAgent
azsdk-net-Identityas primary differentiator (shared by many Azure SDK services)
Current fingerprint (field-tested Feb 2026):
- ✅ AppId
04b07795-8ddb-461a-bbee-02f9e1bf7b46(Azure CLI) - ✅
RequestClientAppis empty (shared with Azure CLI and 4+ other AADClientIds — not a unique fingerprint) - ✅ Azure MCP
monitor_workspace_log_queryappends\n| limit Nto query text — best query-level differentiator - ✅ Token caching: sign-in events represent access sessions, not individual API calls
Solution: Azure MCP Server queries can be identified in LAQueryLogs with moderate confidence by filtering for AADClientId 04b07795 + query text containing \n| limit (the suffix added by monitor_workspace_log_query). In SigninLogs, the shared AppId means Azure MCP is indistinguishable from manual Azure CLI usage — present as "Azure MCP Server / Azure CLI (shared AppId 04b07795)" in reports. The empty RequestClientApp bucket contains queries from 5+ different tools, so this field cannot be used for attribution.
Limitations:
- ARM read operations produce sign-in events but NOT AzureActivity records
- If the user also runs
azCLI manually, sign-in events from both are indistinguishable - The
\n| limit Nquery text suffix is the only reliable query-level differentiator but is heuristic - The credential chain may change with Azure MCP Server updates — monitor for AppId shifts
- AzureActivity ingestion lag is typically 3-20 min (MS docs); SigninLogs ~1-2h; LAQueryLogs/AADNonInteractiveUserSignInLogs ~5-15 min
MicrosoftGraphActivityLogs Availability
Problem: Graph activity logs are NOT enabled by default. If the table is empty or doesn't exist, Graph MCP analysis cannot proceed.
Solution: If MicrosoftGraphActivityLogs returns 0 results or table-not-found error, report: "⚠️ Microsoft Graph activity logs are not enabled in this tenant. Enable them at: https://learn.microsoft.com/en-us/graph/microsoft-graph-activity-logs-overview". Skip Graph MCP analysis gracefully and proceed with other MCP channels.
LAQueryLogs Diagnostic Settings
Problem: LAQueryLogs requires diagnostic settings to be configured on the Log Analytics workspace. Without it, workspace query governance analysis is impossible.
Solution: If LAQueryLogs returns empty, report: "⚠️ LAQueryLogs not available — enable Log Analytics workspace diagnostic settings to monitor query activity." Skip workspace governance analysis and note the gap.
AppId Misclassification History (Field-Tested Feb 2026)
80ccca67 — Previously assumed to be a Graph MCP variant. Actually the M365 Security & Compliance Center (Sentinel Portal backend, RequestClientApp = ASI_Portal). Categorize as "Sentinel Portal (Non-MCP)". Graph MCP has only ONE AppId: e8c77dc2.
95a5d94c — Previously assumed to be "VS Code Copilot" (MCP Client). Actually the Azure Portal — AppInsightsPortalExtension blade, executing Usage dashboard/workbook queries in the browser. No SPN or app registration in tenant; not in merill/microsoft-info known apps list. Categorize as "Portal/Platform (Non-MCP)".
📘 Takeaway: When encountering an unknown AppId in
LAQueryLogs, check theRequestClientAppfield first — it reliably reveals the actual source (e.g.,AppInsightsPortalExtension,ASI_Portal). Do not assume an AppId is MCP-related without verifying via Graph API SPN lookup, sign-in logs, and query content analysis.
CloudAppEvents CamelCase Matching (ActionType AND Operation)
Problem: Both ActionType and RawEventData.Operation values in CloudAppEvents for Sentinel operations use CamelCase without word boundaries (e.g., SentinelAIToolRunCompleted, KQLQueryCompleted). The has operator requires word boundaries and will NOT match these values. Field-tested Feb 2026: has "Completed" returns false for ALL Operation values including KQLQueryCompleted — the has operator fails on substrings within CamelCase tokens.
Solution: Always use contains (not has) when filtering ActionType or Operation for Sentinel/KQL operations:
// ✅ CORRECT — 'contains' works with CamelCase
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| where Operation contains "Completed"
// ❌ WRONG — 'has' requires word boundaries, fails on CamelCase
| where ActionType has "Sentinel" or ActionType has "KQL"
| where Operation has "Completed" // Returns 0 rows — silently drops ALL MCP events!
Impact if missed: Query 12 (MCP vs Direct KQL delineation) will show 0 MCP events and ONLY Direct KQL — because MCP events (RecordType 403) are filtered out by Operation has "Completed", while Direct KQL events (RecordType 379) survive via the OR RecordType == 379 fallback. This creates a false impression that no MCP-driven queries exist.
CloudAppEvents RawEventData Parsing
Problem: RawEventData in CloudAppEvents is a dynamic column but often contains nested JSON that requires double-parsing. Direct property access (e.g., RawEventData.ToolName) may return empty.
Solution: Always parse explicitly with parse_json(tostring(RawEventData)):
| extend RawData = parse_json(tostring(RawEventData))
| extend ToolName = tostring(RawData.ToolName)
Data Lake MCP Has No AppId
Problem: Unlike Graph MCP (e8c77dc2) and Sentinel Triage MCP (7b7b3966), the Sentinel Data Lake MCP has no dedicated AppId in any telemetry table. It is not visible in LAQueryLogs, SigninLogs, or MicrosoftGraphActivityLogs.
Solution: Data Lake MCP activity is audited exclusively via CloudAppEvents (Purview unified audit log). Filter by ActionType contains "SentinelAITool" (preferred — top-level column) or extract RecordType from RawEventData with toint(parse_json(tostring(RawEventData)).RecordType) == 403 and Interface == "IMcpToolTemplate". Note: RecordType is NOT a top-level column in CloudAppEvents — it is nested inside RawEventData and must be extracted via parse_json().
Table availability (field-tested Feb 2026): CloudAppEvents was confirmed available on both Data Lake (TimeGenerated, 90d retention) and Advanced Hunting (Timestamp, 30d retention) in a standard Sentinel workspace without explicit Purview/E5 configuration. Always attempt the query first — only report a gap if the table returns 0 results or a table-not-found error. Do not skip Phase 3 based on licensing assumptions.
CloudAppEvents Double-Counting Prevention
Problem: Each Data Lake MCP tool call generates TWO events: SentinelAIToolRunStarted (RecordType 403) and SentinelAIToolRunCompleted (RecordType 403). Counting both will double the actual call count.
Solution: Always filter on Operation == "SentinelAIToolRunCompleted" for call counts, duration analysis, and error analysis. Use SentinelAIToolRunStarted only when investigating specific timing sequences or queue behavior.
Data Lake MCP ExecutionDuration Format
Problem: The ExecutionDuration field in RawEventData is stored as a string (e.g., "2.4731712"), not a numeric type. Aggregation functions (avg, max) will fail without conversion.
Solution: Use todouble(RawData.ExecutionDuration) to convert before aggregation.
Sentinel Engine False Association
Problem: The Sentinel analytics engine (fc780465-2017-40d4-a0c5-307022471b92) generates the highest query volume in most workspaces but is NOT an MCP server. Including it in MCP totals would massively inflate the numbers.
Solution: ALWAYS label Sentinel Engine and Logic Apps Connector as "Platform (Non-MCP)" in reports. The MCP proportion calculation MUST exclude these from the MCP numerator.
SigninLogs Status Field Needs parse_json() in Data Lake
Problem: The Status column in SigninLogs / AADNonInteractiveUserSignInLogs is a dynamic field containing {errorCode, failureReason, additionalDetails}, but Data Lake workspaces may store it as a string. Using dot-notation (Status.errorCode) without parse_json() causes parser errors (Expected: ;) or SemanticErrors.
Solution: Always use tostring(parse_json(Status).errorCode) — same pattern as DeviceDetail, LocationDetails, and ConditionalAccessPolicies. This works regardless of whether the column is stored as dynamic or string. Query 3 was fixed for this in Feb 2026.
Type Column Unavailable in Data Lake Union Contexts
Problem: The Type pseudo-column (table name) is NOT resolvable in union queries executed via Sentinel Data Lake. Using summarize by Type in a union SigninLogs, AADNonInteractiveUserSignInLogs query fails with SemanticError: Failed to resolve scalar expression named 'Type'.
Solution: When you need to distinguish source tables in a union, add | extend TableName = "SigninLogs" (or "AADNonInteractive") within each union leg before the union operator. Then summarize by TableName. This is already handled in Query 13 via the SignInType field pattern (extend SignInType = "Interactive" / "Non-Interactive"), but ad-hoc summary variants must use the extend approach — never Type.
Non-Interactive Sign-In Noise
Problem: AADNonInteractiveUserSignInLogs may contain Logic Apps connector activity (de8c33bb) that looks like user activity but is automated.
Solution: When reporting Sentinel MCP auth events from SigninLogs, distinguish interactive (user-initiated) from non-interactive (automated) sources. The LogicApps connector is NOT MCP — exclude it from MCP auth counts.
AADNonInteractiveUserSignInLogs Commonly on Data Lake Tier
Problem: Many customers place AADNonInteractiveUserSignInLogs on Data Lake (or Basic) tier. When this table is NOT on Analytics tier, any Advanced Hunting query that unions SigninLogs + AADNonInteractiveUserSignInLogs fails with MPC -32600: The query should contain a single Basic or Auxiliary table or silently returns incomplete/unsorted data. This affects Query 1 (daily trend) and Query 6 (Triage MCP auth) in this skill.
Solution: All queries that union SigninLogs + AADNonInteractiveUserSignInLogs in this skill MUST use mcp_sentinel-data_query_lake instead of RunAdvancedHuntingQuery. Data Lake handles cross-table unions natively and works regardless of which tier each table is on. When running via Data Lake, CloudAppEvents uses TimeGenerated (not Timestamp as in AH). Queries 1, 6, and 15 are already configured for Data Lake.
Off-Hours Timezone Uncertainty
Problem: TimeGenerated is always UTC, but "off-hours" has different meaning depending on the user's timezone. A UTC 06:00 call might be 22:00 local or 14:00 local.
Solution: Default to UTC for off-hours calculation. If the user's timezone is known from sign-in data (LocationDetails), adjust. Always state the timezone assumption in the report.
Multi-Tenant Token Confusion
Problem: Azure MCP Server uses DefaultAzureCredential and may authenticate against the wrong tenant if multiple credentials are cached, causing queries to fail or return data from an unexpected tenant.
Solution: Read config.json for the azure_mcp.tenant parameter. When making Azure MCP Server calls, always pass the tenant parameter explicitly. Note this risk in the report.
Rate Limiting Not Visible in Logs
Problem: Graph MCP Server is capped at 100 calls/min/user. If throttled, calls may not appear in logs (no log entry = no visibility).
Solution: If daily call counts show sudden drops to 0 after a high-volume period, note possible throttling. Check for 429 Too Many Requests response codes in Query 1 raw data.
SentinelAudit Table Availability
Problem: SentinelAudit requires Sentinel auditing and health monitoring to be enabled. It may not exist in all workspaces.
Solution: If SentinelAudit returns table-not-found, skip gracefully. Report: "⚠️ Sentinel auditing not enabled — cannot check configuration changes."
Error Handling
Common Issues
| Issue | Solution |
|---|---|
project ... as ErrorCode fails in AH |
Advanced Hunting rejects as keyword in project. Use = syntax: ErrorCode = tostring(...). See Known Pitfalls. |
MPC -32600 error from Triage MCP |
Transient error — retry once. If persistent, fall back to mcp_sentinel-data_query_lake. |
MicrosoftGraphActivityLogs table not found |
Graph activity logs not enabled. Report gap, skip Graph MCP analysis, provide enablement link. |
LAQueryLogs table not found |
Diagnostic settings not configured on LA workspace. Report gap, skip governance analysis. |
SentinelAudit table not found |
Sentinel health monitoring not enabled. Report gap, skip config change analysis. |
AzureActivity returns 0 results |
No ARM operations in the time range, or no administrative actions by the specified user. |
| SigninLogs returns 0 for Sentinel Platform Services | No one authenticated to Sentinel MCP in the time range. Report as "✅ No Sentinel MCP authentication events detected." |
CloudAppEvents table not found |
Purview unified audit not available (requires E5 license). Report gap: "⚠️ CloudAppEvents not available — cannot monitor Data Lake MCP usage. Requires Microsoft 365 E5 or Purview audit." Skip Phase 3 (Data Lake MCP). |
| CloudAppEvents returns 0 for Sentinel operations | No Data Lake MCP or Direct KQL activity in the time range. Report as "✅ No Sentinel Data Lake activity detected in CloudAppEvents." |
ActionType has "Sentinel" returns 0 but data exists |
CamelCase bug — use contains instead of has for ActionType matching. See Known Pitfalls. |
Operation has "Completed" drops MCP events silently |
Same CamelCase bug — has "Completed" returns false for ALL CamelCase operations (SentinelAIToolRunCompleted, KQLQueryCompleted). MCP events (RecordType 403) are silently dropped; Direct KQL survives only via OR RecordType == 379 fallback. Use contains "Completed". See Known Pitfalls. |
RawEventData.ToolName returns empty |
Double-parse required: use parse_json(tostring(RawEventData)) then extract fields. See Known Pitfalls. |
| Query timeout | Reduce lookback from 30d to 7d, or add ` |
| Unknown AppId in LAQueryLogs | Cross-reference with Entra ID > App Registrations. May be a custom MCP server or third-party tool. |
| Multiple workspaces available | Follow workspace selection rules — STOP, list all, ASK user, WAIT. |
| Azure MCP calls indistinguishable from CLI | Partially resolved: AppId 04b07795 is shared with Azure CLI. Use `\n |
Validation Checklist
Before presenting results, verify:
- All MCP telemetry surfaces were queried (Graph, Sentinel Triage, Sentinel Data Lake, Azure ARM, LAQueryLogs, CloudAppEvents)
- Tables that don't exist are reported as gaps, not silent omissions
- Non-MCP sources (Sentinel Engine, Logic Apps, Sentinel Portal) are clearly labeled as "Platform/Portal (Non-MCP)"
-
80ccca67is classified as "M365 Security & Compliance Center (Sentinel Portal)" — NOT as an MCP server -
95a5d94cis classified as "Azure Portal — AppInsightsPortalExtension" — NOT as MCP Client or VS Code Copilot. Verify viaRequestClientAppfield. - MCP proportion calculation excludes non-MCP platform sources from the MCP numerator
- Two-tier governance view included: Analytics tier (LAQueryLogs) + Data Lake tier (CloudAppEvents)
- Data Lake MCP vs Direct KQL delineation is clearly presented (RecordType 403 vs 379)
- CloudAppEvents queries use
contains(nothas) for ActionType matching - CloudAppEvents queries use
contains(nothas) forOperationfield matching (same CamelCase issue) - CloudAppEvents RawEventData is parsed with
parse_json(tostring(RawEventData))pattern - Data Lake MCP tool call counts use
SentinelAIToolRunCompletedonly (not Started) to avoid double-counting - All user attribution is based on actual query results, not assumptions
- Azure MCP Server detection uses AppId
04b07795(Azure CLI) with emptyRequestClientAppand query text\n| limit Nsuffix as differentiator. Present as "Azure MCP Server / Azure CLI (shared AppId)" in reports - Graph MCP sensitive endpoint percentage is calculated from actual data
- Off-hours analysis states the timezone assumption (default: UTC)
- Empty results are explicitly reported with ✅ (not silently omitted)
- AppId cross-reference table is included for any unknown AppIds discovered
- The MCP Usage Score calculation is transparent with per-dimension evidence
- All ASCII visualizations are wrapped in code fences for markdown compatibility
- Top MCP Users table (Q15) included in report with cross-server breadth ranking
- If no Agent Identities are needed: refer user to
ai-agent-postureskill for comprehensive agent audit
Prerequisites
For complete MCP server monitoring, ensure these data sources are enabled:
| Data Source | Enabling Documentation | Required For |
|---|---|---|
| Microsoft Graph activity logs | Enable Graph activity logs | Graph MCP analysis (Queries 1-2, 5, 9) |
| CloudAppEvents (Purview unified audit) | Requires M365 E5 license; enable Sentinel Data Lake auditing | Data Lake MCP analysis (Queries 10-12) |
| Sentinel auditing and health monitoring | Enable Sentinel monitoring | Config change detection (ad-hoc SentinelAudit queries) |
| LAQueryLogs (diagnostic settings) | Configure diagnostic settings on LA workspace | Workspace governance (Queries 7, 8, 14) |
| AzureActivity | Enabled by default for ARM operations | Azure MCP analysis (ad-hoc ARM queries) |
| SigninLogs | Entra ID diagnostic settings | Sentinel MCP auth events (Queries 3-4, 6, 13) |
| Purview audit logs | Included with E5 license | CloudAppEvents ingestion — required for Data Lake MCP monitoring (Queries 10-12). RecordType 403 (AI Tool) and 379 (KQL) |
If any prerequisite is not met, the skill will report the gap and skip the affected analysis sections.
Cross-References
- KQL Jobs for proactive alerting:
queries/identity/mcp_anomaly_detection_kql_jobs.md— Scheduled Data Lake jobs that promote MCP anomalies to analytics tier for automated Sentinel alerting - Main skill registry:
.github/copilot-instructions.md— Skill detection and global rules - Scope drift analysis:
.github/skills/scope-drift-detection/SKILL.md— Can be run on MCP-related service principals for behavioral drift detection - Sentinel Data Lake auditing: Auditing lake activities — Official docs on RecordType 403/379 audit events in CloudAppEvents
- Sentinel MCP tool collections: Tool collection overview — Data Exploration, Triage, and Security Copilot Agent Creation collections
- Sentinel MCP custom tools: Create custom MCP tools — Expose saved KQL queries as MCP tools
- Copilot Studio MCP catalog: Built-in MCP servers — 19+ Microsoft-managed MCP servers for agent development
- Azure MCP Server tools: Available tools — Full Azure MCP Server tool catalog (40+ namespaces)
- Power BI MCP: Remote endpoint at
https://api.fabric.microsoft.com/v1/mcp/powerbi, Modeling at microsoft/powerbi-modeling-mcp - Fabric RTI MCP: Fabric RTI MCP overview | GitHub
- Playwright MCP: GitHub — Browser automation MCP (26.9k ⭐, local only)
- AI Agent Posture:
.github/skills/ai-agent-posture/SKILL.md— Comprehensive Copilot Studio agent security audit (for Agent Identity analysis, use this skill instead)
SVG Dashboard Generation
📊 Optional post-report step. After an MCP Usage report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/mcp-usage/MCP_Usage_Report_<workspace>_<date>.md - Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/mcp-usage/{report_name}_dashboard.svg
The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.
.github/skills/mitre-coverage-report/SKILL.md
npx skills add SCStelz/security-investigator --skill mitre-coverage-report -g -y
SKILL.md
Frontmatter
{
"name": "mitre-coverage-report",
"description": "MITRE ATT&CK Coverage Report — YAML-driven PowerShell pipeline gathers analytic rule MITRE tags, custom detection techniques, SOC Optimization recommendations, and alert\/incident operational data via az rest\/az monitor\/Graph API, writes a deterministic scratchpad, LLM renders the report. Covers tactic-level coverage matrix, technique-level drill-down with rule mapping, coverage gap identification, SOC Optimization threat scenario alignment, untagged rule remediation, ICS\/OT technique tracking, and MITRE Coverage Score (5 weighted dimensions). Inline chat and markdown file output.",
"drill_down_prompt": "Run MITRE ATT&CK coverage report — tactic\/technique coverage, gaps, SOC optimization",
"threat_pulse_domains": [
"incidents"
]
}
MITRE ATT&CK Coverage Report — Instructions
Purpose
This skill generates a comprehensive MITRE ATT&CK Coverage Report analyzing detection coverage across the ATT&CK Enterprise framework. It inventories all analytic rules and custom detections, maps them to MITRE tactics and techniques, identifies coverage gaps, and provides prioritized recommendations for improving detection posture.
Entity Type: Sentinel workspace (from config.json)
| Scope | Data Sources | Use Case |
|---|---|---|
| Workspace-wide (default) | Analytic Rules (REST), Custom Detections (Graph), SOC Optimization (REST), SecurityAlert/SecurityIncident (KQL) | Full MITRE coverage analysis |
| Operational correlation | SecurityAlert, SecurityIncident | Which MITRE-tagged rules actually produce alerts and incidents |
What this report covers: Tactic-level coverage matrix with per-tactic technique counts and percentages, technique-level drill-down with rule-to-technique mapping, coverage gap identification against the full ATT&CK Enterprise framework, SOC Optimization threat scenario alignment (AiTM, ransomware, BEC, etc.), untagged rule remediation with AI-suggested MITRE tags, ICS/OT technique tracking, operational MITRE correlation (which rules actually fire), and a composite MITRE Coverage Score.
Complementary to: This skill pairs with the sentinel-ingestion-report skill — ingestion report covers data volume, tier optimization, and cost; MITRE coverage report covers detection posture against the ATT&CK framework. Run both for a complete workspace assessment.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ YAML query files PowerShell script LLM render │
│ queries/phase1-3/ ──→ Invoke-MitreScan.ps1 ──→ Phase 4 │
│ (6 .yaml files) (~1030 lines) (SKILL- │
│ • az rest (Sentinel API) report.md) │
│ • Invoke-MgGraphRequest │
│ • az monitor (KQL) │
│ • mitre-attck-enterprise.json │
│ • m365-platform-coverage.json (CTID) │
│ ↓ │
│ temp/mitre_scratch_<ts>.md │
│ (~35 KB, 18+ sections) │
└──────────────────────────────────────────────────────────────────┘
Execution model:
- Phases 1-3 (data gathering): Fully automated by
Invoke-MitreScan.ps1. Phase 1 usesaz rest(Sentinel REST API) and optionallyInvoke-MgGraphRequest(Graph API). Phase 2 usesaz rest(SOC Optimization API). Phase 3 usesaz monitor log-analytics query(KQL). - Phase 4 (rendering): LLM reads the scratchpad +
SKILL-report.mdand renders the report. This is the only phase requiring LLM involvement.
Static reference: mitre-attck-enterprise.json contains ATT&CK Enterprise v16.1 with 14 tactics, 216 techniques, and 475 sub-techniques. The PS1 loads this at startup to compute coverage gaps against the full framework. This file is version-controlled and should be updated when MITRE publishes new ATT&CK releases.
Platform coverage reference: m365-platform-coverage.json is a compact CTID (Center for Threat-Informed Defense) mapping of M365 Defender product capabilities to ATT&CK techniques. Contains detect/protect/respond coverage for 81 detect techniques across 38 capabilities (7 SecurityAlert product groups). Used for the 3-tier platform coverage classification:
- Tier 1 (Alert-Proven): SecurityAlert from M6 query has MITRE technique attribution — highest confidence
- Tier 2 (Deployed Capability): Product is active (has alerts) and CTID claims detect coverage for the technique — medium confidence
- Tier 3 (Catalog Capability): CTID maps coverage but no alert evidence for the product in this workspace — lowest confidence
To rebuild from upstream: download the CTID M365 mapping JSON, transform with PowerShell (group by parent technique, map capabilities to SecurityAlert ProductName). See temp/ctid_raw.json for the raw source.
Companion Files — When to Load
| File | Purpose | When to Load |
|---|---|---|
| SKILL.md (this file) | Architecture, workflow, rendering rules, score methodology, domain reference | Always — primary entry point |
| SKILL-report.md | Report templates (§1-§6), section-to-scratchpad mapping, formatting rules | Phase 4 rendering only |
| Invoke-MitreScan.ps1 | PowerShell data-gathering pipeline (Phases 1-3) | Execution only — no need to read unless debugging |
| mitre-attck-enterprise.json | ATT&CK Enterprise v16.1 static reference | Referenced by PS1 at runtime — no manual loading |
| m365-platform-coverage.json | CTID M365 platform coverage reference (detect/protect/respond) | Referenced by PS1 at runtime — no manual loading |
📑 TABLE OF CONTENTS
- Quick Start - 3-step execution pattern
- Critical Workflow Rules - Prerequisites and prohibitions
- Execution Workflow - Phases 0-4
- Query File Reference - All 5 YAML files
- Output Modes - Inline chat vs. Markdown file
- Deterministic Rendering Rules - Rules A-D (mandatory for Phase 4)
- MITRE Coverage Score - 5-dimension scoring methodology
- Domain Reference - ATT&CK interpretation, tactic priorities, Sentinel-specific mappings
- SVG Dashboard Generation - Visual dashboard from completed report
Quick Start (TL;DR)
3-step execution pattern:
Step 1: Run Invoke-MitreScan.ps1 (Phases 1-3 — data gathering)
Step 2: Read scratchpad + SKILL-report.md (Phase 4 prep)
Step 3: Render report incrementally (§1 via create_file, then §2–§6 appended via replace_string_in_file)
Step 1: Run Data Gathering
# From workspace root — run all phases (default: 30 days alert/incident lookback):
& ".github/skills/mitre-coverage-report/Invoke-MitreScan.ps1"
# Specify a custom alert/incident lookback:
& ".github/skills/mitre-coverage-report/Invoke-MitreScan.ps1" -Days 7
# Run a specific phase (for re-runs / debugging):
& ".github/skills/mitre-coverage-report/Invoke-MitreScan.ps1" -Phase 1
Output: Scratchpad file at temp/mitre_scratch_<timestamp>.md (~28 KB, 12 sections).
Timing: Full run takes ~60-90 seconds (varying with REST API response times and KQL auth state).
Step 2: Load Rendering Context
- Read the scratchpad file (path printed by PS1 at completion)
- Read SKILL-report.md for rendering templates
Step 3: Render Report (Incremental Writes)
Render the report across multiple tool calls — one section per call — to avoid single-call output token limits that truncate large reports:
create_file→ header + disclaimer + §1 (Executive Summary, Score, Inventory, Top 3 Recs)replace_string_in_file→ append §2 (Tactic Coverage Matrix)replace_string_in_file→ append §3 (Technique Deep Dive — largest section)replace_string_in_file→ append §4 (Coverage Gap Analysis)replace_string_in_file→ append §5 (Operational MITRE Correlation)replace_string_in_file→ append §6 + Appendix
Apply SKILL-report.md templates to scratchpad data, following Rules A–D. See SKILL-report.md for full section templates and the anchor pattern for each append.
🔴 Verbatim table sections — use the deterministic slicer, never hand-copy. Several report tables (§3 TechniqueTables, §5.1 CombinedTacticCoverage, §5.2 AlertFiring, §5.3 ActiveVsTagged, §5.4 IncidentsByTactic, §5.5 DataReadiness, §5.6 ConnectorHealth) are pre-rendered by the PS1 under ## PRERENDERED in the scratchpad. Copy them with the read-only helper instead of transcribing by hand:
python .github/skills/mitre-coverage-report/slice_scratch.py --scratch temp/mitre_scratch_<ts>.md --list
python .github/skills/mitre-coverage-report/slice_scratch.py --scratch temp/mitre_scratch_<ts>.md --section AlertFiring
The slicer prefers the ## PRERENDERED copy when a section name also exists as a raw data block, strips pipeline scaffolding (<!-- … --> comments, SectionTitle: markers) wherever it appears, preserves #### sub-headings, and collapses blank runs — so the output drops straight into the report as a valid markdown table. Do NOT paste the raw Key | Value | … data blocks (the ones with a <!-- header --> comment and no |---| separator row) — they render as plain text, not tables, and pasting the whole scratchpad tail into one section corrupts the report.
⛔ Do NOT render §1–§6 in a single create_file call. The output will truncate silently. The scratchpad is ~60 KB; the rendered report exceeds the single-call output budget.
🔴 ALL 6 APPENDS ARE MANDATORY. Do NOT stop after §5 — §6 (Recommendations) and the Appendix (Score Methodology, Limitations) are critical and must be appended. After the 6th append, run grep_search for ## 6. Recommendations and ## Appendix on the report file to verify both exist. If either is missing, append the missing content immediately.
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY MITRE coverage report:
- Run
Invoke-MitreScan.ps1— this single script handles ALL data gathering (Phases 1-3). The LLM does NOT run queries, transcribe output, or write scratchpad sections - Read
config.jsonfor workspace ID, tenant, subscription, and Azure MCP parameters - ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or both (default: both)
- ALWAYS ask the user for timeframe if not specified: the
-Daysparameter controls the alert/incident KQL lookback (Phase 3). Default: 30 days. Phases 1-2 (REST API) are not time-bounded - ALWAYS use
create_filefor markdown reports (never use terminal commands) - ALWAYS sanitize PII from saved reports — use generic placeholders for real rule names, workspace names, and tenant GUIDs in committed files
- Read scratchpad + SKILL-report.md before rendering — the scratchpad is the sole data source
- Custom Detections may be SKIPPED — the Graph API requires
CustomDetection.Read.Allwhich needs admin consent. If skipped, the report notes this and shows AR-only analysis. Do NOT treat SKIPPED as an error — it's a graceful degradation
Prerequisites
| Dependency | Required By | Setup |
|---|---|---|
Azure CLI (az) |
All phases (REST + KQL) | Install: aka.ms/installazurecli. Authenticate: az login --tenant <tenant_id> then az account set --subscription <subscription_id> |
| Azure RBAC | Phase 1-2 (REST API) | Microsoft Sentinel Reader on the workspace (analytic rule inventory + SOC Optimization) |
| KQL auth | Phase 3 (az monitor) | az login with https://api.loganalytics.io/.default scope (CA policy may enforce re-auth) |
| Microsoft.Graph PowerShell | Phase 1 M2 (Custom Detections) | Install-Module Microsoft.Graph.Authentication -Scope CurrentUser. Required scope: CustomDetection.Read.All. PS1 skips gracefully if unavailable |
| PowerShell 7.0+ | Script execution | #Requires -Version 7.0 |
🔴 PROHIBITED
- ❌ Running REST/KQL queries via MCP tools during data gathering — PS1 handles all queries
- ❌ Writing or modifying scratchpad sections manually — PS1 is the sole writer
- ❌ Fabricating technique counts, rule names, or coverage percentages
- ❌ Inventing ATT&CK technique IDs or names not in the reference JSON
- ❌ Overriding MITRE Coverage Score dimensions — the PS1 computes these deterministically
- ❌ Rendering the report without first reading the scratchpad file
- ❌ Reporting "100% coverage" for any tactic unless the data actually shows every technique covered
Execution Workflow
Phase 0: Initialization
- Read
config.jsonforsentinel_workspace_id,subscription_id, Azure MCP parameters - Confirm output mode and timeframe with user (pass
-Daysto PS1; default 30) - Verify prerequisites:
az loginsession active, correct subscription set
Phases 1-3: Data Gathering (automated by PS1)
Run Invoke-MitreScan.ps1 — it handles all 3 phases automatically:
| Phase | Queries | Description | Execution Type |
|---|---|---|---|
| 1 | M1, M2 | Rule inventory — Analytic rules with MITRE tactics/techniques (REST), Custom Detection rules with mitreTechniques (Graph, graceful skip) | REST + Graph |
| 2 | M3 | SOC Optimization — Coverage recommendations with threat scenario context, MITRE tagging suggestions for untagged rules | REST |
| 3 | M4, M5, M6, M7, M8 | Operational correlation — SecurityAlert firing counts per rule with MITRE cross-reference, SecurityIncident volume by tactic, platform-native alert MITRE coverage, table ingestion volume for data readiness validation, data connector health from SentinelHealth | KQL |
Post-processing (automated by PS1):
| Task | Phase | Description |
|---|---|---|
| Tactic coverage matrix | 1 | For each ATT&CK tactic, count enabled rules and covered techniques against the framework reference |
| Technique drill-down | 3 | Map every framework technique to its covering rules AND pre-compute tier/product annotations from CTID cross-reference |
| Untagged rule identification | 1 | Find rules with no MITRE tactics AND no techniques |
| ICS technique extraction | 1 | Separate T0xxx (ICS/OT) technique mappings |
| Threat scenario parsing | 2 | Extract active/recommended detection counts and per-tactic breakdowns from SOC Optimization |
| AI MITRE tagging suggestions | 2 | Extract suggested tactics/techniques for untagged rules. Cross-reference against Phase 1 actual rule tags to verify if suggestions were applied (emits VerifyStatus: Applied/Partial/NotApplied/NotFound per rule, plus summary counts AR_TagsApplied/AR_TagsPartial/AR_TagsNotApplied/AR_TagsNotFound) |
| Alert-to-MITRE correlation | 3 | Cross-reference firing alerts with Phase 1 MITRE tags |
| Active tactic coverage | 3 | Compute which tactics have rules that actually fire alerts |
| Platform alert MITRE extraction | 3 | Extract MITRE techniques attributed by platform-native product alerts (M6) |
| Product presence detection | 3 | Derive active M365 Defender products from SecurityAlert ProductName |
| CTID tier classification | 3 | Cross-reference active products with CTID mapping to classify techniques as Tier 1/2/3 |
| Combined tactic coverage | 3 | Merge custom rule and platform Tier 1/2 coverage per tactic |
| Data readiness cross-reference | 3 | Extract KQL table dependencies from rule queries, cross-reference with M7 ingestion volumes, classify rules as Ready/Partial/NoData |
| Connector health enrichment | 3 | Cross-reference M8 SentinelHealth connector status with Data Readiness — flag "Ready" rules whose feeding connector is degraded or failing |
| Table tier classification | 3 | Cross-reference M9 table tier metadata with rule KQL table dependencies — flag rules targeting Basic/Data Lake tier tables as "TierBlocked" (phantom coverage: rule structurally cannot fire regardless of data volume) |
| Coverage Score computation | All | Weighted composite score from 5 dimensions |
Scratchpad output: PS1 writes all results to temp/mitre_scratch_<timestamp>.md (~28 KB, ~12 named sections). See SKILL-report.md for the Section-to-Scratchpad Mapping.
Phase 4: Render Output (LLM)
🔴 MANDATORY — Load scratchpad + report template before rendering:
- Read the scratchpad file (path printed by PS1). This single file contains ALL data from Phases 1-3.
- Read SKILL-report.md for the complete rendering templates and formatting rules.
Pre-render validation:
- Verify scratchpad has all 3 phase sections (PHASE_1 through PHASE_3)
- Check SCORE section has all 5 dimensions
- If Phase 3 shows FAILED for M4/M5 (token expiry), note this in the report — the Operational dimension defaults to 0
Render — Section-by-Section:
| Section | Data Source (scratchpad keys) | Required |
|---|---|---|
| §1 Executive Summary | All phases + SCORE | ✅ Coverage Score, Workspace at a Glance, Top 3 |
| §2 Tactic Coverage | PHASE_1.TacticCoverage | ✅ 14-tactic matrix with coverage % |
| §3 Technique Deep Dive | PHASE_3.TechniqueDetail (enriched with Tier/TierProducts) | ✅ Per-tactic technique tables with pre-computed tier badges |
| §4 Coverage Gap Analysis | PHASE_1.TacticCoverage + PHASE_3.TechniqueDetail + PHASE_2.ThreatScenarios | ✅ Gaps, priorities, threat scenario alignment |
| §5 Operational MITRE Correlation | PHASE_3.AlertFiring + IncidentsByTactic + ActiveTacticCoverage + PlatformAlertCoverage + PlatformTechniquesByTier + PlatformTacticCoverage + DataReadiness + DataReadiness_Summary + MissingTables + TierBlockedTables + ConnectorHealth + ConnectorHealth_Summary | ✅ Which rules fire, platform coverage, combined tactic view, data readiness, tier-blocked phantom coverage, connector health |
| §6 Recommendations | All phases | ✅ Untagged rule remediation, Content Hub suggestions, coverage priorities |
Query File Reference
All queries are defined as YAML files in queries/phase1-3/.
YAML Format
id: mitre-m1 # Unique identifier
name: Analytic Rule MITRE Extraction # Human-readable name
description: Fetch rules with tactics/techniques # What it does
phase: 1 # Which phase (1-3)
type: rest # rest | graph | kql
url: https://management.azure.com/... # REST API URL with placeholders
jmespath: value[].{...} # JMESPath projection (REST)
Complete Query Inventory
| Phase | File | ID | Type | Description |
|---|---|---|---|---|
| 1 | M1-AnalyticRuleMitre.yaml | mitre-m1 | rest | Scheduled + NRT analytic rules with MITRE tactics, techniques, severity, query text |
| 1 | M2-CustomDetectionMitre.yaml | mitre-m2 | graph | Custom Detection rules with mitreTechniques (graceful skip if auth unavailable) |
| 2 | M3-SocOptCoverage.yaml | mitre-m3 | rest | SOC Optimization coverage recommendations with threat scenarios and MITRE tagging suggestions |
| 3 | M4-AlertFiringByMitre.yaml | mitre-m4 | kql | SecurityAlert firing counts per rule with severity breakdown (30d lookback) |
| 3 | M5-IncidentsByTactic.yaml | mitre-m5 | kql | SecurityIncident volume by tactic with classification breakdown |
| 3 | M6-PlatformAlertCoverage.yaml | mitre-m6 | kql | Platform-native SecurityAlert detections with MITRE technique attribution (excludes custom rules) |
| 3 | M7-TableIngestionVolume.yaml | mitre-m7 | kql | 7-day average daily ingestion volume per table from Usage table for data readiness validation |
| 3 | M8-ConnectorHealth.yaml | mitre-m8 | kql | SentinelHealth data connector fetch status — latest state, success/failure counts, health % per connector (supplements M7 with early-warning connector failure detection) |
| 3 | M9-TableTierClassification.yaml | mitre-m9 | cli | Log Analytics table tier metadata (Analytics/Basic/Data Lake) via az monitor log-analytics workspace table list — identifies tables that analytics rules cannot query |
Output Modes
Mode 1: Inline Chat Summary (default for quick requests)
Compact executive summary rendered directly in chat with MITRE Coverage Score and top coverage gaps.
Mode 2: Markdown File Report
Full detailed report saved to reports/sentinel/mitre_coverage_report_<YYYYMMDD_HHMMSS>.md.
Mode 3: Both (default when user says "report" or "generate report")
Inline chat executive summary + full markdown file.
Ask user if not specified:
"How would you like the MITRE coverage report? I can provide:
- Inline chat summary — MITRE Score + top gaps in chat
- Markdown file — detailed report saved to reports/sentinel/
- Both (recommended) — summary in chat + full report file"
Deterministic Rendering Rules
These rules eliminate LLM interpretation variance. Apply them EXACTLY during Phase 4 rendering.
Rule A: Coverage Level Classification
Assign emoji badges to each tactic row in the coverage matrix based on the percentage of techniques covered:
| Coverage % | Badge | Level |
|---|---|---|
| 0% | 🔴 | No coverage |
| 1-15% | 🟠 | Critical gap |
| 16-30% | 🟡 | Partial |
| 31-50% | 🔵 | Moderate |
| 51-75% | 🟢 | Good |
| >75% | ✅ | Strong |
⛔ PROHIBITED: Assigning badges based on "importance" or "this tactic is more relevant." The badge MUST match the percentage threshold table above.
Rule B: Threat Scenario Priority
When rendering SOC Optimization threat scenarios, order by coverage gap (recommended minus active) descending, but assign badges based on completion rate (proportional to scenario size):
| Completion Rate | Priority | Badge |
|---|---|---|
| <15% | 🔴 High | Very early stage — most recommendations unaddressed |
| 15–35% | 🟠 Medium | Work in progress — significant room for improvement |
| 35–60% | 🟡 Low | Approaching healthy coverage for typical environments |
| ≥60% | ✅ Met | Strong coverage — well above realistic implementation targets |
Why rate-based? Recommendation counts reflect the full Content Hub template catalogue including templates for vendor products not deployed in the environment (e.g., all firewall vendors). A 609-rule scenario will be permanently 🔴 under absolute-gap thresholds even at 80% coverage. Rate-based badges give proportional, meaningful progress signals.
CompletedBySystem note:
CompletedBySystemis a SOC Optimization state, not a rate indicator. Some CompletedBySystem entries have low rates (recommended >> active). Always use the completion rate for badge assignment. The State column is displayed for context but does NOT override the rate-based badge.
Rule C: "Paper Tiger" Detection
When Phase 3 data is available, identify paper tiger rules — rules with MITRE tags that have NEVER produced an alert in the lookback period. These rules are tagged but non-operational, and their coverage is theoretical, not proven.
| Condition | Classification | Display |
|---|---|---|
| Rule tagged with MITRE + 0 alerts in lookback | ⚠️ Paper tiger | Note in technique drill-down |
| Rule tagged with MITRE + ≥1 alert | ✅ Operationally validated | Normal display |
| Phase 3 data unavailable (FAILED/SKIPPED) | — | Skip paper-tiger analysis, note data gap |
⛔ PROHIBITED: Reporting coverage percentages as "validated" when Phase 3 data is missing. If M4/M5 failed, state: "Coverage percentages reflect rule tagging only — operational validation unavailable (Phase 3 KQL queries failed)."
Rule D: Recommendation Ranking
Rank recommendations by impact using this priority order:
| Priority | Category | Criteria |
|---|---|---|
| 1 | 🔴 Low-rate threat scenarios | SOC Optimization scenarios with <15% completion rate. Exclude CompletedByUser scenarios with ≥50% completion rate (Rule E — Reviewed & Addressed). Only include ⚠️ Premature CompletedByUser (<50% rate) |
| 2 | 🔴 Zero-coverage detectable tactics | Tactics with 0% coverage AND ✅ Detectable classification (see tactic table). Exclude ⬜ Inherent blind spot tactics (Reconnaissance, Resource Development) — report these as acknowledged limitations, not actionable gaps |
| 3 | 🟠 Untagged rule remediation | Rules with AI-suggested MITRE tags from SOC Optimization |
| 4 | 🟠 Paper tiger rules | MITRE-tagged rules that never fire (if Phase 3 available) |
| 5 | 🟡 Low-coverage tactics | Tactics with 1-15% coverage |
| 6 | 🟡 Content Hub suggestions | Template-based rules available for uncovered techniques |
| 7 | ⬜ Inherent blind spot tactics | Zero-coverage tactics classified as ⬜ Inherent blind spot. Acknowledge the limitation; suggest compensating controls (threat intel feeds, brand monitoring) only if relevant to the organization |
Rule E: CompletedByUser Completion-Rate Gate
When a SOC Optimization threat scenario has State == CompletedByUser, the user has manually marked it as reviewed. However, marking a scenario "complete" after enabling 2/500 recommendations is fundamentally different from enabling 28/46. Use the completion rate (ActiveDetections / RecommendedDetections × 100) to determine rendering treatment:
| CompletedByUser + Completion Rate | Treatment | Rationale |
|---|---|---|
| ≥ 50% | 🟢 Reviewed & Addressed — render in a separate muted "Reviewed Scenarios" summary below the active gaps table. Exclude from §6 recommendations and Coverage Priority Matrix | User has genuinely triaged the scenario; remaining gap is likely non-applicable templates or platform-only coverage |
| < 50% | ⚠️ Premature Completion — render in the main active gaps table with full gap badge + ⚠️ flag in the State column. Include in §6 recommendations | Gap is too large relative to recommendations to be a deliberate triage decision |
Threshold: 50% is the default. This balances trust in the user's judgment against protection from rubber-stamped completions.
Scratchpad column: CompletionRate is pre-computed by the PS1 and included in the ThreatScenarios row. The LLM reads this value directly — do not recompute it.
Interaction with Rule B (rate-based badges): Rule B still applies for badge assignment on all scenarios. Rule E only controls where CompletedByUser scenarios are rendered (active table vs reviewed summary) and whether they appear in §6 recommendations.
CompletedBySystem scenarios are not affected — they continue to use rate-based badges (Rule B) without the completion-rate gate, since the system assessment is independent of user action.
MITRE Coverage Score
The MITRE Coverage Score is a composite metric (0-100) computed by the PS1 from 5 weighted dimensions. Each dimension scores 0-100 independently, then the weighted sum produces the final score.
Dimensions
| # | Dimension | Weight | Formula | What It Measures |
|---|---|---|---|---|
| 1 | Breadth | 25% | (Σ per-technique readiness credit / total ATT&CK techniques) × 100 blended 60/40 with combined platform coverage |
Readiness-weighted technique coverage. Each technique gets fractional credit based on the best rule covering it: Fired=1.0, Ready=0.75, Partial=0.50, NoData=0.25, TierBlocked=0.0. AR and CD rules follow the same readiness constraints. One firing rule gives full credit even if other rules covering the same technique are NoData |
| 2 | Balance | 10% | (tactics with ≥1 rule / 14 tactics) × 100 |
Whether coverage spans all kill chain phases or clusters in a few |
| 3 | Operational | 30% | (MITRE-tagged rules that fired alerts / total MITRE-tagged enabled rules) × 100 |
Whether tagged rules actually produce detections (not paper tigers). Highest weight: directly rewards purple teaming and operationally validated detections |
| 4 | Tagging | 15% | (rules with MITRE tags / total rules) × 100 |
Completeness of MITRE classification across the rule inventory |
| 5 | SOC Alignment | 20% | (completed SOC recommendations / total SOC coverage recommendations) × 100 |
Alignment with Microsoft's threat-scenario-driven coverage model |
Score Interpretation
| Score Range | Assessment | Typical Profile |
|---|---|---|
| 80-100 | 🟢 Strong | Broad coverage, balanced tactics, operationally validated, well-tagged, SOC-aligned |
| 60-79 | 🔵 Good | Solid coverage with some gaps; may have clustering or unvalidated rules |
| 40-59 | 🟡 Moderate | Significant gaps in breadth or operational validation; improvement opportunities |
| 20-39 | 🟠 Developing | Limited coverage across the framework; many uncovered tactics |
| 0-19 | 🔴 Critical | Minimal detection coverage; urgent investment needed |
Score Context Notes
- Operational = 0 when Phase 3 KQL queries fail (token expiry). Report this: "Operational score 0 reflects data unavailability, not necessarily poor operational coverage."
- SOC Alignment = 50 (default) when no SOC Optimization recommendations exist. This is a neutral baseline, not a penalty.
- Breadth score is naturally low because the ATT&CK framework contains 216+ techniques, many of which are endpoint-specific or pre-compromise with limited Sentinel visibility. Do NOT present this as a crisis — contextualize it: "Prioritize coverage by threat scenario relevance rather than pursuing raw percentage."
- Custom Detections SKIPPED affects Breadth and Tagging dimensions (rules not counted). Note the impact in the report.
- Platform Coverage is reported as a supplementary metric alongside the MITRE Score (not folded into the 5 dimensions). The scratchpad includes
Platform_Tier1/2/3,Platform_ActiveProducts, andRuleBasedPlusPlatform_Coverage. Render this in §1 and §5 per SKILL-report.md templates. The CTID tier classification requiresm365-platform-coverage.json— if the file is missing, platform tiers default to empty and the report notes the limitation.
Domain Reference
ATT&CK Enterprise Tactic Kill Chain Order
The 14 ATT&CK Enterprise tactics in kill chain order (PS1 uses this ordering for all output):
| # | Tactic (Sentinel API name) | Display Name | Cloud/Identity Relevance | Detectability |
|---|---|---|---|---|
| 1 | Reconnaissance | Reconnaissance | 🟡 Low — mostly pre-compromise; limited Sentinel visibility | ⬜ Inherent blind spot |
| 2 | ResourceDevelopment | Resource Development | 🟡 Low — attacker infrastructure; limited Sentinel visibility | ⬜ Inherent blind spot |
| 3 | InitialAccess | Initial Access | 🔴 High — phishing, valid accounts, external services | ✅ Detectable |
| 4 | Execution | Execution | 🟠 Medium — scripting, cloud admin commands | ✅ Detectable |
| 5 | Persistence | Persistence | 🔴 High — account manipulation, app registrations, inbox rules | ✅ Detectable |
| 6 | PrivilegeEscalation | Privilege Escalation | 🔴 High — tenant policy modification, valid accounts | ✅ Detectable |
| 7 | DefenseEvasion | Defense Evasion | 🟠 Medium — many techniques are endpoint-focused | ✅ Detectable |
| 8 | CredentialAccess | Credential Access | 🔴 High — brute force, token theft, AiTM | ✅ Detectable |
| 9 | Discovery | Discovery | 🟡 Medium — account/cloud service discovery | ✅ Detectable |
| 10 | LateralMovement | Lateral Movement | 🟠 Medium — remote services, internal spearphishing | ✅ Detectable |
| 11 | Collection | Collection | 🟡 Medium — email collection, data from cloud storage | ✅ Detectable |
| 12 | CommandAndControl | Command and Control | 🟠 Medium — application layer protocol, web service | ✅ Detectable |
| 13 | Exfiltration | Exfiltration | 🟠 Medium — exfiltration over C2 channel, cloud account | ✅ Detectable |
| 14 | Impact | Impact | 🟠 Medium — resource hijacking (crypto mining), account removal | ✅ Detectable |
Detectability classification:
- ✅ Detectable: Techniques in this tactic generate observable events in Sentinel data sources (sign-in logs, audit logs, endpoint telemetry, email events, etc.). KQL detection rules can be written and deployed.
- ⬜ Inherent blind spot: Techniques in this tactic describe attacker activity that occurs outside the monitored environment (e.g., attacker creating fake accounts on external services, acquiring infrastructure). CTID mappings for these tactics are typically protect/respond capabilities (Conditional Access blocking, PAM restrictions), not detect. No KQL detection rules exist or can realistically be created. Do not recommend deploying rules for inherent blind spot tactics — acknowledge the limitation and recommend compensating controls (e.g., brand monitoring services, threat intelligence feeds) if relevant.
Sentinel-Specific MITRE Mapping Notes
- Sentinel uses PascalCase for tactic names in the REST API:
InitialAccess,CommandAndControl,CredentialAccess. The ATT&CK STIX data uses kebab-case (initial-access). The reference JSON maps between these. - Sub-techniques (T1xxx.xxx) are tracked by Sentinel but the REST API
properties.techniquesfield may contain both parent techniques (T1078) and sub-techniques (T1078.004). The PS1 counts at the parent technique level for coverage matrix purposes. - ICS/OT techniques (T0xxx) use a separate numbering scheme from ATT&CK for ICS. These are extracted and reported separately since they don't map to the Enterprise framework.
- Custom Detection
mitreTechniquesuses the same technique ID format but may specify sub-techniques that analytic rules don't. The PS1 aggregates both sources.
Tactic-Specific Detection Guidance
When rendering recommendations (§6), use these cloud/identity-relevant technique priorities:
| Tactic | Key Sentinel-Detectable Techniques | Priority |
|---|---|---|
| InitialAccess | T1078 (Valid Accounts), T1566 (Phishing), T1133 (External Remote Services) | 🔴 Must-have |
| Persistence | T1098 (Account Manipulation), T1136 (Create Account), T1078 (Valid Accounts) | 🔴 Must-have |
| CredentialAccess | T1110 (Brute Force), T1528 (Steal App Access Token), T1621 (MFA Request Gen) | 🔴 Must-have |
| PrivilegeEscalation | T1484 (Domain/Tenant Policy Mod), T1078 (Valid Accounts), T1098 (Account Manipulation) | 🔴 Must-have |
| DefenseEvasion | T1078 (Valid Accounts), T1484 (Domain/Tenant Policy Mod), T1562 (Impair Defenses) | 🟠 Important |
| Exfiltration | T1567 (Exfil Over Web Service), T1537 (Transfer to Cloud Account) | 🟠 Important |
| Collection | T1114 (Email Collection), T1213 (Data from Info Repos) | 🟠 Important |
SOC Optimization Threat Scenario Reference
SOC Optimization recommendations map to named threat scenarios. When rendering §4, interpret these:
| Scenario | Key Attack Pattern | Priority Tactics |
|---|---|---|
| AiTM (Adversary in the Middle) | Session token theft, AiTM phishing | InitialAccess, CredentialAccess |
| BEC (Financial Fraud) | Email account takeover for wire fraud | InitialAccess, CredentialAccess, Persistence |
| BEC (Mass Credential Harvest) | Large-scale phishing campaigns | InitialAccess, CredentialAccess, DefenseEvasion |
| Human Operated Ransomware | Post-compromise hands-on keyboard | LateralMovement, CredentialAccess, DefenseEvasion, Impact |
| Credential Exploitation | Credential stuffing, password spray | InitialAccess, CredentialAccess, Discovery |
| IaaS Resource Theft | Cloud compute hijacking (crypto mining) | CredentialAccess, Persistence, Impact |
| Network Infiltration | Traditional network-based attacks | Discovery, LateralMovement, C2 |
| X-Cloud Attacks | Cross-cloud lateral movement | CredentialAccess, PrivilegeEscalation, Persistence |
| ERP (SAP) | SAP financial process manipulation | InitialAccess, DefenseEvasion |
SOC Optimization Recommendation States
| State | Meaning | Report Treatment |
|---|---|---|
Active |
Recommendation is open and actionable | Show as gap — count toward coverage deficit |
InProgress |
User has started addressing the recommendation | Show as in-progress — partial credit |
CompletedBySystem |
Microsoft's automated assessment found coverage adequate | Use rate-based badge (may still show 🔴/🟠/🟡 if completion rate is low). State displayed in table for context |
Completed |
User manually marked as complete | Show as met — ✅ |
SVG Dashboard Generation
After the report is generated, the user may request an SVG dashboard visualization.
Trigger: "generate SVG dashboard", "visualize this report", "SVG from the MITRE report"
✅ DEFAULT: run the deterministic renderer (render_dashboard.py)
Do this first — do NOT hand-author the SVG. render_dashboard.py produces the manifest-driven 5-row dashboard non-interactively, parsing every value from the scratchpad + report + svg-widgets.yaml (no hardcoded run data). It is faster, deterministic, and produces a known-good layout. Run it:
python .github/skills/mitre-coverage-report/render_dashboard.py \
--scratch temp/mitre_scratch_<ts>.md \
--manifest .github/skills/mitre-coverage-report/svg-widgets.yaml \
--report reports/sentinel/mitre_coverage_report_<label>_<ts>.md \
--out reports/sentinel/mitre_coverage_report_<label>_<ts>_dashboard.svg
It reads the donut center, score dimensions, tactic bars, threat-scenario table, and KPI values from the scratchpad, and the Top-3 recommendation cards from the report's ### 🎯 Top 3 Recommendations table (falling back to top threat-scenario gaps if absent). Output is self-contained SVG with explicit fill on every <text>.
| Action | Status |
|---|---|
Running render_dashboard.py when the user asks to visualize/generate a dashboard |
✅ REQUIRED (default path) |
Hand-authoring the SVG via the svg-dashboard skill instead of running the script |
❌ PROHIBITED unless the user explicitly asks for a bespoke/custom layout the renderer can't produce |
Fallback — bespoke/interactive dashboards (svg-dashboard skill)
Only use this path when the user explicitly wants a custom layout, different widgets, or styling the deterministic renderer doesn't support. Edit svg-widgets.yaml first if the change is layout/field-level — the renderer reads it at generation time, so many "customizations" don't require hand-authoring.
- Load the
svg-dashboardskill - Use the rendered report + scratchpad data to build visualization widgets
- Recommended widget types for MITRE coverage:
- Score card — MITRE Coverage Score with 5 dimension breakdown
- Bar chart — Per-tactic coverage percentages (14 bars)
- Donut chart — Rule inventory breakdown (AR enabled/disabled, CD enabled/disabled, untagged)
- Table — Top 5 coverage gaps (tactic + gap %)
- KPI cards — Total techniques covered, SOC scenarios met, untagged rules
Troubleshooting
| Issue | Solution |
|---|---|
| Phase 3 KQL queries fail (token expired) | Re-authenticate: az login --tenant <tenant_id> --scope https://api.loganalytics.io/.default |
| Custom Detections SKIPPED | Normal if Graph API admin consent not granted. Report proceeds with AR-only analysis |
| SOC Optimization returns 0 recs | Workspace may not have SOC Optimization enabled, or all recommendations are already completed |
| Breadth score seems low (10-20%) | This is typical — 216+ techniques means even well-covered workspaces have low percentages. Focus on threat-scenario-aligned priorities, not raw percentage |
| ICS techniques appear in output | Normal if Defender for IoT rules are deployed. They're reported separately from Enterprise ATT&CK |
az rest returns 403 |
Check RBAC: user needs Microsoft Sentinel Reader on the workspace |
.github/skills/scope-drift-detection/device/SKILL.md
npx skills add SCStelz/security-investigator --skill scope-drift-detection-device -g -y
SKILL.md
Frontmatter
{
"name": "scope-drift-detection-device",
"description": "Use this skill when asked to detect scope drift, behavioral expansion, or process baseline deviation on devices or endpoints. Triggers on keywords like \"device drift\", \"device process drift\", \"endpoint drift\", \"process baseline\", \"device behavioral change\", or when investigating whether a device has gradually expanded its process execution beyond an established baseline. This skill builds a configurable-window behavioral baseline using DeviceProcessEvents, compares baseline with recent activity, computes a weighted Drift Score across 5 dimensions (Volume, Processes, Accounts, Process Chains, Signing Companies), and correlates with SecurityAlert, DeviceInfo (for uptime corroboration via MDE sensor health), and command-line pattern analysis. Supports fleet-wide and single-device modes.",
"drill_down_prompt": "Analyze device process drift for {entity} — behavioral baseline vs recent activity",
"threat_pulse_domains": [
"endpoint"
]
}
Device Scope Drift Detection — Instructions
Purpose
This skill detects scope drift — the gradual, often imperceptible expansion of process execution behavior beyond an established baseline — in endpoints and devices. Unlike sudden compromise (which triggers alerts), scope drift is a slow-burn pattern that evades threshold-based detections.
Entity Type: Device
| Identifier | Primary Table(s) | Use Case |
|---|---|---|
| DeviceName (hostname) | DeviceProcessEvents |
Endpoints, servers, workstations — fleet-wide or single-device process baseline analysis |
What this skill detects:
- Volume spikes in process execution relative to historical baseline
- New processes or process chains not seen in the baseline period
- New service accounts or user contexts executing processes
- Unsigned or unusually-signed binaries executing on endpoints
- Reconnaissance, lateral movement, persistence, and exfiltration command patterns
- Security alerts involving the drifting devices
Two operating modes:
| Mode | When to Use | Scope |
|---|---|---|
| Fleet-wide | "Check all devices for process drift", "device drift across the fleet" | Computes per-device drift scores, ranks all devices, flags those > 150% |
| Single-device | "Investigate process drift on DEVICE-01", specific hostname provided | Deep dive on one device with full process inventory and command-line analysis |
Related skills:
- SPN Scope Drift — for service principals
- User Scope Drift — for user accounts (UPNs)
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Output Modes - Inline chat vs. Markdown file
- Quick Start - 10-step investigation pattern
- Drift Score Formula - Weighted composite scoring (5 dimensions)
- Execution Workflow - Complete 4-phase process
- Sample KQL Queries - Validated query patterns (Queries 14-22)
- Report Template - Output format specification
- Known Pitfalls - Edge cases and false positives
- Error Handling - Troubleshooting guide
- SVG Dashboard Generation - Visual dashboard from report
Investigation shortcuts:
- Device with behavioral drift (TP Q6): Q15 (per-device drift scores + dimension ratios) → Q16 (first-seen processes — new in recent window) → Q18 (alert/incident correlation) → Q21 (uptime context)
- Suspicious process chains (TP Q7): Q17 (rare parent→child chains in recent window) → Q20 (command-line pattern detection — recon, lateral movement, persistence) → Q18 (alert correlation)
- Fleet uniformity assessment (TP Q6, all devices clustered): Q14 (fleet-wide daily trend) → Q15 (per-device breakdown) → Q22 (per-session volume — confirms burst vs sustained activity)
- Unsigned binary investigation (standalone): Q19 (unsigned/unusual signing companies in recent window) → Q16 (first-seen process overlap) → Q20 (command-line patterns for flagged binaries)
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY device scope drift analysis:
- ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
- ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
- ALWAYS determine mode — fleet-wide or single-device
- ALWAYS determine time windows — baseline period and recent period (configurable, defaults: 6-day baseline, 1-day recent within 7-day lookback)
- ALWAYS build baseline FIRST before comparing recent activity
- ALWAYS apply the low-volume denominator floor to prevent false-positive drift scores on sparse baselines
- ALWAYS correlate across all required data sources (DeviceProcessEvents, SecurityAlert, DeviceInfo)
- ALWAYS run independent queries in parallel for performance
- NEVER report a drift flag without corroborating evidence from at least one secondary data source
Data Sources
| Data Source | Role | Purpose |
|---|---|---|
DeviceProcessEvents |
✅ Primary | Device process execution baseline |
SecurityAlert |
✅ Corroboration | Corroborating alert evidence |
SecurityIncident |
✅ Corroboration | Real alert status/classification |
DeviceInfo |
✅ Corroboration | Device uptime/power-on pattern via MDE sensor health (primary — covers all MDE-onboarded devices) |
Heartbeat |
⚡ Fallback | Device uptime for non-MDE devices with Log Analytics agent (AMA/MMA) only |
⛔ MANDATORY: Sentinel Workspace Selection
This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:
When invoked from incident-investigation skill:
- Inherit the workspace selection from the parent investigation context
- If no workspace was selected in parent context: STOP and ask user to select
When invoked standalone (direct user request):
- ALWAYS call
list_sentinel_workspacesMCP tool FIRST - If 1 workspace exists: Auto-select, display to user, proceed
- If multiple workspaces exist:
- Display all workspaces with Name and ID
- ASK: "Which Sentinel workspace should I use for this investigation?"
- ⛔ STOP AND WAIT for user response
- ⛔ DO NOT proceed until user explicitly selects
- If a query fails on the selected workspace:
- ⛔ DO NOT automatically try another workspace
- STOP and report the error, display available workspaces, ASK user to select
🔴 PROHIBITED ACTIONS:
- ❌ Selecting a workspace without user consent when multiple exist
- ❌ Switching to another workspace after a failure without asking
- ❌ Proceeding with investigation if workspace selection is ambiguous
Output Modes
This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.
Mode 1: Inline Chat Summary (Default)
- Render the full drift analysis directly in the chat response
- Includes ASCII tables, drift dimension bars, and security assessment
- Best for quick review and interactive follow-up questions
Mode 2: Markdown File Report
- Save a comprehensive report to
reports/scope-drift/device/Scope_Drift_Report_<entity>_<timestamp>.md - All ASCII visualizations render correctly inside markdown code fences (
```) - Includes all data from inline mode plus additional detail sections
- Use
create_filetool — NEVER use terminal commands for file output - Filename patterns:
- Fleet-wide:
Scope_Drift_Report_fleet_devices_YYYYMMDD_HHMMSS.md - Single-device:
Scope_Drift_Report_<device_name>_YYYYMMDD_HHMMSS.md(lowercase, sanitized)
- Fleet-wide:
Markdown Rendering Notes
- ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
- ✅ Unicode block characters (
█full block,─box-drawing horizontal) display correctly in monospaced fonts - ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
- ✅ Standard markdown tables (
| col |) render as formatted tables - Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering
Quick Start (TL;DR)
When a user requests device scope drift detection:
- Select Workspace →
list_sentinel_workspaces, auto-select or ask - Determine Mode → Fleet-wide or single-device? Determine time windows.
- Determine Output Mode → Ask if not specified: inline, markdown file, or both
- Run Phase 1 → Query 14 (daily summary) + Query 15 (per-device breakdown)
- Apply Fleet Scaling → Compute drift scores, rank devices, apply tiered depth limits (see Fleet Scaling)
- Run Phase 2 → Query 16 (first-seen processes) + Query 17 (rare process chains) — scoped to Tier 1 + Tier 2 devices only
- Run Phase 3 → Query 18 (SecurityAlert + SecurityIncident) + Query 19 (unsigned/unusual) + Query 20 (notable command-line patterns) — scoped to Tier 1 devices only
- Run Phase 4 (corroboration) → Query 21 (DeviceInfo uptime) + Query 22 (per-session volume) for flagged/intermittent devices in Tier 1
- Compute Final Assessment → Combine drift scores with corroborating evidence
- Output Results → Render in selected mode(s) with tiered depth
Baseline and Recent Windows
Device process drift supports configurable time windows unlike sign-in drift (which uses fixed 90d/7d). The user may specify:
| User Request | Baseline Window | Recent Window |
|---|---|---|
| "24 hours over the last 7 days" | Days 1–6 | Day 7 (last 24h) |
| "last 48 hours vs previous week" | Days 3–9 | Days 1–2 |
| "process drift last 30 days" | Days 8–30 | Days 1–7 |
| No time specified | Last 6 days | Last 24 hours |
Note: Follow the global Tool Selection Rule in .github/copilot-instructions.md. For lookbacks ≤ 30 days, use RunAdvancedHuntingQuery (free on Analytics-tier DeviceProcessEvents; swap TimeGenerated → Timestamp). For lookbacks > 30 days (AH Graph API cap), use mcp_sentinel-data_query_lake with TimeGenerated. Sample queries below are written with TimeGenerated; adapt the column name when running in Advanced Hunting.
Fleet Scaling (Large Environments)
Problem: In small environments (< 50 devices), every device gets a full deep dive. In environments with hundreds or thousands of devices, running Queries 16–22 for every flagged device is prohibitively expensive (query timeouts, massive result sets, unreadable reports).
Solution: After Phase 1 computes drift scores for all devices, apply tiered depth based on fleet size and drift severity.
Fleet Size Detection
After Query 15, count distinct devices in the result set:
| Fleet Size | Tier | Deep Dive Limit | Behavior |
|---|---|---|---|
| ≤ 50 devices | Small | All flagged | Full deep dive for every device > 150%. No limiting needed. |
| 51–200 devices | Medium | Top 10 | Full deep dive for top 10 by DriftScore. Summary row for remaining flagged devices. |
| 201–1000 devices | Large | Top 10 | Full deep dive for top 10. Tier 2 summary (next 20) with first-seen processes only. Remaining flagged devices listed in ranking table with scores but no deep dive. |
| > 1000 devices | Very Large | Top 10 | Same as Large, plus: filter Query 15 to BL_TotalEvents > 10 to exclude near-silent devices from scoring. |
Tiered Depth Model
After computing drift scores and ranking all devices, assign tiers:
| Tier | Devices | Queries Run | Report Depth |
|---|---|---|---|
| Tier 1 (Full) | Top N by DriftScore (N = deep dive limit from table above) | All: Q16, Q17, Q18, Q19, Q20, Q21, Q22 | Full deep dive: ASCII chart, dimension table, first-seen processes, process chains, command-line patterns, alerts, DeviceInfo uptime |
| Tier 2 (Summary) | Next 20 flagged devices (or remaining if < 20) | Q16 only (first-seen processes) | One-line summary per device: score, top 3 new processes, flag status |
| Tier 3 (Score only) | All remaining flagged devices | None beyond Phase 1 | Row in ranking table: device name, drift score, dimension ratios, flag emoji |
| Stable | Devices ≤ 150% | None beyond Phase 1 | Omitted from deep dives. Included in fleet summary statistics only. |
KQL Scoping for Large Fleets
When running Phase 2–4 queries for large fleets, scope them to the relevant device tier using a let block:
// Scope Phase 2–3 queries to Tier 1 devices only
let tier1Devices = dynamic(["device-a", "device-b", "device-c"]);
DeviceProcessEvents
| where TimeGenerated > ago(lookback)
| where DeviceName in~ (tier1Devices)
// ... rest of query
User Override
If the user explicitly asks for "all devices" or "full report", honor the request but warn:
⚠️ Fleet has <N> devices with <X> flagged above 150%. Running full deep dives for all flagged devices may be slow and produce a very long report. Proceed? (Default: top 10 deep dives + summary for others)
Report Disclosure
When tiered depth is applied, always disclose in the report header:
**Fleet Size:** <N> devices (Large fleet — tiered analysis applied)
**Deep Dives:** Top <X> by DriftScore (Tier 1: full analysis)
**Summaries:** <Y> additional flagged devices (Tier 2: first-seen processes only)
**Score Only:** <Z> additional flagged devices (Tier 3: ranking table only)
**Stable:** <W> devices ≤ 150% (omitted from deep dives)
Drift Score Formula
The Drift Score is a weighted composite of behavioral dimensions, normalized so that 100 = identical to baseline.
Device Formula (5 Dimensions)
$$ \text{DriftScore}_{Device} = 0.30V + 0.25P + 0.15A + 0.20C + 0.10S $$
| Dimension | Weight | Metric | Why |
|---|---|---|---|
| Volume | 30% | Daily avg process events (recent / baseline) | Sudden activity surges indicate new software, lateral movement, or compromise |
| Processes | 25% | Distinct process filenames executed | New processes = new software deployment, malware, or living-off-the-land tools |
| Accounts | 15% | Distinct account identities executing processes | New accounts = lateral movement, privilege escalation, or unauthorized access |
| Process Chains | 20% | Distinct parent→child process relationships | New chains = novel execution patterns, potentially malicious process trees |
| Signing Companies | 10% | Distinct file signing entities | New unsigned or unusually-signed binaries = potential malware or unauthorized tools |
Interpretation Scale
| Score | Meaning | Action |
|---|---|---|
| < 80 | Contracting scope | ✅ Normal — entity is doing less than usual |
| 80–120 | Stable / normal variance | ✅ No action required |
| 120–150 | Moderate deviation | 🟡 Monitor — check for legitimate reasons |
| > 150 | Significant drift | 🔴 FLAG — investigate with corroborating evidence |
| > 250 | Extreme drift | 🔴 CRITICAL — immediate investigation required |
Low-Volume Denominator Floor
CRITICAL: For devices with sparse baselines (< 10 daily process events), the volume ratio is artificially inflated. Apply a floor:
IF BL_DailyAvg < 10:
AdjustedVolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100
Flag the score with: "⚠️ Low-volume baseline — ratio may be inflated"
Execution Workflow
Phase 1: Behavioral Baseline vs. Recent Comparison
Default windows: Baseline = days 1-6 ago, Recent = last 24h (within 7-day lookback). Configurable by user.
This is the primary query that computes per-device behavioral profiles and drift metrics.
| Data Source | Query | Notes |
|---|---|---|
DeviceProcessEvents |
Query 14 | Fleet-wide daily summary |
DeviceProcessEvents |
Query 15 | Per-device daily breakdown with drift score computation |
Fleet-wide produces ONE drift score per device. Devices are ranked by DriftScore; those exceeding 150% are assigned to tiers based on fleet size (see Fleet Scaling). Tier 1 devices get full deep dives; Tier 2 get summary analysis; Tier 3 appear in the ranking table only.
Phase 2: Process Drift Pattern Analysis
- First-seen processes (Query 16): Processes appearing only in the recent window with no baseline history. These are the strongest drift signal — new software, tools, or malware.
- Rare process chains (Query 17): Parent→child execution relationships seen only in the recent window. New chains may indicate novel attack patterns, lateral movement tools, or changed automation.
Phase 3: Corroborating Signal Collection (Run in Parallel)
- SecurityAlert + SecurityIncident (Query 18): Alerts referencing any of the analyzed devices, joined with SecurityIncident for real status. Never read SecurityAlert.Status directly — it's always "New".
- Unsigned/unusual processes (Query 19): Processes with signing companies not seen in the baseline, or unsigned binaries. Legitimate software deployments will show known signing companies; malware or tools may be unsigned or signed by unusual entities.
- Notable command-line patterns (Query 20): Search for reconnaissance commands (
whoami,net user,ipconfig,nltest,systeminfo), lateral movement (psexec,wmic), persistence mechanisms (schtasks,reg add), and exfiltration indicators (curl,wget,certutil). - Account landscape analysis: Review which accounts executed processes — flag any new service accounts, admin accounts, or unexpected user contexts in the recent window.
Phase 4: Uptime Corroboration (For Flagged/Intermittent Devices)
- DeviceInfo uptime pattern (Query 21): For any device with a drift score near or above the 150% threshold, or any device known/suspected to be intermittently powered on, query the
DeviceInfotable to determine actual uptime days via MDE sensor health state. This is the primary corroboration source and covers all MDE-onboarded devices. For non-MDE devices with only Log Analytics agent (AMA/MMA), fall back to theHeartbeattable using the same query pattern (substituteDeviceInfo→Heartbeat,DeviceName→Computer,SensorHealthState→OSType). - Per-session process volume (Query 22): Query
DeviceProcessEventsper-day to show per-session event concentration. This context is critical for interpreting volume-based drift — a device that was online only 5 days out of 90 will have a diluted baseline daily average, making any recent power-on session appear as a massive volume spike. - Run Queries 21+22 for flagged devices and include the uptime context in the deep dive section.
Phase 5: Score Computation & Report Generation
- Compute DriftScore per device using the 5-dimension formula
- Apply the low-volume denominator floor
- Flag any device exceeding 150% threshold
- Handle special cases:
- Newly onboarded devices (no baseline = DriftScore 999) should be flagged as "New Device" rather than drift
- Data Lake ingestion boundaries may cause zero recent-window activity — verify before reporting contraction
- For devices with elevated Volume ratio (>200%) or near-threshold DriftScore (>130%): Run Queries 21+22 (DeviceInfo uptime + per-session volume) to determine if the volume spike is explained by intermittent power-on usage. If the device was only online for a small fraction of the baseline window, note as mitigating factor.
- Generate risk assessment with emoji-coded findings
- Render output in the user's selected mode
Sample KQL Queries
Query 14: Device Process Events — Daily Summary (Fleet-Wide)
// Daily summary of process events across all devices
// Configurable: adjust 'lookback' for total analysis window
let lookback = 7d;
DeviceProcessEvents
| where TimeGenerated > ago(lookback)
| summarize
TotalEvents = count(),
DistinctDevices = dcount(DeviceName),
DistinctProcesses = dcount(FileName),
DistinctAccounts = dcount(AccountName),
DistinctChains = dcount(strcat(InitiatingProcessFileName, "→", FileName)),
DistinctCompanies = dcount(ProcessVersionInfoCompanyName)
by Day = bin(TimeGenerated, 1d)
| order by Day asc
Purpose: Provides the fleet-wide daily trend to identify volume anomalies and determine optimal baseline/recent window split. Use this to verify data availability before running the per-device breakdown.
Query 15: Per-Device Daily Breakdown & Drift Score Computation
// Per-device per-day behavioral profile with drift score computation
// Configurable time windows:
// baselineDays = number of days in baseline period
// recentDays = number of days in recent period
// lookback = baselineDays + recentDays
let lookback = 7d;
let recentDays = 1; // Last N days as "recent" window
let baselineDays = 6; // Remaining days as "baseline"
let recentStart = ago(1d * recentDays);
DeviceProcessEvents
| where TimeGenerated > ago(lookback)
| extend IsRecent = TimeGenerated >= recentStart
| summarize
TotalEvents = count(),
DistinctProcesses = dcount(FileName),
DistinctAccounts = dcount(AccountName),
DistinctChains = dcount(strcat(InitiatingProcessFileName, "→", FileName)),
DistinctCompanies = dcount(ProcessVersionInfoCompanyName)
by DeviceName, IsRecent
| extend Period = iff(IsRecent, "Recent", "Baseline")
| order by DeviceName, Period asc
Post-Processing: After retrieving results, compute per-device drift scores:
- For each device, extract Baseline and Recent rows
- Compute daily averages:
BL_DailyAvg = BL_TotalEvents / baselineDays,RC_DailyAvg = RC_TotalEvents / recentDays - Compute dimension ratios:
VolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100 - Apply the Device formula:
DriftScore = 0.30×Volume + 0.25×Processes + 0.15×Accounts + 0.20×Chains + 0.10×Companies - Handle edge cases:
- Device in baseline only (no recent data): Check if data ingestion boundary or genuine silence
- Device in recent only (no baseline): Set DriftScore = 999, flag as "New Device — no baseline"
- Apply denominator floor (
max(BL_value, 10)) for low-volume devices
Single-Device Mode: Add | where DeviceName =~ '<DEVICE_NAME>' as the second filter to scope to one device.
Query 16: First-Seen Processes (New in Recent Window)
// Processes appearing only in the recent window — not seen in baseline
// This is the strongest drift signal for devices
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
let baselineProcesses = DeviceProcessEvents
| where TimeGenerated between (ago(lookback) .. recentStart)
| distinct FileName;
DeviceProcessEvents
| where TimeGenerated >= recentStart
| distinct DeviceName, FileName, ProcessVersionInfoCompanyName
| join kind=leftanti baselineProcesses on FileName
| summarize
NewProcessCount = dcount(FileName),
NewProcesses = make_set(FileName, 50),
Companies = make_set(ProcessVersionInfoCompanyName, 50)
by DeviceName
| where NewProcessCount > 0
| order by NewProcessCount desc
Interpretation:
- New processes from recognized vendors (Microsoft, Google, etc.) → likely software updates or deployments
- Version-stamped update binaries (
AM_Delta_Patch_*.exe,MicrosoftEdge_X64_*.exe,odt*.tmp.exe) → expected noise, always appear as "new" (see pitfall: Version-Stamped Process Name False Positives) - New unsigned processes or processes from unknown companies → investigate immediately
- Large number of new processes on a single device → may indicate software deployment, but also possible malware dropper
Single-Device Mode: Add | where DeviceName =~ '<DEVICE_NAME>' to both the baseline and recent subqueries. Then expand to show full process details including ProcessCommandLine and FolderPath.
Fleet-Wide vs. Per-Device First-Seen Behavior: This query identifies processes that are globally novel — not seen on any device during the baseline. If a process ran on DeviceA during baseline but appears on DeviceB for the first time in the recent window, it will NOT be flagged because the baseline distinct FileName covers all devices. This design choice reduces noise (known-good processes aren't re-flagged per device) but may miss per-device novelty. For per-device first-seen analysis, scope the baseline distinct by DeviceName — note this is significantly more expensive on large fleets.
Query 17: Rare Process Chains (Parent→Child Relationships)
// Process chains (parent→child) seen only in recent window
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
let baselineChains = DeviceProcessEvents
| where TimeGenerated between (ago(lookback) .. recentStart)
| extend Chain = strcat(InitiatingProcessFileName, "→", FileName)
| distinct Chain;
DeviceProcessEvents
| where TimeGenerated >= recentStart
| extend Chain = strcat(InitiatingProcessFileName, "→", FileName)
| join kind=leftanti baselineChains on Chain
| summarize
Occurrences = count(),
Devices = make_set(DeviceName, 20),
DeviceCount = dcount(DeviceName),
Accounts = make_set(AccountName, 10),
SampleCommandLine = take_any(ProcessCommandLine)
by Chain
| order by Occurrences desc
| take 30
Interpretation:
- Common chains like
explorer.exe→notepad.exeappearing as "new" → baseline window too short or intermittent usage - Update chains like
wuauclt.exe→AM_Delta_Patch_*.exeormicrosoftedgeupdate.exe→MicrosoftEdge_X64_*.exe→ expected noise from automatic updates, always appear as "new" due to version-stamped child process names - Suspicious chains like
cmd.exe→powershell.exe→certutil.exe→ investigate for LOLBin abuse - Chains appearing on a single device vs. fleet-wide → single device may indicate targeted activity
Query 18: Device SecurityAlert + SecurityIncident Correlation
// Security alerts referencing analyzed devices, joined with SecurityIncident for real status
// IMPORTANT: SecurityAlert.Status is immutable (always "New") — MUST join SecurityIncident
// Substitute <DEVICE_NAMES> with comma-separated device names from Query 15
let lookback = 7d;
let relevantAlerts = SecurityAlert
| where TimeGenerated > ago(lookback)
| where Entities has_any (<DEVICE_NAMES>) or CompromisedEntity has_any (<DEVICE_NAMES>)
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName, ProductComponentName,
Tactics, Techniques, CompromisedEntity, TimeGenerated;
SecurityIncident
| where CreatedTime > ago(lookback)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| project IncidentNumber, Title, Severity, Status, Classification,
AlertName, AlertSeverity, ProductName, Tactics, Techniques,
CompromisedEntity, AlertTime = TimeGenerated1
| order by AlertTime desc
Interpreting Incident Status in Drift Context:
| Incident Status | Classification | Impact on Drift Assessment |
|---|---|---|
| Closed | TruePositive | 🔴 Confirmed threat — significantly increases drift risk |
| Closed | FalsePositive | 🟢 False alarm — discount from drift risk, note as noise |
| Closed | BenignPositive | 🟡 Expected behavior — note but don't escalate |
| Active/New | Any | 🟠 Unresolved — flag for attention, may indicate ongoing threat |
Product Name Mapping (Legacy → Current Branding):
| SecurityAlert.ProductName (raw) | Report Display Name |
|---|---|
| Microsoft Defender Advanced Threat Protection | Microsoft Defender for Endpoint |
| Microsoft Cloud App Security | Microsoft Defender for Cloud Apps |
| Microsoft Data Loss Prevention | Microsoft Purview Data Loss Prevention |
| Azure Sentinel | Microsoft Sentinel |
| Microsoft 365 Defender | Microsoft Defender XDR |
| Office 365 Advanced Threat Protection | Microsoft Defender for Office 365 |
| Azure Advanced Threat Protection | Microsoft Defender for Identity |
Report Rendering: Group by incident, show severity/status/classification. Translate ProductName to current branding. Link back to device drift scores — a device with both high drift score AND correlated security alerts is highest priority for investigation.
Query 19: Unsigned/Unusual Signing Companies in Recent Window
// Signing companies appearing only in the recent window
// Unsigned or unusually-signed binaries may indicate unauthorized software or malware
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
let baselineCompanies = DeviceProcessEvents
| where TimeGenerated between (ago(lookback) .. recentStart)
| where isnotempty(ProcessVersionInfoCompanyName)
| distinct ProcessVersionInfoCompanyName;
DeviceProcessEvents
| where TimeGenerated >= recentStart
| summarize
EventCount = count(),
Devices = make_set(DeviceName, 20),
Processes = make_set(FileName, 20)
by ProcessVersionInfoCompanyName
| join kind=leftanti baselineCompanies on ProcessVersionInfoCompanyName
| where isnotempty(ProcessVersionInfoCompanyName)
| order by EventCount desc
For unsigned processes (empty company field):
// Find unsigned processes in the recent window
// NOTE: Linux devices will dominate results — Linux binaries lack ProcessVersionInfoCompanyName by design.
// Consider filtering to Windows devices: | where DeviceName !has "linux"
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
DeviceProcessEvents
| where TimeGenerated >= recentStart
| where isempty(ProcessVersionInfoCompanyName)
| summarize
EventCount = count(),
Devices = make_set(DeviceName, 20),
SampleCommandLine = take_any(ProcessCommandLine)
by FileName, FolderPath
| order by EventCount desc
| take 20
Query 20: Notable Command-Line Pattern Detection
// Search for reconnaissance, lateral movement, persistence, and exfiltration command patterns
// Run against the recent window to identify suspicious activity
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
DeviceProcessEvents
| where TimeGenerated >= recentStart
| where ProcessCommandLine has_any (
// Reconnaissance
"whoami", "net user", "net group", "net localgroup", "nltest", "systeminfo",
"ipconfig /all", "nslookup", "query user", "qwinsta",
// Lateral movement
"psexec", "wmic", "invoke-command", "enter-pssession", "new-pssession",
// Persistence
"schtasks /create", "reg add", "sc create", "New-Service",
// Credential access
"mimikatz", "sekurlsa", "lsass", "procdump", "comsvcs.dll",
// Exfiltration / download
"certutil -urlcache", "bitsadmin /transfer", "curl ", "wget ",
"Invoke-WebRequest", "downloadstring", "downloadfile"
)
| project TimeGenerated, DeviceName, AccountName, FileName,
InitiatingProcessFileName, ProcessCommandLine
| order by TimeGenerated desc
| take 50
Interpretation:
- Commands executed by expected service accounts (e.g., MDI sensor running
ipconfig /flushdns) → benign - Linux health checks (
curlto MCR,wgetfor MOTD) executed by root → expected operational noise - Reconnaissance commands from user accounts or unexpected contexts → investigate
- Multiple categories of suspicious commands on the same device → high confidence indicator of compromise
Query 21: DeviceInfo Uptime Pattern (Device Corroboration)
// Corroboration query: Determine actual device uptime days from DeviceInfo table (MDE sensor)
// DeviceInfo records entity snapshots ~hourly for MDE-onboarded devices
// Run for the full analysis window (baseline + recent) to see power-on cadence
// Substitute <DEVICE_NAME> with the target device hostname
let totalDays = 97; // Intentionally wider than the drift analysis window (default 7d) to capture the device's long-term power-on cadence across 90+ days
DeviceInfo
| where TimeGenerated > ago(1d * totalDays)
| where DeviceName has "<DEVICE_NAME>"
| summarize
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
RecordCount = count(),
SensorHealth = take_any(SensorHealthState),
OnboardingStatus = take_any(OnboardingStatus)
by Day = bin(TimeGenerated, 1d)
| order by Day asc
Heartbeat fallback (for non-MDE devices with Log Analytics agent only):
// Fallback: Use Heartbeat table when DeviceInfo returns 0 results (device not MDE-onboarded)
let totalDays = 97;
Heartbeat
| where TimeGenerated > ago(1d * totalDays)
| where Computer has "<DEVICE_NAME>"
| summarize
FirstHeartbeat = min(TimeGenerated),
LastHeartbeat = max(TimeGenerated),
HeartbeatCount = count()
by Day = bin(TimeGenerated, 1d)
| order by Day asc
Interpretation:
- Gaps between days = device was powered off (or MDE sensor was inactive). Count the rows to determine total days online vs. the full analysis window.
- SensorHealthState values:
Active(sensor reporting normally),Inactive(sensor not communicating),Misconfigured(partial telemetry). Use to assess data quality. - Intermittent devices (online <30% of baseline window) will produce artificially diluted baseline daily averages. A single power-on session will appear as a large volume spike. This is a mathematical artifact, not genuine drift.
- Consistent daily presence confirms the baseline daily average is representative — volume spikes are more meaningful.
- Use case: When a device shows elevated Volume ratio (>200%) but low Process/Account/Chain diversity ratios, check DeviceInfo first. If the device was only online 5 days out of 90, the 312% volume ratio is expected.
- Example: A device with 4,243 baseline events spread across only 4 power-on sessions (~40 hrs total) has a "true" daily average of ~1,060 events/session-day, not the diluted ~47 events/calendar-day. A recent session producing 1,031 events is exactly normal.
- Why DeviceInfo over Heartbeat: DeviceInfo is generated by the MDE sensor (~hourly entity snapshots) and covers all Defender-onboarded devices. Heartbeat requires a Log Analytics agent (AMA/MMA) which many MDE-only devices don't have. In testing, DeviceInfo showed 28 days of coverage where Heartbeat showed only 3 days for the same device.
Query 22: Per-Session Process Volume (Device Corroboration)
// Corroboration query: Show event volume and diversity per power-on session
// Confirms events are concentrated in short bursts, not spread evenly
// Substitute <DEVICE_NAME> with the target device hostname
let totalDays = 97; // Intentionally wider than the drift analysis window (default 7d) to capture per-session behavior across the device's full power-on history
DeviceProcessEvents
| where TimeGenerated > ago(1d * totalDays)
| where DeviceName has "<DEVICE_NAME>"
| summarize
Events = count(),
UniqueProcesses = dcount(FileName),
UniqueAccounts = dcount(AccountName),
FirstEvent = min(TimeGenerated),
LastEvent = max(TimeGenerated)
by Day = bin(TimeGenerated, 1d)
| extend SessionDuration = LastEvent - FirstEvent
| order by Day asc
Interpretation:
- Per-session event volumes should be compared across sessions. If each power-on session produces roughly similar event counts (600–1,500), the behavior is consistent regardless of how infrequently the device is used.
- SessionDuration shows how long the device was active per day. Cross-reference with DeviceInfo FirstSeen/LastSeen for validation.
- Process diversity per session (UniqueProcesses) should be similar across sessions. If the most recent session shows 90+ unique processes and baseline sessions also show 70–90+, the diversity is normal — the same software runs each time the device boots.
- Use in report: Include a power-on session table in the Flagged Device Deep Dive to contextualize why the volume ratio is elevated. Note: "Volume-driven score inflation due to intermittent usage pattern — per-session behavior is consistent with baseline sessions."
Report Template
Inline Chat Report Structure (Fleet-Wide)
The inline report MUST include these sections in order:
- Header — Workspace, analysis period (baseline/recent windows), drift threshold, device count, total events
- Fleet Daily Trend Table — Day-by-day event counts, distinct processes, accounts, chains, companies
- Per-Device Drift Score Ranking — All devices sorted by DriftScore descending, with per-dimension ratios and flag status
- Flagged Device Deep Dive (for each Tier 1 device > 150% or DriftScore=999) — Baseline vs. recent comparison, dimension bar chart, new processes, process chains, account context. For new devices (999): identify as "newly onboarded" and list all processes observed. For devices with elevated volume ratio: include DeviceInfo uptime pattern (Query 21) and per-session volume table (Query 22) showing power-on cadence and per-session event consistency. Flag intermittent devices with: "⚠️ Intermittent device — online N of M baseline days. Volume ratio reflects power-on burst, not behavioral expansion."
- Tier 2 Device Summaries (if fleet scaling applied) — One-line summary per Tier 2 device: drift score, top 3 first-seen processes, flag status. No full deep dive.
- First-Seen Process Summary — Processes appearing only in recent window, grouped by device (Tier 1 + Tier 2 devices)
- Correlated Security Alerts — SecurityAlert+SecurityIncident correlation for all analyzed devices
- Uptime Context (if applicable) — For flagged or near-threshold devices, include DeviceInfo-derived power-on session table showing each session's duration, event count, and process diversity. This section contextualizes volume-driven drift scores.
- Account Landscape — Summary of which accounts executed processes, flagging any unexpected contexts
- Notable Command-Line Patterns — Reconnaissance/lateral movement/persistence command matches
- Security Assessment — Emoji-coded findings table with evidence citations
- Verdict Box — Overall fleet risk level, per-device verdicts, recommendations
Inline Chat Report Structure (Single-Device)
Same as fleet-wide sections 1, 3-11, but for one device only. Add:
- Full process inventory (baseline vs recent)
- Complete command-line analysis for suspicious processes
- Process chain tree visualization
Markdown File Report Structure
When outputting to markdown file, include everything from the inline format PLUS:
Filename patterns:
- Fleet-wide:
reports/scope-drift/device/Scope_Drift_Report_fleet_devices_YYYYMMDD_HHMMSS.md - Single-device:
reports/scope-drift/device/Scope_Drift_Report_<device_name>_YYYYMMDD_HHMMSS.md
# Device Process Scope Drift Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Baseline Period:** <start> → <end> (<N> days)
**Recent Period:** <start> → <end> (<N> days)
**Drift Threshold:** 150%
**Data Sources:** DeviceProcessEvents, SecurityAlert, SecurityIncident, DeviceInfo
**Mode:** Fleet-Wide | Single-Device (<device_name>)
**Devices Analyzed:** <count>
**Total Events:** <count>
---
## Executive Summary
<1-3 sentence summary: how many devices analyzed, how many flagged, overall risk level>
---
## Fleet Daily Trend
<ASCII table: Day | Events | Devices | Processes | Accounts | Chains | Companies>
<!-- Wrap in code fence for consistent rendering -->
---
## Per-Device Drift Score Ranking
<Table with all devices, per-dimension ratios, DriftScore, flag status>
<Devices with DriftScore=999 flagged as "New Device">
---
## Flagged Device Deep Dive
### <Device Name> — Drift Score <score>
**ASCII Drift Dimension Chart (REQUIRED):**
Render a box-drawn chart inside a code fence. **Inner width: 58 chars** (every line between `│` markers = exactly 58 visual characters). No emoji inside boxes — use text labels.
**Alignment:** Name (9 chars padded) + weight (5) + gap (2) + bars (20 `█─`) + gap (2) + pct (6, right-aligned: `XXX.X%` or ` XX.X%`) + gap (2) + direction (10 total: `^`/`v`/`=` + 9 trailing spaces). Status labels (centered): `STABLE`, `STABLE (Low-Volume)`, `NEAR THRESHOLD`, `ABOVE THRESHOLD`, `CRITICAL`. Direction: `^` (up), `v` (down), `=` (stable).
**Bar characters:** Use `█` (U+2588 full block) for filled portions and `─` (U+2500 box-drawing horizontal) for the unfilled track.
**Uptime-adjusted Volume:** When the Volume dimension has been adjusted for intermittent uptime (see Pitfalls → Intermittent-Use Device Volume Inflation), display the **effective (adjusted) percentage** in the chart and move the raw value into the description column. This keeps the percentage column fixed-width and avoids breaking bar alignment. Example: `XXX.X% ^ (raw: YYY.Y%)`.
┌──────────────────────────────────────────────────────────┐ │ DEVICE DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (30%) ██████────────────── XXX.X% ^ │ │ Processes(25%) ███───────────────── XX.X% v │ │ Accounts (15%) ██████────────────── XXX.X% = │ │ Chains (20%) ██────────────────── XX.X% v │ │ Companies(10%) ██████────────────── XXX.X% = │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘
**Bar fill:** 20 chars wide. Filled = round(ratio/100 × 20), capped at 20. Title and status: center within 58 chars. Use `█` for filled, `─` for unfilled.
**Then** render the standard markdown dimension table:
| Dimension | Weight | Baseline | Recent | Ratio | Weighted | Status |
|-----------|--------|----------|--------|-------|----------|--------|
<Baseline vs recent comparison table>
<New processes list with signing companies>
<New process chains>
<Account context>
#### Uptime Context (if intermittent device)
<If Volume ratio >200% or device known to be intermittent, include DeviceInfo-derived power-on session table>
| Session | Power On | Power Off | Duration | Events | Processes |
|---------|----------|-----------|----------|--------|-----------|
| 1 | <date/time> | <date/time> | ~N hrs | <count> | <count> |
| ... | ... | ... | ... | ... | ... |
⚠️ Intermittent device — online N of M baseline days. Volume ratio reflects power-on burst, not behavioral expansion. Per-session behavior is consistent with baseline sessions.
---
## First-Seen Processes
<Processes appearing only in recent window, by device>
---
## Correlated Security Alerts
<SecurityAlert + SecurityIncident correlation>
<Group by incident, show severity/status/classification>
---
## Notable Command-Line Patterns
<Reconnaissance/lateral movement/persistence/exfiltration matches>
<Context: which account, which device, benign vs suspicious>
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡 **Factor** | Evidence-based finding |
---
## Verdict
**ASCII Verdict Box (REQUIRED):**
Render a box-drawn verdict summary inside a code fence. **Inner width: 66 chars.** No emoji inside boxes. Pad every line to exactly 66 chars between `│` markers.
For fleet-wide reports:
┌──────────────────────────────────────────────────────────────────┐ │ OVERALL FLEET RISK: <LEVEL> -- <One-line summary> │ │ Flagged Devices: X of Y (Threshold: 150%) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘
For single-device reports:
┌──────────────────────────────────────────────────────────────────┐ │ OVERALL RISK: <LEVEL> -- <One-line summary> │ │ Drift Score: XX.X (Interpretation) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘
**Then** render the full verdict with:
- Per-device verdicts (for fleet-wide)
- Root Cause Analysis paragraph
- Key Findings (numbered list)
- Recommendations (emoji-prefixed list)
---
## Appendix: Query Details
Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.
| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q15 — Device Process Baseline vs. Recent | DeviceProcessEvents | X,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |
*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*
Known Pitfalls
SecurityAlert.Status Is Immutable — Always Join SecurityIncident
Problem: The Status field on SecurityAlert is set to "New" at creation time and never changes. It does NOT reflect whether the alert has been investigated, closed, or classified.
Solution: MUST join with SecurityIncident to get real Status (New/Active/Closed) and Classification (TruePositive/FalsePositive/BenignPositive). See Query 18 which implements this join.
Low-Volume Statistical Inflation
Problem: Entities with very low baseline activity will show extreme volume ratios even with minor changes. Solution: Apply the denominator floor (minimum 10 events/day for volume ratio calculation). Always flag low-volume baselines in the report.
Seasonal/Cyclical Baselines
Problem: Some devices have weekly patterns (lower on weekends) or monthly cycles (patch Tuesday). Solution: Note if the recent window falls on an atypical portion of the cycle. The baseline smooths most cyclical patterns, but edge cases exist.
Newly Onboarded Devices (DriftScore = 999)
Problem: Devices that appear only in the recent window (no baseline data) will have all dimension ratios default to 999, producing an extreme drift score. This does NOT indicate malicious drift — it indicates a newly discovered or recently onboarded device. Solution: Flag these devices as "🔵 New Device — No Baseline" rather than "🔴 Critical Drift". Review the process inventory to confirm the device is running expected management software (MDM agents, AV, etc.). Recommend monitoring for an additional baseline period before assessing drift.
Data Lake Ingestion Boundary
Problem: DeviceProcessEvents in Sentinel Data Lake may have an ingestion lag or retention boundary that causes the most recent hours of data to be absent. This can make devices appear to have zero recent-window activity when data simply hasn't been ingested yet. Solution: In the fleet daily trend (Query 14), verify that the most recent day has comparable event counts to previous days. If the last day shows significantly fewer events across ALL devices, note: "⚠️ Data Lake ingestion boundary detected — recent window may be incomplete." Adjust the recent window start time if needed.
Advanced Hunting Fallback
Problem: DeviceProcessEvents may fail in one of the two execution tools due to query complexity, timeout, or API limitations. This table is available in both Advanced Hunting and Sentinel Data Lake.
Solution: Follow the global Tool Selection Rule in .github/copilot-instructions.md: use Advanced Hunting (RunAdvancedHuntingQuery with Timestamp) for lookbacks ≤ 30 days, and Sentinel Data Lake (query_lake with TimeGenerated) for lookbacks > 30 days (e.g., 90-day baselines). If the preferred tool fails, try the other — same table, same data. If both fail, check that the Defender XDR connector is connected to the workspace.
System/Service Accounts Dominating Volume
Problem: The majority of process events on servers come from system accounts (SYSTEM, LOCAL SERVICE, NETWORK SERVICE, root). These accounts are expected and will dominate volume, process, and chain dimensions.
Solution: When analyzing drift, distinguish between system-level processes (expected) and user-driven processes (more significant for drift). In the account landscape, flag any human user accounts (non-system) executing unusual processes. System accounts executing new processes are still worth noting but at lower priority.
Short Baseline Windows and False Positives
Problem: Unlike SPN/user drift which uses a 90-day baseline, device process drift often uses shorter windows (e.g., 6 days baseline, 1 day recent). Short baselines miss infrequent but legitimate processes (weekly maintenance scripts, monthly update cycles, etc.). Solution: Note the baseline length in the report. If many "first-seen" processes are common system utilities (Task Scheduler, Windows Update, antivirus scans), acknowledge that a longer baseline would likely include them. Recommend extending to 14-30 days for production use.
DeviceProcessEvents Volume Limits
Problem: DeviceProcessEvents can generate massive volumes — tens of thousands of events per device per day on busy servers. KQL queries with dcount() and make_set() can be expensive.
Solution: Always apply TimeGenerated filter as the FIRST filter. Use take or summarize to limit intermediate results. For fleet-wide analysis across many devices, consider processing in batches if total events exceed 500K.
Intermittent-Use Device Volume Inflation
Problem: Devices that are only powered on occasionally (e.g., once per month for maintenance, lab servers, training VMs) will have their baseline daily average diluted across the full analysis window — even though telemetry only exists for a handful of days. When one of these devices powers on during the recent window, the volume ratio can spike to 300%+ even though per-session behavior is identical to baseline sessions. This creates near-threshold or above-threshold DriftScores driven entirely by the volume dimension, with no meaningful behavioral change. Solution: For any device with Volume ratio >200% but Process/Account/Chain/Company ratios below 100%, run Query 21 (DeviceInfo uptime) to determine actual days online. If the device was online for <30% of the baseline window (i.e., fewer than ~27 out of 90 days), flag as "⚠️ Intermittent device — volume-driven score inflation" and include a per-session comparison (Query 22). Consider reporting both the raw DriftScore and an "adjusted" assessment that contextualizes the volume dimension against actual uptime days rather than calendar days. The diversity dimensions (Processes, Accounts, Chains, Companies) are not affected by intermittent usage and remain reliable drift indicators.
Chart formatting for adjusted Volume: In the ASCII drift chart, display only the effective (adjusted) percentage in the percentage column, and append the raw value in the description text after the bar. This avoids variable-width bracket content that breaks bar alignment. Example:
Volume [ 85.1%] ████████────── ↓ Adjusted from 288.3% raw (intermittent uptime)
Process [ 79.5%] ████████────── ↓ Contracting (97/122 unique)
Version-Stamped Process Name False Positives
Problem: Automatic software updates produce binaries with version numbers embedded in the filename (e.g., AM_Delta_Patch_1.443.XXX.0.exe, MicrosoftEdge_X64_134.0.XXXX.XX_*.exe, odt*.tmp.exe). These appear as "first-seen" in Query 16 and "new chains" in Query 17 regardless of baseline length, because each update generates a unique filename.
Solution: When interpreting first-seen processes, check ProcessVersionInfoCompanyName — if the signing company is well-known (Microsoft Corporation, Google LLC, etc.), these are expected update artifacts. In the report, group these under "📦 Expected Update Artifacts" rather than flagging as suspicious drift. For automated scoring, consider excluding filenames matching patterns like AM_Delta_Patch_*, MicrosoftEdge_X64_*, and *.tmp.exe from the drift score calculation, or weighting them lower.
Linux Processes Dominate Unsigned Query
Problem: Linux binaries do not populate ProcessVersionInfoCompanyName (a Windows PE metadata field). Query 19b (unsigned processes) will be flooded with legitimate Linux utilities (gawk, bash, grep, sed, curl, apt-get, etc.) on any fleet containing Linux devices.
Solution: When running Query 19b on a mixed fleet, filter to Windows devices only (| where DeviceName !has "linux") or annotate Linux results separately. For Linux devices, focus on unusual binary paths (e.g., processes running from /tmp/, /dev/shm/, or user home directories) rather than signing status.
Error Handling
Common Issues
| Issue | Solution |
|---|---|
DeviceProcessEvents table not found |
Table may not be connected via Defender XDR connector. Check with search_tables. Verify Defender for Endpoint is onboarded. |
DeviceProcessEvents query timeout |
Reduce lookback window or add intermediate summarize. Split fleet-wide into batches by device if >20 devices. |
| Advanced Hunting fails for DeviceProcessEvents | Default to Sentinel Data Lake (query_lake). Adapt Timestamp → TimeGenerated. See Advanced Hunting Fallback pitfall. |
| Device appears only in recent window | New device onboarding — set DriftScore=999, flag as "New Device", not malicious drift. |
| All devices show zero recent events | Data Lake ingestion boundary — verify with fleet daily trend (Query 14). Adjust recent window if needed. |
| Query timeout | Reduce the lookback window, or add | take 100 to intermediate results. |
Validation Checklist
Before presenting results, verify:
- All applicable data sources were queried (even if some returned 0 results)
- Low-volume denominator floor was applied to any device with BL_DailyAvg < 10
- Corroborating evidence was checked for every flagged device
- Empty results are explicitly reported with ✅ (not silently omitted)
- The report includes the drift score formula and threshold for transparency
- SecurityAlert was joined with SecurityIncident for real Status/Classification (never read SecurityAlert.Status directly)
- Incident classifications (TP/FP/BP) were factored into risk assessment — FalsePositive alerts discounted, TruePositive alerts escalated
- Fleet daily trend was verified for data completeness (no ingestion boundary issues)
- Newly onboarded devices (baseline-only = no recent, or recent-only = no baseline) were correctly identified
- DriftScore=999 entities were flagged as "New Device" not "Critical Drift"
- System/service account processes were distinguished from user-driven processes
- First-seen processes were checked for legitimate software deployment vs suspicious binaries
- Version-stamped update binaries (AM_Delta_Patch_, MicrosoftEdge_X64_, odt*.tmp.exe) were classified as expected noise
- Unsigned/unusually-signed binaries were identified (Linux devices flagged separately from Windows)
- Notable command-line patterns were searched (reconnaissance, lateral movement, persistence, exfiltration)
- SecurityAlert correlation was performed for all analyzed devices
- Baseline window length was noted and its limitations acknowledged
- For devices with Volume ratio >200% or DriftScore >130%: DeviceInfo uptime (Query 21) was checked to identify intermittent-use devices
- Intermittent-use devices were annotated with uptime context and per-session comparison (Query 22)
- Volume-driven drift scores on intermittent devices were contextualized as mathematical artifacts (not behavioral expansion)
SVG Dashboard Generation
📊 Optional post-report step. After a Device scope drift report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/scope-drift/device/Scope_Drift_Report_<entity>_<date>.md - Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/scope-drift/device/{report_name}_dashboard.svg
The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.
.github/skills/scope-drift-detection/spn/SKILL.md
npx skills add SCStelz/security-investigator --skill scope-drift-detection-spn -g -y
SKILL.md
Frontmatter
{
"name": "scope-drift-detection-spn",
"description": "Use this skill when asked to detect scope drift, behavioral expansion, or gradual privilege\/access creep in service principals or automation accounts. Triggers on keywords like \"scope drift\", \"service principal drift\", \"SPN behavioral change\", \"automation account drift\", \"baseline deviation\", \"access expansion\", or when investigating whether a service principal has gradually expanded beyond its intended purpose. This skill builds a 90-day behavioral baseline per SPN, compares it with 7-day recent activity, computes a weighted Drift Score across 5 dimensions, and correlates with SecurityAlert and AuditLogs for corroborating evidence.",
"drill_down_prompt": "Analyze service principal drift for {entity} — resource\/IP\/location expansion",
"threat_pulse_domains": [
"spn"
]
}
Service Principal Scope Drift Detection — Instructions
Purpose
Credit: The scope drift detection concept for service principals was inspired by Iftekhar Hussain's article The Agentic SOC Era: How Sentinel MCP Enables Autonomous Security Reasoning (Feb 2026), which demonstrated multi-source correlation across AADServicePrincipalSignInLogs, AuditLogs, and SecurityAlert to build 90-day behavioral baselines and surface drift via weighted scoring.
This skill detects scope drift — the gradual, often imperceptible expansion of access or behavior beyond an established baseline — in Entra ID service principals. Unlike sudden compromise (which triggers alerts), scope drift is a slow-burn pattern that evades threshold-based detections.
Entity Type: Service Principal
| Identifier | Primary Table(s) | Use Case |
|---|---|---|
| ServicePrincipalName / ServicePrincipalId | AADServicePrincipalSignInLogs |
App registrations, automation accounts, managed identities |
What this skill detects:
- Volume spikes in sign-in activity relative to historical baseline
- New target resources (APIs, services) not previously accessed
- New source IP addresses or geographic locations
- Increased failure rates indicating probing or misconfiguration
- Credential/permission changes correlated with behavioral shifts
- Security alerts involving the drifting entities
Related skills:
- User Scope Drift — for user accounts (UPNs)
- Device Scope Drift — for endpoints/devices
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Output Modes - Inline chat vs. Markdown file
- Quick Start - 7-step investigation pattern
- Drift Score Formula - Weighted composite scoring (5 dimensions)
- Execution Workflow - Complete 4-phase process
- Sample KQL Queries - Validated query patterns (Queries 1-4)
- Report Template - Output format specification
- Known Pitfalls - Edge cases and false positives
- Error Handling - Troubleshooting guide
- SVG Dashboard Generation - Visual dashboard from report
Investigation shortcuts:
- SPN drift triage (TP Q5): Q1 (baseline vs recent — drift scores + dimension ratios) → Q4 (alert/incident correlation) → Tier 1 deep dives for flagged SPNs
- Compromised SPN forensics (TP Q5 + incident context): Q1 (behavioral profile) → Q3 (detailed AuditLog changes — credential adds, consent grants, timestamps, actors) → Q4 (incident status/classification check)
- Permission escalation investigation (TP Q10, standalone): Q2 (AuditLog summary — operation counts baseline vs recent) → Q3 (detailed per-operation rows with initiator/target/modified properties) → Graph API: app permission audit
- IP infrastructure expansion (TP Q5, high IPDrift): Q1 (new IPs list from
NewIPsarray) → anti-join baseline IPs to identify novel sources → IP enrichment (enrich_ips.pyorioc-investigation) for non-Azure IPs
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY SPN scope drift analysis:
- ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
- ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
- ALWAYS build baseline FIRST before comparing recent activity
- ALWAYS apply the low-volume denominator floor to prevent false-positive drift scores on sparse baselines
- ALWAYS correlate across all required data sources (AADServicePrincipalSignInLogs, AuditLogs, SecurityAlert)
- ALWAYS run independent queries in parallel for performance
- NEVER report a drift flag without corroborating evidence from at least one secondary data source
Data Sources
| Data Source | Role | Purpose |
|---|---|---|
AADServicePrincipalSignInLogs |
✅ Primary | SPN sign-in behavioral baseline |
AuditLogs |
✅ Corroboration | Permission/credential/role changes |
SecurityAlert |
✅ Corroboration | Corroborating alert evidence |
SecurityIncident |
✅ Corroboration | Real alert status/classification |
⛔ MANDATORY: Sentinel Workspace Selection
This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:
When invoked from incident-investigation skill:
- Inherit the workspace selection from the parent investigation context
- If no workspace was selected in parent context: STOP and ask user to select
When invoked standalone (direct user request):
- ALWAYS call
list_sentinel_workspacesMCP tool FIRST - If 1 workspace exists: Auto-select, display to user, proceed
- If multiple workspaces exist:
- Display all workspaces with Name and ID
- ASK: "Which Sentinel workspace should I use for this investigation?"
- ⛔ STOP AND WAIT for user response
- ⛔ DO NOT proceed until user explicitly selects
- If a query fails on the selected workspace:
- ⛔ DO NOT automatically try another workspace
- STOP and report the error, display available workspaces, ASK user to select
🔴 PROHIBITED ACTIONS:
- ❌ Selecting a workspace without user consent when multiple exist
- ❌ Switching to another workspace after a failure without asking
- ❌ Proceeding with investigation if workspace selection is ambiguous
Output Modes
This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.
Mode 1: Inline Chat Summary (Default)
- Render the full drift analysis directly in the chat response
- Includes ASCII tables, Pareto chart, drift dimension bars, and security assessment
- Best for quick review and interactive follow-up questions
Mode 2: Markdown File Report
- Save a comprehensive report to
reports/scope-drift/spn/Scope_Drift_Report_<entity>_<timestamp>.md - All ASCII visualizations render correctly inside markdown code fences (
```) - Includes all data from inline mode plus additional detail sections
- Use
create_filetool — NEVER use terminal commands for file output - Filename patterns:
- Single SPN:
Scope_Drift_Report_<spn_short_name>_YYYYMMDD_HHMMSS.md(use display name, sanitized: lowercase, spaces/special chars replaced with hyphens) - All SPNs:
Scope_Drift_Report_all_spns_YYYYMMDD_HHMMSS.md(tenant-wide scan of all service principals)
- Single SPN:
Markdown Rendering Notes
- ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
- ✅ Unicode block characters (
█full block,─box-drawing horizontal) display correctly in monospaced fonts - ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
- ✅ Standard markdown tables (
| col |) render as formatted tables - Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering
Quick Start (TL;DR)
When a user requests SPN scope drift detection:
- Select Workspace →
list_sentinel_workspaces, auto-select or ask - Determine Output Mode → Ask if not specified: inline, markdown file, or both
- Run Phase 1 → Query 1 (AADServicePrincipalSignInLogs baseline vs recent)
- Apply Entity Scaling → Compute drift scores, rank SPNs, apply tiered depth limits (see Entity Scaling)
- Run Phases 2-3 → Queries 2-4 (AuditLogs + SecurityAlert) — scoped per tier
- Compute Final Assessment → Combine drift scores with corroborating evidence
- Output Results → Render in selected mode(s) with tiered depth
Entity Scaling (Large Environments)
Problem: In small tenants, running Queries 2–4 for every SPN is fine. In enterprise environments with hundreds or thousands of service principals, running deep-dive queries for every flagged entity is prohibitively expensive and produces unreadable reports.
Solution: After Phase 1 computes drift scores for all SPNs, apply tiered depth based on entity count and drift severity.
Entity Count Detection
After Query 1, count distinct SPNs in the result set:
| Entity Count | Tier | Deep Dive Limit | Behavior |
|---|---|---|---|
| ≤ 30 SPNs | Small | All flagged | Full deep dive for every SPN > 150%. No limiting needed. |
| 31–100 SPNs | Medium | Top 10 | Full deep dive for top 10 by DriftScore. Summary row for remaining flagged SPNs. |
| 101–500 SPNs | Large | Top 10 | Full deep dive for top 10. Tier 2 summary (next 15) with new resources/IPs only. Remaining flagged SPNs listed in ranking table with scores but no deep dive. |
| > 500 SPNs | Very Large | Top 10 | Same as Large, plus: filter Phase 1 results to BL_TotalSignIns > 10 to exclude near-silent SPNs from scoring. |
Tiered Depth Model
After computing drift scores and ranking all SPNs, assign tiers:
| Tier | Entities | Queries Run | Report Depth |
|---|---|---|---|
| Tier 1 (Full) | Top N by DriftScore | All: Q2, Q3, Q4 | Full deep dive: ASCII chart, dimension table, new resources/IPs/locations, AuditLog changes, alerts |
| Tier 2 (Summary) | Next 15 flagged SPNs (or remaining if < 15) | Q4 only (SecurityAlert correlation) | One-line summary per SPN: score, top 3 new resources, new IPs, flag status |
| Tier 3 (Score only) | All remaining flagged SPNs | None beyond Phase 1 | Row in ranking table: SPN name, drift score, dimension ratios, flag emoji |
| Stable | SPNs ≤ 150% | None beyond Phase 1 | Omitted from deep dives. Included in summary statistics only. |
User Override
If the user explicitly asks for "all SPNs detailed" or "full report", honor the request but warn:
⚠️ Tenant has <N> service principals with <X> flagged above 150%. Running full deep dives for all flagged SPNs may be slow and produce a very long report. Proceed? (Default: top 10 deep dives + summary for others)
Report Disclosure
When tiered depth is applied, always disclose in the report header:
**Entity Count:** <N> service principals (Large tenant — tiered analysis applied)
**Deep Dives:** Top <X> by DriftScore (Tier 1: full analysis)
**Summaries:** <Y> additional flagged SPNs (Tier 2: alert correlation only)
**Score Only:** <Z> additional flagged SPNs (Tier 3: ranking table only)
**Stable:** <W> SPNs ≤ 150% (omitted from deep dives)
Drift Score Formula
The Drift Score is a weighted composite of behavioral dimensions, normalized so that 100 = identical to baseline.
Service Principal Formula (5 Dimensions)
$$ \text{DriftScore}_{SPN} = 0.30V + 0.25R + 0.20IP + 0.15L + 0.10F $$
| Dimension | Weight | Metric | Why |
|---|---|---|---|
| Volume | 30% | Daily avg sign-ins (recent / baseline) | Sudden activity surges indicate misuse or compromise |
| Resources | 25% | Distinct target resources accessed | New resource targets = lateral expansion |
| IPs | 20% | Distinct source IP addresses | New IPs = infrastructure changes, credential theft |
| Locations | 15% | Distinct geographic locations | New geos = impossible travel or proxy rotation |
| Failure Rate | 10% | Failure rate delta (recent − baseline) | Rising failures = probing or brute-force |
Interpretation Scale
| Score | Meaning | Action |
|---|---|---|
| < 80 | Contracting scope | ✅ Normal — entity is doing less than usual |
| 80–120 | Stable / normal variance | ✅ No action required |
| 120–150 | Moderate deviation | 🟡 Monitor — check for legitimate reasons |
| > 150 | Significant drift | 🔴 FLAG — investigate with corroborating evidence |
| > 250 | Extreme drift | 🔴 CRITICAL — immediate investigation required |
Low-Volume Denominator Floor
CRITICAL: For entities with sparse baselines (< 10 daily sign-ins), the volume ratio is artificially inflated. Apply a floor:
IF BL_DailyAvg < 10:
AdjustedVolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100
Flag the score with: "⚠️ Low-volume baseline — ratio may be inflated"
This prevents an entity averaging 1 sign-in/day from triggering at 6 sign-ins/day (600% ratio but trivial absolute volume).
Failure Rate Dimension — Delta-to-Ratio Conversion
CRITICAL: The FailRate dimension is a percentage-point delta, not a multiplicative ratio like the other dimensions. Convert it to the same 0–200+ scale using this formula:
FailRateDelta = RecentFailRate - BaselineFailRate (percentage points)
FailRateRatio = 100 + (FailRateDelta × 10) (scaled: each +1pp = +10 on the ratio scale)
| Baseline FailRate | Recent FailRate | Delta | Ratio | Interpretation |
|---|---|---|---|---|
| 5.00% | 5.00% | 0.00 | 100.0 | No change |
| 5.00% | 8.00% | +3.00 | 130.0 | Moderate increase |
| 5.00% | 12.00% | +7.00 | 170.0 | 🔴 Above threshold |
| 5.00% | 2.00% | -3.00 | 70.0 | Improving (contracting) |
| 0.00% | 0.00% | 0.00 | 100.0 | No change (both clean) |
| 0.00% | 5.00% | +5.00 | 150.0 | 🟡 At threshold — new failures appearing |
Edge case: Baseline = 0% avoids division-by-zero because delta is additive, not multiplicative. The scaling factor (×10) means each percentage point of failure rate increase maps to 10 points on the drift scale. This keeps FailRate on the same magnitude as the other dimensions.
In the ASCII chart: Show the ratio as the bar fill percentage and append the raw delta as direction indicator: ^+X.XX (increasing) or v-X.XX (decreasing).
Execution Workflow
Phase 1: Behavioral Baseline vs. Recent Comparison
Baseline window: 90 days (days 8–97 ago) Recent window: 7 days (last 7 days)
This is the primary query that computes per-SPN behavioral profiles and drift metrics.
| Data Source | Query | Notes |
|---|---|---|
AADServicePrincipalSignInLogs |
Query 1 | Single query, 5 dimensions |
Phase 2: Permission & Configuration Change Audit
Data source: AuditLogs
Correlation: Same 97-day window, filtered to SPNs from Phase 1
Operations to Look For:
Add/Remove service principal credentialsUpdate application – Certificates and secrets managementConsent to applicationAdd delegated permission grantAdd app role assignment to service principalAdd applicationAdd service principal- Any operation containing: "permission", "role", "consent", "oauth", "credential", "certificate", "secret"
Phase 3: Security Alert Correlation
Run Query 4 in parallel with Phase 2 queries for performance.
- SecurityAlert + SecurityIncident (Query 4): Check for alerts referencing SPN IDs or names, joined with SecurityIncident for real status/classification. Never read SecurityAlert.Status directly — it's always "New".
Phase 4: Score Computation & Report Generation
- Compute DriftScore per SPN using the 5-dimension formula
- Apply the low-volume denominator floor
- Flag any entity exceeding 150% threshold
- For flagged entities: assess corroborating evidence (permission changes, alerts)
- Generate risk assessment with emoji-coded findings
- Render output in the user's selected mode
Sample KQL Queries
Query 1: Baseline vs. Recent Behavioral Comparison
// Build 90-day baseline (days 8-97 ago) vs recent 7 days per service principal
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
let recentStart = ago(7d);
// Baseline period: per-SPN behavioral profile
let baseline = AADServicePrincipalSignInLogs
| where TimeGenerated between (baselineStart .. baselineEnd)
| summarize
BL_TotalSignIns = count(),
BL_Days = dcount(bin(TimeGenerated, 1d)),
BL_DistinctResources = dcount(ResourceDisplayName),
BL_DistinctIPs = dcount(IPAddress),
BL_DistinctLocations = dcount(Location),
BL_FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
BL_Resources = make_set(ResourceDisplayName, 50),
BL_IPs = make_set(IPAddress, 50),
BL_Locations = make_set(Location, 50)
by ServicePrincipalName, ServicePrincipalId;
// Recent period: last 7 days
let recent = AADServicePrincipalSignInLogs
| where TimeGenerated >= recentStart
| summarize
RC_TotalSignIns = count(),
RC_Days = dcount(bin(TimeGenerated, 1d)),
RC_DistinctResources = dcount(ResourceDisplayName),
RC_DistinctIPs = dcount(IPAddress),
RC_DistinctLocations = dcount(Location),
RC_FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
RC_Resources = make_set(ResourceDisplayName, 50),
RC_IPs = make_set(IPAddress, 50),
RC_Locations = make_set(Location, 50)
by ServicePrincipalName, ServicePrincipalId;
// Join and compute drift metrics
baseline
| join kind=inner recent on ServicePrincipalId
| extend
BL_DailyAvg = round(1.0 * BL_TotalSignIns / BL_Days, 1),
RC_DailyAvg = round(1.0 * RC_TotalSignIns / RC_Days, 1)
| extend
VolumeRatio = iff(BL_DailyAvg > 0, round(RC_DailyAvg / BL_DailyAvg * 100, 1), 999.0),
ResourceRatio = iff(BL_DistinctResources > 0, round(1.0 * RC_DistinctResources / BL_DistinctResources * 100, 1), 999.0),
IPRatio = iff(BL_DistinctIPs > 0, round(1.0 * RC_DistinctIPs / BL_DistinctIPs * 100, 1), 999.0),
LocationRatio = iff(BL_DistinctLocations > 0, round(1.0 * RC_DistinctLocations / BL_DistinctLocations * 100, 1), 999.0),
FailRateDelta = RC_FailRate - BL_FailRate,
NewResources = set_difference(RC_Resources, BL_Resources),
NewIPs = set_difference(RC_IPs, BL_IPs),
NewLocations = set_difference(RC_Locations, BL_Locations)
| extend
NewResourceCount = array_length(NewResources),
NewIPCount = array_length(NewIPs),
NewLocationCount = array_length(NewLocations)
| extend
// Composite Drift Score (weighted)
// FailRate uses additive delta→ratio conversion: 100 + delta×10
// Negative deltas (improvement) produce values < 100 (contracting)
FailRateRatio = 100.0 + FailRateDelta * 10
| extend
DriftScore = round(
(VolumeRatio * 0.30) +
(ResourceRatio * 0.25) +
(IPRatio * 0.20) +
(LocationRatio * 0.15) +
(FailRateRatio * 0.10)
, 1)
| project ServicePrincipalName, ServicePrincipalId,
BL_Days, BL_TotalSignIns, BL_DailyAvg, BL_DistinctResources, BL_DistinctIPs, BL_DistinctLocations, BL_FailRate,
RC_Days, RC_TotalSignIns, RC_DailyAvg, RC_DistinctResources, RC_DistinctIPs, RC_DistinctLocations, RC_FailRate,
VolumeRatio, ResourceRatio, IPRatio, LocationRatio, FailRateDelta, DriftScore,
NewResourceCount, NewIPCount, NewLocationCount,
NewResources, NewIPs, NewLocations,
BL_Resources, RC_Resources
| order by DriftScore desc
Post-processing note: The low-volume denominator floor (max(BL_DailyAvg, 10)) is NOT applied in the KQL above — it must be applied during post-processing when computing the final assessment. If BL_DailyAvg < 10, recalculate VolumeRatio using the floor value and recompute DriftScore. Flag affected SPNs with: "⚠️ Low-volume baseline — ratio may be inflated."
Query 2: AuditLog Permission & Credential Changes
// Permission/credential/role changes for service principals
// Substitute <SPN_IDS> with comma-separated SPN IDs from Query 1
// Substitute <SPN_NAMES> with SPN display names from Query 1
AuditLogs
| where TimeGenerated > ago(97d)
| where OperationName has_any ("service principal", "application", "credential", "certificate",
"secret", "permission", "role", "consent", "oauth")
| where tostring(TargetResources) has_any (<SPN_IDS>)
or tostring(InitiatedBy) has_any (<SPN_IDS>)
| extend InBaseline = TimeGenerated < ago(7d)
| summarize
BaselineOps = countif(InBaseline),
RecentOps = countif(not(InBaseline))
by OperationName
| order by RecentOps desc
Query 3: Detailed Recent AuditLog Changes
// Detailed drill-down for the recent 7-day window
// Substitute <SPN_IDS> with SPN IDs from Query 1
AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName has_any ("service principal", "application", "credential", "certificate",
"secret", "permission", "role", "consent", "oauth", "update")
| where tostring(TargetResources) has_any (<SPN_IDS>)
| project TimeGenerated, OperationName, Result,
InitiatedBy = tostring(parse_json(tostring(InitiatedBy)).app.displayName),
TargetName = tostring(parse_json(tostring(parse_json(tostring(TargetResources))[0])).displayName),
TargetId = tostring(parse_json(tostring(parse_json(tostring(TargetResources))[0])).id),
ModifiedProperties = tostring(parse_json(tostring(parse_json(tostring(TargetResources))[0])).modifiedProperties)
| order by TimeGenerated desc
Query 4: SecurityAlert + SecurityIncident Correlation
// Security alerts referencing any of the service principals, joined with SecurityIncident for real status
// IMPORTANT: SecurityAlert.Status is immutable (always "New") — MUST join SecurityIncident for real Status/Classification
// Substitute <SPN_IDS> and <SPN_NAMES> with values from Query 1
let relevantAlerts = SecurityAlert
| where TimeGenerated > ago(97d)
| where Entities has_any (<SPN_IDS>) or Entities has_any (<SPN_NAMES>)
or CompromisedEntity has_any (<SPN_NAMES>)
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName, ProductComponentName, Tactics, Techniques, TimeGenerated;
SecurityIncident
| where CreatedTime > ago(97d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend Period = iff(TimeGenerated1 < ago(7d), "Baseline", "Recent")
| summarize
BaselineAlerts = countif(Period == "Baseline"),
RecentAlerts = countif(Period == "Recent"),
TotalAlerts = count(),
Severities = make_set(AlertSeverity, 5),
IncidentStatuses = make_set(Status, 5),
Classifications = make_set(Classification, 5),
BaselineIncidents = dcountif(IncidentNumber, Period == "Baseline"),
RecentIncidents = dcountif(IncidentNumber, Period == "Recent")
by ProductName
| order by TotalAlerts desc
Interpreting Incident Status in Drift Context:
| Incident Status | Classification | Impact on Drift Assessment |
|---|---|---|
| Closed | TruePositive | 🔴 Confirmed threat — significantly increases drift risk |
| Closed | FalsePositive | 🟢 False alarm — discount from drift risk, note as noise |
| Closed | BenignPositive | 🟡 Expected behavior — note but don't escalate |
| Active/New | Any | 🟠 Unresolved — flag for attention, may indicate ongoing threat |
Product Name Mapping (Legacy → Current Branding):
The ProductName field in SecurityAlert contains the detection product. When rendering reports, translate to current Microsoft branding:
| SecurityAlert.ProductName (raw) | Report Display Name |
|---|---|
| Microsoft Defender Advanced Threat Protection | Microsoft Defender for Endpoint |
| Microsoft Cloud App Security | Microsoft Defender for Cloud Apps |
| Microsoft Data Loss Prevention | Microsoft Purview Data Loss Prevention |
| Azure Sentinel | Microsoft Sentinel |
| Microsoft 365 Defender | Microsoft Defender XDR |
| Office 365 Advanced Threat Protection | Microsoft Defender for Office 365 |
| Azure Advanced Threat Protection | Microsoft Defender for Identity |
Note: ProviderName (e.g., ASI Scheduled Alerts, MDATP, MCAS) is the internal provider identifier. ProductName (e.g., Azure Sentinel, Microsoft Defender Advanced Threat Protection) is the user-facing product name. Always use ProductName for grouping and display; ProviderName is unreliable for product identification (e.g., all alerts show as Microsoft XDR at the incident level).
Report Rendering: Group alerts by product using the current branded name. Show Baseline Alerts vs Recent Alerts and Baseline Incidents vs Recent Incidents columns per product row, plus Severity and Classification. Include a Total row. Add a brief 1-2 sentence summary comparing alert volume between periods. Do NOT list individual alert names — keep the table concise at the product level.
Report Template
Inline Chat Report Structure
The inline report MUST include these sections in order:
- Header — Workspace, analysis period, drift threshold, data sources
- Ranked Drift Score Table — All SPNs sorted by DriftScore descending, with per-dimension ratios
- Flagged Entity Deep Dive (for each Tier 1 SPN > 150%) — Baseline vs. recent comparison, dimension bar chart, new IPs/resources, corroborating evidence
- Tier 2 Entity Summaries (if entity scaling applied) — One-line summary per Tier 2 SPN: score, top 3 new resources, new IPs, alert count
- Correlated Signal Summary — AuditLogs and SecurityAlert/Incident findings in a single table
- Behavioral Baseline Chart — ASCII bar chart showing all SPNs' daily avg vs. baseline
- Security Assessment — Emoji-coded findings table with evidence citations
- Verdict Box — Overall risk level, root cause analysis, recommendations
Markdown File Report Structure
When outputting to markdown file, include everything from the inline format PLUS:
Filename patterns:
- Single SPN:
reports/scope-drift/spn/Scope_Drift_Report_<spn_short_name>_YYYYMMDD_HHMMSS.md - All SPNs:
reports/scope-drift/spn/Scope_Drift_Report_all_spns_YYYYMMDD_HHMMSS.md
# Service Principal Scope Drift Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Baseline Period:** <start> → <end> (90 days)
**Recent Period:** <start> → <end> (7 days)
**Drift Threshold:** 150%
**Data Sources:** AADServicePrincipalSignInLogs, AuditLogs, SecurityAlert
---
## Executive Summary
<1-3 sentence summary: how many SPNs analyzed, how many flagged, overall risk level>
---
## Drift Score Ranking
<ASCII table with all SPNs, per-dimension ratios, flag status>
<!-- Wrap in code fence for consistent rendering -->
---
## Flagged Entities
### <SPN Name> — Drift Score <score>
**ASCII Drift Dimension Chart (REQUIRED):**
Render a box-drawn chart inside a code fence. **Inner width: 58 chars** (every line between `│` markers = exactly 58 visual characters). No emoji inside boxes — use text labels.
**Alignment:** Name (9 chars padded) + weight (5) + gap (2) + bars (20 `█─`) + gap (2) + pct (6, right-aligned: `XXX.X%` or ` XX.X%`) + gap (2) + direction (10 total: `^`/`v`/`=` + 9 trailing spaces, or FailRate: delta like `v-X.XX` + 4 trailing spaces). Status labels (centered): `STABLE`, `STABLE (Low-Volume)`, `NEAR THRESHOLD`, `ABOVE THRESHOLD`, `CRITICAL`. Direction: `^` (up), `v` (down), `=` (stable).
**Bar characters:** Use `█` (U+2588 full block) for filled portions and `─` (U+2500 box-drawing horizontal) for the unfilled track.
┌──────────────────────────────────────────────────────────┐ │ SPN DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (30%) ██████────────────── XXX.X% ^ │ │ Resources(25%) ███───────────────── XX.X% v │ │ IPs (20%) ██████────────────── XXX.X% = │ │ Locations(15%) ██────────────────── XX.X% v │ │ FailRate (10%) ██████────────────── XXX.X% v-X.XX │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘
**Bar fill:** 20 chars wide. Filled = round(ratio/100 × 20), capped at 20. Title and status: center within 58 chars (include adjusted score if applicable, e.g., "SPN DRIFT SCORE: 107.5 (adj 80.5)"). Use `█` for filled, `─` for unfilled.
**Then** render the standard markdown dimension table:
| Dimension | Weight | Baseline (90d) | Recent (7d) | Ratio | Weighted | Status |
|-----------|--------|----------------|-------------|-------|----------|--------|
<New resources, new IPs, new locations enumeration>
<Corroborating evidence from AuditLogs, SecurityAlert>
---
## Pareto Analysis
<ASCII Pareto chart of drift dimensions or categories>
<80/20 analysis text>
---
## Correlated Signals
| Data Source | Finding | Incident Status |
|-------------|---------|-----------------|
| AADServicePrincipalSignInLogs | ... | N/A |
| AuditLogs | ... | N/A |
| SecurityAlert / SecurityIncident | <Group by ProductName, translate to current branding> | <Status: New/Active/Closed, Classification: TP/FP/BP> |
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡 **Factor** | Evidence-based finding |
---
## Verdict
**ASCII Verdict Box (REQUIRED):**
Render a box-drawn verdict summary inside a code fence. **Inner width: 66 chars.** No emoji inside boxes. Pad every line to exactly 66 chars between `│` markers.
┌──────────────────────────────────────────────────────────────────┐ │ OVERALL RISK: <LEVEL> -- <One-line summary> │ │ Flagged SPNs: X of Y (Threshold: 150%) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘
**Then** render the full verdict with:
- Root Cause Analysis paragraph
- Key Findings (numbered list)
- Recommendations (emoji-prefixed list)
---
## Appendix: Query Details
Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.
| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q1 — SPN Baseline vs. Recent | AADServicePrincipalSignInLogs | X,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |
*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*
Known Pitfalls
SecurityAlert.Status Is Immutable — Always Join SecurityIncident
Problem: The Status field on SecurityAlert is set to "New" at creation time and never changes. It does NOT reflect whether the alert has been investigated, closed, or classified. Reading SecurityAlert.Status as current investigation status will always show "New" regardless of actual state.
Solution: MUST join with SecurityIncident to get real Status (New/Active/Closed) and Classification (TruePositive/FalsePositive/BenignPositive). See Query 4 which implements this join. When assessing drift risk from alerts, differentiate: Closed-FalsePositive alerts are noise (discount), Closed-TruePositive alerts are confirmed threats (escalate), Active/New incidents need attention (flag).
Low-Volume Statistical Inflation
Problem: Entities with very low baseline activity (e.g., 1 sign-in/day) will show extreme volume ratios even with minor changes. Solution: Apply the denominator floor (minimum 10 sign-ins/day for volume ratio calculation). Always flag low-volume baselines in the report.
Seasonal/Cyclical Baselines
Problem: Some entities have weekly patterns (lower on weekends) or monthly cycles (month-end batch jobs). Solution: Note if the 7-day recent window falls on an atypical portion of the cycle. The 90-day baseline smooths most cyclical patterns, but edge cases exist.
IPv6 Fabric Address Churn
Problem: Microsoft first-party SPNs (MCAS, Defender, etc.) rotate through fd00: internal fabric IPv6 addresses automatically. This inflates the IP ratio without representing actual infrastructure changes.
Solution: When all new IPs share the same fd00: prefix, note this as "Microsoft internal fabric rotation" and downgrade the IP dimension's contribution to the drift score assessment. Do NOT flag IPv6 churn from Microsoft fabric addresses as suspicious.
Credential Rotation False Positives
Problem: Automated certificate/secret rotation creates regular Add/Remove service principal credentials audit entries.
Solution: Check if credential operations follow a regular cadence (weekly/monthly). If rotation is periodic and consistent with baseline, classify as operational — not drift.
SPNs Without Baseline Data
Problem: Newly provisioned SPNs have no baseline to compare against.
Solution: These are excluded from the join kind=inner and will not appear in results. If the user asks about a specific SPN with no baseline, report: "No baseline data available — SPN was provisioned within the recent window or has no sign-in history in the 90-day baseline period."
Sentinel IDs vs Defender XDR IDs for Triage MCP Drill-Down
Problem: Query 4 returns IncidentNumber (Sentinel) and SystemAlertId (Sentinel), but the Triage MCP tools (GetIncidentById, GetAlertById) expect Defender XDR IDs. Passing Sentinel IDs returns "not found" errors.
Solution: When following up on correlated alerts/incidents via Triage MCP:
- Incidents: Always project
ProviderIncidentIdfromSecurityIncidentand pass that toGetIncidentById— never useIncidentNumber - Alerts: Extract the Defender ID from
SecurityAlert:tostring(parse_json(ExtendedProperties).IncidentId)— never useSystemAlertIdwith the Triage MCP - See the global Sentinel ↔ Defender XDR ID Mapping rule in copilot-instructions.md
Error Handling
Common Issues
| Issue | Solution |
|---|---|
AADServicePrincipalSignInLogs table not found |
This table may not exist in all workspaces. Check if it's available with search_tables. Try Advanced Hunting as fallback. |
| Zero entities in results | Verify the workspace has sign-in data for the entity type. Check if logging is enabled. |
| Query timeout | Reduce the baseline window from 90 to 60 days, or add | take 100 to intermediate results. |
AuditLogs has_any not matching |
Ensure IDs are quoted strings in the dynamic() array. Use tostring() on dynamic fields. |
| Very large number of SPNs | Add | where BL_TotalSignIns > 10 to filter out extremely low-activity SPNs that add noise. |
Validation Checklist
Before presenting results, verify:
- All applicable data sources were queried (even if some returned 0 results)
- Low-volume denominator floor was applied to any entity with BL_DailyAvg < 10
- Corroborating evidence was checked for every flagged entity
- Empty results are explicitly reported with ✅ (not silently omitted)
- The report includes the drift score formula and threshold for transparency
- SecurityAlert was joined with SecurityIncident for real Status/Classification (never read SecurityAlert.Status directly)
- Incident classifications (TP/FP/BP) were factored into risk assessment — FalsePositive alerts discounted, TruePositive alerts escalated
- IPv6
fd00:addresses were identified as Microsoft fabric (not adversary infrastructure) - Credential rotation cadence was assessed for AuditLog findings
- When drilling into incidents/alerts via Triage MCP,
ProviderIncidentIdwas used (neverIncidentNumberorSystemAlertId)
SVG Dashboard Generation
📊 Optional post-report step. After an SPN scope drift report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/scope-drift/spn/Scope_Drift_Report_<entity>_<date>.md - Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/scope-drift/spn/{report_name}_dashboard.svg
The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.
.github/skills/scope-drift-detection/user/SKILL.md
npx skills add SCStelz/security-investigator --skill scope-drift-detection-user -g -y
SKILL.md
Frontmatter
{
"name": "scope-drift-detection-user",
"description": "Use this skill when asked to detect scope drift, behavioral expansion, or gradual privilege\/access creep in user accounts. Triggers on keywords like \"user drift\", \"user behavioral change\", \"user scope drift\", \"user baseline deviation\", \"user access expansion\", or when investigating whether a user account has gradually expanded beyond its established behavioral baseline. This skill builds a 90-day behavioral baseline for both interactive and non-interactive sign-ins, compares with 7-day recent activity, computes weighted Drift Scores (7 dimensions for interactive, 6 for non-interactive), and correlates with SecurityAlert, AuditLogs, Identity Protection, custom anomaly tables, CloudAppEvents (cloud app activity drift), and EmailEvents (email pattern drift).",
"drill_down_prompt": "Analyze user behavioral drift for {entity} — sign-in pattern changes, app usage shifts",
"threat_pulse_domains": [
"identity"
]
}
User Account Scope Drift Detection — Instructions
Purpose
This skill detects scope drift — the gradual, often imperceptible expansion of access or behavior beyond an established baseline — in Entra ID user accounts. Unlike sudden compromise (which triggers alerts), scope drift is a slow-burn pattern that evades threshold-based detections.
Entity Type: User Account
| Identifier | Primary Table(s) | Use Case |
|---|---|---|
| UserPrincipalName (UPN) | SigninLogs + AADNonInteractiveUserSignInLogs |
Human users, admin accounts, shared mailboxes |
What this skill detects:
- Volume spikes in sign-in activity relative to historical baseline
- New applications accessed (potential unauthorized access or shadow IT)
- New target resources (APIs, services) not previously accessed
- New device/OS/browser combinations
- New source IP addresses or geographic locations
- Increased failure rates indicating probing or misconfiguration
- Account configuration changes correlated with behavioral shifts
- Security alerts involving the user
- Identity Protection risk events
- Pre-computed sign-in anomalies (custom table)
- Cloud app activity drift — new action types, admin operations, impersonation, external user activity (CloudAppEvents)
- Email pattern drift — volume/direction changes, new sender domains, threat email trends (EmailEvents)
Related skills:
- SPN Scope Drift — for service principals
- Device Scope Drift — for endpoints/devices
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Output Modes - Inline chat vs. Markdown file
- Quick Start - 7-step investigation pattern
- Drift Score Formula - Weighted composite scoring (Interactive: 7 dimensions, Non-Interactive: 6 dimensions)
- Execution Workflow - Complete 4-phase process
- Sample KQL Queries - Validated query patterns (Queries 6-13)
- Report Template - Output format specification
- Known Pitfalls - Edge cases and false positives
- Error Handling - Troubleshooting guide
- SVG Dashboard Generation - Visual dashboard from report
Investigation shortcuts:
- User drift triage (TP Q3): Q6 + Q7 (baseline vs recent — both drift scores + dimension ratios) → Q11 (alert/incident correlation) → Tier 1 deep dives for flagged users
- Compromised user forensics (TP Q3 + incident context): Q6 + Q7 (behavioral profile) → Q8 (AuditLog changes — password/MFA/role changes, timestamps, actors) → Q10 (Identity Protection risk events) → Q11 (incident status/classification)
- Sign-in anomaly investigation (TP Q3, high anomaly count): Q6 + Q7 (drift scores) → Q9 (custom anomaly table — new IPs, device combos, geo novelty) → Q10 (Identity Protection cross-reference)
- Cloud app activity expansion (standalone or TP Q9): Q6 (interactive baseline context) → Q12 (CloudAppEvents — new action types, admin ops, impersonation) → Q11 (alert correlation)
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY user scope drift analysis:
- ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
- ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
- ALWAYS build baseline FIRST before comparing recent activity
- ALWAYS compute BOTH interactive AND non-interactive drift scores — user accounts produce two drift scores
- ALWAYS apply the low-volume denominator floor to prevent false-positive drift scores on sparse baselines
- ALWAYS correlate across all required data sources (SigninLogs, AADNonInteractiveUserSignInLogs, AuditLogs, SecurityAlert, Anomaly table, Identity Protection, CloudAppEvents, EmailEvents)
- ALWAYS run independent queries in parallel for performance
- NEVER report a drift flag without corroborating evidence from at least one secondary data source
Data Sources
| Data Source | Role | Purpose |
|---|---|---|
SigninLogs |
✅ Primary | User interactive sign-in baseline |
AADNonInteractiveUserSignInLogs |
✅ Primary | User non-interactive (token refresh) baseline |
AuditLogs |
✅ Corroboration | Password/MFA/role/group changes |
SecurityAlert |
✅ Corroboration | Corroborating alert evidence |
SecurityIncident |
✅ Corroboration | Real alert status/classification |
Signinlogs_Anomalies_KQL_CL |
✅ Corroboration | Pre-computed anomaly detection (custom table) |
SigninLogs (risk fields) |
✅ Corroboration | Identity Protection risk events |
CloudAppEvents |
✅ Corroboration | Cloud app activity drift — action types, admin operations, apps, IPs, impersonation |
EmailEvents |
✅ Corroboration | Email pattern drift — volume/direction, sender domains, threat emails |
⛔ MANDATORY: Sentinel Workspace Selection
This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:
When invoked from incident-investigation skill:
- Inherit the workspace selection from the parent investigation context
- If no workspace was selected in parent context: STOP and ask user to select
When invoked standalone (direct user request):
- ALWAYS call
list_sentinel_workspacesMCP tool FIRST - If 1 workspace exists: Auto-select, display to user, proceed
- If multiple workspaces exist:
- Display all workspaces with Name and ID
- ASK: "Which Sentinel workspace should I use for this investigation?"
- ⛔ STOP AND WAIT for user response
- ⛔ DO NOT proceed until user explicitly selects
- If a query fails on the selected workspace:
- ⛔ DO NOT automatically try another workspace
- STOP and report the error, display available workspaces, ASK user to select
🔴 PROHIBITED ACTIONS:
- ❌ Selecting a workspace without user consent when multiple exist
- ❌ Switching to another workspace after a failure without asking
- ❌ Proceeding with investigation if workspace selection is ambiguous
Output Modes
This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.
Mode 1: Inline Chat Summary (Default)
- Render the full drift analysis directly in the chat response
- Includes ASCII tables, Pareto chart, drift dimension bars, and security assessment
- Best for quick review and interactive follow-up questions
Mode 2: Markdown File Report
- Save a comprehensive report to
reports/scope-drift/user/Scope_Drift_Report_<username>_<timestamp>.md - All ASCII visualizations render correctly inside markdown code fences (
```) - Includes all data from inline mode plus additional detail sections
- Use
create_filetool — NEVER use terminal commands for file output - Filename pattern:
Scope_Drift_Report_<username>_YYYYMMDD_HHMMSS.md(extract username from UPN, e.g.,jdoefromjdoe@contoso.com)
Markdown Rendering Notes
- ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
- ✅ Unicode block characters (
█full block,─box-drawing horizontal) display correctly in monospaced fonts - ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
- ✅ Standard markdown tables (
| col |) render as formatted tables - Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering
Quick Start (TL;DR)
When a user requests user scope drift detection:
- Select Workspace →
list_sentinel_workspaces, auto-select or ask - Determine Output Mode → Ask if not specified: inline, markdown file, or both
- Run Phase 1 → Query 6 (SigninLogs interactive) + Query 7 (AADNonInteractiveUserSignInLogs)
- Apply Entity Scaling (multi-user only) → If analyzing multiple users, compute drift scores, rank, apply tiered depth limits (see Entity Scaling)
- Run Phases 2-3 → Queries 8-13 (AuditLogs + SecurityAlert + Anomaly table + Identity Protection + CloudAppEvents + EmailEvents) — scoped per tier if multi-user
- Compute Drift Scores → Apply 7-dimension interactive formula + 6-dimension non-interactive formula, flag if >150%, assess with corroborating evidence
- Output Results → Render in selected mode(s)
Entity Scaling (Multi-User Analysis)
Problem: This skill is typically used for single-user investigations, but users may request tenant-wide or group-based analysis ("drift for all users", "drift for finance department"). Running Queries 8–13 for every user in a large tenant is prohibitively expensive and produces unreadable reports.
Solution: For multi-user analysis, after Phase 1 computes drift scores for all target users, apply tiered depth based on user count and drift severity.
Single-user mode: When investigating one specific user (the common case), skip this section entirely — always run all queries at full depth.
User Count Detection
After Queries 6+7, count distinct users in the result set:
| User Count | Tier | Deep Dive Limit | Behavior |
|---|---|---|---|
| 1 user | Single | Full | All queries at full depth. This section does not apply. |
| 2–30 users | Small | All flagged | Full deep dive for every user > 150%. No limiting needed. |
| 31–100 users | Medium | Top 10 | Full deep dive for top 10 by max(Interactive, Non-Interactive) DriftScore. Summary row for remaining flagged users. |
| 101–500 users | Large | Top 10 | Full deep dive for top 10. Tier 2 summary (next 15) with Identity Protection + alerts only. Remaining flagged users listed in ranking table. |
| > 500 users | Very Large | Top 10 | Same as Large, plus: filter Phase 1 results to BL_TotalSignIns > 10 to exclude near-silent accounts from scoring. |
Tiered Depth Model (Multi-User)
| Tier | Users | Queries Run | Report Depth |
|---|---|---|---|
| Tier 1 (Full) | Top N by DriftScore | All: Q8–Q13 | Full deep dive: both ASCII charts, dimension tables, AuditLog changes, alerts, anomalies, Identity Protection, CloudAppEvents, EmailEvents |
| Tier 2 (Summary) | Next 15 flagged users (or remaining if < 15) | Q10 + Q11 only (Identity Protection + SecurityAlert) | One-line summary: both scores, risk state, alert count, flag status |
| Tier 3 (Score only) | All remaining flagged users | None beyond Phase 1 | Row in ranking table: UPN, interactive score, non-interactive score, flag emoji |
| Stable | Users ≤ 150% | None beyond Phase 1 | Omitted from deep dives. Included in summary statistics only. |
User Override
If the user explicitly asks for "all users detailed" or "full report", honor the request but warn:
⚠️ Analysis covers <N> users with <X> flagged above 150%. Running full deep dives for all flagged users may be slow and produce a very long report. Proceed? (Default: top 10 deep dives + summary for others)
Report Disclosure (Multi-User)
When tiered depth is applied, always disclose in the report header:
**User Count:** <N> users (Large cohort — tiered analysis applied)
**Deep Dives:** Top <X> by DriftScore (Tier 1: full analysis)
**Summaries:** <Y> additional flagged users (Tier 2: risk + alerts only)
**Score Only:** <Z> additional flagged users (Tier 3: ranking table only)
**Stable:** <W> users ≤ 150% (omitted from deep dives)
Drift Score Formula
The Drift Score is a weighted composite of behavioral dimensions, normalized so that 100 = identical to baseline.
User accounts produce TWO drift scores (interactive + non-interactive). Both must be computed and reported.
User Account Formula — Interactive (7 Dimensions)
$$ \text{DriftScore}_{Interactive} = 0.25V + 0.20A + 0.10R + 0.15IP + 0.10L + 0.10D + 0.10F $$
| Dimension | Weight | Metric | Why |
|---|---|---|---|
| Volume | 25% | Daily avg interactive sign-ins | Reduced weight vs SPN — user volume is naturally more variable |
| Applications | 20% | Distinct apps accessed | New apps = potential unauthorized access or shadow IT |
| Resources | 10% | Distinct target resources accessed | Reduced weight — apps are a better user-level signal |
| IPs | 15% | Distinct source IP addresses | New IPs = different network, VPN, or credential theft |
| Locations | 10% | Distinct geographic locations | New geos = travel or impossible travel |
| Devices | 10% | Distinct device types (OS + browser) | New devices = potential unauthorized device |
| Failure Rate | 10% | Failure rate delta | Rising failures = password spray target or lockout |
User Account Formula — Non-Interactive (6 Dimensions)
$$ \text{DriftScore}_{NonInteractive} = 0.30V + 0.20A + 0.15R + 0.15IP + 0.10L + 0.10F $$
| Dimension | Weight | Metric | Why |
|---|---|---|---|
| Volume | 30% | Daily avg non-interactive sign-ins | Higher weight — non-interactive volume is more predictable |
| Applications | 20% | Distinct apps with token refreshes | New apps = potential token theft or rogue app consent |
| Resources | 15% | Distinct resources targeted | New resources = lateral expansion via token reuse |
| IPs | 15% | Distinct source IPs | New IPs = session hijack or AiTM proxy |
| Locations | 10% | Distinct geographic locations | Geographic shifts in token usage |
| Failure Rate | 10% | Failure rate delta | Rising failures = expired/revoked token churn |
Note: Devices dimension is excluded from non-interactive because token refreshes don't generate reliable device telemetry.
Interpretation Scale
| Score | Meaning | Action |
|---|---|---|
| < 80 | Contracting scope | ✅ Normal — entity is doing less than usual |
| 80–120 | Stable / normal variance | ✅ No action required |
| 120–150 | Moderate deviation | 🟡 Monitor — check for legitimate reasons |
| > 150 | Significant drift | 🔴 FLAG — investigate with corroborating evidence |
| > 250 | Extreme drift | 🔴 CRITICAL — immediate investigation required |
Low-Volume Denominator Floor
CRITICAL: For entities with sparse baselines (< 10 daily sign-ins), the volume ratio is artificially inflated. Apply a floor:
IF BL_DailyAvg < 10:
AdjustedVolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100
Flag the score with: "⚠️ Low-volume baseline — ratio may be inflated"
This prevents an entity averaging 1 sign-in/day from triggering at 6 sign-ins/day (600% ratio but trivial absolute volume).
User-specific note: Non-interactive sign-ins often have very high volume (thousands/day) from background token refreshes. The floor is less likely to trigger for non-interactive, but always check interactive separately.
Failure Rate Dimension — Delta-to-Ratio Conversion
CRITICAL: The FailRate dimension is a percentage-point delta, not a multiplicative ratio like the other dimensions. Convert it to the same 0–200+ scale using this formula:
FailRateDelta = RecentFailRate - BaselineFailRate (percentage points)
FailRateRatio = 100 + (FailRateDelta × 10) (scaled: each +1pp = +10 on the ratio scale)
| Baseline FailRate | Recent FailRate | Delta | Ratio | Interpretation |
|---|---|---|---|---|
| 5.00% | 5.00% | 0.00 | 100.0 | No change |
| 5.00% | 8.00% | +3.00 | 130.0 | Moderate increase |
| 5.00% | 12.00% | +7.00 | 170.0 | 🔴 Above threshold |
| 5.00% | 2.00% | -3.00 | 70.0 | Improving (contracting) |
| 0.00% | 0.00% | 0.00 | 100.0 | No change (both clean) |
| 0.00% | 5.00% | +5.00 | 150.0 | 🟡 At threshold — new failures appearing |
Edge case: Baseline = 0% avoids division-by-zero because delta is additive, not multiplicative. The scaling factor (×10) means each percentage point of failure rate increase maps to 10 points on the drift scale. This keeps FailRate on the same magnitude as the other dimensions.
In the ASCII chart: Show the ratio as the bar fill percentage and append the raw delta as direction indicator: ^+X.XX (increasing) or v-X.XX (decreasing).
Execution Workflow
Phase 1: Behavioral Baseline vs. Recent Comparison
Baseline window: 90 days (days 8–97 ago) Recent window: 7 days (last 7 days)
This is the primary query that computes per-user behavioral profiles and drift metrics.
| Data Source | Query | Notes |
|---|---|---|
SigninLogs |
Query 6 | Interactive, 7 dimensions (adds Apps, Devices) |
AADNonInteractiveUserSignInLogs |
Query 7 | Non-interactive, 6 dimensions (adds Apps, no Devices) |
User accounts produce TWO drift scores (interactive + non-interactive). Both must be computed and reported.
Phase 2: Account Configuration Change Audit
Data source: AuditLogs
Correlation: Same 97-day window, filtered to the user from Phase 1
Operations to Look For:
Reset user passwordChange user passwordUpdate userAdd member to groupAdd member to roleRegister security infoDelete security infoUpdate StsRefreshTokenValidFrom- Any operation containing: "password", "MFA", "role", "group", "conditional", "auth"
Phase 3: Corroborating Signal Collection (Run in Parallel)
- SecurityAlert + SecurityIncident (Query 11): Check for alerts referencing user UPN, joined with SecurityIncident for real status/classification. Never read SecurityAlert.Status directly — it's always "New".
- Signinlogs_Anomalies_KQL_CL (Query 9): Pre-computed anomaly detection (new IPs, new device combos, geographic novelty). Custom table — may not exist in all workspaces.
- Identity Protection risk fields (Query 10):
RiskLevelDuringSignIn,RiskState,RiskEventTypes_V2fromSigninLogs. - CloudAppEvents (Query 12): Cloud app activity drift — baseline vs. recent comparison of action types, applications, IPs, countries, admin/external/impersonated operations. Requires user's
AccountObjectId(Entra Object ID) — resolve from UPN via Graph API before querying. May not exist if XDR connector is not streaming to Data Lake. - EmailEvents (Query 13): Email pattern drift — baseline vs. recent comparison of volume, send/receive ratio, email direction, sender domains, threat email prevalence. Uses UPN for both sender and recipient matching. May not exist if XDR connector is not streaming to Data Lake.
Phase 4: Score Computation & Report Generation
- Compute DriftScore for BOTH interactive and non-interactive using entity-specific formulas
- Apply the low-volume denominator floor
- Flag if either score exceeds 150% threshold
- For flagged users: assess corroborating evidence (account changes, alerts, anomaly table, Identity Protection, cloud app activity drift, email pattern drift)
- Generate risk assessment with emoji-coded findings
- Render output in the user's selected mode
Sample KQL Queries
Query 6: User Interactive Sign-In Baseline vs. Recent
// Build 90-day baseline vs 7-day recent for user interactive sign-ins
// Substitute <UPN> with user's UPN
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
SigninLogs
| where UserPrincipalName =~ '<UPN>'
| where TimeGenerated >= baselineStart
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
TotalSignIns = count(),
Days = dcount(bin(TimeGenerated, 1d)),
DistinctApps = dcount(AppDisplayName),
DistinctResources = dcount(ResourceDisplayName),
DistinctIPs = dcount(IPAddress),
DistinctLocations = dcount(Location),
DistinctDevices = dcount(strcat(tostring(parse_json(DeviceDetail).operatingSystem), "|", tostring(parse_json(DeviceDetail).browser))),
FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
Apps = make_set(AppDisplayName, 50),
Resources = make_set(ResourceDisplayName, 50),
IPs = make_set(IPAddress, 50),
Locations = make_set(Location, 50),
Devices = make_set(strcat(tostring(parse_json(DeviceDetail).operatingSystem), "|", tostring(parse_json(DeviceDetail).browser)), 50)
by Period
| order by Period asc
Post-processing: Compare Baseline vs Recent rows. Compute ratios per dimension. Calculate set_difference() equivalents in the assessment to identify new apps, IPs, locations, and devices appearing only in the Recent period.
Query 7: User Non-Interactive Sign-In Baseline vs. Recent
// Build 90-day baseline vs 7-day recent for user non-interactive sign-ins
// Substitute <UPN> with user's UPN
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
AADNonInteractiveUserSignInLogs
| where UserPrincipalName =~ '<UPN>'
| where TimeGenerated >= baselineStart
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
TotalSignIns = count(),
Days = dcount(bin(TimeGenerated, 1d)),
DistinctApps = dcount(AppDisplayName),
DistinctResources = dcount(ResourceDisplayName),
DistinctIPs = dcount(IPAddress),
DistinctLocations = dcount(Location),
FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
Apps = make_set(AppDisplayName, 50),
Resources = make_set(ResourceDisplayName, 50),
IPs = make_set(IPAddress, 50),
Locations = make_set(Location, 50)
by Period
| order by Period asc
Note: Devices dimension is excluded from non-interactive queries — token refreshes don't generate reliable device telemetry.
KQL Pattern Note: Uses single-pass extend Period = iff(...) pattern instead of separate baseline/recent subqueries joined with join kind=inner on 1==1. The cross-join pattern is NOT supported in KQL — always use the Period flag approach for user queries.
Query 8: User AuditLog Configuration Changes
// User account configuration changes (password, MFA, roles, groups)
// Substitute <UPN> with user's UPN
AuditLogs
| where TimeGenerated > ago(97d)
| where OperationName has_any ("password", "MFA", "role", "group", "conditional", "auth",
"user", "member", "security info")
| where tostring(TargetResources) has '<UPN>'
or tostring(InitiatedBy) has '<UPN>'
or Identity =~ '<UPN>'
| extend InBaseline = TimeGenerated < ago(7d)
| summarize
BaselineOps = countif(InBaseline),
RecentOps = countif(not(InBaseline)),
Operations = make_set(OperationName, 30)
by OperationName
| order by RecentOps desc
Query 9: SigninLogs Anomaly Table (Custom)
🔴 CRITICAL — CASE-SENSITIVE TABLE NAME: The table is
Signinlogs_Anomalies_KQL_CL(lowercase 'l' in "logs"). Do NOT useSigninLogs_Anomalies_KQL_CL— that will fail withSemanticError: Failed to resolve table. KQL custom_CLtables are case-sensitive. Copy the name exactly as written below.
// Pre-computed anomalies from Signinlogs_Anomalies_KQL_CL
// Substitute <UPN> with user's UPN
// ⚠️ CASE-SENSITIVE: Table name is "Signinlogs" (lowercase 'l'), NOT "SigninLogs"
// Note: This table may not exist in all workspaces — handle gracefully
Signinlogs_Anomalies_KQL_CL
| where TimeGenerated > ago(14d)
| where UserPrincipalName =~ '<UPN>'
| extend Severity = case(
BaselineSize < 3, "Informational",
CountryNovelty and CityNovelty and ArtifactHits >= 20, "High",
ArtifactHits >= 10 or CountryNovelty or CityNovelty or StateNovelty, "Medium",
ArtifactHits >= 5, "Low",
"Informational")
| where Severity in ("High", "Medium", "Low")
| project DetectedDateTime, AnomalyType, Value, Severity, Country, City,
ArtifactHits, CountryNovelty, CityNovelty, OS, BrowserFamily
| order by DetectedDateTime desc
| take 20
Query 10: Identity Protection Risk Events
// Identity Protection risk signals from SigninLogs
// Substitute <UPN> with user's UPN
SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName =~ '<UPN>'
| where RiskLevelDuringSignIn != "none" and RiskLevelDuringSignIn != ""
| project TimeGenerated, RiskLevelDuringSignIn, RiskState, RiskEventTypes_V2,
IPAddress, Location, AppDisplayName,
DeviceOS = tostring(parse_json(DeviceDetail).operatingSystem),
Browser = tostring(parse_json(DeviceDetail).browser),
ConditionalAccessStatus
| order by TimeGenerated desc
| take 20
Note: Identity Protection events supplement the drift analysis. Any atRisk or confirmedCompromised risk states in the recent window should be flagged prominently, regardless of drift score.
Query 11: User SecurityAlert + SecurityIncident Correlation
// Security alerts and incidents referencing the user
// IMPORTANT: SecurityAlert.Status is immutable (always "New") — MUST join SecurityIncident for real Status/Classification
// Substitute <UPN> with user's UPN
let relevantAlerts = SecurityAlert
| where TimeGenerated > ago(97d)
| where Entities has '<UPN>' or CompromisedEntity has '<UPN>'
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName, ProductComponentName, Tactics, Techniques, TimeGenerated;
SecurityIncident
| where CreatedTime > ago(97d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend Period = iff(TimeGenerated1 < ago(7d), "Baseline", "Recent")
| summarize
BaselineAlerts = countif(Period == "Baseline"),
RecentAlerts = countif(Period == "Recent"),
TotalAlerts = count(),
Severities = make_set(AlertSeverity, 5),
IncidentStatuses = make_set(Status, 5),
Classifications = make_set(Classification, 5),
BaselineIncidents = dcountif(IncidentNumber, Period == "Baseline"),
RecentIncidents = dcountif(IncidentNumber, Period == "Recent")
by ProductName
| order by TotalAlerts desc
Interpreting Incident Status in Drift Context:
| Incident Status | Classification | Impact on Drift Assessment |
|---|---|---|
| Closed | TruePositive | 🔴 Confirmed threat — significantly increases drift risk |
| Closed | FalsePositive | 🟢 False alarm — discount from drift risk, note as noise |
| Closed | BenignPositive | 🟡 Expected behavior — note but don't escalate |
| Active/New | Any | 🟠 Unresolved — flag for attention, may indicate ongoing threat |
Product Name Mapping (Legacy → Current Branding):
The ProductName field in SecurityAlert contains the detection product. When rendering reports, translate to current Microsoft branding:
| SecurityAlert.ProductName (raw) | Report Display Name |
|---|---|
| Microsoft Defender Advanced Threat Protection | Microsoft Defender for Endpoint |
| Microsoft Cloud App Security | Microsoft Defender for Cloud Apps |
| Microsoft Data Loss Prevention | Microsoft Purview Data Loss Prevention |
| Azure Sentinel | Microsoft Sentinel |
| Microsoft 365 Defender | Microsoft Defender XDR |
| Office 365 Advanced Threat Protection | Microsoft Defender for Office 365 |
| Azure Advanced Threat Protection | Microsoft Defender for Identity |
Report Rendering: Same rules as SPN — show Baseline vs Recent alert/incident counts per product, with a Total row and brief summary. Do NOT list individual alert names.
Query 12: CloudAppEvents — Cloud App Activity Drift
// Cloud app activity drift — baseline vs recent comparison
// Tracks action type diversity, application usage, IP/geo distribution,
// admin operations, external user activity, and impersonation
// Substitute <ACCOUNT_OBJECT_ID> with user's Entra Object ID (resolve from UPN via Graph API)
// NOTE: This table requires XDR connector streaming to Data Lake
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
CloudAppEvents
| where TimeGenerated >= baselineStart
| where AccountObjectId == '<ACCOUNT_OBJECT_ID>'
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
TotalEvents = count(),
Days = dcount(bin(TimeGenerated, 1d)),
DistinctActions = dcount(ActionType),
DistinctApps = dcount(Application),
DistinctObjects = dcount(ObjectName),
DistinctIPs = dcount(IPAddress),
DistinctCountries = dcount(CountryCode),
AdminOps = countif(IsAdminOperation),
ExternalUserOps = countif(IsExternalUser),
ImpersonatedOps = countif(IsImpersonated),
Actions = make_set(ActionType, 100),
Apps = make_set(Application, 50),
IPs = make_set(IPAddress, 50),
Countries = make_set(CountryCode, 20)
by Period
| order by Period asc
How to resolve AccountObjectId from UPN:
Use Microsoft Graph API: GET /v1.0/users/<UPN>?$select=id → use the id field as <ACCOUNT_OBJECT_ID>.
Drift Interpretation for CloudAppEvents (Corroboration — not scored):
CloudAppEvents provides qualitative corroboration, not a scored drift dimension. Focus on these signals:
| Signal | Baseline → Recent Change | Risk Implication |
|---|---|---|
| DistinctActions ↑↑ | New action types appearing | Expanded permissions or new tooling usage |
| AdminOps ↑↑ | New admin-level operations | Privilege escalation or new admin role assignment |
| ExternalUserOps > 0 (new) | External user activity appearing | Potential guest account abuse or B2B compromise |
| ImpersonatedOps > 0 (new) | Impersonation activity appearing | Delegated access abuse or admin impersonation |
| New applications | Apps in Recent not in Baseline | Shadow IT, rogue app consent, or lateral movement |
| New countries | Countries in Recent not in Baseline | Geographic anomaly — correlate with SigninLogs locations |
| DistinctIPs ↑↑ | Significant new IPs | VPN rotation, proxy usage, or credential sharing |
Corroboration with other drift signals:
- New admin operations in CloudAppEvents + role assignment in AuditLogs = strong privilege escalation signal
- New applications in CloudAppEvents + new apps in SigninLogs = confirmed shadow IT adoption
- New countries in CloudAppEvents + geographic anomalies in anomaly table = travel or compromise
Query 13: EmailEvents — Email Pattern Drift
// Email pattern drift — baseline vs recent comparison
// Tracks volume, send/receive ratio, direction distribution,
// sender diversity, domain diversity, and threat email prevalence
// Substitute <UPN> with user's UPN (matches both sender and recipient)
// NOTE: This table requires XDR connector streaming to Data Lake
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
EmailEvents
| where TimeGenerated >= baselineStart
| where RecipientEmailAddress =~ '<UPN>' or SenderMailFromAddress =~ '<UPN>'
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
TotalEmails = count(),
Days = dcount(bin(TimeGenerated, 1d)),
SentCount = countif(SenderMailFromAddress =~ '<UPN>'),
ReceivedCount = countif(RecipientEmailAddress =~ '<UPN>'),
InboundCount = countif(EmailDirection == "Inbound"),
OutboundCount = countif(EmailDirection == "Outbound"),
IntraOrgCount = countif(EmailDirection == "Intra-org"),
DistinctSenders = dcount(SenderMailFromAddress),
DistinctRecipients = dcountif(RecipientEmailAddress, SenderMailFromAddress =~ '<UPN>'),
DistinctSenderDomains = dcount(SenderMailFromDomain),
ThreatEmails = countif(ThreatTypes != ""),
DistinctSubjects = dcount(Subject),
SenderDomains = make_set(SenderMailFromDomain, 50),
DeliveryActions = make_set(DeliveryAction, 10)
by Period
| order by Period asc
Drift Interpretation for EmailEvents (Corroboration — not scored):
EmailEvents provides qualitative corroboration, not a scored drift dimension. Focus on these signals:
| Signal | Baseline → Recent Change | Risk Implication |
|---|---|---|
| SentCount ↑↑↑ | Sudden spike in outbound email | Potential spam/phishing campaign from compromised account |
| SentCount drops to 0 | User stopped sending email | Account takeover with mail forwarding rule (check OfficeActivity) |
| ThreatEmails ↑ | Increase in threat-flagged inbound | Targeted phishing campaign against user |
| New SenderDomains (inbound) | Domains in Recent not in Baseline | New communication partners or phishing domains |
| IntraOrgCount → 0 (was > 0) | Lost intra-org email patterns | User isolated or moved to different tenant |
| DeliveryAction changes | More "Junked" or "Blocked" in Recent | Email security policies catching more threats |
| DistinctSubjects ↓↓ (with volume ↑) | Many emails with few subjects | Automated/bulk email — potential spam or notification storm |
| OutboundCount ↑ + new recipients | Sudden outbound expansion | Data exfiltration or mass-mailing from compromised mailbox |
Corroboration with other drift signals:
- Outbound email spike + new forwarding rule in OfficeActivity/AuditLogs = email exfiltration (T1114.003)
- ThreatEmails ↑ + Identity Protection risk events + new IPs in SigninLogs = active phishing campaign with partial success
- SentCount → 0 + non-interactive IP drift = account takeover with inbox rule redirect
Report Template
Inline Chat Report Structure
The inline report MUST include these sections in order:
- Header — Workspace, analysis period, drift threshold, data sources
- Interactive Drift Score — 7-dimension breakdown with ratios
- Non-Interactive Drift Score — 6-dimension breakdown with ratios
- Flagged Dimension Deep Dive (for any dimension > 150%) — Baseline vs. recent comparison, new IPs/apps/devices, dimension bar chart
- Correlated Signal Summary — AuditLogs, SecurityAlert/Incident, and anomaly table findings in a single table
- Identity Protection Summary — Risk events, risk states, risk levels
- Cloud App Activity Drift — CloudAppEvents baseline vs. recent: action types, apps, admin ops, impersonation, new countries/IPs
- Email Pattern Drift — EmailEvents baseline vs. recent: volume, direction, sender domains, threat emails
- Security Assessment — Emoji-coded findings table with evidence citations
- Verdict Box — Overall risk level, root cause analysis, recommendations
Markdown File Report Structure
When outputting to markdown file, include everything from the inline format PLUS:
Filename pattern: reports/scope-drift/user/Scope_Drift_Report_<username>_YYYYMMDD_HHMMSS.md
# User Account Scope Drift Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**User:** <UPN>
**Baseline Period:** <start> → <end> (90 days)
**Recent Period:** <start> → <end> (7 days)
**Drift Threshold:** 150%
**Data Sources:** SigninLogs, AADNonInteractiveUserSignInLogs, AuditLogs, SecurityAlert, Signinlogs_Anomalies_KQL_CL, Identity Protection, CloudAppEvents, EmailEvents
---
## Executive Summary
<1-3 sentence summary: interactive drift score, non-interactive drift score, overall risk level>
---
## Interactive Sign-In Drift
**Drift Score: XX.X%** — <status emoji> <Contracting/Stable/Expanding>
<LaTeX formula block>
**ASCII Drift Dimension Chart (REQUIRED):**
Render a box-drawn chart inside a code fence. **Inner width: 58 chars** (every line between `│` markers = exactly 58 visual characters). No emoji inside boxes — use text labels.
**Alignment:** Name (9 chars padded) + weight (5) + gap (2) + bars (20 `█─`) + gap (2) + pct (6, right-aligned: `XXX.X%` or ` XX.X%`) + gap (2) + direction (10 total: `^`/`v`/`=` + 9 trailing spaces, or FailRate: delta like `v-X.XX` + 4 trailing spaces). Status labels (centered): `STABLE`, `STABLE (Low-Volume)`, `NEAR THRESHOLD`, `ABOVE THRESHOLD`, `CRITICAL`. Direction: `^` (up), `v` (down), `=` (stable).
**Bar characters:** Use `█` (U+2588 full block) for filled portions and `─` (U+2500 box-drawing horizontal) for the unfilled track.
┌──────────────────────────────────────────────────────────┐ │ INTERACTIVE DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (25%) ██████────────────── XXX.X% ^ │ │ Apps (20%) ███───────────────── XX.X% v │ │ Resources(10%) ██████────────────── XXX.X% = │ │ IPs (15%) █─────────────────── XX.X% v │ │ Locations(10%) ███───────────────── XX.X% = │ │ Devices (10%) ██────────────────── XX.X% v │ │ FailRate (10%) ██████────────────── XXX.X% v-X.XX │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘
**Bar fill:** 20 chars wide. Filled = round(ratio/100 × 20), capped at 20. Title and status: center within 58 chars. Use `█` for filled, `─` for unfilled.
**Then** render the standard markdown dimension table:
| Dimension | Weight | Baseline (90d) | Recent (7d) | Ratio | Weighted | Status |
|-----------|--------|----------------|-------------|-------|----------|--------|
<New apps, IPs, locations, devices appearing only in recent period>
---
## Non-Interactive Sign-In Drift
**Drift Score: XX.X%** — <status emoji> <Contracting/Stable/Expanding>
<LaTeX formula block>
**ASCII Drift Dimension Chart (REQUIRED):**
Same box-drawn format as Interactive. **Inner width: 58 chars.** 6 dimensions (no Devices):
┌──────────────────────────────────────────────────────────┐ │ NON-INTERACTIVE DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (30%) ███████───────────── XXX.X% ^ │ │ Apps (20%) ████──────────────── XX.X% v │ │ Resources(15%) █████─────────────── XXX.X% = │ │ IPs (15%) ██────────────────── XX.X% v │ │ Locations(10%) ███───────────────── XX.X% = │ │ FailRate (10%) ███████████───────── XXX.X% ^+X.XX │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘
**Then** render the standard markdown dimension table:
| Dimension | Weight | Baseline (90d) | Recent (7d) | Ratio | Weighted | Status |
|-----------|--------|----------------|-------------|-------|----------|--------|
<New apps, IPs, locations appearing only in recent period>
---
## Account Configuration Changes
<AuditLogs findings: password changes, MFA changes, role assignments, group memberships>
---
## Pre-Computed Anomalies
<Signinlogs_Anomalies_KQL_CL findings or gap note if table unavailable>
---
## Identity Protection
<Risk events, risk states, risk levels from SigninLogs>
---
## Cloud App Activity Drift
<CloudAppEvents baseline vs. recent comparison — action types, apps, IPs, countries, admin/external/impersonated operations>
<New actions, new apps, new countries appearing only in recent period>
<Corroboration notes — cross-reference with AuditLogs, SigninLogs>
<If table unavailable: "⚠️ CloudAppEvents table not available in this workspace — XDR connector may not be streaming to Data Lake.">
---
## Email Pattern Drift
<EmailEvents baseline vs. recent comparison — volume, sent/received, direction, sender domains, threat emails>
<Notable changes — outbound spikes, new sender domains, threat email trends>
<Corroboration notes — cross-reference with OfficeActivity for forwarding rules, Identity Protection for phishing>
<If table unavailable: "⚠️ EmailEvents table not available in this workspace — XDR connector may not be streaming to Data Lake.">
---
## Correlated Security Alerts
| Data Source | Finding | Incident Status |
|-------------|---------|-----------------|
| SigninLogs | ... | N/A |
| AADNonInteractiveUserSignInLogs | ... | N/A |
| AuditLogs | ... | N/A |
| Signinlogs_Anomalies_KQL_CL | ... | N/A |
| CloudAppEvents | ... | N/A |
| EmailEvents | ... | N/A |
| SecurityAlert / SecurityIncident | <Group by ProductName, translate to current branding> | <Status: New/Active/Closed, Classification: TP/FP/BP> |
---
## Security Assessment
| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡 **Factor** | Evidence-based finding |
---
## Verdict
**ASCII Verdict Box (REQUIRED):**
Render a box-drawn verdict summary inside a code fence. **Inner width: 66 chars.** No emoji inside boxes. Pad every line to exactly 66 chars between `│` markers.
┌──────────────────────────────────────────────────────────────────┐ │ OVERALL RISK: <LEVEL> -- <One-line summary> │ │ Interactive Score: XX.X (< 80 = Contracting) │ │ Non-Interactive Score: XX.X (< 80 = Contracting) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘
**Then** render the full verdict with:
- Root Cause Analysis paragraph
- Key Findings (numbered list)
- Recommendations (emoji-prefixed list)
---
## Appendix: Query Details
Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file (Queries 6–13). The appendix serves as an audit trail only.
| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q6 — Interactive Baseline vs. Recent | SigninLogs | X,XXX | N rows | X.XXs |
| Q7 — Non-Interactive Baseline vs. Recent | AADNonInteractiveUserSignInLogs | XX,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |
*Query definitions: see Queries 6–13 in this SKILL.md file.*
Known Pitfalls
SecurityAlert.Status Is Immutable — Always Join SecurityIncident
Problem: The Status field on SecurityAlert is set to "New" at creation time and never changes. It does NOT reflect whether the alert has been investigated, closed, or classified.
Solution: MUST join with SecurityIncident to get real Status (New/Active/Closed) and Classification (TruePositive/FalsePositive/BenignPositive). See Query 11 which implements this join.
Low-Volume Statistical Inflation
Problem: Entities with very low baseline activity (e.g., 1 sign-in/day) will show extreme volume ratios even with minor changes. Solution: Apply the denominator floor (minimum 10 sign-ins/day for volume ratio calculation). Always flag low-volume baselines in the report.
Seasonal/Cyclical Baselines
Problem: Some entities have weekly patterns (lower on weekends) or monthly cycles (month-end batch jobs). Solution: Note if the 7-day recent window falls on an atypical portion of the cycle. The 90-day baseline smooths most cyclical patterns, but edge cases exist.
90-Day IP/App Contraction
Problem: The 90-day baseline captures ISP address rotations, travel IPs, and occasional app usage that won't naturally recur in a 7-day window. This makes user accounts appear to be "contracting" (score < 80) when they are actually stable. Solution: For user accounts showing contraction, check if the absolute numbers are reasonable. If the user had 30 IPs over 90 days but only 2 in 7 days, this is expected — note it as "natural IP diversity compression" rather than genuine scope reduction.
Non-Interactive Volume Inflation
Problem: Non-interactive sign-ins (token refreshes, background app activity) can number in the thousands per day. A brief outage or token cache flush can cause dramatic volume swings. Solution: Weight non-interactive drift scores lower in the overall assessment unless corroborated by new apps or IPs. Volume-only drift in non-interactive is rarely meaningful without other signals.
Cross-Join KQL Error
Problem: join kind=inner on 1==1 (cross-join) is NOT supported in KQL Sentinel Data Lake. The SPN query uses separate subqueries joined on ServicePrincipalId, but user queries target a single UPN and cannot use this pattern.
Solution: User queries MUST use the single-pass extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent") pattern with summarize ... by Period. See Queries 6 and 7.
Identity Protection Risk States Lingering
Problem: Risk events (e.g., unfamiliarFeatures, anonymizedIPAddress) may show RiskState == "atRisk" for days/weeks after the triggering event if no admin action is taken.
Solution: Check RiskState carefully. "atRisk" doesn't mean ongoing compromise — it means the risk was never remediated or dismissed. Flag these for admin review but don't automatically escalate drift score.
Device Telemetry Gaps
Problem: DeviceDetail in SigninLogs may be empty or {} for some sign-in types (SSO, mobile apps, headless clients).
Solution: If DistinctDevices is very low (0-1) despite many sign-ins, note the gap rather than treating low device count as meaningful.
🔴 Custom Anomaly Table — CASE-SENSITIVE NAME
Problem: Signinlogs_Anomalies_KQL_CL is a custom table that may not exist in all workspaces. 🔴 CRITICAL: The table name uses lowercase 'l' in "logs" — Signinlogs NOT SigninLogs. KQL custom _CL table names are case-sensitive. LLMs tend to auto-correct this to match the standard SigninLogs table — this WILL cause a SemanticError: Failed to resolve table error. Always copy the exact table name from Query 9.
Solution: If the table returns a SemanticError, first verify you used the correct casing (Signinlogs_Anomalies_KQL_CL). If it still fails after verifying casing, then the table genuinely doesn't exist — skip Query 9 gracefully and note: "⚠️ Custom anomaly table not available in this workspace — skipping pre-computed anomaly check." Do not fail the entire analysis.
CloudAppEvents Uses AccountObjectId, Not UPN
Problem: CloudAppEvents identifies users via AccountObjectId (Entra Object ID GUID), not UserPrincipalName. Querying by UPN will return 0 results.
Solution: Before executing Query 12, resolve the user's Entra Object ID from their UPN using Microsoft Graph API: GET /v1.0/users/<UPN>?$select=id. Use the returned id value as <ACCOUNT_OBJECT_ID> in the query. If Graph API is unavailable, fall back to AccountDisplayName with has operator (less precise — display names are not unique).
CloudAppEvents/EmailEvents Table Availability
Problem: Both CloudAppEvents and EmailEvents are XDR-native tables that require the Defender XDR connector to stream data into the Sentinel Data Lake. They may not exist in all workspaces.
Solution: If either table is not found, skip the corresponding query gracefully and note: "⚠️ [Table] not available in this workspace — XDR connector may not be streaming to Data Lake." Do not fail the entire analysis. These are corroboration signals, not primary drift dimensions.
CloudAppEvents Empty CountryCode and IPAddress
Problem: Some CloudAppEvents entries (particularly system-initiated or API-driven operations) have empty CountryCode and/or IPAddress fields. These inflate DistinctCountries and DistinctIPs counts with empty string entries.
Solution: The query uses dcount() which counts empty strings as a distinct value. When interpreting results, note that one "country" or "IP" may be an empty string representing internal/system events. In the drift interpretation, focus on named countries and non-empty IPs.
EmailEvents ThreatTypes Empty String vs Null
Problem: ThreatTypes field in EmailEvents uses empty string "" for clean emails, not null. Using isnotempty() would miss this distinction.
Solution: Query 13 uses ThreatTypes != "" which correctly filters for threat-flagged emails only. When ThreatEmails count is 0 in Recent but > 0 in Baseline, this is a positive signal (fewer threats reaching the user) rather than a drift concern.
EmailEvents Dual-Direction Matching
Problem: Query 13 matches on both RecipientEmailAddress and SenderMailFromAddress, so a single email where the user is both sender and recipient (e.g., sending to self) could be double-counted.
Solution: This edge case is negligible in practice. The SentCount and ReceivedCount breakdowns use explicit directional filters, so the subtotals are accurate even if TotalEmails has minor inflation from self-sent emails.
Error Handling
Common Issues
| Issue | Solution |
|---|---|
SigninLogs table not found |
Rare but possible in workspaces without Entra ID P1/P2 logging enabled. Report as blocker. |
AADNonInteractiveUserSignInLogs table not found |
Check workspace configuration. Non-interactive logs require diagnostic settings. Skip non-interactive analysis and note the gap. |
Signinlogs_Anomalies_KQL_CL table not found |
First check casing — the table name is Signinlogs (lowercase 'l'), NOT SigninLogs. LLMs frequently auto-correct this. If casing is correct and it still fails, the custom table may not exist in this workspace. Skip Query 9 gracefully with a note; do not fail the analysis. |
CloudAppEvents table not found |
XDR connector may not be streaming to Data Lake. Skip Query 12 gracefully with note; do not fail the analysis. These are corroboration signals. |
EmailEvents table not found |
XDR connector may not be streaming to Data Lake. Skip Query 13 gracefully with note; do not fail the analysis. These are corroboration signals. |
| CloudAppEvents returns 0 results for valid user | Verify AccountObjectId — this field uses Entra Object ID (GUID), not UPN. Resolve via Graph API: GET /v1.0/users/<UPN>?$select=id. |
| Zero entities in results | Verify the workspace has sign-in data for the user. Check if logging is enabled. Verify UPN spelling. |
| Query timeout | Reduce the baseline window from 90 to 60 days, or add | take 100 to intermediate results. |
AuditLogs has_any not matching |
Ensure IDs are quoted strings in the dynamic() array. Use tostring() on dynamic fields. |
join kind=inner on 1==1 error |
Cross-join not supported in KQL. Use single-pass extend Period = iff(...) pattern instead. See Queries 6-7. |
| Identity Protection fields empty | RiskLevelDuringSignIn may be "none" for all records if Identity Protection is not licensed. Note the gap; don't treat as "no risk." |
Validation Checklist
Before presenting results, verify:
- All applicable data sources were queried (even if some returned 0 results)
- Low-volume denominator floor was applied to any entity with BL_DailyAvg < 10
- Corroborating evidence was checked for every flagged entity
- Empty results are explicitly reported with ✅ (not silently omitted)
- The report includes the drift score formula and threshold for transparency
- SecurityAlert was joined with SecurityIncident for real Status/Classification (never read SecurityAlert.Status directly)
- Incident classifications (TP/FP/BP) were factored into risk assessment — FalsePositive alerts discounted, TruePositive alerts escalated
- Both interactive AND non-interactive drift scores were computed
- IP/app contraction was contextualized (90-day diversity vs 7-day window)
- Identity Protection risk states were checked and reported
- Custom anomaly table was queried (or gap noted if unavailable)
- CloudAppEvents was queried for cloud app activity drift (or gap noted if table unavailable)
- EmailEvents was queried for email pattern drift (or gap noted if table unavailable)
- CloudAppEvents AccountObjectId was resolved from UPN via Graph API (not queried by UPN)
- Device telemetry gaps were noted if DeviceDetail was sparse
SVG Dashboard Generation
📊 Optional post-report step. After a User scope drift report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
How to Request a Dashboard
- Same chat: "Generate an SVG dashboard from the report" — data is already in context.
- New chat: Attach or reference the report file, e.g.
#file:reports/scope-drift/user/Scope_Drift_Report_<user>_<date>.md - Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.
Execution
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/scope-drift/user/{report_name}_dashboard.svg
The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.
.github/skills/sentinel-ingestion-report/SKILL.md
npx skills add SCStelz/security-investigator --skill sentinel-ingestion-report -g -y
SKILL.md
Frontmatter
{
"name": "sentinel-ingestion-report",
"description": "Sentinel Ingestion Report — YAML-driven PowerShell pipeline gathers all data via az monitor\/az rest\/Graph API, writes a deterministic scratchpad, LLM renders the report. Covers table-level volume breakdown, tier classification (Analytics\/Basic\/Data Lake), SecurityEvent\/Syslog\/CommonSecurityLog deep dives, ingestion anomaly detection (24h and WoW), analytic rule inventory via REST API, rule health via SentinelHealth, detection coverage cross-reference, tier migration candidates with DL-eligibility lookup, license benefit analysis (DfS P2 500MB\/server\/day, M365 E5 data grant). Inline chat and markdown file output."
}
Sentinel Ingestion Analysis Report — Instructions
Purpose
This skill generates a comprehensive Sentinel Ingestion Analysis Report covering workspace data volume, table-level breakdown, tier classification, ingestion anomalies, detection coverage, and optimization opportunities.
Entity Type: Sentinel workspace (from config.json)
| Scope | Primary Tables | Use Case |
|---|---|---|
| Workspace-wide (default) | Usage, SentinelHealth, SentinelAudit |
Full ingestion and cost analysis |
| Per-table deep dive | SecurityEvent, Syslog, CommonSecurityLog + any table |
Granular breakdown of high-volume tables |
What this report covers: Table-level volume breakdown with tier classification (Analytics/Basic/Data Lake), SecurityEvent/Syslog/CommonSecurityLog deep dives, ingestion anomaly detection (24h and week-over-week), analytic rule inventory with detection coverage cross-reference, rule health monitoring, tier migration candidates with DL-eligibility assessment, and license benefit analysis (DfS P2 and M365 E5).
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ YAML query files PowerShell script LLM render │
│ queries/phase1-5/ ──→ Invoke-IngestionScan ──→ Phase 6 │
│ (23 .yaml files) .ps1 (~2600 lines) (SKILL- │
│ • az monitor (KQL) report.md) │
│ • az rest (REST API) │
│ • az monitor table list │
│ • Invoke-MgGraphRequest │
│ ↓ │
│ temp/ingest_scratch_<ts>.md │
│ (~50 KB, 64 sections) │
└─────────────────────────────────────────────────────────────────┘
Execution model:
- Phases 1-5 (data gathering): Fully automated by
Invoke-IngestionScan.ps1. KQL queries run viaaz monitor log-analytics query. Non-KQL data (analytic rules, tier classifications, custom detections) is gathered via REST API, Azure CLI, and Microsoft Graph. - Phase 6 (rendering): LLM reads the scratchpad +
SKILL-report.mdand renders the report. This is the only phase requiring LLM involvement.
Design decision — TopRecommendations: The Top 3 Recommendations are computed by the LLM at render time (Phase 6), not pre-computed by PS1. Three of the seven Rule E categories (Data loss, DCR filter, Split ingestion) require cross-section reasoning that spans multiple scratchpad sections — this is precisely what the LLM excels at. The PS1 provides all the raw data; the LLM applies Rule E scoring across it.
Companion Files — When to Load
This skill spans 4 files. Load only the file(s) needed for the current phase:
| File | Purpose | When to Load |
|---|---|---|
| SKILL.md (this file) | Architecture, workflow, rendering rules, domain reference | Always — primary entry point |
| SKILL-report.md | Report templates (§1-§8), section-to-scratchpad mapping, formatting rules | Phase 6 rendering only |
| SKILL-drilldown.md | Post-report drill-down — rule cross-referencing (AR + CD via Graph API), ASIM parser verification, known pitfalls, error handling | After report is generated, when user asks follow-up questions (see §13 summary) |
| Invoke-IngestionScan.ps1 | PowerShell data-gathering pipeline (Phases 1-5) | Execution only — no need to read unless debugging |
| slice_scratch.py | Read-only verbatim block slicer — extracts ## PRERENDERED tables/skeleton byte-for-byte so they aren't mangled by hand-copy |
Phase 6 rendering (optional but recommended) |
| render_dashboard.py | Deterministic SVG dashboard renderer — parses scratchpad + report + svg-widgets.yaml into the 7-row dashboard (no hardcoded run data) |
SVG Dashboard Generation (default — run this when asked to visualize) |
📑 TABLE OF CONTENTS
- Quick Start - 3-step execution pattern
- Critical Workflow Rules - Prerequisites and prohibitions
- Execution Workflow - Phases 0-6
- Query File Reference - All 23 YAML files
- Output Modes - Inline chat vs. Markdown file
- Deterministic Rendering Rules - Rules A-G (mandatory for Phase 6)
- Domain Reference - SecurityEvent, Syslog, CommonSecurityLog interpretation
- Tier Classification - Analytics vs Basic vs Data Lake background
- Migration Classification - Zero-rule table categorization for §7a
- Reference: Data Lake Migration - DL-eligible tables, decision matrix, trade-off analysis
- Reference: License Benefits - DfS P2 / E5 pool calculations
- Report Template - JIT pointer → SKILL-report.md
- Post-Report Drill-Down Reference - Rule cross-referencing, Custom Detection API, ASIM verification, error handling
- SVG Dashboard Generation - Visual dashboard from completed report
Quick Start (TL;DR)
3-step execution pattern:
Step 1: Run Invoke-IngestionScan.ps1 (Phases 1-5 — data gathering)
Step 2: Read scratchpad + SKILL-report.md (Phase 6 prep)
Step 3: Render full report (§1-§8) → create_file
Step 1: Run Data Gathering
# From workspace root — run all phases (default: 30 days):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1"
# Specify a custom window (1, 7, 30, 60, or 90 days):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1" -Days 7
# Or run a specific phase (for re-runs / debugging):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1" -Phase 3
# Synthetic mode — use pre-built test data (no Azure auth required):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1" -SyntheticDataDir ".github/skills/sentinel-ingestion-report/test-data/enterprise"
Synthetic mode: When the user asks to generate a report using "synthetic data" or "test data", use -SyntheticDataDir pointing to the enterprise test data directory. This bypasses all Azure/Sentinel queries and loads pre-built JSON files instead. Useful for testing report rendering without live workspace access.
Output: Scratchpad file at temp/ingest_scratch_<timestamp>.md (~50 KB, 64 sections).
Timing: Full run (Phase 0 = all phases) takes ~20-25 seconds. Individual phases: 3-8 seconds each.
Step 2: Load Rendering Context
- Read the scratchpad file (path printed by PS1 at completion)
- Read SKILL-report.md for rendering templates
Step 3: Render Report (Single Write)
Render the complete report (§1-§8) in a single create_file call. Apply SKILL-report.md templates to scratchpad data, following Rules A-G. Render all 8 sections (Executive Summary, Ingestion Overview, Deep Dives, Anomaly Detection, Detection Coverage, License Benefit Analysis, Optimization Recommendations, Appendix) and write to the report file.
⛔ Single-write requirement: The entire report MUST be rendered in one create_file call. Do NOT split rendering across multiple tool calls — splitting causes the LLM to lose template context for later sections (§5-§8), resulting in heading drift, column mutations, and invented content. The complete SKILL-report.md template must be active throughout the entire generation.
🔴 Verbatim table/skeleton blocks — use the deterministic slicer, never hand-copy. The PS1 pre-renders every table, the ASCII cost-waterfall, and the §-heading skeleton under ## PRERENDERED in the scratchpad (Headings, CostWaterfall, DailyChart, TopTables, DetectionPosture, AnomalyTable, CrossReference, SE_Computer, SE_EventID, SyslogHost/Facility/FacSev/Process, CSL_Vendor/Activity, Migration, HealthAlerts, BenefitSummary, DfSP2Detail, E5Tables, QueryTable, Footer). Copy them with the read-only helper instead of transcribing by hand:
python .github/skills/sentinel-ingestion-report/slice_scratch.py --scratch temp/ingest_scratch_<ts>.md --list
python .github/skills/sentinel-ingestion-report/slice_scratch.py --scratch temp/ingest_scratch_<ts>.md --section AnomalyTable
The slicer prefers the ## PRERENDERED copy when a section name also exists as a raw data block, folds the nested ### lines of the Headings skeleton into one block (so --section Headings returns the full §-heading lock list), strips pipeline scaffolding (<!-- … --> comments, SectionTitle: markers), preserves #### sub-headings, and collapses blank runs — so the output drops straight into the report as a valid markdown table or fenced block. Do NOT paste the raw Key | Value | … data blocks (the early raw sections with a <!-- header --> comment and no |---| separator row) — they render as plain text, not tables, and dumping the whole scratchpad tail into one section corrupts the report.
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY ingestion report:
- Run
Invoke-IngestionScan.ps1— this single script handles ALL data gathering (Phases 1-5). The LLM does NOT run queries, transcribe output, or write scratchpad sections - Read
config.jsonfor workspace ID, tenant, subscription, and Azure MCP parameters - ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or both (default: both)
- ALWAYS ask the user for timeframe if not specified: supported values are 1, 7, 30 (default), 60, or 90 days. The
-Daysparameter controls the primary window; deep-dive and comparison windows are derived automatically
Date Window Model
The -Days parameter drives three time windows used across all queries:
| Window | Token | Derivation | Purpose |
|---|---|---|---|
| Primary | {days} |
= -Days value |
Usage overview (Q1-Q3), alert firing (Q12), license benefits (Q17/Q17b), tier summary (Q10b) |
| Deep-dive | {deepDiveDays} |
≤7→Days, ≤30→7, ≤60→14, ≤90→30 | Table breakdowns (Q4-Q8), rule health (Q11/Q11d), cross-ref (Q13), migration candidates (Q16), WoW "this period" (Q15) |
| Comparison | {wowTotalDays} |
= deepDiveDays × 2 | Period-over-period total lookback (Q15) |
Example: -Days 60 → primary=60d, deep-dive=14d, comparison=28d
Dynamic period labels: Report column headers adapt automatically ("This Week"/"Last Week" for 7d deep-dive, "This Month"/"Last Month" for 30d, "This Period"/"Last Period" for 14d).
Exception: Q14 (24h anomaly detection) is unaffected by -Days — it uses fixed algorithmic constants (P30D lookback, 29-day weekday baseline).
5. ALWAYS use create_file for markdown reports (NEVER use PowerShell terminal commands)
6. ALWAYS sanitize PII from saved reports — use generic placeholders for real hostnames, workspace names, and tenant GUIDs in committed files
7. Read scratchpad + SKILL-report.md before rendering — the scratchpad is the sole data source for the report
8. Tier display convention — Azure CLI reports Data Lake tier tables as plan Auxiliary internally, but always refer to this tier as "Data Lake" in output — never use "Auxiliary"
Prerequisites
| Dependency | Required By | Setup |
|---|---|---|
Azure CLI (az) |
All KQL queries (az monitor log-analytics query), analytic rule inventory (az rest), tier classification (az monitor log-analytics workspace table list) |
Install: aka.ms/installazurecli. Authenticate: az login --tenant <tenant_id> then az account set --subscription <subscription_id> |
log-analytics extension |
az monitor log-analytics query (all KQL queries in Phases 1-5) |
Install: az extension add --name log-analytics. Verify: az extension list --query "[?name=='log-analytics']" |
| Azure RBAC | Azure CLI calls above | Log Analytics Reader on the workspace (KQL queries + table list). Microsoft Sentinel Reader on the workspace (analytic rule inventory via az rest) |
| Microsoft.Graph PowerShell | Q9b (Custom Detection rules via Invoke-MgGraphRequest) |
Install-Module Microsoft.Graph.Authentication -Scope CurrentUser. Required Graph scope: CustomDetection.Read.All (interactive consent on first run). PS1 skips gracefully if module not installed or auth fails |
| PowerShell 7.0+ | Parallel query execution | ForEach-Object -Parallel requires PS7+ |
🔴 PROHIBITED
- ❌ Running KQL queries via MCP tools during data gathering — PS1 handles all queries
- ❌ Writing or modifying scratchpad sections manually — PS1 is the sole writer
- ❌ Reporting cost in dollar amounts — always use GB savings (e.g., "~78.7 GB/month savings")
- ❌ Fabricating ingestion volumes, device names, or anomaly percentages
- ❌ Overriding DL eligibility classification from PS1 output based on LLM knowledge
- ❌ Rendering the report without first reading the scratchpad file
Execution Workflow
Phase 0: Initialization
- Read
config.jsonforsentinel_workspace_id,subscription_id, Azure MCP parameters - Confirm output mode and timeframe with user (pass
-Daysto PS1; default 30) - Verify prerequisites:
az loginsession active, correct subscription set
Phases 1-5: Data Gathering (automated by PS1)
Run Invoke-IngestionScan.ps1 — it handles all 5 phases automatically:
| Phase | Queries | Description | Execution Type |
|---|---|---|---|
| 1 | Q1, Q2, Q3 | Core ingestion overview — Usage by DataType, daily trend, workspace summary | KQL (parallel) |
| 2 | Q4, Q5, Q6a, Q6b, Q6c, Q7, Q8 | Table deep dives — SecurityEvent, Syslog, CommonSecurityLog breakdowns | KQL (parallel) |
| 3 | Q9, Q9b, Q10, Q10b | External data — analytic rule inventory (REST), custom detections (Graph), tier classification (CLI), tier summary (KQL) | REST + Graph + CLI + KQL (sequential, with depends_on) |
| 4 | Q11, Q11d, Q12, Q13 | Detection coverage — rule health (SentinelHealth), alert firing (SecurityAlert), cross-reference (all tables with data vs. rule inventory) | KQL (parallel) + post-processing |
| 5 | Q14, Q15, Q16, Q17, Q17b | Anomaly detection + cost analysis — 24h anomaly, WoW comparison, migration candidates, license benefits, E5 per-table | KQL (parallel) + post-processing |
Post-processing (automated by PS1, Phases 4-5):
| Task | Phase | Description |
|---|---|---|
| Table cross-reference | 4 | For each table with data (Q13), regex-search all enabled rule queries for that table name |
| ASIM parser detection | 4 | Search all rule queries for ASIM function patterns (_Im_, _ASim_, imDns, etc.) |
| Value-level rule verification | 4 | For each EventID/Facility/ProcessName/Activity/Vendor from deep dives, check if any rules reference it |
| Detection gap detection | 4 | Identify tables on DL/Basic tier that have enabled rules (🔴 critical finding) |
| Anomaly severity classification | 5 | Apply Rule A thresholds to Q14/Q15 results |
| DL eligibility classification | 5 | Classify all tables using hardcoded $dlYes/$dlNo reference arrays |
| Migration table assembly | 5 | Cross-reference volume × rule count × tier × DL eligibility → category assignment |
| License benefit computation | 5 | Compute DfS P2 pool, E5 grant breakdown |
Scratchpad output: PS1 writes all results to temp/ingest_scratch_<timestamp>.md (~50 KB, ~64 named sections including PHASE_, PRERENDERED, and META blocks). See SKILL-report.md for the Section-to-Scratchpad Mapping.
Phase 6: Render Output (LLM)
🔴 MANDATORY — Load scratchpad + report template before rendering:
- Read the scratchpad file (path printed by PS1). This single file contains ALL data from Phases 1-5.
- Read SKILL-report.md for the complete rendering templates and formatting rules.
Pre-render validation:
- Verify scratchpad has all 5 phase sections (PHASE_1 through PHASE_5)
- Check that
PHASE_5.DL_Script_Outputis populated (proof of DL classification execution) - Cross-validate: Q11
TotalRulesInHealthagainst Q9AR_Enabled— if >10% gap, note it
Render — Section-by-Section Checklist:
Render the report section by section per SKILL-report.md templates. Do NOT skip any section. If a section's data returned 0 results, render the section header with a "✅ No anomalies/items found" note.
| Section | Data Source (scratchpad keys) | Required |
|---|---|---|
| §1 | All phases | ✅ Workspace at a Glance, Cost Waterfall, Detection Posture, Top 3 |
| §2 | PHASE_1.Tables, PHASE_3.TierSummary | ✅ Table breakdown + tier summary |
| §3 | PRERENDERED.SE_, PRERENDERED.Syslog, PRERENDERED.CSL_* | ✅ Deep dives (skip sub-section only if table not in top 20) |
| §4 | PHASE_5.Anomaly24h/AnomalyWoW, PHASE_1.DailyTrend | ✅ Anomaly table + daily trend chart |
| §5 | PHASE_3.RuleInventory, PHASE_4.* | ✅ Rule inventory + cross-ref + health |
| §6 | PHASE_5.LicenseBenefits/E5_Tables | ✅ DfS P2 + E5 analysis |
| §7 | PHASE_5.Migration, PHASE_4.CrossRef | ✅ Migration candidates + recommendations |
| §8 | All | ✅ Appendix (query reference, methodology) |
Compute Top 3 Recommendations using Rule E: scan all scratchpad sections, score each candidate, select the top 3 by score.
Post-render:
- Render inline chat executive summary (if requested)
- Confirm markdown file path to user
Query File Reference
All queries are defined as YAML files in queries/phase1-5/. PS1 discovers, parses, and executes them automatically.
YAML Format
id: ingestion-q1 # Unique identifier
name: Usage by DataType with Billing Breakdown # Human-readable name
description: Top 20 tables ranked by volume # What it does
phase: 1 # Which phase (1-5)
type: kql # kql | rest | cli | graph
timespan: P{days}D # Placeholder — PS1 substitutes at runtime
query: | # KQL query (multiline block scalar)
Usage
| where TimeGenerated > ago({days}d)
...
Non-KQL types have additional fields:
| Type | Additional Fields | Description |
|---|---|---|
rest |
url, method, jmespath |
Sentinel REST API via az rest |
cli |
command |
Azure CLI command (e.g., az monitor log-analytics workspace table list) |
graph |
uri, method |
Microsoft Graph API via Invoke-MgGraphRequest |
Complete Query Inventory
| Phase | File | ID | Type | Description |
|---|---|---|---|---|
| 1 | Q1-UsageByDataType.yaml | ingestion-q1 | kql | Top 20 tables by billable volume with solution mapping |
| 1 | Q2-DailyIngestionTrend.yaml | ingestion-q2 | kql | Daily ingestion trend |
| 1 | Q3-WorkspaceSummary.yaml | ingestion-q3 | kql | Executive summary: table count, billable totals, daily average |
| 2 | Q4-SecurityEventByComputer.yaml | ingestion-q4 | kql | SecurityEvent by Computer (top 25) |
| 2 | Q5-SecurityEventByEventID.yaml | ingestion-q5 | kql | SecurityEvent by EventID (top 20) |
| 2 | Q6a-SyslogByHost.yaml | ingestion-q6a | kql | Syslog by source host (top 25) |
| 2 | Q6b-SyslogByFacilitySeverity.yaml | ingestion-q6b | kql | Syslog by Facility × SeverityLevel (top 30) |
| 2 | Q6c-SyslogByProcess.yaml | ingestion-q6c | kql | Syslog top ProcessName by Facility (top 30) |
| 2 | Q7-CSLByVendor.yaml | ingestion-q7 | kql | CommonSecurityLog by DeviceVendor/DeviceProduct (top 20) |
| 2 | Q8-CSLByActivity.yaml | ingestion-q8 | kql | CommonSecurityLog by Activity/LogSeverity/DeviceAction (top 30) |
| 3 | Q9-AnalyticRuleInventory.yaml | ingestion-q9 | rest | Analytic rules (Scheduled + NRT) via Sentinel REST API |
| 3 | Q9b-CustomDetectionRules.yaml | ingestion-q9b | graph | Custom Detection rules via Microsoft Graph SDK |
| 3 | Q10-TableTierClassification.yaml | ingestion-q10 | cli | Table tier classification via Azure CLI |
| 3 | Q10b-TierSummary.yaml | ingestion-q10b | kql | Per-tier volume summary (depends_on: Q10) |
| 4 | Q11-RuleHealthSummary.yaml | ingestion-q11 | kql | SentinelHealth — rule execution health summary |
| 4 | Q11d-FailingRuleDetail.yaml | ingestion-q11d | kql | SentinelHealth — top 20 failing rules detail |
| 4 | Q12-SecurityAlertFiring.yaml | ingestion-q12 | kql | SecurityAlert — top 30 alert-producing rules |
| 4 | Q13-AllTablesWithData.yaml | ingestion-q13 | kql | All tables with data in deep-dive window (for cross-reference) |
| 5 | Q14-IngestionAnomaly24h.yaml | ingestion-q14 | kql | 24h vs same-weekday avg anomaly detection (29d lookback, fallback to flat 7d, >50%, ≥0.01 GB) |
| 5 | Q15-WeekOverWeek.yaml | ingestion-q15 | kql | Period-over-period volume comparison |
| 5 | Q16-MigrationCandidates.yaml | ingestion-q16 | kql | Billable tables with deep-dive volume (for migration analysis) |
| 5 | Q17-LicenseBenefitAnalysis.yaml | ingestion-q17 | kql | DfS P2 + E5 daily ingestion breakdown |
| 5 | Q17b-E5PerTableBreakdown.yaml | ingestion-q17b | kql | E5-eligible per-table volume |
Output Modes
Mode 1: Inline Chat Summary (default for quick requests)
Compact executive summary rendered directly in chat.
Mode 2: Markdown File Report
Full detailed report saved to reports/sentinel/sentinel_ingestion_report_<YYYYMMDD_HHMMSS>.md.
Mode 3: Both (default when user says "report" or "generate report")
Inline chat executive summary + full markdown file.
Ask user if not specified:
"How would you like the report? I can provide:
- Inline chat summary — executive overview in chat
- Markdown file — detailed report saved to reports/sentinel/
- Both (recommended) — summary in chat + full report file"
Deterministic Rendering Rules
These rules eliminate LLM interpretation variance. Apply them EXACTLY during report rendering (Phase 6). No discretion allowed — the thresholds and formulas below are the sole authority.
Rule A: Anomaly Severity Classification
⚙️ Pre-computed by PS1 →
PRERENDERED.AnomalyTable. Thresholds below retained for §8 methodology reference and manual verification.
Assign severity to each anomaly row deterministically based on absolute deviation AND volume.
| Condition (both must be true) | Severity | Emoji |
|---|---|---|
| abs(Deviation%) ≥ 200 AND max(Last24hGB, Avg7dGB) ≥ 0.05 GB | High | 🟠 |
| abs(Deviation%) ≥ 100 AND max(Last24hGB, Avg7dGB) ≥ 0.01 GB | Medium | 🟡 |
| abs(Deviation%) ≥ 50 AND max(Last24hGB, Avg7dGB) ≥ 0.01 GB | Low | ⚪ |
| Below thresholds OR both periods < 0.01 GB volume | Excluded | — |
Volume floor: The 0.01 GB minimum is enforced by the KQL queries. Tables below this floor are noise and MUST NOT appear in the anomaly table regardless of deviation percentage.
Override 1 — Rule-count: ANY table with ≥5 enabled rules AND an absolute change ≥40% (in either 24h or WoW) is automatically 🟠 regardless of base thresholds — a significant drop on a table feeding multiple rules signals potential connector or TI feed health issues that affect detection coverage. The 24h override catches same-day connector outages; the WoW override catches gradual multi-day degradation.
Override 2 — Near-zero: ANY table with deviation ≤ −95% AND max(volume) ≥ 0.05 GB is automatically 🟠 regardless of rule count — a near-complete signal loss on a significant table is an operational emergency (e.g., connector failure, API key expiry) even if no rules reference it directly.
⛔ PROHIBITED: Assigning severity based on "judgment", "context", or "this table is important" UNLESS the high-rule-count override above applies. Outside that specific override, the emoji MUST match the threshold table above — no discretionary overrides.
Rule B: Risk Rating Definition
In the Top 3 Recommendations table (§1), the "Risk" column means:
Risk = the security or operational impact of NOT acting on this recommendation.
| Risk Level | Definition | Examples |
|---|---|---|
| High | Active detection gap or data loss if not addressed | Rules silently failing on DL tier; connector dropping data; 0% detection coverage on critical table |
| Medium | Missed optimization with measurable cost/posture impact | Zero-rule high-volume table on Analytics tier; noisy EventID with no detection value |
| Low | Minor improvement, no immediate security or cost impact | Small-volume table tier change; informational tuning |
⛔ PROHIBITED: Interpreting "Risk" as implementation difficulty, effort, or change management complexity. Those concerns belong in prose recommendations (§7b-d), NOT the Risk column.
Rule C: Weekday Average Computation
⚙️ Pre-computed by PS1 →
PRERENDERED.DailyChart. Logic below retained for §8 methodology reference.
When computing per-weekday averages for the §4b daily trend chart:
- Exclude the report-generation day: If the last day in
PHASE_1.DailyTrendmatchesMETA.Generateddate, always exclude it from weekday averages — the report was generated mid-day so this is a partial day regardless of its volume. This prevents the partial day from non-deterministically dragging down whichever weekday it falls on. - Exclude ingestion gaps: Any remaining day with total ingestion < 0.1 GB is also excluded. These are ingestion reporting gaps, not representative of normal patterns.
- Formula:
Weekday Avg = sum(GB for that weekday, excluding days per rules 1–2) / count(qualifying days for that weekday) - Round to 2 decimal places.
⛔ PROHIBITED: Including the report-generation partial day or days with < 0.1 GB in averages — they drag down specific weekdays non-deterministically.
Rule D: Cross-Validation Denominator
In §5b cross-validation (Q11 vs Q9), always use AR-only enabled count from Q9 as the denominator:
Gap% = (Q9_AR_Enabled - Q11_DistinctRules) / Q9_AR_Enabled × 100
Do NOT use Combined_Enabled (AR+CD) as the denominator. SentinelHealth only tracks AR executions (Scheduled + NRT), not Custom Detection executions. Comparing Q11 against combined AR+CD inflates the gap percentage.
Rule E: Top 3 Recommendation Ranking
Rank ALL candidate recommendations using this scoring formula. The top 3 by score become the Top 3 in §1. Computed by the LLM at render time by cross-referencing all scratchpad sections.
| Category | SeverityWeight | ImpactValue | Scratchpad Source |
|---|---|---|---|
| 🔴 Detection gap (rules on wrong tier) | 10 | Number of affected rules | PHASE_4.DetectionGaps — PS1 emits Detection gap (XDR) or Detection gap (non-XDR) in §7a Category column |
| 🔴 Data loss / connector failure | 10 | Affected volume in GB/day | PHASE_5.Anomaly24h (large negative deviations) |
| 🟠 DL-eligible migration (zero rules) | 5 | BillableGB from §2a (or deep-dive GB / deepDiveDays × Days if only in §7a) | PHASE_5.Migration (Strong DL-eligible rows) |
| 🟠 DL + KQL Job promotion | 4 | BillableGB (primary window) | High-volume 🟣/🟢 table — can complement split ingestion or stand alone; present both options and note they are combinable |
| 🟠 License benefit activation | 4 | Eligible unclaimed GB/day | PRERENDERED.BenefitSummary + PRERENDERED.E5Tables + PRERENDERED.DfSP2Detail (volume eligible but benefit not yet activated) |
| 🟠 DCR filter / EventID pruning | 4 | Estimated saveable GB (deep dive % × table BillableGB) | PHASE_2.SE_EventID + PHASE_4.ValueRef_EventID |
| 🟠 Health fix (failing rules) | 4 | Number of failing rules | PHASE_4.FailingRules |
| 🟡 Volume spike / cost anomaly | 3 | Spike GB on zero-rule tables | PHASE_5.Anomaly24h (large positive deviations on zero-rule tables — cost spike with no detection value) |
| 🟡 Duplicate ingestion | 3 | Duplicate GB | Cross-ref PRERENDERED.SyslogFacility × PRERENDERED.CSL_Vendor (same-appliance overlap emitting both Syslog and CEF/ASA = double billing) |
| 🟡 Split ingestion | 3 | BillableGB × estimated non-detection fraction | PHASE_2 deep dives + PHASE_4.ValueRef_* (zero-rule values) |
| 🟡 Tier review / unknown eligibility | 2 | BillableGB | PHASE_5.Migration (Unknown rows) |
Score = SeverityWeight × ImpactValue
Sorting: severity-first, then score. 🔴 items always rank above 🟠 items, which always rank above 🟡 items, regardless of score. Within the same severity tier, rank by descending score. This ensures detection gaps and data loss signals are never buried below cost optimizations.
Tie-breaking within same severity: higher score wins. If scores are equal, higher SeverityWeight wins. If still tied, higher ImpactValue wins.
⛔ PROHIBITED: Selecting Top 3 recommendations based on narrative variety, "one from each category", or subjective importance. The formula determines ranking — the LLM renders, it does not curate.
License benefit activation: Surfaces when
PRERENDERED.BenefitSummaryorPRERENDERED.E5Tablesshow E5-eligible or DfS-P2-eligible volume that is not yet being claimed (benefit shows 0 or is absent while eligible tables are ingesting). ImpactValue = the eligible GB/day that could be offset.Volume spike / cost anomaly: Surfaces when
PHASE_5.Anomaly24hshows a large positive deviation (>50% above baseline) on a table with zero detection rules (perPHASE_4.CrossRef). A spiking table with no rules has cost impact but no detection value — a strong signal to investigate and potentially filter or move to DL.Duplicate ingestion: Surfaces when the same network appliance sends data via both Syslog and CommonSecurityLog (CEF/ASA). Compare appliance names/IPs in
PRERENDERED.SyslogFacility/PRERENDERED.SyslogHostagainstPRERENDERED.CSL_Vendor— overlapping sources indicate double billing for the same data. ImpactValue = the smaller of the two streams (the duplicate portion).
Domain Reference
This section provides the domain knowledge needed during Phase 6 rendering. When writing deep dive sections (§3), anomaly analysis (§4), and recommendations (§7), consult these reference tables for interpretation guidance.
SecurityEvent — EventID Optimization
Which EventIDs generate the most volume and their detection vs. cost tradeoff:
| EventID | Description | Optimization Potential |
|---|---|---|
| 4663 | Object access (file auditing) | 🔴 High — often excessive. Consider DCR drop filter or scoping SACL |
| 4624 | Successful logon | 🟡 Medium — valuable for hunting/forensics but rarely in analytic rules. Strong split ingestion candidate: send to Data Lake for retention, keep off Analytics tier |
| 4688 | Process creation | 🟡 Medium — consider moving to MDE DeviceProcessEvents. If no rules reference it, split to Data Lake |
| 4799 | Security group membership enumeration | 🟡 Medium — often noisy on domain controllers |
| 4672 | Special privileges assigned | 🟡 Medium — high volume on DCs |
| 4625 | Failed logon | 🟢 Low — usually valuable for security detection |
🟣 Split ingestion tip: For any deep-dive table classified as 🟢 Keep Analytics (active detection rules), individual high-volume values with zero rule references (verified via Phase 4 value-level check) are strong candidates for sub-table split ingestion. Route those values to Data Lake via DCR transformation — they remain available for hunting while the detection-relevant values stay on Analytics tier. KQL jobs can also run against this split-routed DL data to surface aggregated insights back to Analytics if needed.
Syslog — Facility Reference
Optimization potential by Syslog facility:
| Facility | Description | Optimization Potential |
|---|---|---|
| auth | Authentication events (login, su, getty) | 🟢 Low — always security-relevant. Keep in Analytics tier |
| authpriv | Private authentication (PAM, sudo, sshd) | 🟢 Low — critical for security detection. Always keep |
| kern | Kernel messages (hardware, driver, critical system) | 🟡 Medium — security-relevant but can be noisy. Consider Error+ only for high-volume servers |
| cron | Scheduled task notifications | 🔴 High — rarely security-relevant at Info/Notice. Keep Warning+ only |
| daemon | System daemon messages (systemd, sshd, named, httpd) | 🔴 High — typically largest Syslog contributor (50-80% of volume). Contains both security-critical processes (sshd) and noisy infrastructure (systemd). Drill down with Q6c to identify filterable processes |
| syslog | Internal syslog daemon messages | 🟡 Medium — mostly operational. Keep Warning+ in Analytics |
| user | User-space application messages | 🟡 Medium — varies by application. Check ProcessName |
| Mail subsystem (postfix, sendmail, dovecot) | 🟡 Medium — relevant if mail is in scope; otherwise DL candidate | |
| local0–local7 | Custom application logs | 🔴 High — most common cost optimization targets. Custom apps often log at Debug/Info verbosity |
| ftp | FTP daemon messages | 🟢 Low volume; keep for auditing if FTP in use |
| lpr | Print subsystem | 🔴 High — almost never security-relevant. Set to None in DCR |
| news | Network news (NNTP) | 🔴 High — almost never security-relevant. Set to None in DCR |
| uucp | UUCP subsystem | 🔴 High — almost never security-relevant. Set to None in DCR |
| mark | Internal timestamp marker | 🔴 High — operational only. Set to None in DCR |
Syslog — DCR Severity-per-Facility Recommendations
The Data Collection Rule allows setting a minimum severity level per facility — the single most impactful cost control for Syslog:
| Facility | Recommended Minimum | Rationale |
|---|---|---|
| auth, authpriv | Debug (collect all) | Security-critical — never filter |
| kern | Notice | Kernel module loads (T1547.006) and promiscuous mode (T1040) are kern.notice. Volume impact is minimal |
| daemon | Warning or Error | Major volume reduction. Note: sshd auth events go to auth/authpriv, not daemon. Trade-off: loses systemd service stop events at Info (security service tampering) — acceptable if EDR covers this |
| cron | Warning | Trade-off: cron job execution events are cron.info (T1053.003 persistence). Acceptable if auditd or MDE covers cron file monitoring |
| syslog | Warning | Internal operational messages are low-value at Info |
| user | Warning | Unless specific apps produce security telemetry |
| Warning | Info-level mail relay logs are very verbose | |
| local0–local7 | Assess per-app | No safe default — network appliances, security tools, and databases commonly use local facilities. Review Q6c (Process by Facility) before setting severity filters |
| lpr, news, uucp, mark | None | Disable collection entirely |
Syslog — SeverityLevel Values
| SeverityLevel (string) | Numeric | Meaning | Retention Priority |
|---|---|---|---|
| emerg | 0 | System unusable | 🔴 Always keep |
| alert | 1 | Immediate action required | 🔴 Always keep |
| crit | 2 | Critical condition | 🔴 Always keep |
| err | 3 | Error condition | 🟡 Keep for most facilities |
| warning | 4 | Warning condition | 🟡 Keep for security-relevant facilities |
| notice | 5 | Normal but significant | 🟡 Keep for auth/authpriv and kern |
| info | 6 | Informational | 🟢 Filter for high-volume facilities |
| debug | 7 | Debug-level detail | 🟢 Filter everywhere except auth/authpriv |
Syslog — ProcessName Security Relevance
| ProcessName | Typical Facility | Security Relevance | Optimization |
|---|---|---|---|
| systemd | daemon | 🟡 Low-Medium — unit start/stop events | 🔴 Often 30-50% of daemon volume. Filter Info/Notice at DCR |
| systemd-logind | daemon | 🟡 Medium — session/seat tracking | Keep Warning+ |
| sshd | auth, authpriv, daemon | 🟢 High — SSH login detection (brute force, lateral movement) | 🟢 Always keep |
| sudo | authpriv | 🟢 High — privilege escalation tracking | 🟢 Always keep |
| su | auth, authpriv | 🟢 High — user switching | 🟢 Always keep |
| CRON / crond | cron | 🟡 Low-Medium — scheduled tasks | Keep Warning+ unless monitoring for T1053 |
| named / bind | daemon | 🟡 Medium — DNS. Relevant for DNS tunneling | Keep if DNS rules exist; otherwise Warning+ |
| httpd / nginx | daemon | 🟡 Medium — web server logs | Assess overlap with WAF/CSL data |
| postfix / sendmail | 🟡 Low-Medium — mail relay | Keep Warning+ | |
| dhclient / NetworkManager | daemon | 🟡 Low — DHCP/network changes | Filter Info/Notice |
| kernel | kern | 🟢 Medium-High — kernel events, module loads | Keep Warning+ |
| auditd | daemon, user | 🟢 High — Linux Audit Framework | 🟢 Always keep |
| polkitd | authpriv | 🟡 Medium — PolicyKit authorization | Keep Warning+ |
| dbus-daemon | daemon | 🟡 Low — IPC. Rarely security-relevant | Filter all or keep Error+ |
| rsyslogd / syslog-ng | syslog | 🟡 Low — internal syslog ops | Keep Warning+ |
🟣 Split ingestion tip: If
daemonfacility accounts for >50% of Syslog and Q6c revealssystemd+systemd-logind+dbus-daemondominate, consider a DCR transformation routing those processes to Data Lake while keepingsshd,auditd, and other security-critical processes in Analytics. KQL jobs can complement this by querying the DL-routed portion on schedule.
Syslog — Log Forwarding Architecture Note
In environments using centralized rsyslog/syslog-ng forwarders:
Computer= the log forwarder hostname (many servers collapse to 1-2 forwarders)HostName= the actual originating device (from syslog header)HostIP= the originating device's IP address
The Q6a query uses SourceHost = iff(isnotempty(HostName) and HostName != Computer, HostName, Computer) to prefer the original source. If Q6a shows only 1-2 hosts despite expecting 100+ servers, the environment uses forwarding.
CommonSecurityLog — Vendor Reference
| DeviceVendor | DeviceProduct | Optimization Potential |
|---|---|---|
| Palo Alto Networks | PAN-OS | 🔴 High — filter TRAFFIC activity, keep THREAT in Analytics |
| Check Point | Firewall / VPN-1 & FireWall-1 | 🔴 High — filter routine Accept actions |
| Fortinet | Fortigate | 🔴 High — filter traffic subtype, keep utm and event |
| Cisco | ASA | 🟡 Medium — filter by message ID ranges |
| Zscaler | NSSWeblog | 🟡 Medium — web proxy logs can be high volume |
| F5 | BIG-IP ASM / LTM | 🟡 Medium — WAF logs can spike during attacks |
| Trend Micro | Deep Security | 🟢 Low — typically moderate volume |
Firewall traffic/session logs often account for 60-80% of CSL volume. These are primarily
TRAFFICorAcceptevents with low detection value. Consider DCR transformation, Data Lake tier, split ingestion, or DL + KQL job promotion (these last two can be combined).
CommonSecurityLog — LogSeverity Values
| Value (string) | Value (int) | Meaning | Retention Priority |
|---|---|---|---|
| Very-High | 9-10 | Critical security event | 🔴 Always keep in Analytics |
| High | 7-8 | Significant security event | 🔴 Keep in Analytics |
| Medium | 4-6 | Notable event | 🟡 Review — may be filterable |
| Low | 0-3 | Informational event | 🟢 Candidate for DL or DCR filter |
| (empty/Unknown) | — | Unmapped severity | ⚠️ Check vendor documentation |
DeviceAction optimization: If >70% of events have DeviceAction = "Allow" or "Accept", the table is dominated by permitted traffic. Filter at DCR level or move to Data Lake, keeping only denied/blocked/threat events in Analytics.
Anomaly Interpretation (Q14/Q15)
24h anomalies (Q14): Flags tables where last-24h ingestion deviates >50% from the same-weekday daily average AND at least one period has ≥0.01 GB volume. Q14 uses a fixed 29-day lookback (algorithmic constant, not affected by -Days).
- Positive spikes: May indicate attacks, misconfigured connectors, or bulk imports
- Negative drops: May indicate connector failures, agent issues, or collection gaps
Period-over-period (Q15): Compares total volume per table between current and prior period (period length = deep-dive window).
- New tables (100% change) → appeared only this period (new connector?)
- Growing tables → expanding collection scope or increased activity
- Shrinking tables → connector removal, collection changes, or seasonal patterns
- Stable high-volume tables → included via
ThisWeekMB > 100filter for visibility
Tier Classification
Background
The Sentinel Usage table does NOT contain a TablePlan or Tier column. There is no KQL-native way to determine whether a table is on Analytics, Basic, or Data Lake tier.
PS1 handles this automatically: Q10 (CLI type) runs az monitor log-analytics workspace table list to fetch table plans, then Q10b (KQL, depends_on: Q10) computes per-tier volume summaries using the CLI output. The results are written to PHASE_3.Tiers and PHASE_3.TierSummary in the scratchpad.
Tier Display Convention
Azure CLI reports Data Lake tier tables as plan Auxiliary internally. Always refer to this tier as "Data Lake" in all output — never use "Auxiliary". The _CL suffix denotes a custom log table, not a copy — describe these as "Custom Data Lake table" (not "Auxiliary copy").
Q10b Cross-Reference Query
PS1 automatically populates the DataLakeTables and BasicTables arrays from CLI output and executes the tier summary KQL query. This computes per-tier TotalGB, BillableGB, TableCount, and PercentOfTotal using the full Usage table (not limited to Q1 top-20). These values are the authoritative source for PHASE_3.TierSummary and §2b rendering.
Migration Classification
Used when rendering §7a (Tier Migration Candidates). PS1 computes the Category column using these criteria; the LLM uses this reference for rendering interpretation and recommendation prose.
| Category | Criteria | Action |
|---|---|---|
| 🔵 KQL Job output | Table name ends with _KQL_CL |
NEVER migrate — promoted data from Data Lake, essential for detection pipeline |
| 🔵 Already on Data Lake | Q10 tier = Data Lake AND zero rules | Already migrated — no action needed |
| 🟢 Keep Analytics | ≥1 enabled analytic rule AND healthy executions | Active detection coverage justifies Analytics cost |
| 🟣 Split ingestion candidate | 1-2 enabled rules AND high-volume (≥5 GB/week) AND DL-eligible | Few rules need only a subset of events. Route detection-relevant subset to Analytics via DCR, rest to Data Lake |
| ❗ Detection gap (non-XDR) | ≥1 enabled rule AND table is on Data Lake tier AND table is NOT an XDR table | Critical: Analytic rules cannot execute against DL tables — rules silently failing. Custom Detections also do NOT work because non-XDR tables are not available in Advanced Hunting on Data Lake. Remediation: (1) move table back to Analytics, OR (2) remove/disable the analytic rules referencing the table (accept DL tier). ⛔ PROHIBITED: Recommending "convert ARs to Custom Detections" for non-XDR tables — CDs run against Advanced Hunting which only retains Defender XDR tables for 30 days. Non-XDR tables on Data Lake are invisible to Advanced Hunting. |
| ❗ Detection gap (XDR) | ≥1 enabled rule AND table is on Data Lake tier AND table IS an XDR table | Partial gap: Sentinel Analytic Rules (AR) cannot execute against DL tables — ARs silently failing. However, XDR-native tables (Device*, Email*, CloudAppEvents, UrlClickEvents) are ALWAYS available in Advanced Hunting for 30 days regardless of Sentinel tier. Custom Detection rules run against Advanced Hunting, so CD rules continue to work. Only ARs are broken. Remediation: (1) move table back to Analytics, (2) convert affected ARs to Custom Detections, OR (3) remove/disable the ARs. See Advanced Hunting data retention |
| 🔴 Strong candidate (DL-eligible) | 0 rules AND DL classification = Yes |
Evaluate DCR filtering to reduce unnecessary volume, then migrate remainder to Data Lake |
LLM overlay checks (not separate emojis — flag as callout notes in §7b/7c prose):
- Execution issues: If a 🟢 table's rules appear in
PHASE_4.FailingRuleswith 0 executions or failures, add a ⚠️ note: "Rules targeting [table] have execution issues — see §5b. Fix rules before relying on this coverage."- ASIM dependency: If a 🔴 zero-rule table appears in
PHASE_4.ASIMas consumed by ASIM parsers, add a ⚠️ note: "[table] is consumed by ASIM parsers ([parser names]) — migrating to Data Lake breaks these detections. Verify ASIM dependency before migrating." | 🟠 Not DL-eligible / unknown | 0 rules AND DL classification =NoorUnknown| Optimize via DCR filtering or add analytic rules. Check MS docs for current eligibility |
Reference: Data Lake Migration
This section contains lookup tables and background guidance for DL migration classification. Consult when rendering §7a recommendations and explaining Data Lake trade-offs.
Known DL-Eligible Tables
PS1 uses these lists as hardcoded $dlYes/$dlNo arrays. Keep this reference in sync with the script.
| Category | DL-Eligible Tables | Notes |
|---|---|---|
| Defender XDR | CloudAppEvents, DeviceEvents, DeviceFileCertificateInfo, DeviceFileEvents, DeviceImageLoadEvents, DeviceInfo, DeviceLogonEvents, DeviceNetworkEvents, DeviceNetworkInfo, DeviceProcessEvents, DeviceRegistryEvents, EmailAttachmentInfo, EmailEvents, EmailPostDeliveryEvents, EmailUrlInfo, UrlClickEvents | GA Feb 2025 |
| Verified LA tables | AADManagedIdentitySignInLogs, AADNonInteractiveUserSignInLogs, AADProvisioningLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, AuditLogs, AWSCloudTrail, AzureDiagnostics, CommonSecurityLog, Event, GCPAuditLogs, LAQueryLogs, McasShadowItReporting, MicrosoftGraphActivityLogs, OfficeActivity, Perf, SecurityAlert, SecurityEvent, SecurityIncident, SentinelHealth, SigninLogs, StorageBlobLogs, Syslog, W3CIISLog, WindowsEvent, WindowsFirewall | ⚠️ Only these LA tables are verified DL-eligible. Unlisted → Unknown |
| Custom tables | Any table ending in _CL (except _KQL_CL) |
Custom log tables are workspace-managed → DL-eligible |
Known DL-Ineligible Tables (as of Feb 2026)
| Category | Ineligible Tables | Notes |
|---|---|---|
| XDR — not yet supported | DeviceTvmSoftwareInventory, DeviceTvmSoftwareVulnerabilities, AlertEvidence, AlertInfo, IdentityDirectoryEvents, IdentityLogonEvents, IdentityQueryEvents | MDI tables announced for future DL support |
| Entra ID ❌ | MicrosoftServicePrincipalSignInLogs, MicrosoftNonInteractiveUserSignInLogs, MicrosoftManagedIdentitySignInLogs | Not yet DL-eligible |
| Threat Intelligence ❌ | ThreatIntelIndicators, ThreatIntelligenceIndicator | Required on Analytics for TI matching rules. Never recommend migration |
| Log Analytics ❌ | AppDependencies, AppMetrics, AppPerformanceCounters, AppTraces, AzureActivity, AzureMetrics, ConfigurationChange, Heartbeat, SecurityRecommendation | Not yet DL-eligible |
Fallback rule: If a table is not in either list, the script classifies it as Unknown. Render as ❓ Unknown with note: "Verify at Manage data tiers before migrating."
Decision Matrix
| Enabled Rules | Executions (Health) | Alerts (Q12) | DL-Eligible? | Volume | Recommendation |
|---|---|---|---|---|---|
| 0 | N/A | 0 | ✅ Yes | > 1 GB/week | 🔴 Evaluate DCR filtering to reduce volume, then migrate remainder to Data Lake (confirm no ASIM dependency) |
| 0 | N/A | 0 | ✅ Yes | < 1 GB/week | 🔴 Migrate to Data Lake — minimal savings but cleaner tier alignment. DCR filtering optional at this volume |
| 0 | N/A | 0 | ❌ No / ❓ Unknown | Any | 🟠 Not eligible or unknown — review ingestion necessity, apply DCR filtering |
| 0 | N/A (on DL) | 0 | N/A — already DL | Any | 🔵 Already on Data Lake — no action needed |
| 0 (ASIM-dependent) | N/A | 0 | Any | Any | � Migrate — but LLM adds ⚠️ ASIM dependency callout in §7b |
| ≥1 | 0 or failures | 0 | Any | Any | 🟢 Keep — but LLM adds ⚠️ execution issues callout in §7b |
| ≥1 | 0 (on DL) | Any | N/A — on DL | Any | 🔴 Detection gap — ARs cannot execute against DL. PS1 emits Detection gap (XDR) or Detection gap (non-XDR). If XDR table: CDs still work via Advanced Hunting; recommend converting ARs→CDs or moving back to Analytics. If non-XDR table: move back to Analytics OR remove/disable rules. ⛔ NEVER recommend CD conversion for non-XDR tables |
| 1-2 | > 0, healthy | Any | ✅ Yes | ≥ 5 GB/week | 🟣 Split ingestion candidate |
| ≥1 | > 0, healthy | 0 | Any | Any | 🟢 Keep Analytics — rules executing, no matches (normal for TI rules) |
| ≥1 | > 0, healthy | > 0 | Any | Any | 🟢 Keep Analytics — active detections generating alerts |
Data Lake Trade-Off
| Capability | Analytics Tier | Data Lake Tier |
|---|---|---|
| Analytics rules, alerting, hunting | ✅ Full support | ❌ Not available (but see XDR exception below) |
| Custom Detection rules (Advanced Hunting) | ✅ Full support | ⚠️ XDR tables only: Still available — AH retains 30 days regardless of Sentinel tier. Non-XDR tables: ❌ |
| Workbooks, playbooks, parsers, watchlists | ✅ Full support | ❌ Not available |
| KQL query performance | ✅ High-performance | ⚠️ Slower |
| Query cost | ✅ Included in ingestion price | ❌ Billed per query (data scanned) |
| KQL Jobs / Summary Rules / Search Jobs | ✅ | ✅ |
| Ingestion cost | Standard | Minimal |
| Default retention | 90 days (Sentinel) / 30 days (XDR) | Matches analytics, extendable to 12 years |
Primary vs secondary security data: Primary security data (EDR alerts, auth logs, audit trails) belongs on Analytics. Secondary data (NetFlow, storage access logs, firewall traffic, IoT logs) is ideal for Data Lake.
Filter before you migrate: For high-volume zero-rule tables, DL migration and DCR filtering are complementary — not mutually exclusive. Evaluate whether all ingested data serves a hunting, forensic, or compliance purpose. If a portion is noise (e.g., verbose diagnostics, routine health checks, debug-level telemetry), apply DCR transformations to drop or reduce that portion first, then migrate the meaningful remainder to Data Lake. This avoids simply shifting cost from Analytics to Data Lake query charges on data nobody uses.
Even when a table has zero rules, consider whether it serves hunting/forensic purposes. Tables like SigninLogs or AuditLogs should generally remain on Analytics regardless.
Data Lake Promotion via KQL Jobs
For high-volume tables on Data Lake — whether fully migrated or partially routed via split ingestion — that still need detection coverage:
- Ingest raw logs into Data Lake tier (cheap)
- Create KQL jobs to query Data Lake on schedule, writing aggregated results to Analytics-tier
_KQL_CLtables - Point analytics rules at the
_KQL_CLoutput table
KQL Job key facts: Full KQL (joins, unions, CTEs). Schedules: by-minute through monthly. Lookback up to 12 years. Limits: 3 concurrent / 100 enabled per tenant, 1hr query timeout. Data Lake has ~15-min ingestion latency — jobs should use now(-15m) as upper bound. TimeGenerated is overwritten if >2 days old — preserve source timestamps in a custom column.
Split Ingestion and/or DL + KQL Job Promotion
PS1 auto-classifies 🟣 Split candidates (1-2 rules, ≥5 GB/week, DL-eligible). For these tables (and high-volume 🟢 Keep tables), the report should present both optimization paths so the operator can choose — or combine them — based on their knowledge of the rule queries:
| Split Ingestion (DCR) | DL + KQL Job | |
|---|---|---|
| How it works | DCR routes a detection-relevant subset to Analytics, bulk to DL | Any data on DL (full table or split-routed portion); KQL job promotes aggregated results to _KQL_CL on Analytics |
| Detection latency | Real-time (subset stays on Analytics) | 15+ min (DL ingestion lag + job schedule) |
| Rule rewrite needed | No — rules keep targeting original table | Yes — rules must target _KQL_CL output |
| Volume savings | Moderate (bulk to DL, subset stays) | Depends on scope — maximum if entire table goes to DL, incremental if applied to split-routed portion |
| Best when | Rules filter on specific raw events (EventIDs, facilities) | Rules use aggregation and tolerate latency |
These approaches are complementary, not mutually exclusive. Split ingestion routes bulk data to DL while keeping detection-relevant events on Analytics. KQL jobs can then run against that DL portion to surface additional insights (e.g., aggregated anomalies) back to Analytics via
_KQL_CLtables — giving you both real-time detection on the split subset AND scheduled analytics on the DL bulk.
Rendering guidance: The LLM does NOT have visibility into rule query text (aggregation vs raw filters), so it cannot definitively recommend one over the other. For 🟣 tables and high-volume 🟢 tables, present the comparison and note which approach fits which rule pattern. Do NOT change PS1's Category emoji in §7a — express as prose in §7b/7c.
References:
Reference: License Benefits
Defender for Servers P2 — 500MB/Server/Day Benefit
- Each server protected by DfS P2 contributes 500 MB/day to a pooled daily allowance
- Pool = (number of protected servers) × 500 MB — aggregate across subscription, not per-machine
- Applies to security data types: SecurityAlert, SecurityBaseline, SecurityBaselineSummary, SecurityDetection, SecurityEvent, WindowsFirewall, MaliciousIPCommunication, SysmonEvent, ProtectionStatus, Update, UpdateSummary
- Applied automatically at workspace level — shows as zero cost
Pool calculation from Q4:
Potential DfS P2 Pool = (Distinct servers from Q4) × 500 MB/day
Example: Q4 shows 12 servers → pool = 6 GB/day. If DFSP2-eligible avg is 4.2 GB/day → fully covered.
| Scenario | Condition | Recommendation |
|---|---|---|
| Pool far exceeds usage | DfSP2_DailyGB < 50% of PoolGB | Highlight the unused headroom and recommend increasing SecurityEvent logging levels (e.g., "All Events" instead of "Common") to broaden detection coverage at no additional ingestion cost. Note that increased data volume may affect retention storage costs |
| Pool covers usage | DfSP2_DailyGB ≥ 50% and ≤ 100% of PoolGB | Pool covers current need — monitor growth and reference §3a if approaching ceiling |
| Usage exceeds pool | DfSP2_DailyGB > PoolGB | Overage is billed at standard rates — review §3a EventID breakdown for reduction opportunities, or consider onboarding more servers to DfS P2 to expand the pool |
M365 E5 / Defender XDR Ingestion Benefit
- M365 E5 (or E5 Security, A5, F5, G5) provides 5 MB per user per day pooled data grant (offer page)
- Grant = (number of E5 licenses) × 5 MB/day
- Covers: Entra ID sign-in/audit logs, MCAS shadow IT, Purview info protection, M365 advanced hunting data (29 tables in Q17/Q17b)
- Applied automatically —
Free Benefit - M365 Defender Data Ingestion - Always-free (all Sentinel users): Azure Activity, Office 365 Audit Logs, Defender alerts
⚠️ Ask user for E5 license count — not discoverable from Sentinel telemetry.
Example: 500 E5 licenses → grant = 2.5 GB/day. If E5-eligible avg exceeds grant, overage billed at standard rates.
References:
- DfS P2 data ingestion benefit
- Sentinel free data sources
- View data allocation benefits
- M365 E5 offer details
Report Template
📄 Just-in-time loading: Read SKILL-report.md at the start of Phase 6 rendering. It contains:
- Inline Chat Executive Summary template — Workspace at a Glance, Cost Waterfall, Detection Posture, Overall Assessment, Top 3 Recommendations
- Markdown File Structure — Complete §1–§8 rendering rules, mandatory format requirements, column specifications, validation checks
- Section-to-Scratchpad Mapping — Which scratchpad keys feed each report section
Load ONLY when entering Phase 6 — NOT during Phases 1–5. Combine with scratchpad data for rendering.
Post-Report Drill-Down Reference
📄 Just-in-time loading: Read SKILL-drilldown.md for full instructions when any of these are needed.
Available Drill-Down Patterns
Use these when the user asks follow-up questions after a report is generated (e.g., "which rules use EventID 8002?", "look up custom detection rules", "do any ASIM parsers depend on this table?").
| Pattern | Purpose | Tool / Method | Trigger Phrases |
|---|---|---|---|
| 1. EventID cross-ref | Which analytic rules reference a specific EventID? | az rest (Sentinel REST API) + JMESPath contains() |
"which rules use EventID X", "does any rule need this EventID" |
| 2. Syslog facility/process | Which rules reference a Syslog facility, source, or process? | az rest + JMESPath |
"which rules use sshd", "any rules for authpriv" |
| 3. CSL vendor/activity | Which rules reference a CEF vendor, product, or activity? | az rest + JMESPath |
"rules for Palo Alto TRAFFIC", "which rules use CommonSecurityLog" |
| 4. Full rule query dump | Export all enabled rule queries for manual analysis | az rest → JSON file |
"export all rule queries", "build EventID dependency map" |
| 5. ASIM parser verification | Which ASIM parsers consume a table slated for migration? | az rest + regex match for _Im_/_ASim_ patterns |
"ASIM dependency", "do parsers use this table" |
| 6. Custom Detection rules | Inventory CD rules via Graph API (query text, schedule, last run) | PowerShell Invoke-MgGraphRequest (NOT Graph MCP — scope CustomDetection.Read.All unavailable via MCP) |
"custom detection rules", "CD rules", "lookup custom detections" |
⚠️ Graph MCP limitation: The Graph MCP server returns 403 for the Custom Detection endpoint (
/beta/security/rules/detectionRules). Always useInvoke-MgGraphRequestvia PowerShell terminal. See SKILL-drilldown.md and Q9b-CustomDetectionRules.yaml for the exact endpoint and select fields.
Also in SKILL-drilldown.md
| Section | Contents |
|---|---|
| Known Pitfalls | Usage table batching, _SPLT_CL naming, case-sensitive custom tables, LogSeverity types, value-level vs table-level coverage confusion |
| Error Handling | Common errors from az rest, Graph API, az monitor; graceful degradation for missing tables; re-running individual PS1 phases |
| CloudAppEvents Appendix | Custom Detection management audit trail (EditCustomDetection events) — distinct from execution telemetry |
| Additional References | Microsoft Learn links for cost optimization, DCR configuration, data tiers, ASIM parsers |
SVG Dashboard Generation
After a report is generated, the user can request a visual SVG dashboard.
Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"
✅ DEFAULT: run the deterministic renderer (render_dashboard.py)
Do this first — do NOT hand-author the SVG. render_dashboard.py produces the manifest-driven 7-row dashboard non-interactively, parsing every value from the scratchpad + report + svg-widgets.yaml (no hardcoded run data). It is faster, deterministic, and produces a known-good layout. Run it:
python .github/skills/sentinel-ingestion-report/render_dashboard.py \
--scratch temp/ingest_scratch_<ts>.md \
--manifest .github/skills/sentinel-ingestion-report/svg-widgets.yaml \
--report reports/sentinel/sentinel_ingestion_report_<label>_<ts>.md \
--out reports/sentinel/sentinel_ingestion_report_<label>_<ts>_dashboard.svg
It reads the posture gauge, ingestion KPI cards, daily-volume line chart, cost waterfall, tier donut, top-tables / detection-coverage tables, and WoW anomaly + alert-producing-rule tables from the scratchpad, and the header metadata + Overall Assessment + ### 🎯 Top 3 Recommendations cards from the report (--report is optional — the assessment banner and recommendation cards degrade gracefully if absent). The alert-rule subheader lookback suffix ((7d)/(30d)) is matched by prefix, so any reporting window parses. Output is self-contained SVG with explicit fill on every <text>.
| Action | Status |
|---|---|
Running render_dashboard.py when the user asks to visualize/generate a dashboard |
✅ REQUIRED (default path) |
Hand-authoring the SVG via the svg-dashboard skill instead of running the script |
❌ PROHIBITED unless the user explicitly asks for a bespoke/custom layout the renderer can't produce |
Fallback — bespoke/interactive dashboards (svg-dashboard skill)
Only use this path when the user explicitly wants a custom layout, different widgets, or styling the deterministic renderer doesn't support. Edit svg-widgets.yaml first if the change is layout/field-level — the renderer reads it at generation time, so many "customizations" don't require hand-authoring. The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation.
Step 1: Read svg-widgets.yaml (this skill's widget manifest)
Step 2: Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3: Read the completed report file (data source)
Step 4: Render SVG → save to reports/sentinel/{report_name}_dashboard.svg
.github/skills/svg-dashboard/SKILL.md
npx skills add SCStelz/security-investigator --skill svg-dashboard -g -y
SKILL.md
Frontmatter
{
"name": "svg-dashboard",
"description": "Use this skill when asked to generate SVG data visualization dashboards from investigation data or skill reports. Triggers on keywords like \"generate SVG dashboard\", \"create a visual dashboard\", \"visualize this report\", \"SVG from the report\", \"visualize results\", \"create SVG chart\", \"SVG from this data\". Supports two modes: manifest-driven structured dashboards (from skill reports with svg-widgets.yaml) and freeform adaptive visualizations from ad-hoc investigation data. Component library includes KPI cards, score cards, bar charts, line charts, donut charts, waterfall charts, tables, recommendation cards, assessment banners. SharePoint Dark Theme default palette."
}
SVG Dashboard Generator
Renders SVG data visualization dashboards — either from a skill's
svg-widgets.yamlmanifest (structured dashboards) or freeform from ad-hoc investigation data in context.
Mode Detection
Before rendering, determine which mode applies:
| Condition | Mode | Behavior |
|---|---|---|
User asks for a dashboard after a skill report AND the calling skill has an svg-widgets.yaml |
Manifest Mode | Read the YAML manifest → follow its layout exactly → deterministic dashboard |
| User asks to "visualize", "chart", or "create an SVG" from ad-hoc data in context (query results, investigation findings, inline tables) | Freeform Mode | Select widget types from the Component Library below based on data shape → creative layout |
No svg-widgets.yaml exists for the current workflow |
Freeform Mode | Same as above |
Decision flow:
1. Is there an svg-widgets.yaml for the current skill?
→ YES + user said "dashboard" or "SVG from the report" → Manifest Mode
→ NO → Freeform Mode
2. Does the user have structured data in context (query results, tables, metrics)?
→ YES → Freeform Mode (use data shape to pick widgets)
→ NO → Ask user what data to visualize
Manifest Mode — Structured Dashboard
Used when a skill provides an svg-widgets.yaml manifest (e.g., mcp-usage-monitoring, sentinel-ingestion-report).
Execution
Step 1: Read the calling skill's svg-widgets.yaml (widget manifest)
Step 2: Read this file's Rendering Rules below (component library + quality standards)
Step 3: Read the completed report file (data source)
— If same chat: report data is already in context
— If new chat: read the file path provided by user or find latest in the skill's reports/ subfolder
Step 4: Map manifest fields → report data using data_sources.field_mapping_notes
Step 5: Render SVG → save to the same directory as the report: {report_basename}_dashboard.svg
Data Extraction (Manifest Mode)
- Read the report markdown or scratchpad JSON.
- Match fields from the manifest's
data_sources.field_mapping_notesto locate values. - For arrays (top_tables, anomalies, etc.), extract the full dataset and render up to
max_items. - For single values (KPIs), extract the number and apply the specified
unit. - If a field is not found in the report data, render the widget with "N/A" in muted text — never omit the widget.
Freeform Mode — Adaptive Visualization
Used when no manifest exists or the user wants an ad-hoc visualization from investigation data already in context.
Execution
Step 1: Identify the data in context (query results, investigation findings, report sections, inline tables)
Step 2: Analyze data shape — what dimensions, metrics, categories, and time series are present?
Step 3: Read this file's Rendering Rules below (component library + quality standards)
Step 4: Select appropriate widget types from the Component Library (see Data Shape Guide below)
Step 5: Design a layout: title banner → KPI summary → detail charts/tables → optional assessment
Step 6: Render SVG → save to temp/{descriptive_name}_dashboard.svg or user-specified path
Data Shape → Widget Selection Guide
| Data Shape | Best Widget | Example |
|---|---|---|
| Single metrics / counts | kpi-card |
Total failed logins: 47, Unique IPs: 12 |
| Metric with period-over-period change | delta-kpi-card |
Incidents: 47 (↑23% vs last period) |
| Scored assessment (0-100) | score-card |
Risk Score: 73/100 |
| Categorical counts (top-N) | horizontal-bar-chart |
Top 10 source IPs by attempt count |
| Composition within categories | stacked-bar-chart |
Alert severity breakdown per week |
| Time series (values over dates) | line-chart |
Daily sign-in volume over 30 days |
| Proportional breakdown | donut-chart |
Auth methods: 60% password, 30% MFA, 10% token |
| Additive/subtractive flow | waterfall-chart |
Ingestion costs with license benefits |
| Completion / target tracking | progress-bar |
72% of critical CVEs patched |
| Inline trend in KPI or table cell | sparkline |
7-day mini trend beneath a KPI value |
| Tabular detail rows | table-widget |
IP enrichment results, alert details |
| Prioritized action items | recommendation-cards |
High/Medium/Low priority findings |
| Executive summary | assessment-banner |
Overall risk assessment with key risks/strengths |
| 2D framework coverage (categories × items) | coverage-matrix |
MITRE ATT&CK tactic × technique map, permission grids |
| Report header | title-banner |
Investigation title, date, scope |
Layout Heuristics (Freeform)
- Row 1: Always start with a
title-banner(data source, date range, scope) - Row 2: KPI cards for key metrics (3-6 cards, one row)
- Rows 3+: Charts and tables arranged by importance — most critical findings first
- Final row: Assessment banner or recommendation cards if actionable findings exist
- Canvas size: Default 1400×900, increase height proportionally for more rows (~100-200px per row)
- Use the default SharePoint Dark palette (defined below) unless the data context suggests otherwise
Token Budget & Data Limits (Freeform Mode)
Why this matters: SVG is verbose — every
<rect>,<text>, and<path>consumes output tokens. Without limits, freeform dashboards with rich investigation data routinely exceed the model's output token budget, producing truncated/broken SVGs. Manifest-mode dashboards avoid this because the YAMLmax_itemsand fixed row count act as natural constraints.
Hard Limits — Always Enforced:
| Constraint | Limit | Rationale |
|---|---|---|
| Max rows | 6 (including title banner) | Each row adds ~100-200 SVG elements |
| Max widgets total | 12 | Beyond this, SVG size balloons past safe output limits |
| Max KPI cards per row | 5 | More than 5 become unreadable at standard canvas width |
| Max canvas height | 1200px | Forces prioritization; prevents unbounded vertical growth |
Per-Widget Data Limits:
| Widget Type | Max Data Points | What to Do with Excess |
|---|---|---|
horizontal-bar-chart |
10 bars | Show top 10, add a summary "Other (N remaining)" bar |
stacked-bar-chart |
8 bars × 6 segments | Aggregate smaller segments into "Other" |
line-chart |
30 data points | Resample to weekly if daily exceeds 30; show date range in subtitle |
donut-chart |
7 segments | Merge smallest into "Other" |
waterfall-chart |
8 segments | Combine minor items |
table-widget |
8 rows | Show top 8, add footer "Showing 8 of N" |
recommendation-cards |
4 cards | Prioritize highest-impact recommendations |
sparkline |
14 data points | Resample to fit (e.g., daily → every-other-day) |
Data Triage Strategy:
When the data in context exceeds these limits, apply this priority filter:
- Summarize first — Extract the 3-5 most important KPIs before plotting details
- Top-N everything — For ranked data, show top 10 max; group the rest as "Other"
- Aggregate time series — If >30 daily points, resample to weekly; if >30 weekly, resample to monthly
- One chart per insight — Don't render the same data as both a bar chart AND a table; pick the one that communicates better
- Cut, don't shrink — Rather than making unreadable tiny widgets, remove the lowest-priority widget entirely
If the data is too rich for 6 rows / 12 widgets: Tell the user what was included vs omitted, and suggest they request a second dashboard for the remaining data or provide an svg-widgets.yaml manifest for full control.
Creative Freedom (Freeform)
In freeform mode, you have latitude to:
- Decide which widget types best represent the data
- Choose how many rows and how to arrange widgets (within the limits above)
- Add contextual annotations on charts (peak markers, threshold lines)
- Combine multiple data points into composite widgets
- Adjust canvas dimensions to fit the content (up to 1400×1200 max)
You are still bound by the Quality Standards and Color & Typography rules below — these ensure visual consistency regardless of mode.
Rendering Rules (Both Modes)
Canvas & Layout
- Output a single
<svg>element withxmlns="http://www.w3.org/2000/svg"and thewidth/heightfrom the manifest (or chosen dimensions in freeform mode). - Fill the background with
canvas.background(manifest) or#1b1a19(freeform default). - Apply
canvas.padding(manifest) or40px(freeform default) on all sides. Usable width =width - 2 * padding. - Render rows top-to-bottom with
canvas.row_gap(manifest) or24px(freeform default) spacing between rows. - Within each row, widgets are laid out left-to-right. If a widget specifies
width_pct, it gets that percentage of usable width. Otherwise, widgets share remaining space equally. - Use
canvas.col_gap(manifest) or20px(freeform default) for spacing between widgets in the same row.
Color & Typography
- Use
palette.*values from the manifest. In freeform mode, use the default palette below. - 🔴 GLOBAL TEXT FILL RULE: SVG defaults
fillto black — which is invisible on dark backgrounds. Every<text>element MUST have an explicitfillattribute. Setfill="{palette.text_primary}"on the root<svg>or a top-level<g>so all text inherits white by default. Never rely on SVG's implicit black fill. - All text uses
canvas.font_family(manifest) orSegoe UI, sans-serif(freeform default). - KPI values: bold, 28-36px, colored with
palette.primaryor widget'shighlight_color. - KPI labels: 11-12px,
palette.text_secondary. - Widget titles: bold, 14-16px,
palette.text_primary. - Axis labels and table headers: 10-12px,
palette.text_secondary. - Data labels and value labels: 10-11px,
palette.text_primary. Never place value labels inside bars — always position them after/outside the bar. - The default palette uses a cool dark theme consistent across all skill manifests. Skills may override with their own
paletteinsvg-widgets.yaml.
Default Palette (Freeform Mode)
palette:
background: "#0d1117"
card_bg: "#161b22"
primary: "#409AE1" # Blue — KPI highlights
secondary: "#b4a0ff" # Purple — secondary charts
success: "#40C5AF" # Teal-green — healthy metrics
warning: "#ff8c00" # Orange — moderate risk
danger: "#EF6950" # Red — critical findings
text_primary: "#e6edf3"
text_secondary: "#b2b2b2"
accent: "#FFC83D" # Yellow — warnings, anomalies
grid_line: "#30363d"
Widget Type Reference — Component Library
title-banner
Full-width banner. Render the title large and centered horizontally on the canvas, subtitle fields centered below on the same line separated by " · ". Optional accent underline. Use text-anchor="middle" with x at canvas midpoint. If the manifest specifies title_align: left, left-align instead — but the default is always center.
kpi-card
Rounded rectangle (rx="12"). Show the value large and centered, label below in small text, optional unit suffix. Color the value with highlight_color if specified, otherwise palette.primary. No actual icon rendering needed — use a colored dot or small indicator instead.
delta-kpi-card
Extends kpi-card with a period-over-period change indicator. Render the primary value the same as kpi-card. Below (or beside) the value, show a delta line: an arrow (▲ or ▼) followed by the percentage or absolute change. Color the delta with palette.success for favorable changes and palette.danger for unfavorable changes. If invert_color is true, reverse the color logic (e.g., for metrics where "down" is good, like error rate). Show the comparison period label in palette.text_secondary at 10px (e.g., "vs prior 7d"). If no delta data is available, render as a standard kpi-card with no delta line.
score-card
Rounded rectangle card (rx="12") with card_bg background. Render the numeric score value large and centered (bold, 42-48px), colored by whichever range it falls into (from the widget's ranges array). Below the number, show the rating label (e.g., "CONCERNING") in 14px bold, same color as the number. Above both, render the widget title in 14-16px bold, palette.text_primary. Add a subtle /100 suffix after the score in smaller muted text (18px, palette.text_secondary). Keep it visually clean — no gauge arcs, needles, or scale markers.
stacked-bar-chart
Vertical or horizontal bars where each bar is subdivided into colored segments representing categories (e.g., severity levels, sources, status). Include a legend mapping segment colors to category names. If orientation: horizontal, render left-to-right stacked rows with labels on the left. If orientation: vertical (default), render bottom-to-top stacked columns with labels on the x-axis. Show segment values on hover via <title> elements. If show_totals is true, display the total above each bar. Use segment_colors from the manifest or assign from palette automatically.
horizontal-bar-chart
Horizontal bars sorted by value descending. Layout per row (left to right): label → optional inline badges → bar (proportional to max value) → value label → optional extra column (rightmost). Value labels MUST be positioned outside (after) the bar, never inside it — use fill="{palette.text_primary}" (white on dark themes). Append value_suffix if specified. If show_rule_count: right, render the rule count as the rightmost column, right-aligned. If a value is 0, render it in palette.danger. If show_tier_badge is true, render a small colored badge after each label using colors from the YAML segments or badge_colors definitions. If bar_color_by: severity is set, color bars by severity level. If show_error_overlay is true, render a red overlay segment proportional to failure count. If highlight_sensitive is true, mark flagged items with a warning indicator.
line-chart
SVG <polyline> or <path> for the trend line with optional area fill (fill_opacity). X-axis = dates, Y-axis = values. Render annotations as labeled markers: peak (triangle up), low (triangle down), average (dashed horizontal line). Grid lines at sensible intervals. If show_weekday_pattern is true, add subtle mini-bars along the bottom showing day-of-week averages.
donut-chart
Render using SVG <circle> elements with stroke-dasharray/stroke-dashoffset. Use this exact formula — do not iterate or try alternative approaches:
circumference = 2 * π * radius (e.g., radius=70 → C ≈ 439.82)
For each segment i (ordered by value descending):
arc_len_i = (value_i / total) * circumference
start_i = sum of all previous arc_lens (0 for first segment)
dasharray = "arc_len_i, (circumference - arc_len_i)"
dashoffset = circumference - start_i
transform = "rotate(-90, cx, cy)" ← starts at 12 o'clock
Each segment is a <circle cx cy r> with fill="none", stroke="{segment_color}", stroke-width="20". Stack all circles at the same position — the dasharray/dashoffset combination makes each one draw only its arc portion. Add <title> tooltips.
Legend to the right or below. If show_center_total is true, display the total count in the donut center. If compact is true, reduce the donut radius and legend font size to fit alongside a stacked widget below.
waterfall-chart
Stacked/cascading vertical bars: each segment starts where the previous ended. Negative segments (benefits) flow downward. Show values on each bar. Final bar shows net total.
progress-bar
Horizontal bar showing completion percentage against a target. Render a rounded track (rx="6") in palette.grid_line (or card_bg), filled proportionally with palette.primary (or bar_color if specified). Show the percentage value (bold, 18-22px) to the right of the bar or centered inside the filled portion. Label text above or to the left in palette.text_primary at 12-14px. If target_label is provided, show it at the 100% mark in palette.text_secondary. If thresholds are defined (e.g., [{"at": 90, "color": "success"}, {"at": 50, "color": "warning"}, {"at": 0, "color": "danger"}]), color the fill bar according to which threshold the value meets. If show_remaining is true, display the remaining percentage in muted text after the bar.
sparkline
Miniature trend line — a compact <polyline> rendered inline within a kpi-card, delta-kpi-card, or table-widget cell. Dimensions: typically 60-100px wide × 16-24px tall. No axes, labels, or grid lines — just the trend shape. Stroke width 1.5-2px in palette.primary (or line_color if specified). Optional: fill the area below with the same color at 10-15% opacity. If show_endpoints is true, render small circles (r=2) at the first and last data points. If the last value is higher than the first, color the line palette.success; if lower, palette.danger; if auto_color: false, use the specified line_color instead.
table-widget
Rows of data with alternating row backgrounds (card_bg and slightly lighter). Column headers in text_secondary. If color_scale is true for a column, color positive values red and negative green (or vice versa for cost savings). If badge is true, render small severity badges. If highlight_zero is true for a column, render zero values in palette.danger color. If summary_row is specified, add a totals/summary row at the bottom with a top border separator. If stack_below is specified, this widget shares the same column as the named widget above it — render it directly below that widget rather than side-by-side.
recommendation-cards
Side-by-side rounded cards. Left border colored by priority (card_colors). Title bold, description in text_secondary. If show_impact_estimate, add a small impact line.
assessment-banner
Large panel with a colored left border. Title + main assessment text. Sub-fields rendered as bullet lists (key_risks in palette.danger, strengths in palette.success).
coverage-matrix
Compact grid visualization for displaying coverage status across a two-dimensional framework (e.g., MITRE ATT&CK tactics × techniques, permission matrices, data readiness grids). Renders as a grid of small colored <rect> cells organized into columns, where each column represents a category (e.g., tactic) and each cell represents an item (e.g., technique) within that category.
Layout: Columns are arranged left-to-right. Each column has a rotated header label at the top (45° angle, 10-11px text) and a vertical stack of cells below. Columns are variable-height — each has as many cells as items in that category. A legend bar is rendered below the grid mapping colors to status labels.
Cell rendering: Each cell is a small <rect> (default cell_size: 12 × 12px, cell_gap: 2px between cells). Cells are colored according to their status field using the status_colors map from the manifest. Cells within each column are sorted by status priority (covered items at top, uncovered at bottom) to create a visible "waterline" effect. Each cell has a <title> element containing the item name and status for hover tooltips — this is essential since cell text is not rendered at this scale.
Column rendering: Each column is cell_size + cell_gap wide. Columns are separated by col_gap (default 6px). Column header text is right-rotated and positioned above the first cell. An optional column footer shows the count or percentage (e.g., "5/11" or "45%") in 9px text below the last cell.
Legend: Horizontal bar below the grid with colored squares and labels for each status. Rendered in a single row, 10px text, using the status_colors map.
Manifest fields:
| Field | Required | Description |
|---|---|---|
field |
✅ | Data source — array of {column, items: [{name, status}]} objects |
status_colors |
✅ | Map of status label → hex color (e.g., custom_rule: "#409AE1", tier_1: "#40C5AF", uncovered: "#21262d") |
cell_size |
❌ | Cell width and height in px (default: 12) |
cell_gap |
❌ | Gap between cells in px (default: 2) |
col_gap |
❌ | Gap between columns in px (default: 6) |
show_col_footer |
❌ | Show count/percentage below each column (default: true) |
sort_order |
❌ | Array of status labels defining top-to-bottom cell sort order (covered statuses first) |
max_rows |
❌ | Cap the tallest column at this many cells; excess items are collapsed into a single "+" cell with count in tooltip |
Token budget: This widget is compact by design — 250 cells ≈ 250 <rect> elements (~15KB SVG). No text per cell keeps it efficient. The primary token cost is the <title> tooltip content. For grids exceeding 300 items, set max_rows to cap column height and keep SVG size manageable.
Example use cases: MITRE ATT&CK tactic × technique coverage map, data source × table readiness grid, permission scope × application access matrix, compliance framework × control status.
Quality Standards
- All text must be legible — minimum 10px font size.
- Maintain consistent rounded corners (
rx="8"torx="12") on all cards and panels. - Use
<title>elements on interactive-looking elements for accessibility. - Encode any special characters in text (
&,<, etc.). - The SVG must be fully self-contained — no external stylesheets, fonts, or images.
- Add a
<!-- Generated by Copilot SVG Dashboard Generator -->comment at the top.
Output
- Manifest mode: Save to the same directory as the report, with filename pattern:
{report_basename}_dashboard.svg - Freeform mode: Save to
temp/{descriptive_name}_dashboard.svgor a user-specified path
.github/skills/threat-intel-campaign/SKILL.md
npx skills add SCStelz/security-investigator --skill threat-intel-campaign -g -y
SKILL.md
Frontmatter
{
"name": "threat-intel-campaign",
"description": "Turn a published threat-intelligence article into a tested threat-hunting campaign. Reads a platform-agnostic RSS\/Atom feed (feed_url is a parameter — nothing vendor-specific is hardcoded), triages articles from a recent window, applies a huntability relevance gate to decide whether an article warrants a campaign, then writes\/tests\/tunes KQL hunts and publishes them as a campaign file under queries\/threat-intelligence\/YYYY-MM\/. Also supports a single-article mode (pass an article URL directly). Side-effect-free: it writes campaign files and regenerates the manifest\/TOCs but performs NO git commits or PRs — branch\/PR orchestration belongs to the calling automation. Trigger keywords: \"threat intel campaign\", \"ingest threat intelligence\", \"TI feed\", \"write hunts from this article\", \"threat intelligence blog\", \"build a hunting campaign\"."
}
Threat Intelligence Campaign Authoring — Instructions
Purpose
This skill converts published threat-intelligence reporting into tested, tuned, publish-ready threat-hunting campaigns that land in queries/threat-intelligence/YYYY-MM/. It exists to be driven either:
- Interactively — a human gives one article URL ("read this article and write/test/tune hunts"), or
- Unattended — a scheduled automation passes a feed URL and the skill triages everything published in a recent window.
It does the authoring (parse → triage → relevance gate → write → test → tune → publish files → regenerate manifest/TOCs). It deliberately does NOT create branches, commits, or pull requests. That orchestration — and the per-article PR isolation — belongs to the calling workflow. This keeps the skill reusable and free of git side effects when a human runs it.
What this skill produces:
| Output | Description |
|---|---|
| Campaign file(s) | queries/threat-intelligence/YYYY-MM/<slug>.md in the standard campaign format |
| Regenerated artifacts | .github/manifests/discovery-manifest.yaml + per-file Quick Reference TOCs |
| Structured result | A JSON array (one entry per article) the calling automation consumes to drive per-article PRs |
| Human summary | A readable per-article decision log |
| In-chat hunt findings summary | A per-article report of what the test runs actually surfaced — real hits, false positives to tune, and follow-up actions. Emitted to chat/run output only; never written to a tracked file. This is where concrete findings live, keeping the committed campaign file PII-free. |
📑 TABLE OF CONTENTS
- Critical Workflow Rules
- Prerequisites
- Inputs / Parameters
- Invocation Modes
- Execution Workflow — Phase 0–6
- The Relevance Gate (Huntability Rubric)
- Writing / Testing / Tuning Queries
- Campaign File Format
- Structured Output Contract
- In-Chat Hunt Findings Summary
- Known Pitfalls
- Quality Checklist
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
-
No git side effects. This skill NEVER runs
git commit,git push,gh pr create, or any branch operation. It writes files and regenerates the manifest/TOCs only. Publishing (branch + commit + PR per article) is the calling automation's job. If a human is running this interactively, leave the files in the working tree for them to review. -
feed_urlis a parameter — nothing vendor-specific is hardcoded. The skill handles any RSS 2.0 or Atom feed. The Microsoft Threat Intelligence feed is just one value a caller may pass; do not assume it. -
Advanced Hunting (≤30d) is the primary test/tune engine. Write and validate every query against
RunAdvancedHuntingQuerywithin a 30-day window. Fall back to the Sentinel Data Lake (query_lake, >30d) only when you need additional supporting evidence that AH's 30-day cap cannot provide (e.g., confirming a rare IOC's longer-term absence/presence). Follow the Tool Selection Rule and timestamp-adaptation guidance incopilot-instructions.md. -
⛔ Evidence-based "tested" claims only. A query may be described as tested in the campaign file only if it was actually executed. If a query could not be run (table absent, AH safety filter, telemetry gap), say so explicitly and mark
cd_ready: falsewith honestadaptation_notes. Never imply validation that did not happen. Follow the Evidence-Based Analysis rule incopilot-instructions.md. -
⛔ Committed output is PII-free — but published IOCs are NOT PII. Test/tune runs against the live tenant, but campaign files are version-controlled. NEVER paste real tenant entities (your UPNs, hostnames, IPs, workspace/tenant GUIDs, app names) into a campaign file. The article's published IOCs (hashes, domains, URLs, certs, filenames) are the opposite — they are public, shareable, and MUST be included verbatim in the IOC Reference table and the IOC-sweep queries (this is how the committed companion files do it). Do not placeholder or omit a published IOC. Concrete tenant findings from test runs belong in the In-Chat Hunt Findings Summary, not the file. Perform a PII sanity-check before finalizing each file.
-
⛔ Every IOC must trace to the article. Never invent one. Copy IOCs from the article's "Indicators of compromise" table (and any inline-cited indicators) exactly. Before finalizing, re-open the article's IOC section and confirm each hash/domain/URL/filename in your file appears there character-for-character. Hallucinated or mis-transcribed IOCs are a critical evidence-integrity failure — they produce false detections and erode trust. If an indicator only appears in narrative prose (not the IOC table), label it as such.
-
Reference, don't reinvent. Use the kql-query-authoring skill's discipline for query construction (schema validation via kql-search MCP, table pitfalls,
TimeGeneratedvsTimestamp), and the detection-authoring skill's CD Metadata Contract for the<!-- cd-metadata -->block on every query. Read those SKILL.md files when authoring. -
Workspace selection. Follow the SENTINEL WORKSPACE SELECTION rule in
copilot-instructions.md. In unattended runs the caller will specify the workspace; if exactly one exists, auto-select and state it. -
Read
config.jsonfor workspace ID, tenant, and Azure MCP parameters before querying. -
Quiet runs are a success, not a failure. If nothing in the window qualifies, that is a valid, expected outcome. Emit an empty/
"skipped"-only result set and stop — do not lower the bar to manufacture a campaign.
Prerequisites
| Dependency | Used for |
|---|---|
kql-search MCP (GITHUB_TOKEN set) |
Schema validation, table discovery, community query examples |
Sentinel Triage MCP (RunAdvancedHuntingQuery) |
Primary query testing/tuning (≤30d) |
Sentinel Data Lake MCP (query_lake) |
Supporting evidence only (>30d) |
| Microsoft Learn MCP | Grounding TTP/technique/error-code explanations |
Python 3 (stdlib xml.etree, urllib) |
RSS/Atom parsing — no external dependency required |
web_fetch / web_search tools |
Fetching article bodies and the feed |
.github/manifests/build_manifest.py, scripts/generate_tocs.py |
Post-processing |
Feed parsing uses Python stdlib (
xml.etree.ElementTree) so it works unattended withoutpip install.feedparsermay be used if already installed, but never assume it.
Inputs / Parameters
| Parameter | Default | Description |
|---|---|---|
feed_url |
(required in feed mode) | Any RSS 2.0 / Atom feed URL |
article_url |
(none) | A single article to process directly (single-article mode) |
lookback_hours |
24 |
How far back to consider feed entries (by published date) |
max_campaigns |
3 |
Cap on campaigns produced per run (bounds tenant query load + review burden) |
workspace_id |
from config.json |
Sentinel workspace for testing |
min_queries / max_queries |
4 / 9 |
Soft bounds on queries per campaign |
Invocation Modes
A. Single-article mode — caller passes article_url (the classic "read this article and write hunts" prompt).
→ Skip Phase 1 (feed) and the time filter. Still run Phase 2 (dedup) and Phase 3 (relevance gate) unless the human explicitly says "build it regardless". Then Phases 4–6 for that one article.
B. Feed mode — caller passes feed_url (+ optional lookback_hours).
→ Full Phase 0–6 across all qualifying entries in the window, capped at max_campaigns.
Execution Workflow
Phase 0 — Setup
- Read
config.json(workspace ID, tenant, subscription, Azure MCP params). - Resolve workspace per the selection rule. State which workspace is in use.
- Confirm kql-search + Triage MCP are available (needed for testing).
Phase 1 — Fetch & parse the feed (feed mode only)
Fetch feed_url and parse entries with Python stdlib so it works for both RSS and Atom:
import sys, urllib.request, datetime as dt
import xml.etree.ElementTree as ET
from email.utils import parsedate_to_datetime
feed_url = sys.argv[1]
lookback_hours = int(sys.argv[2]) if len(sys.argv) > 2 else 24
cutoff = dt.datetime.now(dt.timezone.utc) - dt.timedelta(hours=lookback_hours)
raw = urllib.request.urlopen(urllib.request.Request(feed_url, headers={"User-Agent": "ti-campaign/1.0"}), timeout=30).read()
root = ET.fromstring(raw)
ATOM = "{http://www.w3.org/2005/Atom}"
def text(el, *tags):
for t in tags:
for tag in (t, ATOM + t):
f = el.find(tag)
if f is not None and f.text:
return f.text.strip()
return None
def parse_date(s):
if not s: return None
try: return parsedate_to_datetime(s) # RSS pubDate
except Exception:
try: return dt.datetime.fromisoformat(s.replace("Z", "+00:00")) # Atom ISO
except Exception: return None
entries = list(root.iter("item")) or list(root.iter(ATOM + "entry")) # RSS first, else Atom
for e in entries:
title = text(e, "title")
link = text(e, "link")
if not link:
a = e.find(ATOM + "link")
link = a.get("href") if a is not None else None
pub = parse_date(text(e, "pubDate", "published", "updated"))
if pub and pub >= cutoff:
print(f"{pub.isoformat()}\t{title}\t{link}")
Run it with the powershell tool (python script.py <feed_url> <lookback_hours>). Collect (published, title, link) for entries inside the window. If the feed only exposes summaries, you'll fetch full bodies in Phase 3.
Phase 2 — Dedup against existing campaigns
For each candidate URL, check whether it's already been turned into a campaign:
grepfor the article URL (and a normalized form without trailing slash / query string) acrossqueries/threat-intelligence/**.- Also grep the proposed slug. If a match exists → mark
decision: "skipped",reason: "already published", and drop it from the work list.
Phase 3 — Relevance gate
For each remaining candidate, fetch the full article body (web_fetch) and apply the Huntability Rubric. Produce a decision (campaign / skipped) with a one-line reason. Rank campaign candidates by huntability confidence and keep the top max_campaigns.
Phase 4 — Write / test / tune (per qualifying article)
See Writing / Testing / Tuning Queries. Output: a set of validated queries, each with an honest cd-metadata block and tuning notes.
Phase 5 — Publish the campaign file
Write queries/threat-intelligence/YYYY-MM/<slug>.md in the exact Campaign File Format. YYYY-MM = the article's publication month. <slug> = short, kebab/underscore, descriptive (e.g., soho_router_dns_hijacking). Do NOT hand-write the Quick Reference TOC — generate_tocs.py creates it.
Phase 6 — Regenerate artifacts + emit results
python .github/manifests/build_manifest.py(regenerate + validate; fix any error-level warnings on your new file).python scripts/generate_tocs.py(insert the Quick Reference TOC).- Emit the Structured Output Contract JSON + a human summary.
- Emit the In-Chat Hunt Findings Summary — the per-article report of what the test runs actually found (hits, false positives to tune, follow-up actions). Chat/run output only; never write it to a tracked file.
- Stop. No git.
The Relevance Gate — Huntability Rubric
This is the judgment step: does this article warrant a hunting campaign? Decide with explicit gates, not vibes. Cite the evidence from the article for each gate.
Hard gates (BOTH must PASS to build)
| Gate | PASS criteria | FAIL examples |
|---|---|---|
| G1 — Huntable behavior | Article describes specific, observable attacker behaviors mappable to ≥1 ATT&CK technique (process exec, persistence mechanism, C2 pattern, auth abuse, mailbox manipulation, registry/file artifacts, etc.) | "Threat actor targeted sector X" with no technique detail; pure attribution/geopolitics |
| G2 — Telemetry coverage | ≥1 behavior or IOC maps to a table we ingest (Device*, Email*, Identity*, Signin*/EntraId*, Cloud*, Audit*, OfficeActivity, network/DNS) |
Behaviors only observable in telemetry we don't collect (e.g., physical, OT-only with no connector, third-party logs not onboarded) |
Confidence signals (raise/lower priority among passing candidates)
| Signal | Effect |
|---|---|
| Concrete IOCs (hashes, domains, IPs, filenames, command lines, registry keys, user-agents) | ↑↑ strong — enables direct-match hunts |
| Multiple distinct mappable TTPs (richer attack chain) | ↑ |
| Named ATT&CK technique IDs in the article | ↑ |
| Novel TTP not already covered by an existing campaign/query | ↑ |
| Overlaps heavily with an existing campaign | ↓ (consider extending the existing file instead of a new one) |
Auto-skip categories (do not build)
- Product/feature announcements, GA/preview notices, roadmap posts
- Analyst-recognition / "named a Leader" / awards
- Strategy, opinion, policy, or business-update posts
- Event/webinar recaps and partner marketing
- Pure data-breach news with no attacker TTPs/IOCs
Decision rule
BUILD if G1 PASS and G2 PASS and (concrete IOCs present OR ≥2 distinct mappable TTPs). Otherwise SKIP with a specific reason (which gate failed / which auto-skip category).
Record the rubric outcome in the structured result reason field (e.g., "BUILD: 4 IOCs + 6 mappable TTPs (endpoint, identity)" or "SKIP: product announcement, G1 fail").
Writing / Testing / Tuning Queries
For each qualifying article:
-
Extract the TTPs and IOCs. Map each TTP to ATT&CK technique IDs (use Microsoft Learn MCP to confirm technique semantics). Build the IOC table (hashes, domains, IPs, filenames, etc.).
-
Pick detection surfaces. Map TTPs → tables. Prefer XDR-native tables for AH testing. Check the discovery manifest +
grep queries/**first — if an existing query file already covers a TTP, reuse/adapt its pattern and cite it as a companion rather than duplicating. -
Author each query following the kql-query-authoring discipline:
- Validate the table/columns via kql-search MCP (
get_table_schema) before writing. - Respect the table pitfalls in
copilot-instructions.md(e.g.,TimeGeneratedvsTimestamp, dynamic-fieldparse_json,IpAddresscasing). - Datetime filter first;
projecta useful, PII-light column set;order by/summarizeto bound output.
- Validate the table/columns via kql-search MCP (
-
Test in Advanced Hunting (≤30d). Run every query via
RunAdvancedHuntingQuery. Apply the Step-5 zero-result sanity check fromcopilot-instructions.md— a 0-row result must be verified correct (e.g., a direct-IOC sweep returning 0 in a clean environment is the desired outcome; a 0 from a broken filter is not). As you test, record the findings for each query — row count, whether hits look like true/false positives, and any notable entities — so you can build the In-Chat Hunt Findings Summary in Phase 6. These raw findings stay in chat; they do NOT go into the campaign file. -
Tune. If a query is noisy, add targeted exclusions (trusted publishers, known service accounts, expected automation) and document them in Tuning Notes — generically, never with live tenant identifiers. Re-run after tuning.
-
Supporting evidence via Data Lake (>30d) — only if needed. If 30 days is insufficient to characterize prevalence/absence of a rare IOC, run a scoped
query_lake(adaptTimestamp→TimeGeneratedfor Sentinel/LA tables). Use this for evidence, not as the primary engine. -
CD metadata. Attach a
<!-- cd-metadata -->block to every query per the detection-authoring CD Metadata Contract. Setcd_ready: trueonly for high-fidelity, low-noise queries that actually validated cleanly; otherwisecd_ready: falsewithadaptation_notesexplaining what's needed. -
IOC freshness note. IOC-match queries (hashes/domains) rot. Note that operators rotate IOCs and recommend periodic refresh from current MS TI / VirusTotal / a TI indicator table.
Campaign File Format
Match the existing files in queries/threat-intelligence/YYYY-MM/ exactly. Structure:
# <Threat / Actor / Campaign> — Threat Hunts
**Created:** YYYY-MM-DD
**Platform:** Microsoft Defender XDR | Microsoft Sentinel | Both
**Tables:** <exact KQL table names, comma-separated>
**Keywords:** <attack techniques, actor names, tooling, artifacts, field names>
**MITRE:** <technique/tactic IDs, comma-separated>
**Domains:** <threat-pulse domain tags: incidents|identity|spn|endpoint|email|admin|cloud|exposure>
**Timeframe:** Last N days (configurable)
**Source:** [<Article title> (<date>)](<article_url>)
---
## Threat Overview
<2–4 sentence synopsis grounded in the article. Include actor attribution if stated.>
### TTP Summary
| Capability | TTP |
|---|---|
| ... | ... |
### ⚠️ Hunt Pitfalls
| Pitfall | Mitigation |
|---|---|
| ... | ... |
---
## IOC Reference
<Table of published IOCs (hashes/domains/IPs/filenames). Note they rot; recommend refresh.>
---
## Query 1: <Title>
**Purpose:** <what it detects, and what a clean result looks like>
**Severity:** <Low|Medium|High>
**MITRE:** <technique IDs>
<!-- cd-metadata
cd_ready: true|false
cd_table: <PrimaryTable>
cd_frequency: NRT|Hourly|...
cd_severity: <Low|Medium|High>
cd_mitre: ["T...."]
cd_entities: ["device","file","account",...]
cd_adaptation_notes: "<honest notes>"
-->
` ` `kql
<tested query>
` ` `
**Expected results:** <what to expect; 0-row interpretation if a direct IOC sweep>
---
## Query 2: ...
...
---
## General Tuning Notes
1. IOC refresh ...
2. Telemetry gaps ...
3. CD-readiness summary ...
---
## References
- Microsoft Threat Intelligence — [<title>](<url>)
- MITRE ATT&CK — [<technique/actor>](<attack url>)
- Companion files: [`queries/<domain>/<file>.md`](...)
Header field requirements (enforced by build_manifest.py): Tables, Keywords, MITRE, and Domains are mandatory. Domains values must come from the valid set (incidents, identity, spn, endpoint, email, admin, cloud, exposure). A missing Domains is an error-level manifest warning.
Do NOT pre-write a ## Quick Reference — Query Index section — generate_tocs.py inserts it. Pre-creating it breaks the strip-and-reinsert logic.
Structured Output Contract
At the end of every run, emit a JSON array (one object per article considered) so the calling automation can isolate per-article PRs. Print it in a fenced ```json block:
[
{
"article_title": "SOHO router compromise leads to DNS hijacking...",
"article_url": "https://www.microsoft.com/en-us/security/blog/2026/04/07/...",
"published": "2026-04-07T00:00:00Z",
"decision": "campaign",
"reason": "BUILD: 6 mappable TTPs + IOCs (endpoint, identity)",
"file_path": "queries/threat-intelligence/2026-04/dns_hijacking_soho_compromise.md",
"queries_written": 9,
"queries_tested": 9,
"queries_cd_ready": 4,
"domains": ["endpoint", "identity"]
},
{
"article_title": "Microsoft named a Leader in ...",
"article_url": "https://www.microsoft.com/en-us/security/blog/2026/04/05/...",
"published": "2026-04-05T00:00:00Z",
"decision": "skipped",
"reason": "SKIP: analyst-recognition post, G1 fail",
"file_path": null
}
]
decision ∈ campaign | skipped. For skipped, file_path is null. Follow the JSON block with a short human-readable summary (counts, what was built, what was skipped and why).
In-Chat Hunt Findings Summary
After the structured JSON, emit a per-article findings summary that reports what the test runs actually surfaced in the tenant. This is the counterpart to the PII-free campaign file: the file is the reusable, sanitized hunt definition; this summary is the investigation result of running those hunts right now.
Where it goes: chat / run output only. Never write it to a tracked file (not the campaign file, not any queries/** or docs/** file). For unattended runs, the calling automation decides where to route it (e.g., PR description, notification, ticket) per its own data-handling policy — the skill just emits it.
PII posture: Unlike the committed campaign file, this summary may include the concrete entities an analyst needs to act (device names, UPNs, IPs, file hashes, sender addresses, message IDs) — it is investigation output to an operator who already has tenant access, the same as any other investigation skill's chat output. Do not redact what's needed for triage; do not persist it to the repo.
Skip when nothing actionable: If every query returned a verified-clean 0 (e.g., all IOC sweeps clean in a tenant where the IOCs predate the AH window), say so in one line per query rather than padding. The value is in the hits and the FPs, not in restating "0 rows" decoratively.
Format
## 🔎 Hunt Findings — <Article Title> (<run date>)
**Workspace:** <name> **Lookback:** <window> **Queries run:** <n>
| # | Query | Rows | Assessment | Action |
|---|-------|------|------------|--------|
| 1 | AI-brand display-name spoof | 0 | ✅ Clean (tuned; no spoofed-domain phish in window) | None |
| 4 | Fake-AI installer download | 3 | 🟠 2 likely FP (sanctioned vendor), 1 to review | Verify host on DEVICE-X; see below |
| 7 | Endpoint IOC sweep | 0 | ✅ Clean — IOCs predate 30d AH window | Re-run in Data Lake (90d) for retrospective |
### 🔴 True / suspected positives
- **Q4 — DEVICE-X / user@contoso.com:** downloaded `seedance_setup_x64.exe` from `hxxp://…` at <time>. Not a sanctioned vendor host. **Follow-up:** isolate/triage device, pivot Q5 (execution) + Q7 (C2).
### 🟠 False positives to tune
- **Q4 — 2 rows:** `<vendor>` installer from `downloads.<vendor>.com` — legitimate. **Tuning:** add `downloads.<vendor>.com` to the trusted-host list (reflected generically in the file's Tuning Notes, not as a literal tenant value).
### ⚠️ Follow-up actions
- [ ] Re-run Q3/Q7 IOC sweeps in Sentinel Data Lake (>30d) for retrospective coverage.
- [ ] Confirm DEVICE-X download disposition with endpoint team.
- [ ] If positives confirmed, consider promoting Q1/Q4/Q5 to custom detections (see detection-authoring skill).
Closing the loop with the campaign file: when a finding reveals a tuning need (e.g., a legitimate host triggering FPs), capture the generic fix in the campaign file's Tuning Notes / adaptation_notes (e.g., "exclude sanctioned vendor download hosts") — never the literal tenant value. The findings summary names the specific host; the file describes the class of exclusion.
Known Pitfalls
| Pitfall | Mitigation |
|---|---|
| Feed exposes only summaries, not full TTPs | Always web_fetch the full article body before the relevance gate and query authoring |
Atom vs RSS schema differences (<entry>/<published> vs <item>/<pubDate>) |
The Phase-1 parser handles both; never hardcode one shape |
| Treating a marketing/recognition post as huntable | Apply the auto-skip categories; G1 must find real behavior |
| Claiming a query is "tested" when it errored or hit the AH safety filter | Only mark tested if it ran and returned a sane result; otherwise cd_ready: false + notes |
| Pasting live tenant entities (from test runs) into the committed file | Campaign files are PII-free; test data informs tuning notes only |
| Placeholdering or omitting an article's published IOCs | Published IOCs are public, not PII — transcribe them verbatim into the IOC Reference table AND the IOC-sweep queries. Never ship <HASH1> placeholders or "see table" stand-ins. |
| Inventing / mis-transcribing an IOC | Every hash/domain/URL/filename must appear character-for-character in the article's IOC table. Re-verify against the source before finalizing; a hallucinated IOC is a critical evidence-integrity failure. |
| Using Data Lake as the primary engine | AH ≤30d is primary; Data Lake >30d for supporting evidence only |
| Hand-writing the Quick Reference TOC | Let generate_tocs.py generate it |
| Forgetting to regenerate the manifest | Always run build_manifest.py after writing files; resolve error-level warnings |
| Duplicating an existing campaign/query | Grep first; extend or cite companions instead of duplicating |
| Performing git operations | Never — publishing is the automation's responsibility |
| Putting real findings/entities in the committed file, OR omitting them from chat | Two separate outputs: campaign file = PII-free reusable hunt; In-Chat Hunt Findings Summary = real hits/FPs/follow-ups (chat only). Don't merge them. |
Quality Checklist
Before emitting results, confirm:
- Every candidate has an explicit
decision+ evidence-basedreason - Each campaign file matches the standard format (header fields complete,
Domainsvalid) - Every query has a
cd-metadatablock with an honestcd_readyvalue - Every query was tested in Advanced Hunting (or its non-execution is documented)
- Zero-result queries were sanity-checked (desired vs broken)
- No live tenant PII anywhere in the committed file
- Every published IOC from the article is present verbatim (no placeholders/omissions) and every IOC in the file traces back to the article's IOC table (no invented/mis-transcribed indicators)
-
build_manifest.pyruns clean (no error-level warnings on the new file) -
generate_tocs.pyhas inserted the Quick Reference TOC - Structured JSON result emitted + human summary
- In-Chat Hunt Findings Summary emitted (hits, FPs to tune, follow-ups) — chat/run output only, not a tracked file
- No git operations performed
.github/skills/user-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill user-investigation -g -y
SKILL.md
Frontmatter
{
"name": "user-investigation",
"description": "Use this skill when asked to investigate a user account for security issues, suspicious activity, or compliance review. Triggers on keywords like \"investigate user\", \"security investigation\", \"user investigation\", \"check user activity\", \"analyze sign-ins\", or when a UPN\/email is mentioned with investigation context. This skill provides comprehensive Entra ID user security analysis including sign-in anomalies, MFA status, device compliance, audit logs, security incidents, Identity Protection risk, and automated reports (HTML, markdown file, or inline chat).",
"drill_down_prompt": "Investigate user {entity} — sign-in anomalies, MFA, audit trail, Identity Protection risk",
"threat_pulse_domains": [
"identity"
]
}
User Security Investigation - Instructions
Purpose
This skill performs comprehensive security investigations on Entra ID user accounts, analyzing sign-in patterns, anomalies, MFA status, device compliance, audit logs, Office 365 activity, security incidents, and Identity Protection risk signals.
📑 TABLE OF CONTENTS
- Critical Workflow Rules - Start here!
- Investigation Types - Standard/Quick/Comprehensive
- Output Modes - Inline / Markdown file / HTML report
- Quick Start - 6-step investigation pattern
- Execution Workflow - Complete process
- Sample KQL Queries - Validated query patterns
- Microsoft Graph Queries - Identity Protection integration
- Markdown Report Template - Full markdown report structure
- JSON Export Structure - Required fields (HTML report)
- Error Handling - Troubleshooting guide
- SVG Dashboard Generation - Visual dashboard from report data
Investigation shortcuts:
- Risky user quick triage (TP Q3): Q6 (security incidents) → Q2 (anomalies) → Q12 (UEBA anomalies) → Q3d (sign-ins by IP) → Graph: MFA methods
- Compromised user forensics (TP Q3+Q9): Q3 (sign-in summary) → Q5 (OfficeActivity) → Q3d (IP breakdown) → Q1 (priority IPs for enrichment)
- Password spray target (TP Q4): Q3c (sign-in failures) → Q3d (IPs hitting this user) → Q6 (related incidents)
- Post-incident user timeline (TP Q1, incident follow-up): Q4 (audit logs) → Q5 (O365 activity) → Q10 (DLP events) → Q6 (all incidents)
- IP enrichment for user (TP Q3+Q4): Q1 (priority IP extraction) → Q11 (TI matches) →
enrich_ips.py - UEBA behavioral context (TP Q3, portal UEBA anomalies): Q12 (Anomalies table) → Q6 (related incidents) → Q4 (audit trail)
⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run full Batch 1 + Batch 2 when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).
⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️
Before starting ANY user investigation:
- ALWAYS get User Object ID FIRST (required for SecurityIncident and Identity Protection queries)
- ALWAYS calculate date ranges correctly (use current date from context - see Date Range section)
- ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, HTML report, or any combination (see Output Modes)
- ALWAYS track and report time after each major step (mandatory)
- ALWAYS run independent queries in parallel (drastically faster execution)
- ALWAYS use
create_filefor JSON export and markdown reports (NEVER use PowerShell terminal commands) - ⛔ ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
⛔ MANDATORY: Sentinel Workspace Selection
This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:
When invoked from a parent skill (incident-investigation, threat-pulse, etc.):
- Inherit the workspace selection from the parent investigation context
- If no workspace was selected in parent context: STOP and ask user to select
- Use the
SELECTED_WORKSPACE_IDSpassed from the parent skill - Skip output mode prompts — default to inline chat (the parent skill controls the final output format)
When invoked standalone (direct user request):
- ALWAYS call
list_sentinel_workspacesMCP tool FIRST - If 1 workspace exists: Auto-select, display to user, proceed
- If multiple workspaces exist:
- Display all workspaces with Name and ID
- ASK: "Which Sentinel workspace should I use for this investigation?"
- ⛔ STOP AND WAIT for user response
- ⛔ DO NOT proceed until user explicitly selects
- If a query fails on the selected workspace:
- ⛔ DO NOT automatically try another workspace
- STOP and report the error
- Display available workspaces
- ASK user to select a different workspace
- WAIT for user response
Workspace Failure Handling
IF query returns "Failed to resolve table" or similar error:
- STOP IMMEDIATELY
- Report: "⚠️ Query failed on workspace [NAME] ([ID]). Error: [ERROR_MESSAGE]"
- Display: "Available workspaces: [LIST_ALL_WORKSPACES]"
- ASK: "Which workspace should I use instead?"
- WAIT for explicit user response
- DO NOT retry with a different workspace automatically
🔴 PROHIBITED ACTIONS:
- ❌ Selecting a workspace without user consent when multiple exist
- ❌ Switching to another workspace after a failure without asking
- ❌ Proceeding with investigation if workspace selection is ambiguous
- ❌ Assuming a workspace based on previous sessions
Date Range Rules:
- Real-time/recent searches: Add +2 days to current date for end range
- Historical ranges: Add +1 day to user's specified end date
- Example: Current date = Nov 25; "Last 7 days" →
datetime(2025-11-18)todatetime(2025-11-27)
Available Investigation Types
Standard Investigation (7 days)
When to use: General security reviews, routine investigations
Example prompts:
- "Investigate user@contoso.com for the last 7 days"
- "Run security investigation for user@domain.com from 2025-11-14 to 2025-11-21"
Quick Investigation (1 day)
When to use: Urgent cases, recent suspicious activity
Example prompts:
- "Quick investigate suspicious.user@domain.com"
- "Run quick security check on admin@company.com"
Comprehensive Investigation (30 days)
When to use: Deep-dive analysis, compliance reviews, thorough forensics
Example prompts:
- "Full investigation for compromised.user@domain.com"
- "Do a deep dive investigation on external.user@partner.com"
All types include: Anomaly detection, sign-in analysis, IP enrichment, Graph identity data, device compliance, audit logs, Office 365 activity, security alerts, threat intelligence, risk assessment, and automated recommendations.
Output Modes
This skill supports three output modes. ASK the user which they prefer if not explicitly specified. Multiple modes may be selected simultaneously.
Mode 1: Inline Chat Summary (Default)
- Render the full investigation analysis directly in the chat response
- Includes key metrics, risk assessment, anomalies, IP intelligence, sign-in patterns, and recommendations
- Best for quick review and interactive follow-up questions
- No file output — results stay in the chat context
Mode 2: Markdown File Report
- Save a comprehensive investigation report to
reports/user-investigations/user_investigation_<username>_<YYYYMMDD_HHMMSS>.md - All sections from inline mode plus additional detail (full IP tables, query appendix, complete audit trail)
- Uses the Markdown Report Template defined below
- Use
create_filetool — NEVER use terminal commands for file output - Filename pattern:
user_investigation_<username>_YYYYMMDD_HHMMSS.md(extract username from UPN, e.g.,jdoefromjdoe@contoso.com)
Mode 3: HTML Report (Legacy)
- Export investigation data to JSON, then generate a styled HTML report via
generate_report_from_json.py - Interactive IP cards, paginated tables, copy-KQL buttons, and risk-colored visualizations
- Best for sharing with stakeholders who prefer a polished visual report
- Requires the Python report generator pipeline (JSON export → IP enrichment → HTML generation)
Markdown Rendering Notes
- ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
- ✅ Unicode block characters (
█full block,─box-drawing horizontal) display correctly in monospaced fonts - ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
- ✅ Standard markdown tables (
| col |) render as formatted tables - Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering
Mode Selection Examples
| User Request | Mode(s) |
|---|---|
| "Investigate user@domain.com" (no mode specified) | ASK user to choose |
| "Investigate user@domain.com — markdown report" | Mode 2 only |
| "Investigate user@domain.com — full report" | Mode 2 + Mode 3 (both) |
| "Quick investigate user@domain.com" | Mode 1 (inline) |
| "Investigate user@domain.com — HTML report" | Mode 3 only |
| "Investigate user@domain.com — inline and markdown" | Mode 1 + Mode 2 |
Quick Start (TL;DR)
When a user requests a security investigation:
-
Get User ID:
mcp_microsoft_mcp_microsoft_graph_suggest_queries("get user by email") mcp_microsoft_mcp_microsoft_graph_get("/v1.0/users/<UPN>?$select=id,onPremisesSecurityIdentifier") -
Determine Output Mode:
- If user specified: use that mode (inline / markdown / HTML / combination)
- If not specified: ASK user — "Which output format? Inline chat summary, markdown file report, HTML report, or a combination?"
-
Run Parallel Queries:
- Batch 1: 10 Sentinel queries (anomalies, IP extraction, sign-ins, IP counts, audit logs, incidents, etc.)
- Batch 2: 6 Graph queries (profile, MFA, devices, Identity Protection)
- Batch 3: Threat intel enrichment (after extracting IPs from batch 1)
-
Generate Output (based on selected mode):
Mode 1 — Inline: Render analysis directly in chat (no file output)
Mode 2 — Markdown file:
create_file("reports/user-investigations/user_investigation_<username>_<timestamp>.md", markdown_content)Mode 3 — HTML report:
create_file("temp/investigation_<upn_prefix>_<timestamp>.json", json_content)$env:PYTHONPATH = "<WORKSPACE_ROOT>" .venv\Scripts\python.exe scripts/generate_report_from_json.py temp/investigation_<upn_prefix>_<timestamp>.json -
IP Enrichment (Modes 2 & 3):
- Mode 2 (Markdown): Run
python enrich_ips.py <ip1> <ip2> ...for top IPs extracted from queries, then include enrichment results in the markdown report - Mode 3 (HTML): IP enrichment is handled automatically by
generate_report_from_json.py
- Mode 2 (Markdown): Run
-
Track time after each major step and report to user
Execution Workflow
🚨 MANDATORY: Time Tracking Pattern
YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:
[MM:SS] ✓ Step description (XX seconds)
Required Reporting Points:
- After User ID retrieval
- After parallel data collection
- After JSON file creation
- After report generation
- Final: Total elapsed time
Phase 1: Get User ID and SID (REQUIRED FIRST)
- Get user Object ID (Entra ID) and onPremisesSecurityIdentifier (Windows SID) from Microsoft Graph
- Query: /v1.0/users/<UPN>?$select=id,onPremisesSecurityIdentifier
Why this is required:
- User ID needed for SecurityIncident queries (alerts use User ID, not UPN)
- User ID needed for Identity Protection queries
- Windows SID needed for on-premises incident matching
- Missing User ID = missed incidents (e.g., "Device Code Authentication Flow Detected")
Phase 2: Parallel Data Collection
CRITICAL: Use create_file tool to create JSON - NEVER use PowerShell terminal commands!
Batch 1: Sentinel Queries (Run ALL in parallel)
- IP selection query (Query 1) - Returns up to 15 prioritized IPs
- Anomalies query (Query 2)
- UEBA anomaly summary (Query 12) - Sentinel Anomalies table: scored behavioral detections
- Sign-in by application (Query 3)
- Sign-in by location (Query 3b)
- Sign-in failures (Query 3c)
- Audit logs (Query 4)
- Office 365 activity (Query 5)
- DLP events (Query 10)
- Security incidents (Query 6)
After Batch 1 completes: Extract IP Array from Query 1 Results
- Extract IPAddress column into array:
["ip1", "ip2", "ip3", ...] - Build dynamic array for next batch:
let target_ips = dynamic(["ip1", "ip2", "ip3", ...]);
Batch 2: IP Enrichment + Graph Queries (Run ALL in parallel)
- Threat Intel query (Query 11) - Uses IPs from Query 1
- IP frequency query (Query 3d) - Uses IPs from Query 1
- User profile (Graph)
- MFA methods (Graph)
- Registered devices (Graph)
- User risk profile (Graph)
- Risk detections (Graph)
- Risky sign-ins (Graph)
IP Selection Strategy (Query 1 - Deterministic KQL with Risky IPs):
- Priority 1: Anomaly IPs (from Signinlogs_Anomalies_KQL_CL where AnomalyType endswith "IP") - 8 slots
- Priority 2: Risky IPs (from AADUserRiskEvents - Identity Protection flagged IPs) - 4 slots
- Priority 3: Frequent IPs (top sign-in count for baseline context) - 3 slots
- Deduplication: Anomaly IPs exclude from risky; Anomaly+Risky exclude from frequent (no duplicates)
- Result: Up to 15 unique IPs (8 anomaly + 4 risky-only + 3 frequent-only)
Phase 3: Export & Generate Report (Mode-Dependent)
Mode 1 — Inline Chat Summary
- No file export needed
- Render the full investigation analysis directly in chat using the section structure from the Markdown Report Template as a guide
- Include: Executive Summary, Key Metrics, Anomalies, IP Intelligence summary, Sign-in Patterns, Risk Assessment, Recommendations
- Use emoji-coded tables for risk factors and mitigating factors
Mode 2 — Markdown File Report
-
Assess IP enrichment needs:
- Extract the top priority IPs from Query 1 results
- Run
python enrich_ips.py <ip1> <ip2> ...for threat intelligence enrichment - Parse the output to populate IP Intelligence tables in the report
-
Build the markdown report using the Markdown Report Template below
- Populate ALL sections with actual query data
- For sections with no data: use the explicit absence confirmation pattern (e.g., "✅ No anomalies detected...")
- Calculate risk score and assessment dynamically (same logic as HTML report — see
generate_report_from_json.py)
-
Save the report:
create_file("reports/user-investigations/user_investigation_<username>_YYYYMMDD_HHMMSS.md", markdown_content)- Use
create_filetool — NEVER use terminal commands for file output - Extract username from UPN (e.g.,
jdoefromjdoe@contoso.com)
- Use
Mode 3 — HTML Report (Legacy)
-
Export to JSON: Create single JSON file:
temp/investigation_{upn_prefix}_{timestamp}.jsonMerge all results into one dict structure (see JSON Export Structure section below). -
Generate HTML report:
$env:PYTHONPATH = "<WORKSPACE_ROOT>" cd "<WORKSPACE_ROOT>" .\.venv\Scripts\python.exe scripts/generate_report_from_json.py temp/investigation_<upn_prefix>_<timestamp>.json
The HTML report generator handles:
- Dataclass transformation logic
- IP enrichment (prioritized: anomaly IPs first, then frequent sign-in IPs, cap at 10)
- Dynamic risk assessment (NO hardcoded text - all metrics calculated from data)
- KQL query template population
- Result counts calculation
- HTML report generation with modern, streamlined design
Combining Modes
When multiple modes are selected (e.g., "markdown and HTML"):
- Run the data collection once (Phase 2)
- Generate each output format in sequence
- For Mode 2 + Mode 3: the JSON export from Mode 3 can reuse the same data; generate markdown first, then JSON + HTML
Required Field Specifications
User Profile Query
/v1.0/users/<UPN>?$select=id,displayName,userPrincipalName,mail,userType,jobTitle,department,officeLocation,accountEnabled,onPremisesSecurityIdentifier
- All fields REQUIRED for report generation
- Default null values:
department="Unknown",officeLocation="Unknown" onPremisesSecurityIdentifierreturns Windows SID (format:S-1-5-21-...) - REQUIRED for on-premises incident matching
Device Query
/v1.0/users/<USER_ID>/ownedDevices?$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,registrationDateTime,isCompliant,isManaged,trustType,approximateLastSignInDateTime&$orderby=approximateLastSignInDateTime desc&$top=5&$count=true
- All fields REQUIRED for report generation
- Default null values:
trustType="Workplace",approximateLastSignInDateTime="2025-01-01T00:00:00Z"
MFA Methods Query
/v1.0/users/<USER_ID>/authentication/methods?$top=5
Sample KQL Queries
Replace <UPN>, <StartDate>, <EndDate> in these patterns.
⚠️ CRITICAL: START WITH THESE EXACT QUERY PATTERNS These queries have been tested and validated. Use them as your PRIMARY reference.
Tool Selection for This Skill
Follow the global tool selection rule from copilot-instructions.md:
| Investigation Lookback | Tool | Reason |
|---|---|---|
| ≤ 30 days (Quick, Standard, Comprehensive) | RunAdvancedHuntingQuery |
Free for Analytics-tier tables; covers all connected workspace tables |
| > 30 days (custom range) | mcp_sentinel-data_query_lake |
AH only retains 30 days |
| AH query blocked by safety filter | mcp_sentinel-data_query_lake |
Fallback |
| AH returns "table not found" | mcp_sentinel-data_query_lake |
Fallback |
Default: Use RunAdvancedHuntingQuery for all standard investigations. All three investigation types (1d, 7d, 30d) fit within AH's 30-day retention window. Only fall back to Data Lake when the lookback exceeds 30 days or AH fails.
Timestamp column: All tables used in this skill (SigninLogs, AuditLogs, SecurityAlert, SecurityIncident, OfficeActivity, CloudAppEvents, AADUserRiskEvents, Signinlogs_Anomalies_KQL_CL, ThreatIntelIndicators) use TimeGenerated in both tools — no adaptation needed when switching.
📅 Date Range Quick Reference
🔴 STEP 0: GET CURRENT DATE FIRST (MANDATORY) 🔴
- ALWAYS check the current date from the context header BEFORE calculating date ranges
- NEVER use hardcoded years - the year changes and you WILL query the wrong timeframe
RULE 1: Real-Time/Recent Searches (Current Activity)
- Add +2 days to current date for end range
- Why +2? +1 for timezone offset (PST behind UTC) + +1 for inclusive end-of-day
- Pattern: Today is Nov 25 (PST) → Use
datetime(2025-11-27)as end date
RULE 2: Historical Searches (User-Specified Dates)
- Add +1 day to user's specified end date
- Why +1? To include all 24 hours of the final day
Examples Table (Assuming Current Date = November 27, 2025):
| User Request | <StartDate> |
<EndDate> |
Rule Applied |
|---|---|---|---|
| "Last 7 days" | 2025-11-20 |
2025-11-29 |
Rule 1 (+2) |
| "Last 30 days" | 2025-10-28 |
2025-11-29 |
Rule 1 (+2) |
| "Nov 21 to Nov 23" | 2025-11-21 |
2025-11-24 |
Rule 2 (+1) |
🚨 CRITICAL - SIGN-IN QUERIES REQUIREMENT 🚨
You MUST run ALL THREE sign-in queries (3, 3b, 3c) to populate the signin_events dict!
1. Extract Top Priority IPs (Deterministic IP Selection with Risky IPs)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let upn = '<UPN>';
// Priority 1: Anomaly IPs (top 8 by anomaly count)
let anomaly_ips =
Signinlogs_Anomalies_KQL_CL
| where DetectedDateTime between (start .. end)
| where UserPrincipalName =~ upn
| where AnomalyType endswith "IP"
| summarize AnomalyCount = count(), FirstSeen = min(DetectedDateTime) by IPAddress = Value
| order by AnomalyCount desc, FirstSeen asc
| take 8
| extend Priority = 1, Source = "Anomaly";
// Priority 2: Risky IPs from Identity Protection (top 10 for selection pool)
let risky_ips_pool =
AADUserRiskEvents
| where ActivityDateTime between (start .. end)
| where UserPrincipalName =~ upn
| where isnotempty(IpAddress)
| summarize RiskCount = count(), FirstSeen = min(ActivityDateTime) by IPAddress = IpAddress
| order by RiskCount desc, FirstSeen asc
| take 10
| extend Priority = 2, Source = "RiskyIP";
// Priority 3: Frequent Sign-in IPs (top 10 for selection pool)
let frequent_ips_pool =
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ upn
| summarize SignInCount = count(), FirstSeen = min(TimeGenerated) by IPAddress
| order by SignInCount desc, FirstSeen asc
| take 10
| extend Priority = 3, Source = "Frequent";
// Get anomaly IP list for exclusion from risky slot
let anomaly_ip_list = anomaly_ips | project IPAddress;
// Get anomaly + risky IP list for exclusion from frequent slot
let priority_ip_list =
union anomaly_ips, risky_ips_pool
| project IPAddress;
// Reserve slots with deduplication: 8 anomaly + 4 risky + 3 frequent
let anomaly_slot = anomaly_ips | extend Count = AnomalyCount;
let risky_slot = risky_ips_pool
| join kind=anti anomaly_ip_list on IPAddress
| order by RiskCount desc, FirstSeen asc
| take 4
| extend Count = RiskCount;
let frequent_slot = frequent_ips_pool
| join kind=anti priority_ip_list on IPAddress
| order by SignInCount desc, FirstSeen asc
| take 3
| extend Count = SignInCount;
union anomaly_slot, risky_slot, frequent_slot
| project IPAddress, Priority, Count, Source
| order by Priority asc, Count desc
| project IPAddress
2. Anomalies (Signinlogs_Anomalies_KQL_CL)
Signinlogs_Anomalies_KQL_CL
| where DetectedDateTime between (datetime(<StartDate>) .. datetime(<EndDate>))
| where UserPrincipalName =~ '<UPN>'
| extend Severity = case(
BaselineSize < 3, "Informational",
CountryNovelty and CityNovelty and ArtifactHits >= 20, "High",
ArtifactHits >= 10, "Medium",
(CountryNovelty or CityNovelty or StateNovelty), "Medium",
ArtifactHits >= 5, "Low",
"Informational")
| extend SeverityOrder = case(Severity == 'High', 1, Severity == 'Medium', 2, Severity == 'Low', 3, 4)
| project
DetectedDateTime,
UserPrincipalName,
AnomalyType,
Value,
Severity,
SeverityOrder,
Country,
City,
State,
CountryNovelty,
CityNovelty,
StateNovelty,
ArtifactHits,
FirstSeenRecent,
BaselineSize,
OS,
BrowserFamily,
RawBrowser
| order by SeverityOrder asc, DetectedDateTime desc
| take 10
3. Interactive & Non-Interactive Sign-ins (Summary by Application)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| summarize
SignInCount=count(),
SuccessCount=countif(ResultType == '0'),
FailureCount=countif(ResultType != '0'),
FirstSeen=min(TimeGenerated),
LastSeen=max(TimeGenerated),
IPAddresses=make_set(IPAddress),
UniqueLocations=dcount(Location)
by AppDisplayName
| order by SignInCount desc
| take 5
3b. Sign-ins Summary by Location
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| where isnotempty(Location)
| summarize
SignInCount=count(),
SuccessCount=countif(ResultType == '0'),
FailureCount=countif(ResultType != '0'),
FirstSeen=min(TimeGenerated),
LastSeen=max(TimeGenerated),
IPAddresses=make_set(IPAddress),
Applications=make_set(AppDisplayName, 5)
by Location
| order by SignInCount desc
| take 5
3c. Sign-in Failures (Detailed)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| where ResultType != '0'
| summarize
FailureCount=count(),
FirstSeen=min(TimeGenerated),
LastSeen=max(TimeGenerated),
Applications=make_set(AppDisplayName, 3),
Locations=make_set(Location, 3)
by ResultType, ResultDescription
| order by FailureCount desc
| take 5
3d. Sign-in Counts by IP Address
let target_ips = dynamic(["<IP_1>", "<IP_2>", "<IP_3>", ...]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let most_recent_signins = union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| where IPAddress in (target_ips)
| summarize arg_max(TimeGenerated, *) by IPAddress;
most_recent_signins
| extend AuthDetails = parse_json(AuthenticationDetails)
| extend HasAuthDetails = array_length(AuthDetails) > 0
| extend AuthDetailsToExpand = iif(HasAuthDetails, AuthDetails, dynamic([{"authenticationStepResultDetail": ""}]))
| mv-expand AuthDetailsToExpand
| extend AuthStepResultDetail = tostring(AuthDetailsToExpand.authenticationStepResultDetail)
| extend AuthPriority = case(
AuthStepResultDetail has "MFA requirement satisfied", 1,
AuthStepResultDetail has "Correct password", 2,
AuthStepResultDetail has "Passkey", 2,
AuthStepResultDetail has "Phone sign-in", 2,
AuthStepResultDetail has "SMS verification", 2,
AuthStepResultDetail has "First factor requirement satisfied", 3,
AuthStepResultDetail has "MFA required", 4,
999)
| summarize
MostRecentTime = any(TimeGenerated),
MostRecentResultType = any(ResultType),
HasAuthDetails = any(HasAuthDetails),
MinPriority = min(AuthPriority),
AllAuthDetails = make_set(AuthStepResultDetail)
by IPAddress
| extend LastAuthResultDetail = case(
MostRecentResultType != "0", "Authentication failed",
not(HasAuthDetails) and MostRecentResultType == "0", "Token",
MinPriority == 1 and AllAuthDetails has "MFA requirement satisfied", "MFA requirement satisfied by claim in the token",
MinPriority == 2 and AllAuthDetails has "Correct password", "Correct password",
MinPriority == 2 and AllAuthDetails has "Passkey (device-bound)", "Passkey (device-bound)",
MinPriority == 3 and AllAuthDetails has "First factor requirement satisfied by claim in the token", "First factor requirement satisfied by claim in the token",
MinPriority == 4 and AllAuthDetails has "MFA required in Entra ID", "MFA required in Entra ID",
tostring(AllAuthDetails[0]))
| join kind=inner (
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| where IPAddress in (target_ips)
| summarize
SignInCount = count(),
SuccessCount = countif(ResultType == '0'),
FailureCount = countif(ResultType != '0'),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated)
by IPAddress
) on IPAddress
| project IPAddress, SignInCount, SuccessCount, FailureCount, FirstSeen, LastSeen, LastAuthResultDetail
| order by SignInCount desc
4. Entra ID Audit Log Activity (Aggregated Summary)
Tool: RunAdvancedHuntingQuery (≤30d) | mcp_sentinel-data_query_lake (>30d fallback)
AH parsing note: InitiatedBy is dynamic in AH — use tostring(InitiatedBy.user.userPrincipalName) for direct field access. For TargetResources, use tostring(TargetResources[0].displayName). Do NOT double-wrap with parse_json(tostring(parse_json(tostring(...)))) — that Data Lake pattern can cause errors in AH.
AuditLogs
| where TimeGenerated between (datetime(<StartDate>) .. datetime(<EndDate>))
| where Identity =~ '<UPN>' or tostring(InitiatedBy) has '<UPN>'
| summarize
Count=count(),
FirstSeen=min(TimeGenerated),
LastSeen=max(TimeGenerated),
Operations=make_set(OperationName, 10)
by Category, Result
| order by Count desc
| take 10
Ad-hoc drill-down pattern (AH-safe): When you need detailed audit entries beyond the summary above:
AuditLogs
| where TimeGenerated between (datetime(<StartDate>) .. datetime(<EndDate>))
| where Identity =~ '<UPN>' or tostring(InitiatedBy) has '<UPN>'
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)
| extend Target = tostring(TargetResources[0].displayName)
| project TimeGenerated, OperationName, Actor, Target, Result, Category
| order by TimeGenerated desc
| take 30
5. Office 365 (Email / Teams / SharePoint) Activity Distribution
OfficeActivity
| where TimeGenerated between (datetime(<StartDate>) .. datetime(<EndDate>))
| where UserId =~ '<UPN>'
| summarize ActivityCount = count() by RecordType, Operation
| order by ActivityCount desc
| take 5
6. Security Incidents with Alerts Correlated to User
let targetUPN = "<UPN>";
let targetUserId = "<USER_OBJECT_ID>"; // REQUIRED: Get from Microsoft Graph API
let targetSid = "<WINDOWS_SID>"; // REQUIRED: Get from Microsoft Graph API
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let relevantAlerts = SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has targetUPN or Entities has targetUserId or Entities has targetSid
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProviderName, Tactics;
SecurityIncident
| where CreatedTime between (start .. end)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where not(tostring(Labels) has "Redirected")
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend ProviderIncidentUrl = tostring(AdditionalData.providerIncidentUrl)
| extend OwnerUPN = tostring(Owner.userPrincipalName)
| extend LastModifiedTime = todatetime(LastModifiedTime)
| summarize
Title = any(Title),
Severity = any(Severity),
Status = any(Status),
Classification = any(Classification),
CreatedTime = any(CreatedTime),
LastModifiedTime = any(LastModifiedTime),
OwnerUPN = any(OwnerUPN),
ProviderIncidentUrl = any(ProviderIncidentUrl),
AlertCount = count()
by ProviderIncidentId
| order by LastModifiedTime desc
| take 10
CRITICAL: ALL THREE identifiers are REQUIRED (targetUPN, targetUserId, targetSid) - different alert types use different entity formats.
10. DLP Events (Data Loss Prevention)
let upn = '<UPN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
CloudAppEvents
| where TimeGenerated between (start .. end)
| where ActionType in ("FileCopiedToRemovableMedia", "FileUploadedToCloud", "FileCopiedToNetworkShare")
| extend ParsedData = parse_json(RawEventData)
| extend DlpAudit = ParsedData["DlpAuditEventMetadata"]
| extend File = ParsedData["ObjectId"]
| extend UserId = ParsedData["UserId"]
| extend DeviceName = ParsedData["DeviceName"]
| extend ClientIP = ParsedData["ClientIP"]
| extend RuleName = ParsedData["PolicyMatchInfo"]["RuleName"]
| extend Operation = ParsedData["Operation"]
| extend TargetDomain = ParsedData["TargetDomain"]
| extend TargetFilePath = ParsedData["TargetFilePath"]
| where isnotnull(DlpAudit)
| where UserId == upn
| summarize by TimeGenerated, tostring(UserId), tostring(DeviceName), tostring(ClientIP), tostring(RuleName), tostring(File), tostring(Operation), tostring(TargetDomain), tostring(TargetFilePath)
| order by TimeGenerated desc
| take 5
11. Threat Intelligence IP Enrichment (Bulk IP Query)
Performance notes: Filter IsActive/ValidUntil before transformations per KQL best practices. The triple replace_string was replaced with direct array indexing split(...)[0].
let target_ips = dynamic(["<IP_1>", "<IP_2>", "<IP_3>"]);
ThreatIntelIndicators
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| where tostring(split(ObservableKey, ":")[0]) in ("ipv4-addr", "ipv6-addr", "network-traffic")
| where ObservableValue in (target_ips)
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| extend TrafficLightProtocolLevel = tostring(parse_json(AdditionalFields).TLPLevel)
| extend ActivityGroupNames = extract(@"ActivityGroup:(\S+)", 1, tostring(parse_json(Data).labels))
| summarize arg_max(TimeGenerated, *) by ObservableValue
| project
TimeGenerated,
IPAddress = ObservableValue,
ThreatDescription = Description,
ActivityGroupNames,
Confidence,
ValidUntil,
TrafficLightProtocolLevel,
IsActive
| order by Confidence desc, TimeGenerated desc
12. UEBA Anomaly Summary (Sentinel Anomalies Table)
Purpose: Retrieves scored behavioral anomaly detections from Sentinel's built-in UEBA anomaly rules. Aggregates by anomaly type — collapses high-volume rows (e.g., 50 "Anomalous Role Assignment" events) into a single summary row per template. Extracts only the anomalous flags (IsAnomalous == true) and flattens MITRE arrays. Score range: 0.0–1.0 (≥0.7 = High, 0.3–0.7 = Medium, <0.3 = Low).
Data source: The Anomalies table is the KQL source behind the portal's "UEBA anomalies" section. It is distinct from BehaviorInfo (MCAS, AH-only) and BehaviorAnalytics (raw UEBA events, Data Lake-only). Available in both Advanced Hunting and Data Lake.
Tool: RunAdvancedHuntingQuery (default) or mcp_sentinel-data_query_lake (>30d fallback)
⚠️ TI False Positive: DeviceInsights.ThreatIntelIndicatorType frequently shows BruteForce on corporate/Azure egress IPs (TITAN dynamic reputation). Weight the Score and AnomalyFlags over the TI match — a 0.2-score anomaly with a BruteForce TI hit on a known corporate IP is noise.
let targetUPN = '<UPN>';
let lookback = 30d;
Anomalies
| where TimeGenerated > ago(lookback)
| where UserPrincipalName =~ targetUPN
| extend TI_Type = tostring(DeviceInsights.ThreatIntelIndicatorType)
| mv-apply reason = AnomalyReasons on (
where tobool(reason.IsAnomalous) == true
| project FlagName = tostring(reason.Name))
| summarize
Occurrences = dcount(Id),
MaxScore = max(Score),
AvgScore = round(avg(Score), 2),
Tactics = make_set(parse_json(Tactics)),
Techniques = make_set(parse_json(Techniques)),
SourceIPs = make_set(SourceIpAddress, 5),
AnomalyFlags = make_set(FlagName),
TI_Flags = make_set_if(TI_Type, isnotempty(TI_Type)),
FirstSeen = min(StartTime),
LastSeen = max(EndTime),
SampleDescription = take_any(Description)
by AnomalyTemplateName
| mv-apply t = Tactics to typeof(string) on (summarize Tactics = make_set(t))
| mv-apply t = Techniques to typeof(string) on (summarize Techniques = make_set(t))
| extend Tactics = set_difference(Tactics, dynamic([""]))
| extend Techniques = set_difference(Techniques, dynamic([""]))
| order by MaxScore desc, Occurrences desc
Output columns: AnomalyTemplateName, Occurrences (unique anomaly IDs), MaxScore, AvgScore, Tactics, Techniques, SourceIPs, AnomalyFlags (flat set of anomalous reasons), TI_Flags, FirstSeen, LastSeen, SampleDescription (one example description for context).
Verdict guidance:
- 🔴 Escalate: MaxScore ≥ 0.7 with multiple occurrences, or anomaly type involves credential access / account manipulation
- 🟠 Investigate: MaxScore ≥ 0.3, or flags include
CountryUncommonlyConnectedFromByUsercombined withActionUncommonlyPerformedByUser - 🟡 Monitor: Low scores (<0.3) with explainable flags (e.g., first-time admin operations, CTF/lab accounts in target entities)
- ✅ Clear: 0 results — no UEBA anomalies detected
Zero results note: Unlike Q2 (custom Signinlogs_Anomalies_KQL_CL), Q12 queries the built-in Sentinel UEBA Anomalies table. Zero results means no built-in anomaly rules fired — not that UEBA is disabled. If UEBA is not enabled in the workspace, the table may not exist (handle gracefully).
Microsoft Graph Identity Protection Queries
CRITICAL: Always query Identity Protection data in Phase 2 (Batch 2) of investigation workflow
Step 1: Get User Object ID and Windows SID
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/users/<UPN>?$select=id,displayName,userPrincipalName,onPremisesSecurityIdentifier")
Step 2: Get User Risk Profile
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/identityProtection/riskyUsers/<USER_ID>")
Returns: riskLevel (low/medium/high/none), riskState (atRisk/confirmedCompromised/dismissed/remediated)
Step 3: Get Risk Detections
mcp_microsoft_mcp_microsoft_graph_get("/v1.0/identityProtection/riskDetections?$filter=userId eq '<USER_ID>'&$select=id,detectedDateTime,riskEventType,riskLevel,riskState,riskDetail,ipAddress,location,activity,activityDateTime&$orderby=detectedDateTime desc&$top=10")
Returns: Array of risk events with riskEventType (unlikelyTravel, unfamiliarFeatures, anonymizedIPAddress, etc.)
Step 4: Get Risky Sign-ins
mcp_microsoft_mcp_microsoft_graph_get("/beta/auditLogs/signIns?$filter=userId eq '<USER_ID>' and (riskState eq 'atRisk' or riskState eq 'confirmedCompromised')&$select=id,createdDateTime,userPrincipalName,appDisplayName,ipAddress,location,riskState,riskLevelDuringSignIn,riskEventTypes_v2,riskDetail,status&$orderby=createdDateTime desc&$top=5")
NOTE: Risky sign-ins are ONLY available in /beta endpoint, not /v1.0
Common Risk Event Types
- unlikelyTravel: User traveled impossible distance between sign-ins
- unfamiliarFeatures: Sign-in from unfamiliar location/device/IP
- anonymizedIPAddress: Sign-in from Tor, VPN, or proxy
- maliciousIPAddress: Sign-in from known malicious IP
- leakedCredentials: User credentials found in leak databases
Markdown Report Template
When outputting to markdown file (Mode 2), use this template. Populate ALL sections with actual query data. For sections with no data, use the explicit absence confirmation pattern.
Filename pattern: reports/user-investigations/user_investigation_<username>_YYYYMMDD_HHMMSS.md
# User Security Investigation Report
**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**User:** <display_name> (`<UPN>`)
**Department:** <department> | **Title:** <job_title> | **Location:** <office_location>
**Account Status:** <Enabled/Disabled> | **User Type:** <Member/Guest>
**Investigation Period:** <start_date> → <end_date> (<N> days)
**Investigation Type:** <Standard (7d) / Quick (1d) / Comprehensive (30d)>
**Data Sources:** SigninLogs, AADNonInteractiveUserSignInLogs, AuditLogs, SecurityAlert, SecurityIncident, OfficeActivity, CloudAppEvents, AADUserRiskEvents, Signinlogs_Anomalies_KQL_CL, Identity Protection (Graph API), ThreatIntelIndicators
---
## Executive Summary
<2-4 sentence summary: overall risk level, key findings, most significant anomalies or concerns, and primary recommendation. Ground every claim in evidence from query results.>
**Overall Risk Level:** 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL
---
## Key Metrics
| Metric | Value |
|--------|-------|
| **Total Sign-ins** | <count> |
| **Successful** | <count> (<percentage>%) |
| **Failed** | <count> (<percentage>%) |
| **Unique IPs** | <count> |
| **Unique Locations** | <count> |
| **Anomalies Detected** | <count> (High: <n>, Medium: <n>, Low: <n>) |
| **Security Incidents** | <count> (Open: <n>, Closed: <n>) |
| **Risk Detections** | <count> (atRisk: <n>, remediated: <n>) |
| **DLP Events** | <count> |
| **MFA Methods** | <count> methods |
---
## MFA & Authentication Status
| Factor | Status |
|--------|--------|
| **MFA Enabled** | 🟢 Yes / 🔴 No |
| **Methods** | <list of methods: Authenticator, FIDO2, Phone, etc.> |
| **FIDO2/Passkey** | 🟢 Enrolled / 🟡 Not enrolled |
| **Authenticator App** | 🟢 Enrolled / 🟡 Not enrolled |
| **Phishing-Resistant** | 🟢 Yes (passkey/FIDO2) / 🟡 No |
---
## Identity Protection
### User Risk Profile
| Field | Value |
|-------|-------|
| **Risk Level** | 🔴/🟠/🟡/🟢 <high/medium/low/none> |
| **Risk State** | <atRisk / confirmedCompromised / remediated / dismissed / none> |
| **Risk Detail** | <detail text> |
| **Last Updated** | <datetime> |
### Risk Detections
<If risk detections found:>
| Detected | Risk Type | Level | State | IP Address | Location | Activity |
|----------|-----------|-------|-------|------------|----------|----------|
| <datetime> | <riskEventType> | <level> | <state> | <ip> | <city, country> | <signin/user> |
<If no risk detections:>
✅ No Identity Protection risk detections for this user in the investigation period.
### Risky Sign-ins
<If risky sign-ins found:>
| Time | Application | IP Address | Location | Risk Level | Risk State | Detail |
|------|-------------|------------|----------|------------|------------|--------|
| <datetime> | <app> | <ip> | <city, country> | <level> | <state> | <detail> |
<If no risky sign-ins:>
✅ No risky sign-ins detected for this user in the investigation period.
---
## Anomalies (Signinlogs_Anomalies_KQL_CL)
<If anomalies found:>
| Detected | Type | Value | Severity | Location | Hits | Geo Novelty |
|----------|------|-------|----------|----------|------|-------------|
| <datetime> | <NewInteractiveIP / NewInteractiveDeviceCombo / etc.> | <IP or OS\|Browser> | 🔴/🟠/🟡 <severity> | <country, city> | <count> | <Country: Y/N, City: Y/N> |
**Anomaly Summary:**
- <X> new IP addresses detected (Y with geographic novelty)
- <X> new device combinations detected
- Highest severity: <level> — <brief description of most critical anomaly>
<If no anomalies:>
✅ No sign-in anomalies detected in the investigation period.
- Checked: Signinlogs_Anomalies_KQL_CL (0 records)
---
## IP Intelligence
<Table of up to 15 prioritized IPs with enrichment data. Run `enrich_ips.py` for top IPs.>
| IP Address | Source | Location | ISP/Org | VPN | Abuse Score | Reports | Risk | Sign-ins | Auth Method |
|------------|--------|----------|---------|-----|-------------|---------|------|----------|-------------|
| <ip> | 🔴 Anomaly / 🟠 Risky / 🔵 Frequent | <city, country> | <org> | 🟢 No / 🔴 Yes | <score>% | <count> | HIGH/MED/LOW | <count> (✓<success>/✗<fail>) | <MFA/Password/Token/Passkey> |
### Threat Intelligence Matches
<If TI matches found:>
| IP Address | Threat Description | Confidence | Activity Groups | Valid Until |
|------------|-------------------|------------|-----------------|------------|
| <ip> | <description> | <score> | <groups> | <date> |
<If no TI matches:>
✅ No threat intelligence matches found for investigated IPs.
---
## Sign-in Activity
### Top Applications
| Application | Sign-ins | Success | Failures | Unique Locations | IP Addresses | First Seen | Last Seen |
|-------------|----------|---------|----------|------------------|--------------|------------|-----------|
| <app> | <count> | <count> | <count> | <count> | <ip_list> | <date> | <date> |
### Top Locations
| Location | Sign-ins | Success | Failures | IP Addresses | Applications | First Seen | Last Seen |
|----------|----------|---------|----------|--------------|--------------|------------|-----------|
| <location> | <count> | <count> | <count> | <ip_list> | <app_list> | <date> | <date> |
### Sign-in Failures
<If failures found:>
| Error Code | Description | Count | Applications | Locations | First Seen | Last Seen |
|------------|-------------|-------|--------------|-----------|------------|-----------|
| <code> | <description> | <count> | <app_list> | <loc_list> | <date> | <date> |
**Failure Analysis:**
- <Brief analysis of failure patterns — device compliance (53000), MFA required (50074), blocked by CA (530032), etc.>
<If no failures:>
✅ No sign-in failures detected in the investigation period.
---
## Registered Devices
<If devices found:>
| Device Name | OS | Trust Type | Compliant | Managed | Last Sign-in |
|-------------|-----|------------|-----------|---------|--------------|
| <name> | <os> <version> | <AzureAd/Hybrid/Workplace> | 🟢 Yes / 🔴 No | 🟢 Yes / 🔴 No | <date> |
<If no devices:>
✅ No registered devices found for this user.
---
## Audit Log Activity
<If audit events found:>
| Category | Result | Count | Operations | First Seen | Last Seen |
|----------|--------|-------|------------|------------|-----------|
| <category> | <Success/Failure> | <count> | <operation_list> | <date> | <date> |
**Notable Operations:**
- <Brief summary of significant audit events — password changes, role assignments, MFA modifications, app consent, etc.>
<If no audit events:>
✅ No audit log activity detected for this user in the investigation period.
---
## Office 365 Activity
<If O365 events found:>
| Record Type | Operation | Count |
|-------------|-----------|-------|
| <type> | <operation> | <count> |
<If no O365 events:>
✅ No Office 365 activity detected for this user in the investigation period.
---
## DLP Events
<If DLP events found:>
| Time | Device | Operation | File | Target | Rule |
|------|--------|-----------|------|--------|------|
| <datetime> | <device> | <operation> | <filename> | <domain/path> | <rule_name> |
**DLP Summary:**
- ⚠️ <X> sensitive file operations detected
- Operations: <network share copy, cloud upload, removable media, etc.>
- Rules triggered: <list of DLP rule names>
<If no DLP events:>
✅ No DLP events detected for this user in the investigation period.
---
## Security Incidents
<If incidents found:>
| ID | Title | Severity | Status | Classification | Created | Owner | Alerts | Link |
|----|-------|----------|--------|----------------|---------|-------|--------|------|
| <id> | <title> | 🔴/🟠/🟡 <severity> | <New/Active/Closed> | <TP/FP/BP/—> | <date> | <owner_upn> | <count> | [View](<url>) |
**Incident Summary:**
- <X> total incidents (<Y> open, <Z> closed)
- Highest severity: <level>
- <Brief description of most critical incident>
<If no incidents:>
✅ No security incidents involving this user in the investigation period.
- Checked: SecurityAlert → SecurityIncident join on UPN, User Object ID, and Windows SID (0 matches)
---
## Risk Assessment
### Risk Score: <XX>/100 — 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL
### Risk Factors
| Factor | Finding |
|--------|---------|
| 🔴/🟠/🟡 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |
### Mitigating Factors
| Factor | Finding |
|--------|---------|
| 🟢 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |
---
## Recommendations
### Critical Actions
<Numbered list of critical actions with evidence. Only include if critical findings exist.>
### High Priority Actions
<Numbered list of high-priority actions with evidence.>
### Monitoring Actions (14-Day Follow-Up)
<Bulleted list of ongoing monitoring recommendations.>
---
## Appendix: Query Details
| # | Query | Table(s) | Records | Execution |
|---|-------|----------|--------:|----------:|
| 1 | IP Selection (Priority IPs) | Signinlogs_Anomalies_KQL_CL, AADUserRiskEvents, SigninLogs | <count> | <time> |
| 2 | Anomaly Detection | Signinlogs_Anomalies_KQL_CL | <count> | <time> |
| 3 | Sign-ins by Application | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 3b | Sign-ins by Location | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 3c | Sign-in Failures | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 3d | IP Sign-in Counts | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 4 | Audit Log Activity | AuditLogs | <count> | <time> |
| 5 | Office 365 Activity | OfficeActivity | <count> | <time> |
| 6 | Security Incidents | SecurityAlert, SecurityIncident | <count> | <time> |
| 10 | DLP Events | CloudAppEvents | <count> | <time> |
| 11 | Threat Intelligence | ThreatIntelIndicators | <count> | <time> |
| — | User Profile | Microsoft Graph API | 1 | <time> |
| — | MFA Methods | Microsoft Graph API | <count> | <time> |
| — | Registered Devices | Microsoft Graph API | <count> | <time> |
| — | Risk Profile | Microsoft Graph API | 1 | <time> |
| — | Risk Detections | Microsoft Graph API | <count> | <time> |
| — | Risky Sign-ins | Microsoft Graph API (beta) | <count> | <time> |
*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*
**Do NOT include full KQL text in the appendix** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.
---
**Investigation Timeline:**
- [MM:SS] ✓ Phase 1: User ID retrieval (<X>s)
- [MM:SS] ✓ Phase 2: Parallel data collection (<X>s)
- [MM:SS] ✓ IP Enrichment (<X>s)
- [MM:SS] ✓ Phase 3: Report generation (<X>s)
- **Total Investigation Time:** <duration>
Markdown Report Authoring Guidelines
- Populate every section — even if data is empty. Use the
✅ No <X> detected...pattern for empty sections. - Never invent data — follow the Evidence-Based Analysis global rule strictly. Every number in the report must come from a query result.
- Risk assessment is dynamic — calculate risk score using the same weighted logic as
generate_report_from_json.py(risk factors × 10 − mitigating factors × 5 + baseline 30, capped 0–100). - IP enrichment — run
enrich_ips.pyfor IP intelligence data. Ifenrich_ips.pyis unavailable, use Sentinel ThreatIntelIndicators and Signinlogs_Anomalies_KQL_CL data as fallback. - PII-Free — the report file is saved to
reports/which is gitignored. However, exercise caution with any files that may be shared externally. - Emoji consistency — follow the Emoji Formatting table from
copilot-instructions.mdfor all risk/status indicators. - Query appendix — include record counts and execution times but NOT full KQL text. Reference the SKILL.md query numbers.
JSON Export Structure (Mode 3 — HTML Report)
Export MCP query results to a single JSON file with these required keys:
{
"upn": "user@domain.com",
"user_id": "<USER_OBJECT_ID>",
"user_sid": "<WINDOWS_SID>",
"investigation_date": "2025-11-23",
"start_date": "2025-11-15",
"end_date": "2025-11-24",
"timestamp": "20251123_164532",
"anomalies": [...],
"signin_apps": [...],
"signin_locations": [...],
"signin_failures": [...],
"signin_ip_counts": [...],
"audit_events": [...],
"office_events": [...],
"dlp_events": [...],
"incidents": [...],
"user_profile": {
"id": "...",
"displayName": "...",
"userPrincipalName": "...",
"mail": "...",
"userType": "...",
"jobTitle": "...",
"department": "...",
"officeLocation": "...",
"accountEnabled": true
},
"mfa_methods": {...},
"devices": [...],
"risk_profile": {...},
"risk_detections": [...],
"risky_signins": [...],
"threat_intel_ips": [...]
}
Error Handling
Common Issues and Solutions
| Issue | Solution |
|---|---|
Missing department or officeLocation |
Use "Unknown" as default value |
| No anomalies found | Export empty array: "anomalies": [] |
| Graph API returns 404 for user | Verify UPN is correct |
| Sentinel query timeout | Reduce date range or add ` |
Missing trustType in device query |
Use default: "Workplace" |
| No results from SecurityIncident query | Ensure using ALL THREE identifiers (UPN, UserID, SID) |
| Risky sign-ins query fails | Must use /beta endpoint |
Required Field Defaults
{
"department": "Unknown",
"officeLocation": "Unknown",
"trustType": "Workplace",
"approximateLastSignInDateTime": "2025-01-01T00:00:00Z"
}
Empty Result Handling
{
"anomalies": [],
"signin_apps": [],
"signin_locations": [],
"signin_failures": [],
"audit_events": [],
"office_events": [],
"dlp_events": [],
"incidents": [],
"risk_detections": [],
"risky_signins": [],
"threat_intel_ips": []
}
Integration with Main Copilot Instructions
This skill follows all patterns from the main copilot-instructions.md:
- Date range handling: Uses +2 day rule for real-time searches
- Parallel execution: Runs independent queries simultaneously
- Time tracking: Mandatory reporting after each phase
- Token management: Uses
create_filefor all output - Follow-up analysis: Reference
copilot-instructions.mdfor authentication tracing workflows
Example invocations:
- "Investigate user@domain.com for the last 7 days" → asks for output mode
- "Quick security check on admin@company.com" → inline (Mode 1)
- "Full investigation for compromised.user@domain.com last 30 days" → asks for output mode
- "Investigate user@domain.com — markdown report" → markdown file (Mode 2)
- "Investigate user@domain.com — HTML report" → HTML report (Mode 3)
- "Investigate user@domain.com — markdown and HTML" → both Mode 2 + Mode 3
SVG Dashboard Generation
After generating a user investigation report (markdown file output), an SVG dashboard can be created using the shared SVG rendering skill.
Trigger: User asks "generate an SVG dashboard from the report" or "visualize this report"
Workflow:
- Read this skill's
svg-widgets.yaml(widget manifest — defines layout, colors, field mapping) - Read
.github/skills/svg-dashboard/SKILL.md(rendering rules — component library, quality standards) - Extract data from the completed report using
data_sources.field_mapping_notes - Render SVG → save as
{report_basename}_dashboard.svgin the same directory
Layout: 5 rows — title banner, risk score card + KPI cards (sign-ins/success rate/IPs/incidents/anomalies), top apps bar chart + failure codes bar chart, incidents table + risk/mitigating factors table, assessment banner + recommendations.
Last Updated: March 24, 2026


