Agent Skills › SCStelz/security-investigator

SCStelz/security-investigator

GitHub

用于调查条件访问策略变更、相关登录失败(如错误码53000)及疑似绕过行为。通过关联策略修改与登录时间线,区分合法排障与安全控制规避,提供取证分析。

27 skills 226

Install All Skills

npx skills add SCStelz/security-investigator --all -g -y
More Options

List skills in collection

npx skills add SCStelz/security-investigator --list

Skills in Collection (27)

用于调查条件访问策略变更、相关登录失败(如错误码53000)及疑似绕过行为。通过关联策略修改与登录时间线,区分合法排障与安全控制规避,提供取证分析。
Conditional Access CA policy device compliance policy bypass 53000 50074 登录失败排查
.github/skills/ca-policy-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill ca-policy-investigation -g -y
SKILL.md
Frontmatter
{
    "name": "ca-policy-investigation",
    "description": "Use this skill when asked to investigate Conditional Access policy changes, sign-in failures related to CA policies (error codes 53000, 50074, 530032), or suspected policy bypass\/manipulation. Triggers on keywords like \"Conditional Access\", \"CA policy\", \"device compliance\", \"policy bypass\", \"53000\", \"50074\", or when investigating why a user was blocked then suddenly unblocked. This skill provides forensic analysis of CA policy modifications correlated with sign-in failures.",
    "drill_down_prompt": "Investigate Conditional Access policy changes — sign-in correlation, bypass detection",
    "threat_pulse_domains": [
        "identity"
    ]
}

Conditional Access Policy Investigation - Instructions

Purpose

This skill investigates Conditional Access (CA) policy changes in correlation with sign-in failures to detect:

  • Legitimate troubleshooting (authorized policy changes to resolve access issues)
  • Security control bypass (unauthorized policy modifications to circumvent blocks)
  • Privilege abuse (users with admin rights weakening security controls)

The key distinction is whether policy changes were authorized and necessary vs self-service bypass of security controls.


📑 TABLE OF CONTENTS

  1. Critical Investigation Rules - Mandatory workflow steps
  2. Common Error Codes - Sign-in failure reference
  3. CA Policy States - Understanding policy modes
  4. 5-Step Investigation Workflow - KQL queries and analysis
  5. Real-World Example - Complete walkthrough
  6. Critical Mistakes - What NOT to do
  7. Security Recommendations - Remediation guidance

Critical Investigation Rules

When investigating sign-in failures (error codes 53000, 50074) with CA policy correlation:

⚠️ MANDATORY STEPS - DO NOT SKIP:

  1. Query ALL CA policy changes in chronological order (±2 days from failure time)
  2. Parse policy state transitions from the JSON (enabled → disabled → report-only)
  3. Compare failure timeline with policy change timeline
  4. Verify logical consistency: Ask "does this make sense?"

Key Questions to Answer:

  • Was the user blocked BEFORE the policy change?
  • Did the policy change resolve the block?
  • Who initiated the policy change? (same user = suspicious)
  • What was the business justification?

Common Error Codes

Error Code Description Typical Cause
53000 Device not compliant Device not enrolled in Intune or failing compliance checks
50074 Strong authentication required MFA not satisfied
50074 User must enroll in MFA MFA not configured for user
530032 Blocked by CA policy Generic CA policy block
65001 User consent required Application consent needed
53003 Access blocked by CA policy Explicit block condition met
70044 Session expired User needs to re-authenticate

Error Code Investigation Priority

Priority Error Codes Investigation Focus
HIGH 53000, 530032, 53003 Device compliance, CA policy blocks - check for policy manipulation
MEDIUM 50074 MFA requirements - check if MFA was bypassed
LOW 65001, 70044 Consent/session issues - usually not security-related

CA Policy State Meanings

State What It Means Security Impact
enabled Policy actively enforcing Blocks non-compliant access (intended behavior)
disabled Policy not enforcing Security control bypassed - all access allowed
enabledForReportingButNotEnforced Report-only mode Logs violations but doesn't block - defeats purpose

State Transition Risk Assessment

Transition Risk Level Interpretation
enableddisabled HIGH Complete security bypass
enabledenabledForReportingButNotEnforced MEDIUM-HIGH Partial bypass (monitoring only)
disabledenabled LOW Security restored (good)
enabledForReportingButNotEnforcedenabled LOW Security strengthened (good)

Investigation Workflow Pattern

Step 1: Identify Sign-In Failures

Query sign-in failures with CA context:

// Get failures with CA context
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (datetime(<START>) .. datetime(<END>))
| where UserPrincipalName =~ '<UPN>'
| where ResultType != '0'
| where AppDisplayName has '<APPLICATION>'  // e.g., "Visual Studio Code"
| project TimeGenerated, IPAddress, Location, ResultType, ResultDescription, 
    ConditionalAccessStatus, UserAgent
| order by TimeGenerated asc

What to Look For:

  • ResultType values: 53000, 50074, 530032, 53003
  • ConditionalAccessStatus: "failure", "notApplied"
  • Pattern of repeated failures followed by success

Step 2: Query ALL CA Policy Changes in Timeframe

CRITICAL: Query ±2 days from the first failure time

let failure_time = datetime(<FIRST_FAILURE_TIME>);
let start = failure_time - 2d;
let end = failure_time + 2d;
AuditLogs
| where TimeGenerated between (start .. end)
| where OperationName has_any ("Conditional Access", "policy")
| where Identity =~ '<UPN>' or tostring(InitiatedBy) has '<UPN>'
| extend InitiatorUPN = tostring(parse_json(InitiatedBy).user.userPrincipalName)
| extend InitiatorIPAddress = tostring(parse_json(InitiatedBy).user.ipAddress)
| extend TargetName = tostring(parse_json(TargetResources)[0].displayName)
| project TimeGenerated, OperationName, Result, InitiatorUPN, InitiatorIPAddress, 
    TargetName, CorrelationId
| order by TimeGenerated asc  // CRITICAL: Chronological order

Critical Analysis Points:

  • InitiatorUPN: Who made the change? Same user as blocked = suspicious
  • TargetName: Which policy was modified?
  • TimeGenerated: Did change occur AFTER sign-in failures?
  • Order: Always chronological (oldest first) to see cause/effect

Step 3: Parse Policy State Changes

For each CorrelationId from Step 2, get detailed changes:

// Get detailed property changes for a specific policy modification
AuditLogs
| where CorrelationId == "<CORRELATION_ID>"
| extend ModifiedProperties = parse_json(TargetResources)[0].modifiedProperties
| mv-expand ModifiedProperties
| extend PropertyName = tostring(ModifiedProperties.displayName)
| extend OldValue = tostring(ModifiedProperties.oldValue)
| extend NewValue = tostring(ModifiedProperties.newValue)
| project TimeGenerated, PropertyName, OldValue, NewValue

Key Properties to Extract:

  • Look for "state" property in the JSON
  • Parse OldValue and NewValue for state transitions
  • Document: enableddisabledenabledForReportingButNotEnforced

Step 4: Extract Policy State from JSON

Manual JSON Parsing:

The OldValue and NewValue fields contain JSON. Look for the "state" field:

{
  "state": "enabled",
  "conditions": { ... },
  "grantControls": { ... }
}

Build the Timeline:

  1. Extract "state" from each OldValue and NewValue
  2. Create chronological list: enableddisabledenabledForReportingButNotEnforced
  3. Correlate with sign-in failure timeline

Step 5: Security Assessment

Compare timelines and assess intent:

Pattern Interpretation Risk Level
Failures → Policy Disabled User bypassed security control to unblock self HIGH - Privilege abuse
Failures → Policy Changed to Report-Only User weakened security control MEDIUM-HIGH - Partial bypass
Policy Disabled → Failures Continue Cached tokens (5-15 min propagation delay) INFO - Expected behavior
Policy Changed → No More Failures Policy change resolved issue Context-dependent - May be legitimate troubleshooting
Different user made change Admin assisted with access issue LOW - Likely legitimate (verify authorization)

Risk Escalation Criteria:

Criteria Risk Level
Same user blocked AND made policy change HIGH
Policy disabled within 30 minutes of first failure HIGH
Multiple policies modified HIGH
Change made outside business hours MEDIUM-HIGH
No change request ticket/approval MEDIUM-HIGH
Admin made change for blocked user (with ticket) LOW

Real-World Example Analysis

Scenario: User blocked by device compliance policy, then modifies policy

Timeline

Time Event Details
19:05 Sign-in failure Error 53000: device not compliant
19:06 Sign-in failure Error 53000: device not compliant
19:07 Sign-in failure Error 53000: device not compliant
19:09 Policy change enableddisabled
19:09 Policy change disabledenabledForReportingButNotEnforced
19:12 Sign-in failure Error 53000 (cached token)
19:14 Sign-in success Access granted

Analysis

  1. Policy was correctly blocking non-compliant device

    • Device compliance policy was enforcing as intended
    • User's device failed compliance checks (not enrolled or failing policy)
  2. 🚨 User disabled security control to bypass block

    • Same user who was blocked made the policy change
    • Change occurred within 4 minutes of repeated failures
    • No approval or change request documented
  3. ⚠️ User partially reversed by enabling report-only

    • Shows some awareness that disabling was too aggressive
    • But report-only still defeats the purpose (doesn't block)
  4. Report-only mode is NOT a valid security posture

    • Logs violations but allows non-compliant access
    • Creates false sense of security (policy "exists" but doesn't protect)

Assessment

Field Value
Risk Level MEDIUM-HIGH
Finding Self-service security bypass using privileged role
Root Cause User's device is non-compliant (not enrolled/failing compliance)
Policy Impact Device compliance checks now ineffective for all users

Recommendations

  1. Immediate Actions:

    • Restore policy to enabled state
    • Verify user's device compliance status
    • Document incident for security review
  2. User-Specific:

    • Enroll user's device in Intune
    • Verify device meets compliance requirements
    • Review if user needs Security Administrator role
  3. Process Improvements:

    • Implement approval workflow for CA policy changes
    • Create alert for policy state changes (enabled → disabled/report-only)
    • Review all users with permission to modify CA policies
    • Consider PIM for Security Administrator role

Critical Mistakes to Avoid

❌ DON'T:

Mistake Why It's Wrong
Query only ONE policy change event You'll miss the sequence of changes
Read policy changes in reverse chronological order Confuses cause/effect relationship
Assume policy was already disabled Must check starting state from OldValue
Skip verifying "does this make logical sense?" Disabled policies can't block users
Ignore the initiator identity Same user = suspicious, different admin = verify authorization
Focus only on final state The transition sequence reveals intent

✅ DO:

Best Practice Why It Matters
Query ALL policy changes in the timeframe Complete picture of modifications
Order chronologically (oldest first) See cause/effect sequence
Parse the full JSON for state transitions Extract exact policy states
Cross-check: blocked user → policy must be enabled Logical consistency verification
Ask: "Why would user disable this policy?" Usually to bypass a legitimate block
Check if initiator had authorization Ticket, approval, documented reason

Security Recommendations

When CA Policy Changes Are Detected

1. Determine Legitimacy

  • Was the policy change authorized?
  • Was there a valid business reason?
  • Did the user have approval to make this change?
  • Is there a change request ticket?

2. Assess Impact

  • How many users affected by policy change?
  • What applications/resources are now unprotected?
  • How long was the policy disabled/weakened?
  • Are there compliance implications (regulatory requirements)?

3. Remediation Actions

Action Priority
Restore policy to enabled state if unauthorized IMMEDIATE
Investigate root cause (why was user blocked?) HIGH
Fix underlying issue (device compliance, MFA enrollment) HIGH
Review who has permission to modify CA policies MEDIUM
Implement approval workflows for policy changes MEDIUM
Create alerts for future CA policy modifications MEDIUM

4. Long-Term Improvements

Improvement Benefit
Use PIM for Security Administrator role Requires approval for elevated access
Implement CA policy change alerts Real-time notification of modifications
Require multi-admin approval for state changes Prevents single-person bypass
Document approved procedures Clear guidance for legitimate troubleshooting
Regular access reviews Ensure only necessary users have CA admin rights

Prerequisites

Required MCP Servers

This skill requires:

  1. Microsoft Sentinel MCP - For KQL queries against SigninLogs and AuditLogs
    • mcp_sentinel-data_query_lake: Execute KQL queries
    • mcp_sentinel-data_search_tables: Discover table schemas

Required Data Sources

  • SigninLogs - Interactive sign-in events with CA status
  • AADNonInteractiveUserSignInLogs - Non-interactive sign-in events
  • AuditLogs - CA policy modification events

Required Permissions

To view CA policy changes in AuditLogs, ensure:

  • Sentinel workspace has AuditLogs ingestion enabled
  • User has appropriate RBAC to query the workspace

Integration with Other Skills

CA Policy Investigation often follows a user-investigation:

  1. Run user-investigation skill → Identifies sign-in failures
  2. Notice CA-related error codes → 53000, 50074, 530032
  3. Run ca-policy-investigation skill → Correlate failures with policy changes
  4. Document findings → Security assessment with remediation recommendations

Key Integration Points:

  • Sign-in failure data comes from user investigation
  • CA policy changes are NEW queries specific to this skill
  • Assessment combines timeline correlation with policy state analysis
每周审查调查租户上下文记忆文件,对比最新SOC扫描报告,生成仅建议的变更文档(ADD/MODIFY/FLAG),绝不直接修改文件或提交PR。需人工审核确认后才执行应用。
review my context file review tenant context propose context updates what should I add to my context memory
.github/skills/context-memory-review/SKILL.md
npx skills add SCStelz/security-investigator --skill context-memory-review -g -y
SKILL.md
Frontmatter
{
    "name": "context-memory-review",
    "description": "Weekly review of an investigation tenant-context memory file against the most recent SOC scan reports (e.g. Threat Pulse). Surfaces candidate ADD \/ MODIFY \/ FLAG changes to the context file as a propose-only review document for human approval — it NEVER edits the context file, commits, or opens a PR. Trigger on 'review my context file', 'review tenant context', 'propose context updates', 'what should I add to my context memory'."
}

Context Memory Review — Instructions

Purpose

Investigation workflows in this project lean on a tenant-context memory file — a local, gitignored living document that records environment-specific ground truth (known automation/orchestration fingerprints, known-good IPs, account classifications, honeypot/field-device inventory, validated personnel, and documented false-positive rules). Scan automations (e.g. the daily Threat Pulse) read that file to render accurate verdicts.

Over a week of scans, drill-down investigations validate new ground truth — new IPs, new personas, new FP classes, new device classes — that is not yet captured in the context file. This skill reads the last N days of scan reports, compares them against the current context file, and produces a propose-only review document: a list of discrete, human-reviewable candidate changes (ADD / MODIFY / FLAG) with section anchors, proposed text, supporting evidence, recurrence counts, and confidence.

This skill is the first half of a deliberate two-phase, human-in-the-loop workflow:

Phase Who Action
1. Propose (this skill) Automation / interactive Read reports + context file → emit review doc. No edits.
2. Apply (separate, manual) Human-directed interactive session Operator reviews the doc, says "apply items X, Y, Z" → surgical edits to the context file.

🔴 CRITICAL RULES — READ FIRST

  1. PROPOSE-ONLY. NEVER edit the context file in this skill. Do not write, append to, or modify the context memory file. Do not git commit, push, or open a PR. The only file this skill writes is the review document in the output directory.

  2. Read-only against the tenant. If any live queries are needed to corroborate a candidate change, they MUST be read-only (per the Remediation Output Policy). Prefer evidence already present in the reports — only query the tenant to disambiguate a contradiction.

  3. ⛔ Feedback-loop guard (the single most important rule). Scan reports are partly downstream of the context file: a scan verdict may simply echo an existing context entry rather than independently confirm it. You MUST distinguish:

    • First-party validation — a drill-down in the report actually ran a query/enrichment and confirmed the fact (e.g. "enriched IP 203.0.113.10 → datacenter ASN, 0 abuse reports, recurred on 3 days"). This CAN drive a High-confidence proposal.
    • Context-derived echo — the report verdict only restated something the context file already said ("🟢 known orchestration IP per tenant context"). This must NOT be promoted into a new or strengthened entry. Promoting echoes entrenches errors. When unsure, classify as echo.
  4. Never propose weakening or removing a documented FP/safety guardrail based solely on its absence from the week's reports. Absence of a finding ≠ obsolescence of a guardrail. Staleness candidates are FLAG-only, Low confidence, for human judgment — never auto-REMOVE.

  5. Evidence-based only. Every proposed change cites the specific report file(s), date(s), and finding it derives from. Never invent entities, counts, IPs, UPNs, or dates. If the reports don't support a change, don't propose it.

  6. PII stays local. The review document will contain live tenant entities (IPs, UPNs, device names). Write it ONLY to the gitignored output directory. Never commit it, never include it in a PR, never paste tenant PII into any git artifact.


Inputs (supplied by the invoking prompt / workflow)

The invoking workflow or user supplies these. If invoked interactively without them, ask once, then proceed with the defaults shown.

Input Meaning Default
context_file Absolute path to the tenant-context memory file to review (must be provided)
reports_dir Directory (or glob) holding the scan reports to review (must be provided)
reports_glob Filename pattern for the reports of interest *.md
lookback_days How far back to include reports (by filename date or mtime) 7
output_dir Where to write the review document (must be gitignored) reports/context-reviews

Execution Workflow

Phase 0 — Load inputs and current state

  1. Read the context file in full (context_file). Build an internal index of its structure: every section heading (the anchor targets for proposals), and within sections the discrete entries — table rows (e.g. IP tables), bullet points, labelled sub-notes (e.g. "A.2", "Section C"), device entries. Note any validated YYYY-MM-DD provenance stamps.
  2. Enumerate the reports in window. List files in reports_dir matching reports_glob, select those whose date (from filename YYYYMMDD if present, else file mtime) falls within lookback_days. Sort oldest→newest. If zero reports are in window, STOP and report "no reports in window — nothing to review" (a normal quiet-week outcome, not a failure).
  3. Read each in-window report. For large reports, read in ranges. Extract structured signal:
    • Concrete entities that appeared with a verdict: IPs, UPNs/accounts, device/host names, OAuth apps, incident IDs, CVEs.
    • For each: was the verdict reached by a first-party drill-down (a query/enrichment was executed in the report) or an echo of existing context? Capture the distinction — it gates confidence.
    • New FP classes / tuning notes the report's drill-downs articulated.
    • Any contradiction: a drill-down that concluded the opposite of an existing context entry.
    • Note in each report whether the context file was successfully loaded/applied during that scan (the reports state this) — echoes only count as echoes if context was actually applied.

Phase 1 — Correlate across the week

Aggregate signal across all in-window reports:

  1. Recurrence — For each candidate entity/pattern, count how many distinct report-days it appeared on with a consistent first-party classification. Recurrence is the backbone of confidence.
  2. Match against the context file index — For each candidate, determine whether it is:
    • Absent from the context file → ADD candidate.
    • Present but refined by the reports (role/volume/scope changed, new regional sibling, expanded persona list) → MODIFY candidate.
    • Present and merely echoed (no new first-party info) → NOT a candidate (drop it; feedback-loop guard). It may at most justify refreshing a validated date if a first-party drill-down re-confirmed it — and that is a Low/Medium MODIFY, clearly labelled "provenance refresh only".
    • Contradicted by a first-party drill-down → FLAG candidate (never auto-resolve).
  3. Staleness sweep (FLAG-only) — Identify context entries that were NOT referenced by ANY in-window report. These are candidates for human review, not removal. Low confidence. Exclude documented safety/FP guardrails from staleness flags entirely (their value is in preventing future errors, not in weekly hit-rate).

Phase 2 — Score and assemble proposals

Assign each candidate a type and confidence:

Type When
ADD New, first-party-validated fact absent from the context file.
MODIFY Existing entry that a first-party drill-down refined/expanded, or a provenance-refresh.
FLAG A contradiction needing human judgment, or a staleness candidate. Never an auto-edit.
Confidence Criteria
High First-party validated AND recurred on ≥3 report-days (or a single explicit, thorough validated drill-down with enrichment/queries). Consistent classification, no contradicting evidence.
Medium First-party validated on 2 report-days, OR 1 strong drill-down without recurrence.
Low Single weak signal, provenance-refresh only, or any FLAG/staleness candidate.

For every proposal, produce:

  • ID — sequential (P1, P2, …).
  • Type + Confidence.
  • Target section — the exact heading/anchor in the context file where it belongs (for ADD), or the exact existing entry text being changed (for MODIFY/FLAG).
  • Proposed text — for ADD/MODIFY, the literal line/table-row/bullet to insert or the before→after change, written in the context file's existing style and including a (validated <today's date>) stamp where the file uses that convention.
  • Rationale — one or two sentences.
  • Evidence — the report file name(s) + date(s) + the specific finding, and an explicit note of whether it was first-party or echo (only first-party drives ADD/MODIFY).
  • Recurrence — "appeared on N of M report-days".
  • Apply instruction — precise enough for a later interactive session to make a surgical edit (which section, insert-after-which-line, exact text). For FLAG items, the question the human must answer.

Phase 3 — Write the review document

Write the document to output_dir (create the folder if needed) as: <output_dir>/context-review_<YYYYMMDD>_<HHMMSS>.md

Use this structure:

# Context Memory Review — <today's date>

**Context file reviewed:** <context_file>
**Reports reviewed:** <N> file(s) over <lookback_days>d (<earliest date> → <latest date>)
**Proposed changes:** <A> ADD · <M> MODIFY · <F> FLAG
**Confidence mix:** <High count> High · <Medium count> Medium · <Low count> Low

> ⚠️ PROPOSE-ONLY. No changes have been made to the context file. To apply, open an interactive
> session and say e.g. "apply items P1, P3, P7" — those edits will be made surgically with a
> validated-date stamp. Review each item's evidence before approving.

## Reports in this review window
| Date | File | Context applied during scan? |
|------|------|------------------------------|
| ... | ... | yes / no |

## Proposed changes

### P1 — [ADD · High] <short title>
- **Target section:** <heading/anchor>
- **Proposed text:**
  > <literal text to add, in file style, with (validated <date>)>
- **Rationale:** ...
- **Evidence:** <report file(s) + date(s) + finding>; first-party drill-down.
- **Recurrence:** appeared on N of M report-days.
- **Apply instruction:** Insert under "<section>" after "<anchor line>".

### P2 — [MODIFY · Medium] ...
...

### P3 — [FLAG · Low] <contradiction or staleness> ...
- **Question for human:** ...

## Items considered but NOT proposed (feedback-loop guard)
Brief list of candidate signals that were only context-echoes (already in the file, no new first-party
evidence) and were therefore intentionally dropped — so the reviewer can confirm nothing was missed.

## Summary
One paragraph: the week's theme, the highest-value proposed addition, any contradiction needing
attention, and the count of staleness flags.

Phase 4 — Report to chat

End your response with a concise summary: context file + report window reviewed, counts of ADD/MODIFY/FLAG by confidence, the single highest-value proposed change, any contradictions surfaced, the output document path, and a reminder that nothing was applied and how to apply (interactive "apply items …").


Quality Checklist

Before finishing, verify:

  • The context file was not modified; no commit/PR/push occurred.
  • The review document was written only to the gitignored output directory.
  • Every ADD/MODIFY proposal is backed by first-party evidence (not a context echo).
  • No proposal weakens/removes a safety or FP guardrail on the basis of absence alone.
  • Every proposal cites specific report file(s) + date(s) and a recurrence count.
  • Contradictions are FLAG (human decides), never auto-resolved.
  • Staleness candidates are FLAG · Low, and exclude documented guardrails.
  • Proposed text matches the context file's existing style and includes a validated-date stamp where the file uses that convention.
  • The "considered but not proposed" section documents the dropped echoes.

Notes

  • This skill is environment-agnostic. All tenant-specific values (which context file, which reports, output location) are supplied by the invoking workflow or user — keep this file free of any tenant-specific identifiers, hostnames, UPNs, or environment names.
  • Apply is intentionally out of scope here. Keeping propose and apply as separate phases — with apply driven by an explicit human instruction — is the safety boundary that prevents an unattended run from silently rewriting the ground-truth the scans depend on.
通过Graph API在Microsoft Defender XDR中创建、部署和管理自定义检测规则。支持KQL适配、单条/批量部署、生命周期管理及验证,需使用PowerShell和特定权限。
部署自定义检测规则 将Sentinel KQL转换为Defender格式 管理Defender XDR检测规则生命周期
.github/skills/detection-authoring/SKILL.md
npx skills add SCStelz/security-investigator --skill detection-authoring -g -y
SKILL.md
Frontmatter
{
    "name": "detection-authoring",
    "description": "Create, deploy, update, and manage custom detection rules in Microsoft Defender XDR via the Graph API (\/beta\/security\/rules\/detectionRules). Covers query adaptation from Sentinel KQL to custom detection format, deployment via PowerShell (Invoke-MgGraphRequest), manifest-driven batch deployment, and lifecycle management (list, enable\/disable, delete). Companion script: Deploy-CustomDetections.ps1."
}

Custom Detection Authoring — Instructions

Purpose

This skill deploys custom detection rules to Microsoft Defender XDR via the Microsoft Graph API (/beta/security/rules/detectionRules). It handles:

  • Query adaptation — Converting Sentinel KQL queries into custom detection format
  • Single-rule deployment — Creating one rule via Graph API
  • Batch deployment — Deploying multiple rules from a JSON manifest
  • Lifecycle management — Listing, updating, enabling/disabling, and deleting rules
  • Validation — Dry-run queries in Advanced Hunting before deployment

Entity Type: Custom detection rules (Defender XDR)

Writing new detection queries from scratch? This skill focuses on deploying and managing detection rules — not query creation. If you need to write detection KQL from scratch (schema validation, community examples, performance optimization), use the kql-query-authoring skill first with CD intent markers (say "create custom detection queries for [scenario]"). It will produce Sentinel-format queries with cd-metadata blocks ready for this skill to adapt and deploy.


📑 TABLE OF CONTENTS

  1. Prerequisites — Auth, scopes, PowerShell modules
  2. Critical Rules — Mandatory constraints (includes query adaptation checklist)
  3. Naming Convention — Standardized displayName format (no prefixes, no MITRE IDs, colon separators)
  4. API Reference — Graph API schema and field values
  5. Frequency & Lookback — Schedule periods, lookback windows, NRT constraints
  6. Deployment Workflow — Step-by-step process
  7. Batch Deployment — Manifest-driven multi-rule deployment
  8. Lifecycle Management — CRUD operations
  9. Existing Rule Discovery — Search Analytic Rules & Custom Detections by table, EventID, or keyword
  10. Known Pitfalls — Lessons learned (18 pitfalls documented)
  11. CD Metadata Contract — Schema for query file ↔ detection skill coordination

Prerequisites

Required PowerShell Module

# Microsoft.Graph.Authentication — provides Invoke-MgGraphRequest
Install-Module Microsoft.Graph.Authentication -Scope CurrentUser

Required Graph API Scopes

Operation Scope Type
List / Get rules CustomDetection.Read.All Delegated
Create / Update / Delete CustomDetection.ReadWrite.All Delegated

Authentication

# Read-only
Connect-MgGraph -Scopes "CustomDetection.Read.All" -NoWelcome

# Full CRUD
Connect-MgGraph -Scopes "CustomDetection.ReadWrite.All" -NoWelcome

Why Invoke-MgGraphRequest? The Graph MCP server and az rest both return 403 for custom detection endpoints — they lack the CustomDetection.* scopes. Invoke-MgGraphRequest uses interactive delegated auth with consent, which works.

Companion Script

Deploy-CustomDetections.ps1 — PowerShell script for manifest-driven batch deployment. See Batch Deployment.


⚠️ CRITICAL RULES — READ FIRST ⚠️

Mandatory Query Requirements

Custom detection queries have strict requirements that differ from Sentinel analytic rules:

Requirement Detail
🔴 Author-only by default The default behavior is to author, validate, and write the manifest only — do NOT call the Graph API to deploy rules unless the user explicitly says "deploy", "create the rule", "push to Defender", or similar deployment-intent language. If deployment intent is ambiguous, ask before calling the API.
Timestamp column must be projected as-is The query MUST project the timestamp column exactly as it appears in the source tableTimeGenerated for Sentinel/LA tables, Timestamp for XDR-native tables. Do not alias one to the other (e.g., Timestamp = TimeGenerated causes 400 Bad Request). See Pitfall 1.
Event-unique columns (per table type) Required columns that uniquely identify the event differ by table family. A bare summarize count() or make_set() loses these columns and fails. summarize with arg_max IS allowed — see Pitfall 3. See table below for per-type requirements.
Impacted asset identifier column The query must project at least one column whose name matches a valid impactedAssets identifier (e.g., AccountUpn, DeviceName, DeviceId). See Impacted Asset Types and Pitfall 9. Queries without project or summarize typically return these columns automatically.
impactedAssets must be non-empty The impactedAssets array must contain at least 1 element. An empty array ([]) is rejected with 400 BadRequest: "The field ImpactedAssets must be a string or array type with a minimum length of '1'." Every detection must declare which entity it impacts. See Pitfall 13.
No let statements (NRT) NRT rules (schedule: "0") reject let entirely — the API returns a generic 400 Bad Request. This is not documented by Microsoft (empirically discovered Feb 2026) but consistently reproducible. Inline all dynamic arrays/lists directly in where clauses. Non-NRT rules (1H+) tolerate let.
Unique displayName AND title Both the rule displayName and the alert title must be unique across all custom detections. Duplicate displayName returns 409 Conflict. Duplicate title returns 400 Bad Request.
🔴 Naming convention for displayName Follow the standardized naming convention documented in Naming Convention below. No schedule prefixes, no MITRE IDs, no tactic labels — the portal columns already display these. Use clean, descriptive title-case names with colon (:) as the only sub-separator.
150 alerts per run Each rule generates a maximum of 150 alerts per execution. Tune the query to avoid alerting on normal day-to-day activity.
🔴 No response actions All rules deployed by this skill MUST use "responseActions": []. Automated response actions (isolate device, disable user, block file, etc.) are PROHIBITED — they must only be configured manually by a human operator in the Defender portal after the rule is validated. Never populate responseActions in manifests or API calls.
First run = 30-day backfill When a new rule is saved, it immediately runs against the past 30 days of data. Expect a burst of initial alerts if the query has broad coverage.

Required event-unique columns by table type (MS Learn source):

Table Family Required Columns (besides timestamp)
MDE tables (Device*) DeviceId AND ReportId
Alert* tables None (just timestamp)
Observation* tables ObservationId
All other XDR tables ReportId
Sentinel/LA tables (AuditLogs, SigninLogs, SecurityEvent, OfficeActivity, etc.) ReportId recommended (use proxy: CorrelationId, OfficeObjectId, CallerProcessId) but not strictly mandated by the docs

Query Adaptation Checklist

When converting a Sentinel query to custom detection format:

  1. ✅ Remove bare summarize — project raw rows instead. Exception: summarize with arg_max is allowed for threshold-based detections (see Pitfall 3)
  2. ✅ Project the timestamp column as-is: TimeGenerated = TimeGenerated for Sentinel/LA tables, Timestamp for XDR tables. Never alias one to the other.
  3. ✅ Project the impacted asset identifier column — the column name must match a valid identifier from Impacted Asset Types. Examples: DeviceName = Computer for device-focused detections, AccountUpn = UserId for user-focused. See Pitfall 9.
  4. ✅ Project event-unique columns per table type — DeviceId + ReportId for MDE tables; ReportId for other XDR tables; recommended proxy ReportId for Sentinel tables (e.g., ReportId = CorrelationId). Caveat: proxy columns may contain empty strings for some events — acceptable but means those rows won't be individually identifiable in alert details.
  5. ✅ Add a time filter as the first where clause — prefer ingestion_time() > ago(1h) over Timestamp > ago(1h) (see tip below). NRT exception: For NRT rules (schedule: "0"), omit all time filters — ingestion_time() causes 400 Bad Request in NRT mode (see Pitfall 17). Timestamp > ago(...) is accepted but unnecessary.
  6. ✅ Remove let variables for NRT rules — NRT rejects let entirely (generic 400 error, undocumented). Inline all dynamic arrays directly in where clauses. Non-NRT rules tolerate let.
  7. ✅ Validate via Advanced Hunting dry-run before deployment
  8. ✅ For NRT rules: avoid tostring() on dynamic columns — use native string columns instead (e.g., Properties instead of tostring(Properties_d)). See Pitfall 11.
  9. ✅ For NRT rules: verify the table's ingestion lag justifies NRT. See Pitfall 12.
  10. ✅ Count unique {{Column}} references across title AND description combined — max 3 unique columns total (shared across both fields, not per-field). Exceeding this returns 400 Bad Request: "Dynamic properties in alertTitle and alertDescription must not exceed 3 fields". See Pitfall 14.

Performance tip (from MS Learn): "Avoid filtering custom detections by using the Timestamp column. The data used for custom detections is prefiltered based on the detection frequency." Use ingestion_time() instead — it aligns with the platform's pre-filtering for better performance. For scheduled rules, match the time filter to the run frequency (ingestion_time() > ago(1h) for 1H rules). For NRT rules, no time filter is needed. ⚠️ PowerShell note: When building queryText containing backslashes (file paths, regex), always use single-quoted here-strings (@'...'@) to avoid escape sequence mangling — see Pitfall 15.

Example Adaptation

Before (Sentinel KQL — uses summarize):

let _Lookback = 7d;
SecurityEvent
| where TimeGenerated > ago(_Lookback)
| where EventID == 4799
| where TargetSid == "S-1-5-32-544"
| where SubjectUserSid != "S-1-5-18"
| where AccountType != "Machine"
| where not(SubjectUserSid endswith "-500")
| project TimeGenerated, Computer, Actor = SubjectUserName, ...
| summarize EnumerationCount = count(), Processes = make_set(CallerProcess)
    by Actor, ActorDomain, ActorSID

After (Custom Detection — row-level, mandatory columns):

SecurityEvent
| where TimeGenerated > ago(1h)
| where EventID == 4799
| where TargetSid == "S-1-5-32-544"
| where SubjectUserSid != "S-1-5-18"
| where AccountType != "Machine"
| where not(SubjectUserSid endswith "-500")
| project
    TimeGenerated = TimeGenerated,
    DeviceName = Computer,
    AccountName = SubjectUserName,
    AccountDomain = SubjectDomainName,
    AccountSid = SubjectUserSid,
    CallerProcess = CallerProcessName,
    ReportId = CallerProcessId

Key changes:

  • Removed let _Lookback → hardcoded ago(1h)
  • Removed summarize → raw project
  • Added TimeGenerated = TimeGenerated (identity projection — mandatory)
  • Added DeviceName = Computer (impacted asset identifier — device-focused detection)
  • Added ReportId = CallerProcessId (proxy ReportId — event-unique identifier)

Naming Convention

The displayName should be a clean, title-case description of what the detection finds. The portal columns already show Scheduling Type, Tactics, and Techniques — don't repeat them in the name.

Rule Example
Use colon (:) for sub-detail Event Log Clearing: Security or System Log Wiped
Threat actor/family in parentheses at end Credential Dumping Tool Execution (Storm-2885)
TI rules: Threat Intelligence: {IoC} Match on {Table} Threat Intelligence: IP Match on CloudAppEvents
No schedule prefixes (NRT —, 1H —) Portal Scheduling Type column covers this
No MITRE IDs (T1036 —) Portal Techniques column covers this
No tactic labels ((Collection), (Exfiltration)) Portal Tactics column covers this
No em dash () separator Use colon (:) instead

API Reference

Endpoint

POST   /beta/security/rules/detectionRules          — Create
GET    /beta/security/rules/detectionRules           — List all
GET    /beta/security/rules/detectionRules/{id}      — Get by ID
PATCH  /beta/security/rules/detectionRules/{id}      — Update
DELETE /beta/security/rules/detectionRules/{id}      — Delete

Schedule Periods

Value Meaning Notes
"0" NRT (Near Real-Time / Continuous) Runs continuously. See NRT Constraints.
"1H" Every 1 hour Most common for custom detections
"3H" Every 3 hours
"12H" Every 12 hours
"24H" Every 24 hours Daily

Alert Severity Values

Value Use Case
"informational" Baseline queries, low-noise canaries
"low" Suspicious but may be benign
"medium" Likely malicious, needs investigation
"high" High-confidence detection, immediate response

Alert Category Values

category is a case-sensitive, single-value, server-validated enum accepting two groups:

  • MITRE tactics (title case): InitialAccess, Execution, Persistence, PrivilegeEscalation, DefenseEvasion, CredentialAccess, Discovery, LateralMovement, Collection, Exfiltration, CommandAndControl, Impact, Reconnaissance, ResourceDevelopment
  • Non-tactic threat categories: Malware, Ransomware, SuspiciousActivity, UnwantedSoftware — these hide the MITRE techniques field in the portal (MS Learn); prefer a tactic when you want techniques to display.

Portal label note: The portal labels this control "Tactic" (UX rename, 2026), but the API field stays category — single-value, automation unaffected. Validation details in Pitfall 18.

Impacted Asset Types

Device asset:

{
    "@odata.type": "#microsoft.graph.security.impactedDeviceAsset",
    "identifier": "<identifier>"
}

Valid device identifiers: deviceId, deviceName, remoteDeviceName, targetDeviceName, destinationDeviceName

User asset:

{
    "@odata.type": "#microsoft.graph.security.impactedUserAsset",
    "identifier": "<identifier>"
}

Valid user identifiers: accountObjectId, accountSid, accountUpn, accountName, accountDomain, accountId, requestAccountSid, requestAccountName, requestAccountDomain, recipientObjectId, processAccountObjectId, initiatingAccountSid, initiatingProcessAccountUpn, initiatingAccountName, initiatingAccountDomain, servicePrincipalId, servicePrincipalName, targetAccountUpn

Mailbox asset:

{
    "@odata.type": "#microsoft.graph.security.impactedMailboxAsset",
    "identifier": "<identifier>"
}

Valid mailbox identifiers: accountUpn, fileOwnerUpn, initiatingProcessAccountUpn, lastModifyingAccountUpn, targetAccountUpn, senderFromAddress, senderDisplayName, recipientEmailAddress, senderMailFromAddress

Minimal Valid POST Body

{
    "displayName": "Rule Name",
    "isEnabled": true,
    "queryCondition": {
        "queryText": "SecurityEvent\r\n| where TimeGenerated > ago(1h)\r\n| ..."
    },
    "schedule": {
        "period": "1H"
    },
    "detectionAction": {
        "alertTemplate": {
            "title": "Alert Title",
            "description": "Alert description text.",
            "severity": "medium",
            "category": "Discovery",
            "recommendedActions": null,
            "mitreTechniques": ["T1069.001"],
            "impactedAssets": [
                {
                    "@odata.type": "#microsoft.graph.security.impactedDeviceAsset",
                    "identifier": "deviceName"
                }
            ]
        },
        "responseActions": []
    }
}

impactedAssets: Must contain at least 1 element — an empty array causes 400 BadRequest. Every detection must map to at least one impacted entity (device, user, or mailbox). See Pitfall 13.

recommendedActions: Can be null or a string. The portal sets it to null by default.

responseActions: Must always be [] — response actions are prohibited in LLM-authored detections (see Critical Rules). Must be [], not null — sending null causes 400 Bad Request. See Pitfall 10.

organizationalScope: Omit this field entirely for tenant-wide rules (the API default). Including "organizationalScope": null explicitly may cause 400 Bad Request in some API versions.

Custom details (not shown above): The API also supports a customDetails array of key-value pairs surfaced in the alert side panel. Each rule supports up to 20 KVPs with a combined 4KB size limit. Keys are display labels; values are query column names. See MS Learn.

Related evidence (not shown above): Beyond impactedAssets, the entity mapping also supports linking related evidence entities (Process, File, Registry value, IP, OAuth application, DNS, Security group, URL, Mail cluster, Mail message). These provide correlation context but are not impacted assets. See MS Learn.

Dynamic Alert Titles and Descriptions

Alert titles and descriptions can reference query result columns using {{ColumnName}} syntax, making alerts self-descriptive:

{
    "title": "Admin Group Enumeration by {{AccountName}} on {{DeviceName}}",
    "description": "User {{AccountName}} enumerated group {{TargetGroupName}} on the device."
}
Constraint Limit
Max unique dynamic columns 3 unique {{Column}} references TOTAL across title AND description combined — NOT per field. E.g., the example above uses AccountName + DeviceName in title and AccountName + TargetGroupName in description = 3 unique columns (AccountName is reused). Exceeding this returns 400 Bad Request with "Dynamic properties in alertTitle and alertDescription must not exceed 3 fields".
⚠️ Discrepancy with MS Learn docs: The official documentation states "The number of columns you can reference in each field is limited to three" (i.e., 3 per field). However, the Graph API empirically enforces 3 unique columns total across both fields combined (confirmed Feb 2026). The portal UI may enforce the per-field limit differently than the API. Use 3 unique total as the safe limit for Graph API deployments.
Format {{ExactColumnName}} — must match a column in query output
Markup Plain text only — HTML, Markdown, and code are sanitized
URLs Must use percent-encoding format

Frequency & Lookback

Lookback Windows by Frequency

Each frequency has a built-in lookback window. Results outside this window are ignored even if the query requests them:

Frequency Lookback Period Query Filter Recommendation
NRT (Continuous) Streaming No time filter needed — events processed as collected
Every 1 hour Past 4 hours ago(4h) or ago(1h)
Every 3 hours Past 12 hours ago(12h) or ago(3h)
Every 12 hours Past 48 hours ago(48h) or ago(12h)
Every 24 hours Past 30 days ago(30d) or ago(24h)
Custom (Sentinel only) 4× frequency (<daily) or 30d (≥daily) Match lookback

Tip: Match the query time filter to the run frequency (ago(1h) for 1H rules), not the full lookback window. The lookback ensures late-arriving data is caught, but your filter should target the detection window.

NRT Constraints

NRT (Continuous, period: "0") rules have stricter requirements than scheduled rules:

Constraint Detail
Single table only Query must reference exactly one table — no joins or unions
No let statements let variables are silently rejected — the API returns a generic 400 Bad Request with no useful error message. Always inline dynamic arrays/lists directly in where clauses. This constraint is not listed in the official NRT docs (which list only 4 constraints) but is consistently reproducible via Graph API (empirically confirmed Feb 2026).
No externaldata Cannot use the externaldata operator
No comments Query text must not contain any comment lines (//)
Supported operators only Limited to supported KQL features. tostring() on dynamic columns is rejected — use native string columns instead (e.g., Properties instead of tostring(Properties_d)). See Pitfall 11.
No time filter needed NRT processes events as they stream in. The platform pre-filters automatically. Timestamp > ago(1h) is unnecessary but harmless. However, ingestion_time() is rejected — the API returns 400 Bad Request. See Pitfall 17.

NRT-Supported Tables

Not all tables support NRT frequency. Use NRT only with these tables:

Defender XDR tables: AlertEvidence, CloudAppEvents, DeviceEvents, DeviceFileCertificateInfo, DeviceFileEvents, DeviceImageLoadEvents, DeviceLogonEvents, DeviceNetworkEvents, DeviceNetworkInfo, DeviceInfo, DeviceProcessEvents, DeviceRegistryEvents, EmailAttachmentInfo, EmailEvents*, EmailPostDeliveryEvents, EmailUrlInfo, IdentityDirectoryEvents, IdentityLogonEvents, IdentityQueryEvents, UrlClickEvents

* EmailEvents: LatestDeliveryLocation and LatestDeliveryAction columns are excluded from NRT.

Sentinel tables (Preview): ABAPAuditLog_CL, ABAPChangeDocsLog_CL, AuditLogs, AWSCloudTrail, AWSGuardDuty, AzureActivity, CommonSecurityLog, GCPAuditLogs, MicrosoftGraphActivityLogs, OfficeActivity, Okta_CL, OktaV2_CL, ProofpointPOD, ProofPointTAPClicksPermitted_CL, ProofPointTAPMessagesDelivered_CL, SecurityAlert, SecurityEvent, SigninLogs

Important: SecurityEvent and SigninLogs support NRT — our Event ID 4799/4702 queries can run as NRT if they meet the single-table/no-joins constraint.

Ingestion Lag Consideration — NRT Suitability

A table being NRT-supported means the API accepts NRT rules — not that NRT is the right choice. If a table's ingestion lag exceeds the detection frequency benefit, NRT adds overhead with no detection speed improvement. See Pitfall 12 for a per-table ingestion lag assessment and recommendation matrix. Rule of thumb: if ingestion lag > 30 min, use 1H scheduled instead.

Custom Frequency (Sentinel Data Only)

For rules based entirely on Sentinel-ingested data, a custom frequency is available (Preview):

  • Range: 5 minutes to 14 days
  • Lookback: Automatically calculated — 4× frequency for sub-daily, 30 days for daily or longer
  • Requirement: Data must be available in Microsoft Sentinel (not XDR-only tables)

Deployment Workflow

🔴 DEPLOYMENT GATE: Only proceed to Steps 2-3 (API calls) when the user has explicitly requested deployment. Trigger phrases: "deploy", "create the rule", "push", "POST it", "make it live". If the user asked to "author", "write", "create a manifest", "prepare", or "draft" a detection — stop after validation (Step 1) and manifest generation. Present the manifest JSON for review and wait for explicit deployment confirmation.

Single Rule Deployment

Step 1: Validate the query in Advanced Hunting

Run the adapted query with a 1h lookback to validate schema:

Use RunAdvancedHuntingQuery with the adapted KQL query.
Confirm: 0 or more results, correct column schema (TimeGenerated, DeviceName, AccountName, etc.)

Then run with 30d lookback to confirm it returns real data:

Change ago(1h) to ago(30d) for the validation run.
Verify results contain expected columns and realistic data.

Step 2: Check for duplicates, then build and POST the rule

Connect-MgGraph -Scopes "CustomDetection.ReadWrite.All" -NoWelcome

# Pre-flight: check if rule name already exists
$ruleName = "Rule Name"
$existing = (Invoke-MgGraphRequest -Method GET `
    -Uri "/beta/security/rules/detectionRules" -OutputType PSObject).value `
    | Where-Object { $_.displayName -eq $ruleName }
if ($existing) {
    Write-Host "Rule '$ruleName' already exists (ID: $($existing.id)). Skipping POST."
    return
}

$body = @{
    displayName = $ruleName
    isEnabled = $true
    queryCondition = @{
        queryText = "SecurityEvent`r`n| where TimeGenerated > ago(1h)`r`n| ..."
    }
    schedule = @{ period = "1H" }
    detectionAction = @{
        alertTemplate = @{
            title = "Alert Title"
            description = "Description"
            severity = "medium"
            category = "Discovery"
            recommendedActions = $null
            mitreTechniques = @("T1069.001")
            impactedAssets = @(
                @{
                    "@odata.type" = "#microsoft.graph.security.impactedDeviceAsset"
                    identifier = "deviceName"
                }
            )
        }
        responseActions = @()
    }
} | ConvertTo-Json -Depth 10

$result = Invoke-MgGraphRequest -Method POST `
    -Uri "/beta/security/rules/detectionRules" `
    -Body $body -ContentType "application/json" -OutputType PSObject

Step 3: Verify creation

$rules = Invoke-MgGraphRequest -Method GET `
    -Uri "/beta/security/rules/detectionRules" -OutputType PSObject
$rules.value | Select-Object id, displayName, isEnabled,
    @{N='Schedule';E={$_.schedule.period}},
    @{N='Status';E={$_.lastRunDetails.status}} | Format-Table -AutoSize

Batch Deployment

Use the companion script Deploy-CustomDetections.ps1 for manifest-driven batch deployment.

Manifest storage: Save manifest JSON files in the temp/ folder (gitignored). Manifests are deployment artifacts, not versioned query definitions.

Manifest Format

See example-manifest.json for a complete 2-rule reference covering NRT and scheduled (with summarize/arg_max) patterns.

The script reads a JSON file containing an array of rule definitions:

[
    {
        "displayName": "Admin Group Enumeration by Non-Admin User",
        "title": "Admin Group Enumeration by {{AccountName}} on {{DeviceName}}",
        "queryText": "SecurityEvent\r\n| where TimeGenerated > ago(1h)\r\n| ...",
        "schedule": "0",
        "severity": "medium",
        "category": "Discovery",
        "mitreTechniques": ["T1069.001", "T1087.001"],
        "description": "User {{AccountName}} enumerated the local Administrators group.",
        "recommendedActions": "Verify whether the user has a legitimate reason to enumerate admin group membership.",
        "impactedAssets": [
            { "type": "device", "identifier": "deviceName" },
            { "type": "user", "identifier": "accountSid" }
        ],
        "responseActions": []
    }
]

Usage

# Dry-run — validate all queries in Advanced Hunting without creating rules
.\Deploy-CustomDetections.ps1 -ManifestPath .\temp\4799_4702.json -DryRun

# Deploy all rules from manifest (skips existing rules by default)
.\Deploy-CustomDetections.ps1 -ManifestPath .\temp\4799_4702.json

# Deploy and overwrite — attempt POST even if rule name exists (may cause 409)
.\Deploy-CustomDetections.ps1 -ManifestPath .\temp\4799_4702.json -Force

Lifecycle Management

List All Rules

$rules = Invoke-MgGraphRequest -Method GET `
    -Uri "/beta/security/rules/detectionRules" -OutputType PSObject
$rules.value | Select-Object id, displayName, isEnabled,
    @{N='Schedule';E={$_.schedule.period}},
    @{N='LastRun';E={$_.lastRunDetails.status}},
    @{N='Created';E={$_.createdDateTime}} | Format-Table -AutoSize

Get Rule by ID

$rule = Invoke-MgGraphRequest -Method GET `
    -Uri "/beta/security/rules/detectionRules/5632" -OutputType PSObject
$rule | ConvertTo-Json -Depth 10

Update Rule (PATCH)

PATCH /beta/security/rules/detectionRules/{id} — send only the fields you want to change. All fields are optional.

Updatable fields:

Field Path Type Notes
displayName String Rule name — follow Naming Convention
isEnabled Boolean Enable/disable without deleting
queryCondition.queryText String KQL query — validates before saving
schedule.period String 0 (NRT), 1H, 3H, 12H, 24H
detectionAction.alertTemplate.title String Alert title (supports {{Column}} variables)
detectionAction.alertTemplate.description String Alert description (supports {{Column}} variables)
detectionAction.alertTemplate.severity String informational, low, medium, high
detectionAction.alertTemplate.category String ATT&CK tactic (e.g., CredentialAccess)
detectionAction.alertTemplate.recommendedActions String null to clear
detectionAction.alertTemplate.impactedAssets Array null to clear
detectionAction.responseActions Array Always [] — see critical rules

Examples:

# Rename a rule
$body = @{ displayName = 'Cloud Password Spray: Multi-Account Failed Auth from Single IP' } | ConvertTo-Json
Invoke-MgGraphRequest -Method PATCH `
    -Uri "/beta/security/rules/detectionRules/6044" `
    -Body $body -ContentType "application/json"

# Change schedule and severity
$body = @{
    schedule = @{ period = "24H" }
    detectionAction = @{
        alertTemplate = @{ severity = "high" }
    }
} | ConvertTo-Json -Depth 10
Invoke-MgGraphRequest -Method PATCH `
    -Uri "/beta/security/rules/detectionRules/5632" `
    -Body $body -ContentType "application/json"

# Batch rename (loop pattern)
$renames = @{
    'Old Rule Name' = 'New Rule Name'
    'Another Old Name' = 'Another New Name'
}
$rules = (Invoke-MgGraphRequest -Method GET `
    -Uri "/beta/security/rules/detectionRules" -OutputType PSObject).value
foreach ($old in $renames.Keys) {
    $rule = $rules | Where-Object { $_.displayName -eq $old }
    if (-not $rule) { continue }
    $body = @{ displayName = $renames[$old] } | ConvertTo-Json
    Invoke-MgGraphRequest -Method PATCH `
        -Uri "/beta/security/rules/detectionRules/$($rule.id)" `
        -Body $body -ContentType "application/json"
}

Delete Rule

Invoke-MgGraphRequest -Method DELETE `
    -Uri "/beta/security/rules/detectionRules/5632"

⚠️ Deletion propagation delay: After deleting a rule, the name remains reserved for ~30-60 seconds. Creating a rule with the same displayName during this window returns 409 Conflict — but the rule may still be created despite the error. Always verify with a GET after creation.

Enable/Disable Without Deleting

# Disable
Invoke-MgGraphRequest -Method PATCH `
    -Uri "/beta/security/rules/detectionRules/5632" `
    -Body '{"isEnabled": false}' -ContentType "application/json"

# Enable
Invoke-MgGraphRequest -Method PATCH `
    -Uri "/beta/security/rules/detectionRules/5632" `
    -Body '{"isEnabled": true}' -ContentType "application/json"

Existing Rule Discovery

Before authoring new custom detections, check what Analytic Rules (Sentinel) and Custom Detection rules (Defender XDR) already exist for the same table, EventID, or keyword. This avoids duplicating coverage and helps identify gaps.

Step 0: Construct the Analytic Rules URL (once per session)

$cfg = Get-Content config.json | ConvertFrom-Json
$sub = $cfg.subscription_id
$rg  = $cfg.azure_mcp.resource_group
$ws  = $cfg.azure_mcp.workspace_name
$arUrl = "https://management.azure.com/subscriptions/$sub/resourceGroups/$rg/providers/Microsoft.OperationalInsights/workspaces/$ws/providers/Microsoft.SecurityInsights/alertRules?api-version=2024-09-01"

# Verify (should return rule count)
az rest --method get --url $arUrl --query "length(value)" -o tsv 2>$null

All patterns below reuse $arUrl. The Sentinel REST API returns the full KQL query text for every rule — there is no server-side content filtering, so we pull all rules in one call and filter client-side with JMESPath contains().

Search Analytic Rules by Table Name or Keyword

# Which rules reference a specific table? (e.g., SecurityEvent)
az rest --method get --url $arUrl `
  --query "value[?properties.query && contains(properties.query, 'SecurityEvent')].{name: properties.displayName, severity: properties.severity, enabled: properties.enabled}" `
  -o table 2>$null

Search Analytic Rules by EventID

# Which rules reference a specific EventID?
az rest --method get --url $arUrl `
  --query "value[?properties.query && contains(properties.query, '<EventID>')].{name: properties.displayName, severity: properties.severity, enabled: properties.enabled}" `
  -o table 2>$null

To see the surrounding KQL context of a match:

az rest --method get --url $arUrl `
  --query "value[?properties.query && contains(properties.query, '<EventID>')].properties.query" `
  -o tsv 2>$null | Select-String -Pattern '<EventID>' -Context 1,1

Search Analytic Rules for ASIM Parser Dependencies

$rules = az rest --method get --url $arUrl `
  --query "value[?properties.enabled==``true`` && properties.query].{displayName: properties.displayName, query: properties.query}" `
  -o json 2>$null | ConvertFrom-Json

$asimRules = $rules | Where-Object { $_.query -match '_Im_|_ASim_' }
$asimRules | ForEach-Object {
    $schemas = [regex]::Matches($_.query, '_Im_(\w+)') | ForEach-Object { $_.Groups[1].Value } | Sort-Object -Unique
    Write-Host "$($_.displayName): $($schemas -join ', ')"
}

Dump All Enabled Rule Queries for Local Search

az rest --method get --url $arUrl `
  --query "value[?properties.enabled==``true`` && properties.query].{name: properties.displayName, query: properties.query}" `
  -o json > temp/analytic_rule_queries.json

# Then search locally for any pattern
Get-Content temp/analytic_rule_queries.json | Select-String -Pattern 'EventID\s*(==|in\s*\(|has|contains)' -AllMatches

Search Custom Detection Rules (Graph API)

⚠️ Important: The Graph MCP server returns 403 for the Custom Detection endpoint. Always use Invoke-MgGraphRequest via the terminal.

Import-Module Microsoft.Graph.Authentication -ErrorAction Stop

$ctx = Get-MgContext
if (-not $ctx -or $ctx.Scopes -notcontains 'CustomDetection.Read.All') {
    Connect-MgGraph -Scopes 'CustomDetection.Read.All' -NoWelcome
}

$response = Invoke-MgGraphRequest -Method GET `
    -Uri '/beta/security/rules/detectionRules?$select=id,displayName,isEnabled,queryCondition,schedule,lastRunDetails,createdDateTime,lastModifiedDateTime' `
    -OutputType PSObject

Then filter by table name or keyword:

# Which CD rules reference SecurityEvent?
$response.value | Where-Object { $_.queryCondition.queryText -match 'SecurityEvent' } |
    Select-Object displayName, isEnabled, @{N='Query';E={$_.queryCondition.queryText}}

# Which CD rules reference a specific EventID?
$response.value | Where-Object { $_.queryCondition.queryText -match '4688|ProcessCreate' } |
    Select-Object displayName, isEnabled, @{N='Query';E={$_.queryCondition.queryText}}

Identify stale rules (no run in 90 days):

$cutoff = (Get-Date).AddDays(-90).ToString('yyyy-MM-ddTHH:mm:ssZ')
$response.value | Where-Object {
    $_.lastRunDetails.lastRunDateTime -and $_.lastRunDetails.lastRunDateTime -lt $cutoff
} | Select-Object displayName, isEnabled,
    @{N='LastRun';E={$_.lastRunDetails.lastRunDateTime}},
    @{N='Status';E={$_.lastRunDetails.status}}

Key API Fields for Rule Discovery

Source Field Path Content
Analytic Rules (REST) properties.displayName Rule name
properties.query Full KQL query text
properties.severity High / Medium / Low / Informational
properties.enabled true / false
Custom Detections (Graph) displayName Rule name
queryCondition.queryText Full KQL query (AH syntax)
schedule.period PT1H, PT24H, PT0S (continuous)
lastRunDetails.lastRunDateTime Last execution timestamp
lastRunDetails.status completed, failed, running
isEnabled true / false

Tip: JMESPath contains() (used in az rest --query) is case-sensitive. For case-insensitive search, dump to JSON and use PowerShell -match instead.


Known Pitfalls

Pitfall 1: Timestamp vs TimeGenerated — Project As-Is

The query must project the timestamp column exactly as it appears in the source table. Do NOT alias one to the other.

Source Table Type Correct Wrong
Sentinel/LA tables (SecurityEvent, SigninLogs, AuditLogs, etc.) TimeGenerated = TimeGenerated Timestamp = TimeGenerated
XDR-native tables (DeviceEvents, DeviceProcessEvents, etc.) Timestamp (native) TimeGenerated = Timestamp

The MS Learn docs confirm: "Timestamp or TimeGenerated — This column sets the timestamp for generated alerts. The query shouldn't manipulate this column and should return it exactly as it appears in the raw event." Aliasing across types causes 400 Bad Request.

Pitfall 2: Silent Rule Creation on Error Responses (400 AND 409)

The API can silently create a rule even when it returns an error. This applies to both 400 Bad Request and 409 Conflict responses.

Cause A — 400 with partial validation: A POST may pass structural validation (creating the rule) but fail a secondary check (e.g., let variable in NRT query, >3 dynamic fields). The API returns 400 Bad Request — but the rule was already created. A subsequent retry with a fixed query then hits 409 Conflict because the rule exists.

Cause B — Deletion propagation delay: Deleting a rule leaves a name reservation for ~30-60 seconds. POSTing a rule with the same displayName in this window returns 409 Conflict — but the API may still create the rule.

Cause C — Silent success + accidental retry: When running Invoke-MgGraphRequest in a terminal, the POST may succeed but the output buffer splits across calls, making it look like nothing happened. Re-running the same POST produces a 409 because the rule was already created seconds earlier.

Prevention:

  1. Always run a GET before POST to check if the rule name already exists (see Step 2)
  2. Always verify with GET after ANY error response (400 or 409) — the rule may have been created despite the error
  3. Never re-run a POST without first checking via GET whether the previous attempt succeeded
  4. If a rule was silently created with a bad query, use PATCH to update the queryCondition.queryText rather than deleting and re-creating

Pitfall 3: summarize — Allowed Only With Row-Level Output

Custom detection queries must return row-level results with required columns (TimeGenerated, DeviceName, ReportId). A bare summarize count() or make_set() as the final operator fails validation because the output lacks these columns.

However, summarize with arg_max IS allowed when used to return the required columns alongside aggregation:

// ✅ ALLOWED — uses arg_max to preserve row-level columns
DeviceEvents
| where ingestion_time() > ago(1d)
| where ActionType == "AntivirusDetection"
| summarize (Timestamp, ReportId)=arg_max(Timestamp, ReportId), count() by DeviceId
| where count_ > 5

This pattern counts by entity but still returns Timestamp, ReportId, and DeviceId per row — satisfying the requirement. Use this for threshold-based detections ("alert when count > N").

Pitfall 4: Graph MCP and az rest Cannot Access This API

Both the Graph MCP server and az rest lack the CustomDetection.ReadWrite.All scope. Only Invoke-MgGraphRequest with interactive delegated auth works.

Pitfall 5: recommendedActions Type

The recommendedActions field is a String (not an array). Set to null if not needed. The portal always sets it to null.

Pitfall 6: Query Newlines in JSON

The queryText JSON field requires \r\n (CRLF) line breaks on the wire. When using ConvertTo-Json on a PowerShell hashtable (the recommended approach), this is handled automatically — multiline here-string content in the hashtable value is serialized with correct CRLF encoding. No manual newline insertion is needed.

If manually constructing a raw JSON string body (not recommended), use PowerShell backtick escapes `r`n to produce CRLF in the output.

Pitfall 7: Duplicate Name AND Title Check

The API enforces unique displayName AND unique title (alert title) across all custom detections. Duplicate displayName returns 409 Conflict. Duplicate title returns 400 Bad Request. The batch deployment script checks for displayName duplicates by default — use -Force to override. The MS Learn docs state both should be unique: "Detection name... make it unique" and "Alert title... make it unique".

Pitfall 8: Alert Deduplication

Custom detections automatically deduplicate alerts. If a detection fires twice on events with the same entities, custom details, and dynamic details, only one alert is created. This can happen when the lookback period is longer than the run frequency (e.g., 1H frequency with 4H lookback means 3 hours of overlap). Different events on the same entity produce separate alert entries under the same alert.

Pitfall 9: impactedAssets Identifier Must Be a Predefined API Value

The identifier field in impactedAssets must use one of the predefined values from the Impacted Asset Types section — NOT arbitrary query column names. Using a custom column name (e.g., "identifier": "TargetComputer" or "identifier": "Actor") causes a silent 400 InvalidInput with an empty error message.

This aligns with the MS Learn docs which list specific "strong identifier" columns for impacted assets. The portal wizard enforces this via a dropdown; the Graph API rejects non-matching values silently.

Identifier values must use camelCase as listed in the Impacted Asset Types section (e.g., recipientEmailAddress, not RecipientEmailAddress). The API treats identifier values as case-sensitive when matching to the predefined list.

Additionally, the query MUST project a column whose name matches the chosen identifier. If you use "identifier": "accountUpn", the query must project an AccountUpn column (alias if needed: AccountUpn = UserId). The column name match is case-insensitive — AccountUpn in the query matches accountUpn in the identifier.

Wrong Correct
"identifier": "UserId" "identifier": "accountUpn" + project AccountUpn = UserId
"identifier": "Actor" "identifier": "accountUpn" + rename ActorAccountUpn
"identifier": "TargetComputer" "identifier": "deviceName" + project DeviceName = Computer
"identifier": "TargetUPN" "identifier": "accountUpn" + rename TargetUPNAccountUpn

⚠️ InitiatingProcess* column trap (Apr 2026): Device* tables project many InitiatingProcess* columns (e.g., InitiatingProcessAccountName, InitiatingProcessAccountSid, InitiatingProcessAccountUpn, InitiatingProcessAccountObjectId). Only three of these are valid user identifiers: initiatingProcessAccountUpn, and the initiatingAccount* variants (initiatingAccountSid, initiatingAccountName, initiatingAccountDomain). Notably, initiatingProcessAccountName is NOT valid — it looks correct because the column exists, but the API enum uses accountName instead. The API rejects invalid identifiers with a silent 400 InvalidInput (empty error message), making this very hard to debug. Always alias the column: AccountName = InitiatingProcessAccountName. DeviceId requirement: For XDR-native tables (Device*, Email*, CloudAppEvents) with a device-type impactedAsset, the query must project DeviceId (not just DeviceName). Sentinel/LA tables (SecurityEvent, AuditLogs) do not require DeviceId.

Pitfall 10: PowerShell Empty Array Swallowing & organizationalScope

Root cause (Feb 2026): When using PowerShell if/else expressions to assign empty arrays, PowerShell swallows @() and produces $null instead:

# ❌ BUG — $x becomes $null, NOT an empty array
$x = if ($false) { @($items) } else { @() }
# Result: $null

# ✅ CORRECT — assign first, then overwrite conditionally
$x = @()
if ($condition) { $x = @($items) }
# Result: empty Object[] (serializes to [])

This caused array fields like responseActions and mitreTechniques to serialize as null instead of [], which the API rejects with 400 Bad Request.

Combined with organizationalScope: null — including this field explicitly (even as null) was also rejected. The fix: omit organizationalScope entirely and use direct assignment for array fields.

Symptoms: All rules in a batch return 400 Bad Request, but some may be silently created (see Pitfall 2). Manual deployment of the same rule body (without the null fields) succeeds.

Fixed in: Deploy-CustomDetections.ps1 — array fields now use direct assignment, organizationalScope removed from body.

Pitfall 11: tostring() on Dynamic Columns Rejected in NRT Mode

Root cause (Feb 2026): NRT rules (schedule: "0") reject tostring() wrapping dynamic-typed columns. The API returns a generic 400 Bad Request with no useful error message — similar to the let rejection described in NRT Constraints. The same query deploys successfully as a scheduled rule (1H+).

Example — AzureActivity table:

// ❌ FAILS in NRT mode — tostring() on dynamic column
AzureActivity
| where OperationNameValue =~ "MICROSOFT.SECURITY/PRICINGS/WRITE"
| where tostring(Properties_d.pricings_pricingTier) == "Free"

// ✅ WORKS — use the native string column instead
AzureActivity
| where OperationNameValue =~ "MICROSOFT.SECURITY/PRICINGS/WRITE"
| where Properties has '"pricingTier":"Free"'

Workarounds:

  1. Prefer native string columns — many Sentinel tables have both a dynamic column (e.g., Properties_d) and a string column (e.g., Properties). Use the string column with has or contains for NRT.
  2. Switch to 1H schedule — if tostring() is required for precise extraction, use a scheduled rule where it works reliably.

Ingestion lag consideration: Even when a table is NRT-supported, check whether ingestion lag makes NRT impractical — see Ingestion Lag Consideration.

Pitfall 12: NRT-Supported ≠ NRT-Practical — Check Ingestion Lag

A table appearing in the NRT-Supported Tables list means the API accepts NRT rules for that table — it does NOT mean NRT adds value. Tables with significant ingestion lag negate the benefit of continuous detection.

Table Typical Ingestion Lag NRT Practical? Recommendation
DeviceEvents, DeviceProcessEvents < 5 min ✅ Yes NRT is effective
SigninLogs, AuditLogs 5-15 min ⚠️ Marginal 1H is usually sufficient
AzureActivity 3-20 min (docs) ⚠️ Marginal Evaluate per use case
SecurityEvent < 5 min ✅ Yes NRT is effective
OfficeActivity 15-60 min ⚠️ Marginal Evaluate per use case

Rule of thumb: If the table's ingestion lag exceeds 30 minutes, use a 1H scheduled rule instead of NRT. The detection latency is dominated by ingestion lag, not rule frequency.

Pitfall 13: impactedAssets Must Be Non-Empty

Root cause (Feb 2026): The Graph API requires impactedAssets to contain at least 1 element. Sending an empty array ("impactedAssets": []) returns 400 BadRequest with InvalidInput code and the message: "The field ImpactedAssets must be a string or array type with a minimum length of '1'."

This error is particularly difficult to diagnose because:

  • The error message only appears in some response formats — when using Invoke-MgGraphRequest with raw JSON strings, the "message" field is often empty ("")
  • The actual error text only surfaced when using ConvertTo-Json on a PowerShell hashtable body
  • All other fields in the payload may be valid, making it seem like a server-side issue

Every custom detection must declare at least one impacted entity. Choose the most relevant asset type for the detection:

Detection Focus Asset Type Example Identifier
Email-based threats impactedMailboxAsset recipientEmailAddress, senderFromAddress
User activity impactedUserAsset accountUpn, accountObjectId
Endpoint/device impactedDeviceAsset deviceId, deviceName

Prevention:

  • Always include at least one impactedAssets entry in manifests and API payloads
  • The companion script Deploy-CustomDetections.ps1 validates this at manifest load time and rejects rules with empty impactedAssets before calling the API
  • Review the Impacted Asset Types section for the full list of valid identifiers per asset type

Pitfall 14: Max 3 Unique Dynamic Columns Across Title + Description

The Graph API enforces 3 unique {{Column}} references across title and description combined (not per field). Exceeding this returns 400 Bad Request — often with an empty error message via Invoke-MgGraphRequest.

⚠️ MS Learn discrepancy: Docs say 3 per field; the API empirically enforces 3 unique total across both fields (confirmed Mar 2026).

Scenario Unique Columns Result
title: {{A}} {{B}}, description: {{A}} {{C}} A, B, C = 3 ✅ Accepted
title: {{A}} {{B}}, description: {{C}} {{D}} A, B, C, D = 4 ❌ 400 Bad Request

Counting: Reuse across fields is free ({{A}} in both = 1). Count distinct names, not occurrences.

Workaround: Replace excess {{Column}} refs with static text, or use customDetails (up to 20 KVPs) to surface extra columns in the alert side panel. Deploy-CustomDetections.ps1 validates this at manifest load time.

Pitfall 15: PowerShell Double-Quoted Here-Strings — Variable Interpolation & Escaping Traps

When building queryText in PowerShell, always use single-quoted here-strings (@'...'@), NEVER double-quoted (@"..."@). Two distinct failure modes make double-quoted here-strings unreliable for KQL:

Risk 1 — $variable interpolation: PowerShell double-quoted strings interpolate $var references. KQL uses $left and $right in join syntax and $ as a dynamic property prefix. Inside @"..."@, PowerShell replaces these with empty strings (undefined variables → $null → empty), silently producing broken KQL with no compile-time warning.

Risk 2 — LLM/human escaping confusion (confirmed Mar 2026): When writing KQL inside a double-quoted context, an LLM (or human) instinctively adapts backslash escaping — writing \skills (single backslash) instead of \\skills (double backslash), because most languages interpret \\\ in double-quoted strings. PowerShell does NOT do this (backtick ` is the escape character, not backslash), so the single \ passes through literally to the KQL parser, which rejects \s as an invalid escape sequence → 400 Bad Request: "syntax errors".

Byte-level proof (Mar 2026): When identical \\skills content is deliberately placed in both here-string types, PowerShell produces identical bytes — confirming PowerShell itself does not mangle backslashes. The difference arises from what gets written into the string (by the LLM or human), not from PowerShell processing it. This makes the bug extremely hard to diagnose: the query looks correct in terminal output, and the root cause is an invisible content difference between attempts.

Here-String Type $left / $right Backslash Content Practical Safety
@'...'@ (single-quoted) Literal $left What you write is what you get ✅ Safe — no interpretation
@"..."@ (double-quoted) Interpolated → empty ❌ What you write is what you get — but LLMs write different content ❌ Fragile — two failure modes

Rule: For ANY queryText, always use @'...'@. This eliminates both $ interpolation bugs and escaping confusion. Applies to inline PowerShell, the batch deployment script, and any LLM-generated deployment commands.

Additional validated finding: ingestion_time() IS accepted by the CD API for scheduled (non-NRT) rules (tested and confirmed Mar 2026). However, NRT rules reject ingestion_time() with 400 Bad Request — empirically confirmed Apr 2026, reproducible on retry. See Pitfall 17.

Pitfall 16: StrictMode .Count on Pipeline Scalars

PowerShell pipelines returning exactly 1 result unwrap to a scalar. Under Set-StrictMode -Version Latest, .Count on a scalar throws a terminating error. Always wrap in @() when .Count will be accessed — applies to pipelines, ConvertFrom-Json (single-element JSON arrays), and Get-* cmdlets.

# ❌  $x is scalar string when 1 result → .Count fails
$x = ... | Sort-Object -Unique
# ✅  $x is always Object[]
$x = @(... | Sort-Object -Unique)

Fixed in Deploy-CustomDetections.ps1 (Mar 2026): dynamic column validation, manifest load, and existing rule fetch all wrapped in @().

Pitfall 17: ingestion_time() Rejected in NRT Rules

NRT rules (schedule: "0") reject ingestion_time() with 400 Bad Request. The NRT Constraints table notes that Timestamp > ago(...) is "unnecessary but harmless" — however, ingestion_time() is NOT harmless in NRT mode. It is a function call (not a column filter), and the NRT streaming pipeline rejects it outright.

Empirically confirmed (Apr 2026): Four-attempt A/B test on the same query (DeviceProcessEvents with vssadmin/bcdedit destructive command detection):

Attempt ingestion_time() present Result
1st deploy Yes 400 Bad Request
2nd deploy Removed ✅ Created
Delete + redeploy Restored 400 Bad Request
Delete + redeploy Removed ✅ Created

Root cause hypothesis: NRT rules process events via streaming ingestion — ingestion_time() likely depends on a materialized ingestion timestamp that isn't available (or isn't filterable) in the NRT streaming pipeline.

Fix: For NRT rules, omit ingestion_time() entirely. If you need a time filter, use Timestamp > ago(...) instead (accepted but unnecessary since NRT pre-filters automatically).

Rule type ingestion_time() Timestamp > ago(...)
Scheduled (1H+) ✅ Accepted (preferred) ✅ Accepted
NRT ("0") 400 Bad Request ✅ Accepted (unnecessary)

Pitfall 18: category and mitreTechniques Are Server-Side Validated

Both fields are validated against fixed allowlists — invalid values return 400 Bad Request (with descriptive messages, unlike Pitfall 13), not silently accepted.

  • category — title-case and case-sensitive (defenseevasion"Invalid alert category."), single-value (arrays rejected). Full accepted set in Alert Category Values.
  • mitreTechniques — invalid IDs → "Mitre techniques (...) are invalid." The accepted set tracks the tenant's ATT&CK version: legacy IDs (e.g. T1003.001) always work; newer sub-techniques (e.g. T1556.009, T1659) work only on refreshed tenants (expanded set rolling out via preview as of June 2026).

Fallback rule: If a newer sub-technique returns "Mitre techniques ... are invalid", fall back to the parent technique (T1556.009T1556) or a legacy ID. Don't assume the newest ATT&CK values are available everywhere.


CD Metadata Contract

Query files in queries/ can include per-query cd-metadata blocks that provide structured data for the detection authoring skill. This is the producer/consumer contract between the KQL Query Authoring skill (producer) and the Detection Authoring skill (consumer).

When cd-metadata is present

When a query in queries/ includes a cd-metadata block, the detection authoring skill uses it to:

  • Pre-populate manifest fields (schedule, severity, category, title, impactedAssets, etc.)
  • Skip manual CD-readiness assessment — the block declares readiness explicitly
  • Generate the adapted CD query by applying the Query Adaptation Checklist to the Sentinel query in the same section

Schema

The cd-metadata block is an HTML comment with YAML content, placed immediately after the per-query metadata fields (Severity, MITRE, Tuning Notes) and before the KQL code block:

### Query N: [Title]

**Purpose:** ...
**Severity:** High
**MITRE:** T1053.005, T1059.001

<!-- cd-metadata
cd_ready: true
schedule: "1H"
category: "Persistence"
title: "Encoded PowerShell in Scheduled Task on {{DeviceName}}"
impactedAssets:
  - type: device
    identifier: DeviceName
recommendedActions: "Investigate the scheduled task XML. Decode the base64 payload and check for malicious content."
adaptation_notes: "Straightforward — already row-level, add mandatory columns"
-->

```kql
// Query code...

### Field Reference

| Field | Required | Type | Description |
|-------|----------|------|-------------|
| `cd_ready` | Yes | `true` / `false` | Whether this query can be adapted for custom detection deployment |
| `schedule` | If cd_ready | `"0"` / `"1H"` / `"3H"` / `"12H"` / `"24H"` | Detection frequency. `"0"` = NRT (single-table, no joins/unions) |
| `category` | If cd_ready | string | Alert category (see [API Reference](#api-reference) for valid values) |
| `title` | No | string | Dynamic alert title with `{{ColumnName}}` placeholders. Falls back to query heading if omitted. **Limit: max 3 unique `{{Column}}` references across `title` AND `description` combined** (see [Pitfall 14](#pitfall-14-max-3-unique-dynamic-columns-across-title--description)) |
| `impactedAssets` | If cd_ready | array | Asset entities to extract. Each entry: `type` (`device`/`user`/`mailbox`) + `identifier` (predefined API value, e.g., `accountUpn`, `deviceName` — see [Impacted Asset Types](#impacted-asset-types)) |
| `recommendedActions` | No | string | Triage guidance shown in the alert. Omit if not needed |
| `responseActions` | No | array | **PROHIBITED** — must always be omitted or empty `[]`. Response actions must only be configured manually in the Defender portal |
| `adaptation_notes` | No | string | Human-readable notes on what adaptation is needed (for the summary table) |

### Queries NOT suitable for CD

For queries that cannot be adapted (baseline queries, statistical aggregations), use:

```markdown
<!-- cd-metadata
cd_ready: false
adaptation_notes: "Statistical baseline query — requires summarize with dcount, not suitable for CD"
-->

This explicitly documents the assessment so the detection skill doesn't re-evaluate it each time.

How the detection skill consumes cd-metadata

  1. User says "deploy query 8 as a custom detection" → Skill reads the query file, finds the cd-metadata block for Query 8
  2. Pre-populates manifest entry from cd-metadata fields (schedule, category, severity, title, impactedAssets)
  3. Applies Query Adaptation Checklist to the Sentinel KQL query in that section
  4. Writes manifest JSON to temp/ for review. Only deploys via Graph API if the user explicitly requested deployment (see Deployment Gate)

If a query file has no cd-metadata blocks, the skill assesses CD-readiness manually based on the query structure and the Query Adaptation Checklist.

基于Microsoft Sentinel数据生成交互式世界地图,可视化攻击来源、威胁地理分布及IP定位。支持通过KQL查询获取经纬度坐标,利用Sentinel Geomap MCP工具展示标记点、颜色缩放及IP情报钻取分析。
geomap world map geographic attack map show on map visualize locations attack origins latitude/longitude
.github/skills/geomap-visualization/SKILL.md
npx skills add SCStelz/security-investigator --skill geomap-visualization -g -y
SKILL.md
Frontmatter
{
    "name": "geomap-visualization",
    "description": "Use this skill when asked to create geographic maps, visualize attack origins on a world map, show location-based data, or display IP geolocation. Triggers on keywords like \"geomap\", \"world map\", \"geographic\", \"attack map\", \"show on map\", \"visualize locations\", \"attack origins\", or when analyzing data with latitude\/longitude coordinates."
}

Geomap Visualization Skill

Purpose

Generate interactive world map visualizations from Microsoft Sentinel data using the Sentinel Geomap MCP App. Geomaps display markers on a world map with coordinates, ideal for visualizing attack origins, geographic distribution of threats, or location-based security data.


📑 TABLE OF CONTENTS

  1. Quick Start - Minimal example to get started
  2. MCP Tool Reference - Parameters and schemas
  3. Data Sources - Tables with native vs enriched geolocation
  4. KQL Query Patterns - Ready-to-use queries by scenario
  5. Enrichment Integration - Adding threat intel drill-down
  6. Examples - End-to-end workflows
  7. Follow-Up Investigation Queries - Queries for selected IPs
  8. Interactive Selection Feature - Multi-select and chat integration

Quick Start

Minimal Geomap (3 Steps)

# 1. Query Sentinel for data with coordinates
mcp_sentinel-data_query_lake({
  "query": "W3CIISLog | where TimeGenerated > ago(7d) | where scStatus == '401' | summarize value = count(), lat = take_any(RemoteIPLatitude), lon = take_any(RemoteIPLongitude) by ip = cIP | where lat != 0 | project ip, lat, lon, value"
})

# 2. Display geomap
mcp_sentinel-geom_show-attack-map({
  "data": [<query results>],
  "title": "Attack Origins (Last 7 Days)",
  "valueLabel": "Failed Logins",
  "colorScale": "blue-red"
})

MCP Tool Reference

Tool: mcp_sentinel-geom_show-attack-map

Parameter Required Type Description
data array Array of {ip, lat, lon, value} objects
title string Title displayed above map (default: "Attack Origin Map")
valueLabel string Label for values (default: "Attacks")
colorScale string blue-red (threats), green-red, or blue-yellow
enrichment array IP enrichment data for click-to-expand panels

Data Schema

{
  "data": [
    {"ip": "101.36.107.228", "lat": 22.25, "lon": 114.15, "value": 44},
    {"ip": "193.142.147.209", "lat": 52.35, "lon": 4.92, "value": 13},
    {"ip": "170.64.158.196", "lat": -33.90, "lon": 151.19, "value": 9}
  ]
}

Enrichment Schema

{
  "enrichment": [
    {
      "ip": "101.36.107.228",
      "city": "Hong Kong",
      "country": "HK",
      "org": "AS135377 UCLOUD INFORMATION TECHNOLOGY",
      "is_vpn": true,
      "is_proxy": false,
      "is_tor": false,
      "abuse_confidence_score": 100,
      "total_reports": 4612,
      "last_reported": "2026-01-29",
      "threat_categories": ["SSH", "Brute-Force", "Web App Attack"]
    }
  ]
}

⚠️ CRITICAL: Complete Enrichment Requirement

When providing enrichment data, ALWAYS include ALL IPs - never a subset.

Rule: 100% Enrichment Coverage

Scenario Correct Action
Queried 50 IPs from Sentinel Include enrichment for ALL 50 IPs
Enriched 25 IPs Include ALL 25 in enrichment array
Some IPs failed enrichment Include them with empty fields, or filter from both data AND enrichment

Why This Matters

  • Users click markers expecting threat intel panels
  • Missing enrichment = empty panels = broken UX
  • Partial enrichment misleads security analysts

Workflow to Ensure Complete Enrichment

  1. Query Sentinel → Get N IPs with coordinates
  2. Batch enrich IPspython enrich_ips.py <all_ips> or python enrich_ips.py --file <ips.json>
  3. Parse enrichment JSON → Extract ALL enriched entries
  4. Build enrichment array → One entry per IP, matching data array exactly
  5. Call geomap → Both data and enrichment arrays must have same IPs

Example: Building Complete Enrichment

import json

# Load enrichment from batch operation
with open('temp/ip_enrichment_<timestamp>.json', 'r') as f:
    raw_enrichment = json.load(f)

# Build geomap enrichment array - INCLUDE ALL
enrichment = []
for e in raw_enrichment:
    threat_cats = []
    for c in e.get('recent_comments', [])[:5]:
        threat_cats.extend(c.get('categories', []))
    
    enrichment.append({
        'ip': e['ip'],
        'city': e.get('city', 'Unknown'),
        'country': e.get('country', '??'),
        'org': e.get('org', 'Unknown'),
        'is_vpn': e.get('is_vpn') or e.get('vpnapi_security_vpn', False),
        'is_proxy': e.get('is_proxy') or e.get('vpnapi_security_proxy', False),
        'is_tor': e.get('is_tor') or e.get('vpnapi_security_tor', False),
        'abuse_confidence_score': e.get('abuse_confidence_score', 0),
        'total_reports': e.get('total_reports', 0),
        'last_reported': e.get('recent_comments', [{}])[0].get('date', '')[:10] if e.get('recent_comments') else '',
        'threat_categories': list(set(threat_cats))[:5]
    })

# Verify coverage
print(f"Enrichment entries: {len(enrichment)}")  # Must match data array length

❌ NEVER Do This

# BAD: Only including first 25 IPs
enrichment = enrichment[:25]  # WRONG

# BAD: Skipping IPs without abuse scores
enrichment = [e for e in enrichment if e['abuse_confidence_score'] > 0]  # WRONG

✅ ALWAYS Do This

# GOOD: Include all IPs, even if some fields are empty
enrichment = [transform(e) for e in raw_enrichment]  # All entries

# GOOD: If filtering, filter BOTH data and enrichment consistently
valid_ips = set(e['ip'] for e in enrichment if e.get('city'))
data = [d for d in data if d['ip'] in valid_ips]  # Filter both

Data Sources

Tables with Native Geolocation

Some Sentinel tables include lat/lon directly from Microsoft's GeoIP enrichment:

Table Latitude Column Longitude Column Country Column
W3CIISLog RemoteIPLatitude RemoteIPLongitude RemoteIPCountry
CommonSecurityLog DeviceGeoLatitude DeviceGeoLongitude DeviceGeoCountry
AzureDiagnostics varies by source varies by source varies by source
AzureNetworkAnalytics SrcGeoLatitude SrcGeoLongitude SrcGeoCountry

Use these when available - no enrichment needed for coordinates.

Tables Requiring IP Enrichment

These tables have IP addresses but no coordinates:

Table IP Column Enrichment Required
SigninLogs IPAddress Yes - use enrich_ips.py
SecurityEvent IpAddress Yes - use enrich_ips.py
Syslog extract from message Yes - use enrich_ips.py
DeviceNetworkEvents RemoteIP Yes - use enrich_ips.py
OfficeActivity ClientIP Yes - use enrich_ips.py

Enrichment script now captures latitude and longitude from ipinfo.io.


KQL Query Patterns

Pattern 1: Native Geolocation (W3CIISLog)

W3CIISLog
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| where <filter_condition>
| summarize 
    value = count(),
    lat = take_any(RemoteIPLatitude),
    lon = take_any(RemoteIPLongitude),
    country = take_any(RemoteIPCountry)
    by ip = cIP
| where lat != 0 and lon != 0  // Filter unknown locations
| project ip, lat, lon, value
| order by value desc

Pattern 2: Native Geolocation (CommonSecurityLog)

CommonSecurityLog
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| where <filter_condition>
| summarize 
    value = count(),
    lat = take_any(DeviceGeoLatitude),
    lon = take_any(DeviceGeoLongitude)
    by ip = SourceIP
| where lat != 0 and lon != 0
| project ip, lat, lon, value
| order by value desc

Pattern 3: Enrichment Required (Extract IPs Only)

<Table>
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| where <filter_condition>
| summarize value = count() by ip = <IP_column>
| order by value desc
| take 100

Then run enrich_ips.py to get lat/lon.


Scenario-Specific KQL Queries

Scenario: W3CIISLog - Failed Logins (Native Geo)

W3CIISLog
| where TimeGenerated > ago(90d)
| where Computer startswith "<honeypot_name>"
| where scStatus == "401"  // Failed auth
| where cIP != "127.0.0.1"
| summarize 
    value = count(),
    lat = take_any(RemoteIPLatitude),
    lon = take_any(RemoteIPLongitude),
    country = take_any(RemoteIPCountry)
    by ip = cIP
| where lat != 0 and lon != 0
| project ip, lat, lon, value
| order by value desc

Scenario: W3CIISLog - Web Attacks (Native Geo)

W3CIISLog
| where TimeGenerated > ago(30d)
| where tolong(scStatus) >= 400
| where csUriStem has_any ("'", "union", "select", "script", "../", "cmd.exe")
| where cIP != "127.0.0.1"
| summarize 
    value = count(),
    lat = take_any(RemoteIPLatitude),
    lon = take_any(RemoteIPLongitude)
    by ip = cIP
| where lat != 0
| project ip, lat, lon, value
| order by value desc
| take 100

Scenario: CommonSecurityLog - Firewall Blocks (Native Geo)

CommonSecurityLog
| where TimeGenerated > ago(7d)
| where DeviceAction == "Deny" or Activity has "blocked"
| summarize 
    value = count(),
    lat = take_any(DeviceGeoLatitude),
    lon = take_any(DeviceGeoLongitude)
    by ip = SourceIP
| where lat != 0 and lon != 0
| project ip, lat, lon, value
| order by value desc
| take 100

Scenario: SigninLogs - Failed Sign-ins (Requires Enrichment)

Step 1: Query IPs and values

SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType != 0  // Failed
| summarize value = count() by ip = IPAddress
| order by value desc
| take 50

Step 2: Enrich IPs

python enrich_ips.py <ip1> <ip2> <ip3> ...

Step 3: Build map data from enrichment JSON (includes lat/lon)

Scenario: SecurityEvent - RDP Brute Force (Requires Enrichment)

SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID == 4625
| where LogonType == 10  // RDP
| where IpAddress != "-" and IpAddress != "127.0.0.1"
| summarize value = count() by ip = IpAddress
| order by value desc
| take 50

Then enrich to get coordinates.

Scenario: DeviceNetworkEvents - Inbound Attacks (Requires Enrichment)

DeviceNetworkEvents
| where TimeGenerated > ago(7d)
| where DeviceName =~ "<device_name>"
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
| where LocalPort in (3389, 22, 445, 80, 443)
| where RemoteIP !startswith "192.168." and RemoteIP !startswith "10."
| summarize value = count() by ip = RemoteIP
| order by value desc
| take 50

Enrichment Integration

When Coordinates Are Not in Sentinel

For tables without native geo fields, use the enrichment script:

Step 1: Run your KQL query to get IPs and values

Step 2: Enrich IPs:

python enrich_ips.py 203.0.113.42 198.51.100.10 192.0.2.1
# Or from file:
python enrich_ips.py --file temp/attack_ips.json

Step 3: Load enrichment JSON and build map data:

import json

# Load enrichment (now includes latitude/longitude from ipinfo.io)
with open('temp/ip_enrichment_<timestamp>.json', 'r') as f:
    enrichment = json.load(f)

# Build map data
map_data = []
enrichment_out = []

for e in enrichment:
    ip = e['ip']
    lat = e.get('latitude')
    lon = e.get('longitude')
    
    if lat is None or lon is None:
        continue  # Skip IPs without coordinates
    
    # Get value from your KQL results (create a lookup dict)
    value = attack_counts.get(ip, 1)
    
    map_data.append({
        'ip': ip,
        'lat': lat,
        'lon': lon,
        'value': value
    })
    
    # Build enrichment for drill-down
    threat_cats = []
    for c in e.get('recent_comments', [])[:5]:
        threat_cats.extend(c.get('categories', []))
    
    enrichment_out.append({
        'ip': ip,
        'city': e.get('city', 'Unknown'),
        'country': e.get('country', '??'),
        'org': e.get('org', 'Unknown'),
        'is_vpn': e.get('is_vpn') or e.get('vpnapi_security_vpn', False),
        'abuse_confidence_score': e.get('abuse_confidence_score', 0),
        'total_reports': e.get('total_reports', 0),
        'last_reported': e.get('recent_comments', [{}])[0].get('date', '')[:10] if e.get('recent_comments') else '',
        'threat_categories': list(set(threat_cats))[:5]
    })

Interactive Features with Enrichment

When enrichment is provided:

  • Click any marker → Opens threat intel panel showing:
    • 📍 Location (city, country)
    • 🏢 Organization/ISP
    • 🏷️ VPN/Proxy/Tor badges
    • 📊 AbuseIPDB confidence meter
    • 📈 Total reports count
    • 🔴 Threat category tags

Color Scale Guide

Scale Low Value High Value Best For
blue-red Blue Red Threats (attacks, failures) - DEFAULT
green-red Teal Green Positive activity (benign traffic)
blue-yellow Blue Yellow Neutral data distributions

For threat/attack maps, always use blue-red.


Complete Examples

Example 1: 90-Day Honeypot Attack Map (Native Geo)

# 1. Query with native lat/lon from W3CIISLog
mcp_sentinel-data_query_lake({
  "query": "W3CIISLog | where TimeGenerated > ago(90d) | where Computer startswith '<HONEYPOT_SERVER>' | where scStatus == '401' | summarize value = count(), lat = take_any(RemoteIPLatitude), lon = take_any(RemoteIPLongitude), country = take_any(RemoteIPCountry) by ip = cIP | where lat != 0 and lon != 0 | project ip, lat, lon, value | order by value desc"
})

# 2. Enrich top IPs for threat intel drill-down
python enrich_ips.py 101.36.107.228 193.142.147.209 80.190.82.185

# 3. Display geomap
mcp_sentinel-geom_show-attack-map({
  "data": [
    {"ip": "101.36.107.228", "lat": 22.25, "lon": 114.15, "value": 44},
    {"ip": "80.190.82.185", "lat": 50.97, "lon": 6.83, "value": 44},
    {"ip": "193.142.147.209", "lat": 52.35, "lon": 4.92, "value": 13},
    {"ip": "170.64.158.196", "lat": -33.9, "lon": 151.19, "value": 9}
  ],
  "title": "Honeypot Attack Origins - 90 Day Analysis",
  "valueLabel": "Failed Logins",
  "colorScale": "blue-red",
  "enrichment": [
    {"ip": "101.36.107.228", "city": "Hong Kong", "country": "HK", "org": "AS135377 UCLOUD", "is_vpn": true, "abuse_confidence_score": 100, "total_reports": 4612, "threat_categories": ["SSH", "Brute-Force"]},
    {"ip": "193.142.147.209", "city": "Amsterdam", "country": "NL", "org": "AS213438 ColocaTel", "is_vpn": true, "abuse_confidence_score": 100, "total_reports": 30973, "threat_categories": ["Web App Attack", "Hacking"]}
  ]
})

Example 2: SigninLogs Attack Map (Enrichment Required)

# 1. Query IPs with failed sign-ins
mcp_sentinel-data_query_lake({
  "query": "SigninLogs | where TimeGenerated > ago(7d) | where ResultType != 0 | summarize value = count() by ip = IPAddress | order by value desc | take 50"
})

# 2. Enrich all IPs (script now captures lat/lon)
python enrich_ips.py <ip1> <ip2> ...

# 3. Load enrichment JSON and build map data
# (See Python code in Enrichment Integration section)

# 4. Display geomap
mcp_sentinel-geom_show-attack-map({
  "data": [<map_data from enrichment>],
  "title": "Failed Sign-In Origins (Last 7 Days)",
  "valueLabel": "Failed Attempts",
  "colorScale": "blue-red",
  "enrichment": [<enrichment_out>]
})

Example 3: Firewall Blocks (Native Geo)

# 1. Query blocked traffic with geo
mcp_sentinel-data_query_lake({
  "query": "CommonSecurityLog | where TimeGenerated > ago(24h) | where DeviceAction == 'Deny' | summarize value = count(), lat = take_any(DeviceGeoLatitude), lon = take_any(DeviceGeoLongitude) by ip = SourceIP | where lat != 0 | project ip, lat, lon, value | order by value desc | take 100"
})

# 2. Display geomap
mcp_sentinel-geom_show-attack-map({
  "data": [<query results>],
  "title": "Blocked Traffic Origins (Last 24h)",
  "valueLabel": "Blocked Connections",
  "colorScale": "blue-red"
})

Follow-Up Investigation Queries

When users select IPs from the geomap and click "🔍 Investigate in Chat", run these queries to provide comprehensive threat analysis. Execute queries in parallel where possible.

Multi-IP Filter Pattern

All queries use this dynamic IP filter:

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>", ...]);

Replace with the actual IPs selected from the geomap.


Query 1: DeviceNetworkEvents (Network Activity)

Purpose: Show all network connections from selected IPs to any device in the environment.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where RemoteIP in (target_ips)
| summarize 
    ConnectionCount = count(),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    TargetDevices = make_set(DeviceName, 10),
    TargetPorts = make_set(LocalPort, 20),
    Actions = make_set(ActionType, 5)
    by RemoteIP
| extend Duration = LastSeen - FirstSeen
| order by ConnectionCount desc

Columns returned:

  • RemoteIP: Attacker IP
  • ConnectionCount: Total connections
  • FirstSeen/LastSeen: Activity time range
  • TargetDevices: Devices contacted
  • TargetPorts: Ports targeted (LocalPort = service ports on your devices)
  • Actions: Connection types (Success, Blocked, etc.)

Query 2: SecurityEvent (Windows Authentication)

Purpose: Show Windows authentication attempts from selected IPs.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
SecurityEvent
| where TimeGenerated between (start .. end)
| where IpAddress in (target_ips)
| where EventID in (4624, 4625, 4648, 4771, 4776)
| summarize 
    EventCount = count(),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    TargetComputers = make_set(Computer, 10),
    TargetAccounts = make_set(Account, 20),
    LogonTypes = make_set(LogonType, 5)
    by IpAddress, EventID
| extend EventType = case(
    EventID == 4624, "Successful Logon",
    EventID == 4625, "Failed Logon",
    EventID == 4648, "Explicit Credentials",
    EventID == 4771, "Kerberos Pre-Auth Failed",
    EventID == 4776, "NTLM Auth Attempt",
    "Other")
| project IpAddress, EventType, EventCount, TargetComputers, TargetAccounts, LogonTypes, FirstSeen, LastSeen
| order by EventCount desc

Key Event IDs:

  • 4624: Successful logon (ALERT: attacker got in!)
  • 4625: Failed logon (brute force indicator)
  • 4648: Explicit credentials used (lateral movement)
  • 4771: Kerberos pre-auth failed
  • 4776: NTLM credential validation

Query 3: W3CIISLog (Web Attacks)

Purpose: Show HTTP requests from selected IPs including attack patterns.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
W3CIISLog
| where TimeGenerated between (start .. end)
| where cIP in (target_ips)
| summarize 
    RequestCount = count(),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    TargetServers = make_set(Computer, 10),
    URIs = make_set(csUriStem, 20),
    StatusCodes = make_set(tolong(scStatus), 10),
    Methods = make_set(csMethod, 5),
    UserAgents = make_set(csUserAgent, 5)
    by cIP
| extend AttackPatterns = case(
    URIs has_any ("'", "union", "select"), "SQL Injection",
    URIs has "script", "XSS",
    URIs has_any ("../", "..\\"), "Path Traversal",
    URIs has_any ("cmd.exe", "powershell"), "Command Injection",
    "Reconnaissance")
| project IP = cIP, RequestCount, AttackPatterns, TargetServers, StatusCodes, Methods, URIs, FirstSeen, LastSeen
| order by RequestCount desc

Query 4: SigninLogs (Azure AD Activity)

Purpose: Show Azure AD sign-in attempts from selected IPs.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
SigninLogs
| where TimeGenerated between (start .. end)
| where IPAddress in (target_ips)
| summarize 
    SignInCount = count(),
    SuccessCount = countif(ResultType == 0),
    FailureCount = countif(ResultType != 0),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    TargetUsers = make_set(UserPrincipalName, 20),
    TargetApps = make_set(AppDisplayName, 10),
    ErrorCodes = make_set(ResultType, 10),
    ClientApps = make_set(ClientAppUsed, 5)
    by IPAddress
| extend SuccessRate = round(100.0 * SuccessCount / SignInCount, 1)
| project IPAddress, SignInCount, SuccessCount, FailureCount, SuccessRate, TargetUsers, TargetApps, ErrorCodes, FirstSeen, LastSeen
| order by SignInCount desc

CRITICAL: Check SuccessCount > 0 - This indicates the attacker successfully authenticated!


Query 5: ThreatIntelIndicators (Known Threats)

Purpose: Check if selected IPs match threat intelligence databases.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
ThreatIntelIndicators
| extend IndicatorType = replace_string(replace_string(replace_string(tostring(split(ObservableKey, ":", 0)), "[", ""), "]", ""), "\"", "")
| where IndicatorType in ("ipv4-addr", "ipv6-addr", "network-traffic")
| extend NetworkSourceIP = toupper(ObservableValue)
| where NetworkSourceIP in (target_ips)
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| extend TrafficLightProtocolLevel = tostring(parse_json(AdditionalFields).TLPLevel)
| extend ActivityGroupNames = extract(@"ActivityGroup:(\S+)", 1, tostring(parse_json(Data).labels))
| summarize arg_max(TimeGenerated, *) by NetworkSourceIP
| project 
    IPAddress = NetworkSourceIP,
    ThreatDescription = Description,
    ActivityGroupNames,
    Confidence,
    ValidUntil,
    TrafficLightProtocolLevel,
    IsActive,
    TimeGenerated
| order by Confidence desc

Key Fields:

  • Confidence: 0-100 threat confidence score
  • ActivityGroupNames: APT/threat actor attribution (e.g., "PHOSPHORUS", "NOBELIUM")
  • ThreatDescription: Details about the threat

Query 6: SecurityAlert with Incident Status

Purpose: Find security alerts that reference selected IPs, with the actual status from SecurityIncident (not the immutable alert status).

⚠️ IMPORTANT: SecurityAlert.Status is immutable ("New" at creation time). The actual status is on the SecurityIncident table. This query joins to get the real incident status.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
// Step 1: Find alerts containing target IPs as entities
let matched_alerts = SecurityAlert
| where TimeGenerated between (start .. end)
| extend EntitiesParsed = parse_json(Entities)
| mv-expand Entity = EntitiesParsed
| where Entity.["Type"] == "ip"
| extend EntityIP = tostring(Entity.Address)
| where EntityIP in (target_ips)
| summarize MatchedIPs = make_set(EntityIP) by SystemAlertId;
// Step 2: Get latest incident status for these alerts (keep AlertIds)
let incident_status = SecurityIncident
| where TimeGenerated between (start .. end)
| summarize arg_max(TimeGenerated, Status, Classification, IncidentNumber, AlertIds) by IncidentName
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| project AlertId, IncidentStatus = Status, Classification, IncidentNumber;
// Step 3: Join alerts with matched IPs and incident status
SecurityAlert
| where TimeGenerated between (start .. end)
| where SystemAlertId in (matched_alerts)
| join kind=leftouter matched_alerts on $left.SystemAlertId == $right.SystemAlertId
| join kind=leftouter incident_status on $left.SystemAlertId == $right.AlertId
| summarize arg_max(TimeGenerated, AlertName, AlertSeverity, Status, ProviderName, Tactics, Description, MatchedIPs, IncidentStatus, Classification, IncidentNumber) by SystemAlertId
| extend FinalStatus = coalesce(IncidentStatus, Status)  // Use incident status if available
| project 
    TimeGenerated,
    AlertName,
    AlertSeverity,
    Status = FinalStatus,
    Classification,
    IncidentNumber,
    ProviderName,
    Tactics,
    MatchedIPs,
    Description
| order by TimeGenerated desc
| take 25

Why This Matters:

  • SecurityAlert.Status = "New" is the creation status (immutable)
  • SecurityIncident.Status shows the current status (New/Active/Closed)
  • SecurityIncident.Classification shows the closure reason (TruePositive/FalsePositive/BenignPositive)
  • Alerts without incidents keep their original "New" status

Entities JSON Structure Example:

[
  {"$id":"3","HostName":"contoso-server","Type":"host"},
  {"$id":"4","Address":"203.0.113.10","Type":"ip"},
  {"$id":"5","Address":"198.51.100.20","Type":"ip"}
]

Query 7: DeviceProcessEvents (Process Execution Post-Compromise)

Purpose: If attacker IPs had successful connections, check for suspicious process execution.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
// First, find devices that had connections from target IPs
let compromised_devices = DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where RemoteIP in (target_ips)
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
| distinct DeviceName;
// Then check for suspicious processes on those devices
DeviceProcessEvents
| where TimeGenerated between (start .. end)
| where DeviceName in (compromised_devices)
| where FileName in~ ("powershell.exe", "cmd.exe", "wscript.exe", "cscript.exe", "mshta.exe", "certutil.exe", "bitsadmin.exe", "regsvr32.exe", "rundll32.exe")
    or ProcessCommandLine has_any ("Invoke-", "IEX", "DownloadString", "WebClient", "-enc", "-encoded", "bypass", "hidden")
| project TimeGenerated, DeviceName, FileName, ProcessCommandLine, AccountName, InitiatingProcessFileName
| order by TimeGenerated desc
| take 50

Query 8: DeviceFileEvents (Malware Drops)

Purpose: Check for file creation/modification on devices contacted by attacker IPs.

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>"]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
// Find devices that had connections from target IPs
let compromised_devices = DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where RemoteIP in (target_ips)
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
| distinct DeviceName;
// Check for suspicious file activity
DeviceFileEvents
| where TimeGenerated between (start .. end)
| where DeviceName in (compromised_devices)
| where ActionType in ("FileCreated", "FileModified")
| where FileName endswith_cs ".exe" or FileName endswith_cs ".dll" or FileName endswith_cs ".ps1" 
    or FileName endswith_cs ".bat" or FileName endswith_cs ".vbs" or FileName endswith_cs ".js"
| where FolderPath has_any ("\\Temp\\", "\\AppData\\", "\\Downloads\\", "\\ProgramData\\", "\\Users\\Public\\")
| project TimeGenerated, DeviceName, FileName, FolderPath, ActionType, InitiatingProcessFileName, SHA256
| order by TimeGenerated desc
| take 50

Recommended Execution Order

When user selects IPs and clicks "Investigate in Chat":

Phase 1 (Parallel):

  • Query 1: DeviceNetworkEvents
  • Query 2: SecurityEvent
  • Query 3: W3CIISLog
  • Query 4: SigninLogs
  • Query 5: ThreatIntelIndicators
  • Query 6: SecurityAlert

Phase 2 (If connections found):

  • Query 7: DeviceProcessEvents (post-compromise activity)
  • Query 8: DeviceFileEvents (malware indicators)

Response Format:

Summarize findings with:

  1. Threat Level Assessment (Critical/High/Medium/Low)
  2. Attack Summary - What the IPs did, which devices/users were targeted
  3. Successful Access - ALERT if any successful logins (4624) or Azure AD success (ResultType=0)
  4. Threat Intel Matches - Known APT groups, malware campaigns
  5. Recommendations - Block IPs, investigate users, isolate devices

Interactive Selection Feature

The geomap supports multi-select mode for follow-up investigations:

How to Use

  1. Click "☑ Select" button (top of map) to enter selection mode
  2. Click markers to add/remove IPs from selection (green checkmark ✓)
  3. Review selection panel showing selected IPs with enrichment summary
  4. Click "🔍 Investigate in Chat" to send selected IPs for investigation

What Happens

When you click "Investigate in Chat":

  1. All selected IPs are formatted with enrichment context
  2. Message is sent to chat as a user message
  3. LLM runs the follow-up queries above automatically
  4. Results are summarized with threat assessment

Selection Panel Shows

For each selected IP:

  • IP address
  • City, Country
  • Abuse confidence score (color-coded badge)
  • Attack value from the map

Technical Notes

  • Projection: Robinson projection for accurate world map display
  • Map Source: SimpleMaps.com world SVG (MIT license)
  • Bundle Size: ~650 KB (includes embedded world map)
  • CSP Compliance: No external resources - all assets embedded inline
  • Coordinate System: Standard WGS84 (latitude: -90 to 90, longitude: -180 to 180)

When to Use Geomaps

Good Use Cases:

  • Attack origin visualization (honeypots, firewalls)
  • Geographic threat distribution
  • Anomalous sign-in locations
  • VPN/anonymization analysis across regions
  • Executive briefings on global threats

Skip Geomaps When:

  • Fewer than 3 unique locations (too sparse)
  • All IPs from same region (use heatmap instead)
  • Time-based patterns needed (use heatmap)
  • No geographic data available and enrichment not feasible

Last Updated: January 29, 2026

基于Microsoft Sentinel数据生成交互式热力图,用于展示随时间变化的活动模式、聚合数据矩阵及异常检测。支持通过MCP工具渲染 signin logs 等数据,提供丰富的KQL查询模式和颜色配置选项。
创建热力图 可视化时间模式 显示活动网格 分析攻击模式 检查登录活动分布
.github/skills/heatmap-visualization/SKILL.md
npx skills add SCStelz/security-investigator --skill heatmap-visualization -g -y
SKILL.md
Frontmatter
{
    "name": "heatmap-visualization",
    "description": "Use this skill when asked to create heatmaps, visualize patterns over time, show activity grids, or display aggregated data in a matrix format. Triggers on keywords like \"heatmap\", \"show heatmap\", \"visualize patterns\", \"activity grid\", \"time-based visualization\", or when analyzing attack patterns, sign-in activity, or event distributions by time period."
}

Heatmap Visualization Skill

Purpose

Generate interactive heatmap visualizations from Microsoft Sentinel data using the Sentinel Heatmap MCP App. Heatmaps display aggregated data in a row/column grid with color-coded intensity, ideal for identifying patterns across time periods, comparing entities, or spotting anomalies.


📑 TABLE OF CONTENTS

  1. Quick Start - Minimal example to get started
  2. MCP Tool Reference - Parameters and schemas
  3. KQL Query Patterns - Ready-to-use queries by scenario
  4. Enrichment Integration - Adding threat intel drill-down
  5. Color Scale Guide - Choosing the right colors
  6. Examples - End-to-end workflows

Quick Start

Minimal Heatmap (3 Steps)

# 1. Query Sentinel for aggregated data
mcp_sentinel-data_query_lake({
  "query": "SigninLogs | where TimeGenerated > ago(24h) | summarize value = count() by row = AppDisplayName, column = format_datetime(bin(TimeGenerated, 1h), 'HH:mm') | project row, column, value"
})

# 2. Display heatmap
mcp_sentinel-heat_show-signin-heatmap({
  "data": [<query results>],
  "title": "Sign-Ins by Application (Last 24h)",
  "rowLabel": "Application",
  "colLabel": "Hour (UTC)",
  "valueLabel": "Sign-ins",
  "colorScale": "green-red"
})

MCP Tool Reference

Tool: mcp_sentinel-heat_show-signin-heatmap

Parameter Required Type Description
data array Array of {row, column, value} objects
title string Title displayed above heatmap
rowLabel string Label for row axis (e.g., "IP Address")
colLabel string Label for column axis (e.g., "Hour")
valueLabel string Label for cell values (e.g., "Events")
colorScale string green-red, blue-red, or blue-yellow
enrichment array IP enrichment data for click-to-expand panels

Data Schema

{
  "data": [
    {"row": "192.168.1.1", "column": "10:00", "value": 45},
    {"row": "192.168.1.1", "column": "11:00", "value": 62},
    {"row": "10.0.0.5", "column": "10:00", "value": 128}
  ]
}

Enrichment Schema (Optional)

{
  "enrichment": [
    {
      "ip": "80.94.95.83",
      "city": "Timișoara",
      "country": "RO",
      "org": "AS204428 SS-Net",
      "is_vpn": false,
      "abuse_confidence_score": 100,
      "total_reports": 975,
      "last_reported": "2026-01-29",
      "threat_categories": ["RDP Brute-Force", "Hacking", "Port Scan"]
    }
  ]
}

KQL Query Patterns

All queries must return row, column, value columns.

Pattern 1: Activity by Entity and Hour

<Table>
| where TimeGenerated between (datetime(<start>) .. datetime(<end>))
| summarize value = count() 
    by row = <entity_field>, 
       column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc

Pattern 2: Activity by Entity and Day

<Table>
| where TimeGenerated > ago(30d)
| summarize value = count() 
    by row = <entity_field>, 
       column = format_datetime(bin(TimeGenerated, 1d), "yyyy-MM-dd")
| project row, column, value
| order by column asc

Pattern 3: Cross-Tabulation (Two Dimensions)

<Table>
| where TimeGenerated > ago(7d)
| summarize value = count() 
    by row = <dimension1>, 
       column = <dimension2>
| project row, column, value
| order by value desc

Scenario-Specific KQL Queries

Scenario: Sign-In Activity by Application and Hour

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == 0  // Successful sign-ins
| summarize value = count() 
    by row = AppDisplayName, 
       column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc

Recommended: colorScale: "green-red" (activity = good)

Scenario: Failed Sign-Ins by IP and Hour

SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != 0  // Failed sign-ins
| summarize value = count() 
    by row = IPAddress, 
       column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc
| take 500  // Limit to top patterns

Recommended: colorScale: "blue-red" (failures = threat)

Scenario: Honeypot Attack Patterns (SecurityEvent)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer contains honeypot
| where EventID in (4625, 4771, 4776)  // Failed auth events
| where isnotempty(IpAddress) and IpAddress != "-" and IpAddress != "127.0.0.1"
| summarize value = count() 
    by row = IpAddress, 
       column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc

Recommended: colorScale: "blue-red" (attacks = threat)

Scenario: Web Attack Patterns (W3CIISLog)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
W3CIISLog
| where TimeGenerated between (start .. end)
| where tolong(scStatus) >= 400  // HTTP errors
| where cIP != "127.0.0.1"
| summarize value = count() 
    by row = cIP, 
       column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc
| take 300

Recommended: colorScale: "blue-red"

Scenario: Defender Alerts by Severity and Day

SecurityAlert
| where TimeGenerated > ago(30d)
| summarize value = count() 
    by row = AlertSeverity, 
       column = format_datetime(bin(TimeGenerated, 1d), "yyyy-MM-dd")
| project row, column, value
| order by column asc

Recommended: colorScale: "blue-yellow" (neutral overview)

Scenario: User Activity by Application

SigninLogs
| where TimeGenerated > ago(7d)
| where UserPrincipalName =~ '<UPN>'
| summarize value = count() 
    by row = AppDisplayName, 
       column = format_datetime(bin(TimeGenerated, 1d), "MM-dd")
| project row, column, value
| order by column asc

Recommended: colorScale: "green-red"

Scenario: Multi-Source Combined Heatmap

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
union
  (SecurityEvent
   | where TimeGenerated between (start .. end)
   | where Computer contains honeypot
   | where EventID in (4625, 4771, 4776)
   | where isnotempty(IpAddress) and IpAddress != "-"
   | extend Source = "RDP/SMB", IP = IpAddress),
  (W3CIISLog
   | where TimeGenerated between (start .. end)
   | where Computer contains honeypot
   | where tolong(scStatus) >= 400
   | extend Source = "IIS", IP = cIP),
  (DeviceNetworkEvents
   | where TimeGenerated between (start .. end)
   | where DeviceName contains honeypot
   | where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted")
   | extend Source = "Network", IP = RemoteIP)
| where IP != "127.0.0.1" and IP != "::1"
| summarize value = count()
    by row = strcat(IP, " (", Source, ")"), 
       column = format_datetime(bin(TimeGenerated, 1h), "HH:mm")
| project row, column, value
| order by column asc, value desc

Enrichment Integration

Adding Threat Intel Drill-Down

When displaying IP-based heatmaps, add enrichment data for click-to-expand threat panels:

Step 1: Extract unique IPs from your query results

Step 2: Enrich IPs using the enrichment script:

python enrich_ips.py 80.94.95.83 193.142.147.209 101.36.107.228

Step 3: Transform enrichment output to heatmap format:

enrichment_out = []
for e in enrichment_data:
    threat_cats = []
    for c in e.get('recent_comments', [])[:5]:
        threat_cats.extend(c.get('categories', []))
    
    enrichment_out.append({
        'ip': e['ip'],
        'city': e.get('city', 'Unknown'),
        'country': e.get('country', '??'),
        'org': e.get('org', 'Unknown'),
        'is_vpn': e.get('is_vpn') or e.get('vpnapi_security_vpn', False),
        'abuse_confidence_score': e.get('abuse_confidence_score', 0),
        'total_reports': e.get('total_reports', 0),
        'last_reported': e.get('recent_comments', [{}])[0].get('date', '')[:10],
        'threat_categories': list(set(threat_cats))[:5]
    })

Step 4: Include in heatmap call:

mcp_sentinel-heat_show-signin-heatmap({
  "data": [...],
  "enrichment": [<enrichment_out>],
  ...
})

Interactive Features with Enrichment

When enrichment is provided:

  • Click any IP row → Opens threat intel panel showing:
    • 📍 Location (city, country)
    • 🏢 Organization/ISP
    • 🏷️ VPN/Proxy/Tor badges
    • 📊 AbuseIPDB confidence meter (0-100)
    • 📈 Total reports count
    • 🔴 Threat category tags
  • Hover any cell → Tooltip with row, column, exact value

Color Scale Guide

Scale Low Value High Value Best For
green-red Teal/Blue Green Positive activity (sign-ins, successful ops)
blue-red Blue Red Threats/failures (attacks, errors, risks)
blue-yellow Blue Yellow Neutral data (general distributions)

Decision Tree

Is the data about threats/failures/attacks?
  → YES: Use "blue-red" (red = danger)
  → NO: Is high volume a positive indicator?
    → YES: Use "green-red" (green = success)
    → NO: Use "blue-yellow" (neutral)

Complete Examples

Example 1: Honeypot Attack Heatmap with Enrichment

# Query attack data
mcp_sentinel-data_query_lake({
  "query": "SecurityEvent | where TimeGenerated between (datetime(<START_DATE>) .. datetime(<END_DATE>)) | where Computer contains '<HONEYPOT_SERVER>' | where EventID == 4625 | where IpAddress != '127.0.0.1' | summarize value = count() by row = IpAddress, column = format_datetime(bin(TimeGenerated, 1h), 'HH:mm') | project row, column, value | order by column asc, value desc | take 200"
})

# Enrich top IPs
python enrich_ips.py 80.94.95.83 193.142.147.209 101.36.107.228

# Display heatmap
mcp_sentinel-heat_show-signin-heatmap({
  "data": [
    {"row": "80.94.95.83", "column": "19:00", "value": 636},
    {"row": "193.142.147.209", "column": "20:00", "value": 245},
    ...
  ],
  "title": "Honeypot Attack Analysis - Click IP for Threat Intel",
  "rowLabel": "Attacker IP",
  "colLabel": "Hour (UTC)",
  "valueLabel": "Failed Auth Attempts",
  "colorScale": "blue-red",
  "enrichment": [
    {"ip": "80.94.95.83", "city": "Timișoara", "country": "RO", "org": "AS204428 SS-Net", "is_vpn": false, "abuse_confidence_score": 100, "total_reports": 975, "threat_categories": ["RDP Brute-Force", "Hacking"]},
    {"ip": "193.142.147.209", "city": "Amsterdam", "country": "NL", "org": "AS213438 ColocaTel Inc.", "is_vpn": true, "abuse_confidence_score": 100, "total_reports": 30972, "threat_categories": ["SSH Brute-Force", "Port Scan"]}
  ]
})

Example 2: Sign-In Activity Overview

# Query sign-in data
mcp_sentinel-data_query_lake({
  "query": "SigninLogs | where TimeGenerated > ago(24h) | where ResultType == 0 | summarize value = count() by row = AppDisplayName, column = format_datetime(bin(TimeGenerated, 1h), 'HH:mm') | project row, column, value | order by column asc"
})

# Display heatmap (no enrichment needed - not IP-based)
mcp_sentinel-heat_show-signin-heatmap({
  "data": [
    {"row": "Microsoft Teams", "column": "09:00", "value": 145},
    {"row": "Outlook", "column": "09:00", "value": 312},
    ...
  ],
  "title": "Sign-In Activity by Application (Last 24h)",
  "rowLabel": "Application",
  "colLabel": "Hour (UTC)",
  "valueLabel": "Sign-ins",
  "colorScale": "green-red"
})

Known Pitfalls

Column Sorting Is Lexicographic

Problem: The heatmap MCP app sorts columns alphabetically. Labels like Nov 10, Dec 01, Jan 05, Feb 02 will render as Dec → Feb → Jan → Nov — completely out of chronological order.
Solution: Always use ISO date format (YYYY-MM-DD) for time-based column labels. 2025-11-10, 2025-12-01, 2026-01-05 sorts correctly both alphabetically and chronologically.

// ✅ CORRECT — sortable column labels
| summarize value = count() by row = ..., column = format_datetime(bin(TimeGenerated, 7d), "yyyy-MM-dd")

// ❌ WRONG — alphabetic sort breaks chronological order
| summarize value = count() by row = ..., column = format_datetime(bin(TimeGenerated, 7d), "MMM dd")

For hourly heatmaps within a single day, HH:mm is fine (00:00–23:00 sorts correctly). The issue only affects multi-day/week/month labels.


When to Use Heatmaps

Good Use Cases:

  • Attack patterns over time (by hour/day)
  • Comparing activity across entities (IPs, apps, users)
  • Identifying peak activity periods
  • Spotting anomalies in regular patterns
  • Executive-friendly threat visualization

Skip Heatmaps When:

  • Fewer than 5 unique rows or columns (too sparse)
  • Single-dimension data (use bar chart instead)
  • Geographic data (use geomap skill instead)
  • Real-time streaming data (heatmaps are for aggregated snapshots)

Last Updated: January 29, 2026

用于全面分析蜜罐服务器安全,评估攻击模式、威胁情报及漏洞。支持并行查询、IP丰富化、漏洞评估及生成高管报告,强调时间追踪与标准化工作流。
honeypot investigation analyze honeypot honeypot security honeypot report
.github/skills/honeypot-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill honeypot-investigation -g -y
SKILL.md
Frontmatter
{
    "name": "honeypot-investigation",
    "description": "Use this skill when asked to analyze, investigate, or report on honeypot server security. Triggers on keywords like \"honeypot investigation\", \"analyze honeypot\", \"honeypot security\", \"honeypot report\", or when a server name is mentioned with honeypot analysis context. This skill provides comprehensive security analysis including attack patterns, threat intelligence correlation, IP enrichment, vulnerability assessment, and executive report generation.",
    "drill_down_prompt": "Investigate honeypot {entity} — attack patterns, threat intel, vulnerability assessment",
    "threat_pulse_domains": [
        "endpoint",
        "exposure"
    ]
}

Honeypot Investigation Agent - Instructions

Purpose

This agent performs comprehensive security analysis on honeypot servers to assess attack patterns, threat intelligence, vulnerabilities, and defensive effectiveness. Honeypots are decoy systems designed to attract attackers and provide early warning of emerging threats.


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Investigation Parameters - Input requirements
  3. Execution Workflow - Complete process with time tracking
  4. KQL Query Library - Validated query patterns
  5. Report Template - Executive markdown structure
  6. Error Handling - Troubleshooting guide
  7. Visualization Options - Heatmap and Geomap skills

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY honeypot investigation:

  1. ALWAYS calculate date ranges correctly (use current date from context)
  2. ALWAYS track and report time after each major step (mandatory per main instructions)
  3. ALWAYS run independent queries in parallel (drastically faster execution)
  4. ALWAYS save intermediate results to temp/ (enables debugging and auditing)
  5. ALWAYS use create_file for reports (NEVER use PowerShell terminal commands)

Date Range Rules (from main copilot-instructions):

  • Real-time/recent searches: Add +2 days to current date for end range
  • Example: Current date = Dec 12, 2025; Last 48 hours = datetime(2025-12-10) to datetime(2025-12-14)

Investigation Parameters

Required Inputs

Parameter Description Example
Honeypot Name Server/device name honeypot-server
Time Range Investigation period last 48 hours, last 7 days

Automatic Derivations

  • Start Date: Current date - time range
  • End Date: Current date + 2 days (per date range rules)
  • Output File: reports/honeypot/Honeypot_Report_<hostname>_<timestamp>.md
  • Temp Files: temp/honeypot_ips_<timestamp>.json, temp/honeypot_data_<timestamp>.json

Execution Workflow

🚨 MANDATORY: Time Tracking Pattern

YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:

[MM:SS] ✓ Step description (XX seconds)

Required Reporting Points:

  1. After Phase 1 (failed connection queries)
  2. After Phase 2 (IP enrichment + threat intel)
  3. After Phase 3 (incident filtering)
  4. After Phase 4 (vulnerability scan)
  5. After Phase 5 (report generation)
  6. Final: Total elapsed time

Phase 1: Query Failed Connections (PARALLEL)

Execute ALL THREE queries in parallel using mcp_sentinel-data_query_lake:

Query 1A: SecurityEvent (Windows Security Logs)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer contains honeypot  // Use 'contains' for flexible hostname matching
| where EventID in (4625, 4771, 4776)  // Failed logon attempts
| where isnotempty(IpAddress) and IpAddress != "-"  // IpAddress is built-in field
| where IpAddress != "127.0.0.1"  // Exclude localhost (internal honeypot traffic)
| summarize 
    FailedAttempts=count(), 
    FirstSeen=min(TimeGenerated), 
    LastSeen=max(TimeGenerated),
    TargetAccounts=make_set(Account, 10)
    by IpAddress, EventID
| extend EventType = case(
    EventID == 4625, "Failed Logon",
    EventID == 4771, "Kerberos Pre-Auth Failed",
    EventID == 4776, "NTLM Auth Failed",
    "Unknown")
| order by FailedAttempts desc
| take 50

Query 1B: W3CIISLog (IIS Web Server Logs)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
W3CIISLog
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where tolong(scStatus) >= 400  // HTTP errors (4xx/5xx) - scStatus is string type
| where cIP != "127.0.0.1" and cIP != "::1"  // Exclude localhost (internal honeypot traffic)
| summarize 
    RequestCount=count(), 
    FirstSeen=min(TimeGenerated), 
    LastSeen=max(TimeGenerated),
    TargetedURIs=make_set(csUriStem, 10),
    StatusCodes=make_set(tolong(scStatus), 5)  // Convert to long for proper aggregation
    by IpAddress = cIP
| order by RequestCount desc
| take 50

Query 1C: DeviceNetworkEvents (Defender Network Traffic - INBOUND ONLY)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName =~ honeypot
| where ActionType in ("ConnectionSuccess", "InboundConnectionAccepted", "ConnectionFound")  // Successful inbound TCP connections
| where LocalPort in (3389, 80, 443, 445, 22, 21, 23, 8080, 8443)  // Filter by attacked services (LocalPort = honeypot's listening port)
| where RemoteIP != "127.0.0.1" and RemoteIP != "::1" and RemoteIP != "::ffff:127.0.0.1"  // Exclude localhost
| where RemoteIP !startswith "192.168." and RemoteIP !startswith "10." and RemoteIP !startswith "172.16."  // Exclude RFC1918 private IPs
| where RemoteIP !startswith "fe80:" and RemoteIP !startswith "fc00:" and RemoteIP !startswith "fd00:"  // Exclude IPv6 link-local and ULA
| where RemoteIP !startswith "::ffff:"  // Filter out IPv6-mapped IPv4 addresses (reduces duplicate noise)
| summarize 
    ConnectionCount=count(), 
    FirstSeen=min(TimeGenerated), 
    LastSeen=max(TimeGenerated),
    TargetedPorts=make_set(LocalPort, 10),  // LocalPort = attacked services on honeypot
    Actions=make_set(ActionType, 5)
    by RemoteIP  // RemoteIP = attacker source
| order by ConnectionCount desc
| take 50

IMPORTANT: This query shows TCP connection establishment (network layer), NOT successful authentication. Attackers who appear here may still fail at the authentication layer (SecurityEvent 4625). For honeypots, all inbound connections should be treated as reconnaissance/attack attempts.

After Phase 1 completes:

  • Merge all three result sets
  • Rank IPs by attack volume (prioritize SecurityEvent FailedAttempts, then W3CIISLog RequestCount, then DeviceNetworkEvents ConnectionCount)
  • Select top 10-15 IPs for enrichment (focus on high-volume attackers, not one-off scanners)
  • Extract unique IP addresses into array
  • Save prioritized IPs only to temp/honeypot_ips_<timestamp>.json in format: {"ips": ["1.2.3.4", "5.6.7.8", ...]}
  • Document total unique attacker count separately for report statistics
  • Report elapsed time: [MM:SS] ✓ Failed connection queries completed (XX seconds) - [total_count] unique IPs identified, top [enrichment_count] prioritized for enrichment

Phase 2: IP Enrichment & Threat Intelligence (PARALLEL)

Execute IP enrichment script AND Sentinel threat intel query in parallel:

2A: Run IP Enrichment Script

# Read prioritized IPs from JSON file (top 10-15 by attack volume)
# This reduces token consumption by ~80% while maintaining critical intelligence
$env:PYTHONPATH = "<WORKSPACE_ROOT>"
cd "<WORKSPACE_ROOT>"
.\.venv\Scripts\python.exe enrich_ips.py --file temp/honeypot_ips_<timestamp>.json

Enrichment provides (for prioritized IPs only):

  • Geolocation (city, region, country)
  • ISP/Organization (ASN, org name)
  • VPN/Proxy/Tor detection (is_vpn, is_proxy, is_tor)
  • Abuse reputation (abuse_confidence_score, total_reports)
  • Shodan intelligence: open ports, CVEs, tags (e.g., eol-os, self-signed, c2), CPEs, hostnames
  • Risk level assessment (HIGH/MEDIUM/LOW)

Note: Enrichment script provides aggregated statistics for all IPs - use these summary stats in report narrative instead of listing every IP

2B: Query Sentinel Threat Intelligence

let target_ips = dynamic(["<IP1>", "<IP2>", "<IP3>", ...]);  // From Phase 1 prioritized list (top 10-15 IPs)
ThreatIntelIndicators
| extend IndicatorType = replace_string(replace_string(replace_string(tostring(split(ObservableKey, ":", 0)), "[", ""), "]", ""), "\"", "")
| where IndicatorType in ("ipv4-addr", "ipv6-addr", "network-traffic")
| extend NetworkSourceIP = toupper(ObservableValue)
| where NetworkSourceIP in (target_ips)
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| extend TrafficLightProtocolLevel = tostring(parse_json(AdditionalFields).TLPLevel)
| extend ActivityGroupNames = extract(@"ActivityGroup:(\S+)", 1, tostring(parse_json(Data).labels))
| summarize arg_max(TimeGenerated, *) by NetworkSourceIP
| project 
    TimeGenerated,
    IPAddress = NetworkSourceIP,
    ThreatDescription = Description,
    ActivityGroupNames,
    Confidence,
    ValidUntil,
    TrafficLightProtocolLevel,
    IsActive
| order by Confidence desc, TimeGenerated desc

After Phase 2 completes:

  • Merge IP enrichment JSON with Sentinel threat intel results
  • Save combined data to temp/honeypot_data_<timestamp>.json
  • Report elapsed time: [MM:SS] ✓ IP enrichment completed (XX seconds)

Phase 3: Query Security Incidents (Sentinel KQL)

Step 3A: Get Device ID from Sentinel

let honeypot = '<HONEYPOT_NAME>';
DeviceInfo
| where TimeGenerated > ago(30d)
| where DeviceName =~ honeypot or DeviceName contains honeypot
| summarize arg_max(TimeGenerated, *)
| project DeviceId, DeviceName, OSPlatform, OSVersion, PublicIP

Extract DeviceId (GUID) from result - returns single most recent device record.

Step 3B: Query Security Incidents

let targetDevice = "<HONEYPOT_NAME>";
let targetDeviceId = "<DEVICE_ID>";  // REQUIRED: Get from DeviceInfo query (Step 3A)
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let relevantAlerts = SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has targetDevice or Entities has targetDeviceId
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProviderName, Tactics;
SecurityIncident
| where CreatedTime between (start .. end)  // Filter on CreatedTime for incidents created in range
| summarize arg_max(TimeGenerated, *) by ProviderIncidentId  // Get most recent state per ProviderIncidentId
| project ProviderIncidentId, Title, Severity, Status, Classification, CreatedTime, LastModifiedTime, Owner, AdditionalData, AlertIds, Labels
| where not(tostring(Labels) has "Redirected")  // Exclude merged incidents
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend ProviderIncidentUrl = tostring(AdditionalData.providerIncidentUrl)
| extend OwnerUPN = tostring(Owner.userPrincipalName)
| extend LastModifiedTime = todatetime(LastModifiedTime)
| summarize 
    Title = any(Title),
    Severity = any(Severity),
    Status = any(Status),
    Classification = any(Classification),
    CreatedTime = any(CreatedTime),
    LastModifiedTime = any(LastModifiedTime),
    OwnerUPN = any(OwnerUPN),
    ProviderIncidentUrl = any(ProviderIncidentUrl),
    AlertCount = count(),
    MitreTactics = make_set(Tactics)
    by ProviderIncidentId
| order by LastModifiedTime desc
| take 10

IMPORTANT:

  • This query joins SecurityIncident with SecurityAlert to provide full incident context

  • Deduplication: The final summarize statement collapses multiple alerts per incident into a single row (groups by ProviderIncidentId)

  • Filter on CreatedTime to find incidents created in the investigation period

  • Use arg_max(TimeGenerated, *) by IncidentNumber to get the most recent update for each incident (includes status changes, comments, etc.)

  • Returns up to 10 unique incidents (grouped by ProviderIncidentId to ensure one row per external incident ID)

  • ⚠️ CHECK STATUS FIELD: Only report incidents with Status="New" or "Active" as threats. Status="Closed" + Classification="BenignPositive" = expected honeypot activity (do not flag as threat)

After Phase 3 completes:

  • Report elapsed time: [MM:SS] ✓ Security incidents query completed (XX seconds)

Phase 4: Vulnerability Assessment

⚠️ CRITICAL: TVM tables are snapshot tables — NO time filtering!

  • DeviceTvmSoftwareVulnerabilities has NO Timestamp or TimeGenerated column
  • Do NOT add where Timestamp between (...) — it will fail with a schema error
  • Do NOT use Sentinel Data Lake (query_lake) — TVM tables are only available via Advanced Hunting
  • Use RunAdvancedHuntingQuery MCP tool only

Step 4A: Query Vulnerabilities via Advanced Hunting KQL

let deviceName = '<HONEYPOT_NAME>';
DeviceTvmSoftwareVulnerabilities
| where DeviceName startswith deviceName
| project
    CveId,
    VulnerabilitySeverityLevel,
    SoftwareVendor,
    SoftwareName,
    SoftwareVersion,
    RecommendedSecurityUpdate,
    RecommendedSecurityUpdateId
| summarize by CveId, VulnerabilitySeverityLevel, SoftwareVendor, SoftwareName, SoftwareVersion, RecommendedSecurityUpdate, RecommendedSecurityUpdateId
| order by case(VulnerabilitySeverityLevel == "Critical", 1, VulnerabilitySeverityLevel == "High", 2, VulnerabilitySeverityLevel == "Medium", 3, 4) asc
| take 30

Key columns returned:

  • CveId — CVE identifier (e.g., CVE-2025-15467)
  • VulnerabilitySeverityLevel — String: Critical / High / Medium / Low
  • SoftwareVendor, SoftwareName, SoftwareVersion — Affected software details
  • RecommendedSecurityUpdate — Patch info (may be empty)

🔴 PROHIBITED:

  • ❌ Adding Timestamp or TimeGenerated filters (column does not exist)
  • ❌ Projecting CvssScore (column does not exist — use VulnerabilitySeverityLevel instead)
  • ❌ Using Sentinel Data Lake MCP (query_lake) for TVM tables
  • ❌ Using GetDefenderMachineVulnerabilities API (requires separate machine ID lookup, less reliable)

After Phase 4 completes:

  • Report elapsed time: [MM:SS] ✓ Vulnerability scan completed (XX seconds)

Phase 5: Generate Executive Report

Use the Report Template (see section below) to create markdown report.

Critical Report Sections:

  1. Executive Summary - High-level findings (2-3 paragraphs)
  2. Attack Surface Analysis - Failed connections by IP, service, pattern
  3. Threat Intelligence Correlation - Known malicious IPs, APT groups, VPNs
  4. Security Incidents - Incidents triggered by honeypot activity
  5. Attack Pattern Analysis - Targeted services, credential attacks, web exploits
  6. Vulnerability Status - Current CVEs and exploitation risk
  7. Key Detection Insights - TTPs, MITRE ATT&CK mapping, novel indicators
  8. Honeypot Effectiveness - Metrics and recommendations
  9. Conclusion - Summary and next steps

Report Generation:

  1. Populate template with data from Phases 1-4
  2. Use create_file to save: reports/honeypot/Honeypot_Report_<hostname>_<timestamp>.md
  3. Return absolute path to user

After Phase 5 completes:

  • Report elapsed time: [MM:SS] ✓ Report generated (XX seconds)
  • Provide comprehensive timeline breakdown with total elapsed time

KQL Query Library

Additional Useful Queries

Query: Top Targeted User Accounts (Credential Attacks)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where EventID == 4625  // Failed logon
| summarize FailedAttempts = count() by Account
| order by FailedAttempts desc
| take 20

Query: Web Exploitation Patterns (SQL Injection, XSS, Path Traversal)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
W3CIISLog
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where csUriStem has_any ("'", "union", "select", "script", "../", "..\\", "cmd.exe", "powershell")
| summarize 
    AttemptCount = count(),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    UniqueIPs = dcount(cIP)
    by ExploitPattern = case(
        csUriStem has_any ("'", "union", "select"), "SQL Injection",
        csUriStem has "script", "XSS",
        csUriStem has_any ("../", "..\\"), "Path Traversal",
        csUriStem has_any ("cmd.exe", "powershell"), "Command Injection",
        "Other")
| order by AttemptCount desc

Query: Port Scanning Detection

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName =~ honeypot
| summarize 
    DistinctPorts = dcount(RemotePort),
    PortsScanned = make_set(RemotePort),
    EventCount = count()
    by RemoteIP
| where DistinctPorts >= 5  // Threshold: 5+ ports = scan
| order by DistinctPorts desc
| take 20

Query: Brute Force Detection (High Volume from Single IP)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let honeypot = '<HONEYPOT_NAME>';
let threshold = 50;  // 50+ failed attempts = brute force
SecurityEvent
| where TimeGenerated between (start .. end)
| where Computer =~ honeypot
| where EventID == 4625
| extend IpAddress = extract(@"Source Network Address:\s+([^\s]+)", 1, tostring(EventData))
| summarize FailedAttempts = count() by IpAddress
| where FailedAttempts >= threshold
| order by FailedAttempts desc

Report Template

Use this structure for executive reports:

# Honeypot Security Analysis - <HONEYPOT_NAME>
**Analysis Period:** <START_DATE> to <END_DATE> (<HOURS> hours)  
**Report Generated:** <TIMESTAMP>  
**Classification:** CONFIDENTIAL

---

## Executive Summary

[3 comprehensive paragraphs covering attack overview, threat landscape, and value delivered]

**Key Metrics:**
- **Total Attack Attempts:** [count]
- **Unique Attacking IPs:** [count]
- **Security Incidents Triggered:** [count]
- **Known Malicious IPs (Threat Intel):** [count] ([percentage]%)
- **Current Vulnerabilities:** [count] HIGH, [count] MEDIUM

---

## 1. Attack Surface Analysis
[Failed connections by source IP, geographic distribution, VPN/anonymization summary]

## 2. Threat Intelligence Correlation
[IPs matched in threat intel, highest confidence threats, MSTIC indicators]

## 3. Security Incidents
[Incidents involving honeypot with severity, status, classification, MITRE tactics]

## 4. Attack Pattern Analysis
[Targeted services, credential attacks, web exploitation, port scanning]

## 5. Honeypot Vulnerability Status
[CVE inventory, exploitation risk assessment, cross-reference with attacks]

## 6. Key Detection Insights
[MITRE ATT&CK mapping, novel indicators, threat actor attribution]

## 7. Honeypot Effectiveness
[Detection metrics, recommendations for optimization]

## 8. Conclusion
[Summary, key takeaways, immediate/short-term/long-term actions]

---

**Investigation Timeline:**
[Phase timing breakdown]

**Total Investigation Time:** [duration]

Error Handling

Common Issues and Solutions

Issue Solution
Missing honeypot in DeviceInfo table Verify device name; check if device reports to Defender; try Computer field instead
No SecurityEvent logs Device may not be sending Windows Security logs; verify log forwarding configuration
W3CIISLog table not found IIS logging may not be enabled; query WebAccessLog or HTTP logs instead
IP enrichment script fails Check ipinfo.io token in config.json; verify internet connectivity; check temp file exists
Date range returns no results Verify date calculation (current date from context + proper offset); expand time range
KQL timeout Reduce take limit; narrow time range; remove complex aggregations

Validation Checklist

Before delivering report, verify:

  • ✅ All Phase timestamps reported to user
  • ✅ Total elapsed time calculated and displayed
  • ✅ IP enrichment data merged with attack logs
  • ✅ Incident filtering correctly applied (only honeypot-related incidents)
  • ✅ Vulnerability data retrieved (or documented as unavailable)
  • ✅ Report saved to correct path: reports/honeypot/Honeypot_Report_<hostname>_<timestamp>.md
  • ✅ Absolute path returned to user

Integration with Main Copilot Instructions

This skill follows all patterns from the main copilot-instructions.md:

  • Date range handling: Uses +2 day rule for real-time searches
  • Parallel execution: Runs independent queries simultaneously
  • Time tracking: Mandatory reporting after each phase
  • Token management: Uses create_file for all output
  • KQL best practices: Follows Sample KQL Query patterns
  • IP enrichment: Uses documented enrich_ips.py utility

Example invocations:

  • "Investigate the honeypot HONEYPOT-01 over the last 48 hours"
  • "Run honeypot security analysis for honeypot-server-01 from Dec 10-12"
  • "Generate honeypot report for [hostname] last 7 days"

Visualization Options

After completing the investigation, offer to visualize the attack data using the dedicated visualization skills:

Heatmap Visualization

Use the heatmap-visualization skill (.github/skills/heatmap-visualization/SKILL.md) to show attack patterns over time with threat intel drill-down.

When to offer:

  • ✅ After completing honeypot investigation phases
  • ✅ When user asks "show me the attack patterns" or "visualize the attacks"
  • ✅ For comparing attack volumes across time periods
  • ❌ Skip if investigation found minimal activity (<5 unique IPs)

Geomap Visualization

Use the geomap-visualization skill (.github/skills/geomap-visualization/SKILL.md) to show attack origins on a world map.

When to offer:

  • ✅ After completing honeypot investigation phases
  • ✅ When user asks "where are the attacks coming from?" or "show on a map"
  • ✅ For geographic threat distribution analysis
  • ❌ Skip if all IPs are from the same region

Note: W3CIISLog includes native RemoteIPLatitude and RemoteIPLongitude fields - use these directly for geomap visualization without additional enrichment.


Last Updated: January 29, 2026

用于生成适用于Microsoft Sentinel、Defender XDR和Azure Data Explorer的生产级KQL查询。通过验证表结构、参考官方文档及社区示例,确保查询准确高效,并遵循严格的测试与上下文提供规范。
write KQL create KQL query help with KQL query [table] KQL for [scenario]
.github/skills/kql-query-authoring/SKILL.md
npx skills add SCStelz/security-investigator --skill kql-query-authoring -g -y
SKILL.md
Frontmatter
{
    "name": "kql-query-authoring",
    "description": "Use this skill when asked to write, create, or help with KQL (Kusto Query Language) queries for Microsoft Sentinel, Defender XDR, or Azure Data Explorer. Triggers on keywords like \"write KQL\", \"create KQL query\", \"help with KQL\", \"query [table]\", \"KQL for [scenario]\", or when a user requests queries for specific data analysis scenarios. This skill uses schema validation, Microsoft Learn documentation, and community examples to generate production-ready KQL queries."
}

KQL Query Authoring - Instructions

Purpose

Generate validated, production-ready KQL queries by combining schema validation (331+ indexed tables), Microsoft Learn documentation, community examples, and performance best practices.


Prerequisites

Required MCP Servers:

  1. KQL Search MCP Server — Schema validation, query examples, table discovery

    • Install: npm install -g kql-search-mcp (npm)
  2. Microsoft Docs MCP Server — Official Microsoft Learn documentation and code samples

Verification: Tools should be available as mcp_kql-search_* and mcp_microsoft-lea_*.


⚠️ Known Issues

search_favorite_repos Bug (v1.0.5)

❌ Broken — ERROR_TYPE_QUERY_PARSING_FATAL. Use mcp_kql-search_search_github_examples_fallback instead.


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

  1. Validate table schema FIRSTmcp_kql-search_get_table_schema to verify table exists, column names, and data types.

  2. Check platform schema — Sentinel uses TimeGenerated; Defender XDR uses Timestamp. Microsoft Learn examples default to XDR syntax — always convert before testing on Sentinel.

  3. Check local query library FIRST — Use the discovery manifest (.github/manifests/discovery-manifest.yaml) for domain/MITRE lookups and grep_search for table-name/keyword lookups. See the KQL Pre-Flight Checklist in copilot-instructions.md for the full priority order.

  4. Query file structure: NO placeholder TOC — When creating a new query file, do NOT add a ## Quick Reference — Query Index heading or placeholder. scripts/generate_tocs.py creates the heading and table itself. Pre-creating it confuses the strip-and-reinsert logic and produces duplicated content. See ## Creating Query Files below for full file structure rules.

  5. Use multiple sources — Schema (authoritative column names) + Microsoft Learn (official patterns) + community queries (real-world examples).

  6. Test using the correct execution tool — Follow the Tool Selection Rule in copilot-instructions.md:

    • Sentinel-native tables → Data Lake or AH
    • XDR tables ≤ 30d → Advanced Hunting (free); > 30d → Data Lake
    • XDR-only tables (DeviceTvm*, Exposure*) → Advanced Hunting only
    • Adapt timestamp column when switching tools
  7. Test queries before presenting to user — Run with | take 5 via live execution. Use mcp_kql-search_validate_kql_query as fallback if live testing unavailable.

  8. Provide context — Explain what the query does, expected results, and any limitations.

  9. Read the complete workflow below before starting.

📋 Inherited rules: This skill inherits the KQL Pre-Flight Checklist, Tool Selection Rule (Data Lake vs Advanced Hunting), and Known Table Pitfalls from copilot-instructions.md. Those rules are authoritative — do not contradict them here.


Query Authoring Workflow

Step 1: Understand User Requirements

Extract key information:

  • Table(s) needed: Which data source? (e.g., EntraIdSignInEvents, EmailEvents, SecurityAlert)
  • Time range: How far back? (e.g., last 7 days, specific date range)
  • Filters: What specific conditions? (e.g., user, IP, threat type)
  • Output: Statistics, detailed records, time series, aggregations?
  • Platform: Sentinel or Defender XDR? (affects column names)
  • Deployment target: Custom detection rule? (see below)

Custom Detection Intent Detection:

If the user mentions "custom detection", "detection rule", "deploy as detection", "CD rule", "author detections for", or "deploy to Defender":

  1. Read the detection-authoring skill (.github/skills/detection-authoring/SKILL.md) — Critical Rules and CD Metadata Contract sections
  2. Design queries with CD constraints — row-level output, mandatory columns (TimeGenerated, DeviceName, ReportId), no bare summarize
  3. Include cd-metadata blocks in the output file (see Step 8)
  4. Still write queries in Sentinel format (with let variables, 7d lookback) — adaptation to CD format happens at deployment time via the detection-authoring skill

Step 2: Check Local Query Library

Search for existing verified queries before writing from scratch. Use two complementary methods:

  1. Manifest lookup (domain/MITRE): Read .github/manifests/discovery-manifest.yaml and match by domain tag (e.g., identity, endpoint, email) or MITRE technique ID (e.g., T1078, T1566). Best when you know the security domain or ATT&CK technique.
  2. Targeted grep_search (table/keyword): grep_search for the specific table name (e.g., CloudAppEvents, OfficeActivity) or operation keyword (e.g., New-InboxRule, SecretGet) scoped to queries/** and .github/skills/**. The manifest lacks table-name and keyword fields — grep fills this gap.
  3. Check the Ad-Hoc Query Examples appendix in copilot-instructions.md

When to use which: Domain/technique known → manifest first. Table name/operation known → grep first. Both can be used together — manifest for breadth, grep for precision.

If a suitable query is found, adapt it and skip to Step 6. These queries encode known pitfalls and schema quirks.

Step 3: Get Table Schema (MANDATORY)

mcp_kql-search_get_table_schema("<table_name>")

Returns: category, description, all columns with data types, and example queries. Use this to verify column names and understand data types.

Step 4: Get Official Code Samples

mcp_microsoft-lea_microsoft_code_sample_search(
  query: "<table_name> <scenario description>",
  language: "kusto"
)

Include table name + scenario in the query (e.g., "EmailEvents phishing detection").

Step 5: Get Community Examples

mcp_kql-search_search_github_examples_fallback(
  table_name: "<table_name>",
  description: "<goal description>"
)

Also available: mcp_kql-search_search_kql_repositories to find KQL-focused repos.

Step 6: Generate Query

Combine insights: schema for column names, Learn for patterns, community for techniques.

Standalone queries rule: When generating MULTIPLE separate queries, each must start directly with the table name — never use shared let variables across separate queries (they run independently). Use let variables only within a single complex query.

Step 7: Validate and Test (MANDATORY)

Test queries against live data before presenting to the user.

  1. Convert TimestampTimeGenerated if adapting MS Learn examples for Sentinel
  2. Test via mcp_sentinel-data_query_lake or RunAdvancedHuntingQuery with | take 5
  3. Verify results are sensible — check for empty results (wrong table/time/filters)
  4. Fix schema mismatches or syntax errors, re-test
  5. Remove test limits, present to user

Common errors:

Error Fix
Failed to resolve column 'Timestamp' Use TimeGenerated (Sentinel)
Failed to resolve column 'TimeGenerated' Use Timestamp (XDR AH)
Table not found Verify with get_table_schema; try the other execution tool
expected string expression Add tostring() after mv-expand or parse_json
Query timeout / too many results Add datetime filter + take or summarize

Fallback validation: mcp_kql-search_validate_kql_query("<query>") — syntax/schema check only, no live data.

Step 8: Format and Deliver Output

Single query: Provide directly in chat with brief explanation and expected results.

Multiple queries (3+): Create a markdown file in queries/<subfolder>/ with the standardized metadata header. This header is mandatorybuild_manifest.py parses it to index the file for discovery by threat-pulse and other skills.

File naming: queries/<subfolder>/<topic>.md — e.g., queries/email/email_threat_detection.md

Required metadata header template (first 10 lines of every query file):

# <Descriptive Title>

**Created:** YYYY-MM-DD  
**Platform:** Microsoft Sentinel | Microsoft Defender XDR | Both  
**Tables:** <comma-separated exact KQL table names>  
**Keywords:** <comma-separated searchable terms — attack techniques, scenarios, field names>  
**MITRE:** <comma-separated technique IDs, e.g., T1098.001, T1136.003, TA0008>  
**Domains:** <comma-separated domain tags from the valid set below>  
**Timeframe:** Last N days (configurable)  

Valid domain tags: incidents, identity, spn, endpoint, email, admin, cloud, exposure

Field Purpose Parsed By
Tables: Exact KQL table names for grep_search discovery build_manifest.py (full manifest)
Keywords: Searchable terms for attack scenarios, operations, field names build_manifest.py (full manifest)
MITRE: ATT&CK technique/tactic IDs for cross-referencing build_manifest.py (slim + full)
Domains: Domain tags for threat-pulse cross-referencing build_manifest.py (slim + full) — missing = validation error

After creating a new query file: Run python .github/manifests/build_manifest.py to regenerate the discovery manifest, then run python scripts/generate_tocs.py to auto-generate the Quick Reference TOC. The validator will flag any missing required fields.

Subfolder selection: Place files in the subfolder matching the primary data source: identity/, endpoint/, email/, network/, cloud/.

Include per-query documentation with Purpose, Thresholds, Expected Results, and Tuning guidance.

Heading format for TOC compatibility: The generate_tocs.py script auto-generates a Quick Reference TOC by scanning ### and ## Query headings that have a KQL code block within 40 lines. To ensure clean TOC output:

  • DO use ### Query N: <Title> or ## Query N: <Title> for query headings — the number prefix ensures proper TOC ordering
  • DO add a ## heading (e.g., ## Queries, ## Part A:, ## Hunts) immediately before the first ### Query N: if the file has preamble content (Overview, Table Selection, etc.). The TOC generator uses a ---## heading pair as its insertion anchor — without it, the script inserts the TOC at the bottom of the file.
  • DO start non-query section headings with a non-query keyword (e.g., ### Deployment, ### Tuning, ### References) — these are automatically filtered out by the TOC generator
  • DO NOT add a ## Quick Reference — Query Index heading or placeholder yourself — the script creates the heading and table. Pre-existing placeholders cause duplicated content and a broken file structure. (This is also enforced as Critical Rule #4 above.)
  • DO NOT use ### headings for non-query content that contains a KQL code block within 40 lines — the TOC generator uses KQL proximity to detect query headings and will incorrectly include them

Investigation shortcuts (optional): Query files can include an **Investigation shortcuts:** bulleted list between the ## Quick Reference heading and the TOC table. These document recommended query combos for common investigation scenarios (e.g., "Delivered phishing drill-down: Q2.4 + Q7.6 + Q3.3"). Shortcuts are preserved by generate_tocs.py across re-runs. Don't add them to new files — they're a refinement added after real investigations reveal which query combos work best together.

CD-Aware Output

When CD intent is detected (Step 1), each query MUST include a <!-- cd-metadata --> HTML comment block. The full schema is in .github/skills/detection-authoring/SKILL.md under CD Metadata Contract.

Valid cd-metadata fields (exhaustive list):

Field Required Notes
cd_ready Always true or false
schedule If cd_ready "0" (NRT), "1H", "3H", "12H", "24H"
category If cd_ready MITRE tactic (e.g., Persistence, CredentialAccess)
title Optional Dynamic title with {{Column}} placeholders (max 3 unique columns across title + description)
impactedAssets If cd_ready Array of type + identifier pairs
recommendedActions Optional Triage and response guidance string
adaptation_notes Optional What needs to change for CD format

responseActions is NOT a valid cd-metadata field. It shares a name with the Graph API field that is explicitly prohibited in LLM-authored detections ("responseActions": [] is mandatory). Do not include it. Put incident response guidance in recommendedActions instead.

<!-- cd-metadata
cd_ready: true
schedule: "1H"
category: "Persistence"
title: "Suspicious Scheduled Task on {{DeviceName}}"
impactedAssets:
  - type: device
    identifier: DeviceName
recommendedActions: "Investigate the task XML and decode any encoded payloads."
adaptation_notes: "Remove let blocks, add mandatory columns"
-->

For queries not suitable for CD (baseline/statistical):

<!-- cd-metadata
cd_ready: false
adaptation_notes: "Statistical baseline — requires bare summarize, not CD-compatible"
-->

Summary table: Include a CD column in the Implementation Priority table: ✅ 1H / .


Tool Quick Reference

Tool Purpose
mcp_kql-search_get_table_schema Get table columns, types, example queries (Step 3)
mcp_microsoft-lea_microsoft_code_sample_search Official MS Learn KQL samples — use language: "kusto" (Step 4)
mcp_kql-search_search_github_examples_fallback Community KQL examples by table name (Step 5)
mcp_kql-search_search_kql_repositories Find GitHub repos with KQL collections
mcp_kql-search_validate_kql_query Syntax/schema validation (fallback for Step 7)
mcp_kql-search_find_column Find which tables contain a specific column
mcp_kql-search_generate_kql_query Auto-generate schema-validated query from natural language
mcp_sentinel-data_query_lake Execute KQL against live Sentinel (primary validation)
mcp_sentinel-data_search_tables Discover tables using natural language

Schema Differences

Platform Timestamp Column Notes
Sentinel / Log Analytics TimeGenerated All ingested logs
Defender XDR (Advanced Hunting) Timestamp XDR-native tables only; Sentinel tables in AH still use TimeGenerated

Other common differences: Identity/UserPrincipalName (Sentinel) vs AccountUpn/AccountName (XDR); IPAddress (Sentinel) vs RemoteIP/LocalIP (XDR). Always verify with get_table_schema.

Sign-In Table Selection (High-Frequency Queries)

Sign-in queries are the most common query type. Use this decision rule:

Scenario Table Key Differences
AH query, ≤30d EntraIdSignInEvents (single table) Covers both interactive + non-interactive. ErrorCode (int), AccountUpn, Country/City (direct strings), LogonType (JSON array — use has), Timestamp
Data Lake / >30d SigninLogs + AADNonInteractiveUserSignInLogs (union) ResultType (string), UserPrincipalName, parse_json(LocationDetails) needed for geo, IsInteractive (bool), TimeGenerated

Common mistakes:

  • Using union SigninLogs, AADNonInteractiveUserSignInLogs in AH queries — unnecessary, EntraIdSignInEvents covers both
  • Using LogonType == "nonInteractiveUser" — values are JSON arrays (["nonInteractiveUser"]), use has
  • Using ResultType on EntraIdSignInEvents — column is ErrorCode (int), not string

Full details: See copilot-instructions.md → Known Table Pitfalls → EntraIdSignInEvents (AH table preference rule) for complete column mapping and additional pitfalls.

Full table pitfalls (dynamic field parsing, immutable fields, table casing, deprecated tables) are documented in copilot-instructions.md under Known Table Pitfalls. Refer there for SecurityAlert.Status, AuditLogs.InitiatedBy, SigninLogs.DeviceDetail, and 20+ other table-specific gotchas.


Best Practices

Performance Optimization

Reference: KQL Best Practices — Microsoft Learn

1. Filter on datetime columns first

The most important optimization. Datetime predicates use efficient index-based shard elimination, skipping entire data partitions without scanning.

// ✅ Correct — datetime first, then selective string filters
SigninLogs
| where TimeGenerated > ago(7d)
| where UserPrincipalName =~ "user@domain.com"

// ❌ Wrong — string filter before datetime
SigninLogs
| where UserPrincipalName =~ "user@domain.com"
| where TimeGenerated > ago(7d)

2. Use has over contains for token matching

has uses the term index for full-token lookup. contains scans every character — dramatically slower on large tables.

// ✅ Faster — term-level index lookup
| where UserPrincipalName has "admin"

// ❌ Slower — full substring scan
| where UserPrincipalName contains "admin"

Use contains only when you genuinely need substring matching (e.g., fragments inside URL paths).

3. Prefer case-sensitive operators

Case-sensitive comparisons (==, in, has_cs) are faster than case-insensitive (=~, in~, has). Use case-insensitive only when casing is unpredictable.

// ✅ Faster — ActionType, Operation, OfficeWorkload have consistent casing
| where ActionType == "LogonFailed"
| where Operation in ("New-InboxRule", "Set-InboxRule")
| where OfficeWorkload == "Exchange"

// 🔵 Use =~ only when casing varies (e.g., user-entered UPNs)
| where UserPrincipalName =~ "user@domain.com"

Common fields with consistent casing (always use == / in): ActionType, Operation, OfficeWorkload, EventID, ResultType, DeliveryAction, EmailDirection, LogonType, Severity, Status, Classification.

4. Filter tables BEFORE joins

Pre-filter both sides of a join to reduce data volume. Move where clauses into subqueries.

// ✅ Correct — filter KB table before joining
DeviceTvmSoftwareVulnerabilities
| join kind=inner (
    DeviceTvmSoftwareVulnerabilitiesKB
    | where IsExploitAvailable == true
    | where CvssScore >= 8.0
) on CveId

// ❌ Wrong — joins full tables, filters after
DeviceTvmSoftwareVulnerabilities
| join kind=inner DeviceTvmSoftwareVulnerabilitiesKB on CveId
| where IsExploitAvailable == true

Join sizing rules:

  • Smaller table on the left (or hint.strategy=broadcast when left is small)
  • in instead of left semi join for single-column filtering
  • lookup instead of join when right side is small (<50 MB)
  • hint.shufflekey=<key> when both sides are large with high-cardinality join key

5. Use materialize() for multi-referenced let statements

Without materialize(), the engine may recompute the let expression each time it's referenced.

// ✅ Computed once, reused twice
let SprayFailures = materialize(
    EntraIdSignInEvents
    | where Timestamp > ago(7d)
    | where ErrorCode in (50126, 50053, 50057)
    | summarize FailedAttempts = count(), TargetUsers = dcount(AccountUpn)
        by SourceIP = IPAddress
    | where TargetUsers >= 5);

6. Narrow arg_max to only needed columns

arg_max(TimeGenerated, *) materializes every column. Specify only what you use.

// ✅ Only 5 columns materialized
SecurityAlert
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, Entities, Tactics, Techniques, AlertName, AlertSeverity) by SystemAlertId

// ❌ Materializes all 30+ columns
SecurityAlert
| summarize arg_max(TimeGenerated, *) by SystemAlertId

7. Pre-filter before JSON parsing

For rare key/value lookups in dynamic columns, use has to eliminate rows before expensive parse_json().

// ✅ Term filter first, JSON parse on survivors
AuditLogs
| where tostring(TargetResources) has "MyApp"
| extend Target = tostring(parse_json(tostring(TargetResources[0])).displayName)
| where Target == "MyApp"

8. Filter on table columns, not calculated columns

Filtering on native columns enables index usage; calculated columns force full scans.

// ✅ Filter on native column
SecurityEvent | where EventID == 4625

// ❌ Filter on calculated column
SecurityEvent | extend Cat = case(EventID == 4625, "Fail", ...) | where Cat == "Fail"

9. Project only needed columns early

Drop unnecessary columns before expensive operators (join, summarize, mv-expand) to reduce memory and shuffling.

10. Use take or summarize to limit results

Unbounded queries on large tables consume excessive resources.

11. Platform-specific dynamic column access

In AH, AuditLogs.InitiatedBy and TargetResources are native dynamic — use direct dot-notation. In Data Lake, they may be string-typed requiring parse_json().

// ✅ Advanced Hunting — direct access
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)

// ✅ Data Lake — parse_json wrapper
| extend Actor = tostring(parse_json(tostring(InitiatedBy.user)).userPrincipalName)

// 🔵 Safe in both — stringify full field
| where tostring(InitiatedBy) has "user@domain.com"

Security and Privacy

  • Limit sensitive data exposure — redact PII with strcat(substring(UPN, 0, 3), "***") when appropriate
  • Filter early — reduce dataset before projecting sensitive columns

Code Quality

  • Comments — explain what the query does and why key filters are applied
  • Meaningful variable nameslet SuspiciousIPs = ... not let x = ...
  • Standalone queries — when providing multiple separate queries, each MUST start with the table name directly. Never share let variables across queries the user will run independently

Dynamic Type Casting

Common "expected string expression" error: After mv-expand, parse_json, or split, values are dynamic — string functions fail. Always convert first:

// After mv-expand
| mv-expand AuthDetails
| extend AuthMethod = tostring(AuthDetails.authenticationMethod)

// After split
| extend Parts = split(UPN, "@")
| extend Domain = tostring(Parts[1])

Rule of thumb: If you get "expected string expression", add tostring().

Threat Pulse 是面向 SOC 日常操作及新用户的快速安全扫描技能。它在 15 分钟内并行扫描 7 个关键安全领域,生成包含优先发现项和钻取建议的威胁脉冲仪表板,帮助用户快速定位高风险问题并引导至专项调查。
用户询问如何开始使用或新手入门指南 请求执行每日安全概览或快速风险评估 需要跨域(如身份、端点、邮件)的综合安全状态检查
.github/skills/threat-pulse/SKILL.md
npx skills add SCStelz/security-investigator --skill threat-pulse -g -y
SKILL.md
Frontmatter
{
    "name": "threat-pulse",
    "description": "Recommended starting point for new users and daily SOC operations. 15-minute broad security scan across 7 domains (incidents, identity, NHI, endpoint, email, admin\/cloud, exposure) producing a Threat Pulse Dashboard with drill-down recommendations to specialized skills. Trigger on getting-started questions like \"where do I start\", \"what can you do\", \"help me investigate\"."
}

Threat Pulse — Instructions

Purpose

The Threat Pulse skill is a rapid, broad-spectrum security scan designed for the "if you only had 15 minutes" scenario. It executes 12 queries across 7 security domains in parallel, producing a prioritized dashboard of findings with drill-down recommendations to specialized investigation skills.

What this skill covers:

Domain Key Questions Answered
🔴 Incidents What incidents are open and unresolved? Prioritizes High/Critical, backfills with Medium/Low in smaller environments. How old are they? Who owns them? What was recently resolved — TP rate, MITRE tactics, severity distribution?
🔐 Identity (Human) Which users have the highest Defender XDR Risk Score (0-100)? Which are flagged by Identity Protection (RiskLevel/RiskStatus)? What risk events are driving the signals? Are there password spray / brute-force patterns?
🤖 Identity (NonHuman) Which service principals expanded their resource/IP/location footprint?
💻 Endpoint Which endpoints deviated most from their process behavioral baseline? What singleton process chains exist?
📧 Email Threats What's the phishing/spam/malware breakdown? Were any phishing emails delivered?
🔑 Admin & Cloud Ops What mailbox rules, OAuth consents, transport rules, or mailbox permission changes occurred? Is there programmatic mailbox access via API? Any MCAS-flagged compromised sign-ins? Human-initiated CA policy changes? Who performed high-impact admin operations — role assignments, MFA registration, app registration, ownership grants?
🛡️ Exposure Are any critical assets internet-facing with RCE vulnerabilities? What exploitable CVEs (CVSS ≥ 8) are present across the fleet?

Data sources: SecurityIncident, SecurityAlert, IdentityInfo, AADUserRiskEvents, EntraIdSignInEvents, DeviceProcessEvents, DeviceLogonEvents, ExposureGraphNodes, AADServicePrincipalSignInLogs, EmailEvents, CloudAppEvents, AuditLogs, DeviceTvmSoftwareVulnerabilities, DeviceTvmSoftwareVulnerabilitiesKB

Portal URL patterns are defined in the Defender XDR Portal Links table in the Take Action section. Append tid=<tenant_id> (from config.json) to ALL security.microsoft.com URLs — use ?tid= or &tid= depending on existing query params. Omit if tenant_id is not configured.


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules
  2. Execution Workflow — Phase 0–3
  3. Phase 4: Interactive Follow-Up Loop
  4. Take Action — Portal links, AH queries, defanging
  5. Sample KQL Queries — 12 queries
  6. Post-Processing — Drift scores, cross-query correlation
  7. Query File Recommendations
  8. Report Template — Inline + full markdown file structure
  9. Known Pitfalls
  10. Quality Checklist
  11. SVG Dashboard Generation

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

  1. Workspace selection — Follow the SENTINEL WORKSPACE SELECTION rule from copilot-instructions.md. Call list_sentinel_workspaces() before first query.

  2. Read config.json — Load workspace ID, tenant, subscription, and Azure MCP parameters before execution.

  3. Output defaults — Default to inline chat with 7d lookback. Only ask the user for output preferences if they explicitly mention a different mode (e.g., "save to file", "markdown report", "30 day lookback"). If the user just says "threat pulse", "run a scan", or similar — proceed immediately with defaults, do not prompt.

  4. ⛔ MANDATORY: Evidence-based analysis only — Every finding must cite query results. Every "clear" verdict must cite 0 results. Follow the Evidence-Based Analysis rule from copilot-instructions.md.

  5. Parallel execution — Run the Data Lake query (Q5) and all Advanced Hunting queries (Q1, Q2, Q3, Q4, Q6, Q7, Q8, Q9, Q10, Q11, Q12) simultaneously.

  6. Cross-query correlation — After all queries complete, check for correlated findings per the Cross-Query Correlation table in Post-Processing. Escalate priority when patterns match.

  7. SecurityIncident output rule — Every incident MUST include a clickable Defender XDR portal URL: https://security.microsoft.com/incidents/{ProviderIncidentId}?tid=<tenant_id>. See Tenant ID in Portal URLs.

  8. ⛔ MANDATORY: Query File Recommendations (tiered) — After rendering the main report body (Dashboard Summary through Recommended Actions), append the Query File Recommendations section. This runs AFTER the report is visible to the user — not as a blocking gate. Skip only when ALL verdicts are ✅.

  9. ⛔ MANDATORY: 30d drill-down lookback — ALL Phase 4 drill-down queries use 30d (AH) or 90d (Data Lake) lookback, regardless of the Threat Pulse scan window. Entity-scoped queries (filtered by UPN/IP/device) have negligible performance difference between 7d and 30d, and attacks routinely predate the pulse window. AH caps at 30d anyway. Substitute ago(7d)ago(30d) in all query file and skill queries during drill-downs.

Highest Verdict Query Files Proactive Skills Report Section
🔴 or 🟠 Top 3–5, entity-specific prompts All matching skills 📂 Recommended Query Files
🟡 (no 🔴/🟠) Top 1–2, broader prompts Up to 3 posture skills 📂 Proactive Hunting Suggestions
All ✅ Skip Skip Omit entirely
  1. ⛔ MANDATORY: The follow-up loop is stateful, memory-backed, and self-sustaining. Three non-negotiable invariants that hold for the ENTIRE session (re-read this rule before any follow-up interaction):
  • (a) Memory is the source of truth, not the conversation. The prompt pool lives ONLY in /memories/session/threat-pulse-drilldowns.md. It MUST be created the first time the pool is built (Phase 4 step 1) and is a hard precondition for rendering any selection list. If you are about to present follow-up options and this file does not exist, STOP and create it first. NEVER reconstruct the pool from conversation history — always memory view immediately before each vscode_askQuestions call.
  • (b) The loop re-presents itself automatically. After EVERY completed drill-down, you MUST return to Phase 4 step 2 and call vscode_askQuestions again with the updated pool — without waiting for the user to ask for the menu. The only exits are the user selecting Skip, or an empty pool. "Bring the menu back up" should never be something the user has to request.
  • (c) The Quick Pick Call Contract is mechanical, not advisory. Run the Pre-Flight Checklist and print the Pool Receipt line before every call. In particular: ZERO recommended keys, multiSelect: true, correct icon taxonomy (🔍 📄 🎯 💾 🆕 🔄 📋), and the 💾 / 🔄 / Skip tail every iteration. Do not substitute an ad-hoc "Done" option for the contracted tail.

Execution Workflow

Phase 0: Prerequisites

  1. Read config.json for workspace ID and Azure MCP parameters

  2. Call list_sentinel_workspaces() to enumerate available workspaces

  3. Use defaults (inline chat, 7d) unless user specified otherwise

  4. ⛔ MANDATORY: Display scan summary — Before executing any queries, output the following brief to the user as plain markdown text (NOT inside a fenced code block, NOT as inline code). Use the exact heading, line breaks, and emoji-prefixed bullet items shown below. Substitute <WorkspaceName>, <WorkspaceId>, lookback, and output format. Never skip this step — it sets analyst expectations for what's about to run.

    🔍 Threat Pulse — Scan Plan

    Workspace: <WorkspaceName> (<WorkspaceId>) Lookback: <N>d Output: <Inline / Markdown file / Both>

    Executing 12 queries across 7 domains:

    🔴 Incidents — Open incidents (severity-ranked) + 7d closed summary (Q1, Q2) 🔐 Identity — Identity risk posture, risk event enrichment, auth spray (Q3, Q4) 🤖 NonHuman ID — Service principal behavioral drift (Q5) 💻 Endpoint — Device process drift, rare process chains (Q6, Q7) 📧 Email — Inbound threat snapshot (Q8) 🔑 Admin & Cloud — Cloud app ops, privileged operations (Q9, Q10) 🛡️ Exposure — Critical assets, exploitable CVEs (Q11, Q12)

    Data Lake: 1 query | Advanced Hunting: 11 queries in parallel Estimated time: ~2–4 minutes

Phase 1: Data Lake Query (Q5)

Why only 1 query on Data Lake? Q5 requires a 97-day lookback for SPN baseline computation — AH Graph API caps at 30 days. All other queries use ≤30d lookback on Analytics-tier tables accessible via AH.

Query Domain Purpose Tool
Q5 🤖 Identity (NonHuman) Service principal behavioral drift (90d vs 7d) query_lake

Phase 2: Advanced Hunting Queries (Q1, Q2, Q3, Q4, Q6, Q7, Q8, Q9, Q10, Q11, Q12)

Run all 11 in parallel — no dependencies between queries.

Design rationale: The connected LA workspace makes all Sentinel tables (SecurityIncident, IdentityInfo, AADUserRiskEvents, AuditLogs, etc.) queryable via AH. AH is preferred: it's free for Analytics-tier tables and avoids per-query Data Lake billing.

Query Domain Purpose Tool
Q1 🔴 Incidents Open incidents (severity-ranked backfill) with MITRE tactics RunAdvancedHuntingQuery
Q2 🔴 Incidents 7-day closed incident summary (classification, MITRE, severity) RunAdvancedHuntingQuery
Q3 🔐 Identity (Human) Identity risk posture (IdentityInfo) + risk event enrichment (AADUserRiskEvents) RunAdvancedHuntingQuery
Q4 🔐 Identity (Human) Password spray / brute-force across Entra ID + RDP/SSH RunAdvancedHuntingQuery
Q6 💻 Endpoint Fleet device process drift (7d baseline vs 1d) RunAdvancedHuntingQuery
Q7 💻 Endpoint Rare process chain singletons (30d) RunAdvancedHuntingQuery
Q8 📧 Email Inbound email threat snapshot RunAdvancedHuntingQuery
Q9 🔑 Admin & Cloud Ops Cloud app suspicious activity (CloudAppEvents) RunAdvancedHuntingQuery
Q10 🔑 Admin & Cloud Ops High-impact admin operations (AuditLogs) RunAdvancedHuntingQuery
Q11 🛡️ Exposure Internet-facing critical assets RunAdvancedHuntingQuery
Q12 🛡️ Exposure Exploitable CVEs (CVSS ≥ 8) across fleet RunAdvancedHuntingQuery

Phase 3: Post-Processing & Report

  1. Interpret device drift scores from Q6 results (see Post-Processing)
  2. Run cross-query correlation checks (see rule 6 above)
  3. Assign verdicts to each domain (🔴 Escalate / 🟠 Investigate / 🟡 Monitor / ✅ Clear)
  4. Generate prioritized recommendations with drill-down skill references
  5. Render the report immediately — output the Dashboard Summary, Detailed Findings, Cross-Query Correlations, and 🎯 Recommended Actions. Do NOT block on the manifest or prompt pool building.
  6. After the report is rendered, run the Query File Recommendations procedure and append the 📂 Recommended Query Files section. This happens while the user is already reading the report — no perceived delay. Skip entirely when all verdicts are ✅.

Performance note: The Recommendation Gate was previously a blocking step (Phase 3.5) that loaded the ~500-line manifest YAML and ranked entries before the report could render. By moving it after the report output, the user sees findings immediately while recommendations load in the background. The Phase 4 prompt pool building also benefits — it reuses the recommendations already computed in step 6 rather than re-scanning all 12 query results independently.

Phase 4: Interactive Follow-Up Loop

After rendering the report, present the user with a selectable list of follow-up actions — skill investigations, query file hunts, and IOC lookups. Runs when at least one 🔴, 🟠, or 🟡 verdict exists (skip only when ALL verdicts are ✅).

This is a loop, not a one-shot. After each action completes, re-present the selection list with the prompt pool updated. Tier depth (🔴/🟠 vs 🟡-only vs all ✅) follows Rule 8.

⛔ Loop invariant — verify before EVERY iteration (per Rule 10): (a) /memories/session/threat-pulse-drilldowns.md exists and was just re-read via memory view — if not, create/read it first; (b) you are re-presenting the menu automatically after the prior drill-down, not because the user asked; (c) the Pre-Flight Checklist passed and the Pool Receipt was printed. If any of the three is false, fix it before calling vscode_askQuestions. The loop only ends on Skip or an empty pool.

Prompt types (three categories, one unified list):

Type Icon Source Example
Skill investigation 🔍 Per-query Drill-down: skill + entities from findings 🔍 Investigate user jsmith@contoso.comuser-investigation
Query file hunt 📄 Manifest domain + MITRE matching → query file 📄 Hunt for RDP lateral movement from 10.0.0.50queries/endpoint/rdp_threat_detection.md
IOC lookup 🎯 Suspicious IPs, domains, hashes surfaced in findings 🎯 Enrich and investigate IP 203.0.113.42ioc-investigation

Skill matching rules — derive from findings:

Query Trigger Skill Prompt
Q1 Incident surfaced incident-investigation Investigate incident <ProviderIncidentId>
Q1 Incident with Exfiltration tactic or DLP/Insider Risk in AlertNames data-security-analysis Analyze data security events for <entity>
Q2 TruePositive > 0 with non-empty Techniques array mitre-coverage-report Run MITRE coverage report
Q3–Q4 Username/UPN in findings user-investigation Investigate <UPN>
Q3 3+ risky users, or any ConfirmedCompromised identity-posture Run identity posture report
Q3 User with anonymizedIPAddress, impossibleTravel, or anomalousToken in TopRiskEventTypes authentication-tracing Trace authentication chain for <UPN>
Q3 User with unfamiliarFeatures or suspiciousAPITraffic in TopRiskEventTypes scope-drift-detection/user Analyze user behavioral drift for <UPN>
Q3+Q4 🟡-only identity verdicts (no 🔴/🟠) identity-posture Run identity posture report
Q4 Spray source IP ioc-investigation Investigate IP <address>
Q4 Spray targeting 5+ users identity-posture Run identity posture report
Q5 SPN with drift scope-drift-detection/spn Analyze drift for <SPN>
Q6 Device with DriftScore > 130 scope-drift-detection/device Analyze device process drift for <hostname>
Q6–Q7 Device in findings computer-investigation Investigate device <hostname>
Q8 Phishing delivered or malware detected email-threat-posture Run email threat posture report
Q8+Q3 Phishing recipient appears in Q3 risky users authentication-tracing Trace authentication chain for <UPN>
Q9 Compromised Sign-In user surfaced user-investigation + authentication-tracing Investigate <UPN> / Trace authentication chain for <UPN>
Q9 Conditional Access Change by human actor ca-policy-investigation Investigate CA policy changes by <UPN>
Q9 Exchange Admin/Rule Change actors user-investigation Investigate <UPN>
Q10 MFA-Registration user user-investigation Investigate <UPN>
Q10 AppRegistration or Ownership operations app-registration-posture Run app registration posture report
Q10 AppRegistration targets containing AI/Agent/Copilot keywords ai-agent-posture Run AI agent security audit
Q10 RoleManagement Global/Security Admin OR bulk Password resets from single actor identity-posture Run identity posture report
Q10 3+ categories with same actor in TopActors user-investigation Investigate <UPN>
Q11 Any IsVerifiedExposed == true asset exposure-investigation Run exposure report for <hostname>
Q11–Q12 Device in findings computer-investigation Investigate device <hostname>
Q12 CVE with fleet impact exposure-investigation Run vulnerability report for <CVE>

Drill-down lookback — Per Rule 9, substitute ago(7d)ago(30d) (AH) or ago(90d) (Data Lake) in all drill-down queries.

Procedure:

  1. Build the initial prompt pool by combining:

    • Skill prompts: one per unique entity + matching skill from the table above. If the same entity appears in multiple queries (e.g., Q3 and Q9), create ONE skill prompt for that entity — the correlation context goes in the Description, not in the Label.
    • Query file prompts: from Phase 3 step 5 keyword extraction. Each query file is its OWN separate prompt — never merge a query file prompt with a skill prompt.
    • IOC prompts: any suspicious IPs/domains from non-✅ findings not already covered by a skill prompt
    • Deduplicate: if a skill prompt and IOC prompt target the same entity, keep only the skill prompt
    • 🔴 NEVER merge a skill prompt (🔍) with a query file prompt (📄) into a single option. They are different action types with different execution paths.
    • ⛔ Persist the pool. Write the final pool to /memories/session/threat-pulse-drilldowns.md using the exact template below. The format banner is mandatory — it makes the delimiter contract visible to every iteration and every LLM that edits the file. This memory block is the single source of truth; conversation history is not.

    Memory File Template (write on first pool creation)

    # Threat Pulse Session — <YYYY-MM-DD>
    
    **Workspace:** <name> (<id>)
    **Lookback:** <7d|30d|90d>
    **Scan Start:** <YYYY-MM-DD HH:MM UTC>
    
    ## Active Prompt Pool
    
    <!-- FORMAT: `- <ICON> <action> <entity> — Q<N>: <finding> → <skill-or-query-file>` -->
    <!-- ` — ` (space-emdash-space) is the REQUIRED label/description split delimiter. -->
    <!-- One icon per line. Order = file position (no numbering). Do not edit this comment block. -->
    
    - 🔍 Investigate incident #<IncidentId> — Q1: <brief finding>, <N> alerts, <MITRE-ID> → incident-investigation
    - 🎯 Enrich and investigate IP <IP> — Q4: <N> spray attempts / <N> users → ioc-investigation
    ...
    
    ## Pulse Key Findings (quick reference)
    
    ...
    
    ## Completed Drill-Downs
    
    _(none yet)_
    
  2. Call vscode_askQuestions using the Quick Pick Call Contract below. Apply identically on every iteration.

    Quick Pick Call Contract

    • header: Follow-Up Investigation
    • question: Select one or more actions to launch (or skip):
    • options: entity prompts (from memory), then 📋 (if truncated), then 💾 / 🔄 / Skip as the final three — in that order, every iteration. 🆕 prompts prepend to the entity portion only.
      1. 💾 Save full investigation reportSave the complete Threat Pulse session (scan + all drill-downs) as a markdown file
      2. 🔄 Refresh prompt poolRebuild the follow-up prompt list from existing pulse + drill-down findings (does NOT re-run the 12 pulse queries)
      3. SkipNo follow-up — investigation complete
    • Allowed Label icons: 🔍 📄 🎯 💾 🆕 🔄 📋. Verdict emoji (🔴🟠🟡🟢✅) are banned from Labels (render as �� in VS Code quick picks) but fine in Descriptions. Drop 💾 after report is saved; 🔄 and Skip always remain.

    🔴 Pre-Flight Checklist — run mechanically before EVERY vscode_askQuestions call

    □ 1. memory view → read `## Active Prompt Pool` just now (not earlier)
    □ 2. Count entity prompts (exclude 💾/🔄/📋/Skip) = N
    □ 3. Format integrity: every entity line starts with `- ` followed by exactly ONE icon. Any legacy `<N>.` prefix → migrate to `- ` first, re-read, then continue.
    □ 4. If N > 12: render top 12 (🆕 first, then memory order) + append `📋 Show full prompt pool (N items)`
    □ 5. For each rendered option: split memory line at FIRST ` — ` → label = text after `- ` up to delimiter, description = right, BYTE-FOR-BYTE (no paraphrasing; if something is missing, edit memory first then re-read)
    □ 6. Atomic check: each option Label has exactly ONE icon; Description has at most ONE `→ target`
    □ 7. `multiSelect: true` in call args
    □ 8. ZERO `recommended` keys anywhere in options[]
    □ 9. Tail = 💾 / 🔄 / Skip (or 📋 / 💾 / 🔄 / Skip if truncated)
    □ 10. Print the Pool Receipt line to chat BEFORE invoking the tool
    

    Pool Receipt (box 10) — one-liner printed to chat so contract violations are user-visible:

    📊 Pool: <N> total / rendering <R> (🆕×<a>, 🔍×<b>, 📄×<c>, 🎯×<d>) / truncated <✔|—> | multiSelect=true ✔ | recommended=0 ✔
    

    If user selects 📋: re-invoke with all entity prompts (drop 📋, keep 💾/🔄/Skip tail).

  3. If user selects Skip (alone) or pool is empty: end skill execution. Ignore any freeform text if Skip is selected.

  4. Freeform input routing — If user types freeform text instead of (or alongside) selecting options, route by matching intent to validated sources. Do NOT write ad-hoc KQL — find the right skill or query file first. Classified actions feed into step 7 alongside any selected options.

    1. Skill match — Check the request against copilot-instructions.md Available Skills trigger keywords. "Check vulnerabilities on that device" → exposure-investigation or computer-investigation. Route as 🔍 — the read_file gate in step 7 applies.
    2. Query file matchgrep_search the request's key terms (table names, operations, attack types) against queries/**. "Check forwarding rules" → queries/email/email_threat_detection.md. Route as 📄.
    3. Contextual question — If answerable from data already in context (e.g., "is that IP in other alerts?"), answer directly. If a query is needed, loop back to sub-steps 1–2 to find the right source.
    4. No match — If no skill or query file covers the request, follow the KQL Pre-Flight Checklist from copilot-instructions.md (schema validation, table pitfalls, existing query search) before writing any KQL. Never skip the pre-flight for freeform requests.
  5. 💾 Save full investigation report selected:

    • Read /memories/session/threat-pulse-drilldowns.md (critical after context compaction) and compile pulse dashboard + all drill-down findings into a single markdown file using the Report Template (file mode). Weave drill-down insights into the main report — do NOT just append raw output.
    • If no drill-downs were executed yet, omit the Drill-Down Investigation Results and Cross-Investigation Correlation sections with note: "No drill-down investigations were performed in this session."
    • Save to reports/threat-pulse/Threat_Pulse_YYYYMMDD_HHMMSS.md. Drop 💾 from subsequent pool iterations.
  6. 🔄 Refresh prompt pool selected — prompt list ONLY, no KQL execution. Refresh rebuilds the follow-up list; it does NOT re-run Q1–Q12 and does NOT re-run drill-downs. Discard the current pool, rebuild by re-applying Query File Recommendations and the skill matching table against pulse findings + all drill-down findings in memory. Deduplicate against completed prompts. If selected alongside other actions, refresh FIRST, then present the new pool before executing the others.

  7. One or more actions selected — execute sequentially. Build a todo list (one item per action). For each:

    • 🔍 Skill prompt:read_file the child SKILL.md BEFORE writing ANY query → find Investigation shortcut → match TP Q# trigger → execute with entity substitution. Writing KQL without the prior read_file = schema hallucination. See 🔍 Skill Drill-Down Execution Rule.
    • 📄 Query file prompt: read the file and execute its queries verbatim with entity substitution. See 📄 Query File Execution Rule.
    • 🎯 IOC prompt: load ioc-investigation skill with the target indicator.

    After each drill-down, append a session-state entry to /memories/session/threat-pulse-drilldowns.md under ## Completed Drill-Downs:

    ### <N>. <Prompt Label> (<skill-name>, <YYYY-MM-DD HH:MM>)
    - **Entity:** <target entity>
    - **Trigger:** Q<N> — <original finding>
    - **Key Findings:** <1–8 bullets, evidence-cited>
    - **Risk Assessment:** <emoji> <level> — <1-line justification>
    - **Cross-References:** <overlaps with other drill-downs or pulse queries>
    - **Recommendations:** <top 1–3 actions>
    

    This survives context compaction and feeds the 💾 Save report.

    Before returning to step 2 — MANDATORY, in order:

    1. New Evidence Scan — review drill-down results for entities/TTPs not present in prior findings. Add 🆕 prompts only for meaningful leads (new attacker IP with high abuse, new critical CVE on exposed device, etc.). If nothing warrants follow-up, note: "No actionable new evidence."
    2. Manifest check — for each 🆕 item, consult .github/manifests/discovery-manifest.yaml (match by domains, mitre, or title). Only fall back to ad-hoc KQL if nothing matches.
    3. Reload → mutate → write backmemory view ## Active Prompt Pool → delete the completed bullet line(s) → prepend 🆕 prompts as new bullet lines (- <ICON> ...) → memory str_replace. Every entity line is a bullet — no ordinals, so adding/removing items never requires renumbering. Never reconstruct the pool from conversation history.
    4. Return to step 2. Never render the pool as a markdown table/list instead of calling vscode_askQuestions.

Atomic options — ONE action per option. Each option maps to ONE skill + ONE entity, OR ONE query file. When correlations link findings (e.g., Q3+Q9 same user), generate separate options, put the correlation in the Description. Bundling multiple actions/arrows in a single option is the #1 follow-up mistake.

✅ Correct: 🔍 Investigate user cameron@contoso.com / desc Q3+Q9: identity risk + inbox rule manipulation → user-investigation ❌ Wrong: 🔍 Investigate cameron ... → user-investigation, 📄 Hunt phishing → queries/email/...

📄 Query File Execution Rule

⛔ MANDATORY — applies to ALL 📄 query file prompt executions in Phase 4.

When executing a 📄 prompt, use the queries from the file verbatim with entity substitution. Do NOT rewrite queries against different tables than the file specifies.

  1. Read the query file and check its Investigation shortcuts section at the top — match the (TP Q#) annotation to the triggering Threat Pulse query to identify the recommended query chain. Follow that chain for the hunt
  2. Substitute entity values (hostnames, IPs, UPNs) and adjust ago(Nd) lookback if context-aware expansion applies
  3. ⚠️ Hostname-safe substitution: Device names vary across tables (short hostname vs FQDN vs uppercase). NEVER use == for device/computer filters — use startswith (default, case-insensitive, matches both short name and FQDN), or in~ (multi-device). Override == in query file entity substitution notes with startswith.
  4. Execute using the file's exact tables, columns, and filters
  5. If supplementing with additional tables, execute the file's queries first, then add your own — clearly label which are from the file vs. supplementary
Action Status
Reading a query file then writing queries against a different table PROHIBITED
Using the query file as "inspiration" and rewriting from scratch PROHIBITED
Executing the file's queries verbatim with entity substitution REQUIRED

🔍 Skill Drill-Down Execution Rule

⛔ MANDATORY — applies to ALL 🔍 skill drill-down executions in Phase 4.

When executing a skill drill-down, load the child skill's SKILL.md and use its validated queries. Do NOT write ad-hoc queries from memory — schema hallucination (wrong column names, wrong table) is the #1 drill-down failure mode.

  1. Load the child skill's SKILL.md
  2. Match the trigger context (TP Q number) against the skill's Investigation shortcuts section to identify the relevant query chain
  3. Execute the shortcut query chain — substitute only entity placeholders and date ranges. Do NOT add columns, change project/summarize by, or restructure. Column names vary across Device* tables; the SKILL.md queries already use the correct ones.
  4. For quick triage: run only the shortcut chain. For deep investigation: run the full skill workflow
Action Status
Writing ad-hoc KQL without loading the child SKILL.md PROHIBITED
Loading SKILL.md then modifying its queries (adding/changing columns, restructuring) PROHIBITED
Using SKILL.md queries verbatim with entity substitution REQUIRED

🎬 Take Action — Portal-Ready Remediation Blocks

⚠️ AI-generated content may be incorrect. Always review Take Action queries and portal links for accuracy before executing remediation actions.

After every non-✅ drill-down that surfaces actionable entities, append a 🎬 Take Action section with direct portal links (single entities) or Advanced Hunting queries (bulk entities). Ref: Take action on AH results

Every 🎬 Take Action heading in the output — this one and every subsequent one — MUST be immediately followed by the AI-content warning blockquote above.

Skip when: verdict is ✅/🔵, or the action was already taken (e.g., ZAP purged emails).

Single Entity vs Bulk Entity Decision Rule

The remediation format depends on how many entities need action.

Scenario Format
1 entity (user, device, IP, domain, hash) Direct Defender XDR portal link (see Portal Links table for URL patterns)
2+ emails AH query with NetworkMessageId in (...) → Take actions
2+ devices AH query with DeviceName in~ (...) → Take actions
2+ IPs/domains/hashes AH query → click value in results → Add Indicator (allow/warn/block)

⛔ PROHIBITED: Generating an AH query for a single entity when a direct portal link would suffice. AH Take Action is for bulk remediation — for a single entity, link directly to the portal page where the analyst can act.

ID sources (agent retrieves silently — never ask the user):

  • User OID: Graph /v1.0/users/<UPN>?$select=id or IdentityInfo.AccountObjectId
  • MDE DeviceId: DeviceInfo.DeviceId or GetDefenderMachine API
  • SHA / NetworkMessageId / etc.: from the originating AH table

⛔ Never emit prompts like "Retrieve the DeviceId" — run the lookup and emit the finished link in the same turn.

Required Columns per Entity Type

Missing a required column silently disables the action menu. Always include these:

Entity Required Columns Actions Notes
📧 Email NetworkMessageId, RecipientEmailAddress Soft/hard delete, move to folder, submit to Microsoft, initiate investigation Do NOT use projectSubmit to Microsoft and Initiate Automated Investigation require undocumented columns that project strips, silently greying out those options. The portal's Show empty columns toggle only works when columns exist in the result schema. Return all columns; use where to scope results.
💻 Device DeviceId Isolate, collect investigation package, AV scan, initiate investigation, restrict app execution Use summarize arg_max(Timestamp, *) by DeviceId for latest state
📁 File SHA1 or SHA256 + DeviceId Quarantine file Both hash and device required
🔗 Indicator IP, URL/domain, or SHA hash column Add indicator: allow, warn, or block An AH query is still required to surface the values as clickable — there is no Take actions dropdown button. Instead, click any IP/URL/hash value directly in the AH results → Add indicator to create a Defender for Endpoint custom indicator
🔐 Identity (No AH Take Action) Confirm compromised, revoke sessions, suspend in app Single user: Direct Defender XDR Identity page link. Never generate an AH query for identity remediation

Template Queries

📧 Email — by NetworkMessageId: (no project — see Email row above)

EmailEvents
| where Timestamp > ago(7d)
| where NetworkMessageId in ("<id1>", "<id2>")

Take actions → Move to mailbox folder, Delete email (soft/hard), Submit to Microsoft, Initiate automated investigation

📧 Email — by compromised sender domain:

EmailEvents
| where Timestamp > ago(30d)
| where SenderFromDomain =~ "<domain>" and ThreatTypes has "Phish" and DeliveryAction == "Delivered"
| take 500

Take actions → Move to mailbox folder, Delete email (soft/hard), Submit to Microsoft, Initiate automated investigation

💻 Single Device — direct portal link: Link to the Defender XDR machine page. If DeviceId isn't in context, look it up yourself:

DeviceInfo | where DeviceName startswith '<name>' | summarize arg_max(Timestamp, *) by DeviceId | project DeviceId

Then emit: [<DeviceName>](https://security.microsoft.com/machines/v2/<DeviceId>?tid=<tenant_id>). Never fabricate URLs with ?DeviceName=, /machines?, or bare hostnames.

→ Machine page → Response actions → Isolate device, Collect investigation package, Run antivirus scan, Initiate investigation, Restrict app execution

💻 Bulk Devices (2+) — AH query:

DeviceInfo
| where Timestamp > ago(1d)
| where DeviceName in~ ("<device1>", "<device2>")
| summarize arg_max(Timestamp, *) by DeviceId
| project DeviceId, DeviceName, OSPlatform, MachineGroup

Take actions → Isolate device, Collect investigation package, Run antivirus scan, Initiate investigation, Restrict app execution

📁 File — by hash:

Source-aware table selection. SHA hashes appear across many tables (DeviceProcessEvents, DeviceImageLoadEvents, DeviceFileEvents, AlertEvidence). Use DeviceFileEvents as the default — it captures file writes and has the columns needed for Quarantine. If the hash was only observed via process execution (no separate file write event), substitute or union with DeviceProcessEvents. The Quarantine action requires DeviceId + SHA1/SHA256 regardless of source table.

File write events (default — DeviceFileEvents):

DeviceFileEvents
| where Timestamp > ago(7d)
| where SHA1 == "<hash>" or SHA256 == "<hash>"
| project DeviceId, DeviceName, SHA1, SHA256, FileName, FolderPath

Process execution events (when file write not captured — DeviceProcessEvents):

DeviceProcessEvents
| where Timestamp > ago(7d)
| where SHA1 == "<hash>" or SHA256 == "<hash>"
| project DeviceId, DeviceName, SHA1, SHA256, FileName, FolderPath, ProcessCommandLine

Take actions → Quarantine file

🔗 Bulk Indicators (2+ IPs/domains/hashes) — AH query for Add Indicator:

When blocking multiple IPs, domains, or hashes, provide an AH query that surfaces the values as clickable columns. There is no Take actions dropdown — the analyst clicks each value directly in results → Add indicator.

Source-aware table selection. The table MUST match where the IPs were originally discovered. DeviceNetworkEvents is the default for network-layer IPs (endpoint connections, firewall events). However, IPs from authentication-layer sources (AADUserRiskEvents, EntraIdSignInEvents, SigninLogs, AADServicePrincipalSignInLogs) may never appear in endpoint network events — querying DeviceNetworkEvents for those returns 0 results. Use the originating table so the analyst sees the IPs in context and can click to add indicators.

Network-layer IPs (from DeviceNetworkEvents, DeviceLogonEvents, firewall logs):

// Surface attacker IPs as clickable values for Add Indicator
DeviceNetworkEvents
| where Timestamp > ago(7d)
| where RemoteIP in ("<ip1>", "<ip2>", "<ip3>")
| summarize Connections = count(), Ports = make_set(LocalPort) by RemoteIP
| order by Connections desc

Auth-layer IPs (from AADUserRiskEvents, EntraIdSignInEvents, SigninLogs):

// Surface attacker IPs from sign-in/risk events for Add Indicator
EntraIdSignInEvents
| where Timestamp > ago(30d)
| where IPAddress in ("<ip1>", "<ip2>", "<ip3>")
| summarize SignIns = count(), Users = dcount(AccountUpn), Countries = make_set(Country, 5) by IPAddress
| order by SignIns desc

→ Click any IP value in results → Add indicator → Block and remediate

Variant — domains/URLs:

DeviceNetworkEvents
| where Timestamp > ago(7d)
| where RemoteUrl has_any ("<domain1>", "<domain2>")
| summarize Connections = count() by RemoteUrl
| order by Connections desc

→ Click any RemoteUrl value → Add indicator → Block and remediate

Defender XDR Portal Links — All Entity Types

🔴 Every entity (user, domain, URL, IP, file hash) in action/recommendation tables MUST be a clickable Defender XDR portal link — the entity name IS the link. Do NOT add a separate "Portal" column or leave entities as plain text. VS Code renders bare UPNs as mailto: and bare URLs/IPs as broken links.

Entity URL Pattern Example
User https://security.microsoft.com/user?aad=<OID>&upn=<UPN>&tab=overview&tid=<tenant_id> [user@contoso.com](https://security.microsoft.com/user?aad=<OID>&upn=user@contoso.com&tab=overview&tid=<tenant_id>)
Domain https://security.microsoft.com/domains/overview?urlDomain=<domain>&tid=<tenant_id> [contoso.com](https://security.microsoft.com/domains/overview?urlDomain=contoso.com&tid=<tenant_id>)
URL https://security.microsoft.com/url/overview?url=<url-encoded-URL>&tid=<tenant_id> [example.com/path](https://security.microsoft.com/url/overview?url=http%3A%2F%2Fexample.com%2Fpath&tid=<tenant_id>)
IP https://security.microsoft.com/ip/<IP>/overview?tid=<tenant_id> [<IP>](https://security.microsoft.com/ip/<IP>/overview?tid=<tenant_id>)
File Hash https://security.microsoft.com/file/<SHA1-or-SHA256>/?tid=<tenant_id> [da5e459...b1bb1e](https://security.microsoft.com/file/da5e45915354850261cf0e87dc7af19597b1bb1e/?tid=<tenant_id>)
Device https://security.microsoft.com/machines/v2/<MDE_DeviceId>?tid=<tenant_id> [<DeviceName>](https://security.microsoft.com/machines/v2/<MDE_DeviceId>?tid=<tenant_id>)
SPN / Non-Human Identity https://security.microsoft.com/identity-inventory?tab=NonHumanIdentities&tid=<tenant_id> [Non-Human Identities Inventory](https://security.microsoft.com/identity-inventory?tab=NonHumanIdentities&tid=<tenant_id>)

User fallbacks: ?upn=<UPN> when ObjectId is unavailable; ?sid=<SID>&accountName=<Name>&accountDomain=<Domain> for on-prem AD.

Device ID source: DeviceId from the DeviceInfo AH table or the id field from GetDefenderMachine API. This is the MDE machine identifier — NOT the Entra Device Object ID (which is different). The computer-investigation skill retrieves this in Step 1b.

🔴 Portal URL Allowlist — No Invented Paths. The 7 patterns above plus /v2/advanced-hunting?tid=<tenant_id> are the ONLY security.microsoft.com URLs you may emit. For any other action (Custom Indicators, Safe Links policy, Email Explorer, CA policy editor, Secure Score, etc.), write a textual breadcrumb — e.g., "Defender XDR → Settings → Endpoints → Indicators → URLs/Domains → Add item". Never guess a path from memory.

Entity Display — Portal Link vs Defang (Mutually Exclusive)

Context Treatment Example
Action / Take Action / recommendation tables Wrap entity name in portal link (from table above). Never defang. [evil.com](https://security.microsoft.com/domains/overview?urlDomain=evil.com&tid=<tenant_id>)
Data / results tables (raw query output) Defang entity as plain text. Never portal-link. hxxps://evil[.]com/path

Defang rules: http://hxxp://, https://hxxps://, . in domain → [.]. VS Code auto-linkifies anything URL-shaped, which is why defanging is required in data tables. Conversely, a portal-linked entity has the portal URL as the link target, so linkification is safe \u2014 defanging would just break the link.

Rules Summary

Rule Status
Every 🎬 Take Action heading immediately followed by the AI-content warning blockquote REQUIRED
Single entity \u2192 direct portal link (never an AH query) REQUIRED
2+ entities \u2192 AH query with Take Actions, all required columns present, no project on emails REQUIRED
Every AH query includes BOTH a ```kql code block AND a plain [Run in Advanced Hunting](https://security.microsoft.com/v2/advanced-hunting?tid=<tenant_id>) link below it REQUIRED
Action tables: entity = clickable portal link (from the 7 approved patterns). No separate "Portal" column, no defanging. REQUIRED
Data tables: entity = defanged plain text. No portal linking. REQUIRED
Textual breadcrumb ("Defender XDR → …") when no approved portal URL pattern covers the action REQUIRED
Emitting any security.microsoft.com URL outside the 7 approved patterns + /v2/advanced-hunting PROHIBITED
Generating gzip/base64-encoded AH deep links via kql_to_ah_url.py for output PROHIBITED
Non-✅ drill-down surfaces actionable entities but no Take Action block PROHIBITED

Sample KQL Queries

All queries below are verified against live Sentinel/Defender XDR schemas. Use them exactly as written. Lookback periods use ago(Nd) — substitute the user's preferred lookback where noted.

Query 1: Open Incidents with Severity-Ranked Backfill & MITRE Techniques

🔴 Incident hygiene — Surfaces unresolved incidents prioritized by severity (Critical → High → Medium → Low), with age, owner, alert count, MITRE tactics, MITRE technique IDs, and extracted entity names (accounts + devices) for cross-query correlation. In large environments, all 10 slots fill with High/Critical. In smaller environments, Medium/Low backfill remaining slots automatically.

Tool: RunAdvancedHuntingQuery

let OpenIncidents = SecurityIncident
| where TimeGenerated > ago(30d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where Status in ("New", "Active");
let TotalHighCritical = toscalar(OpenIncidents | where Severity in ("High", "Critical") | count);
let TotalAll = toscalar(OpenIncidents | count);
OpenIncidents
| extend SevRank = case(Severity == "Critical", 0, Severity == "High", 1, Severity == "Medium", 2, Severity == "Low", 3, 4)
| extend ParsedLabels = parse_json(Labels)
| mv-apply Label = ParsedLabels on (
    summarize Tags = make_set(tostring(Label.labelName), 5)
)
| extend Tags = set_difference(Tags, dynamic([""]))
| mv-expand AlertId = AlertIds | extend AlertId = tostring(AlertId)
| join kind=leftouter (
    SecurityAlert
    | where TimeGenerated > ago(30d)
    | summarize arg_max(TimeGenerated, Entities, Tactics, Techniques, AlertName, AlertSeverity) by SystemAlertId
    | extend ParsedEntities = parse_json(Entities)
    | mv-expand Entity = ParsedEntities
    | extend EntityType = tostring(Entity.Type),
        AccountUPN = case(
            tostring(Entity.Type) == "account" and isnotempty(tostring(Entity.UPNSuffix)),
            tolower(strcat(tostring(Entity.Name), "@", tostring(Entity.UPNSuffix))),
            tostring(Entity.Type) == "account" and isnotempty(tostring(Entity.AadUserId)),
            tostring(Entity.AadUserId),
            ""),
        HostName = iff(tostring(Entity.Type) == "host", tolower(tostring(Entity.HostName)), "")
    | project SystemAlertId, Tactics, Techniques, AlertName, AlertSeverity, AccountUPN, HostName
) on $left.AlertId == $right.SystemAlertId
| mv-expand Technique = parse_json(Techniques)
| extend Technique = tostring(Technique)
| extend TacticsSplit = split(Tactics, ", ")
| mv-expand Tactic = TacticsSplit
| extend Tactic = tostring(Tactic)
| summarize 
    Tactics = make_set(Tactic),
    Techniques = make_set(Technique),
    AlertNames = make_set(AlertName, 5),
    AlertCount = dcount(AlertId),
    Accounts = make_set(AccountUPN, 5),
    Devices = make_set(HostName, 5),
    Tags = take_any(Tags)
    by ProviderIncidentId, Title, Severity, SevRank, Status, CreatedTime,
       OwnerUPN = tostring(Owner.userPrincipalName)
| extend Techniques = set_difference(Techniques, dynamic([""]))
| extend Tactics = set_difference(Tactics, dynamic([""]))
| extend Accounts = set_difference(Accounts, dynamic([""]))
| extend Devices = set_difference(Devices, dynamic([""]))
| extend AgeDisplay = case(
    datetime_diff('minute', now(), CreatedTime) < 60, strcat(datetime_diff('minute', now(), CreatedTime), "m ago"),
    datetime_diff('hour', now(), CreatedTime) < 24, strcat(datetime_diff('hour', now(), CreatedTime), "h ago"),
    strcat(datetime_diff('day', now(), CreatedTime), "d ago"))
| extend PortalUrl = strcat("https://security.microsoft.com/incidents/", ProviderIncidentId, "?tid=<TENANT_ID>")
| extend TotalHighCritical = TotalHighCritical, TotalAll = TotalAll
| project TotalHighCritical, TotalAll, ProviderIncidentId, Title, Severity, SevRank, AgeDisplay, AlertCount, 
    OwnerUPN, Tactics, Techniques, Accounts, Devices, Tags, PortalUrl, AlertNames, CreatedTime
// --- Deduplicate by Title: keep one representative incident per title for variety ---
| as AllOpenIncidents
| join kind=leftouter (
    AllOpenIncidents | summarize TitleDupCount = count() by Title
) on Title
| project-away Title1
| order by Title asc, SevRank asc, bin(CreatedTime, 1d) desc, AlertCount desc
| extend _rn = row_number(1, prev(Title) != Title)
| where _rn == 1
| project-away _rn
| order by SevRank asc, bin(CreatedTime, 1d) desc, AlertCount desc
| take 10

Purpose: Top 10 open incidents with severity-ranked backfill (Critical→High→Medium→Low). In large envs, all slots fill with High/Critical; small envs backfill with Medium/Low. TotalHighCritical and TotalAll drive the adaptive report header ("Showing 10 of {TotalAll} open incidents ({TotalHighCritical} High/Critical)") and are computed across all open incidents pre-dedup, so header counts stay accurate. The list is deduplicated by Title so the top 10 shows distinct incident types rather than near-identical rows — in noisy envs a single recurring title (password-spray, DLP rule) can otherwise monopolize all 10 slots; the single highest-priority representative per title is kept (severity → newest day → alert count) and TitleDupCount preserves the volume signal. Joins SecurityAlert for MITRE tactics/techniques and extracts Accounts (UPN or AAD ObjectId, lowercased), Devices (hostname, lowercased), and Tags (from Labels — both AutoAssigned ML classifications and User-applied SOC tags) — each capped at 5 per incident — for cross-query correlation with Q3/Q4/Q6/Q7/Q12. Flags unassigned incidents (empty OwnerUPN).

Sort: SevRank asc, bin(CreatedTime, 1d) desc, AlertCount desc — severity tier first, then calendar day (newest first), then complexity within each day.

Verdict logic:

  • 🔴 Escalate: 5+ new High/Critical in 24h, OR any incident with AlertCount > 50, OR any unassigned High/Critical with CredentialAccess/LateralMovement tactics
  • 🟠 Investigate: Any unassigned High/Critical, OR AlertCount > 10, OR multiple High/Critical in <6h
  • 🟡 Monitor: Only Medium/Low incidents exist (no High/Critical), OR High/Critical assigned with low alert count
  • ✅ Clear: 0 open incidents of any severity (Q2 closed summary still renders as context)

Query 2: Closed Incident Summary (7-Day Lookback)

🔴 Threat landscape context — Even when all incidents are resolved, the classification breakdown, MITRE tactic distribution, and severity mix from recent closures provide actionable signals for cross-correlation and query file recommendations.

Tool: RunAdvancedHuntingQuery

Always runs in parallel with Q1 — not conditional on Q1 results.

SecurityIncident
| where CreatedTime > ago(7d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where Status == "Closed"
| where array_length(AlertIds) > 0
| mv-expand AlertId = AlertIds | extend AlertId = tostring(AlertId)
| join kind=leftouter (
    SecurityAlert
    | where TimeGenerated > ago(30d)
    | summarize arg_max(TimeGenerated, Tactics, Techniques) by SystemAlertId
    | project SystemAlertId, Tactics, Techniques
) on $left.AlertId == $right.SystemAlertId
| mv-expand Technique = parse_json(Techniques)
| extend Technique = tostring(Technique)
| extend TacticsSplit = split(Tactics, ", ")
| mv-expand Tactic = TacticsSplit
| extend Tactic = tostring(Tactic)
| summarize
    Total = dcount(IncidentNumber),
    TruePositive = dcountif(IncidentNumber, Classification == "TruePositive"),
    BenignPositive = dcountif(IncidentNumber, Classification == "BenignPositive"),
    FalsePositive = dcountif(IncidentNumber, Classification == "FalsePositive"),
    Undetermined = dcountif(IncidentNumber, Classification == "Undetermined"),
    HighCritical = dcountif(IncidentNumber, Severity in ("High", "Critical")),
    MediumLow = dcountif(IncidentNumber, Severity in ("Medium", "Low")),
    Tactics = make_set(Tactic),
    Techniques = make_set(Technique)
| extend Techniques = set_difference(Techniques, dynamic([""]))
| extend Tactics = set_difference(Tactics, dynamic([""]))

Purpose: Provides a 7-day closed incident summary with classification breakdown (TP/BP/FP/Undetermined), severity distribution, aggregated MITRE tactics, and aggregated MITRE technique IDs. Uses CreatedTime (not TimeGenerated) to match portal "created in last 7 days" semantics — TimeGenerated captures any incident updated in the window, inflating counts with old incidents. Filters array_length(AlertIds) > 0 to exclude phantom incidents — the SecurityIncident table contains hundreds of records synced from XDR with empty AlertIds that never surface in the Defender XDR portal queue (see copilot-instructions.md Known Table Pitfalls). This data feeds three downstream uses:

  1. TP rate signal — High TruePositive ratio indicates an active threat environment
  2. MITRE tactic context — Tactics from closed TPs identify the current threat landscape for cross-correlation with Q3/Q7/Q8 findings
  3. Manifest MITRE matching — The Techniques array contains ATT&CK technique IDs (e.g., T1566, T1078, T1059) directly matchable against manifest entry mitre fields. No tactic→technique mapping needed — the technique IDs are the primary matching key for query file recommendations

Verdict logic:

  • 🟠 Investigate: TruePositive / Total > 0.5 (majority of closures are real threats — active threat environment)
  • 🟡 Monitor: Any TruePositive closures exist, or Undetermined > 0 (some incidents lack classification)
  • ✅ Clear: 0 TruePositive closures; all closures are BenignPositive or FalsePositive
  • 🔵 Informational: 0 closed incidents in 7d

Rendering rules:

  • Always render Q2 results in the report, regardless of Q1 verdict
  • In the Dashboard Summary, Q2 gets its own row. In Detailed Findings, render Q2 immediately after Q1 as a compact summary block
  • Flatten the Tactics and Techniques arrays and report distinct values from TruePositive incidents
  • The Techniques array feeds directly into the Query File Recommendations manifest MITRE matching (no tactic→technique translation needed)
  • If 0 closed incidents in 7d, display: "No incidents closed in the last 7 days"

Query 3: Identity Risk Posture & Risk Event Enrichment

🔐 Identity risk posture — Hybrid two-signal query: IdentityInfo.RiskScore (Defender XDR composite, 0-100) captures alert-chain and MITRE-stage risk, while RiskLevel/RiskStatus (Identity Protection) captures sign-in anomalies and AI-driven signals. Uses both because they are independent engines — a user can have RiskScore=93 with Remediated IdP status, or RiskScore=0 with High/AtRisk IdP status. AADUserRiskEvents enriches with the specific detections explaining why they're flagged.

Tool: RunAdvancedHuntingQuery

let lookback = 7d;
// Layer 1: IdentityInfo — hybrid filter (Defender RiskScore + IdP RiskLevel/Status + Criticality)
let IdentityPosture = IdentityInfo
| where Timestamp > ago(lookback)
| summarize arg_max(Timestamp, *) by AccountUpn
| where RiskScore >= 71
    or RiskLevel in ("High", "Medium")
    or RiskStatus in ("AtRisk", "ConfirmedCompromised")
    or CriticalityLevel >= 3;
// Layer 2: AADUserRiskEvents — enrichment (the why)
let UserRiskEvents = AADUserRiskEvents
| where TimeGenerated > ago(lookback)
| extend Country = tostring(parse_json(Location).countryOrRegion)
| summarize
    RiskDetections = count(),
    HighCount = countif(RiskLevel == "high"),
    TopRiskEventTypes = make_set(RiskEventType, 8),
    TopCountries = make_set(Country, 5),
    LatestDetection = max(TimeGenerated)
    by UserPrincipalName;
// IdentityInfo drives, AADUserRiskEvents enriches
IdentityPosture
| join hint.strategy=broadcast kind=leftouter (UserRiskEvents) 
    on $left.AccountUpn == $right.UserPrincipalName
| extend 
    DisplayName = coalesce(AccountDisplayName, AccountName, AccountUpn),
    PortalUrl = strcat("https://security.microsoft.com/user?",
        case(
            isnotempty(AccountObjectId), strcat("aad=", AccountObjectId, "&upn=", AccountUpn),
            isnotempty(OnPremSid), strcat("sid=", OnPremSid, "&accountName=", AccountName,
                                         "&accountDomain=", AccountDomain),
            isnotempty(AccountUpn), strcat("upn=", AccountUpn),
            ""),
        "&tab=overview&tid=<TENANT_ID>")
| project DisplayName, PortalUrl, RiskScore, RiskLevel, RiskStatus, CriticalityLevel,
    RiskDetections = coalesce(RiskDetections, long(0)),
    HighCount = coalesce(HighCount, long(0)),
    TopRiskEventTypes, TopCountries, LatestDetection
| order by RiskScore desc, HighCount desc, RiskDetections desc, CriticalityLevel desc
| take 15

Purpose: RiskScore (int, 0-100) is the Defender XDR composite score on IdentityInfo — factors include alert chains, MITRE stage progression, and asset criticality. Portal thresholds: 71-90 = High, 91-100 = Critical. RiskLevel/RiskStatus are Identity Protection signals (sign-in anomalies, leaked creds, AI signals) — a separate engine that doesn't always agree with RiskScore. The hybrid OR filter ensures users flagged by either engine surface. Users with both signals firing are highest priority (corroborated).

Output columns: DisplayName (linked to Defender XDR Identity page via PortalUrl), RiskScore (0-100, primary sort), RiskLevel, RiskStatus, CriticalityLevel, RiskDetections (count), HighCount, TopRiskEventTypes, TopCountries, LatestDetection.

Portal URL resolution: Three-tier fallback for identity environment coverage:

  • Cloud/Hybrid (has Entra ObjectId): aad=<ObjectId>&upn=<UPN>
  • On-prem AD (SID only, no Entra sync): sid=<SID>&accountName=<Name>&accountDomain=<Domain>
  • External IdP (UPN only, e.g., CyberArk/Okta): upn=<UPN>

Report rendering: Show top 10 users in the dashboard table. Use DisplayName as clickable link text with PortalUrl as the target. If >10 results, note "+N more — drill down with user-investigation skill". For each user, render RiskScore and TopRiskEventTypes as the key risk indicators.

Verdict logic:

  • 🔴 Escalate: Any user with RiskScore >= 91, or ConfirmedCompromised status, or HighCount > 3, or multiple users with HighCount > 0
  • 🟠 Investigate: RiskScore >= 71, or HighCount > 0 for any user, or any user AtRisk with risk events indicating aiCompoundAccountRisk, impossibleTravel, or maliciousIPAddress
  • 🟡 Monitor: Only Medium risk users with low-severity risk event types (e.g., unfamiliarFeatures)
  • ✅ Clear: 0 users matching the hybrid filter

⚠️ Risk Event Type Routing Guard (Phase 4 drill-down):

  • suspiciousAuthAppApprovalT1621 MFA Fatigue (suspicious Authenticator push approval patterns), NOT OAuth app consent. Route to user-investigation or authentication-tracing. NEVER recommend app-registration-posture based on this risk event alone
  • mcasSuspiciousInboxManipulationRules → T1114.003 email exfiltration via inbox rules. Route to user-investigation with OfficeActivity drill-down

Query 4: Password Spray / Brute-Force Detection

🔐 Auth spray detection (T1110.003 / T1110.001) — Identifies IPs targeting multiple users with failed auth across Entra ID cloud sign-ins AND RDP/SSH/network logons on endpoints.

Tool: RunAdvancedHuntingQuery

// Step 1: Count spray-specific failures per IP (materialized — referenced twice)
let SprayFailures = materialize(EntraIdSignInEvents
| where Timestamp > ago(7d)
| where ErrorCode in (50126, 50053, 50057)
| summarize
    FailedAttempts = count(),
    TargetUsers = dcount(AccountUpn),
    SampleTargets = make_set(AccountUpn, 5),
    FailedApps = make_set(Application, 3),
    Countries = make_set(Country, 3)
    by SourceIP = IPAddress
| where TargetUsers >= 5);
// Step 2: Get full traffic profile for flagged IPs (success context)
let IPTrafficProfile = EntraIdSignInEvents
| where Timestamp > ago(7d)
| where IPAddress in ((SprayFailures | project SourceIP))
| summarize
    TotalSignIns = count(),
    Successes = countif(ErrorCode == 0),
    TotalDistinctUsers = dcount(AccountUpn),
    TotalDistinctApps = dcount(Application)
    by SourceIP = IPAddress;
// Step 3: Join and filter — eliminate shared infrastructure false positives
let EntraResults = SprayFailures
| join kind=inner IPTrafficProfile on SourceIP
| extend 
    SprayRatio = round(FailedAttempts * 100.0 / max_of(TotalSignIns, 1), 1),
    SuccessRate = round(Successes * 100.0 / max_of(TotalSignIns, 1), 1)
| where SprayRatio >= 1.0 and TotalDistinctApps < 50
| extend Surface = "Entra ID"
| project SourceIP, FailedAttempts, TargetUsers, SampleTargets, 
    Protocols = FailedApps, Countries, Surface,
    TotalSignIns, Successes, SprayRatio, SuccessRate, TotalDistinctApps;
// Endpoint brute-force — Surface label by LogonType
let EndpointBrute = DeviceLogonEvents
| where Timestamp > ago(7d)
| where ActionType == "LogonFailed"
| where LogonType in ("RemoteInteractive", "Network")
| where isnotempty(RemoteIP)
| summarize
    FailedAttempts = count(),
    TargetUsers = dcount(AccountName),
    SampleTargets = make_set(AccountName, 5),
    Protocols = make_set(strcat(LogonType, " → ", DeviceName), 3),
    Countries = dynamic(["—"]),
    LogonTypes = make_set(LogonType)
    by SourceIP = RemoteIP
| where FailedAttempts >= 10
| extend Surface = iff(array_length(LogonTypes) == 1 and LogonTypes[0] == "RemoteInteractive", "Endpoint (RDP)", "Endpoint (Network Logon)"),
    TotalSignIns = FailedAttempts, Successes = long(0), 
    SprayRatio = 100.0, SuccessRate = 0.0, TotalDistinctApps = long(0)
| project-away LogonTypes;
union EntraResults, EndpointBrute
| order by SprayRatio desc, TargetUsers desc, FailedAttempts desc
| take 15

Purpose: Detects password spray (1 IP → many users, MITRE T1110.003) and brute-force (1 IP → high failure count, T1110.001) across two surfaces, with shared infrastructure false-positive filtering:

  • Entra ID: Uses EntraIdSignInEvents (Advanced Hunting) which merges interactive + non-interactive sign-ins into a single table. Error codes: 50126=bad password, 50053=locked account, 50057=disabled account. The query enriches failure data with the IP's full traffic profile to compute SprayRatio (spray failures ÷ total sign-ins) and TotalDistinctApps. Two filters eliminate corporate proxies, VPN concentrators, and Azure gateways:
    • SprayRatio >= 1.0 — spray failures must be ≥1% of the IP's total sign-in volume. A proxy with 500K sign-ins and 77 spray errors → 0.01% → filtered. A pure attacker with 77 failures and 0 successes → 100% → kept.
    • TotalDistinctApps < 50 — IPs serving 50+ distinct applications are shared infrastructure. Real spray targets 1–3 apps.
  • Endpoint: RDP (RemoteInteractive) and Network Logon (Network) failed logons on MDE-enrolled devices. Surface labels: Endpoint (RDP) for pure RemoteInteractive, Endpoint (Network Logon) for anything involving Network logon type. NLA caveat: RDP with Network Level Authentication generates LogonType == "Network" (not RemoteInteractive), so Endpoint (Network Logon) may be RDP-via-NLA or SMB — the manifest surfaces both rdp_threat_detection.md and smb_threat_detection.md for drill-down. Threshold of ≥10 failures. No success context available in DeviceLogonEvents for filtering.

Output columns: SourceIP, FailedAttempts, TargetUsers, SampleTargets, Protocols, Countries, Surface, TotalSignIns, Successes, SprayRatio, SuccessRate, TotalDistinctApps. The SprayRatio and TotalDistinctApps columns provide immediate false-positive triage context.

Verdict logic:

  • 🔴 Escalate: Any IP targeting >25 Entra users OR >100 endpoint failures from a single IP
  • 🟠 Investigate: Any spray/brute-force pattern detected (meets thresholds)
  • 🟡 Monitor: Spray activity detected but below thresholds (e.g., single IP with 3–4 target users, or <10 endpoint failures)
  • ✅ Clear: 0 results — no spray/brute-force patterns detected

Drill-down: Use user-investigation skill for targeted users, ioc-investigation for source IPs.


Query 5: SPN Behavioral Drift (90d Baseline vs 7d Recent)

🤖 Automation monitoring — Composite drift score across 5 dimensions for service principals, with IPv6 subnet normalization and IPDrift cap.

Tool: mcp_sentinel-data_query_lake (needs >30d lookback)

let BL_Start = ago(97d); let BL_End = ago(7d);
let RC_Start = ago(7d); let RC_End = now();
let BL = AADServicePrincipalSignInLogs
| where TimeGenerated between (BL_Start .. BL_End)
| extend NormalizedIP = case(
    IPAddress has ":", strcat_array(array_slice(split(IPAddress, ":"), 0, 3), ":"),
    IPAddress)
| summarize 
    BL_Vol = count(),
    BL_Res = dcount(ResourceDisplayName),
    BL_IPs = dcount(NormalizedIP),
    BL_Loc = dcount(Location),
    BL_Fail = dcountif(ResultType, ResultType != "0" and ResultType != 0)
    by ServicePrincipalId, ServicePrincipalName;
let RC = AADServicePrincipalSignInLogs
| where TimeGenerated between (RC_Start .. RC_End)
| extend NormalizedIP = case(
    IPAddress has ":", strcat_array(array_slice(split(IPAddress, ":"), 0, 3), ":"),
    IPAddress)
| summarize 
    RC_Vol = count(),
    RC_Res = dcount(ResourceDisplayName),
    RC_IPs = dcount(NormalizedIP),
    RC_Loc = dcount(Location),
    RC_Fail = dcountif(ResultType, ResultType != "0" and ResultType != 0)
    by ServicePrincipalId, ServicePrincipalName;
RC | join kind=inner BL on ServicePrincipalId
| extend 
    VolDrift = round(RC_Vol * 100.0 / max_of(BL_Vol, 10), 0),
    ResDrift = round(RC_Res * 100.0 / max_of(BL_Res, 3), 0),
    IPDriftRaw = round(RC_IPs * 100.0 / max_of(BL_IPs, 3), 0),
    IPDrift = min_of(round(RC_IPs * 100.0 / max_of(BL_IPs, 3), 0), 300),
    LocDrift = round(RC_Loc * 100.0 / max_of(BL_Loc, 2), 0),
    FailDrift = round(RC_Fail * 100.0 / max_of(BL_Fail, 5), 0)
| extend DriftScore = round((VolDrift*0.20 + ResDrift*0.25 + IPDrift*0.25 + LocDrift*0.15 + FailDrift*0.15), 0)
| where DriftScore > 120
| order by DriftScore desc
| take 10

Purpose: Identifies service principals with significant behavioral changes from their 90-day baseline.

Tuning notes:

  • IPv6 /64 normalization: IPv6 addresses are collapsed to their /64 prefix before counting. Azure PaaS services (Copilot Studio, Playbook Automation) rotate through dozens of fd00: ULA pod addresses within the same cluster — without normalization, each pod IP inflates IPDrift by hundreds of percent.
  • IPDrift cap (300%): IPDriftRaw shows the true ratio; IPDrift is capped to prevent IP-only spikes from dominating. Transparent when IPv4-only SPNs have genuine expansion.
  • Weights: Volume 20%, Resources 25%, IPs 25%, Locations 15%, Failure Rate 15%.

Verdict logic:

  • 🔴 Escalate: Any SPN with DriftScore > 250 or IPDriftRaw > 400%
  • 🟠 Investigate: DriftScore > 150
  • 🟡 Monitor: DriftScore 120–150
  • ✅ Clear: No SPNs above threshold

Drill-down: Use scope-drift-detection/spn skill for full investigation of flagged SPNs.


Query 6: Fleet-Wide Device Process Drift

💻 Endpoint behavioral baseline — Per-device drift scores computed in-query (7d baseline vs 1d recent), with infrastructure noise filtering and VolDrift cap to prevent automation-driven false positives.

Tool: RunAdvancedHuntingQuery

let uptime = DeviceInfo
| where Timestamp > ago(7d)
| extend IsRecent = Timestamp >= ago(1d)
| summarize
    BaselineHours = dcountif(bin(Timestamp, 1h), not(IsRecent)),
    RecentHours   = dcountif(bin(Timestamp, 1h), IsRecent)
    by DeviceName;
DeviceProcessEvents
| where Timestamp > ago(7d)
| where not(
    InitiatingProcessFileName in ("gc_worker", "gc_linux_service", "dsc_host")
    or (InitiatingProcessFileName == "dash" and InitiatingProcessParentFileName in ("gc_worker", "gc_linux_service"))
  )
| extend IsRecent = Timestamp >= ago(1d), DayBucket = bin(Timestamp, 1d)
| summarize
    BL_Events = countif(not(IsRecent)),
    RC_Events = countif(IsRecent),
    BL_Procs = dcountif(FileName, not(IsRecent)),
    RC_Procs = dcountif(FileName, IsRecent),
    BL_Accts = dcountif(AccountName, not(IsRecent)),
    RC_Accts = dcountif(AccountName, IsRecent),
    BL_Chains = dcountif(strcat(InitiatingProcessFileName, "→", FileName), not(IsRecent)),
    RC_Chains = dcountif(strcat(InitiatingProcessFileName, "→", FileName), IsRecent),
    BL_Comps = dcountif(ProcessVersionInfoCompanyName, not(IsRecent)),
    RC_Comps = dcountif(ProcessVersionInfoCompanyName, IsRecent),
    BaselineDays = dcountif(DayBucket, not(IsRecent))
    by DeviceName
| where RC_Events > 0 and BL_Events > 0 and BaselineDays >= 4
| join kind=inner uptime on DeviceName
| where BaselineHours >= 48 and RecentHours >= 4
| extend
    VolDriftRaw = round(RC_Events * 600.0 / max_of(BL_Events, 1), 0),
    VolDrift = min_of(round(RC_Events * 600.0 / max_of(BL_Events, 1), 0), 300),
    ProcDrift = round(RC_Procs * 100.0 / max_of(BL_Procs, 1), 0),
    AcctDrift = round(RC_Accts * 100.0 / max_of(BL_Accts, 1), 0),
    ChainDrift = round(RC_Chains * 100.0 / max_of(BL_Chains, 1), 0),
    CompDrift = round(RC_Comps * 100.0 / max_of(BL_Comps, 1), 0)
| extend DriftScore = round(VolDrift * 0.30 + ProcDrift * 0.25 + ChainDrift * 0.20 + AcctDrift * 0.15 + CompDrift * 0.10, 0)
| order by DriftScore desc
| take 10
| project DeviceName, DriftScore, BaselineDays, BaselineHours, RecentHours, VolDriftRaw, VolDrift, ProcDrift, AcctDrift, ChainDrift, CompDrift

Purpose: Returns the top 10 devices ranked by composite drift score, pre-computed in KQL. No LLM-side math required — just interpret the returned scores.

Tuning notes:

  • GC filter: Excludes Azure Guest Configuration noise (Linux only; <1% impact on Windows).
  • Uptime + baseline-days gates: Filter intermittent endpoints whose offline baseline inflates VolDrift. Drop the DeviceInfo join if heartbeats aren't ingested.
  • VolDrift cap (300%): VolDriftRaw preserves the true ratio. High VolDriftRaw with ~100 diversity metrics = infrastructure noise; both elevated = high-confidence anomaly.
  • Weights: Volume 30%, Processes 25%, Chains 20%, Accounts 15%, Companies 10%.

Verdict logic: See Device Drift Score Interpretation in Post-Processing for the full scale, VolDrift cap context, and fleet-uniformity rule.


Query 7: Rare Process Chain Singletons

💻 Threat hunting — Parent→child process combinations appearing fewer than 3 times in 30 days.

Tool: RunAdvancedHuntingQuery

DeviceProcessEvents
| where Timestamp > ago(30d)
| summarize 
    Count = count(),
    UniqueDevices = dcount(DeviceName),
    SampleDevice = take_any(DeviceName),
    SampleUser = strcat(take_any(AccountDomain), "\\", take_any(AccountName)),
    SampleChildCmd = take_any(ProcessCommandLine),
    GrandparentProcess = take_any(InitiatingProcessParentFileName),
    LastSeen = max(Timestamp)
    by ParentProcess = InitiatingProcessFileName, ChildProcess = FileName
| where Count < 3
| order by Count asc, UniqueDevices asc
| take 20

Purpose: Surfaces the 20 rarest process chains — singletons and near-singletons within the 30-day AH window. Effective for spotting LOLBin abuse, malware execution, or novel attack tooling. Review SampleChildCmd for suspicious command-line patterns.

Verdict logic:

  • 🟠 Investigate: Any singleton with suspicious parent (cmd.exe, powershell.exe, wscript.exe, mshta.exe, rundll32.exe) or child running from temp/user profile directories
  • 🟡 Monitor: Rare chains from system/update processes (version-stamped binaries, Azure VM agents)
  • ✅ Clear: All rare chains are explainable infrastructure artifacts

Query 8: Inbound Email Threat Snapshot

📧 Email posture — Single-row summary of inbound email volume, threat breakdown, and delivered threats.

Tool: RunAdvancedHuntingQuery

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize
    TotalInbound = count(),
    Clean = countif(isempty(ThreatTypes)),
    Phish = countif(ThreatTypes has "Phish"),
    Malware = countif(ThreatTypes has "Malware"),
    Spam = countif(ThreatTypes has "Spam"),
    HighConfPhish = countif(ConfidenceLevel has "High" and ThreatTypes has "Phish"),
    Blocked = countif(DeliveryAction == "Blocked"),
    Delivered = countif(DeliveryAction == "Delivered"),
    PhishDelivered = countif(ThreatTypes has "Phish" and DeliveryAction == "Delivered"),
    DistinctSenders = dcount(SenderFromAddress),
    DistinctRecipients = dcount(RecipientEmailAddress)

Purpose: Instant C-level email posture briefing. The key escalation metric is PhishDelivered — phishing emails that bypassed all protections and reached mailboxes.

Verdict logic:

  • 🔴 Escalate: PhishDelivered > 5 or Malware > 0 delivered
  • 🟠 Investigate: PhishDelivered > 0 (any phishing reached mailboxes)
  • 🟡 Monitor: Phishing detected but 100% blocked/junked
  • ✅ Clear: 0 phishing, 0 malware

Drill-down: Use email-threat-posture skill for full email security analysis including ZAP, Safe Links, and authentication breakdown.


Query 9: Cloud App Suspicious Activity

🔑 Cloud ops monitoring — Detects mailbox rule manipulation, transport rule changes, mailbox delegation, MCAS-flagged compromised sign-ins, and human-initiated Conditional Access policy changes via CloudAppEvents. Focuses on rule/permission/CA mutations — the lower-confidence signals not duplicated by Q1's incident roll-up.

Tool: RunAdvancedHuntingQuery

// Allow-list of Microsoft platform service principals that perform automated mailbox/CA lifecycle ops.
// These appear with empty AccountDisplayName; the real actor name lives in RawEventData.UserId.
// Pattern: any RawEventData.UserId starting with "NT SERVICE\" is Microsoft datacenter automation
// (e.g., MSExchangeAdminApiNetCore for tenant-onboarding/permission hygiene). Exclude from analyst
// view to avoid false-positive "empty actor" alarms.
let PlatformServicePrefix = @"NT SERVICE\";
CloudAppEvents
| where Timestamp > ago(7d)
| where ActionType in (
    // Exchange — Mail flow manipulation
    "New-InboxRule", "Set-InboxRule", "Set-Mailbox",
    "Add-MailboxPermission", "New-TransportRule", "Set-TransportRule", "New-Mailbox",
    // Exchange — Anti-forensic
    "Remove-MailboxPermission", "Remove-InboxRule",
    // Conditional Access manipulation (human-initiated only)
    "Set-ConditionalAccessPolicy", "New-ConditionalAccessPolicy",
    // Compromise signals
    "CompromisedSignIn"
)
// Resolve effective actor: AccountDisplayName when present, else RawEventData.UserId
| extend RawUserId = tostring(parse_json(tostring(RawEventData)).UserId)
| extend EffectiveActor = iff(isnotempty(AccountDisplayName), AccountDisplayName, RawUserId)
// Exclude Microsoft platform service principals (datacenter automation noise)
| where not(EffectiveActor startswith PlatformServicePrefix)
// Filter out system/automation-driven CA changes (CA agent, backup policies)
| where not(ActionType in ("Set-ConditionalAccessPolicy", "New-ConditionalAccessPolicy")
            and isempty(EffectiveActor))
| extend Category = case(
    ActionType in ("New-InboxRule", "Set-InboxRule", "Remove-InboxRule",
                   "Set-Mailbox", "Add-MailboxPermission", "Remove-MailboxPermission",
                   "New-TransportRule", "Set-TransportRule", "New-Mailbox"),
    "Exchange Admin/Rule Change",
    ActionType in ("Set-ConditionalAccessPolicy", "New-ConditionalAccessPolicy"),
    "Conditional Access Change",
    ActionType == "CompromisedSignIn",
    "Compromised Sign-In",
    "Other")
| summarize
    Count = count(),
    UniqueActors = dcount(EffectiveActor),
    TopActors = make_set(EffectiveActor, 5),
    Actions = make_set(ActionType, 5),
    LatestTime = max(Timestamp)
    by Category
| order by Count desc

Purpose: Three-category view of cloud app activity invisible to Q10 (AuditLogs). CompromisedSignIn is an MCAS signal independent from Q3's Identity Protection risk events — dual-source corroboration when both fire. CA changes with empty AccountDisplayName are system/agent-driven and filtered out. Inbox rule, transport rule, and mailbox permission changes are the primary BEC persistence/exfil mechanisms — even when no rule has a forwarding payload, rule creation by a previously-flagged user is a strong follow-up signal.

Actor resolution: AccountDisplayName is often empty for non-interactive ops; the query falls back to RawEventData.UserId. Actors prefixed NT SERVICE\ are Microsoft datacenter automation (e.g., MSExchangeAdminApiNetCore) and are excluded.

Verdict logic:

  • 🔴 Escalate: Compromised Sign-In with 5+ users, OR Conditional Access Change by any human actor, OR Exchange Admin/Rule Change with forwarding-related rules (New-InboxRule, Set-InboxRule, New-TransportRule)
  • 🟠 Investigate: Compromised Sign-In (any count), OR Remove-InboxRule / Remove-MailboxPermission (anti-forensic cleanup signals)
  • 🟡 Monitor: Low-count Set-Mailbox from system actors
  • ✅ Clear: 0 results across all categories

Drill-down: Use user-investigation for actors in Compromised Sign-In category. Use ca-policy-investigation for Conditional Access Change. For any Exchange-related Q9 finding, also query OfficeActivity | where OfficeWorkload == "Exchange" — CloudAppEvents only surfaces ActionType summaries; OfficeActivity carries the full Parameters JSON (ForwardTo / RedirectTo / ForwardingSmtpAddress), per-operation ClientIP, and ops like MoveToDeletedItems / SoftDelete / HardDelete / MailboxLogin that reveal post-compromise forensics. See queries/email/email_threat_detection.md and the CloudAppEvents / OfficeActivity entries in copilot-instructions.md Known Table Pitfalls.


Query 10: High-Impact Privileged Operations

🔑 Admin activity monitoring — Category-aggregated view of privileged operations: role assignments, PIM activations, credential lifecycle, consent grants, CA policy changes, password management, MFA registration, app registration, and ownership grants.

Tool: RunAdvancedHuntingQuery

let PrivOps = AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName has_any (
    "role", "credential", "consent", "Conditional Access", "password", "certificate",
    "security info", "owner", "application"
)
| where Result == "success"
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)
// Exclude system-driven CA policy additions (empty actor = CA agent)
| where not(OperationName has "conditional access" and isempty(Actor))
| extend Target = tostring(TargetResources[0].displayName)
| extend Category = case(
    OperationName has "security info", "MFA-Registration",
    OperationName has "owner", "Ownership",
    OperationName has "application", "AppRegistration",
    OperationName has "role", "RoleManagement",
    OperationName has "credential" or OperationName has "certificate", "Credentials",
    OperationName has "consent", "Consent",
    OperationName has "Conditional Access", "ConditionalAccess",
    OperationName has "password", "Password",
    "Other");
PrivOps
| summarize 
    Count = count(),
    UniqueActors = dcount(Actor),
    TopActors = make_set(Actor, 5),
    Operations = make_set(OperationName, 5),
    Targets = make_set(Target, 5),
    LatestTime = max(TimeGenerated)
    by Category
| order by Count desc

Purpose: Category-level aggregation ensures all 8 privilege domains surface regardless of volume distribution (previous per-actor aggregation was truncated at 15 rows, hiding MFA-Registration, Ownership, and AppRegistration). Key non-obvious details: MFA-Registration deletion + re-registration by same user = credential takeover (T1556.006). Ownership grants to external accounts = persistence (T1098). System-driven CA additions (empty Actor) are filtered out. Password category is high-volume by nature — flag single-actor bulk resets, not self-service.

Verdict logic:

  • 🔴 Escalate: MFA-Registration deletions + registrations for same user (method swap attack), OR Consent grants from unexpected actors, OR Ownership grants to external accounts, OR ConditionalAccess changes by non-admin actors, OR AppRegistration with secrets management from external domains
  • 🟠 Investigate: MFA-Registration from CTF/external accounts, OR RoleManagement targeting Global Admin / Security Admin roles, OR AppRegistration consent operations, OR Password with bulk admin resets (single actor, 10+ targets)
  • 🟡 Monitor: Normal PIM activations and expirations, self-service password resets, credential lifecycle (WHfB/passkey registration)
  • ✅ Clear: 0 results or only system-driven operations with expected volume

Query 11: Critical Assets with Verified Internet Exposure

🛡️ Attack surface — Combines ExposureGraph critical asset inventory with MDE's authoritative DeviceInfo.IsInternetFacing classification to identify verified internet-exposed critical assets.

Tool: RunAdvancedHuntingQuery

let InternetFacing = DeviceInfo
    | where Timestamp > ago(7d)
    | where IsInternetFacing == true
    | summarize arg_max(Timestamp, *) by DeviceId
    | project DeviceName,
        Reason = extractjson("$.InternetFacingReason", AdditionalFields, typeof(string)),
        PublicIP = extractjson("$.InternetFacingPublicScannedIp", AdditionalFields, typeof(string)),
        ExposedPort = extractjson("$.InternetFacingLocalPort", AdditionalFields, typeof(int));
let CriticalAssets = ExposureGraphNodes
    | where set_has_element(Categories, "device")
    | where isnotnull(NodeProperties.rawData.criticalityLevel)
    | extend critLevel = toint(NodeProperties.rawData.criticalityLevel.criticalityLevel)
    | where critLevel < 4
    | project DeviceName = NodeName, CriticalityLevel = critLevel,
        ExposureScore = tostring(NodeProperties.rawData.exposureScore);
CriticalAssets
| join kind=leftouter InternetFacing on DeviceName
| extend IsVerifiedExposed = isnotempty(PublicIP) or isnotempty(Reason)
| project DeviceName, CriticalityLevel, IsVerifiedExposed,
    Reason, PublicIP, ExposedPort, ExposureScore
| order by IsVerifiedExposed desc, CriticalityLevel asc
| take 25

Purpose: Returns the critical asset inventory (criticality 0–3) enriched with MDE's authoritative internet-facing classification. DeviceInfo.IsInternetFacing is confirmed via Microsoft external scans or observed inbound connections and auto-expires after 48h — far more reliable than ExposureGraph properties like isCustomerFacing (business flag) or rawData.IsInternetFacing (not populated in many environments). See MS Docs and queries/network/internet_exposure_analysis.md Query 1 for the canonical reference.

IsVerifiedExposed logic: Checks BOTH PublicIP (populated for PublicScan — Microsoft external scanner) AND Reason (populated for ExternalNetworkConnection — observed inbound traffic). The original isnotempty(PublicIP) missed ExternalNetworkConnection exposures where MDE confirms inbound connections but doesn't populate the scanned public IP field.

Verdict logic:

  • 🔴 Escalate: Any IsVerifiedExposed == true with CriticalityLevel == 0 (internet-facing domain controller/CA)
  • 🟠 Investigate: Any IsVerifiedExposed == true (internet-facing critical asset)
  • 🟡 Monitor: Critical assets exist but none verified internet-facing
  • ✅ Clear: All critical assets properly segmented, no internet exposure

Query 12: Exploitable CVEs (CVSS ≥ 8.0) Across Fleet

🛡️ Vulnerability patch priority — Top exploitable critical CVEs with affected device count.

Tool: RunAdvancedHuntingQuery

DeviceTvmSoftwareVulnerabilities
| join kind=inner (
    DeviceTvmSoftwareVulnerabilitiesKB
    | where IsExploitAvailable == true
    | where CvssScore >= 8.0
) on CveId
| summarize 
    AffectedDevices = dcount(DeviceName),
    SampleDevices = make_set(DeviceName, 3),
    Software = make_set(SoftwareName, 3)
    by CveId, VulnerabilitySeverityLevel, CvssScore
| order by AffectedDevices desc, CvssScore desc
| take 15

Purpose: Instant "what should we patch today" list. Ranks exploitable CVEs by fleet impact (devices affected × CVSS severity). Focus on CVEs with public exploits affecting the most devices.

Verdict logic:

  • 🔴 Escalate: Any CVE with CvssScore >= 9.0 AND AffectedDevices > 10
  • 🟠 Investigate: CVE with CvssScore >= 8.0 AND AffectedDevices > 5
  • 🟡 Monitor: Exploitable CVEs exist but affect < 5 devices
  • ✅ Clear: No exploitable CVEs with CVSS ≥ 8.0 (unlikely but possible in small environments)

Drill-down: Use exposure-investigation skill for full vulnerability posture assessment.


Post-Processing

Device Drift Score Interpretation (Q6)

Q6 returns pre-computed drift scores directly from KQL — no LLM-side math is needed. Simply present the returned table and apply verdicts using this scale:

DriftScore Interpretation Verdict
< 80 Contracting activity (device may be idle/decommissioned) 🔵 Informational
80–110 Stable steady-state servers (fleet floor with uptime gate — was 80–120 pre-uptime-filter) ✅ Clear
110–130 Minor behavioral expansion 🟡 Monitor
130–180 Significant deviation — includes genuine intermittent-workstation drift now that uptime FPs are filtered 🟠 Investigate
180+ Major anomaly — multi-dimensional with confirmed uptime baseline 🔴 Escalate

VolDrift cap context: VolDriftRaw is projected alongside the capped VolDrift. When interpreting results:

  • If VolDriftRaw ≫ 300 but ProcDrift/ChainDrift/AcctDrift are near 100: infrastructure volume spike (GC, patching, agent restart) — low concern despite high raw volume.
  • If VolDriftRaw > 300 AND ProcDrift/ChainDrift/AcctDrift are also elevated: genuine multi-dimensional anomaly — high confidence finding.
  • If VolDriftRaw ≤ 300: cap was not triggered — score reflects true proportions.

Fleet-uniformity rule: If ALL top-10 devices cluster within 20 points of each other, the fleet is behaving uniformly and the verdict should be downgraded one level. Drift is most meaningful when individual devices diverge from the fleet cluster.

⛔ DO NOT manually recompute drift scores. The KQL query handles Volume normalization (÷6 baseline days), VolDrift capping (at 300%), GC infrastructure filtering, and dcount comparison (direct ratio). Trust the returned DriftScore column.

Cross-Query Correlation

After all queries complete, check these correlation patterns and escalate priority when found:

Pattern Queries Implication Action
Incident account matches risky identity Q1 Accounts ∩ Q3 AccountUpn Incident involves user already flagged AtRisk/Compromised — corroborated signal Escalate to 🔴
Incident device matches drifting endpoint Q1 Devices ∩ Q6 DeviceName Incident target has behavioral anomalies on endpoint Escalate to 🔴
Incident device has exploitable CVE Q1 Devices ∩ Q12 DeviceName Incident device is vulnerable to active exploitation Escalate to 🔴
Spray target already in incident Q4 targets ∩ Q1 Accounts Spray target is already involved in an active incident Escalate to 🔴
SPN drift AND unusual credential/consent activity Q5 + Q10 App credential abuse / persistence Escalate to 🔴
Device with rare process chain AND exploitable CVE Q7 + Q12 Potential active exploitation Escalate to 🔴
Spray IP target already flagged as risky Q4 + Q3 Spray target has active Identity Protection risk Escalate to 🔴
Closed TP tactics match active findings Q2 + Q3/Q7/Q8 Same attack pattern recurring despite recent closures Escalate to 🟠, note recurrence
Mailbox rule manipulation AND email threats Q9 + Q8 Potential email exfiltration setup following phishing Escalate to 🔴
Compromised Sign-In user matches risky identity Q9 Compromised Sign-In ∩ Q3 AccountUpn MCAS compromise + Identity Protection risk — dual-signal corroboration Escalate to 🔴
Compromised Sign-In user has Mailbox Read (API) Q9 Compromised Sign-In ∩ Q9 Mailbox Read (API) Compromised account actively exfiltrating email via API — BEC kill chain Escalate to 🔴
Compromised Sign-In user in open incident Q9 Compromised Sign-In ∩ Q1 Accounts MCAS compromise detection overlaps active incident entities Escalate to 🔴
MFA registration from spray target Q10 MFA-Registration ∩ Q4 spray targets Attacker completing MFA enrollment after successful spray — T1556.006 Escalate to 🔴
MFA registration from risky user Q10 MFA-Registration ∩ Q3 AccountUpn Risky user registering new auth methods — potential credential takeover Escalate to 🔴
App registration + SPN drift Q10 AppRegistration ∩ Q5 SPN drift New app + expanding SPN footprint = T1098.001 app-based persistence Escalate to 🔴
CA policy change + spray/compromise activity Q9 Conditional Access Change + Q4 or Q9 Compromised Sign-In Defense weakened during active attack Escalate to 🔴
Mailbox Read (API) user has inbox rule changes Q9 Mailbox Read (API) ∩ Q9 Exchange Admin/Rule Change Programmatic read + forwarding rule = full exfiltration chain (T1114.003) Escalate to 🔴
Phishing recipient is risky user Q8 delivered phishing ∩ Q3 AccountUpn Credential harvesting targeting already-compromised or at-risk user — AiTM chain indicator Escalate to 🔴
DLP/exfiltration incident + API mailbox access Q1 Exfiltration tactic ∩ Q9 Mailbox Read (API) Incident-level exfiltration alert + active API data access — data loss in progress Escalate to 🔴
Role management + SPN drift by same actor Q10 RoleManagement same actor ∩ Q5 SPN drift Role escalation + expanding app footprint = app-based persistence (T1098) Escalate to 🔴

Query File Recommendations

Use .github/manifests/discovery-manifest.yaml (auto-generated by python .github/manifests/build_manifest.py) to match findings to downstream query files and skills. Contains title, path, domains, mitre, prompt.

Skip entirely when all verdicts are ✅. Tier depth follows the Rule 8 table.

Query-to-Domain Map

Query Group Domain Tag(s)
Q1, Q2 (Incidents) incidents
Q3, Q4 (Identity) identity
Q5 (SPN Drift) spn
Q6, Q7 (Endpoint) endpoint
Q8 (Email) email
Q9, Q10 (Admin & Cloud) admin, cloud
Q11, Q12 (Exposure) exposure

Valid tags: incidents, identity, spn, endpoint, email, admin, cloud, exposure.

Procedure

For each non-✅ verdict, collect its domain tag(s), then:

  1. Query files — filter manifest.queries where domains contains ANY active tag. Rank by (a) number of matching tags, (b) MITRE technique overlap with Q1/Q2 Techniques (exact string match on mitre field), (c) keyword overlap (entities, process names, CVE IDs, ActionTypes) against title/path. Select top 3–5 files for 🔴/🟠, 1–2 for 🟡-only.
  2. Skills — filter manifest.skills where domains matches. Substitute actual entity values into the prompt template's {entity} placeholder. 🔴/🟠: include all matches as drill-down options; 🟡-only: limit to 3. Skills without domains (tooling/visualization) are never auto-suggested.

Report Output

Insert 📂 Recommended Query Files after 🎯 Recommended Actions. Include a 🔧 Suggested Skill Drill-Downs sub-section with manifest skill prompts (entity-substituted).

⛔ Numbered list, NOT table — links inside table cells don't render clickable in VS Code chat.

Format: 1. **[<Title>](queries/<subfolder>/<file>.md)** — Q<N>: <finding> — 💡 *"<entity-specific prompt>"*

  • Link text = manifest title, target = manifest path (forward slashes).
  • Prompts MUST reference specific entities/IOCs/TTPs from findings — no generic placeholders.
  • When no matching files: suggest authoring new queries.

Adding New Query Files or Skills

  1. Query files: add **Domains:** <tag1>, <tag2> to metadata header (after **MITRE:**).
  2. Skills: add threat_pulse_domains: [<tag>] and drill_down_prompt: '<prompt>' to YAML frontmatter.
  3. Run python .github/manifests/build_manifest.py — validator flags missing fields.

Report Template

Output modes:

  • Inline chat (default) — render in chat. Truncate data tables to 10 rows; omit Drill-Down, Cross-Investigation, and Investigation Timeline sections when no drill-downs have executed.
  • Markdown file — triggered by 💾 Save full investigation report in Phase 4. Full data tables, no row limits. Path: reports/threat-pulse/Threat_Pulse_YYYYMMDD_HHMMSS.md. Source data: pulse results from context + /memories/session/threat-pulse-drilldowns.md (authoritative after context compaction).

Verdicts: 🔴 Escalate | 🟠 Investigate | 🟡 Monitor | ✅ Clear | 🔵 Informational | ❓ No Data

  • ❓ No Data — query returned table resolution error or timeout. Report the error and table. Treat as monitoring gap.
  • 🔵 Informational — neutral context (e.g., Q2 with 0 closures, Q6 with DriftScore < 80). No action needed.
  • Zero results format: ✅ No <type> detected in the last <N>d. Checked: <table> (0 matches)

Structure

# 🔍 Threat Pulse — <Workspace> | <Date>
**Workspace:** <name> (`<id>`)  
**Scan Date:** <YYYY-MM-DD HH:MM UTC>  
**Scan Duration:** <N>min | **Queries:** 12 | **Drill-Downs:** <N>  (file mode only)

## Executive Summary
<2–4 sentences synthesizing pulse + drill-down findings. State final risk posture incorporating all evidence.>

## Dashboard Summary
<12-row table (Q1, Q2, Q3, Q4–Q12) — columns: #, Domain, Status (verdict emoji), Key Finding (1-line).>

## Detailed Findings
<One section per query — EVERY query gets a section (no skipping). Q2 closed summary always renders after Q1 even when Q1 is ✅.>

## Cross-Query Correlations
<Table per Post-Processing rules, or `✅ No correlations detected`.>

## 🎯 Recommended Actions
<Prioritized table: action, trigger query, drill-down skill.>

## 📂 Recommended Query Files
<Per Report Output Block procedure. For 🟡-only verdicts use "📂 Proactive Hunting Suggestions" header. Omit when all ✅.>

## Drill-Down Investigation Results       (file mode, when drill-downs executed)
### 1. <Title> — <Skill Name>
**Triggered by:** Q<N> — <finding>  
**Entity:** <target> | **Lookback:** <timerange> | **Risk:** <emoji> <level>

**Key Findings:** <max 8 evidence-cited bullets>

**Evidence Summary:** <1–2 paragraph narrative with specific numbers/identifiers. Back-reference pulse queries.>

**Recommendations:** <numbered actions>

### 2. <Next Title> — <Skill Name>
...

## Cross-Investigation Correlation        (file mode, when drill-downs executed)
| Connection | Evidence | Drill-Downs | Implication |
|-----------|----------|-------------|-------------|
<Patterns only visible across multiple investigations. If none: `✅ No cross-investigation correlations identified — each finding is independent.`>

## Consolidated Recommendations           (file mode)
| Priority | Recommendation | Source | Risk |
<Deduplicated across pulse + drill-downs. If same action appears in both, cite both sources on one row.>

## Appendix: Investigation Timeline       (file mode)
| Time | Action | Key Result |

Column / Format Rules

  • Q1: | Incident | Sev | Title | Age | Alerts | Owner | Tactics | Accounts | Devices | Tags |Sev = incident severity, Unassigned → ⚠️ Unassigned, Age uses relative AgeDisplay, entity/tag columns render max 5 comma-separated.
    • When TotalAll > 10: prepend **Showing 10 of {TotalAll} open incidents ({TotalHighCritical} High/Critical)** (sorted by severity, then newest, most complex first)
    • The list is deduplicated by Title (one representative per title). When an incident's TitleDupCount > 1, append (+{TitleDupCount-1} more) to its Title cell so recurring/noisy incident types remain visible without monopolizing the table.
    • When TotalHighCritical == 0: prepend **No High/Critical incidents — showing top Medium/Low from {TotalAll} open**
  • Q1 incidents must include [#<id>](https://security.microsoft.com/incidents/<ProviderIncidentId>?tid=<tenant_id>) links.
  • Q2: Classification breakdown + severity + MITRE tactics/techniques from TP closures. Always render even when Q1 is ✅.

Rules

Rule Status
Executive Summary synthesizes across pulse AND drill-downs (when present) REQUIRED
Every query has a verdict row — no omissions, no skipped "clear" sections REQUIRED
Drill-down subsections are structured summaries, not raw dumps, with Triggered by: Q<N> REQUIRED
Cross-Investigation Correlation explicitly states "none found" if no connections exist REQUIRED
Consolidated Recommendations are deduplicated (same action + multiple sources → one row) REQUIRED
Fabricated data PROHIBITED

Known Pitfalls

Pitfall Mitigation
Q5 takes ~35s (97d lookback) Acceptable — runs in parallel. Only query needing Data Lake
Q7 capped at ago(30d) AH Graph API limit. Use queries/endpoint/rare_process_chains.md via Data Lake for 90d
Q6 drift scores Computed in-query — do NOT recompute LLM-side
Q9 drill-down: CloudAppEvents identity filtering AccountId and AccountObjectId are Entra ObjectId GUIDs, NOT UPNs. Filtering by UPN returns 0 results silently. Use AccountDisplayName for display-name matching, or resolve UPN→ObjectId via Graph API first. NEVER use tostring(RawEventData) has "UPN" — it causes query cancellation on this high-volume table
Q9: RESTSystem false positives Exchange Online first-party backend services use Client=RESTSystem in ClientInfoString and appear as AppId GUIDs in AccountDisplayName. These are NOT user/app API access — they are system-level mail flow, compliance scanning, or connector ingestion. Q9 filters these out; if investigating Q9 results and see GUID actors with RESTSystem, they are benign Microsoft internal operations
Drill-down query error → silent skip ⛔ NEVER skip. On SemanticError/Failed to resolve: diagnose → fix → re-execute → present corrected results. Partial results with silently omitted failures are PROHIBITED

Schema pitfalls (column names, dynamic fields, parse_json patterns) are covered in copilot-instructions.md Known Table Pitfalls. Refer there for SecurityAlert.Status, ExposureGraphNodes.NodeProperties, timestamp columns, and AuditLogs.InitiatedBy.


Quality Checklist

  • All 12 queries executed
  • Every query has a verdict row — no omissions, no skipped "clear" sections
  • ✅ verdicts cite table + "0 results"; 🔴/🟠 cite specific evidence
  • All incidents have clickable XDR portal URLs
  • Cross-query correlations checked
  • Every non-✅ drill-down has a 🎬 Take Action block with portal-ready KQL (correct required columns per entity type)
  • Every 🎬 Take Action block includes the ⚠️ AI-generated content warning immediately below the heading
  • 📂 Recommended Query Files section present when any non-✅ verdict exists (clickable links, not tables)
  • No fabricated data

SVG Dashboard Generation

After completing the Threat Pulse report, the user may request an SVG visualization. Use the svg-dashboard skill in manifest mode — the widget manifest is at .github/skills/threat-pulse/svg-widgets.yaml.

Execution

  1. Read svg-widgets.yaml (widget manifest)
  2. Read the svg-dashboard SKILL.md for component rendering rules
  3. Map manifest field values to the Threat Pulse report data already in context (or read the saved report file)
  4. Render SVG → save to temp/threat_pulse_{date}_dashboard.svg
用于审计AI代理(Copilot Studio、M365 Copilot等)的安全态势。基于AgentsInfo表,评估代理清单、访问权限、MCP工具、知识源暴露、XPIA邮件风险及凭据泄露,全面识别代理 sprawl 与治理风险。
AI agent posture agent security audit Copilot Studio agents agent inventory agent access broadly accessible agents agent tools MCP tools on agents agent knowledge sources XPIA risk agent sprawl AI agent risk agent governance
.github/skills/ai-agent-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill ai-agent-posture -g -y
SKILL.md
Frontmatter
{
    "name": "ai-agent-posture",
    "description": "Use this skill when asked to audit, assess, or report on AI agent security posture across Copilot Studio, Microsoft 365 Copilot, Microsoft Foundry, and third-party agents. Triggers on keywords like \"AI agent posture\", \"agent security audit\", \"Copilot Studio agents\", \"agent inventory\", \"agent access\", \"broadly accessible agents\", \"agent tools\", \"MCP tools on agents\", \"agent knowledge sources\", \"XPIA risk\", \"agent sprawl\", \"AI agent risk\", \"agent governance\", or when investigating AI agent configurations, access posture, tool permissions, or credential exposure. This skill queries the AgentsInfo table in Advanced Hunting to produce a comprehensive security posture assessment covering agent inventory, access posture, broadly-accessible agent exposure, MCP tool proliferation, knowledge source exposure, XPIA email exfiltration risk, hard-coded credential detection, external endpoint risks, creator governance, and agent sprawl analysis. Supports inline chat and markdown file output.",
    "drill_down_prompt": "Run AI agent security audit — agent inventory, authentication gaps, tool permissions",
    "threat_pulse_domains": [
        "admin",
        "cloud"
    ]
}

AI Agent Security Posture — Instructions

Purpose

This skill audits the security posture of AI agents (Copilot Studio, Microsoft 365 Copilot / Agent Builder, Microsoft Foundry, and third-party platforms) across your organization using the AgentsInfo table in Microsoft Defender XDR Advanced Hunting.

🔄 Table migration (AIAgentsInfo → AgentsInfo): This skill was migrated from the deprecated AIAgentsInfo table to the unified multi-platform AgentsInfo table. AIAgentsInfo remains queryable until July 1, 2026, but it is Copilot Studio-only and uses a different schema. All queries in this skill target AgentsInfo. The new table is a different data model, not a rename — see Table Schema Reference and Known Pitfalls for the differences that shaped these queries.

AI agents are autonomous or semi-autonomous applications that can access organizational data, send emails, call external APIs, and use MCP tools. Misconfigured agents — missing authentication, overly broad access, AI-controlled email sending, hard-coded credentials — represent a growing attack surface. This skill systematically evaluates that surface.

What this skill covers:

Domain Key Questions Answered
🔍 Agent Inventory How many agents exist? What's their status, platform, environment?
🔐 Access Posture Which agents are broadly accessible (allowForAllUsers)? How are agents shared (appType: lob/shared)?
🛠️ Tools & MCP Which agents have MCP tools? What operations can they perform?
📚 Knowledge Sources What data sources are agents connected to?
📧 XPIA Email Risk Which agents can send email (data exfil precondition)?
🔑 Credential Exposure Are credentials hard-coded in agent instructions or connector metadata?
🌐 External Endpoint Risk What external hosts do agent connectors reach? Any insecure schemes or non-standard ports?
👥 Creator Governance Who creates agents? Is there naming hygiene? Abandoned agents?

Data source: AgentsInfo table (Advanced Hunting) — currently in Preview.

References:

🔴 URL Registry — Canonical Links for Report Generation

MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL. If a URL is not in this registry, omit the hyperlink entirely and use plain text.

Label Canonical URL
BLOG_RUNTIME_RISK https://www.microsoft.com/en-us/security/blog/2026/01/23/runtime-risk-realtime-defense-securing-ai-agents/
BLOG_AGENT_365 https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-agent-365-the-control-plane-for-ai-agents/
DOCS_AGENTSINFO https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-agentsinfo-table
DOCS_AGENT_PROTECTION https://learn.microsoft.com/en-us/defender-cloud-apps/ai-agent-protection
DOCS_RUNTIME_PROTECTION https://learn.microsoft.com/en-us/defender-cloud-apps/real-time-agent-protection-during-runtime

Usage in reports: When referencing attack scenarios, link to BLOG_RUNTIME_RISK. When referencing Agent 365 governance, link to BLOG_AGENT_365. When referencing runtime protection, link to DOCS_RUNTIME_PROTECTION.


Threat Landscape: Why AI Agent Posture Matters

Microsoft Defender Security Research has identified that AI agents represent a fundamentally new attack surface where the agent's capabilities are effectively equivalent to code execution. When a tool is invoked, it can read/write data, send emails, update records, or trigger workflows — and an attacker who can influence the agent's plan can indirectly cause the execution of unintended operations within the agent's capability sandbox.

The core risk: the agent's orchestrator depends on natural language input to determine which tools to use and how to use them. This creates exposure to prompt injection and reprogramming failures, where malicious prompts, embedded instructions, or crafted documents can manipulate the decision-making process.

This skill's queries map directly to three attack scenarios documented by Microsoft:

Attack Scenario 1: Malicious Instruction Injection via Event-Triggered Workflow

Element Detail
Vector Crafted email sent to an agent-monitored mailbox (event trigger)
Mechanism Email contains hidden instructions telling the agent to search knowledge base for sensitive data and exfiltrate via email to attacker
Preconditions Agent can send email (email connector) + has an event/email trigger + a knowledge source
Detection Q5 (XPIA Email Risk) detects email-capable agents via connector operations; Q7 (Knowledge Sources) identifies data exposure
Skill Signal Agents with an email-send operation (e.g., Office 365 Outlook Send an email (V2)) + knowledge sources, especially if broadly accessible (allowForAllUsers == "true") = highest risk

Attack Scenario 2: Prompt Injection via Shared Document → Email Exfiltration (XPIA)

Element Detail
Vector Malicious insider edits a SharePoint document with crafted instructions
Mechanism Agent processing the document is tricked into reading a sensitive file on a different SharePoint site (that the agent has access to but the attacker doesn't) and emailing contents to attacker-controlled domain
Preconditions Agent has a knowledge/data source + an email-send connector operation
Detection Q5 (XPIA) + Q7 (Knowledge Sources) identifies the attack surface
Skill Signal A declared data source + an email-send operation (e.g., Send an email (V2)) on the same agent = classic XPIA vector

Attack Scenario 3: Capability Reconnaissance on Unauthenticated Agent

Element Detail
Vector Attacker interacts with publicly accessible chatbot (no authentication required)
Mechanism Series of crafted prompts to probe and enumerate the agent's tools and knowledge sources, then exploit them to extract sensitive data
Preconditions Agent is broadly accessible (allowForAllUsers == "true", e.g., shared tenant-wide or website embed)
Detection Q4 (Broadly-Accessible Agents) identifies exposed agents; cross-reference with Q7 (knowledge sources with customer data)
Skill Signal allowForAllUsers == "true" + knowledge sources containing sensitive data = reconnaissance target

⚠️ Authentication-type telemetry gap: The deprecated AIAgentsInfo table exposed UserAuthenticationType (None/Integrated/Custom), which let this skill directly flag unauthenticated agents. The new AgentsInfo table has no populated authentication-type column in current telemetry (ToolsAuthenticationType is empty). The closest available exposure signal is RawAgentInfo.allowForAllUsers == "true" (broadly accessible to all tenant users). This is a proxy, not an equivalent — it measures broad reach, not absence of authentication. Treat broadly-accessible agents as the highest-exposure cohort and recommend Entra-based access policies (Agent 365) to close the gap.

Mitigation: Defender Runtime Protection

Microsoft Defender provides webhook-based runtime inspection for Copilot Studio agents. Before every tool, topic, or knowledge action is executed, the generative orchestrator sends a webhook to Defender containing the planned invocation context. Defender analyzes intent and destination in real time and can allow or block the action before execution.

This is the primary runtime defense against all three scenarios above. When reviewing posture findings from this skill, always recommend enabling Defender Runtime Protection for agents flagged as high-risk. See Real-time agent protection during runtime.

Governance Framework: Microsoft Agent 365

Microsoft Agent 365 is the enterprise control plane for AI agents — the platform-level answer to the governance gaps this skill detects. It provides five capabilities that directly map to this skill's risk dimensions:

Agent 365 Capability What It Does Skill Dimensions Addressed
1. Registry Single source of truth for all agents (Entra agent ID). IT can quarantine unsanctioned agents and detect shadow agents. Agent Store for governed discovery. Agent Inventory (Q1), Creator Governance (Q10), Agent Sprawl (Q11)
2. Access Control Unique agent IDs via Entra. Agent Policy Templates enforce security from day one. Adaptive, risk-based access policies. Least-privilege enforcement. Broadly-Accessible Agents (Q4), Access Posture (Q3)
3. Visualization Unified dashboard mapping agents ↔ users ↔ resources. Role-based reporting. Compliance logging, e-discovery, and audit trail. MCP Tool Exposure (Q6), Knowledge Sources (Q7), Creator Governance (Q10)
4. Interoperability Agents access Work IQ (org data, relationships, context). Works across Copilot Studio, Microsoft Foundry, Agent Framework, Agent 365 SDK, and partner platforms. Knowledge Source Risk (Q7), Tools Inventory (Q12)
5. Security Defense-in-depth via Microsoft Defender (posture + threat detection + runtime protection), Entra (real-time blocking), and Purview (data exposure risk, sensitive data leak prevention, compliance). XPIA Email Risk (Q5), Credential Hygiene (Q8), External Endpoint Risk (Q9)

How to reference Agent 365 in reports: When this skill identifies governance gaps (sprawl, missing authentication, uncontrolled tool access), recommend Agent 365 as the strategic platform to address them. Specific mappings:

  • Agent sprawl / no naming conventions → Agent 365 Registry + quarantine for unsanctioned agents
  • Missing access controls / broadly-accessible agents → Agent 365 Access Control + Entra agent IDs + Policy Templates
  • No visibility into agent-resource connections → Agent 365 Visualization dashboard
  • Uncontrolled MCP/tool proliferation → Agent 365 Security + Defender posture management
  • XPIA / data exfiltration risk → Agent 365 Security + Purview for real-time data leak prevention

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules — Mandatory rules
  2. Table Schema Reference — AgentsInfo columns and data types
  3. Agent Security Score Formula — Composite risk scoring
  4. Execution Workflow — Phase-by-phase query plan
  5. Sample KQL Queries — All queries (Q1–Q12)
  6. Output Modes — Inline vs Markdown report
  7. Inline Report Template — Chat-rendered format
  8. Markdown File Report Template — Disk-saved format
  9. Known Pitfalls — Schema quirks and edge cases
  10. Quality Checklist — Pre-delivery validation
  11. SVG Dashboard Generation — Visual dashboard from report

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

  1. ALWAYS use RunAdvancedHuntingQuery — The AgentsInfo table is an Advanced Hunting table. It is NOT available in Sentinel Data Lake (query_lake). All queries in this skill MUST use RunAdvancedHuntingQuery.

  2. ALWAYS deduplicate agents with arg_max — The table contains multiple records per agent (state snapshots over time). Every query that analyzes current agent state MUST use | summarize arg_max(Timestamp, *) by AgentId to get the latest record per agent. Note AgentId is a guid.

  3. ALWAYS exclude deleted agents (unless specifically auditing deletions) — Add | where LifecycleStatus != "Deleted" after deduplication. LifecycleStatus is blank for active agents and only set to Deleted for removed ones, so this filter keeps active agents.

  4. ASK the user for output format before generating the report:

    • Inline chat summary (quick review in chat)
    • Markdown file report (detailed, archived to reports/ai-agent-posture/)
    • Both (markdown + inline summary)
  5. ⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (✅ No [finding] detected) when queries return 0 results. Never guess or assume.

  6. 🔴 The rich agent detail lives in RawAgentInfo (dynamic), not in flat columns — Governance signals (creatorId, allowForAllUsers, appType, scope) and deep tool/connector detail (declarativeCopilotMetadata) are nested inside the RawAgentInfo dynamic column. The normalized columns (DeclaredTools, McpServers, DeclaredDataSources) are sparse and flat. Parse RawAgentInfo with mv-expand/dot-notation — never assume a flat column holds the value. See Known Pitfalls.

  7. Run queries in parallel batches where possible — Phase 1 queries (Q1–Q3) are independent and can run in parallel. Phase 2 queries (Q4–Q9) are independent and can run in parallel. Phase 3 (Q10–Q12) can run in parallel.

  8. Time tracking — Report elapsed time after each phase completion.


Table Schema Reference

The AgentsInfo table (Preview) contains configuration snapshots of AI agents across Copilot Studio, Microsoft 365 Copilot (Agent Builder), Microsoft Foundry, and third-party platforms. The schema below reflects the live table (which differs from the published docs in several places — column casing, types, and which columns are actually populated).

Top-level columns

Column Type Description
Timestamp datetime Last recorded date/time for this agent snapshot
AgentId guid Unique agent identifier (dedup key)
Name string Display name of the agent
Description string Agent description
Platform string Copilot Studio, Agent Builder in Microsoft 365 Copilot, Microsoft Foundry, Other, SharePoint, Amazon Bedrock, LocalAgents
Version string Agent version
PublishedStatus string Published, Draft
LifecycleStatus string Blank for active agents; Deleted for removed agents
CreatedDateTime datetime When the agent was created
LastUpdatedDateTime datetime When last updated
LastPublishedDateTime datetime When last published
Owners dynamic Owner identities (sparse)
SharedWith dynamic Sharing targets (sparse)
InstanceCount int Blueprint instance count
Instructions string System prompt / agent instructions (well populated)
Model string Backing LLM model (sparse)
Capabilities dynamic Declared capabilities (sparse)
DeclaredDataSources dynamic Knowledge/data sources — array of filename/source strings (sparse)
DeclaredTools dynamic Declared tools — array of {type, name} (sparse, flat)
McpServers dynamic MCP servers — array of {name, description} (sparse)
Skills, ConnectedAgents, Memory, Guardrails dynamic Additional declared config (sparse)
EntraAgentID / EntraBlueprintID / ObservabilityID string Entra + observability linkage (note capital ID)
RawAgentInfo dynamic Primary detail source — full governance + connector manifest (populated for ~all agents). See nested keys below
TenantId, Type, SourceSystem string Standard envelope columns

⚠️ Columns that are EMPTY / unreliable in current telemetry

These columns exist but are not populated in observed data — do NOT build detections on them without first confirming population:

ToolsAuthenticationType (auth-type gap — see below), Availability, Endpoints, Triggers, Permissions, Model (mostly), and most of Owners/SharedWith.

🔴 Authentication-type gap: The deprecated AIAgentsInfo.UserAuthenticationType (None/Integrated/Custom) has no populated equivalent in AgentsInfo. There is no reliable way to flag "unauthenticated" agents from this table. Use RawAgentInfo.allowForAllUsers == "true" as a broad-exposure proxy (Q4) and document the gap.

RawAgentInfo nested keys (the rich data)

For Copilot Studio agents, RawAgentInfo is a marketplace/governance manifest. Key fields the queries below rely on:

Path Meaning
RawAgentInfo.creatorId Creator GUID (resolve to UPN via IdentityInfo join). Replaces CreatorAccountUpn. Sparse
RawAgentInfo.allowForAllUsers "true" = broadly accessible to all tenant users (exposure signal). Replaces AccessControlPolicy == "Any"
RawAgentInfo.appType lob (line-of-business, owner-scoped), shared, thirdParty, firstParty
RawAgentInfo.scope Sharing scope (e.g., tenant)
RawAgentInfo.declarativeCopilotMetadata Deep connector/tool detail (DCM). Present only for the connector-sourced subset (~10% of Copilot Studio agents)

DCM nesting (recovers deep tool, operation, and endpoint detail):

RawAgentInfo.declarativeCopilotMetadata[]
  .actions[]
    .apis[]            // .type = OpenApi | RemoteMCPServer | api_action
      .serverUrls[]    // populated for OpenApi + RemoteMCPServer (external hosts)
      .operations[]
        .operationId   // e.g., "Office 365 Outlook Send an email (V2)"

DCM siblings also carry instructions, llmModels (model), and sourceIds (incl. EnvironmentId, SourceAgentId).


Agent Security Score Formula

The Agent Security Score is a composite risk indicator that summarizes the security posture of an organization's AI agent fleet. Higher scores indicate greater risk.

Scoring Dimensions

$$ \text{AgentSecurityScore} = \sum_{i} \text{DimensionScore}_i $$

Each dimension contributes 0–20 points to a maximum of 100:

Dimension Max 🟢 Low (0–5) 🟡 Medium (6–12) 🔴 High (13–20)
Broadly-Accessible Agents 20 0 agents with allowForAllUsers == "true" 1–2 broadly-accessible agents ≥3 broadly-accessible agents, especially if Published with knowledge sources or email capability
XPIA Email Risk 20 0 email-capable agents 1–2 email-capable agents (scoped access) ≥1 email-capable agent that is also broadly accessible or has knowledge sources
Tool & Endpoint Exposure 20 0–2 MCP agents, known creators, no external endpoints 3–10 MCP agents, external endpoints all HTTPS/standard-port >10 MCP agents, OR MCP/endpoint agents that are broadly accessible, OR any insecure-scheme / non-standard-port external endpoint (Q9 escalators)
Knowledge Source Risk 20 0 agents with data sources + broad access 1–3 agents with data sources + scoped access Agents with data sources + allowForAllUsers == "true". Compounding rule: When agents have data sources + an email-send operation + broad access (the full XPIA chain from Q5 + Q7), score at maximum (20) for this dimension AND score XPIA Email Risk at maximum (20) — the combination is the documented attack pattern
Credential Hygiene 20 0 credential patterns detected Patterns found but agent is Draft (unpublished) Patterns found in Published agents

Interpretation Scale

Score Rating Action
0–20 ✅ Healthy Normal posture, no immediate concerns
21–45 🟡 Elevated Review — minor misconfigurations detected
46–70 🟠 Concerning Investigate — multiple risk signals present
71–100 🔴 Critical Immediate remediation — significant agent security risk

The Tool & Endpoint Exposure dimension folds external-endpoint risk (Q9) into the MCP exposure signal: an insecure scheme, a non-standard port, or an external endpoint on a broadly-accessible agent each escalates this dimension to its High tier regardless of MCP count.

Supplementary Indicators (not summed into the /100 score)

Two indicators are reported alongside the composite score for added context. They are intentionally not added to the /100 total — they enrich interpretation and feed the dimensions above as evidence.

Indicator Source What it tells you
Capability Privilege Index Q13 Count of agents holding ≥1 sensitive operation (mail-send, directory-write, data-write, messaging). Split by broad access. A high count of broadly-accessible + sensitive-op agents is the strongest privilege-abuse signal and should justify maxing the Broad Access and/or XPIA dimensions.
Deep-Manifest Coverage Q14 Percentage of the fleet carrying declarativeCopilotMetadata (DCM). Because the XPIA, endpoint, and capability queries depend on DCM, this is the fraction of the estate that was fully inspectable. Every report MUST surface this so the analyst knows what was not inspected.

Execution Workflow

Phase 0: Prerequisites

  1. Confirm RunAdvancedHuntingQuery is available (AgentsInfo is AH-only)
  2. Ask user for output format (inline / markdown / both)

Phase 1: Inventory & Overview (Q1–Q3)

Run in parallel — no dependencies between queries.

Query Purpose
Q1 Global inventory summary (counts, date range, platforms, creators)
Q2 Status and platform breakdown
Q3 Access posture distribution (appType / allowForAllUsers)

Phase 2: Security Risk Analysis (Q4–Q9)

Run in parallel — no dependencies between queries.

Query Purpose
Q4 Broadly-accessible agents (allowForAllUsers == "true" detail)
Q5 XPIA email exfiltration risk (email-send connector operations)
Q6 MCP tool inventory across agents
Q7 Knowledge / data source audit
Q8 Hard-coded credential scan
Q9 External endpoint & HTTP risk (connector serverUrls)

Phase 3: Governance & Trends (Q10–Q12)

Run in parallel — no dependencies between queries.

Query Purpose
Q10 Top creators and naming hygiene
Q11 Agent creation trend over time
Q12 Capability / tools inventory (all operation types)
Q13 Operation-level privilege mapping (sensitive-operation matrix → Capability Privilege Index)
Q14 Deep-manifest coverage (% of fleet with DCM → report coverage banner)

Phase 4: Score Computation & Report Generation

  1. Compute per-dimension scores from Phase 1–3 data
  2. Sum dimension scores for composite Agent Security Score
  3. Generate report in requested output mode
  4. Report total elapsed time

Phase 5: Runtime Correlation (Optional)

AgentsInfo describes how agents are configured; it does not show whether they are actually used or what they do at runtime. To close that gap, correlate the flagged configuration set against the CopilotActivity table (all-surface AI activity log, available in Advanced Hunting).

When to run: After Phase 4, when the user wants to know which flagged agents are actually active, which are dormant, or whether a high-risk agent shows runtime behavior.

🔴 Use a SCOPED lookup, never a fleet-wide join. A leftouter/inner join of the full AgentsInfo fleet (~15k agents, heavy RawAgentInfo dynamic) against CopilotActivity (100k+ rows) times out the Advanced Hunting endpoint. Instead:

  1. Phase 4 produces a small flagged-agent NAME list (broadly-accessible from Q4 + sensitive-op agents from Q13 — typically <50 names).
  2. Filter CopilotActivity to that name set with where AgentName in (FlaggedNames) — light, no join. See Query 15.

Join-key pitfall: CopilotActivity.AgentId is a composite/prefixed string (e.g., T_<tenant>.<guid>, CopilotStudio.Declarative.T_….gpt.<guid>, or literals like AgentBuilder) — it does not equal the clean AgentsInfo.AgentId GUID, so ID-based joins return 0 matches. AgentName is the reliable correlation key. Also note most CopilotActivity rows have an empty AgentId/AgentName (general M365 Copilot usage, not declarative-agent-attributed), so runtime attribution is inherently low-coverage — absence from CopilotActivity does NOT prove an agent is dormant.

Two high-value correlations:

  • Active-and-dangerous — a flagged agent (broadly accessible / XPIA-exposed / sensitive ops) that ALSO appears in CopilotActivity with real interactions → highest remediation priority (Query 15).
  • Configured-but-dormant — a flagged agent absent from CopilotActivity over the window → lower urgency, candidate for decommissioning (caveat: attribution gaps above).

For deeper runtime reconstruction (data accessed, tools invoked, jailbreak detections), hand off to the dedicated query library queries/cloud/copilot_activity_investigation.md rather than duplicating queries here.

Keep this phase thin and scoped: the posture skill owns configuration assessment; copilot_activity_investigation.md owns runtime reconstruction. Reference, don't duplicate.


Sample KQL Queries

All queries below are validated against the live AgentsInfo table. Use them exactly as written, substituting only where noted. Because the rich agent detail lives in the RawAgentInfo dynamic column, several queries parse RawAgentInfo.declarativeCopilotMetadata (DCM). DCM is present only for the connector-sourced subset of agents — queries that depend on it carry a coverage caveat.

Query 1: Global Inventory Summary

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| extend CreatorId = tostring(RawAgentInfo.creatorId)
| summarize
    UniqueAgents = dcount(AgentId),
    EarliestRecord = min(Timestamp),
    LatestRecord = max(Timestamp),
    Published = countif(PublishedStatus == "Published"),
    Draft = countif(PublishedStatus == "Draft"),
    Deleted = countif(LifecycleStatus == "Deleted"),
    UniquePlatforms = dcount(Platform),
    UniqueCreators = dcount(CreatorId)

Note: UniqueCreators counts only agents with a populated RawAgentInfo.creatorId (the connector-sourced subset). It under-counts true creators; treat it as a lower bound.

Query 2: Status & Platform Breakdown

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| summarize AgentCount = count() by Platform, PublishedStatus
| order by AgentCount desc

⚠️ Authentication-type gap: The deprecated AIAgentsInfo table broke this down by UserAuthenticationType. AgentsInfo has no populated authentication-type column, so this query reports status by platform instead. For exposure, use Q3 (access posture) and Q4 (broadly-accessible agents).

Query 3: Access Posture Distribution

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend AppType = tostring(RawAgentInfo.appType),
         AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| summarize AgentCount = count() by Platform, AppType, AllowAllUsers
| order by AgentCount desc

Interpretation: appType == "lob" (line-of-business) agents are owner-scoped; appType == "shared" are shared more widely. allowForAllUsers == "true" (any platform) is the broad-exposure signal — these reach every tenant user. This replaces the old AccessControlPolicy distribution.

Query 4: Broadly-Accessible Agents

🔴 Security-critical query — agents with allowForAllUsers == "true" are accessible to all tenant users. This is the closest available proxy for the old "unauthenticated / Any access" exposure signal (see the authentication-type gap).

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers),
         AppType = tostring(RawAgentInfo.appType),
         CreatorId = tostring(RawAgentInfo.creatorId)
| where AllowAllUsers == "true"
| project Name, Platform, PublishedStatus, AppType, CreatorId, AgentId, CreatedDateTime, Description
| order by PublishedStatus asc, CreatedDateTime desc

Post-processing: For each broadly-accessible agent, note:

  • Is it Published (active) or Draft?
  • Cross-reference with Q5 (email-capable) and Q7 (knowledge sources) for compounding XPIA / reconnaissance risk.

🔴 Capability Reconnaissance Risk (Attack Scenario 3): Broadly-accessible agents are prime targets for adversarial probing. Published agents with knowledge sources containing customer/internal data are the highest-priority findings.

Query 5: XPIA Email Exfiltration Risk (Email-Capable Agents)

🔴 Security-critical query — agents that can send email via a connector operation. A successful prompt-injection (XPIA) attack could direct the agent to exfiltrate data to arbitrary recipients.

Coverage caveat: Detects email-send operations declared in RawAgentInfo.declarativeCopilotMetadata (DCM). DCM is present only for the connector-sourced agent subset. The old IsGenerativeOrchestrationEnabled flag and action-level inputs (AI-controlled vs hardcoded recipient) are not available in AgentsInfo — this query identifies capability, not orchestration mode.

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId)
| where OperationId has "Send an email" or OperationId has "SendEmail"
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers),
         CreatorId = tostring(RawAgentInfo.creatorId)
| summarize EmailOperations = make_set(OperationId)
    by AgentId, Name, Platform, PublishedStatus, AllowAllUsers, CreatorId
| order by AllowAllUsers desc, PublishedStatus asc

Post-processing:

  • AllowAllUsers == "true" → email-capable and broadly accessible = highest XPIA risk (any tenant user can trigger the chain).
  • Cross-reference with Q7: an email-capable agent that also has knowledge/data sources is the documented XPIA exfiltration pattern (Attack Scenario 2). Prioritize these for Defender Runtime Protection.

🔴 Attack Scenario Mapping: This query detects the agent-configuration precondition (email-send capability) for two documented scenarios — Malicious Instruction Injection via Event Trigger and Prompt Injection via Shared Document. Broadly-accessible email-capable agents (no access restriction + email) are the most dangerous.

Query 6: MCP Tool Inventory Across Agents

🟠 Governance query — MCP servers give agents access to external systems, Graph API, Sentinel data, and more. Uncontrolled MCP proliferation increases the attack surface. AgentsInfo exposes a dedicated McpServers column (cleaner than the old tool-detail parse).

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(McpServers) > 0
| mv-expand Mcp = McpServers
| extend McpName = tostring(Mcp.name)
| extend CreatorId = tostring(RawAgentInfo.creatorId),
         AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| summarize McpServerList = make_set(McpName), McpToolCount = dcount(McpName)
    by AgentId, Name, Platform, CreatorId, AllowAllUsers
| order by McpToolCount desc

MCP server distribution (which servers appear on the most agents):

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(McpServers) > 0
| mv-expand Mcp = McpServers
| summarize AgentCount = dcount(AgentId) by McpServer = tostring(Mcp.name)
| order by AgentCount desc

Note: McpServers is flat ({name, description} only) — no server URLs or credential config. For external MCP endpoint detail (host/scheme/port), use Q9, which parses RemoteMCPServer serverUrls from DCM.

Query 7: Knowledge / Data Source Audit

🟡 Data exposure query — identifies what data sources agents declare. In AgentsInfo, declared sources appear in the DeclaredDataSources column as an array of source/filename strings.

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(DeclaredDataSources) > 0
| mv-expand DS = DeclaredDataSources
| extend DataSource = tostring(DS)
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers),
         CreatorId = tostring(RawAgentInfo.creatorId)
| summarize DataSources = make_set(DataSource), SourceCount = dcount(DataSource)
    by AgentId, Name, Platform, AllowAllUsers, CreatorId
| order by SourceCount desc

Post-processing — flag high-risk combinations:

  • Data sources + allowForAllUsers == "true" → internal data potentially exposed broadly.
  • Any data source on an agent that is also email-capable (Q5) → XPIA exfiltration chain.

Coverage caveat: DeclaredDataSources is sparse and stores source names/filenames (e.g., Priority-Banking-Policy.docx), not the richer $kind/site structure the old KnowledgeDetails column held. Source type classification (SharePoint vs public site vs federated) is not reliably available — report the declared source names and flag broadly-accessible agents that carry any.

🔴 Document Injection Risk (Attack Scenario 2): Data sources are the primary vector for indirect prompt injection (XPIA). Cross-reference with Q5: agents that combine declared data sources with an email-send operation are the textbook XPIA exfiltration pattern — flag these as highest priority in the Knowledge Source Risk dimension.

Query 8: Hard-Coded Credential Scan

🔴 Security-critical query — scans agent Instructions and the connector metadata in RawAgentInfo for patterns matching API keys, JWTs, Basic auth headers, and embedded credentials.

let suspicious_patterns = @"(AKIA[0-9A-Z]{16})|(AIza[0-9A-Za-z_\-]{35})|(xox[baprs]-[0-9a-zA-Z]{10,48})|(ghp_[A-Za-z0-9]{36,59})|(sk_(live|test)_[A-Za-z0-9]{24})|(SG\.[A-Za-z0-9]{22}\.[A-Za-z0-9]{43})|(eyJ[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+)|(Authorization\s*:\s*Basic\s+[A-Za-z0-9=:+]+)|([A-Za-z]+:\/\/[^\/\s]+:[^\/\s]+@[^\/\s]+)";
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend Haystack = strcat(tostring(Instructions), " ", tostring(RawAgentInfo.declarativeCopilotMetadata))
| where Haystack matches regex suspicious_patterns
| project Name, Platform, PublishedStatus,
          CreatorId = tostring(RawAgentInfo.creatorId), AgentId

Post-processing:

  • Published agents with credential matches = immediate remediation required.
  • Recommend Azure Key Vault + environment variables instead of hard-coded secrets.
  • The JWT (eyJ...) and url://user:pass@host patterns can false-positive on example payloads — manually review each match.

Query 9: External Endpoint & HTTP Risk

🟠 Network risk query — inventories the external hosts that agent connectors reach, and flags insecure schemes or non-standard ports. External endpoints are declared in DCM apis[].serverUrls for OpenApi and RemoteMCPServer connector types (these are populated; api_action Power Platform connectors abstract the URL and are not covered).

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| extend ApiType = tostring(Api.type)
| where ApiType in ("OpenApi", "RemoteMCPServer")
| mv-expand Url = Api.serverUrls
| extend Url = tostring(Url)
| where isnotempty(Url)
| extend Host = tostring(parse_url(Url).Host),
         Port = tostring(parse_url(Url).Port),
         Scheme = tostring(parse_url(Url).Scheme)
| extend NonStandardPort = isnotempty(Port) and Port !in ("443", "80", ""),
         InsecureScheme = Scheme != "https"
| project Name, Platform, ApiType, Scheme, Host, Port, Url,
          NonStandardPort, InsecureScheme,
          AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| order by NonStandardPort desc, InsecureScheme desc, Host asc

Post-processing:

  • InsecureScheme == true (non-HTTPS) or NonStandardPort == true → review the connector; data may transit insecurely.
  • Unfamiliar external hosts on broadly-accessible agents (AllowAllUsers == "true") → highest priority.

Coverage caveat: Only OpenApi + RemoteMCPServer connectors declare serverUrls. Power Platform api_action connectors (the majority) do not expose a URL here, so their destinations are not inventoried by this query. The old topic-level HttpRequestAction parsing is not applicable to AgentsInfo.

Query 10: Top Creators & Naming Hygiene

👥 Governance query — identifies prolific agent creators and names lacking descriptiveness. Creator is a GUID in RawAgentInfo.creatorId; resolve to UPN via an IdentityInfo join.

let IdMap = materialize(IdentityInfo
    | where isnotempty(AccountObjectId) and isnotempty(AccountUpn)
    | distinct AccountObjectId, AccountUpn);
AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| extend CreatorId = tostring(RawAgentInfo.creatorId)
| where isnotempty(CreatorId)
| join kind=leftouter IdMap on $left.CreatorId == $right.AccountObjectId
| extend CreatorUpn = coalesce(AccountUpn, CreatorId)
| summarize
    AgentCount = count(),
    PublishedCount = countif(PublishedStatus == "Published"),
    GenericNameCount = countif(Name in~ ("Agent", "agent", "Test", "test", "New Agent")),
    NoDescriptionCount = countif(isempty(Description)),
    AgentNames = make_set(Name, 10)
    by CreatorUpn
| order by AgentCount desc
| take 20

Coverage caveat: Only agents with a populated RawAgentInfo.creatorId are attributed. Creators whose GUID does not resolve in IdentityInfo fall back to the raw GUID. A single creator with a very high AgentCount is a sprawl signal worth investigating.

Query 11: Agent Creation Trend

📈 Trend query — shows agent creation velocity over time to detect sprawl acceleration.

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(CreatedDateTime)
| summarize AgentsCreated = count() by bin(CreatedDateTime, 7d)
| order by CreatedDateTime asc

Query 12: Full Capability / Tools Inventory

🛠️ Tools governance query — catalogs the operations agents can invoke across all connector types, to understand the full capability surface. Parses DCM operations (operationId + API type).

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId), ApiType = tostring(Api.type)
| where isnotempty(OperationId)
| summarize AgentCount = dcount(AgentId), Agents = make_set(Name, 5) by OperationId, ApiType
| order by AgentCount desc

Alternative for non-DCM agents — the flat DeclaredTools column ({type, name}) covers agents without DCM:

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where array_length(DeclaredTools) > 0
| mv-expand Tool = DeclaredTools
| summarize AgentCount = dcount(AgentId)
    by ToolType = tostring(Tool.type), ToolName = tostring(Tool.name)
| order by AgentCount desc

Coverage caveat: The DCM query yields deep operation-level detail but only for the connector-sourced subset. The DeclaredTools fallback is broader but flatter (tool name/type only, no operation IDs). Run both for the fullest picture.

Query 13: Operation-Level Privilege Mapping

🔐 Privilege query — buckets every declared operation into a sensitivity category (mail-send, directory-write, data-write, messaging, security-tooling, read/other) to surface where write/exfiltration capability concentrates. Feeds the Capability Privilege Index supplementary indicator.

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId)
| where isnotempty(OperationId)
| extend PrivilegeCategory = case(
    OperationId has_any ("Send an email", "SendEmail", "Send email"), "Mail-Send",
    OperationId has_any ("AddUserToGroup", "RemoveMember", "UpdatePerson", "UpdateOrganisation", "Create user", "Delete user", "Update user", "Assign"), "Directory-Write",
    OperationId has_any ("unbound action", "Create a row", "Update a row", "Delete a row", "Create record", "Update record"), "Data-Write",
    OperationId has_any ("Post message", "Post a message", "Send message", "Create chat", "post in a chat"), "Messaging",
    OperationId has_any ("Security Copilot", "Sentinel"), "Security-Tooling",
    "Other/Read")
| summarize AgentCount = dcount(AgentId) by PrivilegeCategory
| order by AgentCount desc

Capability Privilege Index — distinct agents holding ≥1 sensitive (write/send) operation, split by broad access:

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| where isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))
| extend AllowAllUsers = tostring(RawAgentInfo.allowForAllUsers)
| mv-expand DCM = RawAgentInfo.declarativeCopilotMetadata
| mv-expand Action = DCM.actions
| mv-expand Api = Action.apis
| mv-expand Op = Api.operations
| extend OperationId = tostring(Op.operationId)
| where OperationId has_any ("Send an email", "SendEmail", "AddUserToGroup", "RemoveMember", "UpdatePerson", "UpdateOrganisation", "unbound action", "Create a row", "Update a row", "Delete a row", "Post message", "post in a chat")
| summarize SensitiveAgents = dcount(AgentId),
            BroadAndSensitive = dcountif(AgentId, AllowAllUsers == "true")

Interpretation: BroadAndSensitive > 0 is a direct privilege-abuse signal — a broadly-accessible agent that can write to the directory, write data, or send mail. These agents justify maxing the Broad Access and/or XPIA dimensions. Tune the operation keyword lists to your tenant's connector set.

Query 14: Deep-Manifest Coverage (Report Banner)

📊 Coverage query — reports what fraction of the fleet carries the deep declarativeCopilotMetadata (DCM) that the XPIA, endpoint, and capability queries depend on. Run this every report and surface the result as a banner so the analyst knows what was not fully inspected.

AgentsInfo
| summarize arg_max(Timestamp, *) by AgentId
| where LifecycleStatus != "Deleted"
| summarize Total = count(),
            WithDCM = countif(isnotempty(tostring(RawAgentInfo.declarativeCopilotMetadata))),
            WithInstructions = countif(isnotempty(Instructions)),
            WithObservabilityID = countif(isnotempty(ObservabilityID)),
            WithEntraAgentID = countif(isnotempty(EntraAgentID))
| extend DcmCoveragePct = round(100.0 * WithDCM / Total, 1),
         InstrCoveragePct = round(100.0 * WithInstructions / Total, 1),
         ObsIdPct = round(100.0 * WithObservabilityID / Total, 1),
         EntraIdPct = round(100.0 * WithEntraAgentID / Total, 1)

Why both ID columns: ObservabilityID is near-universally populated (~100%) and is the natural runtime-correlation handle; EntraAgentID is sparse (only agents provisioned with an Entra Agent ID). Report both so the analyst knows which runtime/identity correlations are feasible.

Query 15: Runtime Correlation — Active-and-Dangerous (Scoped)

🎯 Runtime query (Phase 5) — confirms which flagged agents are actually active. Scoped by name list — no fleet-wide join (see Phase 5 for why a full join times out). Populate FlaggedNames from the Q4 broadly-accessible and Q13 sensitive-op results.

let FlaggedNames = dynamic(["<broadly-accessible or sensitive-op agent names from Q4/Q13>"]);
CopilotActivity
| where TimeGenerated > ago(7d)
| where AgentName in (FlaggedNames)
| summarize Interactions = count(),
            DistinctUsers = dcount(ActorUserId),
            LastSeen = max(TimeGenerated),
            SrcIPs = dcount(SrcIpAddr) by AgentName
| order by Interactions desc

Interpretation: A flagged agent appearing here with real Interactions is active-and-dangerous — prioritize for remediation over dormant flagged agents. Join key is AgentName (CopilotActivity.AgentId is a composite prefixed string that does NOT equal AgentsInfo.AgentId). Absence here does not prove dormancy — most CopilotActivity rows are unattributed (empty AgentName). AIModelName is sparse in this table; do not rely on it for model inventory.


Output Modes

Mode 1: Inline Chat Summary

Render the full analysis directly in the chat response. Best for quick review.

Mode 2: Markdown File Report

Save a comprehensive report to disk at:

reports/ai-agent-posture/AI_Agent_Posture_Report_YYYYMMDD_HHMMSS.md

Mode 3: Both

Generate the markdown file AND provide an inline summary in chat.

Always ask the user which mode before generating output.


Inline Report Template

Render the following sections in order. Omit sections only if explicitly noted as conditional.

🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).

# 🤖 AI Agent Security Posture Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** AgentsInfo (Advanced Hunting)
**Analysis Period:** <EarliestRecord> → <LatestRecord>
**Platforms:** <list discovered Platform values>

---

> 📊 **Deep-Manifest Coverage (Q14):** `<WithDCM>/<Total>` agents (**<DcmCoveragePct>%**) carry `declarativeCopilotMetadata` — the XPIA, external-endpoint, and capability findings below cover **only this subset**. Instructions present on **<InstrCoveragePct>%**, ObservabilityID on **<ObsIdPct>%** (runtime-correlation handle), EntraAgentID on **<EntraIdPct>%**. The remaining `<Total - WithDCM>` agents were inventoried but not deeply inspected.

---

## Executive Summary

<2-3 sentences: total agents, key risk findings, overall score>

**Overall Risk Rating:** 🔴/🟠/🟡/✅ <RATING> (<Score>/100)

---

## Key Metrics

| Metric | Value |
|--------|-------|
| Total Agents (non-deleted) | <N> |
| Published Agents | <N> |
| Draft Agents | <N> |
| Platforms Represented | <N> |
| Resolved Creators (lower bound) | <N> |
| Broadly-Accessible Agents (allowForAllUsers) | <N> |
| Agents with MCP Servers | <N> |
| Agents with Declared Data Sources | <N> |
| Email-Capable Agents (XPIA Risk) | <N> |

> ℹ️ **Coverage note:** Creator and capability metrics are derived from `RawAgentInfo` and `declarativeCopilotMetadata`, which are sparsely populated. Counts marked "lower bound" reflect only agents with the relevant field present — see per-section caveats.

---

## 🔓 Access Posture

> **Authentication-type gap:** `AgentsInfo` has no equivalent to the old `UserAuthenticationType` (None/Microsoft/Custom). The `ToolsAuthenticationType` column is effectively empty in practice. Access exposure is assessed via the `RawAgentInfo.allowForAllUsers` governance signal instead — a **proxy for broad exposure, not an authentication state**.

### Access Distribution (Q3)
| App Type | Allow-All-Users | Count |
|----------|-----------------|-------|
| <appType> | <true/false> | <N> |

### 🔴 Broadly-Accessible Agents (Q4)

<If Q4 returns results:>
| Agent Name | Platform | App Type | Published | Created |
|------------|----------|----------|-----------|---------|
| <name> | <platform> | <appType> | <status> | <date> |

<If Q4 returns 0:>
✅ No broadly-accessible agents (`allowForAllUsers == "true"`) detected.

---

## 📧 XPIA Email Exfiltration Risk

<If Q5 returns results:>
| Agent Name | Platform | Email Operation | Broadly Accessible |
|------------|----------|-----------------|--------------------|
| <name> | <platform> | <operationId> | 🔴 Yes / 🟢 No |

**Risk Assessment:**
- 🔴 Email-capable agents can be exploited via XPIA to exfiltrate data, especially when combined with declared data sources (Q7).
- ⚠️ Recommendation: Review recipient controls; apply Power Platform DLP and Defender Runtime Protection.

> **Coverage caveat:** Email capability is detected from DCM `operations[].operationId` (e.g., "Send an email", "SendEmail"). There is no longer a GenAI-orchestration flag or an `inputs` field, so AI-controlled-vs-hardcoded recipient distinction is **not available** — treat all email-capable agents as candidates. Only the DCM-bearing subset is covered.

<If Q5 returns 0:>
✅ No email-capable agents detected in the DCM-bearing subset.

---

## 🛠️ MCP Server Exposure

<If Q6 returns results:>
| Agent Name | Platform | MCP Servers | Broadly Accessible |
|------------|----------|-------------|--------------------|
| <name> | <platform> | <server list> | <yes/no> |

**MCP Server Distribution:**
| MCP Server | Agent Count |
|------------|-------------|
| <server> | <N> |

<If Q6 returns 0:>
✅ No agents with MCP servers detected.

> **Coverage caveat:** The `McpServers` column is flat (`{name, description}` only) — no server URLs, credential config, or transport detail. Non-HTTPS/hardcoded-cred MCP detection from the old schema is not possible here.

> **Dimension note:** MCP exposure and the External Endpoint findings (below) both feed the single **Tool & Endpoint Exposure** score dimension. Any insecure scheme, non-standard port, or external endpoint on a broadly-accessible agent escalates that dimension to High regardless of MCP count.

---

## 📚 Declared Data Source Exposure

<If Q7 returns results:>
| Agent Name | Platform | Data Sources | Broadly Accessible |
|------------|----------|--------------|--------------------|
| <name> | <platform> | <source names> | <yes/no> |

**⚠️ High-Risk Combinations:**
<List agents with declared data sources + allowForAllUsers == "true", and agents combining data sources with email capability (Q5)>

<If Q7 returns 0:>
✅ No declared data sources found on any agents.

> **Coverage caveat:** `DeclaredDataSources` stores source **names/filenames** only — source *type* classification (SharePoint vs public site vs federated) is not available.

---

## 🔑 Credential Hygiene

<If Q8 returns results:>
🔴 **Hard-coded credential patterns detected in <N> agent(s):**
| Agent Name | Platform | Status | Creator |
|------------|----------|--------|---------|
| <name> | <platform> | <status> | <creatorId/upn> |

⚠️ **Recommendation:** Move secrets to Azure Key Vault; use environment variables at runtime.

<If Q8 returns 0:>
✅ No hard-coded credential patterns detected in agent instructions or connector metadata.

---

## 🌐 External Endpoint & HTTP Risk

<If Q9 returns results:>
| Agent | API Type | Scheme | Host | Port | Insecure | Non-Standard Port |
|-------|----------|--------|------|------|----------|-------------------|
| <name> | <OpenApi/RemoteMCPServer> | <scheme> | <host> | <port> | 🔴/🟢 | 🔴/🟢 |

<If Q9 returns 0:>
✅ No external endpoints with insecure schemes or non-standard ports detected.

> **Coverage caveat:** Only `OpenApi` + `RemoteMCPServer` connectors declare `serverUrls`. Power Platform `api_action` connectors do not expose destination URLs.

---

## 👥 Creator Governance

### Top Creators
| Creator | Agents | Published | Generic Names | No Description |
|---------|--------|-----------|---------------|----------------|
| <upn/creatorId> | <N> | <N> | <N> | <N> |

### Naming Hygiene
- Agents with generic names ("Agent", "Test"): <N>
- Agents with no description: <N>

> **Coverage caveat:** Only agents with a populated `RawAgentInfo.creatorId` are attributed; GUIDs unresolved in `IdentityInfo` fall back to the raw GUID.

---

## 📈 Agent Creation Trend

<ASCII bar chart or summary table of Q11 results — weekly agent creation counts>

---

## 🛠️ Full Capability / Tools Inventory

| Operation / Tool | API / Tool Type | Agent Count | Example Agents |
|------------------|-----------------|-------------|----------------|
| <operationId/name> | <type> | <N> | <agent names> |

---

## 🔐 Capability Privilege Index (Supplementary — not summed into score)

**Operation sensitivity distribution (Q13):**
| Privilege Category | Agent Count |
|--------------------|-------------|
| Mail-Send | <N> |
| Directory-Write | <N> |
| Data-Write | <N> |
| Messaging | <N> |
| Security-Tooling | <N> |
| Other/Read | <N> |

**Index:** <SensitiveAgents> agent(s) hold ≥1 sensitive (write/send) operation; **<BroadAndSensitive>** of those are also broadly accessible (`allowForAllUsers == "true"`).

<If BroadAndSensitive > 0:>
🔴 **<BroadAndSensitive> broadly-accessible agent(s) with sensitive write/send capability** — direct privilege-abuse exposure. These justify maxing the Broad Access and/or XPIA dimensions.

<If BroadAndSensitive == 0:>
✅ No broadly-accessible agents hold sensitive write/send operations (within the DCM-bearing subset).

> Supplementary indicator — provides privilege context but is **not** added to the /100 composite. Coverage limited to the DCM-bearing subset (see banner).

---

## 🎯 Runtime Correlation — Active-and-Dangerous (Q15, Optional)

<If Phase 5 was run — flagged agents correlated against CopilotActivity:>
| Agent Name | Interactions | Distinct Users | Source IPs | Last Seen |
|------------|--------------|----------------|------------|-----------|
| <name> | <N> | <N> | <N> | <date> |

🔴 **Active-and-dangerous:** Flagged agents (broadly accessible / sensitive ops) confirmed active at runtime — prioritize for remediation over dormant flagged agents.

<If no flagged agents appear in CopilotActivity:>
✅ No flagged agents showed runtime activity in the window. *(Caveat: most `CopilotActivity` rows are unattributed — absence does not prove dormancy.)*

> Scoped name-based lookup (`AgentName` key). Runtime attribution is inherently low-coverage; this section confirms presence, not absence.

---

## Agent Security Score Card

```
┌──────────────────────────────────────────────────────┐
│          AGENT SECURITY SCORE: <NN>/100              │
│              Rating: <EMOJI> <RATING>                │
├──────────────────────────────────────────────────────┤
│ Broad Access     [<bar>] <N>/20  (<detail>)          │
│ XPIA Email Risk  [<bar>] <N>/20  (<detail>)          │
│ Tool & Endpt Expo[<bar>] <N>/20  (<detail>)          │
│ Data Source Risk [<bar>] <N>/20  (<detail>)          │
│ Credential Hygn  [<bar>] <N>/20  (<detail>)          │
├──────────────────────────────────────────────────────┤
│ Supplementary (not scored):                          │
│  Capability Privilege Index: <S> sensitive / <B> broad│
│  Deep-Manifest Coverage:     <DcmCoveragePct>%        │
└──────────────────────────────────────────────────────┘
```

---

## Security Assessment

| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |

---

## Recommendations

> **Key mitigation — Runtime:** For all high-risk agents, recommend enabling **Microsoft Defender Runtime Protection** — webhook-based real-time inspection that can block malicious tool invocations before execution. See [Real-time agent protection during runtime](https://learn.microsoft.com/en-us/defender-cloud-apps/real-time-agent-protection-during-runtime).

> **Key mitigation — Governance:** For fleet-wide governance gaps (sprawl, missing auth, uncontrolled tools), recommend adopting **[Microsoft Agent 365](https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-agent-365-the-control-plane-for-ai-agents/)** as the enterprise control plane — providing centralized Registry (inventory + quarantine), Access Control (Entra agent IDs + Policy Templates), Visualization (agent ↔ resource mapping), and Security (Defender + Purview integration).

1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...

---

## Appendix: Query Execution Summary

| Query | Description | Records | Time |
|-------|-------------|---------|------|
| Q1 | Global Inventory | <N> | <time> |
| Q2 | Status & Auth Breakdown | <N> | <time> |
| ... | ... | ... | ... |
| Q13 | Operation-Level Privilege Mapping | <N> | <time> |
| Q14 | Deep-Manifest Coverage | <N> | <time> |
| Q15 | Runtime Correlation (scoped, optional) | <N> | <time> |

Markdown File Report Template

When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:

reports/ai-agent-posture/AI_Agent_Posture_Report_YYYYMMDD_HHMMSS.md

Include the following additional sections in the file report that are omitted from inline:

  1. Full agent detail table (all non-deleted agents with key fields)
  2. Per-platform breakdown (agent counts and creators by Platform)
  3. Complete data source listing (every declared source name, not just examples)
  4. Complete MCP agent listing (every MCP agent with full server list)
  5. Raw query references — note that full query definitions are in this SKILL.md file

File Report Header

# AI Agent Security Posture Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** AgentsInfo (Advanced Hunting)
**Analysis Period:** <EarliestRecord> → <LatestRecord> (<N> days)
**Platforms:** <list discovered Platform values>
**Total Agents:** <N> (Published: <N>, Draft: <N>)

---

> 📊 **Deep-Manifest Coverage (Q14):** `<WithDCM>/<Total>` agents (**<DcmCoveragePct>%**) carry `declarativeCopilotMetadata`; XPIA/endpoint/capability findings cover only this subset. ObservabilityID **<ObsIdPct>%**, EntraAgentID **<EntraIdPct>%**.

---

Include the Capability Privilege Index and (if Phase 5 ran) Runtime Correlation sections from the inline template in the file report as well.


Known Pitfalls

1. AgentsInfo Is Advanced Hunting Only

Problem: The AgentsInfo table does NOT exist in Sentinel Data Lake. Querying via mcp_sentinel-data_query_lake returns SemanticError: Failed to resolve table.

Solution: Always use RunAdvancedHuntingQuery. The table has 30-day retention in AH.

2. Multiple Records Per Agent (State Snapshots)

Problem: The table logs configuration snapshots over time. Querying without deduplication returns inflated counts and duplicate agent entries.

Solution: Always use | summarize arg_max(Timestamp, *) by AgentId to get the latest state per agent before any analysis. Note AgentId is a guid and the column is Name/Description (not AgentName/AgentDescription as some docs state).

3. RawAgentInfo Is the Real Detail Source

Problem: The normalized columns (DeclaredTools, McpServers, DeclaredDataSources, Owners, Capabilities) are sparsely populated and flat. The rich governance/configuration detail lives in the RawAgentInfo dynamic column (populated for ~all agents) and, for the connector-sourced subset, in RawAgentInfo.declarativeCopilotMetadata (DCM).

Solution: For creator (RawAgentInfo.creatorId), broad access (RawAgentInfo.allowForAllUsers), app type (RawAgentInfo.appType), and deep capability/endpoint detail, parse RawAgentInfo. RawAgentInfo is dynamic — no double-parse needed; access nested keys directly with tostring(RawAgentInfo.key).

4. declarativeCopilotMetadata (DCM) Covers Only a Subset

Problem: Deep capability queries (Q5 email, Q9 endpoints, Q12 operations) depend on RawAgentInfo.declarativeCopilotMetadata, which is present for only ~10% of Copilot Studio agents (the connector-sourced subset). The majority have only a shallow manifest.

Solution: Always state the coverage caveat in reports. DCM path: declarativeCopilotMetadata[].actions[].apis[] with .type (OpenApi/RemoteMCPServer/api_action), .serverUrls[], and .operations[].operationId. Results from these queries are a floor, not a complete inventory.

5. Authentication-Type Detection Has No Equivalent

Problem: The old UserAuthenticationType (None/Microsoft/Custom) is gone. The ToolsAuthenticationType column exists in schema but is effectively empty (~100% blank). There is no way to classify agents as "unauthenticated" the way the old skill did.

Solution: Use RawAgentInfo.allowForAllUsers == "true" as a broad-exposure proxy (documented as a proxy, NOT an authentication state). Never claim an agent is "unauthenticated" — say "broadly accessible".

6. Many Schema Columns Are Empty in Practice

Problem: ToolsAuthenticationType, Availability, Endpoints, Triggers, Permissions, and Model are present in the schema but empty/null in practice. Queries built on them silently return 0 rows.

Solution: Do not build core logic on these columns. Validate population with a quick summarize countif(isnotempty(<col>)) before relying on a column. LifecycleStatus is blank for active agents (only Deleted is populated) — LifecycleStatus != "Deleted" correctly passes blanks.

7. creatorId Is a GUID — Join IdentityInfo for UPN

Problem: RawAgentInfo.creatorId is an Entra object GUID, not a UPN. There is no CreatorAccountUpn, LastModifiedByUpn, or LastPublishedByUpn equivalent.

Solution: Resolve via leftouter join to IdentityInfo on AccountObjectId, then coalesce(AccountUpn, CreatorId). Creator attribution is a lower bound — creatorId is sparse.

8. serverUrls Only Populated for OpenApi & RemoteMCPServer

Problem: External endpoint URLs in DCM apis[].serverUrls are populated for OpenApi and RemoteMCPServer connector types, but not for api_action (Power Platform connectors, the majority). Filtering all API types yields mostly empty URLs.

Solution: Filter ApiType in ("OpenApi", "RemoteMCPServer") before expanding serverUrls. State that api_action destinations are not inventoried.

9. McpServers Is Flat (Name/Description Only)

Problem: The dedicated McpServers column contains only {name, description} — no server URLs, credential configuration, or transport detail. Non-HTTPS MCP detection and hardcoded-cred-in-MCP detection from the old design are not possible.

Solution: Use McpServers for inventory/exposure counts only. For MCP server endpoints, fall back to the DCM RemoteMCPServer API type (Q9).

10. AH Booleans Are Textual True/False (Feb 25, 2026)

Problem: Since Feb 25, 2026, Advanced Hunting boolean results render as textual True/False, not 1/0. Governance flags from RawAgentInfo (e.g., allowForAllUsers) are JSON strings ("true"/"false").

Solution: Compare against the string form: tostring(RawAgentInfo.allowForAllUsers) == "true". Avoid == 1 / == true numeric/bool comparisons on parsed JSON values.

11. CopilotActivity Correlation — Composite AgentId & Fleet-Join Timeouts

Problem: Phase 5 runtime correlation against CopilotActivity has three traps: (1) CopilotActivity.AgentId is a composite/prefixed string (e.g., T_<tenant>.<guid>, CopilotStudio.Declarative.T_….gpt.<guid>, or literals like AgentBuilder) that does not equal the clean AgentsInfo.AgentId GUID — ID joins return 0 matches. (2) A fleet-wide AgentsInfoCopilotActivity join (~15k agents × 100k+ rows, heavy RawAgentInfo) times out the AH endpoint. (3) Most CopilotActivity rows have an empty AgentName/AgentId (general M365 Copilot usage), so runtime attribution is low-coverage.

Solution: Use a scoped name-based lookup (Query 15): build a small flagged-name list from Q4/Q13, then CopilotActivity | where AgentName in (FlaggedNames) — no join. AgentName is the reliable cross-table key. Never join the full fleet. Treat absence from CopilotActivity as unconfirmed, not proof of dormancy. AIModelName is sparse here — do not use it for model inventory.


Quality Checklist

Before delivering the report, verify:

  • All queries used arg_max(Timestamp, *) by AgentId for deduplication
  • All queries filtered LifecycleStatus != "Deleted" (unless auditing deletions)
  • All queries ran via RunAdvancedHuntingQuery (not Data Lake)
  • Zero-result queries are reported with explicit absence confirmation (✅ pattern)
  • The Agent Security Score calculation is transparent with per-dimension evidence
  • Broadly-accessible agents are described as a proxy (NOT "unauthenticated"); the auth-type gap is stated
  • DCM-dependent sections (XPIA email, external endpoints, capability inventory) include the coverage caveat
  • Deep-Manifest Coverage banner (Q14) is present at the top of the report (DCM %, ObservabilityID %, EntraAgentID %)
  • Capability Privilege Index (Q13) is reported as a supplementary indicator, explicitly noted as NOT summed into the /100 score
  • If Phase 5 ran, runtime correlation is SCOPED by AgentName (Query 15) — never a fleet-wide join; absence is described as unconfirmed, not dormant
  • Score card uses the Tool & Endpoint Exposure dimension label (not "MCP Server Expo") and shows the two supplementary indicators
  • MCP server inventory includes server names, not just counts
  • Declared data sources note that source-type classification is unavailable
  • Creator governance resolves creatorId GUIDs via IdentityInfo and notes the lower-bound caveat
  • Recommendations are prioritized and evidence-based
  • All hyperlinks in the report are copied verbatim from the URL Registry — no fabricated or recalled-from-memory URLs
  • No PII from live environments in the SKILL.md file itself

SVG Dashboard Generation

📊 Optional post-report step. After an AI Agent Security Posture report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/ai-agent-posture/AI_Agent_Posture_Report_<org>_<date>.md
  • Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/ai-agent-posture/{report_name}_dashboard.svg

The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.

审计Entra ID应用注册和服务主体的安全态势,结合Graph API库存与KQL攻击链检测,评估权限、所有者风险、凭证卫生及滥用信号,提供5维风险评分。
app registration posture service principal permissions dangerous app permissions app ownership app credential abuse SPN lateral movement app consent grant overprivileged apps cross-tenant SPN app registration kill chain app persistence credential add chain Graph API permissions audit
.github/skills/app-registration-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill app-registration-posture -g -y
SKILL.md
Frontmatter
{
    "name": "app-registration-posture",
    "description": "Audit Entra ID app registration and service principal security posture. Triggers on keywords like \"app registration posture\", \"service principal permissions\", \"dangerous app permissions\", \"app ownership\", \"app credential abuse\", \"SPN lateral movement\", \"app consent grant\", \"overprivileged apps\", \"cross-tenant SPN\", \"app registration kill chain\", \"app persistence\", \"credential add chain\", \"Graph API permissions audit\". Combines Graph API current-state inventory (dangerous permissions, ownership, credential hygiene) with KQL chain detection (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs) for posture assessment covering permission concentration, owner risk, credential hygiene, cross-tenant exposure, and active abuse signals. Includes 5-dimension App Permission Risk Score. Inline chat or markdown output.",
    "drill_down_prompt": "Run app registration posture audit — dangerous permissions, credential hygiene, abuse chains",
    "threat_pulse_domains": [
        "spn",
        "admin"
    ]
}

App Registration Security Posture — Instructions

Purpose

This skill audits the security posture of Entra ID App Registrations and Service Principals across your organization, combining Graph API current-state inventory with KQL attack chain detection to create a comprehensive assessment.

App Registrations are a growing persistence and lateral movement vector. Attackers who compromise a user with app ownership can add credentials (secrets/certificates), disconnect from the user session, and authenticate as the service principal — inheriting all the app's permissions. This is the exact pattern documented in the Guardz research and used in the SolarWinds/Solorigate attack.

What this skill covers:

Domain Key Questions Answered Data Source
🔐 Permission Inventory Which apps have dangerous Graph API permissions? How concentrated are critical permissions? Graph API
👤 Owner Risk Which app owners are non-admin users (phishing targets)? Are owners currently risky? Ownerless apps? Graph API + Q1
🔑 Credential Hygiene Stale secrets, multi-credential apps, long-lived credentials, cert+secret anomalies Graph API
🌐 Cross-Tenant Exposure Foreign SPNs authenticating into your tenant with dangerous permissions Q4
Active Abuse Chains Risky user → app ops, credential add → SPN activation, ownership → credential chains, Graph API lateral movement, permission escalation, multi-app ownership spread, App Governance & OAuth incident cross-reference Q1–Q8

How this differs from existing capabilities:

Existing Resource Coverage Gap This Skill Fills
app_credential_management.md Individual credential/ownership/consent events No cross-table chain correlation
service_principal_scope_drift.md SPN behavioral baseline drift No link to preceding compromise signals
App Governance (Microsoft) Anomalous app behavior, overprivileged apps No correlation with user risk signals or multi-step chains
This skill Graph API posture + KQL chain detection End-to-end: current state → historical abuse → risk scoring

Data sources:

Source Type What It Provides
AuditLogs (ApplicationManagement) KQL Credential adds, ownership changes, consent grants, permission assignments
AADServicePrincipalSignInLogs KQL SPN authentication patterns, cross-tenant sign-ins, credential types
AADUserRiskEvents KQL Identity Protection risk detections for app owners
MicrosoftGraphActivityLogs KQL Graph API calls by SPNs post-credential-add
AlertInfo + AlertEvidence KQL App Governance alerts, OAuth incidents, Attack Disruption events (Q8)
Graph API (/servicePrincipals, /applications) REST Current-state permission grants, app ownership, credential inventory

References:

🔴 URL Registry — Canonical Links for Report Generation

MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL. If a URL is not in this registry, omit the hyperlink entirely and use plain text.

Label Canonical URL
BLOG_GUARDZ https://guardz.com/blog/abusing-entra-id-app-registrations-for-long-term-persistence/
BLOG_SOLORIGATE https://www.microsoft.com/en-us/security/blog/2020/12/28/using-microsoft-365-defender-to-coordinate-protection-against-solorigate/
DOCS_APP_GOVERNANCE https://learn.microsoft.com/en-us/defender-cloud-apps/app-governance-manage-app-governance
DOCS_GRAPH_PERMS https://learn.microsoft.com/en-us/graph/permissions-reference
DOCS_FIRST_PARTY_APPS https://learn.microsoft.com/en-us/troubleshoot/entra/entra-id/governance/verify-first-party-apps-sign-in
MITRE_T1098_001 https://attack.mitre.org/techniques/T1098/001/
MITRE_T1550_001 https://attack.mitre.org/techniques/T1550/001/

Threat Landscape: Why App Registration Posture Matters

The attack pattern is well-documented and increasingly exploited:

User compromised → discovers app ownership → adds credential (secret/cert) →
disconnects from user session → authenticates AS the app (SPN) →
uses app permissions for lateral movement / data exfiltration / privilege escalation

Why app registrations are attractive to attackers:

Factor Risk
Persistence beyond user compromise Revoking the user's password doesn't revoke the app credential — the SPN continues to operate
Non-admin users as owners Standard users can own apps with Application.ReadWrite.All — if phished, the attacker inherits those permissions
Permissions outlive their creators App permissions persist even after the admin who granted them leaves the org
Cross-tenant trust Multi-tenant apps create implicit trust relationships that survive account remediation
Low visibility SPN sign-ins are in a separate log table (AADServicePrincipalSignInLogs) that many SOCs don't monitor

MITRE ATT&CK Mapping:

Technique ID Kill Chain Stage Detection Query
Additional Cloud Credentials T1098.001 Persistence Q2, Q3
Additional Cloud Roles T1098.003 Privilege Escalation Q6
Cloud Accounts T1078.004 Initial Access / Persistence Q1
Application Access Token T1550.001 Lateral Movement Q2, Q5
SAML/OAuth Tokens T1606.002 Credential Access Q4
Impersonation T1656 Defense Evasion Q4

Q8 note: Q8 (App Governance & OAuth Incident Cross-Reference) is a detection validation query, not a technique-specific detector. It cross-references existing Defender detections spanning multiple techniques above against Phase 1 findings.


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules — Mandatory rules
  2. Schema Pitfalls — AuditLogs and Graph API pitfalls
  3. Dangerous Permissions Reference — Application-level Graph API grants
  4. App Permission Risk Score Formula — Composite risk scoring
  5. Execution Workflow — Phase-by-phase plan
  6. Phase 1: Graph API Posture Inventory — Steps P1–P7
  7. Phase 2: KQL Chain Detection Queries — Queries Q1–Q8
  8. Output Modes — Inline vs Markdown report
  9. Inline Report Template — Chat-rendered format
  10. Markdown File Report Template — Disk-saved format
  11. Known Pitfalls — Schema quirks and edge cases
  12. Quality Checklist — Pre-delivery validation
  13. SVG Dashboard Generation — Visual dashboard from report

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

  1. Dual data source skill: This skill uses BOTH Graph API (via Graph MCP) for current-state posture AND KQL (via RunAdvancedHuntingQuery) for historical chain detection. Both phases are required for a complete assessment.

  2. Graph API before KQL: Run Phase 1 (Graph API posture) first — it identifies the dangerous apps. Phase 2 (KQL chains) then checks whether those apps show historical abuse signals.

  3. Use RunAdvancedHuntingQuery for all KQL queries. All tables used (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs, AlertInfo, AlertEvidence) are available in Advanced Hunting. AH is free for Analytics-tier tables. Data Lake fallback only if AH fails or lookback > 30 days (note: AlertInfo/AlertEvidence are AH-only).

  4. ASK the user for output format before generating the report:

    • Inline chat summary (quick review in chat)
    • Markdown file report (detailed, archived to reports/app-registration-posture/)
    • Both (markdown + inline summary)
  5. ⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (✅ No [finding] detected) when queries return 0 results. Never guess or assume.

  6. AuditLogs dynamic fields require special handling — Always extract with tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName). See Schema Pitfalls.

  7. Graph API: query from the permission side, not the app side — Don't enumerate all app registrations (could be 1000+). Query appRoleAssignedTo on the Microsoft Graph service principal to get all dangerous grants in ~3 API calls. See Phase 1 Scaling Strategy.

  8. Run KQL queries in parallel batches where possible — Q1–Q8 are all independent and can run in parallel.

  9. Time tracking — Report elapsed time after each phase completion.

⛔ PROHIBITED ACTIONS

Action Status
Enumerating all app registrations individually via Graph API PROHIBITED — use appRoleAssignedTo approach
Querying requiredResourceAccess for granted permissions PROHIBITED — shows requested, not granted perms
Querying ServicePrincipal for ownership (/servicePrincipals/{id}?$expand=owners) PROHIBITED — ownership is on Application object
Joining AuditLog operations on TargetResources[0].id across operation types PROHIBITED — AppId ≠ SPNId for same app
Reporting 0 KQL results without sanity-checking the query logic PROHIBITED
Fabricating URLs not in the URL Registry PROHIBITED

Schema Pitfalls

Read these before modifying any query in this skill.

Pitfall Details Workaround
Application ObjectId ≠ ServicePrincipal ObjectId The same app has different GUIDs in TargetResources[0].id depending on operation type. Credential operations → Application ObjectId; permission/consent operations → ServicePrincipal ObjectId Join on displayName or Actor when correlating across operation types (see Q6)
Ownership target name in modifiedProperties For "Add owner to application", TargetResources[0] is the new owner (User type). The app name is in TargetResources[0].modifiedProperties[1].newValue (field Application.DisplayName) Extract with tostring(parse_json(tostring(ModProps[1].newValue)))
OperationName trailing spaces Credential operations have trailing spaces: "Update application – Certificates and secrets management " Preserve trailing spaces in filters or use has instead of ==
InitiatedBy is dynamic Always extract with tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName) Never use dot-notation directly
Consent targets structure "Consent to application": Target[0] = the app receiving consent. "Add delegated permission grant": Target[0] = the resource API (e.g., Microsoft Graph), Target[1] = the app Check OperationName before assuming Target[0] is the app
Cross-tenant SPNs have no local app object GET /v1.0/applications?$filter=displayName eq 'X' returns empty for SPNs owned by foreign tenants Identify via AADServicePrincipalSignInLogs where AppOwnerTenantId != AADTenantId (Q4). These can only be managed by the owning tenant
SP owners ≠ Application owners /servicePrincipals/{id}?$expand=owners often returns empty even when the Application has owners Always query the Application object for ownership
requiredResourceAccess ≠ granted permissions The Application object's requiredResourceAccess shows what the app requests, not what's been granted Use appRoleAssignedTo for granted permissions — this is the authoritative source
Red team apps may have owners stripped Attack simulation tools often remove ownership post-creation Fall back to AuditLogs "Add application" operation to find the original creator

Dangerous Permissions Reference

Application-level Graph API grants that this skill flags:

Permission Risk Attack Use
Application.ReadWrite.All 🔴 Critical Create/modify any app registration — further persistence
AppRoleAssignment.ReadWrite.All 🔴 Critical Grant itself or any app any permission — golden ticket
RoleManagement.ReadWrite.Directory 🔴 Critical Assign any directory role to any principal
Directory.ReadWrite.All 🔴 Critical Read/write all directory objects
Policy.ReadWrite.ConditionalAccess 🔴 Critical Disable CA policies — defense evasion
Mail.ReadWrite 🟠 High Read any user's mailbox — data exfiltration
Mail.Send 🟠 High Send email as any user — phishing, BEC
Mail.Read 🟠 High Read any user's mail — reconnaissance
MailboxSettings.ReadWrite 🟠 High Create forwarding rules — silent exfiltration
User.ReadWrite.All 🟠 High Modify any user account — credential reset
Group.ReadWrite.All 🟠 High Modify group membership — privilege escalation
Files.ReadWrite.All 🟠 High Access all SharePoint/OneDrive files
Sites.ReadWrite.All 🟠 High Full SharePoint site access
SecurityEvents.ReadWrite.All 🟡 Medium Read/modify security alerts — cover tracks
User.Export.All 🟡 Medium Export all user data — bulk exfiltration
Exchange.ManageAsApp 🟡 Medium Full Exchange management — mailbox access

Permission risk classification for scoring:

  • Critical (🔴): Permissions that enable self-elevation or directory-wide control — 5 permissions listed above
  • High (🟠): Permissions that enable data access or account manipulation — 8 permissions listed above
  • Medium (🟡): Permissions that enable reconnaissance or secondary access — 3 permissions listed above

🔴 Delegated vs Application Permissions — Risk Model

This skill focuses on application permissions (appRoleAssignments) because they represent unattended, user-independent privilege. Delegated permissions (oauth2PermissionGrants) are a fundamentally different risk category. Do not conflate the two.

Why This Distinction Matters

Factor Application Permissions (appRoleAssignments) Delegated Permissions (oauth2PermissionGrants)
Identity App acts as its own identity — no user context required App acts on behalf of a signed-in user
Effective permissions The full granted scope — the app CAN do everything the permission allows Intersection of app's delegated scope AND the user's own Entra roles — the app can only do what the user could already do
Unattended access ✅ Yes — runs 24/7 via client credentials or managed identity ❌ No — requires a user session (interactive or refresh token)
Blast radius The permission itself IS the blast radius — Directory.ReadWrite.All means full directory write for the app, regardless of who triggered it Bounded by the user's roles — a standard user with Directory.ReadWrite.All delegated consent still can't write to the directory because they lack the Entra role
Token theft impact Stolen app credential = full permission scope, no MFA challenge Stolen user token = only the user's own effective permissions, bounded by their roles
Risk priority 🔴 Primary concern — this skill's focus 🟡 Secondary concern — relevant only for privileged admin accounts

What AllPrincipals Delegated Consent Actually Does

An AllPrincipals (admin consent) delegated grant removes the per-user consent prompt — it does NOT grant users abilities beyond their existing Entra roles. The practical impact:

  • Standard users: Effectively no additional risk. The app can request tokens with broad scopes, but the effective permissions are still limited by the user's role assignments. A user without Exchange Admin role cannot manage mailboxes even if Mail.ReadWrite is consented.
  • Privileged admins: Marginal incremental risk. The consent prompt is removed as a speed bump, so a stolen admin session can silently acquire tokens with the consented scopes — but the admin could have granted that consent themselves in one click anyway.
  • Token theft for admins: The real scenario where delegated consent matters. An attacker with a stolen Global Admin refresh token can silently use any AllPrincipals-consented scope without triggering a consent dialog. However, the admin already had the ability to do everything those scopes enable.

How This Affects Skill Analysis

  1. Phase 1 (P2) queries appRoleAssignedTo — these are application permissions. This is correct and intentional. The Dangerous Permissions Reference table above applies to application-level grants only.

  2. Chain detection queries (Q1, Q3, Q6) detect "Consent to application" and "Add delegated permission grant" in AuditLogs — these detect the act of granting consent, which is a valid abuse signal regardless of permission type (a compromised user granting broad consent is suspicious). The risk assessment should focus on what the user then DOES with the consented access, not on the scope list itself.

  3. When assessing consent grants in chain detection output:

    • A compromised user adding application permissions (Add app role assignment to service principal) = 🔴 Critical — the app gains independent, unattended access
    • A compromised user granting delegated consent (Consent to application, Add delegated permission grant) = 🟠 High if the user is a privileged admin, 🟡 Medium for standard users — the effective permissions are bounded by the user's roles
  4. Do NOT overstate delegated AllPrincipals consent risk. Reporting 100+ delegated scopes as "dangerous" without explaining the intersection model misleads stakeholders into believing any user can exploit those scopes. Always qualify: "Effective delegated permissions are limited to what each user's Entra roles already allow."

When Delegated Permissions ARE Concerning

Despite the lower baseline risk, flag delegated consents when:

Scenario Why It Matters
AllPrincipals consent on a 3rd-party (non-Microsoft) app with broad scopes The app vendor could be compromised, and the consent enables data access for any admin session
Delegated consent combined with Q1 chain (risky admin → consent grant) A compromised admin granting broad delegated consent may be preparing for token-based lateral movement
App has BOTH application permissions AND broad delegated consent Dual permission model = dual attack surface
AllPrincipals consent for offline_access + sensitive scopes on a public client app Enables refresh token persistence without re-authentication

⛔ PROHIBITED Actions

Action Status
Stating that AllPrincipals delegated consent gives "any user" access to the scoped resources PROHIBITED — effective permissions = intersection with user's roles
Rating delegated consent scopes at the same severity as identical application permission scopes PROHIBITED — application permissions are unattended and user-independent
Omitting the delegated-vs-application distinction when presenting permission findings PROHIBITED — always clarify which permission type is being discussed
Ignoring delegated consent entirely PROHIBITED — it is a secondary risk that matters for privileged accounts

App Permission Risk Score Formula

The App Permission Risk Score is a composite risk indicator summarizing the security posture of your organization's app registration and service principal fleet. Higher scores indicate greater risk.

Scoring Dimensions

$$ \text{AppPermissionRiskScore} = \sum_{i} \text{DimensionScore}_i $$

Each dimension contributes 0–20 points to a maximum of 100:

Dimension Max 🟢 Low (0–5) 🟡 Medium (6–12) 🔴 High (13–20)
Permission Concentration 20 0–2 apps with dangerous perms; 0 critical-tier perms 3–5 apps with dangerous perms; ≤1 app with ≥3 critical-tier perms >5 apps with dangerous perms OR ≥2 apps with ≥3 critical-tier perms OR any app with AppRoleAssignment.ReadWrite.All (golden ticket → auto 16+)
Owner Risk 20 All flagged apps have admin owners; 0 ownerless dangerous apps 1–2 ownerless dangerous apps; OR non-admin owner on 🟠-level app ≥3 ownerless apps with dangerous perms OR non-admin owner on 🔴-level app OR any app owner with active Identity Protection risk (atRisk/confirmedCompromised)
Credential Hygiene 20 All apps ≤1 active credential; all secrets <180 days old; 0 dormant privileged apps Any app with 2 active secrets; OR any secret 180d–730d old; OR 1 dormant privileged app Any app with ≥3 active secrets + critical perms; OR any secret >730d old (2yr); OR cert+secret on same critical app
Cross-Tenant Exposure 20 0 foreign SPNs with dangerous perms 1–2 foreign SPNs with 🟠-level perms; all from known/identified partner tenants Any foreign SPN with 🔴 critical perms (AppRoleAssignment.ReadWrite.All, Directory.ReadWrite.All, RoleManagement.ReadWrite.Directory, Policy.ReadWrite.ConditionalAccess) OR foreign SPN from unidentified tenant
Active Abuse Signals 20 Q1–Q8 all return 0 non-pipeline results Q1–Q7 return only 🟡-priority results (after pipeline collapse); OR only suspiciousAuthAppApproval self-referencing chains; OR Q8 returns only App Governance “Unused”/“Expiring” alerts with no XDR/MCAS overlap Q1 returns any chain with adminConfirmedUserCompromised or confirmedCompromised (→ auto 15+); OR Q6 returns 🔴-priority cred→consent chain from a user with active Identity Protection risk; OR Q8 returns apps with DetectionBreadth ≥2 (multi-source detections) or any Attack Disruption incident

Scoring Anchors (Deterministic Rules)

Apply these anchors BEFORE adjusting within bands. They set a floor for the dimension score:

Condition Dimension Minimum Score
AppRoleAssignment.ReadWrite.All granted to ANY app Permission Concentration 16
Any app owner has adminConfirmedUserCompromised Owner Risk 15
Any secret >730 days old on an app with critical perms Credential Hygiene 14
Foreign SPN with AppRoleAssignment.ReadWrite.All Cross-Tenant Exposure 17
Q1 chain with adminConfirmedUserCompromised → app consent Active Abuse Signals 15
Q8 returns any Attack Disruption incident for an app in Phase 1 Active Abuse Signals 16
Q8 returns app with DetectionBreadth ≥3 AND in Phase 1 flagged list Active Abuse Signals 14
All Q1–Q8 non-pipeline results = 0 Active Abuse Signals ≤5 (cap)

Interpretation Scale

Score Rating Action
0–20 ✅ Healthy Normal posture, routine monitoring
21–45 🟡 Elevated Review — minor permission sprawl or credential age detected
46–70 🟠 Concerning Investigate — multiple risk signals across dimensions
71–100 🔴 Critical Immediate remediation — active abuse chains or critical permission concentration

Execution Workflow

Phase 0: Prerequisites

  1. Confirm Graph MCP (mcp_graph-mcp-ser) is available for posture queries
  2. Confirm RunAdvancedHuntingQuery is available for chain detection
  3. Ask user for output format (inline / markdown / both)
  4. Ask user for lookback period (default: 30 days for KQL queries)

Phase 1: Graph API Posture Inventory (Steps P1–P7)

Sequential — each step depends on the previous.

Step Purpose API Call(s)
P1 Find Microsoft Graph service principal ID in tenant 1 call
P2 List ALL application permission grants to Microsoft Graph 1 call (paginated) — save to temp/p2_grants.json
P3 Resolve permission GUIDs to human-readable names 1 call — run in parallel with P2 — save to temp/p3_approles.json
P4 Filter to dangerous permissions (PowerShell script) 0 API calls — joins P2+P3 JSON, outputs flagged apps
P5 Resolve owners for flagged apps N calls (only flagged apps)
P6 Assess owner risk (directory roles) M calls (only flagged owners)
P7 Credential hygiene check (from P5 response) 0 calls

Total: 3 + N + M calls (typically < 20 for most tenants)

Phase 2: KQL Chain Detection (Q1–Q8)

Run in parallel — no dependencies between queries. Q8 uses a 90-day lookback (incident data is sparser); Q1–Q7 use 30 days.

Query Purpose Tables Kill Chain Stage
Q1 Risky User → App Operations Chain AADUserRiskEvents + AuditLogs Compromise → App Abuse
Q2 Credential Add → SPN Activation AuditLogs + AADServicePrincipalSignInLogs Persistence → SPN Impersonation
Q3 Ownership Add → Credential Modification Chain AuditLogs (self-join) Privilege Escalation → Persistence
Q4 Cross-Tenant SPN Sign-Ins AADServicePrincipalSignInLogs Lateral Movement (cross-tenant)
Q5 Credential Add → SPN Graph API Lateral Movement AuditLogs + MicrosoftGraphActivityLogs Lateral Movement / Data Exfiltration
Q6 Credential Add → Permission Escalation Chain AuditLogs (self-join) Persistence → Privilege Escalation
Q7 Multi-App Ownership Spread AuditLogs Persistence (breadth)
Q8 App Governance & OAuth Incident Cross-Reference AlertInfo + AlertEvidence Detection Validation

Phase 3: Score Computation & Report Generation

  1. Compute per-dimension scores from Phase 1 and Phase 2 data
  2. Cross-reference: Map Phase 1 flagged apps to Phase 2 chain detections
  3. Sum dimension scores for composite App Permission Risk Score
  4. Generate report in requested output mode
  5. Report total elapsed time

Phase 1: Graph API Posture Inventory

Scaling Strategy: Don't enumerate all app registrations (could be 1000+). Query from the permission grant side — find what's been granted dangerous permissions, then resolve owners only for those flagged apps.

Step P1: Find the Microsoft Graph Service Principal ID

The Microsoft Graph resource service principal is the target of all application permission grants. Its well-known AppId is 00000003-0000-0000-c000-000000000000, but its ObjectId varies per tenant.

GET /v1.0/servicePrincipals?$filter=appId eq '00000003-0000-0000-c000-000000000000'&$select=id,displayName

Save the returned id — you'll need it for Steps P2 and P3.

Step P2: List ALL Application Permission Grants to Microsoft Graph

This single call returns every app in the tenant that has been granted application-level permissions (not delegated) to Microsoft Graph.

GET /v1.0/servicePrincipals/{graph-sp-id}/appRoleAssignedTo
    ?$select=principalDisplayName,principalId,principalType,appRoleId,createdDateTime
    &$top=999

Returns: One row per permission grant. Each row contains:

  • principalDisplayName — app name
  • principalId — ServicePrincipal ObjectId
  • appRoleId — permission GUID
  • createdDateTime — when the permission was granted

Post-processing: Group by principalDisplayName to get the per-app permission list.

⚠️ Large Response Handling: P2 can return hundreds of rows (one per permission grant across all apps). When the response is large:

  1. Save P2 and P3 responses to temp/ as JSON files before processing — this prevents data loss if context gets truncated
  2. Run P2 and P3 in parallel — they are independent (P3 only needs the Graph SP ID from P1, same as P2)
  3. Use PowerShell for the GUID→name join and dangerous-permission filter — do NOT attempt to parse large JSON in-context. Write a script that:
    • Loads P2 grants + P3 appRoles from the saved JSON files
    • Builds the appRoleIdvalue lookup map
    • Filters to dangerous permissions
    • Groups by app name
    • Outputs the flagged-app summary (app name, dangerous perms, grant dates, principalId)
  4. Only bring the filtered summary back into context — the full P2/P3 data stays in temp files for reference
# Save MCP responses to temp files first, then:
$grants = Get-Content "temp/p2_grants.json" -Raw | ConvertFrom-Json
$roles = Get-Content "temp/p3_approles.json" -Raw | ConvertFrom-Json

# Build GUID→name map
$roleMap = @{}
foreach ($r in $roles) { $roleMap[$r.id] = $r.value }

# Dangerous permissions list
$dangerousPerms = @(
    "Directory.ReadWrite.All", "Application.ReadWrite.All",
    "AppRoleAssignment.ReadWrite.All", "RoleManagement.ReadWrite.Directory",
    "Mail.ReadWrite", "Mail.Send", "Mail.Read",
    "Files.ReadWrite.All", "User.ReadWrite.All", "Group.ReadWrite.All",
    "Sites.ReadWrite.All", "MailboxSettings.ReadWrite", "User.Export.All",
    "Exchange.ManageAsApp", "full_access_as_app",
    "Policy.ReadWrite.ConditionalAccess", "SecurityEvents.ReadWrite.All"
)

# Enrich grants with permission names and filter
$enriched = $grants | ForEach-Object {
    $permName = $roleMap[$_.appRoleId]
    [PSCustomObject]@{
        App = $_.principalDisplayName
        PrincipalId = $_.principalId
        Permission = $permName
        Dangerous = $permName -in $dangerousPerms
        GrantDate = $_.createdDateTime
    }
}

# Summary: apps with dangerous permissions
$flagged = $enriched | Where-Object Dangerous | Group-Object App | ForEach-Object {
    [PSCustomObject]@{
        App = $_.Name
        DangerousPerms = ($_.Group.Permission | Sort-Object -Unique) -join ", "
        Count = $_.Count
        LatestGrant = ($_.Group.GrantDate | Sort-Object -Descending | Select-Object -First 1)
        PrincipalId = $_.Group[0].PrincipalId
    }
} | Sort-Object Count -Descending

# Display summary
$totalApps = ($enriched | Select-Object -Unique App).Count
Write-Host "Total apps with Graph permissions: $totalApps"
Write-Host "Apps with dangerous permissions: $($flagged.Count)"
Write-Host "Total dangerous grants: $(($enriched | Where-Object Dangerous).Count)"
$flagged | Format-Table -AutoSize

This script replaces the manual P3/P4 steps — it does the GUID resolution AND dangerous-permission filtering in one pass.

Step P3: Resolve Permission GUIDs to Names

Run in parallel with P2 — both only need the Graph SP ID from P1.

GET /v1.0/servicePrincipals/{graph-sp-id}/appRoles

Returns: Complete list of Microsoft Graph permission definitions with id (GUID), value (e.g., Mail.ReadWrite), and displayName.

Save the response to temp/p3_approles.json. The PowerShell script from P2 loads this file to build the GUID→name lookup.

Step P4: Filter to Dangerous Permissions

Handled by the PowerShell script in P2. The script performs GUID→name join, dangerous-permission filter, and per-app grouping in one pass. No additional API calls needed.

Output: A table of flagged apps with their dangerous permission list, permission risk level, and grant dates.

Step P5: Resolve Owners for Flagged Apps

Only for apps flagged in P4, retrieve owners from the Application object (NOT the ServicePrincipal):

GET /v1.0/applications?$filter=displayName eq '{flagged-app-name}'
    &$select=id,appId,displayName,passwordCredentials,keyCredentials
    &$expand=owners($select=id,displayName,userPrincipalName)

Repeat for each flagged app. Important:

  • Cross-tenant SPNs return empty results (no local Application object)
  • Red team apps may have owners stripped post-creation
  • For ownerless apps, fall back to AuditLogs "Add application" to find original creator

Step P6: Assess Owner Risk

For each owner found in P5:

  1. Check directory roles — is the owner a privileged admin or a standard user?

    GET /v1.0/roleManagement/directory/roleAssignments
        ?$filter=principalId eq '{owner-id}'
        &$expand=roleDefinition($select=displayName)
    

    Non-admin owners of apps with critical permissions = the Guardz attack vector.

  2. Check Identity Protection risk — feed owner.userPrincipalName into Q1 to detect active risk events. An owner currently flagged by Identity Protection who owns a dangerous app is the highest-priority finding.

Step P7: Credential Hygiene Check

The P5 response includes passwordCredentials and keyCredentials. Assess:

Check Field Risk
Multiple active secrets passwordCredentials[] where endDateTime > now 🟠 Multiple access methods — harder to revoke
Long-lived secrets endDateTime > 2 years from startDateTime 🟠 Stale credential risk — may leak without detection
No credentials at all Empty passwordCredentials + keyCredentials 🟢 App can't be used for SPN auth (lower risk)
Certificate + Secret both active Both arrays non-empty 🟡 Review — cert is expected, secret alongside is unusual

Phase 2: KQL Chain Detection Queries

All queries below are verified against live data. Use them exactly as written, substituting only the lookback period and chain windows where noted.

Tool: Use RunAdvancedHuntingQuery for all queries. All tables are Analytics-tier — AH queries are free. Fall back to mcp_sentinel-data_query_lake only for lookback > 30 days.

Query 1: Risky User → App Operations Chain (HIGHEST SIGNAL)

Purpose: Detect users with active Identity Protection risk detections who then perform app credential, ownership, or consent operations.

Kill Chain Stage: Compromise → App Abuse

Tables: AADUserRiskEvents + AuditLogs

Why high signal: A user flagged by Identity Protection performing app credential operations within days is strong evidence of the exact attack pattern described in the Guardz research.

// Chain Detection: Users with active risk → app credential/ownership operations
let lookback = 30d;
let chainWindow = 7d; // Risk event → app operation within 7 days
// Step 1: Users with unresolved or confirmed risk
let RiskyUsers = AADUserRiskEvents
| where TimeGenerated > ago(lookback)
| where RiskState in ("atRisk", "confirmedCompromised")
| summarize 
    RiskEvents = count(),
    RiskTypes = make_set(RiskEventType, 5),
    MaxRiskLevel = max(RiskLevel),
    EarliestRisk = min(TimeGenerated),
    LatestRisk = max(TimeGenerated)
    by UserPrincipalName;
// Step 2: App credential/ownership/consent operations by those users
AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName has_any ("credential", "secret", "certificate", "owner", "consent", "permission")
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| where isnotempty(InitiatedByUser)
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppName = coalesce(
    tostring(Target.displayName),
    tostring(parse_json(tostring(parse_json(tostring(Target.modifiedProperties))[1].newValue))))
| join kind=inner RiskyUsers on $left.InitiatedByUser == $right.UserPrincipalName
| where TimeGenerated between (EarliestRisk .. (LatestRisk + chainWindow))
| project 
    RiskDetectedAt = EarliestRisk,
    AppOperationAt = TimeGenerated,
    TimeDeltaHours = datetime_diff('hour', TimeGenerated, EarliestRisk),
    User = InitiatedByUser,
    RiskTypes,
    MaxRiskLevel,
    RiskEvents,
    OperationName,
    TargetApp = TargetAppName,
    CorrelationId
| order by RiskDetectedAt desc

Triage Priority:

  • 🔴 Critical: MaxRiskLevel = high + credential add operation → likely active compromise
  • 🟠 High: MaxRiskLevel = medium + ownership add → attacker positioning for persistence
  • 🟡 Medium: MaxRiskLevel = low + consent grant → may be suspiciousAuthAppApproval self-referencing

Tuning:

  • Tighten chainWindow to 1d for higher precision
  • Add | where RiskTypes !has "suspiciousAuthAppApproval" to exclude consent-flagging-consent loops

Query 2: Credential Add → SPN Activation from New Origin

Purpose: After a credential is added to an app, detect when the SPN authenticates from a new IP within 72 hours. This is the SolarWinds "backdoor credential → authenticate as the app" pattern.

Kill Chain Stage: Persistence → SPN Impersonation

Tables: AuditLogs + AADServicePrincipalSignInLogs

// Chain Detection: Credential added → SPN signs in within 72h
let lookback = 30d;
let activationWindow = 72h;
// Step 1: Credential additions with actor and target
let CredentialAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
    "Update application – Certificates and secrets management ",
    "Add service principal credentials"
  )
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| extend InitiatedByApp = tostring(parse_json(tostring(InitiatedBy)).app.displayName)
| extend Actor = iff(isnotempty(InitiatedByUser), InitiatedByUser, InitiatedByApp)
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppName = tostring(Target.displayName)
| extend TargetAppId = tostring(Target.id)
| extend ModifiedProps = parse_json(tostring(Target.modifiedProperties))
| extend KeyDescription = tostring(ModifiedProps[0].newValue)
| extend CredentialType = case(
    KeyDescription has "AsymmetricX509Cert", "Certificate",
    KeyDescription has "Password", "Client Secret",
    "Unknown")
| project CredAddTime = TimeGenerated, Actor, TargetAppName, TargetAppId, CredentialType, CorrelationId;
// Step 2: SPN sign-ins after credential add
CredentialAdds
| join kind=inner (
    AADServicePrincipalSignInLogs
    | where TimeGenerated > ago(lookback)
    | where ResultType == "0" // successful only
    | project SPNSignInTime = TimeGenerated, AppId, ServicePrincipalName, IPAddress, 
        Location, ResourceDisplayName, ClientCredentialType,
        ServicePrincipalCredentialKeyId
) on $left.TargetAppId == $right.AppId
| where SPNSignInTime between (CredAddTime .. (CredAddTime + activationWindow))
| summarize
    SPNSignIns = count(),
    DistinctIPs = dcount(IPAddress),
    IPs = make_set(IPAddress, 10),
    Resources = make_set(ResourceDisplayName, 5),
    CredTypes = make_set(ClientCredentialType, 5),
    FirstSignIn = min(SPNSignInTime),
    LastSignIn = max(SPNSignInTime)
    by CredAddTime, Actor, TargetAppName, TargetAppId, CredentialType, CorrelationId
| extend HoursToActivation = datetime_diff('hour', FirstSignIn, CredAddTime)
| order by CredAddTime desc

Triage Priority:

  • 🔴 Critical: HoursToActivation < 1 + new IP not in SPN's historical baseline
  • 🟠 High: HoursToActivation < 24 + accessing sensitive resources (Graph, Key Vault)
  • 🟡 Medium: Normal activation window but from multiple IPs

Enhancement: Run the SPN scope drift skill (.github/skills/scope-drift-detection/spn/SKILL.md) on any flagged SPN for baseline comparison.

Query 3: Ownership Add → Credential Modification Chain

Purpose: Detect the exact Guardz attack sequence — user is added as app owner, then credentials are modified on that app within 7 days. The SameActorAsNewOwner flag is key: if the newly added owner immediately creates a credential, that's the attacker using ownership to establish persistence.

Kill Chain Stage: Privilege Escalation → Persistence

Tables: AuditLogs (self-join)

// Chain Detection: Owner added to app → credential/permission op on same app within 7d
let lookback = 30d;
let chainWindow = 7d;
// Step 1: Ownership additions — extract new owner and target app
let OwnershipAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ ("Add owner to application", "Add owner to service principal")
| extend Target0 = parse_json(tostring(TargetResources))[0]
| extend NewOwnerUPN = tostring(Target0.userPrincipalName)
| extend NewOwnerId = tostring(Target0.id)
| extend ModProps = parse_json(tostring(Target0.modifiedProperties))
| extend TargetAppName = tostring(parse_json(tostring(ModProps[1].newValue)))
| extend TargetAppId = tostring(parse_json(tostring(ModProps[0].newValue)))
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| extend Actor = iff(isnotempty(InitiatedByUser), InitiatedByUser, tostring(parse_json(tostring(InitiatedBy)).app.displayName))
| project OwnerAddTime = TimeGenerated, Actor, NewOwnerUPN, TargetAppName, TargetAppId, OperationName;
// Step 2: Credential or permission operations on the same app
AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
    "Update application – Certificates and secrets management ",
    "Add service principal credentials",
    "Add delegated permission grant",
    "Consent to application",
    "Add app role assignment to service principal"
  )
| extend Target = parse_json(tostring(TargetResources))[0]
| extend CredTargetId = tostring(Target.id)
| extend CredActor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| join kind=inner OwnershipAdds on $left.CredTargetId == $right.TargetAppId
| where TimeGenerated between (OwnerAddTime .. (OwnerAddTime + chainWindow))
| project
    OwnerAddTime,
    CredOpTime = TimeGenerated,
    HoursGap = datetime_diff('hour', TimeGenerated, OwnerAddTime),
    NewOwnerUPN,
    CredActor,
    SameActorAsNewOwner = (CredActor =~ NewOwnerUPN),
    OwnershipOp = OperationName1,
    CredentialOp = OperationName,
    TargetAppName,
    TargetAppId
| order by OwnerAddTime desc

Triage Priority:

  • 🔴 Critical: SameActorAsNewOwner = true + HoursGap < 1 → scripted attack
  • 🟠 High: SameActorAsNewOwner = true + HoursGap < 24 → manual attacker
  • 🟡 Medium: Different actors (admin added owner, owner later legitimately rotated creds)

Query 4: SPN Cross-Tenant Sign-Ins

Purpose: Detect service principals owned by external tenants authenticating into your tenant. Multi-tenant app abuse was the core SolarWinds persistence mechanism.

Kill Chain Stage: Lateral Movement (cross-tenant)

Tables: AADServicePrincipalSignInLogs

// Detect cross-tenant SPN authentication — foreign SPNs accessing local resources
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(30d)
| where ResultType == "0"
| where isnotempty(AppOwnerTenantId)
| where AppOwnerTenantId != AADTenantId
| summarize 
    SignIns = count(),
    DistinctIPs = dcount(IPAddress),
    IPs = make_set(IPAddress, 5),
    Resources = make_set(ResourceDisplayName, 10),
    CredTypes = make_set(ClientCredentialType, 5),
    Locations = make_set(Location, 5),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by ServicePrincipalName, AppId, AppOwnerTenantId, AADTenantId
| order by SignIns desc

Triage Priority:

  • 🔴 Critical: Unknown foreign tenant SPN accessing sensitive resources (Graph, Key Vault, ARM)
  • 🟠 High: Known partner/vendor SPN with new access patterns
  • 🟡 Low: Microsoft first-party service SPNs (verify against first-party app list)

Enhancement — New Cross-Tenant SPNs (first seen in last 7d vs 30d baseline):

let recent = 7d;
let baseline = 30d;
let RecentCrossTenant = AADServicePrincipalSignInLogs
| where TimeGenerated > ago(recent)
| where ResultType == "0"
| where AppOwnerTenantId != AADTenantId
| distinct AppId, ServicePrincipalName, AppOwnerTenantId;
let BaselineCrossTenant = AADServicePrincipalSignInLogs
| where TimeGenerated between (ago(baseline) .. ago(recent))
| where ResultType == "0"
| where AppOwnerTenantId != AADTenantId
| distinct AppId;
RecentCrossTenant
| join kind=leftanti BaselineCrossTenant on AppId
| project ServicePrincipalName, AppId, AppOwnerTenantId

Query 5: Credential Add → SPN Graph API Lateral Movement

Purpose: After a credential is added, track what Graph API calls the SPN makes. Categorizes API endpoints into sensitive categories to identify lateral movement and data exfiltration.

Kill Chain Stage: Lateral Movement / Data Exfiltration

Tables: AuditLogs + MicrosoftGraphActivityLogs

Prerequisite: MicrosoftGraphActivityLogs must be ingested (requires Entra ID P1/P2 + diagnostic settings enabled).

// Chain Detection: Credential added → SPN Graph API calls within 72h
let lookback = 30d;
let monitorWindow = 72h;
// Step 1: Apps that had credentials added
let CredentialAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
    "Update application – Certificates and secrets management ",
    "Add service principal credentials"
  )
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppId = tostring(Target.id)
| extend TargetAppName = tostring(Target.displayName)
| extend Actor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| project CredAddTime = TimeGenerated, Actor, TargetAppName, TargetAppId;
// Step 2: Graph API calls by those apps after credential add
CredentialAdds
| join kind=inner (
    MicrosoftGraphActivityLogs
    | where TimeGenerated > ago(lookback)
    | where isnotempty(ServicePrincipalId)
    | project GraphCallTime = TimeGenerated, AppId, RequestMethod, RequestUri, 
        ResponseStatusCode, ServicePrincipalId
) on $left.TargetAppId == $right.AppId
| where GraphCallTime between (CredAddTime .. (CredAddTime + monitorWindow))
| extend EndpointCategory = case(
    RequestUri has "/roleManagement/", "Role Management",
    RequestUri has_any ("/applications/", "/servicePrincipals/"), "App/SPN Management",
    RequestUri has "/users/", "User Enumeration",
    RequestUri has "/groups/", "Group Enumeration",
    RequestUri has "/identity/conditionalAccess/", "CA Policy Access",
    RequestUri has "/policies/", "Policy Management",
    RequestUri has "/security/", "Security Data",
    RequestUri has_any ("/mail/", "/messages", "/mailFolders"), "Email Access",
    RequestUri has_any ("/drives/", "/sites/"), "File Access",
    RequestUri has "/auditLogs/", "Audit Log Access",
    "Other")
| where EndpointCategory != "Other"
| summarize 
    GraphCalls = count(),
    Methods = make_set(RequestMethod, 5),
    SampleUris = make_set(RequestUri, 3),
    SuccessRate = round(100.0 * countif(ResponseStatusCode >= 200 and ResponseStatusCode < 300) / count(), 1)
    by CredAddTime, Actor, TargetAppName, TargetAppId, EndpointCategory
| order by CredAddTime desc, GraphCalls desc

Triage Priority:

  • 🔴 Critical: Role Management or App/SPN Management → privilege escalation / further persistence
  • 🔴 Critical: Email Access → data exfiltration (SolarWinds primary objective)
  • 🟠 High: CA Policy Access or Policy Management → defense evasion
  • 🟡 Medium: File Access → potential data staging

Query 6: Credential Add → Permission Escalation Chain

Purpose: After adding a credential (persistence), detect the attacker granting additional permissions or consenting to broader API access on the same app.

Kill Chain Stage: Persistence → Privilege Escalation

Tables: AuditLogs (self-join)

Schema Note: Credential operations and consent operations use different ID spaces for the same app (Application ObjectId vs ServicePrincipal ObjectId). This query joins on Actor + TargetAppName to bridge the gap.

// Chain Detection: Credential added → permission/consent on same app within 7d
let lookback = 30d;
let escalationWindow = 7d;
// Step 1: Credential additions
let CredentialAdds = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
    "Update application – Certificates and secrets management ",
    "Add service principal credentials"
  )
| extend Target = parse_json(tostring(TargetResources))[0]
| extend TargetAppName = tostring(Target.displayName)
| where isnotempty(TargetAppName)
| extend CredActor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| where isnotempty(CredActor)
| project CredAddTime = TimeGenerated, CredActor, TargetAppName;
// Step 2: Permission grants by same actor on same-named app
let PermissionGrants = AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ (
    "Add delegated permission grant",
    "Consent to application",
    "Add app role assignment to service principal"
  )
| extend EscActor = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| where isnotempty(EscActor)
| extend Target0 = parse_json(tostring(TargetResources))[0]
| extend PermAppName = case(
    OperationName =~ "Consent to application", tostring(Target0.displayName),
    tostring(Target0.displayName))
| project PermOpTime = TimeGenerated, EscActor, PermAppName, EscalationOp = OperationName;
// Join: same actor + same app + credential first then permission
CredentialAdds
| join kind=inner PermissionGrants on $left.CredActor == $right.EscActor, $left.TargetAppName == $right.PermAppName
| where PermOpTime between (CredAddTime .. (CredAddTime + escalationWindow))
| project
    CredAddTime,
    PermissionOpTime = PermOpTime,
    HoursGap = datetime_diff('hour', PermOpTime, CredAddTime),
    Actor = CredActor,
    TargetAppName,
    EscalationOp
| order by CredAddTime desc

Triage Priority:

  • 🔴 Critical: HoursGap = 0 + consent grant → automated attack tool
  • 🟠 High: Consent to powerful API scopes
  • 🟡 Medium: Add app role assignment with larger gap → possibly legitimate

Query 7: Multi-App Ownership Spread

Purpose: Detect a single user being added as owner to multiple applications within a rolling window. Attackers spread ownership across apps to maximize blast radius.

Kill Chain Stage: Persistence (breadth)

Tables: AuditLogs

// Detect lateral ownership expansion — one user becoming owner of many apps
let lookback = 30d;
AuditLogs
| where TimeGenerated > ago(lookback)
| where Category == "ApplicationManagement"
| where OperationName in~ ("Add owner to application", "Add owner to service principal")
| extend Target0 = parse_json(tostring(TargetResources))[0]
| extend NewOwnerUPN = tostring(Target0.userPrincipalName)
| extend ModProps = parse_json(tostring(Target0.modifiedProperties))
| extend TargetAppName = tostring(parse_json(tostring(ModProps[1].newValue)))
| extend TargetAppId = tostring(parse_json(tostring(ModProps[0].newValue)))
| extend InitiatedByUser = tostring(parse_json(tostring(InitiatedBy)).user.userPrincipalName)
| extend Actor = iff(isnotempty(InitiatedByUser), InitiatedByUser, tostring(parse_json(tostring(InitiatedBy)).app.displayName))
| where isnotempty(NewOwnerUPN)
| summarize
    AppsOwned = dcount(TargetAppId),
    AppNames = make_set(TargetAppName, 10),
    OwnershipOps = count(),
    FirstAdd = min(TimeGenerated),
    LastAdd = max(TimeGenerated),
    AddedBy = make_set(Actor, 5)
    by NewOwnerUPN
| extend SpreadWindowHours = datetime_diff('hour', LastAdd, FirstAdd)
| where AppsOwned >= 3
| order by AppsOwned desc

Triage Priority:

  • 🔴 Critical: AppsOwned >= 5 + SpreadWindowHours < 24 → bulk automated ownership grab
  • 🟠 High: Non-admin user (AddedBy = themselves) with AppsOwned >= 3
  • 🟡 Medium: Automation account adding ownership as part of deployment

Enhancement: Feed NewOwnerUPN values into Q1 to check for active identity risk events.

Query 8: App Governance & OAuth Incident Cross-Reference

Purpose: Surface existing Defender detections (App Governance, MCAS, Defender XDR attack disruptions) for apps in our posture assessment. Creates a cross-reference between our Graph API + KQL findings and what Microsoft's own detection products already flagged — confirming known threats and highlighting gaps.

Kill Chain Stage: Detection Validation (cross-reference)

Tables: AlertInfo + AlertEvidence

Why this matters:

  • Apps flagged by BOTH our skill AND App Governance/XDR → confirmed threat, urgent remediation
  • Apps flagged ONLY by our skill → unique detection value (the skill caught what App Governance missed)
  • Apps flagged ONLY by App Governance → coverage gap in our assessment (e.g., apps without dangerous Graph perms but with suspicious behavior)

Key field mappings (discovered via live testing):

Field Table Values
ServiceSource AlertInfo "App Governance", "Microsoft Defender for Cloud Apps", "Microsoft Defender XDR", "Microsoft Defender for Identity"
DetectionSource AlertInfo "App Governance Policy", "Microsoft 365 Defender", "Security Copilot", "Custom detection"
EntityType AlertEvidence "OAuthApplication" (app entities), "CloudApplication" (resource targets)
AdditionalFields.OAuthAppId AlertEvidence Application (client) ID — join key to Graph API flagged apps
AdditionalFields.Name AlertEvidence App display name

App Governance alert types:

  • Custom policy, App Creation Policy — admin-defined rules
  • Overprivileged app, New highly privileged app — permission-based detections
  • Expiring credentials, Unused credentials, Unused app — hygiene alerts

Defender XDR OAuth alert types:

  • Malicious OAuth application registration by a compromised user — attack disruption
  • Suspicious OAuth consent and privilege escalation activity — Security Copilot detection
  • Suspicious OAuth app registration — MCAS detection
  • Anomalous OAuth device code authentication activity — MDI detection
// Q8: App Governance + OAuth Incident Cross-Reference
let lookback = 90d;
// Part 1: App Governance alerts
let AppGovAlerts = AlertInfo
| where Timestamp > ago(lookback)
| where ServiceSource == "App Governance"
| project AlertId, AlertTitle = Title, ServiceSource, DetectionSource, Severity, Timestamp;
// Part 2: OAuth-related alerts from all sources
let OAuthAlerts = AlertInfo
| where Timestamp > ago(lookback)
| where Title has "OAuth"
    or (ServiceSource == "Microsoft Defender for Cloud Apps" and Title has_any ("app registration", "OAuth"))
| project AlertId, AlertTitle = Title, ServiceSource, DetectionSource, Severity, Timestamp;
// Part 3: Attack Disruption incidents targeting OAuth/compromised-user app abuse
let AttackDisruption = AlertInfo
| where Timestamp > ago(lookback)
| where Title has "attack disruption" and Title has_any ("OAuth", "malicious", "compromised")
| project AlertId, AlertTitle = Title, ServiceSource, DetectionSource, Severity, Timestamp;
// Combine all alert sources (deduplicate)
let AllAppAlerts = union AppGovAlerts, OAuthAlerts, AttackDisruption
| summarize arg_max(Timestamp, *) by AlertId;
// Join with AlertEvidence to get OAuthApplication entities
AllAppAlerts
| join kind=leftouter (
    AlertEvidence
    | where Timestamp > ago(lookback)
    | where EntityType == "OAuthApplication"
    | extend OAuthAppId = tostring(parse_json(AdditionalFields).OAuthAppId)
    | extend OAuthAppName = tostring(parse_json(AdditionalFields).Name)
    | project AlertId, OAuthAppId, OAuthAppName, EntityType
) on AlertId
| summarize
    AlertCount = count(),
    AlertTitles = make_set(AlertTitle, 10),
    Severities = make_set(Severity, 5),
    ServiceSources = make_set(ServiceSource, 5),
    DetectionSources = make_set(DetectionSource, 5),
    LatestAlert = max(Timestamp),
    EarliestAlert = min(Timestamp)
    by OAuthAppName, OAuthAppId
| extend OAuthAppName = iff(isempty(OAuthAppName), "⚠️ No app entity extracted", OAuthAppName)
| extend HasDefenderXDR = ServiceSources has "Microsoft Defender XDR"
| extend HasAppGov = ServiceSources has "App Governance"
| extend HasMCAS = ServiceSources has "Microsoft Defender for Cloud Apps"
| extend DetectionBreadth = toint(HasDefenderXDR) + toint(HasAppGov) + toint(HasMCAS)
| order by DetectionBreadth desc, AlertCount desc

Post-processing — Cross-reference with Phase 1 flagged apps:

After Q8 returns, compare the OAuthAppName values against the apps flagged in Phase 1 (P4):

Scenario Meaning Report Action
App in BOTH Phase 1 (dangerous perms) AND Q8 (existing detections) Confirmed threat — multiple detection layers agree 🔴 Highlight in report: "Corroborated by N existing Defender detections"
App in Phase 1 ONLY (dangerous perms, no Q8 hits) Skill-unique detection — App Governance hasn't flagged it 🟠 Highlight: "Not yet detected by App Governance — unique skill finding"
App in Q8 ONLY (existing detections, not in Phase 1) App may not have dangerous Graph perms but has suspicious behavior 🔵 Include in appendix: "Additional apps flagged by App Governance (not in dangerous-perms scope)"
App with DetectionBreadth ≥ 2 Multiple Defender products independently detected the app 🔴 Highest confidence finding

Triage Priority:

  • 🔴 Critical: DetectionBreadth ≥ 2 AND app also in Phase 1 flagged list → multi-source confirmed threat
  • 🔴 Critical: Any alert titled "Malicious OAuth application registration by a compromised user" (attack disruption) → Defender XDR auto-disrupted the attack
  • 🟠 High: App Governance Overprivileged app or New highly privileged app alerts on Phase 1 flagged apps
  • 🟡 Medium: App Governance hygiene alerts (Expiring credentials, Unused app) on any app

Output Modes

Mode 1: Inline Chat Summary

Render the full analysis directly in the chat response. Best for quick review.

Mode 2: Markdown File Report

Save a comprehensive report to disk at:

reports/app-registration-posture/App_Registration_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md

Where {tenant} is a short identifier for the tenant (derive from config.json or ask the user).

Mode 3: Both

Generate the markdown file AND provide an inline summary in chat.

Always ask the user which mode before generating output.


Inline Report Template

Render the following sections in order. Omit sections only if explicitly noted as conditional.

🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).

# 🔐 App Registration Security Posture Report

**Generated:** YYYY-MM-DD HH:MM UTC  
**Data Sources:** Graph API + Advanced Hunting (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs, AlertInfo, AlertEvidence)  
**KQL Lookback:** <N> days (Q1–Q7); 90 days (Q8)  
**Tenant:** <tenant name> (<tenant ID>)

---

## Executive Summary

<2-3 sentences: total apps with Graph permissions, apps with dangerous permissions, key chain detection findings, overall score>

**Overall Risk Rating:** 🔴/🟠/🟡/✅ <RATING> (<Score>/100)

---

## Key Metrics

| Metric | Value |
|--------|-------|
| Apps with Graph API Permissions | <N> |
| Apps with Dangerous Permissions | <N> |
| Critical Permission Grants (🔴) | <N> |
| High Permission Grants (🟠) | <N> |
| Medium Permission Grants (🟡) | <N> |
| Ownerless Apps with Dangerous Perms | <N> |
| Apps with No Local Application Object | <N> |
| Cross-Tenant SPNs | <N> |
| Active Abuse Chain Detections (Q1–Q8) | <N total hits> |

---

## 🔐 Permission Inventory (Graph API)

### Apps with Dangerous Permissions

| App Name | Dangerous Permissions | Risk Level | Grant Dates |
|----------|----------------------|------------|-------------|
| <app> | <perm1>, <perm2>, ... | 🔴/🟠/🟡 | <dates> |

### Permission Concentration

| Permission | Apps Granted | Risk |
|------------|-------------|------|
| <perm> | <N> (<app names>) | 🔴/🟠/🟡 |

**Assessment:**
- <emoji> <evidence-based finding about permission concentration>
- <emoji> <finding about golden ticket permissions (AppRoleAssignment.ReadWrite.All)>

---

## 👤 Owner Risk Assessment

### Flagged App Owners

> **Non-optional columns:** The `Identity Protection Risk` column MUST always be present. For each owner, check Q1 results or query AADUserRiskEvents for active risk state. If no risk events exist, show "✅ None". Never drop this column.

| App Name | Owner | Owner Roles | Identity Protection Risk | Owner Risk |
|----------|-------|-------------|--------------------------|------------|
| <app> | <upn> | <roles or "None (standard user)"> | <risk state + risk types, or "✅ None"> | 🔴/🟠/🟡/🟢 |

### Ownerless Apps with Dangerous Permissions

| App Name | Dangerous Permissions | Creator (from AuditLogs) |
|----------|----------------------|--------------------------|
| <app> | <perms> | <creator UPN or "Unknown"> |

**Assessment:**
- <emoji> <finding about non-admin owners on critical-permission apps>
- <emoji> <finding about ownerless apps>

---

## 🔑 Credential Hygiene

| App Name | Active Secrets | Active Certs | Oldest Secret Age | Longest Expiry | Risk |
|----------|---------------|-------------|-------------------|----------------|------|
| <app> | <N> | <N> | <days> | <date> | 🔴/🟠/🟡/🟢 |

**Assessment:**
- <emoji> <finding about multi-credential apps>
- <emoji> <finding about long-lived secrets>
- 🟡 **Dormant privileged apps:** List any apps with dangerous permissions but NO active credentials (0 secrets, 0 valid certs). These are one `Add service principal credentials` operation away from active abuse — rate as 🟡 at assessment level (not 🟢). Example: "Contoso employee onboarding has `User.ReadWrite.All` but no credentials — dormant risk."

---

## 🌐 Cross-Tenant SPN Exposure (Q4)

<If Q4 returns results:>
| SPN Name | Owner Tenant | Sign-Ins (30d) | Distinct IPs | Resources Accessed | Auth Methods | Locations | First Seen | Last Seen |
|----------|-------------|----------------|-------------|-------------------|-------------|-----------|------------|-----------|
| <name> | <tenant ID> | <N> | <N> | <resources> | <methods> | <locations> | <date> | <date> |

> **Auth method note:** `clientAssertion` (certificate-based) indicates higher attacker sophistication than `clientSecret`. Both present on a single SPN may indicate migration or redundant credential paths.

<If Q4 enhancement returns new SPNs:>
⚠️ **New Cross-Tenant SPNs (first seen in last 7 days):**
| SPN Name | Owner Tenant |
|----------|-------------|
| <name> | <tenant ID> |

<If Q4 returns 0:>
✅ No cross-tenant SPN sign-ins detected in the last <N> days.

**Assessment:**
- <emoji> <finding about foreign-tenant SPNs with golden ticket or CA policy write permissions>
- <emoji> <finding about sign-in volume and resource breadth>
- 🔵 Filter out known [first-party Microsoft service SPNs](https://learn.microsoft.com/en-us/troubleshoot/entra/entra-id/governance/verify-first-party-apps-sign-in) — normal behavior.

---

## ⚡ Active Abuse Chain Detection (Q1–Q3, Q5–Q8)

> **Note:** Q4 (Cross-Tenant SPNs) is presented in its own section above since it doubles as both a chain detection and a posture finding.

> **Bulk-pattern collapse rule:** When any chain query (Q1–Q8) returns >10 chains where >80% share the same actor AND the same pattern (uniform resource, timing, app naming convention), collapse into a single **"Automated Pipeline"** summary row with the total count and a governance-review flag. Only table the outliers individually. This prevents automation noise from burying genuine attack chains.

### Q1: Risky User → App Operations

<If Q1 returns results, always start with a rollup summary table:>

**Summary:**
| Priority | Chains | Users | Key Finding |
|----------|--------|-------|-------------|
| 🔴 Critical | <N> | <users> | <top finding — e.g., adminConfirmedUserCompromised → app consent> |
| 🟠 High | <N> | <users> | <summary> |
| 🟡 Low | <N> | <users> | <summary or "consent-flagging-consent loops"> |

<Then detail tables for 🔴 Critical and 🟠 High chains only. Collapse 🟡 Low into the summary.>

| Risk Detected | App Operation | Hours Gap | User | Risk Types | Risk Level | Target App |
|--------------|---------------|-----------|------|------------|------------|------------|
| <date> | <date> | <N> | <upn> | <types> | <level> | <app> |

> ⚠️ **Self-referencing note:** If Q1 results are dominated by `suspiciousAuthAppApproval` risk types, these may be self-referencing — Identity Protection flags consent operations as risky, which then correlates back to the same consent. Report both the raw count and a filtered count (`| where RiskTypes !has "suspiciousAuthAppApproval"`) to distinguish genuine compromise signals from circular detections.

<If Q1 returns 0:>
✅ No risky-user → app-operations chains detected.

### Q2: Credential Add → SPN Activation

<If Q2 returns results:>
| Cred Added | First SPN Sign-In | Hours to Activation | Actor | App | Distinct IPs | Resources |
|------------|-------------------|---------------------|-------|-----|-------------|-----------|
| <date> | <date> | <N> | <upn> | <app> | <N> | <resources> |

<If Q2 returns 0:>
✅ No credential-add → SPN-activation chains detected.

### Q3: Ownership → Credential Chain

<If Q3 returns results:>
| Owner Added | Cred Operation | Hours Gap | New Owner | Same Actor? | App |
|-------------|---------------|-----------|-----------|-------------|-----|
| <date> | <date> | <N> | <upn> | <yes/no> | <app> |

<If Q3 returns 0:>
✅ No ownership → credential modification chains detected.

### Q5: Credential Add → Graph API Lateral Movement

<If Q5 returns results:>
| Cred Added | Actor | App | Endpoint Category | Graph Calls | Methods | Success Rate |
|------------|-------|-----|-------------------|-------------|---------|-------------|
| <date> | <upn> | <app> | <category> | <N> | <methods> | <pct>% |

<If Q5 returns 0:>
✅ No credential-add → Graph API lateral movement chains detected.

> **Note:** MicrosoftGraphActivityLogs requires Entra ID P1/P2 + diagnostic settings. If table not found, report as: `❓ MicrosoftGraphActivityLogs not available — cannot assess Graph API lateral movement.`

### Q6: Credential Add → Permission Escalation

<If Q6 returns results:>
| Cred Added | Perm Escalation | Hours Gap | Actor | App | Escalation Operation |
|------------|----------------|-----------|-------|-----|---------------------|
| <date> | <date> | <N> | <upn> | <app> | <operation> |

<If Q6 returns 0:>
✅ No credential-add → permission-escalation chains detected.

### Q7: Multi-App Ownership Spread

<If Q7 returns results:>
| User | Apps Owned | Spread Window (hrs) | App Names | Added By |
|------|-----------|---------------------|-----------|----------|
| <upn> | <N> | <N> | <names> | <actors> |

<If Q7 returns 0:>
✅ No multi-app ownership spread detected (threshold: ≥3 apps).

### Q8: App Governance & OAuth Incident Cross-Reference

> **Purpose:** Cross-reference Phase 1 flagged apps with existing Microsoft detections (App Governance alerts, Defender XDR OAuth alerts, Attack Disruption incidents). This validates skill findings against Microsoft's own detection coverage and surfaces apps with multi-source detections.

<If Q8 returns results:>

**Detection Summary:**
| App Name | App ID | Alert Count | Detection Sources | Detection Breadth | Highest Severity | Has Attack Disruption |
|----------|--------|-------------|-------------------|-------------------|------------------|-----------------------|
| <name> | <id> | <N> | <sources> | <N> | <severity> | ✅/❌ |

**Cross-Reference with Phase 1:**
- 🔴 **Both skill and Microsoft flagged:** <list apps found in BOTH Phase 1 dangerous-permission inventory AND Q8 detections — these are confirmed high-priority>
- 🟠 **Skill-only (no Microsoft detection):** <list apps from Phase 1 that Q8 did NOT detect — skill's unique value-add, may indicate detection gap in App Governance>
- 🔵 **Microsoft-only (not in skill scope):** <list apps from Q8 that are NOT in Phase 1 — may not have dangerous permissions but triggered behavioral alerts>

<If Q8 returns 0:>
✅ No App Governance, OAuth, or Attack Disruption alerts detected for any apps in the last 90 days.

---

## App Permission Risk Score Card

```
┌──────────────────────────────────────────────────────────────┐
│       APP PERMISSION RISK SCORE: <NN>/100                    │
│                Rating: <EMOJI> <RATING>                      │
├──────────────────────────────────────────────────────────────┤
│ Perm Concentration [<bar>] <N>/20  (<detail>)                │
│ Owner Risk         [<bar>] <N>/20  (<detail>)                │
│ Credential Hygiene [<bar>] <N>/20  (<detail>)                │
│ Cross-Tenant Exp.  [<bar>] <N>/20  (<detail>)                │
│ Active Abuse Sigs  [<bar>] <N>/20  (<detail>)                │
└──────────────────────────────────────────────────────────────┘
```

### Dimension Details

| Dimension | Score | Evidence |
|-----------|-------|----------|
| **Permission Concentration** | 🔴/🟠/🟡 <N>/20 | <N> apps with dangerous perms; list golden ticket / critical perms found |
| **Owner Risk** | 🔴/🟠/🟡 <N>/20 | <N> ownerless apps; non-admin owners on critical apps; Identity Protection signals |
| **Credential Hygiene** | 🔴/🟠/🟡 <N>/20 | Multi-secret apps; stale credentials; dormant privileged apps |
| **Cross-Tenant Exposure** | 🔴/🟠/🟡 <N>/20 | Foreign SPNs with critical perms; unknown tenant IDs; resource breadth |
| **Active Abuse Signals** | 🔴/🟠/🟡 <N>/20 | Which chain queries (Q1–Q8) returned critical results; key actors; Q8 detection breadth |

---

## Security Assessment

| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |

---

## Recommendations

> **Key context:** This skill detects signals that [Microsoft App Governance](https://learn.microsoft.com/en-us/defender-cloud-apps/app-governance-manage-app-governance) does NOT — specifically the cross-table correlation between user compromise signals and app abuse chains. Recommendations should complement App Governance, not duplicate it.

**Minimum recommendation checklist** — include ALL applicable items (skip only if the finding doesn't exist in the data). Order by severity (🔴 first):

| # | Must-Include Topic | When Applicable |
|---|-------------------|------------------|
| a | **Golden ticket / critical cross-tenant SPN remediation** | Any foreign SPN with `AppRoleAssignment.ReadWrite.All` or `Directory.ReadWrite.All` |
| b | **Compromised-user consent investigation** | Q1 returns `adminConfirmedUserCompromised` or `confirmedCompromised` chains |
| c | **Owner assignment for ownerless dangerous apps** | Any ownerless app with dangerous perms |
| d | **Stale credential rotation** | Any secret >365 days old on an app with dangerous perms |
| e | **Multi-credential reduction** | Any app with ≥3 active secrets |
| f | **Non-admin owner risk mitigation** | Non-admin user owns app with 🔴-level perms |
| g | **Single-user blast radius reduction** | Any user owns ≥20 apps (pipeline or otherwise) |
| h | **Dormant privileged app disposition** | App with dangerous perms but no credentials |
| i | **Expired-credential permission cleanup** | App with expired creds that still retains dangerous permission grants |
| j | **App Governance enablement** | Always include if not already deployed (standard closing recommendation) |

1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...

---

## Related Workspace Resources

| Resource | Relationship |
|----------|-------------|
| `queries/identity/app_credential_management.md` | Individual event queries — complements chain detections |
| `queries/identity/service_principal_scope_drift.md` | SPN behavioral baseline — use for post-detection deep dive |
| `.github/skills/scope-drift-detection/spn/SKILL.md` | Full SPN investigation workflow — run on SPNs flagged by Q2 |
| `queries/cloud/behavior_entities.md` Q6 | MCAS `UnusualAdditionOfCredentialsToAnOauthApp` detection |

---

## Appendix: Query Execution Summary

| Phase | Query | Description | Records |
|-------|-------|-------------|--------|
| 1 | P1 | Find Graph SP ID | 1 |
| 1 | P2 | List permission grants | <N> |
| 1 | P3 | Resolve permission names | <N> |
| 1 | P4 | Filter dangerous perms | <N> |
| 1 | P5 | Resolve owners | <N> apps |
| 1 | P6 | Assess owner risk | <N> owners |
| 1 | P7 | Credential hygiene | <N> apps |
| 2 | Q1 | Risky User → App Ops | <N> |
| 2 | Q2 | Cred → SPN Activation | <N> |
| 2 | Q3 | Ownership → Credential | <N> |
| 2 | Q4 | Cross-Tenant SPNs | <N> |
| 2 | Q5 | Cred → Graph API | <N> |
| 2 | Q6 | Cred → Permission Esc. | <N> |
| 2 | Q7 | Ownership Spread | <N> |
| 2 | Q8 | App Gov & OAuth Cross-Ref | <N> |

Markdown File Report Template

When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:

reports/app-registration-posture/App_Registration_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md

Include the following additional sections in the file report that are omitted from inline:

  1. Full permission grant table (all apps with Graph permissions, not just dangerous ones)
  2. Complete owner listing (all owners for all flagged apps, including creator fallback from AuditLogs)
  3. Credential detail table (full passwordCredentials and keyCredentials with expiry dates)
  4. Cross-tenant SPN detail (full resource access breakdown per foreign SPN)
  5. Raw Q1–Q8 results (full chain detection output, not summarized)
  6. MITRE ATT&CK mapping table (techniques detected vs not detected)

File Report Header

# App Registration Security Posture Report

**Generated:** YYYY-MM-DD HH:MM UTC  
**Data Sources:** Graph API + Advanced Hunting (AuditLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, MicrosoftGraphActivityLogs, AlertInfo, AlertEvidence)  
**KQL Lookback:** <N> days (Q1–Q7); 90 days (Q8)  
**Tenant:** <tenant name> (<tenant ID>)  
**Apps with Graph Permissions:** <N>  
**Apps with Dangerous Permissions:** <N>  
**Cross-Tenant SPNs:** <N>  
**Chain Detections (Q1–Q8):** <N total hits>

---

File Report Differences from Inline

The file report uses the same inline template structure with these additions:

  • Q1–Q8 chain sections: Include ALL result rows (inline collapses 🟡 Low into the summary)
  • Cross-Tenant SPN Exposure table: Add Auth Methods and Locations columns (inline may abbreviate)
  • Credential Hygiene table: Add Application Object column (✅ Exists / ❌ No local object)
  • Dimension Details table: Always included (inline may omit if score is low)
  • Dormant privileged apps callout: Include in credential hygiene section even for 🟢 apps

Known Pitfalls

1. Application ObjectId ≠ ServicePrincipal ObjectId

Problem: The same app has different GUIDs in TargetResources[0].id depending on the AuditLog operation type. Credential operations reference the Application ObjectId; permission/consent operations reference the ServicePrincipal ObjectId.

Impact: Joining credential events to permission events on TargetResources[0].id returns zero results even when both operations target the same app.

Solution: Q6 joins on Actor + TargetAppName (display name match) instead of ObjectId. This works reliably for same-actor chains.

2. Ownership Operations — Target Name in modifiedProperties

Problem: For "Add owner to application", TargetResources[0] is the new owner (User type), not the app. The app name is buried in TargetResources[0].modifiedProperties[1].newValue.

Solution: Extract with tostring(parse_json(tostring(ModProps[1].newValue))). Field name is Application.DisplayName.

3. OperationName Trailing Spaces

Problem: "Update application – Certificates and secrets management " has a trailing space. String equality (==) fails without it.

Solution: Use in~() with the exact string (including trailing space) or use has for substring matching.

4. Cross-Tenant SPNs Have No Local Application Object

Problem: Graph API calls to /v1.0/applications?$filter=displayName eq 'X' return empty for SPNs owned by foreign tenants — they only have a ServicePrincipal object in your tenant, not an Application object.

Impact: Cannot retrieve ownership or credential details for cross-tenant SPNs via local Graph API.

Solution: Identify cross-tenant SPNs via Q4 (AppOwnerTenantId != AADTenantId). Report them separately with a note that ownership is managed by the foreign tenant.

5. Graph API requiredResourceAccess ≠ Granted Permissions

Problem: The Application object's requiredResourceAccess shows what the app requests (manifest), not what's been admin-consented/granted.

Solution: Always use appRoleAssignedTo on the resource service principal (Step P2) for the authoritative granted permissions list.

6. Red Team Apps May Have Owners Stripped

Problem: Attack simulation tools often remove app ownership post-creation to evade detection. Graph API returns no owners.

Solution: Fall back to AuditLogs "Add application" OperationName to find the original creator — AuditLogs retain the InitiatedBy actor forever.

7. MicrosoftGraphActivityLogs May Not Be Available

Problem: Q5 requires MicrosoftGraphActivityLogs, which needs Entra ID P1/P2 and diagnostic settings to be enabled. Not all tenants have this.

Impact: If the table doesn't exist, Q5 returns an error.

Solution: If Q5 fails with "table not found", report as ❓ MicrosoftGraphActivityLogs not available and skip — do not fail the entire assessment. The other 7 chain queries and Graph API posture still provide substantial coverage.

8. suspiciousAuthAppApproval Self-Referencing in Q1

Problem: When a consent grant occurs, Identity Protection may flag the same event as a suspiciousAuthAppApproval risk detection. Q1 then correlates the risk event WITH the consent operation, creating a circular detection.

Solution: If Q1 results are dominated by suspiciousAuthAppApproval risk types, note in the report that these may be self-referencing. The user can filter with | where RiskTypes !has "suspiciousAuthAppApproval" for higher-confidence chains.

9. Conflating Delegated AllPrincipals Consent with Application Permission Risk

Problem: When auditing tenant permissions (e.g., Get-MgOauth2PermissionGrant -Filter "consentType eq 'AllPrincipals'"), the returned delegated scopes can look alarming — 100+ scopes on a single app. It is tempting to rate these at the same severity as application permissions.

Why this is wrong: Delegated permissions operate as the intersection of the app's consented scopes and the signed-in user's Entra roles. A standard user cannot exploit broad delegated consent beyond their own role boundaries. The consent only removes the per-user prompt — it does not elevate privilege.

Solution: See Delegated vs Application Permissions — Risk Model. When this skill's analysis overlaps with a separate delegated consent audit, always clarify which permission type is being discussed. Application permissions (from P2/appRoleAssignedTo) are the primary risk. Delegated AllPrincipals consents are a secondary concern relevant mainly to privileged admin account compromise scenarios.


Quality Checklist

Before delivering the report, verify:

  • Phase 1 (Graph API) completed: P1–P7 steps executed
  • Phase 2 (KQL) completed: Q1–Q8 all executed via RunAdvancedHuntingQuery
  • Zero-result queries are reported with explicit absence confirmation (✅ pattern)
  • Graph API used appRoleAssignedTo (NOT requiredResourceAccess) for permission inventory
  • App ownership queried from Application object (NOT ServicePrincipal)
  • Cross-tenant SPNs reported separately with foreign-tenant note
  • The App Permission Risk Score calculation is transparent with per-dimension evidence
  • Permission inventory includes human-readable names (not just GUIDs)
  • Owner risk assessment includes directory role check + Identity Protection status
  • Credential hygiene includes expiry dates, not just counts
  • Chain detection results include triage priority (🔴/🟠/🟡) for each finding
  • Chain detection consent findings distinguish application permission grants (🔴) from delegated consent grants (🟠/🟡) — see Delegated vs Application Permissions — Risk Model
  • Q8 cross-reference includes three-way breakdown (both flagged, skill-only, Microsoft-only)
  • Recommendations complement (not duplicate) App Governance capabilities
  • All hyperlinks copied verbatim from URL Registry — no fabricated URLs
  • No PII from live environments in the SKILL.md file itself
  • Total elapsed time reported

SVG Dashboard Generation

📊 Optional post-report step. After an App Registration Posture report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/app-registration-posture/App_Registration_Posture_Report_<tenant>_<date>.md
  • Customization: Create an svg-widgets.yaml in this skill folder before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest, if it exists)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode if yaml exists, Freeform Mode otherwise)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/app-registration-posture/{report_name}_dashboard.svg
用于Entra ID认证流取证分析,区分合法活动与凭证/令牌窃取。通过追踪SessionId链、检查RequestSequence及IP丰富数据,判断可疑登录是否涉及交互式MFA或刷新令牌重用,评估地理异常风险。
trace authentication trace back to interactive MFA SessionId analysis token reuse geographic anomaly impossible travel
.github/skills/authentication-tracing/SKILL.md
npx skills add SCStelz/security-investigator --skill authentication-tracing -g -y
SKILL.md
Frontmatter
{
    "name": "authentication-tracing",
    "description": "Use this skill when asked to trace authentication flows, analyze SessionId chains, investigate token reuse vs interactive MFA, or assess geographic anomalies in sign-ins. Triggers on keywords like \"trace authentication\", \"trace back to interactive MFA\", \"SessionId analysis\", \"token reuse\", \"geographic anomaly\", \"impossible travel\", or when investigating suspicious sign-in locations. This skill provides forensic analysis of Entra ID authentication chains to distinguish legitimate activity from credential\/token theft.",
    "drill_down_prompt": "Trace authentication chain for {entity} — SessionId analysis, token reuse, geographic anomalies",
    "threat_pulse_domains": [
        "identity"
    ]
}

Authentication Tracing - Instructions

Purpose

This skill performs forensic analysis of Entra ID authentication flows to determine whether anomalous sign-ins represent:

  • Legitimate activity (VPN usage, user travel, mobile carrier routing)
  • Token theft/credential compromise (stolen refresh tokens, session hijacking)

The key distinction is whether the user actively performed MFA at a suspicious location or if the authentication used a refresh token from a prior session.


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Key Forensic Indicators - Understanding authentication signals
  3. IP Enrichment Data - JSON structure reference
  4. 6-Step Forensic Workflow - SessionId-based investigation
  5. Real-World Example - Complete walkthrough
  6. Authentication Methods Reference - Method patterns
  7. Risk Escalation Criteria - High/Medium/Low classification
  8. Best Practices - Summary checklist

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

🚨 MANDATORY CHECKPOINT: Before providing ANY risk assessment for authentication anomalies:

  1. STOP - Do not improvise or use general security knowledge
  2. READ the complete risk assessment framework in this document
  3. QUOTE specific instruction sections in your analysis
  4. VERIFY your conclusions match documented guidance before responding to user

Before executing ANY authentication tracing queries, you MUST:

  1. Read the SessionId-based workflow (Steps 1-6 below) in full
  2. Search the investigation JSON for IP enrichment data (ip_enrichment array) - PRIMARY DATA SOURCE
  3. Follow the documented steps in order (SessionId → Authentication chain → Interactive MFA → Risk assessment)
  4. Use IP enrichment context in your final risk assessment (VPN status, abuse scores, threat intel, auth patterns)

Skipping these steps will result in incomplete or incorrect analysis.


Key Forensic Indicators

When investigating anomalous sign-ins (e.g., from new countries, IPs, or devices), it's critical to determine whether the user actively performed MFA at that location or if the authentication used a refresh token from a prior session.

RequestSequence Field

Value Meaning Implication
RequestSequence: 1 or higher Interactive authentication User was challenged and responded
RequestSequence: 0 Token-based authentication No user interaction required

AuthenticationDetails Array Patterns

Interactive Pattern:

  • Array contains authentication method (e.g., "Passkey (device-bound)") with RequestSequence > 0
  • Followed by "Previously satisfied" entry

Token Reuse Pattern:

  • Array contains ONLY "Previously satisfied" entries
  • Shows "MFA requirement satisfied by claim in the token"

authenticationStepDateTime Correlation

  • If authenticationStepDateTime references a time when NO interactive auth occurred, it indicates token reuse
  • Cross-reference timestamps with events that have RequestSequence > 0 to trace token origin

IP Enrichment Data Structure (PRIMARY EVIDENCE SOURCE)

CRITICAL: The investigation JSON contains a comprehensive ip_enrichment array with authoritative detection flags.

Always reference this data FIRST before making VPN/proxy/Tor determinations.

Example IP Enrichment Entry (Actual JSON Structure)

{
  "ip": "203.0.113.42",           // ← KEY: Use "ip" field, not "ip_address"
  "city": "Singapore",
  "region": "Singapore",
  "country": "SG",
  "org": "AS12345 Example Hosting Ltd",
  "asn": "AS12345",
  "timezone": "Asia/Singapore",
  "risk_level": "HIGH",           // ← Overall risk assessment (LOW/MEDIUM/HIGH)
  "assessment": "⚠️ Threat Intelligence Match: Commercial VPN Service Detected",
  "is_vpn": true,                 // ← PRIMARY VPN DETECTION FLAG (ipinfo.io detection)
  "is_proxy": false,              // ← PRIMARY PROXY DETECTION FLAG
  "is_tor": false,                // ← PRIMARY TOR DETECTION FLAG
  "abuse_confidence_score": 0,    // ← AbuseIPDB score 0-100 (0=clean, 75+=high risk)
  "total_reports": 2,             // ← Number of abuse reports in AbuseIPDB
  "is_whitelisted": false,
  "threat_description": "Commercial VPN Service: Known Anonymization Infrastructure",
  "anomaly_type": "NewInteractiveIP",
  "first_seen": "2025-10-16",     // ← First sign-in from this IP (date string)
  "last_seen": "2025-10-16",      // ← Last sign-in from this IP (date string)
  "hit_count": 5,                  // ← Number of anomaly detections
  "signin_count": 8,               // ← Total sign-ins from this IP
  "success_count": 7,              // ← Successful authentications
  "failure_count": 1,              // ← Failed authentications
  "last_auth_result_detail": "MFA requirement satisfied by claim in the token",
  "threat_detected": false,        // ← Legacy field (use threat_description instead)
  "threat_confidence": 0,
  "threat_tlp_level": "",
  "threat_activity_groups": ""
}

CRITICAL: Always use ip_enrichment[].ip to match IPs, NOT ip_address!

Key Fields for Analysis

Field Purpose Usage Example
is_vpn Definitive VPN detection is_vpn: true → Confirmed VPN endpoint (don't infer, use this flag)
is_proxy Definitive proxy detection is_proxy: true → Confirmed proxy (anonymized traffic)
is_tor Definitive Tor detection is_tor: true → Confirmed Tor exit node (high anonymity risk)
abuse_confidence_score AbuseIPDB reputation (0-100) >= 75 = High risk, >= 25 = Medium risk, 0 = Clean
threat_detected Threat intel match flag true → IP matches ThreatIntelIndicators table
threat_description Threat intel details "Surfshark VPN", "Malicious activity detected", etc.
org / asn Network ownership AS9009 = M247 Europe (VPN infrastructure provider)
signin_count Total sign-ins from IP High count (>100) = established pattern vs transient
last_auth_result_detail Authentication method "MFA satisfied by token" vs "Correct password" = interactive vs token reuse
first_seen / last_seen Temporal pattern Single day = transient, multi-day = established behavior

Analysis Priority Hierarchy

  1. IP enrichment flags (is_vpn, is_proxy, is_tor) - Most authoritative source
  2. Abuse reputation (abuse_confidence_score, total_reports) - Community-validated risk data
  3. Threat intelligence (threat_detected, threat_description) - IOC matches from Sentinel
  4. Network ownership (org, asn, company_type) - Infrastructure context (hosting, ISP, etc.)
  5. Authentication patterns (last_auth_result_detail, signin_count) - Behavioral context
  6. Identity Protection (risk detections) - Microsoft ML-based risk signals

⚠️ NEVER say "likely VPN" or "probably proxy" if enrichment data has explicit boolean flags!


Forensic Workflow: Tracing Authentication Chains

Scenario: Anomalous sign-ins detected from new IP/location. Determine if user performed fresh MFA or reused token.

Tool Selection: AH-First, Data Lake Fallback

Lookback Tool Table Why
≤ 30 days (default) RunAdvancedHuntingQuery EntraIdSignInEvents Single table covers interactive + non-interactive. No union needed. Direct columns for Country, City, Browser, UserAgent. Free on Analytics tier.
> 30 days mcp_sentinel-data_query_lake union SigninLogs, AADNonInteractiveUserSignInLogs AH Graph API caps at 30d. Data Lake retains 90d+. See Data Lake Fallback Queries below.

Column mapping — EntraIdSignInEvents vs SigninLogs:

EntraIdSignInEvents (AH) SigninLogs (Data Lake) Notes
Timestamp TimeGenerated
AccountUpn UserPrincipalName
Application AppDisplayName
ErrorCode (int) ResultType (string) AH: ErrorCode == 0, DL: ResultType == "0"
Country, City (direct strings) Location or parse_json(LocationDetails) No parsing needed in AH
LogonType (JSON array) Separate tables (SigninLogs vs AADNonInteractive) AH: has "interactiveUser", DL: check which table
AuthenticationRequirement AuthenticationRequirement Same values: singleFactorAuthentication, multiFactorAuthentication
UserAgent, Browser, OSPlatform parse_json(DeviceDetail) Direct columns in AH
UniqueTokenId (not available) AH-only — token-level forensics
SessionId SessionId Same
(not available) AuthenticationDetails (JSON array) DL-only — per-step RequestSequence + authenticationMethod

Key trade-off: AuthenticationDetails (Data Lake only) provides per-step RequestSequence and authenticationMethod ("Password", "Previously satisfied", "Mobile app notification"). EntraIdSignInEvents replaces this with row-level LogonType (interactive vs non-interactive) + AuthenticationRequirement (singleFactor/multiFactor) + UniqueTokenId. Both achieve the same forensic goal — determining interactive MFA vs token reuse — through different signals.

⚠️ Table name casing: Capital I in SignInEntraIdSignInEvents, NOT EntraIdSigninEvents.

⚠️ LogonType is a JSON array string (e.g., ["interactiveUser"]). Use has for filtering, NOT ==.

CRITICAL: START WITH SessionId - This is Your Primary and Most Efficient Investigation Pattern:

  1. Query suspicious IP(s) to get SessionId (single query for all suspicious IPs)
  2. Query SessionId for complete chain — interactive vs non-interactive classification, geographic progression, token tracking
  3. Find interactive sign-ins to determine where the user (or attacker) authenticated interactively
    • Expand date range progressively if needed: investigation window → 7 days → 30 days (AH limit)
    • If > 30d lookback required: Switch to Data Lake fallback queries (90d retention)

AVOID chronological searching without SessionId - it requires multiple queries and is less efficient.


Step 1: Get SessionId from Suspicious Authentication (ALWAYS START HERE)

Tool: RunAdvancedHuntingQuery

This single query gives you SessionId AND enough context to determine next steps:

let suspicious_ips = dynamic(["<IP_1>", "<IP_2>"]);  // All suspicious IPs
EntraIdSignInEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ '<UPN>'
| where IPAddress in (suspicious_ips)
| project Timestamp, IPAddress, Country, City, Application,
    SessionId, LogonType, AuthenticationRequirement,
    UserAgent, Browser, OSPlatform, ErrorCode, UniqueTokenId
| order by Timestamp asc
| take 50

What This Returns:

  • SessionId(s) for suspicious authentications (your primary key for Step 2)
  • Device fingerprint (UserAgent) to check for device consistency
  • Application context
  • Initial timeline

Critical Decision Point:

  • All suspicious IPs share same SessionId? → Session continuity detected → Investigate further (could be legitimate user OR stolen token)
  • Different SessionIds across IPs? → Different authentication flows → Investigate device and authentication patterns
  • IMPORTANT: SessionId alone does NOT determine legitimacy - must correlate with UserAgent, geography, and behavior patterns

Step 2: Trace Complete Authentication Chain by SessionId (DEFINITIVE PROOF)

Tool: RunAdvancedHuntingQuery

Once you have SessionId from Step 1, query ALL authentications in that session:

let target_session_id = "<SESSION_ID_FROM_STEP_1>";
EntraIdSignInEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ '<UPN>'
| where SessionId == target_session_id
| project Timestamp, IPAddress, Country, City, Application,
    LogonType, AuthenticationRequirement, ErrorCode,
    UserAgent, Browser, OSPlatform, UniqueTokenId
| order by Timestamp asc

This Single Query Reveals:

  • Complete geographic progression (all IPs/locations in chronological order)
  • Where interactive authentication occurred (LogonType has "interactiveUser" + AuthenticationRequirement == "multiFactorAuthentication")
  • Token reuse pattern (LogonType has "nonInteractiveUser" — all subsequent events using cached tokens)
  • Device consistency (Browser + UserAgent + OSPlatform should match across session)
  • Time gaps between locations (assess physical possibility of travel)
  • Token-level tracking (UniqueTokenId — same token reused across IPs = session continuity)

Critical Evidence - What SessionId Indicates:

  • SessionId is a browser session identifier that tracks authentication flows
  • Same SessionId across IPs = Session continuity (could be legitimate user OR stolen token replay)
  • SessionId does NOT prove device identity - stolen refresh tokens maintain session continuity
  • Same SessionId + Same UserAgent + Geographic impossibility = Possible token theft
  • Token theft attacks maintain the original SessionId - attacker inherits session from stolen token
  • CRITICAL: Same SessionId does NOT rule out credential/token theft

Analysis Pattern:

  1. Look at FIRST authentication in session (earliest Timestamp)
  2. Check if LogonType has "interactiveUser" → User performed interactive authentication at that IP/location
  3. Check AuthenticationRequirementmultiFactorAuthentication = MFA was required and satisfied; singleFactorAuthentication = password-only
  4. Subsequent events with LogonType has "nonInteractiveUser" = token reuse (expected OAuth flow)
  5. Verify device consistency (Browser + UserAgent should match across session; different = possible token theft)
  6. Assess geographic progression (impossible travel = high risk; reasonable = needs user confirmation)
  7. Track UniqueTokenId — same token ID across geographically distant IPs = session continuity (could be VPN OR stolen token)

Step 3: Find Interactive Sign-Ins with Progressive Date Range Expansion

Tool: RunAdvancedHuntingQuery (≤30d) or Data Lake fallback (>30d)

Use this when Step 2 shows only nonInteractiveUser logon types (no interactive auth in the session)

Query Pattern:

EntraIdSignInEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ '<UPN>'
| where LogonType has "interactiveUser"
| where ErrorCode == 0
| summarize 
    SignInCount = count(),
    Apps = make_set(Application, 5),
    Countries = make_set(Country, 3),
    AuthReqs = make_set(AuthenticationRequirement),
    TokenIds = dcount(UniqueTokenId),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by IPAddress, SessionId
| order by LastSeen desc
| take 20

What This Returns:

  • All IPs where the user performed interactive sign-ins, grouped by session
  • AuthenticationRequirement per IP — reveals whether MFA was required or bypassed
  • TokenIds count — how many distinct tokens were issued from each IP/session pair
  • Match SessionIds against Step 1 results — if the suspicious SessionId also has interactive sign-ins from a VPS IP, the attacker has the password

Progressive expansion (if AH 30d window is insufficient):

  • If no interactive sign-ins found in 30d → Switch to Data Lake fallback queries for 90d lookback
  • Tokens can be valid for up to 90 days depending on tenant policy

Data Lake Fallback Queries (>30d)

Use these when the AH 30d window is insufficient — e.g., tracing token origins older than 30 days.

Tool: mcp_sentinel-data_query_lake with workspaceId

Step 1 (Data Lake):

let suspicious_ips = dynamic(["<IP_1>", "<IP_2>"]);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(90d)
| where UserPrincipalName =~ '<UPN>'
| where IPAddress in (suspicious_ips)
| project TimeGenerated, IPAddress, Location, AppDisplayName, 
    SessionId = tostring(SessionId), UserAgent, ResultType, CorrelationId
| order by TimeGenerated asc
| take 50

Step 2 (Data Lake) — with per-step auth detail:

let target_session_id = "<SESSION_ID>";
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(90d)
| where UserPrincipalName =~ '<UPN>'
| where SessionId == target_session_id
| extend AuthDetails = parse_json(tostring(AuthenticationDetails))
| mv-expand AuthDetails
| extend AuthMethod = tostring(AuthDetails.authenticationMethod)
| extend AuthStepDateTime = todatetime(AuthDetails.authenticationStepDateTime)
| extend RequestSeq = toint(AuthDetails.RequestSequence)
| project TimeGenerated, IPAddress, Location, AppDisplayName, 
    AuthMethod, AuthStepDateTime, RequestSeq, UserAgent, ResultType
| order by TimeGenerated asc

Data Lake advantage: AuthenticationDetails provides granular per-step RequestSequence and authenticationMethod ("Password", "Previously satisfied", "Mobile app notification") not available in EntraIdSignInEvents. Use this for forensic-grade MFA step tracing when AH's LogonType + AuthenticationRequirement columns are insufficient.

Step 3 (Data Lake) — interactive MFA search:

union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(90d)
| where UserPrincipalName =~ '<UPN>'
| extend AuthDetails = parse_json(tostring(AuthenticationDetails))
| mv-expand AuthDetails
| extend AuthMethod = tostring(AuthDetails.authenticationMethod)
| extend RequestSeq = toint(AuthDetails.RequestSequence)
| where AuthMethod != "Previously satisfied"
| where RequestSeq > 0
| project TimeGenerated, IPAddress, Location, AppDisplayName, AuthMethod,
    RequestSeq, SessionId = tostring(SessionId), UserAgent, ResultType
| order by TimeGenerated desc
| take 30

Step 4: Collect All IPs from Authentication Chain

CRITICAL: After completing the SessionId trace, extract ALL unique IP addresses discovered:

  1. From Interactive MFA session (Step 3 results)
  2. From Suspicious session (Step 1 results)
  3. From Complete SessionId chain (Step 2 results)

Build comprehensive IP list for enrichment analysis.


Step 5: Analyze IP Enrichment Data for ALL Discovered IPs

MANDATORY: Search investigation JSON ip_enrichment array for EVERY IP in the authentication chain:

For each IP address discovered in Steps 1-3:

  1. Locate IP in ip_enrichment array (search by "ip": "<IP_ADDRESS>" field)

  2. Extract key risk indicators:

    • is_vpn, is_proxy, is_tor (anonymization detection)
    • abuse_confidence_score, total_reports (reputation)
    • threat_description, threat_detected (threat intel matches)
    • org, asn (network ownership - hosting vs ISP)
    • last_auth_result_detail (authentication pattern)
    • signin_count, success_count, failure_count (frequency/behavior)
    • first_seen, last_seen (temporal pattern - transient vs established)
  3. Document findings for EACH IP in the chain:

    • Geographic location + ISP/VPN status
    • Risk level + threat intelligence status
    • Authentication pattern (interactive vs token reuse)
    • Behavioral context (frequency, success rate, temporal pattern)

This creates a complete evidence picture showing the full authentication journey with enrichment context.


Step 6: Document Risk Assessment

⚠️ MANDATORY CHECKPOINT - Before writing risk assessment:

  • READ the "When to Escalate Authentication Anomalies" section below
  • IDENTIFY which risk classification criteria applies to your case
  • QUOTE the specific criteria in your analysis
  • DO NOT improvise - follow documented classification exactly

Present findings in clear evidence trail:

  1. Interactive Session: IP, Location, Timestamp, AuthMethod, SessionId
  2. Subsequent Session: IP, Location, Timestamp, AuthMethod (token-based), SessionId
  3. IP Enrichment Analysis for ALL IPs: Present enrichment data for EVERY IP discovered in trace (VPN status, abuse scores, threat intel, auth patterns, frequency, temporal context)
  4. Connection Proof: SessionId match + time gap + geographic distance + comprehensive enrichment context from all IPs
  5. Risk Assessment: Evaluate based on context - MUST quote specific instruction criteria

Risk Assessment Framework - SessionId Interpretation:

  • SessionId does NOT prove device identity - token theft maintains session continuity
  • Same SessionId across geographically distant IPs = Requires investigation (VPN/travel OR stolen token)
  • Different SessionIds = Different authentication flows (not necessarily more suspicious)
  • Must correlate multiple signals: SessionId + UserAgent + Geography + Behavior + Time patterns + IP enrichment data

Real-World Example: Geographic Anomaly Analysis

Scenario: User sign-ins detected from two geographically distant locations within 18 hours.

Step 1: Interactive MFA Analysis

Location A Analysis:

  1. Query 1: Found 2 events with SMS verification and RequestSeq: 1
  2. Result: User performed fresh interactive SMS authentication at Location A
  3. Evidence: authenticationStepDateTime: 2025-10-15T14:23:05Z with RequestSequence: 1

Location B Analysis:

  1. Query 1: Zero results (no non-"Previously satisfied" methods)
  2. Result: Location B authentications used only token reuse - NO interactive MFA
  3. Evidence: All events show "MFA requirement satisfied by claim in the token"

Step 2: SessionId Verification (SMOKING GUN)

Query to compare sessions across both IPs:

let suspicious_ips = dynamic(["<IP_ADDRESS_1>", "<IP_ADDRESS_2>"]);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (datetime(<START_DATE>) .. datetime(<END_DATE>))
| where UserPrincipalName =~ '<UPN>'
| where IPAddress in (suspicious_ips)
| project TimeGenerated, IPAddress, Location, SessionId, UserAgent
| order by TimeGenerated asc

CRITICAL FINDING:

  • SessionId: <SESSION_ID_EXAMPLE>
  • ALL Location A authentications: Same SessionId (over time period 1)
  • ALL Location B authentications: Same SessionId (over time period 2)
  • Time gap: Varies (analyze based on context)
  • Geographic distance: Varies (analyze based on context)

Initial Appearance: Potential geographic anomaly requiring investigation Further Analysis Required: Correlate SessionId with UserAgent, behavior patterns, and user confirmation

Step 3: Evidence Summary and Interpretation

Evidence Type Finding Observation
Interactive MFA Location A only User performed SMS authentication
Location B Auth Methods "Previously satisfied" only Token reuse (normal OAuth flow)
SessionId Same across both locations Session continuity maintained
Time Gap 18 hours Within typical refresh token lifetime (24-90 days)
User Agent Same Consistent device fingerprint
Applications Consistent across locations Consistent workflow pattern

Critical Analysis - SessionId Does NOT Prove Legitimacy

The same SessionId requires careful analysis because:

  • SessionId is a browser session identifier that tracks authentication flows
  • Same SessionId = Session continuity (could be legitimate user OR stolen token)
  • Stolen refresh tokens maintain the original SessionId - attacker inherits session state
  • Same SessionId does NOT rule out token theft or credential compromise

Possible Scenarios Requiring Investigation:

Scenario Description Action Required
Legitimate VPN Connection User switched VPN exit nodes (same device, different apparent location) Requires user confirmation
Legitimate User Travel User traveled between locations with sufficient time gap (tokens remained valid) Requires user confirmation
Multi-Device User User has laptop + phone active simultaneously (different IPs, concurrent activity) Check UserAgent for mobile vs desktop - Requires user confirmation
Stolen Token Replay Attacker obtained refresh token (SessionId stays same, may show different UserAgent) Cannot be ruled out by SessionId alone
Mobile Carrier Routing Carrier routes traffic through regional gateways (device in one location, exits another) Check IP enrichment for ISP org

Additional Investigation Checklist

  • ✅ Check UserAgent consistency across all sessions
  • Distinguish mobile vs desktop UserAgents - Concurrent activity from different device types (e.g., Android Chrome + Windows Edge) may indicate legitimate multi-device usage, not token theft
  • ✅ Verify geographic progression is physically possible
  • ✅ Review applications accessed (any unusual admin tools?)
  • ✅ Check for failed authentication attempts before success
  • ✅ Look for account modifications or privilege changes
  • Check IP enrichment data in investigation JSON - Use ip_enrichment array to verify:
    • VPN/proxy/Tor status (is_vpn, is_proxy, is_tor)
    • Abuse reputation (abuse_confidence_score, total_reports)
    • Threat intelligence matches (threat_detected, threat_description)
    • Authentication patterns (last_auth_result_detail, signin_count, success_count, failure_count)
    • Temporal context (first_seen, last_seen - transient vs established pattern)
  • Most important: Confirm with user directly

Recommendation: User Confirmation Questions

Use IP enrichment data from investigation JSON to strengthen your analysis, then confirm with user:

  1. "Were you using a VPN on [date] around [time]?" (if is_vpn: true)
  2. "Did you travel between [Location A] and [Location B] during this timeframe?"
  3. "Were you using multiple devices (e.g., laptop and phone) at the same time?" (if concurrent activity with different UserAgents detected)
  4. "Do you recognize [applications] activity during this timeframe?"
  5. "Have you noticed any unusual device or account behavior recently?"

Only after user confirmation can you conclude VPN usage or travel is legitimate. Same SessionId + IP enrichment data together provide strong evidence, but user confirmation is still required.


Common Authentication Methods and RequestSequence Patterns

Authentication Method RequestSeq > 0 Meaning RequestSeq = 0 Meaning
Passkey (device-bound) User physically approved with biometric/PIN Passkey used in prior session, token reused
Phone sign-in User approved notification on phone Phone approval in prior session, token reused
SMS verification User entered SMS code SMS verification in prior session, token reused
Microsoft Authenticator app User approved push notification Authenticator used in prior session, token reused
Previously satisfied N/A - never has RequestSeq > 0 Always indicates token/claim reuse

When to Escalate Authentication Anomalies

CRITICAL: Always check IP enrichment data before making risk determination!

High Risk (Escalate Immediately)

  • Token reuse from geographically impossible locations (regardless of SessionId)
  • Token reuse after user reports device loss/theft
  • Concurrent sessions from multiple countries simultaneously with same UserAgent (same device can't be in two places)
  • Note: Concurrent activity with different UserAgents (mobile vs desktop) may indicate legitimate multi-device usage - verify before escalating
  • Token reuse from IPs matching ThreatIntelIndicators OR threat_detected: true in IP enrichment
  • Unusual application access (admin portals, sensitive resources not in user's normal pattern)
  • Failed authentication attempts followed by successful token reuse
  • Account modifications or privilege escalations during suspicious sessions
  • Geographic anomaly + Same SessionId + Different UserAgent = Likely token theft
  • Impossible travel time between authentications (regardless of SessionId)
  • IP enrichment shows: abuse_confidence_score >= 75, is_tor: true, or malicious threat_description

Medium Risk (Investigate Further - Confirm with User)

  • Same SessionId + Geographically distant locations = Could be VPN/travel OR token theft - VERIFY with IP enrichment
  • Concurrent activity from different IPs with different UserAgents = Could be multi-device (laptop + phone) OR token theft - ASK user about device usage
  • Token reuse from unexpected country without prior user notification
  • Token reuse spanning >30 days (excessive token lifetime - increases theft window)
  • Pattern of token-only authentications without any interactive MFA in 30+ days
  • Sign-ins during unusual hours for user's timezone
  • Access to sensitive data repositories during suspicious sessions
  • Same SessionId + Same UserAgent + Unusual geographic pattern = Needs user confirmation
  • IP enrichment shows: abuse_confidence_score >= 25, is_vpn: true without user confirmation, or total_reports > 0

Low Risk / Likely Legitimate (Monitor Only)

  • Token reuse from nearby IPs in same city (mobile carrier IP rotation)
  • Token reuse following confirmed interactive MFA from expected location
  • Token reuse from known corporate VPN IP ranges
  • Applications and access patterns consistent with user's role
  • User confirms VPN usage or travel when questioned
  • No unusual data access or configuration changes
  • Consistent UserAgent + Reasonable geographic progression + User confirmation
  • IP enrichment shows: abuse_confidence_score: 0, residential ISP org (TELUS, Comcast, etc.), is_vpn: false, high signin_count with consistent success rate

Best Practices for Authentication Tracing

  1. START WITH SessionId - Query suspicious IPs to get SessionId first (most efficient approach)
  2. Use SessionId to trace complete chain - Single query shows entire authentication progression
  3. Check IP enrichment data - Use investigation JSON ip_enrichment array for VPN, abuse scores, threat intel
  4. Verify device consistency - Same SessionId + Same UserAgent + Geographic reasonableness = Likely legitimate
  5. Check for multi-device scenarios - Different UserAgents (mobile vs desktop) with concurrent activity often indicates legitimate multi-device usage, not token theft. Users commonly work on laptop while checking email on phone.
  6. Concurrent activity ≠ Automatic compromise - Before concluding token theft from concurrent sessions, verify UserAgent differences and ask user about device usage patterns
  7. SessionId alone is NOT conclusive - Must correlate with UserAgent, geography, behavior, and user confirmation
  8. Check first authentication in session - RequestSeq > 0 shows where user performed interactive MFA
  9. Assess geographic progression - Evaluate if travel is physically possible or if VPN is likely
  10. Widen time ranges if needed - Tokens can be valid for 24-90 days depending on policy
  11. Always confirm with user - Geographic anomalies require user verification regardless of SessionId

Prerequisites

Required MCP Servers

This skill requires:

  1. Sentinel Triage MCP (primary, ≤30d) — RunAdvancedHuntingQuery for EntraIdSignInEvents
  2. Microsoft Sentinel Data Lake MCP (fallback, >30d) — mcp_sentinel-data_query_lake for SigninLogs + AADNonInteractiveUserSignInLogs union

Required Data Sources

  • EntraIdSignInEvents (primary) — Unified interactive + non-interactive sign-in events. Advanced Hunting only, 30d retention via Graph API
  • SigninLogs + AADNonInteractiveUserSignInLogs (fallback) — Sentinel Data Lake, 90d+ retention. Required when tracing token origins older than 30 days or when AuthenticationDetails per-step granularity is needed
  • Investigation JSON - Pre-generated investigation file with ip_enrichment array (from user-investigation skill)

How to Find Investigation JSON

  • Pattern: temp/investigation_<upn_prefix>_<timestamp>.json
  • Most recent file for user is usually the one to analyze
  • Use file_search or list_dir to locate existing investigations

Integration with User Investigation Skill

Authentication tracing is typically performed as a follow-up analysis after running a user investigation:

  1. Run user-investigation skill → Generates investigation JSON with ip_enrichment array
  2. Review anomalies → Identify suspicious IPs/locations requiring deeper analysis
  3. Run authentication-tracing skill → Trace SessionId chains, correlate with IP enrichment
  4. Document findings → Provide risk assessment with evidence trail

Key Integration Points:

  • IP enrichment data comes from investigation JSON (already queried by user-investigation)
  • SessionId queries are NEW queries specific to authentication tracing
  • Risk assessment combines both data sources
用于对Windows/macOS/Linux设备进行全方位安全调查,涵盖恶意软件、合规性及异常活动分析。支持Entra ID管理设备,通过KQL和Graph API提取告警、漏洞及网络数据,生成详细报告或可视化看板。
investigate computer investigate device investigate endpoint check machine device security endpoint investigation
.github/skills/computer-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill computer-investigation -g -y
SKILL.md
Frontmatter
{
    "name": "computer-investigation",
    "description": "Use this skill when asked to investigate a computer, device, endpoint, or machine for security issues, suspicious activity, malware, or compliance review. Triggers on keywords like \"investigate computer\", \"investigate device\", \"investigate endpoint\", \"check machine\", \"device security\", \"endpoint investigation\", or when a device name\/hostname is mentioned with investigation context. This skill provides comprehensive device security analysis including Defender alerts, sign-in patterns, logged-on users, vulnerabilities, software inventory, compliance status, network activity, and automated investigation tracking for Entra Joined, Hybrid Joined, and Entra Registered devices.",
    "drill_down_prompt": "Investigate device {entity} — Defender alerts, process activity, vulnerabilities, compliance",
    "threat_pulse_domains": [
        "endpoint"
    ]
}

Computer Security Investigation - Instructions

Purpose

This skill performs comprehensive security investigations on Windows, macOS, and Linux devices registered in Microsoft Entra ID and/or managed by Microsoft Defender for Endpoint. It analyzes Defender alerts, device compliance, sign-in patterns, logged-on users, installed software, vulnerabilities, network connections, and automated investigation results for:

  • Entra Joined Devices: Cloud-only devices joined directly to Microsoft Entra ID
  • Hybrid Joined Devices: Devices joined to both on-premises Active Directory and Microsoft Entra ID
  • Entra Registered Devices: Personal devices (BYOD) registered with Microsoft Entra ID

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Investigation Types - Standard/Quick/Comprehensive
  3. Output Modes - Inline / Markdown file / JSON export
  4. Quick Start - 5-step investigation pattern
  5. Execution Workflow - Complete process
  6. Sample KQL Queries - Validated query patterns
  7. Microsoft Graph Queries - Entra ID device data
  8. Defender for Endpoint Queries - MDE API integration
  9. Markdown Report Template - Full markdown report structure
  10. JSON Export Structure - Required fields
  11. Error Handling - Troubleshooting guide
  12. SVG Dashboard Generation - Visual dashboard from report data

Investigation shortcuts:

  • Device with behavioral drift (TP Q6): Q3 (suspicious processes) → Q11 (logon events) → Q7 (incidents) → Q8 (device info)
  • Internet-facing critical asset (TP Q11): Q8 (device info + internet-facing) → Q4 (outbound connections) → Q10 (vulnerabilities) → Q11 (logon events)
  • Device in active incident (TP Q1): Q2 (security alerts) → Q3 (process execution) → Q5 (file events) → Q6 (registry persistence) → Q7 (incidents)
  • Brute-forced endpoint (TP Q4): Q11 (logon events) → Q4 (outbound connections) → Q12 (TI IP matches)
  • Vulnerability assessment (TP Q12): Q9 (software inventory) → Q10 (CVEs on device) → Q8 (exposure score)

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY computer investigation:

  1. ALWAYS get Device ID FIRST (required for Defender API and Graph queries - multiple IDs exist!)
  2. ALWAYS determine device type (Entra Joined, Hybrid Joined, or Entra Registered)
  3. ALWAYS calculate date ranges correctly (use current date from context - see Date Range section)
  4. ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or JSON export (see Output Modes)
  5. ALWAYS track and report time after each major step (mandatory)
  6. ALWAYS run independent queries in parallel (drastically faster execution)
  7. ALWAYS use create_file for JSON export and markdown reports (NEVER use PowerShell terminal commands)
  8. ⛔ ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)

⛔ MANDATORY: Sentinel Workspace Selection

This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:

When invoked from a parent skill (incident-investigation, threat-pulse, etc.):

  • Inherit the workspace selection from the parent investigation context
  • If no workspace was selected in parent context: STOP and ask user to select
  • Use the SELECTED_WORKSPACE_IDS passed from the parent skill
  • Skip output mode prompts — default to inline chat (the parent skill controls the final output format)

When invoked standalone (direct user request):

  1. ALWAYS call list_sentinel_workspaces MCP tool FIRST
  2. If 1 workspace exists: Auto-select, display to user, proceed
  3. If multiple workspaces exist:
    • Display all workspaces with Name and ID
    • ASK: "Which Sentinel workspace should I use for this investigation?"
    • ⛔ STOP AND WAIT for user response
    • ⛔ DO NOT proceed until user explicitly selects
  4. If a query fails on the selected workspace:
    • ⛔ DO NOT automatically try another workspace
    • STOP and report the error
    • Display available workspaces
    • ASK user to select a different workspace
    • WAIT for user response

Workspace Failure Handling

IF query returns "Failed to resolve table" or similar error:
    - STOP IMMEDIATELY
    - Report: "⚠️ Query failed on workspace [NAME] ([ID]). Error: [ERROR_MESSAGE]"
    - Display: "Available workspaces: [LIST_ALL_WORKSPACES]"
    - ASK: "Which workspace should I use instead?"
    - WAIT for explicit user response
    - DO NOT retry with a different workspace automatically

🔴 PROHIBITED ACTIONS:

  • ❌ Selecting a workspace without user consent when multiple exist
  • ❌ Switching to another workspace after a failure without asking
  • ❌ Proceeding with investigation if workspace selection is ambiguous
  • ❌ Assuming a workspace based on previous sessions

Device ID Types:

  • Entra Device ID (Azure AD Object ID): Used for Graph API queries - GUID format
  • Defender Device ID: Used for MDE API queries - GUID format (different from Entra ID!)
  • Device Name/Hostname: Human-readable name, use for initial search
  • Intune Device ID: Used for Intune management queries

Date Range Rules:

  • Real-time/recent searches: Add +2 days to current date for end range
  • Historical ranges: Add +1 day to user's specified end date
  • Example: Current date = Jan 23; "Last 7 days" → datetime(2026-01-16) to datetime(2026-01-25)

Device Types Reference

Entra Joined Devices

  • trustType: AzureAd
  • Characteristics: Cloud-only, no on-premises AD connection
  • Identity: Uses Entra ID for authentication
  • Common scenarios: Cloud-native organizations, Windows Autopilot deployments

Hybrid Joined Devices

  • trustType: ServerAd (indicates hybrid join with on-premises AD)
  • Characteristics: Joined to both on-premises AD and Entra ID
  • Identity: Uses both on-premises AD and Entra ID
  • Common scenarios: Traditional enterprise environments migrating to cloud

Entra Registered Devices

  • trustType: Workplace
  • Characteristics: Personal/BYOD devices, user adds work account
  • Identity: User authenticates with Entra ID, device not fully managed
  • Common scenarios: BYOD policies, personal device access to corporate resources

Available Investigation Types

Standard Investigation (7 days)

When to use: General security reviews, routine investigations

Example prompts:

  • "Investigate device WORKSTATION-001 for the last 7 days"
  • "Run security investigation for computer LAP-JSMITH from 2026-01-16 to 2026-01-23"
  • "Check endpoint security for DESKTOP-ABC123"

Quick Investigation (1 day)

When to use: Urgent cases, active malware alerts, recent suspicious activity

Example prompts:

  • "Quick investigate infected device SRV-SQL01"
  • "Run quick security check on machine WKS-FINANCE02"
  • "Urgent: check device LAPTOP-EXEC-01 for compromise"

Comprehensive Investigation (30 days)

When to use: Deep-dive analysis, lateral movement detection, thorough forensics

Example prompts:

  • "Full investigation for potentially compromised device SRV-DC01"
  • "Do a deep dive investigation on endpoint WORKSTATION-IT03 last 30 days"
  • "Comprehensive security analysis for hybrid joined device DESKTOP-HR01"

All types include: Defender alerts, device compliance, sign-in patterns from device, logged-on users, software inventory, vulnerabilities, network connections, file activities, automated investigation status, and security recommendations.


Output Modes

This skill supports three output modes. ASK the user which they prefer if not explicitly specified. Multiple modes may be selected simultaneously.

Mode 1: Inline Chat Summary (Default)

  • Render the full investigation analysis directly in the chat response
  • Includes device profile, risk assessment, alerts, vulnerabilities, logged-on users, and recommendations
  • Best for quick review and interactive follow-up questions
  • No file output — results stay in the chat context

Mode 2: Markdown File Report

  • Save a comprehensive investigation report to reports/computer-investigations/computer_investigation_<device_name>_<YYYYMMDD_HHMMSS>.md
  • All sections from inline mode plus additional detail (full vulnerability tables, process event samples, network connection details, query appendix)
  • Uses the Markdown Report Template defined below
  • Use create_file tool — NEVER use terminal commands for file output
  • Filename pattern: computer_investigation_<device_name>_YYYYMMDD_HHMMSS.md (lowercase device name, replace spaces/special chars with underscores)

Mode 3: JSON Export (Legacy)

  • Export investigation data to JSON for downstream processing or archival
  • Uses the JSON Export Structure defined below
  • Best for programmatic consumption or integration with other tools

Markdown Rendering Notes

  • ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
  • ✅ Unicode block characters ( full block, box-drawing horizontal) display correctly in monospaced fonts
  • ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
  • ✅ Standard markdown tables (| col |) render as formatted tables
  • Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering

Quick Start (TL;DR)

When a user requests a computer security investigation:

  1. Get Device IDs:

    # First, find the device and get both Entra ID and Defender ID
    mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,trustType,isCompliant,isManaged")
    # Then get Defender device ID from MDE
    Use Defender `ListDefenderMachines` or Advanced Hunting to find by device name
    
  2. Run Parallel Queries:

    • Batch 1: 8 Sentinel/Advanced Hunting queries (device sign-ins, alerts, process events, network, files, incidents)
    • Batch 2: 5 Defender API queries (machine details, logged-on users, alerts, vulnerabilities, recommendations)
    • Batch 3: 3 Graph queries (device details, compliance, BitLocker keys if needed)
  3. Export & Report (Mode-Dependent):

    • Mode 1 (Inline): Render analysis directly in chat using the Markdown Report Template as a guide
    • Mode 2 (Markdown): Build full report using the Markdown Report Template, save to reports/computer-investigations/
    • Mode 3 (JSON): Export to temp/investigation_device_<device_name>_<timestamp>.json
  4. Generate Summary Report: Provide investigation summary with key findings, risk assessment, and recommendations.

  5. Track time after each major step and report to user


Execution Workflow

🚨 MANDATORY: Time Tracking Pattern

YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:

[MM:SS] ✓ Step description (XX seconds)

Required Reporting Points:

  1. After Device ID retrieval
  2. After parallel data collection
  3. After JSON file creation
  4. After summary generation
  5. Final: Total elapsed time

Phase 1: Get Device IDs (REQUIRED FIRST)

Step 1a: Get Entra Device ID from Microsoft Graph

/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,trustType,isCompliant,isManaged,registrationDateTime,approximateLastSignInDateTime,mdmAppId,profileType

Step 1b: Get Defender Device ID Use Advanced Hunting or Defender API to find the MDE device ID:

DeviceInfo
| where DeviceName startswith '<DEVICE_NAME>'  // Use startswith to match both hostname and FQDN
| summarize arg_max(TimeGenerated, *) by DeviceName
| project DeviceId, DeviceName, OSPlatform, OSVersion, MachineGroup, OnboardingStatus, ExposureLevel, SensorHealthState, DeviceManualTags, DeviceDynamicTags, RegistryDeviceTag

Note: RiskScore is NOT in DeviceInfo - use GetDefenderMachine API to get riskScore and exposureLevel.

Why BOTH IDs are required:

  • Entra Device ID: Used for Graph API (compliance, registration, BitLocker, Intune)
  • Defender Device ID: Used for MDE API (alerts, vulnerabilities, logged-on users, investigations)
  • IDs are DIFFERENT: The same device has different GUIDs in Entra ID vs Defender for Endpoint

Device Type Determination:

  • Check trustType field from Graph API response:
    • AzureAd = Entra Joined
    • ServerAd = Hybrid Joined
    • Workplace = Entra Registered

Phase 2: Parallel Data Collection

CRITICAL: Use create_file tool to create JSON - NEVER use PowerShell terminal commands!

Batch 1: Sentinel/Advanced Hunting Queries (Run ALL in parallel)

  • Device sign-in events (Query 1) - Who signed into this device
  • Device alerts (Query 2) - SecurityAlert filtered by device
  • Process execution events (Query 3) - Suspicious process activity
  • Network connection events (Query 4) - Outbound connections
  • File events (Query 5) - File creation/modification/deletion
  • Registry events (Query 6) - Registry modifications
  • Security incidents (Query 7) - Incidents containing this device
  • Device inventory changes (Query 8) - Configuration changes

Batch 2: Defender for Endpoint API (Run ALL in parallel)

  • Machine details (GetDefenderMachine) - Device info from MDE
  • Logged-on users (GetDefenderMachineLoggedOnUsers) - Recent users
  • Device alerts (GetDefenderMachineAlerts) - MDE alerts
  • Device vulnerabilities (Advanced Hunting) - CVEs on device
  • Installed software (Advanced Hunting) - Software inventory

Batch 3: Graph API Queries (Run ALL in parallel)

  • Device details (Graph) - Full device properties
  • Compliance policies (Graph) - Applied compliance policies
  • Intune device status (if MDM enrolled) - Intune management data

Phase 3: Export & Generate Report (Mode-Dependent)

Mode 1 — Inline Chat Summary

  • No file export needed
  • Render the full investigation analysis directly in chat using the section structure from the Markdown Report Template as a guide
  • Include: Device Profile, Alert Summary, Logged-On Users, Vulnerability Overview, Process Activity, Network Connections, Risk Assessment, Recommendations
  • Use emoji-coded tables for risk factors and mitigating factors

Mode 2 — Markdown File Report

  1. Assess IP enrichment needs:

    • Extract public IPs from network connection events and sign-in data
    • Run python enrich_ips.py <ip1> <ip2> ... for threat intelligence enrichment
    • Parse the output to populate IP Intelligence tables in the report
  2. Build the markdown report using the Markdown Report Template below

    • Populate ALL sections with actual query data
    • For sections with no data: use the explicit absence confirmation pattern (e.g., "✅ No alerts detected...")
    • Calculate risk score and assessment dynamically
  3. Save the report:

    create_file("reports/computer-investigations/computer_investigation_<device_name>_YYYYMMDD_HHMMSS.md", markdown_content)
    
    • Use create_file tool — NEVER use terminal commands for file output
    • Lowercase device name, replace spaces/special chars with underscores

Mode 3 — JSON Export (Legacy)

  1. Export to JSON:

    create_file("temp/investigation_device_<device_name>_<timestamp>.json", json_content)
    
  2. Merge all results into one dict structure (see JSON Export Structure section below)


Required Field Specifications

Device Query (Graph API)

/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,trustType,isCompliant,isManaged,registrationDateTime,approximateLastSignInDateTime,mdmAppId,profileType,manufacturer,model,enrollmentType,deviceOwnership
  • All fields REQUIRED for investigation
  • trustType determines device join type
  • isCompliant and isManaged indicate MDM status

Defender Machine Details

Use the Defender GetDefenderMachine MCP tool with Defender Device ID:

  • Returns: healthStatus, riskScore, exposureLevel, onboardingStatus, lastSeen, osPlatform, osVersion

Sample KQL Queries

Use these exact patterns with the appropriate MCP tool. Replace <DEVICE_NAME>, <DEVICE_ID>, <StartDate>, <EndDate>.

⚠️ CRITICAL: START WITH THESE EXACT QUERY PATTERNS These queries have been tested and validated. Use them as your PRIMARY reference.


🔧 MCP Tool Invocation Reference

CRITICAL: Use the correct parameter names for each tool!

Sentinel Data Lake MCP (query_lake tool)

  • Tool: Use the Sentinel Data Lake MCP's query_lake tool
  • Parameter name: query
  • Time column: TimeGenerated
  • Use for: Lookbacks >30 days on any table (AH Graph API is capped at 30d), or when AH is blocked by the safety filter

Example invocation:

query_lake(
    query="DeviceInfo | where DeviceName startswith 'DEVICENAME' | summarize arg_max(TimeGenerated, *) by DeviceId",
    workspaceId="<WORKSPACE_ID>"
)

Defender XDR Advanced Hunting (RunAdvancedHuntingQuery tool)

  • Tool: Use the Sentinel Triage MCP's RunAdvancedHuntingQuery tool
  • Parameter name: kqlQuery (NOT query!)
  • Time column: Timestamp for XDR-native tables (Device*, Email*, etc.); TimeGenerated for LA/Sentinel tables (SigninLogs, SecurityAlert, etc.) — even in AH
  • Use for: Default choice for all ≤30d queries (free for Analytics-tier tables). Required for TVM tables (DeviceTvmSoftwareInventory, DeviceTvmSoftwareVulnerabilities) which don't exist in Data Lake.

Example invocation:

RunAdvancedHuntingQuery(
    kqlQuery="DeviceTvmSoftwareVulnerabilities | where DeviceName startswith 'DEVICENAME' | take 30"
)

Tool Selection Guide

Follow the global Tool Selection Rule in .github/copilot-instructions.md (Data Lake vs Advanced Hunting). This skill does NOT override the global default — use Advanced Hunting first for ≤30d lookbacks (free for Analytics-tier tables), and fall back to Data Lake only for >30d windows or when AH is blocked by the safety filter.

Table Tool (lookback ≤30d) Tool (lookback >30d) Time Column
Device* (DeviceInfo, DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceLogonEvents, DeviceRegistryEvents) Advanced Hunting (free) Data Lake AH: Timestamp / DL: TimeGenerated
SecurityAlert, SecurityIncident Advanced Hunting Data Lake TimeGenerated (both tools)
SigninLogs, AuditLogs, AADNonInteractiveUserSignInLogs Advanced Hunting Data Lake TimeGenerated (both tools)
DeviceTvmSoftwareInventory, DeviceTvmSoftwareVulnerabilities Advanced Hunting only Advanced Hunting only Timestamp (snapshot, no time filter needed)

When adapting the sample queries below: they are written with TimeGenerated for Data Lake compatibility. For Advanced Hunting on Device* tables, swap TimeGeneratedTimestamp. For SecurityAlert/SecurityIncident/SigninLogs in AH, keep TimeGenerated (LA/Sentinel tables retain their column name in AH).

Schema differences: Some MDE columns (e.g., SentBytes, ReceivedBytes in DeviceNetworkEvents) may not be available in Data Lake. If a column fails in one tool, try the other.


📅 Date Range Quick Reference

🔴 STEP 0: GET CURRENT DATE FIRST (MANDATORY) 🔴

  • ALWAYS check the current date from the context header BEFORE calculating date ranges
  • NEVER use hardcoded years - the year changes and you WILL query the wrong timeframe

RULE 1: Real-Time/Recent Searches (Current Activity)

  • Add +2 days to current date for end range
  • Why +2? +1 for timezone offset (PST behind UTC) + +1 for inclusive end-of-day
  • Pattern: Today is Jan 23 (PST) → Use datetime(2026-01-25) as end date

RULE 2: Historical Searches (User-Specified Dates)

  • Add +1 day to user's specified end date
  • Why +1? To include all 24 hours of the final day

Examples Table (Assuming Current Date = January 23, 2026):

User Request <StartDate> <EndDate> Rule Applied
"Last 7 days" 2026-01-16 2026-01-25 Rule 1 (+2)
"Last 30 days" 2025-12-24 2026-01-25 Rule 1 (+2)
"Jan 15 to Jan 20" 2026-01-15 2026-01-21 Rule 2 (+1)

1. Device Sign-In Events (Who authenticated on this device)

Note: DeviceDetail is dynamic in SigninLogs but string in AADNonInteractiveUserSignInLogs. Query SigninLogs only for device context (interactive sign-ins contain device info). Do NOT use union with DeviceDetail filtering - causes schema conflicts in Sentinel Data Lake.

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
SigninLogs
| where TimeGenerated between (start .. end)
| extend DeviceDetailStr = tostring(DeviceDetail)
| where DeviceDetailStr has deviceName
| extend ParsedDevice = parse_json(DeviceDetailStr)
| extend DeviceName = tostring(ParsedDevice.displayName)
| extend DeviceId = tostring(ParsedDevice.deviceId)
| extend DeviceOS = tostring(ParsedDevice.operatingSystem)
| extend DeviceTrustType = tostring(ParsedDevice.trustType)
| extend DeviceCompliant = tostring(ParsedDevice.isCompliant)
| summarize 
    SignInCount = count(),
    SuccessCount = countif(ResultType == '0'),
    FailureCount = countif(ResultType != '0'),
    UniqueUsers = dcount(UserPrincipalName),
    Users = make_set(UserPrincipalName, 10),
    Applications = make_set(AppDisplayName, 10),
    IPAddresses = make_set(IPAddress, 10),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by DeviceName, DeviceOS, DeviceTrustType, DeviceCompliant
| order by SignInCount desc

2. Device Security Alerts (SecurityAlert table)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has deviceName or CompromisedEntity has deviceName
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project 
    TimeGenerated,
    AlertName,
    AlertSeverity,
    Status,
    Description,
    ProviderName,
    Tactics,
    Techniques,
    CompromisedEntity,
    RemediationSteps
| order by TimeGenerated desc
| take 20

3. Process Execution Events (Suspicious processes)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceProcessEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| where ActionType in ("ProcessCreated", "ProcessCreatedUsingWmiQuery")
| extend CommandLineLength = strlen(ProcessCommandLine)
| extend IsSuspicious = case(
    ProcessCommandLine has_any ("powershell", "cmd", "wscript", "cscript") and ProcessCommandLine has_any ("-enc", "-e ", "bypass", "hidden", "downloadstring", "invoke-expression", "iex"), true,
    ProcessCommandLine has_any ("certutil", "bitsadmin") and ProcessCommandLine has_any ("download", "transfer", "urlcache"), true,
    ProcessCommandLine has_any ("reg", "registry") and ProcessCommandLine has_any ("add", "delete") and ProcessCommandLine has_any ("run", "runonce"), true,
    FileName in~ ("mimikatz.exe", "procdump.exe", "psexec.exe", "cobaltstrike", "beacon.exe"), true,
    CommandLineLength > 500, true,
    false)
| summarize 
    ProcessCount = count(),
    SuspiciousCount = countif(IsSuspicious),
    UniqueProcesses = dcount(FileName),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    SampleCommands = make_set(ProcessCommandLine, 5)
    by FileName, FolderPath, AccountName, AccountDomain
| where SuspiciousCount > 0 or ProcessCount > 50
| order by SuspiciousCount desc, ProcessCount desc
| take 20

4. Network Connection Events (Outbound connections)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| where ActionType == "ConnectionSuccess"
| where RemoteIPType != "Private" // Focus on public IPs
| summarize 
    ConnectionCount = count(),
    UniqueRemoteIPs = dcount(RemoteIP),
    UniqueRemotePorts = dcount(RemotePort),
    Protocols = make_set(Protocol, 5),
    InitiatingProcesses = make_set(InitiatingProcessFileName, 10),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by RemoteIP, RemotePort, RemoteUrl
| order by ConnectionCount desc
| take 30

5. File Events (File creation/modification/deletion)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceFileEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| where ActionType in ("FileCreated", "FileModified", "FileDeleted", "FileRenamed")
| extend FileExtension = tostring(split(FileName, ".")[-1])
| extend IsSuspicious = case(
    FileExtension in~ ("exe", "dll", "bat", "cmd", "ps1", "vbs", "js", "hta", "scr", "pif"), true,
    FolderPath has_any ("\\temp\\", "\\tmp\\", "\\appdata\\local\\temp", "\\programdata\\", "\\users\\public\\"), true,
    false)
| summarize 
    FileEventCount = count(),
    SuspiciousCount = countif(IsSuspicious),
    CreatedCount = countif(ActionType == "FileCreated"),
    ModifiedCount = countif(ActionType == "FileModified"),
    DeletedCount = countif(ActionType == "FileDeleted"),
    UniqueFiles = dcount(FileName),
    FileExtensions = make_set(FileExtension, 10),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by FolderPath, InitiatingProcessFileName
| where SuspiciousCount > 0 or FileEventCount > 100
| order by SuspiciousCount desc, FileEventCount desc
| take 20

6. Registry Events (Registry modifications)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceRegistryEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| where ActionType in ("RegistryValueSet", "RegistryKeyCreated")
| extend IsPersistence = case(
    RegistryKey has_any ("\\CurrentVersion\\Run", "\\CurrentVersion\\RunOnce", "\\CurrentVersion\\RunServices"), true,
    RegistryKey has_any ("\\Policies\\Explorer\\Run", "\\Active Setup\\Installed Components"), true,
    RegistryKey has_any ("\\Image File Execution Options\\", "\\Winlogon\\", "\\BootExecute"), true,
    RegistryKey has_any ("\\Services\\", "\\Drivers\\"), true,
    false)
| summarize 
    RegistryEventCount = count(),
    PersistenceCount = countif(IsPersistence),
    UniqueKeys = dcount(RegistryKey),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by RegistryKey, RegistryValueName, InitiatingProcessFileName
| where PersistenceCount > 0
| order by PersistenceCount desc, RegistryEventCount desc
| take 20

7. Security Incidents Containing Device

let deviceName = '<DEVICE_NAME>';
let deviceId = '<DEVICE_ID>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let relevantAlerts = SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has deviceName or Entities has deviceId or CompromisedEntity has deviceName
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProviderName, Tactics;
SecurityIncident
| where CreatedTime between (start .. end)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where not(tostring(Labels) has "Redirected")
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend ProviderIncidentUrl = tostring(AdditionalData.providerIncidentUrl)
| extend OwnerUPN = tostring(Owner.userPrincipalName)
| summarize 
    Title = any(Title),
    Severity = any(Severity),
    Status = any(Status),
    Classification = any(Classification),
    CreatedTime = any(CreatedTime),
    LastModifiedTime = any(LastModifiedTime),
    OwnerUPN = any(OwnerUPN),
    ProviderIncidentUrl = any(ProviderIncidentUrl),
    AlertCount = count(),
    Tactics = make_set(Tactics)
    by ProviderIncidentId
| order by LastModifiedTime desc
| take 10

8. Device Inventory and Configuration Changes

Note: RiskScore is NOT in DeviceInfo - use GetDefenderMachine API for risk/exposure scores.

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceInfo
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| summarize arg_max(TimeGenerated, *) by DeviceId
| project 
    TimeGenerated,
    DeviceId,
    DeviceName,
    OSPlatform,
    OSVersion,
    OSBuild,
    OSArchitecture,
    LoggedOnUsers,
    MachineGroup,
    DeviceCategory,
    OnboardingStatus,
    SensorHealthState,
    ExposureLevel,
    IsAzureADJoined,
    IsInternetFacing,
    JoinType,
    PublicIP,
    DeviceManualTags,
    DeviceDynamicTags,
    RegistryDeviceTag

9. Software Inventory on Device

⚠️ DO NOT use Sentinel Data Lake MCP (query_lake) for this query. The DeviceTvmSoftwareInventory table is NOT available in the Sentinel Data Lake. Use Advanced Hunting MCP (RunAdvancedHuntingQuery) only. TVM tables use snapshot ingestion with no TimeGenerated filtering.

let deviceName = '<DEVICE_NAME>';
DeviceTvmSoftwareInventory
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| project 
    DeviceName,
    SoftwareVendor,
    SoftwareName,
    SoftwareVersion,
    EndOfSupportStatus,
    EndOfSupportDate
| summarize by SoftwareVendor, SoftwareName, SoftwareVersion, EndOfSupportStatus, EndOfSupportDate
| order by NumberOfWeaknesses desc
| take 30

10. Vulnerabilities on Device

⚠️ DO NOT use Sentinel Data Lake MCP (query_lake) for this query. The DeviceTvmSoftwareVulnerabilities table is NOT available in the Sentinel Data Lake. Use Advanced Hunting MCP (RunAdvancedHuntingQuery) only. TVM tables use snapshot ingestion with no TimeGenerated filtering.

let deviceName = '<DEVICE_NAME>';
DeviceTvmSoftwareVulnerabilities
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| project
    CveId,
    VulnerabilitySeverityLevel,
    SoftwareVendor,
    SoftwareName,
    SoftwareVersion,
    RecommendedSecurityUpdate,
    RecommendedSecurityUpdateId
| summarize by CveId, VulnerabilitySeverityLevel, SoftwareVendor, SoftwareName, SoftwareVersion, RecommendedSecurityUpdate, RecommendedSecurityUpdateId
| order by case(VulnerabilitySeverityLevel == "Critical", 1, VulnerabilitySeverityLevel == "High", 2, VulnerabilitySeverityLevel == "Medium", 3, 4) asc
| take 30

11. Logon Events on Device

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
DeviceLogonEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| summarize 
    LogonCount = count(),
    SuccessCount = countif(ActionType == "LogonSuccess"),
    FailureCount = countif(ActionType == "LogonFailed"),
    UniqueAccounts = dcount(AccountName),
    LogonTypes = make_set(LogonType, 5),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    RemoteIPs = make_set(RemoteIP, 10)
    by AccountName, AccountDomain, LogonType
| order by LogonCount desc
| take 20

12. Threat Intelligence IP Matches (Device Network Traffic)

Performance notes: ThreatIntelIndicators can be large (100K+ rows). Filter IsActive/ValidUntil before string transformations per KQL best practices — reduce data first, transform later. The triple replace_string was replaced with direct array indexing split(...)[0] which returns a clean string.

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let deviceName = '<DEVICE_NAME>';
let device_ips = DeviceNetworkEvents
| where TimeGenerated between (start .. end)
| where DeviceName startswith deviceName  // Use startswith to match both hostname and FQDN
| where RemoteIPType != "Private"
| distinct RemoteIP;
ThreatIntelIndicators
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| where tostring(split(ObservableKey, ":")[0]) in ("ipv4-addr", "ipv6-addr", "network-traffic")
| where ObservableValue in (device_ips)
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| summarize arg_max(TimeGenerated, *) by ObservableValue
| project 
    TimeGenerated,
    IPAddress = ObservableValue,
    ThreatDescription = Description,
    Confidence,
    ValidUntil,
    IsActive
| order by Confidence desc
| take 20

Microsoft Graph Device Queries

Use these Graph API queries in Phase 2 (Batch 3) of investigation workflow

Step 1: Find Device by Name

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices?$filter=displayName eq '<DEVICE_NAME>'&$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,trustType,isCompliant,isManaged,registrationDateTime,approximateLastSignInDateTime,mdmAppId,profileType,manufacturer,model,enrollmentType,deviceOwnership")

Step 2: Get Device Owners

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices/<DEVICE_OBJECT_ID>/registeredOwners?$select=id,displayName,userPrincipalName")

Step 3: Get Device Users

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/devices/<DEVICE_OBJECT_ID>/registeredUsers?$select=id,displayName,userPrincipalName")

Step 4: Get BitLocker Recovery Keys (if needed)

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/informationProtection/bitlocker/recoveryKeys?$filter=deviceId eq '<DEVICE_ID>'")

NOTE: Requires BitLockerKey.Read.All permission

Step 5: Get Intune Device Details (if MDM enrolled)

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/deviceManagement/managedDevices?$filter=deviceName eq '<DEVICE_NAME>'&$select=id,deviceName,managedDeviceOwnerType,complianceState,managementAgent,lastSyncDateTime,osVersion,azureADRegistered,azureADDeviceId,deviceEnrollmentType,deviceCategoryDisplayName,serialNumber,userPrincipalName")

Defender for Endpoint Queries

Use these MDE API queries in Phase 2 (Batch 2) of investigation workflow

Get Machine Details

GetDefenderMachine(id="<DEFENDER_DEVICE_ID>")

Returns: id, computerDnsName, osPlatform, osVersion, healthStatus, onboardingStatus, riskScore, exposureLevel, lastSeen, lastIpAddress, lastExternalIpAddress, rbacGroupName, machineTags (API field — maps to DeviceManualTags in AH)

Get Logged-On Users

GetDefenderMachineLoggedOnUsers(id="<DEFENDER_DEVICE_ID>")

Returns: Array of users with accountName, accountDomain, firstSeen, lastSeen, logonTypes

Get Machine Alerts (via API)

Use the ListAlerts MCP tool filtered by device:

ListAlerts with machineId filter

Get Automated Investigations

ListDefenderInvestigations

Filter results by machineId to find investigations related to the device

Get Remediation Activities

ListDefenderRemediationActivities

Filter results by machineId to find remediation tasks for the device


Markdown Report Template

When outputting to markdown file (Mode 2), use this template. Populate ALL sections with actual query data. For sections with no data, use the explicit absence confirmation pattern.

Filename pattern: reports/computer-investigations/computer_investigation_<device_name>_YYYYMMDD_HHMMSS.md

# Computer Security Investigation Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Device:** `<DEVICE_NAME>`
**OS:** <operating_system> <os_version>
**Trust Type:** <Entra Joined / Hybrid Joined / Entra Registered> (`<trustType>`)
**Compliance:** <Compliant/Non-Compliant> | **Managed:** <Yes/No> | **MDM:** <Intune/None>
**Investigation Period:** <start_date> → <end_date> (<N> days)
**Investigation Type:** <Standard (7d) / Quick (1d) / Comprehensive (30d)>
**Data Sources:** DeviceInfo, DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceLogonEvents, SigninLogs, SecurityAlert, SecurityIncident, DeviceTvmSoftwareVulnerabilities, DeviceTvmSoftwareInventory, ThreatIntelIndicators, Microsoft Graph API, Defender for Endpoint API

---

## Executive Summary

<2-4 sentence summary: overall device risk level, key findings, most significant alerts or vulnerabilities, and primary recommendation. Ground every claim in evidence from query results.>

**Overall Risk Level:** 🔴 CRITICAL / 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL

---

## Device Profile

| Property | Value |
|----------|-------|
| **Device Name** | `<device_name>` |
| **OS** | <os_platform> <os_version> (<os_build>) |
| **Architecture** | <os_architecture> |
| **Trust Type** | <Entra Joined / Hybrid Joined / Entra Registered> |
| **Compliant** | 🟢 Yes / 🔴 No |
| **Managed** | 🟢 Yes / 🔴 No |
| **Manufacturer** | <manufacturer> |
| **Model** | <model> |
| **Registration Date** | <datetime> |
| **Last Sign-in** | <datetime> |
| **Internet Facing** | 🔴 Yes / 🟢 No |

### Defender for Endpoint Status

| Property | Value |
|----------|-------|
| **Onboarding Status** | 🟢 Onboarded / 🔴 Not Onboarded |
| **Sensor Health** | 🟢 Active / 🟠 Inactive / 🔴 Misconfigured |
| **Health Status** | <health_status> |
| **Risk Score** | 🔴/🟠/🟡/🟢 <None/Low/Medium/High> |
| **Exposure Level** | 🔴/🟠/🟡/🟢 <None/Low/Medium/High> |
| **Last Seen** | <datetime> |
| **Last Internal IP** | <ip_address> |
| **Last External IP** | <ip_address> |
| **Machine Group** | <group_name> |
| **Device Tags** | <comma-separated list from DeviceManualTags + DeviceDynamicTags, or "None"> |

### Device Owners & Registered Users

<If owners/users found:>

| User | UPN | Role |
|------|-----|------|
| <display_name> | <upn> | Owner / Registered User |

<If no owners/users:>
✅ No registered owners or users found for this device.

---

## Key Metrics

| Metric | Value |
|--------|-------|
| **Security Alerts** | <count> (Critical: <n>, High: <n>, Medium: <n>, Low: <n>) |
| **Security Incidents** | <count> (Open: <n>, Closed: <n>) |
| **Logged-On Users** | <count> unique users |
| **Sign-ins from Device** | <count> (Success: <n>, Failed: <n>) |
| **Vulnerabilities** | <count> (Critical: <n>, High: <n>, Medium: <n>) |
| **Suspicious Processes** | <count> flagged |
| **Network Connections** | <count> external IPs |
| **TI Matches** | <count> threat intel hits |
| **End-of-Support Software** | <count> |

---

## Security Alerts

<If alerts found:>

| Time | Alert Name | Severity | Status | Provider | Tactics | Compromised Entity |
|------|-----------|----------|--------|----------|---------|---------------------|
| <datetime> | <alert_name> | 🔴/🟠/🟡 <severity> | <status> | <provider> | <tactics> | <entity> |

**Alert Summary:**
- <X> total alerts (<breakdown by severity>)
- <Brief description of most critical alert(s)>
- Remediation steps: <summary of recommended actions from alert data>

<If no alerts:>
✅ No security alerts detected for this device in the investigation period.
- Checked: SecurityAlert filtered by device name and device ID (0 matches)

---

## Security Incidents

<If incidents found:>

| ID | Title | Severity | Status | Classification | Created | Owner | Alerts | Link |
|----|-------|----------|--------|----------------|---------|-------|--------|------|
| <provider_incident_id> | <title> | 🔴/🟠/🟡 <severity> | <New/Active/Closed> | <TP/FP/BP/—> | <date> | <owner_upn> | <count> | [View](<url>) |

**Incident Summary:**
- <X> total incidents (<Y> open, <Z> closed)
- Highest severity: <level>
- <Brief description of most critical incident>

<If no incidents:>
✅ No security incidents involving this device in the investigation period.
- Checked: SecurityAlert → SecurityIncident join on device name and device ID (0 matches)

---

## Logged-On Users

<If users found:>

| Account | Domain | Logon Type | Logon Count | Success | Failed | First Seen | Last Seen |
|---------|--------|------------|:-----------:|:-------:|:------:|------------|-----------|
| <account_name> | <domain> | <Interactive/RemoteInteractive/Network/etc.> | <count> | <count> | <count> | <date> | <date> |

**User Analysis:**
- <X> unique accounts authenticated on this device
- <Summary of logon patterns — expected vs unexpected accounts, after-hours logons, remote IPs>

<If no logon data:>
✅ No logon events detected for this device in the investigation period.

### Defender Logged-On Users (API)

<If MDE logged-on users found:>

| Account | Domain | First Seen | Last Seen | Logon Types |
|---------|--------|------------|-----------|-------------|
| <account_name> | <domain> | <date> | <date> | <types> |

<If no MDE data:>
✅ No logged-on user data returned from Defender for Endpoint API.

---

## Sign-in Activity (From Device)

<If sign-in events found:>

| Device Name | OS | Trust Type | Compliant | Users | Applications | IPs | Sign-ins | Success | Failed | First Seen | Last Seen |
|-------------|-----|------------|-----------|:-----:|:------------:|:---:|:--------:|:-------:|:------:|------------|-----------|
| <name> | <os> | <trust> | 🟢/🔴 | <count> | <count> | <count> | <count> | <count> | <count> | <date> | <date> |

**Top Users:** <list of UPNs>
**Top Applications:** <list of apps>
**Top IPs:** <list of IPs>

<If no sign-in events:>
✅ No sign-in events found for this device in the investigation period.

---

## Process Activity

<If suspicious processes found:>

| Process | Path | Account | Process Count | Suspicious | Sample Command Lines |
|---------|------|---------|:------------:|:----------:|----------------------|
| <filename> | <folder_path> | <account_name> | <count> | 🔴 <count> | <truncated_command> |

**Process Analysis:**
- <X> suspicious process executions detected
- <Summary of suspicious patterns — encoded commands, LOLBins, credential dumping tools, long command lines>

<If no suspicious processes:>
✅ No suspicious process activity detected on this device in the investigation period.
- Checked: DeviceProcessEvents filtered for suspicious indicators (0 flagged)

---

## Network Connections

<If external connections found:>

| Remote IP | Remote Port | URL | Connections | Unique Ports | Protocols | Initiating Processes | First Seen | Last Seen |
|-----------|:-----------:|-----|:-----------:|:------------:|-----------|----------------------|------------|-----------|
| <ip> | <port> | <url> | <count> | <count> | <protocols> | <process_list> | <date> | <date> |

**Network Summary:**
- <X> unique external IPs contacted
- <Y> unique remote ports
- <Top initiating processes>

<If no external connections:>
✅ No external network connections detected for this device in the investigation period.

### Threat Intelligence Matches

<If TI matches found:>

| IP Address | Threat Description | Confidence | Valid Until | Active |
|------------|-------------------|:----------:|------------|:------:|
| <ip> | <description> | <score> | <date> | ✅/❌ |

<If no TI matches:>
✅ No threat intelligence matches found for device network traffic.
- Checked: ThreatIntelIndicators joined with device external IPs (0 matches)

---

## File Activity

<If suspicious file events found:>

| Folder Path | Initiating Process | Total Events | Suspicious | Created | Modified | Deleted | Extensions | First Seen | Last Seen |
|-------------|-------------------|:------------:|:----------:|:-------:|:--------:|:-------:|------------|------------|-----------|
| <path> | <process> | <count> | 🔴 <count> | <count> | <count> | <count> | <ext_list> | <date> | <date> |

**File Activity Analysis:**
- <X> suspicious file operations detected
- <Summary — executable drops in temp folders, script creation, mass file modifications>

<If no suspicious file events:>
✅ No suspicious file activity detected on this device in the investigation period.
- Checked: DeviceFileEvents for suspicious extensions and temp folder activity (0 flagged)

---

## Registry Modifications

<If persistence-related registry events found:>

| Registry Key | Value Name | Initiating Process | Total Events | Persistence | First Seen | Last Seen |
|-------------|------------|-------------------|:------------:|:-----------:|------------|-----------|
| <key> | <value_name> | <process> | <count> | 🔴 <count> | <date> | <date> |

**Registry Analysis:**
- <X> persistence-related registry modifications detected
- <Summary — Run keys, services, Winlogon, IFEO modifications>

<If no persistence registry events:>
✅ No persistence-related registry modifications detected on this device in the investigation period.
- Checked: DeviceRegistryEvents for Run/RunOnce/Services/Winlogon/IFEO keys (0 flagged)

---

## Vulnerabilities

<If vulnerabilities found:>

| CVE ID | Severity | Vendor | Software | Version | Security Update |
|--------|----------|--------|----------|---------|-----------------|
| <cve_id> | 🔴/🟠/🟡 <severity> | <vendor> | <software> | <version> | <update_id> |

**Vulnerability Summary:**
- <X> total vulnerabilities (Critical: <n>, High: <n>, Medium: <n>, Low: <n>)
- <Most critical CVEs and their remediation status>

<If no vulnerabilities:>
✅ No known vulnerabilities detected on this device.
- Checked: DeviceTvmSoftwareVulnerabilities (0 records)

---

## Software Inventory

<If notable software found:>

| Vendor | Software | Version | End of Support | EOS Date |
|--------|----------|---------|:--------------:|----------|
| <vendor> | <software> | <version> | 🔴 Yes / 🟢 No | <date> |

**Software Summary:**
- <X> total software packages installed
- <Y> end-of-support software detected
- <Notable findings — outdated browsers, deprecated runtimes, risky applications>

<If no software data:>
✅ No software inventory data available for this device.
- Checked: DeviceTvmSoftwareInventory (0 records)

---

## Device Configuration

<If configuration data available:>

| Property | Value |
|----------|-------|
| **Public IP** | <ip> |
| **Machine Group** | <group> |
| **Device Category** | <category> |
| **Onboarding Status** | <status> |
| **Sensor Health** | <health> |
| **Exposure Level** | <level> |
| **Azure AD Joined** | <Yes/No> |
| **Internet Facing** | <Yes/No> |
| **Join Type** | <type> |

---

## IP Intelligence

<Table of external IPs from network connections and sign-in data. Run `enrich_ips.py` for top IPs.>

| IP Address | Source | Location | ISP/Org | VPN | Abuse Score | Reports | Risk |
|------------|--------|----------|---------|-----|-------------|---------|------|
| <ip> | 🔵 Network / 🔵 Sign-in / 🔴 TI Match | <city, country> | <org> | 🟢 No / 🔴 Yes | <score>% | <count> | HIGH/MED/LOW |

---

## Risk Assessment

### Risk Score: <XX>/100 — 🔴 CRITICAL / 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL

### Risk Factors

| Factor | Finding |
|--------|---------|
| 🔴/🟠/🟡 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |

### Mitigating Factors

| Factor | Finding |
|--------|---------|
| 🟢 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |

---

## Recommendations

### Critical Actions
<Numbered list of critical actions with evidence. Only include if critical findings exist.>

### High Priority Actions
<Numbered list of high-priority actions with evidence.>

### Monitoring Actions (14-Day Follow-Up)
<Bulleted list of ongoing monitoring recommendations.>

---

## Appendix: Query Details

| # | Query | Table(s) | Tool | Records | Execution |
|---|-------|----------|------|--------:|----------:|
| 1 | Device Sign-In Events | SigninLogs | Data Lake | <count> | <time> |
| 2 | Security Alerts | SecurityAlert | Data Lake | <count> | <time> |
| 3 | Process Events | DeviceProcessEvents | Data Lake | <count> | <time> |
| 4 | Network Connections | DeviceNetworkEvents | Data Lake | <count> | <time> |
| 5 | File Events | DeviceFileEvents | Data Lake | <count> | <time> |
| 6 | Registry Events | DeviceRegistryEvents | Data Lake | <count> | <time> |
| 7 | Security Incidents | SecurityAlert, SecurityIncident | Data Lake | <count> | <time> |
| 8 | Device Inventory | DeviceInfo | Data Lake | <count> | <time> |
| 9 | Software Inventory | DeviceTvmSoftwareInventory | Advanced Hunting | <count> | <time> |
| 10 | Vulnerabilities | DeviceTvmSoftwareVulnerabilities | Advanced Hunting | <count> | <time> |
| 11 | Logon Events | DeviceLogonEvents | Data Lake | <count> | <time> |
| 12 | Threat Intelligence | ThreatIntelIndicators, DeviceNetworkEvents | Data Lake | <count> | <time> |
| — | Device Profile | Microsoft Graph API | Graph | 1 | <time> |
| — | Device Owners/Users | Microsoft Graph API | Graph | <count> | <time> |
| — | Machine Details | Defender for Endpoint API | MDE | 1 | <time> |
| — | Logged-On Users | Defender for Endpoint API | MDE | <count> | <time> |

*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*

**Do NOT include full KQL text in the appendix** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.

---

**Investigation Timeline:**
- [MM:SS] ✓ Phase 1: Device ID retrieval (<X>s)
- [MM:SS] ✓ Phase 2: Parallel data collection (<X>s)
- [MM:SS] ✓ IP Enrichment (<X>s)
- [MM:SS] ✓ Phase 3: Report generation (<X>s)
- **Total Investigation Time:** <duration>

Markdown Report Authoring Guidelines

  1. Populate every section — even if data is empty. Use the ✅ No <X> detected... pattern for empty sections.
  2. Never invent data — follow the Evidence-Based Analysis global rule strictly. Every number in the report must come from a query result.
  3. Risk assessment is dynamic — calculate risk score using the weighted framework in the Risk Assessment Framework section (Defender Risk Score 25%, Active Alerts 25%, Vulnerabilities 20%, Compliance Status 15%, Sign-in Anomalies 15%).
  4. IP enrichment — run enrich_ips.py for external IPs from network connections and sign-in data. If enrich_ips.py is unavailable, use Sentinel ThreatIntelIndicators data as fallback.
  5. PII-Free — the report file is saved to reports/ which is gitignored. However, exercise caution with any files that may be shared externally.
  6. Emoji consistency — follow the Emoji Formatting table from copilot-instructions.md for all risk/status indicators.
  7. Query appendix — include record counts and execution times but NOT full KQL text. Reference the SKILL.md query numbers.
  8. Trust type context — always reference the device trust type in the Executive Summary and Risk Assessment, as it affects the security implications.

JSON Export Structure

Export MCP query results to a single JSON file with these required keys:

{
  "device_name": "WORKSTATION-001",
  "device_id_entra": "<ENTRA_DEVICE_OBJECT_ID>",
  "device_id_defender": "<DEFENDER_DEVICE_ID>",
  "device_type": "HybridJoined",
  "investigation_date": "2026-01-23",
  "start_date": "2026-01-16",
  "end_date": "2026-01-25",
  "timestamp": "20260123_143200",
  
  "device_profile": {
    "displayName": "WORKSTATION-001",
    "operatingSystem": "Windows",
    "operatingSystemVersion": "10.0.22621.3007",
    "trustType": "ServerAd",
    "isCompliant": true,
    "isManaged": true,
    "registrationDateTime": "2025-06-15T10:30:00Z",
    "approximateLastSignInDateTime": "2026-01-23T14:00:00Z",
    "manufacturer": "Dell Inc.",
    "model": "Latitude 5520"
  },
  
  "defender_profile": {
    "healthStatus": "Active",
    "riskScore": "Medium",
    "exposureLevel": "Low",
    "onboardingStatus": "Onboarded",
    "sensorHealthState": "Active",
    "lastSeen": "2026-01-23T14:30:00Z",
    "lastIpAddress": "10.0.1.50",
    "lastExternalIpAddress": "203.0.113.42"
  },
  
  "device_owners": [...],
  "device_users": [...],
  "signin_events": [...],
  "security_alerts": [...],
  "process_events": [...],
  "network_events": [...],
  "file_events": [...],
  "registry_events": [...],
  "incidents": [...],
  "logged_on_users": [...],
  "software_inventory": [...],
  "vulnerabilities": [...],
  "automated_investigations": [...],
  "remediation_activities": [...],
  "threat_intel_matches": [...],
  
  "summary": {
    "total_alerts": 5,
    "critical_alerts": 1,
    "high_alerts": 2,
    "medium_alerts": 2,
    "low_alerts": 0,
    "total_vulnerabilities": 15,
    "critical_vulnerabilities": 2,
    "unique_logged_on_users": 3,
    "suspicious_processes": 4,
    "threat_intel_hits": 1
  }
}

Error Handling

Common Issues and Solutions

Issue Solution
Device not found in Graph API Try searching by deviceId instead of displayName, check case sensitivity
Defender Device ID not matching Use Advanced Hunting to find correct Defender ID by device name
DeviceName query returns empty Use startswith instead of =~ - DeviceName often contains FQDN (e.g., hostname.domain.com)
SigninLogs DeviceDetail fails with union DeviceDetail is dynamic in SigninLogs but string in AADNonInteractiveUserSignInLogs - query tables separately, don't use union isfuzzy=true with DeviceDetail filtering
RiskScore column not found RiskScore is NOT in DeviceInfo table - use GetDefenderMachine API for riskScore
Missing compliance data Device may not be MDM enrolled - check isManaged field
No process events Device may not be onboarded to Defender for Endpoint
Trust type is null Device may be partially registered - check registrationDateTime
Query timeout on DeviceEvents Reduce date range or add more specific filters
BitLocker query fails Verify permissions and that BitLocker is enabled on device

Required Field Defaults

{
  "trustType": "Workplace",
  "isCompliant": false,
  "isManaged": false,
  "approximateLastSignInDateTime": "1970-01-01T00:00:00Z",
  "riskScore": "Unknown",
  "exposureLevel": "Unknown",
  "healthStatus": "Unknown"
}

Empty Result Handling

{
  "signin_events": [],
  "security_alerts": [],
  "process_events": [],
  "network_events": [],
  "file_events": [],
  "registry_events": [],
  "incidents": [],
  "logged_on_users": [],
  "software_inventory": [],
  "vulnerabilities": [],
  "automated_investigations": [],
  "remediation_activities": [],
  "threat_intel_matches": []
}

Device Trust Type Analysis

Security Implications by Trust Type

Entra Joined (trustType: AzureAd)

  • Pros: Full cloud management, Conditional Access enforcement, BitLocker key escrow
  • Cons: No access to on-premises resources without VPN/Azure AD Application Proxy
  • Investigation Focus: Cloud sign-in patterns, Intune compliance, Conditional Access logs

Hybrid Joined (trustType: ServerAd)

  • Pros: Access to both cloud and on-premises resources, GPO support
  • Cons: Complex identity, dual token handling, potential for on-prem compromise to affect cloud
  • Investigation Focus: BOTH cloud and on-premises sign-ins, AD replication, Kerberos tickets

Entra Registered (trustType: Workplace)

  • Pros: BYOD support, minimal device management overhead
  • Cons: Limited compliance enforcement, device not fully controlled
  • Investigation Focus: User activity on device, data access patterns, potential data exfiltration

Risk Assessment Framework

Device Risk Scoring

Factor Weight High Risk Indicators
Defender Risk Score 25% "High" or "Critical"
Active Alerts 25% Any Critical/High severity alerts
Vulnerabilities 20% Critical CVEs, end-of-support software
Compliance Status 15% Non-compliant, not managed
Sign-in Anomalies 15% Multiple users, unusual hours, new IPs

Risk Level Determination

  • Critical: Active critical alert OR critical vulnerability being exploited
  • High: High severity alerts OR critical unpatched vulnerabilities OR compromised user logged on
  • Medium: Medium alerts OR high vulnerabilities OR non-compliance
  • Low: Minor alerts OR low vulnerabilities, device is compliant and healthy
  • Informational: No alerts, compliant, healthy sensor

Integration with Main Copilot Instructions

This skill follows all patterns from the main copilot-instructions.md:

  • Date range handling: Uses +2 day rule for real-time searches
  • Parallel execution: Runs independent queries simultaneously
  • Time tracking: Mandatory reporting after each phase
  • Token management: Uses create_file for all output
  • Follow-up analysis: Reference copilot-instructions.md for cross-entity correlation

Example invocations:

  • "Investigate device WORKSTATION-001 for the last 7 days"
  • "Quick security check on computer LAP-JSMITH01"
  • "Full investigation for potentially compromised endpoint SRV-DC01 last 30 days"
  • "Check hybrid joined device DESKTOP-HR01 for malware"
  • "Analyze BYOD device iPad-John for suspicious activity"

SVG Dashboard Generation

After generating a computer investigation report (markdown file output), an SVG dashboard can be created using the shared SVG rendering skill.

Trigger: User asks "generate an SVG dashboard from the report" or "visualize this report"

Workflow:

  1. Read this skill's svg-widgets.yaml (widget manifest — defines layout, colors, field mapping)
  2. Read .github/skills/svg-dashboard/SKILL.md (rendering rules — component library, quality standards)
  3. Extract data from the completed report using data_sources.field_mapping_notes
  4. Render SVG → save as {report_basename}_dashboard.svg in the same directory

Layout: 5 rows — title banner, risk score card + KPI cards (alerts/incidents/vulnerabilities/users/EOS software), alerts by MITRE tactic bar chart + vulnerabilities by severity bar chart, incidents table + risk/mitigating factors table, assessment banner + recommendations.


Last Updated: March 24, 2026

分析Microsoft Purview DataSecurityEvents,涵盖SIT访问、DLP匹配、敏感度标签变更及Copilot风险暴露。支持大规模环境下的用户下钻、文件清单与风险排名,提供KQL查询模板与可视化报告。
data security sensitive information type SIT access DLP events DataSecurityEvents EDM access credit card access insider risk activity Purview data security sensitivity label label downgrade label change Copilot label exposure
.github/skills/data-security-analysis/SKILL.md
npx skills add SCStelz/security-investigator --skill data-security-analysis -g -y
SKILL.md
Frontmatter
{
    "name": "data-security-analysis",
    "description": "Analyze data security events, sensitive information type (SIT) access, sensitivity label access, DLP matches, or Purview insider risk activity. Triggers on keywords like \"data security\", \"sensitive information type\", \"SIT access\", \"DLP events\", \"DataSecurityEvents\", \"EDM access\", \"credit card access\", \"insider risk activity\", \"Purview data security\", \"sensitivity label\", \"label downgrade\", \"label change\", \"Copilot label exposure\". Queries DataSecurityEvents in Advanced Hunting to produce SIT and label access analysis: volume breakdowns, user drill-downs, file inventories, action type distribution, DLP correlation, label change tracking, Copilot label exposure, temporal patterns, and risk-ranked user summaries. Inline chat or markdown output. Designed for large environments (100k+ users) with tiered drill-down.",
    "drill_down_prompt": "Analyze data security events — SIT access patterns, label changes, DLP policy matches",
    "threat_pulse_domains": [
        "admin",
        "cloud"
    ]
}

Data Security Events Analysis — Instructions

Purpose

This skill analyzes DataSecurityEvents (Microsoft Purview Insider Risk Management / DLP telemetry) to answer questions about who accessed documents containing sensitive information types (SITs) and/or sensitivity labels — including EDM (Exact Data Match), built-in SITs (credit cards, SSNs, etc.), trainable classifiers, and Microsoft Purview sensitivity labels (Confidential, Highly Confidential, custom labels, etc.).

Primary Table: DataSecurityEvents (Defender XDR Advanced Hunting)

Use Case Example Question
SIT access audit "Who accessed files with credit card numbers in the last 30 days?"
EDM monitoring "Show me all access to documents matching our EDM SIT"
DLP event analysis "What DLP policy matches occurred this week?"
Insider risk triage "Which users have the most sensitive data interactions?"
SIT landscape overview "What sensitive information types exist in our environment?"
Sensitivity label audit "Who accessed Highly Confidential labeled documents?"
Label change tracking "Show me all label downgrades in the last 30 days"
Copilot label exposure "What labeled documents did Copilot access in risky interactions?"

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. SIT GUID Mapping Strategy - How SIT GUIDs are resolved to names
  3. Label GUID Mapping Strategy - How sensitivity label GUIDs are resolved to names
  4. Output Modes - Inline chat vs. Markdown file
  5. Quick Start - 8-step execution pattern
  6. Execution Workflow - 6-phase analysis process
  7. Sample KQL Queries - Validated query patterns (Queries 1-16d)
  8. Report Template - Rendering rules (15 rules) + output format specification
  9. Known Pitfalls - Table quirks and edge cases (27 entries)
  10. Error Handling - Troubleshooting guide
  11. SVG Dashboard Generation - Visual dashboard from report data

Investigation shortcuts:

  • DLP/exfiltration incident entities (TP Q1): Q3 (top users by SIT volume) → Q6 (DLP policy matches) → Q9 (single-user SIT profile for each incident entity) → Q10b (file-based spikes)
  • High-volume mailbox API access (TP Q9): Q9 (single-user SIT profile for API actors) → Q4 (top files accessed) → Q10b (file-based spikes) → Q6 (DLP policy matches)
  • Risky identity with data access (TP Q3): Q9 (single-user SIT profile) → Q4 (top files) → Q13 (label downgrade/changes by user)
  • Copilot sensitive data exposure (TP Q1 Copilot incidents, or TP Q10 AppRegistration with AI keywords): Q16a (Copilot SIT landscape + agent/human split) → Q16b (top human users, high-priority SITs) → Q16d (prompt-only risk signal)
  • Label compliance / downgrade alert (TP Q1 label-related incidents): Q13 (label changes) → Q15 (label-only events) → Q14 (Copilot label exposure)
  • Tenant-wide data security posture (standalone, no TP trigger): Full Phase 1–5 workflow

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full Phase 1-5 sequence when the user explicitly requests "full analysis", "comprehensive", or "tenant-wide overview". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).

When invoked from a parent skill (threat-pulse, incident-investigation, user-investigation):

  • Inherit the workspace selection from the parent investigation context
  • Skip output mode prompts — default to inline chat (the parent skill controls the final output format)
  • Match the TP Q# trigger to the shortcuts above and execute that chain with entity substitution
  • Use 30d lookback (AH default) unless the parent specifies otherwise

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY data security analysis:

  1. ALWAYS use RunAdvancedHuntingQuery — DataSecurityEvents is an Advanced Hunting table, NOT available in Sentinel Data Lake
  2. ALWAYS run Query 1 (SIT Discovery) first — establishes which SITs are active and builds the GUID-to-Name mapping
  3. ALWAYS use summarize aggressively — this table can have 600k+ rows in 30 days even in mid-size tenants. NEVER retrieve raw rows except for targeted samples
  4. ALWAYS pre-filter with has before mv-expand on SensitiveInfoTypeInfo — the has "<GUID>" filter avoids expensive expansion on non-matching rows
  5. ALWAYS use tostring() + double parse_json() for SensitiveInfoTypeInfo — it's Collection(String), not native dynamic
  6. NEVER report SIT GUIDs without attempting name resolution — use the mapping strategy below
  7. ALWAYS ask for output mode if not specified: inline chat or markdown file
  8. Prerequisite: DataSecurityEvents requires Insider Risk Management opt-in to share data with Defender XDR. If the table returns 0 rows or "table not found", inform the user of this requirement
  9. ALWAYS run the Label Coverage Assessment (Query 11 quick stats variant) during Phase 1 to determine if this environment has significant label usage. Adapt the report accordingly (see Rule 11)
  10. NEVER report sensitivity label GUIDs without attempting name resolution — use the label mapping strategy below
  11. ALWAYS use split() on SensitivityLabelId — this column can contain comma-separated GUIDs (one per sub-entity), not a single GUID

⛔ PROHIBITED ACTIONS

Action Status
Querying DataSecurityEvents via mcp_sentinel-data_query_lake PROHIBITED — AH-only table
Retrieving raw rows without summarize or take limit PROHIBITED — table is massive
Reporting SIT GUIDs without name resolution attempt PROHIBITED
Reporting sensitivity label GUIDs without name resolution attempt PROHIBITED
Running mv-expand on SensitiveInfoTypeInfo without pre-filtering with has PROHIBITED — performance killer at scale
Assuming SensitiveInfoTypeInfo is native dynamic PROHIBITED — it's Collection(String), requires double-parse

SIT GUID Mapping Strategy

The Problem

DataSecurityEvents.SensitiveInfoTypeInfo contains SIT GUIDs, not human-readable names. SIT GUIDs fall into three categories:

Category Resolvable via KQL? Example
Built-in Microsoft SITs ✅ Yes — use embedded mapping 50842eb7-...-b085 → "Credit Card Number"
Custom/EDM SITs ❌ No — org-specific GUIDs b28fcea1-...-9291 → "Project Obsidian" (custom)
Trainable Classifiers (ML) ❌ No — ClassifierType: "MLModel" 77a140be-...-7560 → unknown ML classifier

Resolution Strategy (3 tiers, in order)

Tier 1: Embedded Well-Known SIT Mapping (instant, no auth)

The query library below includes a datatable of the most common Microsoft SIT GUIDs encountered in production environments. This covers ~90% of detections in typical tenants.

Tier 2: User-Provided Custom SIT Mapping (config-driven)

If the user has custom/EDM SITs, they can provide a mapping in config.json under a sit_mapping key:

{
  "sit_mapping": {
    "<custom-sit-guid-1>": "Your Custom SIT Name",
    "<custom-sit-guid-2>": "Your EDM SIT Name"
  }
}

At skill startup: Check if config.json has a sit_mapping section. If yes, merge it into the KQL datatable for name resolution.

Tier 3: PowerShell Resolution (optional, on-demand)

If unresolved GUIDs remain after Tier 1+2, offer to resolve them via PowerShell:

"I found N SIT GUIDs that aren't in the built-in mapping. Would you like me to resolve them via Get-DlpSensitiveInformationType? This requires an active Security & Compliance PowerShell session (Connect-IPPSSession)."

If the user agrees:

# Requires: Install-Module ExchangeOnlineManagement
# Requires: Connect-IPPSSession -UserPrincipalName <UPN>
Get-DlpSensitiveInformationType -Identity "<GUID>" | Select-Object Name, Id, Publisher

After resolution: Offer to save the mapping to config.json for future runs.

Post-Resolution Persistence (MANDATORY)

After Tier 3 PowerShell resolution completes, always offer to persist the resolved GUIDs:

"I resolved N SIT GUIDs via PowerShell. Would you like me to save these to config.json under sit_mapping so future runs resolve them automatically via Tier 2?"

If the user agrees, read the current config.json, add/merge a sit_mapping object with the resolved GUIDs, and write it back. Format:

{
  "sit_mapping": {
    "<guid>": "<resolved-name>",
    "<guid>": "<resolved-name>"
  }
}

Why this matters: Without persistence, every new session re-encounters the same unresolved GUIDs. The first report in a workspace should resolve and persist; subsequent runs benefit automatically.

Trainable Classifiers

GUIDs with ClassifierType: "MLModel" are trainable classifiers and may not resolve via Get-DlpSensitiveInformationType. Display them as:

  • [ML Classifier] <GUID> if unresolved
  • Check if the GUID appears in the well-known mapping (some trainable classifiers have known GUIDs)

Label GUID Mapping Strategy

The Problem

DataSecurityEvents has 4 label-related columns, all containing sensitivity label GUIDs (not names):

Column Type Content
SensitivityLabelId string Label on the document at event time. Can contain comma-separated GUIDs (one per sub-entity)
PreviousSensitivityLabelId string Previous label — only populated on label-change events (downgrade, removal)
SharepointSiteSensitivityLabelId string Label on the SharePoint site (not the document)
RiskyAIUsageSensitivityLabelsInfo Collection(String) Labels on resources Copilot accessed in risky AI events — JSON array of objects with SubEntityId, SubEntityName, SensitivityLabelId

Resolution Strategy (3 tiers, in order)

Tier 1: Embedded Well-Known Label Mapping (instant, no auth)

The query library includes a datatable of Microsoft default sensitivity labels (the defa4170-* GUID family). All 12 default labels — including parent labels — use the deterministic pattern defa4170-0d19-0005-XXXX-bc88714345d2, confirmed across multiple tenants.

⚠️ Important: Microsoft does not publish default label GUIDs in official documentation. The GUID pattern is confirmed via Get-Label on default-configuration tenants. Older tenants may have renamed default labels (e.g., "Non-business" instead of "Personal", "Internal exception" instead of "Anyone (unrestricted)") or replaced default parent label GUIDs with random tenant-specific ones. Always validate with Get-Label (Tier 3) when accuracy matters.

Default Label GUID Pattern: defa4170-0d19-0005-XXXX-bc88714345d2 — complete mapping (12 labels, priority-ordered):

GUID suffix Priority Default Name Parent
0000 0 Personal (top-level)
0001 1 Public (top-level)
0002 2 General (top-level)
0003 3 Anyone (unrestricted) General
0004 4 All Employees (unrestricted) General
0005 5 Confidential (top-level, parent)
0006 6 Anyone (unrestricted) Confidential
0007 7 All Employees Confidential
0008 8 Trusted People Confidential
0009 9 Highly Confidential (top-level, parent)
000a 10 All Employees Highly Confidential
000b 11 Specified People Highly Confidential

Older/customized tenants: Admins may have renamed default labels or deleted and recreated parent labels with random GUIDs. If Get-Label returns a different GUID for "Confidential" or "Highly Confidential" (not matching defa4170-*), the tenant has custom parent labels — add them via config.json (Tier 2).

Tier 2: User-Provided Custom Label Mapping (config-driven)

Custom labels (org-created) have random GUIDs. Users can provide a mapping in config.json:

{
  "label_mapping": {
    "<custom-label-guid-1>": "Your Custom Label",
    "<sub-label-guid>": "Sub-Label Name|Parent Label Name",
    "<parent-label-guid>": "Confidential"
  }
}

Value format: "LabelName" for top-level labels, "LabelName|ParentName" (pipe-delimited) for sub-labels. When building the KQL datatable, split on | to populate LabelName and LabelParent columns.

Renamed defaults: If a tenant has renamed default labels (e.g., defa4170...0000 → "Non-business" instead of "Personal"), include the renamed GUID in label_mapping — Tier 2 entries override Tier 1 defaults.

At skill startup: Check if config.json has a label_mapping section. If yes, merge it into the KQL datatable for name resolution. Tier 2 entries take precedence over Tier 1 defaults for the same GUID.

Tier 3: PowerShell Resolution (optional, on-demand)

If unresolved label GUIDs remain after Tier 1+2, offer to resolve them via PowerShell:

"I found N label GUIDs that aren't in the built-in mapping. Would you like me to resolve them via Get-Label? This requires an active Security & Compliance PowerShell session (Connect-IPPSSession)."

If the user agrees:

# Requires: Install-Module ExchangeOnlineManagement
# Requires: Connect-IPPSSession -UserPrincipalName <UPN>
Get-Label | Select-Object DisplayName, @{N='LabelGuid';E={$_.Guid.ToString()}}, ParentLabelDisplayName | Format-List

After resolution: Offer to save the mapping to config.json under label_mapping for future runs (same persistence pattern as SIT mapping).

Key difference from SIT resolution: Labels use Get-Label (not Get-DlpSensitiveInformationType). The cmdlet returns ALL labels at once — no need to query by individual GUID.


Output Modes

ASK the user which they prefer if not explicitly specified. Both may be selected.

Mode 1: Inline Chat Summary (Default)

  • Render analysis directly in chat
  • Includes summary tables, top-N breakdowns, risk-ranked user list
  • Best for quick review and follow-up questions

Mode 2: Markdown File Report

  • Save to reports/data-security/DataSecurity_Analysis_<scope>_<timestamp>.md
  • Full detail including all phases, temporal charts, file inventories
  • Use create_file tool — NEVER use terminal commands for file output
  • Filename pattern: DataSecurity_Analysis_<scope>_YYYYMMDD_HHMMSS.md
    • <scope> = tenant_wide, sit_<SITname>, user_<username>, etc.

Quick Start (TL;DR)

  1. Determine scope → Tenant-wide overview? Specific SIT? Specific user? Specific label? Time range?
  2. Check config.json → Look for sit_mapping and label_mapping sections for custom name resolution
  3. Run Phase 1 → Query 1 (SIT Discovery) + Label Coverage Assessment (quick stats variant) to determine environment maturity
  4. Run Phase 2 → Queries 2-5 (breakdowns by action type, user, file, time)
  5. Run Phase 2.5 → Queries 16a-16d (Copilot SIT exposure analysis) — conditional on Copilot volume (see Phase 2.5 trigger)
  6. Run Phase 3 → Queries 6-8 (DLP correlation, workload, SIT drill-down), Query 10b (file-based spikes)
  7. Run Phase 4 → Queries 11-15 (label landscape, label-based user ranking, label downgrades, Copilot label exposure, label-only events) — depth depends on label coverage (see Rule 11)
  8. Output Results → Render in selected mode(s), offer PowerShell resolution for unknowns

Execution Workflow

Phase 1: Discovery & Mapping (always run first)

Goal: Establish what SITs and labels exist in the data, their volume, and resolve GUIDs to names.

  1. Run Query 1 (SIT Discovery) — returns top SIT GUIDs with hit counts
  2. Run Label Coverage Assessment (quick stats from Query 11 comment block) — returns label vs SIT coverage percentages
  3. Apply Tier 1 mapping (embedded datatable) to resolve known SIT and label GUIDs
  4. Check config.json for Tier 2 mapping (sit_mapping + label_mapping) to resolve custom GUIDs
  5. Flag any remaining unresolved GUIDs for optional Tier 3 (PowerShell)
  6. Present the SIT + label landscape to the user before proceeding
  7. Determine label analysis depth based on coverage (see Rule 11)

Phase 2: Breakdown Analysis

Goal: Decompose SIT access patterns by multiple dimensions.

Run these queries in parallel where possible:

Query Dimension Purpose
Query 2 Action Type What operations triggered SIT detections (file read, download, copy, Copilot response, etc.)
Query 3 User Ranking Top users by SIT interaction volume — risk-ranked
Query 4 File Inventory Top files/documents containing the most SIT detections
Query 5 Temporal Pattern Daily/hourly volume trend to spot spikes

Phase 2.5: Copilot SIT Exposure Analysis (conditional on Copilot volume)

Trigger: Run this phase when Copilot/AI events exceed 30% of total volume (determined from Query 2 Action Type breakdown or Query 7 Workload breakdown).

Goal: Decompose Copilot SIT interactions by priority tier, identify users prompting high-value SITs into Copilot, separate service account noise from human risk signals, and estimate real interaction counts (correcting for row multiplication).

Key insight: Each Copilot interaction generates ~2-3 DSE rows on average (up to 35 for complex exchanges) because Purview creates separate rows for prompt SIT matches, response SIT matches, and compound agent interactions. Raw event counts must be corrected for this multiplier when reporting interaction volumes.

Query Purpose
Query 16a Copilot SIT Landscape — Which SITs fire in Copilot interactions, classified by priority tier (High/Medium/Low)
Query 16b Top Users by High-Priority SIT in Copilot — Risk-ranked users excluding service accounts
Query 16c Daily Temporal Trend by SIT Category — Spot adoption vs risk pattern changes over time
Query 16d Prompt-Only Human Users — Users typing sensitive data INTO Copilot (primary risk signal), excluding service accounts and responses

Service Account Filtering: Automated service accounts (e.g., Security Copilot agents, Purview agents) can generate 50-70% of all Copilot events. These accounts typically follow patterns like securitycopilotagentuser-*, svc-*, or system-generated UPN prefixes with GUIDs. Query 16a quantifies the agent vs human split; Queries 16b-16d exclude agents to surface human risk.

Priority SIT Classification: SITs are classified into tiers for Copilot risk assessment:

Tier SIT Categories Risk Rationale
🔴 High Credit Card Numbers, SSNs, Azure/Cloud Credentials, Employee HR Data (custom) Direct financial, identity, or infrastructure exposure
🟡 Medium Project code names (custom), Employee IDs (custom) Business-sensitive but not directly exploitable
🔵 Low All Full Names, IP Addresses, Physical Addresses, Medical Terms High-volume, low-specificity — noise in Copilot context

Custom SIT classification: Organizations should classify their custom/EDM SITs into these tiers. If config.json has a sit_priority section (mapping GUID → tier), use it. Otherwise, classify custom SITs as 🔴 High by default (conservative).


Phase 3: Deep Dive (conditional on scope)

Scenario Run These
Tenant-wide overview Query 6 (DLP policy matches), Query 7 (Workload breakdown)
Specific SIT investigation Query 8 (Single-SIT deep dive with full user/file/action breakdown)
Specific user investigation Query 9 (Single-user SIT access profile)
Anomaly detection Query 10b (file-based spikes — PRIMARY), Query 10 (overall spikes — secondary, includes Copilot)

Phase 4: Sensitivity Label Analysis (conditional on coverage)

Goal: Analyze sensitivity label access patterns, label changes, and Copilot label exposure.

Run the Label Coverage Assessment first (Phase 1, step 2). Then adapt depth per Rule 11:

Label Coverage Analysis Depth Run These
≥5% of events have labels (label-mature environment) Full label analysis — dedicated report sections Query 11 (label landscape), Query 12 (label-based user ranking), Query 13 (label changes), Query 14 (Copilot label exposure), Query 15 (label-only events)
1-5% of events have labels (emerging label environment) Summary label section — condensed into one section Query 11 (label landscape), Query 13 (label changes)
<1% of events have labels (SIT-dominant environment) Brief note only — mention label presence in Scope & Limitations Label coverage stats from Phase 1 assessment only
User asks specifically about labels Full label analysis regardless of coverage percentage All label queries (11-15)

Phase 5: Report Generation

Render findings using the Report Template below.


Sample KQL Queries

Well-Known SIT GUID Mapping (datatable)

Use this let block as a prefix for any query that needs name resolution. It covers the most common Microsoft SITs plus placeholders for custom SITs from config.json.

// Well-known SIT GUID mapping — covers ~90% of typical detections
// Add custom/EDM SIT GUIDs from config.json sit_mapping section
let SITMapping = datatable(SITId: string, SITName: string) [
    // ── Financial ──
    "50842eb7-edc8-4019-85dd-5a5c1f2bb085", "Credit Card Number",
    "cb353f78-2b72-4c3c-8827-92ebe4f69fdf", "ABA Routing Number",
    "78e09124-f2c3-4656-b32a-c1a132cd2711", "Brazil CPF Number",
    // ── Identity / PII ──
    "a44669fe-0d48-453d-a9b1-2cc83f2cba77", "U.S. Social Security Number (SSN)",
    "a7dd5e5f-e7f9-4626-a2c6-86a8cb6830d2", "IP Address v4",
    "1daa4ad5-e2dd-4ca4-a788-54722c09efb2", "IP Address",
    "50b8b56b-4ef8-44c2-a924-03374f5831ce", "All Full Names",
    "8548332d-6d71-41f8-97db-cc3b5fa544e6", "All Physical Addresses",
    "44aa44f2-63d1-41df-af0d-970283ac41e2", "U.S. Physical Addresses",
    "d1d18c85-1203-46f5-b32f-2d6309de4e5b", "Australia Physical Addresses",
    "6fa57f91-314a-4561-8248-7ab921957448", "Philippines Passport Number",
    "d0001c83-e72f-4360-98d3-f5a41dc5a380", "Indonesia Passport Number",
    // ── Healthcare ──
    "065bdd91-ef07-40d3-b8a4-0aea722eaa49", "All Medical Terms And Conditions",
    "17066377-466d-43ff-997f-c9240414021c", "Diseases",
    "f6dc2d17-3549-41e2-af29-ae1846ae9542", "Types Of Medication",
    "ee05bb9c-7b87-42e1-9987-446b243245d5", "Lab Test Terms",
    // ── Azure / Cloud secrets ──
    "0f587d92-eb28-44a9-bd1c-90f2892b47aa", "Azure DocumentDB Auth Key",
    "ce1a126d-186f-4700-8c0c-486157b953fd", "Azure SQL Connection String",
    "0b34bec3-d5d6-4974-b7b0-dcdb5c90c29d", "Azure IoT Connection String",
    "c7bc98e8-551a-4c35-a92d-d2c8cda714a7", "Azure Storage Account Key",
    "095a7e6c-efd8-46d5-af7b-5298d53a49fc", "Azure Redis Cache Connection String",
    // ─── ADD CUSTOM / EDM SITs FROM config.json sit_mapping HERE ───
    // Example: "<your-edm-guid>", "Your EDM SIT Name",
    "END_MARKER", "END_MARKER"
];

Instructions: When building queries, read config.json for sit_mapping entries and insert them into the datatable above, replacing the END_MARKER row. If no custom mapping exists, remove the END_MARKER row.


Query 1: SIT Discovery — Active SIT Landscape

Purpose: Find all active SIT GUIDs, their volume, and classify them.

// Query 1: SIT Discovery — What SITs are active in this environment?
// Adjust timespan as needed (default: 30d)
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend ClassifierType = tostring(SITJson.ClassifierType)
| extend SITConfidence = toint(SITJson.Confidence)
| extend SITCount = toint(SITJson.Count)
| summarize 
    TotalEvents = count(),
    DistinctUsers = dcount(AccountUpn),
    DistinctFiles = dcount(ObjectId),
    AvgConfidence = avg(SITConfidence),
    MaxConfidence = max(SITConfidence),
    ClassifierTypes = make_set(ClassifierType)
    by SITId
| order by TotalEvents desc
| take 50

Post-processing: Join results with the SITMapping datatable to resolve names. Flag any GUIDs not in the mapping as "Unknown — custom/EDM SIT" or "[ML Classifier]" based on ClassifierTypes.


Query 2: Action Type Breakdown

Purpose: Break down SIT detections by what operation triggered them.

// Query 2: Action Type Breakdown — What operations trigger SIT detections?
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize 
    EventCount = count(),
    DistinctUsers = dcount(AccountUpn),
    DistinctFiles = dcount(ObjectId)
    by ActionType
| order by EventCount desc

Query 3: Top Users by SIT Interaction Volume

Purpose: Risk-rank users by sensitive data interaction volume. Designed for 100k+ user environments.

// Query 3: Top 50 Users by SIT access volume
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize 
    TotalEvents = count(),
    DistinctSITs = dcount(tostring(parse_json(tostring(parse_json(tostring(SensitiveInfoTypeInfo))[0])).SensitiveInfoTypeId)),
    DistinctFiles = dcount(ObjectId),
    ActionTypes = make_set(ActionType),
    Workloads = make_set(Workload),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by AccountUpn
| order by TotalEvents desc
| take 50

Query 4: Top Files by SIT Detection Count

Purpose: Identify the most sensitive documents — files with the most SIT detections across access events.

// Query 4: Top 30 Files by SIT detection frequency
// Excludes system/operational files (DLPCache, EBWebView) that are Defender operational reads, not user-initiated
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| where isnotempty(ObjectId)
| where ObjectId !has "DLPCache" and ObjectId !has "EBWebView" and ObjectId !has "\\ProgramData\\Microsoft\\Windows Defender\\"
| summarize 
    AccessCount = count(),
    DistinctUsers = dcount(AccountUpn),
    ActionTypes = make_set(ActionType),
    LastAccessed = max(Timestamp)
    by ObjectId
| order by AccessCount desc
| take 30

Query 5: Temporal Pattern — Daily SIT Event Volume

Purpose: Detect volume spikes or anomalies in SIT-related activity over time.

// Query 5: Daily SIT event volume trend — includes file-based column for spike attribution (Rule 10)
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize 
    DailyEvents = count(), 
    FileEvents = countif(Workload !in ("Copilot", "ConnectedAIApp")),
    DistinctUsers = dcount(AccountUpn) 
    by Day = bin(Timestamp, 1d)
| order by Day asc

Query 6: DLP Policy Match Correlation

Purpose: Show DLP policy matches alongside SIT detections — which policies fired and how often.

// Query 6: DLP Policy Match breakdown
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(DlpPolicyMatchInfo)
| extend DlpInfo = parse_json(DlpPolicyMatchInfo)
| mv-expand DlpPolicy = DlpInfo
| extend PolicyName = tostring(DlpPolicy.PolicyName)
| extend PolicyId = tostring(DlpPolicy.PolicyId)
| summarize 
    MatchCount = count(),
    DistinctUsers = dcount(AccountUpn),
    DistinctFiles = dcount(ObjectId),
    ActionTypes = make_set(ActionType)
    by PolicyName
| order by MatchCount desc

Query 7: Workload Breakdown

Purpose: Where is sensitive data being accessed — SharePoint, OneDrive, Exchange, Teams, Endpoints, Copilot?

// Query 7: Workload distribution of SIT events
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize 
    EventCount = count(),
    DistinctUsers = dcount(AccountUpn),
    DistinctFiles = dcount(ObjectId)
    by Workload
| order by EventCount desc

Query 8: Single-SIT Deep Dive

Purpose: Full breakdown for a specific SIT GUID — who accessed it, which files, what operations, over what time period.

Usage: Replace <TARGET_SIT_GUID> with the specific SIT GUID to investigate (e.g., an EDM SIT GUID).

// Query 8: Single-SIT deep dive — replace GUID
let targetSIT = "<TARGET_SIT_GUID>";
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitiveInfoTypeInfo)
| where SensitiveInfoTypeInfo has targetSIT
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| where SITId == targetSIT
| extend SITConfidence = toint(SITJson.Confidence)
| extend SITCount = toint(SITJson.Count)
| summarize 
    AccessCount = count(),
    AvgConfidence = avg(SITConfidence),
    TotalSITInstances = sum(SITCount),
    ActionTypes = make_set(ActionType),
    Workloads = make_set(Workload),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by AccountUpn, ObjectId
| order by AccessCount desc
| take 100

Query 9: Single-User SIT Access Profile

Purpose: Complete SIT interaction profile for a specific user — what SITs they accessed, which files, operations, and when.

Usage: Replace <TARGET_UPN> with the user's UPN.

// Query 9: Single-user SIT access profile
let targetUser = "<TARGET_UPN>";
DataSecurityEvents
| where Timestamp > ago(30d)
| where AccountUpn =~ targetUser
| where isnotempty(SensitiveInfoTypeInfo)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend SITConfidence = toint(SITJson.Confidence)
| extend SITCount = toint(SITJson.Count)
| summarize 
    AccessCount = count(),
    DistinctFiles = dcount(ObjectId),
    AvgConfidence = avg(SITConfidence),
    ActionTypes = make_set(ActionType),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by SITId
| order by AccessCount desc

Query 10: Anomaly Detection — Users with SIT Access Spikes

Purpose: Compare each user's recent 7-day SIT activity against their 30-day daily average to detect sudden spikes. Designed for 100k+ user environments.

// Query 10: SIT access spike detection (7d recent vs 23d baseline) — ALL events
// NOTE: This includes Copilot events. For file-based-only spikes, use Query 10b below.
let baseline = DataSecurityEvents
| where Timestamp between (ago(30d) .. ago(7d))
| where isnotempty(SensitiveInfoTypeInfo)
| summarize BaselineTotal = count() by AccountUpn
| extend BaselineDailyAvg = round(BaselineTotal / 23.0, 1); // 23 days in baseline window
let recent = DataSecurityEvents
| where Timestamp > ago(7d)
| where isnotempty(SensitiveInfoTypeInfo)
| summarize RecentTotal = count() by AccountUpn
| extend RecentDailyAvg = round(RecentTotal / 7.0, 1);
recent
| join kind=inner baseline on AccountUpn
| extend SpikeRatio = round(RecentDailyAvg / BaselineDailyAvg, 2)
| where SpikeRatio > 2.0 and RecentTotal > 20 and BaselineTotal >= 10
| project AccountUpn, BaselineDailyAvg, RecentDailyAvg, SpikeRatio, BaselineTotal, RecentTotal
| order by SpikeRatio desc
| take 30

Query 10b: File-Based-Only Spike Detection (Excludes Copilot)

Purpose: Same as Query 10 but excludes Copilot and ConnectedAIApp events to surface actual file access spikes. This is the primary risk signal — Copilot spikes often just reflect adoption changes.

// Query 10b: File-based SIT access spike detection (excludes Copilot/AI events)
let CopilotActionTypes = dynamic(["Risky prompt entered in Copilot", "Sensitive response received in Copilot",
    "Risky prompt entered in connected AI apps", "Sensitive response received in connected AI apps"]);
let baseline = DataSecurityEvents
| where Timestamp between (ago(30d) .. ago(7d))
| where isnotempty(SensitiveInfoTypeInfo)
| where not(ActionType has_any (CopilotActionTypes))
| where Workload !in ("Copilot", "ConnectedAIApp")
| summarize BaselineTotal = count() by AccountUpn
| extend BaselineDailyAvg = round(BaselineTotal / 23.0, 1);
let recent = DataSecurityEvents
| where Timestamp > ago(7d)
| where isnotempty(SensitiveInfoTypeInfo)
| where not(ActionType has_any (CopilotActionTypes))
| where Workload !in ("Copilot", "ConnectedAIApp")
| summarize RecentTotal = count() by AccountUpn
| extend RecentDailyAvg = round(RecentTotal / 7.0, 1);
recent
| join kind=inner baseline on AccountUpn
| extend SpikeRatio = round(RecentDailyAvg / BaselineDailyAvg, 2)
| where SpikeRatio > 2.0 and RecentTotal > 10 and BaselineTotal >= 10
| project AccountUpn, BaselineDailyAvg, RecentDailyAvg, SpikeRatio, BaselineTotal, RecentTotal
| order by SpikeRatio desc
| take 30

Well-Known Label GUID Mapping (datatable)

Use this let block as a prefix for label queries that need name resolution. It covers the Microsoft default sensitivity labels plus placeholder slots for custom labels from config.json.

// Microsoft default sensitivity label GUID mapping — all 12 labels
// Confirmed via Get-Label on default-configuration tenants
// Older tenants may have renamed labels or replaced parent GUIDs — validate with Get-Label if needed
// Add custom labels from config.json label_mapping section
let LabelMapping = datatable(LabelId: string, LabelName: string, LabelParent: string) [
    // ── Microsoft Default Labels (defa4170-0d19-0005-XXXX-bc88714345d2 family) ──
    // Priority 0-11, sequential GUID suffixes
    "defa4170-0d19-0005-0000-bc88714345d2", "Personal", "",
    "defa4170-0d19-0005-0001-bc88714345d2", "Public", "",
    "defa4170-0d19-0005-0002-bc88714345d2", "General", "",
    "defa4170-0d19-0005-0003-bc88714345d2", "Anyone (unrestricted)", "General",
    "defa4170-0d19-0005-0004-bc88714345d2", "All Employees (unrestricted)", "General",
    "defa4170-0d19-0005-0005-bc88714345d2", "Confidential", "",
    "defa4170-0d19-0005-0006-bc88714345d2", "Anyone (unrestricted)", "Confidential",
    "defa4170-0d19-0005-0007-bc88714345d2", "All Employees", "Confidential",
    "defa4170-0d19-0005-0008-bc88714345d2", "Trusted People", "Confidential",
    "defa4170-0d19-0005-0009-bc88714345d2", "Highly Confidential", "",
    "defa4170-0d19-0005-000a-bc88714345d2", "All Employees", "Highly Confidential",
    "defa4170-0d19-0005-000b-bc88714345d2", "Specified People", "Highly Confidential",
    // ─── ADD CUSTOM LABELS FROM config.json label_mapping HERE ───
    // Example: "<your-custom-label-guid>", "Your Custom Label", "Parent Label",
    "END_MARKER", "END_MARKER", "END_MARKER"
];

Instructions: When building queries, read config.json for label_mapping entries and insert them into the datatable above, replacing the END_MARKER row. Older/customized tenants: If Get-Label shows parent labels (Confidential, Highly Confidential) with random GUIDs instead of defa4170-*, the admin has recreated them — add the tenant-specific GUIDs from config.json label_mapping or Get-Label. If resolved sub-label names differ from the datatable (e.g., "Non-business" vs "Personal"), prefer the Get-Label name for that tenant.


Query 11: Label Coverage Overview — Sensitivity Label Landscape

Purpose: Discover which sensitivity labels appear in the data, their volume, and resolve GUIDs to names. Also includes a quick stats variant for Phase 1 coverage assessment.

Quick Stats Variant (run first in Phase 1):

// Label Coverage Assessment — run in Phase 1 to determine label analysis depth
DataSecurityEvents
| where Timestamp > ago(30d)
| summarize
    TotalEvents = count(),
    WithSIT = countif(isnotempty(SensitiveInfoTypeInfo) and SensitiveInfoTypeInfo != "[]"),
    WithLabel = countif(isnotempty(SensitivityLabelId)),
    WithPrevLabel = countif(isnotempty(PreviousSensitivityLabelId)),
    LabelOnly_NoSIT = countif(isnotempty(SensitivityLabelId) and (isempty(SensitiveInfoTypeInfo) or SensitiveInfoTypeInfo == "[]")),
    SIT_WithLabel = countif(isnotempty(SensitivityLabelId) and isnotempty(SensitiveInfoTypeInfo) and SensitiveInfoTypeInfo != "[]")

Full Label Landscape Query:

// Query 11: Label Landscape — which sensitivity labels appear and how often
// Prefix with LabelMapping datatable from above
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitivityLabelId)
| extend LabelIds = split(SensitivityLabelId, ",")
| mv-expand LabelIdRaw = LabelIds
| extend LabelId = tostring(trim(" ", tostring(LabelIdRaw)))
| where isnotempty(LabelId)
| lookup kind=leftouter LabelMapping on LabelId
| extend LabelDisplay = iff(isempty(LabelName) or LabelName == "END_MARKER",
    strcat("[Unknown] ", LabelId),
    iff(isempty(LabelParent), LabelName, strcat(LabelParent, " / ", LabelName)))
| summarize
    EventCount = count(),
    DistinctUsers = dcount(AccountUpn),
    DistinctFiles = dcount(ObjectId),
    ActionTypes = make_set(ActionType)
    by LabelDisplay, LabelId
| order by EventCount desc

Post-processing: Flag any [Unknown] GUIDs for Tier 2/3 resolution. The LabelDisplay column renders as "Parent / Child" for sub-labels (e.g., "Highly Confidential / Project Obsidian") and just the label name for top-level labels.


Query 12: Top Users by Labeled Document Access (File-Based)

Purpose: Risk-rank users by labeled document access volume, excluding Copilot/AI events. This is the label-dimension equivalent of Query 3.

// Query 12: Top users by labeled document access (file-based only)
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitivityLabelId)
| where ActionType !has "Copilot" and Workload !in ("Copilot", "ConnectedAIApp")
| summarize
    EventCount = count(),
    DistinctLabels = dcount(SensitivityLabelId),
    DistinctFiles = dcount(ObjectId),
    ActionTypes = make_set(ActionType),
    Workloads = make_set(Workload),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by AccountUpn
| order by EventCount desc
| take 30

Query 13: Label Downgrade & Change Tracking

Purpose: Find all events where a sensitivity label was downgraded, removed, or changed. Critical for detecting policy circumvention or insider risk.

// Query 13: Label downgrade/removal events — detect label circumvention
// Prefix with LabelMapping datatable to resolve both current and previous label GUIDs
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(PreviousSensitivityLabelId)
| extend CurrentLabelId = SensitivityLabelId
| extend PrevLabelId = PreviousSensitivityLabelId
| lookup kind=leftouter (LabelMapping | project LabelId, CurrentLabelName=LabelName, CurrentParent=LabelParent) on $left.CurrentLabelId == $right.LabelId
| lookup kind=leftouter (LabelMapping | project LabelId, PrevLabelName=LabelName, PrevParent=LabelParent) on $left.PrevLabelId == $right.LabelId
| extend CurrentDisplay = iff(isempty(CurrentLabelName), iff(isempty(CurrentLabelId), "[Removed]", strcat("[Unknown] ", CurrentLabelId)),
    iff(isempty(CurrentParent), CurrentLabelName, strcat(CurrentParent, " / ", CurrentLabelName)))
| extend PrevDisplay = iff(isempty(PrevLabelName), strcat("[Unknown] ", PrevLabelId),
    iff(isempty(PrevParent), PrevLabelName, strcat(PrevParent, " / ", PrevLabelName)))
| project Timestamp, ActionType, AccountUpn, ObjectId,
    PreviousLabel = PrevDisplay, CurrentLabel = CurrentDisplay, Workload
| order by Timestamp desc

Key ActionTypes in label change events:

  • Label downgraded on a file — label lowered (e.g., HC → Confidential)
  • Label removed from a file — label stripped entirely
  • Label on file downgraded or removed, then file accessed by Copilot — label reduced AND Copilot accessed the now-less-protected file

Query 14: Copilot Label Exposure — Labeled Resources Accessed by Copilot

Purpose: Identify which sensitivity-labeled documents Copilot accessed during risky AI interactions. This surfaces data exposure risk where Copilot may be surfacing Highly Confidential content.

// Query 14: Copilot label exposure — what labeled docs did Copilot access in risky interactions?
// Prefix with LabelMapping datatable
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(RiskyAIUsageSensitivityLabelsInfo)
| where tostring(RiskyAIUsageSensitivityLabelsInfo) !has "null"
    or (tostring(RiskyAIUsageSensitivityLabelsInfo) has "null" and tostring(RiskyAIUsageSensitivityLabelsInfo) has "SensitivityLabelId")
| mv-expand LabelEntry = parse_json(tostring(RiskyAIUsageSensitivityLabelsInfo))
| extend LabelJson = parse_json(tostring(LabelEntry))
| extend SubEntityName = tostring(LabelJson.SubEntityName)
| extend LabelId = tostring(LabelJson.SensitivityLabelId)
| where isnotempty(LabelId)
| lookup kind=leftouter LabelMapping on LabelId
| extend LabelDisplay = iff(isempty(LabelName) or LabelName == "END_MARKER",
    strcat("[Unknown] ", LabelId),
    iff(isempty(LabelParent), LabelName, strcat(LabelParent, " / ", LabelName)))
| summarize
    EventCount = count(),
    DistinctUsers = dcount(AccountUpn),
    SubEntities = make_set(SubEntityName),
    ActionTypes = make_set(ActionType)
    by LabelDisplay, LabelId
| order by EventCount desc

SubEntityName values:

  • ResponseAccessedResource — a labeled document that Copilot cited in its response
  • Response — the Copilot response itself that was flagged

Query 15: Label-Only Events — Events Triggered Purely by Label (No SIT Content Match)

Purpose: Find events where the trigger was the sensitivity label alone, not SIT content detection. These represent label-aware DLP/IRM policy matches.

// Query 15: Label-only events — triggered by label, not SIT content
// Prefix with LabelMapping datatable
DataSecurityEvents
| where Timestamp > ago(30d)
| where isnotempty(SensitivityLabelId)
| where isempty(SensitiveInfoTypeInfo) or SensitiveInfoTypeInfo == "[]"
| extend LabelIds = split(SensitivityLabelId, ",")
| mv-expand LabelIdRaw = LabelIds
| extend LabelId = tostring(trim(" ", tostring(LabelIdRaw)))
| where isnotempty(LabelId)
| lookup kind=leftouter LabelMapping on LabelId
| extend LabelDisplay = iff(isempty(LabelName) or LabelName == "END_MARKER",
    strcat("[Unknown] ", LabelId),
    iff(isempty(LabelParent), LabelName, strcat(LabelParent, " / ", LabelName)))
| summarize
    EventCount = count(),
    DistinctUsers = dcount(AccountUpn),
    DistinctFiles = dcount(ObjectId),
    ActionTypes = make_set(ActionType),
    Workloads = make_set(Workload)
    by LabelDisplay, LabelId
| order by EventCount desc

Why this matters: In label-mature environments, this query can surface significant activity that the SIT-only queries completely miss. If a document has a "Highly Confidential" label but no SIT content (e.g., manually labeled strategic document), it only appears here.


Query 16a: Copilot SIT Landscape with Priority Tiers

Purpose: Break down which SITs fire in Copilot interactions, classify by priority tier, and quantify service account vs human split. This is the entry point for Phase 2.5.

Prerequisite: Merge the well-known SIT GUID mapping datatable (above) with any config.json sit_mapping entries before running.

// Query 16a: Copilot SIT Landscape — priority-tiered breakdown with agent/human split
// Prefix with SITMapping datatable
// ── SIT Priority Classification (canonical definition — Queries 16c/16d reference this) ──
let HighPrioritySITs = dynamic([
    "50842eb7-edc8-4019-85dd-5a5c1f2bb085",  // Credit Card Number
    "a44669fe-0d48-453d-a9b1-2cc83f2cba77",  // U.S. SSN
    "0f587d92-eb28-44a9-bd1c-90f2892b47aa",  // Azure DocumentDB Auth Key
    "ce1a126d-186f-4700-8c0c-486157b953fd",  // Azure SQL Connection String
    "0b34bec3-d5d6-4974-b7b0-dcdb5c90c29d",  // Azure IoT Connection String
    "c7bc98e8-551a-4c35-a92d-d2c8cda714a7",  // Azure Storage Account Key
    "095a7e6c-efd8-46d5-af7b-5298d53a49fc"   // Azure Redis Cache Connection String
    // ── ADD credential/HR SIT GUIDs from config.json sit_mapping HERE ──
]);
let MediumPrioritySITs = dynamic([
    // ── ADD project/employee ID SIT GUIDs from config.json sit_mapping HERE ──
]);
// ── Service account regex (update with org-specific patterns) ──
let ServiceAccountPattern = @"^(securitycopilotagentuser-|svc-)";
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "Copilot"
| where isnotempty(SensitiveInfoTypeInfo)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend IsAgent = AccountUpn matches regex ServiceAccountPattern
| summarize
    TotalEvents = count(),
    AgentEvents = countif(IsAgent),
    HumanEvents = countif(not(IsAgent) and isnotempty(AccountUpn)),
    HumanUsers = dcountif(AccountUpn, not(IsAgent) and isnotempty(AccountUpn)),
    PromptEvents = countif(ActionType has "prompt"),
    ResponseEvents = countif(ActionType has "response")
    by SITId
| lookup kind=leftouter SITMapping on $left.SITId == $right.SITId
| extend SITDisplay = iff(isempty(SITName) or SITName == "END_MARKER", strcat("[Unknown] ", SITId), SITName)
| extend PriorityTier = case(
    SITId in (HighPrioritySITs), "🔴 High",
    SITId in (MediumPrioritySITs), "🟡 Medium",
    "🔵 Low")
| project SITDisplay, PriorityTier, TotalEvents, AgentEvents, HumanEvents, HumanUsers, PromptEvents, ResponseEvents, SITId
| order by TotalEvents desc

Post-processing:

  • Populate HighPrioritySITs and MediumPrioritySITs arrays with credential, HR, and custom SIT GUIDs from config.json sit_mapping. Any SIT not in either array defaults to Low.
  • If config.json has a sit_priority section (GUID → tier mapping), use it to override the default classification.
  • Calculate Agent % of total — if > 50%, flag prominently in report ("⚠️ N% of Copilot SIT events are from automated service accounts").
  • Unknown SITs ([Unknown]) should be classified as 🔴 High by default (conservative — unknown custom SITs may be high-value EDM/exact data match).

Query 16b: Top Users by High-Priority SIT in Copilot (Excluding Service Accounts)

Purpose: Risk-rank human users whose Copilot interactions triggered high-priority SIT detections. Excludes automated service accounts.

// Query 16b: Top 20 human users by high-priority SIT in Copilot interactions
// Prefix with SITMapping datatable and HighPrioritySITs + ServiceAccountPattern from Query 16a
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "Copilot"
| where isnotempty(SensitiveInfoTypeInfo)
| where not(AccountUpn matches regex ServiceAccountPattern)
| where isnotempty(AccountUpn)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| where SITId in (HighPrioritySITs)
| lookup kind=leftouter SITMapping on $left.SITId == $right.SITId
| extend SITDisplay = iff(isempty(SITName) or SITName == "END_MARKER", strcat("[Unknown] ", SITId), SITName)
| summarize
    Events = count(),
    DistinctHighSITs = dcount(SITId),
    SITNames = make_set(SITDisplay),
    PromptEvents = countif(ActionType has "prompt"),
    ResponseEvents = countif(ActionType has "response"),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by AccountUpn
| order by Events desc
| take 20

Query 16c: Daily Copilot SIT Trend by Priority Category

Purpose: Track daily Copilot SIT detection volume by priority tier to distinguish adoption changes from risk spikes.

// Query 16c: Daily Copilot SIT trend by priority category (human users only)
// ── Copy HighPrioritySITs from Query 16a (canonical list: CCN, SSN, Azure credentials + config.json custom) ──
// ── Copy ServiceAccountPattern from Query 16a ──
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "Copilot"
| where isnotempty(SensitiveInfoTypeInfo)
| where not(AccountUpn matches regex ServiceAccountPattern)
| where isnotempty(AccountUpn)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| extend PriorityTier = iff(SITId in (HighPrioritySITs), "High", "Low")
| summarize Events = count() by Day = bin(Timestamp, 1d), PriorityTier
| order by Day asc, PriorityTier asc

Post-processing: Render as a dual-line chart or table with High vs Low columns per day. Spikes in the High tier warrant investigation; spikes in Low tier alone are typically noise from broad SITs (All Full Names, IP Addresses) and can be noted but not escalated.


Query 16d: Prompt-Only Analysis — Human Users Typing Sensitive Data INTO Copilot

Purpose: The primary risk signal — users who typed sensitive data into Copilot prompts (not just receiving sensitive responses). Excludes service accounts and response-only events.

// Query 16d: Prompt-only human users with high-priority SIT detections
// This is the key risk signal: sensitive data entered BY the user INTO Copilot
// ── Copy HighPrioritySITs and ServiceAccountPattern from Query 16a ──
DataSecurityEvents
| where Timestamp > ago(30d)
| where ActionType has "prompt"  // Prompts only — user-initiated risk
| where isnotempty(SensitiveInfoTypeInfo)
| where not(AccountUpn matches regex ServiceAccountPattern)
| where isnotempty(AccountUpn)
| mv-expand SIT = parse_json(tostring(SensitiveInfoTypeInfo))
| extend SITJson = parse_json(tostring(SIT))
| extend SITId = tostring(SITJson.SensitiveInfoTypeId)
| where SITId in (HighPrioritySITs)
| lookup kind=leftouter SITMapping on $left.SITId == $right.SITId
| extend SITDisplay = iff(isempty(SITName) or SITName == "END_MARKER", strcat("[Unknown] ", SITId), SITName)
| summarize
    PromptEvents = count(),
    DistinctHighSITs = dcount(SITId),
    SITNames = make_set(SITDisplay),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by AccountUpn
| order by PromptEvents desc
| take 20

Why prompts matter more than responses:

  • "Risky prompt entered in Copilot" = User typed/pasted sensitive data (SSN, credit card, credentials) into a Copilot prompt. This is a behavioral risk — the user chose to share sensitive data with AI.
  • "Sensitive response received in Copilot" = Copilot retrieved and surfaced sensitive data from grounding sources (SharePoint, OneDrive, email). This is an access/configuration risk — overpermissioned data exposure.
  • Prompt events are the stronger signal for insider risk and user coaching interventions.

Report Template

Report Rendering Rules

These rules are MANDATORY for all report output (inline chat and markdown file). Follow strictly.

Rule 1: Risk Rating Scale

When assigning risk levels to users in the file-based user ranking, use this hierarchy:

Risk Level Evidence Required
Critical "Files collected and exfiltrated" ActionType present — confirmed insider risk exfiltration signal OR mass exfiltration pattern: ≥1,000 distinct files to removable media + file deletions within a ≤48-hour window (volume-based escalation even without the IRM-labeled ActionType)
High Exfiltration signals below Critical thresholds (e.g., USB copies < 1,000 files without deletion pattern) OR sustained high DLP alert volume (top 2-3 by events/files)
Medium Broad SIT diversity (10+ SIT types) OR cross-workload activity (3+ workloads) OR external domain WITHOUT explicit exfiltration signal
Low Single-workload, moderate volume, no exfiltration or anomaly signals

PROHIBITED: Rating a user with "Files collected and exfiltrated" as Medium or Low. This ActionType is always High or Critical. ⛔ PROHIBITED: Rating a user with ≥1,000 files USB-copied + deletion in ≤48h as anything below Critical.

Rule 2: Executive Summary Uses File-Based Metrics Only

The Executive Summary MUST cite file-based (non-Copilot) event counts and file counts for user risk descriptions. Never cite overall metrics that include Copilot volume — this inflates perceived risk.

Context Cite Example
✅ File-based risk summary Non-Copilot events, non-Copilot files "u3087 generated 211 file-based events across 32 files"
❌ Inflated overall metrics Total events including Copilot "u1812 — 294 total events including 185 files"

Rule 3: Top Users Overall Section — Copilot Compression

When Copilot events exceed 80% of total volume:

  • Do NOT render a standalone "Top Users Overall" section dominated by Copilot service accounts
  • Instead, include a brief note: "Top overall users are dominated by Copilot service accounts/heavy Copilot users — see file-based user ranking below for actual data access risk."
  • If any users in the top-10 overall have multi-workload activity (Copilot + file operations), call them out in a single sentence rather than a full table

Rule 4: Copilot Count Reconciliation

When reporting Copilot vs file-based splits, ensure the counts reconcile across sections:

  • Action Type breakdown Copilot total = Workload breakdown Copilot total
  • If they differ (e.g., Connected AI App events counted differently), annotate the delta
  • Show the reconciliation in the Action Type section: "Copilot interactions: N events (Action Types: Risky prompt X + Sensitive response Y + Combined Z = N)"

Rule 5: Scope & Limitations Section (Required for Markdown Reports)

Markdown file reports MUST include a Scope & Limitations section immediately after the Executive Summary. Include:

## Scope & Limitations

| Consideration | Detail |
|--------------|--------|
| **Data Source** | DataSecurityEvents (Defender XDR Advanced Hunting) — requires Insider Risk Management opt-in to share data with Defender XDR |
| **Coverage** | SIT detections + sensitivity label events — files with neither SIT content match nor sensitivity label do NOT appear in this data |
| **Label Coverage** | X% of events have sensitivity labels — label analysis depth adjusted accordingly (see Rule 11) |
| **Retention** | 30-day Advanced Hunting retention |
| **ML Classifiers** | N trainable classifier GUIDs could not be resolved — see Unresolved SIT GUIDs section |
| **Copilot Volume** | Copilot events represent X% of total volume and are separated from file-based analysis throughout this report |

Fill in the actual values for N and X% from the query results.

Rule 6: SIT Landscape Table Integrity

  • Each GUID must appear exactly once in the SIT Landscape table — one row per GUID, no exceptions
  • NEVER group multiple GUIDs into a single row with slash-separated values (e.g., 1e883268/d2cdc387/bf6e0b84...). Even "copy" variants of the same SIT that share identical metrics MUST be separate rows
  • After all GUID resolution tiers complete, deduplicate by GUID — if conflicts exist, prefer the most specific resolution (Tier 3 PowerShell > Tier 2 config > Tier 1 embedded)
  • Group the table by category: Custom/Organization SITs, Built-in Microsoft SITs, ML Classifiers (Unresolvable)
  • Do NOT include a GUID under two different names
  • The total distinct SIT count cited in the Executive Summary must equal the number of rows in the SIT Landscape tables (sum of all category sub-tables)
  • Post-render verification (MANDATORY): After building all SIT Landscape sub-tables, count the total rows. If the exec summary cites a different number, update the exec summary to match. Format: "N active SIT types" where N = sum of rows across Custom, Built-in, and Unresolved sub-tables

PROHIBITED: Bundling GUIDs like "Credit Card Number copy (x3)" with 6,966 ea. — each of the 3 GUIDs must be its own row with its own exact counts ⛔ PROHIBITED: Exec summary citing a SIT count that doesn't match the actual row count in the SIT Landscape tables

Rule 7: Spike Detection — File-Based Primary, Overall Secondary

When rendering spike alerts:

  • Primary: Always run and display Query 10b (file-based-only spikes) as the main spike alert section. This surfaces actual sensitive data access spikes.
  • Secondary: Run Query 10 (all events) only if the user requests overall spikes or if there are interesting patterns worth noting. Include a clear note that these spikes are predominantly Copilot-driven.
  • If only running one spike query, always prefer Query 10b.
  • In the report, label sections clearly: "File-Based SIT Access Spikes" vs "Overall SIT Access Spikes (incl. Copilot)".

Rule 8: Top Files — Exclude System/Operational Files

The Top Files section must exclude Defender for Endpoint operational file reads:

  • C:\ProgramData\Microsoft\Windows Defender\DLPCache\* — DLP label metadata cache reads
  • *\EBWebView\* — Edge WebView browser cache
  • Any path matching \ProgramData\Microsoft\ that is clearly a system/cache path

If system files appear in results despite the query filter, separate them into a "System/Operational Files" subsection below the main "User-Accessed Files" list.

Rule 9: Risk Rating Consistency — Exec Summary Must Match User Table

Every user mentioned in the Executive Summary MUST use the same risk rating as in the File-Based Top Users table. If the table says 🔴 Critical, the exec summary must say Critical (and vice versa).

  • After building the File-Based Top Users table (the source of truth), cross-check every user mention in the exec summary
  • If there is a conflict, the User Table rating wins — update the exec summary to match
  • Never rate a user differently in two sections of the same report

PROHIBITED: Exec summary says "High" while user table says "Critical" for the same user (or any other mismatch).

Rule 10: Temporal Pattern — Include File-Based Event Column

The Temporal Pattern (daily volume) section MUST include a File Events column alongside the total. Without this, daily spikes appear alarming when they may be entirely Copilot-driven.

Column Required Source
Date bin(Timestamp, 1d)
Daily Events Total count()
File Events countif(Workload !in ("Copilot", "ConnectedAIApp"))
Distinct Users dcount(AccountUpn)
Notable Annotation for spikes

When annotating spikes (🔴), clarify whether the spike is Copilot-driven or file-driven:

  • "🔴 Major spike — file-driven" (if File Events also spike)
  • "🟡 Copilot adoption spike — file activity normal" (if only total spikes but File Events are stable)

Use Query 5 (updated) which returns both columns.

Rule 11: Adaptive Label Analysis Depth

The depth of sensitivity label analysis MUST be determined dynamically based on the Phase 1 Label Coverage Assessment. This prevents wasting queries in SIT-dominant environments while ensuring full coverage in label-mature environments.

Label Coverage Report Behavior
≥5% labeled events Full dedicated label sections: Label Landscape table, Label-Based Top Users, Label Changes, Copilot Label Exposure, Label-Only Events. These render as peer sections alongside SIT analysis
1-5% labeled events Condensed "Sensitivity Labels" section: Label Landscape table + Label Changes only. Other label dimensions mentioned in summary notes
<1% labeled events Brief note in Scope & Limitations: "Sensitivity label data is sparse (<1% of events) — environment is SIT-dominant. N events had labels; see label coverage stats below." No dedicated label sections unless user asks
User explicitly asks about labels Full label sections regardless of coverage percentage

The Label Coverage Assessment also determines the overall report framing:

  • SIT-dominant (<5% labels): Report title/framing stays "SIT Access Analysis" with label addendum
  • Dual signal (5-50% labels): Report framing becomes "Data Security Analysis (SIT + Labels)"
  • Label-dominant (>50% labels): Report framing becomes "Data Security Analysis" with labels as primary signal and SIT as secondary

PROHIBITED: Running all 5 label queries (11-15) when coverage is <1% and user didn't ask about labels — this wastes API calls and clutters the report.

Rule 12: Label Display Format — Always Show Parent/Child Hierarchy

When rendering sensitivity label names, ALWAYS show the parent-child hierarchy using "/" notation:

Raw Label Correct Display Incorrect Display
Sub-label under Highly Confidential "Highly Confidential / All Employees" "All Employees"
Sub-label under Confidential "Confidential / All employees" "All employees"
Top-level label "General" "General" (correct as-is)
Unknown label "[Unknown] abc12345..." blank or omitted

This prevents ambiguity — "All Employees" exists under BOTH Confidential and Highly Confidential as different labels with different GUIDs.

PROHIBITED: Displaying sub-label names without their parent (e.g., just "Specific people" — which parent?).

Rule 13: Service Account Separation in Copilot Analysis

All Copilot SIT analysis MUST separate automated service accounts from human users. Service accounts (Security Copilot agents, Purview agents, etc.) can generate 50-70% of Copilot event volume and must not inflate human risk metrics.

Requirement Detail
Identify service accounts Filter: AccountUpn matches regex ServiceAccountPattern (defined in Query 16a: @"^(securitycopilotagentuser-|svc-)"). Check Query 16a results for additional org-specific patterns (any account with >10K events and a GUID-like UPN prefix is likely automated)
Report agent volume separately Include a summary line: "⚠️ N automated service accounts generated X events (Y% of Copilot volume) — excluded from human risk analysis below"
Never mix in user rankings Queries 16b-16d exclude agents by default. If rendering an overall Copilot user table, agents go in a separate subsection
Power-user flagging After agent exclusion, if any single human user accounts for >20% of remaining Copilot events, flag them: "⚠️ Power user — may indicate automated workflow via Copilot"

PROHIBITED: Including automated service accounts in human user risk rankings for Copilot SIT analysis. ⛔ PROHIBITED: Reporting raw Copilot event counts as "user interactions" without noting the ~2-3x row multiplication factor.

Rule 14: Copilot Event Row Multiplication Awareness

Each Copilot interaction generates multiple DSE rows (average ~2-3x, up to 35x for complex exchanges). Purview creates separate rows for:

  • The prompt (if it contains a SIT match) — "Risky prompt entered in Copilot"
  • The response (if grounding data contains SIT matches) — "Sensitive response received in Copilot"
  • Compound events — agent interactions, SharePoint access, multiple simultaneous conditions
Reporting Requirement Format
Raw event counts "N Copilot SIT events (raw DSE rows)"
Estimated interactions "~N/2.5 estimated real interactions" (use the ratio from the environment if calculated)
User daily rates If citing per-user daily rates, note whether raw or estimated: "~X interactions/day (estimated from Y raw events)"

This rule prevents stakeholders from seeing "270K Copilot events" and panicking when the actual interaction count is ~95K.

Rule 15: Copilot SIT Report Section (Phase 2.5 Output)

When Phase 2.5 is triggered (Copilot >30% of events), render a dedicated "Copilot SIT Exposure" section with:

  1. Service Account vs Human Split — "X events from N service accounts (Y%), Z events from N human users (Y%)" with the agent exclusion note
  2. SIT Priority Landscape — Table from Query 16a showing SITs by priority tier (High/Medium/Low) with human-only metrics
  3. High-Priority SIT Users — Top 10 human users from Query 16b with prompt vs response breakdown
  4. Prompt Risk Signal — Top 10 users from Query 16d who typed high-priority SITs into Copilot prompts — this is the primary actionable finding
  5. Temporal Trend — Daily trend from Query 16c (optional — include if spikes are interesting)

If Copilot events are <30% of total, skip Phase 2.5 entirely and note in the Copilot section: "Copilot events represent X% of total volume — below threshold for dedicated analysis. See Action Type breakdown for Copilot summary."


Inline Chat Format

## 📊 Data Security Events Analysis
**Scope:** <Tenant-wide / SIT: <name> / User: <UPN>>
**Time Range:** <start> to <end>
**Total Events:** <N> | **Distinct Users:** <N> | **Distinct Files:** <N>

### SIT Landscape
| # | SIT Name | GUID (short) | Events | Users | Files | Classifier |
|---|----------|-------------|--------|-------|-------|------------|
| 1 | Credit Card Number | 50842eb7... | 7,255 | 46 | 346 | Content |
| 2 | All Full Names | 50b8b56b... | 128,957 | 1,475 | 119 | Content |
| ... | | | | | | |

### Action Type Breakdown
| Action Type | Events | Users | Files |
|-------------|--------|-------|-------|
| Sensitive response received in Copilot | 228,564 | ... | ... |
| Risky prompt entered in Copilot | 390,905 | ... | ... |
| ... | | | |

### 🔴 Top Users by SIT Volume (Risk-Ranked)
| # | User | Total Events | Distinct SITs | Distinct Files | Last Active |
|---|------|-------------|---------------|----------------|-------------|
| 1 | user@domain.com | 12,345 | 8 | 42 | 2026-03-16 |
| ... | | | | | |

### 🏷️ Sensitivity Label Landscape (if ≥1% coverage)
| # | Label | GUID (short) | Events | Users | Files |
|---|-------|-------------|--------|-------|-------|
| 1 | Highly Confidential / All Employees | defa4170...000a | 62 | 9 | 31 |
| 2 | General | defa4170...0002 | 55 | 12 | 32 |
| ... | | | | | |

### ⚠️ Label Changes (if any PreviousSensitivityLabelId events exist)
| Timestamp | User | File | Previous Label | Current Label | Action |
|-----------|------|------|---------------|--------------|--------|
| 2026-03-11 | user@domain.com | doc.docx | HC / Project X | Confidential / All employees | Label downgraded |
| ... | | | | | |

### 🤖 Copilot SIT Exposure (if Copilot >30% of events)
**Service Accounts:** N agents generated X events (Y%) — excluded from analysis below
**Row Multiplication:** ~2-3 DSE rows per real interaction (est. ~Z real interactions)

| # | SIT Name | Priority | Human Events | Human Users | Prompts | Responses |
|---|----------|----------|-------------|-------------|---------|----------|
| 1 | Credit Card Number | 🔴 High | 1,234 | 89 | 456 | 778 |
| 2 | All Full Names | 🔵 Low | 45,678 | 1,200 | 12,345 | 33,333 |
| ... | | | | | | |

**🔴 Prompt Risk — Users Typing High-Priority SITs Into Copilot:**
| # | User | Prompt Events | SIT Types | Last Active |
|---|------|-------------|-----------|-------------|
| 1 | user@domain.com | 142 | SSN, Credit Card | 2026-03-16 |
| ... | | | | |

### ⚠️ SIT Access Spike Alerts
| User | Baseline (daily avg) | Recent (daily avg) | Spike Ratio | Status |
|------|---------------------|-------------------|-------------|--------|
| user@domain.com | 5.2 | 47.1 | 9.06x | 🔴 Spike |
| ... | | | | |

### Unresolved SIT GUIDs
<List of GUIDs not in mapping — offer PowerShell resolution>

Markdown File Format

Same structure as inline, wrapped in proper markdown with:

  • Report metadata header (generated timestamp, scope, tool versions)
  • Scope & Limitations section immediately after Executive Summary (see Rule 5 above)
  • Each section as H2/H3
  • File-based user ranking as the primary risk section (NOT the Copilot-dominated overall ranking)
  • DLP Policy Match section with DefaultRule explanation for empty PolicyName entries
  • Sensitivity Label sections (if coverage ≥1% or user requested): Label Landscape, Label-Based Top Users, Label Changes, Copilot Label Exposure
  • Code fences for any raw data samples
  • Save to: reports/data-security/DataSecurity_Analysis_<scope>_YYYYMMDD_HHMMSS.md

Known Pitfalls

Pitfall Detail Mitigation
SensitiveInfoTypeInfo is Collection(String), not dynamic Each element is a JSON string requiring double-parse: parse_json(tostring(SensitiveInfoTypeInfo)) to expand array, then parse_json(tostring(element)) to access fields Always double-parse. Direct dot-notation fails silently
Massive row counts 600k+ rows in 30 days for mid-size tenants; millions for 100k+ user orgs ALWAYS summarize first. NEVER retrieve raw rows without take limit
mv-expand is expensive Expanding SensitiveInfoTypeInfo across 600k rows without pre-filtering is extremely slow Pre-filter with where SensitiveInfoTypeInfo has "<GUID>" before mv-expand
Copilot dominates event volume 90%+ of events can be Copilot-related. See Rules 2-3 for report handling and Phase 2.5 for dedicated Copilot analysis Filter to specific ActionType values for file access. Use Workload !in ("Copilot", "ConnectedAIApp") for file-only analysis
Trainable classifiers (MLModel) don't resolve GUIDs with ClassifierType: "MLModel" may not exist in Get-DlpSensitiveInformationType Display as [ML Classifier] <GUID> and note in report
SIT GUID is per-SIT, not per-detection Multiple documents can match the same SIT GUID — the GUID identifies the SIT type, not the specific match Use Count and Confidence fields from SITJson for detection-level detail
ObjectId can be empty Copilot interaction events may not have an ObjectId (no specific file) Filter isnotempty(ObjectId) for file-specific analysis
IRM opt-in required DataSecurityEvents is populated by Insider Risk Management. No opt-in = empty table Check for 0 results and explain the prerequisite
Table schema evolves DataSecurityEvents is in Preview — column names and availability may change Run getschema if queries fail with column resolution errors
DlpPolicyMatchInfo is sparse Only ~0.5% of rows have DLP policy match data (the rest are IRM-only detections) Don't assume all SIT events have DLP data; they're independent signals
Duplicate GUID in SIT mapping One GUID can only resolve to one SIT name. If a GUID appears in both the embedded datatable and a Tier 2/3 resolution with a different name, the result will have duplicate rows with conflicting names. This can happen when a built-in SIT GUID overlaps with a custom SIT copy, or when PowerShell returns a different display name than the embedded mapping After resolving all GUIDs, deduplicate by GUID before rendering the SIT Landscape table. If a GUID maps to multiple names, prefer the Tier 3 (PowerShell) name over Tier 1 (embedded). Never show the same GUID on two rows with different names
Empty PolicyName = DefaultRule ("Always audit") DLP alerts with empty/null PolicyName are typically generated by the built-in DefaultRule that fires when "Always audit file activity for devices" is enabled (ON by default). These are NOT orphaned or misconfigured policies — they are the expected result of the default audit setting In the DLP Policy Match section, explain: "Events with empty PolicyName are generated by the built-in DefaultRule, which audits all monitored file types (Office, PDF, CSV) on onboarded devices when 'Always audit file activity for devices' is enabled (default: ON). No explicit DLP policy is required for these events."
Compound ActionType values Some events have ActionType values that combine multiple labels (e.g., "Risky prompt entered in Copilot, Sensitive response received in Copilot" or "Sensitive info shared on Teams, DLP Rule Matched"). These are literal string values from the data, not aggregation artifacts Display compound ActionTypes exactly as they appear in the data. Do NOT split or merge them — they represent events where multiple conditions were met simultaneously
System/operational files dominate Top Files Files under C:\ProgramData\Microsoft\Windows Defender\DLPCache\RMSLabels\ and *\EBWebView\* are Defender for Endpoint reading sensitivity label metadata — NOT user-initiated file access. These can dominate 90%+ of the Top Files list Query 4 filters these paths. If they still appear, separate into a "System/Operational Files" subsection. Never present DLPCache reads as user data access risk
Localized SIT names in CloudAppEvents CloudAppEvents DLPRuleMatch events include SIT names, but names appear in the user's locale (e.g., "የዱቤ ካርድ ቁጥር" instead of "Credit Card Number" for Amharic users). Same GUID can map to different name strings depending on locale Always aggregate by SIT GUID, never by SIT name. Use the GUID-to-name mapping (Tier 1/2/3) for canonical English names. This applies when cross-referencing CloudAppEvents with DataSecurityEvents
Browsing events are not files — separate from Top Files ActionTypes like "Generative AI websites browsed" and "Gambling websites browsed" reference URLs, not files. They have no ObjectId file path — only a URL domain. Including them under "Top Files" is misleading Render browsing/URL events in a separate subsection (e.g., "External / AI Service Access") below Top Files. Never mix URL-based events into the file ranking tables. Title the Top Files section accurately ("Top Files" not "Top Files & URLs")
Temporal spike annotations must reference known users When annotating daily spikes in the Temporal Pattern table (e.g., "🔴 Major spike — u625 SharePoint batch"), the attributed user MUST appear elsewhere in the report — in the File-Based Top Users table, a drill-down section, or at minimum a footnote. Referencing a user that exists nowhere else in the report violates the evidence-based analysis rule and creates an unverifiable claim Before annotating a spike with a user attribution, verify the user appears in the Top Users ranking. If they don't make the top-10 but are the spike driver, either: (a) add them to the user table with a note "included due to temporal spike attribution", or (b) use a generic annotation ("🔴 Major spike — file-driven") without naming the user
SensitivityLabelId can contain comma-separated GUIDs When an event involves multiple sub-entities (e.g., Copilot accessing multiple labeled resources), the SensitivityLabelId column contains comma-separated GUIDs like aaaa-...,aaaa-...,bbbb-.... This is NOT a single GUID Always split(SensitivityLabelId, ",") then mv-expand to handle multi-GUID values. Querying with == "<GUID>" will miss events; use has "<GUID>" for filtering or split() + mv-expand for enumeration
RiskyAIUsageSensitivityLabelsInfo is mostly [null] The column is populated on 90%+ of Copilot events, but almost all contain [null] (a JSON array with a literal null element). Only events where Copilot actually accessed labeled resources have real data Filter with tostring(RiskyAIUsageSensitivityLabelsInfo) !has "null" to exclude the null-dominated rows. The isnotempty() check alone is insufficient — [null] passes it
Label GUIDs have no ClassifierType field Unlike SIT GUIDs which have ClassifierType: "Content" or "MLModel", label GUIDs have no built-in type indicator. Resolution requires external lookup (datatable, config, or PowerShell Get-Label) Use the Label GUID Mapping Strategy (3 tiers). Unknown labels display as [Unknown] <GUID>
Default label GUIDs are deterministic but not officially documented All 12 default labels (including parents) use defa4170-0d19-0005-XXXX-bc88714345d2 with sequential suffixes 0000000b (confirmed via Get-Label on default-configuration tenants). However, Microsoft does not publish these GUIDs in official docs. Older/customized tenants may have: (a) renamed default labels (e.g., "Non-business" instead of "Personal"), (b) replaced parent label GUIDs with random tenant-specific ones, or (c) added custom sub-labels that break the sequential pattern The embedded datatable includes all 12 confirmed default GUIDs. For tenants with custom parent GUIDs, add them via config.json label_mapping. If Get-Label returns different names for a defa4170 GUID, prefer the Get-Label name for that tenant
PreviousSensitivityLabelId can equal SensitivityLabelId On "Label removed from a file" events, both fields may contain the same GUID (the label that was removed). The current label after removal may be empty Check ActionType to distinguish: "Label downgraded" = actual label change; "Label removed" = label stripped (current may be empty); "Label on file downgraded or removed, then file accessed by Copilot" = compound event with Copilot exposure
Label-only events require label-aware DLP/IRM policies Events with labels but no SIT content (Query 15) only appear if a DLP or IRM policy explicitly targets sensitivity labels. Environments with no label-based policies will see zero label-only events even if documents are labeled If Query 15 returns 0 results AND labels exist (Query 11 > 0), this indicates no label-based DLP/IRM policies are configured — mention this as a gap in the report
Copilot service accounts inflate event counts Automated service accounts can generate 50-70% of all Copilot events. See Rule 13 for full requirements Filter: where not(AccountUpn matches regex ServiceAccountPattern) (defined in Query 16a). Run Query 16a to quantify agent vs human split
Copilot row multiplication (~2-3x per interaction) Each interaction generates ~2-3 DSE rows (prompt + response + compound events), up to 35 for complex exchanges. See Rule 14 for reporting requirements Estimate real interactions as raw_events / 2.5. Always include multiplier context when reporting Copilot volumes
Correct field name is SensitiveInfoTypeId (not SensitiveType) Inside the SensitiveInfoTypeInfo JSON, the SIT GUID field is named SensitiveInfoTypeId. LLMs frequently generate SensitiveType or SensitiveTypeId — both are wrong and return null/empty when accessed Always use tostring(SITJson.SensitiveInfoTypeId). Other valid fields: Confidence (int), ClassifierType (string), Count (int), SubEntityName (string — "Prompt" or "Response" in Copilot events)
SubEntityName distinguishes Prompt vs Response in Copilot SIT events Inside SensitiveInfoTypeInfo JSON, SubEntityName contains "Prompt" or "Response" — indicating whether the SIT was detected in the user's prompt or Copilot's response. This is more reliable than relying on ActionType text matching for prompt/response classification within individual SIT matches Use tostring(SITJson.SubEntityName) when doing per-SIT prompt/response breakdowns. ActionType classifies the event-level action; SubEntityName classifies the per-SIT-match context

Error Handling

Error Cause Resolution
Failed to resolve table 'DataSecurityEvents' Table not available — IRM not opted in, or not connected to Defender XDR Inform user: "DataSecurityEvents requires Microsoft Purview Insider Risk Management opt-in to share data with Defender XDR."
0 results for SensitiveInfoTypeInfo queries No SIT detections in timeframe, or SIT detection not enabled in IRM policies Widen time range; check if IRM policies include SIT detection
Failed to resolve column 'ObjectName' Schema changed or column renamed Use ObjectId instead (confirmed available). Run getschema to verify current schema
PowerShell Get-DlpSensitiveInformationType fails Not connected to IPPS session Run Connect-IPPSSession -UserPrincipalName <UPN> first
The term 'Get-DlpSensitiveInformationType' is not recognized Module not installed or IPPS session in different terminal Install-Module ExchangeOnlineManagement then Connect-IPPSSession in the same terminal session
The term 'Get-Label' is not recognized Not connected to IPPS session or module not loaded Same as above — Connect-IPPSSession provides both Get-DlpSensitiveInformationType and Get-Label
Label GUIDs all show as [Unknown] Default label datatable doesn't match target tenant (parent GUIDs vary) Resolve via Get-Label and persist to config.json label_mapping

File Access Action Types Reference

When the user specifically asks about who opened/accessed/downloaded documents, filter to these ActionTypes:

ActionType Meaning
Sensitive File read File opened on endpoint (Defender for Endpoint)
File accessed on SPO File opened in SharePoint Online / OneDrive
File downloaded from SharePoint File downloaded from SPO/OneDrive
File copied to Removable media File copied to USB/removable storage
File upload to cloud File uploaded to cloud storage
Sensitive file created New file created with sensitive content
File Archived File moved to archive
Text copied to clipboard from sensitive file Clipboard copy from sensitive doc

Copilot-related ActionTypes (separate category — AI interaction, not direct file access):

ActionType Meaning
Sensitive response received in Copilot Copilot surfaced content matching a SIT
Risky prompt entered in Copilot User prompt triggered risk detection
Sensitive response received in Copilot;Agent generating sensitive responses Copilot agent generated response containing SIT matches
Risky prompt entered in Copilot;Sensitive response received in Copilot Both prompt and response contained SIT matches
Risky prompt entered in Copilot;Sensitive response received in Copilot;Exposing agent to risky prompts;Agent generating sensitive responses Compound agent event — prompt + response + agent risk
Risky prompt entered in Copilot;Exposing agent to risky prompts User exposed a Copilot agent to a risky prompt

Label-related ActionTypes (sensitivity label events):

ActionType Meaning
Label downgraded on a file Sensitivity label lowered (e.g., HC → Confidential)
Label removed from a file Sensitivity label stripped entirely
Label on file downgraded or removed, then file accessed by Copilot Label reduced AND Copilot subsequently accessed the file

DLP ActionTypes:

ActionType Meaning
Generated High severity DLP alerts DLP policy triggered a high-severity alert
DLP Rule Matched DLP rule matched (may be combined with other types)

SVG Dashboard Generation

After generating a Data Security Events analysis report (markdown file output), an SVG dashboard can be created using the shared SVG rendering skill.

Trigger: User asks "generate an SVG dashboard from the report" or "visualize this report"

Workflow:

  1. Read this skill's svg-widgets.yaml (widget manifest — defines layout, colors, field mapping)
  2. Read .github/skills/svg-dashboard/SKILL.md (rendering rules — component library, quality standards)
  3. Extract data from the completed report using data_sources.field_mapping_notes
  4. Render SVG → save as {report_basename}_dashboard.svg in the same directory

Layout: 5 rows — title banner, KPI cards (events/users/files/file%/copilot%/label%), top SITs bar chart + workload donut, risk-ranked users table + file action bars, assessment banner + recommendations.

基于Microsoft Defender for Office 365高级狩猎数据,生成邮件威胁防护态势报告。覆盖邮件流、威胁构成、钓鱼防护、认证、ZAP修复、Safe Links及附件分析等维度,提供C级安全可见性。
email threat report email security posture phishing report MDO report Defender for Office 365 report ZAP effectiveness Safe Links report DMARC report spam report email volume report
.github/skills/email-threat-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill email-threat-posture -g -y
SKILL.md
Frontmatter
{
    "name": "email-threat-posture",
    "description": "Generate email threat protection reports and assess email security posture. Triggers on keywords like \"email threat report\", \"email security posture\", \"phishing report\", \"MDO report\", \"Defender for Office 365 report\", \"ZAP effectiveness\", \"Safe Links report\", \"DMARC report\", \"spam report\", \"email volume report\". Queries EmailEvents, EmailPostDeliveryEvents, UrlClickEvents, and EmailAttachmentInfo in Advanced Hunting for a posture assessment covering inbound mail flow, threat composition, phishing detection, email authentication (DMARC\/DKIM\/SPF), post-delivery remediation (ZAP), Safe Links click protection, attachment analysis, detection method effectiveness, and delivery disposition. Supports inline chat, markdown file, and SVG dashboard output.",
    "drill_down_prompt": "Run email threat posture report — phishing trends, delivery gaps, protection effectiveness",
    "threat_pulse_domains": [
        "email"
    ]
}

Email Threat Protection Posture — Instructions

Purpose

This skill generates an Email Threat Protection Posture Report using Microsoft Defender for Office 365 (MDO) telemetry available through Advanced Hunting. It provides C-level visibility into how effectively the organization's email security stack is detecting, blocking, and remediating email-based threats.

What this skill covers:

Domain Key Questions Answered
📬 Mail Flow Overview How many inbound emails? What's the daily trend? Who are the top senders?
🛡️ Threat Composition How many phishing, spam, and malware threats were detected?
🎯 Phishing Protection How many phishing emails were blocked vs delivered? Who are the most targeted users?
🔐 Email Authentication What are the DMARC/DKIM/SPF/CompAuth pass rates? Which domains fail authentication?
🧹 Post-Delivery Remediation How effective is ZAP? How many remediations succeeded vs failed?
🔗 Safe Links Protection How many URL clicks were scanned? Were any phishing clicks allowed through?
📎 Attachment Analysis What attachment types are flowing through email? Were any malicious?
📊 Detection Methods What detection technologies are catching threats (URL detonation, fingerprinting, etc.)?
📦 Delivery Disposition Where do emails end up — inbox, junk, quarantine, blocked?
🚨 MDO Incidents How many security incidents were generated by Defender for Office? What severity, status, and types?

Data sources: EmailEvents, EmailPostDeliveryEvents, UrlClickEvents, EmailAttachmentInfo, SecurityAlert, SecurityIncident (Advanced Hunting)

References:

🔴 URL Registry — Canonical Links for Report Generation

MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL.

Label Canonical URL
DOCS_EMAILEVENTS https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-emailevents-table
DOCS_EMAILPOSTDELIVERY https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-emailpostdeliveryevents-table
DOCS_URLCLICKEVENTS https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-urlclickevents-table
DOCS_EMAILATTACHMENTINFO https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-emailattachmentinfo-table
DOCS_MDO_EFFICACY https://learn.microsoft.com/en-us/defender-office-365/reports-mdo-email-collaboration-dashboard#appendix-advanced-hunting-efficacy-query-in-defender-for-office-365-plan-2
DOCS_MDO_OVERVIEW https://learn.microsoft.com/en-us/defender-office-365/mdo-about
DOCS_SECURITY_ALERT https://learn.microsoft.com/en-us/azure/sentinel/data-connectors/microsoft-sentinel-security-alert
DOCS_ZAP https://learn.microsoft.com/en-us/defender-office-365/zero-hour-auto-purge
DOCS_SAFE_LINKS https://learn.microsoft.com/en-us/defender-office-365/safe-links-about

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules — Mandatory rules
  2. Email Protection Score Formula — Composite posture scoring
  3. Execution Workflow — Phase-by-phase query plan
  4. Sample KQL Queries — All queries (Q1–Q14)
  5. Output Modes — Inline vs Markdown report
  6. Inline Report Template — Chat-rendered format
  7. Markdown File Report Template — Disk-saved format
  8. Known Pitfalls — Schema quirks and edge cases
  9. Quality Checklist — Pre-delivery validation
  10. SVG Dashboard Generation — Visual dashboard from report

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

  1. Use RunAdvancedHuntingQuery by default — EmailEvents and related tables are XDR-native tables available in Advanced Hunting. Use Timestamp as the datetime column. If a query fails in AH, fall back to Sentinel Data Lake (query_lake) using TimeGenerated.

  2. Default lookback: 7 days — Unless the user specifies a different period. This provides a meaningful weekly snapshot for executive reporting while staying within AH's 30-day retention.

  3. ASK the user for output format before generating the report:

    • Inline chat summary (quick review in chat)
    • Markdown file report (detailed, archived to reports/email-threat-posture/)
    • Both (markdown + inline summary)
  4. ⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (✅ No [finding] detected) when queries return 0 results. Never fabricate data.

  5. Run queries in parallel batches where possible — Phase 1 queries (Q1–Q4) are independent. Phase 2 queries (Q5–Q8) are independent. Phase 3 queries (Q9–Q12) are independent.

  6. PII handling — Do NOT include recipient email addresses in inline reports or markdown files. Aggregate by domain or use anonymized references (e.g., "2 users in the contoso.com domain"). Top sender domains from external sources are acceptable.

  7. Percentages must be grounded — Always show both the percentage AND the raw count (e.g., "99.8% clean (5,851 of 5,864)").


Email Protection Score Formula

The Email Protection Score is a composite posture indicator summarizing the effectiveness of email security controls. Higher scores indicate stronger protection (inverse of a risk score).

Scoring Dimensions

$$ \text{EmailProtectionScore} = \sum_{i} \text{DimensionScore}_i $$

Each dimension contributes 0–20 points to a maximum of 100:

Dimension Max 🟢 High (16–20) 🟡 Medium (8–15) 🔴 Low (0–7)
Threat Block Rate 20 ≥95% of threats not in inbox (post-ZAP final state) 80–94% remediated <80% remediated (threats remain in inbox)
Email Authentication 20 SPF+DMARC+DKIM all ≥95% Any one 80–94% Any one <80%
ZAP Effectiveness 20 ≥95% ZAP success rate + 0 failed ZAPs 80–94% success OR 1–2 failures <80% success OR ≥3 failures
Safe Links Protection 20 0 phishing click-throughs AND active scanning 1–2 phishing click-throughs ≥3 phishing click-throughs OR no scanning
Phishing Delivery Rate 20 0 phishing emails delivered (post-ZAP) 1–5 phishing delivered (post-ZAP) >5 phishing still in mailboxes (post-ZAP)

Interpretation Scale

Score Rating Action
85–100 ✅ Strong Excellent posture — maintain current configurations
65–84 🟡 Good Minor gaps — review flagged dimensions
45–64 🟠 Needs Improvement Multiple weaknesses — prioritize remediation
0–44 🔴 Critical Significant exposure — immediate action required

Execution Workflow

Phase 0: Prerequisites

  1. Confirm RunAdvancedHuntingQuery is available (EmailEvents tables are AH-native)
  2. Ask user for output format (inline / markdown / both)
  3. Confirm lookback period (default: 7 days)

Phase 1: Mail Flow & Threat Overview (Q1–Q4)

Run in parallel — no dependencies between queries.

Query Purpose
Q1 Inbound email summary with threat breakdown
Q2 Email volume trend by day
Q3 Delivery action and location breakdown
Q4 Detection methods breakdown

Phase 2: Protection Effectiveness (Q5–Q8)

Run in parallel — no dependencies between queries.

Query Purpose
Q5 Email authentication pass rates (DMARC/DKIM/SPF/CompAuth)
Q6 ZAP and post-delivery remediation summary
Q7 Safe Links click activity summary
Q8 Phishing emails delivered (not blocked)

Phase 3: Deep Dives & Governance (Q9–Q12)

Run in parallel — no dependencies between queries.

Query Purpose
Q9 Top phishing sender domains
Q10 Most targeted recipients (aggregated)
Q11 Attachment type distribution
Q12 Post-ZAP threat state (latest delivery location)

Phase 4: MDO Security Incidents (Q13–Q14)

Run in parallel — no dependencies between queries.

Query Purpose
Q13 MDO incident summary by severity and status
Q14 MDO incident type breakdown (top alert-driven incidents)

⚠️ SecurityAlert.Status is IMMUTABLE — always "New" regardless of actual state. These queries use the canonical SecurityAlert→SecurityIncident join to get real Status and Classification from the SecurityIncident table. See copilot-instructions.md Known Table Pitfalls.

Phase 5: Score Computation & Report Generation

  1. Compute per-dimension scores from Phase 1–4 data
  2. Sum dimension scores for composite Email Protection Score
  3. Generate report in requested output mode
  4. Offer SVG dashboard if not already requested

Sample KQL Queries

All queries below are verified against the EmailEvents family of tables. Use them exactly as written, substituting only the lookback period where noted. These queries use Timestamp for Advanced Hunting. If falling back to Data Lake, replace Timestamp with TimeGenerated.

Query 1: Inbound Email Summary with Threat Breakdown

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize
    TotalInbound = count(),
    Clean = countif(isempty(ThreatTypes)),
    Phish = countif(ThreatTypes has "Phish"),
    Malware = countif(ThreatTypes has "Malware"),
    Spam = countif(ThreatTypes has "Spam"),
    HighConfPhish = countif(ConfidenceLevel has "High" and ThreatTypes has "Phish"),
    Blocked = countif(DeliveryAction == "Blocked"),
    Delivered = countif(DeliveryAction == "Delivered"),
    Junked = countif(DeliveryAction == "Junked"),
    DistinctSenders = dcount(SenderFromAddress),
    DistinctRecipients = dcount(RecipientEmailAddress)
| project TotalInbound, Clean, Phish, Malware, Spam, HighConfPhish,
    Blocked, Delivered, Junked, DistinctSenders, DistinctRecipients

Query 2: Email Volume Trend by Day

EmailEvents
| where Timestamp > ago(7d)
| summarize
    Inbound = countif(EmailDirection == "Inbound"),
    Outbound = countif(EmailDirection == "Outbound"),
    IntraOrg = countif(EmailDirection == "Intra-org")
    by Day = bin(Timestamp, 1d)
| order by Day asc

Query 3: Delivery Action & Location Breakdown

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize Count = count() by DeliveryAction, DeliveryLocation
| order by Count desc

Query 4: Detection Methods Breakdown

EmailEvents
| where Timestamp > ago(7d)
| where isnotempty(DetectionMethods) and DetectionMethods != "{}"
| extend DetMethods = parse_json(DetectionMethods)
| extend FirstDetection = tostring(bag_keys(DetMethods)[0])
| extend FirstSubcategory = iif(
    FirstDetection != "" and array_length(DetMethods[FirstDetection]) > 0,
    strcat(FirstDetection, ": ", tostring(DetMethods[FirstDetection][0])),
    FirstDetection)
| summarize Count = count() by FirstSubcategory
| order by Count desc

Query 5: Email Authentication Pass Rates

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| extend AuthDetails = parse_json(AuthenticationDetails)
| extend
    DMARC = tostring(AuthDetails.DMARC),
    DKIM = tostring(AuthDetails.DKIM),
    SPF = tostring(AuthDetails.SPF),
    CompAuth = tostring(AuthDetails.CompAuth)
| summarize
    TotalEmails = count(),
    DMARCPass = countif(DMARC == "pass"),
    DMARCFail = countif(DMARC == "fail"),
    DKIMPass = countif(DKIM == "pass"),
    DKIMFail = countif(DKIM == "fail"),
    SPFPass = countif(SPF == "pass"),
    SPFFail = countif(SPF == "fail"),
    CompAuthPass = countif(CompAuth has "pass"),
    CompAuthFail = countif(CompAuth == "fail")

Query 6: ZAP & Post-Delivery Remediation Summary

EmailPostDeliveryEvents
| where Timestamp > ago(7d)
| summarize
    TotalActions = count(),
    PhishZAP = countif(ActionType == "Phish ZAP"),
    MalwareZAP = countif(ActionType == "Malware ZAP"),
    SpamZAP = countif(ActionType == "Spam ZAP"),
    ThreatZAPTotal = countif(ActionType in ("Phish ZAP", "Malware ZAP", "Spam ZAP")),
    ManualRemediation = countif(ActionType has "Admin"),
    SuccessCount = countif(ActionResult == "Success"),
    ErrorCount = countif(ActionResult == "Error")
| project TotalActions, PhishZAP, MalwareZAP, SpamZAP, ThreatZAPTotal, ManualRemediation, SuccessCount, ErrorCount

Query 7: Safe Links Click Activity Summary

UrlClickEvents
| where Timestamp > ago(7d)
| summarize
    TotalClicks = count(),
    BlockedClicks = countif(ActionType == "ClickBlocked"),
    AllowedClicks = countif(ActionType == "ClickAllowed"),
    ClickedThrough = countif(IsClickedThrough == true),
    PhishClicks = countif(ThreatTypes has "Phish"),
    DistinctUrls = dcount(Url),
    DistinctUsers = dcount(AccountUpn)

Query 8: Phishing Emails Delivered (Not Blocked)

EmailEvents
| where Timestamp > ago(7d)
| where ThreatTypes has "Phish"
| where DeliveryAction == "Delivered" or LatestDeliveryAction == "Delivered"
| summarize
    DeliveredPhish = count(),
    DistinctRecipients = dcount(RecipientEmailAddress),
    DistinctSenders = dcount(SenderFromAddress),
    Subjects = make_set(Subject, 5)

Query 9: Top Phishing Sender Domains

EmailEvents
| where Timestamp > ago(7d)
| where ThreatTypes has "Phish"
| summarize
    Count = count(),
    DistinctRecipients = dcount(RecipientEmailAddress),
    DeliveredCount = countif(DeliveryAction == "Delivered" or LatestDeliveryAction == "Delivered")
    by SenderFromDomain
| top 10 by Count

Query 10: Most Targeted Recipients (Aggregated by Domain)

EmailEvents
| where Timestamp > ago(7d)
| where isnotempty(ThreatTypes) and EmailDirection == "Inbound"
| extend RecipientDomain = tostring(split(RecipientEmailAddress, "@")[1])
| summarize
    ThreatCount = count(),
    PhishCount = countif(ThreatTypes has "Phish"),
    SpamCount = countif(ThreatTypes has "Spam"),
    MalwareCount = countif(ThreatTypes has "Malware"),
    DistinctRecipients = dcount(RecipientEmailAddress)
    by RecipientDomain
| order by ThreatCount desc

Query 11: Attachment Type Distribution

EmailAttachmentInfo
| where Timestamp > ago(7d)
| summarize
    Count = count(),
    DistinctFiles = dcount(FileName),
    ThreatCount = countif(isnotempty(ThreatTypes))
    by FileType
| order by Count desc
| take 15

Query 12: Post-ZAP Threat State (Latest Delivery Location)

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| where isnotempty(ThreatTypes)
| summarize Count = count() by LatestDeliveryAction, LatestDeliveryLocation, ThreatTypes
| order by Count desc

Query 13: MDO Incident Summary by Severity and Status

Uses the canonical SecurityAlert→SecurityIncident join. Filters to ProductName == "Office 365 Advanced Threat Protection" and excludes Communication Compliance alerts (CC_ prefix).

let MDOAlerts = SecurityAlert
| where TimeGenerated > ago(7d)
| where ProductName == "Office 365 Advanced Threat Protection"
| where AlertName !startswith "CC_"
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId;
SecurityIncident
| where CreatedTime > ago(7d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner MDOAlerts on $left.AlertId == $right.SystemAlertId
| summarize IncidentCount = dcount(IncidentNumber) by Severity, Status, Classification
| order by Severity asc, IncidentCount desc

Query 14: MDO Incident Type Breakdown (Top Alert-Driven Incidents)

Groups incidents by title and alert composition to show the most common MDO-generated incident types.

let MDOAlerts = SecurityAlert
| where TimeGenerated > ago(7d)
| where ProductName == "Office 365 Advanced Threat Protection"
| where AlertName !startswith "CC_"
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName;
SecurityIncident
| where CreatedTime > ago(7d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner MDOAlerts on $left.AlertId == $right.SystemAlertId
| summarize
    IncidentCount = dcount(IncidentNumber),
    AlertCount = count(),
    OpenCount = dcountif(IncidentNumber, Status == "New" or Status == "Active"),
    ClosedCount = dcountif(IncidentNumber, Status == "Closed"),
    TruePositives = dcountif(IncidentNumber, Classification == "TruePositive")
    by AlertName, Severity
| order by IncidentCount desc
| take 10

Output Modes

Mode 1: Inline Chat Summary

Render the full analysis directly in the chat response. Best for quick review and C-level briefings.

Mode 2: Markdown File Report

Save a comprehensive report to disk at:

reports/email-threat-posture/Email_Threat_Protection_Report_YYYYMMDD_HHMMSS.md

Mode 3: Both

Generate the markdown file AND provide an inline summary in chat.

Always ask the user which mode before generating output.


Inline Report Template

Render the following sections in order. Omit sections only if explicitly noted as conditional.

🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).

# 📧 Email Threat Protection Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** Microsoft Defender for Office 365 (Advanced Hunting)
**Analysis Period:** <StartDate> → <EndDate> (<N> days)
**Protected Mailboxes:** <DistinctRecipients>
**Total Inbound Emails:** <N>
**Email Protection Score:** <Score>/100 — <RATING>

---

## Executive Summary

<2-3 sentences: total inbound volume, threat detection rate, key findings, overall posture rating>

**Email Protection Score:** 🟢/🟡/🟠/🔴 <RATING> (<Score>/100)

---

## Key Metrics

| Metric | Value |
|--------|-------|
| Total Inbound Emails | <N> |
| Clean Email Rate | <N>% (<clean> of <total>) |
| Threats Detected | <N> (Phish: <N>, Spam: <N>, Malware: <N>) |
| Threats Blocked Pre-Delivery | <N> |
| Phishing Delivered (Now Remediated) | <N> |
| Threat ZAP Actions | <N> (Phish: <N>, Malware: <N>, Spam: <N>) |
| Total Post-Delivery Actions | <N> (includes system events) |
| ZAP Success Rate | <N>% (Failed: <N>) |
| Threats Still in Mailboxes (Post-ZAP) | <N> (Phish: <N>, Spam: <N>) |
| Safe Links Clicks Scanned | <N> |
| Phishing Click-Throughs | <N> |
| Distinct Senders | <N> |
| Protected Mailboxes | <N> |

---

## 📬 Mail Flow Overview

### Daily Volume Trend

<Table or sparkline showing inbound/outbound/intra-org by day>

| Day | Inbound | Outbound | Intra-org |
|-----|---------|----------|-----------|
| <date> | <N> | <N> | <N> |

**Observations:** <Note any spikes, trends, or anomalies>

---

## 🛡️ Threat Composition

### Threat Categories

| Category | Count | % of Threats |
|----------|-------|-------------|
| Phishing | <N> | <N>% |
| Spam | <N> | <N>% |
| Malware | <N> | <N>% |
| High-Confidence Phishing | <N> | — |

### Detection Methods

| Method | Count |
|--------|-------|
| <method> | <N> |

### Top Phishing Sender Domains

| Domain | Phish Count | Delivered | Recipients Hit |
|--------|-------------|-----------|----------------|
| <domain> | <N> | <N> | <N> |

<If Q9 returns 0 phishing domains:>
✅ No phishing sender domains detected.

---

## 📦 Delivery Disposition

### Initial Delivery Action

| Action | Location | Count |
|--------|----------|-------|
| Delivered | Inbox/folder | <N> |
| Blocked | Dropped | <N> |
| Blocked | Quarantine | <N> |
| Junked | Junk folder | <N> |

### Post-ZAP Threat State

<Shows where threats currently reside after ZAP remediation>

| Latest Action | Location | Threat Type | Count |
|---------------|----------|-------------|-------|
| <action> | <location> | <type> | <N> |

**Summary of current threat locations (post-ZAP):**

| Current Location | Threat Count | % of Threats |
|-----------------|-------------|-------------|
| 🟢 Quarantine | <N> | <N>% |
| 🟢 Junk folder | <N> | <N>% |
| 🟢 Blocked/Dropped/Failed | <N> | <N>% |
| 🟢 Deleted items | <N> | <N>% |
| 🔴 **Still in Inbox** | **<N>** | **<N>%** |
| **Total** | **<N>** | **100%** |

> Show the phishing vs spam breakdown for "Still in Inbox": e.g., "<N> phishing (<N> total threats including spam)"

---

## 🔐 Email Authentication

| Protocol | Pass Rate | Pass Count | Fail Count | Other/None |
|----------|-----------|------------|------------|------------|
| SPF | <N>% | <N> | <N> | <N> |
| DMARC | <N>% | <N> | <N> | <N> |
| DKIM | <N>% | <N> | <N> | <N> |
| CompAuth | <N>% | <N> | <N> | <N> |

> **Note:** "Other/None" = emails with no result for that protocol (e.g., no DKIM signature). A low DKIM pass rate with 0 failures means unsigned senders, not spoofing. Compare against DMARC and CompAuth for the complete authentication picture.

**Assessment:**
- <emoji> <finding for each protocol>

---

## 🧹 Post-Delivery Remediation (ZAP)

| Metric | Value |
|--------|-------|
| Threat ZAP Actions | <N> (Phish: <N>, Malware: <N>, Spam: <N>) |
| Total Post-Delivery Actions | <N> (includes system events, admin actions) |
| ZAP Success Rate | <N>% (<success> of <total>) |
| Failed Remediations | <N> |

> **Reporting guidance:** The Key Metrics "Threat ZAP Actions" row should show **only** the Phish + Malware + Spam ZAP count — NOT the TotalActions, which includes system-initiated post-delivery events (message trace updates, delivery location changes). TotalActions is shown separately with a clarifying note.

<If ErrorCount > 0:>
⚠️ **<N> ZAP remediation(s) failed** — manual follow-up recommended. Threats may remain in user mailboxes.

<If ErrorCount == 0:>
✅ All post-delivery remediations completed successfully.

---

## 🔗 Safe Links Protection

| Metric | Value |
|--------|-------|
| Total Clicks Scanned | <N> |
| Clicks Blocked | <N> |
| Clicks Allowed | <N> |
| Phishing Clicks | <N> |
| Click-Through Overrides | <N> |
| Distinct URLs Scanned | <N> |
| Users Protected | <N> |

<If PhishClicks > 0:>
🔴 **<N> phishing URL click(s) detected** — investigate affected users for credential compromise.

<If PhishClicks == 0:>
✅ No phishing URL click-throughs detected.

---

## 📎 Attachment Analysis

### Top Attachment Types

| File Type | Count | Distinct Files | Threats Detected |
|-----------|-------|----------------|------------------|
| <type> | <N> | <N> | <N> |

<If any ThreatCount > 0:>
⚠️ **Malicious attachments detected in <N> file type(s)** — verify delivery status and endpoint execution.

<If all ThreatCount == 0:>
✅ No malicious attachments detected in email flow.

---

## 🎯 Targeted Recipients

| Recipient Domain | Threat Count | Phish | Spam | Malware | Recipients |
|-----------------|-------------|-------|------|---------|------------|
| <domain> | <N> | <N> | <N> | <N> | <N> |

---

## Email Protection Score Card

```
┌──────────────────────────────────────────────────────┐
│       EMAIL PROTECTION SCORE: <NN>/100               │
│             Rating: <EMOJI> <RATING>                 │
├──────────────────────────────────────────────────────┤
│ Threat Block Rate    [<bar>] <N>/20  (<detail>)      │
│ Email Authentication [<bar>] <N>/20  (<detail>)      │
│ ZAP Effectiveness    [<bar>] <N>/20  (<detail>)      │
│ Safe Links Protection[<bar>] <N>/20  (<detail>)      │
│ Phishing Delivery    [<bar>] <N>/20  (<detail>)      │
└──────────────────────────────────────────────────────┘
```

---
## 🚨 MDO Security Incidents

### Incident Summary (Last <N> Days)

| Severity | Open | Closed | True Positive | Total |
|----------|------|--------|---------------|-------|
| 🔴 High | <N> | <N> | <N> | <N> |
| 🟠 Medium | <N> | <N> | <N> | <N> |
| 🟡 Low | <N> | <N> | <N> | <N> |
| 🔵 Informational | <N> | <N> | <N> | <N> |
| **Total** | **<N>** | **<N>** | **<N>** | **<N>** |

### Top MDO Incident Types

| Alert Name | Severity | Incidents | Open | Closed | True Positives |
|------------|----------|-----------|------|--------|----------------|
| <name> | <sev> | <N> | <N> | <N> | <N> |

<If Q13 returns 0 incidents:>
✅ No MDO-generated security incidents in the analysis period.

---
## Security Assessment

| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |

---

## Recommendations

1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...

---

## Appendix: Query Execution Summary

| Query | Description | Records | Time |
|-------|-------------|---------|------|
| Q1 | Inbound Email Summary | <N> | <time> |
| Q2 | Daily Volume Trend | <N> | <time> |
| ... | ... | ... | ... |
| Q13 | MDO Incident Summary | <N> | <time> |
| Q14 | MDO Incident Types | <N> | <time> |

Markdown File Report Template

When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:

reports/email-threat-posture/Email_Threat_Protection_Report_YYYYMMDD_HHMMSS.md

Include the following additional sections in the file report that are omitted from inline:

  1. Top sender domains table (full top 10 by volume with phish/spam breakdown)
  2. Authentication failure breakdown by domain (domains failing DMARC/DKIM/SPF)
  3. Overridden threats (emails detected as threats but allowed by policy)
  4. Complete detection methods table (all detection categories, not just top)
  5. First-contact phishing attempts (emails from never-before-seen senders flagged as phish)
  6. MDO security incidents — Full severity × status breakdown + top incident types from Q13/Q14
  7. Raw query references — note that full query definitions are in this SKILL.md file

Markdown Section Ordering

Follow this exact section order in markdown file reports:

Order Section Source
1 Header (with Total Inbound + Score) Template header
2 Executive Summary Template
3 Key Metrics Template
4 Mail Flow Overview (daily trend) Q2
5 Threat Composition (categories + detection methods + top phish senders) Q1, Q4, Q9
6 Delivery Disposition (initial + post-ZAP threat state) Q3, Q12
7 Email Authentication (with auth failures by domain) Q5, QM2
8 Post-Delivery Remediation (ZAP) Q6
9 Safe Links Protection Q7
10 Attachment Analysis Q11
11 Targeted Recipients Q10
12 Deep-dive sections start here
13 Overridden Threats QM3
14 First-Contact Phishing QM4
15 MDO Security Incidents Q13, Q14
16 Top Sender Domains by Volume QM1
17 Score and assessment
18 Email Protection Score Card Computed
19 Security Assessment Synthesized
20 Recommendations Synthesized
21 Appendix: Query Execution Summary All queries
22 References URL Registry

Key rule: Score Card → Assessment → Recommendations always come AFTER all data sections (including deep dives). This ensures the reader sees all evidence before the overall assessment.

Additional Queries for Markdown File Deep Dives

These queries provide enrichment data for the markdown file report only. Skip for inline mode.

QM1: Top Sender Domains by Volume

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| summarize
    EmailCount = count(),
    PhishCount = countif(ThreatTypes has "Phish"),
    SpamCount = countif(ThreatTypes has "Spam"),
    DistinctSenders = dcount(SenderFromAddress)
    by SenderFromDomain
| order by EmailCount desc
| take 10

QM2: Authentication Failures by Domain

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| extend AuthDetails = parse_json(AuthenticationDetails)
| extend
    DMARC = tostring(AuthDetails.DMARC),
    DKIM = tostring(AuthDetails.DKIM),
    SPF = tostring(AuthDetails.SPF)
| summarize
    TotalEmails = count(),
    DMARCFail = countif(DMARC == "fail"),
    DKIMFail = countif(DKIM == "fail"),
    SPFFail = countif(SPF == "fail")
    by SenderFromDomain
| where DMARCFail > 0 or DKIMFail > 0 or SPFFail > 0
| order by TotalEmails desc
| take 15

QM3: Overridden Threats (Allow Policies)

EmailEvents
| where Timestamp > ago(7d)
| where OrgLevelAction == "Allow" and isnotempty(ThreatTypes)
| summarize Count = count() by ThreatTypes, OrgLevelPolicy, DetectionMethods
| order by Count desc

QM4: First-Contact Phishing Attempts

EmailEvents
| where Timestamp > ago(7d)
| where EmailDirection == "Inbound"
| where IsFirstContact == true
| where ThreatTypes has "Phish" or UrlCount > 3
| summarize
    FirstContactCount = count(),
    PhishCount = countif(ThreatTypes has "Phish"),
    HighUrlCount = countif(UrlCount > 3),
    DistinctSenders = dcount(SenderFromAddress)

File Report Header

# Email Threat Protection Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** Microsoft Defender for Office 365 (Advanced Hunting)
**Analysis Period:** <StartDate> → <EndDate> (<N> days)
**Protected Mailboxes:** <DistinctRecipients>
**Total Inbound Emails:** <N>
**Email Protection Score:** <Score>/100 — <RATING>

---

Known Pitfalls

1. DetectionMethods Is a JSON String

Problem: DetectionMethods looks like it should be dynamic but is a string column containing JSON. Direct property access fails.

Solution: Always parse_json(DetectionMethods) before accessing sub-keys:

| extend DetMethods = parse_json(DetectionMethods)
| extend FirstDetection = tostring(bag_keys(DetMethods)[0])

2. AuthenticationDetails Is a JSON String

Problem: Same as DetectionMethods — AuthenticationDetails is a string column, not dynamic.

Solution: Always parse_json(AuthenticationDetails):

| extend AuthDetails = parse_json(AuthenticationDetails)
| extend DMARC = tostring(AuthDetails.DMARC)

3. ThreatTypes Is Pipe-Delimited

Problem: ThreatTypes can contain multiple values pipe-delimited (e.g., "Phish|Spam"). Using == will miss multi-category threats.

Solution: Always use has operator:

| where ThreatTypes has "Phish"   // ✅ Correct
| where ThreatTypes == "Phish"    // ❌ Misses "Phish|Spam"

4. Timestamp vs TimeGenerated

Problem: Advanced Hunting uses Timestamp for XDR-native tables. Sentinel Data Lake uses TimeGenerated.

Solution: Default queries use Timestamp (AH). If falling back to Data Lake, replace Timestamp with TimeGenerated throughout.

5. IsFirstContact May Be Null

Problem: IsFirstContact can be null for outbound or intra-org emails. Filtering on it without scoping to inbound emails may miss records.

Solution: Always filter EmailDirection == "Inbound" before using IsFirstContact.

6. LatestDeliveryAction vs DeliveryAction

Problem: DeliveryAction is the initial delivery disposition. LatestDeliveryAction reflects the current state after ZAP or manual remediation. Reporting only DeliveryAction overstates the number of threats in mailboxes.

Solution: When assessing current threat exposure, use LatestDeliveryAction and LatestDeliveryLocation. When assessing initial filter effectiveness, use DeliveryAction.

7. DKIM Pass Rate May Be Lower Than Expected

Problem: DKIM pass rate can appear low because many legitimate emails (especially bulk/marketing) don't sign with DKIM at all. An email with no DKIM signature isn't a DKIM "fail" — it simply has no result. The DKIM field from AuthenticationDetails may be empty or "none" rather than "fail".

Solution: When computing DKIM pass rate, note the denominator: emails with a DKIM result vs total emails. A lower DKIM rate is expected and doesn't necessarily indicate spoofing. Compare against DMARC and CompAuth for a better authentication picture.

8. ZAP ErrorCount May Include Non-Threat Emails

Problem: ZAP errors can occur for legitimate reasons: shared mailboxes, retention policies preventing purge, user-moved emails. A ZAP error doesn't always mean a threat is still active.

Solution: When reporting ZAP failures, note that manual investigation may confirm the threat was already handled. Don't over-alarm on ZAP errors without context.

9. ZAP TotalActions ≠ Threat ZAP Count

Problem: EmailPostDeliveryEvents includes all post-delivery events — not just ZAP threat remediations. The TotalActions count from Q6 includes system-initiated events (message trace updates, delivery location changes, admin investigation submissions). Reporting TotalActions as "ZAP Remediations" in Key Metrics massively overstates the threat remediation picture (e.g., 7,790 total when only 674 are actual threat ZAPs).

Solution: Always use ThreatZAPTotal (PhishZAP + MalwareZAP + SpamZAP) for headline ZAP metrics. Show TotalActions separately with a clarifying note: "includes system events". In Key Metrics, use "Threat ZAP Actions: 674" not "ZAP Remediations: 7,790".

10. Threat Block Rate — Post-ZAP vs Pre-Delivery

Problem: The scoring dimension "Threat Block Rate" can be interpreted two ways: (a) pre-delivery block rate (threats blocked before reaching inbox), or (b) final disposition rate (threats not in inbox after ZAP). These give different numbers — e.g., 72.5% pre-delivery vs 81.2% post-ZAP.

Solution: The dimension measures final threat disposition (post-ZAP) — the percentage of detected threats that are NOT currently in user inboxes. This is the operationally relevant metric because it reflects actual user exposure. The dimension description explicitly says "not in inbox (post-ZAP final state)".


Quality Checklist

Before delivering the report, verify:

  • All percentage values show both percentage AND raw count
  • All queries used Timestamp (AH) or TimeGenerated (Data Lake) consistently
  • Zero-result queries are reported with explicit absence confirmation (✅ pattern)
  • The Email Protection Score calculation is transparent with per-dimension evidence
  • Detection methods show the full breakdown, not just "threats detected"
  • ZAP effectiveness distinguishes threat ZAP count vs TotalActions (no inflated headline metric)
  • Key Metrics ZAP row shows ThreatZAPTotal (Phish+Malware+Spam), NOT TotalActions
  • Safe Links section distinguishes blocked vs allowed vs click-through
  • Email authentication covers all four protocols with Other/None column
  • Threats-in-mailbox summary breaks down phishing vs spam (not just total)
  • Markdown report follows section ordering guidance (data → deep dives → score → assessment)
  • Post-ZAP state (Q12) shows where threats currently reside, not just initial delivery
  • MDO incidents section shows severity × status breakdown with open/closed/TP counts
  • Incident queries use canonical SecurityAlert→SecurityIncident join (NOT SecurityAlert.Status)
  • Recommendations are prioritized and evidence-based
  • All hyperlinks copied verbatim from the URL Registry — no fabricated URLs
  • No recipient PII (email addresses) in the report — aggregate by domain only
  • Daily volume trend includes at least a note on peak/anomaly days

SVG Dashboard Generation

📊 Optional post-report step. After an Email Threat Protection report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/email-threat-posture/Email_Threat_Protection_Report_<date>.md
  • Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/email-threat-posture/{report_name}_dashboard.svg

The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.

生成全面的漏洞与暴露管理报告,涵盖CVE、配置合规、EoS软件、关键资产、攻击路径及证书状态。支持组织级或单设备范围,通过查询TVM相关表输出安全态势评估。
vulnerability report exposure report CVE assessment security posture vulnerability assessment exposure management patch status end of support security recommendations attack paths critical assets configuration compliance Defender device health security score TVM threat and vulnerability management
.github/skills/exposure-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill exposure-investigation -g -y
SKILL.md
Frontmatter
{
    "name": "exposure-investigation",
    "description": "Use this skill when asked to generate a vulnerability and exposure management report, assess security posture, or review CVEs, security configurations, and attack paths. Triggers on keywords like \"vulnerability report\", \"exposure report\", \"CVE assessment\", \"security posture\", \"vulnerability assessment\", \"exposure management\", \"patch status\", \"end of support\", \"security recommendations\", \"attack paths\", \"critical assets\", \"configuration compliance\", \"Defender device health\", \"security score\", \"TVM\", \"threat and vulnerability management\", or when asking about overall organizational vulnerability\/exposure state. This skill queries DeviceTvm* tables and ExposureGraphNodes\/Edges to produce a comprehensive posture report covering CVEs, exploitable vulnerabilities, security configuration compliance, end-of-support software, critical asset inventory, attack paths, Defender device health, and certificate status. Supports org-wide and per-device scoping with inline chat and markdown file output.",
    "drill_down_prompt": "Run vulnerability and exposure report — CVEs, attack paths, critical assets, configuration compliance",
    "threat_pulse_domains": [
        "exposure"
    ]
}

Vulnerability & Exposure Management Report — Instructions

Purpose

This skill generates a comprehensive Vulnerability & Exposure Management Report covering the full security posture of the organization (or a specific device). It goes beyond CVEs to include security configuration compliance, end-of-support software, Exposure Management critical assets, attack paths, and certificate status.

Entity Type: Organization-wide (default) or single device

Scope Primary Tables Use Case
Org-wide (default) DeviceTvmSoftwareVulnerabilities, ExposureGraphNodes, ExposureGraphEdges Full organizational posture assessment
Per-device DeviceTvmSoftwareVulnerabilities, DeviceTvmSecureConfigurationAssessment Focused device vulnerability review

What this skill covers:

Section Data Source Coverage
CVE Vulnerabilities DeviceTvmSoftwareVulnerabilities + DeviceTvmSoftwareVulnerabilitiesKB Severity distribution, exploitable CVEs, CVSS scores
Security Configuration DeviceTvmSecureConfigurationAssessment + ...KB OS, Network, Security Controls, Accounts, Application compliance
End-of-Support Software DeviceTvmSoftwareInventory EoS/EoL software with dates and affected devices
Critical Assets ExposureGraphNodes Criticality levels, internet-facing, RCE/privesc flags
Attack Paths ExposureGraphEdges + ExposureGraphNodes Multi-hop paths from vulnerable to critical assets
Defender Device Health DeviceTvmSecureConfigurationAssessment + DeviceInfo AV mode, signatures, RTP, tamper protection, cloud protection compliance by active/inactive status
Certificate Status DeviceTvmCertificateInfo Expired and expiring certificates
Software Evidence (drill-down) DeviceTvmSoftwareEvidenceBeta File paths, registry paths linking vulnerable software to on-disk locations — used for targeted remediation

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Output Modes - Inline chat vs. Markdown file
  3. Quick Start - 8-step execution pattern
  4. Execution Workflow - Complete phased process
  5. Sample KQL Queries - Validated query patterns (Queries 1-11, 13-16)
  6. Drill-Down Reference Queries - Targeted file-level evidence for remediation (Queries 17-19)
  7. Report Template - Output structure and formatting
  8. Per-Device Mode - Single device scoping
  9. Known Pitfalls - Edge cases
  10. Error Handling - Troubleshooting guide

Investigation shortcuts:

  • Specific CVE assessment (TP Q12): Q2 (exploitable CVEs + KB details) → Q5 (per-device vuln counts, scoped) → Q14 (top vulnerable software) → Q17/Q18 (file evidence drill-down)
  • Internet-facing critical asset exposure (TP Q11): Q7 (critical asset inventory) → Q15 (internet-facing + vulns) → Q10a (vulnerable device summary) → Q10b (blast radius edges) → Q16 (multi-hop attack paths, optional)
  • Per-device vulnerability review (TP Q12, TP Q1): Q5 (per-device vuln counts) → Q6 (per-device compliance) → Q8 (high-impact misconfigs) → Q13 (certificates)
  • Fleet-wide posture report (standalone): Q1 (severity dist) → Q2 (exploitable) → Q3 (config compliance) → Q4 (EoS software) → Q7 (critical assets) → Q9 (Defender health)
  • Defender health audit (TP Q11, standalone): Q9 (fleet summary by control×OS) → Q11 (non-compliant exceptions, active only) → Q6 (per-device compliance scorecard)
  • Attack path analysis (TP Q11+Q12): Q10a (vulnerable device exposure) → Q10b (1-hop blast radius) → Q15 (internet-facing critical + vulns) → Q16 (multi-hop paths, optional)
  • Software version sprawl (after Q14 or Q2): Q14 (top vulnerable software) → Q17 (version sprawl by source) → Q18 (CVE to file mapping) → Q19 (stale extension folders)

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY vulnerability/exposure report:

  1. ALL queries in this skill use RunAdvancedHuntingQuery — DeviceTvm* and ExposureGraph* tables are Advanced Hunting only (NOT in Sentinel Data Lake)
  2. No Sentinel workspace selection is required — this skill does NOT query Sentinel Data Lake tables
  3. ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or both (default: both)
  4. ALWAYS ask the user for scope if ambiguous: org-wide (default) or specific device name
  5. ALWAYS run independent queries in parallel for performance
  6. ALWAYS use create_file for markdown reports (NEVER use PowerShell terminal commands)
  7. ALWAYS sanitize PII from saved reports — use generic placeholders for real hostnames, tenant names, and UPNs in committed files (reports/ files for user's own use may contain real values)
  8. ExposureGraph tables are snapshot data — no Timestamp or TimeGenerated filter needed
  9. DeviceTvm assessment tables — use summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId to get the latest assessment per device×config. Do NOT use Timestamp > ago(1d) as a pre-filter — lab/weekend environments may have stale data and return 0 results

Tool Selection

Table Pattern Tool Notes
DeviceTvm* RunAdvancedHuntingQuery AH-only tables
ExposureGraphNodes RunAdvancedHuntingQuery AH-only, snapshot data, no timestamp filter
ExposureGraphEdges RunAdvancedHuntingQuery AH-only, snapshot data, no timestamp filter

🔴 PROHIBITED:

  • ❌ Using mcp_sentinel-data_query_lake for any table in this skill
  • ❌ Adding TimeGenerated filters to ExposureGraph queries
  • ❌ Reporting findings without actual query evidence
  • ❌ Fabricating CVE IDs, CVSS scores, or device names

When invoked from a parent skill (threat-pulse, incident-investigation, etc.)

  • Skip output mode and scope prompts — the parent skill controls output format
  • Use the investigation shortcut that matches the parent trigger (see shortcuts above the TOC)
  • For quick triage: run only the shortcut query chain
  • For deep investigation: run the full phased workflow

Output Modes

Mode 1: Inline Chat Summary (default for quick requests)

Compact executive summary rendered directly in chat.

Mode 2: Markdown File Report

Full detailed report saved to reports/exposure/vulnerability_exposure_report_<YYYYMMDD_HHMMSS>.md.

Mode 3: Both (default when user says "report" or "generate report")

Inline chat executive summary + full markdown file.

Ask user if not specified:

"How would you like the report? I can provide:

  1. Inline chat summary — executive overview in chat
  2. Markdown file — detailed report saved to reports/exposure/
  3. Both (recommended) — summary in chat + full report file"

Quick Start (TL;DR)

8-step execution pattern for org-wide report:

Step 1: Determine scope (org-wide or specific device) and output mode
Step 2: Run Phase 1 queries in parallel — CVE distribution, exploitable CVEs, config compliance
Step 3: Run Phase 2 queries in parallel — EoS software, per-device vulns, per-device compliance
Step 4: Run Phase 3 queries in parallel — ExposureGraph critical assets, high-impact misconfigs, Defender health fleet summary
Step 5: Run Phase 4 queries in parallel — Attack paths, Defender health exceptions, certificates
Step 6: Run Phase 5 (optional) — Top vulnerable software, internet-facing critical assets
Step 7: Compute summary metrics and risk assessment
Step 8: Render inline chat executive summary
Step 9: Generate markdown file report (if requested)

Execution Workflow

Phase 1: Core Vulnerability & Compliance (3 parallel queries)

Run these simultaneously:

Query Description Reference
Q1 CVE severity distribution Query 1
Q2 Exploitable CVEs (with known exploits) Query 2
Q3 Security config compliance by category Query 3

Phase 2: Software & Per-Device Detail (3 parallel queries)

Query Description Reference
Q4 End-of-support software inventory Query 4
Q5 Per-device vulnerability counts Query 5
Q6 Per-device compliance scorecard Query 6

Phase 3: Exposure Management & Defender Health (3 parallel queries)

Query Description Reference
Q7 Critical asset inventory Query 7
Q8 High-impact misconfigurations with remediation Query 8
Q9 Defender health fleet summary Query 9

Phase 4: Attack Paths & Supplementary (4 parallel queries)

Query Description Reference
Q10a Vulnerable device exposure summary (fast) Query 10a
Q10b Edge connectivity from vulnerable devices (fast) Query 10b
Q11 Defender health non-compliant exceptions Query 11
Q13 Certificate expiration status Query 13

Phase 5: Supplementary Detail (optional, 3 parallel queries)

Run only if Phase 1-4 reveal high-risk items:

Query Description Reference
Q14 Top vulnerable software by CVE count Query 14
Q15 Internet-facing critical assets with vulnerabilities Query 15
Q16 Multi-hop attack path enumeration (slow — graph-match) Query 16

Phase 6: Render Output

  1. Compute summary metrics from all query results
  2. Assign overall risk rating (see Risk Assessment)
  3. Render inline chat executive summary
  4. Generate markdown file (if requested)

Sample KQL Queries

All queries use RunAdvancedHuntingQuery via the Sentinel Triage MCP server.

Query 1: CVE Severity Distribution

DeviceTvmSoftwareVulnerabilities
| summarize 
    DeviceCount = dcount(DeviceId),
    VulnCount = count()
    by VulnerabilitySeverityLevel
| order by VulnCount desc

Purpose: Top-level severity breakdown for executive summary.


Query 2: Exploitable CVEs

DeviceTvmSoftwareVulnerabilities
| join kind=inner DeviceTvmSoftwareVulnerabilitiesKB on CveId
| where IsExploitAvailable == true
| summarize 
    AffectedDevices = dcount(DeviceName),
    DeviceList = make_set(DeviceName)
    by CveId, VulnerabilitySeverityLevel, CvssScore, VulnerabilityDescription
| order by CvssScore desc, AffectedDevices desc
| take 20

Purpose: Highest-risk CVEs — known exploits mean active threat. These are always Priority 1.


Query 3: Security Config Compliance by Category

DeviceTvmSecureConfigurationAssessment
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| summarize 
    TotalAssessments = count(),
    CompliantCount = countif(IsCompliant == true),
    NonCompliantCount = countif(IsCompliant == false)
    by ConfigurationCategory
| extend ComplianceRate = round(100.0 * CompliantCount / TotalAssessments, 1)
| order by NonCompliantCount desc

Purpose: Compliance posture across OS, Network, Security Controls, Accounts, Application categories.


Query 4: End-of-Support Software

DeviceTvmSoftwareInventory
| where EndOfSupportStatus != ""
| summarize 
    AffectedDevices = dcount(DeviceId),
    DeviceList = make_set(DeviceName)
    by SoftwareVendor, SoftwareName, SoftwareVersion, EndOfSupportStatus, EndOfSupportDate
| order by AffectedDevices desc

Purpose: Identify unsupported software — no patches available, high risk.

EndOfSupportStatus values:

  • EOS Software — Entire product line end-of-support
  • EOS Version — Specific version end-of-support
  • Upcoming EOS Version — EoS within next 6 months

Query 5: Per-Device Vulnerability Counts

DeviceTvmSoftwareVulnerabilities
| summarize 
    Critical = countif(VulnerabilitySeverityLevel == "Critical"),
    High = countif(VulnerabilitySeverityLevel == "High"),
    Medium = countif(VulnerabilitySeverityLevel == "Medium"),
    Low = countif(VulnerabilitySeverityLevel == "Low"),
    Total = count()
    by DeviceName, OSPlatform
| order by Critical desc, High desc, Total desc

Purpose: Per-device vulnerability heatmap — identifies most vulnerable endpoints.


Query 6: Per-Device Compliance Scorecard

DeviceTvmSecureConfigurationAssessment
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| summarize 
    TotalChecks = count(),
    Compliant = countif(IsCompliant == true),
    NonCompliant = countif(IsCompliant == false),
    NotApplicable = countif(IsApplicable == false)
    by DeviceName
| extend ComplianceRate = round(100.0 * Compliant / (Compliant + NonCompliant), 1)
| order by ComplianceRate asc

Purpose: Rank devices by compliance rate — worst-first for remediation priority.


Query 7: Critical Asset Inventory

🔴 MCP Property Access: NodeProperties is stored as a JSON string. Direct dot-notation (NodeProperties.rawData.criticalityLevel) returns null through MCP serialization. MUST use double parse_json(tostring()) extraction — see Known Pitfalls.

ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend critLevel = rawData.criticalityLevel
| extend critValue = toint(critLevel.criticalityLevel)
| extend ruleBasedCrit = toint(critLevel.ruleBasedCriticalityLevel)
| extend ruleNames = tostring(critLevel.ruleNames)
| where isnotnull(critLevel) and critValue < 4
| extend InternetFacing = iff(isnotnull(rawData.IsInternetFacing), "Yes", "No")
| extend VulnerableToRCE = iff(isnotnull(rawData.vulnerableToRCE), "Yes", "No")
| extend VulnerableToPrivEsc = iff(isnotnull(rawData.VulnerableToPrivilegeEscalation), "Yes", "No")
| extend ExposureScore = tostring(rawData.exposureScore)
| project 
    DeviceName = NodeName,
    CriticalityLevel = critValue,
    RuleBasedCriticality = ruleBasedCrit,
    RuleNames = ruleNames,
    InternetFacing,
    VulnerableToRCE,
    VulnerableToPrivEsc,
    ExposureScore,
    NodeLabel
| order by CriticalityLevel asc

Purpose: Inventory critical assets with exposure flags — feeds into prioritization.

Criticality Levels:

  • 0-1: Most critical (domain controllers, high-value servers)
  • 2-3: High priority
  • 4+: Standard (excluded from this query)

Note on zero results: If this query returns 0 results, it means no devices have criticality classifications. Check the raw NodeProperties with ExposureGraphNodes | where set_has_element(Categories, "device") | extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData)) | project NodeName, rawData | take 5 to verify property structure. Criticality is auto-assigned for domain controllers (Level 0) and can be manually assigned in the Exposure Management portal.


Query 8: High-Impact Misconfigurations

DeviceTvmSecureConfigurationAssessment
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| where IsCompliant == false and IsApplicable == true
| summarize AffectedDevices = dcount(DeviceId), DeviceList = make_set(DeviceName) by ConfigurationId
| join kind=inner DeviceTvmSecureConfigurationAssessmentKB on ConfigurationId
| project 
    ConfigurationId,
    ConfigurationName,
    ConfigurationCategory,
    ConfigurationSubcategory,
    ConfigurationImpact,
    RiskDescription,
    RemediationOptions,
    AffectedDevices,
    DeviceList
| order by ConfigurationImpact desc, AffectedDevices desc
| take 20

Purpose: Top misconfigurations ranked by impact score with actionable remediation steps from the KB.

ConfigurationImpact scores:

  • 9-10: Critical — must remediate immediately
  • 7-8: High — remediate in short term
  • 4-6: Medium — plan remediation
  • 1-3: Low — monitor

Query 9: Defender Health Fleet Summary

// Defender Health Fleet Summary — compliance by control × OS × active/inactive status
// Active = DeviceInfo last seen within 7 days; Inactive = last seen > 7 days ago
// SCID Mapping:
//   Windows: scid-2010 (AVMode), scid-2011 (AVSignatures), scid-2012 (RTP),
//            scid-2013 (PUA), scid-2016 (CloudProtection), scid-2003 (TamperProtection),
//            scid-91 (BehaviourMonitoring), scid-2030 (CoreComponentsUpdate)
//   macOS:   scid-5090 (RTP), scid-5091 (PUA), scid-5094 (Cloud), scid-5095 (AVSigs)
//   Linux:   scid-6090 (RTP), scid-6091 (PUA), scid-6094 (Cloud), scid-6095 (AVSigs)
let defenderSCIDs = dynamic([
    "scid-2010", "scid-2011", "scid-2012", "scid-2013", "scid-2016", 
    "scid-2003", "scid-91", "scid-2030",
    "scid-5090", "scid-5091", "scid-5094", "scid-5095",
    "scid-6090", "scid-6091", "scid-6094", "scid-6095"
]);
let deviceStatus = DeviceInfo
| summarize arg_max(Timestamp, DeviceName, OSPlatform) by DeviceId
| extend DeviceActivity = iff(Timestamp > ago(7d), "Active", "Inactive");
DeviceTvmSecureConfigurationAssessment
| where ConfigurationId in~ (defenderSCIDs)
| where IsApplicable == 1
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| extend Control = case(
    ConfigurationId =~ "scid-2010", "AVMode",
    ConfigurationId =~ "scid-2011", "AVSignatures",
    ConfigurationId =~ "scid-2012", "RealtimeProtection",
    ConfigurationId =~ "scid-2013", "PUAProtection",
    ConfigurationId =~ "scid-2016", "CloudProtection",
    ConfigurationId =~ "scid-2003", "TamperProtection",
    ConfigurationId =~ "scid-91", "BehaviourMonitoring",
    ConfigurationId =~ "scid-2030", "CoreComponentsUpdate",
    ConfigurationId =~ "scid-5090", "RealtimeProtection",
    ConfigurationId =~ "scid-5091", "PUAProtection",
    ConfigurationId =~ "scid-5094", "CloudProtection",
    ConfigurationId =~ "scid-5095", "AVSignatures",
    ConfigurationId =~ "scid-6090", "RealtimeProtection",
    ConfigurationId =~ "scid-6091", "PUAProtection",
    ConfigurationId =~ "scid-6094", "CloudProtection",
    ConfigurationId =~ "scid-6095", "AVSignatures",
    ConfigurationId)
| join kind=leftouter deviceStatus on DeviceId
| extend DeviceActivity = coalesce(DeviceActivity, "Unknown")
| summarize
    Compliant = countif(IsCompliant == 1),
    NonCompliant = countif(IsCompliant == 0),
    TotalDevices = dcount(DeviceId)
    by Control, OSPlatform, DeviceActivity
| extend ComplianceRate = round(100.0 * Compliant / (Compliant + NonCompliant), 1)
| order by DeviceActivity asc, Control asc, OSPlatform asc

Purpose: Fleet-scale Defender for Endpoint health dashboard. Shows compliance rates for each security control by OS platform, split by active/inactive device status. Designed for environments with 1000+ devices — does NOT list individual devices.

Defender Controls Assessed:

Control Description Critical?
AVMode Antivirus running in Active mode (vs Passive/EDR Blocked) 🔴 Yes
AVSignatures Antivirus signature definitions are current 🟠 High
RealtimeProtection Real-time file scanning enabled 🔴 Yes
PUAProtection Potentially Unwanted Application blocking enabled 🟡 Medium
CloudProtection Cloud-delivered protection (MAPS) enabled 🟠 High
TamperProtection Tamper Protection prevents disabling security settings 🔴 Yes
BehaviourMonitoring Behavioral analysis and monitoring enabled 🟠 High
CoreComponentsUpdate MDE unified agent / core components current 🟡 Medium

Active vs Inactive Classification:

  • Active: Device last seen in DeviceInfo within 7 days — these are operational endpoints
  • Inactive: Device last seen > 7 days ago — stale signature data is expected and should NOT be flagged as a security gap

Interpretation guidance: Focus on active devices with non-compliant critical controls (AVMode, RTP, TamperProtection). Inactive devices with stale AVSignatures are expected — report as "X inactive devices not reporting" rather than "X devices with outdated signatures."

SCID reference: Based on Jeffrey Appel's Defender health guide and Azure/Azure-Sentinel MDE_DeviceHealth.YAML.


Query 10a: Vulnerable Device Exposure Summary

ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend HasHighCritVulns = isnotnull(rawData.highRiskVulnerabilityInsights) 
    and tostring(parse_json(tostring(rawData.highRiskVulnerabilityInsights)).hasHighOrCritical) == "true"
| extend VulnerableToRCE = isnotnull(rawData.vulnerableToRCE)
| extend VulnerableToPrivEsc = isnotnull(rawData.VulnerableToPrivilegeEscalation)
| extend InternetFacing = isnotnull(rawData.IsInternetFacing)
| extend critLevel = rawData.criticalityLevel
| extend IsCritical = isnotnull(critLevel) and toint(critLevel.criticalityLevel) < 4
| summarize 
    TotalDevices = count(),
    HighCritVulnDevices = countif(HasHighCritVulns),
    RCEVulnDevices = countif(VulnerableToRCE),
    PrivEscVulnDevices = countif(VulnerableToPrivEsc),
    InternetFacingDevices = countif(InternetFacing),
    InternetFacingWithHighCritVulns = countif(InternetFacing and HasHighCritVulns),
    CriticalDevices = countif(IsCritical),
    CriticalWithHighCritVulns = countif(IsCritical and HasHighCritVulns)

Purpose: Fast single-table scan that produces executive-level exposure headlines:

  • "X of Y devices have high/critical vulnerabilities"
  • "Z internet-facing devices are vulnerable"
  • "N critical assets have exploitable weaknesses"

Performance: ⚡ Fast — single ExposureGraphNodes scan, no graph-match. Always runs in <5 seconds.

Key property: highRiskVulnerabilityInsights.hasHighOrCritical is the reliable vulnerability flag on device nodes. The property is a nested JSON string requiring parse_json(tostring(...)) to extract. See queries/cloud/exposure_graph_attack_paths.md Node Property Reference for full details.


Query 10b: Edge Connectivity from Vulnerable Devices

let VulnDevices = ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| where isnotnull(rawData.highRiskVulnerabilityInsights)
| extend HasHighCritVulns = tostring(parse_json(tostring(rawData.highRiskVulnerabilityInsights)).hasHighOrCritical) == "true"
| where HasHighCritVulns
| project NodeId;
let TargetNodes = ExposureGraphNodes
| project NodeId, TargetName = NodeName, TargetCategories = Categories, TargetLabel = NodeLabel;
ExposureGraphEdges
| join kind=inner VulnDevices on $left.SourceNodeId == $right.NodeId
| join kind=inner TargetNodes on $left.TargetNodeId == $right.NodeId
| extend TargetType = case(
    set_has_element(TargetCategories, "identity"), "Identity",
    set_has_element(TargetCategories, "compute"), "Compute",
    set_has_element(TargetCategories, "data"), "Data Store",
    set_has_element(TargetCategories, "ip_address"), "IP Address",
    tostring(TargetCategories))
| summarize 
    PathCount = count(),
    UniqueTargets = dcount(TargetNodeId),
    SampleTargets = make_set(TargetName, 5)
    by EdgeLabel, TargetType
| order by PathCount desc

Purpose: Shows the 1-hop blast radius shape from vulnerable devices WITHOUT expensive graph-match. Reveals:

  • How many identities can be reached (lateral movement risk)
  • How many Azure resources are reachable (data exfiltration risk)
  • Which edge types dominate (authentication vs permissions vs network)

Performance: ⚡ Fast — join-based aggregation, no make-graph or graph-match. Runs in <10 seconds even on large graphs.

Interpretation: High counts on "can authenticate as" edges to identities indicate lateral movement risk. High counts on "has permissions to" edges to data stores indicate data exfiltration risk. Feed the most concerning edge types into Q16 (optional deep-dive) if needed.

Portal deep-dive: For interactive multi-hop attack path exploration, use the Exposure Management Attack Paths portal.


Query 11: Defender Health Non-Compliant Exceptions

// Defender Health Non-Compliant Exceptions — exception-based, active devices only
// Groups non-compliant controls per device for fleet-scale readability
// Inactive devices excluded — stale signatures on offline devices are expected
let defenderSCIDs = dynamic([
    "scid-2010", "scid-2011", "scid-2012", "scid-2013", "scid-2016", 
    "scid-2003", "scid-91", "scid-2030",
    "scid-5090", "scid-5091", "scid-5094", "scid-5095",
    "scid-6090", "scid-6091", "scid-6094", "scid-6095"
]);
let deviceStatus = DeviceInfo
| summarize arg_max(Timestamp, DeviceName, OSPlatform) by DeviceId
| extend DeviceActivity = iff(Timestamp > ago(7d), "Active", "Inactive"),
         LastSeen = Timestamp;
DeviceTvmSecureConfigurationAssessment
| where ConfigurationId in~ (defenderSCIDs)
| where IsApplicable == 1
| where IsCompliant == 0
| summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId
| extend Control = case(
    ConfigurationId =~ "scid-2010", "AVMode",
    ConfigurationId =~ "scid-2011", "AVSignatures",
    ConfigurationId =~ "scid-2012", "RealtimeProtection",
    ConfigurationId =~ "scid-2013", "PUAProtection",
    ConfigurationId =~ "scid-2016", "CloudProtection",
    ConfigurationId =~ "scid-2003", "TamperProtection",
    ConfigurationId =~ "scid-91", "BehaviourMonitoring",
    ConfigurationId =~ "scid-2030", "CoreComponentsUpdate",
    ConfigurationId =~ "scid-5090", "RealtimeProtection",
    ConfigurationId =~ "scid-5091", "PUAProtection",
    ConfigurationId =~ "scid-5094", "CloudProtection",
    ConfigurationId =~ "scid-5095", "AVSignatures",
    ConfigurationId =~ "scid-6090", "RealtimeProtection",
    ConfigurationId =~ "scid-6091", "PUAProtection",
    ConfigurationId =~ "scid-6094", "CloudProtection",
    ConfigurationId =~ "scid-6095", "AVSignatures",
    ConfigurationId)
| join kind=inner deviceStatus on DeviceId
| where DeviceActivity == "Active"
| summarize 
    NonCompliantControls = make_set(Control),
    FailedCount = dcount(Control),
    HighestImpact = max(toreal(ConfigurationImpact))
    by DeviceName, OSPlatform, LastSeen
| order by FailedCount desc, HighestImpact desc
| take 100

Purpose: Exception-based reporting — only surfaces active devices failing Defender health controls. Groups all non-compliant controls per device for fleet-scale readability (one row per problem device, not one row per failed check).

Design for scale:

  • Inner join with DeviceInfo → only active devices (seen within 7 days)
  • Summarize by device → one row per device listing all failed controls as an array
  • take 100 → practical limit for very large environments; increase if needed
  • Inactive devices excluded → stale signatures on offline devices are expected, not actionable

Note: If this query returns 0 results, that's a positive finding — report as "✅ All active devices pass all Defender health controls." If the fleet summary (Q9) shows non-compliant devices but all are Inactive, report as: "⚠️ X inactive devices have stale Defender configurations — verify if devices should be decommissioned or reconnected."


Query 13: Certificate Expiration Status

🔴 CRITICAL: DeviceTvmCertificateInfo does NOT have a DeviceName column. You MUST join with DeviceInfo to resolve device names. Using DeviceName directly will fail with SemanticError: Failed to resolve scalar expression named 'DeviceName'. The query below already includes the required join. If the table returns empty or error, skip gracefully — it requires Defender Vulnerability Management add-on licensing.

DeviceTvmCertificateInfo
| extend Status = case(
    ExpirationDate < now(), "Expired",
    ExpirationDate < datetime_add('day', 30, now()), "Expiring within 30 days",
    "Valid"
)
| where Status != "Valid"
| summarize CertCount = count() by Status, DeviceId
| join kind=inner (
    DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| project DeviceName, Status, CertCount
| order by Status asc, CertCount desc

Purpose: Identify expired and soon-expiring certificates that can cause service outages or security gaps.

Note: DeviceTvmCertificateInfo does NOT have a DeviceName column — you must join with DeviceInfo to resolve device names. If the table returns empty or error, skip gracefully — it requires Defender Vulnerability Management add-on licensing.


Query 14: Top Vulnerable Software

DeviceTvmSoftwareVulnerabilities
| summarize 
    CriticalCVEs = countif(VulnerabilitySeverityLevel == "Critical"),
    HighCVEs = countif(VulnerabilitySeverityLevel == "High"),
    TotalCVEs = count(),
    AffectedDevices = dcount(DeviceId)
    by SoftwareVendor, SoftwareName
| order by CriticalCVEs desc, HighCVEs desc, TotalCVEs desc
| take 15

Purpose: Identify which software products contribute the most vulnerabilities — useful for upgrade/removal decisions.


Query 15: Internet-Facing Critical Assets with Vulnerabilities

ExposureGraphNodes
| where set_has_element(Categories, "device")
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend critLevel = rawData.criticalityLevel
| where isnotnull(critLevel) and toint(critLevel.criticalityLevel) < 4
| where isnotnull(rawData.IsInternetFacing)
| extend VulnerableToRCE = isnotnull(rawData.vulnerableToRCE)
| extend VulnerableToPrivEsc = isnotnull(rawData.VulnerableToPrivilegeEscalation)
| project 
    DeviceName = NodeName,
    CriticalityLevel = toint(critLevel.criticalityLevel),
    VulnerableToRCE,
    VulnerableToPrivEsc,
    NodeLabel
| order by CriticalityLevel asc

Purpose: Highest-risk combination: critical + internet-facing + vulnerable. Always Priority 1 remediation.


Query 16: Multi-Hop Attack Path Enumeration

⚠️ Optional — slow query. Only run when Q10a/Q10b reveal high exposure (e.g., many vulnerable devices with identity edges) and the user explicitly requests attack path enumeration. Skip by default in standard reports.

let IdentitiesAndCriticalDevices = ExposureGraphNodes
| extend rawData = parse_json(tostring(parse_json(tostring(NodeProperties)).rawData))
| extend HasRCEVuln = isnotnull(rawData.vulnerableToRCE)
| extend CritLevel = toint(rawData.criticalityLevel.criticalityLevel)
| extend HasCritLevel = isnotnull(rawData.criticalityLevel)
| where 
    (set_has_element(Categories, "device") and 
        (
            (HasCritLevel and CritLevel < 4)
            or 
            HasRCEVuln
        )
    )
    or 
    set_has_element(Categories, "identity");
ExposureGraphEdges
| where EdgeLabel in~ ("can authenticate as", "CanRemoteInteractiveLogonTo")
| make-graph SourceNodeId --> TargetNodeId with IdentitiesAndCriticalDevices on NodeId
| graph-match (DeviceWithRCE)-[CanConnectAs]->(Identity)-[CanRemoteLogin]->(CriticalDevice)
    where 
        CanConnectAs.EdgeLabel =~ "can authenticate as" and
        CanRemoteLogin.EdgeLabel =~ "CanRemoteInteractiveLogonTo" and
        set_has_element(Identity.Categories, "identity") and 
        set_has_element(DeviceWithRCE.Categories, "device") and DeviceWithRCE.HasRCEVuln and
        set_has_element(CriticalDevice.Categories, "device") and CriticalDevice.HasCritLevel
    project 
        RCEDeviceName = DeviceWithRCE.NodeName,
        IdentityName = Identity.NodeName,
        CriticalDeviceName = CriticalDevice.NodeName,
        CriticalityLevel = tostring(CriticalDevice.CritLevel)
| order by CriticalityLevel asc

Purpose: Discover multi-hop attack chains: RCE-vulnerable device → user identity → critical server. This is the heavy graph-match query — use Q10a/Q10b for fast summary stats, and only run this when deep enumeration is needed.

Note: This query may return 0 results if no RCE→identity→critical-device paths exist. That's a positive finding — report as "✅ No multi-hop attack paths from RCE-vulnerable devices to critical servers detected."

Performance: ⚠️ Slow — uses make-graph + graph-match. Can take 30-60+ seconds on large environments. Filter nodes tightly BEFORE make-graph to reduce graph size.

Additional patterns: See queries/cloud/exposure_graph_attack_paths.md for 30+ query patterns covering cookie chains, permission analysis, choke point detection, and Azure Resource Graph integration.


Drill-Down Reference Queries

⚠️ These queries are NOT part of the standard report workflow. They use DeviceTvmSoftwareEvidenceBeta to map vulnerable software to actual file paths on disk. Use them for targeted drill-downs when the user asks to investigate a specific software's vulnerabilities, identify cleanup targets, or understand why a software has so many CVE versions.

Do NOT run these fleet-wide in large environments — the evidence table can be very large. Always scope to a specific SoftwareName and optionally a DeviceId.

When to Use

Scenario Query Trigger
User asks "why does software X have so many versions?" Q17 After Q14 reveals high version sprawl
User asks "what files are causing these CVEs?" Q18 After Q2 identifies exploitable CVEs for a software
User asks "what can I safely clean up?" Q19 After Q17/Q18 reveal old extension/app version folders
Standard vulnerability report None These queries are NOT used in standard reports

DeviceTvmSoftwareEvidenceBeta — Table Reference

Beta table: Schema and table name may change in future Defender releases. The canonical table name is DeviceTvmSoftwareEvidenceBeta — NOT DeviceTvmSoftwareEvidences or DeviceTvmSoftwareEvidence.

Column Type Description
DeviceId string Device identifier (join with DeviceInfo for DeviceName)
SoftwareVendor string Software vendor name
SoftwareName string Software product name (matches DeviceTvmSoftwareVulnerabilities.SoftwareName)
SoftwareVersion string Detected version (matches DeviceTvmSoftwareVulnerabilities.SoftwareVersion)
DiskPaths dynamic JSON array of file paths where the software was detected on disk
RegistryPaths dynamic JSON array of registry keys evidencing the software installation
LastSeenTime string Last time evidence was observed

Query 17: Version Sprawl by Source — Per-Software Summary

// Drill-down: For a specific software, show all versions with file locations
// categorized by source (Azure extension, application, standalone install, etc.)
// Scope: Single software — ALWAYS filter by SoftwareName
DeviceTvmSoftwareEvidenceBeta
| where SoftwareName =~ '<SOFTWARE_NAME>'
| extend Paths = parse_json(DiskPaths)
| mv-expand Path = Paths
| extend FilePath = tostring(Path)
| extend Source = case(
    FilePath has "Packages\\Plugins", "Azure Extension",
    FilePath has "Program Files\\Microsoft OneDrive", "OneDrive",
    FilePath has "WindowsApps", "Store App",
    FilePath has "Program Files\\dotnet", ".NET Runtime",
    FilePath has "Python", "Python",
    FilePath has "Windows\\System32", "System",
    FilePath has "Program Files\\", "Installed Software",
    FilePath has "dpkg-query", "Linux Package",
    "Other")
| join kind=inner (
    DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| summarize 
    Versions = make_set(SoftwareVersion),
    FileCount = dcount(FilePath),
    Devices = make_set(DeviceName)
    by Source
| extend VersionCount = array_length(Versions), DeviceCount = array_length(Devices)
| order by FileCount desc

Purpose: High-level summary showing WHERE a software's vulnerable files come from — Azure extensions leaving old versions behind, OneDrive version-per-folder sprawl, Store apps, standalone installs, etc. Useful for identifying the root cause of version sprawl and choosing the right remediation approach.

Substitute: Replace <SOFTWARE_NAME> with the software from Q14 results (e.g., openssl, curl, zlib).

When to include in reports: This query produces a compact summary table suitable for including in reports when a specific software dominates the CVE count. Present it under Section 2c (Top Vulnerable Software) as a "Source Breakdown" sub-table for the worst offender.


Query 18: Vulnerable File Paths — CVE to File Mapping

// Drill-down: Map specific software versions to their on-disk file paths
// and correlate with CVE count per version
// Scope: Single software — ALWAYS filter by SoftwareName
let vulnVersions = DeviceTvmSoftwareVulnerabilities
| where SoftwareName =~ '<SOFTWARE_NAME>'
| summarize CVEs = make_set(CveId) by SoftwareVersion
| extend CVECount = array_length(CVEs);
DeviceTvmSoftwareEvidenceBeta
| where SoftwareName =~ '<SOFTWARE_NAME>'
| extend Paths = parse_json(DiskPaths)
| mv-expand Path = Paths
| extend FilePath = tostring(Path)
| join kind=inner (
    DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| join kind=leftouter vulnVersions on SoftwareVersion
| summarize 
    Devices = make_set(DeviceName),
    DeviceCount = dcount(DeviceName)
    by FilePath, SoftwareVersion, CVECount
| order by CVECount desc, DeviceCount desc

Purpose: Maps every vulnerable file path to its version and CVE count. Shows exactly which files on which devices are contributing to CVE exposure. Key for building targeted cleanup scripts.

Substitute: Replace <SOFTWARE_NAME> with the target software name.

Common patterns revealed:

  • Azure extensions: C:\Packages\Plugins\<ExtensionName>\<OldVersion>\...\libcrypto-3-x64.dll — old extension versions left behind after upgrades, each bundling their own OpenSSL/curl/zlib
  • OneDrive: C:\Program Files\Microsoft OneDrive\<version>\ — every OneDrive update creates a new version folder with bundled libraries
  • Store apps: C:\Program Files\WindowsApps\<AppName_Version>\ — managed by Microsoft Store, stale versions auto-cleaned eventually
  • Standalone installs: C:\Program Files\<product>\ — requires manual update or reinstall

Query 19: Stale Extension Folder Detection

// Drill-down: Find OLD Azure extension version folders still on disk
// by comparing evidence paths against the latest installed version
// Scope: All Azure extension evidence — safe to run fleet-wide (small result set)
//
// ⚠️ PITFALL: Version comparison uses string max() which is LEXICOGRAPHIC.
//    "1.29.98" > "1.29.104" because '9' > '1' at position 5.
//    Review results manually — a "stale" folder with a higher numeric version
//    than "latest" means the comparison inverted. This is a known KQL limitation
//    for dotted version strings with variable-width segments.
DeviceTvmSoftwareEvidenceBeta
| extend Paths = parse_json(DiskPaths)
| mv-expand Path = Paths
| extend FilePath = tostring(Path)
| where FilePath has "packages" and FilePath has "plugins"
| extend ExtensionName = extract(@"plugins\\([^\\]+)", 1, FilePath)
| extend ExtensionVersion = extract(@"plugins\\[^\\]+\\([^\\]+)", 1, FilePath)
| where isnotempty(ExtensionName) and isnotempty(ExtensionVersion)
| join kind=inner (
    DeviceInfo | summarize arg_max(Timestamp, DeviceName) by DeviceId
) on DeviceId
| summarize 
    SoftwareVersions = make_set(SoftwareVersion),
    FileCount = dcount(FilePath),
    Devices = make_set(DeviceName)
    by ExtensionName, ExtensionVersion
| as hint.materialized=true AllExtVersions
| join kind=inner (
    AllExtVersions
    | summarize LatestVersion = max(ExtensionVersion) by ExtensionName
) on ExtensionName
| where ExtensionVersion != LatestVersion
| project ExtensionName, StaleVersion = ExtensionVersion, LatestVersion,
    BundledSoftwareVersions = SoftwareVersions, FileCount, Devices
| order by ExtensionName asc, StaleVersion asc

Purpose: Identifies old Azure extension version folders still present on disk after upgrades. These are the primary source of "phantom" CVEs from bundled libraries (OpenSSL, curl, zlib, etc.) that inflate vulnerability counts. Safe to run fleet-wide because it only returns stale folders (small result set).

Known limitation: max(ExtensionVersion) uses lexicographic string comparison, which breaks for version segments with different digit counts (e.g., 1.29.98 vs 1.29.104). Always review results — if a "stale" version number looks higher than "latest," the comparison inverted. There is no built-in KQL function for semantic version comparison.

Regex note: extract() in KQL is case-sensitive. The evidence table stores paths in lowercase (c:\packages\plugins\...), so the regex uses lowercase plugins. The has operator used for filtering is case-insensitive.

Remediation pattern: For each stale extension version folder, the entire folder tree can be safely deleted:

Remove-Item -Recurse -Force "C:\Packages\Plugins\<ExtensionName>\<StaleVersion>"

After cleanup, TVM will reflect the reduced vulnerability count within 4-24 hours.

Common culprits: Azure Monitor Agent (AzureMonitorWindowsAgent), Guest Configuration Agent (ConfigurationforWindows), Azure Security Center (MicrosoftMonitoringAgent), and other Azure Arc extensions that bundle OpenSSL, curl, or zlib.


Risk Assessment

Compute an overall risk rating based on query results:

Rating Criteria
🔴 Critical Any: exploitable Critical CVEs on internet-facing assets, OR compliance rate < 40%, OR internet-facing devices with high/critical vulnerabilities (Q10a), OR high blast radius from vulnerable devices to identities/data stores (Q10b)
🟠 High Any: exploitable High CVEs > 5, OR EoS software on critical assets, OR compliance rate < 60%, OR active devices with RTP/TamperProtection/AVMode non-compliant
🟡 Medium Any: total High CVEs > 50, OR EoS software present, OR compliance rate < 75%, OR expired certificates > 10
🟢 Low None of the above criteria met

Cite specific evidence when assigning risk level (per copilot-instructions.md Evidence-Based Analysis rule).


Report Template

Inline Chat Executive Summary

📊 VULNERABILITY & EXPOSURE REPORT — <DATE>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Overall Risk:** 🔴 / 🟠 / 🟡 / 🟢 <RATING> — <1-sentence justification with evidence>

### Vulnerability Overview
| Severity | CVE Count | Devices Affected |
|----------|-----------|------------------|
| 🔴 Critical | X | Y |
| 🟠 High | X | Y |
| 🟡 Medium | X | Y |
| 🔵 Low | X | Y |

⚠️ **X CVEs with known exploits** — see full report for details

### Configuration Compliance
| Category | Compliant % | Non-Compliant |
|----------|-------------|---------------|
| OS | X% | Y |
| Network | X% | Y |
| Security Controls | X% | Y |
| Accounts | X% | Y |
| Application | X% | Y |

### Attack Path Exposure
| Metric | Count |
|--------|-------|
| Devices with high/critical vulnerabilities | X of Y |
| Internet-facing vulnerable devices | Z |
| Critical assets with vulnerabilities | N |
| Lateral movement paths (identity edges) | X → Y targets |
| Data access paths (permission edges) | X → Y targets |

🔗 **Full interactive attack paths:** [Exposure Management Portal](https://security.microsoft.com/exposure-management/attack-paths)

### Defender Device Health
**Active Devices:** X/Y controls fully compliant across Z active devices
**Inactive Devices:** N devices not reporting (excluded — stale signatures expected)
⚠️ / ✅ **Non-compliant active devices:** <count and failed control names, or "None">

### Key Findings
- 🔴 <Critical finding 1>
- 🟠 <High finding 2>
- ⚠️ <Notable finding 3>
- ✅ <Positive finding>

### 🎯 TOP 3 PRIORITY ACTIONS
1. 🔴 <Action 1 — e.g., Patch X exploitable CVEs on internet-facing assets>
2. 🟠 <Action 2 — e.g., Remediate Y Impact-9 security misconfigurations>
3. ⚠️ <Action 3 — e.g., Upgrade Z end-of-support software>

📄 Full report: reports/exposure/vulnerability_exposure_report_<YYYYMMDD_HHMMSS>.md

Markdown File Structure

The full markdown report file MUST follow this structure:

# Vulnerability & Exposure Management Report

**Generated:** <DATE>
**Scope:** <Org-Wide / Device: HOSTNAME>
**Overall Risk Rating:** 🔴/🟠/🟡/🟢 <RATING>

---

## 1. Executive Summary
- Overall risk rating with evidence
- Key metrics dashboard
- Top 3 priority remediation actions

## 2. CVE Vulnerability Assessment

🔗 **Browse all CVEs in Defender portal:** [Weaknesses](https://security.microsoft.com/vulnerabilities) | [Software Inventory](https://security.microsoft.com/software-inventory)

### 2a. Severity Distribution
<Table: Severity × CVE Count × Device Count>

### 2b. Exploitable Vulnerabilities
<Table: CVE ID, CVSS, Description, Affected Devices — sorted by CVSS desc>

### 2c. Top Vulnerable Software
<Table: Vendor, Software, Critical/High/Total CVEs, Affected Devices>

### 2d. Per-Device Vulnerability Matrix
<Table: Device, OS, Critical/High/Med/Low/Total>

## 3. Security Configuration Compliance

🔗 **Detailed recommendations in Defender portal:** [Security Recommendations](https://security.microsoft.com/exposure-recommendations) | [Vulnerability Management Dashboard](https://security.microsoft.com/vulnerability-management/dashboard)

### 3a. Compliance by Category
<Table: Category, Total, Compliant %, Non-Compliant>

### 3b. Per-Device Compliance Scorecard
<Table: Device, Compliance %, Compliant/NonCompliant/NA counts>

### 3c. High-Impact Misconfigurations (Impact ≥ 8)
For each misconfiguration:
- **Configuration:** <Name>
- **Category:** <Category> > <Subcategory>
- **Impact Score:** <Score>/10
- **Risk:** <RiskDescription>
- **Affected Devices:** <count> (<device list>)
- **Remediation:** <Summary of RemediationOptions — strip HTML tags>

## 4. End-of-Support Software
<Table: Vendor, Software, Version, EoS Status, EoS Date, Affected Devices>

## 5. Exposure Management

### 5a. Critical Asset Inventory
<Table: Device, Criticality Level, Internet-Facing, RCE Vuln, PrivEsc Vuln>

### 5b. Attack Path & Exposure Analysis

**Vulnerable Device Exposure (Q10a):**
| Metric | Count |
|--------|-------|
| Total devices | X |
| Devices with high/critical vulnerabilities | Y |
| Internet-facing vulnerable devices | Z |
| RCE-vulnerable devices | N |
| Critical assets with vulnerabilities | N |

**Blast Radius from Vulnerable Devices — 1-Hop Connectivity (Q10b):**
| Edge Type | Target Type | Path Count | Unique Targets | Sample Targets |
|-----------|-------------|------------|----------------|----------------|
| can authenticate as | Identity | X | Y | ... |
| has permissions to | Data Store | X | Y | ... |
| ... | ... | ... | ... | ... |

**Interpretation:** <Narrative summarizing lateral movement risk, data access risk, and key choke points>

🔗 **Full interactive attack path analysis:** [Exposure Management Portal](https://security.microsoft.com/exposure-management/attack-paths)

> If Q16 was run (optional deep-dive):
> **Multi-Hop Attack Chains (Q16):** <Table: Entry Device → Identity → Target Device / Criticality>
> Or: "✅ No multi-hop attack paths from RCE-vulnerable devices to critical servers detected."

## 6. Endpoint Health

### 6a. Defender Device Health
**Fleet Summary (Active Devices):** <Table: Control × OS Platform × Compliant / NonCompliant / ComplianceRate — active devices only>
**Inactive Device Summary:** <Count of inactive devices by OS — signature staleness is expected, flag for decommissioning review>
**Non-Compliant Exceptions (Active Only):** <Table: Device, OS, Failed Controls, Count — only active devices failing Defender controls>
If no non-compliant active devices: "✅ All active devices pass all Defender health controls"
If non-compliant only on inactive: "⚠️ X inactive devices have stale Defender configurations — verify if devices should be decommissioned or reconnected"

### 6b. Certificate Status
<Table: Device, Expired/Expiring count>

## 7. Prioritized Remediation Plan

🔗 **Track remediation in Defender portal:** [Remediation Activities](https://security.microsoft.com/vulnerability-management/remediation) | [Security Recommendations](https://security.microsoft.com/exposure-recommendations)

| Priority | Category | Action | Impact |
|----------|----------|--------|--------|
| 🔴 Immediate | ... | ... | ... |
| 🟠 Short-term | ... | ... | ... |
| 🟡 Medium-term | ... | ... | ... |
| 🟢 Ongoing | ... | ... | ... |

## 8. Appendix
- Query reference (all KQL queries used)
- Data freshness notes
- Methodology

Per-Device Mode

When user specifies a device name, scope all DeviceTvm queries to that device:

Add filter to Queries 1-6, 8, 9, 11, 13, 14:

| where DeviceName startswith '<DEVICE_NAME>'  // Use startswith — DeviceName is often FQDN (e.g., hostname.domain.com)

ExposureGraph queries (7, 15): Filter by NodeName:

| where NodeName has '<DEVICE_NAME>'  // Use has — NodeName may be FQDN, short name, or contain domain suffix

Per-device report differences:

  • Section 5b (Attack paths) — filter to paths involving the specific device
  • Title changes to: Vulnerability & Exposure Report — <DEVICE_NAME>

Known Pitfalls

Pitfall Impact Mitigation
DeviceName in TVM tables is stored as FQDN (e.g., hostname.domain.com) DeviceName =~ 'hostname' returns 0 results — exact match fails on FQDN MUST use DeviceName startswith '<short_name>' for per-device filtering. startswith matches both short names and FQDNs. Same applies to ExposureGraphNodes.NodeName — use has instead of =~
DeviceTvmCertificateInfo requires Defender VM add-on Query returns empty or error Skip gracefully, note in report: "Certificate data requires Defender Vulnerability Management add-on"
DeviceTvmBrowserExtensions may be empty No browser extension data Skip section, note as "No browser extension data available"

| DeviceTvmSoftwareVulnerabilitiesKB has a specific schema | Ad-hoc project using non-existent columns (CveDescription, ExploitTypes, ExploitVerified, IsExploitVerified, RecommendedSecurityUpdate, RecommendedSecurityUpdateId) returns Failed to resolve scalar expression | Verified columns (via getschema): CveId, CvssScore, CvssVector, CveSupportability, IsExploitAvailable (bool), VulnerabilitySeverityLevel, LastModifiedTime, PublishedDate, VulnerabilityDescription, AffectedSoftware (dynamic), EpssScore (real). There are NO columns named ExploitTypes, ExploitVerified, RecommendedSecurityUpdate, or RecommendedSecurityUpdateId. Those exist on DeviceTvmSoftwareVulnerabilities (the main table), not the KB. Use getschema before adding ad-hoc columns. Stick to skill queries — do NOT improvise projections | | RemediationOptions in KB tables contains HTML | Raw HTML in output | Strip HTML tags when rendering in markdown: remove <br/>, <ol>, <li>, <a> tags, convert to plain text bullet points | | NodeProperties is a JSON string, NOT a parsed dynamic object | Direct dot-notation like NodeProperties.rawData.criticalityLevel returns null through MCP JSON serialization — queries silently return 0 results | MUST use double parse_json(tostring()) extraction: parse_json(tostring(parse_json(tostring(NodeProperties)).rawData)) then access sub-properties. This is the ONLY reliable pattern for NodeProperties access. See Q7, Q10a, Q10b, Q15, Q16 for canonical examples | | ConfigurationBenchmarks in KB contains benchmark mappings | Can enrich report | Optional: extract CIS/NIST benchmark references for compliance mapping | | DeviceTvm assessments refresh periodically | Data may be 12-24h old | Note data freshness in report appendix | | DeviceTvmSecureConfigurationAssessment with Timestamp > ago(1d) returns 0 results | Lab, weekend, and low-activity environments may not have assessments in the last 24h. The ago(1d) filter silently drops all data — the #1 cause of empty Q3/Q6/Q8 results | NEVER use Timestamp > ago(1d) as a pre-filter. Use summarize arg_max(Timestamp, *) by DeviceId, ConfigurationId to dedup to the latest assessment per device×config without a time floor. Q9 and Q11 already use this pattern correctly | | graph-match queries can be slow on large graphs | Timeout possible | Filter nodes BEFORE make-graph to reduce graph size | | parse_json() and graph-match project produce dynamic-typed columns | order by fails with "key can't be of dynamic type" error | Always wrap in explicit type casts (toint(), tostring(), tolong()) before using in order by, summarize, or comparisons. Applies to ALL parse_json() output — not just graph-match. Example: | extend critValue = toint(rawData.criticalityLevel.criticalityLevel) then | order by critValue asc | | DeviceTvmInfoGathering table exists but is NOT used by this skill | Agent may attempt to query it for Defender health data, causing errors due to unfamiliar schema | Defender sensor health is covered by Q9 (SCIDs in DeviceTvmSecureConfigurationAssessment). Do NOT improvise queries against DeviceTvmInfoGathering — its schema differs from other DeviceTvm* tables and is not documented here | | DeviceTvmCertificateInfo has NO DeviceName column | Failed to resolve scalar expression named 'DeviceName' | Join with DeviceInfo \| summarize arg_max(Timestamp, DeviceName) by DeviceId to resolve device names | | Context in DeviceTvmSecureConfigurationAssessment is double-nested JSON | First parse_json(Context) returns an array of JSON strings; items need a second parse_json() to extract values | Use parse_json(tostring(parse_json(Context)[0]))[N] — e.g., [0] for AV mode code, [2] for signature date | | SCID numbers are OS-specific — same control has different IDs per platform | Querying Windows SCIDs on macOS/Linux returns IsApplicable=0 | Use the SCID mapping: Windows 2010-2030, macOS 5090-5095, Linux 6090-6095. Q9/Q11 normalize OS-specific SCIDs to unified control names | | Inactive devices have naturally stale AV signatures | Non-compliant AVSignatures on devices offline >7 days is expected, not a security gap | Always join DeviceInfo to separate active (seen <7d) from inactive devices; report inactive signature staleness as informational only | | DeviceTvmSoftwareEvidenceBeta is a Beta table | Table name and schema may change in future Defender releases | Use exact name DeviceTvmSoftwareEvidenceBeta — NOT DeviceTvmSoftwareEvidences or DeviceTvmSoftwareEvidence. If the table returns SemanticError, it may have been renamed or graduated to GA — check FetchAdvancedHuntingTablesOverview for the current name | | DeviceTvmSoftwareEvidenceBeta has no DeviceName column | Cannot display device names directly | Join with DeviceInfo \| summarize arg_max(Timestamp, DeviceName) by DeviceId — same pattern as DeviceTvmCertificateInfo | | DiskPaths and RegistryPaths are dynamic arrays | Need parse_json() + mv-expand to flatten into individual paths | Pattern: \| extend Paths = parse_json(DiskPaths) \| mv-expand Path = Paths \| extend FilePath = tostring(Path) | | Evidence queries can be expensive fleet-wide | Large environments have millions of file evidence rows | ALWAYS scope to a specific SoftwareName. Never run DeviceTvmSoftwareEvidenceBeta without a filter | | max() on version strings is lexicographic | "1.29.98" > "1.29.104" because '9' > '1' at the 5th character — inverts the comparison for multi-digit segments | Q19 results must be manually reviewed. KQL has no built-in semantic version comparison | | extract() regex is case-sensitive | Evidence table paths are lowercase (c:\packages\plugins\...), but regex patterns with uppercase (e.g., Plugins) won't match | Always use lowercase in extract() patterns for file paths. Use case-insensitive has for filtering |


Error Handling

Error Cause Resolution
SemanticError: Failed to resolve table 'DeviceTvm...' Table not available in AH Verify Defender for Endpoint is onboarded; some DeviceTvm* tables require premium licensing
SemanticError: Failed to resolve table 'ExposureGraphNodes' Exposure Management not enabled Report as: "⚠️ Microsoft Security Exposure Management is not enabled in this tenant. ExposureGraph sections skipped."
Query timeout on graph-match Graph too large Reduce node set with tighter filters; try simpler edge queries first
Empty results from DeviceTvmSoftwareVulnerabilities No onboarded devices or no vulns detected Verify at least one device is MDE-onboarded: `DeviceInfo
DeviceTvmCertificateInfo not found Requires Defender Vulnerability Management add-on Skip section, note in report

Graceful Degradation

If a table or query fails, do not abort the entire report. Skip the affected section and note it:

### 6b. Certificate Status
❓ Certificate data not available — `DeviceTvmCertificateInfo` table not found.
This may require the Defender Vulnerability Management add-on license.

Continue with all remaining sections. The report should always produce output for at least:

  • CVE Vulnerability Assessment (Sections 2a-2d)
  • Security Configuration Compliance (Sections 3a-3c)

These are available in all Defender for Endpoint tenants.


Additional References

审计组织身份安全态势。基于IdentityAccountInfo表,结合IdentityInfo和日志数据,评估账户清单、特权账号、闲置/已删账号、密码策略、风险分布及多提供商身份关联等维度,提供全面的身份卫生与风险分析。
identity posture identity security report account hygiene stale accounts privileged accounts password posture identity providers multi-provider identity identity sprawl service accounts deleted accounts with roles cross-IdP honeytoken sensitive accounts
.github/skills/identity-posture/SKILL.md
npx skills add SCStelz/security-investigator --skill identity-posture -g -y
SKILL.md
Frontmatter
{
    "name": "identity-posture",
    "description": "Audit identity security posture across the organization. Triggers on keywords like \"identity posture\", \"identity security report\", \"account hygiene\", \"stale accounts\", \"privileged accounts\", \"password posture\", \"identity providers\", \"multi-provider identity\", \"identity sprawl\", \"service accounts\", \"deleted accounts with roles\", \"cross-IdP\", \"honeytoken\", \"sensitive accounts\". Queries IdentityAccountInfo in Advanced Hunting (enriched with IdentityInfo and IdentityLogonEvents) for a posture assessment covering account inventory by provider, privileged account audit, stale\/deleted account hygiene, password posture, risk distribution, multi-provider identity linking, MDI tag analysis, and department-level insights. Inline chat or markdown output.",
    "drill_down_prompt": "Run identity posture report — account hygiene, privilege distribution, stale accounts",
    "threat_pulse_domains": [
        "identity"
    ]
}

Identity Security Posture — Instructions

Purpose

This skill audits the identity security posture across your organization using the IdentityAccountInfo table in Microsoft Defender XDR Advanced Hunting, enriched with IdentityInfo and IdentityLogonEvents for password policy and logon activity context.

Modern organizations use multiple identity providers (Entra ID, Active Directory, Okta, SailPoint, CyberArk, Ping, etc.). IdentityAccountInfo is the only table that provides a unified identity graph across these providers, linking accounts to a single IdentityId. This skill systematically evaluates the security posture of that identity fabric.

What this skill covers:

Domain Key Questions Answered
🔍 Identity Inventory How many accounts exist? Across which providers? What types and statuses?
👑 Privileged Account Audit Who holds high-privilege roles? Across which providers? Are they permanent?
🗑️ Stale & Deleted Account Hygiene Which enabled accounts have no logon activity? Do deleted accounts retain permissions?
🔑 Password Posture Password age distribution, PasswordNeverExpires/PasswordNotRequired flags (AD accounts via IdentityInfo join)
🟠 Risk Distribution How are identity risk levels distributed? Which high-risk accounts are still active?
🔗 Multi-Provider Identity Linking Which identities span multiple IdPs? Are there status mismatches across providers?
🏷️ Sensitive & Honeytoken Accounts Which accounts are MDI-tagged? Are sensitive accounts properly protected?
🏢 Organizational Context Account distribution by department, service account inventory

Primary data source: IdentityAccountInfo table (Advanced Hunting) — currently in Preview.

Enrichment tables:

  • IdentityInfo — Adds UserAccountControl (PasswordNeverExpires, PasswordNotRequired), DistinguishedName, RiskLevel, BlastRadius, PrivilegedEntraPimRoles (Preview)
  • IdentityLogonEvents — Last logon timestamps across AD, Entra, Okta, SailPoint, M365 apps
  • SigninLogs — Last Entra ID sign-in for stale account detection (via Data Lake for 90d+ lookback)

References:

🔴 URL Registry — Canonical Links for Report Generation

MANDATORY: When generating reports, copy URLs verbatim from this registry. NEVER construct, guess, or paraphrase a URL. If a URL is not in this registry, omit the hyperlink entirely and use plain text.

Label Canonical URL
DOCS_IDENTITYACCOUNTINFO https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-identityaccountinfo-table
DOCS_IDENTITYINFO https://learn.microsoft.com/en-us/defender-xdr/advanced-hunting-identityinfo-table
DOCS_MDI_ACCOUNTS https://learn.microsoft.com/en-us/defender-for-identity/security-posture-assessments/accounts
DOCS_MDI_HYBRID https://learn.microsoft.com/en-us/defender-for-identity/security-posture-assessments/hybrid-security
DOCS_MDI_INFRA https://learn.microsoft.com/en-us/defender-for-identity/security-posture-assessments/identity-infrastructure
GITHUB_VERBOON_PWD https://github.com/alexverboon/Hunting-Queries-Detection-Rules/blob/main/Defender%20For%20Identity/MDI-Identity-Password%20Security%20Posture%20Assessment.md

Why Identity Posture Matters

Identity is the new perimeter. Attackers consistently target credentials, stale accounts, and over-privileged identities as the path of least resistance into enterprise environments. Key risks this skill detects:

Risk Impact Skill Detection
Stale accounts Dormant accounts with active permissions are prime targets for credential stuffing and lateral movement Q5 (Stale Account Detection)
Deleted accounts with residual permissions Accounts that are deleted but retain group memberships and role assignments create orphan access Q6 (Deleted Account Hygiene)
Permanent privileged roles Standing Global Admin / Security Admin roles violate least-privilege and increase blast radius Q4 (Privileged Account Audit)
Password policy gaps PasswordNeverExpires and PasswordNotRequired on AD accounts undermine credential rotation Q7 (Password Posture)
Multi-provider identity sprawl Same person with accounts across AAD + AD + Okta + CyberArk with inconsistent status/permissions Q8 (Multi-Provider Linking)
High-risk active accounts Accounts flagged High risk by Identity Protection that remain active and privileged Q9 (Risk Distribution)
Unprotected sensitive accounts MDI-tagged Sensitive/Honeytoken accounts without appropriate monitoring Q10 (MDI Tags)

This skill maps directly to the following MDI Security Posture Assessments (see Accounts assessments):

  • Remove stale Active Directory accounts
  • Entra ID privileged users also privileged in AD
  • Identify service accounts in privileged groups
  • Locate accounts in built-in Operator Groups
  • Accounts with passwords older than 180 days

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules — Mandatory rules
  2. Table Schema Reference — IdentityAccountInfo columns
  3. Identity Posture Score Formula — Composite risk scoring
  4. Execution Workflow — Phase-by-phase query plan
  5. Sample KQL Queries — All queries (Q1–Q12)
  6. Output Modes — Inline vs Markdown report
  7. Inline Report Template — Chat-rendered format
  8. Markdown File Report Template — Disk-saved format
  9. SVG Dashboard Generation — Visual dashboard from report
  10. Known Pitfalls — Schema quirks and edge cases
  11. Quality Checklist — Pre-delivery validation

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

  1. ALWAYS use RunAdvancedHuntingQuery — The IdentityAccountInfo table is an Advanced Hunting table. All queries in this skill MUST use RunAdvancedHuntingQuery. Exception: Q5b (stale account enrichment via SigninLogs) may use Data Lake for 90d+ lookback.

  2. ALWAYS deduplicate accounts with arg_max — The table contains multiple snapshots per account (state changes + 24h refresh). Every query that analyzes current account state MUST use | summarize arg_max(Timestamp, *) by AccountId to get the latest record per account.

  3. ASK the user for output format before generating the report:

    • Inline chat summary (quick review in chat)
    • Markdown file report (detailed, archived to reports/identity-posture/)
    • Both (markdown + inline summary)
  4. ⛔ MANDATORY: Evidence-based analysis only — Report ONLY what query results show. Use the explicit absence pattern (✅ No [finding] detected) when queries return 0 results. Never guess or assume.

  5. Dynamic fields require parse_json() + tostring()AssignedRoles, EligibleRoles, GroupMembership, Tags are dynamic arrays. Always use parse_json() for mv-expand and tostring() for string comparisons.

  6. Run queries in parallel batches where possible — Phase 1 queries (Q1–Q3) are independent. Phase 2 queries (Q4–Q8) are independent. Phase 3 (Q9–Q12) are independent.

  7. Time tracking — Report elapsed time after each phase.

  8. Table is in Preview — Some fields documented in the schema may not be populated yet (EnrolledMfas, TenantMembershipType, AuthenticationMethod, CriticalityLevel, DefenderRiskLevel). Handle gracefully — check for empty/null and report as "Not yet populated (Preview)" rather than "No data".

⛔ PROHIBITED ACTIONS

Action Status
Querying IdentityAccountInfo via mcp_sentinel-data_query_lake PROHIBITED — AH-only table
Querying without arg_max(Timestamp, *) by AccountId deduplication PROHIBITED — inflates counts
Reporting empty Preview fields as "No data found" PROHIBITED — report as "Not yet populated (Preview)"
Filtering AssignedRoles or Tags with direct string comparison without parse_json() PROHIBITED — dynamic fields
Assuming SourceProviderRiskLevel or Tags are populated for all providers PROHIBITED — availability varies by IdP

Table Schema Reference

IdentityAccountInfo (Primary)

Column Type Description Population
Timestamp datetime Snapshot timestamp (state change or 24h refresh) ✅ All
AccountId string Internal account identifier (unique per provider account) ✅ All
IdentityId string Unified identity — links accounts across providers ✅ All
AccountUpn string User principal name ✅ All
DisplayName string Display name ✅ All
SourceProvider string Identity provider (AzureActiveDirectory, ActiveDirectory, Okta, SailPoint, CyberArkIdentity, Ping) ✅ All
AccountStatus string Status (Enabled, Disabled, Deleted, ACTIVE, STAGED, DEPROVISIONED, etc.) ✅ All
Type string Account type (User, ServiceAccount) ✅ All
AssignedRoles dynamic Role assignments (AAD roles, CyberArk roles, etc.) ✅ ~60%
EligibleRoles dynamic PIM-eligible roles ❌ Empty (Preview)
GroupMembership dynamic Group IDs ✅ ~72%
Tags dynamic MDI tags (Sensitive, Honeytoken, Privileged Account) ✅ ~1% (tagged accounts only)
SourceProviderRiskLevel dynamic Risk level from source provider (Low/Medium/High/None) ✅ ~18% (AAD + AD)
LastPasswordChangeTime datetime Last password change 🟡 ~1% (sparse — mostly non-AAD)
CreatedDateTime datetime Account creation date ✅ ~99%
Department string Department name ✅ ~60%
Manager string Manager name 🟡 ~1%
City / Country string Location 🟡 <1%
Sid string Security Identifier (cloud SID for AAD, on-prem SID for AD) ✅ ~89%
IsPrimary bool Whether this is the primary account for the linked identity ✅ All
IdentityLinkType string Linkage type (Manual, StrongId) ✅ All
EnrolledMfas dynamic MFA enrollment details ❌ Empty (Preview)
TenantMembershipType string Guest/Member ❌ Empty (Preview)
AuthenticationMethod string Credentials/Federated/Hybrid ❌ Empty (Preview)
CriticalityLevel int Criticality score ❌ Empty (Preview)

IdentityInfo (Enrichment — Join on IdentityId or AccountUpn)

Key columns used for enrichment:

Column Type What It Adds
UserAccountControl dynamic AD flags: PasswordNeverExpires, PasswordNotRequired, etc.
DistinguishedName string AD OU path
RiskLevel string Entra ID risk level (Low/Medium/High)
BlastRadius string UEBA blast radius (Low/Medium/High) — requires Sentinel UEBA
PrivilegedEntraPimRoles dynamic PIM role schedules (Preview — requires MDI)
IsAccountEnabled boolean Account enabled status
RiskStatus string None, AtRisk, Remediated, Dismissed, ConfirmedCompromised

IdentityLogonEvents (Enrichment — Join on AccountUpn)

Used for stale account detection (last logon across AD, Entra, third-party IdPs).


Identity Posture Score Formula

The Identity Posture Score is a composite risk indicator summarizing the security posture of an organization's identity fabric. Higher scores indicate greater risk.

Scoring Dimensions

$$ \text{IdentityPostureScore} = \sum_{i} \text{DimensionScore}_i $$

Each dimension contributes 0–20 points to a maximum of 100:

Dimension Max 🟢 Low (0–5) 🟡 Medium (6–12) 🔴 High (13–20)
Stale/Deleted Account Risk 20 <5% enabled accounts stale; 0 deleted with roles 5–15% stale; <50 deleted with roles >15% stale; >50 deleted accounts retaining roles
Privileged Account Exposure 20 <5 permanent high-priv accounts; all use PIM 5–15 permanent high-priv; some PIM gaps >15 permanent high-priv across multiple providers; no PIM
Password Posture 20 <10% PasswordNeverExpires; avg age <180d 10–40% PwdNeverExpires; avg age 180–365d >40% PwdNeverExpires; avg age >365d; PasswordNotRequired present
Risk Distribution 20 <5% accounts at High risk; all remediated/dismissed 5–10% High risk; some unresolved >10% High risk accounts active; unresolved AtRisk state
Identity Sprawl 20 <5% identities span >1 provider; consistent status 5–15% multi-provider; some status mismatches >15% multi-provider; status mismatches (enabled in one, disabled in another)

Interpretation Scale

Score Rating Action
0–20 ✅ Healthy Normal posture, routine monitoring
21–45 🟡 Elevated Review — minor hygiene gaps detected
46–70 🟠 Concerning Investigate — multiple risk signals present
71–100 🔴 Critical Immediate remediation — significant identity security risk

Execution Workflow

Phase 0: Prerequisites

  1. Confirm RunAdvancedHuntingQuery is available (IdentityAccountInfo is AH-only)
  2. Ask user for output format (inline / markdown / both)

Phase 1: Inventory & Overview (Q1–Q3)

Run in parallel — no dependencies between queries.

Query Purpose Table
Q1 Global inventory summary (accounts, identities, providers, date range) IdentityAccountInfo
Q2 Account status distribution by provider IdentityAccountInfo
Q3 Account type and department distribution IdentityAccountInfo

Phase 2: Security Risk Analysis (Q4–Q8)

Run in parallel — no dependencies between queries.

Query Purpose Tables
Q4 Privileged account audit — high-value roles across providers IdentityAccountInfo
Q5 Stale account detection — enabled with no logon in 90d IdentityAccountInfo + IdentityLogonEvents
Q6 Deleted account hygiene — deleted accounts retaining permissions IdentityAccountInfo
Q7 Password posture — age distribution + AD policy flags IdentityAccountInfo + IdentityInfo
Q7c Built-in & infrastructure account password audit IdentityAccountInfo + IdentityInfo
Q8 Multi-provider identity linking — cross-IdP sprawl and mismatches IdentityAccountInfo

Phase 3: Risk & Governance (Q9–Q12)

Run in parallel — no dependencies between queries.

Query Purpose Tables
Q9 Risk level distribution IdentityAccountInfo
Q10 MDI tags analysis (Sensitive, Honeytoken) IdentityAccountInfo
Q11 Service account inventory IdentityAccountInfo
Q12 Account creation trend IdentityAccountInfo

Phase 4: Score Computation & Report Generation

  1. Compute per-dimension scores from Phase 1–3 data
  2. Sum dimension scores for composite Identity Posture Score
  3. Generate report in requested output mode
  4. Report total elapsed time

Sample KQL Queries

All queries below are verified against the IdentityAccountInfo table schema (2026-03-24). Use them exactly as written, substituting only where noted.

Query 1: Global Inventory Summary

IdentityAccountInfo
| summarize 
    TotalRows = count(),
    UniqueAccounts = dcount(AccountId),
    UniqueIdentities = dcount(IdentityId),
    UniqueUPNs = dcount(AccountUpn),
    MinTimestamp = min(Timestamp),
    MaxTimestamp = max(Timestamp),
    SourceProviders = make_set(SourceProvider),
    AccountTypes = make_set(Type),
    AccountStatuses = make_set(AccountStatus)

Query 2: Account Status Distribution by Provider

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| summarize Count = count() by SourceProvider, AccountStatus, Type
| order by Count desc

Query 3: Department Distribution

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(Department)
| summarize Count = dcount(AccountId) by Department
| order by Count desc
| take 20

Query 4: Privileged Account Audit

🔴 Security-critical query — identifies accounts with high-privilege roles across all identity providers.

let highPrivRoles = dynamic([
    "Global Administrator", "Security Administrator", "Exchange Administrator",
    "SharePoint Administrator", "Application Administrator",
    "Cloud App Security Administrator", "Privileged Role Administrator",
    "Intune Administrator", "Compliance Administrator",
    "Privileged Authentication Administrator", "User Administrator",
    "Azure AD Joined Device Local Administrator",
    "SYSTEM_ADMINISTRATOR", "PRIVILEGE_CLOUD_ADMINISTRATORS",
    "PRIVILEGE_CLOUD_ADMINISTRATORS_LITE",
    "TDR_ADMINISTRATOR", "RISK_MANAGEMENT_ADMIN"
]);
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus in ("Enabled", "ACTIVE")
| where isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"
| mv-expand Role = parse_json(AssignedRoles)
| extend RoleName = tostring(Role)
| where RoleName in (highPrivRoles)
| summarize 
    HighPrivRoles = make_set(RoleName),
    RoleCount = dcount(RoleName)
    by AccountUpn, DisplayName, SourceProvider, AccountStatus
| order by RoleCount desc

Post-processing:

  • Flag accounts with >2 high-privilege roles as excessive
  • Cross-reference with Q8 (multi-provider) — accounts with high-priv roles in both AAD and CyberArk/AD represent dual-privilege risk
  • Check if roles are permanent (currently EligibleRoles is empty in Preview, so all discovered roles appear permanent)
  • Reference MDI Assessment: Entra ID privileged users also privileged in AD
  • Pagination check: If Q4 returns exactly 10,000 rows (AH limit), re-run with | take 500 on the final output and note "Results may be truncated" in the report
  • Global Administrator callout: After the high-priv table, always add a dedicated GA callout listing all accounts with the Global Administrator role. GA is the highest-risk role and should be immediately scannable

Query 4b: Full Role Distribution

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"
| mv-expand Role = parse_json(AssignedRoles)
| summarize AccountCount = dcount(AccountId) by tostring(Role)
| order by AccountCount desc
| take 25

Query 5: Stale Account Detection

🔴 Security-critical query — identifies enabled accounts with no logon activity in 90 days.

let lastLogon = IdentityLogonEvents
| where Timestamp > ago(90d)
| summarize LastLogon = max(Timestamp) by AccountUpn;
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus in ("Enabled", "ACTIVE")
| join kind=leftouter (lastLogon) on AccountUpn
| where isnull(LastLogon) or LastLogon < ago(90d)
| summarize 
    StaleEnabledAccounts = count(),
    WithRoles = countif(isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"),
    WithGroups = countif(isnotempty(tostring(GroupMembership)) and tostring(GroupMembership) != "[]"),
    Providers = make_set(SourceProvider)
    by Type
| order by StaleEnabledAccounts desc

Post-processing:

  • Stale accounts with active roles = highest priority for deprovisioning
  • Reference MDI Assessment: Remove stale Active Directory accounts
  • Note: IdentityLogonEvents has 30d retention in AH. For accurate 90d stale detection, would need SigninLogs via Data Lake. The 30d window still catches accounts with zero recent activity

Query 5b: Stale Account Provider Breakdown

let lastLogon = IdentityLogonEvents
| where Timestamp > ago(30d)
| summarize LastLogon = max(Timestamp) by AccountUpn;
IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus in ("Enabled", "ACTIVE")
| join kind=leftouter (lastLogon) on AccountUpn
| where isnull(LastLogon)
| summarize StaleCount = count() by SourceProvider
| order by StaleCount desc

Query 6: Deleted Account Hygiene

🟠 Governance query — identifies deleted accounts that still retain role assignments and group memberships.

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where AccountStatus == "Deleted"
| extend HasRoles = isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]"
| extend HasGroups = isnotempty(tostring(GroupMembership)) and tostring(GroupMembership) != "[]"
| summarize 
    TotalDeleted = count(),
    DeletedWithRoles = countif(HasRoles),
    DeletedWithGroups = countif(HasGroups),
    DeletedWithBoth = countif(HasRoles and HasGroups),
    Providers = make_set(SourceProvider)

Post-processing:

  • Deleted accounts with roles = orphan permission risk
  • Note: in some providers, "Deleted" status may lag actual deletion. Cross-reference with DeletedDateTime if populated
  • Large numbers indicate lifecycle management gaps

Query 7: Password Posture (IdentityAccountInfo + IdentityInfo Join)

🟠 Security query — combines password age from IdentityAccountInfo with AD policy flags from IdentityInfo. Adapted from Alex Verboon's MDI Password Security Posture Assessment with critical fixes for join direction, null UAC handling, and epoch date filtering.

Key design decisions:

  • IdentityAccountInfo as primary (left) table — using IdentityInfo as primary inflates row counts because IdentityInfo has multiple snapshots per identity. IdentityAccountInfo deduplicated by IdentityId gives the true enabled-account baseline.
  • Join on IdentityId (not AccountUpn) — IdentityId is the stable cross-table key. UPN-based joins can produce 1:many inflation when multiple IdentityInfo records share a UPN.
  • isnotnull(UserAccountControl) guard on IdentityInfo — see Pitfall #8 below. Without this, array_index_of(null, "value") returns null, and null != -1 evaluates to true in KQL, making ALL null-UAC accounts appear to have PasswordNeverExpires.
  • datetime(2000-01-01) date guard — some records contain placeholder dates (e.g., 0001-01-01) producing 700,000+ day password ages.
let accountinfo = IdentityAccountInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where AccountStatus !in ("Disabled", "Deleted", "DEPROVISIONED", "SUSPENDED")
| where Type != "ServiceAccount"
| extend DaysSinceLastPasswordChange =
    iff(isnull(LastPasswordChangeTime) or LastPasswordChangeTime < datetime(2000-01-01), int(null),
        datetime_diff('day', now(), LastPasswordChangeTime))
| extend Sensitive = array_index_of(Tags, "Sensitive") != -1
| project IdentityId, AccountUpn, AccountStatus, SourceProvider,
    LastPasswordChangeTime, DaysSinceLastPasswordChange, Sensitive;
let IdInfo = IdentityInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotnull(UserAccountControl)
| extend PasswordNeverExpires = array_index_of(UserAccountControl, "PasswordNeverExpires") != -1,
         PasswordNotRequired = array_index_of(UserAccountControl, "PasswordNotRequired") != -1
| project IdentityId, PasswordNeverExpires, PasswordNotRequired;
accountinfo
| join kind=leftouter (IdInfo) on IdentityId
| summarize
    TotalEnabled = count(),
    WithPasswordData = countif(isnotnull(DaysSinceLastPasswordChange)),
    AvgPasswordAgeDays = avgif(DaysSinceLastPasswordChange, isnotnull(DaysSinceLastPasswordChange)),
    MaxPasswordAgeDays = maxif(DaysSinceLastPasswordChange, isnotnull(DaysSinceLastPasswordChange)),
    PwdOver365d = countif(DaysSinceLastPasswordChange > 365),
    WithUACData = countif(isnotnull(PasswordNeverExpires)),
    PwdNeverExpires = countif(PasswordNeverExpires == true),
    PwdNotRequired = countif(PasswordNotRequired == true),
    SensitiveAccounts = countif(Sensitive)

Post-processing:

  • WithUACData shows how many accounts had AD UAC flags to check — only on-prem AD accounts monitored by MDI will have this data
  • PwdNeverExpires and PwdNotRequired are now accurate counts (not directional) thanks to the isnotnull(UserAccountControl) guard
  • Report password data coverage: WithPasswordData / TotalEnabled — if < 5%, use condensed template

Query 7b: Password Age Distribution Buckets (with PwdNeverExpires Cross-Reference)

let accountinfo = IdentityAccountInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotnull(LastPasswordChangeTime)
| where LastPasswordChangeTime > datetime(2000-01-01)
| where AccountStatus !in ("Disabled", "Deleted", "DEPROVISIONED", "SUSPENDED")
| where Type != "ServiceAccount"
| extend DaysSinceLastPasswordChange = datetime_diff('day', now(), LastPasswordChangeTime)
| project IdentityId, DaysSinceLastPasswordChange;
let IdInfo = IdentityInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotnull(UserAccountControl)
| extend PasswordNeverExpires = array_index_of(UserAccountControl, "PasswordNeverExpires") != -1
| project IdentityId, PasswordNeverExpires;
accountinfo
| join kind=leftouter (IdInfo) on IdentityId
| extend PasswordAgeBucket = case(
    DaysSinceLastPasswordChange <= 30, "0-30 days",
    DaysSinceLastPasswordChange <= 90, "31-90 days",
    DaysSinceLastPasswordChange <= 180, "91-180 days",
    DaysSinceLastPasswordChange <= 365, "181-365 days",
    "365+ days")
| summarize Accounts = count(), PwdNeverExpires = countif(PasswordNeverExpires == true) by PasswordAgeBucket
| order by Accounts desc

Post-processing:

  • The PwdNeverExpires column per bucket reveals the root cause of stale passwords — if most 365+ day accounts have PwdNeverExpires, the issue is AD password policy, not user neglect
  • Highlight correlation: "X of Y accounts with passwords >365 days old have PasswordNeverExpires set"

Query 7c: Built-In & Infrastructure Account Password Check

🔴 Security query — audits password posture of built-in and infrastructure accounts (krbtgt, Administrator, Guest, MSOL_, AAD_, ADSync*). These accounts are high-value targets — krbtgt password age directly affects Golden Ticket attack risk.

let accountinfo = IdentityAccountInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| extend DaysSinceLastPasswordChange =
    iff(isnull(LastPasswordChangeTime) or LastPasswordChangeTime < datetime(2000-01-01), int(null),
        datetime_diff('day', now(), LastPasswordChangeTime))
| extend Sensitive = array_index_of(Tags, "Sensitive") != -1
| project IdentityId, AccountUpn, AccountStatus, SourceProvider,
    LastPasswordChangeTime, DaysSinceLastPasswordChange, Sensitive;
let IdInfo = IdentityInfo
| where Timestamp > ago(30d)
| summarize arg_max(Timestamp, *) by IdentityId
| where isnotempty(AccountName)
| extend PasswordNeverExpires = iff(isnotnull(UserAccountControl), array_index_of(UserAccountControl, "PasswordNeverExpires") != -1, bool(null)),
         PasswordNotRequired = iff(isnotnull(UserAccountControl), array_index_of(UserAccountControl, "PasswordNotRequired") != -1, bool(null))
| extend OUPath = extract(@"CN=[^,]+,(.*)", 1, DistinguishedName)
| project IdentityId, AccountName, AccountDomain, AccountDisplayName,
    PasswordNeverExpires, PasswordNotRequired, OUPath;
IdInfo
| join kind=leftouter (accountinfo) on IdentityId
| where tolower(AccountName) in ("krbtgt", "administrator", "guest", "admin")
    or tolower(AccountName) startswith "msol_"
    or tolower(AccountName) startswith "aad_"
    or tolower(AccountName) startswith "adsync"
| project AccountName, AccountDomain, AccountDisplayName, AccountStatus,
    SourceProvider, LastPasswordChangeTime, DaysSinceLastPasswordChange,
    PasswordNeverExpires, PasswordNotRequired, Sensitive, OUPath
| order by DaysSinceLastPasswordChange desc

Post-processing:

  • krbtgt: Microsoft recommends rotation every 180 days. Flag any krbtgt account with password >180d as 🔴 High Risk (Golden Ticket attack window). >365d is critical
  • MSOL_/AAD_/ADSync: Azure AD Connect service accounts. If AccountStatus == "Enabled" but the sync is decommissioned, flag as 🟠 stale privileged account. PwdNeverExpires is common but should be monitored
  • Guest: PwdNotRequired is standard Windows behavior for Guest accounts. Flag only if Guest is Enabled (should always be Disabled)
  • Administrator: Check if renamed (may not appear). Flag if password >365d

Query 8: Multi-Provider Identity Linking

🟡 Governance query — identifies identities that span multiple identity providers, including status mismatches.

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| summarize 
    Providers = make_set(SourceProvider),
    ProviderCount = dcount(SourceProvider),
    Statuses = make_set(AccountStatus),
    StatusCount = dcount(AccountStatus),
    UPNs = make_set(AccountUpn),
    RolesSummary = make_set(tostring(AssignedRoles))
    by IdentityId
| where ProviderCount > 1
| extend HasStatusMismatch = StatusCount > 1
| summarize 
    MultiProviderIdentities = count(),
    WithStatusMismatch = countif(HasStatusMismatch),
    MaxProviders = max(ProviderCount),
    ProviderCombos = make_set(strcat_array(Providers, " + "))

Query 8b: Multi-Provider Identity Detail (Top 15)

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| summarize 
    Providers = make_set(SourceProvider),
    ProviderCount = dcount(SourceProvider),
    Statuses = make_set(AccountStatus),
    UPNs = make_set(AccountUpn),
    Roles = make_set(tostring(AssignedRoles))
    by IdentityId, DisplayName
| where ProviderCount > 1
| order by ProviderCount desc
| take 15

Query 9: Risk Level Distribution

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(tostring(SourceProviderRiskLevel))
| summarize 
    Count = dcount(AccountId),
    EnabledCount = dcountif(AccountId, AccountStatus in ("Enabled", "ACTIVE")),
    WithHighPrivRoles = dcountif(AccountId, isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]")
    by tostring(SourceProviderRiskLevel), SourceProvider
| order by Count desc

Post-processing:

  • High-risk accounts that are Enabled + have high-priv roles = critical finding
  • Cross-reference with IdentityInfo RiskStatus for Entra accounts to check if risk has been remediated/dismissed

Query 10: MDI Tags Analysis

🏷️ Governance query — analyzes Defender for Identity tags (Sensitive, Honeytoken, custom tags).

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(tostring(Tags)) and tostring(Tags) != "[]"
| mv-expand Tag = parse_json(Tags)
| extend TagName = tostring(Tag)
| summarize 
    AccountCount = dcount(AccountId),
    Accounts = make_set(AccountUpn, 10)
    by TagName, SourceProvider
| order by AccountCount desc

Post-processing:

  • Sensitive-tagged accounts should be cross-referenced with Q4 (privileged) and Q9 (risk) for comprehensive posture view
  • Honeytoken accounts — verify monitoring is active (any logon should generate an alert)

Query 11: Service Account Inventory

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where Type == "ServiceAccount"
| summarize 
    Count = count(),
    Providers = make_set(SourceProvider),
    Statuses = make_set(AccountStatus),
    EnabledCount = countif(AccountStatus in ("Enabled", "ACTIVE")),
    WithRoles = countif(isnotempty(tostring(AssignedRoles)) and tostring(AssignedRoles) != "[]")

Query 12: Account Creation Trend

📈 Trend query — shows account creation velocity over time.

IdentityAccountInfo
| summarize arg_max(Timestamp, *) by AccountId
| where isnotempty(CreatedDateTime)
| summarize AccountsCreated = count() by bin(CreatedDateTime, 7d), SourceProvider
| order by CreatedDateTime asc

Output Modes

Mode 1: Inline Chat Summary

Render the full analysis directly in the chat response. Best for quick review.

Mode 2: Markdown File Report

Save a comprehensive report to disk at:

reports/identity-posture/Identity_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md

Where {tenant} is a short identifier for the tenant (e.g., contoso, zava). Derive from the tenant domain in config.json or ask the user. If unknown, omit the tenant tag.

Mode 3: Both

Generate the markdown file AND provide an inline summary in chat.

Always ask the user which mode before generating output.


Inline Report Template

Render the following sections in order. Omit sections only if explicitly noted as conditional.

🔴 URL Rule: All hyperlinks in the report MUST be copied verbatim from the URL Registry above. Do NOT generate, recall from memory, or paraphrase any URL. If a needed URL is not in the registry, use plain text (no hyperlink).

# 🔐 Identity Security Posture Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** IdentityAccountInfo (Advanced Hunting — Preview)
**Analysis Period:** <EarliestRecord> → <LatestRecord>
**Identity Providers:** <comma-separated provider list>

---

## Executive Summary

<2-3 sentences: total accounts/identities, key risk findings, overall score>

**Overall Risk Rating:** 🔴/🟠/🟡/✅ <RATING> (<Score>/100)

---

## Key Metrics

| Metric | Value |
|--------|-------|
| Total Accounts (deduplicated) | <N> |
| Unique Identities | <N> |
| Identity Providers | <N> (<list>) |
| Enabled Accounts | <N> |
| Disabled Accounts | <N> |
| Deleted Accounts | <N> |
| Service Accounts | <N> |
| Accounts with High-Privilege Roles | <N> |
| Stale Accounts (no logon 30d*) | <N> |
| Multi-Provider Identities | <N> |
| MDI Sensitive-Tagged Accounts | <N> |

> \* IdentityLogonEvents has 30-day retention in Advanced Hunting. True 90-day stale count is lower. See Stale & Deleted Account Hygiene section for details.

---

## 🔍 Identity Inventory

### Accounts by Provider
| Provider | Accounts | Enabled | Disabled | Deleted | Other | Service Accounts |
|----------|----------|---------|----------|---------|-------|------------------|
| <provider> | <N> | <N> | <N> | <N> | <N> | <N> |
| **Total** | **<N>** | **<N>** | **<N>** | **<N>** | **<N>** | **<N>** |

> **Account count note:** The provider breakdown may sum to slightly more than the deduplicated "Total Accounts" in Key Metrics because `arg_max(Timestamp, *) by AccountId` resolves each account to a single snapshot, while a small number of AccountIds may share provider rows. Always use the deduplicated count from Q1 as the authoritative total.

### Account Status Vocabulary by Provider

| Status | Meaning | Providers |
|--------|---------|----------|
| Enabled / ACTIVE | Active account | AAD, AD, SailPoint, CyberArk, Okta, Ping |
| Disabled | Administratively disabled | AAD, AD |
| Deleted | Soft-deleted (AAD recycle bin) | AAD |
| NONE | No status (SailPoint) | SailPoint |
| INACTIVE | Deactivated | SailPoint |
| STAGED | Provisioned but not activated | Okta |
| DEPROVISIONED | Fully deactivated | Okta |
| PROVISIONED | Created but pending activation | Okta |
| INVITED | Pending acceptance | CyberArk |
| CREATED | Newly created | CyberArk |
| SUSPENDED | Temporarily suspended | CyberArk |

> Include this table in every report. Values are discovered dynamically from Q2 output — add any new statuses observed.

### Department Distribution (Top 15)
| Department | Accounts |
|------------|----------|
| <dept> | <N> |

> **Department aggregation rule:** When case-inconsistent values exist (e.g., "Internal" vs "internal"), collapse them into a single row with combined count and note the inconsistency: `> ⚠️ Department values have case inconsistency: "Internal" (N) and "internal" (N) appear as separate values. Recommend standardizing.`

---

## 👑 Privileged Account Audit

### High-Privilege Role Holders
| Account | Provider | Roles | Status |
|---------|----------|-------|--------|
| <upn> | <provider> | <role list> | <status> |

> 🔴 **Global Administrators (<N>):** <comma-separated list of GA account UPNs> — Best practice: max 2 permanent GA accounts (break glass only). Convert user-facing GA accounts to PIM-eligible.

### Role Distribution (Top 15)
| Role | Account Count |
|------|---------------|
| <role> | <N> |

**Assessment:**
- <emoji> <evidence-based finding about privilege distribution>
- <emoji> <PIM/permanent role finding>
- <emoji> <cross-provider privilege finding>

---

## 🗑️ Stale & Deleted Account Hygiene

### Stale Accounts (Enabled, No Logon in 30d)
| Metric | Value |
|--------|-------|
| Total Stale Enabled | <N> |
| Stale with Active Roles | <N> |
| Stale with Group Memberships | <N> |
| Stale by Provider | <breakdown> |

> ⚠️ **Important caveat:** IdentityLogonEvents has **30-day retention** in Advanced Hunting. Accounts that last logged in 31–90 days ago appear "stale" in this analysis. The true 90-day stale count is likely lower. For accurate 90-day stale detection, cross-reference with SigninLogs via Data Lake (90d+ retention).

### Deleted Accounts with Residual Permissions
| Metric | Value |
|--------|-------|
| Total Deleted | <N> |
| Deleted with Roles | <N> |
| Deleted with Groups | <N> |
| Deleted with Both | <N> |

**Assessment:**
- <emoji> <evidence-based finding about stale account risk>
- <emoji> <deleted account orphan risk finding>

---

## 🔑 Password Posture

<If LastPasswordChangeTime coverage ≥ 5% of enabled accounts — render full section:>
| Metric | Value |
|--------|-------|
| Accounts with Password Data | <WithPasswordData>/<TotalEnabled> (<pct>%) |
| Accounts with UAC Data | <WithUACData> |
| PasswordNeverExpires | <N> of <WithUACData> with UAC data |
| PasswordNotRequired | <N> of <WithUACData> with UAC data |
| Sensitive Accounts | <N> |
| Avg Password Age (days) | <N> |
| Max Password Age (days) | <N> |
| Passwords > 365 days | <PwdOver365d> |

### Password Age Distribution
| Bucket | Accounts | PwdNeverExpires | % |
|--------|----------|-----------------|---|
| 0-30 days | <N> | <N> | <pct>% |
| 31-90 days | <N> | <N> | <pct>% |
| 91-180 days | <N> | <N> | <pct>% |
| 181-365 days | <N> | <N> | <pct>% |
| 365+ days | <N> | <N> | <pct>% |

<Highlight if PwdNeverExpires correlates with 365+ bucket:>
> 🔴 **X of Y accounts with passwords >365 days old have PasswordNeverExpires set** — these passwords will never rotate without manual intervention.

<If LastPasswordChangeTime coverage < 5% of enabled accounts — render condensed format instead:>
⚠️ **Limited data availability:** `LastPasswordChangeTime` populated for <N>/<TotalEnabled> enabled accounts (<pct>%).
Among accounts with data: <N> have passwords >365d old, <N> changed within 30d.
For comprehensive assessment, use Graph API (`/users?$select=passwordPolicies,lastPasswordChangeDateTime`).

### AD Password Policy Flags (via IdentityInfo UAC enrichment)
| Flag | Accounts | Scope |
|------|----------|-------|
| PasswordNeverExpires | <N> | <WithUACData> accounts with UAC data (on-prem AD with MDI only) |
| PasswordNotRequired | <N> | <WithUACData> accounts with UAC data |

> **Data quality note:** UAC flags are only available for on-prem AD accounts monitored by MDI (~<WithUACData>/<TotalEnabled> accounts in this environment). The `isnotnull(UserAccountControl)` filter ensures accurate counts — no inflation from null-UAC accounts.

### Built-In & Infrastructure Account Password Audit

<Render from Q7c results. Always include this section — built-in accounts exist in every AD environment.>

| Account | Domain | Status | Password Age | PwdNeverExpires | PwdNotRequired | Sensitive |
|---------|--------|--------|-------------|----------------|----------------|----------|
| <AccountName> | <AccountDomain> | <Status> | <DaysSinceLastPasswordChange>d | <Yes/No> | <Yes/No> | <Yes/No> |

<Flag critical findings:>
- 🔴 **krbtgt** accounts with password >180 days — Golden Ticket attack window (Microsoft recommends 180-day rotation)
- 🟠 **MSOL_/AAD_/ADSync** accounts still Enabled with PwdNeverExpires — review if Azure AD Connect is still in use
- 🟡 **Guest** accounts with PwdNotRequired — standard Windows behavior, flag only if Enabled

---

## 🟠 Risk Distribution

| Risk Level | Provider | Total | Enabled | With High-Priv Roles |
|------------|----------|-------|---------|----------------------|
| 🔴 High | <provider> | <N> | <N> | <N> |
| 🟠 Medium | <provider> | <N> | <N> | <N> |
| 🟡 Low | <provider> | <N> | <N> | <N> |
| ⚪ None | <provider> | <N> | <N> | <N> |

**Assessment:**
- <emoji> <evidence-based finding about active high-risk accounts>

---

## 🔗 Multi-Provider Identity Linking

| Metric | Value |
|--------|-------|
| Identities Spanning Multiple Providers | <N> |
| Max Providers per Identity | <N> |
| Identities with Status Mismatches | <N> |
| Provider Combinations | <list> |

<If status mismatches found:>
⚠️ **Status Mismatches Detected:** <N> identities have inconsistent status across providers (e.g., Enabled in AAD but DEPROVISIONED in Okta). This indicates lifecycle management gaps.

<Top 5 multi-provider identities table>

---

## 🏷️ Sensitive & Honeytoken Accounts

| Tag | Count | Provider | Sample Accounts |
|-----|-------|----------|----------------|
| <tag> | <N> | <provider> | <upn list> |

**Assessment:**
- <emoji> <honeytoken monitoring confirmation>
- <emoji> <sensitive account protection finding>

---

## Identity Posture Score Card

```
┌─────────────────────────────────────────────────────────────┐
│          IDENTITY POSTURE SCORE: <NN>/100                   │
│                Rating: <EMOJI> <RATING>                     │
├─────────────────────────────────────────────────────────────┤
│ Stale/Deleted  [<bar>] <N>/20  (<short detail>)             │
│ Privileged     [<bar>] <N>/20  (<short detail>)             │
│ Password       [<bar>] <N>/20  (<short detail>)             │
│ Risk Distrib.  [<bar>] <N>/20  (<short detail>)             │
│ Identity Sprawl[<bar>] <N>/20  (<short detail>)             │
└─────────────────────────────────────────────────────────────┘
```

> **Score card detail rule:** Keep `(<short detail>)` to ~30 characters max so text fits within the box. Use abbreviated phrasing, e.g., `885 deleted w/roles; high stale %` not `885 deleted accounts with active role assignments`.

---

## Security Assessment

| Factor | Finding |
|--------|---------|
| <emoji> **<Factor>** | <Evidence-based finding> |

---

## Recommendations

1. <emoji> **<Priority action>** — <evidence and rationale>
2. ...

---

## Next Steps

<1-2 sentences anchoring the immediate follow-up action based on the highest-priority recommendation. Reference the specific recommendation number.>

Example:
> Begin with Recommendation #1 (High-Risk account remediation) by exporting the 560 affected accounts to the security operations team. Schedule a follow-up identity posture review after remediation to verify score improvement.

---

## Appendix: Query Execution Summary

| Query | Description | Records | Time |
|-------|-------------|---------|------|
| Q1 | Global Inventory | <N> | <time> |
| Q2 | Status by Provider | <N> | <time> |
| ... | ... | ... | ... |

Markdown File Report Template

When outputting to markdown file, use the same structure as the Inline Report Template above, saved to:

reports/identity-posture/Identity_Posture_Report_{tenant}_YYYYMMDD_HHMMSS.md

Where {tenant} matches the Mode 2 filename convention above.

Include the following additional sections in the file report that are omitted from inline:

  1. Full privileged account detail table (all high-priv accounts, not just top N)
  2. Complete multi-provider identity listing (all multi-IdP identities with UPN mapping)
  3. Per-provider account detail (full status/type breakdown per provider)
  4. Stale account detail (top stale accounts with last logon dates)
  5. Preview field coverage summary (which documented fields are/aren't populated)

File Report Header

# Identity Security Posture Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Data Source:** IdentityAccountInfo (Advanced Hunting — Preview)
**Enrichment:** IdentityInfo, IdentityLogonEvents
**Analysis Period:** <EarliestRecord> → <LatestRecord> (<N> days)
**Identity Providers:** <N> (<list with account counts>)
**Total Accounts:** <N> (Enabled/Active: ~<N> | Disabled: ~<N> | Deleted: <N> | Other: ~<N>)
**Unique Identities:** <N>

---

Account count convention: Use the deduplicated count from Q1 (dcount(AccountId)) as the authoritative "Total Accounts". Provider breakdowns from Q2 may sum slightly higher due to snapshot resolution. Present status sub-counts with ~ prefix when derived from Q2 provider rows to signal they are approximate breakdowns.


SVG Dashboard Generation

📊 Optional post-report step. After an Identity Security Posture report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/identity-posture/Identity_Posture_Report_<tenant>_<date>.md
  • Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/identity-posture/{report_name}_dashboard.svg

The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.


Known Pitfalls

1. IdentityAccountInfo Is Advanced Hunting Only

Problem: The table does NOT exist in Sentinel Data Lake. Querying via mcp_sentinel-data_query_lake returns SemanticError: Failed to resolve table.

Solution: Always use RunAdvancedHuntingQuery. The table has 30-day retention in AH.

2. Multiple Records Per Account (State Snapshots)

Problem: The table logs configuration snapshots over time (state changes + 24h refresh). Querying without deduplication inflates counts.

Solution: Always use | summarize arg_max(Timestamp, *) by AccountId for current state analysis. Use by IdentityId when you want the latest per unified identity.

3. AccountStatus Values Are Provider-Specific

Problem: Each identity provider uses its own status vocabulary:

  • AAD: Enabled, Disabled, Deleted
  • SailPoint: ACTIVE, NONE, INACTIVE
  • Okta: STAGED, ACTIVE, DEPROVISIONED, PROVISIONED
  • CyberArk: ACTIVE, INVITED, SUSPENDED, CREATED

Solution: When filtering for "active/enabled" accounts, use AccountStatus in ("Enabled", "ACTIVE") to catch both AAD and third-party providers. For "disabled" filtering, include provider-specific disabled states.

4. AssignedRoles Contains Mixed Role Vocabularies

Problem: AssignedRoles contains role names from different providers in the same column — AAD roles ("Global Administrator"), CyberArk roles ("SYSTEM_ADMINISTRATOR"), Okta roles, etc. They are NOT normalized.

Solution: When searching for high-privilege roles, include role names from all providers in the highPrivRoles list. See Q4 for the canonical list.

5. EligibleRoles Is Empty (Preview)

Problem: The EligibleRoles column (for PIM-eligible roles) is documented but currently returns empty for all accounts.

Impact: Cannot distinguish permanent vs PIM-eligible roles from this table alone. All discovered roles in AssignedRoles should be treated as potentially permanent. For accurate PIM data, use Graph API (/roleManagement/directory/roleEligibilityScheduleInstances).

6. EnrolledMfas/TenantMembershipType/AuthenticationMethod Are Empty

Problem: These fields are documented but not yet populated in any provider. This is expected for a Preview table.

Solution: Report as "Not yet populated (Preview)" — not as absence of MFA or guest accounts. For MFA data, use SigninLogs (AuthenticationDetails) or Graph API. For Guest/Member, use IdentityInfo (TenantMembershipType — same issue) or Graph API.

7. LastPasswordChangeTime Is Sparse for AAD

Problem: Only ~1% of accounts have LastPasswordChangeTime populated, mostly non-AAD providers (CyberArk, Okta). AAD accounts typically show null. Some records contain placeholder dates (e.g., 0001-01-01T00:00:00Z) that produce nonsensical password age values (700,000+ days).

Solution: For AD-specific password posture, join with IdentityInfo which has UserAccountControl flags (PasswordNeverExpires, PasswordNotRequired). For cloud-only AAD, password age data may need Graph API enrichment. Always filter where LastPasswordChangeTime > datetime(2000-01-01) to exclude placeholder dates before computing avg/max.

8. array_index_of(null) Returns Null — Not -1

Problem: When UserAccountControl is null (which it is for ~99% of identities in IdentityInfo — only on-prem AD accounts with MDI have it), array_index_of(null, "PasswordNeverExpires") returns null — NOT -1. In KQL, null != -1 evaluates to true. This means Verboon's original pattern array_index_of(UserAccountControl, "PasswordNeverExpires") != -1 incorrectly returns true for ALL accounts with null UserAccountControl, massively inflating PwdNeverExpires counts (e.g., 16,197 false positives out of 16,297 identities).

Solution: In the IdentityInfo let block, add | where isnotnull(UserAccountControl) BEFORE computing the boolean flags. This limits the UAC analysis to accounts that actually have UAC data (~100 out of 16,000+ in a typical environment). The Q7 query uses leftouter join, so accounts without UAC data get null for the flag columns, and countif(PasswordNeverExpires == true) correctly excludes nulls. Counts from this pattern are now accurate, not directional.

8b. Q7 IdentityInfo Join — Use IdentityId, Not AccountUpn

Problem: Joining on AccountUpn can produce 1:many inflation when multiple IdentityInfo records share the same UPN. Additionally, using IdentityInfo as the primary (left) table inflates the row count because IdentityInfo contains multiple snapshot records per identity.

Solution: Use IdentityAccountInfo as the primary table (deduplicated by IdentityId). Join IdentityInfo on IdentityId (the stable cross-table identity key). Deduplicate IdentityInfo by IdentityId as well. This ensures 1:1 matching and the correct enabled-account baseline.

9. Tags Only Available on Accounts with MDI Coverage

Problem: Tags (Sensitive, Honeytoken, etc.) are populated only by Defender for Identity. Accounts from providers without MDI integration won't have tags.

Solution: Don't interpret "no tags" as "not sensitive." Report the count of tagged accounts and note that only MDI-monitored accounts can be tagged.

10. IdentityLogonEvents Has 30-Day Retention in AH

Problem: When using IdentityLogonEvents for stale account detection (Q5), AH only retains 30 days. Accounts that last logged in 31–90 days ago will appear "stale" if only checking IdentityLogonEvents.

Solution: For accurate 90-day stale detection, consider enriching with SigninLogs via Data Lake (90d+ retention). The 30d IdentityLogonEvents window is still useful for identifying accounts with zero recent activity.

11. Deduplication Key: AccountId vs IdentityId

Problem: AccountId is unique per provider-account pair. IdentityId is the unified identity (one person may have multiple AccountIds). Using the wrong key inflates or deflates counts.

Solution:

  • Use by AccountId when counting individual accounts/provider-specific analysis
  • Use by IdentityId when counting people/unified identity analysis
  • Q7 (password posture) uses by IdentityId because it joins with IdentityInfo per person
  • Q8 (multi-provider) groups by IdentityId to detect cross-provider linking

12. SourceProviderRiskLevel vs IdentityInfo.RiskLevel

Problem: Both tables have risk level fields but they may differ:

  • IdentityAccountInfo.SourceProviderRiskLevel: Risk from the source provider (AAD Identity Protection, AD MDI)
  • IdentityInfo.RiskLevel: Entra ID risk level + RiskStatus for remediation state

Solution: For a complete risk picture, check both. SourceProviderRiskLevel covers more providers; IdentityInfo.RiskLevel + RiskStatus gives Entra-specific remediation context.

13. Provider Count Varies by Tenant

Problem: Not all tenants have 6 providers connected. The provider list depends on which identity sources are integrated with Defender XDR / MDI.

Solution: Always report the actual providers found rather than assuming a fixed set. The inventory query (Q1) discovers this dynamically.


Quality Checklist

Before delivering the report, verify:

  • All queries used arg_max(Timestamp, *) by AccountId (or by IdentityId where noted)
  • All queries ran via RunAdvancedHuntingQuery (not Data Lake, except Q5b enrichment)
  • Zero-result queries reported with explicit absence confirmation (✅ pattern)
  • Identity Posture Score computation is transparent with per-dimension evidence
  • AccountStatus filtering handles provider-specific vocabularies
  • Privileged account audit includes roles from all providers (AAD + CyberArk + Okta)
  • Empty Preview fields reported as "Not yet populated (Preview)" not "No data"
  • Password posture correctly notes LastPasswordChangeTime sparsity
  • Multi-provider identity analysis includes status mismatch detection
  • Recommendations are prioritized and evidence-based
  • All hyperlinks copied verbatim from URL Registry
  • No PII from live environments in the SKILL.md file itself
用于调查Microsoft Defender XDR或Sentinel中的安全事件。通过检索元数据、警报和资产,引导用户选择实体(用户、设备、IoC)进行深入分析,支持多工作区选择及循环迭代调查。
investigate incident incident ID incident investigation analyze incident triage incident 包含调查上下文的incident number/ID
.github/skills/incident-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill incident-investigation -g -y
SKILL.md
Frontmatter
{
    "name": "incident-investigation",
    "description": "Use this skill when asked to investigate a security incident by ID from Microsoft Defender XDR or Microsoft Sentinel. Triggers on keywords like \"investigate incident\", \"incident ID\", \"incident investigation\", \"analyze incident\", \"triage incident\", or when an incident number\/ID is mentioned with investigation context. This skill provides comprehensive incident analysis including metadata retrieval, alert listing, asset enumeration, evidence filtering, and deep entity investigation using Sentinel MCP tools and specialized skills.",
    "drill_down_prompt": "Investigate incident {entity} — alert details, entity extraction, timeline reconstruction",
    "threat_pulse_domains": [
        "incidents"
    ]
}

Incident Investigation - Instructions

Purpose

This skill performs comprehensive security investigations on incidents from Microsoft Defender XDR and Microsoft Sentinel. It retrieves incident details, lists alerts, enumerates assets and evidences, and then performs deep investigation on user-selected entities using appropriate tools and specialized skills.

Investigation Flow:

  1. Phase 1: Incident Description - Retrieve metadata, alerts, assets, and evidences
  2. Phase 2: Incident Investigation Menu - Ask the user to select the incident assets and entities that should be investigated.
  3. Phase 2-A: User Investigation - Follow user-investigation skill workflow
  4. Phase 2-B: Device Investigation - Follow computer-investigation skill workflow
  5. Phase 2-C: IoC Investigation - Follow ioc-investigation skill workflow for IPs, URLs, Files, Domains, Hashes
  6. Phase 3: Looping to Phase 2 - Ask the user to select the further assets and entities that should be investigated.

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Phase 1: Incident Description - Metadata, Alerts, Assets, Evidences
  3. Phase 2: Incident Investigation Menu - Presenting the options
  4. Phase 2-A: User Investigation - Using user-investigation skill
  5. Phase 2-B: Device Investigation - Using computer-investigation skill
  6. Phase 2-C: IoC Investigation - Using ioc-investigation skill (IPs, URLs, Files, Domains, Hashes)
  7. Phase 3: Post Incident Investigation - Looping to phase 2
  8. JSON Export Structure - Required fields
  9. Error Handling - Troubleshooting guide

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY incident investigation:

  1. ALWAYS complete Phase 1 first - Retrieve full incident description before any deep investigation
  2. ALWAYS list Sentinel workspaces at the START of Phase 2 - Call list_sentinel_workspaces MCP tool BEFORE presenting the investigation menu
  3. ⛔ ALWAYS complete workspace selection BEFORE any investigation - This is a MANDATORY CHECKPOINT:
    • If 1 workspace: auto-select and display to user
    • If multiple workspaces: ASK USER to select and WAIT for response
    • DO NOT proceed to any entity investigation without a workspace selected
  4. ALWAYS present extracted entities to user - After workspace selection, ask user which entities to investigate
  5. ALWAYS wait for user confirmation - Do not proceed with deep investigation until user selects entities
  6. ALWAYS use the correct tools for each entity type:
    • Users → Follow .github/skills/user-investigation/SKILL.md
    • Devices → Follow .github/skills/computer-investigation/SKILL.md
    • IPs/URLs/Files/Domains/Hashes → Follow .github/skills/ioc-investigation/SKILL.md
  7. ALWAYS track and report time after each major step
  8. ALWAYS filter evidences - Remove internal IPs (RFC1918) and tenant domains from investigation scope. Also remove all public IPs from the devices listed as assets involved in the incident.
  9. ALWAYS defang malicious/suspicious URLs and IPs - NEVER return them as clickable links. Use defang format: hxxps://evil[.]com, 203[.]0[.]113[.]42
  10. ⛔ NEVER auto-select a Sentinel workspace when multiple exist - Workspace selection is MANDATORY:
    • ❌ DO NOT select a workspace on behalf of the user when multiple exist
    • ❌ DO NOT switch to another workspace if a query fails
    • ❌ DO NOT proceed with investigation without explicit user selection
    • ✅ If query fails: STOP, report error, ask user to select different workspace
    • ✅ If multiple workspaces: STOP, list all, WAIT for user selection
    • ✅ Only auto-select if exactly ONE workspace exists

Incident ID Patterns:

Pattern Source Tool to Use
Numeric (e.g., 12345, 98765) Defender XDR / Sentinel GetIncidentById
GUID format Sentinel (internal) Sentinel query_lake MCP tool
INxx-xxxxx format Defender XDR GetIncidentById

⚠️ Sentinel → Defender XDR ID Mapping (Critical):

When an incident is discovered via Sentinel KQL (e.g., SecurityIncident or SecurityAlert tables), its IDs are Sentinel-local and will NOT work with the Triage MCP:

Sentinel Field Triage MCP Accepts? Correct Field to Use
SecurityIncident.IncidentNumber ❌ Returns "not found" Use SecurityIncident.ProviderIncidentId
SecurityAlert.SystemAlertId ❌ Returns "not found" Extract parse_json(ExtendedProperties).IncidentId
SecurityIncident.ProviderIncidentId Pass directly to GetIncidentById

Rule: When querying SecurityIncident for later Triage MCP drill-down, always project ProviderIncidentId alongside IncidentNumber. Use ProviderIncidentId for all GetIncidentById calls.

Date Range Rules:

  • Default analysis window: 7 days before current date to current date (Standard)
  • Investigation depth options:
    • Comprehensive: 30 days window (for thorough analysis)
    • Standard: 7 days window (default)
    • Quick: 1 day window (for rapid triage)
  • Format: ISO 8601 (e.g., 2026-01-17T00:00:00Z to 2026-01-24T00:00:00Z)

Phase 1: Incident Description

This phase retrieves and presents all incident information. Follow the exact structure below.

1.1 Incident Metadata

Retrieve and list the incident's metadata using GetIncidentById:

Field Description
Title Incident display name
Description Detailed incident description
Status Active, Resolved, Redirected
Severity High, Medium, Low, Informational
Priority assessment If available from incident data
Classification TruePositive, FalsePositive, BenignPositive, etc.
Determination Malware, Phishing, etc.
Created Date When incident was created
First Activity Date First malicious activity timestamp
Last Updated Date Most recent modification
Assigned To Analyst assigned to incident
MITRE Categories Tactics and techniques involved
Tags Labels applied to incident

1.2 Incident Alerts

🔴 Tool Selection for Alert Retrieval

Use GetIncidentById with includeAlertsData=true to retrieve incident-specific alerts. This returns only alerts correlated to the incident.

⛔ DO NOT use ListAlerts to retrieve alerts for a specific incident. ListAlerts has NO incidentId parameter — it can only filter by createdAfter, createdBefore, severity, status. Calling it returns all tenant alerts (up to page size 10,000), not incident-specific ones. Any unsupported parameter (e.g., incidentId) is silently ignored.

If GetIncidentById(includeAlertsData=true) returns a truncated or excessively large response (e.g., incident has hundreds of correlated alerts from noise sources like Purview IRM or DLP), use RunAdvancedHuntingQuery as the fallback:

// Get alerts linked to the incident's primary user/entity
AlertInfo
| where Timestamp > datetime(<incident_created_minus_7d>)
| join kind=inner (
    AlertEvidence
    | where Timestamp > datetime(<incident_created_minus_7d>)
    | where EntityType == "User"
    | where AccountUpn =~ "<primary_user_upn>" or AccountObjectId == "<user_object_id>"
    | distinct AlertId
) on AlertId
| project Timestamp, AlertId, Title, Severity, Category, AttackTechniques, DetectionSource, ServiceSource
| order by Timestamp asc

This approach bypasses the Triage MCP's alert cap and gives full control over date range and entity filtering.

Alert Fields to Retrieve

For each alert, retrieve:

  • Alert name
  • Tags
  • Severity
  • Investigation state
  • Status
  • Impacted assets
  • Correlation reason
  • Detection source
  • First activity
  • Last activity

Presentation Rules:

  1. Return as a table (exclude Alert ID column from display)
  2. Order by last activity date descending
  3. Add row numbers starting from 1
  4. If more than 30 alerts exist, note this after the table and provide a Defender portal link
  5. NEVER calculate and write the total number of alerts

1.3 Incident Assets

Retrieve and list ALL assets involved in the incident by type:

Device Assets:

Field Description
Name Device hostname
Domain AD domain
Risk Level Device risk assessment
Exposure Level Vulnerability exposure
OS Platform Operating system

User Assets:

Field Description
Display Name User's full name
UPN User Principal Name
User Status Account status
Domain User's domain
Department Organizational department

App Assets:

Field Description
App Name Application name
App Client ID OAuth client ID
Risk Application risk level
Publisher App publisher

Cloud Resource Assets:

Field Description
Resource Name Cloud resource identifier
Status Resource status
Cloud Environment Azure, AWS, GCP, etc.
Type Resource type

Count assets by type ONLY after retrieving complete lists.

1.4 Incident Evidences

Retrieve evidences classified as malicious or suspicious only:

Processes (Top 10):

  • Get ALL malicious/suspicious processes
  • Return only the 10 most probable signs of malicious activity (use judgment)

Files (Top 10):

  • Get ALL malicious/suspicious files
  • Return only the 10 most probable signs of malicious activity (use judgment)

IP Addresses (Top 10, Filtered):

  • Get ALL malicious/suspicious IPs
  • Filter out RFC1918 internal IPs: 10.x.x.x, 172.16-31.x.x, 192.168.x.x
  • Filter out public IPs associated to the devices listed as assets involved in the incident
  • Return only the first 10 from filtered list
  • DEFANG ALL IPs:** When presenting IPs and domains to the user, ALWAYS use defanged format: 203[.]0[.]113[.]42, evil[.]com. NEVER output clickable malicious indicators.

URLs and DNS Domains (Top 10, Filtered):

  • Get ALL malicious/suspicious URLs and DNS Domains
  • Filter out tenant domain URLs (DNS domains associated with the organization)
  • Return only the first 10 from filtered list
  • DEFANG ALL URLs AND DNS DOMAINS:** When presenting URLs to the user, ALWAYS use defanged format: hxxps://evil[.]com/path, hxxp://malware[.]net. NEVER output clickable malicious URLs.

AD Domains:

  • Return ALL malicious/suspicious AD domains (no limit)

For each evidence type: If more than 10 exist, note this after the table and provide Defender portal link.


Phase 2: Incident Investigation Menu

⛔ MANDATORY CHECKPOINT: Workspace Selection

This checkpoint MUST be completed before ANY entity investigation can proceed.

Step 2.1: List Sentinel Workspaces

ALWAYS execute this step first, regardless of any other considerations:

list_sentinel_workspaces (MCP tool)

Store the result. This determines the workflow for Step 2.3.

Step 2.2: Present Entity Summary

Show a summary of the incident entities and assets from Phase 1:

  • Users (with UPN and display name)
  • Devices (with hostname and risk level)
  • URLs (defanged)
  • IPs (defanged, filtered)
  • File hashes
  • Domains (defanged)

🔴 DEFANG ALL URLs AND DOMAINS: When presenting URLs and DNS Domains to the user, ALWAYS use defanged format: hxxps://evil[.]com/path, hxxp://malware[.]net, evil[.]com. NEVER output clickable malicious URLs.

🔴 DEFANG ALL IPs: When presenting IPs to the user, ALWAYS use defanged format: 203[.]0[.]113[.]42. NEVER output clickable malicious indicators.

Step 2.3: Workspace Selection Gate

IF workspace_count == 1:
    - Auto-select the single workspace
    - Display: "Using Sentinel workspace: [NAME] ([ID])"
    - Set SESSION_WORKSPACE_SELECTED = true
    
ELSE IF workspace_count > 1 AND SESSION_WORKSPACE_SELECTED == false:
    - Display all workspaces with Name and ID
    - ASK USER: "Which Sentinel workspace should I run my searches in? Select one or more, or choose 'all'."
    - WAIT for user response
    - Set SESSION_WORKSPACE_SELECTED = true after selection
    
ELSE IF workspace_count > 1 AND SESSION_WORKSPACE_SELECTED == true:
    - Display: "Continuing with previously selected workspace: [NAME] ([ID])"
    - DO NOT ask again

⛔ DO NOT PROCEED PAST THIS POINT WITHOUT A WORKSPACE SELECTED

If SESSION_WORKSPACE_SELECTED == false after Step 2.3, STOP and ask the user to select a workspace.

Step 2.4: Ask User to Select Entities

Ask the user:

"Which assets and entities involved in the incident should be investigated in depth? Please select them by providing their numbers or names, or simply ask to analyze all of them. The more entities you select, the longer the analysis will take."

🔴 DO NOT OFFER OTHER OPTIONS: Only ask the user whether they want to investigate one or more of the incident entities and assets listed above in more depth.

Read the response.

  • If they do not want to proceed with the proposed investigations, ask them what they want to do.
  • If they want to proceed with one or more of the proposed investigations, continue with Step 2.5.

Step 2.5: Start Investigations

Pre-flight check: Confirm SESSION_WORKSPACE_SELECTED == true before proceeding.

Proceed in accordance with the instructions described below for Phase 2-A, Phase 2-B, and Phase 2-C. When multiple investigation types are selected (users, devices, IoCs) run them in parallel as much as possible.


Phase 2-A: User Investigation

Pre-requisites (MANDATORY)

⛔ VERIFY BEFORE PROCEEDING:

  • SESSION_WORKSPACE_SELECTED == true (workspace explicitly selected by user)
  • SELECTED_WORKSPACE_IDS array is populated with user's selection
  • ✅ User has explicitly selected which user(s) to investigate

If any pre-requisite is FALSE: STOP and return to Phase 2.3 Workspace Selection Gate.

User Investigation Workflow

⚡ PARALLEL EXECUTION: When multiple users are selected, execute user investigations in parallel as much as possible.

📦 WORKSPACE CONTEXT: Pass the selected workspace(s) to all child skill invocations:

  • Use SELECTED_WORKSPACE_IDS from Phase 2.3 for all Sentinel queries
  • If a query fails with table/workspace error: STOP, report error, ask user to select different workspace
  • ⛔ DO NOT automatically retry with a different workspace

For EACH user selected by the user:

🔴 REFERENCE THE SKILL FILE: Read and follow the complete workflow defined in:

.github/skills/user-investigation/SKILL.md

Key Steps (summary - see skill file for full details):

  1. Get User Object ID from Microsoft Graph
  2. Calculate date ranges based on investigation type (Standard/Quick/Comprehensive)
  3. Run parallel data collection:
    • Sign-in anomalies (Signinlogs_Anomalies_KQL_CL — note lowercase 'l' in "logs")
    • Sign-in statistics (apps, locations, IPs)
    • Audit log events
    • Office 365 activity
    • Security incidents involving user
    • Identity Protection risk detections
    • MFA and authentication methods
    • Device compliance status
  4. IP enrichment for flagged addresses
  5. Compile and present findings
  6. Generate HTML report (if requested)

DO NOT copy the full workflow here - always read the skill file for the most current instructions.


Phase 2-B: Device Investigation

Device Investigation Workflow

⚡ PARALLEL EXECUTION: When multiple devices are selected, execute device data collection queries in parallel for ALL devices simultaneously. Run Defender alerts, compliance, logged-on users, vulnerabilities, network/process/file events queries concurrently.

For EACH device selected by the user:

🔴 REFERENCE THE SKILL FILE: Read and follow the complete workflow defined in:

.github/skills/computer-investigation/SKILL.md

Key Steps (summary - see skill file for full details):

  1. Get Device IDs (Entra Device ID + Defender Device ID)
  2. Determine device type (Entra Joined, Hybrid Joined, Entra Registered)
  3. Run parallel data collection:
    • Defender alerts for device
    • Device compliance status
    • Logged-on users
    • Software vulnerabilities
    • Network connections
    • Process events
    • File events
    • Automated investigations
  4. Compile and present findings

DO NOT copy the full workflow here - always read the skill file for the most current instructions.


Phase 2-C: IoC Investigation

IoC Investigation Workflow

⚡ PARALLEL EXECUTION: When multiple IoCs are selected, execute ALL IoC investigation queries in parallel. Run threat intel lookups, Sentinel queries, and organizational exposure queries concurrently for all IoCs.

For EACH IoC selected by the user:

🔴 REFERENCE THE SKILL FILE: Read and follow the complete workflow defined in:

.github/skills/ioc-investigation/SKILL.md

Supported IoC Types:

IoC Type Detection Pattern Key Investigation Points
URL https?:// or domain pattern Malicious indicators, phishing, threat intel, organizational exposure
IPv4 Address \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} Threat intel, network connections, geographic analysis
IPv6 Address Contains multiple colons Same as IPv4
Domain [a-zA-Z0-9][-a-zA-Z0-9]*\.[a-zA-Z]{2,} DNS queries, email threats, reputation
MD5 Hash 32 hex characters File prevalence, malware analysis
SHA1 Hash 40 hex characters File prevalence, malware analysis
SHA256 Hash 64 hex characters File prevalence, malware analysis

Key Steps (summary - see skill file for full details):

  1. Identify IoC type and normalize
  2. Query Defender Threat Intelligence
  3. Check Sentinel ThreatIntelIndicators table
  4. Analyze organizational exposure (devices, connections)
  5. Correlate with CVEs if applicable
  6. Present findings with risk assessment

DO NOT copy the full workflow here - always read the skill file for the most current instructions.


Phase 3: Post-Investigation Loop (MANDATORY)

⛔ CRITICAL: DO NOT END THE RESPONSE WITHOUT COMPLETING THIS PHASE

After completing ALL selected entity investigations in Phase 2, you MUST:

  1. List remaining uninvestigated entities - Show all entities from Phase 1 that were NOT yet investigated
  2. Ask the user to select additional entities - Prompt user to continue or conclude
  3. Wait for user response - Do not assume the investigation is complete

Phase 3 Checklist (Execute After Every Phase 2 Completion)

☐ Step 3.1: Compile list of UNINVESTIGATED entities (exclude already-investigated items)
☐ Step 3.2: Present remaining entities to user with numbered list
☐ Step 3.3: Ask: "Would you like to investigate any of the remaining entities? Select by number/name, or say 'done' to conclude."
☐ Step 3.4: Wait for user response before concluding

Required Prompt Format

After presenting investigation findings, ALWAYS end with:

📋 Remaining Uninvestigated Entities:

# Type Entity Notes
1 Device [DEVICE_NAME] [Risk level or relevant context]
2 File [FILENAME] [Hash or detection status]
3 URL [DEFANGED_URL] [Threat assessment]
... ... ... ...

Would you like to investigate any of these remaining entities? Select by number/name, type "all" to investigate everything, or say "done" to conclude the investigation.

Rules

  • DO NOT include entities that were already investigated in the list
  • DO NOT ask the user to select Sentinel workspaces again (use previously selected workspace)
  • DO NOT provide a final summary or recommendations until the user explicitly says "done" or declines further investigation
  • DO NOT assume the investigation is complete just because selected entities were analyzed

Loop Behavior

IF user selects additional entities:
    → Return to Phase 2 (2-A, 2-B, or 2-C based on entity type)
    → After completion, return to Phase 3 again
    
ELSE IF user says "done" or declines:
    → Proceed to Final Summary
    → Provide recommendations
    → Offer to generate consolidated report

Sentinel MCP Tools Reference

analyze_user_entity

Purpose: Starts asynchronous security analysis of a user entity.

Parameters:

Parameter Type Required Description
userId string Yes User's Azure AD Object ID (GUID)
startTime string Yes ISO 8601 format start time
endTime string Yes ISO 8601 format end time
workspaceId string No Sentinel workspace GUID (optional if only one workspace)

Time Window Options: 30 days (Comprehensive), 7 days (Standard), 1 day (Quick)

Returns: 202 Accepted with analysisId

get_entity_analysis

Purpose: Retrieves results of an asynchronous entity analysis.

Parameters:

Parameter Type Required Description
analysisId string Yes Analysis ID returned from analyze_*_entity

Returns: 200 OK with analysis results when complete, or status if still processing


Quick Start (TL;DR)

When a user requests an incident investigation:

  1. Phase 1 - Incident Description:

    • Retrieve incident metadata using GetIncidentById
    • List top 30 alerts as a table
    • Enumerate all assets by type (devices, users, apps, cloud resources)
    • List filtered evidences (processes, files, IPs, URLs, domains)
  2. ⛔ Phase 2 - Mandatory Workspace Selection:

    • Call list_sentinel_workspaces MCP tool FIRST
    • Present entity summary from Phase 1
    • If 1 workspace: auto-select and display
    • If multiple workspaces: ASK USER to select before proceeding
    • DO NOT proceed to investigations without a workspace selected
  3. Phase 2-A - User Investigation:

    • For each selected user: Follow .github/skills/user-investigation/SKILL.md
    • Present findings
  4. Phase 2-B - Device Investigation:

    • For each selected device: Follow .github/skills/computer-investigation/SKILL.md
    • Present findings
  5. Phase 2-C - IoC Investigation:

    • For each selected IoC (IPs, URLs, Files, Domains, Hashes): Follow .github/skills/ioc-investigation/SKILL.md
    • Present findings
  6. Export & Summary:

    • Create consolidated JSON file
    • Present investigation summary with recommendations

JSON Export Structure

Required Fields

Field Type Description
investigation_metadata object Incident ID, timestamp, investigation phases completed
incident_details object Metadata, alerts, assets, evidences from Phase 1
user_investigations array Results from Phase 2-A (user-investigation skill)
device_investigations array Results from Phase 2-B (computer-investigation skill)
ioc_investigations array Results from Phase 2-C (ioc-investigation skill - includes IPs, URLs, Files, Domains, Hashes)
summary object Key findings, risk assessment, recommendations

Example JSON Structure

{
  "investigation_metadata": {
    "incident_id": "<INCIDENT_ID>",
    "investigation_timestamp": "<ISO_TIMESTAMP>",
    "phases_completed": ["incident_description", "user_investigation", "device_investigation", "ioc_investigation"],
    "total_elapsed_time_seconds": 300
  },
  "incident_details": {
    "metadata": {
      "title": "<INCIDENT_TITLE>",
      "description": "<DESCRIPTION>",
      "severity": "<SEVERITY>",
      "status": "<STATUS>",
      "classification": "<CLASSIFICATION>",
      "determination": "<DETERMINATION>",
      "created_date": "<TIMESTAMP>",
      "first_activity_date": "<TIMESTAMP>",
      "last_updated_date": "<TIMESTAMP>",
      "assigned_to": "<ANALYST>",
      "mitre_categories": ["<TACTIC1>", "<TACTIC2>"],
      "tags": ["<TAG1>", "<TAG2>"]
    },
    "alerts": [
      {
        "name": "<ALERT_NAME>",
        "severity": "<SEVERITY>",
        "status": "<STATUS>",
        "first_activity": "<TIMESTAMP>",
        "last_activity": "<TIMESTAMP>"
      }
    ],
    "assets": {
      "devices": [...],
      "users": [...],
      "apps": [...],
      "cloud_resources": [...]
    },
    "evidences": {
      "processes": [...],
      "files": [...],
      "ip_addresses": [...],
      "urls": [...],
      "ad_domains": [...]
    }
  },
  "user_investigations": [
    {
      "upn": "user@domain.com",
      "user_id": "<GUID>",
      "analysis_id": "<ANALYSIS_ID>",
      "time_window": {
        "start": "<ISO_TIMESTAMP>",
        "end": "<ISO_TIMESTAMP>"
      },
      "findings": {...},
      "risk_level": "High"
    }
  ],
  "device_investigations": [
    {
      "hostname": "<DEVICE_NAME>",
      "device_id": "<GUID>",
      "findings": {...}
    }
  ],
  "ioc_investigations": [
    {
      "ioc_type": "IP",
      "value": "203.0.113.42",
      "findings": {...}
    },
    {
      "ioc_type": "URL",
      "value": "https://example.com",
      "findings": {...},
      "threat_assessment": "Malicious"
    }
  ],
  "summary": {
    "risk_assessment": "High",
    "key_findings": [...],
    "recommendations": [...]
  }
}

Error Handling

Common Issues and Solutions

Issue Solution
Incident not found Verify incident ID format; try Sentinel query if Defender fails
User Object ID not found Verify UPN is correct; check if user exists in Entra ID
analyze_user_entity returns error Check userId is GUID format; verify time window ≤ 30 days
get_entity_analysis still processing Poll again after 5-10 seconds; max 2 minutes
No workspace found Use list_sentinel_workspaces MCP tool to get workspace ID
Device investigation fails Verify device exists in Defender; check device ID type
IoC investigation timeout Reduce date range; check IoC format

Workspace ID Retrieval

If workspace ID is unknown, retrieve it first:

list_sentinel_workspaces (MCP tool)

Returns: List of workspace name/ID pairs

Workspace ID Selection

If there is more than one Sentinel workspace (as retrieved from list_sentinel_workspaces MCP tool), present the list - in terms of workspace names and IDs - to the user so that the user can select which workspace to use for the investigation. Offer also to the user the possibility to use all existing workspaces.

If only one workspace is selected by the user, use the workspaceId of that workspace when calling investigation tools.

If the user asks to consider more than one workspace, use one by one the workspaceId of all of them when calling investigation tools.

Time Window Limits

Tool Time Window Options
User Investigation 30 days (Comprehensive), 7 days (Standard), 1 day (Quick)
Computer Investigation 30 days (Comprehensive), 7 days (Standard), 1 day (Quick)
IoC Investigation 30 days (Comprehensive), 7 days (Standard), 1 day (Quick)

Example Investigation Workflow

User Request: "Investigate incident 12345"

Phase 1: Incident Description

[00:00] Starting incident investigation for ID: 12345

### Incident Metadata
- **Title:** Multi-stage attack with credential theft
- **Severity:** High
- **Status:** Active
- **Classification:** TruePositive
- **Created:** 2026-01-20T10:30:00Z
- **MITRE Categories:** Initial Access, Credential Access, Lateral Movement

### Incident Alerts 
| # | Alert Name | Severity | Status | Last Activity |
|---|------------|----------|--------|---------------|
| 1 | Suspicious sign-in from unusual location | High | New | 2026-01-23 |
| 2 | Credential theft attempt detected | High | InProgress | 2026-01-22 |
| ... | ... | ... | ... | ... |

### Incident Assets
**Devices:**
| Name | Domain | Risk Level | OS |
|------|--------|------------|-----|
| WORKSTATION-01 | contoso.com | High | Windows 11 |
| LAPTOP-EXEC | contoso.com | Medium | Windows 11 |
| SERVER-DC01 | contoso.com | Low | Windows Server 2022 |

**Users:**
| Display Name | UPN | Status | Department |
|--------------|-----|--------|------------|
| John Smith | jsmith@contoso.com | Active | Finance |
| Admin Account | admin@contoso.com | Active | IT |
| Jane Doe | jdoe@contoso.com | Active | HR |
| Service Account | svc-backup@contoso.com | Active | IT |

### Incident Evidences
**IPs (after filtering - excluded private IPs):**
- `203[.]0[.]113[.]42` (Malicious - C2 communication)
- `198[.]51[.]100[.]10` (Suspicious - Data exfiltration)
- `192[.]0[.]2[.]50` (Suspicious - Unusual connection)
...

**URLs (after filtering - excluded managed domains):**
- `hxxps://evil-site[.]com/payload[.]exe` (Malicious)
- `hxxps://phishing[.]example[.]com/login` (Suspicious)
...

[01:30] Phase 1 completed (90 seconds)

Phase 2-A: User Investigation

Which users from the incident assets should be investigated deeply?
Available users:
1. jsmith@contoso.com (Finance)
2. admin@contoso.com (IT)
3. jdoe@contoso.com (HR)
4. svc-backup@contoso.com (IT)

User selects: "1, 2"

[01:35] Starting parallel user analysis for 2 users...
- Getting user Object IDs from Graph API (parallel)
- Starting analyze_user_entity for jsmith@contoso.com (Analysis ID: abc123-def456)
- Starting analyze_user_entity for admin@contoso.com (Analysis ID: xyz789-ghi012)
- Polling for results (parallel)...
[02:15] All analyses complete

### User Analysis: jsmith@contoso.com
**Risk Level:** High
**Key Findings:**
1. Sign-in from unusual location (IP: `203[.]0[.]113[.]42`, Country: Russia)
2. Multiple failed MFA attempts followed by success
3. Unusual file access pattern detected
...

### User Analysis: admin@contoso.com
**Risk Level:** Medium
**Key Findings:**
1. Service account usage from new device
...

[02:20] Phase 2-A completed (45 seconds - parallel execution)

Phase 2-B: Device Investigation

Which devices from the incident assets should be investigated deeply?
Available devices:
1. WORKSTATION-01 (High risk)
2. LAPTOP-EXEC (Medium risk)
3. SERVER-DC01 (Low risk)

User selects: "1"

[03:10] Starting device investigation for WORKSTATION-01...
- Following computer-investigation skill workflow
- Getting device IDs (Entra + Defender)
- Running parallel queries...
[04:30] Device investigation complete

### Device Analysis: WORKSTATION-01
**Key Findings:**
1. Malware execution detected (sha256: abc123...)
2. Outbound C2 communication to 203.0.113.42
3. Credential dumping tool found
...

[04:35] Phase 2-B completed (85 seconds)

Phase 2-C: IoC Investigation

Which IPs, URLs, Files, Domains, or Hashes should be investigated deeply?
Available IoCs:
1. 203[.]0[.]113[.]42 (IP - C2 communication)
2. 198[.]51[.]100[.]10 (IP - Data exfiltration)
3. hxxps://evil-site[.]com/payload[.]exe (URL - Malicious)
4. hxxps://phishing[.]example[.]com/login (URL - Suspicious)
5. abc123def456... (Hash - Malware)

User selects: "1, 3, 4, 5"

[04:40] Starting parallel IoC investigation for 4 IoCs...
- Following ioc-investigation skill workflow
- Running threat intel, Sentinel, and exposure queries in parallel for all IoCs
[05:30] All IoC analyses complete

### IP Analysis: 203[.]0[.]113[.]42
**Threat Assessment:** Malicious
**Key Findings:**
1. Known C2 infrastructure
2. Associated with threat actor APT-XYZ
...

### URL Analysis: hxxps://evil-site[.]com/payload[.]exe
**Threat Assessment:** Malicious
**Key Findings:**
1. Known malware distribution domain
2. 3 devices in organization accessed this URL
...

### URL Analysis: hxxps://phishing[.]example[.]com/login
**Threat Assessment:** Suspicious
**Key Findings:**
1. Phishing page mimicking corporate login
...

### Hash Analysis: abc123def456...
**Threat Assessment:** Malicious
**Key Findings:**
1. Known malware sample
...

[05:35] Phase 2-C completed (55 seconds - parallel execution)

[05:45] Investigation Summary
=========================
**Incident:** 12345 - Multi-stage attack with credential theft
**Total Investigation Time:** 4 minutes 10 seconds (optimized with parallel execution)

**Key Findings:**
1. Compromised user account (jsmith@contoso.com) used for initial access
2. Malware deployed on WORKSTATION-01 establishing C2 channel
3. Credential theft attempt targeting admin account
4. Data exfiltration attempts detected

**Recommendations:**
1. 🔴 CRITICAL: Isolate WORKSTATION-01 immediately
2. 🔴 CRITICAL: Reset credentials for jsmith@contoso.com and admin@contoso.com
3. 🟠 HIGH: Block IP `203[.]0[.]113[.]42` at firewall
4. 🟠 HIGH: Block domain `evil-site[.]com`
5. 🟡 MEDIUM: Review all sign-ins for affected users in past 30 days

**Export:** temp/incident_investigation_12345_20260124.json

Integration with Skill Files

This skill orchestrates investigations by referencing specialized skills:

Investigation Phase Skill/Tool Location/Reference
Phase 1: Incident Description Built-in workflow This file (see Phase 1 section)
Phase 2-A: User Investigation user-investigation skill .github/skills/user-investigation/SKILL.md
Phase 2-B: Device Investigation computer-investigation skill .github/skills/computer-investigation/SKILL.md
Phase 2-C: IoC Investigation ioc-investigation skill .github/skills/ioc-investigation/SKILL.md (IPs, URLs, Files, Domains, Hashes)

🔴 ALWAYS read the referenced skill file before executing that phase to ensure proper workflow execution.

用于调查IP、域名、URL及文件哈希等威胁指标,结合Microsoft Defender情报与高级狩猎进行关联分析,识别漏洞CVE并评估组织暴露面。
investigate IP check domain IoC investigation threat intel is this malicious suspicious URL
.github/skills/ioc-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill ioc-investigation -g -y
SKILL.md
Frontmatter
{
    "name": "ioc-investigation",
    "description": "Use this skill when asked to investigate an Indicator of Compromise (IoC) such as an IP address, DNS domain, URL, or file hash. Triggers on keywords like \"investigate IP\", \"check domain\", \"IoC investigation\", \"threat intel\", \"is this malicious\", \"suspicious URL\", or when an IP\/domain\/URL\/hash is mentioned with investigation context. This skill provides comprehensive IoC analysis using Microsoft Defender Threat Intelligence, Sentinel Threat Intel tables, Advanced Hunting, organizational exposure assessment, CVE correlation, and affected device enumeration.",
    "drill_down_prompt": "Investigate IoC {entity} — threat intel, organizational exposure, affected devices",
    "threat_pulse_domains": [
        "identity",
        "endpoint",
        "email",
        "exposure"
    ]
}

IoC (Indicator of Compromise) Investigation - Instructions

Purpose

This skill performs comprehensive security investigations on Indicators of Compromise (IoCs) including:

  • IP Addresses: Network connections, threat intel matches, geographic analysis, organizational exposure
  • DNS Domains: Domain reputation, connection events, email-based threats, URL analysis
  • URLs: URL reputation, phishing detection, email delivery, browser activity
  • File Hashes: Malware analysis, file prevalence, related alerts, affected devices

The investigation correlates IoCs with Microsoft Defender Threat Intelligence, identifies associated CVEs, and enumerates organizational assets affected by those vulnerabilities.


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Investigation Types - By IoC type
  3. Quick Start - 5-step investigation pattern
  4. Execution Workflow - Complete process
  5. Sample KQL Queries - Validated query patterns
  6. Defender API Queries - Threat Intel & Vulnerability Management
  7. JSON Export Structure - Required fields
  8. Error Handling - Troubleshooting guide

Investigation shortcuts:

  • Suspicious IP from spray/brute-force (TP Q4): Q2 (network connections) → Q11 (sign-in analysis) → Q8 (alert evidence) → Q1 (TI match)
  • IP from user risk event (TP Q3): Q11 (sign-in analysis) → Q2 (device connections) → Q9 (security alerts) → enrich_ips.py
  • Phishing domain/URL (TP Q8): Q4 (DNS/HTTP connections) → Q6 (email delivery) → Q8 (alert evidence) → Q1 (TI match)
  • File hash from incident (TP Q1): Q7 (file events across all tables) → Q9 (security alerts) → Q10 (custom indicator check) → Q12 (CVE extraction)
  • IoC organizational exposure (TP Q1+Q11): Q2/Q4 (affected devices) → Q9 (alert correlation) → Q12 (CVEs from alerts)

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY IoC investigation:

  1. ALWAYS identify the IoC type FIRST (IP, Domain, URL, or File Hash)
  2. ALWAYS normalize the IoC (lowercase domains, validate IP format, extract domain from URL)
  3. ALWAYS calculate date ranges correctly (use current date from context - see Date Range section)
  4. ALWAYS track and report time after each major step (mandatory)
  5. ALWAYS run independent queries in parallel (drastically faster execution)
  6. ALWAYS use create_file for JSON export (NEVER use PowerShell terminal commands)
  7. ⛔ ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)

⛔ MANDATORY: Sentinel Workspace Selection

This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:

When invoked from a parent skill (incident-investigation, threat-pulse, etc.):

  • Inherit the workspace selection from the parent investigation context
  • If no workspace was selected in parent context: STOP and ask user to select
  • Use the SELECTED_WORKSPACE_IDS passed from the parent skill
  • Skip output mode prompts — default to inline chat (the parent skill controls the final output format)

When invoked standalone (direct user request):

  1. ALWAYS call list_sentinel_workspaces MCP tool FIRST
  2. If 1 workspace exists: Auto-select, display to user, proceed
  3. If multiple workspaces exist:
    • Display all workspaces with Name and ID
    • ASK: "Which Sentinel workspace should I use for this investigation?"
    • ⛔ STOP AND WAIT for user response
    • ⛔ DO NOT proceed until user explicitly selects
  4. If a query fails on the selected workspace:
    • ⛔ DO NOT automatically try another workspace
    • STOP and report the error
    • Display available workspaces
    • ASK user to select a different workspace
    • WAIT for user response

Workspace Failure Handling

IF query returns "Failed to resolve table" or similar error:
    - STOP IMMEDIATELY
    - Report: "⚠️ Query failed on workspace [NAME] ([ID]). Error: [ERROR_MESSAGE]"
    - Display: "Available workspaces: [LIST_ALL_WORKSPACES]"
    - ASK: "Which workspace should I use instead?"
    - WAIT for explicit user response
    - DO NOT retry with a different workspace automatically

🔴 PROHIBITED ACTIONS:

  • ❌ Selecting a workspace without user consent when multiple exist
  • ❌ Switching to another workspace after a failure without asking
  • ❌ Proceeding with investigation if workspace selection is ambiguous
  • ❌ Assuming a workspace based on previous sessions

IoC Type Detection Rules:

Pattern IoC Type Normalization
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} IPv4 Address Validate octets ≤255
[a-fA-F0-9:]+ (with multiple colons) IPv6 Address Lowercase, expand if needed
[a-zA-Z0-9][-a-zA-Z0-9]*\.[a-zA-Z]{2,} Domain Lowercase, remove trailing dot
https?://.* or starts with www. URL Extract domain for separate analysis
32 hex chars MD5 Hash Lowercase
40 hex chars SHA1 Hash Lowercase
64 hex chars SHA256 Hash Lowercase

Date Range Rules:

  • Real-time/recent searches: Add +2 days to current date for end range
  • Historical ranges: Add +1 day to user's specified end date
  • Example: Current date = Jan 23; "Last 7 days" → datetime(2026-01-16) to datetime(2026-01-25)

Available Investigation Types

IP Address Investigation

When to use: Suspicious inbound/outbound connections, firewall alerts, sign-in anomalies

Example prompts:

  • "Investigate IP 203.0.113.42"
  • "Is 198.51.100.10 malicious?"
  • "Check threat intel for 192.0.2.1"

Data sources:

  • Defender Threat Intelligence (IP alerts, statistics)
  • DeviceNetworkEvents (connection history)
  • ThreatIntelIndicators (Sentinel TI table)
  • SigninLogs (if used for authentication)
  • Defender IOC list (custom indicators)
  • enrich_ips.py (3rd-party enrichment: ipinfo.io geo/ISP, vpnapi.io VPN/proxy/Tor, AbuseIPDB abuse score & reports, Shodan ports/services/CVEs/tags)

Domain Investigation

When to use: Suspicious DNS queries, phishing domains, C2 communication

Example prompts:

  • "Investigate domain malware-c2.example.com"
  • "Is evil.com in our threat intel?"
  • "Check if any devices connected to suspicious.net"

Data sources:

  • DeviceNetworkEvents (DNS queries, HTTP connections)
  • EmailUrlInfo (email-delivered URLs)
  • ThreatIntelIndicators (domain indicators)
  • Defender IOC list (blocked domains)
  • UrlClickEvents (user clicks on domain)

URL Investigation

When to use: Phishing links, malicious downloads, suspicious redirects

Example prompts:

Data sources:

  • EmailUrlInfo (URLs in emails)
  • UrlClickEvents (click tracking)
  • DeviceNetworkEvents (HTTP/HTTPS connections)
  • DeviceFileEvents (downloads from URL)
  • ThreatIntelIndicators (URL patterns)

File Hash Investigation

When to use: Malware analysis, suspicious executables, file reputation

Example prompts:

  • "Investigate hash a1b2c3d4e5f6..."
  • "Is this SHA256 known malware?"
  • "Which devices have this file?"

Data sources:

  • Defender File Info & Statistics
  • Defender File Alerts
  • Defender File Related Machines
  • DeviceFileEvents (file creation/modification)
  • ThreatIntelIndicators (file hash indicators)

Quick Start (TL;DR)

When a user requests an IoC investigation:

  1. Identify & Normalize IoC:

    - Detect IoC type (IP/Domain/URL/Hash)
    - Normalize format (lowercase, validate)
    - Extract embedded IoCs (domain from URL)
    
  2. Run Parallel Queries (Batch 1 - Threat Intel):

    • Sentinel ThreatIntelIndicators query
    • Defender Indicators lookup (ListDefenderIndicators)
    • Defender IP/File alerts (GetDefenderIpAlerts or GetDefenderFileAlerts)
    • Defender IP/File statistics
  3. Run 3rd-Party IP Enrichment (IP IoCs only):

    python enrich_ips.py <IP_ADDRESS>
    
    • ipinfo.io: Geolocation, ISP/ASN, hosting provider
    • vpnapi.io: VPN, proxy, Tor exit node detection
    • AbuseIPDB: Abuse confidence score, recent attack reports
    • Shodan: Open ports, services/banners, CVEs, tags (e.g., c2, eol-os, self-signed)
  4. Run Parallel Queries (Batch 2 - Activity):

    • DeviceNetworkEvents (connections involving IoC)
    • AlertEvidence (alerts with IoC as evidence)
    • SecurityAlert (alerts mentioning IoC)
    • EmailUrlInfo (if domain/URL)
  5. CVE & Vulnerability Correlation:

    • Extract CVE IDs from threat intel results AND Shodan enrichment
    • For each CVE: ListDefenderMachinesByVulnerability
    • Aggregate affected devices
  6. Export to JSON & Generate Summary:

    temp/ioc_investigation_{ioc_normalized}_{timestamp}.json
    

Execution Workflow

🚨 MANDATORY: Time Tracking Pattern

YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:

[MM:SS] ✓ Step description (XX seconds)

Required Reporting Points:

  1. After IoC normalization and type detection
  2. After 3rd-party IP enrichment (IP IoCs)
  3. After Defender/Sentinel threat intelligence lookup
  4. After activity/connection analysis
  5. After CVE correlation and device enumeration
  6. After JSON file creation
  7. Final: Total elapsed time

Phase 1: IoC Identification and Normalization (REQUIRED FIRST)

Step 1.1: Detect IoC Type

# Regex patterns for IoC detection
IPv4: r'^(\d{1,3}\.){3}\d{1,3}$'
IPv6: r'^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$'
Domain: r'^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$'
URL: r'^https?://'
MD5: r'^[a-fA-F0-9]{32}$'
SHA1: r'^[a-fA-F0-9]{40}$'
SHA256: r'^[a-fA-F0-9]{64}$'

Step 1.2: Normalize IoC

  • IP Address: Validate octets, detect IPv4 vs IPv6
  • Domain: Lowercase, remove trailing dots, extract from URL if needed
  • URL: Keep full URL, also extract domain for parallel investigation
  • Hash: Lowercase

Step 1.3: Create Investigation Context

{
  "ioc_type": "ip|domain|url|hash",
  "ioc_value": "<normalized_value>",
  "ioc_original": "<user_provided_value>",
  "extracted_domain": "<if_url>",
  "investigation_start": "<timestamp>",
  "date_range_start": "<StartDate>",
  "date_range_end": "<EndDate>"
}

Phase 2: 3rd-Party IP Enrichment (IP Address IoCs)

MANDATORY for all IP address investigations. Run enrich_ips.py to get external threat intelligence context that is NOT available from Defender/Sentinel native tools.

python enrich_ips.py <IP_ADDRESS_1> <IP_ADDRESS_2> ...

What it provides:

Source Intelligence
ipinfo.io Geolocation (city, country, coordinates), ISP/ASN, organization, hosting provider detection
vpnapi.io VPN, proxy, Tor exit node, relay detection
AbuseIPDB Abuse confidence score (0-100), total reports, last reported date, recent reporter comments with attack categories
Shodan Open ports, service/banner details, OS detection, known CVEs, tags (e.g., c2, eol-os, self-signed, honeypot), CPEs, hostnames

Output: Per-IP detailed results printed to terminal + JSON export saved to temp/.

Integration with investigation:

  • AbuseIPDB score ≥ 75: 🔴 Strong indicator of malicious activity — flag as high risk
  • VPN/Proxy/Tor detected: 🟠 Potential evasion — note in risk assessment
  • Shodan tags contain c2: 🔴 Known C2 infrastructure — escalate immediately
  • Shodan CVEs found: Cross-reference with Phase 5 CVE correlation for organizational exposure
  • Hosting provider (not residential ISP): 🟡 May indicate attacker infrastructure

Note: For domain and URL IoCs, extract the resolved IP(s) from DeviceNetworkEvents results and run enrichment on those IPs as a follow-up step.


Phase 3: Parallel Threat Intelligence Collection (Defender & Sentinel)

CRITICAL: Run ALL threat intel queries in parallel for speed!

Batch 1: Threat Intelligence APIs (Run ALL in parallel)

Query Tool/API IoC Types
Defender IOC List ListDefenderIndicators ⚠️ IP, Domain, URL
Defender IP Alerts GetDefenderIpAlerts IP
Defender IP Statistics GetDefenderIpStatistics IP
Defender File Alerts GetDefenderFileAlerts Hash
Defender File Info GetDefenderFileInfo Hash
Defender File Statistics GetDefenderFileStatistics Hash
Defender File Machines GetDefenderFileRelatedMachines Hash

⚠️ ListDefenderIndicators Note: If result is written to file (>50KB), you MUST read and filter the file manually. See Custom IOC Management for required processing steps.

Batch 2: Sentinel KQL Queries (Run ALL in parallel)

Query Table IoC Types
TI Indicators Match ThreatIntelIndicators All
Network Connections DeviceNetworkEvents IP, Domain, URL
Alert Evidence AlertEvidence All
Security Alerts SecurityAlert All
Email URLs EmailUrlInfo Domain, URL

Phase 4: CVE Correlation and Vulnerability Management

Step 4.1: Extract CVE IDs from Threat Intel AND Enrichment

  • Parse threat intel results for CVE references (pattern: CVE-\d{4}-\d{4,})
  • Extract from: alert descriptions, threat family info, MITRE techniques
  • Extract from Shodan enrichment (shodan_vulns field from enrich_ips.py output)

Step 4.2: Query Affected Devices per CVE

For each CVE_ID found:
  → ListDefenderMachinesByVulnerability(cveId: CVE_ID)
  → Collect: deviceId, deviceName, osPlatform, exposureLevel

Step 4.3: Aggregate Device Exposure

{
  "cve_correlation": {
    "cve_ids_found": ["CVE-2024-1234", "CVE-2024-5678"],
    "affected_devices_by_cve": {
      "CVE-2024-1234": [
        {"deviceId": "...", "deviceName": "...", "osPlatform": "..."}
      ]
    },
    "total_unique_affected_devices": 15,
    "critical_cves": 2,
    "high_cves": 3
  }
}

Phase 5: Activity and Connection Analysis

For IP Address IoCs:

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let IPAddress = '<IP_ADDRESS>';
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteIP == IPAddress or LocalIP == IPAddress
| summarize 
    ConnectionCount = count(),
    UniqueDevices = dcount(DeviceId),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp),
    Ports = make_set(RemotePort),
    Protocols = make_set(Protocol)
    by ActionType
| order by ConnectionCount desc

For Domain IoCs:

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let Domain = '<DOMAIN>';
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteUrl has Domain
| summarize 
    ConnectionCount = count(),
    UniqueDevices = dcount(DeviceId),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp),
    UniqueURLs = make_set(RemoteUrl, 10)
    by DeviceName
| order by ConnectionCount desc

For File Hash IoCs:

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let Hash = '<HASH>';
union withsource=SourceTable DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceLogonEvents, DeviceImageLoadEvents, DeviceEvents
| where Timestamp between (start .. end)
| where SHA1 =~ Hash or SHA256 =~ Hash or MD5 =~ Hash or InitiatingProcessSHA256 =~ Hash
| summarize 
    EventCount = count(),
    UniqueDevices = dcount(DeviceId),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp),
    FileNames = make_set(FileName),
    FolderPaths = make_set(FolderPath, 5)
    by ActionType
| order by EventCount desc

Phase 6: Export to JSON

Create single JSON file: temp/ioc_investigation_{ioc_type}_{ioc_normalized}_{timestamp}.json


Sample KQL Queries

Use these exact patterns with appropriate MCP tools. Replace <IOC_VALUE>, <StartDate>, <EndDate>.

⚠️ CRITICAL: START WITH THESE EXACT QUERY PATTERNS These queries have been tested and validated. Use them as your PRIMARY reference.


📅 Date Range Quick Reference

🔴 STEP 0: GET CURRENT DATE FIRST (MANDATORY) 🔴

  • ALWAYS check the current date from the context header BEFORE calculating date ranges
  • NEVER use hardcoded years - the year changes and you WILL query the wrong timeframe

RULE 1: Real-Time/Recent Searches (Current Activity)

  • Add +2 days to current date for end range
  • Why +2? +1 for timezone offset + +1 for inclusive end-of-day
  • Pattern: Today is Jan 23 → Use datetime(2026-01-25) as end date

RULE 2: Historical Searches (User-Specified Dates)

  • Add +1 day to user's specified end date
  • Why +1? To include all 24 hours of the final day

1. Threat Intelligence Indicator Match (Sentinel - limited to first 20 IoCs)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let ioc_value = '<IOC_VALUE>';
ThreatIntelIndicators
| where TimeGenerated between (start .. end)
| where IsActive == true and IsDeleted == false
| summarize arg_max(TimeGenerated, *) by Id
| where ObservableValue =~ ioc_value
    or Pattern has ioc_value
| project 
    TimeGenerated,
    Id,
    ObservableKey,
    ObservableValue,
    Pattern,
    Confidence,
    ValidFrom,
    ValidUntil,
    Tags,
    Data
| order by TimeGenerated desc
| take 20

2. IP Address - Network Connection Activity (Advanced Hunting)

let target_ip = '<IP_ADDRESS>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteIP == target_ip or LocalIP == target_ip
| extend Direction = iff(RemoteIP == target_ip, "Outbound", "Inbound")
| summarize 
    TotalConnections = count(),
    UniqueDevices = dcount(DeviceId),
    UniquePorts = dcount(RemotePort),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp),
    Devices = make_set(DeviceName, 10),
    Ports = make_set(RemotePort, 20),
    Protocols = make_set(Protocol),
    ActionTypes = make_set(ActionType),
    InitiatingProcesses = make_set(InitiatingProcessFileName, 10),
    Direction = make_set(Direction,2)

3. IP Address - Detailed Connection Timeline (limited to first 20 events)

let target_ip = '<IP_ADDRESS>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteIP == target_ip or LocalIP == target_ip
| project 
    Timestamp,
    DeviceName,
    DeviceId,
    ActionType,
    RemoteIP,
    RemotePort,
    RemoteUrl,
    LocalIP,
    LocalPort,
    Protocol,
    InitiatingProcessFileName,
    InitiatingProcessCommandLine,
    InitiatingProcessAccountName
| order by Timestamp desc
| take 20

4. Domain - DNS and HTTP Connection Activity

let target_domain = '<DOMAIN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteUrl has target_domain
| summarize 
    TotalConnections = count(),
    UniqueDevices = dcount(DeviceId),
    UniqueUsers = dcount(InitiatingProcessAccountName),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp),
    Devices = make_set(DeviceName, 10),
    URLs = make_set(RemoteUrl, 20),
    Ports = make_set(RemotePort),
    InitiatingProcesses = make_set(InitiatingProcessFileName, 10)

5. Domain - Detailed Connection Timeline (limited to first 20 events)

let target_domain = '<DOMAIN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
DeviceNetworkEvents
| where Timestamp between (start .. end)
| where RemoteUrl has target_domain
| project 
    Timestamp,
    DeviceName,
    InitiatingProcessAccountName,
    ActionType,
    RemoteUrl,
    RemoteIP,
    RemotePort,
    Protocol,
    InitiatingProcessFileName,
    InitiatingProcessCommandLine
| order by Timestamp desc
| take 20

6. URL - Email Delivery Analysis

let target_url = '<URL>';
let target_domain = '<DOMAIN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
EmailUrlInfo
| where TimeGenerated between (start .. end)
| where Url == target_url or Url has target_domain or UrlDomain =~ target_domain
| summarize 
    EmailCount = dcount(NetworkMessageId),
    UniqueURLs = make_set(Url, 10),
    UrlLocations = make_set(UrlLocation),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by UrlDomain
| order by EmailCount desc

7. File Hash - Device File Events

let target_hash = '<HASH>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union withsource=SourceTable DeviceProcessEvents, DeviceNetworkEvents, DeviceFileEvents, DeviceRegistryEvents, DeviceLogonEvents, DeviceImageLoadEvents, DeviceEvents
| where Timestamp between (start .. end)
| where SHA1 =~ target_hash or SHA256 =~ target_hash or MD5 =~ target_hash or InitiatingProcessSHA256 =~ target_hash
| summarize 
    EventCount = count(),
    UniqueDevices = dcount(DeviceId),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp),
    Devices = make_set(DeviceName, 10),
    FileNames = make_set(FileName, 10),
    FolderPaths = make_set(FolderPath, 10),
    ActionTypes = make_set(ActionType)
| extend HashType = case(
    isnotempty(target_hash) and strlen(target_hash) == 32, "MD5",
    isnotempty(target_hash) and strlen(target_hash) == 40, "SHA1",
    isnotempty(target_hash) and strlen(target_hash) == 64, "SHA256",
    "Unknown")

8. Alert Evidence - IoC in Alerts (limited to first 20 alerts)

let ioc_value = '<IOC_VALUE>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
AlertEvidence
| where TimeGenerated between (start .. end)
| where RemoteIP == ioc_value 
    or RemoteUrl has ioc_value 
    or SHA1 =~ ioc_value 
    or SHA256 =~ ioc_value
    or FileName has ioc_value
    or Title has ioc_value
    or Categories has ioc_value
| project 
    TimeGenerated,
    AlertId,
    Title,
    Severity,
    Categories,
    ServiceSource,
    EntityType,
    EvidenceRole,
    RemoteIP,
    RemoteUrl,
    FileName,
    SHA1,
    SHA256,
    DeviceName,
    AccountName
| order by TimeGenerated desc
| take 20

9. Security Alerts Mentioning IoC

let ioc_value = '<IOC_VALUE>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
AlertEvidence
| where TimeGenerated between (start .. end)
| where RemoteIP == ioc_value 
    or RemoteUrl has ioc_value 
    or SHA1 =~ ioc_value 
    or SHA256 =~ ioc_value
    or FileName has ioc_value
    or Title has ioc_value
    or Categories has ioc_value
| join AlertInfo on AlertId
| extend HostFullName = strcat(parse_json(parse_json(AdditionalFields).Host).HostName,".", parse_json(parse_json(AdditionalFields).Host).DnsDomain)
| extend OS = strcat(parse_json(parse_json(AdditionalFields).Host).OSFamily," ", parse_json(parse_json(AdditionalFields).Host).OSVersion)
| extend IsDomainJoined = parse_json(parse_json(AdditionalFields).Host).IsDomainJoined
| extend AffectedDevice = strcat(HostFullName,",", OS, ",IsDomainJoined: ", IsDomainJoined)
| summarize 
    AlertCount = dcount(AlertId),
    Alerts = make_set(Title, 10),
    Severities = make_set(Severity),
    Categories = make_set(Category),
    AttackTechniques = make_set(AttackTechniques),
    AffectedDevices = make_set(AffectedDevice, 10)

10. Defender Custom IOC List Match

// Use Defender API: ListDefenderIndicators with filters
// indicatorType: "IpAddress" | "DomainName" | "Url" | "FileSha1" | "FileSha256" | "FileMd5"
// indicatorValue: "<IOC_VALUE>"

11. IP Address - Sign-in Analysis (Azure AD)

let target_ip = '<IP_ADDRESS>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where IPAddress == target_ip
| summarize 
    SignInCount = count(),
    UniqueUsers = dcount(UserPrincipalName),
    SuccessCount = countif(ResultType == '0'),
    FailureCount = countif(ResultType != '0'),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    Users = make_set(UserPrincipalName, 10),
    Apps = make_set(AppDisplayName, 10),
    ResultTypes = make_set(ResultType)
| extend SuccessRate = round(100.0 * SuccessCount / SignInCount, 2)

12. CVE Extraction from Alerts

let ioc_value = '<IOC_VALUE>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
AlertEvidence
| where TimeGenerated between (start .. end)
| where RemoteIP == ioc_value 
    or RemoteUrl has ioc_value 
    or SHA1 =~ ioc_value 
    or SHA256 =~ ioc_value
    or FileName has ioc_value
    or Title has ioc_value
    or Categories has ioc_value
| extend CVEs = extract_all(@"(CVE-\d{4}-\d{4,})", tostring(AttackTechniques))
| mv-expand CVE = CVEs
| where isnotempty(CVE)
| summarize 
    CVECount = dcount(tostring(CVE)),
    CVEs = make_set(tostring(CVE)),
    AlertCount = dcount(AlertId),
    Alerts = make_set(Title, 5)

Defender API Queries

IP Address Investigation

Get Alerts for IP:

Tool: GetDefenderIpAlerts (MCP)
Parameter: ipAddress = "<IP_ADDRESS>"
Returns: All security alerts associated with the IP

Get IP Statistics:

Tool: activate_file_and_ip_statistics_tools → GetDefenderIpStatistics
Parameter: ipAddress = "<IP_ADDRESS>"
Returns: Organization prevalence, device count, communication stats

Find Devices by IP:

Tool: FindDefenderMachinesByIp (MCP)
Parameters: 
  ipAddress = "<IP_ADDRESS>"
  timestamp = "<DATETIME>" (ISO 8601 format)
Returns: Devices that communicated with IP ±15 minutes of timestamp

File Hash Investigation

Get File Info:

Tool: activate_file_and_ip_statistics_tools → GetDefenderFileInfo
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: File details, global prevalence, threat determination

Get File Statistics:

Tool: activate_file_and_ip_statistics_tools → GetDefenderFileStatistics
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: Organization statistics, device count, global stats

Get File Alerts:

Tool: GetDefenderFileAlerts (MCP)
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: All alerts associated with the file

Get Devices with File:

Tool: GetDefenderFileRelatedMachines (MCP)
Parameter: fileHash = "<SHA1_OR_SHA256>"
Returns: All devices where file was observed

Vulnerability Management

List Devices Affected by CVE:

Tool: ListDefenderMachinesByVulnerability (MCP)
Parameter: cveId = "CVE-YYYY-NNNNN"
Returns: All devices vulnerable to the CVE with exposure details

Get Device Vulnerabilities:

Tool: GetDefenderMachineVulnerabilities (MCP)
Parameter: id = "<DEVICE_ID>"
Returns: All CVEs affecting the specific device

Custom IOC Management

Search Existing IOCs:

Tool: ListDefenderIndicators (MCP)
Parameters (all optional):
  indicatorType = "IpAddress" | "DomainName" | "Url" | "FileSha1" | "FileSha256" | "FileMd5"
  indicatorValue = "<IOC_VALUE>"
  action = "Alert" | "Block" | "Allow"
  severity = "Informational" | "Low" | "Medium" | "High"
Returns: Matching custom indicators in tenant

⚠️ CRITICAL: Processing Large ListDefenderIndicators Results

The ListDefenderIndicators API may return ALL custom indicators in the tenant regardless of filter parameters. When results are large (>50KB), they are written to a temporary file instead of returned inline.

MANDATORY Processing Steps:

  1. If result says "Large tool result written to file":

    • Use read_file tool to read the content file path provided
    • Parse the JSON response to extract the value array
    • Manually filter for the target IoC using case-insensitive matching:
      # Filter logic for IP address
      matches = [ind for ind in indicators["value"] 
                 if ind.get("indicatorValue", "").lower() == target_ioc.lower()]
      
    • Report: "Found X custom indicator(s) matching [IOC]" or "No custom indicators match [IOC]"
  2. If result is inline JSON with empty value array:

    • Report: "No custom indicators found for [IOC]"

🔴 PROHIBITED:

  • ❌ Assuming "large result = no match" without reading and filtering the file
  • ❌ Reporting "Not in IOC list" without verifying the actual content
  • ❌ Skipping file processing due to result size

Example - Correct Processing:

1. Call: ListDefenderIndicators(indicatorType: "IpAddress", indicatorValue: "203.0.113.42")
2. Result: "Large tool result (69KB) written to file: /path/to/content.json"
3. Action: read_file(/path/to/content.json)
4. Parse: Extract value array from JSON
5. Filter: Search for indicatorValue == "203.0.113.42" (case-insensitive)
6. Report: "No custom indicators match 203.0.113.42" OR "Found 1 custom indicator: [details]"

JSON Export Structure

Create file: temp/ioc_investigation_{ioc_type}_{ioc_normalized}_{timestamp}.json

{
  "investigation_metadata": {
    "ioc_type": "ip|domain|url|hash",
    "ioc_value": "<normalized_value>",
    "ioc_original": "<user_input>",
    "investigation_timestamp": "<ISO8601>",
    "date_range_start": "<StartDate>",
    "date_range_end": "<EndDate>",
    "elapsed_time_seconds": 45
  },
  "threat_intelligence": {
    "sentinel_ti_matches": [],
    "defender_ioc_matches": [],
    "defender_alerts": [],
    "threat_families": [],
    "confidence_score": 0-100,
    "verdict": "Malicious|Suspicious|Clean|Unknown"
  },
  "ip_enrichment": {
    "geo": { "city": "", "country": "", "org": "", "isp": "" },
    "vpn_proxy_tor": { "is_vpn": false, "is_proxy": false, "is_tor": false },
    "abuseipdb": { "abuse_confidence_score": 0, "total_reports": 0, "last_reported": "", "recent_categories": [] },
    "shodan": { "ports": [], "services": [], "vulns": [], "tags": [], "os": "", "hostnames": [], "cpes": [] }
  },
  "activity_analysis": {
    "network_connections": {
      "total_connections": 0,
      "unique_devices": 0,
      "unique_users": 0,
      "first_seen": "<datetime>",
      "last_seen": "<datetime>",
      "top_devices": [],
      "top_ports": [],
      "top_processes": []
    },
    "email_delivery": {
      "email_count": 0,
      "unique_urls": [],
      "delivery_locations": []
    },
    "file_activity": {
      "event_count": 0,
      "unique_devices": 0,
      "file_names": [],
      "folder_paths": [],
      "action_types": []
    },
    "signin_activity": {
      "signin_count": 0,
      "unique_users": 0,
      "success_rate": 0,
      "affected_users": []
    }
  },
  "alert_correlation": {
    "total_alerts": 0,
    "severity_breakdown": {
      "high": 0,
      "medium": 0,
      "low": 0,
      "informational": 0
    },
    "alert_titles": [],
    "attack_techniques": [],
    "affected_entities": []
  },
  "cve_correlation": {
    "cve_ids_found": [],
    "affected_devices_by_cve": {},
    "total_unique_affected_devices": 0,
    "cve_severity_breakdown": {
      "critical": 0,
      "high": 0,
      "medium": 0,
      "low": 0
    }
  },
  "organizational_exposure": {
    "total_affected_devices": 0,
    "affected_device_list": [],
    "exposure_level": "High|Medium|Low|None",
    "recommended_actions": []
  },
  "risk_assessment": {
    "overall_risk": "Critical|High|Medium|Low|Informational",
    "risk_factors": [],
    "mitigating_factors": [],
    "confidence": "High|Medium|Low"
  }
}

Error Handling

Common Issues and Solutions

Issue Solution
No TI matches found IoC may be unknown; proceed with activity analysis
Defender API returns 404 IoC not in organization's scope; check Sentinel data
Empty DeviceNetworkEvents Expand date range or check if MDE is deployed
CVE not found in vulnerability DB CVE may be too new or not applicable to org assets
Multiple IoC types detected Investigate each separately, correlate results
Rate limiting on API calls Add delays between API calls, batch where possible
ListDefenderIndicators returns large file Read file with read_file, parse JSON, manually filter for target IoC value

Required Field Defaults

If queries return no results, use these defaults:

{
  "threat_intelligence": {
    "sentinel_ti_matches": [],
    "defender_alerts": [],
    "verdict": "Unknown",
    "confidence_score": 0
  },
  "activity_analysis": {
    "network_connections": {
      "total_connections": 0,
      "unique_devices": 0
    }
  },
  "cve_correlation": {
    "cve_ids_found": [],
    "affected_devices_by_cve": {},
    "total_unique_affected_devices": 0
  }
}

Example Workflows

Example 1: IP Address Investigation

User says: "Investigate IP 203.0.113.42 for the last 7 days"

Workflow:

  1. Identify IoC: IPv4 Address, normalized: 203.0.113.42
  2. 3rd-Party Enrichment:
    python enrich_ips.py 203.0.113.42
    
    → Get geo, ISP, VPN/proxy/Tor flags, AbuseIPDB score, Shodan ports/CVEs/tags
  3. Phase 1 - Threat Intel (parallel):
    • GetDefenderIpAlerts(ipAddress: "203.0.113.42")
    • Sentinel ThreatIntelIndicators query
    • ListDefenderIndicators(indicatorType: "IpAddress", indicatorValue: "203.0.113.42")
  4. Phase 2 - Activity Analysis (parallel):
    • DeviceNetworkEvents query for IP
    • SigninLogs query for IP
    • AlertEvidence query for IP
  5. Phase 3 - CVE Correlation:
    • Extract CVEs from alerts AND Shodan enrichment
    • For each CVE: ListDefenderMachinesByVulnerability
  6. Export JSON and summarize findings (include enrichment data in JSON export)

Example 2: Domain Investigation

User says: "Is evil-malware.com in our environment?"

Workflow:

  1. Identify IoC: Domain, normalized: evil-malware.com
  2. Phase 1 - Threat Intel (parallel):
    • Sentinel ThreatIntelIndicators query
    • ListDefenderIndicators(indicatorType: "DomainName", indicatorValue: "evil-malware.com")
  3. Phase 2 - Activity Analysis (parallel):
    • DeviceNetworkEvents query for domain
    • EmailUrlInfo query for domain
    • AlertEvidence query for domain
  4. Phase 3 - Exposure Assessment:
    • List all devices that connected
    • Identify affected users
  5. Export JSON and summarize findings

Example 3: File Hash Investigation with CVE Correlation

User says: "Investigate SHA256 a1b2c3... and check which devices are vulnerable"

Workflow:

  1. Identify IoC: SHA256 Hash, normalized: a1b2c3...
  2. Phase 1 - Threat Intel (parallel):
    • GetDefenderFileInfo(fileHash: "a1b2c3...")
    • GetDefenderFileAlerts(fileHash: "a1b2c3...")
    • GetDefenderFileStatistics(fileHash: "a1b2c3...")
  3. Phase 2 - Device Exposure:
    • GetDefenderFileRelatedMachines(fileHash: "a1b2c3...")
    • DeviceFileEvents query
  4. Phase 3 - CVE Correlation:
    • Extract CVEs from file threat family info
    • For each CVE: ListDefenderMachinesByVulnerability
    • Cross-reference with devices that have the file
  5. Export JSON and summarize with remediation priorities

Security Notes

  • All investigations are logged for audit purposes
  • IoC values may be sensitive - handle with care
  • Follow organizational data classification policies
  • Consider threat actor attribution implications
  • Document investigation actions for incident timeline

Integration with Other Skills

This skill can be combined with:

  • user-investigation: When IoC is found in user's sign-in logs
  • computer-investigation: When IoC is found on specific device
  • authentication-tracing: When IoC IP appears in auth anomalies
  • ca-policy-investigation: When IoC triggers conditional access events

Cross-skill pivot example: "Investigate IP 203.0.113.42" → Found in user sign-ins → "Investigate user@domain.com" using user-investigation skill

监控审计Microsoft Sentinel及Defender XDR环境中的MCP服务器使用。追踪Graph、Sentinel、Azure等MCP服务器的遥测数据,分析用户行为、API调用趋势、敏感操作检测及跨服务器活动,提供安全风险评估与合规报告。
MCP usage MCP server monitoring MCP activity MCP audit tool usage monitoring MCP breakdown who is using MCP
.github/skills/mcp-usage-monitoring/SKILL.md
npx skills add SCStelz/security-investigator --skill mcp-usage-monitoring -g -y
SKILL.md
Frontmatter
{
    "name": "mcp-usage-monitoring",
    "description": "Use this skill when asked to monitor, audit, or analyze MCP (Model Context Protocol) server usage in the environment. Triggers on keywords like \"MCP usage\", \"MCP server monitoring\", \"MCP activity\", \"Graph MCP\", \"Sentinel MCP\", \"Azure MCP\", \"MCP audit\", \"tool usage monitoring\", \"MCP breakdown\", \"who is using MCP\", or when investigating MCP user activity, Graph API calls from MCP servers, or workspace query governance. This skill provides comprehensive MCP server telemetry analysis across Graph MCP, Sentinel MCP, and Azure MCP servers including usage trends, endpoint access patterns, user attribution, cross-server user analysis, sensitive API detection, workspace query governance, and security risk assessment with inline and markdown file reporting.",
    "drill_down_prompt": "Run MCP usage monitoring report — Graph\/Sentinel\/Azure MCP activity, user attribution",
    "threat_pulse_domains": [
        "admin"
    ]
}

MCP Server Usage Monitoring — Instructions

Purpose

This skill monitors and audits Model Context Protocol (MCP) server usage across your Microsoft Sentinel and Defender XDR environment. MCP servers are AI-powered tools that enable language models to interact with Microsoft security services — and like any privileged access channel, they require monitoring.

What this skill tracks:

MCP Server Telemetry Source Key Identifier
Microsoft Graph MCP Server MicrosoftGraphActivityLogs AppId = e8c77dc2-69b3-43f4-bc51-3213c9d915b4
Sentinel Data Lake MCP CloudAppEvents RecordType 403, Interface = IMcpToolTemplate
Sentinel Triage MCP MicrosoftGraphActivityLogs + SigninLogs AppId = 7b7b3966-1961-47b5-b080-43ca5482e21c ("Microsoft Defender Mcp") — dedicated AppId with full user attribution via delegated cert auth
Azure MCP Server AzureActivity No dedicated AppId — uses DefaultAzureCredential
Sentinel Data Lake — Direct KQL CloudAppEvents RecordType 379, Operation = KQLQueryCompleted
Workspace Query Sources (Analytics Tier) LAQueryLogs All clients querying Log Analytics workspace

What this skill detects:

  • Graph API call volume, trends, and endpoint diversity via MCP
  • Sensitive/high-risk Graph endpoint access (PIM, credentials, Identity Protection)
  • Sentinel workspace query patterns by client application
  • User vs. Service Principal attribution across all MCP channels
  • Cross-server user analysis — identifies users with broadest MCP footprint (multiple server types, highest call volume)
  • Azure ARM operations potentially originating from Azure MCP Server
  • Non-MCP platform query sources for governance context (Sentinel Engine, Logic Apps)
  • Sentinel Data Lake MCP tool usage — tool call breakdown (query_lake, list_sentinel_workspaces, search_tables, etc.), success/failure rates, execution duration, tables accessed via CloudAppEvents (Purview unified audit)
  • MCP-driven vs Direct KQL delineation — distinguishes Data Lake queries initiated via MCP tools (RecordType 403, Interface IMcpToolTemplate) from direct KQL queries (RecordType 379) and Analytics tier queries (LAQueryLogs)
  • Anomalous access patterns: new users, new endpoints, volume spikes, error surges
  • MCP server usage as a proportion of total workspace activity

Extended landscape awareness: Beyond these four actively monitored MCP servers, Microsoft's MCP ecosystem includes 30+ additional servers (Copilot Studio built-in catalog, Power BI, Fabric RTI, Playwright, Security Copilot Agent Creation, and more). See Extended Microsoft MCP Server Landscape for the full catalog, telemetry surfaces, and monitoring expansion priorities.


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Extended MCP Server Landscape - Full Microsoft MCP ecosystem catalog
  3. Output Modes - Inline chat vs. Markdown file
  4. Scalability & Token Management - Guidance for large environments
  5. Quick Start - 10-step investigation pattern
  6. MCP Usage Score Formula - Composite health & risk scoring
  7. Execution Workflow - Complete 7-phase process
  8. Sample KQL Queries - Validated query patterns
  9. Report Template - Output format specification
  10. Proactive Alerting — KQL Data Lake Jobs - Scheduled anomaly detection
  11. Known Pitfalls - Edge cases and false positives
  12. Error Handling - Troubleshooting guide
  13. SVG Dashboard Generation - Visual dashboard from completed report

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY MCP usage monitoring analysis:

  1. ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
  2. ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
  3. ALWAYS ask the user for time range if not specified: default to 30 days, configurable
  4. ALWAYS query all MCP telemetry surfaces — do not skip any MCP server type
  5. ALWAYS include non-MCP workspace context (Sentinel Engine, Logic Apps) for governance proportion analysis
  6. ALWAYS run independent queries in parallel for performance
  7. ALWAYS attribute activity to specific users — never present anonymous aggregates
  8. NEVER conflate non-MCP platform activity with MCP activity — clearly label categories
  9. ALWAYS execute pre-authored queries from Sample KQL Queries EXACTLY as written — substitute only the time range parameter (e.g., ago(30d)ago(90d)). These queries encode mitigations for schema pitfalls documented in Known Pitfalls. Writing equivalent queries from scratch is ❌ PROHIBITED

Known AppIds Reference

MCP Servers & AI Agents

AppId Service Telemetry Table Notes
e8c77dc2-69b3-43f4-bc51-3213c9d915b4 Microsoft Graph MCP Server for Enterprise MicrosoftGraphActivityLogs Read-only Graph API proxy
7b7b3966-1961-47b5-b080-43ca5482e21c Sentinel Triage MCP ("Microsoft Defender Mcp") MicrosoftGraphActivityLogs, SigninLogs, AADNonInteractiveUserSignInLogs Microsoft first-party AppId, same across all tenants. Dedicated AppId — visible in MicrosoftGraphActivityLogs (API calls to /security/* endpoints) and SigninLogs/AADNonInteractiveUserSignInLogs (AppDisplayName = "Microsoft Defender Mcp"). Delegated auth with certificate (ClientAuthMethod=2), full user attribution. Scopes: SecurityAlert.Read.All, SecurityIncident.Read.All, ThreatHunting.Read.All. Target resources: Microsoft Graph, WindowsDefenderATP. No local SPN — display name only visible in SigninLogs. 🔴 Confirmed Feb 2026: Empirical telemetry investigation identified 7b7b3966 as the Triage MCP AppId via MicrosoftGraphActivityLogs + SigninLogs correlation.
253895df-6bd8-4eaf-b101-1381ec4306eb Sentinel Platform Services App Reg SigninLogs Sentinel-hosted MCP platform
04b07795-8ddb-461a-bbee-02f9e1bf7b46 Azure MCP Server (local stdio via DefaultAzureCredential → Azure CLI) SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs Shared AppId with Azure CLI. In LAQueryLogs, RequestClientApp is empty (not a unique fingerprint). Azure MCP appends \n| limit N to query text — the only query-level differentiator. Read-only ARM ops don't appear in AzureActivity. 🔄 Updated Feb 2026: Previously documented as AppId 1950a258 (AzurePowerShellCredential) with csharpsdk,LogAnalyticsPSClient — that fingerprint is obsolete; only 1 occurrence found in 30-day lookback.
(none — uses DefaultAzureCredential) Azure MCP Server (local stdio) AzureActivity ARM write operations only; read ops not logged. Claims.appid = 04b07795. Inherits cred from Azure CLI/VS Code
(no AppId — Purview unified audit) Sentinel Data Lake MCP CloudAppEvents RecordType 403; Interface IMcpToolTemplate; tools: query_lake, list_sentinel_workspaces, search_tables

Sentinel MCP Collection Endpoints

Endpoint URL Collection Monitored
https://sentinel.microsoft.com/mcp/data-exploration Data Exploration (Data Lake MCP) ✅ Phase 3
https://sentinel.microsoft.com/mcp/triage Triage (Triage MCP) ✅ Phase 2
https://sentinel.microsoft.com/mcp/security-copilot-agent-creation Security Copilot Agent Creation ❌ See Landscape

Client Applications

AppId Service Telemetry Table Notes
aebc6443-996d-45c2-90f0-388ff96faa56 Visual Studio Code SigninLogs VS Code as MCP client → Sentinel
9ba5f2e4-6bbf-4df2-b19b-7f1bcb926818 PowerPlatform-sentinelmcp-Connector SigninLogs Copilot Studio → Sentinel MCP
04b07795-8ddb-461a-bbee-02f9e1bf7b46 Azure CLI (DefaultAzureCredential) SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs Primary Azure MCP Server credential path (field-tested Feb 2026). RequestClientApp is empty in LAQueryLogs. Azure MCP appends \n| limit N to query text. Shared AppId with manual az CLI — disambiguate via query text pattern or session correlation. 🔄 Previously documented as 1950a258 (AzurePowerShellCredential) — that path is obsolete

Portal & Platform Applications (Non-MCP — for context)

AppId Service Telemetry Table Notes
80ccca67-54bd-44ab-8625-4b79c4dc7775 M365 Security & Compliance Center (Sentinel Portal) LAQueryLogs ASI_Portal, ASI_Portal_Connectors — Sentinel Portal backend, NOT an MCP server
95a5d94c-a1a0-40eb-ac6d-48c5bdee96d5 Azure Portal — AppInsightsPortalExtension LAQueryLogs Azure Portal blade for Log Analytics Usage dashboards/workbooks. RequestClientApp = AppInsightsPortalExtension. Executes billing/usage queries (e.g., Usage | where IsBillable). NOT MCP, NOT VS Code — runs when user opens Workspace Usage Dashboard in browser. No SPN or app registration in tenant (platform-level first-party app). Not in merill/microsoft-info known apps list.
de8c33bb-995b-4d4a-9d04-8d8af5d59601 PowerPlatform-AzureMonitorLogs-Connector AADNonInteractiveUserSignInLogs, LAQueryLogs Logic Apps → Log Analytics (NOT MCP)
fc780465-2017-40d4-a0c5-307022471b92 Sentinel Engine (analytics rules, UEBA, Advanced Hunting backend) LAQueryLogs Built-in scheduled query engine (NOT MCP). Also serves as the execution backend for Advanced HuntingRequestClientApp = "M365D_AdvancedHunting" indicates AH queries from Triage MCP, Defender portal, or Security Copilot that hit connected LA tables (see Query 7). Separate from analytics rules (RequestClientApp empty or other values).

Extended Microsoft MCP Server Landscape (Reference)

Beyond the four MCP servers actively monitored by this skill, Microsoft's MCP ecosystem includes many additional servers. This section catalogs them for awareness, threat modeling, and future monitoring expansion.

Sentinel MCP Collections (Microsoft-Hosted)

Microsoft Sentinel exposes three official MCP collections, each at a distinct endpoint:

Collection Endpoint URL Purpose Monitored by This Skill
Data Exploration https://sentinel.microsoft.com/mcp/data-exploration query_lake, search_tables, list_sentinel_workspaces, entity analyzer ✅ Phase 3 (CloudAppEvents)
Triage https://sentinel.microsoft.com/mcp/triage Incident triage, Advanced Hunting, entity investigation ✅ Phase 2 (MicrosoftGraphActivityLogs + SigninLogs — AppId 7b7b3966)
Security Copilot Agent Creation https://sentinel.microsoft.com/mcp/security-copilot-agent-creation Create Microsoft Security Copilot agents for complex workflows ❌ Not yet monitored

Sentinel Custom MCP Tools: Organizations can create their own MCP tools by exposing saved KQL queries from Advanced Hunting as MCP tools. These execute through the same Sentinel MCP infrastructure and are audited in CloudAppEvents (RecordType 403) alongside built-in tools. See Create custom Sentinel MCP tools.

🔵 Monitoring note: Custom MCP tools appear in CloudAppEvents with the same RecordType 403 and IMcpToolTemplate interface as built-in tools. The ToolName field will show the custom tool name, making them visible in Query 13 without modification.

Power BI MCP Servers

Server Type Endpoint / Repo Purpose Telemetry Surface
Power BI Remote MCP Microsoft-hosted https://api.fabric.microsoft.com/v1/mcp/powerbi Query Power BI datasets, reports, and workspaces remotely via SSE transport 🟡 PowerBIActivity table (if ingested into Sentinel), Fabric audit logs
Power BI Modeling MCP Local (stdio) microsoft/powerbi-modeling-mcp Local Power BI model operations (DAX queries, schema exploration) ❌ Local only — no Azure telemetry

⚠️ Data exfiltration risk: Power BI Remote MCP provides API-based access to organizational datasets. If an AI agent connects to this endpoint, it can query sensitive business data. Monitor PowerBIActivity for unusual access patterns if this table is available in your Sentinel workspace.

Fabric & Azure Data Explorer MCP Servers

Server Type Endpoint / Repo Purpose Telemetry Surface
Fabric RTI MCP Server Local (stdio) microsoft/fabric-rti-mcp Query Azure Data Explorer clusters and Fabric Real-Time Intelligence Eventhouses via KQL 🟡 ADX audit logs, Fabric audit events
Azure MCP Server — Kusto namespace Local (stdio) Part of Azure MCP Server (azmcp --namespace kusto) Manage ADX clusters, databases, tables, and queries via ARM ✅ Already covered (Azure ARM operations — Phase 4)
Kusto Query MCP Copilot Studio built-in Copilot Studio catalog KQL query execution from Copilot Studio agents 🟡 CloudAppEvents (Copilot Studio workload)

🔵 Note: The Fabric RTI MCP Server is open-source and runs locally. It authenticates to ADX/Eventhouse using the user's credentials. If your org uses ADX, queries from this MCP would appear in ADX audit logs (.show queries / diagnostic logs), NOT in Sentinel LAQueryLogs.

Developer & Productivity MCP Servers

Server Type Repo Purpose Telemetry Surface
Playwright MCP Local (stdio) microsoft/playwright-mcp (26.9k ⭐) Browser automation via accessibility tree — enables LLMs to interact with web pages ❌ Local only — no Azure telemetry
GitHub MCP Server Local (stdio) github/github-mcp-server GitHub repo operations (issues, PRs, code search) via PAT ❌ GitHub audit logs only, not in Sentinel
Microsoft Learn Docs MCP Cloud-hosted Certified Copilot Studio connector Search and fetch official Microsoft Learn documentation ❌ Public docs, no security data

Copilot Studio Built-in MCP Servers (19+ servers)

Microsoft Copilot Studio provides a catalog of built-in MCP servers for agent development. These are Microsoft-managed, cloud-hosted servers that agents can connect to.

Source: Built-in MCP servers catalog

Category MCP Servers Security Relevance
Microsoft 365 Outlook Mail, Outlook Calendar, 365 User Profile, Teams, Word, 365 Copilot (Search) 🔴 High — email, calendar, user profile access
SharePoint & OneDrive SharePoint and OneDrive, SharePoint Lists 🟠 Medium — file and data access
Administration 365 Admin Center 🔴 High — administrative control plane
Dataverse Dataverse MCP 🟠 Medium — business data access
Dynamics 365 Sales, Finance, Supply Chain, Service, ERP, Contact Center (6 sub-variants) 🟡 Low-Medium — business application data
Fabric Fabric MCP 🟠 Medium — analytics data access
Office 365 Outlook Contact Management, Email Management, Meeting Management 🔴 High — email and contact data
Meta-Server MCP Management MCP 🟠 Medium — manages other MCP servers via Dataverse/Graph

⚠️ Telemetry gap: Copilot Studio built-in MCP servers are NOT directly visible in LAQueryLogs or MicrosoftGraphActivityLogs. Their activity may appear in:

  • CloudAppEvents — under Copilot Studio workload (if Purview unified audit is configured)
  • M365 unified audit log — as Copilot Studio agent actions
  • AuditLogs — service principal lifecycle events (creation, modification)
  • AADServicePrincipalSignInLogs — SPN sign-ins to Bot Framework from Azure internal IPs (fd00:*)

To monitor Copilot Studio agent activity, use the ai-agent-posture skill for comprehensive agent security auditing.

Azure MCP Server — Full Tool Surface

The Azure MCP Server (already tracked in Phase 4) has a much broader tool surface than just ARM operations. The complete namespace catalog:

Category Namespaces Security-Relevant Tools
AI & ML foundry, search, speech AI Foundry model access, Search index queries
Identity role ⚠️ RBAC role assignments — view and manage
Security keyvault, appconfig, confidentialledger 🔴 Key Vault secrets/keys/certs, App Configuration
Databases cosmos, mysql, postgres, redis, sql Database access and management
Storage storage, fileshares, storagesync, managedlustre Blob, file, and storage account access
Compute appservice, functionapp, aks App Service, Functions, Kubernetes
Networking eventhubs, servicebus, eventgrid, communication, signalr Messaging and event services
DevOps bicepschema, deploy, monitor, workbooks, grafana Infrastructure deployment, monitoring
Governance policy, quota, resourcehealth, cloudarchitect Policy management, resource health
Other marketplace, virtualdesktop, loadtesting, acr VDI, container registry, load testing

🔵 Key Vault access via MCP is particularly security-sensitive. The Azure MCP Server implements elicitation (user confirmation prompts) before returning secrets. However, this can be bypassed with the --insecure-disable-user-confirmation flag. Monitor AzureActivity for Key Vault operations correlated with MCP usage patterns.

Monitoring Expansion Priorities

If expanding this skill's coverage, prioritize based on data access risk:

Priority Server Why How to Monitor
🔴 P1 Copilot Studio built-in M365 MCPs Email, Teams, admin center access ai-agent-posture skill + CloudAppEvents
🔴 P1 Security Copilot Agent Creation Creates autonomous security agents CloudAppEvents for agent creation events
🟠 P2 Power BI Remote MCP Dataset query access via API PowerBIActivity table if available
🟠 P2 Sentinel Custom MCP Tools User-defined tools, same audit surface Already visible in Phase 3 CloudAppEvents
🟡 P3 Fabric RTI MCP ADX/Eventhouse data access ADX diagnostic logs
🟡 P3 Kusto Query MCP (Copilot Studio) KQL from Copilot Studio agents CloudAppEvents (Copilot Studio workload)
P4 Playwright, GitHub, Learn Docs MCPs Local/public, minimal telemetry Not monitorable from Sentinel

Note: This catalog reflects the Microsoft MCP ecosystem as of February 2026. The Copilot Studio MCP catalog notes: "This list isn't exhaustive. New MCP connectors are added regularly."


⛔ MANDATORY: Sentinel Workspace Selection

This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:

When invoked from another skill (e.g., incident-investigation):

  • Inherit the workspace selection from the parent investigation context
  • If no workspace was selected in parent context: STOP and ask user to select

When invoked standalone (direct user request):

  1. ALWAYS call list_sentinel_workspaces MCP tool FIRST
  2. If 1 workspace exists: Auto-select, display to user, proceed
  3. If multiple workspaces exist:
    • Display all workspaces with Name and ID
    • ASK: "Which Sentinel workspace should I use for this analysis?"
    • ⛔ STOP AND WAIT for user response
    • ⛔ DO NOT proceed until user explicitly selects
  4. If a query fails on the selected workspace:
    • ⛔ DO NOT automatically try another workspace
    • STOP and report the error, display available workspaces, ASK user to select

🔴 PROHIBITED ACTIONS:

  • ❌ Selecting a workspace without user consent when multiple exist
  • ❌ Switching to another workspace after a failure without asking
  • ❌ Proceeding with analysis if workspace selection is ambiguous

Output Modes

This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.

Mode 1: Inline Chat Summary (Default)

  • Render the full MCP usage analysis directly in the chat response
  • Includes ASCII tables, trend charts, endpoint breakdowns, and security assessment
  • Best for quick review and interactive follow-up questions

Mode 2: Markdown File Report

  • Save a comprehensive report to reports/mcp-usage/MCP_Usage_Report_<timestamp>.md
  • All ASCII visualizations render correctly inside markdown code fences (```)
  • Includes all data from inline mode plus additional detail sections
  • Use create_file tool — NEVER use terminal commands for file output
  • Filename pattern: reports/mcp-usage/MCP_Usage_Report_YYYYMMDD_HHMMSS.md

Markdown Rendering Notes

  • ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
  • ✅ Unicode block characters (▓░█) display correctly in monospaced fonts
  • ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
  • ✅ Standard markdown tables (| col |) render as formatted tables
  • Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering

Scalability & Token Management

This skill was developed in a small lab environment (1–2 users, single workspace). In larger tenants with many users, MCP servers, and higher query volumes, the query complexity is not a concern — all queries use summarize, dcount, make_set(..., N), and take operators, so result sets remain bounded regardless of raw table size. Execution time will increase but output shape stays the same.

The primary risk in large environments is LLM token exhaustion during report generation. All query results accumulate in conversation context before the report is written, and this skill file itself consumes significant context. In a large tenant, richer result sets (more users, endpoints, error categories, AppIds) can push past token limits before the report is complete.

Guardrails for Large Environments

1. Tighten result set limits in queries:

Parameter Small Env (default) Large Env
make_set(..., N) for users 10 5
make_set(..., N) for endpoints 20–30 10
make_set(..., N) for errors 5 3
take on governance tables 25 15
take on endpoint rankings 25 15
take on error analysis 50 20

2. Incremental file writes (markdown mode):

Instead of composing the entire report in memory and writing it in one create_file call:

  • Write the report header and executive summary first with create_file
  • Append each section (Graph MCP, Sentinel Triage, Data Lake, etc.) using replace_string_in_file to insert content at the end of the file
  • This allows earlier query results to fall out of active context after being written

3. Two-pass approach for very large tenants:

  • Pass 1 (Summary): Run all queries with aggressive limits (take 10, make_set(..., 3)). Generate a summary report with top-level numbers only.
  • Pass 2 (Drill-down): If the user wants detail on a specific section (e.g., "show me the full Data Lake error breakdown"), run targeted queries for that section only.

4. Parallel query batching:

Phases 1–5 contain independent queries — always run them in parallel. But avoid running all ~16 queries simultaneously; batch them into 2–3 groups of 5–6 queries. This balances throughput against context accumulation.

5. Omit raw query appendix for large reports:

The "Appendix: Query Details" section listing every KQL query used can be omitted in large environments to save tokens. The queries are documented in this skill file and don't need to be repeated in the report.

Indicators You're Hitting Token Limits

  • Report generation starts but cuts off mid-section
  • The agent switches to a new conversation turn unexpectedly during report writing
  • Sections become progressively less detailed toward the end of the report
  • The agent summarizes findings in chat instead of writing the full markdown file

If any of these occur, ask the agent to: "Continue writing the report from where you left off" — the incremental file write approach ensures partial progress is saved.


Quick Start (TL;DR)

When a user requests MCP usage monitoring:

  1. Select Workspacelist_sentinel_workspaces, auto-select or ask
  2. Determine Output Mode → Ask if not specified: inline, markdown file, or both
  3. Determine Time Range → Ask if not specified; default 30 days
  4. Run Phase 1 (Graph MCP) → Daily usage summary, top endpoints, sensitive API access
  5. Run Phase 2 (Sentinel Triage MCP) → API calls via AppId 7b7b3966, auth events, AH downstream queries
  6. Run Phase 3 (Sentinel Data Lake MCP) → CloudAppEvents tool usage, error analysis, MCP vs Direct KQL
  7. Run Phase 4 (Azure MCP & ARM) → ARM operations, resource provider breakdown
  8. Run Phase 5 (Workspace Governance) → All query sources (Analytics + Data Lake tiers), MCP proportion
  9. Run Phase 6 (Cross-Server User Analysis) → Top MCP users by server breadth, power user identification
  10. Run Phase 7 (Assessment) → Compute MCP Usage Score, security assessment, render report

Parallel execution: Phases 1-5 contain independent queries — run all of them in parallel for performance. Phases 6-7 depend on results from 1-5.


MCP Usage Score Formula

The MCP Usage Score is a composite health and risk indicator that summarizes MCP server activity. Unlike the Drift Score (which is a ratio), this is an absolute assessment based on multiple dimensions.

Scoring Dimensions

$$ \text{MCPUsageScore} = \sum_{i} \text{DimensionScore}_i $$

Each dimension contributes 0–20 points to a maximum of 100:

Dimension Max Points Green (0-5) Yellow (6-12) Red (13-20)
User Diversity 20 1-2 known users 3-5 users or 1 unknown >5 users or unknown users
Endpoint Sensitivity 20 0% sensitive endpoints 1-30% sensitive >30% calls to sensitive APIs
Error Rate 20 <1% errors 1-5% errors >5% errors
Volume Anomaly 20 Within ±50% of daily avg 50-200% spike >200% spike vs avg
Off-Hours Activity 20 <5% off-hours 5-20% off-hours >20% calls outside business hours

Interpretation Scale

Score Meaning Action
0–25 Healthy ✅ Normal MCP usage, no concerns
26–50 Elevated 🟡 Review — minor anomalies detected
51–75 Concerning 🟠 Investigate — multiple risk signals present
76–100 Critical 🔴 Immediate review — significant security risk

Sensitivity Classification

Sensitive Graph API endpoints — flag any MCP calls to these patterns:

roleManagement, roleAssignments, roleEligibility,
authentication/methods, identityProtection, riskyUsers,
riskDetections, conditionalAccess, servicePrincipals,
appRoleAssignments, oauth2PermissionGrants,
auditLogs, directoryRoles, privilegedAccess,
security/alerts, security/incidents

Off-Hours Definition

Business hours: 08:00–18:00 local time (derive from user's primary sign-in timezone, or use UTC if unknown). Weekends count as off-hours for all 24 hours.


Execution Workflow

Phase 1: Graph MCP Server Analysis

Data source: MicrosoftGraphActivityLogs
Filter: AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"

Collect:

  • Execute Query 1 (Unified Daily MCP Activity Trend) via RunAdvancedHuntingQuery — returns daily Server | Day | Calls | Errors | ErrorRate for ALL 4 MCP servers in one pass. Run this ONCE here; do NOT re-run in Phases 2–4. Feeds the SVG dashboard Row 5 line chart and volume anomaly detection.
  • Execute Query 2 (Endpoint & Activity Summary) via RunAdvancedHuntingQuery — returns per-endpoint rows with call counts, sensitivity flag, off-hours metrics, error rates, and user sets. Replaces former Q2 + Q3 + Q11. Derive: top endpoints (order by CallCount), sensitive APIs (where IsSensitive), off-hours % (sum(OffHoursCalls)/sum(CallCount)).

Phase 2: Sentinel Triage MCP Analysis

Data sources: MicrosoftGraphActivityLogs, SigninLogs, AADNonInteractiveUserSignInLogs
Filter: AppId = 7b7b3966-1961-47b5-b080-43ca5482e21c ("Microsoft Defender Mcp")

Detection Method (Confirmed Feb 2026):

The Sentinel Triage MCP has a dedicated AppId (7b7b3966-1961-47b5-b080-43ca5482e21c) that appears in both MicrosoftGraphActivityLogs and SigninLogs/AADNonInteractiveUserSignInLogs. This enables definitive attribution of Triage MCP calls — no heuristics or shared-surface estimation needed.

Key characteristics:

  • AppDisplayName: "Microsoft Defender Mcp" (visible in SigninLogs)
  • Auth type: Delegated + certificate (ClientAuthMethod=2) — user identity always available
  • Scopes: SecurityAlert.Read.All, SecurityIncident.Read.All, ThreatHunting.Read.All
  • Target resources: Microsoft Graph, WindowsDefenderATP
  • API endpoints: POST /v1.0/security/runHuntingQuery/, GET /security/incidents/, GET /security/alerts_v2/
  • No local SPN: Microsoft first-party app — display name only visible in SigninLogs, not in Graph API SPN lookup

🔵 MicrosoftGraphActivityLogs retention varies by environment (depends on Log Analytics workspace configuration and diagnostic settings). Do not assume a fixed retention period — check with a baseline row count query first.

Collect:

  • Execute Query 3 to get authentication events by client app (VS Code, Copilot Studio, browser) with user, IP, OS, country
  • Execute Query 4 to get client app usage breakdown with distinct user counts and last-seen timestamps
  • Execute Query 5 to get Triage MCP API usage from MicrosoftGraphActivityLogs — filter by AppId 7b7b3966 for exact Triage MCP calls with endpoint/method/user breakdown
  • Execute Query 6 to get Triage MCP authentication events from SigninLogs/AADNonInteractiveUserSignInLogs — sign-in frequency, user attribution, IP, OS, country
  • Execute Query 7 to get LAQueryLogs for Advanced Hunting downstream queries via fc780465 / M365D_AdvancedHunting. Captures queries from any RunAdvancedHuntingQuery consumer (Triage MCP, Defender portal, Security Copilot) that hit connected LA tables. XDR-native tables (DeviceEvents, EmailEvents) don't appear here.

Phase 3: Sentinel Data Lake MCP Analysis

Data source: CloudAppEvents (Purview unified audit log)
Execution tool: RunAdvancedHuntingQuery preferred (30-day lookback, free for Analytics-tier tables). CloudAppEvents uses Timestamp in AH (not TimeGenerated). Fall back to mcp_sentinel-data_query_lake (uses TimeGenerated, 90d retention) only if lookback > 30 days or AH returns errors.
Filter: ActionType contains "Sentinel" or ActionType contains "KQL". RecordType is inside RawEventData (not a top-level column) — extract with parse_json(tostring(RawEventData)).RecordType. RecordType 403 = MCP tools, 379 = Direct KQL.

⚠️ MANDATORY: Execute Query 10 against query_lake before reporting any gap. If the query returns 0 results or table-not-found, THEN report the gap. Do NOT skip this phase based on assumptions about E5 licensing or Purview configuration — the table may be populated even without explicit Purview setup.

Audit Path: Sentinel Data Lake MCP tools are NOT audited via LAQueryLogs — they are tracked through Purview unified audit log, surfaced in the CloudAppEvents table. RecordType 403 (inside RawEventData) = Sentinel AI Tool activities, RecordType 379 = KQL activities.

MCP vs Direct KQL Delineation:

Access Pattern RecordType Interface Operation What It Represents
MCP Server-driven 403 IMcpToolTemplate SentinelAIToolRunStarted, SentinelAIToolRunCompleted Tool calls via Sentinel Data Lake MCP (e.g., query_lake, list_sentinel_workspaces, search_tables)
Direct KQL 379 Microsoft.SentinelGraph.AIPrimitives.Core.Services.KqsService KQLQueryCompleted KQL queries executed directly via Sentinel Graph / Data Lake Explorer (no MCP intermediary)

⚠️ Known Limitation (Discovered Mar 2026): RecordType 403 (SentinelAIToolRunCompleted / IMcpToolTemplate) may not be emitted by the Data Lake MCP server. In verified testing, all Data Lake MCP tool calls (query_lake, search_tables) appeared as RecordType 379 with Interface = "InterfaceNotProvided" — NOT as RecordType 403. When RecordType 403 returns 0 results:

  1. Do NOT report "0 MCP activity" — the audit pipeline has a gap, not the usage.
  2. Fallback: Use Interface breakdown within RecordType 379. InterfaceNotProvided contains MCP-driven queries. Cross-reference users in InterfaceNotProvided with known Sentinel MCP users from Q4/Q6 (SigninLogs). Known portal interfaces: msglakeexplorer@msec-msg (Portal Data Lake Explorer), msgjobmanagement@msec-msg (scheduled jobs), ipykernel_launcher.py (Jupyter), PowerBIConnector (Power BI), Microsoft.Medeina.Server (Security Copilot).
  3. Report as "Probable MCP" — clearly note the attribution is based on proxy signal (user overlap), not definitive RecordType 403 classification.

Key RawEventData Fields:

Field Description Example
ToolName MCP tool invoked query_lake, list_sentinel_workspaces, search_tables, analyze_url_entity
Interface Execution interface — distinguishes MCP from direct IMcpToolTemplate (MCP) vs KqsService (direct)
ExecutionDuration Duration in seconds (as string) "2.4731712"
FailureReason Error message if failed "SemanticError: 'DeviceDetail' column does not exist"
TablesRead Tables accessed by the query "SigninLogs"
DatabasesRead Log Analytics workspace name "la-yourworkspace"
TotalRows Rows returned 100
InputParameters Full tool input including KQL query text and workspaceId JSON string with query and workspaceId keys

Collect:

  • Execute Query 10 to get Data Lake MCP access pattern summary (tool/table/workspace inventory with MCP vs Direct KQL delineation)
  • Execute Query 11 to get tool-level breakdown with call counts and avg execution duration
  • Execute Query 12 to get error analysis for failed Data Lake MCP tool calls

Phase 4: Azure MCP Server Authentication & Queries

Data sources: SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs
Filter: AppId = 04b07795-8ddb-461a-bbee-02f9e1bf7b46 (sign-in logs, LAQueryLogs)

Collect:

  • Execute Query 13 to get Azure MCP Server authentication events from SigninLogs/AADNonInteractiveUserSignInLogs — filter by AppId 04b07795 (Azure CLI credential, field-tested Feb 2026). 🔄 Previously documented as AppId 1950a258 (AzurePowerShellCredential) — that path is obsolete.
  • Execute Query 14 to get Azure MCP Server workspace queries from LAQueryLogs — filter by AADClientId 04b07795. RequestClientApp is empty (not a unique fingerprint). Azure MCP appends \n| limit N to query text — use query text pattern as differentiator.

Detection Method (🔄 Updated Feb 2026):

The Azure MCP Server runs as a local .NET process (stdio mode) and authenticates via DefaultAzureCredential. Field-tested Feb 2026: The credential chain now resolves to Azure CLI credential (04b07795-8ddb-461a-bbee-02f9e1bf7b46), NOT AzurePowerShellCredential (1950a258) as previously documented.

Previous fingerprint (OBSOLETE): AppId 1950a258 + RequestClientApp = csharpsdk,LogAnalyticsPSClient. Only 1 occurrence found in 30-day lookback. The Azure MCP Server SDK path has changed.

Current fingerprint (field-tested Feb 2026):

Signal Azure MCP Server (Current) Azure CLI (Manual) Notes
AppId (SigninLogs) 04b07795 04b07795 Shared — not a unique differentiator
AADClientId (LAQueryLogs) 04b07795 04b07795 Shared
RequestClientApp (LAQueryLogs) Empty ("") Empty ("") Shared — not a unique differentiator. Empty RequestClientApp is also used by 4+ other AADClientIds
Query text pattern (LAQueryLogs) Appends \n| limit N to all queries No standard suffix Best differentiator — Azure MCP monitor_workspace_log_query always appends a limit operator
AzureActivity (Claims.appid) 04b07795 (write ops only) 04b07795 Shared; read ops not logged. Use Q14 HasLimitSuffix for query-level differentiation

🚨 Key change from previous documentation:

  • RequestClientApp = "csharpsdk,LogAnalyticsPSClient"OBSOLETE, no longer produced by Azure MCP Server
  • ❌ AppId 1950a258 (AzurePowerShellCredential) — OBSOLETE credential path
  • ✅ AppId 04b07795 (Azure CLI) — current credential path
  • RequestClientApp is empty — shared with Azure CLI and other tools
  • ✅ Query text containing \n| limit — most reliable query-level differentiator

Disambiguation challenges:

  • Azure MCP Server queries are difficult to isolate from manual Azure CLI queries in LAQueryLogs because both share the same AppId AND empty RequestClientApp
  • The \n| limit N suffix appended by monitor_workspace_log_query is the best heuristic but is not guaranteed to be unique
  • In SigninLogs, UserAgent containing azsdk-net-Identity with OS Microsoft Windows may still help if the credential chain includes Azure Identity SDK components
  • Consider correlating query timing with known MCP session activity for attribution

Authentication Sequence Observed (Current):

  1. Azure MCP Server acquires token via Azure CLI cached credential
  2. Token is reused for subsequent operations within its lifetime
  3. If MFA claim is missing → interactive browser prompt (rare with CLI credential)
  4. Subsequent calls reuse the cached token until expiry

🔴 Token Caching Behavior (Field-Tested Feb 2026):

  • Sign-in events appear at token acquisition time, NOT at each individual API call time
  • Once a token is cached, subsequent Azure MCP calls (list resources, get configs, etc.) do NOT generate new sign-in events
  • You will see 1-3 sign-in events per token lifecycle, not one per API call
  • To count actual API calls, correlate with AzureActivity (write ops) or LAQueryLogs (monitor_workspace_log_query calls)
  • The ~1hr token lifetime means at most ~24 sign-in event clusters per day of continuous use

AzureActivity visibility: Only ARM write/action/delete operations appear in AzureActivity (Administrative category). Azure MCP Server read-only operations (list subscriptions, list resource groups, list clusters) do NOT appear. Claims.appid = 04b07795 when write operations do occur.

Note: Azure MCP Server is difficult to isolate from manual Azure CLI usage because they share the same AppId and both produce empty RequestClientApp. The \n| limit N query text suffix is the best heuristic for LAQueryLogs. In SigninLogs, the shared AppId means Azure MCP authenticated as Azure CLI — there is no unique sign-in fingerprint. Present findings as "Azure MCP Server / Azure CLI (shared AppId 04b07795)" in reports.

Phase 5: Workspace Query Governance

Data source: LAQueryLogs (Analytics tier), CloudAppEvents (Data Lake tier)
Filter: All AADClientIds (LAQueryLogs), All Sentinel operations (CloudAppEvents)

Collect:

  • Execute Query 8 to get all clients querying the Analytics tier workspace with query counts, user counts, CPU usage
  • Data Lake tier query volume from Phase 3 results (Queries 10-12)
  • MCP proportion calculation: combined MCP query volume (Analytics + Data Lake tiers) / total query volume

Phase 6: Cross-Server User Analysis

Data sources: MicrosoftGraphActivityLogs, CloudAppEvents, SigninLogs, AADNonInteractiveUserSignInLogs

Collect:

  • Execute Query 9 to get Graph MCP caller attribution — User vs SPN breakdown
  • Execute Query 15 to get top MCP users ranked by cross-server breadth — identifies which users span the most MCP servers and their total call volume

Note: Query 15 joins user activity across all 4 MCP channels (Graph MCP, Triage MCP, Data Lake MCP, Azure CLI/MCP) and resolves UserIds to UPNs via SigninLogs. Data Lake MCP attribution uses InterfaceNotProvided proxy signal when RecordType 403 is unavailable.

Phase 7: Score Computation & Report Generation

  1. Compute per-dimension scores from Phase 1-6 data:
    • User Diversity: Count distinct users across all MCP channels (use Query 15 cross-server results)
    • Endpoint Sensitivity: % of Graph MCP calls to sensitive patterns (Phase 1 Query 2 IsSensitive column)
    • Error Rate: % of non-2xx responses across all MCP channels
    • Volume Anomaly: Compare most recent day vs rolling average (Phase 1 Query 1 daily data)
    • Off-Hours Activity: % of MCP calls outside 08:00-18:00 (Phase 1 Query 2 OffHoursCalls column)
  2. Sum dimension scores for composite MCP Usage Score
  3. Include Top MCP Users table in report (Phase 6 — Query 15 cross-server results)
  4. Generate security assessment with emoji-coded findings
  5. Render output in the user's selected mode
  6. Validate report completeness — after composing the report, run the Report Completeness Checklist below. Cross-check every required section against the template before saving/presenting. Fix any missing sections before finalizing.

Sample KQL Queries

🔴 MANDATORY: Execute these queries EXACTLY as written. Substitute only the time range parameter (e.g., ago(30d)ago(90d)) and entity-specific values where indicated. These queries are schema-verified and encode mitigations for pitfalls documented in Known Pitfalls. Rewriting, paraphrasing, or constructing "equivalent" queries from scratch risks hitting the exact schema issues these queries were designed to avoid.

Action Status
Rewriting a pre-authored query from scratch PROHIBITED
Removing parse_json() / tostring() wrappers from queries PROHIBITED
Substituting column names without schema verification PROHIBITED
Using has instead of contains for CamelCase fields PROHIBITED
Executing a query not from this section without completing the Pre-Flight Checklist PROHIBITED

Query 1: Unified Daily MCP Activity Trend

Note: Consolidates former Q1 (Graph MCP daily), Q7d (Triage MCP daily), Q23 (Data Lake MCP daily), Q25a (Azure MCP daily) into a single union query. Feeds: SVG dashboard Row 5 line chart (daily_mcp_trend) — all 4 series in one query.
Tool: mcp_sentinel-data_query_lake (union of SigninLogs + AADNonInteractiveUserSignInLogs fails in AH when AADNonInteractiveUserSignInLogs is on Data Lake tier — common in customer environments).
⚠️ Timestamp: All tables use TimeGenerated in Data Lake (unlike AH where CloudAppEvents uses Timestamp).

// Unified Daily MCP Activity Trend — all 4 MCP servers in one pass
// Configurable: replace 30d with desired lookback (max 30d for AH)
let lookback = 30d;
// --- Graph MCP (AppId e8c77dc2) ---
let graph_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| summarize Calls = count(),
    Errors = countif(ResponseStatusCode >= 400)
    by Day = bin(TimeGenerated, 1d)
| extend Server = "Graph MCP";
// --- Triage MCP (AppId 7b7b3966) ---
let triage_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "7b7b3966-1961-47b5-b080-43ca5482e21c"
| summarize Calls = count(),
    Errors = countif(ResponseStatusCode >= 400)
    by Day = bin(TimeGenerated, 1d)
| extend Server = "Triage MCP";
// --- Data Lake MCP (CloudAppEvents RecordType 379 + InterfaceNotProvided) ---
let data_lake_mcp = CloudAppEvents
| where TimeGenerated >= ago(lookback)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend RecordType = toint(RawData.RecordType),
    Interface = tostring(RawData.Interface),
    FailureReason = tostring(RawData.FailureReason)
| where RecordType == 379 and (Interface == "InterfaceNotProvided" or isempty(Interface))
| summarize Calls = count(),
    Errors = countif(isnotempty(FailureReason) and FailureReason != "")
    by Day = bin(TimeGenerated, 1d)
| extend Server = "Data Lake MCP";
// --- Azure MCP/CLI (AppId 04b07795 — shared with Azure CLI) ---
let azure_interactive = SigninLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
| project TimeGenerated, ResultType;
let azure_noninteractive = AADNonInteractiveUserSignInLogs
| where TimeGenerated >= ago(lookback)
| where AppId == "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
| project TimeGenerated, ResultType;
let azure_mcp = union azure_interactive, azure_noninteractive
| summarize Calls = count(),
    Errors = countif(ResultType != "0" and ResultType != "")
    by Day = bin(TimeGenerated, 1d)
| extend Server = "Azure MCP/CLI";
// --- Union all servers ---
union graph_mcp, triage_mcp, data_lake_mcp, azure_mcp
| extend ErrorRate = iff(Calls > 0, round(100.0 * Errors / Calls, 1), 0.0)
| project Server, Day, Calls, Errors, ErrorRate
| order by Day asc, Server asc

Query 2: Graph MCP — Endpoint & Activity Summary

Replaces: former Q2 (Top Endpoints), Q3 (Sensitive API Access), Q11 (Off-Hours Activity).
Tool: RunAdvancedHuntingQuery
Report derivation: Top endpoints = all rows by CallCount desc. Sensitive endpoints = where IsSensitive. Off-hours % = sum(OffHoursCalls) / sum(CallCount) across all rows.

// Graph MCP — single-pass endpoint analysis with sensitivity + off-hours enrichment
let sensitive_patterns = dynamic([
    "roleManagement", "roleAssignments", "roleEligibility",
    "authentication/methods", "identityProtection", "riskyUsers",
    "riskDetections", "conditionalAccess", "servicePrincipals",
    "appRoleAssignments", "oauth2PermissionGrants",
    "auditLogs", "directoryRoles", "privilegedAccess",
    "security/alerts", "security/incidents"
]);
MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(30d)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| extend Endpoint = tostring(split(RequestUri, "?")[0])
| extend HourOfDay = datetime_part("hour", TimeGenerated)
| extend DayOfWeek = dayofweek(TimeGenerated) / 1d
| extend IsOffHours = HourOfDay < 8 or HourOfDay >= 18 or DayOfWeek >= 5
| extend IsSensitive = RequestUri has_any (sensitive_patterns)
| summarize 
    CallCount = count(),
    DistinctUsers = dcount(UserId),
    ErrorCount = countif(ResponseStatusCode >= 400),
    AvgDurationMs = round(avg(DurationMs), 0),
    OffHoursCalls = countif(IsOffHours),
    Methods = make_set(RequestMethod, 5),
    Users = make_set(UserId, 10),
    LastUsed = max(TimeGenerated)
    by Endpoint, IsSensitive
| extend 
    ErrorRate = round(100.0 * ErrorCount / CallCount, 1),
    OffHoursPct = round(100.0 * OffHoursCalls / CallCount, 1)
| order by CallCount desc
| take 50

Query 3: Sentinel MCP — Authentication Events

Tool: RunAdvancedHuntingQuery (30-day lookback, free for Analytics-tier tables). Fall back to mcp_sentinel-data_query_lake only if lookback > 30 days.
⚠️ Pitfall-aware: Uses parse_json(Status) and parse_json(DeviceDetail) wrappers — required for Data Lake (string columns) and safe in AH. Uses = syntax (not as) in project — see project as Keyword Fails in Advanced Hunting.

// Who is authenticating to Sentinel MCP (via VS Code, Copilot Studio, browser)
SigninLogs
| where TimeGenerated >= ago(30d)
| where ResourceDisplayName =~ "Sentinel Platform Services"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
    ResourceDisplayName, IPAddress, 
    ErrorCode = tostring(parse_json(Status).errorCode),
    ConditionalAccessStatus, AuthenticationRequirement, ClientAppUsed,
    OS = tostring(parse_json(DeviceDetail).operatingSystem),
    Country = tostring(parse_json(LocationDetails).countryOrRegion)
| order by TimeGenerated desc

Query 4: Sentinel MCP — Client App Breakdown

Tool: RunAdvancedHuntingQuery (30-day lookback, free for Analytics-tier tables).

// Which client apps (VS Code, Copilot Studio, browser) are accessing Sentinel MCP
SigninLogs
| where TimeGenerated >= ago(30d)
| where ResourceDisplayName =~ "Sentinel Platform Services"
| summarize 
    SignInCount = count(),
    DistinctUsers = dcount(UserPrincipalName),
    Users = make_set(UserPrincipalName, 10),
    LastSeen = max(TimeGenerated)
    by AppDisplayName, AppId, ClientAppUsed
| order by SignInCount desc

Query 5: Sentinel Triage MCP — API Call Activity (Dedicated AppId)

// Measure Sentinel Triage MCP API calls via its dedicated AppId in MicrosoftGraphActivityLogs.
// AppId 7b7b3966 = "Microsoft Defender Mcp" — the Triage MCP server's own identity.
// This gives DEFINITIVE attribution of Triage MCP calls — no shared-surface estimation needed.
//
// Confirmed Feb 2026: AppId 7b7b3966 appears in MicrosoftGraphActivityLogs with delegated
// auth (certificate), full UserId attribution, and scopes SecurityAlert.Read.All,
// SecurityIncident.Read.All, ThreatHunting.Read.All.
//
// Known API endpoints:
//   - POST /v1.0/security/runHuntingQuery/ (Advanced Hunting)
//   - GET  /security/incidents/ (ListIncidents, GetIncidentById)
//   - GET  /security/alerts_v2/ (ListAlerts, GetAlertById)
let triage_mcp_appid = "7b7b3966-1961-47b5-b080-43ca5482e21c";
MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(30d)
| where AppId == triage_mcp_appid
| extend Endpoint = extract(@"/v\d\.\d/(.+?)(\?|$)", 1, RequestUri)
| summarize 
    Calls = count(),
    DistinctUsers = dcount(UserId),
    Users = make_set(UserId, 10),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by RequestMethod, Endpoint
| order by Calls desc
| take 25

Query 6: Sentinel Triage MCP — Authentication Events (SigninLogs)

Tool: mcp_sentinel-data_query_lake (union of SigninLogs + AADNonInteractiveUserSignInLogs fails in AH when AADNonInteractiveUserSignInLogs is on Data Lake tier — common in customer environments).
⚠️ Pitfall-aware: Uses parse_json() wrappers on DeviceDetail/LocationDetails — required for Data Lake (string columns). Uses = syntax (not as) in project.

// Triage MCP authentication events from SigninLogs + AADNonInteractiveUserSignInLogs.
// AppId 7b7b3966 = "Microsoft Defender Mcp" — delegated auth with certificate.
// Uses parse_json() wrappers for DeviceDetail/LocationDetails (safe in both AH and Data Lake).
let triage_mcp_appid = "7b7b3966-1961-47b5-b080-43ca5482e21c";
let signinlogs_interactive = SigninLogs
| where TimeGenerated >= ago(30d)
| where AppId == triage_mcp_appid
| extend SignInType = "Interactive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
    ResourceDisplayName, IPAddress,
    ResultType = tostring(ResultType),
    ResultDescription = tostring(ResultDescription),
    SignInType,
    OS = tostring(parse_json(DeviceDetail).operatingSystem),
    Browser = tostring(parse_json(DeviceDetail).browser),
    Country = tostring(parse_json(LocationDetails).countryOrRegion),
    City = tostring(parse_json(LocationDetails).city);
let signinlogs_noninteractive = AADNonInteractiveUserSignInLogs
| where TimeGenerated >= ago(30d)
| where AppId == triage_mcp_appid
| extend SignInType = "NonInteractive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
    ResourceDisplayName, IPAddress,
    ResultType = tostring(ResultType),
    ResultDescription = tostring(ResultDescription),
    SignInType,
    OS = tostring(parse_json(DeviceDetail).operatingSystem),
    Browser = tostring(parse_json(DeviceDetail).browser),
    Country = tostring(parse_json(LocationDetails).countryOrRegion),
    City = tostring(parse_json(LocationDetails).city);
union signinlogs_interactive, signinlogs_noninteractive
| summarize
    SignIns = count(),
    DistinctUsers = dcount(UserPrincipalName),
    Users = make_set(UserPrincipalName, 10),
    IPs = make_set(IPAddress, 10),
    Countries = make_set(Country, 10),
    LastSeen = max(TimeGenerated)
    by AppDisplayName, SignInType, ResourceDisplayName
| order by SignIns desc

Query 7: LAQueryLogs — Advanced Hunting Downstream Queries (Supplementary Signal)

// SUPPLEMENTARY detection: Advanced Hunting queries (from Triage MCP, Defender portal,
// Security Copilot, or any RunAdvancedHuntingQuery consumer) that hit connected
// Log Analytics workspace tables.
//
// AH downstream queries appear under fc780465 (Sentinel Engine) with
// RequestClientApp "M365D_AdvancedHunting" — full user attribution (AADEmail populated).
//
// This is a DOWNSTREAM signal — it only fires when RunAdvancedHuntingQuery targets
// Sentinel-connected LA tables (SigninLogs, AuditLogs, SecurityAlert, etc.).
// Queries hitting XDR-native tables (DeviceEvents, EmailEvents, etc.) stay in the
// Defender XDR backend and never appear here.
//
// Use alongside Query 5 (MicrosoftGraphActivityLogs) for complete Triage MCP coverage:
//   - Query 5 = PRIMARY: Triage MCP API calls filtered by dedicated AppId 7b7b3966
//   - Query 7 = SUPPLEMENTARY: downstream query execution when AH hits LA tables
//
// ATTRIBUTION LIMITATION: Cannot distinguish Triage MCP AH queries from Defender portal
// AH queries or Security Copilot AH queries — all appear as M365D_AdvancedHunting.
LAQueryLogs
| where TimeGenerated >= ago(30d)
| where AADClientId == "fc780465-2017-40d4-a0c5-307022471b92" and RequestClientApp == "M365D_AdvancedHunting"
| summarize 
    QueryCount = count(),
    DistinctUsers = dcount(AADEmail),
    Users = make_set(AADEmail, 10),
    AvgCPUMs = avg(StatsCPUTimeMs),
    TotalRowsReturned = sum(ResponseRowCount),
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated)
    by AADClientId, RequestClientApp
| order by QueryCount desc

Query 8: All Workspace Query Sources — Complete Governance View

// Every client querying the workspace — MCP and non-MCP combined
LAQueryLogs
| where TimeGenerated >= ago(30d)
| summarize 
    QueryCount = count(),
    DistinctUsers = dcount(AADEmail),
    AvgCPUMs = avg(StatsCPUTimeMs),
    TotalRowsReturned = sum(ResponseRowCount)
    by AADClientId
| order by QueryCount desc

Query 9: Graph MCP — Caller Attribution (User vs SPN)

// Attribute Graph MCP calls to User, Service Principal, or SPN subtype
// Key: UserId populated = delegated (user), ServicePrincipalId populated = app-only (SPN)
// ClientAuthMethod: 0 = public client (user), 1 = client secret (SPN), 2 = certificate (SPN)
MicrosoftGraphActivityLogs
| where TimeGenerated >= ago(30d)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| extend CallerType = case(
    isnotempty(ServicePrincipalId) and isempty(UserId), "ServicePrincipal/Agent (App-Only)",
    isnotempty(UserId) and isnotempty(ServicePrincipalId), "Delegated (User+SPN/Agent OBO)",
    isnotempty(UserId) and isempty(ServicePrincipalId), "User (Delegated)",
    "Unknown")
| extend AuthMethod = case(
    ClientAuthMethod == 0, "Public Client",
    ClientAuthMethod == 1, "Client Secret",
    ClientAuthMethod == 2, "Client Certificate",
    "Unknown")
| summarize
    CallCount = count(),
    DistinctEndpoints = dcount(tostring(split(RequestUri, "?")[0])),
    SuccessRate = round(100.0 * countif(ResponseStatusCode >= 200 and ResponseStatusCode < 300) / count(), 1),
    SampleEndpoints = make_set(tostring(split(RequestUri, "?")[0]), 5),
    IPs = make_set(IPAddress, 5)
    by CallerType, AuthMethod, UserId, ServicePrincipalId
| order by CallCount desc

Post-processing: For any rows where CallerType = "ServicePrincipal/Agent (App-Only)", cross-reference the ServicePrincipalId with Entra via Graph API:

  1. Primary method (most reliable): Query /beta/servicePrincipals/{id}?$select=id,appId,displayName,servicePrincipalType,tags — check tags array for agentic indicators:
    • AgenticApp — confirms this is an agent application
    • AIAgentBuilder — agent was created by an AI agent builder platform
    • AgentCreatedBy:CopilotStudio — specifically created by Copilot Studio
    • AgenticInstance — runtime instance of an agent
    • power-virtual-agents-* — Copilot Studio internal tracking tag
  2. Fallback: Check servicePrincipalType — if it equals "Agent", it is a registered Agent Identity. Note: as of Feb 2026, Copilot Studio agents still show "Application" here despite being true agents.
  3. Name-based filtering is UNRELIABLE — SPNs with "Agent" in display name may be standard app registrations (e.g., "Contoso Agent Tools" = GitCreatedApp).

Use microsoft_graph_suggest_queriesmicrosoft_graph_get for the Graph API calls. Query multiple SPNs in one call: /beta/servicePrincipals?$count=true&$filter=id in ('id1','id2')&$select=id,appId,displayName,servicePrincipalType,tags.

Query 10: Data Lake MCP — Access Pattern Summary

Note: Consolidates former Q20 (Tool Usage Summary) + Q24 (MCP vs Direct KQL Delineation) into a single query. Tool: RunAdvancedHuntingQuery (uses Timestamp for CloudAppEvents).
⚠️ Pitfall-aware: Uses contains (not has) for ActionType/Operation — see CloudAppEvents CamelCase Matching. Uses parse_json(tostring(RawEventData)) — see CloudAppEvents RawEventData Parsing. Filters on SentinelAIToolRunCompleted only — see CloudAppEvents Double-Counting Prevention.

// Data Lake MCP — single-pass access pattern delineation + tool/table/workspace inventory
// Combines former Q20 (summary) and Q24 (delineation) into one query
CloudAppEvents
| where Timestamp >= ago(30d)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend 
    Operation = tostring(RawData.Operation),
    RecordType = toint(RawData.RecordType),
    ToolName = tostring(RawData.ToolName),
    Interface = tostring(RawData.Interface),
    ExecutionDuration = todouble(RawData.ExecutionDuration),
    FailureReason = tostring(RawData.FailureReason),
    TablesRead = tostring(RawData.TablesRead),
    DatabasesRead = tostring(RawData.DatabasesRead),
    TotalRows = toint(RawData.TotalRows),
    UserId_raw = tostring(RawData.UserId),
    InputParams = tostring(RawData.InputParameters)
| extend 
    AccessPattern = case(
        RecordType == 403 and Interface == "IMcpToolTemplate", "MCP Server-Driven",
        RecordType == 379 and (Interface == "InterfaceNotProvided" or isempty(Interface)), "MCP-Driven (Probable)",
        RecordType == 379 and Interface has "msglakeexplorer", "Portal (Data Lake Explorer)",
        RecordType == 379 and Interface has "msgjobmanagement", "Scheduled Jobs",
        RecordType == 379, "Other Direct KQL",
        "Other"),
    IsSuccess = isempty(FailureReason) or FailureReason == "",
    HasKQLQuery = InputParams has "query"
| where Operation contains "Completed" or RecordType == 379  // 'contains' not 'has' — CamelCase
| summarize
    TotalCalls = count(),
    SuccessCount = countif(IsSuccess),
    FailureCount = countif(not(IsSuccess)),
    DistinctTools = dcount(ToolName),
    Tools = make_set(ToolName, 20),
    DistinctTables = dcount(TablesRead),
    Tables = make_set(TablesRead, 30),
    Workspaces = make_set(DatabasesRead, 5),
    AvgDurationSec = round(avg(ExecutionDuration), 2),
    TotalRowsReturned = sum(TotalRows),
    DistinctUsers = dcount(UserId_raw),
    Users = make_set(UserId_raw, 10),
    KQLQueryCount = countif(HasKQLQuery),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by AccessPattern
| extend ErrorRate = round(100.0 * FailureCount / TotalCalls, 1)
| order by TotalCalls desc

Post-processing for Query 10:

  • If MCP Server-Driven (RecordType 403) has results → use it directly as the definitive MCP count.
  • If MCP Server-Driven returns 0 rows but MCP-Driven (Probable) has results → report the probable count with the audit gap caveat. Cross-reference users with Q4/Q6 SigninLogs to validate.
  • Portal (Data Lake Explorer) = msglakeexplorer@msec-msg interface, Scheduled Jobs = msgjobmanagement@msec-msg.
  • Combine with Query 8 (Analytics tier LAQueryLogs — all workspace sources) for a complete two-tier governance view:
Tier Data Source MCP Sources Non-MCP Sources
Analytics Tier LAQueryLogs AH backend fc780465 / M365D_AdvancedHunting (captures AH queries from Triage MCP, Defender portal, Security Copilot that hit connected LA tables; shared surface, see Query 7) Sentinel Portal (80ccca67), Sentinel Engine analytics (fc780465, non-AH), Logic Apps (de8c33bb)
Data Lake Tier CloudAppEvents Data Lake MCP (RecordType 403, IMcpToolTemplate) Direct KQL (RecordType 379, KqsService)
Graph API MicrosoftGraphActivityLogs Graph MCP (e8c77dc2)
Azure MCP SigninLogs, AADNonInteractiveUserSignInLogs, LAQueryLogs Azure MCP Server (04b07795, empty RequestClientApp, query text `\n limit N` suffix)

Query 11: Data Lake MCP — Interface Breakdown

Tool: RunAdvancedHuntingQuery (uses Timestamp for CloudAppEvents).
⚠️ Pitfall-aware: Uses contains/parse_json(tostring()) pattern — see Query 10 pitfall notes. Uses todouble(ExecutionDuration) — see Data Lake MCP ExecutionDuration Format. When RecordType 403 is present, groups by ToolName; when absent, falls back to Interface field.

// Breakdown of Data Lake access by Interface — identifies MCP vs Portal vs Jobs
// PRIMARY: Uses RecordType 403 / ToolName when available (MCP audit events)
// FALLBACK: When RecordType 403 absent, groups by Interface field from RecordType 379
//   - InterfaceNotProvided = probable MCP-driven (cross-ref with Q4/Q6 SigninLogs)
//   - msglakeexplorer@msec-msg = Sentinel Portal Data Lake Explorer
//   - msgjobmanagement@msec-msg = Scheduled/job-based queries
//   - ipykernel_launcher.py = Jupyter Notebook
//   - PowerBIConnector = Power BI
//   - Microsoft.Medeina.Server = Security Copilot
CloudAppEvents
| where Timestamp >= ago(30d)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend 
    Operation = tostring(RawData.Operation),
    RecordType = toint(RawData.RecordType),
    ToolName = tostring(RawData.ToolName),
    Interface = tostring(RawData.Interface),
    ExecutionDuration = todouble(RawData.ExecutionDuration),
    FailureReason = tostring(RawData.FailureReason),
    TablesRead = tostring(RawData.TablesRead),
    UserId_raw = tostring(RawData.UserId)
| where Operation contains "Completed" or RecordType == 379
| extend 
    // When RecordType 403 exists, ToolName is the grouping key; otherwise use Interface
    GroupKey = iff(RecordType == 403, coalesce(ToolName, "unknown_tool"), coalesce(Interface, "InterfaceNotProvided")),
    IsSuccess = isempty(FailureReason) or FailureReason == "",
    Source = iff(RecordType == 403, "MCP Tool (RecordType 403)", "Interface (RecordType 379)")
| summarize
    CallCount = count(),
    SuccessCount = countif(IsSuccess),
    FailureCount = countif(not(IsSuccess)),
    AvgDurationSec = round(avg(ExecutionDuration), 2),
    MaxDurationSec = round(max(ExecutionDuration), 2),
    TablesAccessed = make_set(TablesRead, 20),
    DistinctUsers = dcount(UserId_raw),
    Users = make_set(UserId_raw, 10),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by GroupKey, Source
| extend ErrorRate = round(100.0 * FailureCount / CallCount, 1)
| order by CallCount desc

Query 12: Data Lake MCP — Error Analysis

Tool: RunAdvancedHuntingQuery (uses Timestamp for CloudAppEvents).
⚠️ Pitfall-aware: Uses contains/parse_json(tostring()) pattern — see Query 10 pitfall notes. Now groups errors by both AccessPattern (MCP vs Portal vs Jobs) and ErrorCategory for richer diagnostics.

// Analyze failed Data Lake queries — identify schema errors, permission issues, etc.
// PRIMARY: Filters on ActionType contains "SentinelAITool" (RecordType 403) when available
// FALLBACK: When RecordType 403 absent, analyzes all failed RecordType 379 events grouped by Interface
CloudAppEvents
| where Timestamp >= ago(30d)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| extend 
    Operation = tostring(RawData.Operation),
    RecordType = toint(RawData.RecordType),
    ToolName = tostring(RawData.ToolName),
    Interface = tostring(RawData.Interface),
    FailureReason = tostring(RawData.FailureReason),
    TablesRead = tostring(RawData.TablesRead),
    UserId_raw = tostring(RawData.UserId)
| where Operation contains "Completed" or RecordType == 379
| where isnotempty(FailureReason) and FailureReason != ""
| extend 
    AccessPattern = case(
        RecordType == 403 and Interface == "IMcpToolTemplate", "MCP Server-Driven",
        RecordType == 379 and (Interface == "InterfaceNotProvided" or isempty(Interface)), "MCP-Driven (Probable)",
        RecordType == 379 and Interface has "msglakeexplorer", "Portal (Data Lake Explorer)",
        RecordType == 379 and Interface has "msgjobmanagement", "Scheduled Jobs",
        RecordType == 379, "Other Direct KQL",
        "Other"),
    ErrorCategory = case(
        FailureReason has "SemanticError", "Schema/Semantic Error",
        FailureReason has "SyntaxError", "KQL Syntax Error",
        FailureReason has "Unauthorized" or FailureReason has "403", "Permission Denied",
        FailureReason has "Timeout", "Query Timeout",
        FailureReason has "NotFound", "Table/Resource Not Found",
        "Other Error")
| summarize
    ErrorCount = count(),
    Tools = make_set(ToolName, 10),
    Tables = make_set(TablesRead, 10),
    Users = make_set(UserId_raw, 10),
    SampleErrors = make_set(substring(FailureReason, 0, 150), 5),
    FirstSeen = min(Timestamp),
    LastSeen = max(Timestamp)
    by AccessPattern, ErrorCategory
| order by AccessPattern asc, ErrorCount desc

Query 13: Azure MCP Server — Authentication Events (SigninLogs)

Tool: mcp_sentinel-data_query_lake (90d lookback exceeds AH 30d limit).
⚠️ Pitfall-aware: Uses parse_json(Status)/parse_json(DeviceDetail) wrappers — see SigninLogs Status Field Needs parse_json(). Uses extend SignInType to avoid Type pseudo-column — see Type Column Unavailable in Data Lake Union Contexts.

// Detect Azure MCP Server authentication events via Azure CLI AppId.
//
// 🔄 UPDATED Feb 2026: Azure MCP Server now uses Azure CLI credential (04b07795),
// NOT AzurePowerShellCredential (1950a258) as previously documented.
// The old AppId 1950a258 + UserAgent 'azsdk-net-Identity' fingerprint is OBSOLETE.
//
// ⚠️ SHARED APPID: 04b07795 is the Azure CLI AppId — shared with manual 'az' CLI usage.
// There is NO unique sign-in fingerprint for Azure MCP Server vs manual Azure CLI.
// This query returns ALL Azure CLI sign-ins. Correlate with LAQueryLogs (Query 14)
// for query-level attribution via the '\n| limit N' text pattern.
//
// NOTE: Sign-in events represent TOKEN ACQUISITIONS, not individual API calls.
// A cached token serves many Azure MCP calls with no additional sign-in events.
// FIX (Feb 2026): Explicit tostring() casts on ResultType, ResultDescription,
// ConditionalAccessStatus, AuthenticationRequirement to prevent union type mismatches
// between SigninLogs and AADNonInteractiveUserSignInLogs. Removed ResourceId (inconsistent
// across tables). Use parse_json() wrapper on DeviceDetail and LocationDetails — these
// columns may be stored as string (not dynamic) in Data Lake workspaces, causing
// SemanticError on dot-notation access without parse_json().
let azure_mcp_appid = "04b07795-8ddb-461a-bbee-02f9e1bf7b46";
let signinlogs_interactive = SigninLogs
| where TimeGenerated >= ago(90d)
| where AppId == azure_mcp_appid
| extend SignInType = "Interactive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
    ResourceDisplayName, IPAddress, 
    ResultType = tostring(ResultType),
    ResultDescription = tostring(ResultDescription),
    UserAgent, SignInType,
    ConditionalAccessStatus = tostring(ConditionalAccessStatus),
    AuthenticationRequirement = tostring(AuthenticationRequirement),
    OS = tostring(parse_json(DeviceDetail).operatingSystem),
    Country = tostring(parse_json(LocationDetails).countryOrRegion);
let signinlogs_noninteractive = AADNonInteractiveUserSignInLogs
| where TimeGenerated >= ago(90d)
| where AppId == azure_mcp_appid
| extend SignInType = "Non-Interactive"
| project TimeGenerated, UserPrincipalName, AppDisplayName, AppId,
    ResourceDisplayName, IPAddress,
    ResultType = tostring(ResultType),
    ResultDescription = tostring(ResultDescription),
    UserAgent, SignInType,
    ConditionalAccessStatus = tostring(ConditionalAccessStatus),
    AuthenticationRequirement = tostring(AuthenticationRequirement),
    OS = tostring(parse_json(DeviceDetail).operatingSystem),
    Country = tostring(parse_json(LocationDetails).countryOrRegion);
union signinlogs_interactive, signinlogs_noninteractive
| order by TimeGenerated desc

Query 14: Azure MCP Server — Workspace Queries (LAQueryLogs)

Tool: mcp_sentinel-data_query_lake (90d lookback exceeds AH 30d limit).

// Detect Azure MCP Server workspace queries via LAQueryLogs.
//
// 🔄 UPDATED Feb 2026: Azure MCP Server now uses Azure CLI credential (04b07795).
// RequestClientApp is EMPTY (not 'csharpsdk,LogAnalyticsPSClient' as previously documented).
//
// ⚠️ SHARED FINGERPRINT: Empty RequestClientApp + AppId 04b07795 is shared with manual
// Azure CLI and 4+ other AADClientIds. This query returns ALL queries from AppId 04b07795
// with empty RequestClientApp. To isolate Azure MCP Server queries, look for the
// '\n| limit N' suffix that monitor_workspace_log_query always appends to query text.
//
// 30-day pattern analysis (Feb 2026) showed 11 distinct RequestClientApp values:
//   - Empty ("") = 417 queries across 5 AADClientIds (Azure MCP, Sentinel DL MCP, Portal, etc.)
//   - "csharpsdk,LogAnalyticsPSClient" = only 1 query ever (obsolete fingerprint)
//   - "M365D_AdvancedHunting" = Advanced Hunting backend
//   - "ASI_Portal" / "ASI_Portal_Connectors" = Sentinel Portal
//   - Others: AppInsightsPortalExtension, LogicApps, PSClient, etc.
let azure_cli_appid = "04b07795-8ddb-461a-bbee-02f9e1bf7b46";
LAQueryLogs
| where TimeGenerated >= ago(90d)
| where AADClientId == azure_cli_appid
| extend HasLimitSuffix = QueryText has "\n| limit" or QueryText has "\r\n| limit"
| project TimeGenerated, AADEmail, AADClientId,
    RequestClientApp,
    QueryTextTruncated = substring(QueryText, 0, 300),
    ResponseCode, ResponseRowCount,
    StatsCPUTimeMs,
    RequestTarget,
    HasLimitSuffix
| order by TimeGenerated desc

Post-processing: Rows with HasLimitSuffix = true are highly likely Azure MCP Server queries (the monitor_workspace_log_query command always appends | limit N). Rows without the suffix may be manual Azure CLI or other tools using the same credential.

Query 15: Top MCP Users — Cross-Server Breadth

Tool: RunAdvancedHuntingQuery (7-day lookback default, all tables on Analytics tier). Purpose: Identifies users with the broadest MCP footprint — ranking by how many distinct MCP server types they use and their total call volume across all channels. Feeds the Top MCP Users report section and SVG dashboard widget.

let lookback = 7d;
let graph_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated > ago(lookback)
| where AppId == "e8c77dc2-69b3-43f4-bc51-3213c9d915b4"
| where isnotempty(UserId)
| summarize Calls = count() by UserId
| project UserId, Server = "Graph MCP", Calls;
let triage_mcp = MicrosoftGraphActivityLogs
| where TimeGenerated > ago(lookback)
| where AppId == "7b7b3966-1961-47b5-b080-43ca5482e21c"
| where isnotempty(UserId)
| summarize Calls = count() by UserId
| project UserId, Server = "Triage MCP", Calls;
let datalake_mcp = CloudAppEvents
| where Timestamp > ago(lookback)
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| extend RawData = parse_json(tostring(RawEventData))
| where tostring(RawData.Interface) == "InterfaceNotProvided" or isempty(tostring(RawData.Interface))
| where isnotempty(AccountObjectId)
| summarize Calls = count() by UserId = AccountObjectId
| project UserId, Server = "Data Lake MCP", Calls;
let azure_mcp = union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(lookback)
| where AppId == "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
| where isnotempty(UserId)
| summarize Calls = count() by UserId
| project UserId, Server = "Azure CLI/MCP", Calls;
let upn_map = union SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(lookback)
| where isnotempty(UserPrincipalName)
| summarize arg_max(TimeGenerated, UserPrincipalName) by UserId
| project UserId, UPN = UserPrincipalName;
union graph_mcp, triage_mcp, datalake_mcp, azure_mcp
| summarize Servers = make_set(Server), ServerCount = dcount(Server), TotalCalls = sum(Calls) by UserId
| join kind=leftouter upn_map on UserId
| project UPN = coalesce(UPN, UserId), ServerCount, Servers, TotalCalls
| sort by ServerCount desc, TotalCalls desc
| take 25

⚠️ Pitfall-aware:

  • Data Lake MCP leg: Uses ActionType contains (not has) per the CamelCase pitfall. Parses RawEventData once and filters on Interface field for the InterfaceNotProvided proxy signal when RecordType 403 is unavailable (see Phase 3 Known Limitation).
  • Azure CLI/MCP leg: Uses shared AppId 04b07795 — includes both Azure MCP Server and manual az CLI sign-ins. Cannot distinguish at this level.
  • UPN resolution: Joins with SigninLogs to resolve UserId GUIDs to human-readable UPNs. Users with no recent sign-ins will show their GUID instead.
  • CloudAppEvents timestamp: Uses Timestamp (not TimeGenerated) since this runs via Advanced Hunting.
  • AADNonInteractiveUserSignInLogs tier: If this table is on Data Lake/Basic tier, the union SigninLogs, AADNonInteractiveUserSignInLogs legs may fail in AH. Fall back to mcp_sentinel-data_query_lake if needed (switch TimestampTimeGenerated for the CloudAppEvents leg).

Post-processing:

  • Render as a ranked table in the report: | Rank | User (UPN) | Servers Used | MCP Servers | Total Calls |
  • Users spanning 3+ servers represent the broadest MCP adoption — highlight them.
  • Cross-reference top users with the sensitive endpoint data from Q2 to flag users with both breadth AND sensitive access.

Report Template

Inline Chat Report Structure

The inline report MUST include these sections in order:

  1. Header — Workspace, analysis period, data sources checked, MCP servers detected
  2. Executive Summary — 2-3 sentence overview of MCP usage posture
  3. MCP Footprint Summary (SVG-critical: provides consolidated KPIs for dashboard Row 2 + Row 3)
    • Server Landscape table — one row per MCP server with: Server, API Calls, Auth Events, Distinct Users, Error Rate, Status. This table feeds the SVG server_landscape widget directly.
    • Consolidated KPI block — aggregate totals across all servers:
      Total MCP API Calls: <sum of API calls across Graph + Triage + Data Lake + Azure>
      Total Auth Events: <sum of auth events across Triage + Azure + Platform Services>
      Distinct MCP Users: <deduplicated count or max across channels>
      Active MCP Servers: <count of server types with >0 activity>
      Combined MCP Query Share: <MCP queries / total workspace queries %>
      Sensitive API Rate: <sensitive / total Graph MCP calls %>
      
    • These values are derived from Phase 1-5 query results and MUST be rendered as a single block for SVG extraction. Do not scatter them across per-server sections only.
  4. Graph MCP Server Analysis
    • Daily usage trend (ASCII bar chart showing requests/day — from Query 1 unified trend, Graph MCP series)
    • Top endpoints table (endpoint, call count, % of total, last used)
    • Sensitive API access summary with user attribution
    • Caller attribution (User vs SPN vs Agent — from Query 9)
  5. Sentinel Triage MCP Analysis
    • Triage MCP API calls from MicrosoftGraphActivityLogs — filtered by dedicated AppId 7b7b3966 ("Microsoft Defender Mcp")
    • Daily usage trend (ASCII bar chart showing calls/day — from Query 1 unified trend, Triage MCP series)
    • Triage MCP authentication events from SigninLogs/AADNonInteractiveUserSignInLogs — sign-in frequency, user attribution, IP, country
    • User attribution table with sign-in type breakdown
  6. Sentinel Data Lake MCP Analysis
    • MCP tool usage summary (success/failure, avg duration)
    • Tool breakdown table (query_lake, list_sentinel_workspaces, search_tables, etc.)
    • Error analysis with error categories and sample failure reasons
    • Daily activity trend (ASCII bar chart — from Query 1 unified trend, Data Lake MCP series)
    • MCP vs Direct KQL delineation table
  7. Azure MCP & ARM Analysis
    • Azure MCP Server authentication events (detected via AppId 04b07795 — Azure CLI credential, shared AppId)
    • Daily auth trend (ASCII bar chart showing events/day — from Query 1 unified trend, Azure MCP/CLI series)
    • Azure MCP Server workspace queries from LAQueryLogs (detected via AADClientId 04b07795 + empty RequestClientApp + \n| limit N query text suffix)
    • ARM operation volume and resource providers accessed — if no ARM write ops detected, explicitly state: "✅ No ARM write operations detected for AppId 04b07795 in the analysis period."
    • Source attribution via Claims.appid (Azure Portal, AI Studio, Power Platform connectors, etc.)
  8. Workspace Query Governance (Two-Tier)
    • Analytics Tier (LAQueryLogs): All query sources table with MCP vs Portal vs Platform breakdown
    • Data Lake Tier (CloudAppEvents): MCP-driven vs Direct KQL breakdown
    • Combined MCP proportion across both tiers
    • Pareto analysis of query sources
  9. Top MCP Users (Cross-Server Breadth)
    • Ranked table of users by number of MCP servers used and total call volume
    • Cross-server correlation (Graph MCP, Triage MCP, Data Lake MCP, Azure CLI/MCP)
    • UPN resolution from UserIds
  10. MCP Usage Score — Per-dimension breakdown with scoring rationale
  11. Security Assessment — Emoji-coded findings table with evidence citations
  12. Recommendations — Prioritized action items based on findings

Report Completeness Checklist

🔴 MANDATORY — Run before finalizing any report. After composing the full report, verify each row below. Every server section (4-7) must include its Daily Trend chart derived from Query 1. Query 1 returns all 4 server series in a single union — filter by Server column to extract each.

# Section Required Sub-Section Data Source Check
4 Graph MCP Server Daily Usage Trend (ASCII bar chart) Q1 → Server = "Graph MCP"
4 Graph MCP Server Top Endpoints table Q2
4 Graph MCP Server Sensitive API access summary Q2 IsSensitive rows
4 Graph MCP Server Caller attribution Q9
5 Sentinel Triage MCP Daily Usage Trend (ASCII bar chart) Q1 → Server = "Triage MCP"
5 Sentinel Triage MCP API calls table Q5
5 Sentinel Triage MCP Authentication events Q6
6 Data Lake MCP Daily Activity Trend (ASCII bar chart) Q1 → Server = "Data Lake MCP"
6 Data Lake MCP MCP vs Direct KQL delineation Q10
6 Data Lake MCP Tool breakdown table Q11
6 Data Lake MCP Error analysis Q12
7 Azure MCP Server Daily Auth Trend (ASCII bar chart) Q1 → Server = "Azure MCP/CLI"
7 Azure MCP Server Authentication events Q13
7 Azure MCP Server Workspace queries (LAQueryLogs) Q14
7 Azure MCP Server AzureActivity write operations (ad-hoc or explicit "none found")
9 Top MCP Users Cross-server user breadth table Q15

If any checkbox cannot be checked, either the data was missing (state why — e.g., "Q1 returned 0 rows for this server") or the section was accidentally omitted. Do not finalize the report with unchecked boxes unless the data genuinely does not exist.

Report Visualization Patterns

Daily Usage Trend (ASCII)

Graph MCP Usage — Last 30 Days
Day         Calls  Trend
─────────────────────────────────────
2026-02-07  │ 23   ████████████
2026-02-06  │  0   
2026-02-05  │ 45   ██████████████████████
2026-02-04  │ 12   ██████
...
─────────────────────────────────────
Avg: 15.2/day  Peak: 45  Total: 152

Workspace Query Proportion (ASCII)

Analytics Tier Query Sources — Last 30d (LAQueryLogs)
──────────────────────────────────────────
Sentinel Engine    ████████████████████████████████████ 88.4%  (10,354)
Logic Apps         ████                                  7.0%     (821)
Triage MCP          █                                    4.1%     (481)
Sentinel Portal                                          0.4%      (48)
──────────────────────────────────────────
MCP Servers: 4.1% │ Portal: 0.4% │ Platform: 95.4%

Data Lake Tier Query Sources — Last 30d (CloudAppEvents)
──────────────────────────────────────────
Data Lake MCP      ████████████████████████████████████ 97.1%  (1,028)
Direct KQL                                               2.9%      (34)
──────────────────────────────────────────
MCP Server-Driven: 97.1% │ Direct KQL: 2.9%

Endpoint Access Distribution (ASCII)

Top Graph MCP Endpoints — 30d
─────────────────────────────────────────────────────
conditionalAccess/policies    ████████████  27  (17.8%)
users                         ██████████    22  (14.5%)
roleManagement/directory      ████████      18  (11.8%)
servicePrincipals             ██████        14   (9.2%)
groups                        █████         11   (7.2%)
...
─────────────────────────────────────────────────────
🔴 Sensitive: 82/152 (53.9%)  │  ✅ Standard: 70/152 (46.1%)

MCP Usage Score Card (ASCII)

┌──────────────────────────────────────────────────────┐
│               MCP USAGE SCORE: 22/100                │
│                 Rating: ✅ HEALTHY                    │
├──────────────────────────────────────────────────────┤
│ User Diversity     [██░░░░░░░░] 3/20  (1-2 users)   │
│ Endpoint Sensitiv  [████████░░] 14/20 (54% sensitive)│
│ Error Rate         [░░░░░░░░░░] 0/20  (<1% errors)  │
│ Volume Anomaly     [██░░░░░░░░] 3/20  (within norm)  │
│ Off-Hours Activity [█░░░░░░░░░] 2/20  (<5% off-hrs)  │
└──────────────────────────────────────────────────────┘

Markdown File Report Structure

When outputting to markdown file, include everything from the inline format PLUS:

# MCP Server Usage Monitoring Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Analysis Period:** <start> → <end> (<N> days)
**Data Sources:** MicrosoftGraphActivityLogs, SigninLogs, LAQueryLogs, CloudAppEvents, AzureActivity, SentinelAudit

---

## Executive Summary

<2-3 sentence summary: MCP servers detected, total usage volume, risk level, key findings>

---

## MCP Footprint Summary

### Server Landscape
| MCP Server | API Calls | Auth Events | Distinct Users | Error Rate | Status |
|------------|----------:|------------:|---------------:|-----------:|--------|
| Graph MCP | ... | — | ... | ...% | ✅/🟡/🟠/🔴 |
| Triage MCP | ... | ... | ... | ...% | ✅/🟡/🟠/🔴 |
| Data Lake MCP | ... | — | ... | ...% | ✅/🟡/🟠/🔴 |
| Azure MCP/CLI | — | ... | ... | ...% | ✅/🟡/🟠/🔴 |

### Consolidated KPIs
| Metric | Value |
|--------|------:|
| Total MCP API Calls | X,XXX |
| Total Auth Events | X,XXX |
| Distinct MCP Users | XXX |
| Active MCP Servers | N of 4 |
| Combined MCP Query Share | X.X% |
| Sensitive API Rate | X.X% |

> **SVG Note:** These KPIs map directly to Row 2 KPI cards and the Server Landscape maps to Row 3 table widget. Render this section before per-server deep dives to enable incremental SVG generation.

---

## Graph MCP Server

### Daily Usage Trend
<ASCII bar chart — requests per day>

### Top Endpoints
| Rank | Endpoint | Calls | % Total | Users | Last Used |
|------|----------|-------|---------|-------|-----------|
| 1 | ... | ... | ... | ... | ... |

### Sensitive API Access
| Endpoint | Calls | Users | Methods | Risk |
|----------|-------|-------|---------|------|
| roleManagement/... | 18 | 1 | GET | 🟠 Read access to PIM |
| ... | ... | ... | ... | ... |

**Summary:** X of Y calls (Z%) targeted sensitive endpoints. <Risk assessment>.

### Caller Attribution (Query 9)
| Caller Type | Auth Method | Users | Calls | Success Rate |
|-------------|-------------|------:|------:|-------------:|
| 👤 User (Delegated) | ... | ... | ... | ...% |
| 🤖 Service Principal | ... | ... | ... | ...% |

---

## Sentinel Triage MCP

### Triage MCP API Calls (MicrosoftGraphActivityLogs — AppId `7b7b3966`)
| Endpoint | Method | Calls | Users | First Seen | Last Seen |
|----------|--------|-------|-------|------------|----------|
| ... | ... | ... | ... | ... | ... |

### Triage MCP Authentication Events (SigninLogs — "Microsoft Defender Mcp")
| Sign-In Type | Sign-Ins | Users | IPs | Countries | Resource | Last Seen |
|-------------|----------|-------|-----|-----------|----------|----------|
| ... | ... | ... | ... | ... | ... | ... |

---

## Sentinel Data Lake MCP

> **Audit Source:** `CloudAppEvents` (Purview unified audit log)  
> **Classification:** RecordType 403 + Interface `IMcpToolTemplate` = MCP-driven | RecordType 379 = Direct KQL

### MCP vs Direct KQL Delineation
| Access Pattern | Total Calls | Success | Failures | Error Rate | Avg Duration | Users |
|---------------|-------------|---------|----------|------------|-------------|-------|
| 🤖 MCP Server-Driven | ... | ... | ... | ...% | ...s | ... |
| 👤 Direct KQL | ... | ... | ... | ...% | ...s | ... |

### MCP Tool Breakdown
| Tool Name | Calls | Success | Failures | Error Rate | Avg Duration | Last Seen |
|-----------|-------|---------|----------|------------|-------------|-----------|
| `query_lake` | ... | ... | ... | ...% | ...s | ... |
| `list_sentinel_workspaces` | ... | ... | ... | ...% | ...s | ... |
| `search_tables` | ... | ... | ... | ...% | ...s | ... |
| ... | ... | ... | ... | ... | ... | ... |

### Error Analysis
| Error Category | Count | % of Failures | Sample Error | Affected Tools |
|---------------|-------|---------------|--------------|----------------|
| Schema/Semantic Error | ... | ...% | `column 'X' does not exist` | ... |
| ... | ... | ... | ... | ... |

### Daily Activity Trend
<ASCII bar chart — MCP + Direct KQL calls per day>

---

## Azure MCP Server

> **Detection Method:** Azure CLI credential (AppId `04b07795`, shared with manual `az` CLI). `RequestClientApp` is empty in LAQueryLogs. Best differentiator: Azure MCP appends `\\n| limit N` to query text via `monitor_workspace_log_query`. 🔄 Previously documented as AppId `1950a258` + `csharpsdk,LogAnalyticsPSClient` — that fingerprint is obsolete.

### Authentication Timeline
| Timestamp | Resource | Result | Auth Type | UserAgent | Notes |
|-----------|----------|--------|-----------|-----------|-------|
| ... | ... | ... | ... | ... | ... |

### Workspace Queries (LAQueryLogs)
| Timestamp | Query (truncated) | Response | CPU (ms) | Source App |
|-----------|-------------------|----------|----------|------------|
| ... | ... | ... | ... | ... |

### AzureActivity Write Operations
| Timestamp | Operation | Resource Provider | Status | Claims.appid |
|-----------|-----------|-------------------|--------|-------------|
| ... | ... | ... | ... | `04b07795` |

> If no ARM write operations found, state: "✅ No ARM write operations detected for AppId `04b07795` in the analysis period. ARM read operations are not logged in AzureActivity."

---

## Azure ARM Operations (All Sources)

> **Source Attribution:** ARM operations attributed via `Claims.appid` in AzureActivity.
> Azure MCP Server read-only operations NOT logged in AzureActivity.

### ARM Source Attribution
| AppId | App Name | Calls | Operations |
|-------|----------|-------|------------|
| ... | ... | ... | ... |

### Operations by Resource Provider
| Resource Provider | Calls | Top Operations | Distinct Resources |
|-------------------|-------|----------------|-------------------|
| ... | ... | ... | ... |

---

## Workspace Query Governance (Two-Tier)

### Analytics Tier (LAQueryLogs)
| Rank | AppId | Source | Category | Queries | % Total | Users |
|------|-------|--------|----------|---------|---------|-------|
| 1 | ... | Sentinel Engine | Platform | ... | ... | ... |
| 2 | ... | Sentinel Triage MCP | MCP Server | ... | ... | ... |
| 3 | ... | Sentinel Portal | Portal | ... | ... | ... |
| ... | ... | ... | ... | ... | ... | ... |

### Data Lake Tier (CloudAppEvents)
| Access Pattern | Calls | % Total | Users | Tables Accessed |
|---------------|-------|---------|-------|-----------------|
| 🤖 MCP Server-Driven | ... | ...% | ... | ... |
| 👤 Direct KQL | ... | ...% | ... | ... |

### Combined MCP Proportion
<ASCII proportion bar — Analytics + Data Lake tiers combined>

MCP queries represent X% of combined query volume:
- Analytics tier: X of Y queries via Sentinel Triage MCP (Z%)
- Data Lake tier: X of Y queries via Data Lake MCP (Z%)
- Graph API: X calls via Graph MCP

---

## Top MCP Users (Cross-Server Breadth)

### User Ranking by MCP Server Breadth (Query 15)
| Rank | User (UPN) | Servers Used | MCP Servers | Total Calls |
|------|-----------|:------------:|-------------|------------:|
| 1 | ... | N | Graph MCP, Triage MCP, ... | X,XXX |
| 2 | ... | N | ... | X,XXX |
| ... | ... | ... | ... | ... |

> **Interpretation:** Users spanning 3+ MCP servers represent the broadest AI tool adoption. Cross-reference with sensitive endpoint data (§4) to identify users combining breadth with privileged access.

---

## MCP Usage Score

<ASCII score card>

### Dimension Breakdown
| Dimension | Score | Evidence |
|-----------|-------|----------|
| User Diversity | X/20 | N distinct users across M MCP channels |
| Endpoint Sensitivity | X/20 | N% of Graph MCP calls to sensitive endpoints |
| Error Rate | X/20 | N% error rate across all channels |
| Volume Anomaly | X/20 | Peak day was N% of rolling average |
| Off-Hours Activity | X/20 | N% of calls outside 08:00-18:00 UTC |

---

## Security Assessment

| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡/🟠 **Factor** | Evidence-based finding |

---

## Recommendations

1. ⚠️/🟢 <Prioritized action item with evidence>
2. ...

---

## Appendix: Query Details

Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.

| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q1 — Unified Daily MCP Activity Trend | MicrosoftGraphActivityLogs, CloudAppEvents, SigninLogs, AADNonInteractive, LAQueryLogs | X,XXX | N rows | X.XXs |
| Q2 — Graph MCP Endpoint & Activity Summary | MicrosoftGraphActivityLogs | X,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |

*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*

Proactive Alerting — KQL Data Lake Jobs

This skill provides on-demand visibility (Phases 1-7 above). For continuous, scheduled anomaly detection that feeds Sentinel analytics rules, use the companion KQL Data Lake Jobs defined in:

📄 queries/identity/mcp_anomaly_detection_kql_jobs.md

Maturity Model

Tier Capability Implementation
1. Visibility (current skill) On-demand MCP usage reports via Copilot chat This SKILL.md — Phases 1-7, Queries 1-15
2. Baselining 14-day behavioral baselines per user per MCP server KQL Jobs 1-8 build baselines automatically
3. Alerting Automated anomaly detection → Sentinel incidents KQL Jobs promote to _KQL_CL tables → Analytics Rules fire
4. Enforcement Real-time guardrails, scope limits (future) Not yet available — requires MCP protocol-level controls

KQL Job Inventory

Job Anomaly Type Source Table(s) Destination Table Schedule
1 New sensitive Graph endpoint MicrosoftGraphActivityLogs MCPGraphAnomalies_KQL_CL Daily
2 Graph MCP volume spike (3x baseline) MicrosoftGraphActivityLogs MCPGraphAnomalies_KQL_CL Daily
3 Off-hours Graph MCP activity MicrosoftGraphActivityLogs MCPGraphAnomalies_KQL_CL Daily
4 Graph MCP error rate anomaly MicrosoftGraphActivityLogs MCPGraphAnomalies_KQL_CL Daily
5 New Azure MCP Server user AADNonInteractiveUserSignInLogs MCPAzureAnomalies_KQL_CL Daily
6 New Azure MCP resource target AADNonInteractiveUserSignInLogs MCPAzureAnomalies_KQL_CL Daily
7 Sentinel workspace query anomalies LAQueryLogs MCPSentinelAnomalies_KQL_CL Daily
8 Cross-MCP activity chains Multiple (join) MCPCrossMCPCorrelation_KQL_CL Daily

Why KQL Jobs (Not Summary Rules)

KQL jobs support multi-table joins — critical for Job 7 (LAQueryLogs + baseline) and Job 8 (Graph + Azure + Sentinel cross-correlation). Summary rules are limited to single-table with lookup() joins to analytics-tier tables only.

Architecture

Data Lake ──[KQL Jobs (daily)]──► _KQL_CL tables (analytics tier) ──[Analytics Rules]──► Incidents

Key design constraints:

  • 15-minute delay: All queries use now() - 15m to account for Data Lake ingestion latency
  • Anomaly-only promotion: Only flagged records are written to analytics tier (cost optimization)
  • Separate timestamp: DetectedTime preserves original event time; TimeGenerated reflects job execution time
  • 3 concurrent job limit: Per tenant — prioritize Jobs 1, 7, 8 for highest-value detections

For full query definitions, deployment checklist, and companion analytics rule templates, see queries/identity/mcp_anomaly_detection_kql_jobs.md.


Known Pitfalls

project ... as Keyword Fails in Advanced Hunting

Problem: The as keyword for column aliasing inside project (e.g., tostring(parse_json(Status).errorCode) as ErrorCode) fails in Advanced Hunting with Query could not be parsed at 'as'. While as is valid KQL in Log Analytics / Data Lake, the AH parser rejects it inside project statements.
Solution: Always use = assignment syntax instead: ErrorCode = tostring(parse_json(Status).errorCode). This works in both AH and Data Lake. All queries in this skill have been updated to use = syntax. When writing new queries, never use as for column aliasing in project — reserve as for tabular expression naming (let T = ... | as T).

Azure MCP Server Detection (🔄 Updated Feb 2026)

Problem: Azure MCP Server uses DefaultAzureCredential and the credential chain now resolves to Azure CLI (AppId 04b07795-8ddb-461a-bbee-02f9e1bf7b46), NOT AzurePowerShellCredential (1950a258) as previously documented. In LAQueryLogs, RequestClientApp is empty (not csharpsdk,LogAnalyticsPSClient). The previously documented fingerprint (1950a258 + csharpsdk,LogAnalyticsPSClient) appeared only once in 30-day lookback and is obsolete. ARM read operations (the majority of MCP calls) do not appear in AzureActivity.

Previous fingerprint (OBSOLETE):

  • ❌ AppId 1950a258-227b-4e31-a9cf-717495945fc2 (AzurePowerShellCredential)
  • RequestClientApp = "csharpsdk,LogAnalyticsPSClient" in LAQueryLogs
  • ❌ UserAgent azsdk-net-Identity as primary differentiator (shared by many Azure SDK services)

Current fingerprint (field-tested Feb 2026):

  • ✅ AppId 04b07795-8ddb-461a-bbee-02f9e1bf7b46 (Azure CLI)
  • RequestClientApp is empty (shared with Azure CLI and 4+ other AADClientIds — not a unique fingerprint)
  • ✅ Azure MCP monitor_workspace_log_query appends \n| limit N to query text — best query-level differentiator
  • ✅ Token caching: sign-in events represent access sessions, not individual API calls

Solution: Azure MCP Server queries can be identified in LAQueryLogs with moderate confidence by filtering for AADClientId 04b07795 + query text containing \n| limit (the suffix added by monitor_workspace_log_query). In SigninLogs, the shared AppId means Azure MCP is indistinguishable from manual Azure CLI usage — present as "Azure MCP Server / Azure CLI (shared AppId 04b07795)" in reports. The empty RequestClientApp bucket contains queries from 5+ different tools, so this field cannot be used for attribution.

Limitations:

  • ARM read operations produce sign-in events but NOT AzureActivity records
  • If the user also runs az CLI manually, sign-in events from both are indistinguishable
  • The \n| limit N query text suffix is the only reliable query-level differentiator but is heuristic
  • The credential chain may change with Azure MCP Server updates — monitor for AppId shifts
  • AzureActivity ingestion lag is typically 3-20 min (MS docs); SigninLogs ~1-2h; LAQueryLogs/AADNonInteractiveUserSignInLogs ~5-15 min

MicrosoftGraphActivityLogs Availability

Problem: Graph activity logs are NOT enabled by default. If the table is empty or doesn't exist, Graph MCP analysis cannot proceed.
Solution: If MicrosoftGraphActivityLogs returns 0 results or table-not-found error, report: "⚠️ Microsoft Graph activity logs are not enabled in this tenant. Enable them at: https://learn.microsoft.com/en-us/graph/microsoft-graph-activity-logs-overview". Skip Graph MCP analysis gracefully and proceed with other MCP channels.

LAQueryLogs Diagnostic Settings

Problem: LAQueryLogs requires diagnostic settings to be configured on the Log Analytics workspace. Without it, workspace query governance analysis is impossible.
Solution: If LAQueryLogs returns empty, report: "⚠️ LAQueryLogs not available — enable Log Analytics workspace diagnostic settings to monitor query activity." Skip workspace governance analysis and note the gap.

AppId Misclassification History (Field-Tested Feb 2026)

80ccca67 — Previously assumed to be a Graph MCP variant. Actually the M365 Security & Compliance Center (Sentinel Portal backend, RequestClientApp = ASI_Portal). Categorize as "Sentinel Portal (Non-MCP)". Graph MCP has only ONE AppId: e8c77dc2.

95a5d94c — Previously assumed to be "VS Code Copilot" (MCP Client). Actually the Azure Portal — AppInsightsPortalExtension blade, executing Usage dashboard/workbook queries in the browser. No SPN or app registration in tenant; not in merill/microsoft-info known apps list. Categorize as "Portal/Platform (Non-MCP)".

📘 Takeaway: When encountering an unknown AppId in LAQueryLogs, check the RequestClientApp field first — it reliably reveals the actual source (e.g., AppInsightsPortalExtension, ASI_Portal). Do not assume an AppId is MCP-related without verifying via Graph API SPN lookup, sign-in logs, and query content analysis.

CloudAppEvents CamelCase Matching (ActionType AND Operation)

Problem: Both ActionType and RawEventData.Operation values in CloudAppEvents for Sentinel operations use CamelCase without word boundaries (e.g., SentinelAIToolRunCompleted, KQLQueryCompleted). The has operator requires word boundaries and will NOT match these values. Field-tested Feb 2026: has "Completed" returns false for ALL Operation values including KQLQueryCompleted — the has operator fails on substrings within CamelCase tokens.
Solution: Always use contains (not has) when filtering ActionType or Operation for Sentinel/KQL operations:

// ✅ CORRECT — 'contains' works with CamelCase
| where ActionType contains "Sentinel" or ActionType contains "KQL"
| where Operation contains "Completed"

// ❌ WRONG — 'has' requires word boundaries, fails on CamelCase
| where ActionType has "Sentinel" or ActionType has "KQL"
| where Operation has "Completed"  // Returns 0 rows — silently drops ALL MCP events!

Impact if missed: Query 12 (MCP vs Direct KQL delineation) will show 0 MCP events and ONLY Direct KQL — because MCP events (RecordType 403) are filtered out by Operation has "Completed", while Direct KQL events (RecordType 379) survive via the OR RecordType == 379 fallback. This creates a false impression that no MCP-driven queries exist.

CloudAppEvents RawEventData Parsing

Problem: RawEventData in CloudAppEvents is a dynamic column but often contains nested JSON that requires double-parsing. Direct property access (e.g., RawEventData.ToolName) may return empty.
Solution: Always parse explicitly with parse_json(tostring(RawEventData)):

| extend RawData = parse_json(tostring(RawEventData))
| extend ToolName = tostring(RawData.ToolName)

Data Lake MCP Has No AppId

Problem: Unlike Graph MCP (e8c77dc2) and Sentinel Triage MCP (7b7b3966), the Sentinel Data Lake MCP has no dedicated AppId in any telemetry table. It is not visible in LAQueryLogs, SigninLogs, or MicrosoftGraphActivityLogs.
Solution: Data Lake MCP activity is audited exclusively via CloudAppEvents (Purview unified audit log). Filter by ActionType contains "SentinelAITool" (preferred — top-level column) or extract RecordType from RawEventData with toint(parse_json(tostring(RawEventData)).RecordType) == 403 and Interface == "IMcpToolTemplate". Note: RecordType is NOT a top-level column in CloudAppEvents — it is nested inside RawEventData and must be extracted via parse_json().

Table availability (field-tested Feb 2026): CloudAppEvents was confirmed available on both Data Lake (TimeGenerated, 90d retention) and Advanced Hunting (Timestamp, 30d retention) in a standard Sentinel workspace without explicit Purview/E5 configuration. Always attempt the query first — only report a gap if the table returns 0 results or a table-not-found error. Do not skip Phase 3 based on licensing assumptions.

CloudAppEvents Double-Counting Prevention

Problem: Each Data Lake MCP tool call generates TWO events: SentinelAIToolRunStarted (RecordType 403) and SentinelAIToolRunCompleted (RecordType 403). Counting both will double the actual call count.
Solution: Always filter on Operation == "SentinelAIToolRunCompleted" for call counts, duration analysis, and error analysis. Use SentinelAIToolRunStarted only when investigating specific timing sequences or queue behavior.

Data Lake MCP ExecutionDuration Format

Problem: The ExecutionDuration field in RawEventData is stored as a string (e.g., "2.4731712"), not a numeric type. Aggregation functions (avg, max) will fail without conversion.
Solution: Use todouble(RawData.ExecutionDuration) to convert before aggregation.

Sentinel Engine False Association

Problem: The Sentinel analytics engine (fc780465-2017-40d4-a0c5-307022471b92) generates the highest query volume in most workspaces but is NOT an MCP server. Including it in MCP totals would massively inflate the numbers.
Solution: ALWAYS label Sentinel Engine and Logic Apps Connector as "Platform (Non-MCP)" in reports. The MCP proportion calculation MUST exclude these from the MCP numerator.

SigninLogs Status Field Needs parse_json() in Data Lake

Problem: The Status column in SigninLogs / AADNonInteractiveUserSignInLogs is a dynamic field containing {errorCode, failureReason, additionalDetails}, but Data Lake workspaces may store it as a string. Using dot-notation (Status.errorCode) without parse_json() causes parser errors (Expected: ;) or SemanticErrors.
Solution: Always use tostring(parse_json(Status).errorCode) — same pattern as DeviceDetail, LocationDetails, and ConditionalAccessPolicies. This works regardless of whether the column is stored as dynamic or string. Query 3 was fixed for this in Feb 2026.

Type Column Unavailable in Data Lake Union Contexts

Problem: The Type pseudo-column (table name) is NOT resolvable in union queries executed via Sentinel Data Lake. Using summarize by Type in a union SigninLogs, AADNonInteractiveUserSignInLogs query fails with SemanticError: Failed to resolve scalar expression named 'Type'.
Solution: When you need to distinguish source tables in a union, add | extend TableName = "SigninLogs" (or "AADNonInteractive") within each union leg before the union operator. Then summarize by TableName. This is already handled in Query 13 via the SignInType field pattern (extend SignInType = "Interactive" / "Non-Interactive"), but ad-hoc summary variants must use the extend approach — never Type.

Non-Interactive Sign-In Noise

Problem: AADNonInteractiveUserSignInLogs may contain Logic Apps connector activity (de8c33bb) that looks like user activity but is automated.
Solution: When reporting Sentinel MCP auth events from SigninLogs, distinguish interactive (user-initiated) from non-interactive (automated) sources. The LogicApps connector is NOT MCP — exclude it from MCP auth counts.

AADNonInteractiveUserSignInLogs Commonly on Data Lake Tier

Problem: Many customers place AADNonInteractiveUserSignInLogs on Data Lake (or Basic) tier. When this table is NOT on Analytics tier, any Advanced Hunting query that unions SigninLogs + AADNonInteractiveUserSignInLogs fails with MPC -32600: The query should contain a single Basic or Auxiliary table or silently returns incomplete/unsorted data. This affects Query 1 (daily trend) and Query 6 (Triage MCP auth) in this skill.
Solution: All queries that union SigninLogs + AADNonInteractiveUserSignInLogs in this skill MUST use mcp_sentinel-data_query_lake instead of RunAdvancedHuntingQuery. Data Lake handles cross-table unions natively and works regardless of which tier each table is on. When running via Data Lake, CloudAppEvents uses TimeGenerated (not Timestamp as in AH). Queries 1, 6, and 15 are already configured for Data Lake.

Off-Hours Timezone Uncertainty

Problem: TimeGenerated is always UTC, but "off-hours" has different meaning depending on the user's timezone. A UTC 06:00 call might be 22:00 local or 14:00 local.
Solution: Default to UTC for off-hours calculation. If the user's timezone is known from sign-in data (LocationDetails), adjust. Always state the timezone assumption in the report.

Multi-Tenant Token Confusion

Problem: Azure MCP Server uses DefaultAzureCredential and may authenticate against the wrong tenant if multiple credentials are cached, causing queries to fail or return data from an unexpected tenant.
Solution: Read config.json for the azure_mcp.tenant parameter. When making Azure MCP Server calls, always pass the tenant parameter explicitly. Note this risk in the report.

Rate Limiting Not Visible in Logs

Problem: Graph MCP Server is capped at 100 calls/min/user. If throttled, calls may not appear in logs (no log entry = no visibility).
Solution: If daily call counts show sudden drops to 0 after a high-volume period, note possible throttling. Check for 429 Too Many Requests response codes in Query 1 raw data.

SentinelAudit Table Availability

Problem: SentinelAudit requires Sentinel auditing and health monitoring to be enabled. It may not exist in all workspaces.
Solution: If SentinelAudit returns table-not-found, skip gracefully. Report: "⚠️ Sentinel auditing not enabled — cannot check configuration changes."


Error Handling

Common Issues

Issue Solution
project ... as ErrorCode fails in AH Advanced Hunting rejects as keyword in project. Use = syntax: ErrorCode = tostring(...). See Known Pitfalls.
MPC -32600 error from Triage MCP Transient error — retry once. If persistent, fall back to mcp_sentinel-data_query_lake.
MicrosoftGraphActivityLogs table not found Graph activity logs not enabled. Report gap, skip Graph MCP analysis, provide enablement link.
LAQueryLogs table not found Diagnostic settings not configured on LA workspace. Report gap, skip governance analysis.
SentinelAudit table not found Sentinel health monitoring not enabled. Report gap, skip config change analysis.
AzureActivity returns 0 results No ARM operations in the time range, or no administrative actions by the specified user.
SigninLogs returns 0 for Sentinel Platform Services No one authenticated to Sentinel MCP in the time range. Report as "✅ No Sentinel MCP authentication events detected."
CloudAppEvents table not found Purview unified audit not available (requires E5 license). Report gap: "⚠️ CloudAppEvents not available — cannot monitor Data Lake MCP usage. Requires Microsoft 365 E5 or Purview audit." Skip Phase 3 (Data Lake MCP).
CloudAppEvents returns 0 for Sentinel operations No Data Lake MCP or Direct KQL activity in the time range. Report as "✅ No Sentinel Data Lake activity detected in CloudAppEvents."
ActionType has "Sentinel" returns 0 but data exists CamelCase bug — use contains instead of has for ActionType matching. See Known Pitfalls.
Operation has "Completed" drops MCP events silently Same CamelCase bug — has "Completed" returns false for ALL CamelCase operations (SentinelAIToolRunCompleted, KQLQueryCompleted). MCP events (RecordType 403) are silently dropped; Direct KQL survives only via OR RecordType == 379 fallback. Use contains "Completed". See Known Pitfalls.
RawEventData.ToolName returns empty Double-parse required: use parse_json(tostring(RawEventData)) then extract fields. See Known Pitfalls.
Query timeout Reduce lookback from 30d to 7d, or add `
Unknown AppId in LAQueryLogs Cross-reference with Entra ID > App Registrations. May be a custom MCP server or third-party tool.
Multiple workspaces available Follow workspace selection rules — STOP, list all, ASK user, WAIT.
Azure MCP calls indistinguishable from CLI Partially resolved: AppId 04b07795 is shared with Azure CLI. Use `\n

Validation Checklist

Before presenting results, verify:

  • All MCP telemetry surfaces were queried (Graph, Sentinel Triage, Sentinel Data Lake, Azure ARM, LAQueryLogs, CloudAppEvents)
  • Tables that don't exist are reported as gaps, not silent omissions
  • Non-MCP sources (Sentinel Engine, Logic Apps, Sentinel Portal) are clearly labeled as "Platform/Portal (Non-MCP)"
  • 80ccca67 is classified as "M365 Security & Compliance Center (Sentinel Portal)" — NOT as an MCP server
  • 95a5d94c is classified as "Azure Portal — AppInsightsPortalExtension" — NOT as MCP Client or VS Code Copilot. Verify via RequestClientApp field.
  • MCP proportion calculation excludes non-MCP platform sources from the MCP numerator
  • Two-tier governance view included: Analytics tier (LAQueryLogs) + Data Lake tier (CloudAppEvents)
  • Data Lake MCP vs Direct KQL delineation is clearly presented (RecordType 403 vs 379)
  • CloudAppEvents queries use contains (not has) for ActionType matching
  • CloudAppEvents queries use contains (not has) for Operation field matching (same CamelCase issue)
  • CloudAppEvents RawEventData is parsed with parse_json(tostring(RawEventData)) pattern
  • Data Lake MCP tool call counts use SentinelAIToolRunCompleted only (not Started) to avoid double-counting
  • All user attribution is based on actual query results, not assumptions
  • Azure MCP Server detection uses AppId 04b07795 (Azure CLI) with empty RequestClientApp and query text \n| limit N suffix as differentiator. Present as "Azure MCP Server / Azure CLI (shared AppId)" in reports
  • Graph MCP sensitive endpoint percentage is calculated from actual data
  • Off-hours analysis states the timezone assumption (default: UTC)
  • Empty results are explicitly reported with ✅ (not silently omitted)
  • AppId cross-reference table is included for any unknown AppIds discovered
  • The MCP Usage Score calculation is transparent with per-dimension evidence
  • All ASCII visualizations are wrapped in code fences for markdown compatibility
  • Top MCP Users table (Q15) included in report with cross-server breadth ranking
  • If no Agent Identities are needed: refer user to ai-agent-posture skill for comprehensive agent audit

Prerequisites

For complete MCP server monitoring, ensure these data sources are enabled:

Data Source Enabling Documentation Required For
Microsoft Graph activity logs Enable Graph activity logs Graph MCP analysis (Queries 1-2, 5, 9)
CloudAppEvents (Purview unified audit) Requires M365 E5 license; enable Sentinel Data Lake auditing Data Lake MCP analysis (Queries 10-12)
Sentinel auditing and health monitoring Enable Sentinel monitoring Config change detection (ad-hoc SentinelAudit queries)
LAQueryLogs (diagnostic settings) Configure diagnostic settings on LA workspace Workspace governance (Queries 7, 8, 14)
AzureActivity Enabled by default for ARM operations Azure MCP analysis (ad-hoc ARM queries)
SigninLogs Entra ID diagnostic settings Sentinel MCP auth events (Queries 3-4, 6, 13)
Purview audit logs Included with E5 license CloudAppEvents ingestion — required for Data Lake MCP monitoring (Queries 10-12). RecordType 403 (AI Tool) and 379 (KQL)

If any prerequisite is not met, the skill will report the gap and skip the affected analysis sections.


Cross-References

  • KQL Jobs for proactive alerting: queries/identity/mcp_anomaly_detection_kql_jobs.md — Scheduled Data Lake jobs that promote MCP anomalies to analytics tier for automated Sentinel alerting
  • Main skill registry: .github/copilot-instructions.md — Skill detection and global rules
  • Scope drift analysis: .github/skills/scope-drift-detection/SKILL.md — Can be run on MCP-related service principals for behavioral drift detection
  • Sentinel Data Lake auditing: Auditing lake activities — Official docs on RecordType 403/379 audit events in CloudAppEvents
  • Sentinel MCP tool collections: Tool collection overview — Data Exploration, Triage, and Security Copilot Agent Creation collections
  • Sentinel MCP custom tools: Create custom MCP tools — Expose saved KQL queries as MCP tools
  • Copilot Studio MCP catalog: Built-in MCP servers — 19+ Microsoft-managed MCP servers for agent development
  • Azure MCP Server tools: Available tools — Full Azure MCP Server tool catalog (40+ namespaces)
  • Power BI MCP: Remote endpoint at https://api.fabric.microsoft.com/v1/mcp/powerbi, Modeling at microsoft/powerbi-modeling-mcp
  • Fabric RTI MCP: Fabric RTI MCP overview | GitHub
  • Playwright MCP: GitHub — Browser automation MCP (26.9k ⭐, local only)
  • AI Agent Posture: .github/skills/ai-agent-posture/SKILL.md — Comprehensive Copilot Studio agent security audit (for Agent Identity analysis, use this skill instead)

SVG Dashboard Generation

📊 Optional post-report step. After an MCP Usage report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/mcp-usage/MCP_Usage_Report_<workspace>_<date>.md
  • Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/mcp-usage/{report_name}_dashboard.svg

The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.

生成MITRE ATT&CK检测覆盖报告。通过YAML驱动PowerShell脚本采集规则、告警及SOC优化数据,映射战术技术并识别缺口。LLM渲染包含覆盖率矩阵、未标记规则修复建议及综合评分的Markdown报告,辅助提升检测能力。
需要评估Microsoft Sentinel工作区对MITRE ATT&CK框架的检测覆盖情况 希望识别检测规则中的覆盖缺口或未标记的规则 需要生成包含SOC优化建议和综合评分的检测能力分析报告
.github/skills/mitre-coverage-report/SKILL.md
npx skills add SCStelz/security-investigator --skill mitre-coverage-report -g -y
SKILL.md
Frontmatter
{
    "name": "mitre-coverage-report",
    "description": "MITRE ATT&CK Coverage Report — YAML-driven PowerShell pipeline gathers analytic rule MITRE tags, custom detection techniques, SOC Optimization recommendations, and alert\/incident operational data via az rest\/az monitor\/Graph API, writes a deterministic scratchpad, LLM renders the report. Covers tactic-level coverage matrix, technique-level drill-down with rule mapping, coverage gap identification, SOC Optimization threat scenario alignment, untagged rule remediation, ICS\/OT technique tracking, and MITRE Coverage Score (5 weighted dimensions). Inline chat and markdown file output.",
    "drill_down_prompt": "Run MITRE ATT&CK coverage report — tactic\/technique coverage, gaps, SOC optimization",
    "threat_pulse_domains": [
        "incidents"
    ]
}

MITRE ATT&CK Coverage Report — Instructions

Purpose

This skill generates a comprehensive MITRE ATT&CK Coverage Report analyzing detection coverage across the ATT&CK Enterprise framework. It inventories all analytic rules and custom detections, maps them to MITRE tactics and techniques, identifies coverage gaps, and provides prioritized recommendations for improving detection posture.

Entity Type: Sentinel workspace (from config.json)

Scope Data Sources Use Case
Workspace-wide (default) Analytic Rules (REST), Custom Detections (Graph), SOC Optimization (REST), SecurityAlert/SecurityIncident (KQL) Full MITRE coverage analysis
Operational correlation SecurityAlert, SecurityIncident Which MITRE-tagged rules actually produce alerts and incidents

What this report covers: Tactic-level coverage matrix with per-tactic technique counts and percentages, technique-level drill-down with rule-to-technique mapping, coverage gap identification against the full ATT&CK Enterprise framework, SOC Optimization threat scenario alignment (AiTM, ransomware, BEC, etc.), untagged rule remediation with AI-suggested MITRE tags, ICS/OT technique tracking, operational MITRE correlation (which rules actually fire), and a composite MITRE Coverage Score.

Complementary to: This skill pairs with the sentinel-ingestion-report skill — ingestion report covers data volume, tier optimization, and cost; MITRE coverage report covers detection posture against the ATT&CK framework. Run both for a complete workspace assessment.


Architecture

 ┌──────────────────────────────────────────────────────────────────┐
 │  YAML query files        PowerShell script         LLM render   │
 │  queries/phase1-3/  ──→  Invoke-MitreScan.ps1  ──→  Phase 4    │
 │  (6 .yaml files)         (~1030 lines)             (SKILL-      │
 │                          • az rest (Sentinel API)   report.md)  │
 │                          • Invoke-MgGraphRequest                │
 │                          • az monitor (KQL)                     │
 │                          • mitre-attck-enterprise.json          │
 │                          • m365-platform-coverage.json (CTID)   │
 │                          ↓                                      │
 │                     temp/mitre_scratch_<ts>.md                  │
 │                     (~35 KB, 18+ sections)                     │
 └──────────────────────────────────────────────────────────────────┘

Execution model:

  • Phases 1-3 (data gathering): Fully automated by Invoke-MitreScan.ps1. Phase 1 uses az rest (Sentinel REST API) and optionally Invoke-MgGraphRequest (Graph API). Phase 2 uses az rest (SOC Optimization API). Phase 3 uses az monitor log-analytics query (KQL).
  • Phase 4 (rendering): LLM reads the scratchpad + SKILL-report.md and renders the report. This is the only phase requiring LLM involvement.

Static reference: mitre-attck-enterprise.json contains ATT&CK Enterprise v16.1 with 14 tactics, 216 techniques, and 475 sub-techniques. The PS1 loads this at startup to compute coverage gaps against the full framework. This file is version-controlled and should be updated when MITRE publishes new ATT&CK releases.

Platform coverage reference: m365-platform-coverage.json is a compact CTID (Center for Threat-Informed Defense) mapping of M365 Defender product capabilities to ATT&CK techniques. Contains detect/protect/respond coverage for 81 detect techniques across 38 capabilities (7 SecurityAlert product groups). Used for the 3-tier platform coverage classification:

  • Tier 1 (Alert-Proven): SecurityAlert from M6 query has MITRE technique attribution — highest confidence
  • Tier 2 (Deployed Capability): Product is active (has alerts) and CTID claims detect coverage for the technique — medium confidence
  • Tier 3 (Catalog Capability): CTID maps coverage but no alert evidence for the product in this workspace — lowest confidence

To rebuild from upstream: download the CTID M365 mapping JSON, transform with PowerShell (group by parent technique, map capabilities to SecurityAlert ProductName). See temp/ctid_raw.json for the raw source.


Companion Files — When to Load

File Purpose When to Load
SKILL.md (this file) Architecture, workflow, rendering rules, score methodology, domain reference Always — primary entry point
SKILL-report.md Report templates (§1-§6), section-to-scratchpad mapping, formatting rules Phase 4 rendering only
Invoke-MitreScan.ps1 PowerShell data-gathering pipeline (Phases 1-3) Execution only — no need to read unless debugging
mitre-attck-enterprise.json ATT&CK Enterprise v16.1 static reference Referenced by PS1 at runtime — no manual loading
m365-platform-coverage.json CTID M365 platform coverage reference (detect/protect/respond) Referenced by PS1 at runtime — no manual loading

📑 TABLE OF CONTENTS

  1. Quick Start - 3-step execution pattern
  2. Critical Workflow Rules - Prerequisites and prohibitions
  3. Execution Workflow - Phases 0-4
  4. Query File Reference - All 5 YAML files
  5. Output Modes - Inline chat vs. Markdown file
  6. Deterministic Rendering Rules - Rules A-D (mandatory for Phase 4)
  7. MITRE Coverage Score - 5-dimension scoring methodology
  8. Domain Reference - ATT&CK interpretation, tactic priorities, Sentinel-specific mappings
  9. SVG Dashboard Generation - Visual dashboard from completed report

Quick Start (TL;DR)

3-step execution pattern:

Step 1:  Run Invoke-MitreScan.ps1 (Phases 1-3 — data gathering)
Step 2:  Read scratchpad + SKILL-report.md (Phase 4 prep)
Step 3:  Render report incrementally (§1 via create_file, then §2–§6 appended via replace_string_in_file)

Step 1: Run Data Gathering

# From workspace root — run all phases (default: 30 days alert/incident lookback):
& ".github/skills/mitre-coverage-report/Invoke-MitreScan.ps1"

# Specify a custom alert/incident lookback:
& ".github/skills/mitre-coverage-report/Invoke-MitreScan.ps1" -Days 7

# Run a specific phase (for re-runs / debugging):
& ".github/skills/mitre-coverage-report/Invoke-MitreScan.ps1" -Phase 1

Output: Scratchpad file at temp/mitre_scratch_<timestamp>.md (~28 KB, 12 sections).

Timing: Full run takes ~60-90 seconds (varying with REST API response times and KQL auth state).

Step 2: Load Rendering Context

  1. Read the scratchpad file (path printed by PS1 at completion)
  2. Read SKILL-report.md for rendering templates

Step 3: Render Report (Incremental Writes)

Render the report across multiple tool calls — one section per call — to avoid single-call output token limits that truncate large reports:

  1. create_file → header + disclaimer + §1 (Executive Summary, Score, Inventory, Top 3 Recs)
  2. replace_string_in_file → append §2 (Tactic Coverage Matrix)
  3. replace_string_in_file → append §3 (Technique Deep Dive — largest section)
  4. replace_string_in_file → append §4 (Coverage Gap Analysis)
  5. replace_string_in_file → append §5 (Operational MITRE Correlation)
  6. replace_string_in_file → append §6 + Appendix

Apply SKILL-report.md templates to scratchpad data, following Rules A–D. See SKILL-report.md for full section templates and the anchor pattern for each append.

🔴 Verbatim table sections — use the deterministic slicer, never hand-copy. Several report tables (§3 TechniqueTables, §5.1 CombinedTacticCoverage, §5.2 AlertFiring, §5.3 ActiveVsTagged, §5.4 IncidentsByTactic, §5.5 DataReadiness, §5.6 ConnectorHealth) are pre-rendered by the PS1 under ## PRERENDERED in the scratchpad. Copy them with the read-only helper instead of transcribing by hand:

python .github/skills/mitre-coverage-report/slice_scratch.py --scratch temp/mitre_scratch_<ts>.md --list
python .github/skills/mitre-coverage-report/slice_scratch.py --scratch temp/mitre_scratch_<ts>.md --section AlertFiring

The slicer prefers the ## PRERENDERED copy when a section name also exists as a raw data block, strips pipeline scaffolding (<!-- … --> comments, SectionTitle: markers) wherever it appears, preserves #### sub-headings, and collapses blank runs — so the output drops straight into the report as a valid markdown table. Do NOT paste the raw Key | Value | … data blocks (the ones with a <!-- header --> comment and no |---| separator row) — they render as plain text, not tables, and pasting the whole scratchpad tail into one section corrupts the report.

⛔ Do NOT render §1–§6 in a single create_file call. The output will truncate silently. The scratchpad is ~60 KB; the rendered report exceeds the single-call output budget.

🔴 ALL 6 APPENDS ARE MANDATORY. Do NOT stop after §5 — §6 (Recommendations) and the Appendix (Score Methodology, Limitations) are critical and must be appended. After the 6th append, run grep_search for ## 6. Recommendations and ## Appendix on the report file to verify both exist. If either is missing, append the missing content immediately.


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY MITRE coverage report:

  1. Run Invoke-MitreScan.ps1 — this single script handles ALL data gathering (Phases 1-3). The LLM does NOT run queries, transcribe output, or write scratchpad sections
  2. Read config.json for workspace ID, tenant, subscription, and Azure MCP parameters
  3. ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or both (default: both)
  4. ALWAYS ask the user for timeframe if not specified: the -Days parameter controls the alert/incident KQL lookback (Phase 3). Default: 30 days. Phases 1-2 (REST API) are not time-bounded
  5. ALWAYS use create_file for markdown reports (never use terminal commands)
  6. ALWAYS sanitize PII from saved reports — use generic placeholders for real rule names, workspace names, and tenant GUIDs in committed files
  7. Read scratchpad + SKILL-report.md before rendering — the scratchpad is the sole data source
  8. Custom Detections may be SKIPPED — the Graph API requires CustomDetection.Read.All which needs admin consent. If skipped, the report notes this and shows AR-only analysis. Do NOT treat SKIPPED as an error — it's a graceful degradation

Prerequisites

Dependency Required By Setup
Azure CLI (az) All phases (REST + KQL) Install: aka.ms/installazurecli. Authenticate: az login --tenant <tenant_id> then az account set --subscription <subscription_id>
Azure RBAC Phase 1-2 (REST API) Microsoft Sentinel Reader on the workspace (analytic rule inventory + SOC Optimization)
KQL auth Phase 3 (az monitor) az login with https://api.loganalytics.io/.default scope (CA policy may enforce re-auth)
Microsoft.Graph PowerShell Phase 1 M2 (Custom Detections) Install-Module Microsoft.Graph.Authentication -Scope CurrentUser. Required scope: CustomDetection.Read.All. PS1 skips gracefully if unavailable
PowerShell 7.0+ Script execution #Requires -Version 7.0

🔴 PROHIBITED

  • ❌ Running REST/KQL queries via MCP tools during data gathering — PS1 handles all queries
  • ❌ Writing or modifying scratchpad sections manually — PS1 is the sole writer
  • ❌ Fabricating technique counts, rule names, or coverage percentages
  • ❌ Inventing ATT&CK technique IDs or names not in the reference JSON
  • ❌ Overriding MITRE Coverage Score dimensions — the PS1 computes these deterministically
  • ❌ Rendering the report without first reading the scratchpad file
  • ❌ Reporting "100% coverage" for any tactic unless the data actually shows every technique covered

Execution Workflow

Phase 0: Initialization

  1. Read config.json for sentinel_workspace_id, subscription_id, Azure MCP parameters
  2. Confirm output mode and timeframe with user (pass -Days to PS1; default 30)
  3. Verify prerequisites: az login session active, correct subscription set

Phases 1-3: Data Gathering (automated by PS1)

Run Invoke-MitreScan.ps1 — it handles all 3 phases automatically:

Phase Queries Description Execution Type
1 M1, M2 Rule inventory — Analytic rules with MITRE tactics/techniques (REST), Custom Detection rules with mitreTechniques (Graph, graceful skip) REST + Graph
2 M3 SOC Optimization — Coverage recommendations with threat scenario context, MITRE tagging suggestions for untagged rules REST
3 M4, M5, M6, M7, M8 Operational correlation — SecurityAlert firing counts per rule with MITRE cross-reference, SecurityIncident volume by tactic, platform-native alert MITRE coverage, table ingestion volume for data readiness validation, data connector health from SentinelHealth KQL

Post-processing (automated by PS1):

Task Phase Description
Tactic coverage matrix 1 For each ATT&CK tactic, count enabled rules and covered techniques against the framework reference
Technique drill-down 3 Map every framework technique to its covering rules AND pre-compute tier/product annotations from CTID cross-reference
Untagged rule identification 1 Find rules with no MITRE tactics AND no techniques
ICS technique extraction 1 Separate T0xxx (ICS/OT) technique mappings
Threat scenario parsing 2 Extract active/recommended detection counts and per-tactic breakdowns from SOC Optimization
AI MITRE tagging suggestions 2 Extract suggested tactics/techniques for untagged rules. Cross-reference against Phase 1 actual rule tags to verify if suggestions were applied (emits VerifyStatus: Applied/Partial/NotApplied/NotFound per rule, plus summary counts AR_TagsApplied/AR_TagsPartial/AR_TagsNotApplied/AR_TagsNotFound)
Alert-to-MITRE correlation 3 Cross-reference firing alerts with Phase 1 MITRE tags
Active tactic coverage 3 Compute which tactics have rules that actually fire alerts
Platform alert MITRE extraction 3 Extract MITRE techniques attributed by platform-native product alerts (M6)
Product presence detection 3 Derive active M365 Defender products from SecurityAlert ProductName
CTID tier classification 3 Cross-reference active products with CTID mapping to classify techniques as Tier 1/2/3
Combined tactic coverage 3 Merge custom rule and platform Tier 1/2 coverage per tactic
Data readiness cross-reference 3 Extract KQL table dependencies from rule queries, cross-reference with M7 ingestion volumes, classify rules as Ready/Partial/NoData
Connector health enrichment 3 Cross-reference M8 SentinelHealth connector status with Data Readiness — flag "Ready" rules whose feeding connector is degraded or failing
Table tier classification 3 Cross-reference M9 table tier metadata with rule KQL table dependencies — flag rules targeting Basic/Data Lake tier tables as "TierBlocked" (phantom coverage: rule structurally cannot fire regardless of data volume)
Coverage Score computation All Weighted composite score from 5 dimensions

Scratchpad output: PS1 writes all results to temp/mitre_scratch_<timestamp>.md (~28 KB, ~12 named sections). See SKILL-report.md for the Section-to-Scratchpad Mapping.

Phase 4: Render Output (LLM)

🔴 MANDATORY — Load scratchpad + report template before rendering:

  1. Read the scratchpad file (path printed by PS1). This single file contains ALL data from Phases 1-3.
  2. Read SKILL-report.md for the complete rendering templates and formatting rules.

Pre-render validation:

  1. Verify scratchpad has all 3 phase sections (PHASE_1 through PHASE_3)
  2. Check SCORE section has all 5 dimensions
  3. If Phase 3 shows FAILED for M4/M5 (token expiry), note this in the report — the Operational dimension defaults to 0

Render — Section-by-Section:

Section Data Source (scratchpad keys) Required
§1 Executive Summary All phases + SCORE ✅ Coverage Score, Workspace at a Glance, Top 3
§2 Tactic Coverage PHASE_1.TacticCoverage ✅ 14-tactic matrix with coverage %
§3 Technique Deep Dive PHASE_3.TechniqueDetail (enriched with Tier/TierProducts) ✅ Per-tactic technique tables with pre-computed tier badges
§4 Coverage Gap Analysis PHASE_1.TacticCoverage + PHASE_3.TechniqueDetail + PHASE_2.ThreatScenarios ✅ Gaps, priorities, threat scenario alignment
§5 Operational MITRE Correlation PHASE_3.AlertFiring + IncidentsByTactic + ActiveTacticCoverage + PlatformAlertCoverage + PlatformTechniquesByTier + PlatformTacticCoverage + DataReadiness + DataReadiness_Summary + MissingTables + TierBlockedTables + ConnectorHealth + ConnectorHealth_Summary ✅ Which rules fire, platform coverage, combined tactic view, data readiness, tier-blocked phantom coverage, connector health
§6 Recommendations All phases ✅ Untagged rule remediation, Content Hub suggestions, coverage priorities

Query File Reference

All queries are defined as YAML files in queries/phase1-3/.

YAML Format

id: mitre-m1                                   # Unique identifier
name: Analytic Rule MITRE Extraction            # Human-readable name
description: Fetch rules with tactics/techniques # What it does
phase: 1                                        # Which phase (1-3)
type: rest                                      # rest | graph | kql
url: https://management.azure.com/...           # REST API URL with placeholders
jmespath: value[].{...}                         # JMESPath projection (REST)

Complete Query Inventory

Phase File ID Type Description
1 M1-AnalyticRuleMitre.yaml mitre-m1 rest Scheduled + NRT analytic rules with MITRE tactics, techniques, severity, query text
1 M2-CustomDetectionMitre.yaml mitre-m2 graph Custom Detection rules with mitreTechniques (graceful skip if auth unavailable)
2 M3-SocOptCoverage.yaml mitre-m3 rest SOC Optimization coverage recommendations with threat scenarios and MITRE tagging suggestions
3 M4-AlertFiringByMitre.yaml mitre-m4 kql SecurityAlert firing counts per rule with severity breakdown (30d lookback)
3 M5-IncidentsByTactic.yaml mitre-m5 kql SecurityIncident volume by tactic with classification breakdown
3 M6-PlatformAlertCoverage.yaml mitre-m6 kql Platform-native SecurityAlert detections with MITRE technique attribution (excludes custom rules)
3 M7-TableIngestionVolume.yaml mitre-m7 kql 7-day average daily ingestion volume per table from Usage table for data readiness validation
3 M8-ConnectorHealth.yaml mitre-m8 kql SentinelHealth data connector fetch status — latest state, success/failure counts, health % per connector (supplements M7 with early-warning connector failure detection)
3 M9-TableTierClassification.yaml mitre-m9 cli Log Analytics table tier metadata (Analytics/Basic/Data Lake) via az monitor log-analytics workspace table list — identifies tables that analytics rules cannot query

Output Modes

Mode 1: Inline Chat Summary (default for quick requests)

Compact executive summary rendered directly in chat with MITRE Coverage Score and top coverage gaps.

Mode 2: Markdown File Report

Full detailed report saved to reports/sentinel/mitre_coverage_report_<YYYYMMDD_HHMMSS>.md.

Mode 3: Both (default when user says "report" or "generate report")

Inline chat executive summary + full markdown file.

Ask user if not specified:

"How would you like the MITRE coverage report? I can provide:

  1. Inline chat summary — MITRE Score + top gaps in chat
  2. Markdown file — detailed report saved to reports/sentinel/
  3. Both (recommended) — summary in chat + full report file"

Deterministic Rendering Rules

These rules eliminate LLM interpretation variance. Apply them EXACTLY during Phase 4 rendering.

Rule A: Coverage Level Classification

Assign emoji badges to each tactic row in the coverage matrix based on the percentage of techniques covered:

Coverage % Badge Level
0% 🔴 No coverage
1-15% 🟠 Critical gap
16-30% 🟡 Partial
31-50% 🔵 Moderate
51-75% 🟢 Good
>75% Strong

⛔ PROHIBITED: Assigning badges based on "importance" or "this tactic is more relevant." The badge MUST match the percentage threshold table above.

Rule B: Threat Scenario Priority

When rendering SOC Optimization threat scenarios, order by coverage gap (recommended minus active) descending, but assign badges based on completion rate (proportional to scenario size):

Completion Rate Priority Badge
<15% 🔴 High Very early stage — most recommendations unaddressed
15–35% 🟠 Medium Work in progress — significant room for improvement
35–60% 🟡 Low Approaching healthy coverage for typical environments
≥60% ✅ Met Strong coverage — well above realistic implementation targets

Why rate-based? Recommendation counts reflect the full Content Hub template catalogue including templates for vendor products not deployed in the environment (e.g., all firewall vendors). A 609-rule scenario will be permanently 🔴 under absolute-gap thresholds even at 80% coverage. Rate-based badges give proportional, meaningful progress signals.

CompletedBySystem note: CompletedBySystem is a SOC Optimization state, not a rate indicator. Some CompletedBySystem entries have low rates (recommended >> active). Always use the completion rate for badge assignment. The State column is displayed for context but does NOT override the rate-based badge.

Rule C: "Paper Tiger" Detection

When Phase 3 data is available, identify paper tiger rules — rules with MITRE tags that have NEVER produced an alert in the lookback period. These rules are tagged but non-operational, and their coverage is theoretical, not proven.

Condition Classification Display
Rule tagged with MITRE + 0 alerts in lookback ⚠️ Paper tiger Note in technique drill-down
Rule tagged with MITRE + ≥1 alert ✅ Operationally validated Normal display
Phase 3 data unavailable (FAILED/SKIPPED) Skip paper-tiger analysis, note data gap

⛔ PROHIBITED: Reporting coverage percentages as "validated" when Phase 3 data is missing. If M4/M5 failed, state: "Coverage percentages reflect rule tagging only — operational validation unavailable (Phase 3 KQL queries failed)."

Rule D: Recommendation Ranking

Rank recommendations by impact using this priority order:

Priority Category Criteria
1 🔴 Low-rate threat scenarios SOC Optimization scenarios with <15% completion rate. Exclude CompletedByUser scenarios with ≥50% completion rate (Rule E — Reviewed & Addressed). Only include ⚠️ Premature CompletedByUser (<50% rate)
2 🔴 Zero-coverage detectable tactics Tactics with 0% coverage AND ✅ Detectable classification (see tactic table). Exclude ⬜ Inherent blind spot tactics (Reconnaissance, Resource Development) — report these as acknowledged limitations, not actionable gaps
3 🟠 Untagged rule remediation Rules with AI-suggested MITRE tags from SOC Optimization
4 🟠 Paper tiger rules MITRE-tagged rules that never fire (if Phase 3 available)
5 🟡 Low-coverage tactics Tactics with 1-15% coverage
6 🟡 Content Hub suggestions Template-based rules available for uncovered techniques
7 Inherent blind spot tactics Zero-coverage tactics classified as ⬜ Inherent blind spot. Acknowledge the limitation; suggest compensating controls (threat intel feeds, brand monitoring) only if relevant to the organization

Rule E: CompletedByUser Completion-Rate Gate

When a SOC Optimization threat scenario has State == CompletedByUser, the user has manually marked it as reviewed. However, marking a scenario "complete" after enabling 2/500 recommendations is fundamentally different from enabling 28/46. Use the completion rate (ActiveDetections / RecommendedDetections × 100) to determine rendering treatment:

CompletedByUser + Completion Rate Treatment Rationale
≥ 50% 🟢 Reviewed & Addressed — render in a separate muted "Reviewed Scenarios" summary below the active gaps table. Exclude from §6 recommendations and Coverage Priority Matrix User has genuinely triaged the scenario; remaining gap is likely non-applicable templates or platform-only coverage
< 50% ⚠️ Premature Completion — render in the main active gaps table with full gap badge + ⚠️ flag in the State column. Include in §6 recommendations Gap is too large relative to recommendations to be a deliberate triage decision

Threshold: 50% is the default. This balances trust in the user's judgment against protection from rubber-stamped completions.

Scratchpad column: CompletionRate is pre-computed by the PS1 and included in the ThreatScenarios row. The LLM reads this value directly — do not recompute it.

Interaction with Rule B (rate-based badges): Rule B still applies for badge assignment on all scenarios. Rule E only controls where CompletedByUser scenarios are rendered (active table vs reviewed summary) and whether they appear in §6 recommendations.

CompletedBySystem scenarios are not affected — they continue to use rate-based badges (Rule B) without the completion-rate gate, since the system assessment is independent of user action.


MITRE Coverage Score

The MITRE Coverage Score is a composite metric (0-100) computed by the PS1 from 5 weighted dimensions. Each dimension scores 0-100 independently, then the weighted sum produces the final score.

Dimensions

# Dimension Weight Formula What It Measures
1 Breadth 25% (Σ per-technique readiness credit / total ATT&CK techniques) × 100 blended 60/40 with combined platform coverage Readiness-weighted technique coverage. Each technique gets fractional credit based on the best rule covering it: Fired=1.0, Ready=0.75, Partial=0.50, NoData=0.25, TierBlocked=0.0. AR and CD rules follow the same readiness constraints. One firing rule gives full credit even if other rules covering the same technique are NoData
2 Balance 10% (tactics with ≥1 rule / 14 tactics) × 100 Whether coverage spans all kill chain phases or clusters in a few
3 Operational 30% (MITRE-tagged rules that fired alerts / total MITRE-tagged enabled rules) × 100 Whether tagged rules actually produce detections (not paper tigers). Highest weight: directly rewards purple teaming and operationally validated detections
4 Tagging 15% (rules with MITRE tags / total rules) × 100 Completeness of MITRE classification across the rule inventory
5 SOC Alignment 20% (completed SOC recommendations / total SOC coverage recommendations) × 100 Alignment with Microsoft's threat-scenario-driven coverage model

Score Interpretation

Score Range Assessment Typical Profile
80-100 🟢 Strong Broad coverage, balanced tactics, operationally validated, well-tagged, SOC-aligned
60-79 🔵 Good Solid coverage with some gaps; may have clustering or unvalidated rules
40-59 🟡 Moderate Significant gaps in breadth or operational validation; improvement opportunities
20-39 🟠 Developing Limited coverage across the framework; many uncovered tactics
0-19 🔴 Critical Minimal detection coverage; urgent investment needed

Score Context Notes

  • Operational = 0 when Phase 3 KQL queries fail (token expiry). Report this: "Operational score 0 reflects data unavailability, not necessarily poor operational coverage."
  • SOC Alignment = 50 (default) when no SOC Optimization recommendations exist. This is a neutral baseline, not a penalty.
  • Breadth score is naturally low because the ATT&CK framework contains 216+ techniques, many of which are endpoint-specific or pre-compromise with limited Sentinel visibility. Do NOT present this as a crisis — contextualize it: "Prioritize coverage by threat scenario relevance rather than pursuing raw percentage."
  • Custom Detections SKIPPED affects Breadth and Tagging dimensions (rules not counted). Note the impact in the report.
  • Platform Coverage is reported as a supplementary metric alongside the MITRE Score (not folded into the 5 dimensions). The scratchpad includes Platform_Tier1/2/3, Platform_ActiveProducts, and RuleBasedPlusPlatform_Coverage. Render this in §1 and §5 per SKILL-report.md templates. The CTID tier classification requires m365-platform-coverage.json — if the file is missing, platform tiers default to empty and the report notes the limitation.

Domain Reference

ATT&CK Enterprise Tactic Kill Chain Order

The 14 ATT&CK Enterprise tactics in kill chain order (PS1 uses this ordering for all output):

# Tactic (Sentinel API name) Display Name Cloud/Identity Relevance Detectability
1 Reconnaissance Reconnaissance 🟡 Low — mostly pre-compromise; limited Sentinel visibility ⬜ Inherent blind spot
2 ResourceDevelopment Resource Development 🟡 Low — attacker infrastructure; limited Sentinel visibility ⬜ Inherent blind spot
3 InitialAccess Initial Access 🔴 High — phishing, valid accounts, external services ✅ Detectable
4 Execution Execution 🟠 Medium — scripting, cloud admin commands ✅ Detectable
5 Persistence Persistence 🔴 High — account manipulation, app registrations, inbox rules ✅ Detectable
6 PrivilegeEscalation Privilege Escalation 🔴 High — tenant policy modification, valid accounts ✅ Detectable
7 DefenseEvasion Defense Evasion 🟠 Medium — many techniques are endpoint-focused ✅ Detectable
8 CredentialAccess Credential Access 🔴 High — brute force, token theft, AiTM ✅ Detectable
9 Discovery Discovery 🟡 Medium — account/cloud service discovery ✅ Detectable
10 LateralMovement Lateral Movement 🟠 Medium — remote services, internal spearphishing ✅ Detectable
11 Collection Collection 🟡 Medium — email collection, data from cloud storage ✅ Detectable
12 CommandAndControl Command and Control 🟠 Medium — application layer protocol, web service ✅ Detectable
13 Exfiltration Exfiltration 🟠 Medium — exfiltration over C2 channel, cloud account ✅ Detectable
14 Impact Impact 🟠 Medium — resource hijacking (crypto mining), account removal ✅ Detectable

Detectability classification:

  • ✅ Detectable: Techniques in this tactic generate observable events in Sentinel data sources (sign-in logs, audit logs, endpoint telemetry, email events, etc.). KQL detection rules can be written and deployed.
  • ⬜ Inherent blind spot: Techniques in this tactic describe attacker activity that occurs outside the monitored environment (e.g., attacker creating fake accounts on external services, acquiring infrastructure). CTID mappings for these tactics are typically protect/respond capabilities (Conditional Access blocking, PAM restrictions), not detect. No KQL detection rules exist or can realistically be created. Do not recommend deploying rules for inherent blind spot tactics — acknowledge the limitation and recommend compensating controls (e.g., brand monitoring services, threat intelligence feeds) if relevant.

Sentinel-Specific MITRE Mapping Notes

  • Sentinel uses PascalCase for tactic names in the REST API: InitialAccess, CommandAndControl, CredentialAccess. The ATT&CK STIX data uses kebab-case (initial-access). The reference JSON maps between these.
  • Sub-techniques (T1xxx.xxx) are tracked by Sentinel but the REST API properties.techniques field may contain both parent techniques (T1078) and sub-techniques (T1078.004). The PS1 counts at the parent technique level for coverage matrix purposes.
  • ICS/OT techniques (T0xxx) use a separate numbering scheme from ATT&CK for ICS. These are extracted and reported separately since they don't map to the Enterprise framework.
  • Custom Detection mitreTechniques uses the same technique ID format but may specify sub-techniques that analytic rules don't. The PS1 aggregates both sources.

Tactic-Specific Detection Guidance

When rendering recommendations (§6), use these cloud/identity-relevant technique priorities:

Tactic Key Sentinel-Detectable Techniques Priority
InitialAccess T1078 (Valid Accounts), T1566 (Phishing), T1133 (External Remote Services) 🔴 Must-have
Persistence T1098 (Account Manipulation), T1136 (Create Account), T1078 (Valid Accounts) 🔴 Must-have
CredentialAccess T1110 (Brute Force), T1528 (Steal App Access Token), T1621 (MFA Request Gen) 🔴 Must-have
PrivilegeEscalation T1484 (Domain/Tenant Policy Mod), T1078 (Valid Accounts), T1098 (Account Manipulation) 🔴 Must-have
DefenseEvasion T1078 (Valid Accounts), T1484 (Domain/Tenant Policy Mod), T1562 (Impair Defenses) 🟠 Important
Exfiltration T1567 (Exfil Over Web Service), T1537 (Transfer to Cloud Account) 🟠 Important
Collection T1114 (Email Collection), T1213 (Data from Info Repos) 🟠 Important

SOC Optimization Threat Scenario Reference

SOC Optimization recommendations map to named threat scenarios. When rendering §4, interpret these:

Scenario Key Attack Pattern Priority Tactics
AiTM (Adversary in the Middle) Session token theft, AiTM phishing InitialAccess, CredentialAccess
BEC (Financial Fraud) Email account takeover for wire fraud InitialAccess, CredentialAccess, Persistence
BEC (Mass Credential Harvest) Large-scale phishing campaigns InitialAccess, CredentialAccess, DefenseEvasion
Human Operated Ransomware Post-compromise hands-on keyboard LateralMovement, CredentialAccess, DefenseEvasion, Impact
Credential Exploitation Credential stuffing, password spray InitialAccess, CredentialAccess, Discovery
IaaS Resource Theft Cloud compute hijacking (crypto mining) CredentialAccess, Persistence, Impact
Network Infiltration Traditional network-based attacks Discovery, LateralMovement, C2
X-Cloud Attacks Cross-cloud lateral movement CredentialAccess, PrivilegeEscalation, Persistence
ERP (SAP) SAP financial process manipulation InitialAccess, DefenseEvasion

SOC Optimization Recommendation States

State Meaning Report Treatment
Active Recommendation is open and actionable Show as gap — count toward coverage deficit
InProgress User has started addressing the recommendation Show as in-progress — partial credit
CompletedBySystem Microsoft's automated assessment found coverage adequate Use rate-based badge (may still show 🔴/🟠/🟡 if completion rate is low). State displayed in table for context
Completed User manually marked as complete Show as met — ✅

SVG Dashboard Generation

After the report is generated, the user may request an SVG dashboard visualization.

Trigger: "generate SVG dashboard", "visualize this report", "SVG from the MITRE report"

✅ DEFAULT: run the deterministic renderer (render_dashboard.py)

Do this first — do NOT hand-author the SVG. render_dashboard.py produces the manifest-driven 5-row dashboard non-interactively, parsing every value from the scratchpad + report + svg-widgets.yaml (no hardcoded run data). It is faster, deterministic, and produces a known-good layout. Run it:

python .github/skills/mitre-coverage-report/render_dashboard.py \
  --scratch temp/mitre_scratch_<ts>.md \
  --manifest .github/skills/mitre-coverage-report/svg-widgets.yaml \
  --report reports/sentinel/mitre_coverage_report_<label>_<ts>.md \
  --out reports/sentinel/mitre_coverage_report_<label>_<ts>_dashboard.svg

It reads the donut center, score dimensions, tactic bars, threat-scenario table, and KPI values from the scratchpad, and the Top-3 recommendation cards from the report's ### 🎯 Top 3 Recommendations table (falling back to top threat-scenario gaps if absent). Output is self-contained SVG with explicit fill on every <text>.

Action Status
Running render_dashboard.py when the user asks to visualize/generate a dashboard REQUIRED (default path)
Hand-authoring the SVG via the svg-dashboard skill instead of running the script PROHIBITED unless the user explicitly asks for a bespoke/custom layout the renderer can't produce

Fallback — bespoke/interactive dashboards (svg-dashboard skill)

Only use this path when the user explicitly wants a custom layout, different widgets, or styling the deterministic renderer doesn't support. Edit svg-widgets.yaml first if the change is layout/field-level — the renderer reads it at generation time, so many "customizations" don't require hand-authoring.

  1. Load the svg-dashboard skill
  2. Use the rendered report + scratchpad data to build visualization widgets
  3. Recommended widget types for MITRE coverage:
    • Score card — MITRE Coverage Score with 5 dimension breakdown
    • Bar chart — Per-tactic coverage percentages (14 bars)
    • Donut chart — Rule inventory breakdown (AR enabled/disabled, CD enabled/disabled, untagged)
    • Table — Top 5 coverage gaps (tactic + gap %)
    • KPI cards — Total techniques covered, SOC scenarios met, untagged rules

Troubleshooting

Issue Solution
Phase 3 KQL queries fail (token expired) Re-authenticate: az login --tenant <tenant_id> --scope https://api.loganalytics.io/.default
Custom Detections SKIPPED Normal if Graph API admin consent not granted. Report proceeds with AR-only analysis
SOC Optimization returns 0 recs Workspace may not have SOC Optimization enabled, or all recommendations are already completed
Breadth score seems low (10-20%) This is typical — 216+ techniques means even well-covered workspaces have low percentages. Focus on threat-scenario-aligned priorities, not raw percentage
ICS techniques appear in output Normal if Defender for IoT rules are deployed. They're reported separately from Enterprise ATT&CK
az rest returns 403 Check RBAC: user needs Microsoft Sentinel Reader on the workspace
用于检测终端或设备进程执行行为的范围漂移,识别超出基线的渐进式异常。支持单设备和全舰队模式,通过五维加权评分及关联安全警报进行深度分析。
device drift device process drift endpoint drift process baseline device behavioral change
.github/skills/scope-drift-detection/device/SKILL.md
npx skills add SCStelz/security-investigator --skill scope-drift-detection-device -g -y
SKILL.md
Frontmatter
{
    "name": "scope-drift-detection-device",
    "description": "Use this skill when asked to detect scope drift, behavioral expansion, or process baseline deviation on devices or endpoints. Triggers on keywords like \"device drift\", \"device process drift\", \"endpoint drift\", \"process baseline\", \"device behavioral change\", or when investigating whether a device has gradually expanded its process execution beyond an established baseline. This skill builds a configurable-window behavioral baseline using DeviceProcessEvents, compares baseline with recent activity, computes a weighted Drift Score across 5 dimensions (Volume, Processes, Accounts, Process Chains, Signing Companies), and correlates with SecurityAlert, DeviceInfo (for uptime corroboration via MDE sensor health), and command-line pattern analysis. Supports fleet-wide and single-device modes.",
    "drill_down_prompt": "Analyze device process drift for {entity} — behavioral baseline vs recent activity",
    "threat_pulse_domains": [
        "endpoint"
    ]
}

Device Scope Drift Detection — Instructions

Purpose

This skill detects scope drift — the gradual, often imperceptible expansion of process execution behavior beyond an established baseline — in endpoints and devices. Unlike sudden compromise (which triggers alerts), scope drift is a slow-burn pattern that evades threshold-based detections.

Entity Type: Device

Identifier Primary Table(s) Use Case
DeviceName (hostname) DeviceProcessEvents Endpoints, servers, workstations — fleet-wide or single-device process baseline analysis

What this skill detects:

  • Volume spikes in process execution relative to historical baseline
  • New processes or process chains not seen in the baseline period
  • New service accounts or user contexts executing processes
  • Unsigned or unusually-signed binaries executing on endpoints
  • Reconnaissance, lateral movement, persistence, and exfiltration command patterns
  • Security alerts involving the drifting devices

Two operating modes:

Mode When to Use Scope
Fleet-wide "Check all devices for process drift", "device drift across the fleet" Computes per-device drift scores, ranks all devices, flags those > 150%
Single-device "Investigate process drift on DEVICE-01", specific hostname provided Deep dive on one device with full process inventory and command-line analysis

Related skills:


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Output Modes - Inline chat vs. Markdown file
  3. Quick Start - 10-step investigation pattern
  4. Drift Score Formula - Weighted composite scoring (5 dimensions)
  5. Execution Workflow - Complete 4-phase process
  6. Sample KQL Queries - Validated query patterns (Queries 14-22)
  7. Report Template - Output format specification
  8. Known Pitfalls - Edge cases and false positives
  9. Error Handling - Troubleshooting guide
  10. SVG Dashboard Generation - Visual dashboard from report

Investigation shortcuts:

  • Device with behavioral drift (TP Q6): Q15 (per-device drift scores + dimension ratios) → Q16 (first-seen processes — new in recent window) → Q18 (alert/incident correlation) → Q21 (uptime context)
  • Suspicious process chains (TP Q7): Q17 (rare parent→child chains in recent window) → Q20 (command-line pattern detection — recon, lateral movement, persistence) → Q18 (alert correlation)
  • Fleet uniformity assessment (TP Q6, all devices clustered): Q14 (fleet-wide daily trend) → Q15 (per-device breakdown) → Q22 (per-session volume — confirms burst vs sustained activity)
  • Unsigned binary investigation (standalone): Q19 (unsigned/unusual signing companies in recent window) → Q16 (first-seen process overlap) → Q20 (command-line patterns for flagged binaries)

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY device scope drift analysis:

  1. ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
  2. ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
  3. ALWAYS determine mode — fleet-wide or single-device
  4. ALWAYS determine time windows — baseline period and recent period (configurable, defaults: 6-day baseline, 1-day recent within 7-day lookback)
  5. ALWAYS build baseline FIRST before comparing recent activity
  6. ALWAYS apply the low-volume denominator floor to prevent false-positive drift scores on sparse baselines
  7. ALWAYS correlate across all required data sources (DeviceProcessEvents, SecurityAlert, DeviceInfo)
  8. ALWAYS run independent queries in parallel for performance
  9. NEVER report a drift flag without corroborating evidence from at least one secondary data source

Data Sources

Data Source Role Purpose
DeviceProcessEvents ✅ Primary Device process execution baseline
SecurityAlert ✅ Corroboration Corroborating alert evidence
SecurityIncident ✅ Corroboration Real alert status/classification
DeviceInfo ✅ Corroboration Device uptime/power-on pattern via MDE sensor health (primary — covers all MDE-onboarded devices)
Heartbeat ⚡ Fallback Device uptime for non-MDE devices with Log Analytics agent (AMA/MMA) only

⛔ MANDATORY: Sentinel Workspace Selection

This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:

When invoked from incident-investigation skill:

  • Inherit the workspace selection from the parent investigation context
  • If no workspace was selected in parent context: STOP and ask user to select

When invoked standalone (direct user request):

  1. ALWAYS call list_sentinel_workspaces MCP tool FIRST
  2. If 1 workspace exists: Auto-select, display to user, proceed
  3. If multiple workspaces exist:
    • Display all workspaces with Name and ID
    • ASK: "Which Sentinel workspace should I use for this investigation?"
    • ⛔ STOP AND WAIT for user response
    • ⛔ DO NOT proceed until user explicitly selects
  4. If a query fails on the selected workspace:
    • ⛔ DO NOT automatically try another workspace
    • STOP and report the error, display available workspaces, ASK user to select

🔴 PROHIBITED ACTIONS:

  • ❌ Selecting a workspace without user consent when multiple exist
  • ❌ Switching to another workspace after a failure without asking
  • ❌ Proceeding with investigation if workspace selection is ambiguous

Output Modes

This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.

Mode 1: Inline Chat Summary (Default)

  • Render the full drift analysis directly in the chat response
  • Includes ASCII tables, drift dimension bars, and security assessment
  • Best for quick review and interactive follow-up questions

Mode 2: Markdown File Report

  • Save a comprehensive report to reports/scope-drift/device/Scope_Drift_Report_<entity>_<timestamp>.md
  • All ASCII visualizations render correctly inside markdown code fences (```)
  • Includes all data from inline mode plus additional detail sections
  • Use create_file tool — NEVER use terminal commands for file output
  • Filename patterns:
    • Fleet-wide: Scope_Drift_Report_fleet_devices_YYYYMMDD_HHMMSS.md
    • Single-device: Scope_Drift_Report_<device_name>_YYYYMMDD_HHMMSS.md (lowercase, sanitized)

Markdown Rendering Notes

  • ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
  • ✅ Unicode block characters ( full block, box-drawing horizontal) display correctly in monospaced fonts
  • ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
  • ✅ Standard markdown tables (| col |) render as formatted tables
  • Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering

Quick Start (TL;DR)

When a user requests device scope drift detection:

  1. Select Workspacelist_sentinel_workspaces, auto-select or ask
  2. Determine Mode → Fleet-wide or single-device? Determine time windows.
  3. Determine Output Mode → Ask if not specified: inline, markdown file, or both
  4. Run Phase 1 → Query 14 (daily summary) + Query 15 (per-device breakdown)
  5. Apply Fleet Scaling → Compute drift scores, rank devices, apply tiered depth limits (see Fleet Scaling)
  6. Run Phase 2 → Query 16 (first-seen processes) + Query 17 (rare process chains) — scoped to Tier 1 + Tier 2 devices only
  7. Run Phase 3 → Query 18 (SecurityAlert + SecurityIncident) + Query 19 (unsigned/unusual) + Query 20 (notable command-line patterns) — scoped to Tier 1 devices only
  8. Run Phase 4 (corroboration) → Query 21 (DeviceInfo uptime) + Query 22 (per-session volume) for flagged/intermittent devices in Tier 1
  9. Compute Final Assessment → Combine drift scores with corroborating evidence
  10. Output Results → Render in selected mode(s) with tiered depth

Baseline and Recent Windows

Device process drift supports configurable time windows unlike sign-in drift (which uses fixed 90d/7d). The user may specify:

User Request Baseline Window Recent Window
"24 hours over the last 7 days" Days 1–6 Day 7 (last 24h)
"last 48 hours vs previous week" Days 3–9 Days 1–2
"process drift last 30 days" Days 8–30 Days 1–7
No time specified Last 6 days Last 24 hours

Note: Follow the global Tool Selection Rule in .github/copilot-instructions.md. For lookbacks ≤ 30 days, use RunAdvancedHuntingQuery (free on Analytics-tier DeviceProcessEvents; swap TimeGeneratedTimestamp). For lookbacks > 30 days (AH Graph API cap), use mcp_sentinel-data_query_lake with TimeGenerated. Sample queries below are written with TimeGenerated; adapt the column name when running in Advanced Hunting.


Fleet Scaling (Large Environments)

Problem: In small environments (< 50 devices), every device gets a full deep dive. In environments with hundreds or thousands of devices, running Queries 16–22 for every flagged device is prohibitively expensive (query timeouts, massive result sets, unreadable reports).

Solution: After Phase 1 computes drift scores for all devices, apply tiered depth based on fleet size and drift severity.

Fleet Size Detection

After Query 15, count distinct devices in the result set:

Fleet Size Tier Deep Dive Limit Behavior
≤ 50 devices Small All flagged Full deep dive for every device > 150%. No limiting needed.
51–200 devices Medium Top 10 Full deep dive for top 10 by DriftScore. Summary row for remaining flagged devices.
201–1000 devices Large Top 10 Full deep dive for top 10. Tier 2 summary (next 20) with first-seen processes only. Remaining flagged devices listed in ranking table with scores but no deep dive.
> 1000 devices Very Large Top 10 Same as Large, plus: filter Query 15 to BL_TotalEvents > 10 to exclude near-silent devices from scoring.

Tiered Depth Model

After computing drift scores and ranking all devices, assign tiers:

Tier Devices Queries Run Report Depth
Tier 1 (Full) Top N by DriftScore (N = deep dive limit from table above) All: Q16, Q17, Q18, Q19, Q20, Q21, Q22 Full deep dive: ASCII chart, dimension table, first-seen processes, process chains, command-line patterns, alerts, DeviceInfo uptime
Tier 2 (Summary) Next 20 flagged devices (or remaining if < 20) Q16 only (first-seen processes) One-line summary per device: score, top 3 new processes, flag status
Tier 3 (Score only) All remaining flagged devices None beyond Phase 1 Row in ranking table: device name, drift score, dimension ratios, flag emoji
Stable Devices ≤ 150% None beyond Phase 1 Omitted from deep dives. Included in fleet summary statistics only.

KQL Scoping for Large Fleets

When running Phase 2–4 queries for large fleets, scope them to the relevant device tier using a let block:

// Scope Phase 2–3 queries to Tier 1 devices only
let tier1Devices = dynamic(["device-a", "device-b", "device-c"]);
DeviceProcessEvents
| where TimeGenerated > ago(lookback)
| where DeviceName in~ (tier1Devices)
// ... rest of query

User Override

If the user explicitly asks for "all devices" or "full report", honor the request but warn:

⚠️ Fleet has <N> devices with <X> flagged above 150%. Running full deep dives for all flagged devices may be slow and produce a very long report. Proceed? (Default: top 10 deep dives + summary for others)

Report Disclosure

When tiered depth is applied, always disclose in the report header:

**Fleet Size:** <N> devices (Large fleet — tiered analysis applied)
**Deep Dives:** Top <X> by DriftScore (Tier 1: full analysis)
**Summaries:** <Y> additional flagged devices (Tier 2: first-seen processes only)
**Score Only:** <Z> additional flagged devices (Tier 3: ranking table only)
**Stable:** <W> devices ≤ 150% (omitted from deep dives)

Drift Score Formula

The Drift Score is a weighted composite of behavioral dimensions, normalized so that 100 = identical to baseline.

Device Formula (5 Dimensions)

$$ \text{DriftScore}_{Device} = 0.30V + 0.25P + 0.15A + 0.20C + 0.10S $$

Dimension Weight Metric Why
Volume 30% Daily avg process events (recent / baseline) Sudden activity surges indicate new software, lateral movement, or compromise
Processes 25% Distinct process filenames executed New processes = new software deployment, malware, or living-off-the-land tools
Accounts 15% Distinct account identities executing processes New accounts = lateral movement, privilege escalation, or unauthorized access
Process Chains 20% Distinct parent→child process relationships New chains = novel execution patterns, potentially malicious process trees
Signing Companies 10% Distinct file signing entities New unsigned or unusually-signed binaries = potential malware or unauthorized tools

Interpretation Scale

Score Meaning Action
< 80 Contracting scope ✅ Normal — entity is doing less than usual
80–120 Stable / normal variance ✅ No action required
120–150 Moderate deviation 🟡 Monitor — check for legitimate reasons
> 150 Significant drift 🔴 FLAG — investigate with corroborating evidence
> 250 Extreme drift 🔴 CRITICAL — immediate investigation required

Low-Volume Denominator Floor

CRITICAL: For devices with sparse baselines (< 10 daily process events), the volume ratio is artificially inflated. Apply a floor:

IF BL_DailyAvg < 10:
    AdjustedVolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100
    Flag the score with: "⚠️ Low-volume baseline — ratio may be inflated"

Execution Workflow

Phase 1: Behavioral Baseline vs. Recent Comparison

Default windows: Baseline = days 1-6 ago, Recent = last 24h (within 7-day lookback). Configurable by user.

This is the primary query that computes per-device behavioral profiles and drift metrics.

Data Source Query Notes
DeviceProcessEvents Query 14 Fleet-wide daily summary
DeviceProcessEvents Query 15 Per-device daily breakdown with drift score computation

Fleet-wide produces ONE drift score per device. Devices are ranked by DriftScore; those exceeding 150% are assigned to tiers based on fleet size (see Fleet Scaling). Tier 1 devices get full deep dives; Tier 2 get summary analysis; Tier 3 appear in the ranking table only.

Phase 2: Process Drift Pattern Analysis

  • First-seen processes (Query 16): Processes appearing only in the recent window with no baseline history. These are the strongest drift signal — new software, tools, or malware.
  • Rare process chains (Query 17): Parent→child execution relationships seen only in the recent window. New chains may indicate novel attack patterns, lateral movement tools, or changed automation.

Phase 3: Corroborating Signal Collection (Run in Parallel)

  • SecurityAlert + SecurityIncident (Query 18): Alerts referencing any of the analyzed devices, joined with SecurityIncident for real status. Never read SecurityAlert.Status directly — it's always "New".
  • Unsigned/unusual processes (Query 19): Processes with signing companies not seen in the baseline, or unsigned binaries. Legitimate software deployments will show known signing companies; malware or tools may be unsigned or signed by unusual entities.
  • Notable command-line patterns (Query 20): Search for reconnaissance commands (whoami, net user, ipconfig, nltest, systeminfo), lateral movement (psexec, wmic), persistence mechanisms (schtasks, reg add), and exfiltration indicators (curl, wget, certutil).
  • Account landscape analysis: Review which accounts executed processes — flag any new service accounts, admin accounts, or unexpected user contexts in the recent window.

Phase 4: Uptime Corroboration (For Flagged/Intermittent Devices)

  • DeviceInfo uptime pattern (Query 21): For any device with a drift score near or above the 150% threshold, or any device known/suspected to be intermittently powered on, query the DeviceInfo table to determine actual uptime days via MDE sensor health state. This is the primary corroboration source and covers all MDE-onboarded devices. For non-MDE devices with only Log Analytics agent (AMA/MMA), fall back to the Heartbeat table using the same query pattern (substitute DeviceInfoHeartbeat, DeviceNameComputer, SensorHealthStateOSType).
  • Per-session process volume (Query 22): Query DeviceProcessEvents per-day to show per-session event concentration. This context is critical for interpreting volume-based drift — a device that was online only 5 days out of 90 will have a diluted baseline daily average, making any recent power-on session appear as a massive volume spike.
  • Run Queries 21+22 for flagged devices and include the uptime context in the deep dive section.

Phase 5: Score Computation & Report Generation

  1. Compute DriftScore per device using the 5-dimension formula
  2. Apply the low-volume denominator floor
  3. Flag any device exceeding 150% threshold
  4. Handle special cases:
    • Newly onboarded devices (no baseline = DriftScore 999) should be flagged as "New Device" rather than drift
    • Data Lake ingestion boundaries may cause zero recent-window activity — verify before reporting contraction
  5. For devices with elevated Volume ratio (>200%) or near-threshold DriftScore (>130%): Run Queries 21+22 (DeviceInfo uptime + per-session volume) to determine if the volume spike is explained by intermittent power-on usage. If the device was only online for a small fraction of the baseline window, note as mitigating factor.
  6. Generate risk assessment with emoji-coded findings
  7. Render output in the user's selected mode

Sample KQL Queries

Query 14: Device Process Events — Daily Summary (Fleet-Wide)

// Daily summary of process events across all devices
// Configurable: adjust 'lookback' for total analysis window
let lookback = 7d;
DeviceProcessEvents
| where TimeGenerated > ago(lookback)
| summarize
    TotalEvents = count(),
    DistinctDevices = dcount(DeviceName),
    DistinctProcesses = dcount(FileName),
    DistinctAccounts = dcount(AccountName),
    DistinctChains = dcount(strcat(InitiatingProcessFileName, "→", FileName)),
    DistinctCompanies = dcount(ProcessVersionInfoCompanyName)
    by Day = bin(TimeGenerated, 1d)
| order by Day asc

Purpose: Provides the fleet-wide daily trend to identify volume anomalies and determine optimal baseline/recent window split. Use this to verify data availability before running the per-device breakdown.

Query 15: Per-Device Daily Breakdown & Drift Score Computation

// Per-device per-day behavioral profile with drift score computation
// Configurable time windows:
//   baselineDays = number of days in baseline period
//   recentDays = number of days in recent period
//   lookback = baselineDays + recentDays
let lookback = 7d;
let recentDays = 1;  // Last N days as "recent" window
let baselineDays = 6; // Remaining days as "baseline"
let recentStart = ago(1d * recentDays);
DeviceProcessEvents
| where TimeGenerated > ago(lookback)
| extend IsRecent = TimeGenerated >= recentStart
| summarize
    TotalEvents = count(),
    DistinctProcesses = dcount(FileName),
    DistinctAccounts = dcount(AccountName),
    DistinctChains = dcount(strcat(InitiatingProcessFileName, "→", FileName)),
    DistinctCompanies = dcount(ProcessVersionInfoCompanyName)
    by DeviceName, IsRecent
| extend Period = iff(IsRecent, "Recent", "Baseline")
| order by DeviceName, Period asc

Post-Processing: After retrieving results, compute per-device drift scores:

  1. For each device, extract Baseline and Recent rows
  2. Compute daily averages: BL_DailyAvg = BL_TotalEvents / baselineDays, RC_DailyAvg = RC_TotalEvents / recentDays
  3. Compute dimension ratios: VolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100
  4. Apply the Device formula: DriftScore = 0.30×Volume + 0.25×Processes + 0.15×Accounts + 0.20×Chains + 0.10×Companies
  5. Handle edge cases:
    • Device in baseline only (no recent data): Check if data ingestion boundary or genuine silence
    • Device in recent only (no baseline): Set DriftScore = 999, flag as "New Device — no baseline"
    • Apply denominator floor (max(BL_value, 10)) for low-volume devices

Single-Device Mode: Add | where DeviceName =~ '<DEVICE_NAME>' as the second filter to scope to one device.

Query 16: First-Seen Processes (New in Recent Window)

// Processes appearing only in the recent window — not seen in baseline
// This is the strongest drift signal for devices
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
let baselineProcesses = DeviceProcessEvents
| where TimeGenerated between (ago(lookback) .. recentStart)
| distinct FileName;
DeviceProcessEvents
| where TimeGenerated >= recentStart
| distinct DeviceName, FileName, ProcessVersionInfoCompanyName
| join kind=leftanti baselineProcesses on FileName
| summarize
    NewProcessCount = dcount(FileName),
    NewProcesses = make_set(FileName, 50),
    Companies = make_set(ProcessVersionInfoCompanyName, 50)
    by DeviceName
| where NewProcessCount > 0
| order by NewProcessCount desc

Interpretation:

  • New processes from recognized vendors (Microsoft, Google, etc.) → likely software updates or deployments
  • Version-stamped update binaries (AM_Delta_Patch_*.exe, MicrosoftEdge_X64_*.exe, odt*.tmp.exe) → expected noise, always appear as "new" (see pitfall: Version-Stamped Process Name False Positives)
  • New unsigned processes or processes from unknown companies → investigate immediately
  • Large number of new processes on a single device → may indicate software deployment, but also possible malware dropper

Single-Device Mode: Add | where DeviceName =~ '<DEVICE_NAME>' to both the baseline and recent subqueries. Then expand to show full process details including ProcessCommandLine and FolderPath.

Fleet-Wide vs. Per-Device First-Seen Behavior: This query identifies processes that are globally novel — not seen on any device during the baseline. If a process ran on DeviceA during baseline but appears on DeviceB for the first time in the recent window, it will NOT be flagged because the baseline distinct FileName covers all devices. This design choice reduces noise (known-good processes aren't re-flagged per device) but may miss per-device novelty. For per-device first-seen analysis, scope the baseline distinct by DeviceName — note this is significantly more expensive on large fleets.

Query 17: Rare Process Chains (Parent→Child Relationships)

// Process chains (parent→child) seen only in recent window
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
let baselineChains = DeviceProcessEvents
| where TimeGenerated between (ago(lookback) .. recentStart)
| extend Chain = strcat(InitiatingProcessFileName, "→", FileName)
| distinct Chain;
DeviceProcessEvents
| where TimeGenerated >= recentStart
| extend Chain = strcat(InitiatingProcessFileName, "→", FileName)
| join kind=leftanti baselineChains on Chain
| summarize
    Occurrences = count(),
    Devices = make_set(DeviceName, 20),
    DeviceCount = dcount(DeviceName),
    Accounts = make_set(AccountName, 10),
    SampleCommandLine = take_any(ProcessCommandLine)
    by Chain
| order by Occurrences desc
| take 30

Interpretation:

  • Common chains like explorer.exe→notepad.exe appearing as "new" → baseline window too short or intermittent usage
  • Update chains like wuauclt.exe→AM_Delta_Patch_*.exe or microsoftedgeupdate.exe→MicrosoftEdge_X64_*.exe → expected noise from automatic updates, always appear as "new" due to version-stamped child process names
  • Suspicious chains like cmd.exe→powershell.exe→certutil.exe → investigate for LOLBin abuse
  • Chains appearing on a single device vs. fleet-wide → single device may indicate targeted activity

Query 18: Device SecurityAlert + SecurityIncident Correlation

// Security alerts referencing analyzed devices, joined with SecurityIncident for real status
// IMPORTANT: SecurityAlert.Status is immutable (always "New") — MUST join SecurityIncident
// Substitute <DEVICE_NAMES> with comma-separated device names from Query 15
let lookback = 7d;
let relevantAlerts = SecurityAlert
| where TimeGenerated > ago(lookback)
| where Entities has_any (<DEVICE_NAMES>) or CompromisedEntity has_any (<DEVICE_NAMES>)
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName, ProductComponentName,
    Tactics, Techniques, CompromisedEntity, TimeGenerated;
SecurityIncident
| where CreatedTime > ago(lookback)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| project IncidentNumber, Title, Severity, Status, Classification,
    AlertName, AlertSeverity, ProductName, Tactics, Techniques,
    CompromisedEntity, AlertTime = TimeGenerated1
| order by AlertTime desc

Interpreting Incident Status in Drift Context:

Incident Status Classification Impact on Drift Assessment
Closed TruePositive 🔴 Confirmed threat — significantly increases drift risk
Closed FalsePositive 🟢 False alarm — discount from drift risk, note as noise
Closed BenignPositive 🟡 Expected behavior — note but don't escalate
Active/New Any 🟠 Unresolved — flag for attention, may indicate ongoing threat

Product Name Mapping (Legacy → Current Branding):

SecurityAlert.ProductName (raw) Report Display Name
Microsoft Defender Advanced Threat Protection Microsoft Defender for Endpoint
Microsoft Cloud App Security Microsoft Defender for Cloud Apps
Microsoft Data Loss Prevention Microsoft Purview Data Loss Prevention
Azure Sentinel Microsoft Sentinel
Microsoft 365 Defender Microsoft Defender XDR
Office 365 Advanced Threat Protection Microsoft Defender for Office 365
Azure Advanced Threat Protection Microsoft Defender for Identity

Report Rendering: Group by incident, show severity/status/classification. Translate ProductName to current branding. Link back to device drift scores — a device with both high drift score AND correlated security alerts is highest priority for investigation.

Query 19: Unsigned/Unusual Signing Companies in Recent Window

// Signing companies appearing only in the recent window
// Unsigned or unusually-signed binaries may indicate unauthorized software or malware
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
let baselineCompanies = DeviceProcessEvents
| where TimeGenerated between (ago(lookback) .. recentStart)
| where isnotempty(ProcessVersionInfoCompanyName)
| distinct ProcessVersionInfoCompanyName;
DeviceProcessEvents
| where TimeGenerated >= recentStart
| summarize
    EventCount = count(),
    Devices = make_set(DeviceName, 20),
    Processes = make_set(FileName, 20)
    by ProcessVersionInfoCompanyName
| join kind=leftanti baselineCompanies on ProcessVersionInfoCompanyName
| where isnotempty(ProcessVersionInfoCompanyName)
| order by EventCount desc

For unsigned processes (empty company field):

// Find unsigned processes in the recent window
// NOTE: Linux devices will dominate results — Linux binaries lack ProcessVersionInfoCompanyName by design.
// Consider filtering to Windows devices: | where DeviceName !has "linux"
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
DeviceProcessEvents
| where TimeGenerated >= recentStart
| where isempty(ProcessVersionInfoCompanyName)
| summarize
    EventCount = count(),
    Devices = make_set(DeviceName, 20),
    SampleCommandLine = take_any(ProcessCommandLine)
    by FileName, FolderPath
| order by EventCount desc
| take 20

Query 20: Notable Command-Line Pattern Detection

// Search for reconnaissance, lateral movement, persistence, and exfiltration command patterns
// Run against the recent window to identify suspicious activity
let lookback = 7d;
let recentDays = 1;
let recentStart = ago(1d * recentDays);
DeviceProcessEvents
| where TimeGenerated >= recentStart
| where ProcessCommandLine has_any (
    // Reconnaissance
    "whoami", "net user", "net group", "net localgroup", "nltest", "systeminfo",
    "ipconfig /all", "nslookup", "query user", "qwinsta",
    // Lateral movement
    "psexec", "wmic", "invoke-command", "enter-pssession", "new-pssession",
    // Persistence
    "schtasks /create", "reg add", "sc create", "New-Service",
    // Credential access
    "mimikatz", "sekurlsa", "lsass", "procdump", "comsvcs.dll",
    // Exfiltration / download
    "certutil -urlcache", "bitsadmin /transfer", "curl ", "wget ",
    "Invoke-WebRequest", "downloadstring", "downloadfile"
    )
| project TimeGenerated, DeviceName, AccountName, FileName,
    InitiatingProcessFileName, ProcessCommandLine
| order by TimeGenerated desc
| take 50

Interpretation:

  • Commands executed by expected service accounts (e.g., MDI sensor running ipconfig /flushdns) → benign
  • Linux health checks (curl to MCR, wget for MOTD) executed by root → expected operational noise
  • Reconnaissance commands from user accounts or unexpected contexts → investigate
  • Multiple categories of suspicious commands on the same device → high confidence indicator of compromise

Query 21: DeviceInfo Uptime Pattern (Device Corroboration)

// Corroboration query: Determine actual device uptime days from DeviceInfo table (MDE sensor)
// DeviceInfo records entity snapshots ~hourly for MDE-onboarded devices
// Run for the full analysis window (baseline + recent) to see power-on cadence
// Substitute <DEVICE_NAME> with the target device hostname
let totalDays = 97; // Intentionally wider than the drift analysis window (default 7d) to capture the device's long-term power-on cadence across 90+ days
DeviceInfo
| where TimeGenerated > ago(1d * totalDays)
| where DeviceName has "<DEVICE_NAME>"
| summarize 
    FirstSeen = min(TimeGenerated),
    LastSeen = max(TimeGenerated),
    RecordCount = count(),
    SensorHealth = take_any(SensorHealthState),
    OnboardingStatus = take_any(OnboardingStatus)
    by Day = bin(TimeGenerated, 1d)
| order by Day asc

Heartbeat fallback (for non-MDE devices with Log Analytics agent only):

// Fallback: Use Heartbeat table when DeviceInfo returns 0 results (device not MDE-onboarded)
let totalDays = 97;
Heartbeat
| where TimeGenerated > ago(1d * totalDays)
| where Computer has "<DEVICE_NAME>"
| summarize 
    FirstHeartbeat = min(TimeGenerated),
    LastHeartbeat = max(TimeGenerated),
    HeartbeatCount = count()
    by Day = bin(TimeGenerated, 1d)
| order by Day asc

Interpretation:

  • Gaps between days = device was powered off (or MDE sensor was inactive). Count the rows to determine total days online vs. the full analysis window.
  • SensorHealthState values: Active (sensor reporting normally), Inactive (sensor not communicating), Misconfigured (partial telemetry). Use to assess data quality.
  • Intermittent devices (online <30% of baseline window) will produce artificially diluted baseline daily averages. A single power-on session will appear as a large volume spike. This is a mathematical artifact, not genuine drift.
  • Consistent daily presence confirms the baseline daily average is representative — volume spikes are more meaningful.
  • Use case: When a device shows elevated Volume ratio (>200%) but low Process/Account/Chain diversity ratios, check DeviceInfo first. If the device was only online 5 days out of 90, the 312% volume ratio is expected.
  • Example: A device with 4,243 baseline events spread across only 4 power-on sessions (~40 hrs total) has a "true" daily average of ~1,060 events/session-day, not the diluted ~47 events/calendar-day. A recent session producing 1,031 events is exactly normal.
  • Why DeviceInfo over Heartbeat: DeviceInfo is generated by the MDE sensor (~hourly entity snapshots) and covers all Defender-onboarded devices. Heartbeat requires a Log Analytics agent (AMA/MMA) which many MDE-only devices don't have. In testing, DeviceInfo showed 28 days of coverage where Heartbeat showed only 3 days for the same device.

Query 22: Per-Session Process Volume (Device Corroboration)

// Corroboration query: Show event volume and diversity per power-on session
// Confirms events are concentrated in short bursts, not spread evenly
// Substitute <DEVICE_NAME> with the target device hostname
let totalDays = 97; // Intentionally wider than the drift analysis window (default 7d) to capture per-session behavior across the device's full power-on history
DeviceProcessEvents
| where TimeGenerated > ago(1d * totalDays)
| where DeviceName has "<DEVICE_NAME>"
| summarize 
    Events = count(),
    UniqueProcesses = dcount(FileName),
    UniqueAccounts = dcount(AccountName),
    FirstEvent = min(TimeGenerated),
    LastEvent = max(TimeGenerated)
    by Day = bin(TimeGenerated, 1d)
| extend SessionDuration = LastEvent - FirstEvent
| order by Day asc

Interpretation:

  • Per-session event volumes should be compared across sessions. If each power-on session produces roughly similar event counts (600–1,500), the behavior is consistent regardless of how infrequently the device is used.
  • SessionDuration shows how long the device was active per day. Cross-reference with DeviceInfo FirstSeen/LastSeen for validation.
  • Process diversity per session (UniqueProcesses) should be similar across sessions. If the most recent session shows 90+ unique processes and baseline sessions also show 70–90+, the diversity is normal — the same software runs each time the device boots.
  • Use in report: Include a power-on session table in the Flagged Device Deep Dive to contextualize why the volume ratio is elevated. Note: "Volume-driven score inflation due to intermittent usage pattern — per-session behavior is consistent with baseline sessions."

Report Template

Inline Chat Report Structure (Fleet-Wide)

The inline report MUST include these sections in order:

  1. Header — Workspace, analysis period (baseline/recent windows), drift threshold, device count, total events
  2. Fleet Daily Trend Table — Day-by-day event counts, distinct processes, accounts, chains, companies
  3. Per-Device Drift Score Ranking — All devices sorted by DriftScore descending, with per-dimension ratios and flag status
  4. Flagged Device Deep Dive (for each Tier 1 device > 150% or DriftScore=999) — Baseline vs. recent comparison, dimension bar chart, new processes, process chains, account context. For new devices (999): identify as "newly onboarded" and list all processes observed. For devices with elevated volume ratio: include DeviceInfo uptime pattern (Query 21) and per-session volume table (Query 22) showing power-on cadence and per-session event consistency. Flag intermittent devices with: "⚠️ Intermittent device — online N of M baseline days. Volume ratio reflects power-on burst, not behavioral expansion."
  5. Tier 2 Device Summaries (if fleet scaling applied) — One-line summary per Tier 2 device: drift score, top 3 first-seen processes, flag status. No full deep dive.
  6. First-Seen Process Summary — Processes appearing only in recent window, grouped by device (Tier 1 + Tier 2 devices)
  7. Correlated Security Alerts — SecurityAlert+SecurityIncident correlation for all analyzed devices
  8. Uptime Context (if applicable) — For flagged or near-threshold devices, include DeviceInfo-derived power-on session table showing each session's duration, event count, and process diversity. This section contextualizes volume-driven drift scores.
  9. Account Landscape — Summary of which accounts executed processes, flagging any unexpected contexts
  10. Notable Command-Line Patterns — Reconnaissance/lateral movement/persistence command matches
  11. Security Assessment — Emoji-coded findings table with evidence citations
  12. Verdict Box — Overall fleet risk level, per-device verdicts, recommendations

Inline Chat Report Structure (Single-Device)

Same as fleet-wide sections 1, 3-11, but for one device only. Add:

  • Full process inventory (baseline vs recent)
  • Complete command-line analysis for suspicious processes
  • Process chain tree visualization

Markdown File Report Structure

When outputting to markdown file, include everything from the inline format PLUS:

Filename patterns:

  • Fleet-wide: reports/scope-drift/device/Scope_Drift_Report_fleet_devices_YYYYMMDD_HHMMSS.md
  • Single-device: reports/scope-drift/device/Scope_Drift_Report_<device_name>_YYYYMMDD_HHMMSS.md
# Device Process Scope Drift Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Baseline Period:** <start> → <end> (<N> days)
**Recent Period:** <start> → <end> (<N> days)
**Drift Threshold:** 150%
**Data Sources:** DeviceProcessEvents, SecurityAlert, SecurityIncident, DeviceInfo
**Mode:** Fleet-Wide | Single-Device (<device_name>)
**Devices Analyzed:** <count>
**Total Events:** <count>

---

## Executive Summary

<1-3 sentence summary: how many devices analyzed, how many flagged, overall risk level>

---

## Fleet Daily Trend

<ASCII table: Day | Events | Devices | Processes | Accounts | Chains | Companies>
<!-- Wrap in code fence for consistent rendering -->

---

## Per-Device Drift Score Ranking

<Table with all devices, per-dimension ratios, DriftScore, flag status>
<Devices with DriftScore=999 flagged as "New Device">

---

## Flagged Device Deep Dive

### <Device Name> — Drift Score <score>

**ASCII Drift Dimension Chart (REQUIRED):**

Render a box-drawn chart inside a code fence. **Inner width: 58 chars** (every line between `│` markers = exactly 58 visual characters). No emoji inside boxes — use text labels.

**Alignment:** Name (9 chars padded) + weight (5) + gap (2) + bars (20 `█─`) + gap (2) + pct (6, right-aligned: `XXX.X%` or ` XX.X%`) + gap (2) + direction (10 total: `^`/`v`/`=` + 9 trailing spaces). Status labels (centered): `STABLE`, `STABLE (Low-Volume)`, `NEAR THRESHOLD`, `ABOVE THRESHOLD`, `CRITICAL`. Direction: `^` (up), `v` (down), `=` (stable).

**Bar characters:** Use `█` (U+2588 full block) for filled portions and `─` (U+2500 box-drawing horizontal) for the unfilled track.

**Uptime-adjusted Volume:** When the Volume dimension has been adjusted for intermittent uptime (see Pitfalls → Intermittent-Use Device Volume Inflation), display the **effective (adjusted) percentage** in the chart and move the raw value into the description column. This keeps the percentage column fixed-width and avoids breaking bar alignment. Example: `XXX.X%  ^  (raw: YYY.Y%)`.

┌──────────────────────────────────────────────────────────┐ │ DEVICE DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (30%) ██████────────────── XXX.X% ^ │ │ Processes(25%) ███───────────────── XX.X% v │ │ Accounts (15%) ██████────────────── XXX.X% = │ │ Chains (20%) ██────────────────── XX.X% v │ │ Companies(10%) ██████────────────── XXX.X% = │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘


**Bar fill:** 20 chars wide. Filled = round(ratio/100 × 20), capped at 20. Title and status: center within 58 chars. Use `█` for filled, `─` for unfilled.

**Then** render the standard markdown dimension table:

| Dimension | Weight | Baseline | Recent | Ratio | Weighted | Status |
|-----------|--------|----------|--------|-------|----------|--------|

<Baseline vs recent comparison table>
<New processes list with signing companies>
<New process chains>
<Account context>

#### Uptime Context (if intermittent device)

<If Volume ratio >200% or device known to be intermittent, include DeviceInfo-derived power-on session table>

| Session | Power On | Power Off | Duration | Events | Processes |
|---------|----------|-----------|----------|--------|-----------|
| 1 | <date/time> | <date/time> | ~N hrs | <count> | <count> |
| ... | ... | ... | ... | ... | ... |

⚠️ Intermittent device — online N of M baseline days. Volume ratio reflects power-on burst, not behavioral expansion. Per-session behavior is consistent with baseline sessions.

---

## First-Seen Processes

<Processes appearing only in recent window, by device>

---

## Correlated Security Alerts

<SecurityAlert + SecurityIncident correlation>
<Group by incident, show severity/status/classification>

---

## Notable Command-Line Patterns

<Reconnaissance/lateral movement/persistence/exfiltration matches>
<Context: which account, which device, benign vs suspicious>

---

## Security Assessment

| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡 **Factor** | Evidence-based finding |

---

## Verdict

**ASCII Verdict Box (REQUIRED):**

Render a box-drawn verdict summary inside a code fence. **Inner width: 66 chars.** No emoji inside boxes. Pad every line to exactly 66 chars between `│` markers.

For fleet-wide reports:

┌──────────────────────────────────────────────────────────────────┐ │ OVERALL FLEET RISK: <LEVEL> -- <One-line summary> │ │ Flagged Devices: X of Y (Threshold: 150%) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘


For single-device reports:

┌──────────────────────────────────────────────────────────────────┐ │ OVERALL RISK: <LEVEL> -- <One-line summary> │ │ Drift Score: XX.X (Interpretation) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘


**Then** render the full verdict with:
- Per-device verdicts (for fleet-wide)
- Root Cause Analysis paragraph
- Key Findings (numbered list)
- Recommendations (emoji-prefixed list)

---

## Appendix: Query Details

Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.

| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q15 — Device Process Baseline vs. Recent | DeviceProcessEvents | X,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |

*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*

Known Pitfalls

SecurityAlert.Status Is Immutable — Always Join SecurityIncident

Problem: The Status field on SecurityAlert is set to "New" at creation time and never changes. It does NOT reflect whether the alert has been investigated, closed, or classified. Solution: MUST join with SecurityIncident to get real Status (New/Active/Closed) and Classification (TruePositive/FalsePositive/BenignPositive). See Query 18 which implements this join.

Low-Volume Statistical Inflation

Problem: Entities with very low baseline activity will show extreme volume ratios even with minor changes. Solution: Apply the denominator floor (minimum 10 events/day for volume ratio calculation). Always flag low-volume baselines in the report.

Seasonal/Cyclical Baselines

Problem: Some devices have weekly patterns (lower on weekends) or monthly cycles (patch Tuesday). Solution: Note if the recent window falls on an atypical portion of the cycle. The baseline smooths most cyclical patterns, but edge cases exist.

Newly Onboarded Devices (DriftScore = 999)

Problem: Devices that appear only in the recent window (no baseline data) will have all dimension ratios default to 999, producing an extreme drift score. This does NOT indicate malicious drift — it indicates a newly discovered or recently onboarded device. Solution: Flag these devices as "🔵 New Device — No Baseline" rather than "🔴 Critical Drift". Review the process inventory to confirm the device is running expected management software (MDM agents, AV, etc.). Recommend monitoring for an additional baseline period before assessing drift.

Data Lake Ingestion Boundary

Problem: DeviceProcessEvents in Sentinel Data Lake may have an ingestion lag or retention boundary that causes the most recent hours of data to be absent. This can make devices appear to have zero recent-window activity when data simply hasn't been ingested yet. Solution: In the fleet daily trend (Query 14), verify that the most recent day has comparable event counts to previous days. If the last day shows significantly fewer events across ALL devices, note: "⚠️ Data Lake ingestion boundary detected — recent window may be incomplete." Adjust the recent window start time if needed.

Advanced Hunting Fallback

Problem: DeviceProcessEvents may fail in one of the two execution tools due to query complexity, timeout, or API limitations. This table is available in both Advanced Hunting and Sentinel Data Lake. Solution: Follow the global Tool Selection Rule in .github/copilot-instructions.md: use Advanced Hunting (RunAdvancedHuntingQuery with Timestamp) for lookbacks ≤ 30 days, and Sentinel Data Lake (query_lake with TimeGenerated) for lookbacks > 30 days (e.g., 90-day baselines). If the preferred tool fails, try the other — same table, same data. If both fail, check that the Defender XDR connector is connected to the workspace.

System/Service Accounts Dominating Volume

Problem: The majority of process events on servers come from system accounts (SYSTEM, LOCAL SERVICE, NETWORK SERVICE, root). These accounts are expected and will dominate volume, process, and chain dimensions. Solution: When analyzing drift, distinguish between system-level processes (expected) and user-driven processes (more significant for drift). In the account landscape, flag any human user accounts (non-system) executing unusual processes. System accounts executing new processes are still worth noting but at lower priority.

Short Baseline Windows and False Positives

Problem: Unlike SPN/user drift which uses a 90-day baseline, device process drift often uses shorter windows (e.g., 6 days baseline, 1 day recent). Short baselines miss infrequent but legitimate processes (weekly maintenance scripts, monthly update cycles, etc.). Solution: Note the baseline length in the report. If many "first-seen" processes are common system utilities (Task Scheduler, Windows Update, antivirus scans), acknowledge that a longer baseline would likely include them. Recommend extending to 14-30 days for production use.

DeviceProcessEvents Volume Limits

Problem: DeviceProcessEvents can generate massive volumes — tens of thousands of events per device per day on busy servers. KQL queries with dcount() and make_set() can be expensive. Solution: Always apply TimeGenerated filter as the FIRST filter. Use take or summarize to limit intermediate results. For fleet-wide analysis across many devices, consider processing in batches if total events exceed 500K.

Intermittent-Use Device Volume Inflation

Problem: Devices that are only powered on occasionally (e.g., once per month for maintenance, lab servers, training VMs) will have their baseline daily average diluted across the full analysis window — even though telemetry only exists for a handful of days. When one of these devices powers on during the recent window, the volume ratio can spike to 300%+ even though per-session behavior is identical to baseline sessions. This creates near-threshold or above-threshold DriftScores driven entirely by the volume dimension, with no meaningful behavioral change. Solution: For any device with Volume ratio >200% but Process/Account/Chain/Company ratios below 100%, run Query 21 (DeviceInfo uptime) to determine actual days online. If the device was online for <30% of the baseline window (i.e., fewer than ~27 out of 90 days), flag as "⚠️ Intermittent device — volume-driven score inflation" and include a per-session comparison (Query 22). Consider reporting both the raw DriftScore and an "adjusted" assessment that contextualizes the volume dimension against actual uptime days rather than calendar days. The diversity dimensions (Processes, Accounts, Chains, Companies) are not affected by intermittent usage and remain reliable drift indicators.

Chart formatting for adjusted Volume: In the ASCII drift chart, display only the effective (adjusted) percentage in the percentage column, and append the raw value in the description text after the bar. This avoids variable-width bracket content that breaks bar alignment. Example:

Volume   [ 85.1%] ████████────── ↓ Adjusted from 288.3% raw (intermittent uptime)
Process  [ 79.5%] ████████────── ↓ Contracting (97/122 unique)

Version-Stamped Process Name False Positives

Problem: Automatic software updates produce binaries with version numbers embedded in the filename (e.g., AM_Delta_Patch_1.443.XXX.0.exe, MicrosoftEdge_X64_134.0.XXXX.XX_*.exe, odt*.tmp.exe). These appear as "first-seen" in Query 16 and "new chains" in Query 17 regardless of baseline length, because each update generates a unique filename. Solution: When interpreting first-seen processes, check ProcessVersionInfoCompanyName — if the signing company is well-known (Microsoft Corporation, Google LLC, etc.), these are expected update artifacts. In the report, group these under "📦 Expected Update Artifacts" rather than flagging as suspicious drift. For automated scoring, consider excluding filenames matching patterns like AM_Delta_Patch_*, MicrosoftEdge_X64_*, and *.tmp.exe from the drift score calculation, or weighting them lower.

Linux Processes Dominate Unsigned Query

Problem: Linux binaries do not populate ProcessVersionInfoCompanyName (a Windows PE metadata field). Query 19b (unsigned processes) will be flooded with legitimate Linux utilities (gawk, bash, grep, sed, curl, apt-get, etc.) on any fleet containing Linux devices. Solution: When running Query 19b on a mixed fleet, filter to Windows devices only (| where DeviceName !has "linux") or annotate Linux results separately. For Linux devices, focus on unusual binary paths (e.g., processes running from /tmp/, /dev/shm/, or user home directories) rather than signing status.


Error Handling

Common Issues

Issue Solution
DeviceProcessEvents table not found Table may not be connected via Defender XDR connector. Check with search_tables. Verify Defender for Endpoint is onboarded.
DeviceProcessEvents query timeout Reduce lookback window or add intermediate summarize. Split fleet-wide into batches by device if >20 devices.
Advanced Hunting fails for DeviceProcessEvents Default to Sentinel Data Lake (query_lake). Adapt TimestampTimeGenerated. See Advanced Hunting Fallback pitfall.
Device appears only in recent window New device onboarding — set DriftScore=999, flag as "New Device", not malicious drift.
All devices show zero recent events Data Lake ingestion boundary — verify with fleet daily trend (Query 14). Adjust recent window if needed.
Query timeout Reduce the lookback window, or add | take 100 to intermediate results.

Validation Checklist

Before presenting results, verify:

  • All applicable data sources were queried (even if some returned 0 results)
  • Low-volume denominator floor was applied to any device with BL_DailyAvg < 10
  • Corroborating evidence was checked for every flagged device
  • Empty results are explicitly reported with ✅ (not silently omitted)
  • The report includes the drift score formula and threshold for transparency
  • SecurityAlert was joined with SecurityIncident for real Status/Classification (never read SecurityAlert.Status directly)
  • Incident classifications (TP/FP/BP) were factored into risk assessment — FalsePositive alerts discounted, TruePositive alerts escalated
  • Fleet daily trend was verified for data completeness (no ingestion boundary issues)
  • Newly onboarded devices (baseline-only = no recent, or recent-only = no baseline) were correctly identified
  • DriftScore=999 entities were flagged as "New Device" not "Critical Drift"
  • System/service account processes were distinguished from user-driven processes
  • First-seen processes were checked for legitimate software deployment vs suspicious binaries
  • Version-stamped update binaries (AM_Delta_Patch_, MicrosoftEdge_X64_, odt*.tmp.exe) were classified as expected noise
  • Unsigned/unusually-signed binaries were identified (Linux devices flagged separately from Windows)
  • Notable command-line patterns were searched (reconnaissance, lateral movement, persistence, exfiltration)
  • SecurityAlert correlation was performed for all analyzed devices
  • Baseline window length was noted and its limitations acknowledged
  • For devices with Volume ratio >200% or DriftScore >130%: DeviceInfo uptime (Query 21) was checked to identify intermittent-use devices
  • Intermittent-use devices were annotated with uptime context and per-session comparison (Query 22)
  • Volume-driven drift scores on intermittent devices were contextualized as mathematical artifacts (not behavioral expansion)

SVG Dashboard Generation

📊 Optional post-report step. After a Device scope drift report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/scope-drift/device/Scope_Drift_Report_<entity>_<date>.md
  • Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/scope-drift/device/{report_name}_dashboard.svg

The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.

检测 Entra ID 服务主体(SPN)的范围漂移,识别权限或行为的渐进式异常扩张。通过构建90天基线并与7天近期活动对比,计算5维加权漂移得分,关联安全日志以发现隐蔽的访问扩展风险。
scope drift service principal drift SPN behavioral change automation account drift baseline deviation access expansion
.github/skills/scope-drift-detection/spn/SKILL.md
npx skills add SCStelz/security-investigator --skill scope-drift-detection-spn -g -y
SKILL.md
Frontmatter
{
    "name": "scope-drift-detection-spn",
    "description": "Use this skill when asked to detect scope drift, behavioral expansion, or gradual privilege\/access creep in service principals or automation accounts. Triggers on keywords like \"scope drift\", \"service principal drift\", \"SPN behavioral change\", \"automation account drift\", \"baseline deviation\", \"access expansion\", or when investigating whether a service principal has gradually expanded beyond its intended purpose. This skill builds a 90-day behavioral baseline per SPN, compares it with 7-day recent activity, computes a weighted Drift Score across 5 dimensions, and correlates with SecurityAlert and AuditLogs for corroborating evidence.",
    "drill_down_prompt": "Analyze service principal drift for {entity} — resource\/IP\/location expansion",
    "threat_pulse_domains": [
        "spn"
    ]
}

Service Principal Scope Drift Detection — Instructions

Purpose

Credit: The scope drift detection concept for service principals was inspired by Iftekhar Hussain's article The Agentic SOC Era: How Sentinel MCP Enables Autonomous Security Reasoning (Feb 2026), which demonstrated multi-source correlation across AADServicePrincipalSignInLogs, AuditLogs, and SecurityAlert to build 90-day behavioral baselines and surface drift via weighted scoring.

This skill detects scope drift — the gradual, often imperceptible expansion of access or behavior beyond an established baseline — in Entra ID service principals. Unlike sudden compromise (which triggers alerts), scope drift is a slow-burn pattern that evades threshold-based detections.

Entity Type: Service Principal

Identifier Primary Table(s) Use Case
ServicePrincipalName / ServicePrincipalId AADServicePrincipalSignInLogs App registrations, automation accounts, managed identities

What this skill detects:

  • Volume spikes in sign-in activity relative to historical baseline
  • New target resources (APIs, services) not previously accessed
  • New source IP addresses or geographic locations
  • Increased failure rates indicating probing or misconfiguration
  • Credential/permission changes correlated with behavioral shifts
  • Security alerts involving the drifting entities

Related skills:


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Output Modes - Inline chat vs. Markdown file
  3. Quick Start - 7-step investigation pattern
  4. Drift Score Formula - Weighted composite scoring (5 dimensions)
  5. Execution Workflow - Complete 4-phase process
  6. Sample KQL Queries - Validated query patterns (Queries 1-4)
  7. Report Template - Output format specification
  8. Known Pitfalls - Edge cases and false positives
  9. Error Handling - Troubleshooting guide
  10. SVG Dashboard Generation - Visual dashboard from report

Investigation shortcuts:

  • SPN drift triage (TP Q5): Q1 (baseline vs recent — drift scores + dimension ratios) → Q4 (alert/incident correlation) → Tier 1 deep dives for flagged SPNs
  • Compromised SPN forensics (TP Q5 + incident context): Q1 (behavioral profile) → Q3 (detailed AuditLog changes — credential adds, consent grants, timestamps, actors) → Q4 (incident status/classification check)
  • Permission escalation investigation (TP Q10, standalone): Q2 (AuditLog summary — operation counts baseline vs recent) → Q3 (detailed per-operation rows with initiator/target/modified properties) → Graph API: app permission audit
  • IP infrastructure expansion (TP Q5, high IPDrift): Q1 (new IPs list from NewIPs array) → anti-join baseline IPs to identify novel sources → IP enrichment (enrich_ips.py or ioc-investigation) for non-Azure IPs

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY SPN scope drift analysis:

  1. ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
  2. ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
  3. ALWAYS build baseline FIRST before comparing recent activity
  4. ALWAYS apply the low-volume denominator floor to prevent false-positive drift scores on sparse baselines
  5. ALWAYS correlate across all required data sources (AADServicePrincipalSignInLogs, AuditLogs, SecurityAlert)
  6. ALWAYS run independent queries in parallel for performance
  7. NEVER report a drift flag without corroborating evidence from at least one secondary data source

Data Sources

Data Source Role Purpose
AADServicePrincipalSignInLogs ✅ Primary SPN sign-in behavioral baseline
AuditLogs ✅ Corroboration Permission/credential/role changes
SecurityAlert ✅ Corroboration Corroborating alert evidence
SecurityIncident ✅ Corroboration Real alert status/classification

⛔ MANDATORY: Sentinel Workspace Selection

This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:

When invoked from incident-investigation skill:

  • Inherit the workspace selection from the parent investigation context
  • If no workspace was selected in parent context: STOP and ask user to select

When invoked standalone (direct user request):

  1. ALWAYS call list_sentinel_workspaces MCP tool FIRST
  2. If 1 workspace exists: Auto-select, display to user, proceed
  3. If multiple workspaces exist:
    • Display all workspaces with Name and ID
    • ASK: "Which Sentinel workspace should I use for this investigation?"
    • ⛔ STOP AND WAIT for user response
    • ⛔ DO NOT proceed until user explicitly selects
  4. If a query fails on the selected workspace:
    • ⛔ DO NOT automatically try another workspace
    • STOP and report the error, display available workspaces, ASK user to select

🔴 PROHIBITED ACTIONS:

  • ❌ Selecting a workspace without user consent when multiple exist
  • ❌ Switching to another workspace after a failure without asking
  • ❌ Proceeding with investigation if workspace selection is ambiguous

Output Modes

This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.

Mode 1: Inline Chat Summary (Default)

  • Render the full drift analysis directly in the chat response
  • Includes ASCII tables, Pareto chart, drift dimension bars, and security assessment
  • Best for quick review and interactive follow-up questions

Mode 2: Markdown File Report

  • Save a comprehensive report to reports/scope-drift/spn/Scope_Drift_Report_<entity>_<timestamp>.md
  • All ASCII visualizations render correctly inside markdown code fences (```)
  • Includes all data from inline mode plus additional detail sections
  • Use create_file tool — NEVER use terminal commands for file output
  • Filename patterns:
    • Single SPN: Scope_Drift_Report_<spn_short_name>_YYYYMMDD_HHMMSS.md (use display name, sanitized: lowercase, spaces/special chars replaced with hyphens)
    • All SPNs: Scope_Drift_Report_all_spns_YYYYMMDD_HHMMSS.md (tenant-wide scan of all service principals)

Markdown Rendering Notes

  • ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
  • ✅ Unicode block characters ( full block, box-drawing horizontal) display correctly in monospaced fonts
  • ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
  • ✅ Standard markdown tables (| col |) render as formatted tables
  • Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering

Quick Start (TL;DR)

When a user requests SPN scope drift detection:

  1. Select Workspacelist_sentinel_workspaces, auto-select or ask
  2. Determine Output Mode → Ask if not specified: inline, markdown file, or both
  3. Run Phase 1 → Query 1 (AADServicePrincipalSignInLogs baseline vs recent)
  4. Apply Entity Scaling → Compute drift scores, rank SPNs, apply tiered depth limits (see Entity Scaling)
  5. Run Phases 2-3 → Queries 2-4 (AuditLogs + SecurityAlert) — scoped per tier
  6. Compute Final Assessment → Combine drift scores with corroborating evidence
  7. Output Results → Render in selected mode(s) with tiered depth

Entity Scaling (Large Environments)

Problem: In small tenants, running Queries 2–4 for every SPN is fine. In enterprise environments with hundreds or thousands of service principals, running deep-dive queries for every flagged entity is prohibitively expensive and produces unreadable reports.

Solution: After Phase 1 computes drift scores for all SPNs, apply tiered depth based on entity count and drift severity.

Entity Count Detection

After Query 1, count distinct SPNs in the result set:

Entity Count Tier Deep Dive Limit Behavior
≤ 30 SPNs Small All flagged Full deep dive for every SPN > 150%. No limiting needed.
31–100 SPNs Medium Top 10 Full deep dive for top 10 by DriftScore. Summary row for remaining flagged SPNs.
101–500 SPNs Large Top 10 Full deep dive for top 10. Tier 2 summary (next 15) with new resources/IPs only. Remaining flagged SPNs listed in ranking table with scores but no deep dive.
> 500 SPNs Very Large Top 10 Same as Large, plus: filter Phase 1 results to BL_TotalSignIns > 10 to exclude near-silent SPNs from scoring.

Tiered Depth Model

After computing drift scores and ranking all SPNs, assign tiers:

Tier Entities Queries Run Report Depth
Tier 1 (Full) Top N by DriftScore All: Q2, Q3, Q4 Full deep dive: ASCII chart, dimension table, new resources/IPs/locations, AuditLog changes, alerts
Tier 2 (Summary) Next 15 flagged SPNs (or remaining if < 15) Q4 only (SecurityAlert correlation) One-line summary per SPN: score, top 3 new resources, new IPs, flag status
Tier 3 (Score only) All remaining flagged SPNs None beyond Phase 1 Row in ranking table: SPN name, drift score, dimension ratios, flag emoji
Stable SPNs ≤ 150% None beyond Phase 1 Omitted from deep dives. Included in summary statistics only.

User Override

If the user explicitly asks for "all SPNs detailed" or "full report", honor the request but warn:

⚠️ Tenant has <N> service principals with <X> flagged above 150%. Running full deep dives for all flagged SPNs may be slow and produce a very long report. Proceed? (Default: top 10 deep dives + summary for others)

Report Disclosure

When tiered depth is applied, always disclose in the report header:

**Entity Count:** <N> service principals (Large tenant — tiered analysis applied)
**Deep Dives:** Top <X> by DriftScore (Tier 1: full analysis)
**Summaries:** <Y> additional flagged SPNs (Tier 2: alert correlation only)
**Score Only:** <Z> additional flagged SPNs (Tier 3: ranking table only)
**Stable:** <W> SPNs ≤ 150% (omitted from deep dives)

Drift Score Formula

The Drift Score is a weighted composite of behavioral dimensions, normalized so that 100 = identical to baseline.

Service Principal Formula (5 Dimensions)

$$ \text{DriftScore}_{SPN} = 0.30V + 0.25R + 0.20IP + 0.15L + 0.10F $$

Dimension Weight Metric Why
Volume 30% Daily avg sign-ins (recent / baseline) Sudden activity surges indicate misuse or compromise
Resources 25% Distinct target resources accessed New resource targets = lateral expansion
IPs 20% Distinct source IP addresses New IPs = infrastructure changes, credential theft
Locations 15% Distinct geographic locations New geos = impossible travel or proxy rotation
Failure Rate 10% Failure rate delta (recent − baseline) Rising failures = probing or brute-force

Interpretation Scale

Score Meaning Action
< 80 Contracting scope ✅ Normal — entity is doing less than usual
80–120 Stable / normal variance ✅ No action required
120–150 Moderate deviation 🟡 Monitor — check for legitimate reasons
> 150 Significant drift 🔴 FLAG — investigate with corroborating evidence
> 250 Extreme drift 🔴 CRITICAL — immediate investigation required

Low-Volume Denominator Floor

CRITICAL: For entities with sparse baselines (< 10 daily sign-ins), the volume ratio is artificially inflated. Apply a floor:

IF BL_DailyAvg < 10:
    AdjustedVolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100
    Flag the score with: "⚠️ Low-volume baseline — ratio may be inflated"

This prevents an entity averaging 1 sign-in/day from triggering at 6 sign-ins/day (600% ratio but trivial absolute volume).

Failure Rate Dimension — Delta-to-Ratio Conversion

CRITICAL: The FailRate dimension is a percentage-point delta, not a multiplicative ratio like the other dimensions. Convert it to the same 0–200+ scale using this formula:

FailRateDelta = RecentFailRate - BaselineFailRate  (percentage points)
FailRateRatio = 100 + (FailRateDelta × 10)         (scaled: each +1pp = +10 on the ratio scale)
Baseline FailRate Recent FailRate Delta Ratio Interpretation
5.00% 5.00% 0.00 100.0 No change
5.00% 8.00% +3.00 130.0 Moderate increase
5.00% 12.00% +7.00 170.0 🔴 Above threshold
5.00% 2.00% -3.00 70.0 Improving (contracting)
0.00% 0.00% 0.00 100.0 No change (both clean)
0.00% 5.00% +5.00 150.0 🟡 At threshold — new failures appearing

Edge case: Baseline = 0% avoids division-by-zero because delta is additive, not multiplicative. The scaling factor (×10) means each percentage point of failure rate increase maps to 10 points on the drift scale. This keeps FailRate on the same magnitude as the other dimensions.

In the ASCII chart: Show the ratio as the bar fill percentage and append the raw delta as direction indicator: ^+X.XX (increasing) or v-X.XX (decreasing).


Execution Workflow

Phase 1: Behavioral Baseline vs. Recent Comparison

Baseline window: 90 days (days 8–97 ago) Recent window: 7 days (last 7 days)

This is the primary query that computes per-SPN behavioral profiles and drift metrics.

Data Source Query Notes
AADServicePrincipalSignInLogs Query 1 Single query, 5 dimensions

Phase 2: Permission & Configuration Change Audit

Data source: AuditLogs Correlation: Same 97-day window, filtered to SPNs from Phase 1

Operations to Look For:

  • Add/Remove service principal credentials
  • Update application – Certificates and secrets management
  • Consent to application
  • Add delegated permission grant
  • Add app role assignment to service principal
  • Add application
  • Add service principal
  • Any operation containing: "permission", "role", "consent", "oauth", "credential", "certificate", "secret"

Phase 3: Security Alert Correlation

Run Query 4 in parallel with Phase 2 queries for performance.

  • SecurityAlert + SecurityIncident (Query 4): Check for alerts referencing SPN IDs or names, joined with SecurityIncident for real status/classification. Never read SecurityAlert.Status directly — it's always "New".

Phase 4: Score Computation & Report Generation

  1. Compute DriftScore per SPN using the 5-dimension formula
  2. Apply the low-volume denominator floor
  3. Flag any entity exceeding 150% threshold
  4. For flagged entities: assess corroborating evidence (permission changes, alerts)
  5. Generate risk assessment with emoji-coded findings
  6. Render output in the user's selected mode

Sample KQL Queries

Query 1: Baseline vs. Recent Behavioral Comparison

// Build 90-day baseline (days 8-97 ago) vs recent 7 days per service principal
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
let recentStart = ago(7d);
// Baseline period: per-SPN behavioral profile
let baseline = AADServicePrincipalSignInLogs
| where TimeGenerated between (baselineStart .. baselineEnd)
| summarize
    BL_TotalSignIns = count(),
    BL_Days = dcount(bin(TimeGenerated, 1d)),
    BL_DistinctResources = dcount(ResourceDisplayName),
    BL_DistinctIPs = dcount(IPAddress),
    BL_DistinctLocations = dcount(Location),
    BL_FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
    BL_Resources = make_set(ResourceDisplayName, 50),
    BL_IPs = make_set(IPAddress, 50),
    BL_Locations = make_set(Location, 50)
    by ServicePrincipalName, ServicePrincipalId;
// Recent period: last 7 days
let recent = AADServicePrincipalSignInLogs
| where TimeGenerated >= recentStart
| summarize
    RC_TotalSignIns = count(),
    RC_Days = dcount(bin(TimeGenerated, 1d)),
    RC_DistinctResources = dcount(ResourceDisplayName),
    RC_DistinctIPs = dcount(IPAddress),
    RC_DistinctLocations = dcount(Location),
    RC_FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
    RC_Resources = make_set(ResourceDisplayName, 50),
    RC_IPs = make_set(IPAddress, 50),
    RC_Locations = make_set(Location, 50)
    by ServicePrincipalName, ServicePrincipalId;
// Join and compute drift metrics
baseline
| join kind=inner recent on ServicePrincipalId
| extend
    BL_DailyAvg = round(1.0 * BL_TotalSignIns / BL_Days, 1),
    RC_DailyAvg = round(1.0 * RC_TotalSignIns / RC_Days, 1)
| extend
    VolumeRatio = iff(BL_DailyAvg > 0, round(RC_DailyAvg / BL_DailyAvg * 100, 1), 999.0),
    ResourceRatio = iff(BL_DistinctResources > 0, round(1.0 * RC_DistinctResources / BL_DistinctResources * 100, 1), 999.0),
    IPRatio = iff(BL_DistinctIPs > 0, round(1.0 * RC_DistinctIPs / BL_DistinctIPs * 100, 1), 999.0),
    LocationRatio = iff(BL_DistinctLocations > 0, round(1.0 * RC_DistinctLocations / BL_DistinctLocations * 100, 1), 999.0),
    FailRateDelta = RC_FailRate - BL_FailRate,
    NewResources = set_difference(RC_Resources, BL_Resources),
    NewIPs = set_difference(RC_IPs, BL_IPs),
    NewLocations = set_difference(RC_Locations, BL_Locations)
| extend
    NewResourceCount = array_length(NewResources),
    NewIPCount = array_length(NewIPs),
    NewLocationCount = array_length(NewLocations)
| extend
    // Composite Drift Score (weighted)
    // FailRate uses additive delta→ratio conversion: 100 + delta×10
    // Negative deltas (improvement) produce values < 100 (contracting)
    FailRateRatio = 100.0 + FailRateDelta * 10
| extend
    DriftScore = round(
        (VolumeRatio * 0.30) +
        (ResourceRatio * 0.25) +
        (IPRatio * 0.20) +
        (LocationRatio * 0.15) +
        (FailRateRatio * 0.10)
    , 1)
| project ServicePrincipalName, ServicePrincipalId,
    BL_Days, BL_TotalSignIns, BL_DailyAvg, BL_DistinctResources, BL_DistinctIPs, BL_DistinctLocations, BL_FailRate,
    RC_Days, RC_TotalSignIns, RC_DailyAvg, RC_DistinctResources, RC_DistinctIPs, RC_DistinctLocations, RC_FailRate,
    VolumeRatio, ResourceRatio, IPRatio, LocationRatio, FailRateDelta, DriftScore,
    NewResourceCount, NewIPCount, NewLocationCount,
    NewResources, NewIPs, NewLocations,
    BL_Resources, RC_Resources
| order by DriftScore desc

Post-processing note: The low-volume denominator floor (max(BL_DailyAvg, 10)) is NOT applied in the KQL above — it must be applied during post-processing when computing the final assessment. If BL_DailyAvg < 10, recalculate VolumeRatio using the floor value and recompute DriftScore. Flag affected SPNs with: "⚠️ Low-volume baseline — ratio may be inflated."

Query 2: AuditLog Permission & Credential Changes

// Permission/credential/role changes for service principals
// Substitute <SPN_IDS> with comma-separated SPN IDs from Query 1
// Substitute <SPN_NAMES> with SPN display names from Query 1
AuditLogs
| where TimeGenerated > ago(97d)
| where OperationName has_any ("service principal", "application", "credential", "certificate",
    "secret", "permission", "role", "consent", "oauth")
| where tostring(TargetResources) has_any (<SPN_IDS>)
    or tostring(InitiatedBy) has_any (<SPN_IDS>)
| extend InBaseline = TimeGenerated < ago(7d)
| summarize
    BaselineOps = countif(InBaseline),
    RecentOps = countif(not(InBaseline))
    by OperationName
| order by RecentOps desc

Query 3: Detailed Recent AuditLog Changes

// Detailed drill-down for the recent 7-day window
// Substitute <SPN_IDS> with SPN IDs from Query 1
AuditLogs
| where TimeGenerated > ago(7d)
| where OperationName has_any ("service principal", "application", "credential", "certificate",
    "secret", "permission", "role", "consent", "oauth", "update")
| where tostring(TargetResources) has_any (<SPN_IDS>)
| project TimeGenerated, OperationName, Result,
    InitiatedBy = tostring(parse_json(tostring(InitiatedBy)).app.displayName),
    TargetName = tostring(parse_json(tostring(parse_json(tostring(TargetResources))[0])).displayName),
    TargetId = tostring(parse_json(tostring(parse_json(tostring(TargetResources))[0])).id),
    ModifiedProperties = tostring(parse_json(tostring(parse_json(tostring(TargetResources))[0])).modifiedProperties)
| order by TimeGenerated desc

Query 4: SecurityAlert + SecurityIncident Correlation

// Security alerts referencing any of the service principals, joined with SecurityIncident for real status
// IMPORTANT: SecurityAlert.Status is immutable (always "New") — MUST join SecurityIncident for real Status/Classification
// Substitute <SPN_IDS> and <SPN_NAMES> with values from Query 1
let relevantAlerts = SecurityAlert
| where TimeGenerated > ago(97d)
| where Entities has_any (<SPN_IDS>) or Entities has_any (<SPN_NAMES>)
    or CompromisedEntity has_any (<SPN_NAMES>)
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName, ProductComponentName, Tactics, Techniques, TimeGenerated;
SecurityIncident
| where CreatedTime > ago(97d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend Period = iff(TimeGenerated1 < ago(7d), "Baseline", "Recent")
| summarize
    BaselineAlerts = countif(Period == "Baseline"),
    RecentAlerts = countif(Period == "Recent"),
    TotalAlerts = count(),
    Severities = make_set(AlertSeverity, 5),
    IncidentStatuses = make_set(Status, 5),
    Classifications = make_set(Classification, 5),
    BaselineIncidents = dcountif(IncidentNumber, Period == "Baseline"),
    RecentIncidents = dcountif(IncidentNumber, Period == "Recent")
    by ProductName
| order by TotalAlerts desc

Interpreting Incident Status in Drift Context:

Incident Status Classification Impact on Drift Assessment
Closed TruePositive 🔴 Confirmed threat — significantly increases drift risk
Closed FalsePositive 🟢 False alarm — discount from drift risk, note as noise
Closed BenignPositive 🟡 Expected behavior — note but don't escalate
Active/New Any 🟠 Unresolved — flag for attention, may indicate ongoing threat

Product Name Mapping (Legacy → Current Branding):

The ProductName field in SecurityAlert contains the detection product. When rendering reports, translate to current Microsoft branding:

SecurityAlert.ProductName (raw) Report Display Name
Microsoft Defender Advanced Threat Protection Microsoft Defender for Endpoint
Microsoft Cloud App Security Microsoft Defender for Cloud Apps
Microsoft Data Loss Prevention Microsoft Purview Data Loss Prevention
Azure Sentinel Microsoft Sentinel
Microsoft 365 Defender Microsoft Defender XDR
Office 365 Advanced Threat Protection Microsoft Defender for Office 365
Azure Advanced Threat Protection Microsoft Defender for Identity

Note: ProviderName (e.g., ASI Scheduled Alerts, MDATP, MCAS) is the internal provider identifier. ProductName (e.g., Azure Sentinel, Microsoft Defender Advanced Threat Protection) is the user-facing product name. Always use ProductName for grouping and display; ProviderName is unreliable for product identification (e.g., all alerts show as Microsoft XDR at the incident level).

Report Rendering: Group alerts by product using the current branded name. Show Baseline Alerts vs Recent Alerts and Baseline Incidents vs Recent Incidents columns per product row, plus Severity and Classification. Include a Total row. Add a brief 1-2 sentence summary comparing alert volume between periods. Do NOT list individual alert names — keep the table concise at the product level.


Report Template

Inline Chat Report Structure

The inline report MUST include these sections in order:

  1. Header — Workspace, analysis period, drift threshold, data sources
  2. Ranked Drift Score Table — All SPNs sorted by DriftScore descending, with per-dimension ratios
  3. Flagged Entity Deep Dive (for each Tier 1 SPN > 150%) — Baseline vs. recent comparison, dimension bar chart, new IPs/resources, corroborating evidence
  4. Tier 2 Entity Summaries (if entity scaling applied) — One-line summary per Tier 2 SPN: score, top 3 new resources, new IPs, alert count
  5. Correlated Signal Summary — AuditLogs and SecurityAlert/Incident findings in a single table
  6. Behavioral Baseline Chart — ASCII bar chart showing all SPNs' daily avg vs. baseline
  7. Security Assessment — Emoji-coded findings table with evidence citations
  8. Verdict Box — Overall risk level, root cause analysis, recommendations

Markdown File Report Structure

When outputting to markdown file, include everything from the inline format PLUS:

Filename patterns:

  • Single SPN: reports/scope-drift/spn/Scope_Drift_Report_<spn_short_name>_YYYYMMDD_HHMMSS.md
  • All SPNs: reports/scope-drift/spn/Scope_Drift_Report_all_spns_YYYYMMDD_HHMMSS.md
# Service Principal Scope Drift Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**Baseline Period:** <start> → <end> (90 days)
**Recent Period:** <start> → <end> (7 days)
**Drift Threshold:** 150%
**Data Sources:** AADServicePrincipalSignInLogs, AuditLogs, SecurityAlert

---

## Executive Summary

<1-3 sentence summary: how many SPNs analyzed, how many flagged, overall risk level>

---

## Drift Score Ranking

<ASCII table with all SPNs, per-dimension ratios, flag status>
<!-- Wrap in code fence for consistent rendering -->

---

## Flagged Entities

### <SPN Name> — Drift Score <score>

**ASCII Drift Dimension Chart (REQUIRED):**

Render a box-drawn chart inside a code fence. **Inner width: 58 chars** (every line between `│` markers = exactly 58 visual characters). No emoji inside boxes — use text labels.

**Alignment:** Name (9 chars padded) + weight (5) + gap (2) + bars (20 `█─`) + gap (2) + pct (6, right-aligned: `XXX.X%` or ` XX.X%`) + gap (2) + direction (10 total: `^`/`v`/`=` + 9 trailing spaces, or FailRate: delta like `v-X.XX` + 4 trailing spaces). Status labels (centered): `STABLE`, `STABLE (Low-Volume)`, `NEAR THRESHOLD`, `ABOVE THRESHOLD`, `CRITICAL`. Direction: `^` (up), `v` (down), `=` (stable).

**Bar characters:** Use `█` (U+2588 full block) for filled portions and `─` (U+2500 box-drawing horizontal) for the unfilled track.

┌──────────────────────────────────────────────────────────┐ │ SPN DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (30%) ██████────────────── XXX.X% ^ │ │ Resources(25%) ███───────────────── XX.X% v │ │ IPs (20%) ██████────────────── XXX.X% = │ │ Locations(15%) ██────────────────── XX.X% v │ │ FailRate (10%) ██████────────────── XXX.X% v-X.XX │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘


**Bar fill:** 20 chars wide. Filled = round(ratio/100 × 20), capped at 20. Title and status: center within 58 chars (include adjusted score if applicable, e.g., "SPN DRIFT SCORE: 107.5 (adj 80.5)"). Use `█` for filled, `─` for unfilled.

**Then** render the standard markdown dimension table:

| Dimension | Weight | Baseline (90d) | Recent (7d) | Ratio | Weighted | Status |
|-----------|--------|----------------|-------------|-------|----------|--------|

<New resources, new IPs, new locations enumeration>
<Corroborating evidence from AuditLogs, SecurityAlert>

---

## Pareto Analysis

<ASCII Pareto chart of drift dimensions or categories>
<80/20 analysis text>

---

## Correlated Signals

| Data Source | Finding | Incident Status |
|-------------|---------|-----------------|
| AADServicePrincipalSignInLogs | ... | N/A |
| AuditLogs | ... | N/A |
| SecurityAlert / SecurityIncident | <Group by ProductName, translate to current branding> | <Status: New/Active/Closed, Classification: TP/FP/BP> |

---

## Security Assessment

| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡 **Factor** | Evidence-based finding |

---

## Verdict

**ASCII Verdict Box (REQUIRED):**

Render a box-drawn verdict summary inside a code fence. **Inner width: 66 chars.** No emoji inside boxes. Pad every line to exactly 66 chars between `│` markers.

┌──────────────────────────────────────────────────────────────────┐ │ OVERALL RISK: <LEVEL> -- <One-line summary> │ │ Flagged SPNs: X of Y (Threshold: 150%) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘


**Then** render the full verdict with:
- Root Cause Analysis paragraph
- Key Findings (numbered list)
- Recommendations (emoji-prefixed list)

---

## Appendix: Query Details

Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.

| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q1 — SPN Baseline vs. Recent | AADServicePrincipalSignInLogs | X,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |

*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*

Known Pitfalls

SecurityAlert.Status Is Immutable — Always Join SecurityIncident

Problem: The Status field on SecurityAlert is set to "New" at creation time and never changes. It does NOT reflect whether the alert has been investigated, closed, or classified. Reading SecurityAlert.Status as current investigation status will always show "New" regardless of actual state. Solution: MUST join with SecurityIncident to get real Status (New/Active/Closed) and Classification (TruePositive/FalsePositive/BenignPositive). See Query 4 which implements this join. When assessing drift risk from alerts, differentiate: Closed-FalsePositive alerts are noise (discount), Closed-TruePositive alerts are confirmed threats (escalate), Active/New incidents need attention (flag).

Low-Volume Statistical Inflation

Problem: Entities with very low baseline activity (e.g., 1 sign-in/day) will show extreme volume ratios even with minor changes. Solution: Apply the denominator floor (minimum 10 sign-ins/day for volume ratio calculation). Always flag low-volume baselines in the report.

Seasonal/Cyclical Baselines

Problem: Some entities have weekly patterns (lower on weekends) or monthly cycles (month-end batch jobs). Solution: Note if the 7-day recent window falls on an atypical portion of the cycle. The 90-day baseline smooths most cyclical patterns, but edge cases exist.

IPv6 Fabric Address Churn

Problem: Microsoft first-party SPNs (MCAS, Defender, etc.) rotate through fd00: internal fabric IPv6 addresses automatically. This inflates the IP ratio without representing actual infrastructure changes. Solution: When all new IPs share the same fd00: prefix, note this as "Microsoft internal fabric rotation" and downgrade the IP dimension's contribution to the drift score assessment. Do NOT flag IPv6 churn from Microsoft fabric addresses as suspicious.

Credential Rotation False Positives

Problem: Automated certificate/secret rotation creates regular Add/Remove service principal credentials audit entries. Solution: Check if credential operations follow a regular cadence (weekly/monthly). If rotation is periodic and consistent with baseline, classify as operational — not drift.

SPNs Without Baseline Data

Problem: Newly provisioned SPNs have no baseline to compare against. Solution: These are excluded from the join kind=inner and will not appear in results. If the user asks about a specific SPN with no baseline, report: "No baseline data available — SPN was provisioned within the recent window or has no sign-in history in the 90-day baseline period."

Sentinel IDs vs Defender XDR IDs for Triage MCP Drill-Down

Problem: Query 4 returns IncidentNumber (Sentinel) and SystemAlertId (Sentinel), but the Triage MCP tools (GetIncidentById, GetAlertById) expect Defender XDR IDs. Passing Sentinel IDs returns "not found" errors. Solution: When following up on correlated alerts/incidents via Triage MCP:

  • Incidents: Always project ProviderIncidentId from SecurityIncident and pass that to GetIncidentById — never use IncidentNumber
  • Alerts: Extract the Defender ID from SecurityAlert: tostring(parse_json(ExtendedProperties).IncidentId) — never use SystemAlertId with the Triage MCP
  • See the global Sentinel ↔ Defender XDR ID Mapping rule in copilot-instructions.md

Error Handling

Common Issues

Issue Solution
AADServicePrincipalSignInLogs table not found This table may not exist in all workspaces. Check if it's available with search_tables. Try Advanced Hunting as fallback.
Zero entities in results Verify the workspace has sign-in data for the entity type. Check if logging is enabled.
Query timeout Reduce the baseline window from 90 to 60 days, or add | take 100 to intermediate results.
AuditLogs has_any not matching Ensure IDs are quoted strings in the dynamic() array. Use tostring() on dynamic fields.
Very large number of SPNs Add | where BL_TotalSignIns > 10 to filter out extremely low-activity SPNs that add noise.

Validation Checklist

Before presenting results, verify:

  • All applicable data sources were queried (even if some returned 0 results)
  • Low-volume denominator floor was applied to any entity with BL_DailyAvg < 10
  • Corroborating evidence was checked for every flagged entity
  • Empty results are explicitly reported with ✅ (not silently omitted)
  • The report includes the drift score formula and threshold for transparency
  • SecurityAlert was joined with SecurityIncident for real Status/Classification (never read SecurityAlert.Status directly)
  • Incident classifications (TP/FP/BP) were factored into risk assessment — FalsePositive alerts discounted, TruePositive alerts escalated
  • IPv6 fd00: addresses were identified as Microsoft fabric (not adversary infrastructure)
  • Credential rotation cadence was assessed for AuditLog findings
  • When drilling into incidents/alerts via Triage MCP, ProviderIncidentId was used (never IncidentNumber or SystemAlertId)

SVG Dashboard Generation

📊 Optional post-report step. After an SPN scope drift report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/scope-drift/spn/Scope_Drift_Report_<entity>_<date>.md
  • Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/scope-drift/spn/{report_name}_dashboard.svg

The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.

用于检测Entra ID用户账户的范围漂移,识别权限或行为的渐进式异常扩张。通过构建90天行为基线并与近期活动对比,计算加权漂移分数,关联安全警报、审计日志及云应用/邮件事件,发现隐蔽的权限滥用或影子IT风险。
用户范围漂移检测 用户行为变化分析 用户基线偏差调查 访问权限逐渐扩大排查
.github/skills/scope-drift-detection/user/SKILL.md
npx skills add SCStelz/security-investigator --skill scope-drift-detection-user -g -y
SKILL.md
Frontmatter
{
    "name": "scope-drift-detection-user",
    "description": "Use this skill when asked to detect scope drift, behavioral expansion, or gradual privilege\/access creep in user accounts. Triggers on keywords like \"user drift\", \"user behavioral change\", \"user scope drift\", \"user baseline deviation\", \"user access expansion\", or when investigating whether a user account has gradually expanded beyond its established behavioral baseline. This skill builds a 90-day behavioral baseline for both interactive and non-interactive sign-ins, compares with 7-day recent activity, computes weighted Drift Scores (7 dimensions for interactive, 6 for non-interactive), and correlates with SecurityAlert, AuditLogs, Identity Protection, custom anomaly tables, CloudAppEvents (cloud app activity drift), and EmailEvents (email pattern drift).",
    "drill_down_prompt": "Analyze user behavioral drift for {entity} — sign-in pattern changes, app usage shifts",
    "threat_pulse_domains": [
        "identity"
    ]
}

User Account Scope Drift Detection — Instructions

Purpose

This skill detects scope drift — the gradual, often imperceptible expansion of access or behavior beyond an established baseline — in Entra ID user accounts. Unlike sudden compromise (which triggers alerts), scope drift is a slow-burn pattern that evades threshold-based detections.

Entity Type: User Account

Identifier Primary Table(s) Use Case
UserPrincipalName (UPN) SigninLogs + AADNonInteractiveUserSignInLogs Human users, admin accounts, shared mailboxes

What this skill detects:

  • Volume spikes in sign-in activity relative to historical baseline
  • New applications accessed (potential unauthorized access or shadow IT)
  • New target resources (APIs, services) not previously accessed
  • New device/OS/browser combinations
  • New source IP addresses or geographic locations
  • Increased failure rates indicating probing or misconfiguration
  • Account configuration changes correlated with behavioral shifts
  • Security alerts involving the user
  • Identity Protection risk events
  • Pre-computed sign-in anomalies (custom table)
  • Cloud app activity drift — new action types, admin operations, impersonation, external user activity (CloudAppEvents)
  • Email pattern drift — volume/direction changes, new sender domains, threat email trends (EmailEvents)

Related skills:


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Output Modes - Inline chat vs. Markdown file
  3. Quick Start - 7-step investigation pattern
  4. Drift Score Formula - Weighted composite scoring (Interactive: 7 dimensions, Non-Interactive: 6 dimensions)
  5. Execution Workflow - Complete 4-phase process
  6. Sample KQL Queries - Validated query patterns (Queries 6-13)
  7. Report Template - Output format specification
  8. Known Pitfalls - Edge cases and false positives
  9. Error Handling - Troubleshooting guide
  10. SVG Dashboard Generation - Visual dashboard from report

Investigation shortcuts:

  • User drift triage (TP Q3): Q6 + Q7 (baseline vs recent — both drift scores + dimension ratios) → Q11 (alert/incident correlation) → Tier 1 deep dives for flagged users
  • Compromised user forensics (TP Q3 + incident context): Q6 + Q7 (behavioral profile) → Q8 (AuditLog changes — password/MFA/role changes, timestamps, actors) → Q10 (Identity Protection risk events) → Q11 (incident status/classification)
  • Sign-in anomaly investigation (TP Q3, high anomaly count): Q6 + Q7 (drift scores) → Q9 (custom anomaly table — new IPs, device combos, geo novelty) → Q10 (Identity Protection cross-reference)
  • Cloud app activity expansion (standalone or TP Q9): Q6 (interactive baseline context) → Q12 (CloudAppEvents — new action types, admin ops, impersonation) → Q11 (alert correlation)

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run the full query set when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY user scope drift analysis:

  1. ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)
  2. ALWAYS ask the user for output mode if not specified: inline chat summary or markdown file report (or both)
  3. ALWAYS build baseline FIRST before comparing recent activity
  4. ALWAYS compute BOTH interactive AND non-interactive drift scores — user accounts produce two drift scores
  5. ALWAYS apply the low-volume denominator floor to prevent false-positive drift scores on sparse baselines
  6. ALWAYS correlate across all required data sources (SigninLogs, AADNonInteractiveUserSignInLogs, AuditLogs, SecurityAlert, Anomaly table, Identity Protection, CloudAppEvents, EmailEvents)
  7. ALWAYS run independent queries in parallel for performance
  8. NEVER report a drift flag without corroborating evidence from at least one secondary data source

Data Sources

Data Source Role Purpose
SigninLogs ✅ Primary User interactive sign-in baseline
AADNonInteractiveUserSignInLogs ✅ Primary User non-interactive (token refresh) baseline
AuditLogs ✅ Corroboration Password/MFA/role/group changes
SecurityAlert ✅ Corroboration Corroborating alert evidence
SecurityIncident ✅ Corroboration Real alert status/classification
Signinlogs_Anomalies_KQL_CL ✅ Corroboration Pre-computed anomaly detection (custom table)
SigninLogs (risk fields) ✅ Corroboration Identity Protection risk events
CloudAppEvents ✅ Corroboration Cloud app activity drift — action types, admin operations, apps, IPs, impersonation
EmailEvents ✅ Corroboration Email pattern drift — volume/direction, sender domains, threat emails

⛔ MANDATORY: Sentinel Workspace Selection

This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:

When invoked from incident-investigation skill:

  • Inherit the workspace selection from the parent investigation context
  • If no workspace was selected in parent context: STOP and ask user to select

When invoked standalone (direct user request):

  1. ALWAYS call list_sentinel_workspaces MCP tool FIRST
  2. If 1 workspace exists: Auto-select, display to user, proceed
  3. If multiple workspaces exist:
    • Display all workspaces with Name and ID
    • ASK: "Which Sentinel workspace should I use for this investigation?"
    • ⛔ STOP AND WAIT for user response
    • ⛔ DO NOT proceed until user explicitly selects
  4. If a query fails on the selected workspace:
    • ⛔ DO NOT automatically try another workspace
    • STOP and report the error, display available workspaces, ASK user to select

🔴 PROHIBITED ACTIONS:

  • ❌ Selecting a workspace without user consent when multiple exist
  • ❌ Switching to another workspace after a failure without asking
  • ❌ Proceeding with investigation if workspace selection is ambiguous

Output Modes

This skill supports two output modes. ASK the user which they prefer if not explicitly specified. Both may be selected.

Mode 1: Inline Chat Summary (Default)

  • Render the full drift analysis directly in the chat response
  • Includes ASCII tables, Pareto chart, drift dimension bars, and security assessment
  • Best for quick review and interactive follow-up questions

Mode 2: Markdown File Report

  • Save a comprehensive report to reports/scope-drift/user/Scope_Drift_Report_<username>_<timestamp>.md
  • All ASCII visualizations render correctly inside markdown code fences (```)
  • Includes all data from inline mode plus additional detail sections
  • Use create_file tool — NEVER use terminal commands for file output
  • Filename pattern: Scope_Drift_Report_<username>_YYYYMMDD_HHMMSS.md (extract username from UPN, e.g., jdoe from jdoe@contoso.com)

Markdown Rendering Notes

  • ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
  • ✅ Unicode block characters ( full block, box-drawing horizontal) display correctly in monospaced fonts
  • ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
  • ✅ Standard markdown tables (| col |) render as formatted tables
  • Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering

Quick Start (TL;DR)

When a user requests user scope drift detection:

  1. Select Workspacelist_sentinel_workspaces, auto-select or ask
  2. Determine Output Mode → Ask if not specified: inline, markdown file, or both
  3. Run Phase 1 → Query 6 (SigninLogs interactive) + Query 7 (AADNonInteractiveUserSignInLogs)
  4. Apply Entity Scaling (multi-user only) → If analyzing multiple users, compute drift scores, rank, apply tiered depth limits (see Entity Scaling)
  5. Run Phases 2-3 → Queries 8-13 (AuditLogs + SecurityAlert + Anomaly table + Identity Protection + CloudAppEvents + EmailEvents) — scoped per tier if multi-user
  6. Compute Drift Scores → Apply 7-dimension interactive formula + 6-dimension non-interactive formula, flag if >150%, assess with corroborating evidence
  7. Output Results → Render in selected mode(s)

Entity Scaling (Multi-User Analysis)

Problem: This skill is typically used for single-user investigations, but users may request tenant-wide or group-based analysis ("drift for all users", "drift for finance department"). Running Queries 8–13 for every user in a large tenant is prohibitively expensive and produces unreadable reports.

Solution: For multi-user analysis, after Phase 1 computes drift scores for all target users, apply tiered depth based on user count and drift severity.

Single-user mode: When investigating one specific user (the common case), skip this section entirely — always run all queries at full depth.

User Count Detection

After Queries 6+7, count distinct users in the result set:

User Count Tier Deep Dive Limit Behavior
1 user Single Full All queries at full depth. This section does not apply.
2–30 users Small All flagged Full deep dive for every user > 150%. No limiting needed.
31–100 users Medium Top 10 Full deep dive for top 10 by max(Interactive, Non-Interactive) DriftScore. Summary row for remaining flagged users.
101–500 users Large Top 10 Full deep dive for top 10. Tier 2 summary (next 15) with Identity Protection + alerts only. Remaining flagged users listed in ranking table.
> 500 users Very Large Top 10 Same as Large, plus: filter Phase 1 results to BL_TotalSignIns > 10 to exclude near-silent accounts from scoring.

Tiered Depth Model (Multi-User)

Tier Users Queries Run Report Depth
Tier 1 (Full) Top N by DriftScore All: Q8–Q13 Full deep dive: both ASCII charts, dimension tables, AuditLog changes, alerts, anomalies, Identity Protection, CloudAppEvents, EmailEvents
Tier 2 (Summary) Next 15 flagged users (or remaining if < 15) Q10 + Q11 only (Identity Protection + SecurityAlert) One-line summary: both scores, risk state, alert count, flag status
Tier 3 (Score only) All remaining flagged users None beyond Phase 1 Row in ranking table: UPN, interactive score, non-interactive score, flag emoji
Stable Users ≤ 150% None beyond Phase 1 Omitted from deep dives. Included in summary statistics only.

User Override

If the user explicitly asks for "all users detailed" or "full report", honor the request but warn:

⚠️ Analysis covers <N> users with <X> flagged above 150%. Running full deep dives for all flagged users may be slow and produce a very long report. Proceed? (Default: top 10 deep dives + summary for others)

Report Disclosure (Multi-User)

When tiered depth is applied, always disclose in the report header:

**User Count:** <N> users (Large cohort — tiered analysis applied)
**Deep Dives:** Top <X> by DriftScore (Tier 1: full analysis)
**Summaries:** <Y> additional flagged users (Tier 2: risk + alerts only)
**Score Only:** <Z> additional flagged users (Tier 3: ranking table only)
**Stable:** <W> users ≤ 150% (omitted from deep dives)

Drift Score Formula

The Drift Score is a weighted composite of behavioral dimensions, normalized so that 100 = identical to baseline.

User accounts produce TWO drift scores (interactive + non-interactive). Both must be computed and reported.

User Account Formula — Interactive (7 Dimensions)

$$ \text{DriftScore}_{Interactive} = 0.25V + 0.20A + 0.10R + 0.15IP + 0.10L + 0.10D + 0.10F $$

Dimension Weight Metric Why
Volume 25% Daily avg interactive sign-ins Reduced weight vs SPN — user volume is naturally more variable
Applications 20% Distinct apps accessed New apps = potential unauthorized access or shadow IT
Resources 10% Distinct target resources accessed Reduced weight — apps are a better user-level signal
IPs 15% Distinct source IP addresses New IPs = different network, VPN, or credential theft
Locations 10% Distinct geographic locations New geos = travel or impossible travel
Devices 10% Distinct device types (OS + browser) New devices = potential unauthorized device
Failure Rate 10% Failure rate delta Rising failures = password spray target or lockout

User Account Formula — Non-Interactive (6 Dimensions)

$$ \text{DriftScore}_{NonInteractive} = 0.30V + 0.20A + 0.15R + 0.15IP + 0.10L + 0.10F $$

Dimension Weight Metric Why
Volume 30% Daily avg non-interactive sign-ins Higher weight — non-interactive volume is more predictable
Applications 20% Distinct apps with token refreshes New apps = potential token theft or rogue app consent
Resources 15% Distinct resources targeted New resources = lateral expansion via token reuse
IPs 15% Distinct source IPs New IPs = session hijack or AiTM proxy
Locations 10% Distinct geographic locations Geographic shifts in token usage
Failure Rate 10% Failure rate delta Rising failures = expired/revoked token churn

Note: Devices dimension is excluded from non-interactive because token refreshes don't generate reliable device telemetry.

Interpretation Scale

Score Meaning Action
< 80 Contracting scope ✅ Normal — entity is doing less than usual
80–120 Stable / normal variance ✅ No action required
120–150 Moderate deviation 🟡 Monitor — check for legitimate reasons
> 150 Significant drift 🔴 FLAG — investigate with corroborating evidence
> 250 Extreme drift 🔴 CRITICAL — immediate investigation required

Low-Volume Denominator Floor

CRITICAL: For entities with sparse baselines (< 10 daily sign-ins), the volume ratio is artificially inflated. Apply a floor:

IF BL_DailyAvg < 10:
    AdjustedVolumeRatio = RC_DailyAvg / max(BL_DailyAvg, 10) * 100
    Flag the score with: "⚠️ Low-volume baseline — ratio may be inflated"

This prevents an entity averaging 1 sign-in/day from triggering at 6 sign-ins/day (600% ratio but trivial absolute volume).

User-specific note: Non-interactive sign-ins often have very high volume (thousands/day) from background token refreshes. The floor is less likely to trigger for non-interactive, but always check interactive separately.

Failure Rate Dimension — Delta-to-Ratio Conversion

CRITICAL: The FailRate dimension is a percentage-point delta, not a multiplicative ratio like the other dimensions. Convert it to the same 0–200+ scale using this formula:

FailRateDelta = RecentFailRate - BaselineFailRate  (percentage points)
FailRateRatio = 100 + (FailRateDelta × 10)         (scaled: each +1pp = +10 on the ratio scale)
Baseline FailRate Recent FailRate Delta Ratio Interpretation
5.00% 5.00% 0.00 100.0 No change
5.00% 8.00% +3.00 130.0 Moderate increase
5.00% 12.00% +7.00 170.0 🔴 Above threshold
5.00% 2.00% -3.00 70.0 Improving (contracting)
0.00% 0.00% 0.00 100.0 No change (both clean)
0.00% 5.00% +5.00 150.0 🟡 At threshold — new failures appearing

Edge case: Baseline = 0% avoids division-by-zero because delta is additive, not multiplicative. The scaling factor (×10) means each percentage point of failure rate increase maps to 10 points on the drift scale. This keeps FailRate on the same magnitude as the other dimensions.

In the ASCII chart: Show the ratio as the bar fill percentage and append the raw delta as direction indicator: ^+X.XX (increasing) or v-X.XX (decreasing).


Execution Workflow

Phase 1: Behavioral Baseline vs. Recent Comparison

Baseline window: 90 days (days 8–97 ago) Recent window: 7 days (last 7 days)

This is the primary query that computes per-user behavioral profiles and drift metrics.

Data Source Query Notes
SigninLogs Query 6 Interactive, 7 dimensions (adds Apps, Devices)
AADNonInteractiveUserSignInLogs Query 7 Non-interactive, 6 dimensions (adds Apps, no Devices)

User accounts produce TWO drift scores (interactive + non-interactive). Both must be computed and reported.

Phase 2: Account Configuration Change Audit

Data source: AuditLogs Correlation: Same 97-day window, filtered to the user from Phase 1

Operations to Look For:

  • Reset user password
  • Change user password
  • Update user
  • Add member to group
  • Add member to role
  • Register security info
  • Delete security info
  • Update StsRefreshTokenValidFrom
  • Any operation containing: "password", "MFA", "role", "group", "conditional", "auth"

Phase 3: Corroborating Signal Collection (Run in Parallel)

  • SecurityAlert + SecurityIncident (Query 11): Check for alerts referencing user UPN, joined with SecurityIncident for real status/classification. Never read SecurityAlert.Status directly — it's always "New".
  • Signinlogs_Anomalies_KQL_CL (Query 9): Pre-computed anomaly detection (new IPs, new device combos, geographic novelty). Custom table — may not exist in all workspaces.
  • Identity Protection risk fields (Query 10): RiskLevelDuringSignIn, RiskState, RiskEventTypes_V2 from SigninLogs.
  • CloudAppEvents (Query 12): Cloud app activity drift — baseline vs. recent comparison of action types, applications, IPs, countries, admin/external/impersonated operations. Requires user's AccountObjectId (Entra Object ID) — resolve from UPN via Graph API before querying. May not exist if XDR connector is not streaming to Data Lake.
  • EmailEvents (Query 13): Email pattern drift — baseline vs. recent comparison of volume, send/receive ratio, email direction, sender domains, threat email prevalence. Uses UPN for both sender and recipient matching. May not exist if XDR connector is not streaming to Data Lake.

Phase 4: Score Computation & Report Generation

  1. Compute DriftScore for BOTH interactive and non-interactive using entity-specific formulas
  2. Apply the low-volume denominator floor
  3. Flag if either score exceeds 150% threshold
  4. For flagged users: assess corroborating evidence (account changes, alerts, anomaly table, Identity Protection, cloud app activity drift, email pattern drift)
  5. Generate risk assessment with emoji-coded findings
  6. Render output in the user's selected mode

Sample KQL Queries

Query 6: User Interactive Sign-In Baseline vs. Recent

// Build 90-day baseline vs 7-day recent for user interactive sign-ins
// Substitute <UPN> with user's UPN
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
SigninLogs
| where UserPrincipalName =~ '<UPN>'
| where TimeGenerated >= baselineStart
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
    TotalSignIns = count(),
    Days = dcount(bin(TimeGenerated, 1d)),
    DistinctApps = dcount(AppDisplayName),
    DistinctResources = dcount(ResourceDisplayName),
    DistinctIPs = dcount(IPAddress),
    DistinctLocations = dcount(Location),
    DistinctDevices = dcount(strcat(tostring(parse_json(DeviceDetail).operatingSystem), "|", tostring(parse_json(DeviceDetail).browser))),
    FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
    Apps = make_set(AppDisplayName, 50),
    Resources = make_set(ResourceDisplayName, 50),
    IPs = make_set(IPAddress, 50),
    Locations = make_set(Location, 50),
    Devices = make_set(strcat(tostring(parse_json(DeviceDetail).operatingSystem), "|", tostring(parse_json(DeviceDetail).browser)), 50)
    by Period
| order by Period asc

Post-processing: Compare Baseline vs Recent rows. Compute ratios per dimension. Calculate set_difference() equivalents in the assessment to identify new apps, IPs, locations, and devices appearing only in the Recent period.

Query 7: User Non-Interactive Sign-In Baseline vs. Recent

// Build 90-day baseline vs 7-day recent for user non-interactive sign-ins
// Substitute <UPN> with user's UPN
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
AADNonInteractiveUserSignInLogs
| where UserPrincipalName =~ '<UPN>'
| where TimeGenerated >= baselineStart
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
    TotalSignIns = count(),
    Days = dcount(bin(TimeGenerated, 1d)),
    DistinctApps = dcount(AppDisplayName),
    DistinctResources = dcount(ResourceDisplayName),
    DistinctIPs = dcount(IPAddress),
    DistinctLocations = dcount(Location),
    FailRate = round(1.0 * countif(ResultType != "0" and ResultType != 0) / count() * 100, 2),
    Apps = make_set(AppDisplayName, 50),
    Resources = make_set(ResourceDisplayName, 50),
    IPs = make_set(IPAddress, 50),
    Locations = make_set(Location, 50)
    by Period
| order by Period asc

Note: Devices dimension is excluded from non-interactive queries — token refreshes don't generate reliable device telemetry.

KQL Pattern Note: Uses single-pass extend Period = iff(...) pattern instead of separate baseline/recent subqueries joined with join kind=inner on 1==1. The cross-join pattern is NOT supported in KQL — always use the Period flag approach for user queries.

Query 8: User AuditLog Configuration Changes

// User account configuration changes (password, MFA, roles, groups)
// Substitute <UPN> with user's UPN
AuditLogs
| where TimeGenerated > ago(97d)
| where OperationName has_any ("password", "MFA", "role", "group", "conditional", "auth",
    "user", "member", "security info")
| where tostring(TargetResources) has '<UPN>'
    or tostring(InitiatedBy) has '<UPN>'
    or Identity =~ '<UPN>'
| extend InBaseline = TimeGenerated < ago(7d)
| summarize
    BaselineOps = countif(InBaseline),
    RecentOps = countif(not(InBaseline)),
    Operations = make_set(OperationName, 30)
    by OperationName
| order by RecentOps desc

Query 9: SigninLogs Anomaly Table (Custom)

🔴 CRITICAL — CASE-SENSITIVE TABLE NAME: The table is Signinlogs_Anomalies_KQL_CL (lowercase 'l' in "logs"). Do NOT use SigninLogs_Anomalies_KQL_CL — that will fail with SemanticError: Failed to resolve table. KQL custom _CL tables are case-sensitive. Copy the name exactly as written below.

// Pre-computed anomalies from Signinlogs_Anomalies_KQL_CL
// Substitute <UPN> with user's UPN
// ⚠️ CASE-SENSITIVE: Table name is "Signinlogs" (lowercase 'l'), NOT "SigninLogs"
// Note: This table may not exist in all workspaces — handle gracefully
Signinlogs_Anomalies_KQL_CL
| where TimeGenerated > ago(14d)
| where UserPrincipalName =~ '<UPN>'
| extend Severity = case(
    BaselineSize < 3, "Informational",
    CountryNovelty and CityNovelty and ArtifactHits >= 20, "High",
    ArtifactHits >= 10 or CountryNovelty or CityNovelty or StateNovelty, "Medium",
    ArtifactHits >= 5, "Low",
    "Informational")
| where Severity in ("High", "Medium", "Low")
| project DetectedDateTime, AnomalyType, Value, Severity, Country, City,
    ArtifactHits, CountryNovelty, CityNovelty, OS, BrowserFamily
| order by DetectedDateTime desc
| take 20

Query 10: Identity Protection Risk Events

// Identity Protection risk signals from SigninLogs
// Substitute <UPN> with user's UPN
SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName =~ '<UPN>'
| where RiskLevelDuringSignIn != "none" and RiskLevelDuringSignIn != ""
| project TimeGenerated, RiskLevelDuringSignIn, RiskState, RiskEventTypes_V2,
    IPAddress, Location, AppDisplayName,
    DeviceOS = tostring(parse_json(DeviceDetail).operatingSystem),
    Browser = tostring(parse_json(DeviceDetail).browser),
    ConditionalAccessStatus
| order by TimeGenerated desc
| take 20

Note: Identity Protection events supplement the drift analysis. Any atRisk or confirmedCompromised risk states in the recent window should be flagged prominently, regardless of drift score.

Query 11: User SecurityAlert + SecurityIncident Correlation

// Security alerts and incidents referencing the user
// IMPORTANT: SecurityAlert.Status is immutable (always "New") — MUST join SecurityIncident for real Status/Classification
// Substitute <UPN> with user's UPN
let relevantAlerts = SecurityAlert
| where TimeGenerated > ago(97d)
| where Entities has '<UPN>' or CompromisedEntity has '<UPN>'
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProductName, ProductComponentName, Tactics, Techniques, TimeGenerated;
SecurityIncident
| where CreatedTime > ago(97d)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend Period = iff(TimeGenerated1 < ago(7d), "Baseline", "Recent")
| summarize
    BaselineAlerts = countif(Period == "Baseline"),
    RecentAlerts = countif(Period == "Recent"),
    TotalAlerts = count(),
    Severities = make_set(AlertSeverity, 5),
    IncidentStatuses = make_set(Status, 5),
    Classifications = make_set(Classification, 5),
    BaselineIncidents = dcountif(IncidentNumber, Period == "Baseline"),
    RecentIncidents = dcountif(IncidentNumber, Period == "Recent")
    by ProductName
| order by TotalAlerts desc

Interpreting Incident Status in Drift Context:

Incident Status Classification Impact on Drift Assessment
Closed TruePositive 🔴 Confirmed threat — significantly increases drift risk
Closed FalsePositive 🟢 False alarm — discount from drift risk, note as noise
Closed BenignPositive 🟡 Expected behavior — note but don't escalate
Active/New Any 🟠 Unresolved — flag for attention, may indicate ongoing threat

Product Name Mapping (Legacy → Current Branding):

The ProductName field in SecurityAlert contains the detection product. When rendering reports, translate to current Microsoft branding:

SecurityAlert.ProductName (raw) Report Display Name
Microsoft Defender Advanced Threat Protection Microsoft Defender for Endpoint
Microsoft Cloud App Security Microsoft Defender for Cloud Apps
Microsoft Data Loss Prevention Microsoft Purview Data Loss Prevention
Azure Sentinel Microsoft Sentinel
Microsoft 365 Defender Microsoft Defender XDR
Office 365 Advanced Threat Protection Microsoft Defender for Office 365
Azure Advanced Threat Protection Microsoft Defender for Identity

Report Rendering: Same rules as SPN — show Baseline vs Recent alert/incident counts per product, with a Total row and brief summary. Do NOT list individual alert names.

Query 12: CloudAppEvents — Cloud App Activity Drift

// Cloud app activity drift — baseline vs recent comparison
// Tracks action type diversity, application usage, IP/geo distribution,
// admin operations, external user activity, and impersonation
// Substitute <ACCOUNT_OBJECT_ID> with user's Entra Object ID (resolve from UPN via Graph API)
// NOTE: This table requires XDR connector streaming to Data Lake
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
CloudAppEvents
| where TimeGenerated >= baselineStart
| where AccountObjectId == '<ACCOUNT_OBJECT_ID>'
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
    TotalEvents = count(),
    Days = dcount(bin(TimeGenerated, 1d)),
    DistinctActions = dcount(ActionType),
    DistinctApps = dcount(Application),
    DistinctObjects = dcount(ObjectName),
    DistinctIPs = dcount(IPAddress),
    DistinctCountries = dcount(CountryCode),
    AdminOps = countif(IsAdminOperation),
    ExternalUserOps = countif(IsExternalUser),
    ImpersonatedOps = countif(IsImpersonated),
    Actions = make_set(ActionType, 100),
    Apps = make_set(Application, 50),
    IPs = make_set(IPAddress, 50),
    Countries = make_set(CountryCode, 20)
    by Period
| order by Period asc

How to resolve AccountObjectId from UPN: Use Microsoft Graph API: GET /v1.0/users/<UPN>?$select=id → use the id field as <ACCOUNT_OBJECT_ID>.

Drift Interpretation for CloudAppEvents (Corroboration — not scored):

CloudAppEvents provides qualitative corroboration, not a scored drift dimension. Focus on these signals:

Signal Baseline → Recent Change Risk Implication
DistinctActions ↑↑ New action types appearing Expanded permissions or new tooling usage
AdminOps ↑↑ New admin-level operations Privilege escalation or new admin role assignment
ExternalUserOps > 0 (new) External user activity appearing Potential guest account abuse or B2B compromise
ImpersonatedOps > 0 (new) Impersonation activity appearing Delegated access abuse or admin impersonation
New applications Apps in Recent not in Baseline Shadow IT, rogue app consent, or lateral movement
New countries Countries in Recent not in Baseline Geographic anomaly — correlate with SigninLogs locations
DistinctIPs ↑↑ Significant new IPs VPN rotation, proxy usage, or credential sharing

Corroboration with other drift signals:

  • New admin operations in CloudAppEvents + role assignment in AuditLogs = strong privilege escalation signal
  • New applications in CloudAppEvents + new apps in SigninLogs = confirmed shadow IT adoption
  • New countries in CloudAppEvents + geographic anomalies in anomaly table = travel or compromise

Query 13: EmailEvents — Email Pattern Drift

// Email pattern drift — baseline vs recent comparison
// Tracks volume, send/receive ratio, direction distribution,
// sender diversity, domain diversity, and threat email prevalence
// Substitute <UPN> with user's UPN (matches both sender and recipient)
// NOTE: This table requires XDR connector streaming to Data Lake
let baselineStart = ago(97d);
let baselineEnd = ago(7d);
EmailEvents
| where TimeGenerated >= baselineStart
| where RecipientEmailAddress =~ '<UPN>' or SenderMailFromAddress =~ '<UPN>'
| extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent")
| summarize
    TotalEmails = count(),
    Days = dcount(bin(TimeGenerated, 1d)),
    SentCount = countif(SenderMailFromAddress =~ '<UPN>'),
    ReceivedCount = countif(RecipientEmailAddress =~ '<UPN>'),
    InboundCount = countif(EmailDirection == "Inbound"),
    OutboundCount = countif(EmailDirection == "Outbound"),
    IntraOrgCount = countif(EmailDirection == "Intra-org"),
    DistinctSenders = dcount(SenderMailFromAddress),
    DistinctRecipients = dcountif(RecipientEmailAddress, SenderMailFromAddress =~ '<UPN>'),
    DistinctSenderDomains = dcount(SenderMailFromDomain),
    ThreatEmails = countif(ThreatTypes != ""),
    DistinctSubjects = dcount(Subject),
    SenderDomains = make_set(SenderMailFromDomain, 50),
    DeliveryActions = make_set(DeliveryAction, 10)
    by Period
| order by Period asc

Drift Interpretation for EmailEvents (Corroboration — not scored):

EmailEvents provides qualitative corroboration, not a scored drift dimension. Focus on these signals:

Signal Baseline → Recent Change Risk Implication
SentCount ↑↑↑ Sudden spike in outbound email Potential spam/phishing campaign from compromised account
SentCount drops to 0 User stopped sending email Account takeover with mail forwarding rule (check OfficeActivity)
ThreatEmails ↑ Increase in threat-flagged inbound Targeted phishing campaign against user
New SenderDomains (inbound) Domains in Recent not in Baseline New communication partners or phishing domains
IntraOrgCount → 0 (was > 0) Lost intra-org email patterns User isolated or moved to different tenant
DeliveryAction changes More "Junked" or "Blocked" in Recent Email security policies catching more threats
DistinctSubjects ↓↓ (with volume ↑) Many emails with few subjects Automated/bulk email — potential spam or notification storm
OutboundCount ↑ + new recipients Sudden outbound expansion Data exfiltration or mass-mailing from compromised mailbox

Corroboration with other drift signals:

  • Outbound email spike + new forwarding rule in OfficeActivity/AuditLogs = email exfiltration (T1114.003)
  • ThreatEmails ↑ + Identity Protection risk events + new IPs in SigninLogs = active phishing campaign with partial success
  • SentCount → 0 + non-interactive IP drift = account takeover with inbox rule redirect

Report Template

Inline Chat Report Structure

The inline report MUST include these sections in order:

  1. Header — Workspace, analysis period, drift threshold, data sources
  2. Interactive Drift Score — 7-dimension breakdown with ratios
  3. Non-Interactive Drift Score — 6-dimension breakdown with ratios
  4. Flagged Dimension Deep Dive (for any dimension > 150%) — Baseline vs. recent comparison, new IPs/apps/devices, dimension bar chart
  5. Correlated Signal Summary — AuditLogs, SecurityAlert/Incident, and anomaly table findings in a single table
  6. Identity Protection Summary — Risk events, risk states, risk levels
  7. Cloud App Activity Drift — CloudAppEvents baseline vs. recent: action types, apps, admin ops, impersonation, new countries/IPs
  8. Email Pattern Drift — EmailEvents baseline vs. recent: volume, direction, sender domains, threat emails
  9. Security Assessment — Emoji-coded findings table with evidence citations
  10. Verdict Box — Overall risk level, root cause analysis, recommendations

Markdown File Report Structure

When outputting to markdown file, include everything from the inline format PLUS:

Filename pattern: reports/scope-drift/user/Scope_Drift_Report_<username>_YYYYMMDD_HHMMSS.md

# User Account Scope Drift Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**User:** <UPN>
**Baseline Period:** <start> → <end> (90 days)
**Recent Period:** <start> → <end> (7 days)
**Drift Threshold:** 150%
**Data Sources:** SigninLogs, AADNonInteractiveUserSignInLogs, AuditLogs, SecurityAlert, Signinlogs_Anomalies_KQL_CL, Identity Protection, CloudAppEvents, EmailEvents

---

## Executive Summary

<1-3 sentence summary: interactive drift score, non-interactive drift score, overall risk level>

---

## Interactive Sign-In Drift

**Drift Score: XX.X%** — <status emoji> <Contracting/Stable/Expanding>

<LaTeX formula block>

**ASCII Drift Dimension Chart (REQUIRED):**

Render a box-drawn chart inside a code fence. **Inner width: 58 chars** (every line between `│` markers = exactly 58 visual characters). No emoji inside boxes — use text labels.

**Alignment:** Name (9 chars padded) + weight (5) + gap (2) + bars (20 `█─`) + gap (2) + pct (6, right-aligned: `XXX.X%` or ` XX.X%`) + gap (2) + direction (10 total: `^`/`v`/`=` + 9 trailing spaces, or FailRate: delta like `v-X.XX` + 4 trailing spaces). Status labels (centered): `STABLE`, `STABLE (Low-Volume)`, `NEAR THRESHOLD`, `ABOVE THRESHOLD`, `CRITICAL`. Direction: `^` (up), `v` (down), `=` (stable).

**Bar characters:** Use `█` (U+2588 full block) for filled portions and `─` (U+2500 box-drawing horizontal) for the unfilled track.

┌──────────────────────────────────────────────────────────┐ │ INTERACTIVE DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (25%) ██████────────────── XXX.X% ^ │ │ Apps (20%) ███───────────────── XX.X% v │ │ Resources(10%) ██████────────────── XXX.X% = │ │ IPs (15%) █─────────────────── XX.X% v │ │ Locations(10%) ███───────────────── XX.X% = │ │ Devices (10%) ██────────────────── XX.X% v │ │ FailRate (10%) ██████────────────── XXX.X% v-X.XX │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘


**Bar fill:** 20 chars wide. Filled = round(ratio/100 × 20), capped at 20. Title and status: center within 58 chars. Use `█` for filled, `─` for unfilled.

**Then** render the standard markdown dimension table:

| Dimension | Weight | Baseline (90d) | Recent (7d) | Ratio | Weighted | Status |
|-----------|--------|----------------|-------------|-------|----------|--------|

<New apps, IPs, locations, devices appearing only in recent period>

---

## Non-Interactive Sign-In Drift

**Drift Score: XX.X%** — <status emoji> <Contracting/Stable/Expanding>

<LaTeX formula block>

**ASCII Drift Dimension Chart (REQUIRED):**

Same box-drawn format as Interactive. **Inner width: 58 chars.** 6 dimensions (no Devices):

┌──────────────────────────────────────────────────────────┐ │ NON-INTERACTIVE DRIFT SCORE: XX.X │ │ STABLE │ ├──────────────────────────────────────────────────────────┤ │ │ │ Volume (30%) ███████───────────── XXX.X% ^ │ │ Apps (20%) ████──────────────── XX.X% v │ │ Resources(15%) █████─────────────── XXX.X% = │ │ IPs (15%) ██────────────────── XX.X% v │ │ Locations(10%) ███───────────────── XX.X% = │ │ FailRate (10%) ███████████───────── XXX.X% ^+X.XX │ │ │ │ ────────────────────────── 100% baseline ──┤ │ │ 150% drift threshold ▲ │ └──────────────────────────────────────────────────────────┘


**Then** render the standard markdown dimension table:

| Dimension | Weight | Baseline (90d) | Recent (7d) | Ratio | Weighted | Status |
|-----------|--------|----------------|-------------|-------|----------|--------|

<New apps, IPs, locations appearing only in recent period>

---

## Account Configuration Changes

<AuditLogs findings: password changes, MFA changes, role assignments, group memberships>

---

## Pre-Computed Anomalies

<Signinlogs_Anomalies_KQL_CL findings or gap note if table unavailable>

---

## Identity Protection

<Risk events, risk states, risk levels from SigninLogs>

---

## Cloud App Activity Drift

<CloudAppEvents baseline vs. recent comparison — action types, apps, IPs, countries, admin/external/impersonated operations>
<New actions, new apps, new countries appearing only in recent period>
<Corroboration notes — cross-reference with AuditLogs, SigninLogs>
<If table unavailable: "⚠️ CloudAppEvents table not available in this workspace — XDR connector may not be streaming to Data Lake.">

---

## Email Pattern Drift

<EmailEvents baseline vs. recent comparison — volume, sent/received, direction, sender domains, threat emails>
<Notable changes — outbound spikes, new sender domains, threat email trends>
<Corroboration notes — cross-reference with OfficeActivity for forwarding rules, Identity Protection for phishing>
<If table unavailable: "⚠️ EmailEvents table not available in this workspace — XDR connector may not be streaming to Data Lake.">

---

## Correlated Security Alerts

| Data Source | Finding | Incident Status |
|-------------|---------|-----------------|
| SigninLogs | ... | N/A |
| AADNonInteractiveUserSignInLogs | ... | N/A |
| AuditLogs | ... | N/A |
| Signinlogs_Anomalies_KQL_CL | ... | N/A |
| CloudAppEvents | ... | N/A |
| EmailEvents | ... | N/A |
| SecurityAlert / SecurityIncident | <Group by ProductName, translate to current branding> | <Status: New/Active/Closed, Classification: TP/FP/BP> |

---

## Security Assessment

| Factor | Finding |
|--------|---------|
| 🔴/🟢/🟡 **Factor** | Evidence-based finding |

---

## Verdict

**ASCII Verdict Box (REQUIRED):**

Render a box-drawn verdict summary inside a code fence. **Inner width: 66 chars.** No emoji inside boxes. Pad every line to exactly 66 chars between `│` markers.

┌──────────────────────────────────────────────────────────────────┐ │ OVERALL RISK: <LEVEL> -- <One-line summary> │ │ Interactive Score: XX.X (< 80 = Contracting) │ │ Non-Interactive Score: XX.X (< 80 = Contracting) │ │ Root Cause: <Brief root cause explanation> │ └──────────────────────────────────────────────────────────────────┘


**Then** render the full verdict with:
- Root Cause Analysis paragraph
- Key Findings (numbered list)
- Recommendations (emoji-prefixed list)

---

## Appendix: Query Details

Render a single markdown table summarizing all queries executed. **Do NOT include full KQL text** — the canonical queries are already documented in this SKILL.md file (Queries 6–13). The appendix serves as an audit trail only.

| Query | Table(s) | Records Scanned | Results | Execution |
|-------|----------|----------------:|--------:|----------:|
| Q6 — Interactive Baseline vs. Recent | SigninLogs | X,XXX | N rows | X.XXs |
| Q7 — Non-Interactive Baseline vs. Recent | AADNonInteractiveUserSignInLogs | XX,XXX | N rows | X.XXs |
| ... | ... | ... | ... | ... |

*Query definitions: see Queries 6–13 in this SKILL.md file.*

Known Pitfalls

SecurityAlert.Status Is Immutable — Always Join SecurityIncident

Problem: The Status field on SecurityAlert is set to "New" at creation time and never changes. It does NOT reflect whether the alert has been investigated, closed, or classified. Solution: MUST join with SecurityIncident to get real Status (New/Active/Closed) and Classification (TruePositive/FalsePositive/BenignPositive). See Query 11 which implements this join.

Low-Volume Statistical Inflation

Problem: Entities with very low baseline activity (e.g., 1 sign-in/day) will show extreme volume ratios even with minor changes. Solution: Apply the denominator floor (minimum 10 sign-ins/day for volume ratio calculation). Always flag low-volume baselines in the report.

Seasonal/Cyclical Baselines

Problem: Some entities have weekly patterns (lower on weekends) or monthly cycles (month-end batch jobs). Solution: Note if the 7-day recent window falls on an atypical portion of the cycle. The 90-day baseline smooths most cyclical patterns, but edge cases exist.

90-Day IP/App Contraction

Problem: The 90-day baseline captures ISP address rotations, travel IPs, and occasional app usage that won't naturally recur in a 7-day window. This makes user accounts appear to be "contracting" (score < 80) when they are actually stable. Solution: For user accounts showing contraction, check if the absolute numbers are reasonable. If the user had 30 IPs over 90 days but only 2 in 7 days, this is expected — note it as "natural IP diversity compression" rather than genuine scope reduction.

Non-Interactive Volume Inflation

Problem: Non-interactive sign-ins (token refreshes, background app activity) can number in the thousands per day. A brief outage or token cache flush can cause dramatic volume swings. Solution: Weight non-interactive drift scores lower in the overall assessment unless corroborated by new apps or IPs. Volume-only drift in non-interactive is rarely meaningful without other signals.

Cross-Join KQL Error

Problem: join kind=inner on 1==1 (cross-join) is NOT supported in KQL Sentinel Data Lake. The SPN query uses separate subqueries joined on ServicePrincipalId, but user queries target a single UPN and cannot use this pattern. Solution: User queries MUST use the single-pass extend Period = iff(TimeGenerated < baselineEnd, "Baseline", "Recent") pattern with summarize ... by Period. See Queries 6 and 7.

Identity Protection Risk States Lingering

Problem: Risk events (e.g., unfamiliarFeatures, anonymizedIPAddress) may show RiskState == "atRisk" for days/weeks after the triggering event if no admin action is taken. Solution: Check RiskState carefully. "atRisk" doesn't mean ongoing compromise — it means the risk was never remediated or dismissed. Flag these for admin review but don't automatically escalate drift score.

Device Telemetry Gaps

Problem: DeviceDetail in SigninLogs may be empty or {} for some sign-in types (SSO, mobile apps, headless clients). Solution: If DistinctDevices is very low (0-1) despite many sign-ins, note the gap rather than treating low device count as meaningful.

🔴 Custom Anomaly Table — CASE-SENSITIVE NAME

Problem: Signinlogs_Anomalies_KQL_CL is a custom table that may not exist in all workspaces. 🔴 CRITICAL: The table name uses lowercase 'l' in "logs" — Signinlogs NOT SigninLogs. KQL custom _CL table names are case-sensitive. LLMs tend to auto-correct this to match the standard SigninLogs table — this WILL cause a SemanticError: Failed to resolve table error. Always copy the exact table name from Query 9. Solution: If the table returns a SemanticError, first verify you used the correct casing (Signinlogs_Anomalies_KQL_CL). If it still fails after verifying casing, then the table genuinely doesn't exist — skip Query 9 gracefully and note: "⚠️ Custom anomaly table not available in this workspace — skipping pre-computed anomaly check." Do not fail the entire analysis.

CloudAppEvents Uses AccountObjectId, Not UPN

Problem: CloudAppEvents identifies users via AccountObjectId (Entra Object ID GUID), not UserPrincipalName. Querying by UPN will return 0 results. Solution: Before executing Query 12, resolve the user's Entra Object ID from their UPN using Microsoft Graph API: GET /v1.0/users/<UPN>?$select=id. Use the returned id value as <ACCOUNT_OBJECT_ID> in the query. If Graph API is unavailable, fall back to AccountDisplayName with has operator (less precise — display names are not unique).

CloudAppEvents/EmailEvents Table Availability

Problem: Both CloudAppEvents and EmailEvents are XDR-native tables that require the Defender XDR connector to stream data into the Sentinel Data Lake. They may not exist in all workspaces. Solution: If either table is not found, skip the corresponding query gracefully and note: "⚠️ [Table] not available in this workspace — XDR connector may not be streaming to Data Lake." Do not fail the entire analysis. These are corroboration signals, not primary drift dimensions.

CloudAppEvents Empty CountryCode and IPAddress

Problem: Some CloudAppEvents entries (particularly system-initiated or API-driven operations) have empty CountryCode and/or IPAddress fields. These inflate DistinctCountries and DistinctIPs counts with empty string entries. Solution: The query uses dcount() which counts empty strings as a distinct value. When interpreting results, note that one "country" or "IP" may be an empty string representing internal/system events. In the drift interpretation, focus on named countries and non-empty IPs.

EmailEvents ThreatTypes Empty String vs Null

Problem: ThreatTypes field in EmailEvents uses empty string "" for clean emails, not null. Using isnotempty() would miss this distinction. Solution: Query 13 uses ThreatTypes != "" which correctly filters for threat-flagged emails only. When ThreatEmails count is 0 in Recent but > 0 in Baseline, this is a positive signal (fewer threats reaching the user) rather than a drift concern.

EmailEvents Dual-Direction Matching

Problem: Query 13 matches on both RecipientEmailAddress and SenderMailFromAddress, so a single email where the user is both sender and recipient (e.g., sending to self) could be double-counted. Solution: This edge case is negligible in practice. The SentCount and ReceivedCount breakdowns use explicit directional filters, so the subtotals are accurate even if TotalEmails has minor inflation from self-sent emails.


Error Handling

Common Issues

Issue Solution
SigninLogs table not found Rare but possible in workspaces without Entra ID P1/P2 logging enabled. Report as blocker.
AADNonInteractiveUserSignInLogs table not found Check workspace configuration. Non-interactive logs require diagnostic settings. Skip non-interactive analysis and note the gap.
Signinlogs_Anomalies_KQL_CL table not found First check casing — the table name is Signinlogs (lowercase 'l'), NOT SigninLogs. LLMs frequently auto-correct this. If casing is correct and it still fails, the custom table may not exist in this workspace. Skip Query 9 gracefully with a note; do not fail the analysis.
CloudAppEvents table not found XDR connector may not be streaming to Data Lake. Skip Query 12 gracefully with note; do not fail the analysis. These are corroboration signals.
EmailEvents table not found XDR connector may not be streaming to Data Lake. Skip Query 13 gracefully with note; do not fail the analysis. These are corroboration signals.
CloudAppEvents returns 0 results for valid user Verify AccountObjectId — this field uses Entra Object ID (GUID), not UPN. Resolve via Graph API: GET /v1.0/users/<UPN>?$select=id.
Zero entities in results Verify the workspace has sign-in data for the user. Check if logging is enabled. Verify UPN spelling.
Query timeout Reduce the baseline window from 90 to 60 days, or add | take 100 to intermediate results.
AuditLogs has_any not matching Ensure IDs are quoted strings in the dynamic() array. Use tostring() on dynamic fields.
join kind=inner on 1==1 error Cross-join not supported in KQL. Use single-pass extend Period = iff(...) pattern instead. See Queries 6-7.
Identity Protection fields empty RiskLevelDuringSignIn may be "none" for all records if Identity Protection is not licensed. Note the gap; don't treat as "no risk."

Validation Checklist

Before presenting results, verify:

  • All applicable data sources were queried (even if some returned 0 results)
  • Low-volume denominator floor was applied to any entity with BL_DailyAvg < 10
  • Corroborating evidence was checked for every flagged entity
  • Empty results are explicitly reported with ✅ (not silently omitted)
  • The report includes the drift score formula and threshold for transparency
  • SecurityAlert was joined with SecurityIncident for real Status/Classification (never read SecurityAlert.Status directly)
  • Incident classifications (TP/FP/BP) were factored into risk assessment — FalsePositive alerts discounted, TruePositive alerts escalated
  • Both interactive AND non-interactive drift scores were computed
  • IP/app contraction was contextualized (90-day diversity vs 7-day window)
  • Identity Protection risk states were checked and reported
  • Custom anomaly table was queried (or gap noted if unavailable)
  • CloudAppEvents was queried for cloud app activity drift (or gap noted if table unavailable)
  • EmailEvents was queried for email pattern drift (or gap noted if table unavailable)
  • CloudAppEvents AccountObjectId was resolved from UPN via Graph API (not queried by UPN)
  • Device telemetry gaps were noted if DeviceDetail was sparse

SVG Dashboard Generation

📊 Optional post-report step. After a User scope drift report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

How to Request a Dashboard

  • Same chat: "Generate an SVG dashboard from the report" — data is already in context.
  • New chat: Attach or reference the report file, e.g. #file:reports/scope-drift/user/Scope_Drift_Report_<user>_<date>.md
  • Customization: Edit svg-widgets.yaml before requesting — the renderer reads it at generation time.

Execution

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/scope-drift/user/{report_name}_dashboard.svg

The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation. All customization happens there.

生成Sentinel数据摄入分析报告,涵盖表级用量、分层分类、异常检测、规则健康及优化建议。通过YAML驱动PowerShell收集数据并写入临时文件,最终由LLM渲染输出Markdown报告,支持内联聊天。
需要分析Azure Sentinel工作区的数据摄入量和成本结构 检查数据摄入异常或检测规则覆盖情况 评估Tier迁移候选项与许可证收益优化机会
.github/skills/sentinel-ingestion-report/SKILL.md
npx skills add SCStelz/security-investigator --skill sentinel-ingestion-report -g -y
SKILL.md
Frontmatter
{
    "name": "sentinel-ingestion-report",
    "description": "Sentinel Ingestion Report — YAML-driven PowerShell pipeline gathers all data via az monitor\/az rest\/Graph API, writes a deterministic scratchpad, LLM renders the report. Covers table-level volume breakdown, tier classification (Analytics\/Basic\/Data Lake), SecurityEvent\/Syslog\/CommonSecurityLog deep dives, ingestion anomaly detection (24h and WoW), analytic rule inventory via REST API, rule health via SentinelHealth, detection coverage cross-reference, tier migration candidates with DL-eligibility lookup, license benefit analysis (DfS P2 500MB\/server\/day, M365 E5 data grant). Inline chat and markdown file output."
}

Sentinel Ingestion Analysis Report — Instructions

Purpose

This skill generates a comprehensive Sentinel Ingestion Analysis Report covering workspace data volume, table-level breakdown, tier classification, ingestion anomalies, detection coverage, and optimization opportunities.

Entity Type: Sentinel workspace (from config.json)

Scope Primary Tables Use Case
Workspace-wide (default) Usage, SentinelHealth, SentinelAudit Full ingestion and cost analysis
Per-table deep dive SecurityEvent, Syslog, CommonSecurityLog + any table Granular breakdown of high-volume tables

What this report covers: Table-level volume breakdown with tier classification (Analytics/Basic/Data Lake), SecurityEvent/Syslog/CommonSecurityLog deep dives, ingestion anomaly detection (24h and week-over-week), analytic rule inventory with detection coverage cross-reference, rule health monitoring, tier migration candidates with DL-eligibility assessment, and license benefit analysis (DfS P2 and M365 E5).


Architecture

 ┌─────────────────────────────────────────────────────────────────┐
 │  YAML query files        PowerShell script        LLM render    │
 │  queries/phase1-5/  ──→  Invoke-IngestionScan  ──→  Phase 6     │
 │  (23 .yaml files)        .ps1 (~2600 lines)       (SKILL-       │
 │                          • az monitor (KQL)        report.md)   │
 │                          • az rest (REST API)                   │
 │                          • az monitor table list                │
 │                          • Invoke-MgGraphRequest                │
 │                          ↓                                      │
 │                     temp/ingest_scratch_<ts>.md                 │
 │                     (~50 KB, 64 sections)                       │
 └─────────────────────────────────────────────────────────────────┘

Execution model:

  • Phases 1-5 (data gathering): Fully automated by Invoke-IngestionScan.ps1. KQL queries run via az monitor log-analytics query. Non-KQL data (analytic rules, tier classifications, custom detections) is gathered via REST API, Azure CLI, and Microsoft Graph.
  • Phase 6 (rendering): LLM reads the scratchpad + SKILL-report.md and renders the report. This is the only phase requiring LLM involvement.

Design decision — TopRecommendations: The Top 3 Recommendations are computed by the LLM at render time (Phase 6), not pre-computed by PS1. Three of the seven Rule E categories (Data loss, DCR filter, Split ingestion) require cross-section reasoning that spans multiple scratchpad sections — this is precisely what the LLM excels at. The PS1 provides all the raw data; the LLM applies Rule E scoring across it.


Companion Files — When to Load

This skill spans 4 files. Load only the file(s) needed for the current phase:

File Purpose When to Load
SKILL.md (this file) Architecture, workflow, rendering rules, domain reference Always — primary entry point
SKILL-report.md Report templates (§1-§8), section-to-scratchpad mapping, formatting rules Phase 6 rendering only
SKILL-drilldown.md Post-report drill-down — rule cross-referencing (AR + CD via Graph API), ASIM parser verification, known pitfalls, error handling After report is generated, when user asks follow-up questions (see §13 summary)
Invoke-IngestionScan.ps1 PowerShell data-gathering pipeline (Phases 1-5) Execution only — no need to read unless debugging
slice_scratch.py Read-only verbatim block slicer — extracts ## PRERENDERED tables/skeleton byte-for-byte so they aren't mangled by hand-copy Phase 6 rendering (optional but recommended)
render_dashboard.py Deterministic SVG dashboard renderer — parses scratchpad + report + svg-widgets.yaml into the 7-row dashboard (no hardcoded run data) SVG Dashboard Generation (default — run this when asked to visualize)

📑 TABLE OF CONTENTS

  1. Quick Start - 3-step execution pattern
  2. Critical Workflow Rules - Prerequisites and prohibitions
  3. Execution Workflow - Phases 0-6
  4. Query File Reference - All 23 YAML files
  5. Output Modes - Inline chat vs. Markdown file
  6. Deterministic Rendering Rules - Rules A-G (mandatory for Phase 6)
  7. Domain Reference - SecurityEvent, Syslog, CommonSecurityLog interpretation
  8. Tier Classification - Analytics vs Basic vs Data Lake background
  9. Migration Classification - Zero-rule table categorization for §7a
  10. Reference: Data Lake Migration - DL-eligible tables, decision matrix, trade-off analysis
  11. Reference: License Benefits - DfS P2 / E5 pool calculations
  12. Report Template - JIT pointer → SKILL-report.md
  13. Post-Report Drill-Down Reference - Rule cross-referencing, Custom Detection API, ASIM verification, error handling
  14. SVG Dashboard Generation - Visual dashboard from completed report

Quick Start (TL;DR)

3-step execution pattern:

Step 1:  Run Invoke-IngestionScan.ps1 (Phases 1-5 — data gathering)
Step 2:  Read scratchpad + SKILL-report.md (Phase 6 prep)
Step 3:  Render full report (§1-§8) → create_file

Step 1: Run Data Gathering

# From workspace root — run all phases (default: 30 days):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1"

# Specify a custom window (1, 7, 30, 60, or 90 days):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1" -Days 7

# Or run a specific phase (for re-runs / debugging):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1" -Phase 3

# Synthetic mode — use pre-built test data (no Azure auth required):
& ".github/skills/sentinel-ingestion-report/Invoke-IngestionScan.ps1" -SyntheticDataDir ".github/skills/sentinel-ingestion-report/test-data/enterprise"

Synthetic mode: When the user asks to generate a report using "synthetic data" or "test data", use -SyntheticDataDir pointing to the enterprise test data directory. This bypasses all Azure/Sentinel queries and loads pre-built JSON files instead. Useful for testing report rendering without live workspace access.

Output: Scratchpad file at temp/ingest_scratch_<timestamp>.md (~50 KB, 64 sections).

Timing: Full run (Phase 0 = all phases) takes ~20-25 seconds. Individual phases: 3-8 seconds each.

Step 2: Load Rendering Context

  1. Read the scratchpad file (path printed by PS1 at completion)
  2. Read SKILL-report.md for rendering templates

Step 3: Render Report (Single Write)

Render the complete report (§1-§8) in a single create_file call. Apply SKILL-report.md templates to scratchpad data, following Rules A-G. Render all 8 sections (Executive Summary, Ingestion Overview, Deep Dives, Anomaly Detection, Detection Coverage, License Benefit Analysis, Optimization Recommendations, Appendix) and write to the report file.

⛔ Single-write requirement: The entire report MUST be rendered in one create_file call. Do NOT split rendering across multiple tool calls — splitting causes the LLM to lose template context for later sections (§5-§8), resulting in heading drift, column mutations, and invented content. The complete SKILL-report.md template must be active throughout the entire generation.

🔴 Verbatim table/skeleton blocks — use the deterministic slicer, never hand-copy. The PS1 pre-renders every table, the ASCII cost-waterfall, and the §-heading skeleton under ## PRERENDERED in the scratchpad (Headings, CostWaterfall, DailyChart, TopTables, DetectionPosture, AnomalyTable, CrossReference, SE_Computer, SE_EventID, SyslogHost/Facility/FacSev/Process, CSL_Vendor/Activity, Migration, HealthAlerts, BenefitSummary, DfSP2Detail, E5Tables, QueryTable, Footer). Copy them with the read-only helper instead of transcribing by hand:

python .github/skills/sentinel-ingestion-report/slice_scratch.py --scratch temp/ingest_scratch_<ts>.md --list
python .github/skills/sentinel-ingestion-report/slice_scratch.py --scratch temp/ingest_scratch_<ts>.md --section AnomalyTable

The slicer prefers the ## PRERENDERED copy when a section name also exists as a raw data block, folds the nested ### lines of the Headings skeleton into one block (so --section Headings returns the full §-heading lock list), strips pipeline scaffolding (<!-- … --> comments, SectionTitle: markers), preserves #### sub-headings, and collapses blank runs — so the output drops straight into the report as a valid markdown table or fenced block. Do NOT paste the raw Key | Value | … data blocks (the early raw sections with a <!-- header --> comment and no |---| separator row) — they render as plain text, not tables, and dumping the whole scratchpad tail into one section corrupts the report.


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY ingestion report:

  1. Run Invoke-IngestionScan.ps1 — this single script handles ALL data gathering (Phases 1-5). The LLM does NOT run queries, transcribe output, or write scratchpad sections
  2. Read config.json for workspace ID, tenant, subscription, and Azure MCP parameters
  3. ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, or both (default: both)
  4. ALWAYS ask the user for timeframe if not specified: supported values are 1, 7, 30 (default), 60, or 90 days. The -Days parameter controls the primary window; deep-dive and comparison windows are derived automatically

Date Window Model

The -Days parameter drives three time windows used across all queries:

Window Token Derivation Purpose
Primary {days} = -Days value Usage overview (Q1-Q3), alert firing (Q12), license benefits (Q17/Q17b), tier summary (Q10b)
Deep-dive {deepDiveDays} ≤7→Days, ≤30→7, ≤60→14, ≤90→30 Table breakdowns (Q4-Q8), rule health (Q11/Q11d), cross-ref (Q13), migration candidates (Q16), WoW "this period" (Q15)
Comparison {wowTotalDays} = deepDiveDays × 2 Period-over-period total lookback (Q15)

Example: -Days 60 → primary=60d, deep-dive=14d, comparison=28d

Dynamic period labels: Report column headers adapt automatically ("This Week"/"Last Week" for 7d deep-dive, "This Month"/"Last Month" for 30d, "This Period"/"Last Period" for 14d).

Exception: Q14 (24h anomaly detection) is unaffected by -Days — it uses fixed algorithmic constants (P30D lookback, 29-day weekday baseline). 5. ALWAYS use create_file for markdown reports (NEVER use PowerShell terminal commands) 6. ALWAYS sanitize PII from saved reports — use generic placeholders for real hostnames, workspace names, and tenant GUIDs in committed files 7. Read scratchpad + SKILL-report.md before rendering — the scratchpad is the sole data source for the report 8. Tier display convention — Azure CLI reports Data Lake tier tables as plan Auxiliary internally, but always refer to this tier as "Data Lake" in output — never use "Auxiliary"

Prerequisites

Dependency Required By Setup
Azure CLI (az) All KQL queries (az monitor log-analytics query), analytic rule inventory (az rest), tier classification (az monitor log-analytics workspace table list) Install: aka.ms/installazurecli. Authenticate: az login --tenant <tenant_id> then az account set --subscription <subscription_id>
log-analytics extension az monitor log-analytics query (all KQL queries in Phases 1-5) Install: az extension add --name log-analytics. Verify: az extension list --query "[?name=='log-analytics']"
Azure RBAC Azure CLI calls above Log Analytics Reader on the workspace (KQL queries + table list). Microsoft Sentinel Reader on the workspace (analytic rule inventory via az rest)
Microsoft.Graph PowerShell Q9b (Custom Detection rules via Invoke-MgGraphRequest) Install-Module Microsoft.Graph.Authentication -Scope CurrentUser. Required Graph scope: CustomDetection.Read.All (interactive consent on first run). PS1 skips gracefully if module not installed or auth fails
PowerShell 7.0+ Parallel query execution ForEach-Object -Parallel requires PS7+

🔴 PROHIBITED

  • ❌ Running KQL queries via MCP tools during data gathering — PS1 handles all queries
  • ❌ Writing or modifying scratchpad sections manually — PS1 is the sole writer
  • ❌ Reporting cost in dollar amounts — always use GB savings (e.g., "~78.7 GB/month savings")
  • ❌ Fabricating ingestion volumes, device names, or anomaly percentages
  • ❌ Overriding DL eligibility classification from PS1 output based on LLM knowledge
  • ❌ Rendering the report without first reading the scratchpad file

Execution Workflow

Phase 0: Initialization

  1. Read config.json for sentinel_workspace_id, subscription_id, Azure MCP parameters
  2. Confirm output mode and timeframe with user (pass -Days to PS1; default 30)
  3. Verify prerequisites: az login session active, correct subscription set

Phases 1-5: Data Gathering (automated by PS1)

Run Invoke-IngestionScan.ps1 — it handles all 5 phases automatically:

Phase Queries Description Execution Type
1 Q1, Q2, Q3 Core ingestion overview — Usage by DataType, daily trend, workspace summary KQL (parallel)
2 Q4, Q5, Q6a, Q6b, Q6c, Q7, Q8 Table deep dives — SecurityEvent, Syslog, CommonSecurityLog breakdowns KQL (parallel)
3 Q9, Q9b, Q10, Q10b External data — analytic rule inventory (REST), custom detections (Graph), tier classification (CLI), tier summary (KQL) REST + Graph + CLI + KQL (sequential, with depends_on)
4 Q11, Q11d, Q12, Q13 Detection coverage — rule health (SentinelHealth), alert firing (SecurityAlert), cross-reference (all tables with data vs. rule inventory) KQL (parallel) + post-processing
5 Q14, Q15, Q16, Q17, Q17b Anomaly detection + cost analysis — 24h anomaly, WoW comparison, migration candidates, license benefits, E5 per-table KQL (parallel) + post-processing

Post-processing (automated by PS1, Phases 4-5):

Task Phase Description
Table cross-reference 4 For each table with data (Q13), regex-search all enabled rule queries for that table name
ASIM parser detection 4 Search all rule queries for ASIM function patterns (_Im_, _ASim_, imDns, etc.)
Value-level rule verification 4 For each EventID/Facility/ProcessName/Activity/Vendor from deep dives, check if any rules reference it
Detection gap detection 4 Identify tables on DL/Basic tier that have enabled rules (🔴 critical finding)
Anomaly severity classification 5 Apply Rule A thresholds to Q14/Q15 results
DL eligibility classification 5 Classify all tables using hardcoded $dlYes/$dlNo reference arrays
Migration table assembly 5 Cross-reference volume × rule count × tier × DL eligibility → category assignment
License benefit computation 5 Compute DfS P2 pool, E5 grant breakdown

Scratchpad output: PS1 writes all results to temp/ingest_scratch_<timestamp>.md (~50 KB, ~64 named sections including PHASE_, PRERENDERED, and META blocks). See SKILL-report.md for the Section-to-Scratchpad Mapping.

Phase 6: Render Output (LLM)

🔴 MANDATORY — Load scratchpad + report template before rendering:

  1. Read the scratchpad file (path printed by PS1). This single file contains ALL data from Phases 1-5.
  2. Read SKILL-report.md for the complete rendering templates and formatting rules.

Pre-render validation:

  1. Verify scratchpad has all 5 phase sections (PHASE_1 through PHASE_5)
  2. Check that PHASE_5.DL_Script_Output is populated (proof of DL classification execution)
  3. Cross-validate: Q11 TotalRulesInHealth against Q9 AR_Enabled — if >10% gap, note it

Render — Section-by-Section Checklist:

Render the report section by section per SKILL-report.md templates. Do NOT skip any section. If a section's data returned 0 results, render the section header with a "✅ No anomalies/items found" note.

Section Data Source (scratchpad keys) Required
§1 All phases ✅ Workspace at a Glance, Cost Waterfall, Detection Posture, Top 3
§2 PHASE_1.Tables, PHASE_3.TierSummary ✅ Table breakdown + tier summary
§3 PRERENDERED.SE_, PRERENDERED.Syslog, PRERENDERED.CSL_* ✅ Deep dives (skip sub-section only if table not in top 20)
§4 PHASE_5.Anomaly24h/AnomalyWoW, PHASE_1.DailyTrend ✅ Anomaly table + daily trend chart
§5 PHASE_3.RuleInventory, PHASE_4.* ✅ Rule inventory + cross-ref + health
§6 PHASE_5.LicenseBenefits/E5_Tables ✅ DfS P2 + E5 analysis
§7 PHASE_5.Migration, PHASE_4.CrossRef ✅ Migration candidates + recommendations
§8 All ✅ Appendix (query reference, methodology)

Compute Top 3 Recommendations using Rule E: scan all scratchpad sections, score each candidate, select the top 3 by score.

Post-render:

  • Render inline chat executive summary (if requested)
  • Confirm markdown file path to user

Query File Reference

All queries are defined as YAML files in queries/phase1-5/. PS1 discovers, parses, and executes them automatically.

YAML Format

id: ingestion-q1                              # Unique identifier
name: Usage by DataType with Billing Breakdown # Human-readable name
description: Top 20 tables ranked by volume    # What it does
phase: 1                                       # Which phase (1-5)
type: kql                                      # kql | rest | cli | graph
timespan: P{days}D                             # Placeholder — PS1 substitutes at runtime
query: |                                       # KQL query (multiline block scalar)
  Usage
  | where TimeGenerated > ago({days}d)
  ...

Non-KQL types have additional fields:

Type Additional Fields Description
rest url, method, jmespath Sentinel REST API via az rest
cli command Azure CLI command (e.g., az monitor log-analytics workspace table list)
graph uri, method Microsoft Graph API via Invoke-MgGraphRequest

Complete Query Inventory

Phase File ID Type Description
1 Q1-UsageByDataType.yaml ingestion-q1 kql Top 20 tables by billable volume with solution mapping
1 Q2-DailyIngestionTrend.yaml ingestion-q2 kql Daily ingestion trend
1 Q3-WorkspaceSummary.yaml ingestion-q3 kql Executive summary: table count, billable totals, daily average
2 Q4-SecurityEventByComputer.yaml ingestion-q4 kql SecurityEvent by Computer (top 25)
2 Q5-SecurityEventByEventID.yaml ingestion-q5 kql SecurityEvent by EventID (top 20)
2 Q6a-SyslogByHost.yaml ingestion-q6a kql Syslog by source host (top 25)
2 Q6b-SyslogByFacilitySeverity.yaml ingestion-q6b kql Syslog by Facility × SeverityLevel (top 30)
2 Q6c-SyslogByProcess.yaml ingestion-q6c kql Syslog top ProcessName by Facility (top 30)
2 Q7-CSLByVendor.yaml ingestion-q7 kql CommonSecurityLog by DeviceVendor/DeviceProduct (top 20)
2 Q8-CSLByActivity.yaml ingestion-q8 kql CommonSecurityLog by Activity/LogSeverity/DeviceAction (top 30)
3 Q9-AnalyticRuleInventory.yaml ingestion-q9 rest Analytic rules (Scheduled + NRT) via Sentinel REST API
3 Q9b-CustomDetectionRules.yaml ingestion-q9b graph Custom Detection rules via Microsoft Graph SDK
3 Q10-TableTierClassification.yaml ingestion-q10 cli Table tier classification via Azure CLI
3 Q10b-TierSummary.yaml ingestion-q10b kql Per-tier volume summary (depends_on: Q10)
4 Q11-RuleHealthSummary.yaml ingestion-q11 kql SentinelHealth — rule execution health summary
4 Q11d-FailingRuleDetail.yaml ingestion-q11d kql SentinelHealth — top 20 failing rules detail
4 Q12-SecurityAlertFiring.yaml ingestion-q12 kql SecurityAlert — top 30 alert-producing rules
4 Q13-AllTablesWithData.yaml ingestion-q13 kql All tables with data in deep-dive window (for cross-reference)
5 Q14-IngestionAnomaly24h.yaml ingestion-q14 kql 24h vs same-weekday avg anomaly detection (29d lookback, fallback to flat 7d, >50%, ≥0.01 GB)
5 Q15-WeekOverWeek.yaml ingestion-q15 kql Period-over-period volume comparison
5 Q16-MigrationCandidates.yaml ingestion-q16 kql Billable tables with deep-dive volume (for migration analysis)
5 Q17-LicenseBenefitAnalysis.yaml ingestion-q17 kql DfS P2 + E5 daily ingestion breakdown
5 Q17b-E5PerTableBreakdown.yaml ingestion-q17b kql E5-eligible per-table volume

Output Modes

Mode 1: Inline Chat Summary (default for quick requests)

Compact executive summary rendered directly in chat.

Mode 2: Markdown File Report

Full detailed report saved to reports/sentinel/sentinel_ingestion_report_<YYYYMMDD_HHMMSS>.md.

Mode 3: Both (default when user says "report" or "generate report")

Inline chat executive summary + full markdown file.

Ask user if not specified:

"How would you like the report? I can provide:

  1. Inline chat summary — executive overview in chat
  2. Markdown file — detailed report saved to reports/sentinel/
  3. Both (recommended) — summary in chat + full report file"

Deterministic Rendering Rules

These rules eliminate LLM interpretation variance. Apply them EXACTLY during report rendering (Phase 6). No discretion allowed — the thresholds and formulas below are the sole authority.

Rule A: Anomaly Severity Classification

⚙️ Pre-computed by PS1PRERENDERED.AnomalyTable. Thresholds below retained for §8 methodology reference and manual verification.

Assign severity to each anomaly row deterministically based on absolute deviation AND volume.

Condition (both must be true) Severity Emoji
abs(Deviation%) ≥ 200 AND max(Last24hGB, Avg7dGB) ≥ 0.05 GB High 🟠
abs(Deviation%) ≥ 100 AND max(Last24hGB, Avg7dGB) ≥ 0.01 GB Medium 🟡
abs(Deviation%) ≥ 50 AND max(Last24hGB, Avg7dGB) ≥ 0.01 GB Low
Below thresholds OR both periods < 0.01 GB volume Excluded

Volume floor: The 0.01 GB minimum is enforced by the KQL queries. Tables below this floor are noise and MUST NOT appear in the anomaly table regardless of deviation percentage.

Override 1 — Rule-count: ANY table with ≥5 enabled rules AND an absolute change ≥40% (in either 24h or WoW) is automatically 🟠 regardless of base thresholds — a significant drop on a table feeding multiple rules signals potential connector or TI feed health issues that affect detection coverage. The 24h override catches same-day connector outages; the WoW override catches gradual multi-day degradation.

Override 2 — Near-zero: ANY table with deviation ≤ −95% AND max(volume) ≥ 0.05 GB is automatically 🟠 regardless of rule count — a near-complete signal loss on a significant table is an operational emergency (e.g., connector failure, API key expiry) even if no rules reference it directly.

⛔ PROHIBITED: Assigning severity based on "judgment", "context", or "this table is important" UNLESS the high-rule-count override above applies. Outside that specific override, the emoji MUST match the threshold table above — no discretionary overrides.

Rule B: Risk Rating Definition

In the Top 3 Recommendations table (§1), the "Risk" column means:

Risk = the security or operational impact of NOT acting on this recommendation.

Risk Level Definition Examples
High Active detection gap or data loss if not addressed Rules silently failing on DL tier; connector dropping data; 0% detection coverage on critical table
Medium Missed optimization with measurable cost/posture impact Zero-rule high-volume table on Analytics tier; noisy EventID with no detection value
Low Minor improvement, no immediate security or cost impact Small-volume table tier change; informational tuning

⛔ PROHIBITED: Interpreting "Risk" as implementation difficulty, effort, or change management complexity. Those concerns belong in prose recommendations (§7b-d), NOT the Risk column.

Rule C: Weekday Average Computation

⚙️ Pre-computed by PS1PRERENDERED.DailyChart. Logic below retained for §8 methodology reference.

When computing per-weekday averages for the §4b daily trend chart:

  1. Exclude the report-generation day: If the last day in PHASE_1.DailyTrend matches META.Generated date, always exclude it from weekday averages — the report was generated mid-day so this is a partial day regardless of its volume. This prevents the partial day from non-deterministically dragging down whichever weekday it falls on.
  2. Exclude ingestion gaps: Any remaining day with total ingestion < 0.1 GB is also excluded. These are ingestion reporting gaps, not representative of normal patterns.
  3. Formula: Weekday Avg = sum(GB for that weekday, excluding days per rules 1–2) / count(qualifying days for that weekday)
  4. Round to 2 decimal places.

⛔ PROHIBITED: Including the report-generation partial day or days with < 0.1 GB in averages — they drag down specific weekdays non-deterministically.

Rule D: Cross-Validation Denominator

In §5b cross-validation (Q11 vs Q9), always use AR-only enabled count from Q9 as the denominator:

Gap% = (Q9_AR_Enabled - Q11_DistinctRules) / Q9_AR_Enabled × 100

Do NOT use Combined_Enabled (AR+CD) as the denominator. SentinelHealth only tracks AR executions (Scheduled + NRT), not Custom Detection executions. Comparing Q11 against combined AR+CD inflates the gap percentage.

Rule E: Top 3 Recommendation Ranking

Rank ALL candidate recommendations using this scoring formula. The top 3 by score become the Top 3 in §1. Computed by the LLM at render time by cross-referencing all scratchpad sections.

Category SeverityWeight ImpactValue Scratchpad Source
🔴 Detection gap (rules on wrong tier) 10 Number of affected rules PHASE_4.DetectionGaps — PS1 emits Detection gap (XDR) or Detection gap (non-XDR) in §7a Category column
🔴 Data loss / connector failure 10 Affected volume in GB/day PHASE_5.Anomaly24h (large negative deviations)
🟠 DL-eligible migration (zero rules) 5 BillableGB from §2a (or deep-dive GB / deepDiveDays × Days if only in §7a) PHASE_5.Migration (Strong DL-eligible rows)
🟠 DL + KQL Job promotion 4 BillableGB (primary window) High-volume 🟣/🟢 table — can complement split ingestion or stand alone; present both options and note they are combinable
🟠 License benefit activation 4 Eligible unclaimed GB/day PRERENDERED.BenefitSummary + PRERENDERED.E5Tables + PRERENDERED.DfSP2Detail (volume eligible but benefit not yet activated)
🟠 DCR filter / EventID pruning 4 Estimated saveable GB (deep dive % × table BillableGB) PHASE_2.SE_EventID + PHASE_4.ValueRef_EventID
🟠 Health fix (failing rules) 4 Number of failing rules PHASE_4.FailingRules
🟡 Volume spike / cost anomaly 3 Spike GB on zero-rule tables PHASE_5.Anomaly24h (large positive deviations on zero-rule tables — cost spike with no detection value)
🟡 Duplicate ingestion 3 Duplicate GB Cross-ref PRERENDERED.SyslogFacility × PRERENDERED.CSL_Vendor (same-appliance overlap emitting both Syslog and CEF/ASA = double billing)
🟡 Split ingestion 3 BillableGB × estimated non-detection fraction PHASE_2 deep dives + PHASE_4.ValueRef_* (zero-rule values)
🟡 Tier review / unknown eligibility 2 BillableGB PHASE_5.Migration (Unknown rows)

Score = SeverityWeight × ImpactValue

Sorting: severity-first, then score. 🔴 items always rank above 🟠 items, which always rank above 🟡 items, regardless of score. Within the same severity tier, rank by descending score. This ensures detection gaps and data loss signals are never buried below cost optimizations.

Tie-breaking within same severity: higher score wins. If scores are equal, higher SeverityWeight wins. If still tied, higher ImpactValue wins.

⛔ PROHIBITED: Selecting Top 3 recommendations based on narrative variety, "one from each category", or subjective importance. The formula determines ranking — the LLM renders, it does not curate.

License benefit activation: Surfaces when PRERENDERED.BenefitSummary or PRERENDERED.E5Tables show E5-eligible or DfS-P2-eligible volume that is not yet being claimed (benefit shows 0 or is absent while eligible tables are ingesting). ImpactValue = the eligible GB/day that could be offset.

Volume spike / cost anomaly: Surfaces when PHASE_5.Anomaly24h shows a large positive deviation (>50% above baseline) on a table with zero detection rules (per PHASE_4.CrossRef). A spiking table with no rules has cost impact but no detection value — a strong signal to investigate and potentially filter or move to DL.

Duplicate ingestion: Surfaces when the same network appliance sends data via both Syslog and CommonSecurityLog (CEF/ASA). Compare appliance names/IPs in PRERENDERED.SyslogFacility/PRERENDERED.SyslogHost against PRERENDERED.CSL_Vendor — overlapping sources indicate double billing for the same data. ImpactValue = the smaller of the two streams (the duplicate portion).


Domain Reference

This section provides the domain knowledge needed during Phase 6 rendering. When writing deep dive sections (§3), anomaly analysis (§4), and recommendations (§7), consult these reference tables for interpretation guidance.

SecurityEvent — EventID Optimization

Which EventIDs generate the most volume and their detection vs. cost tradeoff:

EventID Description Optimization Potential
4663 Object access (file auditing) 🔴 High — often excessive. Consider DCR drop filter or scoping SACL
4624 Successful logon 🟡 Medium — valuable for hunting/forensics but rarely in analytic rules. Strong split ingestion candidate: send to Data Lake for retention, keep off Analytics tier
4688 Process creation 🟡 Medium — consider moving to MDE DeviceProcessEvents. If no rules reference it, split to Data Lake
4799 Security group membership enumeration 🟡 Medium — often noisy on domain controllers
4672 Special privileges assigned 🟡 Medium — high volume on DCs
4625 Failed logon 🟢 Low — usually valuable for security detection

🟣 Split ingestion tip: For any deep-dive table classified as 🟢 Keep Analytics (active detection rules), individual high-volume values with zero rule references (verified via Phase 4 value-level check) are strong candidates for sub-table split ingestion. Route those values to Data Lake via DCR transformation — they remain available for hunting while the detection-relevant values stay on Analytics tier. KQL jobs can also run against this split-routed DL data to surface aggregated insights back to Analytics if needed.

Syslog — Facility Reference

Optimization potential by Syslog facility:

Facility Description Optimization Potential
auth Authentication events (login, su, getty) 🟢 Low — always security-relevant. Keep in Analytics tier
authpriv Private authentication (PAM, sudo, sshd) 🟢 Low — critical for security detection. Always keep
kern Kernel messages (hardware, driver, critical system) 🟡 Medium — security-relevant but can be noisy. Consider Error+ only for high-volume servers
cron Scheduled task notifications 🔴 High — rarely security-relevant at Info/Notice. Keep Warning+ only
daemon System daemon messages (systemd, sshd, named, httpd) 🔴 High — typically largest Syslog contributor (50-80% of volume). Contains both security-critical processes (sshd) and noisy infrastructure (systemd). Drill down with Q6c to identify filterable processes
syslog Internal syslog daemon messages 🟡 Medium — mostly operational. Keep Warning+ in Analytics
user User-space application messages 🟡 Medium — varies by application. Check ProcessName
mail Mail subsystem (postfix, sendmail, dovecot) 🟡 Medium — relevant if mail is in scope; otherwise DL candidate
local0–local7 Custom application logs 🔴 High — most common cost optimization targets. Custom apps often log at Debug/Info verbosity
ftp FTP daemon messages 🟢 Low volume; keep for auditing if FTP in use
lpr Print subsystem 🔴 High — almost never security-relevant. Set to None in DCR
news Network news (NNTP) 🔴 High — almost never security-relevant. Set to None in DCR
uucp UUCP subsystem 🔴 High — almost never security-relevant. Set to None in DCR
mark Internal timestamp marker 🔴 High — operational only. Set to None in DCR

Syslog — DCR Severity-per-Facility Recommendations

The Data Collection Rule allows setting a minimum severity level per facility — the single most impactful cost control for Syslog:

Facility Recommended Minimum Rationale
auth, authpriv Debug (collect all) Security-critical — never filter
kern Notice Kernel module loads (T1547.006) and promiscuous mode (T1040) are kern.notice. Volume impact is minimal
daemon Warning or Error Major volume reduction. Note: sshd auth events go to auth/authpriv, not daemon. Trade-off: loses systemd service stop events at Info (security service tampering) — acceptable if EDR covers this
cron Warning Trade-off: cron job execution events are cron.info (T1053.003 persistence). Acceptable if auditd or MDE covers cron file monitoring
syslog Warning Internal operational messages are low-value at Info
user Warning Unless specific apps produce security telemetry
mail Warning Info-level mail relay logs are very verbose
local0–local7 Assess per-app No safe default — network appliances, security tools, and databases commonly use local facilities. Review Q6c (Process by Facility) before setting severity filters
lpr, news, uucp, mark None Disable collection entirely

Syslog — SeverityLevel Values

SeverityLevel (string) Numeric Meaning Retention Priority
emerg 0 System unusable 🔴 Always keep
alert 1 Immediate action required 🔴 Always keep
crit 2 Critical condition 🔴 Always keep
err 3 Error condition 🟡 Keep for most facilities
warning 4 Warning condition 🟡 Keep for security-relevant facilities
notice 5 Normal but significant 🟡 Keep for auth/authpriv and kern
info 6 Informational 🟢 Filter for high-volume facilities
debug 7 Debug-level detail 🟢 Filter everywhere except auth/authpriv

Syslog — ProcessName Security Relevance

ProcessName Typical Facility Security Relevance Optimization
systemd daemon 🟡 Low-Medium — unit start/stop events 🔴 Often 30-50% of daemon volume. Filter Info/Notice at DCR
systemd-logind daemon 🟡 Medium — session/seat tracking Keep Warning+
sshd auth, authpriv, daemon 🟢 High — SSH login detection (brute force, lateral movement) 🟢 Always keep
sudo authpriv 🟢 High — privilege escalation tracking 🟢 Always keep
su auth, authpriv 🟢 High — user switching 🟢 Always keep
CRON / crond cron 🟡 Low-Medium — scheduled tasks Keep Warning+ unless monitoring for T1053
named / bind daemon 🟡 Medium — DNS. Relevant for DNS tunneling Keep if DNS rules exist; otherwise Warning+
httpd / nginx daemon 🟡 Medium — web server logs Assess overlap with WAF/CSL data
postfix / sendmail mail 🟡 Low-Medium — mail relay Keep Warning+
dhclient / NetworkManager daemon 🟡 Low — DHCP/network changes Filter Info/Notice
kernel kern 🟢 Medium-High — kernel events, module loads Keep Warning+
auditd daemon, user 🟢 High — Linux Audit Framework 🟢 Always keep
polkitd authpriv 🟡 Medium — PolicyKit authorization Keep Warning+
dbus-daemon daemon 🟡 Low — IPC. Rarely security-relevant Filter all or keep Error+
rsyslogd / syslog-ng syslog 🟡 Low — internal syslog ops Keep Warning+

🟣 Split ingestion tip: If daemon facility accounts for >50% of Syslog and Q6c reveals systemd + systemd-logind + dbus-daemon dominate, consider a DCR transformation routing those processes to Data Lake while keeping sshd, auditd, and other security-critical processes in Analytics. KQL jobs can complement this by querying the DL-routed portion on schedule.

Syslog — Log Forwarding Architecture Note

In environments using centralized rsyslog/syslog-ng forwarders:

  • Computer = the log forwarder hostname (many servers collapse to 1-2 forwarders)
  • HostName = the actual originating device (from syslog header)
  • HostIP = the originating device's IP address

The Q6a query uses SourceHost = iff(isnotempty(HostName) and HostName != Computer, HostName, Computer) to prefer the original source. If Q6a shows only 1-2 hosts despite expecting 100+ servers, the environment uses forwarding.

CommonSecurityLog — Vendor Reference

DeviceVendor DeviceProduct Optimization Potential
Palo Alto Networks PAN-OS 🔴 High — filter TRAFFIC activity, keep THREAT in Analytics
Check Point Firewall / VPN-1 & FireWall-1 🔴 High — filter routine Accept actions
Fortinet Fortigate 🔴 High — filter traffic subtype, keep utm and event
Cisco ASA 🟡 Medium — filter by message ID ranges
Zscaler NSSWeblog 🟡 Medium — web proxy logs can be high volume
F5 BIG-IP ASM / LTM 🟡 Medium — WAF logs can spike during attacks
Trend Micro Deep Security 🟢 Low — typically moderate volume

Firewall traffic/session logs often account for 60-80% of CSL volume. These are primarily TRAFFIC or Accept events with low detection value. Consider DCR transformation, Data Lake tier, split ingestion, or DL + KQL job promotion (these last two can be combined).

CommonSecurityLog — LogSeverity Values

Value (string) Value (int) Meaning Retention Priority
Very-High 9-10 Critical security event 🔴 Always keep in Analytics
High 7-8 Significant security event 🔴 Keep in Analytics
Medium 4-6 Notable event 🟡 Review — may be filterable
Low 0-3 Informational event 🟢 Candidate for DL or DCR filter
(empty/Unknown) Unmapped severity ⚠️ Check vendor documentation

DeviceAction optimization: If >70% of events have DeviceAction = "Allow" or "Accept", the table is dominated by permitted traffic. Filter at DCR level or move to Data Lake, keeping only denied/blocked/threat events in Analytics.

Anomaly Interpretation (Q14/Q15)

24h anomalies (Q14): Flags tables where last-24h ingestion deviates >50% from the same-weekday daily average AND at least one period has ≥0.01 GB volume. Q14 uses a fixed 29-day lookback (algorithmic constant, not affected by -Days).

  • Positive spikes: May indicate attacks, misconfigured connectors, or bulk imports
  • Negative drops: May indicate connector failures, agent issues, or collection gaps

Period-over-period (Q15): Compares total volume per table between current and prior period (period length = deep-dive window).

  • New tables (100% change) → appeared only this period (new connector?)
  • Growing tables → expanding collection scope or increased activity
  • Shrinking tables → connector removal, collection changes, or seasonal patterns
  • Stable high-volume tables → included via ThisWeekMB > 100 filter for visibility

Tier Classification

Background

The Sentinel Usage table does NOT contain a TablePlan or Tier column. There is no KQL-native way to determine whether a table is on Analytics, Basic, or Data Lake tier.

PS1 handles this automatically: Q10 (CLI type) runs az monitor log-analytics workspace table list to fetch table plans, then Q10b (KQL, depends_on: Q10) computes per-tier volume summaries using the CLI output. The results are written to PHASE_3.Tiers and PHASE_3.TierSummary in the scratchpad.

Tier Display Convention

Azure CLI reports Data Lake tier tables as plan Auxiliary internally. Always refer to this tier as "Data Lake" in all output — never use "Auxiliary". The _CL suffix denotes a custom log table, not a copy — describe these as "Custom Data Lake table" (not "Auxiliary copy").

Q10b Cross-Reference Query

PS1 automatically populates the DataLakeTables and BasicTables arrays from CLI output and executes the tier summary KQL query. This computes per-tier TotalGB, BillableGB, TableCount, and PercentOfTotal using the full Usage table (not limited to Q1 top-20). These values are the authoritative source for PHASE_3.TierSummary and §2b rendering.


Migration Classification

Used when rendering §7a (Tier Migration Candidates). PS1 computes the Category column using these criteria; the LLM uses this reference for rendering interpretation and recommendation prose.

Category Criteria Action
🔵 KQL Job output Table name ends with _KQL_CL NEVER migrate — promoted data from Data Lake, essential for detection pipeline
🔵 Already on Data Lake Q10 tier = Data Lake AND zero rules Already migrated — no action needed
🟢 Keep Analytics ≥1 enabled analytic rule AND healthy executions Active detection coverage justifies Analytics cost
🟣 Split ingestion candidate 1-2 enabled rules AND high-volume (≥5 GB/week) AND DL-eligible Few rules need only a subset of events. Route detection-relevant subset to Analytics via DCR, rest to Data Lake
Detection gap (non-XDR) ≥1 enabled rule AND table is on Data Lake tier AND table is NOT an XDR table Critical: Analytic rules cannot execute against DL tables — rules silently failing. Custom Detections also do NOT work because non-XDR tables are not available in Advanced Hunting on Data Lake. Remediation: (1) move table back to Analytics, OR (2) remove/disable the analytic rules referencing the table (accept DL tier). ⛔ PROHIBITED: Recommending "convert ARs to Custom Detections" for non-XDR tables — CDs run against Advanced Hunting which only retains Defender XDR tables for 30 days. Non-XDR tables on Data Lake are invisible to Advanced Hunting.
Detection gap (XDR) ≥1 enabled rule AND table is on Data Lake tier AND table IS an XDR table Partial gap: Sentinel Analytic Rules (AR) cannot execute against DL tables — ARs silently failing. However, XDR-native tables (Device*, Email*, CloudAppEvents, UrlClickEvents) are ALWAYS available in Advanced Hunting for 30 days regardless of Sentinel tier. Custom Detection rules run against Advanced Hunting, so CD rules continue to work. Only ARs are broken. Remediation: (1) move table back to Analytics, (2) convert affected ARs to Custom Detections, OR (3) remove/disable the ARs. See Advanced Hunting data retention
🔴 Strong candidate (DL-eligible) 0 rules AND DL classification = Yes Evaluate DCR filtering to reduce unnecessary volume, then migrate remainder to Data Lake

LLM overlay checks (not separate emojis — flag as callout notes in §7b/7c prose):

  • Execution issues: If a 🟢 table's rules appear in PHASE_4.FailingRules with 0 executions or failures, add a ⚠️ note: "Rules targeting [table] have execution issues — see §5b. Fix rules before relying on this coverage."
  • ASIM dependency: If a 🔴 zero-rule table appears in PHASE_4.ASIM as consumed by ASIM parsers, add a ⚠️ note: "[table] is consumed by ASIM parsers ([parser names]) — migrating to Data Lake breaks these detections. Verify ASIM dependency before migrating." | 🟠 Not DL-eligible / unknown | 0 rules AND DL classification = No or Unknown | Optimize via DCR filtering or add analytic rules. Check MS docs for current eligibility |

Reference: Data Lake Migration

This section contains lookup tables and background guidance for DL migration classification. Consult when rendering §7a recommendations and explaining Data Lake trade-offs.

Known DL-Eligible Tables

PS1 uses these lists as hardcoded $dlYes/$dlNo arrays. Keep this reference in sync with the script.

Category DL-Eligible Tables Notes
Defender XDR CloudAppEvents, DeviceEvents, DeviceFileCertificateInfo, DeviceFileEvents, DeviceImageLoadEvents, DeviceInfo, DeviceLogonEvents, DeviceNetworkEvents, DeviceNetworkInfo, DeviceProcessEvents, DeviceRegistryEvents, EmailAttachmentInfo, EmailEvents, EmailPostDeliveryEvents, EmailUrlInfo, UrlClickEvents GA Feb 2025
Verified LA tables AADManagedIdentitySignInLogs, AADNonInteractiveUserSignInLogs, AADProvisioningLogs, AADServicePrincipalSignInLogs, AADUserRiskEvents, AuditLogs, AWSCloudTrail, AzureDiagnostics, CommonSecurityLog, Event, GCPAuditLogs, LAQueryLogs, McasShadowItReporting, MicrosoftGraphActivityLogs, OfficeActivity, Perf, SecurityAlert, SecurityEvent, SecurityIncident, SentinelHealth, SigninLogs, StorageBlobLogs, Syslog, W3CIISLog, WindowsEvent, WindowsFirewall ⚠️ Only these LA tables are verified DL-eligible. Unlisted → Unknown
Custom tables Any table ending in _CL (except _KQL_CL) Custom log tables are workspace-managed → DL-eligible

Known DL-Ineligible Tables (as of Feb 2026)

Category Ineligible Tables Notes
XDR — not yet supported DeviceTvmSoftwareInventory, DeviceTvmSoftwareVulnerabilities, AlertEvidence, AlertInfo, IdentityDirectoryEvents, IdentityLogonEvents, IdentityQueryEvents MDI tables announced for future DL support
Entra ID MicrosoftServicePrincipalSignInLogs, MicrosoftNonInteractiveUserSignInLogs, MicrosoftManagedIdentitySignInLogs Not yet DL-eligible
Threat Intelligence ThreatIntelIndicators, ThreatIntelligenceIndicator Required on Analytics for TI matching rules. Never recommend migration
Log Analytics AppDependencies, AppMetrics, AppPerformanceCounters, AppTraces, AzureActivity, AzureMetrics, ConfigurationChange, Heartbeat, SecurityRecommendation Not yet DL-eligible

Fallback rule: If a table is not in either list, the script classifies it as Unknown. Render as ❓ Unknown with note: "Verify at Manage data tiers before migrating."

Decision Matrix

Enabled Rules Executions (Health) Alerts (Q12) DL-Eligible? Volume Recommendation
0 N/A 0 ✅ Yes > 1 GB/week 🔴 Evaluate DCR filtering to reduce volume, then migrate remainder to Data Lake (confirm no ASIM dependency)
0 N/A 0 ✅ Yes < 1 GB/week 🔴 Migrate to Data Lake — minimal savings but cleaner tier alignment. DCR filtering optional at this volume
0 N/A 0 ❌ No / ❓ Unknown Any 🟠 Not eligible or unknown — review ingestion necessity, apply DCR filtering
0 N/A (on DL) 0 N/A — already DL Any 🔵 Already on Data Lake — no action needed
0 (ASIM-dependent) N/A 0 Any Any � Migrate — but LLM adds ⚠️ ASIM dependency callout in §7b
≥1 0 or failures 0 Any Any 🟢 Keep — but LLM adds ⚠️ execution issues callout in §7b
≥1 0 (on DL) Any N/A — on DL Any 🔴 Detection gap — ARs cannot execute against DL. PS1 emits Detection gap (XDR) or Detection gap (non-XDR). If XDR table: CDs still work via Advanced Hunting; recommend converting ARs→CDs or moving back to Analytics. If non-XDR table: move back to Analytics OR remove/disable rules. ⛔ NEVER recommend CD conversion for non-XDR tables
1-2 > 0, healthy Any ✅ Yes ≥ 5 GB/week 🟣 Split ingestion candidate
≥1 > 0, healthy 0 Any Any 🟢 Keep Analytics — rules executing, no matches (normal for TI rules)
≥1 > 0, healthy > 0 Any Any 🟢 Keep Analytics — active detections generating alerts

Data Lake Trade-Off

Capability Analytics Tier Data Lake Tier
Analytics rules, alerting, hunting ✅ Full support ❌ Not available (but see XDR exception below)
Custom Detection rules (Advanced Hunting) ✅ Full support ⚠️ XDR tables only: Still available — AH retains 30 days regardless of Sentinel tier. Non-XDR tables: ❌
Workbooks, playbooks, parsers, watchlists ✅ Full support ❌ Not available
KQL query performance ✅ High-performance ⚠️ Slower
Query cost ✅ Included in ingestion price ❌ Billed per query (data scanned)
KQL Jobs / Summary Rules / Search Jobs
Ingestion cost Standard Minimal
Default retention 90 days (Sentinel) / 30 days (XDR) Matches analytics, extendable to 12 years

Primary vs secondary security data: Primary security data (EDR alerts, auth logs, audit trails) belongs on Analytics. Secondary data (NetFlow, storage access logs, firewall traffic, IoT logs) is ideal for Data Lake.

Filter before you migrate: For high-volume zero-rule tables, DL migration and DCR filtering are complementary — not mutually exclusive. Evaluate whether all ingested data serves a hunting, forensic, or compliance purpose. If a portion is noise (e.g., verbose diagnostics, routine health checks, debug-level telemetry), apply DCR transformations to drop or reduce that portion first, then migrate the meaningful remainder to Data Lake. This avoids simply shifting cost from Analytics to Data Lake query charges on data nobody uses.

Even when a table has zero rules, consider whether it serves hunting/forensic purposes. Tables like SigninLogs or AuditLogs should generally remain on Analytics regardless.

Data Lake Promotion via KQL Jobs

For high-volume tables on Data Lake — whether fully migrated or partially routed via split ingestion — that still need detection coverage:

  1. Ingest raw logs into Data Lake tier (cheap)
  2. Create KQL jobs to query Data Lake on schedule, writing aggregated results to Analytics-tier _KQL_CL tables
  3. Point analytics rules at the _KQL_CL output table

KQL Job key facts: Full KQL (joins, unions, CTEs). Schedules: by-minute through monthly. Lookback up to 12 years. Limits: 3 concurrent / 100 enabled per tenant, 1hr query timeout. Data Lake has ~15-min ingestion latency — jobs should use now(-15m) as upper bound. TimeGenerated is overwritten if >2 days old — preserve source timestamps in a custom column.

Split Ingestion and/or DL + KQL Job Promotion

PS1 auto-classifies 🟣 Split candidates (1-2 rules, ≥5 GB/week, DL-eligible). For these tables (and high-volume 🟢 Keep tables), the report should present both optimization paths so the operator can choose — or combine them — based on their knowledge of the rule queries:

Split Ingestion (DCR) DL + KQL Job
How it works DCR routes a detection-relevant subset to Analytics, bulk to DL Any data on DL (full table or split-routed portion); KQL job promotes aggregated results to _KQL_CL on Analytics
Detection latency Real-time (subset stays on Analytics) 15+ min (DL ingestion lag + job schedule)
Rule rewrite needed No — rules keep targeting original table Yes — rules must target _KQL_CL output
Volume savings Moderate (bulk to DL, subset stays) Depends on scope — maximum if entire table goes to DL, incremental if applied to split-routed portion
Best when Rules filter on specific raw events (EventIDs, facilities) Rules use aggregation and tolerate latency

These approaches are complementary, not mutually exclusive. Split ingestion routes bulk data to DL while keeping detection-relevant events on Analytics. KQL jobs can then run against that DL portion to surface additional insights (e.g., aggregated anomalies) back to Analytics via _KQL_CL tables — giving you both real-time detection on the split subset AND scheduled analytics on the DL bulk.

Rendering guidance: The LLM does NOT have visibility into rule query text (aggregation vs raw filters), so it cannot definitively recommend one over the other. For 🟣 tables and high-volume 🟢 tables, present the comparison and note which approach fits which rule pattern. Do NOT change PS1's Category emoji in §7a — express as prose in §7b/7c.

References:


Reference: License Benefits

Defender for Servers P2 — 500MB/Server/Day Benefit

  • Each server protected by DfS P2 contributes 500 MB/day to a pooled daily allowance
  • Pool = (number of protected servers) × 500 MB — aggregate across subscription, not per-machine
  • Applies to security data types: SecurityAlert, SecurityBaseline, SecurityBaselineSummary, SecurityDetection, SecurityEvent, WindowsFirewall, MaliciousIPCommunication, SysmonEvent, ProtectionStatus, Update, UpdateSummary
  • Applied automatically at workspace level — shows as zero cost

Pool calculation from Q4:

Potential DfS P2 Pool = (Distinct servers from Q4) × 500 MB/day

Example: Q4 shows 12 servers → pool = 6 GB/day. If DFSP2-eligible avg is 4.2 GB/day → fully covered.

Scenario Condition Recommendation
Pool far exceeds usage DfSP2_DailyGB < 50% of PoolGB Highlight the unused headroom and recommend increasing SecurityEvent logging levels (e.g., "All Events" instead of "Common") to broaden detection coverage at no additional ingestion cost. Note that increased data volume may affect retention storage costs
Pool covers usage DfSP2_DailyGB ≥ 50% and ≤ 100% of PoolGB Pool covers current need — monitor growth and reference §3a if approaching ceiling
Usage exceeds pool DfSP2_DailyGB > PoolGB Overage is billed at standard rates — review §3a EventID breakdown for reduction opportunities, or consider onboarding more servers to DfS P2 to expand the pool

M365 E5 / Defender XDR Ingestion Benefit

  • M365 E5 (or E5 Security, A5, F5, G5) provides 5 MB per user per day pooled data grant (offer page)
  • Grant = (number of E5 licenses) × 5 MB/day
  • Covers: Entra ID sign-in/audit logs, MCAS shadow IT, Purview info protection, M365 advanced hunting data (29 tables in Q17/Q17b)
  • Applied automatically — Free Benefit - M365 Defender Data Ingestion
  • Always-free (all Sentinel users): Azure Activity, Office 365 Audit Logs, Defender alerts

⚠️ Ask user for E5 license count — not discoverable from Sentinel telemetry.

Example: 500 E5 licenses → grant = 2.5 GB/day. If E5-eligible avg exceeds grant, overage billed at standard rates.

References:


Report Template

📄 Just-in-time loading: Read SKILL-report.md at the start of Phase 6 rendering. It contains:

  • Inline Chat Executive Summary template — Workspace at a Glance, Cost Waterfall, Detection Posture, Overall Assessment, Top 3 Recommendations
  • Markdown File Structure — Complete §1–§8 rendering rules, mandatory format requirements, column specifications, validation checks
  • Section-to-Scratchpad Mapping — Which scratchpad keys feed each report section

Load ONLY when entering Phase 6 — NOT during Phases 1–5. Combine with scratchpad data for rendering.


Post-Report Drill-Down Reference

📄 Just-in-time loading: Read SKILL-drilldown.md for full instructions when any of these are needed.

Available Drill-Down Patterns

Use these when the user asks follow-up questions after a report is generated (e.g., "which rules use EventID 8002?", "look up custom detection rules", "do any ASIM parsers depend on this table?").

Pattern Purpose Tool / Method Trigger Phrases
1. EventID cross-ref Which analytic rules reference a specific EventID? az rest (Sentinel REST API) + JMESPath contains() "which rules use EventID X", "does any rule need this EventID"
2. Syslog facility/process Which rules reference a Syslog facility, source, or process? az rest + JMESPath "which rules use sshd", "any rules for authpriv"
3. CSL vendor/activity Which rules reference a CEF vendor, product, or activity? az rest + JMESPath "rules for Palo Alto TRAFFIC", "which rules use CommonSecurityLog"
4. Full rule query dump Export all enabled rule queries for manual analysis az rest → JSON file "export all rule queries", "build EventID dependency map"
5. ASIM parser verification Which ASIM parsers consume a table slated for migration? az rest + regex match for _Im_/_ASim_ patterns "ASIM dependency", "do parsers use this table"
6. Custom Detection rules Inventory CD rules via Graph API (query text, schedule, last run) PowerShell Invoke-MgGraphRequest (NOT Graph MCP — scope CustomDetection.Read.All unavailable via MCP) "custom detection rules", "CD rules", "lookup custom detections"

⚠️ Graph MCP limitation: The Graph MCP server returns 403 for the Custom Detection endpoint (/beta/security/rules/detectionRules). Always use Invoke-MgGraphRequest via PowerShell terminal. See SKILL-drilldown.md and Q9b-CustomDetectionRules.yaml for the exact endpoint and select fields.

Also in SKILL-drilldown.md

Section Contents
Known Pitfalls Usage table batching, _SPLT_CL naming, case-sensitive custom tables, LogSeverity types, value-level vs table-level coverage confusion
Error Handling Common errors from az rest, Graph API, az monitor; graceful degradation for missing tables; re-running individual PS1 phases
CloudAppEvents Appendix Custom Detection management audit trail (EditCustomDetection events) — distinct from execution telemetry
Additional References Microsoft Learn links for cost optimization, DCR configuration, data tiers, ASIM parsers

SVG Dashboard Generation

After a report is generated, the user can request a visual SVG dashboard.

Trigger phrases: "generate SVG dashboard", "create a visual dashboard", "visualize this report", "SVG from the report"

✅ DEFAULT: run the deterministic renderer (render_dashboard.py)

Do this first — do NOT hand-author the SVG. render_dashboard.py produces the manifest-driven 7-row dashboard non-interactively, parsing every value from the scratchpad + report + svg-widgets.yaml (no hardcoded run data). It is faster, deterministic, and produces a known-good layout. Run it:

python .github/skills/sentinel-ingestion-report/render_dashboard.py \
  --scratch temp/ingest_scratch_<ts>.md \
  --manifest .github/skills/sentinel-ingestion-report/svg-widgets.yaml \
  --report reports/sentinel/sentinel_ingestion_report_<label>_<ts>.md \
  --out reports/sentinel/sentinel_ingestion_report_<label>_<ts>_dashboard.svg

It reads the posture gauge, ingestion KPI cards, daily-volume line chart, cost waterfall, tier donut, top-tables / detection-coverage tables, and WoW anomaly + alert-producing-rule tables from the scratchpad, and the header metadata + Overall Assessment + ### 🎯 Top 3 Recommendations cards from the report (--report is optional — the assessment banner and recommendation cards degrade gracefully if absent). The alert-rule subheader lookback suffix ((7d)/(30d)) is matched by prefix, so any reporting window parses. Output is self-contained SVG with explicit fill on every <text>.

Action Status
Running render_dashboard.py when the user asks to visualize/generate a dashboard REQUIRED (default path)
Hand-authoring the SVG via the svg-dashboard skill instead of running the script PROHIBITED unless the user explicitly asks for a bespoke/custom layout the renderer can't produce

Fallback — bespoke/interactive dashboards (svg-dashboard skill)

Only use this path when the user explicitly wants a custom layout, different widgets, or styling the deterministic renderer doesn't support. Edit svg-widgets.yaml first if the change is layout/field-level — the renderer reads it at generation time, so many "customizations" don't require hand-authoring. The YAML manifest is the single source of truth for layout, widgets, field mappings, colors, and data source documentation.

Step 1:  Read svg-widgets.yaml (this skill's widget manifest)
Step 2:  Read .github/skills/svg-dashboard/SKILL.md (rendering rules — Manifest Mode)
Step 3:  Read the completed report file (data source)
Step 4:  Render SVG → save to reports/sentinel/{report_name}_dashboard.svg
用于根据调查数据或技能报告生成SVG数据可视化仪表板。支持基于YAML清单的结构化模式和针对临时数据的自适应模式,提供KPI、图表等多种组件。
generate SVG dashboard create a visual dashboard visualize this report SVG from the report visualize results create SVG chart SVG from this data
.github/skills/svg-dashboard/SKILL.md
npx skills add SCStelz/security-investigator --skill svg-dashboard -g -y
SKILL.md
Frontmatter
{
    "name": "svg-dashboard",
    "description": "Use this skill when asked to generate SVG data visualization dashboards from investigation data or skill reports. Triggers on keywords like \"generate SVG dashboard\", \"create a visual dashboard\", \"visualize this report\", \"SVG from the report\", \"visualize results\", \"create SVG chart\", \"SVG from this data\". Supports two modes: manifest-driven structured dashboards (from skill reports with svg-widgets.yaml) and freeform adaptive visualizations from ad-hoc investigation data. Component library includes KPI cards, score cards, bar charts, line charts, donut charts, waterfall charts, tables, recommendation cards, assessment banners. SharePoint Dark Theme default palette."
}

SVG Dashboard Generator

Renders SVG data visualization dashboards — either from a skill's svg-widgets.yaml manifest (structured dashboards) or freeform from ad-hoc investigation data in context.


Mode Detection

Before rendering, determine which mode applies:

Condition Mode Behavior
User asks for a dashboard after a skill report AND the calling skill has an svg-widgets.yaml Manifest Mode Read the YAML manifest → follow its layout exactly → deterministic dashboard
User asks to "visualize", "chart", or "create an SVG" from ad-hoc data in context (query results, investigation findings, inline tables) Freeform Mode Select widget types from the Component Library below based on data shape → creative layout
No svg-widgets.yaml exists for the current workflow Freeform Mode Same as above

Decision flow:

1. Is there an svg-widgets.yaml for the current skill?
   → YES + user said "dashboard" or "SVG from the report" → Manifest Mode
   → NO  → Freeform Mode

2. Does the user have structured data in context (query results, tables, metrics)?
   → YES → Freeform Mode (use data shape to pick widgets)
   → NO  → Ask user what data to visualize

Manifest Mode — Structured Dashboard

Used when a skill provides an svg-widgets.yaml manifest (e.g., mcp-usage-monitoring, sentinel-ingestion-report).

Execution

Step 1:  Read the calling skill's svg-widgets.yaml (widget manifest)
Step 2:  Read this file's Rendering Rules below (component library + quality standards)
Step 3:  Read the completed report file (data source)
         — If same chat: report data is already in context
         — If new chat: read the file path provided by user or find latest in the skill's reports/ subfolder
Step 4:  Map manifest fields → report data using data_sources.field_mapping_notes
Step 5:  Render SVG → save to the same directory as the report: {report_basename}_dashboard.svg

Data Extraction (Manifest Mode)

  • Read the report markdown or scratchpad JSON.
  • Match fields from the manifest's data_sources.field_mapping_notes to locate values.
  • For arrays (top_tables, anomalies, etc.), extract the full dataset and render up to max_items.
  • For single values (KPIs), extract the number and apply the specified unit.
  • If a field is not found in the report data, render the widget with "N/A" in muted text — never omit the widget.

Freeform Mode — Adaptive Visualization

Used when no manifest exists or the user wants an ad-hoc visualization from investigation data already in context.

Execution

Step 1:  Identify the data in context (query results, investigation findings, report sections, inline tables)
Step 2:  Analyze data shape — what dimensions, metrics, categories, and time series are present?
Step 3:  Read this file's Rendering Rules below (component library + quality standards)
Step 4:  Select appropriate widget types from the Component Library (see Data Shape Guide below)
Step 5:  Design a layout: title banner → KPI summary → detail charts/tables → optional assessment
Step 6:  Render SVG → save to temp/{descriptive_name}_dashboard.svg or user-specified path

Data Shape → Widget Selection Guide

Data Shape Best Widget Example
Single metrics / counts kpi-card Total failed logins: 47, Unique IPs: 12
Metric with period-over-period change delta-kpi-card Incidents: 47 (↑23% vs last period)
Scored assessment (0-100) score-card Risk Score: 73/100
Categorical counts (top-N) horizontal-bar-chart Top 10 source IPs by attempt count
Composition within categories stacked-bar-chart Alert severity breakdown per week
Time series (values over dates) line-chart Daily sign-in volume over 30 days
Proportional breakdown donut-chart Auth methods: 60% password, 30% MFA, 10% token
Additive/subtractive flow waterfall-chart Ingestion costs with license benefits
Completion / target tracking progress-bar 72% of critical CVEs patched
Inline trend in KPI or table cell sparkline 7-day mini trend beneath a KPI value
Tabular detail rows table-widget IP enrichment results, alert details
Prioritized action items recommendation-cards High/Medium/Low priority findings
Executive summary assessment-banner Overall risk assessment with key risks/strengths
2D framework coverage (categories × items) coverage-matrix MITRE ATT&CK tactic × technique map, permission grids
Report header title-banner Investigation title, date, scope

Layout Heuristics (Freeform)

  • Row 1: Always start with a title-banner (data source, date range, scope)
  • Row 2: KPI cards for key metrics (3-6 cards, one row)
  • Rows 3+: Charts and tables arranged by importance — most critical findings first
  • Final row: Assessment banner or recommendation cards if actionable findings exist
  • Canvas size: Default 1400×900, increase height proportionally for more rows (~100-200px per row)
  • Use the default SharePoint Dark palette (defined below) unless the data context suggests otherwise

Token Budget & Data Limits (Freeform Mode)

Why this matters: SVG is verbose — every <rect>, <text>, and <path> consumes output tokens. Without limits, freeform dashboards with rich investigation data routinely exceed the model's output token budget, producing truncated/broken SVGs. Manifest-mode dashboards avoid this because the YAML max_items and fixed row count act as natural constraints.

Hard Limits — Always Enforced:

Constraint Limit Rationale
Max rows 6 (including title banner) Each row adds ~100-200 SVG elements
Max widgets total 12 Beyond this, SVG size balloons past safe output limits
Max KPI cards per row 5 More than 5 become unreadable at standard canvas width
Max canvas height 1200px Forces prioritization; prevents unbounded vertical growth

Per-Widget Data Limits:

Widget Type Max Data Points What to Do with Excess
horizontal-bar-chart 10 bars Show top 10, add a summary "Other (N remaining)" bar
stacked-bar-chart 8 bars × 6 segments Aggregate smaller segments into "Other"
line-chart 30 data points Resample to weekly if daily exceeds 30; show date range in subtitle
donut-chart 7 segments Merge smallest into "Other"
waterfall-chart 8 segments Combine minor items
table-widget 8 rows Show top 8, add footer "Showing 8 of N"
recommendation-cards 4 cards Prioritize highest-impact recommendations
sparkline 14 data points Resample to fit (e.g., daily → every-other-day)

Data Triage Strategy:

When the data in context exceeds these limits, apply this priority filter:

  1. Summarize first — Extract the 3-5 most important KPIs before plotting details
  2. Top-N everything — For ranked data, show top 10 max; group the rest as "Other"
  3. Aggregate time series — If >30 daily points, resample to weekly; if >30 weekly, resample to monthly
  4. One chart per insight — Don't render the same data as both a bar chart AND a table; pick the one that communicates better
  5. Cut, don't shrink — Rather than making unreadable tiny widgets, remove the lowest-priority widget entirely

If the data is too rich for 6 rows / 12 widgets: Tell the user what was included vs omitted, and suggest they request a second dashboard for the remaining data or provide an svg-widgets.yaml manifest for full control.

Creative Freedom (Freeform)

In freeform mode, you have latitude to:

  • Decide which widget types best represent the data
  • Choose how many rows and how to arrange widgets (within the limits above)
  • Add contextual annotations on charts (peak markers, threshold lines)
  • Combine multiple data points into composite widgets
  • Adjust canvas dimensions to fit the content (up to 1400×1200 max)

You are still bound by the Quality Standards and Color & Typography rules below — these ensure visual consistency regardless of mode.


Rendering Rules (Both Modes)

Canvas & Layout

  • Output a single <svg> element with xmlns="http://www.w3.org/2000/svg" and the width/height from the manifest (or chosen dimensions in freeform mode).
  • Fill the background with canvas.background (manifest) or #1b1a19 (freeform default).
  • Apply canvas.padding (manifest) or 40px (freeform default) on all sides. Usable width = width - 2 * padding.
  • Render rows top-to-bottom with canvas.row_gap (manifest) or 24px (freeform default) spacing between rows.
  • Within each row, widgets are laid out left-to-right. If a widget specifies width_pct, it gets that percentage of usable width. Otherwise, widgets share remaining space equally.
  • Use canvas.col_gap (manifest) or 20px (freeform default) for spacing between widgets in the same row.

Color & Typography

  • Use palette.* values from the manifest. In freeform mode, use the default palette below.
  • 🔴 GLOBAL TEXT FILL RULE: SVG defaults fill to black — which is invisible on dark backgrounds. Every <text> element MUST have an explicit fill attribute. Set fill="{palette.text_primary}" on the root <svg> or a top-level <g> so all text inherits white by default. Never rely on SVG's implicit black fill.
  • All text uses canvas.font_family (manifest) or Segoe UI, sans-serif (freeform default).
  • KPI values: bold, 28-36px, colored with palette.primary or widget's highlight_color.
  • KPI labels: 11-12px, palette.text_secondary.
  • Widget titles: bold, 14-16px, palette.text_primary.
  • Axis labels and table headers: 10-12px, palette.text_secondary.
  • Data labels and value labels: 10-11px, palette.text_primary. Never place value labels inside bars — always position them after/outside the bar.
  • The default palette uses a cool dark theme consistent across all skill manifests. Skills may override with their own palette in svg-widgets.yaml.

Default Palette (Freeform Mode)

palette:
  background: "#0d1117"
  card_bg: "#161b22"
  primary: "#409AE1"       # Blue — KPI highlights
  secondary: "#b4a0ff"     # Purple — secondary charts
  success: "#40C5AF"       # Teal-green — healthy metrics
  warning: "#ff8c00"       # Orange — moderate risk
  danger: "#EF6950"        # Red — critical findings
  text_primary: "#e6edf3"
  text_secondary: "#b2b2b2"
  accent: "#FFC83D"        # Yellow — warnings, anomalies
  grid_line: "#30363d"

Widget Type Reference — Component Library

title-banner

Full-width banner. Render the title large and centered horizontally on the canvas, subtitle fields centered below on the same line separated by " · ". Optional accent underline. Use text-anchor="middle" with x at canvas midpoint. If the manifest specifies title_align: left, left-align instead — but the default is always center.

kpi-card

Rounded rectangle (rx="12"). Show the value large and centered, label below in small text, optional unit suffix. Color the value with highlight_color if specified, otherwise palette.primary. No actual icon rendering needed — use a colored dot or small indicator instead.

delta-kpi-card

Extends kpi-card with a period-over-period change indicator. Render the primary value the same as kpi-card. Below (or beside) the value, show a delta line: an arrow (▲ or ▼) followed by the percentage or absolute change. Color the delta with palette.success for favorable changes and palette.danger for unfavorable changes. If invert_color is true, reverse the color logic (e.g., for metrics where "down" is good, like error rate). Show the comparison period label in palette.text_secondary at 10px (e.g., "vs prior 7d"). If no delta data is available, render as a standard kpi-card with no delta line.

score-card

Rounded rectangle card (rx="12") with card_bg background. Render the numeric score value large and centered (bold, 42-48px), colored by whichever range it falls into (from the widget's ranges array). Below the number, show the rating label (e.g., "CONCERNING") in 14px bold, same color as the number. Above both, render the widget title in 14-16px bold, palette.text_primary. Add a subtle /100 suffix after the score in smaller muted text (18px, palette.text_secondary). Keep it visually clean — no gauge arcs, needles, or scale markers.

stacked-bar-chart

Vertical or horizontal bars where each bar is subdivided into colored segments representing categories (e.g., severity levels, sources, status). Include a legend mapping segment colors to category names. If orientation: horizontal, render left-to-right stacked rows with labels on the left. If orientation: vertical (default), render bottom-to-top stacked columns with labels on the x-axis. Show segment values on hover via <title> elements. If show_totals is true, display the total above each bar. Use segment_colors from the manifest or assign from palette automatically.

horizontal-bar-chart

Horizontal bars sorted by value descending. Layout per row (left to right): label → optional inline badges → bar (proportional to max value) → value label → optional extra column (rightmost). Value labels MUST be positioned outside (after) the bar, never inside it — use fill="{palette.text_primary}" (white on dark themes). Append value_suffix if specified. If show_rule_count: right, render the rule count as the rightmost column, right-aligned. If a value is 0, render it in palette.danger. If show_tier_badge is true, render a small colored badge after each label using colors from the YAML segments or badge_colors definitions. If bar_color_by: severity is set, color bars by severity level. If show_error_overlay is true, render a red overlay segment proportional to failure count. If highlight_sensitive is true, mark flagged items with a warning indicator.

line-chart

SVG <polyline> or <path> for the trend line with optional area fill (fill_opacity). X-axis = dates, Y-axis = values. Render annotations as labeled markers: peak (triangle up), low (triangle down), average (dashed horizontal line). Grid lines at sensible intervals. If show_weekday_pattern is true, add subtle mini-bars along the bottom showing day-of-week averages.

donut-chart

Render using SVG <circle> elements with stroke-dasharray/stroke-dashoffset. Use this exact formula — do not iterate or try alternative approaches:

circumference = 2 * π * radius    (e.g., radius=70 → C ≈ 439.82)

For each segment i (ordered by value descending):
  arc_len_i    = (value_i / total) * circumference
  start_i      = sum of all previous arc_lens (0 for first segment)
  dasharray    = "arc_len_i, (circumference - arc_len_i)"
  dashoffset   = circumference - start_i
  transform    = "rotate(-90, cx, cy)"     ← starts at 12 o'clock

Each segment is a <circle cx cy r> with fill="none", stroke="{segment_color}", stroke-width="20". Stack all circles at the same position — the dasharray/dashoffset combination makes each one draw only its arc portion. Add <title> tooltips.

Legend to the right or below. If show_center_total is true, display the total count in the donut center. If compact is true, reduce the donut radius and legend font size to fit alongside a stacked widget below.

waterfall-chart

Stacked/cascading vertical bars: each segment starts where the previous ended. Negative segments (benefits) flow downward. Show values on each bar. Final bar shows net total.

progress-bar

Horizontal bar showing completion percentage against a target. Render a rounded track (rx="6") in palette.grid_line (or card_bg), filled proportionally with palette.primary (or bar_color if specified). Show the percentage value (bold, 18-22px) to the right of the bar or centered inside the filled portion. Label text above or to the left in palette.text_primary at 12-14px. If target_label is provided, show it at the 100% mark in palette.text_secondary. If thresholds are defined (e.g., [{"at": 90, "color": "success"}, {"at": 50, "color": "warning"}, {"at": 0, "color": "danger"}]), color the fill bar according to which threshold the value meets. If show_remaining is true, display the remaining percentage in muted text after the bar.

sparkline

Miniature trend line — a compact <polyline> rendered inline within a kpi-card, delta-kpi-card, or table-widget cell. Dimensions: typically 60-100px wide × 16-24px tall. No axes, labels, or grid lines — just the trend shape. Stroke width 1.5-2px in palette.primary (or line_color if specified). Optional: fill the area below with the same color at 10-15% opacity. If show_endpoints is true, render small circles (r=2) at the first and last data points. If the last value is higher than the first, color the line palette.success; if lower, palette.danger; if auto_color: false, use the specified line_color instead.

table-widget

Rows of data with alternating row backgrounds (card_bg and slightly lighter). Column headers in text_secondary. If color_scale is true for a column, color positive values red and negative green (or vice versa for cost savings). If badge is true, render small severity badges. If highlight_zero is true for a column, render zero values in palette.danger color. If summary_row is specified, add a totals/summary row at the bottom with a top border separator. If stack_below is specified, this widget shares the same column as the named widget above it — render it directly below that widget rather than side-by-side.

recommendation-cards

Side-by-side rounded cards. Left border colored by priority (card_colors). Title bold, description in text_secondary. If show_impact_estimate, add a small impact line.

assessment-banner

Large panel with a colored left border. Title + main assessment text. Sub-fields rendered as bullet lists (key_risks in palette.danger, strengths in palette.success).

coverage-matrix

Compact grid visualization for displaying coverage status across a two-dimensional framework (e.g., MITRE ATT&CK tactics × techniques, permission matrices, data readiness grids). Renders as a grid of small colored <rect> cells organized into columns, where each column represents a category (e.g., tactic) and each cell represents an item (e.g., technique) within that category.

Layout: Columns are arranged left-to-right. Each column has a rotated header label at the top (45° angle, 10-11px text) and a vertical stack of cells below. Columns are variable-height — each has as many cells as items in that category. A legend bar is rendered below the grid mapping colors to status labels.

Cell rendering: Each cell is a small <rect> (default cell_size: 12 × 12px, cell_gap: 2px between cells). Cells are colored according to their status field using the status_colors map from the manifest. Cells within each column are sorted by status priority (covered items at top, uncovered at bottom) to create a visible "waterline" effect. Each cell has a <title> element containing the item name and status for hover tooltips — this is essential since cell text is not rendered at this scale.

Column rendering: Each column is cell_size + cell_gap wide. Columns are separated by col_gap (default 6px). Column header text is right-rotated and positioned above the first cell. An optional column footer shows the count or percentage (e.g., "5/11" or "45%") in 9px text below the last cell.

Legend: Horizontal bar below the grid with colored squares and labels for each status. Rendered in a single row, 10px text, using the status_colors map.

Manifest fields:

Field Required Description
field Data source — array of {column, items: [{name, status}]} objects
status_colors Map of status label → hex color (e.g., custom_rule: "#409AE1", tier_1: "#40C5AF", uncovered: "#21262d")
cell_size Cell width and height in px (default: 12)
cell_gap Gap between cells in px (default: 2)
col_gap Gap between columns in px (default: 6)
show_col_footer Show count/percentage below each column (default: true)
sort_order Array of status labels defining top-to-bottom cell sort order (covered statuses first)
max_rows Cap the tallest column at this many cells; excess items are collapsed into a single "+" cell with count in tooltip

Token budget: This widget is compact by design — 250 cells ≈ 250 <rect> elements (~15KB SVG). No text per cell keeps it efficient. The primary token cost is the <title> tooltip content. For grids exceeding 300 items, set max_rows to cap column height and keep SVG size manageable.

Example use cases: MITRE ATT&CK tactic × technique coverage map, data source × table readiness grid, permission scope × application access matrix, compliance framework × control status.

Quality Standards

  • All text must be legible — minimum 10px font size.
  • Maintain consistent rounded corners (rx="8" to rx="12") on all cards and panels.
  • Use <title> elements on interactive-looking elements for accessibility.
  • Encode any special characters in text (&amp;, &lt;, etc.).
  • The SVG must be fully self-contained — no external stylesheets, fonts, or images.
  • Add a <!-- Generated by Copilot SVG Dashboard Generator --> comment at the top.

Output

  • Manifest mode: Save to the same directory as the report, with filename pattern: {report_basename}_dashboard.svg
  • Freeform mode: Save to temp/{descriptive_name}_dashboard.svg or a user-specified path
将威胁情报文章转化为可测试的KQL狩猎活动。支持RSS/Atom批量处理或单篇文章模式,执行解析、相关性筛选、编写、测试调优及发布文件,不产生Git副作用,输出结构化结果与聊天中的狩猎发现摘要。
threat intel campaign ingest threat intelligence TI feed write hunts from this article threat intelligence blog build a hunting campaign
.github/skills/threat-intel-campaign/SKILL.md
npx skills add SCStelz/security-investigator --skill threat-intel-campaign -g -y
SKILL.md
Frontmatter
{
    "name": "threat-intel-campaign",
    "description": "Turn a published threat-intelligence article into a tested threat-hunting campaign. Reads a platform-agnostic RSS\/Atom feed (feed_url is a parameter — nothing vendor-specific is hardcoded), triages articles from a recent window, applies a huntability relevance gate to decide whether an article warrants a campaign, then writes\/tests\/tunes KQL hunts and publishes them as a campaign file under queries\/threat-intelligence\/YYYY-MM\/. Also supports a single-article mode (pass an article URL directly). Side-effect-free: it writes campaign files and regenerates the manifest\/TOCs but performs NO git commits or PRs — branch\/PR orchestration belongs to the calling automation. Trigger keywords: \"threat intel campaign\", \"ingest threat intelligence\", \"TI feed\", \"write hunts from this article\", \"threat intelligence blog\", \"build a hunting campaign\"."
}

Threat Intelligence Campaign Authoring — Instructions

Purpose

This skill converts published threat-intelligence reporting into tested, tuned, publish-ready threat-hunting campaigns that land in queries/threat-intelligence/YYYY-MM/. It exists to be driven either:

  • Interactively — a human gives one article URL ("read this article and write/test/tune hunts"), or
  • Unattended — a scheduled automation passes a feed URL and the skill triages everything published in a recent window.

It does the authoring (parse → triage → relevance gate → write → test → tune → publish files → regenerate manifest/TOCs). It deliberately does NOT create branches, commits, or pull requests. That orchestration — and the per-article PR isolation — belongs to the calling workflow. This keeps the skill reusable and free of git side effects when a human runs it.

What this skill produces:

Output Description
Campaign file(s) queries/threat-intelligence/YYYY-MM/<slug>.md in the standard campaign format
Regenerated artifacts .github/manifests/discovery-manifest.yaml + per-file Quick Reference TOCs
Structured result A JSON array (one entry per article) the calling automation consumes to drive per-article PRs
Human summary A readable per-article decision log
In-chat hunt findings summary A per-article report of what the test runs actually surfaced — real hits, false positives to tune, and follow-up actions. Emitted to chat/run output only; never written to a tracked file. This is where concrete findings live, keeping the committed campaign file PII-free.

📑 TABLE OF CONTENTS

  1. Critical Workflow Rules
  2. Prerequisites
  3. Inputs / Parameters
  4. Invocation Modes
  5. Execution Workflow — Phase 0–6
  6. The Relevance Gate (Huntability Rubric)
  7. Writing / Testing / Tuning Queries
  8. Campaign File Format
  9. Structured Output Contract
  10. In-Chat Hunt Findings Summary
  11. Known Pitfalls
  12. Quality Checklist

⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

  1. No git side effects. This skill NEVER runs git commit, git push, gh pr create, or any branch operation. It writes files and regenerates the manifest/TOCs only. Publishing (branch + commit + PR per article) is the calling automation's job. If a human is running this interactively, leave the files in the working tree for them to review.

  2. feed_url is a parameter — nothing vendor-specific is hardcoded. The skill handles any RSS 2.0 or Atom feed. The Microsoft Threat Intelligence feed is just one value a caller may pass; do not assume it.

  3. Advanced Hunting (≤30d) is the primary test/tune engine. Write and validate every query against RunAdvancedHuntingQuery within a 30-day window. Fall back to the Sentinel Data Lake (query_lake, >30d) only when you need additional supporting evidence that AH's 30-day cap cannot provide (e.g., confirming a rare IOC's longer-term absence/presence). Follow the Tool Selection Rule and timestamp-adaptation guidance in copilot-instructions.md.

  4. ⛔ Evidence-based "tested" claims only. A query may be described as tested in the campaign file only if it was actually executed. If a query could not be run (table absent, AH safety filter, telemetry gap), say so explicitly and mark cd_ready: false with honest adaptation_notes. Never imply validation that did not happen. Follow the Evidence-Based Analysis rule in copilot-instructions.md.

  5. ⛔ Committed output is PII-free — but published IOCs are NOT PII. Test/tune runs against the live tenant, but campaign files are version-controlled. NEVER paste real tenant entities (your UPNs, hostnames, IPs, workspace/tenant GUIDs, app names) into a campaign file. The article's published IOCs (hashes, domains, URLs, certs, filenames) are the opposite — they are public, shareable, and MUST be included verbatim in the IOC Reference table and the IOC-sweep queries (this is how the committed companion files do it). Do not placeholder or omit a published IOC. Concrete tenant findings from test runs belong in the In-Chat Hunt Findings Summary, not the file. Perform a PII sanity-check before finalizing each file.

  6. ⛔ Every IOC must trace to the article. Never invent one. Copy IOCs from the article's "Indicators of compromise" table (and any inline-cited indicators) exactly. Before finalizing, re-open the article's IOC section and confirm each hash/domain/URL/filename in your file appears there character-for-character. Hallucinated or mis-transcribed IOCs are a critical evidence-integrity failure — they produce false detections and erode trust. If an indicator only appears in narrative prose (not the IOC table), label it as such.

  7. Reference, don't reinvent. Use the kql-query-authoring skill's discipline for query construction (schema validation via kql-search MCP, table pitfalls, TimeGenerated vs Timestamp), and the detection-authoring skill's CD Metadata Contract for the <!-- cd-metadata --> block on every query. Read those SKILL.md files when authoring.

  8. Workspace selection. Follow the SENTINEL WORKSPACE SELECTION rule in copilot-instructions.md. In unattended runs the caller will specify the workspace; if exactly one exists, auto-select and state it.

  9. Read config.json for workspace ID, tenant, and Azure MCP parameters before querying.

  10. Quiet runs are a success, not a failure. If nothing in the window qualifies, that is a valid, expected outcome. Emit an empty/"skipped"-only result set and stop — do not lower the bar to manufacture a campaign.


Prerequisites

Dependency Used for
kql-search MCP (GITHUB_TOKEN set) Schema validation, table discovery, community query examples
Sentinel Triage MCP (RunAdvancedHuntingQuery) Primary query testing/tuning (≤30d)
Sentinel Data Lake MCP (query_lake) Supporting evidence only (>30d)
Microsoft Learn MCP Grounding TTP/technique/error-code explanations
Python 3 (stdlib xml.etree, urllib) RSS/Atom parsing — no external dependency required
web_fetch / web_search tools Fetching article bodies and the feed
.github/manifests/build_manifest.py, scripts/generate_tocs.py Post-processing

Feed parsing uses Python stdlib (xml.etree.ElementTree) so it works unattended without pip install. feedparser may be used if already installed, but never assume it.


Inputs / Parameters

Parameter Default Description
feed_url (required in feed mode) Any RSS 2.0 / Atom feed URL
article_url (none) A single article to process directly (single-article mode)
lookback_hours 24 How far back to consider feed entries (by published date)
max_campaigns 3 Cap on campaigns produced per run (bounds tenant query load + review burden)
workspace_id from config.json Sentinel workspace for testing
min_queries / max_queries 4 / 9 Soft bounds on queries per campaign

Invocation Modes

A. Single-article mode — caller passes article_url (the classic "read this article and write hunts" prompt). → Skip Phase 1 (feed) and the time filter. Still run Phase 2 (dedup) and Phase 3 (relevance gate) unless the human explicitly says "build it regardless". Then Phases 4–6 for that one article.

B. Feed mode — caller passes feed_url (+ optional lookback_hours). → Full Phase 0–6 across all qualifying entries in the window, capped at max_campaigns.


Execution Workflow

Phase 0 — Setup

  1. Read config.json (workspace ID, tenant, subscription, Azure MCP params).
  2. Resolve workspace per the selection rule. State which workspace is in use.
  3. Confirm kql-search + Triage MCP are available (needed for testing).

Phase 1 — Fetch & parse the feed (feed mode only)

Fetch feed_url and parse entries with Python stdlib so it works for both RSS and Atom:

import sys, urllib.request, datetime as dt
import xml.etree.ElementTree as ET
from email.utils import parsedate_to_datetime

feed_url = sys.argv[1]
lookback_hours = int(sys.argv[2]) if len(sys.argv) > 2 else 24
cutoff = dt.datetime.now(dt.timezone.utc) - dt.timedelta(hours=lookback_hours)

raw = urllib.request.urlopen(urllib.request.Request(feed_url, headers={"User-Agent": "ti-campaign/1.0"}), timeout=30).read()
root = ET.fromstring(raw)
ATOM = "{http://www.w3.org/2005/Atom}"

def text(el, *tags):
    for t in tags:
        for tag in (t, ATOM + t):
            f = el.find(tag)
            if f is not None and f.text:
                return f.text.strip()
    return None

def parse_date(s):
    if not s: return None
    try: return parsedate_to_datetime(s)              # RSS pubDate
    except Exception:
        try: return dt.datetime.fromisoformat(s.replace("Z", "+00:00"))  # Atom ISO
        except Exception: return None

entries = list(root.iter("item")) or list(root.iter(ATOM + "entry"))  # RSS first, else Atom
for e in entries:
    title = text(e, "title")
    link = text(e, "link")
    if not link:
        a = e.find(ATOM + "link")
        link = a.get("href") if a is not None else None
    pub = parse_date(text(e, "pubDate", "published", "updated"))
    if pub and pub >= cutoff:
        print(f"{pub.isoformat()}\t{title}\t{link}")

Run it with the powershell tool (python script.py <feed_url> <lookback_hours>). Collect (published, title, link) for entries inside the window. If the feed only exposes summaries, you'll fetch full bodies in Phase 3.

Phase 2 — Dedup against existing campaigns

For each candidate URL, check whether it's already been turned into a campaign:

  • grep for the article URL (and a normalized form without trailing slash / query string) across queries/threat-intelligence/**.
  • Also grep the proposed slug. If a match exists → mark decision: "skipped", reason: "already published", and drop it from the work list.

Phase 3 — Relevance gate

For each remaining candidate, fetch the full article body (web_fetch) and apply the Huntability Rubric. Produce a decision (campaign / skipped) with a one-line reason. Rank campaign candidates by huntability confidence and keep the top max_campaigns.

Phase 4 — Write / test / tune (per qualifying article)

See Writing / Testing / Tuning Queries. Output: a set of validated queries, each with an honest cd-metadata block and tuning notes.

Phase 5 — Publish the campaign file

Write queries/threat-intelligence/YYYY-MM/<slug>.md in the exact Campaign File Format. YYYY-MM = the article's publication month. <slug> = short, kebab/underscore, descriptive (e.g., soho_router_dns_hijacking). Do NOT hand-write the Quick Reference TOCgenerate_tocs.py creates it.

Phase 6 — Regenerate artifacts + emit results

  1. python .github/manifests/build_manifest.py (regenerate + validate; fix any error-level warnings on your new file).
  2. python scripts/generate_tocs.py (insert the Quick Reference TOC).
  3. Emit the Structured Output Contract JSON + a human summary.
  4. Emit the In-Chat Hunt Findings Summary — the per-article report of what the test runs actually found (hits, false positives to tune, follow-up actions). Chat/run output only; never write it to a tracked file.
  5. Stop. No git.

The Relevance Gate — Huntability Rubric

This is the judgment step: does this article warrant a hunting campaign? Decide with explicit gates, not vibes. Cite the evidence from the article for each gate.

Hard gates (BOTH must PASS to build)

Gate PASS criteria FAIL examples
G1 — Huntable behavior Article describes specific, observable attacker behaviors mappable to ≥1 ATT&CK technique (process exec, persistence mechanism, C2 pattern, auth abuse, mailbox manipulation, registry/file artifacts, etc.) "Threat actor targeted sector X" with no technique detail; pure attribution/geopolitics
G2 — Telemetry coverage ≥1 behavior or IOC maps to a table we ingest (Device*, Email*, Identity*, Signin*/EntraId*, Cloud*, Audit*, OfficeActivity, network/DNS) Behaviors only observable in telemetry we don't collect (e.g., physical, OT-only with no connector, third-party logs not onboarded)

Confidence signals (raise/lower priority among passing candidates)

Signal Effect
Concrete IOCs (hashes, domains, IPs, filenames, command lines, registry keys, user-agents) ↑↑ strong — enables direct-match hunts
Multiple distinct mappable TTPs (richer attack chain)
Named ATT&CK technique IDs in the article
Novel TTP not already covered by an existing campaign/query
Overlaps heavily with an existing campaign ↓ (consider extending the existing file instead of a new one)

Auto-skip categories (do not build)

  • Product/feature announcements, GA/preview notices, roadmap posts
  • Analyst-recognition / "named a Leader" / awards
  • Strategy, opinion, policy, or business-update posts
  • Event/webinar recaps and partner marketing
  • Pure data-breach news with no attacker TTPs/IOCs

Decision rule

BUILD if G1 PASS and G2 PASS and (concrete IOCs present OR ≥2 distinct mappable TTPs). Otherwise SKIP with a specific reason (which gate failed / which auto-skip category).

Record the rubric outcome in the structured result reason field (e.g., "BUILD: 4 IOCs + 6 mappable TTPs (endpoint, identity)" or "SKIP: product announcement, G1 fail").


Writing / Testing / Tuning Queries

For each qualifying article:

  1. Extract the TTPs and IOCs. Map each TTP to ATT&CK technique IDs (use Microsoft Learn MCP to confirm technique semantics). Build the IOC table (hashes, domains, IPs, filenames, etc.).

  2. Pick detection surfaces. Map TTPs → tables. Prefer XDR-native tables for AH testing. Check the discovery manifest + grep queries/** first — if an existing query file already covers a TTP, reuse/adapt its pattern and cite it as a companion rather than duplicating.

  3. Author each query following the kql-query-authoring discipline:

    • Validate the table/columns via kql-search MCP (get_table_schema) before writing.
    • Respect the table pitfalls in copilot-instructions.md (e.g., TimeGenerated vs Timestamp, dynamic-field parse_json, IpAddress casing).
    • Datetime filter first; project a useful, PII-light column set; order by/summarize to bound output.
  4. Test in Advanced Hunting (≤30d). Run every query via RunAdvancedHuntingQuery. Apply the Step-5 zero-result sanity check from copilot-instructions.md — a 0-row result must be verified correct (e.g., a direct-IOC sweep returning 0 in a clean environment is the desired outcome; a 0 from a broken filter is not). As you test, record the findings for each query — row count, whether hits look like true/false positives, and any notable entities — so you can build the In-Chat Hunt Findings Summary in Phase 6. These raw findings stay in chat; they do NOT go into the campaign file.

  5. Tune. If a query is noisy, add targeted exclusions (trusted publishers, known service accounts, expected automation) and document them in Tuning Notes — generically, never with live tenant identifiers. Re-run after tuning.

  6. Supporting evidence via Data Lake (>30d) — only if needed. If 30 days is insufficient to characterize prevalence/absence of a rare IOC, run a scoped query_lake (adapt TimestampTimeGenerated for Sentinel/LA tables). Use this for evidence, not as the primary engine.

  7. CD metadata. Attach a <!-- cd-metadata --> block to every query per the detection-authoring CD Metadata Contract. Set cd_ready: true only for high-fidelity, low-noise queries that actually validated cleanly; otherwise cd_ready: false with adaptation_notes explaining what's needed.

  8. IOC freshness note. IOC-match queries (hashes/domains) rot. Note that operators rotate IOCs and recommend periodic refresh from current MS TI / VirusTotal / a TI indicator table.


Campaign File Format

Match the existing files in queries/threat-intelligence/YYYY-MM/ exactly. Structure:

# <Threat / Actor / Campaign> — Threat Hunts

**Created:** YYYY-MM-DD  
**Platform:** Microsoft Defender XDR | Microsoft Sentinel | Both  
**Tables:** <exact KQL table names, comma-separated>  
**Keywords:** <attack techniques, actor names, tooling, artifacts, field names>  
**MITRE:** <technique/tactic IDs, comma-separated>  
**Domains:** <threat-pulse domain tags: incidents|identity|spn|endpoint|email|admin|cloud|exposure>  
**Timeframe:** Last N days (configurable)  
**Source:** [<Article title> (<date>)](<article_url>)

---

## Threat Overview
<2–4 sentence synopsis grounded in the article. Include actor attribution if stated.>

### TTP Summary
| Capability | TTP |
|---|---|
| ... | ... |

### ⚠️ Hunt Pitfalls
| Pitfall | Mitigation |
|---|---|
| ... | ... |

---

## IOC Reference
<Table of published IOCs (hashes/domains/IPs/filenames). Note they rot; recommend refresh.>

---

## Query 1: <Title>

**Purpose:** <what it detects, and what a clean result looks like>  
**Severity:** <Low|Medium|High>  
**MITRE:** <technique IDs>  
<!-- cd-metadata
cd_ready: true|false
cd_table: <PrimaryTable>
cd_frequency: NRT|Hourly|...
cd_severity: <Low|Medium|High>
cd_mitre: ["T...."]
cd_entities: ["device","file","account",...]
cd_adaptation_notes: "<honest notes>"
-->
` ` `kql
<tested query>
` ` `
**Expected results:** <what to expect; 0-row interpretation if a direct IOC sweep>

---

## Query 2: ...
...

---

## General Tuning Notes
1. IOC refresh ...
2. Telemetry gaps ...
3. CD-readiness summary ...

---

## References
- Microsoft Threat Intelligence — [<title>](<url>)
- MITRE ATT&CK — [<technique/actor>](<attack url>)
- Companion files: [`queries/<domain>/<file>.md`](...)

Header field requirements (enforced by build_manifest.py): Tables, Keywords, MITRE, and Domains are mandatory. Domains values must come from the valid set (incidents, identity, spn, endpoint, email, admin, cloud, exposure). A missing Domains is an error-level manifest warning.

Do NOT pre-write a ## Quick Reference — Query Index section — generate_tocs.py inserts it. Pre-creating it breaks the strip-and-reinsert logic.


Structured Output Contract

At the end of every run, emit a JSON array (one object per article considered) so the calling automation can isolate per-article PRs. Print it in a fenced ```json block:

[
  {
    "article_title": "SOHO router compromise leads to DNS hijacking...",
    "article_url": "https://www.microsoft.com/en-us/security/blog/2026/04/07/...",
    "published": "2026-04-07T00:00:00Z",
    "decision": "campaign",
    "reason": "BUILD: 6 mappable TTPs + IOCs (endpoint, identity)",
    "file_path": "queries/threat-intelligence/2026-04/dns_hijacking_soho_compromise.md",
    "queries_written": 9,
    "queries_tested": 9,
    "queries_cd_ready": 4,
    "domains": ["endpoint", "identity"]
  },
  {
    "article_title": "Microsoft named a Leader in ...",
    "article_url": "https://www.microsoft.com/en-us/security/blog/2026/04/05/...",
    "published": "2026-04-05T00:00:00Z",
    "decision": "skipped",
    "reason": "SKIP: analyst-recognition post, G1 fail",
    "file_path": null
  }
]

decisioncampaign | skipped. For skipped, file_path is null. Follow the JSON block with a short human-readable summary (counts, what was built, what was skipped and why).


In-Chat Hunt Findings Summary

After the structured JSON, emit a per-article findings summary that reports what the test runs actually surfaced in the tenant. This is the counterpart to the PII-free campaign file: the file is the reusable, sanitized hunt definition; this summary is the investigation result of running those hunts right now.

Where it goes: chat / run output only. Never write it to a tracked file (not the campaign file, not any queries/** or docs/** file). For unattended runs, the calling automation decides where to route it (e.g., PR description, notification, ticket) per its own data-handling policy — the skill just emits it.

PII posture: Unlike the committed campaign file, this summary may include the concrete entities an analyst needs to act (device names, UPNs, IPs, file hashes, sender addresses, message IDs) — it is investigation output to an operator who already has tenant access, the same as any other investigation skill's chat output. Do not redact what's needed for triage; do not persist it to the repo.

Skip when nothing actionable: If every query returned a verified-clean 0 (e.g., all IOC sweeps clean in a tenant where the IOCs predate the AH window), say so in one line per query rather than padding. The value is in the hits and the FPs, not in restating "0 rows" decoratively.

Format

## 🔎 Hunt Findings — <Article Title> (<run date>)
**Workspace:** <name>  **Lookback:** <window>  **Queries run:** <n>

| # | Query | Rows | Assessment | Action |
|---|-------|------|------------|--------|
| 1 | AI-brand display-name spoof | 0 | ✅ Clean (tuned; no spoofed-domain phish in window) | None |
| 4 | Fake-AI installer download | 3 | 🟠 2 likely FP (sanctioned vendor), 1 to review | Verify host on DEVICE-X; see below |
| 7 | Endpoint IOC sweep | 0 | ✅ Clean — IOCs predate 30d AH window | Re-run in Data Lake (90d) for retrospective |

### 🔴 True / suspected positives
- **Q4 — DEVICE-X / user@contoso.com:** downloaded `seedance_setup_x64.exe` from `hxxp://…` at <time>. Not a sanctioned vendor host. **Follow-up:** isolate/triage device, pivot Q5 (execution) + Q7 (C2).

### 🟠 False positives to tune
- **Q4 — 2 rows:** `<vendor>` installer from `downloads.<vendor>.com` — legitimate. **Tuning:** add `downloads.<vendor>.com` to the trusted-host list (reflected generically in the file's Tuning Notes, not as a literal tenant value).

### ⚠️ Follow-up actions
- [ ] Re-run Q3/Q7 IOC sweeps in Sentinel Data Lake (>30d) for retrospective coverage.
- [ ] Confirm DEVICE-X download disposition with endpoint team.
- [ ] If positives confirmed, consider promoting Q1/Q4/Q5 to custom detections (see detection-authoring skill).

Closing the loop with the campaign file: when a finding reveals a tuning need (e.g., a legitimate host triggering FPs), capture the generic fix in the campaign file's Tuning Notes / adaptation_notes (e.g., "exclude sanctioned vendor download hosts") — never the literal tenant value. The findings summary names the specific host; the file describes the class of exclusion.


Known Pitfalls

Pitfall Mitigation
Feed exposes only summaries, not full TTPs Always web_fetch the full article body before the relevance gate and query authoring
Atom vs RSS schema differences (<entry>/<published> vs <item>/<pubDate>) The Phase-1 parser handles both; never hardcode one shape
Treating a marketing/recognition post as huntable Apply the auto-skip categories; G1 must find real behavior
Claiming a query is "tested" when it errored or hit the AH safety filter Only mark tested if it ran and returned a sane result; otherwise cd_ready: false + notes
Pasting live tenant entities (from test runs) into the committed file Campaign files are PII-free; test data informs tuning notes only
Placeholdering or omitting an article's published IOCs Published IOCs are public, not PII — transcribe them verbatim into the IOC Reference table AND the IOC-sweep queries. Never ship <HASH1> placeholders or "see table" stand-ins.
Inventing / mis-transcribing an IOC Every hash/domain/URL/filename must appear character-for-character in the article's IOC table. Re-verify against the source before finalizing; a hallucinated IOC is a critical evidence-integrity failure.
Using Data Lake as the primary engine AH ≤30d is primary; Data Lake >30d for supporting evidence only
Hand-writing the Quick Reference TOC Let generate_tocs.py generate it
Forgetting to regenerate the manifest Always run build_manifest.py after writing files; resolve error-level warnings
Duplicating an existing campaign/query Grep first; extend or cite companions instead of duplicating
Performing git operations Never — publishing is the automation's responsibility
Putting real findings/entities in the committed file, OR omitting them from chat Two separate outputs: campaign file = PII-free reusable hunt; In-Chat Hunt Findings Summary = real hits/FPs/follow-ups (chat only). Don't merge them.

Quality Checklist

Before emitting results, confirm:

  • Every candidate has an explicit decision + evidence-based reason
  • Each campaign file matches the standard format (header fields complete, Domains valid)
  • Every query has a cd-metadata block with an honest cd_ready value
  • Every query was tested in Advanced Hunting (or its non-execution is documented)
  • Zero-result queries were sanity-checked (desired vs broken)
  • No live tenant PII anywhere in the committed file
  • Every published IOC from the article is present verbatim (no placeholders/omissions) and every IOC in the file traces back to the article's IOC table (no invented/mis-transcribed indicators)
  • build_manifest.py runs clean (no error-level warnings on the new file)
  • generate_tocs.py has inserted the Quick Reference TOC
  • Structured JSON result emitted + human summary
  • In-Chat Hunt Findings Summary emitted (hits, FPs to tune, follow-ups) — chat/run output only, not a tracked file
  • No git operations performed
用于全面调查Entra ID用户账户的安全技能,涵盖登录异常、MFA状态、设备合规及身份保护风险。支持多种输出格式并包含快捷调查路径,适用于安全事件分析与合规审查。
investigate user security investigation user investigation check user activity analyze sign-ins
.github/skills/user-investigation/SKILL.md
npx skills add SCStelz/security-investigator --skill user-investigation -g -y
SKILL.md
Frontmatter
{
    "name": "user-investigation",
    "description": "Use this skill when asked to investigate a user account for security issues, suspicious activity, or compliance review. Triggers on keywords like \"investigate user\", \"security investigation\", \"user investigation\", \"check user activity\", \"analyze sign-ins\", or when a UPN\/email is mentioned with investigation context. This skill provides comprehensive Entra ID user security analysis including sign-in anomalies, MFA status, device compliance, audit logs, security incidents, Identity Protection risk, and automated reports (HTML, markdown file, or inline chat).",
    "drill_down_prompt": "Investigate user {entity} — sign-in anomalies, MFA, audit trail, Identity Protection risk",
    "threat_pulse_domains": [
        "identity"
    ]
}

User Security Investigation - Instructions

Purpose

This skill performs comprehensive security investigations on Entra ID user accounts, analyzing sign-in patterns, anomalies, MFA status, device compliance, audit logs, Office 365 activity, security incidents, and Identity Protection risk signals.


📑 TABLE OF CONTENTS

  1. Critical Workflow Rules - Start here!
  2. Investigation Types - Standard/Quick/Comprehensive
  3. Output Modes - Inline / Markdown file / HTML report
  4. Quick Start - 6-step investigation pattern
  5. Execution Workflow - Complete process
  6. Sample KQL Queries - Validated query patterns
  7. Microsoft Graph Queries - Identity Protection integration
  8. Markdown Report Template - Full markdown report structure
  9. JSON Export Structure - Required fields (HTML report)
  10. Error Handling - Troubleshooting guide
  11. SVG Dashboard Generation - Visual dashboard from report data

Investigation shortcuts:

  • Risky user quick triage (TP Q3): Q6 (security incidents) → Q2 (anomalies) → Q12 (UEBA anomalies) → Q3d (sign-ins by IP) → Graph: MFA methods
  • Compromised user forensics (TP Q3+Q9): Q3 (sign-in summary) → Q5 (OfficeActivity) → Q3d (IP breakdown) → Q1 (priority IPs for enrichment)
  • Password spray target (TP Q4): Q3c (sign-in failures) → Q3d (IPs hitting this user) → Q6 (related incidents)
  • Post-incident user timeline (TP Q1, incident follow-up): Q4 (audit logs) → Q5 (O365 activity) → Q10 (DLP events) → Q6 (all incidents)
  • IP enrichment for user (TP Q3+Q4): Q1 (priority IP extraction) → Q11 (TI matches) → enrich_ips.py
  • UEBA behavioral context (TP Q3, portal UEBA anomalies): Q12 (Anomalies table) → Q6 (related incidents) → Q4 (audit trail)

⛔ Shortcut Default Rule: When a matching shortcut exists for the investigation context, use it — don't run the full workflow. Only run full Batch 1 + Batch 2 when the user explicitly requests "full investigation", "comprehensive", or "deep dive". Shortcuts render only the report sections relevant to their query chain (plus Executive Summary and Recommendations, always).


⚠️ CRITICAL WORKFLOW RULES - READ FIRST ⚠️

Before starting ANY user investigation:

  1. ALWAYS get User Object ID FIRST (required for SecurityIncident and Identity Protection queries)
  2. ALWAYS calculate date ranges correctly (use current date from context - see Date Range section)
  3. ALWAYS ask the user for output mode if not specified: inline chat summary, markdown file report, HTML report, or any combination (see Output Modes)
  4. ALWAYS track and report time after each major step (mandatory)
  5. ALWAYS run independent queries in parallel (drastically faster execution)
  6. ALWAYS use create_file for JSON export and markdown reports (NEVER use PowerShell terminal commands)
  7. ⛔ ALWAYS enforce Sentinel workspace selection (see Workspace Selection section below)

⛔ MANDATORY: Sentinel Workspace Selection

This skill requires a Sentinel workspace to execute queries. Follow these rules STRICTLY:

When invoked from a parent skill (incident-investigation, threat-pulse, etc.):

  • Inherit the workspace selection from the parent investigation context
  • If no workspace was selected in parent context: STOP and ask user to select
  • Use the SELECTED_WORKSPACE_IDS passed from the parent skill
  • Skip output mode prompts — default to inline chat (the parent skill controls the final output format)

When invoked standalone (direct user request):

  1. ALWAYS call list_sentinel_workspaces MCP tool FIRST
  2. If 1 workspace exists: Auto-select, display to user, proceed
  3. If multiple workspaces exist:
    • Display all workspaces with Name and ID
    • ASK: "Which Sentinel workspace should I use for this investigation?"
    • ⛔ STOP AND WAIT for user response
    • ⛔ DO NOT proceed until user explicitly selects
  4. If a query fails on the selected workspace:
    • ⛔ DO NOT automatically try another workspace
    • STOP and report the error
    • Display available workspaces
    • ASK user to select a different workspace
    • WAIT for user response

Workspace Failure Handling

IF query returns "Failed to resolve table" or similar error:
    - STOP IMMEDIATELY
    - Report: "⚠️ Query failed on workspace [NAME] ([ID]). Error: [ERROR_MESSAGE]"
    - Display: "Available workspaces: [LIST_ALL_WORKSPACES]"
    - ASK: "Which workspace should I use instead?"
    - WAIT for explicit user response
    - DO NOT retry with a different workspace automatically

🔴 PROHIBITED ACTIONS:

  • ❌ Selecting a workspace without user consent when multiple exist
  • ❌ Switching to another workspace after a failure without asking
  • ❌ Proceeding with investigation if workspace selection is ambiguous
  • ❌ Assuming a workspace based on previous sessions

Date Range Rules:

  • Real-time/recent searches: Add +2 days to current date for end range
  • Historical ranges: Add +1 day to user's specified end date
  • Example: Current date = Nov 25; "Last 7 days" → datetime(2025-11-18) to datetime(2025-11-27)

Available Investigation Types

Standard Investigation (7 days)

When to use: General security reviews, routine investigations

Example prompts:

Quick Investigation (1 day)

When to use: Urgent cases, recent suspicious activity

Example prompts:

Comprehensive Investigation (30 days)

When to use: Deep-dive analysis, compliance reviews, thorough forensics

Example prompts:

All types include: Anomaly detection, sign-in analysis, IP enrichment, Graph identity data, device compliance, audit logs, Office 365 activity, security alerts, threat intelligence, risk assessment, and automated recommendations.


Output Modes

This skill supports three output modes. ASK the user which they prefer if not explicitly specified. Multiple modes may be selected simultaneously.

Mode 1: Inline Chat Summary (Default)

  • Render the full investigation analysis directly in the chat response
  • Includes key metrics, risk assessment, anomalies, IP intelligence, sign-in patterns, and recommendations
  • Best for quick review and interactive follow-up questions
  • No file output — results stay in the chat context

Mode 2: Markdown File Report

  • Save a comprehensive investigation report to reports/user-investigations/user_investigation_<username>_<YYYYMMDD_HHMMSS>.md
  • All sections from inline mode plus additional detail (full IP tables, query appendix, complete audit trail)
  • Uses the Markdown Report Template defined below
  • Use create_file tool — NEVER use terminal commands for file output
  • Filename pattern: user_investigation_<username>_YYYYMMDD_HHMMSS.md (extract username from UPN, e.g., jdoe from jdoe@contoso.com)

Mode 3: HTML Report (Legacy)

  • Export investigation data to JSON, then generate a styled HTML report via generate_report_from_json.py
  • Interactive IP cards, paginated tables, copy-KQL buttons, and risk-colored visualizations
  • Best for sharing with stakeholders who prefer a polished visual report
  • Requires the Python report generator pipeline (JSON export → IP enrichment → HTML generation)

Markdown Rendering Notes

  • ✅ ASCII tables, box-drawing characters, and bar charts render perfectly in markdown code blocks
  • ✅ Unicode block characters ( full block, box-drawing horizontal) display correctly in monospaced fonts
  • ✅ Emoji indicators (🔴🟢🟡⚠️✅) render natively in GitHub-flavored markdown
  • ✅ Standard markdown tables (| col |) render as formatted tables
  • Tip: Wrap all ASCII art in triple-backtick code fences for consistent rendering

Mode Selection Examples

User Request Mode(s)
"Investigate user@domain.com" (no mode specified) ASK user to choose
"Investigate user@domain.com — markdown report" Mode 2 only
"Investigate user@domain.com — full report" Mode 2 + Mode 3 (both)
"Quick investigate user@domain.com" Mode 1 (inline)
"Investigate user@domain.com — HTML report" Mode 3 only
"Investigate user@domain.com — inline and markdown" Mode 1 + Mode 2

Quick Start (TL;DR)

When a user requests a security investigation:

  1. Get User ID:

    mcp_microsoft_mcp_microsoft_graph_suggest_queries("get user by email")
    mcp_microsoft_mcp_microsoft_graph_get("/v1.0/users/<UPN>?$select=id,onPremisesSecurityIdentifier")
    
  2. Determine Output Mode:

    • If user specified: use that mode (inline / markdown / HTML / combination)
    • If not specified: ASK user — "Which output format? Inline chat summary, markdown file report, HTML report, or a combination?"
  3. Run Parallel Queries:

    • Batch 1: 10 Sentinel queries (anomalies, IP extraction, sign-ins, IP counts, audit logs, incidents, etc.)
    • Batch 2: 6 Graph queries (profile, MFA, devices, Identity Protection)
    • Batch 3: Threat intel enrichment (after extracting IPs from batch 1)
  4. Generate Output (based on selected mode):

    Mode 1 — Inline: Render analysis directly in chat (no file output)

    Mode 2 — Markdown file:

    create_file("reports/user-investigations/user_investigation_<username>_<timestamp>.md", markdown_content)
    

    Mode 3 — HTML report:

    create_file("temp/investigation_<upn_prefix>_<timestamp>.json", json_content)
    
    $env:PYTHONPATH = "<WORKSPACE_ROOT>"
    .venv\Scripts\python.exe scripts/generate_report_from_json.py temp/investigation_<upn_prefix>_<timestamp>.json
    
  5. IP Enrichment (Modes 2 & 3):

    • Mode 2 (Markdown): Run python enrich_ips.py <ip1> <ip2> ... for top IPs extracted from queries, then include enrichment results in the markdown report
    • Mode 3 (HTML): IP enrichment is handled automatically by generate_report_from_json.py
  6. Track time after each major step and report to user


Execution Workflow

🚨 MANDATORY: Time Tracking Pattern

YOU MUST TRACK AND REPORT TIME AFTER EVERY MAJOR STEP:

[MM:SS] ✓ Step description (XX seconds)

Required Reporting Points:

  1. After User ID retrieval
  2. After parallel data collection
  3. After JSON file creation
  4. After report generation
  5. Final: Total elapsed time

Phase 1: Get User ID and SID (REQUIRED FIRST)

- Get user Object ID (Entra ID) and onPremisesSecurityIdentifier (Windows SID) from Microsoft Graph
- Query: /v1.0/users/<UPN>?$select=id,onPremisesSecurityIdentifier

Why this is required:

  • User ID needed for SecurityIncident queries (alerts use User ID, not UPN)
  • User ID needed for Identity Protection queries
  • Windows SID needed for on-premises incident matching
  • Missing User ID = missed incidents (e.g., "Device Code Authentication Flow Detected")

Phase 2: Parallel Data Collection

CRITICAL: Use create_file tool to create JSON - NEVER use PowerShell terminal commands!

Batch 1: Sentinel Queries (Run ALL in parallel)

  • IP selection query (Query 1) - Returns up to 15 prioritized IPs
  • Anomalies query (Query 2)
  • UEBA anomaly summary (Query 12) - Sentinel Anomalies table: scored behavioral detections
  • Sign-in by application (Query 3)
  • Sign-in by location (Query 3b)
  • Sign-in failures (Query 3c)
  • Audit logs (Query 4)
  • Office 365 activity (Query 5)
  • DLP events (Query 10)
  • Security incidents (Query 6)

After Batch 1 completes: Extract IP Array from Query 1 Results

  • Extract IPAddress column into array: ["ip1", "ip2", "ip3", ...]
  • Build dynamic array for next batch: let target_ips = dynamic(["ip1", "ip2", "ip3", ...]);

Batch 2: IP Enrichment + Graph Queries (Run ALL in parallel)

  • Threat Intel query (Query 11) - Uses IPs from Query 1
  • IP frequency query (Query 3d) - Uses IPs from Query 1
  • User profile (Graph)
  • MFA methods (Graph)
  • Registered devices (Graph)
  • User risk profile (Graph)
  • Risk detections (Graph)
  • Risky sign-ins (Graph)

IP Selection Strategy (Query 1 - Deterministic KQL with Risky IPs):

  • Priority 1: Anomaly IPs (from Signinlogs_Anomalies_KQL_CL where AnomalyType endswith "IP") - 8 slots
  • Priority 2: Risky IPs (from AADUserRiskEvents - Identity Protection flagged IPs) - 4 slots
  • Priority 3: Frequent IPs (top sign-in count for baseline context) - 3 slots
  • Deduplication: Anomaly IPs exclude from risky; Anomaly+Risky exclude from frequent (no duplicates)
  • Result: Up to 15 unique IPs (8 anomaly + 4 risky-only + 3 frequent-only)

Phase 3: Export & Generate Report (Mode-Dependent)

Mode 1 — Inline Chat Summary

  • No file export needed
  • Render the full investigation analysis directly in chat using the section structure from the Markdown Report Template as a guide
  • Include: Executive Summary, Key Metrics, Anomalies, IP Intelligence summary, Sign-in Patterns, Risk Assessment, Recommendations
  • Use emoji-coded tables for risk factors and mitigating factors

Mode 2 — Markdown File Report

  1. Assess IP enrichment needs:

    • Extract the top priority IPs from Query 1 results
    • Run python enrich_ips.py <ip1> <ip2> ... for threat intelligence enrichment
    • Parse the output to populate IP Intelligence tables in the report
  2. Build the markdown report using the Markdown Report Template below

    • Populate ALL sections with actual query data
    • For sections with no data: use the explicit absence confirmation pattern (e.g., "✅ No anomalies detected...")
    • Calculate risk score and assessment dynamically (same logic as HTML report — see generate_report_from_json.py)
  3. Save the report:

    create_file("reports/user-investigations/user_investigation_<username>_YYYYMMDD_HHMMSS.md", markdown_content)
    
    • Use create_file tool — NEVER use terminal commands for file output
    • Extract username from UPN (e.g., jdoe from jdoe@contoso.com)

Mode 3 — HTML Report (Legacy)

  1. Export to JSON: Create single JSON file: temp/investigation_{upn_prefix}_{timestamp}.json Merge all results into one dict structure (see JSON Export Structure section below).

  2. Generate HTML report:

    $env:PYTHONPATH = "<WORKSPACE_ROOT>"
    cd "<WORKSPACE_ROOT>"
    .\.venv\Scripts\python.exe scripts/generate_report_from_json.py temp/investigation_<upn_prefix>_<timestamp>.json
    

The HTML report generator handles:

  • Dataclass transformation logic
  • IP enrichment (prioritized: anomaly IPs first, then frequent sign-in IPs, cap at 10)
  • Dynamic risk assessment (NO hardcoded text - all metrics calculated from data)
  • KQL query template population
  • Result counts calculation
  • HTML report generation with modern, streamlined design

Combining Modes

When multiple modes are selected (e.g., "markdown and HTML"):

  • Run the data collection once (Phase 2)
  • Generate each output format in sequence
  • For Mode 2 + Mode 3: the JSON export from Mode 3 can reuse the same data; generate markdown first, then JSON + HTML

Required Field Specifications

User Profile Query

/v1.0/users/<UPN>?$select=id,displayName,userPrincipalName,mail,userType,jobTitle,department,officeLocation,accountEnabled,onPremisesSecurityIdentifier
  • All fields REQUIRED for report generation
  • Default null values: department="Unknown", officeLocation="Unknown"
  • onPremisesSecurityIdentifier returns Windows SID (format: S-1-5-21-...) - REQUIRED for on-premises incident matching

Device Query

/v1.0/users/<USER_ID>/ownedDevices?$select=id,deviceId,displayName,operatingSystem,operatingSystemVersion,registrationDateTime,isCompliant,isManaged,trustType,approximateLastSignInDateTime&$orderby=approximateLastSignInDateTime desc&$top=5&$count=true
  • All fields REQUIRED for report generation
  • Default null values: trustType="Workplace", approximateLastSignInDateTime="2025-01-01T00:00:00Z"

MFA Methods Query

/v1.0/users/<USER_ID>/authentication/methods?$top=5

Sample KQL Queries

Replace <UPN>, <StartDate>, <EndDate> in these patterns.

⚠️ CRITICAL: START WITH THESE EXACT QUERY PATTERNS These queries have been tested and validated. Use them as your PRIMARY reference.

Tool Selection for This Skill

Follow the global tool selection rule from copilot-instructions.md:

Investigation Lookback Tool Reason
≤ 30 days (Quick, Standard, Comprehensive) RunAdvancedHuntingQuery Free for Analytics-tier tables; covers all connected workspace tables
> 30 days (custom range) mcp_sentinel-data_query_lake AH only retains 30 days
AH query blocked by safety filter mcp_sentinel-data_query_lake Fallback
AH returns "table not found" mcp_sentinel-data_query_lake Fallback

Default: Use RunAdvancedHuntingQuery for all standard investigations. All three investigation types (1d, 7d, 30d) fit within AH's 30-day retention window. Only fall back to Data Lake when the lookback exceeds 30 days or AH fails.

Timestamp column: All tables used in this skill (SigninLogs, AuditLogs, SecurityAlert, SecurityIncident, OfficeActivity, CloudAppEvents, AADUserRiskEvents, Signinlogs_Anomalies_KQL_CL, ThreatIntelIndicators) use TimeGenerated in both tools — no adaptation needed when switching.


📅 Date Range Quick Reference

🔴 STEP 0: GET CURRENT DATE FIRST (MANDATORY) 🔴

  • ALWAYS check the current date from the context header BEFORE calculating date ranges
  • NEVER use hardcoded years - the year changes and you WILL query the wrong timeframe

RULE 1: Real-Time/Recent Searches (Current Activity)

  • Add +2 days to current date for end range
  • Why +2? +1 for timezone offset (PST behind UTC) + +1 for inclusive end-of-day
  • Pattern: Today is Nov 25 (PST) → Use datetime(2025-11-27) as end date

RULE 2: Historical Searches (User-Specified Dates)

  • Add +1 day to user's specified end date
  • Why +1? To include all 24 hours of the final day

Examples Table (Assuming Current Date = November 27, 2025):

User Request <StartDate> <EndDate> Rule Applied
"Last 7 days" 2025-11-20 2025-11-29 Rule 1 (+2)
"Last 30 days" 2025-10-28 2025-11-29 Rule 1 (+2)
"Nov 21 to Nov 23" 2025-11-21 2025-11-24 Rule 2 (+1)

🚨 CRITICAL - SIGN-IN QUERIES REQUIREMENT 🚨 You MUST run ALL THREE sign-in queries (3, 3b, 3c) to populate the signin_events dict!


1. Extract Top Priority IPs (Deterministic IP Selection with Risky IPs)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let upn = '<UPN>';

// Priority 1: Anomaly IPs (top 8 by anomaly count)
let anomaly_ips = 
    Signinlogs_Anomalies_KQL_CL
    | where DetectedDateTime between (start .. end)
    | where UserPrincipalName =~ upn
    | where AnomalyType endswith "IP"
    | summarize AnomalyCount = count(), FirstSeen = min(DetectedDateTime) by IPAddress = Value
    | order by AnomalyCount desc, FirstSeen asc
    | take 8
    | extend Priority = 1, Source = "Anomaly";

// Priority 2: Risky IPs from Identity Protection (top 10 for selection pool)
let risky_ips_pool = 
    AADUserRiskEvents
    | where ActivityDateTime between (start .. end)
    | where UserPrincipalName =~ upn
    | where isnotempty(IpAddress)
    | summarize RiskCount = count(), FirstSeen = min(ActivityDateTime) by IPAddress = IpAddress
    | order by RiskCount desc, FirstSeen asc
    | take 10
    | extend Priority = 2, Source = "RiskyIP";

// Priority 3: Frequent Sign-in IPs (top 10 for selection pool)
let frequent_ips_pool =
    union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
    | where TimeGenerated between (start .. end)
    | where UserPrincipalName =~ upn
    | summarize SignInCount = count(), FirstSeen = min(TimeGenerated) by IPAddress
    | order by SignInCount desc, FirstSeen asc
    | take 10
    | extend Priority = 3, Source = "Frequent";

// Get anomaly IP list for exclusion from risky slot
let anomaly_ip_list = anomaly_ips | project IPAddress;

// Get anomaly + risky IP list for exclusion from frequent slot
let priority_ip_list = 
    union anomaly_ips, risky_ips_pool
    | project IPAddress;

// Reserve slots with deduplication: 8 anomaly + 4 risky + 3 frequent
let anomaly_slot = anomaly_ips | extend Count = AnomalyCount;
let risky_slot = risky_ips_pool 
    | join kind=anti anomaly_ip_list on IPAddress
    | order by RiskCount desc, FirstSeen asc
    | take 4
    | extend Count = RiskCount;
let frequent_slot = frequent_ips_pool 
    | join kind=anti priority_ip_list on IPAddress
    | order by SignInCount desc, FirstSeen asc
    | take 3
    | extend Count = SignInCount;

union anomaly_slot, risky_slot, frequent_slot
| project IPAddress, Priority, Count, Source
| order by Priority asc, Count desc
| project IPAddress

2. Anomalies (Signinlogs_Anomalies_KQL_CL)

Signinlogs_Anomalies_KQL_CL
| where DetectedDateTime between (datetime(<StartDate>) .. datetime(<EndDate>))
| where UserPrincipalName =~ '<UPN>'
| extend Severity = case(
    BaselineSize < 3, "Informational",
    CountryNovelty and CityNovelty and ArtifactHits >= 20, "High",
    ArtifactHits >= 10, "Medium",
    (CountryNovelty or CityNovelty or StateNovelty), "Medium",
    ArtifactHits >= 5, "Low",
    "Informational")
| extend SeverityOrder = case(Severity == 'High', 1, Severity == 'Medium', 2, Severity == 'Low', 3, 4)
| project
    DetectedDateTime,
    UserPrincipalName,
    AnomalyType,
    Value,
    Severity,
    SeverityOrder,
    Country,
    City,
    State,
    CountryNovelty,
    CityNovelty,
    StateNovelty,
    ArtifactHits,
    FirstSeenRecent,
    BaselineSize,
    OS,
    BrowserFamily,
    RawBrowser
| order by SeverityOrder asc, DetectedDateTime desc
| take 10

3. Interactive & Non-Interactive Sign-ins (Summary by Application)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| summarize 
    SignInCount=count(),
    SuccessCount=countif(ResultType == '0'),
    FailureCount=countif(ResultType != '0'),
    FirstSeen=min(TimeGenerated),
    LastSeen=max(TimeGenerated),
    IPAddresses=make_set(IPAddress),
    UniqueLocations=dcount(Location)
    by AppDisplayName
| order by SignInCount desc
| take 5

3b. Sign-ins Summary by Location

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| where isnotempty(Location)
| summarize 
    SignInCount=count(),
    SuccessCount=countif(ResultType == '0'),
    FailureCount=countif(ResultType != '0'),
    FirstSeen=min(TimeGenerated),
    LastSeen=max(TimeGenerated),
    IPAddresses=make_set(IPAddress),
    Applications=make_set(AppDisplayName, 5)
    by Location
| order by SignInCount desc
| take 5

3c. Sign-in Failures (Detailed)

let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| where ResultType != '0'
| summarize 
    FailureCount=count(),
    FirstSeen=min(TimeGenerated),
    LastSeen=max(TimeGenerated),
    Applications=make_set(AppDisplayName, 3),
    Locations=make_set(Location, 3)
    by ResultType, ResultDescription
| order by FailureCount desc
| take 5

3d. Sign-in Counts by IP Address

let target_ips = dynamic(["<IP_1>", "<IP_2>", "<IP_3>", ...]);
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let most_recent_signins = union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
| where TimeGenerated between (start .. end)
| where UserPrincipalName =~ '<UPN>'
| where IPAddress in (target_ips)
| summarize arg_max(TimeGenerated, *) by IPAddress;
most_recent_signins
| extend AuthDetails = parse_json(AuthenticationDetails)
| extend HasAuthDetails = array_length(AuthDetails) > 0
| extend AuthDetailsToExpand = iif(HasAuthDetails, AuthDetails, dynamic([{"authenticationStepResultDetail": ""}]))
| mv-expand AuthDetailsToExpand
| extend AuthStepResultDetail = tostring(AuthDetailsToExpand.authenticationStepResultDetail)
| extend AuthPriority = case(
    AuthStepResultDetail has "MFA requirement satisfied", 1,
    AuthStepResultDetail has "Correct password", 2,
    AuthStepResultDetail has "Passkey", 2,
    AuthStepResultDetail has "Phone sign-in", 2,
    AuthStepResultDetail has "SMS verification", 2,
    AuthStepResultDetail has "First factor requirement satisfied", 3,
    AuthStepResultDetail has "MFA required", 4,
    999)
| summarize 
    MostRecentTime = any(TimeGenerated),
    MostRecentResultType = any(ResultType),
    HasAuthDetails = any(HasAuthDetails),
    MinPriority = min(AuthPriority),
    AllAuthDetails = make_set(AuthStepResultDetail)
    by IPAddress
| extend LastAuthResultDetail = case(
    MostRecentResultType != "0", "Authentication failed",
    not(HasAuthDetails) and MostRecentResultType == "0", "Token",
    MinPriority == 1 and AllAuthDetails has "MFA requirement satisfied", "MFA requirement satisfied by claim in the token",
    MinPriority == 2 and AllAuthDetails has "Correct password", "Correct password",
    MinPriority == 2 and AllAuthDetails has "Passkey (device-bound)", "Passkey (device-bound)",
    MinPriority == 3 and AllAuthDetails has "First factor requirement satisfied by claim in the token", "First factor requirement satisfied by claim in the token",
    MinPriority == 4 and AllAuthDetails has "MFA required in Entra ID", "MFA required in Entra ID",
    tostring(AllAuthDetails[0]))
| join kind=inner (
    union isfuzzy=true SigninLogs, AADNonInteractiveUserSignInLogs
    | where TimeGenerated between (start .. end)
    | where UserPrincipalName =~ '<UPN>'
    | where IPAddress in (target_ips)
    | summarize 
        SignInCount = count(),
        SuccessCount = countif(ResultType == '0'),
        FailureCount = countif(ResultType != '0'),
        FirstSeen = min(TimeGenerated),
        LastSeen = max(TimeGenerated)
        by IPAddress
) on IPAddress
| project IPAddress, SignInCount, SuccessCount, FailureCount, FirstSeen, LastSeen, LastAuthResultDetail
| order by SignInCount desc

4. Entra ID Audit Log Activity (Aggregated Summary)

Tool: RunAdvancedHuntingQuery (≤30d) | mcp_sentinel-data_query_lake (>30d fallback)

AH parsing note: InitiatedBy is dynamic in AH — use tostring(InitiatedBy.user.userPrincipalName) for direct field access. For TargetResources, use tostring(TargetResources[0].displayName). Do NOT double-wrap with parse_json(tostring(parse_json(tostring(...)))) — that Data Lake pattern can cause errors in AH.

AuditLogs
| where TimeGenerated between (datetime(<StartDate>) .. datetime(<EndDate>))
| where Identity =~ '<UPN>' or tostring(InitiatedBy) has '<UPN>'
| summarize 
    Count=count(),
    FirstSeen=min(TimeGenerated),
    LastSeen=max(TimeGenerated),
    Operations=make_set(OperationName, 10)
    by Category, Result
| order by Count desc
| take 10

Ad-hoc drill-down pattern (AH-safe): When you need detailed audit entries beyond the summary above:

AuditLogs
| where TimeGenerated between (datetime(<StartDate>) .. datetime(<EndDate>))
| where Identity =~ '<UPN>' or tostring(InitiatedBy) has '<UPN>'
| extend Actor = tostring(InitiatedBy.user.userPrincipalName)
| extend Target = tostring(TargetResources[0].displayName)
| project TimeGenerated, OperationName, Actor, Target, Result, Category
| order by TimeGenerated desc
| take 30

5. Office 365 (Email / Teams / SharePoint) Activity Distribution

OfficeActivity
| where TimeGenerated between (datetime(<StartDate>) .. datetime(<EndDate>))
| where UserId =~ '<UPN>'
| summarize ActivityCount = count() by RecordType, Operation
| order by ActivityCount desc
| take 5

6. Security Incidents with Alerts Correlated to User

let targetUPN = "<UPN>";
let targetUserId = "<USER_OBJECT_ID>";  // REQUIRED: Get from Microsoft Graph API
let targetSid = "<WINDOWS_SID>";  // REQUIRED: Get from Microsoft Graph API
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
let relevantAlerts = SecurityAlert
| where TimeGenerated between (start .. end)
| where Entities has targetUPN or Entities has targetUserId or Entities has targetSid
| summarize arg_max(TimeGenerated, *) by SystemAlertId
| project SystemAlertId, AlertName, AlertSeverity, ProviderName, Tactics;
SecurityIncident
| where CreatedTime between (start .. end)
| summarize arg_max(TimeGenerated, *) by IncidentNumber
| where not(tostring(Labels) has "Redirected")
| mv-expand AlertId = AlertIds
| extend AlertId = tostring(AlertId)
| join kind=inner relevantAlerts on $left.AlertId == $right.SystemAlertId
| extend ProviderIncidentUrl = tostring(AdditionalData.providerIncidentUrl)
| extend OwnerUPN = tostring(Owner.userPrincipalName)
| extend LastModifiedTime = todatetime(LastModifiedTime)
| summarize 
    Title = any(Title),
    Severity = any(Severity),
    Status = any(Status),
    Classification = any(Classification),
    CreatedTime = any(CreatedTime),
    LastModifiedTime = any(LastModifiedTime),
    OwnerUPN = any(OwnerUPN),
    ProviderIncidentUrl = any(ProviderIncidentUrl),
    AlertCount = count()
    by ProviderIncidentId
| order by LastModifiedTime desc
| take 10

CRITICAL: ALL THREE identifiers are REQUIRED (targetUPN, targetUserId, targetSid) - different alert types use different entity formats.

10. DLP Events (Data Loss Prevention)

let upn = '<UPN>';
let start = datetime(<StartDate>);
let end = datetime(<EndDate>);
CloudAppEvents
| where TimeGenerated between (start .. end)
| where ActionType in ("FileCopiedToRemovableMedia", "FileUploadedToCloud", "FileCopiedToNetworkShare")
| extend ParsedData = parse_json(RawEventData)
| extend DlpAudit = ParsedData["DlpAuditEventMetadata"]
| extend File = ParsedData["ObjectId"]
| extend UserId = ParsedData["UserId"]
| extend DeviceName = ParsedData["DeviceName"]
| extend ClientIP = ParsedData["ClientIP"]
| extend RuleName = ParsedData["PolicyMatchInfo"]["RuleName"]
| extend Operation = ParsedData["Operation"]
| extend TargetDomain = ParsedData["TargetDomain"]
| extend TargetFilePath = ParsedData["TargetFilePath"]
| where isnotnull(DlpAudit)
| where UserId == upn
| summarize by TimeGenerated, tostring(UserId), tostring(DeviceName), tostring(ClientIP), tostring(RuleName), tostring(File), tostring(Operation), tostring(TargetDomain), tostring(TargetFilePath)
| order by TimeGenerated desc
| take 5

11. Threat Intelligence IP Enrichment (Bulk IP Query)

Performance notes: Filter IsActive/ValidUntil before transformations per KQL best practices. The triple replace_string was replaced with direct array indexing split(...)[0].

let target_ips = dynamic(["<IP_1>", "<IP_2>", "<IP_3>"]);
ThreatIntelIndicators
| where IsActive and (ValidUntil > now() or isempty(ValidUntil))
| where tostring(split(ObservableKey, ":")[0]) in ("ipv4-addr", "ipv6-addr", "network-traffic")
| where ObservableValue in (target_ips)
| extend Description = tostring(parse_json(Data).description)
| where Description !contains_cs "State: inactive;" and Description !contains_cs "State: falsepos;"
| extend TrafficLightProtocolLevel = tostring(parse_json(AdditionalFields).TLPLevel)
| extend ActivityGroupNames = extract(@"ActivityGroup:(\S+)", 1, tostring(parse_json(Data).labels))
| summarize arg_max(TimeGenerated, *) by ObservableValue
| project 
    TimeGenerated,
    IPAddress = ObservableValue,
    ThreatDescription = Description,
    ActivityGroupNames,
    Confidence,
    ValidUntil,
    TrafficLightProtocolLevel,
    IsActive
| order by Confidence desc, TimeGenerated desc

12. UEBA Anomaly Summary (Sentinel Anomalies Table)

Purpose: Retrieves scored behavioral anomaly detections from Sentinel's built-in UEBA anomaly rules. Aggregates by anomaly type — collapses high-volume rows (e.g., 50 "Anomalous Role Assignment" events) into a single summary row per template. Extracts only the anomalous flags (IsAnomalous == true) and flattens MITRE arrays. Score range: 0.0–1.0 (≥0.7 = High, 0.3–0.7 = Medium, <0.3 = Low).

Data source: The Anomalies table is the KQL source behind the portal's "UEBA anomalies" section. It is distinct from BehaviorInfo (MCAS, AH-only) and BehaviorAnalytics (raw UEBA events, Data Lake-only). Available in both Advanced Hunting and Data Lake.

Tool: RunAdvancedHuntingQuery (default) or mcp_sentinel-data_query_lake (>30d fallback)

⚠️ TI False Positive: DeviceInsights.ThreatIntelIndicatorType frequently shows BruteForce on corporate/Azure egress IPs (TITAN dynamic reputation). Weight the Score and AnomalyFlags over the TI match — a 0.2-score anomaly with a BruteForce TI hit on a known corporate IP is noise.

let targetUPN = '<UPN>';
let lookback = 30d;
Anomalies
| where TimeGenerated > ago(lookback)
| where UserPrincipalName =~ targetUPN
| extend TI_Type = tostring(DeviceInsights.ThreatIntelIndicatorType)
| mv-apply reason = AnomalyReasons on (
    where tobool(reason.IsAnomalous) == true
    | project FlagName = tostring(reason.Name))
| summarize
    Occurrences = dcount(Id),
    MaxScore = max(Score),
    AvgScore = round(avg(Score), 2),
    Tactics = make_set(parse_json(Tactics)),
    Techniques = make_set(parse_json(Techniques)),
    SourceIPs = make_set(SourceIpAddress, 5),
    AnomalyFlags = make_set(FlagName),
    TI_Flags = make_set_if(TI_Type, isnotempty(TI_Type)),
    FirstSeen = min(StartTime),
    LastSeen = max(EndTime),
    SampleDescription = take_any(Description)
    by AnomalyTemplateName
| mv-apply t = Tactics to typeof(string) on (summarize Tactics = make_set(t))
| mv-apply t = Techniques to typeof(string) on (summarize Techniques = make_set(t))
| extend Tactics = set_difference(Tactics, dynamic([""]))
| extend Techniques = set_difference(Techniques, dynamic([""]))
| order by MaxScore desc, Occurrences desc

Output columns: AnomalyTemplateName, Occurrences (unique anomaly IDs), MaxScore, AvgScore, Tactics, Techniques, SourceIPs, AnomalyFlags (flat set of anomalous reasons), TI_Flags, FirstSeen, LastSeen, SampleDescription (one example description for context).

Verdict guidance:

  • 🔴 Escalate: MaxScore ≥ 0.7 with multiple occurrences, or anomaly type involves credential access / account manipulation
  • 🟠 Investigate: MaxScore ≥ 0.3, or flags include CountryUncommonlyConnectedFromByUser combined with ActionUncommonlyPerformedByUser
  • 🟡 Monitor: Low scores (<0.3) with explainable flags (e.g., first-time admin operations, CTF/lab accounts in target entities)
  • Clear: 0 results — no UEBA anomalies detected

Zero results note: Unlike Q2 (custom Signinlogs_Anomalies_KQL_CL), Q12 queries the built-in Sentinel UEBA Anomalies table. Zero results means no built-in anomaly rules fired — not that UEBA is disabled. If UEBA is not enabled in the workspace, the table may not exist (handle gracefully).


Microsoft Graph Identity Protection Queries

CRITICAL: Always query Identity Protection data in Phase 2 (Batch 2) of investigation workflow

Step 1: Get User Object ID and Windows SID

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/users/<UPN>?$select=id,displayName,userPrincipalName,onPremisesSecurityIdentifier")

Step 2: Get User Risk Profile

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/identityProtection/riskyUsers/<USER_ID>")

Returns: riskLevel (low/medium/high/none), riskState (atRisk/confirmedCompromised/dismissed/remediated)

Step 3: Get Risk Detections

mcp_microsoft_mcp_microsoft_graph_get("/v1.0/identityProtection/riskDetections?$filter=userId eq '<USER_ID>'&$select=id,detectedDateTime,riskEventType,riskLevel,riskState,riskDetail,ipAddress,location,activity,activityDateTime&$orderby=detectedDateTime desc&$top=10")

Returns: Array of risk events with riskEventType (unlikelyTravel, unfamiliarFeatures, anonymizedIPAddress, etc.)

Step 4: Get Risky Sign-ins

mcp_microsoft_mcp_microsoft_graph_get("/beta/auditLogs/signIns?$filter=userId eq '<USER_ID>' and (riskState eq 'atRisk' or riskState eq 'confirmedCompromised')&$select=id,createdDateTime,userPrincipalName,appDisplayName,ipAddress,location,riskState,riskLevelDuringSignIn,riskEventTypes_v2,riskDetail,status&$orderby=createdDateTime desc&$top=5")

NOTE: Risky sign-ins are ONLY available in /beta endpoint, not /v1.0

Common Risk Event Types

  • unlikelyTravel: User traveled impossible distance between sign-ins
  • unfamiliarFeatures: Sign-in from unfamiliar location/device/IP
  • anonymizedIPAddress: Sign-in from Tor, VPN, or proxy
  • maliciousIPAddress: Sign-in from known malicious IP
  • leakedCredentials: User credentials found in leak databases

Markdown Report Template

When outputting to markdown file (Mode 2), use this template. Populate ALL sections with actual query data. For sections with no data, use the explicit absence confirmation pattern.

Filename pattern: reports/user-investigations/user_investigation_<username>_YYYYMMDD_HHMMSS.md

# User Security Investigation Report

**Generated:** YYYY-MM-DD HH:MM UTC
**Workspace:** <workspace_name>
**User:** <display_name> (`<UPN>`)
**Department:** <department> | **Title:** <job_title> | **Location:** <office_location>
**Account Status:** <Enabled/Disabled> | **User Type:** <Member/Guest>
**Investigation Period:** <start_date> → <end_date> (<N> days)
**Investigation Type:** <Standard (7d) / Quick (1d) / Comprehensive (30d)>
**Data Sources:** SigninLogs, AADNonInteractiveUserSignInLogs, AuditLogs, SecurityAlert, SecurityIncident, OfficeActivity, CloudAppEvents, AADUserRiskEvents, Signinlogs_Anomalies_KQL_CL, Identity Protection (Graph API), ThreatIntelIndicators

---

## Executive Summary

<2-4 sentence summary: overall risk level, key findings, most significant anomalies or concerns, and primary recommendation. Ground every claim in evidence from query results.>

**Overall Risk Level:** 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL

---

## Key Metrics

| Metric | Value |
|--------|-------|
| **Total Sign-ins** | <count> |
| **Successful** | <count> (<percentage>%) |
| **Failed** | <count> (<percentage>%) |
| **Unique IPs** | <count> |
| **Unique Locations** | <count> |
| **Anomalies Detected** | <count> (High: <n>, Medium: <n>, Low: <n>) |
| **Security Incidents** | <count> (Open: <n>, Closed: <n>) |
| **Risk Detections** | <count> (atRisk: <n>, remediated: <n>) |
| **DLP Events** | <count> |
| **MFA Methods** | <count> methods |

---

## MFA & Authentication Status

| Factor | Status |
|--------|--------|
| **MFA Enabled** | 🟢 Yes / 🔴 No |
| **Methods** | <list of methods: Authenticator, FIDO2, Phone, etc.> |
| **FIDO2/Passkey** | 🟢 Enrolled / 🟡 Not enrolled |
| **Authenticator App** | 🟢 Enrolled / 🟡 Not enrolled |
| **Phishing-Resistant** | 🟢 Yes (passkey/FIDO2) / 🟡 No |

---

## Identity Protection

### User Risk Profile

| Field | Value |
|-------|-------|
| **Risk Level** | 🔴/🟠/🟡/🟢 <high/medium/low/none> |
| **Risk State** | <atRisk / confirmedCompromised / remediated / dismissed / none> |
| **Risk Detail** | <detail text> |
| **Last Updated** | <datetime> |

### Risk Detections

<If risk detections found:>

| Detected | Risk Type | Level | State | IP Address | Location | Activity |
|----------|-----------|-------|-------|------------|----------|----------|
| <datetime> | <riskEventType> | <level> | <state> | <ip> | <city, country> | <signin/user> |

<If no risk detections:>
✅ No Identity Protection risk detections for this user in the investigation period.

### Risky Sign-ins

<If risky sign-ins found:>

| Time | Application | IP Address | Location | Risk Level | Risk State | Detail |
|------|-------------|------------|----------|------------|------------|--------|
| <datetime> | <app> | <ip> | <city, country> | <level> | <state> | <detail> |

<If no risky sign-ins:>
✅ No risky sign-ins detected for this user in the investigation period.

---

## Anomalies (Signinlogs_Anomalies_KQL_CL)

<If anomalies found:>

| Detected | Type | Value | Severity | Location | Hits | Geo Novelty |
|----------|------|-------|----------|----------|------|-------------|
| <datetime> | <NewInteractiveIP / NewInteractiveDeviceCombo / etc.> | <IP or OS\|Browser> | 🔴/🟠/🟡 <severity> | <country, city> | <count> | <Country: Y/N, City: Y/N> |

**Anomaly Summary:**
- <X> new IP addresses detected (Y with geographic novelty)
- <X> new device combinations detected
- Highest severity: <level> — <brief description of most critical anomaly>

<If no anomalies:>
✅ No sign-in anomalies detected in the investigation period.
- Checked: Signinlogs_Anomalies_KQL_CL (0 records)

---

## IP Intelligence

<Table of up to 15 prioritized IPs with enrichment data. Run `enrich_ips.py` for top IPs.>

| IP Address | Source | Location | ISP/Org | VPN | Abuse Score | Reports | Risk | Sign-ins | Auth Method |
|------------|--------|----------|---------|-----|-------------|---------|------|----------|-------------|
| <ip> | 🔴 Anomaly / 🟠 Risky / 🔵 Frequent | <city, country> | <org> | 🟢 No / 🔴 Yes | <score>% | <count> | HIGH/MED/LOW | <count> (✓<success>/✗<fail>) | <MFA/Password/Token/Passkey> |

### Threat Intelligence Matches

<If TI matches found:>

| IP Address | Threat Description | Confidence | Activity Groups | Valid Until |
|------------|-------------------|------------|-----------------|------------|
| <ip> | <description> | <score> | <groups> | <date> |

<If no TI matches:>
✅ No threat intelligence matches found for investigated IPs.

---

## Sign-in Activity

### Top Applications

| Application | Sign-ins | Success | Failures | Unique Locations | IP Addresses | First Seen | Last Seen |
|-------------|----------|---------|----------|------------------|--------------|------------|-----------|
| <app> | <count> | <count> | <count> | <count> | <ip_list> | <date> | <date> |

### Top Locations

| Location | Sign-ins | Success | Failures | IP Addresses | Applications | First Seen | Last Seen |
|----------|----------|---------|----------|--------------|--------------|------------|-----------|
| <location> | <count> | <count> | <count> | <ip_list> | <app_list> | <date> | <date> |

### Sign-in Failures

<If failures found:>

| Error Code | Description | Count | Applications | Locations | First Seen | Last Seen |
|------------|-------------|-------|--------------|-----------|------------|-----------|
| <code> | <description> | <count> | <app_list> | <loc_list> | <date> | <date> |

**Failure Analysis:**
- <Brief analysis of failure patterns — device compliance (53000), MFA required (50074), blocked by CA (530032), etc.>

<If no failures:>
✅ No sign-in failures detected in the investigation period.

---

## Registered Devices

<If devices found:>

| Device Name | OS | Trust Type | Compliant | Managed | Last Sign-in |
|-------------|-----|------------|-----------|---------|--------------|
| <name> | <os> <version> | <AzureAd/Hybrid/Workplace> | 🟢 Yes / 🔴 No | 🟢 Yes / 🔴 No | <date> |

<If no devices:>
✅ No registered devices found for this user.

---

## Audit Log Activity

<If audit events found:>

| Category | Result | Count | Operations | First Seen | Last Seen |
|----------|--------|-------|------------|------------|-----------|
| <category> | <Success/Failure> | <count> | <operation_list> | <date> | <date> |

**Notable Operations:**
- <Brief summary of significant audit events — password changes, role assignments, MFA modifications, app consent, etc.>

<If no audit events:>
✅ No audit log activity detected for this user in the investigation period.

---

## Office 365 Activity

<If O365 events found:>

| Record Type | Operation | Count |
|-------------|-----------|-------|
| <type> | <operation> | <count> |

<If no O365 events:>
✅ No Office 365 activity detected for this user in the investigation period.

---

## DLP Events

<If DLP events found:>

| Time | Device | Operation | File | Target | Rule |
|------|--------|-----------|------|--------|------|
| <datetime> | <device> | <operation> | <filename> | <domain/path> | <rule_name> |

**DLP Summary:**
- ⚠️ <X> sensitive file operations detected
- Operations: <network share copy, cloud upload, removable media, etc.>
- Rules triggered: <list of DLP rule names>

<If no DLP events:>
✅ No DLP events detected for this user in the investigation period.

---

## Security Incidents

<If incidents found:>

| ID | Title | Severity | Status | Classification | Created | Owner | Alerts | Link |
|----|-------|----------|--------|----------------|---------|-------|--------|------|
| <id> | <title> | 🔴/🟠/🟡 <severity> | <New/Active/Closed> | <TP/FP/BP/—> | <date> | <owner_upn> | <count> | [View](<url>) |

**Incident Summary:**
- <X> total incidents (<Y> open, <Z> closed)
- Highest severity: <level>
- <Brief description of most critical incident>

<If no incidents:>
✅ No security incidents involving this user in the investigation period.
- Checked: SecurityAlert → SecurityIncident join on UPN, User Object ID, and Windows SID (0 matches)

---

## Risk Assessment

### Risk Score: <XX>/100 — 🔴 HIGH / 🟠 MEDIUM / 🟡 LOW / 🟢 INFORMATIONAL

### Risk Factors

| Factor | Finding |
|--------|---------|
| 🔴/🟠/🟡 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |

### Mitigating Factors

| Factor | Finding |
|--------|---------|
| 🟢 **<Factor Name>** | <Evidence-grounded finding with specific numbers> |

---

## Recommendations

### Critical Actions
<Numbered list of critical actions with evidence. Only include if critical findings exist.>

### High Priority Actions
<Numbered list of high-priority actions with evidence.>

### Monitoring Actions (14-Day Follow-Up)
<Bulleted list of ongoing monitoring recommendations.>

---

## Appendix: Query Details

| # | Query | Table(s) | Records | Execution |
|---|-------|----------|--------:|----------:|
| 1 | IP Selection (Priority IPs) | Signinlogs_Anomalies_KQL_CL, AADUserRiskEvents, SigninLogs | <count> | <time> |
| 2 | Anomaly Detection | Signinlogs_Anomalies_KQL_CL | <count> | <time> |
| 3 | Sign-ins by Application | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 3b | Sign-ins by Location | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 3c | Sign-in Failures | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 3d | IP Sign-in Counts | SigninLogs, AADNonInteractiveUserSignInLogs | <count> | <time> |
| 4 | Audit Log Activity | AuditLogs | <count> | <time> |
| 5 | Office 365 Activity | OfficeActivity | <count> | <time> |
| 6 | Security Incidents | SecurityAlert, SecurityIncident | <count> | <time> |
| 10 | DLP Events | CloudAppEvents | <count> | <time> |
| 11 | Threat Intelligence | ThreatIntelIndicators | <count> | <time> |
| — | User Profile | Microsoft Graph API | 1 | <time> |
| — | MFA Methods | Microsoft Graph API | <count> | <time> |
| — | Registered Devices | Microsoft Graph API | <count> | <time> |
| — | Risk Profile | Microsoft Graph API | 1 | <time> |
| — | Risk Detections | Microsoft Graph API | <count> | <time> |
| — | Risky Sign-ins | Microsoft Graph API (beta) | <count> | <time> |

*Query definitions: see the Sample KQL Queries section in this SKILL.md file.*

**Do NOT include full KQL text in the appendix** — the canonical queries are already documented in this SKILL.md file. The appendix serves as an audit trail only.

---

**Investigation Timeline:**
- [MM:SS] ✓ Phase 1: User ID retrieval (<X>s)
- [MM:SS] ✓ Phase 2: Parallel data collection (<X>s)
- [MM:SS] ✓ IP Enrichment (<X>s)
- [MM:SS] ✓ Phase 3: Report generation (<X>s)
- **Total Investigation Time:** <duration>

Markdown Report Authoring Guidelines

  1. Populate every section — even if data is empty. Use the ✅ No <X> detected... pattern for empty sections.
  2. Never invent data — follow the Evidence-Based Analysis global rule strictly. Every number in the report must come from a query result.
  3. Risk assessment is dynamic — calculate risk score using the same weighted logic as generate_report_from_json.py (risk factors × 10 − mitigating factors × 5 + baseline 30, capped 0–100).
  4. IP enrichment — run enrich_ips.py for IP intelligence data. If enrich_ips.py is unavailable, use Sentinel ThreatIntelIndicators and Signinlogs_Anomalies_KQL_CL data as fallback.
  5. PII-Free — the report file is saved to reports/ which is gitignored. However, exercise caution with any files that may be shared externally.
  6. Emoji consistency — follow the Emoji Formatting table from copilot-instructions.md for all risk/status indicators.
  7. Query appendix — include record counts and execution times but NOT full KQL text. Reference the SKILL.md query numbers.

JSON Export Structure (Mode 3 — HTML Report)

Export MCP query results to a single JSON file with these required keys:

{
  "upn": "user@domain.com",
  "user_id": "<USER_OBJECT_ID>",
  "user_sid": "<WINDOWS_SID>",
  "investigation_date": "2025-11-23",
  "start_date": "2025-11-15",
  "end_date": "2025-11-24",
  "timestamp": "20251123_164532",
  "anomalies": [...],
  "signin_apps": [...],
  "signin_locations": [...],
  "signin_failures": [...],
  "signin_ip_counts": [...],
  "audit_events": [...],
  "office_events": [...],
  "dlp_events": [...],
  "incidents": [...],
  "user_profile": {
    "id": "...",
    "displayName": "...",
    "userPrincipalName": "...",
    "mail": "...",
    "userType": "...",
    "jobTitle": "...",
    "department": "...",
    "officeLocation": "...",
    "accountEnabled": true
  },
  "mfa_methods": {...},
  "devices": [...],
  "risk_profile": {...},
  "risk_detections": [...],
  "risky_signins": [...],
  "threat_intel_ips": [...]
}

Error Handling

Common Issues and Solutions

Issue Solution
Missing department or officeLocation Use "Unknown" as default value
No anomalies found Export empty array: "anomalies": []
Graph API returns 404 for user Verify UPN is correct
Sentinel query timeout Reduce date range or add `
Missing trustType in device query Use default: "Workplace"
No results from SecurityIncident query Ensure using ALL THREE identifiers (UPN, UserID, SID)
Risky sign-ins query fails Must use /beta endpoint

Required Field Defaults

{
  "department": "Unknown",
  "officeLocation": "Unknown",
  "trustType": "Workplace",
  "approximateLastSignInDateTime": "2025-01-01T00:00:00Z"
}

Empty Result Handling

{
  "anomalies": [],
  "signin_apps": [],
  "signin_locations": [],
  "signin_failures": [],
  "audit_events": [],
  "office_events": [],
  "dlp_events": [],
  "incidents": [],
  "risk_detections": [],
  "risky_signins": [],
  "threat_intel_ips": []
}

Integration with Main Copilot Instructions

This skill follows all patterns from the main copilot-instructions.md:

  • Date range handling: Uses +2 day rule for real-time searches
  • Parallel execution: Runs independent queries simultaneously
  • Time tracking: Mandatory reporting after each phase
  • Token management: Uses create_file for all output
  • Follow-up analysis: Reference copilot-instructions.md for authentication tracing workflows

Example invocations:


SVG Dashboard Generation

After generating a user investigation report (markdown file output), an SVG dashboard can be created using the shared SVG rendering skill.

Trigger: User asks "generate an SVG dashboard from the report" or "visualize this report"

Workflow:

  1. Read this skill's svg-widgets.yaml (widget manifest — defines layout, colors, field mapping)
  2. Read .github/skills/svg-dashboard/SKILL.md (rendering rules — component library, quality standards)
  3. Extract data from the completed report using data_sources.field_mapping_notes
  4. Render SVG → save as {report_basename}_dashboard.svg in the same directory

Layout: 5 rows — title banner, risk score card + KPI cards (sign-ins/success rate/IPs/incidents/anomalies), top apps bar chart + failure codes bar chart, incidents table + risk/mitigating factors table, assessment banner + recommendations.


Last Updated: March 24, 2026

Главная - Вики-сайт
Copyright © 2011-2026 iteam. Current version is 2.155.2. UTC+08:00, 2026-07-05 23:07
浙ICP备14020137号-1 $Гость$