23 KiB
Track Azure API and Service Bus failures with Application Insights correlation
Track Azure API and Service Bus failures with Application Insights correlation
1. Workflow Overview
Purpose:
This workflow queries Azure Application Insights to identify failed (or optionally all) API requests and correlates them with Service Bus traces and exceptions using the shared operationId (Application Insights operation_Id). It then produces a Markdown and HTML failure analysis report, and optionally exports results (Excel) or returns them via webhook (both disabled by default).
Target use cases:
- Investigating API failures end-to-end across APIM → backend → Service Bus
- Building an on-demand operational report for incident review
- Correlating Application Insights telemetry streams (requests/traces/exceptions)
1.1 Entry & Configuration
Receives a manual start and sets App Insights/AAD parameters (app id, tenant, client credentials, time range, filters).
1.2 Data Collection (Application Insights API)
Authenticates to Azure AD using client credentials and runs three KQL queries (requests, traces, exceptions) against the App Insights Query API.
1.3 Correlation & Analysis
Transforms App Insights table format into JSON objects, groups traces/exceptions by operationId, correlates them to request records, and computes summary stats + top lists.
1.4 Report Generation & Output Routing
Builds Markdown + HTML reports, checks if there is any data, then outputs either a structured “report payload” or a “no data” message. Optional output nodes are present but disabled.
2. Block-by-Block Analysis
Block 1 — Entry & Configuration
Overview:
Starts the workflow manually and defines the parameters needed for Azure AD authentication and App Insights queries (including the time range and whether to include successful requests).
Nodes involved:
- Manual Trigger
- Set Configuration
Node: Manual Trigger
- Type / role:
n8n-nodes-base.manualTrigger— interactive entry point for testing/running on demand. - Configuration choices: No parameters.
- Input/Output:
- Output → Set Configuration
- Edge cases / failures: None (only runs when manually executed).
Node: Set Configuration
- Type / role:
n8n-nodes-base.set— creates a config object used by later Code nodes. - Key configuration (interpreted):
appInsightsAppId: Application Insights Application ID (not the resource ID).tenantId: Azure AD tenant GUID.clientId: service principal/app registration client id.clientSecret: service principal secret.timeRange: KQLago()value as string (examples:24h,7d,30d).includeSuccessful: boolean; whenfalse, requests query filters toresultCode >= 400.
- Key expressions / variables: none (static placeholders in JSON).
- Input/Output:
- Input ← Manual Trigger
- Output → Query Application Insights
- Version-specific notes: Node typeVersion
3.3(modern Set node with “assignments” model). - Edge cases / failures:
- Incorrect/missing values will later cause OAuth failures (AAD) or App Insights query failures.
- Storing
clientSecretdirectly in node parameters is functional but risky; credentials/variables are safer.
Sticky note coverage (context):
- “Configuration” note: reminds to set App ID and time range, and ensure OAuth2 credentials are configured—however, this workflow actually performs OAuth internally in Code (not via n8n OAuth credentials).
Block 2 — Data Collection (Application Insights API)
Overview:
Fetches an Azure AD token using the client credentials grant, then issues three POST requests to the Application Insights Query API using KQL to retrieve: APIM requests, Service Bus-related traces, and exceptions.
Nodes involved:
- Query Application Insights
Node: Query Application Insights
- Type / role:
n8n-nodes-base.code— performs OAuth token acquisition + three App Insights KQL queries. - Configuration choices (interpreted):
- Reads config from the incoming item (
$input.item.json). - AAD token endpoint:
https://login.microsoftonline.com/${tenantId}/oauth2/v2.0/token - Grant:
client_credentials - Scope:
https://api.applicationinsights.io/.default - App Insights Query API endpoint:
https://api.applicationinsights.io/v1/apps/${appId}/query - Executes three KQL queries (each
take 500and ordered by timestamp desc).
- Reads config from the incoming item (
- Key logic / expressions:
- Throws if
access_tokenmissing:throw new Error('Failed to obtain access token: ' + JSON.stringify(tokenData))
- Returns a single item with:
apimData: apimData.tables?.[0] || {}sbData: sbData.tables?.[0] || {}exceptionsData: exceptionsData.tables?.[0] || {}
- Throws if
- KQL query intent (summarized):
- APIM Requests (
requeststable)- Filter:
timestamp > ago(timeRange) - Optional filter: if
includeSuccessfulis false →resultCode >= 400 - Extract custom dimensions:
apim-service-name,apim-operation-name,apim-subscription-id
- Projects client geo/type fields and other request fields
- Filter:
- Service Bus Traces (
tracestable)- Filter by time range
- Filter: message contains
ServiceBusOR customDimensions hasMessageId - Extract custom dimensions:
MessageId,EntityName,Endpoint
- Exceptions (
exceptionstable)- Filter by time range
- Projects messages/methods and
details
- APIM Requests (
- Input/Output:
- Input ← Set Configuration
- Output → Correlate and Analyze Data
- Version-specific notes: Code node typeVersion
2(supportsfetchin n8n’s runtime). - Edge cases / failure types:
- Auth failures: invalid tenant/client/secret, missing API permissions, expired secret → token request returns error JSON; workflow throws.
- Authorization: service principal lacking access to the App Insights app → 403/401 from query API.
- API throttling/timeouts: App Insights Query API can throttle; no retry/backoff implemented.
- Unexpected response shape: if
tables[0]missing, downstream sees{}and parses to empty arrays (graceful). - KQL assumptions: expected customDimensions keys may not exist (handled as
tostring(...)which yields empty). - Row limit:
take 500may hide additional failures in high-volume environments.
Sticky note coverage (context):
- “Data Collection” note states “Handles OAuth2 authentication internally.” This matches the Code implementation.
Block 3 — Correlation & Analysis
Overview:
Converts the App Insights “table” responses into object arrays, groups traces and exceptions by operationId, correlates them to request records, and computes summary metrics plus top errors and slowest requests.
Nodes involved:
- Correlate and Analyze Data
Node: Correlate and Analyze Data
- Type / role:
n8n-nodes-base.code— transforms and correlates telemetry; calculates analytics. - Configuration choices (interpreted):
parseAppInsightsTable(table):- Converts
{ columns: [{name...}], rows: [...] }into[ {colName: value, ...}, ... ] - Returns
[]if structure missing.
- Converts
- Builds maps:
sbByOperationId[operationId] = [trace, ...]exceptionsByOperationId[operationId] = [exception, ...]
- Correlates by iterating over
apimRequestsand attaching:serviceBusTraces,serviceBusMessageIdsexceptions,exceptionMessages- Flags:
hasException,hasServiceBusTrace,isFailure
- Key expressions / variables:
- References config timeRange via node lookup:
timeRange: $('Set Configuration').item.json.timeRange
- References config timeRange via node lookup:
- Computed outputs:
summary:- totals, failed/successful counts, counts with exceptions/traces
- average duration (see edge case below)
- analysis timestamp
topErrors:- Count by
${resultCode} - ${apiName}, top 10
- Count by
topSlowRequests:- Sort by duration desc, top 10
correlatedData: enriched per-request objectsrawData: counts of raw rows per source
- Input/Output:
- Input ← Query Application Insights
- Output → Generate Report
- Version-specific notes: Code node typeVersion
2. - Edge cases / failure types:
- Division by zero:
averageDurationdoes... / correlatedData.length. If there are zero APIM requests, this becomesNaN(and later.toFixed()can throw). The workflow tries to guard later with “Check If Data Exists”, but report generation happens before the IF check, so this can still break execution. - Type coercion:
resultCodefrom App Insights can be string; comparisons like>= 400may behave unexpectedly if not numeric. (KQLresultCodeinrequestsis often string.) - OperationId missing: items without
operationIdwill not correlate; they still appear but without linked traces/exceptions.
- Division by zero:
Block 4 — Report Generation & Output Routing
Overview:
Generates Markdown and HTML reports from the analytics, checks whether any correlated data exists, then emits either the report payload or a “no data” message. Optional export/webhook response nodes exist but are disabled.
Nodes involved:
- Generate Report
- Check If Data Exists
- Output Report
- No Data Found
- Export to Excel (disabled)
- Respond to Webhook (disabled)
Node: Generate Report
- Type / role:
n8n-nodes-base.code— builds human-friendly Markdown and HTML strings. - Configuration choices (interpreted):
- Builds
markdownReportwith summary, top errors, and slow requests. - Builds
htmlReportwith styled cards and lists.
- Builds
- Key expressions / variables:
- Failure rate:
((failedRequests/totalRequests)*100).toFixed(1) - Average duration:
summary.averageDuration.toFixed(2) - Slow request duration formatting:
r.duration.toFixed(2)
- Failure rate:
- Input/Output:
- Input ← Correlate and Analyze Data
- Output → Check If Data Exists
- Edge cases / failure types:
- If
summary.totalRequestsis0, several.toFixed()calls and division will produceInfinity/NaNand may throw (depending on the value). - If any
durationis null/undefined,toFixed()can throw. (You partially guard earlier with(r.duration || 0)for sorting, but not when formatting in the report.) - Large HTML/Markdown could exceed downstream size limits if later sent via webhook/email (not present here but common).
- If
Node: Check If Data Exists
- Type / role:
n8n-nodes-base.if— routes flow depending on whether there are correlated records. - Condition (interpreted):
- If
{{$json.correlatedData.length}} > 0→ “true” branch - Else → “false” branch
- If
- Input/Output:
- Input ← Generate Report
- True → Output Report
- False → No Data Found
- Version-specific notes: IF node typeVersion
2with strict validation enabled. - Edge cases / failure types:
- If
correlatedDatais missing/not an array, expression may error. In normal flow it exists.
- If
Node: Output Report
- Type / role:
n8n-nodes-base.set— prepares a clean output payload for downstream use. - Configuration choices (interpreted):
report: Markdown report stringhtmlReport: HTML report stringsummary: summary objecttopErrors: arrayfailedRequests: filters correlatedData to failures via inline JS:={{ $json.correlatedData.filter(r => r.isFailure) }}
- Input/Output:
- Input ← Check If Data Exists (true)
- Output → Export to Excel (disabled) and Respond to Webhook (disabled)
- Version-specific notes: Set node typeVersion
3.3. - Edge cases / failure types:
- If
correlatedDatais huge, filtering inline is fine but could increase memory usage.
- If
Node: No Data Found
- Type / role:
n8n-nodes-base.set— emits a message when nothing matched. - Configuration choices (interpreted):
- Sets
messageexplaining no data found.
- Sets
- Input/Output:
- Input ← Check If Data Exists (false)
- No outgoing connections.
- Version-specific notes: Set node typeVersion
3.3.
Node: Export to Excel (disabled)
- Type / role:
n8n-nodes-base.spreadsheetFile— would export items to an.xlsxfile. - Configuration choices (interpreted):
- Operation: create file (
toFile) - Format:
xlsx - File name pattern:
azure-api-failures-{{ $now.format('yyyy-MM-dd-HHmmss') }}.xlsx - Includes header row.
- Operation: create file (
- Input/Output:
- Input ← Output Report
- No defined output in connections (node is present as optional).
- Edge cases / failure types:
- Disabled by default; if enabled, ensure the incoming data is shaped as rows you want exported (currently the workflow output is a single item with nested arrays/objects; exporting that directly may not yield a useful spreadsheet without additional flattening).
Node: Respond to Webhook (disabled)
- Type / role:
n8n-nodes-base.respondToWebhook— would return a JSON response to an upstream Webhook trigger. - Configuration choices (interpreted):
- Responds with JSON body:
{ status: 'success', data: { summary, topErrors, failedRequests, report }, timestamp }
- Responds with JSON body:
- Input/Output:
- Input ← Output Report
- Edge cases / failure types:
- This workflow has no Webhook Trigger node. Enabling this node alone won’t make it callable via HTTP; you’d need to add a Webhook trigger and route into this flow.
Sticky note coverage (context):
- “Processing & Reporting” note matches correlation + reporting nodes.
- “Output Options” note correctly states Excel and webhook response are disabled.
3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| Sticky Note | n8n-nodes-base.stickyNote | Documentation / visual guidance | ## How it works\n\nThis workflow queries Azure Application Insights to track failed API calls across APIM, Service Bus, and exceptions. It makes three KQL queries to the Application Insights API, then correlates the results using operationId to show which API calls failed, what exceptions occurred, and which Service Bus messages were involved.\n\n## Setup steps\n\n1. Create service principal: Run az ad sp create-for-rbac --name "n8n-appinsights" --role "Monitoring Reader" --scopes /subscriptions/{subscription-id}\n\n2. Configure workflow: Update 'Set Configuration' with your Application Insights Application ID and Azure AD tenant ID.\n\n3. Add credentials: The workflow uses Azure credentials which must be configured in your n8n instance. |
||
| Sticky Note - Config | n8n-nodes-base.stickyNote | Documentation / visual guidance | ## Configuration\n\nSet your Application Insights Application ID and time range filter. Ensure OAuth2 credentials are configured in n8n. Time range options: 24h, 7d, 30d. | ||
| Sticky Note - Processing | n8n-nodes-base.stickyNote | Documentation / visual guidance | ## Processing & Reporting\n\nCorrelates data using operationId, calculates statistics, identifies top errors and slow requests, then generates Markdown and HTML reports. | ||
| Sticky Note - Data Collection | n8n-nodes-base.stickyNote | Documentation / visual guidance | ## Data Collection\n\nMakes three Application Insights API calls in one node: APIM requests, Service Bus traces, and exceptions. Handles OAuth2 authentication internally. | ||
| Sticky Note - Outputs | n8n-nodes-base.stickyNote | Documentation / visual guidance | ## Output Options\n\nExport to Excel or return JSON via webhook. Both nodes are disabled by default—enable as needed. | ||
| Manual Trigger | n8n-nodes-base.manualTrigger | Manual entry point | Set Configuration | ||
| Set Configuration | n8n-nodes-base.set | Define App Insights + AAD parameters | Manual Trigger | Query Application Insights | ## Configuration\n\nSet your Application Insights Application ID and time range filter. Ensure OAuth2 credentials are configured in n8n. Time range options: 24h, 7d, 30d. |
| Query Application Insights | n8n-nodes-base.code | OAuth token + 3 KQL queries to App Insights | Set Configuration | Correlate and Analyze Data | ## Data Collection\n\nMakes three Application Insights API calls in one node: APIM requests, Service Bus traces, and exceptions. Handles OAuth2 authentication internally. |
| Correlate and Analyze Data | n8n-nodes-base.code | Correlate requests/traces/exceptions by operationId; compute stats | Query Application Insights | Generate Report | ## Processing & Reporting\n\nCorrelates data using operationId, calculates statistics, identifies top errors and slow requests, then generates Markdown and HTML reports. |
| Generate Report | n8n-nodes-base.code | Build Markdown + HTML report | Correlate and Analyze Data | Check If Data Exists | ## Processing & Reporting\n\nCorrelates data using operationId, calculates statistics, identifies top errors and slow requests, then generates Markdown and HTML reports. |
| Check If Data Exists | n8n-nodes-base.if | Branch on correlatedData length | Generate Report | Output Report; No Data Found | |
| Output Report | n8n-nodes-base.set | Prepare final structured output payload | Check If Data Exists (true) | Export to Excel; Respond to Webhook | ## Output Options\n\nExport to Excel or return JSON via webhook. Both nodes are disabled by default—enable as needed. |
| No Data Found | n8n-nodes-base.set | Output “no data” message | Check If Data Exists (false) | ||
| Export to Excel | n8n-nodes-base.spreadsheetFile | Optional XLSX export (disabled) | Output Report | ## Output Options\n\nExport to Excel or return JSON via webhook. Both nodes are disabled by default—enable as needed. | |
| Respond to Webhook | n8n-nodes-base.respondToWebhook | Optional HTTP response (disabled; requires webhook trigger) | Output Report | ## Output Options\n\nExport to Excel or return JSON via webhook. Both nodes are disabled by default—enable as needed. |
4. Reproducing the Workflow from Scratch
-
Create a new workflow
- Name: Track API failures with Application Insights correlation (or your preferred name).
-
Add node: Manual Trigger
- Node type: Manual Trigger
- No configuration needed.
-
Add node: Set Configuration
- Node type: Set
- Add fields (assignments):
appInsightsAppId(String) = your App Insights Application IDtenantId(String) = Azure AD tenant IDclientId(String) = service principal client IDclientSecret(String) = service principal secrettimeRange(String) =24h(or7d,30d)includeSuccessful(Boolean) =false
- Connect: Manual Trigger → Set Configuration
-
(Azure setup) Create/ensure service principal permissions
- Create SP (example from note):
az ad sp create-for-rbac --name "n8n-appinsights" --role "Monitoring Reader" --scopes /subscriptions/{subscription-id} - Ensure this principal can query the target Application Insights resource (at minimum, Reader/Monitoring Reader typically suffices; some environments may require explicit access to the resource).
- Create SP (example from note):
-
Add node: Query Application Insights (Code)
- Node type: Code
- Paste logic that:
- Reads
appInsightsAppId,tenantId,clientId,clientSecret,timeRange,includeSuccessfulfrom input JSON - Requests token from AAD v2 token endpoint using
client_credentialsand scopehttps://api.applicationinsights.io/.default - Calls
POST https://api.applicationinsights.io/v1/apps/{appId}/querythree times with KQL for:requests(APIM requests; optionally failures only)traces(Service Bus-related)exceptions
- Returns a single item containing
apimData,sbData,exceptionsDataset to the first table or{}.
- Reads
- Connect: Set Configuration → Query Application Insights
-
Add node: Correlate and Analyze Data (Code)
- Node type: Code
- Implement logic to:
- Convert App Insights table format to object arrays
- Group traces/exceptions by
operationId - Enrich each request with matching traces/exceptions
- Compute
summary,topErrors,topSlowRequests, includecorrelatedData - Reference
timeRangefrom the configuration node if desired.
- Connect: Query Application Insights → Correlate and Analyze Data
-
Add node: Generate Report (Code)
- Node type: Code
- Build:
markdownReportstringhtmlReportstring
- Return original data plus the two reports.
- Connect: Correlate and Analyze Data → Generate Report
-
Add node: Check If Data Exists (IF)
- Node type: IF
- Condition:
- Left value:
={{ $json.correlatedData.length }} - Operator: greater than
- Right value:
0
- Left value:
- Connect: Generate Report → Check If Data Exists
-
Add node: Output Report (Set)
- Node type: Set
- Add fields:
report(String) =={{ $json.markdownReport }}htmlReport(String) =={{ $json.htmlReport }}summary(Object) =={{ $json.summary }}topErrors(Array) =={{ $json.topErrors }}failedRequests(Array) =={{ $json.correlatedData.filter(r => r.isFailure) }}
- Connect IF true output → Output Report
-
Add node: No Data Found (Set)
- Node type: Set
- Add field:
message(String) = “No data found for the specified time range and filters. Check your App Insights configuration.”
- Connect IF false output → No Data Found
-
(Optional) Add node: Export to Excel (disabled by default)
- Node type: Spreadsheet File
- Operation: To File
- File format:
xlsx - File name:
azure-api-failures-{{ $now.format('yyyy-MM-dd-HHmmss') }}.xlsx - Connect: Output Report → Export to Excel
- Note: you may want to add a “flatten” step first to export a table of
failedRequests.
-
(Optional) Add node: Respond to Webhook (disabled by default)
- Node type: Respond to Webhook
- Respond with: JSON
- Response body (expression) building:
- status, summary, topErrors, failedRequests, report, timestamp
- Connect: Output Report → Respond to Webhook
- Important: Add a Webhook Trigger node at the start if you want this to be callable via HTTP.
5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
Create service principal: az ad sp create-for-rbac --name "n8n-appinsights" --role "Monitoring Reader" --scopes /subscriptions/{subscription-id} |
Sticky note “How it works” |
Time range options: 24h, 7d, 30d |
Sticky note “Configuration” |
| Output nodes “Export to Excel” and “Respond to Webhook” are disabled by default | Sticky note “Output Options” |
Correlation key is operationId derived from operation_Id across requests/traces/exceptions |
Workflow design (Data Processing) |