creation

2026-04-19 17:14:37 +00:00 · 2026-03-10 12:12:01 +08:00
parent f8d86a6db3
commit afe8fd87b1
1 changed files with 125 additions and 0 deletions
--- a/alerts-13791/readme-13791.md
+++ b/alerts-13791/readme-13791.md
@@ -0,0 +1,125 @@
+Auto-heal failing workflows with Azure OpenAI, n8n API, and Slack alerts
+
+https://n8nworkflows.xyz/workflows/auto-heal-failing-workflows-with-azure-openai--n8n-api--and-slack-alerts-13791
+
+
+# Auto-heal failing workflows with Azure OpenAI, n8n API, and Slack alerts
+
+### 1. Workflow Overview
+
+The **AI Self-Healing Engine & Auto-Patching System** is an advanced operational framework designed to minimize downtime in n8n environments. It acts as a global error handler that intercepts failures from other workflows, analyzes the root cause using Artificial Intelligence, and autonomously attempts to resolve the issue.
+
+The workflow logic is divided into four functional blocks:
+1.  **Error Interception & Context Retrieval:** Captures error metadata and fetches the complete source JSON of the failing workflow via the n8n REST API.
+2.  **AI Diagnosis:** Uses an Azure OpenAI-powered agent to distinguish between transient failures (network timeouts) and structural failures (wrong parameters or logic errors).
+3.  **Autonomous Resolution (Self-Healing):** 
+    *   **Retry Logic:** If the error is transient, the system waits for a "cool down" period and triggers a re-execution.
+    *   **Auto-Patching:** If the error is a logic/parameter issue, the system generates a corrected version of the workflow JSON and updates the production workflow live.
+4.  **Notification & Reporting:** Dispatches detailed alerts to Slack regarding successful repairs or requests for manual intervention.
+
+---
+
+### 2. Block-by-Block Analysis
+
+#### 2.1 Error Interception & Context Retrieval
+*   **Overview:** This block triggers upon any workflow failure and gathers the necessary technical context to allow the AI to "read" the code of the failing workflow.
+*   **Nodes Involved:** `On Workflow Error`, `Filter: Ignore Self`, `Get Workflow JSON`.
+*   **Node Details:**
+    *   **On Workflow Error:** The entry point. It captures the execution ID, workflow ID, and the specific error message.
+    *   **Filter: Ignore Self:** Ensures the healing engine doesn't attempt to heal itself if it fails, preventing infinite recursion loops. (Condition: `workflow.id` != current workflow ID).
+    *   **Get Workflow JSON:** An HTTP Request node (GET) that calls the n8n API (`/workflows/{id}`). It requires an `X-N8N-API-KEY` header. Failure to provide a valid API key or incorrect URL will cause this block to fail.
+
+#### 2.2 AI Diagnosis
+*   **Overview:** The "brain" of the workflow. It processes the error message and the workflow structure to decide the best course of action.
+*   **Nodes Involved:** `Diagnose Error`, `Azure OpenAI Chat Model1`, `Decision Schema`.
+*   **Node Details:**
+    *   **Diagnose Error (AI Agent):** Takes the failed node name, error message, and full Workflow JSON as input. It is prompted to act as a "Senior Engineer."
+    *   **Azure OpenAI Chat Model1:** The LLM provider (using GPT-4o). Requires Azure credentials and deployment name.
+    *   **Decision Schema (Output Parser):** Constraints the AI to return a specific JSON object containing a `state` ("RETRY" or "FIX"), a `diagnosis` string, and a `patch` object (parameter name and new value).
+
+#### 2.3 Autonomous Resolution (Self-Healing)
+*   **Overview:** Executes the technical fix based on the AI's verdict.
+*   **Nodes Involved:** `Determine Action`, `Cool Down`, `Retry Execution`, `Generate Patch JSON`, `Update Workflow`.
+*   **Node Details:**
+    *   **Determine Action:** A Switch node that routes the flow based on the AI's `state` output.
+    *   **Cool Down (Wait):** Pauses for a specified interval (e.g., 1 minute) to allow external services to recover.
+    *   **Retry Execution:** Uses the n8n API (POST) to trigger a retry of the specific failed execution ID.
+    *   **Generate Patch JSON (Code Node):** A JavaScript block that clones the original workflow JSON, finds the failing node, updates the identified faulty parameter with the AI's suggested value, and appends a "🛠 AI AUTO-FIX" note to the node for transparency.
+    *   **Update Workflow:** Uses the n8n API (PUT) to overwrite the existing workflow with the newly patched JSON.
+
+#### 2.4 Notification & Reporting
+*   **Overview:** Informs the operations team of the outcome via Slack.
+*   **Nodes Involved:** `Notify Success`, `Notify Manual Fix`.
+*   **Node Details:**
+    *   **Notify Success:** Sent when a "FIX" (patch) is successfully applied. Includes the AI's diagnosis.
+    *   **Notify Manual Fix:** Sent if the AI suggests a manual review or if the "RETRY" logic is exhausted. Includes a direct link to the failed execution.
+
+---
+
+### 3. Summary Table
+
+| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
+| :--- | :--- | :--- | :--- | :--- | :--- |
+| On Workflow Error | n8n-nodes-base.errorTrigger | Trigger | None | Filter: Ignore Self | The error trigger responds with failure in any workflow, then AI analyzes the failure context... |
+| Filter: Ignore Self | n8n-nodes-base.filter | Safety Gate | On Workflow Error | Get Workflow JSON | The error trigger responds with failure in any workflow... |
+| Get Workflow JSON | n8n-nodes-base.httpRequest | Data Acquisition | Filter: Ignore Self | Diagnose Error | The error trigger responds with failure in any workflow... |
+| Diagnose Error | @n8n/n8n-nodes-langchain.agent | Logic/Reasoning | Get Workflow JSON | Determine Action | The error trigger responds with failure in any workflow... |
+| Decision Schema | @n8n/n8n-nodes-langchain.outputParserStructured | Data Formatting | None | Diagnose Error | N/A |
+| Azure OpenAI Chat Model1 | @n8n/n8n-nodes-langchain.lmChatAzureOpenAi | AI Model | None | Diagnose Error | N/A |
+| Determine Action | n8n-nodes-base.switch | Routing | Diagnose Error | Cool Down, Generate Patch JSON, Notify Manual Fix | The Switch node directs traffic based on the AI's verdict. |
+| Cool Down | n8n-nodes-base.wait | Delay | Determine Action | Retry Execution | The Switch node directs traffic based on the AI's verdict. If a retry is needed... |
+| Retry Execution | n8n-nodes-base.httpRequest | Action (Retry) | Cool Down | None | The Switch node directs traffic based on the AI's verdict. |
+| Generate Patch JSON | n8n-nodes-base.code | Transformation | Determine Action | Update Workflow | The JavaScript node injects the AI's corrected value into the workflow JSON. |
+| Update Workflow | n8n-nodes-base.httpRequest | Action (Update) | Generate Patch JSON | Notify Success | The JavaScript node injects the AI's corrected value into the workflow JSON. |
+| Notify Success | n8n-nodes-base.slack | Notification | Update Workflow | None | The JavaScript node injects the AI's corrected value into the workflow JSON. |
+| Notify Manual Fix | n8n-nodes-base.slack | Notification | Determine Action | None | If the AI cannot determine a fix or the error is marked as "MANUAL"... |
+
+---
+
+### 4. Reproducing the Workflow from Scratch
+
+1.  **Preparation:**
+    *   Generate an n8n API Key from your User Settings.
+    *   Ensure you have an Azure OpenAI Deployment (Model: GPT-4o).
+    *   Prepare a Slack App with `chat:write` permissions.
+
+2.  **Trigger & Filter Setup:**
+    *   Add an **Error Trigger** node.
+    *   Connect it to a **Filter** node. Set the condition to check that the incoming `workflow.id` (from the error) does not equal the current workflow ID to prevent loops.
+
+3.  **Metadata Fetching:**
+    *   Add an **HTTP Request** node.
+    *   Method: `GET`. URL: `https://YOUR_N8N_DOMAIN/api/v1/workflows/{{ $json.workflow.id }}`.
+    *   Header: `X-N8N-API-KEY` with your API key.
+
+4.  **AI Integration:**
+    *   Add an **AI Agent** node.
+    *   Attach an **Azure OpenAI Chat Model** node.
+    *   Attach a **Structured Output Parser** node. Define the schema with properties: `state` (enum: RETRY, FIX), `diagnosis` (string), and `patch` (object with `parameterName` and `newValue`).
+    *   In the Agent prompt, pass the error message and the stringified JSON from the HTTP Request node.
+
+5.  **Execution Routing:**
+    *   Add a **Switch** node. Create three routing rules based on the AI's `state` output.
+
+6.  **The "Retry" Branch:**
+    *   Add a **Wait** node (e.g., 1-5 minutes).
+    *   Add an **HTTP Request** node. Method: `POST`. URL: `.../api/v1/executions/{{ executionId }}/retry`.
+
+7.  **The "Fix" Branch:**
+    *   Add a **Code** node. Use JavaScript to parse the workflow JSON, locate the specific node mentioned in the AI output, and overwrite the parameter with the new value.
+    *   Add an **HTTP Request** node. Method: `PUT`. URL: `.../api/v1/workflows/{{ workflowId }}`. Pass the updated JSON in the body.
+    *   Add a **Slack** node to notify of the successful patch.
+
+8.  **The "Manual" Branch:**
+    *   Add a **Slack** node configured with a message containing the AI's diagnosis and a link to the execution for human review.
+
+---
+
+### 5. General Notes & Resources
+
+| Note Content | Context or Link |
+| :--- | :--- |
+| This workflow acts as a global error handler. It must be selected as the "Error Workflow" in the settings of other workflows. | Workflow Settings > Error Workflow |
+| Requires n8n API v1 enabled on your instance. | [n8n API Documentation](https://docs.n8n.io/api/v1/) |
+| The "Generate Patch JSON" node adds a visual note to the modified node in the UI for auditing. | Logic implemented in the Code Node |
+| Self-recursion check is critical: without the Filter node, a failure in this workflow would trigger itself infinitely. | Safety Configuration |