Discover Hidden Website API Endpoints Using Regex and AI https://n8nworkflows.xyz/workflows/discover-hidden-website-api-endpoints-using-regex-and-ai-4627 # Discover Hidden Website API Endpoints Using Regex and AI ### 1. Workflow Overview This workflow, titled **"Discover Hidden Website API Endpoints Using Regex and AI"**, is designed to automatically identify hidden or undocumented API endpoints embedded within JavaScript files of modern web applications. It is ideal for analyzing Single Page Applications (SPAs), Knockout.js, Next.js/Nuxt.js frameworks, and other sites where API endpoints are present as string literals in bundled or CDN-served JavaScript files. The workflow is logically divided into several functional blocks: - **1.1 API Endpoints Extraction with Predefined Regex**: Fetches the target website’s HTML, extracts JS file URLs, filters relevant JS files, downloads their content, and applies a predefined regex to extract potential API endpoints. - **1.2 Initial AI Analysis of JS Files**: Sends each relevant JS file content to a Large Language Model (LLM) to perform a detailed analysis and description of API endpoints, including methods and parameters. - **1.3 AI Agent Regex Generation and Validation**: Uses an AI agent to generate custom regex patterns based on the AI analysis, then validates these regex patterns against reference endpoint files through iterative refinement. - **1.4 Final Regex Execution and Endpoint Aggregation**: Executes the validated regex to extract endpoints from AI analysis results, removes duplicates, and merges both predefined regex and AI-generated endpoints. - **1.5 Export and Reporting**: Sorts and formats the combined endpoint data and exports the results to an Excel file for comparison and further use. --- ### 2. Block-by-Block Analysis #### 2.1 API Endpoints Extraction with Predefined Regex **Overview:** This block fetches the website HTML, extracts JS file URLs, filters for relevant JS files, retrieves their content, and applies a predefined regex to extract potential API endpoints. **Nodes Involved:** - Configuration - Fetch Website HTML - Extract URLs of JS files - Split URLs of JS files - Keep Relevant JS Files - Fetch JS Content - Extract API Endpoints - Check Endpoints Count - Split Endpoints - Remove Duplicates - Reference for Source Metadata - Insufficient Endpoints (NoOp) **Node Details:** - **Configuration** - *Type:* Set - *Role:* Stores user inputs: target website URL and User-Agent string for HTTP requests. - *Key Parameters:* - URL (string): Target site URL (e.g., https://example.com) - User-Agent (string): Custom user-agent header to mimic a bot or browser. - *Input:* Manual trigger - *Output:* Provides URL and User-Agent for HTTP requests. - **Fetch Website HTML** - *Type:* HTTP Request - *Role:* Downloads the raw HTML content of the target website. - *Configuration:* - URL: Dynamic from Configuration node. - Timeout: 30s, follow redirects, allow unauthorized SSL certificates. - Headers: User-Agent from Configuration. - Never error on response to allow downstream handling. - *Input:* Configuration node - *Output:* HTML content in response body. - **Extract URLs of JS files** - *Type:* HTML Extract - *Role:* Parses HTML to extract `src` attributes of all `