From 799e1ad4befafe180860f0d95387d66dfdecb18c Mon Sep 17 00:00:00 2001
From: nusquama <nusquama@gmail.com>
Date: Wed, 12 Nov 2025 19:18:08 +0100
Subject: [PATCH] creation

---
 .../readme-6663.md                            | 262 ++++++++++++++++++
 1 file changed, 262 insertions(+)
 create mode 100644 workflows/Automated Web Scraper: Niche Job-Product Monitor With Telegram Alert-6663/readme-6663.md

diff --git a/workflows/Automated Web Scraper: Niche Job-Product Monitor With Telegram Alert-6663/readme-6663.md b/workflows/Automated Web Scraper: Niche Job-Product Monitor With Telegram Alert-6663/readme-6663.md
new file mode 100644
index 000000000..4456347e3
--- /dev/null
+++ b/workflows/Automated Web Scraper: Niche Job-Product Monitor With Telegram Alert-6663/readme-6663.md	
@@ -0,0 +1,262 @@
+Automated Web Scraper: Niche Job/Product Monitor With Telegram Alert
+
+https://n8nworkflows.xyz/workflows/automated-web-scraper--niche-job-product-monitor-with-telegram-alert-6663
+
+
+# Automated Web Scraper: Niche Job/Product Monitor With Telegram Alert
+
+### 1. Workflow Overview
+
+This workflow is an **Automated Web Scraper** designed to monitor niche job boards, product pages, or any specified webpage at regular intervals and send alerts via Telegram when new or updated items are detected.
+
+It is structured into the following logical blocks:
+
+- **1.1 Hourly Trigger:** Automatically initiates the workflow on a recurring schedule (default every hour).
+- **1.2 Web Content Retrieval:** Downloads the raw HTML content of the targeted webpage.
+- **1.3 Data Extraction:** Parses the HTML to extract specific job or product information using CSS selectors.
+- **1.4 Conditional Check:** Determines if any relevant items were found to proceed further.
+- **1.5 Notification Formatting:** Formats the extracted data into a message suitable for Telegram.
+- **1.6 Alert Dispatch:** Sends the formatted notification message to a specified Telegram chat.
+
+---
+
+### 2. Block-by-Block Analysis
+
+#### 2.1 Hourly Trigger
+
+- **Overview:**  
+  This block uses a Cron node to trigger the workflow execution every hour automatically, ensuring timely monitoring without manual intervention.
+
+- **Nodes Involved:**  
+  - Hourly Monitor Trigger
+
+- **Node Details:**  
+  - **Name:** Hourly Monitor Trigger  
+  - **Type:** Cron Trigger  
+  - **Configuration:**  
+    - Mode set to `everyHour` to run once each hour.  
+    - No additional parameters configured, but can be customized to run at different intervals (e.g., every 4 hours or daily).  
+  - **Inputs:** None (Trigger node).  
+  - **Outputs:** Connects to `Fetch Webpage Content`.  
+  - **Edge Cases / Failures:**  
+    - Cron misconfiguration may cause no triggers or too frequent triggers.  
+    - Timezone differences may affect expected trigger times if server timezone differs.  
+  - **Version Requirements:** Standard Cron node; compatible with n8n v0.133+.
+
+---
+
+#### 2.2 Web Content Retrieval
+
+- **Overview:**  
+  Downloads the full HTML content of the target webpage to enable subsequent data extraction.
+
+- **Nodes Involved:**  
+  - Fetch Webpage Content
+
+- **Node Details:**  
+  - **Name:** Fetch Webpage Content  
+  - **Type:** HTTP Request  
+  - **Configuration:**  
+    - URL set to `https://www.n8n.io/blog/` by default — must be replaced with the URL of the job board, product page, or any target site.  
+    - Response Format: `string` (to get raw HTML).  
+    - No authentication or headers configured by default; may require adding cookies, headers, or tokens if the site needs login or API access.  
+  - **Inputs:** Triggered by `Hourly Monitor Trigger`.  
+  - **Outputs:** Sends HTML content to `Extract Job/Product Info`.  
+  - **Edge Cases / Failures:**  
+    - HTTP errors (404, 500, timeout).  
+    - Sites with JavaScript-rendered content will not be fully captured; may need headless browser approach or code node with Puppeteer/Playwright.  
+  - **Version Requirements:** HTTP Request node v3 or higher recommended for stable response format options.
+
+---
+
+#### 2.3 Data Extraction
+
+- **Overview:**  
+  Parses the downloaded HTML and extracts key data points such as job titles and links using CSS selectors.
+
+- **Nodes Involved:**  
+  - Extract Job/Product Info
+
+- **Node Details:**  
+  - **Name:** Extract Job/Product Info  
+  - **Type:** HTML Extract  
+  - **Configuration:**  
+    - HTML input set dynamically as `{{ $node["Fetch Webpage Content"].json.data }}` to consume prior node’s output.  
+    - Extract Operations configured with two selectors:  
+      1. Selector: `h3.BlogItem_title__d78Xb` → extracts job/product titles (`textContent`).  
+      2. Selector: `a.BlogItem_blogItem__a_H6E` → extracts job/product links (`href`).  
+    - Property names assigned as `JobTitle` and `JobLink`.  
+    - Users must customize selectors and properties to match their target webpage’s structure.  
+  - **Inputs:** Receives HTML from `Fetch Webpage Content`.  
+  - **Outputs:** Passes extracted data to `If Items Found`.  
+  - **Edge Cases / Failures:**  
+    - Incorrect or outdated CSS selectors will result in no data extracted.  
+    - If the page structure changes, the selectors must be updated.  
+    - Empty extraction arrays if site content is missing or access is blocked.  
+  - **Version Requirements:** HTML Extract node v1+; no specific version constraints.  
+  - **Notes:** Testing this node independently is critical to ensure selectors are accurate.
+
+---
+
+#### 2.4 Conditional Check
+
+- **Overview:**  
+  Checks whether the extraction yielded any items before proceeding to notification; prevents unnecessary alerts.
+
+- **Nodes Involved:**  
+  - If Items Found
+
+- **Node Details:**  
+  - **Name:** If Items Found  
+  - **Type:** If Node (Conditional)  
+  - **Configuration:**  
+    - Condition: Checks if the length of the extracted items array (`{{ $json.length }}`) is **not equal** to 0.  
+  - **Inputs:** Receives extracted data from `Extract Job/Product Info`.  
+  - **Outputs:**  
+    - True path: continues to `Format Notification Message`.  
+    - False path: workflow ends silently (no new items).  
+  - **Edge Cases / Failures:**  
+    - If extraction produces an unexpected output format, condition may fail.  
+    - Empty extraction arrays correctly end workflow without alerts.  
+  - **Version Requirements:** Standard If node; compatible with all recent n8n versions.
+
+---
+
+#### 2.5 Notification Formatting
+
+- **Overview:**  
+  Converts extracted items into a formatted Markdown message for Telegram, summarizing new or updated jobs/products.
+
+- **Nodes Involved:**  
+  - Format Notification Message
+
+- **Node Details:**  
+  - **Name:** Format Notification Message  
+  - **Type:** Function Node  
+  - **Configuration:**  
+    - JavaScript code iterates over extracted items:  
+      - Accesses `JobTitle` and `JobLink` (or fallback properties like `ProductName`).  
+      - Optionally appends `StockStatus` if available.  
+      - Builds a Markdown-formatted string with bold titles and links.  
+    - Outputs a single JSON object with property `notificationMessage`.  
+  - **Inputs:** Receives data from `If Items Found` (True path).  
+  - **Outputs:** Sends formatted text to `Send Telegram Alert`.  
+  - **Edge Cases / Failures:**  
+    - Missing or misnamed properties in extracted data lead to placeholders like "N/A" or "No link".  
+    - Empty input array leads to message: "No new items found during this check." (though this path should not occur due to prior conditional).  
+  - **Version Requirements:** Function node v1+ standard.
+
+---
+
+#### 2.6 Alert Dispatch
+
+- **Overview:**  
+  Sends the formatted notification message via Telegram Bot API to a specified chat.
+
+- **Nodes Involved:**  
+  - Send Telegram Alert
+
+- **Node Details:**  
+  - **Name:** Send Telegram Alert  
+  - **Type:** Telegram Node  
+  - **Configuration:**  
+    - Credentials: Requires a Telegram API credential with a Bot Token obtained from @BotFather on Telegram.  
+    - Chat ID: Must be set to the Telegram chat's unique ID where alerts are sent.  
+    - Text: Dynamically set as `{{ $json.notificationMessage }}` from previous node.  
+    - Parse Mode: Set to `Markdown` to allow bold and link formatting.  
+  - **Inputs:** Receives formatted message from `Format Notification Message`.  
+  - **Outputs:** Terminal node; no further outputs.  
+  - **Edge Cases / Failures:**  
+    - Invalid bot token or chat ID causes authentication errors.  
+    - Telegram API rate limits may delay or block message sending.  
+    - Message formatting errors if Markdown syntax is incorrect.  
+  - **Version Requirements:** Telegram node v1+, credentials must be configured.
+
+---
+
+### 3. Summary Table
+
+| Node Name               | Node Type            | Functional Role          | Input Node(s)            | Output Node(s)                | Sticky Note                                                                                                           |
+|-------------------------|----------------------|-------------------------|--------------------------|------------------------------|-----------------------------------------------------------------------------------------------------------------------|
+| Hourly Monitor Trigger   | Cron Trigger         | Workflow trigger        | None                     | Fetch Webpage Content         | This `Cron` node will trigger the workflow automatically every **hour**. Adjust schedule in node parameters.          |
+| Fetch Webpage Content    | HTTP Request         | Download webpage HTML   | Hourly Monitor Trigger   | Extract Job/Product Info      | Change URL to your target site. Response format set to `string`. May require auth or advanced scraping if JS is used. |
+| Extract Job/Product Info | HTML Extract         | Extract data with CSS   | Fetch Webpage Content    | If Items Found                | Define CSS selectors carefully; test output to ensure correct data extraction.                                        |
+| If Items Found           | If Node              | Conditional check       | Extract Job/Product Info | Format Notification Message   | Checks if any items were extracted to decide whether to send notification.                                            |
+| Format Notification Msg  | Function             | Format alert message    | If Items Found           | Send Telegram Alert           | Formats extracted data into Markdown message for Telegram alert.                                                      |
+| Send Telegram Alert      | Telegram             | Send notification       | Format Notification Msg  | None                         | Requires Telegram Bot token and Chat ID. Set Parse Mode to Markdown.                                                  |
+
+---
+
+### 4. Reproducing the Workflow from Scratch
+
+1. **Create a Cron Trigger Node**  
+   - Name: `Hourly Monitor Trigger`  
+   - Type: Cron Trigger  
+   - Set Mode to `everyHour` (or customize as needed for your monitoring frequency).  
+
+2. **Add HTTP Request Node**  
+   - Name: `Fetch Webpage Content`  
+   - Type: HTTP Request  
+   - Connect input from `Hourly Monitor Trigger`.  
+   - Set URL to the webpage you want to monitor (e.g., `https://your-target-site.com/jobs`).  
+   - Set Response Format to `string` to get raw HTML.  
+   - If needed, configure authentication headers, cookies, or tokens depending on the target site.  
+
+3. **Add HTML Extract Node**  
+   - Name: `Extract Job/Product Info`  
+   - Type: HTML Extract  
+   - Connect input from `Fetch Webpage Content`.  
+   - Set HTML field to `{{ $node["Fetch Webpage Content"].json.data }}` (use expression editor).  
+   - Add Extract Operations:  
+     - For each data point, add an operation with:  
+       - Selector: Use browser developer tools to find CSS selector for the target element (e.g., job title).  
+       - Attribute: Usually `textContent` for text or `href` for links.  
+       - Property Name: e.g., `JobTitle`, `JobLink`.  
+   - Test extraction to verify selectors are correct.  
+
+4. **Add If Node for Conditional Check**  
+   - Name: `If Items Found`  
+   - Connect input from `Extract Job/Product Info`.  
+   - Configure condition:  
+     - Value 1: `{{ $json.length }}` (expression)  
+     - Operation: `notEqual`  
+     - Value 2: `0`  
+   - This checks if any items were extracted.  
+
+5. **Add Function Node to Format Notification**  
+   - Name: `Format Notification Message`  
+   - Connect input from `If Items Found` (True path).  
+   - Enter the provided JavaScript function code which:  
+     - Iterates over items.  
+     - Extracts title, link, and optional stock status.  
+     - Builds a Markdown message summarizing all items.  
+
+6. **Add Telegram Node to Send Alert**  
+   - Name: `Send Telegram Alert`  
+   - Connect input from `Format Notification Message`.  
+   - Choose or create Telegram API credential:  
+     - Obtain Bot Token from @BotFather on Telegram.  
+   - Set Chat ID to your Telegram chat’s unique identifier.  
+     - Obtain by sending a message to your bot and querying `getUpdates` via Telegram API.  
+   - Set Text field to `{{ $json.notificationMessage }}` (expression).  
+   - Set Parse Mode to `Markdown`.  
+
+7. **Save and Activate the Workflow**  
+   - Test each step individually to ensure proper data flow and outputs.  
+   - Run the workflow manually first, then enable automatic triggering.  
+
+---
+
+### 5. General Notes & Resources
+
+| Note Content                                                                                                                                                          | Context or Link                                                                                                 |
+|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------|
+| To find CSS selectors for extraction, use browser developer tools: right-click element → Inspect → right-click code → Copy → Copy selector.                         | Standard browser dev tools (Chrome, Firefox).                                                                 |
+| Telegram Bot setup instructions: create a bot using @BotFather, get Bot Token, send a message to your bot, then obtain chat ID via `https://api.telegram.org/bot...` | Telegram API documentation and BotFather instructions.                                                        |
+| For sites with JavaScript-loaded content, consider using Puppeteer or Playwright within a `Code` node for scraping dynamic content.                                | n8n documentation on headless browser scraping techniques.                                                     |
+| Adjust schedule in Cron node to avoid rate limiting or excessive requests on the target website.                                                                     | Best practice for ethical scraping and avoiding bans.                                                          |
+| Markdown formatting in Telegram supports bolding (`**text**`), italics, links, etc.; verify message correctness before sending alerts.                            | Telegram Markdown parse mode documentation.                                                                    |
+
+---
+
+**Disclaimer:** The text provided originates exclusively from an automated workflow created with n8n, a tool for integration and automation. This processing strictly adheres to current content policies and contains no illegal, offensive, or protected elements. All handled data is legal and public.
\ No newline at end of file