diff --git a/workflows/Build a Reddit no-API weekly digest with ScrapeOps and Google Sheets-13992/readme-13992.md b/workflows/Build a Reddit no-API weekly digest with ScrapeOps and Google Sheets-13992/readme-13992.md
new file mode 100644
index 000000000..ab3984186
--- /dev/null
+++ b/workflows/Build a Reddit no-API weekly digest with ScrapeOps and Google Sheets-13992/readme-13992.md
@@ -0,0 +1,847 @@
+Build a Reddit no-API weekly digest with ScrapeOps and Google Sheets
+
+https://n8nworkflows.xyz/workflows/build-a-reddit-no-api-weekly-digest-with-scrapeops-and-google-sheets-13992
+
+
+# Build a Reddit no-API weekly digest with ScrapeOps and Google Sheets
+
+# 1. Workflow Overview
+
+This workflow creates a weekly Reddit industry digest without using the Reddit API. It scrapes public subreddit listing pages through ScrapeOps, extracts post metadata, enriches posts with Reddit JSON post details, deduplicates against a Google Sheet, stores only new posts, then compiles a weekly digest and optionally emails it.
+
+Typical use cases:
+- Weekly monitoring of technical communities such as `selfhosted`, `devops`, `programming`, and `webdev`
+- Building a content pipeline for newsletters, internal trend reports, or research tracking
+- Persisting scraped community content into Google Sheets for later analysis
+
+## 1.1 Trigger & Runtime Configuration
+
+The workflow starts on a weekly schedule. It calculates the current week range, resets workflow-level static memory, and emits one item per subreddit with shared configuration such as timeframe, per-subreddit post limit, and Google Sheet ID.
+
+## 1.2 Subreddit Listing Scraping
+
+Each subreddit is processed one at a time via batching. For each subreddit, the workflow requests the “Top of Week” page from `old.reddit.com` through ScrapeOps Proxy and inserts a randomized 1–3 second delay.
+
+## 1.3 Listing Parsing
+
+The returned HTML is parsed in a Code node. The parser extracts metadata including post title, canonical Reddit URL, author, flair, score, comment count, timestamp, and a generated SHA-1 content hash.
+
+## 1.4 Post Enrichment
+
+For every parsed listing item, the workflow fetches the corresponding Reddit `.json` endpoint through ScrapeOps, extracts `selftext` and inferred post type, then merges that data back into the listing metadata and normalizes the final post fields.
+
+## 1.5 Deduplication & Persistence
+
+In parallel with the subreddit configuration stage, the workflow reads existing rows from the `posts` tab in Google Sheets. New scraped posts are compared against existing sheet content using both `content_hash` and normalized `post_url`. Only unseen posts are marked as new and appended to the `posts` sheet.
+
+## 1.6 Weekly Digest Generation & Delivery
+
+After batch processing completes, the workflow builds a digest from the in-memory collection of newly discovered posts. It derives lightweight topics from repeated words, selects top posts by score and comment count, writes a summary row into the `weekly_digest` sheet, and optionally emails the digest.
+
+---
+
+# 2. Block-by-Block Analysis
+
+## 2.1 Block: Trigger & Configuration
+
+### Overview
+This block launches the workflow weekly and prepares one execution item per target subreddit. It also initializes workflow static data used later for deduplication and digest generation.
+
+### Nodes Involved
+- Weekly Schedule Trigger
+- Configure Subreddits & Week Range
+
+### Node Details
+
+#### Weekly Schedule Trigger
+- **Type and role:** `n8n-nodes-base.scheduleTrigger`; workflow entry point
+- **Configuration choices:** Uses a basic interval rule. In this exported JSON the rule is minimal and represents a schedule-based trigger intended to run weekly.
+- **Key expressions or variables used:** None
+- **Input and output connections:**
+ - Input: none
+ - Output: `Configure Subreddits & Week Range`
+- **Version-specific requirements:** Type version 1
+- **Edge cases / failure types:**
+ - Misconfigured schedule may cause it to run too often or not at all
+ - Timezone interpretation depends on instance settings
+- **Sub-workflow reference:** None
+
+#### Configure Subreddits & Week Range
+- **Type and role:** `n8n-nodes-base.code`; emits runtime configuration records
+- **Configuration choices:**
+ - Resets static global arrays:
+ - `global.seen = []`
+ - `global.newPosts = []`
+ - Calculates:
+ - `run_id` from current ISO timestamp
+ - `run_date` as `YYYY-MM-DD`
+ - `week_range` from Monday to Sunday in UTC
+ - Defines subreddit list:
+ - `selfhosted`
+ - `devops`
+ - `programming`
+ - `webdev`
+ - Sets fixed parameters per emitted item:
+ - `sort = "top"`
+ - `time_range = "week"`
+ - `limit = 20`
+ - `sheet_id = "1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI"`
+- **Key expressions or variables used:**
+ - Workflow static data via `$getWorkflowStaticData('global')`
+ - UTC week calculations
+- **Input and output connections:**
+ - Input: `Weekly Schedule Trigger`
+ - Outputs:
+ - ` Split Subreddits Into Batches`
+ - `Read Existing Posts from Sheet`
+- **Version-specific requirements:** Code node type version 2
+- **Edge cases / failure types:**
+ - If static data is unavailable in a given runtime, fallback behavior relies on `globalThis`
+ - Hardcoded subreddit list requires manual editing for changes
+ - Hardcoded sheet ID may drift from the IDs configured in Google Sheets nodes
+- **Sub-workflow reference:** None
+
+---
+
+## 2.2 Block: Scrape Subreddit Listings
+
+### Overview
+This block processes subreddits sequentially, fetches each subreddit’s weekly top page through ScrapeOps, and deliberately slows the request cadence to reduce scraping pressure.
+
+### Nodes Involved
+- Split Subreddits Into Batches
+- ScrapeOps: Fetch Subreddit Listing
+- Polite Delay (1–3s)
+
+### Node Details
+
+#### Split Subreddits Into Batches
+- **Type and role:** `n8n-nodes-base.splitInBatches`; controls iteration over subreddit items
+- **Configuration choices:**
+ - `batchSize = 1`
+ - Processes one subreddit at a time
+- **Key expressions or variables used:** None
+- **Input and output connections:**
+ - Input:
+ - `Configure Subreddits & Week Range`
+ - loop-back from `Append New Posts to Sheet`
+ - Outputs:
+ - `ScrapeOps: Fetch Subreddit Listing`
+ - `Build Weekly Digest`
+- **Version-specific requirements:** Type version 2
+- **Edge cases / failure types:**
+ - Because `Build Weekly Digest` is connected to the second output, the digest runs when batching completes; if no new posts were collected, the digest may still execute with empty data
+ - Loop behavior depends on n8n batch semantics; incorrect downstream termination can cause partial processing
+- **Sub-workflow reference:** None
+
+#### ScrapeOps: Fetch Subreddit Listing
+- **Type and role:** `@scrapeops/n8n-nodes-scrapeops.ScrapeOps`; performs proxied HTTP fetch of subreddit HTML
+- **Configuration choices:**
+ - URL expression:
+ - `https://old.reddit.com/r/{{$json.subreddit}}/top/?t=week`
+ - Uses ScrapeOps account credential
+ - No advanced options explicitly set
+- **Key expressions or variables used:**
+ - `$json.subreddit`
+- **Input and output connections:**
+ - Input: ` Split Subreddits Into Batches`
+ - Output: ` Polite Delay (1–3s)`
+- **Version-specific requirements:** ScrapeOps node type version 1; requires installed ScrapeOps n8n node package and valid API credentials
+- **Edge cases / failure types:**
+ - Invalid or missing ScrapeOps credential
+ - Reddit returning blocked, challenge, or alternate HTML
+ - Network timeout or proxy errors
+ - If subreddit does not exist, parsing stage may return zero posts
+- **Sub-workflow reference:** None
+
+#### Polite Delay (1–3s)
+- **Type and role:** `n8n-nodes-base.wait`; rate-control pause
+- **Configuration choices:**
+ - Wait duration in seconds
+ - Randomized expression: `Math.floor(Math.random()*3)+1`
+- **Key expressions or variables used:**
+ - Dynamic wait duration expression
+- **Input and output connections:**
+ - Input: `ScrapeOps: Fetch Subreddit Listing`
+ - Output: `Parse Listing HTML → Post Metadata`
+- **Version-specific requirements:** Type version 1
+- **Edge cases / failure types:**
+ - Wait node resumes execution asynchronously; environment must support wait/resume properly
+ - Very large runs can accumulate runtime overhead
+- **Sub-workflow reference:** None
+
+---
+
+## 2.3 Block: Parse Post Metadata
+
+### Overview
+This block converts scraped subreddit listing HTML into structured post objects. It also generates fallback metadata, detects subreddit names, and computes stable hashes for deduplication.
+
+### Nodes Involved
+- Parse Listing HTML → Post Metadata
+
+### Node Details
+
+#### Parse Listing HTML → Post Metadata
+- **Type and role:** `n8n-nodes-base.code`; HTML parser and record builder
+- **Configuration choices:**
+ - Reads HTML from `$json.data`, `$json.body`, or raw `$json`
+ - Uses `limit` from input, defaulting to 20
+ - Calculates fallback values:
+ - `run_id`
+ - `run_date`
+ - `week_range`
+ - Parses old Reddit listing blocks by splitting on `
Fires weekly and sets runtime config — subreddit list, week range, batch size, and Google Sheet IDs. |
+| Configure Subreddits & Week Range | Code | Builds per-subreddit runtime items and resets static state | Weekly Schedule Trigger | Split Subreddits Into Batches; Read Existing Posts from Sheet | ## 1. Trigger & Configuration
Fires weekly and sets runtime config — subreddit list, week range, batch size, and Google Sheet IDs. |
+| Split Subreddits Into Batches | Split In Batches | Iterates through subreddits one at a time | Configure Subreddits & Week Range; Append New Posts to Sheet | ScrapeOps: Fetch Subreddit Listing; Build Weekly Digest | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. |
+| ScrapeOps: Fetch Subreddit Listing | ScrapeOps | Fetches subreddit top-of-week HTML via proxy | Split Subreddits Into Batches | Polite Delay (1–3s) | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. |
+| Polite Delay (1–3s) | Wait | Adds random delay between requests | ScrapeOps: Fetch Subreddit Listing | Parse Listing HTML → Post Metadata | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. |
+| Parse Listing HTML → Post Metadata | Code | Parses old Reddit listing HTML into structured posts | Polite Delay (1–3s) | ScrapeOps: Fetch Post Details (JSON); Merge Post Metadata + Text | ## 3. Parse Post Metadata
Extract title, URL, score, comment count, author, and timestamps from listing HTML into structured JSON. |
+| ScrapeOps: Fetch Post Details (JSON) | ScrapeOps | Fetches per-post Reddit JSON | Parse Listing HTML → Post Metadata | Extract Selftext & Post Type | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. |
+| Extract Selftext & Post Type | Code | Extracts selftext and post characteristics from Reddit JSON | ScrapeOps: Fetch Post Details (JSON) | Merge Post Metadata + Text | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. |
+| Merge Post Metadata + Text | Merge | Merges listing metadata with post JSON extraction | Parse Listing HTML → Post Metadata; Extract Selftext & Post Type | Finalize & Normalize Post Fields | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. |
+| Finalize & Normalize Post Fields | Code | Chooses best post text and cleans fields | Merge Post Metadata + Text | Merge Scraped + Existing Posts | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. |
+| Read Existing Posts from Sheet | Google Sheets | Loads existing saved posts for deduplication | Configure Subreddits & Week Range | Merge Scraped + Existing Posts | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. |
+| Merge Scraped + Existing Posts | Merge | Synchronizes scraped branch and sheet-read branch before deduplication | Finalize & Normalize Post Fields; Read Existing Posts from Sheet | Deduplicate New Posts | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. |
+| Deduplicate New Posts | Code | Flags duplicates using hash and URL and stores new posts in static memory | Merge Scraped + Existing Posts | Append New Posts to Sheet | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. |
+| Append New Posts to Sheet | Google Sheets | Appends post rows to the `posts` sheet and loops batch execution | Deduplicate New Posts | Split Subreddits Into Batches | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. |
+| Build Weekly Digest | Code | Builds digest summary from newly found posts | Split Subreddits Into Batches | Append Weekly Digest to Sheet | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. |
+| Append Weekly Digest to Sheet | Google Sheets | Stores weekly digest in sheet | Build Weekly Digest | Send Weekly Digest Email | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. |
+| Send Weekly Digest Email | Email Send | Emails the final digest text | Append Weekly Digest to Sheet | | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. |
+| Overview (Sticky) | Sticky Note | Workspace documentation | | | # 📰 Reddit Industry Digest (Weekly) → Google Sheets
This workflow builds a weekly industry digest by collecting top posts from selected subreddits — no Reddit API needed. It scrapes public Reddit pages via **ScrapeOps Proxy**, enriches each post with full text using Reddit's JSON endpoint, deduplicates against your Google Sheet, and generates a weekly summary that can optionally be emailed.
### How it works
1. ⏰ **Weekly Schedule Trigger** fires automatically once a week.
2. ⚙️ **Configure Subreddits & Week Range** sets the subreddit list, week range, and Sheet IDs.
3. 📦 **Split Subreddits Into Batches** processes each subreddit one at a time.
4. 🌐 **ScrapeOps: Fetch Subreddit Listing** scrapes the top-of-week page from `old.reddit.com`.
5. ⏳ **Polite Delay** adds a 1–3s pause between requests.
6. 🔍 **Parse Listing HTML** extracts title, URL, score, comments, author, and timestamps.
7. 📡 **ScrapeOps: Fetch Post Details** retrieves each post as JSON to extract `selftext`.
8. 🔀 **Merge & Normalize** combines listing data with post body text into a final record.
9. 🧹 **Deduplicate New Posts** filters posts already in the Sheet by hash and URL.
10. 💾 **Append New Posts** saves only new posts to the `posts` tab.
11. 📊 **Build Weekly Digest** generates topic clusters and top post summaries.
12. 📧 **Send Digest Email** optionally emails the weekly summary.
### Setup steps
- Register for a free ScrapeOps API key: https://scrapeops.io/app/register/n8n
- Add ScrapeOps credentials in n8n. Docs: https://scrapeops.io/docs/n8n/overview/
- Duplicate [this sheet](https://docs.google.com/spreadsheets/d/1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI/edit?usp=sharing) to copy Columns and Spreadsheet ID.
- Connect Google Sheets and set your Spreadsheet ID in the Sheet nodes.
- Update your subreddit list in **Configure Subreddits & Week Range**.
- Optional: enable **Send Digest Email** and configure credentials.
### Customization
- Add or remove subreddits in the configure node.
- Change timeframe from `week` to `month` in the fetch URL.
- Add a Slack node to post the digest to a channel. |
+| Section: Trigger & Inputs | Sticky Note | Visual section label | | | ## 1. Trigger & Configuration
Fires weekly and sets runtime config — subreddit list, week range, batch size, and Google Sheet IDs. |
+| Section: Scrape Listings | Sticky Note | Visual section label | | | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. |
+| Section: Post Enrichment | Sticky Note | Visual section label | | | ## 3. Parse Post Metadata
Extract title, URL, score, comment count, author, and timestamps from listing HTML into structured JSON. |
+| Section: Post Enrichment1 | Sticky Note | Visual section label | | | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. |
+| Section: Post Enrichment2 | Sticky Note | Visual section label | | | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. |
+| Section: Post Enrichment3 | Sticky Note | Visual section label | | | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. |
+
+---
+
+# 4. Reproducing the Workflow from Scratch
+
+1. **Create a new workflow**
+ - Name it something like: `Reddit Industry Digest with ScrapeOps and Google Sheets`.
+
+2. **Add a Schedule Trigger node**
+ - Node type: `Schedule Trigger`
+ - Configure it to run weekly.
+ - Choose the desired weekday and time in your n8n instance timezone.
+
+3. **Add a Code node named `Configure Subreddits & Week Range`**
+ - Connect it after the trigger.
+ - Paste logic that:
+ - resets workflow static global arrays `seen` and `newPosts`
+ - computes:
+ - `run_id`
+ - `run_date`
+ - Monday-to-Sunday `week_range`
+ - defines a subreddit list such as:
+ - `selfhosted`
+ - `devops`
+ - `programming`
+ - `webdev`
+ - emits one item per subreddit with:
+ - `subreddit`
+ - `sort = top`
+ - `time_range = week`
+ - `limit = 20`
+ - `sheet_id = your spreadsheet ID`
+
+4. **Add a Google Sheets credential**
+ - Use OAuth2 for Google Sheets.
+ - Ensure access to the destination spreadsheet.
+
+5. **Prepare the spreadsheet**
+ - Create or duplicate a spreadsheet with two tabs:
+ - `posts`
+ - `weekly_digest`
+ - The `posts` tab should contain columns:
+ - `run_id`
+ - `run_date`
+ - `subreddit`
+ - `sort`
+ - `time_range`
+ - `post_id`
+ - `post_url`
+ - `post_title`
+ - `post_text`
+ - `author`
+ - `created_utc`
+ - `score`
+ - `num_comments`
+ - `flair`
+ - `extracted_at`
+ - `content_hash`
+ - `is_new`
+ - The `weekly_digest` tab should contain columns:
+ - `run_id`
+ - `week_range`
+ - `subreddits`
+ - `total_posts`
+ - `top_topics_json`
+ - `weekly_brief_text`
+ - `top_posts_json`
+ - `created_at`
+
+6. **Add a Google Sheets node named `Read Existing Posts from Sheet`**
+ - Connect it from `Configure Subreddits & Week Range`.
+ - Configure it to read from your spreadsheet.
+ - Select the `posts` tab.
+ - Enable it to output data even if empty, if available in your node version.
+
+7. **Add a `Split In Batches` node**
+ - Name it ` Split Subreddits Into Batches`.
+ - Connect it from `Configure Subreddits & Week Range`.
+ - Set `Batch Size` to `1`.
+
+8. **Install and configure ScrapeOps**
+ - Install the ScrapeOps n8n node package if it is not already installed.
+ - Create ScrapeOps credentials with your API key.
+ - Reference:
+ - https://scrapeops.io/app/register/n8n
+ - https://scrapeops.io/docs/n8n/overview/
+
+9. **Add a ScrapeOps node named `ScrapeOps: Fetch Subreddit Listing`**
+ - Connect it from ` Split Subreddits Into Batches`.
+ - Set URL to:
+ - `https://old.reddit.com/r/{{$json.subreddit}}/top/?t=week`
+ - Use the ScrapeOps credential.
+ - Keep response as HTML/text.
+
+10. **Add a Wait node named ` Polite Delay (1–3s)`**
+ - Connect it after the listing fetch.
+ - Set unit to `seconds`.
+ - Set amount expression to:
+ - `{{ Math.floor(Math.random()*3)+1 }}`
+
+11. **Add a Code node named `Parse Listing HTML → Post Metadata`**
+ - Connect it after the wait node.
+ - Implement logic that:
+ - reads listing HTML from `data` or `body`
+ - parses each Reddit post block from `old.reddit.com`
+ - extracts title, author, permalink, score, comments, flair, and timestamp
+ - normalizes Reddit URLs to `https://www.reddit.com/...`
+ - computes `content_hash` using SHA-1
+ - emits one item per post
+ - honors a `limit` from input, default `20`
+ - Enable `Always Output Data`.
+
+12. **Add a ScrapeOps node named ` ScrapeOps: Fetch Post Details (JSON)`**
+ - Connect it from `Parse Listing HTML → Post Metadata`.
+ - Set URL expression to:
+ - `{{ ($json.post_url || '').replace(/\?.*$/, '').replace(/\/$/, '') + '.json?raw_json=1' }}`
+ - Set return type to `json`.
+ - Use the same ScrapeOps credential.
+
+13. **Add a Code node named `Extract Selftext & Post Type`**
+ - Connect it after the post-details node.
+ - Implement logic that:
+ - looks for the raw response in `body`, `data`, `response`, or the longest string field
+ - decodes HTML entities
+ - rejects HTML responses
+ - parses JSON
+ - extracts post data from `data.children[0].data`
+ - emits fields including:
+ - `post_text_extracted`
+ - `post_type`
+ - `post_title`
+ - `post_id`
+ - `post_url`
+ - `subreddit`
+ - `score`
+ - `num_comments`
+ - `author`
+ - `created_utc`
+ - returns diagnostic data on parse failure
+
+14. **Add a Merge node named `Merge Post Metadata + Text`**
+ - Connect input 0 from `Parse Listing HTML → Post Metadata`
+ - Connect input 1 from `Extract Selftext & Post Type`
+ - Set:
+ - Mode: `Combine`
+ - Combination mode: `Merge By Position`
+
+15. **Add a Code node named `Finalize & Normalize Post Fields`**
+ - Connect it after the merge.
+ - Configure it to:
+ - overwrite `post_text` with `post_text_extracted` when non-empty
+ - otherwise keep the existing `post_text`
+ - remove `post_text_extracted`
+
+16. **Add a Merge node named ` Merge Scraped + Existing Posts`**
+ - Connect input 0 from `Finalize & Normalize Post Fields`
+ - Connect input 1 from `Read Existing Posts from Sheet`
+ - Set:
+ - Mode: `Combine`
+ - Combination mode: `Merge By Position`
+ - Note: this node mainly acts as a synchronization point.
+
+17. **Add a Code node named `Deduplicate New Posts`**
+ - Connect it after the merge.
+ - Implement logic that:
+ - loads workflow static global data
+ - reads all rows from `Read Existing Posts from Sheet` with `$items(...)`
+ - builds a set of existing `content_hash` and normalized `post_url`
+ - checks each scraped item against that set
+ - sets `is_new` true or false
+ - pushes only new posts into `global.newPosts`
+ - returns items for downstream use
+ - Enable `Always Output Data`.
+
+18. **Important correction: filter before appending**
+ - The provided workflow claims to append only new posts, but as wired it returns all items to the append node.
+ - To reproduce the intended behavior safely, add an `IF` node or a Code filter after `Deduplicate New Posts`:
+ - condition: `{{$json.is_new}}` is true
+ - Send only the true branch to the append node.
+ - If reproducing the JSON exactly, omit this filter; if reproducing the intended logic, include it.
+
+19. **Add a Google Sheets node named `Append New Posts to Sheet`**
+ - Connect it from:
+ - ideally the filtered `true` branch from step 18
+ - or directly from `Deduplicate New Posts` if you want to mirror the provided wiring
+ - Configure:
+ - Operation: `Append`
+ - Spreadsheet: your spreadsheet
+ - Sheet: `posts`
+ - Map the columns explicitly to the fields listed in step 5
+
+20. **Loop batch execution**
+ - Connect `Append New Posts to Sheet` back to ` Split Subreddits Into Batches`.
+ - This continues processing the next subreddit.
+
+21. **Add a Code node named `Build Weekly Digest`**
+ - Connect it to the second output of ` Split Subreddits Into Batches`, which runs when batching completes.
+ - Implement logic that:
+ - reads `global.newPosts`
+ - counts total new posts
+ - creates a subreddit summary
+ - tokenizes title + post text
+ - excludes common stopwords
+ - derives top keywords and simple topic clusters
+ - sorts top posts by score, then comment count
+ - creates:
+ - `top_topics_json`
+ - `top_posts_json`
+ - `weekly_brief_text`
+ - `created_at`
+ - `run_id`
+ - `week_range`
+
+22. **Add a Google Sheets node named `Append Weekly Digest to Sheet`**
+ - Connect it after `Build Weekly Digest`.
+ - Configure:
+ - Operation: `Append`
+ - Sheet: `weekly_digest`
+ - Explicitly map:
+ - `run_id`
+ - `created_at`
+ - `subreddits`
+ - `week_range`
+ - `total_posts`
+ - `top_posts_json`
+ - `top_topics_json`
+ - `weekly_brief_text`
+
+23. **Add an Email Send node named `Send Weekly Digest Email`**
+ - Connect it after `Append Weekly Digest to Sheet`.
+ - Configure:
+ - To: your recipient address
+ - From: a valid sender address
+ - Subject: `Weekly Developer Tools Digest (Reddit) – {{$json.week_range}}`
+ - Text body: `{{$json.weekly_brief_text}}`
+ - Enable `Execute Once`.
+
+24. **Configure email credentials**
+ - Depending on your n8n environment, configure SMTP or the supported email transport.
+ - Replace placeholder addresses.
+
+25. **Test with manual execution**
+ - Run the workflow manually.
+ - Verify:
+ - subreddit pages are fetched
+ - posts are parsed
+ - per-post JSON is readable
+ - `posts` tab receives rows
+ - `weekly_digest` tab receives one digest row
+ - email sends correctly if enabled
+
+26. **Validate edge conditions**
+ - Test with:
+ - a nonexistent subreddit
+ - an empty `posts` tab
+ - a repeated run on the same week
+ - one or more link/image posts with empty `selftext`
+
+27. **Recommended hardening improvements**
+ - Add a filter before `Append New Posts to Sheet` so only `is_new = true` rows are appended
+ - Replace merge-by-position with a safer key-based join where practical
+ - Add error handling for blocked HTML, bad JSON, and credential failures
+ - Move hardcoded subreddit list and spreadsheet ID into environment variables or workflow variables
+
+---
+
+# 5. General Notes & Resources
+
+| Note Content | Context or Link |
+|---|---|
+| Register for a free ScrapeOps API key | https://scrapeops.io/app/register/n8n |
+| ScrapeOps n8n documentation | https://scrapeops.io/docs/n8n/overview/ |
+| ScrapeOps Proxy API documentation | https://scrapeops.io/docs/n8n/proxy-api/ |
+| Duplicate the sample Google Sheet template | https://docs.google.com/spreadsheets/d/1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI/edit?usp=sharing |
+| Customization note: add or remove subreddits in the configuration Code node | Workflow setup note |
+| Customization note: change timeframe from `week` to `month` in the listing fetch URL | Workflow setup note |
+| Customization note: add a Slack node to send the digest to a channel | Workflow setup note |
+
+## Additional implementation observations
+- The workflow has a single entry point: `Weekly Schedule Trigger`.
+- There are no sub-workflows or workflow-execution nodes in this workflow.
+- The current implementation does **not fully enforce** “append only new posts” because `Deduplicate New Posts` returns all items and `Append New Posts to Sheet` receives them directly.
+- The digest is based only on posts collected during the current run and stored in `global.newPosts`, not on all posts in the spreadsheet.
+- The workflow depends on `old.reddit.com` HTML structure; if Reddit changes markup, the parser will need updates.
\ No newline at end of file
Fires weekly and sets runtime config — subreddit list, week range, batch size, and Google Sheet IDs. | +| Split Subreddits Into Batches | Split In Batches | Iterates through subreddits one at a time | Configure Subreddits & Week Range; Append New Posts to Sheet | ScrapeOps: Fetch Subreddit Listing; Build Weekly Digest | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. | +| ScrapeOps: Fetch Subreddit Listing | ScrapeOps | Fetches subreddit top-of-week HTML via proxy | Split Subreddits Into Batches | Polite Delay (1–3s) | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. | +| Polite Delay (1–3s) | Wait | Adds random delay between requests | ScrapeOps: Fetch Subreddit Listing | Parse Listing HTML → Post Metadata | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. | +| Parse Listing HTML → Post Metadata | Code | Parses old Reddit listing HTML into structured posts | Polite Delay (1–3s) | ScrapeOps: Fetch Post Details (JSON); Merge Post Metadata + Text | ## 3. Parse Post Metadata
Extract title, URL, score, comment count, author, and timestamps from listing HTML into structured JSON. | +| ScrapeOps: Fetch Post Details (JSON) | ScrapeOps | Fetches per-post Reddit JSON | Parse Listing HTML → Post Metadata | Extract Selftext & Post Type | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. | +| Extract Selftext & Post Type | Code | Extracts selftext and post characteristics from Reddit JSON | ScrapeOps: Fetch Post Details (JSON) | Merge Post Metadata + Text | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. | +| Merge Post Metadata + Text | Merge | Merges listing metadata with post JSON extraction | Parse Listing HTML → Post Metadata; Extract Selftext & Post Type | Finalize & Normalize Post Fields | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. | +| Finalize & Normalize Post Fields | Code | Chooses best post text and cleans fields | Merge Post Metadata + Text | Merge Scraped + Existing Posts | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. | +| Read Existing Posts from Sheet | Google Sheets | Loads existing saved posts for deduplication | Configure Subreddits & Week Range | Merge Scraped + Existing Posts | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. | +| Merge Scraped + Existing Posts | Merge | Synchronizes scraped branch and sheet-read branch before deduplication | Finalize & Normalize Post Fields; Read Existing Posts from Sheet | Deduplicate New Posts | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. | +| Deduplicate New Posts | Code | Flags duplicates using hash and URL and stores new posts in static memory | Merge Scraped + Existing Posts | Append New Posts to Sheet | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. | +| Append New Posts to Sheet | Google Sheets | Appends post rows to the `posts` sheet and loops batch execution | Deduplicate New Posts | Split Subreddits Into Batches | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. | +| Build Weekly Digest | Code | Builds digest summary from newly found posts | Split Subreddits Into Batches | Append Weekly Digest to Sheet | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. | +| Append Weekly Digest to Sheet | Google Sheets | Stores weekly digest in sheet | Build Weekly Digest | Send Weekly Digest Email | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. | +| Send Weekly Digest Email | Email Send | Emails the final digest text | Append Weekly Digest to Sheet | | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. | +| Overview (Sticky) | Sticky Note | Workspace documentation | | | # 📰 Reddit Industry Digest (Weekly) → Google Sheets
This workflow builds a weekly industry digest by collecting top posts from selected subreddits — no Reddit API needed. It scrapes public Reddit pages via **ScrapeOps Proxy**, enriches each post with full text using Reddit's JSON endpoint, deduplicates against your Google Sheet, and generates a weekly summary that can optionally be emailed.
### How it works
1. ⏰ **Weekly Schedule Trigger** fires automatically once a week.
2. ⚙️ **Configure Subreddits & Week Range** sets the subreddit list, week range, and Sheet IDs.
3. 📦 **Split Subreddits Into Batches** processes each subreddit one at a time.
4. 🌐 **ScrapeOps: Fetch Subreddit Listing** scrapes the top-of-week page from `old.reddit.com`.
5. ⏳ **Polite Delay** adds a 1–3s pause between requests.
6. 🔍 **Parse Listing HTML** extracts title, URL, score, comments, author, and timestamps.
7. 📡 **ScrapeOps: Fetch Post Details** retrieves each post as JSON to extract `selftext`.
8. 🔀 **Merge & Normalize** combines listing data with post body text into a final record.
9. 🧹 **Deduplicate New Posts** filters posts already in the Sheet by hash and URL.
10. 💾 **Append New Posts** saves only new posts to the `posts` tab.
11. 📊 **Build Weekly Digest** generates topic clusters and top post summaries.
12. 📧 **Send Digest Email** optionally emails the weekly summary.
### Setup steps
- Register for a free ScrapeOps API key: https://scrapeops.io/app/register/n8n
- Add ScrapeOps credentials in n8n. Docs: https://scrapeops.io/docs/n8n/overview/
- Duplicate [this sheet](https://docs.google.com/spreadsheets/d/1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI/edit?usp=sharing) to copy Columns and Spreadsheet ID.
- Connect Google Sheets and set your Spreadsheet ID in the Sheet nodes.
- Update your subreddit list in **Configure Subreddits & Week Range**.
- Optional: enable **Send Digest Email** and configure credentials.
### Customization
- Add or remove subreddits in the configure node.
- Change timeframe from `week` to `month` in the fetch URL.
- Add a Slack node to post the digest to a channel. | +| Section: Trigger & Inputs | Sticky Note | Visual section label | | | ## 1. Trigger & Configuration
Fires weekly and sets runtime config — subreddit list, week range, batch size, and Google Sheet IDs. | +| Section: Scrape Listings | Sticky Note | Visual section label | | | ## 2. Scrape Subreddit Listings
Batch through each subreddit and scrape the "Top of Week" page via [ScrapeOps Proxy](https://scrapeops.io/docs/n8n/proxy-api/) with a polite delay between requests. | +| Section: Post Enrichment | Sticky Note | Visual section label | | | ## 3. Parse Post Metadata
Extract title, URL, score, comment count, author, and timestamps from listing HTML into structured JSON. | +| Section: Post Enrichment1 | Sticky Note | Visual section label | | | ## 4. Enrich & Finalize Posts
Fetch each post as JSON to extract `selftext`, merge with listing metadata, and normalize all fields into the final record. | +| Section: Post Enrichment2 | Sticky Note | Visual section label | | | ## 5. Deduplicate & Save
Compare against existing Sheet rows by hash and URL, then append only new posts to the `posts` tab. | +| Section: Post Enrichment3 | Sticky Note | Visual section label | | | ## 6. Weekly Digest & Email
Generate topic clusters and top post summaries, write to `weekly_digest` tab, and optionally send by email. | + +--- + +# 4. Reproducing the Workflow from Scratch + +1. **Create a new workflow** + - Name it something like: `Reddit Industry Digest with ScrapeOps and Google Sheets`. + +2. **Add a Schedule Trigger node** + - Node type: `Schedule Trigger` + - Configure it to run weekly. + - Choose the desired weekday and time in your n8n instance timezone. + +3. **Add a Code node named `Configure Subreddits & Week Range`** + - Connect it after the trigger. + - Paste logic that: + - resets workflow static global arrays `seen` and `newPosts` + - computes: + - `run_id` + - `run_date` + - Monday-to-Sunday `week_range` + - defines a subreddit list such as: + - `selfhosted` + - `devops` + - `programming` + - `webdev` + - emits one item per subreddit with: + - `subreddit` + - `sort = top` + - `time_range = week` + - `limit = 20` + - `sheet_id = your spreadsheet ID` + +4. **Add a Google Sheets credential** + - Use OAuth2 for Google Sheets. + - Ensure access to the destination spreadsheet. + +5. **Prepare the spreadsheet** + - Create or duplicate a spreadsheet with two tabs: + - `posts` + - `weekly_digest` + - The `posts` tab should contain columns: + - `run_id` + - `run_date` + - `subreddit` + - `sort` + - `time_range` + - `post_id` + - `post_url` + - `post_title` + - `post_text` + - `author` + - `created_utc` + - `score` + - `num_comments` + - `flair` + - `extracted_at` + - `content_hash` + - `is_new` + - The `weekly_digest` tab should contain columns: + - `run_id` + - `week_range` + - `subreddits` + - `total_posts` + - `top_topics_json` + - `weekly_brief_text` + - `top_posts_json` + - `created_at` + +6. **Add a Google Sheets node named `Read Existing Posts from Sheet`** + - Connect it from `Configure Subreddits & Week Range`. + - Configure it to read from your spreadsheet. + - Select the `posts` tab. + - Enable it to output data even if empty, if available in your node version. + +7. **Add a `Split In Batches` node** + - Name it ` Split Subreddits Into Batches`. + - Connect it from `Configure Subreddits & Week Range`. + - Set `Batch Size` to `1`. + +8. **Install and configure ScrapeOps** + - Install the ScrapeOps n8n node package if it is not already installed. + - Create ScrapeOps credentials with your API key. + - Reference: + - https://scrapeops.io/app/register/n8n + - https://scrapeops.io/docs/n8n/overview/ + +9. **Add a ScrapeOps node named `ScrapeOps: Fetch Subreddit Listing`** + - Connect it from ` Split Subreddits Into Batches`. + - Set URL to: + - `https://old.reddit.com/r/{{$json.subreddit}}/top/?t=week` + - Use the ScrapeOps credential. + - Keep response as HTML/text. + +10. **Add a Wait node named ` Polite Delay (1–3s)`** + - Connect it after the listing fetch. + - Set unit to `seconds`. + - Set amount expression to: + - `{{ Math.floor(Math.random()*3)+1 }}` + +11. **Add a Code node named `Parse Listing HTML → Post Metadata`** + - Connect it after the wait node. + - Implement logic that: + - reads listing HTML from `data` or `body` + - parses each Reddit post block from `old.reddit.com` + - extracts title, author, permalink, score, comments, flair, and timestamp + - normalizes Reddit URLs to `https://www.reddit.com/...` + - computes `content_hash` using SHA-1 + - emits one item per post + - honors a `limit` from input, default `20` + - Enable `Always Output Data`. + +12. **Add a ScrapeOps node named ` ScrapeOps: Fetch Post Details (JSON)`** + - Connect it from `Parse Listing HTML → Post Metadata`. + - Set URL expression to: + - `{{ ($json.post_url || '').replace(/\?.*$/, '').replace(/\/$/, '') + '.json?raw_json=1' }}` + - Set return type to `json`. + - Use the same ScrapeOps credential. + +13. **Add a Code node named `Extract Selftext & Post Type`** + - Connect it after the post-details node. + - Implement logic that: + - looks for the raw response in `body`, `data`, `response`, or the longest string field + - decodes HTML entities + - rejects HTML responses + - parses JSON + - extracts post data from `data.children[0].data` + - emits fields including: + - `post_text_extracted` + - `post_type` + - `post_title` + - `post_id` + - `post_url` + - `subreddit` + - `score` + - `num_comments` + - `author` + - `created_utc` + - returns diagnostic data on parse failure + +14. **Add a Merge node named `Merge Post Metadata + Text`** + - Connect input 0 from `Parse Listing HTML → Post Metadata` + - Connect input 1 from `Extract Selftext & Post Type` + - Set: + - Mode: `Combine` + - Combination mode: `Merge By Position` + +15. **Add a Code node named `Finalize & Normalize Post Fields`** + - Connect it after the merge. + - Configure it to: + - overwrite `post_text` with `post_text_extracted` when non-empty + - otherwise keep the existing `post_text` + - remove `post_text_extracted` + +16. **Add a Merge node named ` Merge Scraped + Existing Posts`** + - Connect input 0 from `Finalize & Normalize Post Fields` + - Connect input 1 from `Read Existing Posts from Sheet` + - Set: + - Mode: `Combine` + - Combination mode: `Merge By Position` + - Note: this node mainly acts as a synchronization point. + +17. **Add a Code node named `Deduplicate New Posts`** + - Connect it after the merge. + - Implement logic that: + - loads workflow static global data + - reads all rows from `Read Existing Posts from Sheet` with `$items(...)` + - builds a set of existing `content_hash` and normalized `post_url` + - checks each scraped item against that set + - sets `is_new` true or false + - pushes only new posts into `global.newPosts` + - returns items for downstream use + - Enable `Always Output Data`. + +18. **Important correction: filter before appending** + - The provided workflow claims to append only new posts, but as wired it returns all items to the append node. + - To reproduce the intended behavior safely, add an `IF` node or a Code filter after `Deduplicate New Posts`: + - condition: `{{$json.is_new}}` is true + - Send only the true branch to the append node. + - If reproducing the JSON exactly, omit this filter; if reproducing the intended logic, include it. + +19. **Add a Google Sheets node named `Append New Posts to Sheet`** + - Connect it from: + - ideally the filtered `true` branch from step 18 + - or directly from `Deduplicate New Posts` if you want to mirror the provided wiring + - Configure: + - Operation: `Append` + - Spreadsheet: your spreadsheet + - Sheet: `posts` + - Map the columns explicitly to the fields listed in step 5 + +20. **Loop batch execution** + - Connect `Append New Posts to Sheet` back to ` Split Subreddits Into Batches`. + - This continues processing the next subreddit. + +21. **Add a Code node named `Build Weekly Digest`** + - Connect it to the second output of ` Split Subreddits Into Batches`, which runs when batching completes. + - Implement logic that: + - reads `global.newPosts` + - counts total new posts + - creates a subreddit summary + - tokenizes title + post text + - excludes common stopwords + - derives top keywords and simple topic clusters + - sorts top posts by score, then comment count + - creates: + - `top_topics_json` + - `top_posts_json` + - `weekly_brief_text` + - `created_at` + - `run_id` + - `week_range` + +22. **Add a Google Sheets node named `Append Weekly Digest to Sheet`** + - Connect it after `Build Weekly Digest`. + - Configure: + - Operation: `Append` + - Sheet: `weekly_digest` + - Explicitly map: + - `run_id` + - `created_at` + - `subreddits` + - `week_range` + - `total_posts` + - `top_posts_json` + - `top_topics_json` + - `weekly_brief_text` + +23. **Add an Email Send node named `Send Weekly Digest Email`** + - Connect it after `Append Weekly Digest to Sheet`. + - Configure: + - To: your recipient address + - From: a valid sender address + - Subject: `Weekly Developer Tools Digest (Reddit) – {{$json.week_range}}` + - Text body: `{{$json.weekly_brief_text}}` + - Enable `Execute Once`. + +24. **Configure email credentials** + - Depending on your n8n environment, configure SMTP or the supported email transport. + - Replace placeholder addresses. + +25. **Test with manual execution** + - Run the workflow manually. + - Verify: + - subreddit pages are fetched + - posts are parsed + - per-post JSON is readable + - `posts` tab receives rows + - `weekly_digest` tab receives one digest row + - email sends correctly if enabled + +26. **Validate edge conditions** + - Test with: + - a nonexistent subreddit + - an empty `posts` tab + - a repeated run on the same week + - one or more link/image posts with empty `selftext` + +27. **Recommended hardening improvements** + - Add a filter before `Append New Posts to Sheet` so only `is_new = true` rows are appended + - Replace merge-by-position with a safer key-based join where practical + - Add error handling for blocked HTML, bad JSON, and credential failures + - Move hardcoded subreddit list and spreadsheet ID into environment variables or workflow variables + +--- + +# 5. General Notes & Resources + +| Note Content | Context or Link | +|---|---| +| Register for a free ScrapeOps API key | https://scrapeops.io/app/register/n8n | +| ScrapeOps n8n documentation | https://scrapeops.io/docs/n8n/overview/ | +| ScrapeOps Proxy API documentation | https://scrapeops.io/docs/n8n/proxy-api/ | +| Duplicate the sample Google Sheet template | https://docs.google.com/spreadsheets/d/1rKuVREV4pedie7uAbuEcLvghNrAEeAjIPUBE6cQPleI/edit?usp=sharing | +| Customization note: add or remove subreddits in the configuration Code node | Workflow setup note | +| Customization note: change timeframe from `week` to `month` in the listing fetch URL | Workflow setup note | +| Customization note: add a Slack node to send the digest to a channel | Workflow setup note | + +## Additional implementation observations +- The workflow has a single entry point: `Weekly Schedule Trigger`. +- There are no sub-workflows or workflow-execution nodes in this workflow. +- The current implementation does **not fully enforce** “append only new posts” because `Deduplicate New Posts` returns all items and `Append New Posts to Sheet` receives them directly. +- The digest is based only on posts collected during the current run and stored in `global.newPosts`, not on all posts in the spreadsheet. +- The workflow depends on `old.reddit.com` HTML structure; if Reddit changes markup, the parser will need updates. \ No newline at end of file