20 KiB
Automated Facebook Group Scraper: Posts, Comments, and Sub-comments to Supabase
Automated Facebook Group Scraper: Posts, Comments, and Sub-comments to Supabase
1. Workflow Overview
This workflow automates the scraping of Facebook group data—posts, comments on those posts, and sub-comments (replies to comments)—and stores the structured information in a Supabase database. It is designed for users who want to continuously gather and archive Facebook group discussions for analysis, monitoring, or archival purposes.
The workflow is logically divided into three main functional blocks:
- 1.1 Input Reception: Captures inputs (Facebook group URL and number of posts to scrape) via a form trigger.
- 1.2 Facebook Group Posts Scraping and Storage: Scrapes group posts using the Apify Facebook Group Posts actor, formats the data, and inserts it into Supabase’s
poststable. - 1.3 Comments and Sub-comments Scraping and Storage: For each post, scrapes comments and stores them in a
commentstable, then conditionally scrapes sub-comments (replies to comments) and stores them in arepliestable in Supabase.
Each block involves multiple nodes that process data step-by-step, ensuring data integrity, conditional flows, and seamless integration between Apify scraping and database insertion.
2. Block-by-Block Analysis
2.1 Input Reception
-
Overview:
This block receives user inputs via a form submission, specifically the URL of the Facebook group and optionally the number of posts to scrape. The input triggers the workflow. -
Nodes Involved:
- On form submission
- Edit Fields1
-
Node Details:
-
On form submission
- Type:
formTrigger - Role: Entry point that waits for form data submission with fields
url(string, required) andnumber of posts(number, optional). - Configuration: Form titled "URL" with two fields:
url(Facebook group URL) andnumber of posts(to limit scraping). - Inputs: External user input (form)
- Outputs: JSON with
urlandpost(number of posts) - Edge cases: Missing URL will prevent trigger; number of posts missing will default to scraping all posts.
- Type:
-
Edit Fields1
- Type:
set - Role: Formats and sets fields for downstream use; essentially passes input values while setting default or structured fields.
- Configuration:
- Sets
urlfrom form input - Sets
postnumber to 11 (default number of posts to scrape if not provided)
- Sets
- Inputs: From "On form submission" node
- Outputs: JSON with set fields for next node
- Edge cases: If
postis missing, defaults to 11.
- Type:
-
2.2 Facebook Group Posts Scraping and Storage
-
Overview:
This block uses the Apify Facebook Group Posts actor to scrape posts from the specified Facebook group URL. The scraped posts are then formatted and stored in the Supabasepoststable. -
Nodes Involved:
- Run an Actor
- Get dataset items
- Add A Post
- Sticky Note (explanatory)
-
Node Details:
-
Run an Actor
- Type: Apify actor node
- Role: Executes the Apify actor "AtBpiepuIUNs2k2ku" (Facebook Group Posts Scraper) with parameters from the input.
- Configuration:
- Memory: 16 GB
- Actor ID: fixed to Facebook Group Posts Scraper
- Custom body includes:
count: number of posts to scrape (from input JSON)minDelayandmaxDelayfor pacing requestsproxyenabled via Apify proxyscrapeGroupPosts.groupUrl: URL of Facebook group to scrapesortType: "new_posts" (sort posts by newest first)
- Inputs: From "Edit Fields1" node
- Outputs: Initiates dataset creation on Apify for scraped posts
- Edge cases: API errors, proxy failures, actor timeout, invalid group URL, missing permissions.
- Credentials: Apify API credentials required.
-
Get dataset items
- Type: Apify dataset retrieval node
- Role: Retrieves scraped posts data from Apify dataset created by the actor.
- Configuration: Uses dataset ID from the output of "Run an Actor" node.
- Inputs: From "Run an Actor" node output
- Outputs: JSON array of posts data
- Edge cases: Dataset not ready, empty dataset, API errors.
- Credentials: Apify API credentials required.
-
Add A Post
- Type: Supabase insert node
- Role: Inserts each post into Supabase database table
posts. - Configuration: Maps fields from scraped data to Supabase columns:
createdat: formatted date from UNIX timestamp (day, month name, year)url: post URLuser_name: poster’s nametext: post text contentreactioncount,sharecount,commentcount: respective countsattachments: array of attachment URLs extracted from post data
- Inputs: From "Get dataset items" node
- Outputs: Confirmation of inserted rows
- Edge cases: Invalid data format, insertion errors, missing fields.
- Credentials: Supabase API credentials required.
-
Sticky Note
- Content explains this block automates Facebook group post scraping and storage into Supabase posts table.
- Provides high-level summary and purpose.
-
2.3 Comments and Sub-comments Scraping and Storage
-
Overview:
This block scrapes comments for each post stored earlier, adds them into Supabasecommentstable, then conditionally scrapes sub-comments (replies to comments) and stores those in a separaterepliestable. -
Nodes Involved:
- ScrapeComments
- Get dataset items1
- Add A Comment
- If1
- Edit Fields
- Split Out
- Create a row
- Sticky Note1
- Sticky Note2
-
Node Details:
-
ScrapeComments
- Type: Apify actor node
- Role: Executes Apify actor "us5srxAYnsrkgUv2v" that scrapes comments from a Facebook post URL.
- Configuration:
- Memory: 8 GB
- Actor ID: fixed to Facebook Comments Scraper
- Custom body includes:
includeNestedComments: false (only first-level comments here)startUrls: post URL passed from previous nodeviewOption: "RANKED_UNFILTERED" (all comments unfiltered)
- Inputs: From "Add A Post" (post URL)
- Outputs: Dataset created on Apify with comments data
- Edge cases: Actor failures, missing post URL, rate limits, comments disabled on post.
- Credentials: Apify API credentials required.
-
Get dataset items1
- Type: Apify dataset retrieval node
- Role: Retrieves scraped comments for a post from Apify dataset.
- Configuration: Uses dataset ID from "ScrapeComments" node output.
- Inputs: From "ScrapeComments"
- Outputs: JSON array of comments
- Edge cases: Dataset empty, retrieval errors.
- Credentials: Apify API credentials required.
-
Add A Comment
- Type: Supabase insert node
- Role: Inserts comments into Supabase
commentstable, linking them to group title, post title, and original post text. - Configuration:
- Maps fields such as:
group_title: extracted from URL path segmentspost_title: post title from previous datafacebookurl: original input URLcommenturl,text,profilename,likescount,commentscountfrom comment dataAttachments: array of attachment URLs extracted safely from nested objects
- Maps fields such as:
- Inputs: From "Get dataset items1"
- Outputs: Confirmation of comment insertion
- Edge cases: Missing fields, attachment mapping issues, insertion errors.
- Credentials: Supabase API credentials required.
-
If1
- Type: Conditional (if) node
- Role: Checks if a comment has more than 0 nested sub-comments (commentsCount > 0) to decide if sub-comments scraping is needed.
- Configuration: Condition:
commentsCountfrom comment data > 0 - Inputs: From "Add A Comment" node output
- Outputs: True branch proceeds to scrape sub-comments, false ends branch
- Edge cases: Missing commentsCount, null values causing false negatives.
-
Edit Fields
- Type:
set - Role: Prepares the
commentsarray from the JSON data retrieved to enable splitting into individual comment items. - Configuration: Assigns
commentsfield with the array fromGet dataset items1JSON - Inputs: From "If1" true branch
- Outputs: Structured JSON to be split out
- Edge cases: Empty or missing comments array.
- Type:
-
Split Out
- Type:
splitOut - Role: Splits the
commentsarray into individual messages to process sub-comments separately. - Configuration: Splits on
commentsfield - Inputs: From "Edit Fields"
- Outputs: Individual comment JSON items
- Edge cases: Empty array results in no output.
- Type:
-
Create a row
- Type: Supabase insert node
- Role: Inserts sub-comments (replies) into Supabase
repliestable, linking back to parent comment and post. - Configuration: Maps fields including:
parent_comment,parent_textfrom parent comment datacommenturl,text,profileurl,profilenamefrom sub-comment datapost_textfrom original post text
- Inputs: From "Split Out"
- Outputs: Confirmation of sub-comment insertion
- Edge cases: Missing parent data, insertion errors.
- Credentials: Supabase API credentials required.
-
Sticky Note1
- Explains that this block scrapes comments for previously scraped posts and stores them in Supabase comments table.
- Highlights automated syncing and processing features.
-
Sticky Note2
- Explains the sub-comments scraping step and its separate storage in Supabase replies table.
- Emphasizes automation and real-time syncing.
-
3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| On form submission | formTrigger | Entry point; receives Facebook group URL and post count | - | Edit Fields1 | |
| Edit Fields1 | set | Sets and formats input fields for scraping | On form submission | Run an Actor | |
| Run an Actor | Apify actor node | Scrapes Facebook group posts using Apify actor | Edit Fields1 | Get dataset items | |
| Get dataset items | Apify dataset retrieval | Retrieves scraped posts dataset | Run an Actor | Add A Post | |
| Add A Post | Supabase insert | Inserts posts into Supabase posts table |
Get dataset items | ScrapeComments | |
| ScrapeComments | Apify actor node | Scrapes comments for each post | Add A Post | Get dataset items1 | |
| Get dataset items1 | Apify dataset retrieval | Retrieves scraped comments dataset | ScrapeComments | Add A Comment | |
| Add A Comment | Supabase insert | Inserts comments into Supabase comments table |
Get dataset items1 | If1 | |
| If1 | if | Checks if comments have sub-comments to scrape | Add A Comment | Edit Fields (true) | |
| Edit Fields | set | Prepares comments array for splitting | If1 (true branch) | Split Out | |
| Split Out | splitOut | Splits comments array into individual sub-comments | Edit Fields | Create a row | |
| Create a row | Supabase insert | Inserts sub-comments into Supabase replies table |
Split Out | ||
| Sticky Note | stickyNote | Overview of Facebook posts scraping block | - | - | ## Facebook Scraping Automation \nAutomation: This workflow scrapes posts from a Facebook group and inserts them into a Supabase database. \nStep-by-Step: \n1. Facebook Group Scraping\n2. Data Formatting\n\nFeatures: Automated scraping of posts and real-time syncing with Supabase. |
| Sticky Note1 | stickyNote | Overview of comments scraping block | - | - | ## Scraping Comments for Previous Posts \nAutomation: Scrapes comments for posts and stores them in Supabase comments table. \nFeatures: Automated scraping of comments and syncing with Supabase. |
| Sticky Note2 | stickyNote | Overview of sub-comments scraping block | - | - | ## Scraping Comments on Comments (Sub-comments) \nAutomation: Scrapes sub-comments and stores them in a separate Supabase table. \nFeatures: Automated scraping and syncing with Supabase. |
4. Reproducing the Workflow from Scratch
-
Create a form trigger node ("On form submission")
- Type:
formTrigger - Configure webhook with fields:
url(string, required, placeholder: "Enter URL")post(number, optional, placeholder: "Enter number of posts...")
- Title the form "URL".
- Type:
-
Add a Set node ("Edit Fields1")
- Type:
set - Assign:
url={{$json["url"]}}(from form)post= 11 (default number of posts if not provided)
- Type:
-
Add an Apify actor node ("Run an Actor")
- Type: Apify node
- Credentials: Apify API with valid key
- Configure:
- Memory: 16384 MB
- Actor ID:
AtBpiepuIUNs2k2ku(Facebook Group Posts Scraper) - Custom Body (use expression):
{ "count": {{$json["post"]}}, "maxDelay": 10, "minDelay": 1, "proxy": { "useApifyProxy": true }, "scrapeGroupPosts.groupUrl": "{{$json["url"]}}", "sortType": "new_posts" }
-
Add Apify dataset retrieval node ("Get dataset items")
- Type: Apify node
- Credentials: Apify API
- Configure to get dataset items from the dataset ID returned by "Run an Actor".
-
Add Supabase insert node ("Add A Post")
- Credentials: Supabase API
- Table:
posts - Map fields from dataset items:
createdat: format UNIX timestampcreatedAtto string "day, month name, year"url: post URL or nulluser_name: poster’s name or nulltext: post text or nullreactioncount,sharecount,commentcount: respective counts or nullattachments: extract array of URLs from attachments or null
-
Add Apify actor node ("ScrapeComments")
- Credentials: Apify API
- Memory: 8192 MB
- Actor ID:
us5srxAYnsrkgUv2v(Facebook Comments Scraper) - Custom Body:
{ "includeNestedComments": false, "startUrls": [{"url": "{{$json["url"]}}"}], "viewOption": "RANKED_UNFILTERED" }
-
Add Apify dataset node ("Get dataset items1")
- Credentials: Apify API
- Retrieve the dataset for comments from "ScrapeComments".
-
Add Supabase insert node ("Add A Comment")
- Credentials: Supabase API
- Table:
comments - Map fields:
group_title: extract last segment of URL path from prior steppost_title: from input or prior post datafacebookurl: original URL inputcommenturl,text,profilename,likescount,commentscountfrom comment dataAttachments: map URLs from attachments safely
-
Add If node ("If1")
- Condition:
commentsCount> 0 - Input: from "Add A Comment"
- Condition:
-
Add Set node ("Edit Fields") (on If true branch)
- Assign
comments=$('Get dataset items1').item.json.comments
- Assign
-
Add Split Out node ("Split Out")
- Split field:
comments
- Split field:
-
Add Supabase insert node ("Create a row")
- Credentials: Supabase API
- Table:
replies - Map fields:
parent_comment,parent_textfrom parent comment datacommenturl,text,profileurl,profilenamefrom sub-commentpost_textfrom original post text
-
Connect nodes in the order described:
- On form submission → Edit Fields1 → Run an Actor → Get dataset items → Add A Post → ScrapeComments → Get dataset items1 → Add A Comment → If1 → (true) Edit Fields → Split Out → Create a row
-
Add sticky notes for documentation and clarity at relevant points, summarizing each block’s purpose.
5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
| This workflow requires Apify API credentials with access to Facebook scraping actors. | Credential setup in Apify portal |
Supabase database must have three tables: posts, comments, and replies with matching schema fields as mapped. |
Supabase schema design |
| Facebook scraping actors rely on proxy usage enabled in Apify to avoid rate limiting and blocks. | Apify Proxy usage recommended |
| Date formatting in "Add A Post" node uses JavaScript Date functions on UNIX timestamp fields. | See node expressions for details |
Sub-comments scraping is conditional on comments having replies (commentsCount > 0). |
Prevents unnecessary scraping |
| Workflow handles null or missing data gracefully by defaulting to null in database inserts. | Reduces insertion errors |
| For more info on Apify Facebook actors: https://apify.com/store/actors | Apify actor marketplace |
Disclaimer: The provided text originates solely from an automated workflow built with n8n, an integration and automation tool. All processing complies fully with applicable content policies and contains no illegal, offensive, or protected elements. All handled data is legal and public.