This commit is contained in:
nusquama
2026-03-15 12:01:40 +08:00
parent 687d7b8c88
commit d2b073603c

View File

@@ -0,0 +1,593 @@
Look up contact details and addresses from names using ScraperCity
https://n8nworkflows.xyz/workflows/look-up-contact-details-and-addresses-from-names-using-scrapercity-13929
# Look up contact details and addresses from names using ScraperCity
# 1. Workflow Overview
This workflow performs an asynchronous people-search lookup using the ScraperCity People Finder API. It accepts one or more search inputs for a person—name, phone number, and/or email—submits a lookup job, polls the API until the job completes, downloads the resulting CSV file, parses it into structured records, removes duplicates, and appends the final rows to Google Sheets.
Typical use cases include:
- Enriching a lead or contact from limited known details
- Finding likely addresses and contact details from a persons name
- Storing lookup results for manual review or later processing in a spreadsheet
The workflow is organized into four logical blocks.
## 1.1 Input Configuration
The workflow begins with a manual trigger and a Set node used to define the search criteria. This is the operator-controlled entry point.
## 1.2 ScraperCity Job Submission
The configured search inputs are sent to ScraperCitys `/scrape/people-finder` endpoint. The returned `runId` is extracted and stored for later status checks and final download.
## 1.3 Asynchronous Polling Loop
Because ScraperCity jobs are not immediate, the workflow waits before the first status check, then repeatedly polls the status endpoint. If the scrape is complete, execution continues; otherwise it waits 60 seconds and loops again. A Split In Batches node limits the maximum polling iterations.
## 1.4 Result Download, Parsing, Deduplication, and Storage
Once the scrape succeeds, the workflow downloads the result file, parses CSV text into JSON records using a Code node, removes duplicates based on name and address, and appends the final records to Google Sheets.
---
# 2. Block-by-Block Analysis
## Block 1 — Input Configuration
### Overview
This block defines how the workflow starts and where the user enters the search criteria. It is designed for manual execution and easy modification before each run.
### Nodes Involved
- When clicking 'Execute workflow'
- Configure Search Inputs
### Node Details
#### 1. When clicking 'Execute workflow'
- **Type and role:** `n8n-nodes-base.manualTrigger`; manual entry point for ad hoc execution.
- **Configuration choices:** No parameters are configured. It simply starts the workflow when the user clicks Execute.
- **Key expressions or variables used:** None.
- **Input and output connections:**
- Input: none
- Output: `Configure Search Inputs`
- **Version-specific requirements:** Type version `1`.
- **Edge cases or potential failure types:**
- No runtime failure is likely here.
- Only usable in manual or test-style executions, not as an autonomous production trigger.
- **Sub-workflow reference:** None.
#### 2. Configure Search Inputs
- **Type and role:** `n8n-nodes-base.set`; defines the lookup parameters for the person search.
- **Configuration choices:**
- Creates four fields:
- `searchName` = `"Jane Doe"`
- `searchPhone` = `""`
- `searchEmail` = `""`
- `maxResults` = `3`
- This node acts as the main configurable input surface for the workflow.
- **Key expressions or variables used:**
- Downstream nodes use:
- `$json.searchName`
- `$json.searchPhone`
- `$json.searchEmail`
- `$json.maxResults`
- **Input and output connections:**
- Input: `When clicking 'Execute workflow'`
- Output: `Start People Finder Scrape`
- **Version-specific requirements:** Type version `3.4`.
- **Edge cases or potential failure types:**
- If all three search fields are empty, the API request may still be sent but may return no results or an error depending on ScraperCity validation.
- `maxResults` should be a reasonable positive number; extreme values may cause API rejection or unnecessary processing time.
- Inputs are single strings, but the downstream JSON body wraps non-empty values into one-element arrays.
- **Sub-workflow reference:** None.
---
## Block 2 — ScraperCity Job Submission
### Overview
This block sends the search request to ScraperCity and stores the returned job identifier. The `runId` becomes the central reference for all subsequent polling and result retrieval.
### Nodes Involved
- Start People Finder Scrape
- Store Run ID
### Node Details
#### 3. Start People Finder Scrape
- **Type and role:** `n8n-nodes-base.httpRequest`; submits a people-finder scrape job to ScraperCity.
- **Configuration choices:**
- Method: `POST`
- URL: `https://app.scrapercity.com/api/v1/scrape/people-finder`
- Body format: JSON
- Authentication: Generic credential type using HTTP Header Auth
- Credential expected: `ScraperCity API Key`
- **Key expressions or variables used:**
- Request body is dynamically built from Set node output:
- `name`: one-element JSON array if `searchName` exists, otherwise `[]`
- `email`: one-element JSON array if `searchEmail` exists, otherwise `[]`
- `phone_number`: one-element JSON array if `searchPhone` exists, otherwise `[]`
- `street_citystatezip`: always `[]`
- `max_results`: `$json.maxResults`
- Expressions:
- `{{ $json.searchName ? '["' + $json.searchName + '"]' : '[]' }}`
- `{{ $json.searchEmail ? '["' + $json.searchEmail + '"]' : '[]' }}`
- `{{ $json.searchPhone ? '["' + $json.searchPhone + '"]' : '[]' }}`
- `{{ $json.maxResults }}`
- **Input and output connections:**
- Input: `Configure Search Inputs`
- Output: `Store Run ID`
- **Version-specific requirements:** Type version `4.2`.
- **Edge cases or potential failure types:**
- Missing or invalid HTTP Header Auth credential will cause authentication failure.
- If the body expression renders malformed JSON, the request will fail before or during submission.
- Special characters such as quotes in the search values may break the string-built JSON body because values are concatenated directly rather than encoded safely.
- API rate limits, invalid input, or service outages can cause non-2xx responses.
- If ScraperCity changes field names or endpoint behavior, downstream nodes may break.
- **Sub-workflow reference:** None.
#### 4. Store Run ID
- **Type and role:** `n8n-nodes-base.set`; extracts and preserves the `runId` returned by ScraperCity.
- **Configuration choices:**
- Assigns:
- `runId` = `{{ $json.runId }}`
- **Key expressions or variables used:**
- `={{ $json.runId }}`
- **Input and output connections:**
- Input: `Start People Finder Scrape`
- Output: `Wait Before First Status Check`
- **Version-specific requirements:** Type version `3.4`.
- **Edge cases or potential failure types:**
- If the API response does not include `runId`, later nodes will generate invalid URLs.
- If ScraperCity returns an error payload instead of the expected structure, this node may silently set an empty or undefined `runId`.
- **Sub-workflow reference:** None.
---
## Block 3 — Asynchronous Polling Loop
### Overview
This block handles the long-running asynchronous nature of the scrape job. It delays the first status check, then polls repeatedly until the status becomes `SUCCEEDED`, with a 60-second wait between attempts and a maximum loop count enforced by Split In Batches.
### Nodes Involved
- Wait Before First Status Check
- Poll Loop
- Check Scrape Status
- Is Scrape Complete?
- Wait 60 Seconds Before Retry
### Node Details
#### 5. Wait Before First Status Check
- **Type and role:** `n8n-nodes-base.wait`; pauses the workflow before the initial status poll.
- **Configuration choices:**
- Wait amount: `30`
- No explicit unit is shown in the JSON payload excerpt; in practice this node is intended as an initial delay before the first check.
- **Key expressions or variables used:** None.
- **Input and output connections:**
- Input: `Store Run ID`
- Output: `Poll Loop`
- **Version-specific requirements:** Type version `1.1`.
- **Edge cases or potential failure types:**
- If n8n is not configured to support wait/resume correctly, delayed execution may not resume as expected.
- Long waits rely on the instances persistence and execution-resume configuration.
- **Sub-workflow reference:** None.
#### 6. Poll Loop
- **Type and role:** `n8n-nodes-base.splitInBatches`; used here as an iteration controller rather than for classic item batching.
- **Configuration choices:**
- `maxIterations`: `30`
- It feeds the current item into the poll-check branch and accepts the retry branch back into itself.
- **Key expressions or variables used:** None directly.
- **Input and output connections:**
- Inputs:
- `Wait Before First Status Check`
- `Wait 60 Seconds Before Retry`
- Outputs:
- Main output 0: `Check Scrape Status`
- Output 1: unused
- **Version-specific requirements:** Type version `3`.
- **Edge cases or potential failure types:**
- If the scrape never reaches `SUCCEEDED` within 30 iterations, the loop will stop without a dedicated timeout-handling branch.
- Because no explicit failed/completed timeout branch exists, the workflow may end silently after iteration exhaustion.
- This pattern assumes one item only; multiple items could produce more complex iteration behavior.
- **Sub-workflow reference:** None.
#### 7. Check Scrape Status
- **Type and role:** `n8n-nodes-base.httpRequest`; checks the status of the submitted scrape job.
- **Configuration choices:**
- Method: `GET`
- URL: `https://app.scrapercity.com/api/v1/scrape/status/{{ $json.runId }}`
- Authentication: Generic HTTP Header Auth
- Credential expected: `ScraperCity API Key`
- **Key expressions or variables used:**
- URL expression: `=https://app.scrapercity.com/api/v1/scrape/status/{{ $json.runId }}`
- **Input and output connections:**
- Input: `Poll Loop`
- Output: `Is Scrape Complete?`
- **Version-specific requirements:** Type version `4.2`.
- **Edge cases or potential failure types:**
- Missing or invalid `runId` causes invalid endpoint requests.
- Auth failures, API downtime, or rate limiting can interrupt the polling loop.
- If the API returns an unexpected structure, the IF node may not evaluate as intended.
- **Sub-workflow reference:** None.
#### 8. Is Scrape Complete?
- **Type and role:** `n8n-nodes-base.if`; routes execution based on whether the scrape has completed successfully.
- **Configuration choices:**
- Condition checks whether `$json.status` equals `"SUCCEEDED"`.
- Strict type validation and case-sensitive comparison are enabled.
- **Key expressions or variables used:**
- Left value: `={{ $json.status }}`
- Right value: `SUCCEEDED`
- **Input and output connections:**
- Input: `Check Scrape Status`
- Output 0 (true): `Download Results`
- Output 1 (false): `Wait 60 Seconds Before Retry`
- **Version-specific requirements:** Type version `2.2`.
- **Edge cases or potential failure types:**
- Only `SUCCEEDED` is treated as terminal success.
- Other statuses such as `FAILED`, `CANCELLED`, or unexpected values are all treated as “not complete” and retried, which may be undesirable.
- If `status` is missing or null, the node will route to retry.
- **Sub-workflow reference:** None.
#### 9. Wait 60 Seconds Before Retry
- **Type and role:** `n8n-nodes-base.wait`; delays the next poll attempt.
- **Configuration choices:**
- Wait amount: `60`
- **Key expressions or variables used:** None.
- **Input and output connections:**
- Input: `Is Scrape Complete?` false branch
- Output: `Poll Loop`
- **Version-specific requirements:** Type version `1.1`.
- **Edge cases or potential failure types:**
- Same persistence/resume considerations as the first Wait node.
- Large numbers of delayed executions can affect queue/storage load on busy n8n instances.
- **Sub-workflow reference:** None.
---
## Block 4 — Result Download, Parsing, Deduplication, and Storage
### Overview
After the scrape completes successfully, this block retrieves the CSV export, converts it to structured JSON rows, removes duplicate contacts using name and address, and appends the remaining records to a Google Sheet.
### Nodes Involved
- Download Results
- Parse CSV Results
- Remove Duplicate Contacts
- Save Results to Google Sheets
### Node Details
#### 10. Download Results
- **Type and role:** `n8n-nodes-base.httpRequest`; fetches the completed scrape output.
- **Configuration choices:**
- Method: `GET`
- URL: `https://app.scrapercity.com/api/downloads/{{ $json.runId }}`
- Authentication: Generic HTTP Header Auth
- Credential expected: `ScraperCity API Key`
- **Key expressions or variables used:**
- URL expression: `=https://app.scrapercity.com/api/downloads/{{ $json.runId }}`
- **Input and output connections:**
- Input: `Is Scrape Complete?` true branch
- Output: `Parse CSV Results`
- **Version-specific requirements:** Type version `4.2`.
- **Edge cases or potential failure types:**
- Invalid or expired `runId` may return 404 or similar errors.
- API may return raw text, JSON with a `data` field, or another response shape; the Code node attempts to tolerate several possibilities.
- If the result file is very large, memory usage may become significant.
- **Sub-workflow reference:** None.
#### 11. Parse CSV Results
- **Type and role:** `n8n-nodes-base.code`; parses CSV text into one item per row.
- **Configuration choices:**
- JavaScript code manually:
- Reads raw content from `items[0].json.data`, or `items[0].json.body`, or `items[0].json`
- Converts the payload to a string if needed
- Rejects empty CSV
- Splits into lines
- Uses a custom `parseCsvLine()` function to support quoted values and escaped double quotes
- Uses the first row as headers
- Builds one JSON object per data row
- **Key expressions or variables used:**
- Internal logic references:
- `items[0].json.data`
- `items[0].json.body`
- `items[0].json`
- Output rows are plain objects using CSV headers as keys.
- **Input and output connections:**
- Input: `Download Results`
- Output: `Remove Duplicate Contacts`
- **Version-specific requirements:** Type version `2`.
- **Edge cases or potential failure types:**
- If the response is not valid CSV text, parsing may produce invalid rows.
- Splitting on `\n` alone may leave trailing `\r` characters in some environments.
- Multiline CSV fields are not handled correctly because the parser splits the entire file on line breaks before parsing quotes.
- If there is only a header row and no data rows, it returns an error item instead of zero rows.
- If column counts vary per row, missing values become empty strings.
- **Sub-workflow reference:** None.
#### 12. Remove Duplicate Contacts
- **Type and role:** `n8n-nodes-base.removeDuplicates`; removes duplicate result rows before persistence.
- **Configuration choices:**
- Compares records using:
- `full_name`
- `address`
- **Key expressions or variables used:** None.
- **Input and output connections:**
- Input: `Parse CSV Results`
- Output: `Save Results to Google Sheets`
- **Version-specific requirements:** Type version `2`.
- **Edge cases or potential failure types:**
- If the CSV headers do not contain `full_name` or `address`, deduplication may be ineffective or based on empty values.
- Slight differences in formatting, casing, or whitespace can prevent expected deduplication.
- Different people at the same address with the same name representation may be collapsed unintentionally.
- **Sub-workflow reference:** None.
#### 13. Save Results to Google Sheets
- **Type and role:** `n8n-nodes-base.googleSheets`; appends the final contact rows into a Google Sheet.
- **Configuration choices:**
- Operation: `append`
- Mapping mode: automatic mapping from input data
- Google Sheet document ID: left empty in the provided workflow and must be configured
- Sheet name / sheet ID: left empty in the provided workflow and must be configured
- Credential: Google Sheets OAuth2
- **Key expressions or variables used:**
- No custom expressions beyond unresolved resource locator placeholders.
- **Input and output connections:**
- Input: `Remove Duplicate Contacts`
- Output: none
- **Version-specific requirements:** Type version `4.6`.
- **Edge cases or potential failure types:**
- The node is incomplete as provided; both target spreadsheet and target sheet must be selected.
- Missing or invalid Google OAuth2 credentials will cause auth failure.
- Auto-mapping expects the destination sheet headers to align reasonably with incoming field names.
- Google API quotas, permission issues, or sheet structure changes may cause failures.
- If the previous node outputs an error item rather than normal records, that item may also be appended unless additional filtering is added.
- **Sub-workflow reference:** None.
---
# 3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| When clicking 'Execute workflow' | Manual Trigger | Manual start of the workflow | | Configure Search Inputs | ## Configuration<br>Set your search target in **Configure Search Inputs** -- name, phone, or email. Add your ScraperCity API key credential to the HTTP nodes. |
| Configure Search Inputs | Set | Defines search criteria and result limit | When clicking 'Execute workflow' | Start People Finder Scrape | ## How it works<br>1. Enter a name, phone, or email in the Configure Search Inputs node.<br>2. The workflow submits a people-finder job to the ScraperCity API.<br>3. It polls for completion every 60 seconds (jobs take 2-20 min).<br>4. Results are downloaded, parsed from CSV, deduped, and saved to Google Sheets.<br><br>## Setup steps<br>1. Create a Header Auth credential named **ScraperCity API Key** -- set header to `Authorization`, value to `Bearer YOUR_KEY`.<br>2. Connect a Google Sheets OAuth2 credential.<br>3. Edit **Configure Search Inputs** with your target person.<br>4. Set your Google Sheet ID in **Save Results to Google Sheets**.<br>5. Click Execute workflow.<br><br>## Configuration<br>Set your search target in **Configure Search Inputs** -- name, phone, or email. Add your ScraperCity API key credential to the HTTP nodes. |
| Start People Finder Scrape | HTTP Request | Submits the ScraperCity people-finder job | Configure Search Inputs | Store Run ID | ## Submit Scrape Job<br>**Start People Finder Scrape** POSTs to the API and returns a job ID. **Store Run ID** saves it for polling and download references. |
| Store Run ID | Set | Extracts and stores the ScraperCity run ID | Start People Finder Scrape | Wait Before First Status Check | ## Submit Scrape Job<br>**Start People Finder Scrape** POSTs to the API and returns a job ID. **Store Run ID** saves it for polling and download references. |
| Wait Before First Status Check | Wait | Initial delay before polling begins | Store Run ID | Poll Loop | |
| Poll Loop | Split In Batches | Controls repeated polling iterations | Wait Before First Status Check; Wait 60 Seconds Before Retry | Check Scrape Status | ## Async Polling Loop<br>**Check Scrape Status** queries the job. **Is Scrape Complete?** routes to download on success. Otherwise **Wait 60 Seconds Before Retry** loops back through the poll. |
| Check Scrape Status | HTTP Request | Polls ScraperCity for job status | Poll Loop | Is Scrape Complete? | ## Async Polling Loop<br>**Check Scrape Status** queries the job. **Is Scrape Complete?** routes to download on success. Otherwise **Wait 60 Seconds Before Retry** loops back through the poll. |
| Is Scrape Complete? | IF | Branches on ScraperCity job completion status | Check Scrape Status | Download Results; Wait 60 Seconds Before Retry | ## Async Polling Loop<br>**Check Scrape Status** queries the job. **Is Scrape Complete?** routes to download on success. Otherwise **Wait 60 Seconds Before Retry** loops back through the poll. |
| Wait 60 Seconds Before Retry | Wait | Delay between polling attempts | Is Scrape Complete? | Poll Loop | ## Async Polling Loop<br>**Check Scrape Status** queries the job. **Is Scrape Complete?** routes to download on success. Otherwise **Wait 60 Seconds Before Retry** loops back through the poll. |
| Download Results | HTTP Request | Downloads completed CSV results | Is Scrape Complete? | Parse CSV Results | ## Parse and Save Results<br>**Download Results** fetches the CSV. **Parse CSV Results** converts to JSON records. **Remove Duplicate Contacts** dedupes. **Save Results to Google Sheets** writes each row. |
| Parse CSV Results | Code | Parses CSV text into JSON rows | Download Results | Remove Duplicate Contacts | ## Parse and Save Results<br>**Download Results** fetches the CSV. **Parse CSV Results** converts to JSON records. **Remove Duplicate Contacts** dedupes. **Save Results to Google Sheets** writes each row. |
| Remove Duplicate Contacts | Remove Duplicates | Deduplicates contacts by full name and address | Parse CSV Results | Save Results to Google Sheets | ## Parse and Save Results<br>**Download Results** fetches the CSV. **Parse CSV Results** converts to JSON records. **Remove Duplicate Contacts** dedupes. **Save Results to Google Sheets** writes each row. |
| Save Results to Google Sheets | Google Sheets | Appends final results to a spreadsheet | Remove Duplicate Contacts | | ## How it works<br>1. Enter a name, phone, or email in the Configure Search Inputs node.<br>2. The workflow submits a people-finder job to the ScraperCity API.<br>3. It polls for completion every 60 seconds (jobs take 2-20 min).<br>4. Results are downloaded, parsed from CSV, deduped, and saved to Google Sheets.<br><br>## Setup steps<br>1. Create a Header Auth credential named **ScraperCity API Key** -- set header to `Authorization`, value to `Bearer YOUR_KEY`.<br>2. Connect a Google Sheets OAuth2 credential.<br>3. Edit **Configure Search Inputs** with your target person.<br>4. Set your Google Sheet ID in **Save Results to Google Sheets**.<br>5. Click Execute workflow.<br><br>## Parse and Save Results<br>**Download Results** fetches the CSV. **Parse CSV Results** converts to JSON records. **Remove Duplicate Contacts** dedupes. **Save Results to Google Sheets** writes each row. |
| Overview | Sticky Note | Workspace documentation | | | |
| Section - Configuration | Sticky Note | Workspace documentation for input setup | | | |
| Section - Submit and Store | Sticky Note | Workspace documentation for submission phase | | | |
| Section - Async Polling Loop | Sticky Note | Workspace documentation for polling phase | | | |
| Section - Parse and Save | Sticky Note | Workspace documentation for result handling | | | |
---
# 4. Reproducing the Workflow from Scratch
1. **Create a new workflow** in n8n.
2. **Add a Manual Trigger node**
- Node type: **Manual Trigger**
- Name it: **When clicking 'Execute workflow'**
- No extra configuration is required.
3. **Add a Set node** after the trigger
- Node type: **Set**
- Name it: **Configure Search Inputs**
- Add these fields:
1. `searchName` as String, default example: `Jane Doe`
2. `searchPhone` as String, default example: empty string
3. `searchEmail` as String, default example: empty string
4. `maxResults` as Number, default value: `3`
- Connect:
- `When clicking 'Execute workflow'``Configure Search Inputs`
4. **Create the ScraperCity credential**
- Credential type: **HTTP Header Auth**
- Credential name: **ScraperCity API Key**
- Header name: `Authorization`
- Header value: `Bearer YOUR_KEY`
- Use the exact credential on all ScraperCity HTTP Request nodes.
5. **Add an HTTP Request node** to submit the scrape job
- Node type: **HTTP Request**
- Name it: **Start People Finder Scrape**
- Method: `POST`
- URL: `https://app.scrapercity.com/api/v1/scrape/people-finder`
- Authentication: **Generic Credential Type**
- Generic Auth Type: **HTTP Header Auth**
- Select credential: **ScraperCity API Key**
- Enable body sending
- Specify body as **JSON**
- Use this expression-based JSON body logic:
- `name`: one-element array when `searchName` is non-empty, else empty array
- `email`: one-element array when `searchEmail` is non-empty, else empty array
- `phone_number`: one-element array when `searchPhone` is non-empty, else empty array
- `street_citystatezip`: empty array
- `max_results`: value from `maxResults`
- Recreate the same behavior as:
- name from `$json.searchName`
- email from `$json.searchEmail`
- phone from `$json.searchPhone`
- max results from `$json.maxResults`
- Connect:
- `Configure Search Inputs``Start People Finder Scrape`
6. **Add a Set node** to preserve the run ID
- Node type: **Set**
- Name it: **Store Run ID**
- Add one field:
- `runId` as String
- Value expression: `{{ $json.runId }}`
- Connect:
- `Start People Finder Scrape``Store Run ID`
7. **Add a Wait node** for the first delay
- Node type: **Wait**
- Name it: **Wait Before First Status Check**
- Configure it to wait **30 seconds**
- Connect:
- `Store Run ID``Wait Before First Status Check`
8. **Add a Split In Batches node** to control polling iterations
- Node type: **Split In Batches**
- Name it: **Poll Loop**
- In options, set **Max Iterations** to `30`
- Connect:
- `Wait Before First Status Check``Poll Loop`
9. **Add an HTTP Request node** to check job status
- Node type: **HTTP Request**
- Name it: **Check Scrape Status**
- Method: `GET`
- URL expression:
- `https://app.scrapercity.com/api/v1/scrape/status/{{ $json.runId }}`
- Authentication: **Generic Credential Type**
- Generic Auth Type: **HTTP Header Auth**
- Credential: **ScraperCity API Key**
- Connect:
- `Poll Loop` main output → `Check Scrape Status`
10. **Add an IF node** to test completion
- Node type: **IF**
- Name it: **Is Scrape Complete?**
- Condition:
- Left value: `{{ $json.status }}`
- Operator: equals
- Right value: `SUCCEEDED`
- Use strict/case-sensitive comparison as in the original workflow.
- Connect:
- `Check Scrape Status``Is Scrape Complete?`
11. **Add a second Wait node** for retries
- Node type: **Wait**
- Name it: **Wait 60 Seconds Before Retry**
- Configure it to wait **60 seconds**
- Connect:
- `Is Scrape Complete?` false output → `Wait 60 Seconds Before Retry`
12. **Close the loop**
- Connect:
- `Wait 60 Seconds Before Retry``Poll Loop`
- This creates the polling cycle.
- The Split In Batches nodes max iterations prevents infinite looping.
13. **Add an HTTP Request node** to download results
- Node type: **HTTP Request**
- Name it: **Download Results**
- Method: `GET`
- URL expression:
- `https://app.scrapercity.com/api/downloads/{{ $json.runId }}`
- Authentication: **Generic Credential Type**
- Generic Auth Type: **HTTP Header Auth**
- Credential: **ScraperCity API Key**
- Connect:
- `Is Scrape Complete?` true output → `Download Results`
14. **Add a Code node** to parse the CSV
- Node type: **Code**
- Name it: **Parse CSV Results**
- Language: JavaScript
- Paste logic equivalent to the provided workflow:
- Read the raw response from `items[0].json.data`, fallback to `items[0].json.body`, fallback to `items[0].json`
- Convert non-string payloads to a string
- Return an error object if the CSV text is empty
- Split the first line into headers
- Parse each remaining line into values with support for quoted commas and escaped quotes
- Build one output item per row
- Connect:
- `Download Results``Parse CSV Results`
15. **Add a Remove Duplicates node**
- Node type: **Remove Duplicates**
- Name it: **Remove Duplicate Contacts**
- Configure fields to compare:
- `full_name`
- `address`
- Connect:
- `Parse CSV Results``Remove Duplicate Contacts`
16. **Create the Google Sheets credential**
- Credential type: **Google Sheets OAuth2 API**
- Authenticate with a Google account that has edit access to the destination spreadsheet.
17. **Prepare the destination Google Sheet**
- Create or choose a spreadsheet.
- Create the target tab.
- Ideally add header columns matching the CSV-derived field names from ScraperCity, or at least compatible columns for auto-mapping.
- Common fields may include name, address, phone, email, or similar depending on the API export structure.
18. **Add a Google Sheets node**
- Node type: **Google Sheets**
- Name it: **Save Results to Google Sheets**
- Operation: **Append**
- Mapping mode: **Auto-map input data**
- Select the spreadsheet document ID
- Select the target sheet/tab
- Credential: your Google Sheets OAuth2 credential
- Connect:
- `Remove Duplicate Contacts``Save Results to Google Sheets`
19. **Optional but recommended: add sticky notes**
- Add one overview note describing the full process.
- Add section notes for:
- Configuration
- Submit and Store
- Async Polling Loop
- Parse and Save
20. **Test the workflow**
- Set at least one search field in **Configure Search Inputs**
- Verify the ScraperCity credential works
- Verify the Google Sheets node points to a writable spreadsheet
- Run the workflow manually
- Expect job completion to take roughly 220 minutes based on the sticky note guidance
21. **Recommended improvements when rebuilding**
- Add validation to ensure at least one of name, phone, or email is provided.
- Add explicit handling for failed statuses like `FAILED` or `CANCELLED`.
- Add a timeout/failure branch if max polling iterations are exceeded.
- Use safer JSON construction in the submission body to avoid malformed JSON when values contain quotes.
- Filter out parser error rows before writing to Google Sheets.
### Credential Summary
- **ScraperCity API Key**
- Type: HTTP Header Auth
- Header: `Authorization`
- Value: `Bearer YOUR_KEY`
- **Google Sheets OAuth2**
- Type: Google Sheets OAuth2 API
- Must have append/write permission on the selected spreadsheet
### Sub-workflow Setup
- This workflow does **not** invoke any sub-workflows.
- It has a single entry point: **When clicking 'Execute workflow'**.
---
# 5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
| Jobs are expected to complete asynchronously and may take approximately 220 minutes. | Operational note from workspace documentation |
| Before use, configure a Header Auth credential named **ScraperCity API Key** with `Authorization: Bearer YOUR_KEY`. | ScraperCity authentication setup |
| Before use, connect a Google Sheets OAuth2 credential and set the destination spreadsheet in **Save Results to Google Sheets**. | Google Sheets setup |
| User-editable search criteria are centralized in **Configure Search Inputs**. | Workflow operation |
| The workflow description area is empty in the provided export. | Metadata note |