creation

2026-04-19 17:14:37 +00:00 · 2026-03-10 12:10:21 +08:00
parent e6139f9624
commit bf2e3d7cdb
1 changed files with 144 additions and 0 deletions
--- a/ScraperCity-13858/readme-13858.md
+++ b/ScraperCity-13858/readme-13858.md
@@ -0,0 +1,144 @@
+Find business emails from contact names and domains using ScraperCity
+
+https://n8nworkflows.xyz/workflows/find-business-emails-from-contact-names-and-domains-using-scrapercity-13858
+
+
+# Find business emails from contact names and domains using ScraperCity
+
+# 1. Workflow Overview
+
+This workflow automates the process of finding professional email addresses using the ScraperCity API. It is designed for sales development representatives, recruiters, and growth marketers who have a list of contact names and company domains and need to enrich them with verified email addresses at scale.
+
+The process operates asynchronously: it submits a batch of contacts, polls the ScraperCity server until the background job is finished, downloads the enriched data, and finally appends the successful results into a Google Sheet.
+
+The workflow is organized into four functional blocks:
+1.  **Contact Configuration:** Defining the input list of names and domains.
+2.  **Job Submission:** Sending the data to ScraperCity and capturing the tracking ID.
+3.  **Async Polling Loop:** Checking the job status repeatedly until the processing is complete.
+4.  **Data Processing & Export:** Normalizing the API response and writing found emails to a spreadsheet.
+
+---
+
+# 2. Block-by-Block Analysis
+
+### 2.1 Contact Configuration
+This block initializes the workflow with the data intended for enrichment.
+
+*   **Nodes Involved:** `When clicking 'Execute workflow'`, `Set Contact List`.
+*   **Node Details:**
+    *   **When clicking 'Execute workflow' (Manual Trigger):** Starts the process manually.
+    *   **Set Contact List (Set):** 
+        *   **Role:** Defines the input data.
+        *   **Configuration:** Assigns a string variable `contacts` containing a JSON array of objects (e.g., `[{"first_name":"Jane", "last_name":"Smith", "domain":"example.com"}]`).
+        *   **Edge Cases:** Malformed JSON strings here will cause the subsequent HTTP request to fail.
+
+### 2.2 Job Submission
+This block handles the initial communication with the ScraperCity API.
+
+*   **Nodes Involved:** `Submit Email Finder Job`, `Store Run ID`.
+*   **Node Details:**
+    *   **Submit Email Finder Job (HTTP Request):** 
+        *   **Role:** Submits the contact list to the `/api/v1/scrape/email-finder` endpoint via POST.
+        *   **Configuration:** Uses Header Authentication. The body sends the `contacts` variable from the previous node.
+        *   **Failure Types:** 401 Unauthorized (invalid API key) or 400 Bad Request (invalid contact format).
+    *   **Store Run ID (Set):**
+        *   **Role:** Extracts the `runId` from the API response.
+        *   **Configuration:** Maps `{{ $json.runId }}` to a local variable for easy reference in polling nodes.
+
+### 2.3 Async Polling Loop
+Because email finding is an intensive process, the API requires a polling mechanism to determine when results are ready.
+
+*   **Nodes Involved:** `Wait Before First Poll`, `Polling Loop`, `Check Job Status`, `Is Job Complete?`, `Wait 60 Seconds Before Retry`.
+*   **Node Details:**
+    *   **Wait Before First Poll (Wait):** Pauses for 30 seconds to give the API time to start processing.
+    *   **Polling Loop (Split In Batches):** Used as a control gate for the loop. Set to batch size 1 to process the single `runId`.
+    *   **Check Job Status (HTTP Request):** Performs a GET request to `/api/v1/scrape/status/{{runId}}`.
+    *   **Is Job Complete? (If):** Checks if the `status` field in the response equals `SUCCEEDED`.
+    *   **Wait 60 Seconds Before Retry (Wait):** If the status is not `SUCCEEDED`, it waits one minute before routing back to the `Polling Loop` node.
+
+### 2.4 Data Processing & Export
+Once the job is finished, this block retrieves, cleans, and saves the data.
+
+*   **Nodes Involved:** `Download Email Results`, `Parse and Format Results`, `Filter Emails Found`, `Write Results to Google Sheets`.
+*   **Node Details:**
+    *   **Download Email Results (HTTP Request):** Fetches the final JSON payload from `/api/downloads/{{runId}}`.
+    *   **Parse and Format Results (Code):**
+        *   **Role:** Normalizes the data structure.
+        *   **Logic:** It iterates through the results and ensures fields like `email`, `email_status`, and `confidence` are present and flat, making them compatible with spreadsheet rows.
+    *   **Filter Emails Found (Filter):** Removes any rows where the `email` field is empty to ensure only successful matches are saved.
+    *   **Write Results to Google Sheets (Google Sheets):** 
+        *   **Role:** Appends the cleaned data to a specific spreadsheet.
+        *   **Configuration:** Maps internal JSON keys to Sheet column headers (e.g., `first_name`, `last_name`, `email`).
+
+---
+
+# 3. Summary Table
+
+| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
+| :--- | :--- | :--- | :--- | :--- | :--- |
+| When clicking 'Execute workflow' | manualTrigger | Workflow Entry | (None) | Set Contact List | (None) |
+| Set Contact List | set | Data Input | When clicking 'Execute workflow' | Submit Email Finder Job | The `Set Contact List` node is where you define the contacts to look up. |
+| Submit Email Finder Job | httpRequest | API Submission | Set Contact List | Store Run ID | Add your ScraperCity API key credential here |
+| Store Run ID | set | ID Extraction | Submit Email Finder Job | Wait Before First Poll | `Store Run ID` captures this ID and makes it available throughout the rest of the workflow. |
+| Wait Before First Poll | wait | Initial Delay | Store Run ID | Polling Loop | Wait 30 seconds on the first pass, then enters a loop. |
+| Polling Loop | splitInBatches | Loop Controller | Wait Before First Poll, Wait 60 Seconds Before Retry | Check Job Status | ScraperCity email-finder jobs run in the background. |
+| Check Job Status | httpRequest | Status Check | Polling Loop | Is Job Complete? | Polls the status endpoint via GET. |
+| Is Job Complete? | if | Logic Switch | Check Job Status | Download Email Results (True), Wait 60 Seconds Before Retry (False) | Checks whether status equals `SUCCEEDED`. |
+| Wait 60 Seconds Before Retry | wait | Retry Delay | Is Job Complete? | Polling Loop | If not done, `Wait 60 Seconds Before Retry` pauses before looping back. |
+| Download Email Results | httpRequest | Data Retrieval | Is Job Complete? | Parse and Format Results | `Download Email Results` fetches the completed results from ScraperCity. |
+| Parse and Format Results | code | Data Cleaning | Download Email Results | Filter Emails Found | Normalizes each contact row into flat fields. |
+| Filter Emails Found | filter | Data Filtering | Parse and Format Results | Write Results to Google Sheets | Removes contacts where no email was returned. |
+| Write Results to Google Sheets | googleSheets | Data Export | Filter Emails Found | (None) | Set your Google Sheets document ID and sheet name here |
+
+---
+
+# 4. Reproducing the Workflow from Scratch
+
+1.  **Setup Credentials:**
+    *   Create a **Header Auth** credential: Name it `ScraperCity API Key`, Header: `Authorization`, Value: `Bearer [YOUR_API_KEY]`.
+    *   Create a **Google Sheets OAuth2** credential for your Google account.
+
+2.  **Initial Logic:**
+    *   Add a **Manual Trigger** node.
+    *   Connect a **Set** node (`Set Contact List`). Create a string variable `contacts` and paste a JSON array of contacts with `first_name`, `last_name`, and `domain`.
+
+3.  **API Submission:**
+    *   Add an **HTTP Request** node (`Submit Email Finder Job`).
+        *   Method: `POST`.
+        *   URL: `https://scrapercity.com/api/v1/scrape/email-finder`.
+        *   Authentication: `Header Auth`.
+        *   Body: `JSON`, `contacts` field linked to the previous node.
+    *   Add a **Set** node (`Store Run ID`). Assign `runId` from the expression `{{ $json.runId }}`.
+
+4.  **Polling Mechanism:**
+    *   Add a **Wait** node (`Wait Before First Poll`) set to 30 seconds.
+    *   Add a **Split In Batches** node (`Polling Loop`) with batch size 1.
+    *   Add an **HTTP Request** node (`Check Job Status`).
+        *   Method: `GET`.
+        *   URL: `https://scrapercity.com/api/v1/scrape/status/{{ $node["Store Run ID"].json["runId"] }}`.
+    *   Add an **If** node (`Is Job Complete?`). Condition: String `{{ $json.status }}` equals `SUCCEEDED`.
+    *   Add a **Wait** node (`Wait 60 Seconds Before Retry`) set to 60 seconds. Connect it from the **False** output of the If node and loop it back to the **Polling Loop** node.
+
+5.  **Data Extraction:**
+    *   Add an **HTTP Request** node (`Download Email Results`) to the **True** output of the If node.
+        *   Method: `GET`.
+        *   URL: `https://scrapercity.com/api/downloads/{{ $node["Store Run ID"].json["runId"] }}`.
+    *   Add a **Code** node (`Parse and Format Results`). Use JavaScript to map the array items into flat JSON objects containing `first_name`, `last_name`, `domain`, `email`, `email_status`, and `confidence`.
+    *   Add a **Filter** node (`Filter Emails Found`). Rule: `email` is not empty.
+
+6.  **Final Export:**
+    *   Add a **Google Sheets** node. 
+        *   Operation: `Append`.
+        *   Document/Sheet: Select your target sheet.
+        *   Mapping: Map the fields from the Filter node to the sheet columns.
+
+---
+
+# 5. General Notes & Resources
+
+| Note Content | Context or Link |
+| :--- | :--- |
+| **Typical Completion Time** | Jobs usually take 1-10 minutes depending on list size. |
+| **API Documentation** | Refer to ScraperCity API docs for rate limits or plan details. |
+| **Data Normalization** | The Code node ensures that even if no results are found, the workflow doesn't crash but returns a "no_results" status. |
+| **Sheet Preparation** | Ensure your Google Sheet has headers: first_name, last_name, domain, email, email_status, confidence. |