creation

2026-04-28 00:29:22 +00:00 · 2025-11-13 13:23:02 +01:00
parent 1be7a8019c
commit b777326267
1 changed files with 331 additions and 0 deletions
@@ -0,0 +1,331 @@
+Analyze Search Intent for Keywords with Google Scraping, Bright Data, and Gemini AI
+
+https://n8nworkflows.xyz/workflows/analyze-search-intent-for-keywords-with-google-scraping--bright-data--and-gemini-ai-6043
+
+
+# Analyze Search Intent for Keywords with Google Scraping, Bright Data, and Gemini AI
+
+### 1. Workflow Overview
+
+This workflow automates the process of analyzing search intent for a list of keywords by leveraging Google Search scraping, Bright Data proxy services, and AI-powered intent classification via Google Gemini AI and Langchain. It is designed primarily for SEO analysts, digital marketers, and content strategists aiming to understand the intent behind top-ranking search results for targeted keywords.
+
+The workflow is divided into the following logical blocks:
+
+- **1.1 Input Reception:** Trigger and keyword ingestion from Google Sheets.
+- **1.2 Search Results Retrieval:** Scraping Google Search results using Bright Data.
+- **1.3 Data Preparation:** Formatting and splitting search results data for processing.
+- **1.4 AI-based Intent Classification:** Classifying search intent of top-ranking pages using Google Gemini AI and Langchain’s text classifier.
+- **1.5 Intent Output Processing:** Mapping AI classifications to intent labels and merging results.
+- **1.6 Results Persistence:** Appending or updating results in Google Sheets for further analysis.
+
+---
+
+### 2. Block-by-Block Analysis
+
+#### 1.1 Input Reception
+
+- **Overview:** Starts the workflow manually and fetches the keywords and country codes from a Google Sheet.
+- **Nodes Involved:**  
+  - When clicking ‘Execute workflow’  
+  - Get Keywords  
+  - Loop Over Items  
+  - set keyword
+
+- **Node Details:**
+
+  - **When clicking ‘Execute workflow’**  
+    - *Type:* Manual Trigger  
+    - *Role:* Entry point allowing manual start of the workflow.  
+    - *Config:* No parameters needed.  
+    - *Input/Output:* No input; outputs trigger downstream nodes.  
+    - *Edge cases:* None; manual trigger only.
+
+  - **Get Keywords**  
+    - *Type:* Google Sheets  
+    - *Role:* Reads keywords and associated country codes from a predefined Google Sheet.  
+    - *Config:* Reads from sheet gid=0 of the Google Sheets document with ID "1QU9rwawCZLiYW8nlYYRMj-9OvAUNZoe2gP49KbozQqw". Uses Google Sheets OAuth2 credentials.  
+    - *Key variables:* Reads columns "Keyword" and "country code".  
+    - *Input:* Trigger from manual node.  
+    - *Output:* List of keywords and countries for processing.  
+    - *Edge cases:* Auth failures, empty sheet, or rate limits.
+
+  - **Loop Over Items**  
+    - *Type:* Split In Batches  
+    - *Role:* Processes keywords one-by-one to manage API limits and sequential processing.  
+    - *Config:* Default single-item batch size (default).  
+    - *Input:* Keywords list from Google Sheets.  
+    - *Output:* Single keyword and country code per iteration.  
+    - *Edge cases:* Empty input list; batch processing errors.
+
+  - **set keyword**  
+    - *Type:* Set  
+    - *Role:* Maps current keyword and country code for use in HTTP request.  
+    - *Config:* Sets two fields: `search_term` from `Keyword`, and `country code` from input data.  
+    - *Input:* One keyword item from Loop Over Items.  
+    - *Output:* Sets variables for next node.  
+    - *Edge cases:* Missing keyword or country code fields.
+
+---
+
+#### 1.2 Search Results Retrieval
+
+- **Overview:** Scrapes Google Search results for each keyword using Bright Data’s SERP API.
+- **Nodes Involved:**  
+  - Fetch Google Search Results JSON  
+  - Split Out
+
+- **Node Details:**
+
+  - **Fetch Google Search Results JSON**  
+    - *Type:* HTTP Request  
+    - *Role:* Calls Bright Data API to scrape Google Search results JSON.  
+    - *Config:*  
+      - POST to `https://api.brightdata.com/request`  
+      - Sends body parameters including zone `serp_api1`, URL constructed with the keyword query (spaces replaced with `+`), country code, and format `raw`.  
+      - Uses HTTP header auth with preconfigured Bright Data credentials.  
+      - Query parameter `async=true` to enable asynchronous scraping.  
+      - Accept header set to `application/json`.  
+    - *Input:* Keyword and country code from "set keyword".  
+    - *Output:* Raw SERP JSON response.  
+    - *Edge cases:* Auth errors, rate limits, network timeouts, malformed responses, unsupported country codes.
+
+  - **Split Out**  
+    - *Type:* Split Out  
+    - *Role:* Splits the `organic` array of top-ranking search results into individual items for separate processing.  
+    - *Config:* Splits on field `organic` from JSON response.  
+    - *Input:* Output of HTTP Request node.  
+    - *Output:* Individual organic search result entries (pages).  
+    - *Edge cases:* Missing or empty `organic` field.
+
+---
+
+#### 1.3 Data Preparation
+
+- **Overview:** Formats the individual search result items to a structured format and enriches with keyword context.
+- **Nodes Involved:**  
+  - format fields
+
+- **Node Details:**
+
+  - **format fields**  
+    - *Type:* Set  
+    - *Role:* Extracts and maps important fields from each search result item and adds the keyword context for intent classification.  
+    - *Config:*  
+      - Sets fields:  
+        - `Keyword` from the keyword variable  
+        - `ranking page` from `link` field of search result  
+        - `title` from `title` field  
+        - `Meta description` from `description` field  
+        - `extensions` as JSON string of optional extensions  
+        - `position` from `rank` field  
+    - *Input:* One organic search result item.  
+    - *Output:* Structured data for AI classification.  
+    - *Edge cases:* Missing fields, null descriptions.
+
+---
+
+#### 1.4 AI-based Intent Classification
+
+- **Overview:** Uses Langchain's text classifier node powered by Google Gemini AI to classify each search result's intent into predefined categories.
+- **Nodes Involved:**  
+  - Google Gemini Chat Model2  
+  - Intent Classifier  
+  - informational, Navigational, Commercial, Transactional, Mixed
+
+- **Node Details:**
+
+  - **Google Gemini Chat Model2**  
+    - *Type:* Langchain Google Gemini Chat  
+    - *Role:* Invokes Google Gemini AI to process input text for intent classification.  
+    - *Config:* Uses Google PaLM API credentials.  
+    - *Input:* Receives text to classify from "Intent Classifier".  
+    - *Output:* AI response for classification.  
+    - *Edge cases:* API rate limiting, credential expiration, API errors.
+
+  - **Intent Classifier**  
+    - *Type:* Langchain Text Classifier  
+    - *Role:* Defines intent categories and classifies the input text into one of five intents: Informational, Navigational, Commercial, Transactional, Mixed.  
+    - *Config:*  
+      - Input text: includes keyword, page title, meta description.  
+      - Categories with descriptions for classification.  
+    - *Input:* Text from "format fields".  
+    - *Output:* Classification result routed to one of the five subsequent nodes.  
+    - *Edge cases:* Ambiguous classifications, AI errors.
+
+  - **informational, Navigational, Commercial, Transactional, Mixed**  
+    - *Type:* Set nodes  
+    - *Role:* Assign the classified intent label as a string field `intent` to the item.  
+    - *Config:* Each node sets the `intent` field to its respective category name.  
+    - *Input:* Routed from Intent Classifier output.  
+    - *Output:* Labeled data forwarded to merge node.  
+    - *Edge cases:* None expected.
+
+---
+
+#### 1.5 Intent Output Processing
+
+- **Overview:** Merges all classified intent data streams into a single stream for final processing.
+- **Nodes Involved:**  
+  - Merge intents
+
+- **Node Details:**
+
+  - **Merge intents**  
+    - *Type:* Merge  
+    - *Role:* Combines the five separate intent-labeled data streams into one unified output stream.  
+    - *Config:* Set to handle 5 input connections.  
+    - *Input:* From informational, Navigational, Commercial, Transactional, Mixed nodes.  
+    - *Output:* Unified data stream for appending.  
+    - *Edge cases:* Missing input streams, synchronization issues.
+
+---
+
+#### 1.6 Results Persistence
+
+- **Overview:** Appends or updates the processed search result data with intent labels into a Google Sheet for review and further analysis.
+- **Nodes Involved:**  
+  - append ranking result
+
+- **Node Details:**
+
+  - **append ranking result**  
+    - *Type:* Google Sheets  
+    - *Role:* Appends or updates rows in a specific Google Sheet tab with the enriched data.  
+    - *Config:*  
+      - Document ID: same as input sheet.  
+      - Sheet name: GID 1031244896 (Sheet2).  
+      - Operation: appendOrUpdate using `ranking page` as matching column.  
+      - Columns mapped: title, position, ranking page, Meta description, Keyword, extensions, intent.  
+      - Uses Google Sheets OAuth2 credentials.  
+    - *Input:* Unified intent-labeled data from Merge node.  
+    - *Output:* Appended/updated data in Google Sheet.  
+    - *Edge cases:* Auth failures, rate limits, update conflicts.
+
+---
+
+### 3. Summary Table
+
+| Node Name                      | Node Type                           | Functional Role                          | Input Node(s)                      | Output Node(s)                     | Sticky Note                                                                                                  |
+|--------------------------------|-----------------------------------|----------------------------------------|----------------------------------|----------------------------------|--------------------------------------------------------------------------------------------------------------|
+| When clicking ‘Execute workflow’| Manual Trigger                    | Manual workflow start                   | -                                | Get Keywords                     |                                                                                                              |
+| Get Keywords                   | Google Sheets                     | Reads keywords and country codes       | When clicking ‘Execute workflow’ | Loop Over Items                  | - Make a copy of this [G sheet](https://docs.google.com/spreadsheets/d/1QU9rwawCZLiYW8nlYYRMj-9OvAUNZoe2gP49KbozQqw/edit?usp=sharing) - Add your desired keywords |
+| Loop Over Items               | Split In Batches                  | Processes keywords one-by-one           | Get Keywords                    | set keyword / Loop Over Items    | - We loop over each item one at a time                                                                       |
+| set keyword                   | Set                              | Maps keyword and country code           | Loop Over Items                 | Fetch Google Search Results JSON | - Map keyword and country code - Update the Zone name to match your zone on Bright Data - Run the scraper     |
+| Fetch Google Search Results JSON| HTTP Request                    | Scrapes Google Search via Bright Data  | set keyword                    | Split Out                      |                                                                                                              |
+| Split Out                    | Split Out                        | Splits organic search results           | Fetch Google Search Results JSON| format fields                  |                                                                                                              |
+| format fields                | Set                              | Extracts and formats search result data | Split Out                     | Intent Classifier              |                                                                                                              |
+| Google Gemini Chat Model2    | Langchain Google Gemini Chat     | AI processing for text classification   | Intent Classifier (ai_languageModel) | Intent Classifier            |                                                                                                              |
+| Intent Classifier            | Langchain Text Classifier        | Classifies search intent                 | format fields / Google Gemini Chat Model2 | informational, Navigational, Commercial, Transactional, Mixed | ## Analyze intent of the top ranking pages - We use AI to analyze each title and its intent as informational, Commercial, transactional, navigational or Mixed in the case it doesn't fall into any of those |
+| informational               | Set                              | Labels intent as informational           | Intent Classifier              | Merge intents                  |                                                                                                              |
+| Navigational               | Set                              | Labels intent as navigational             | Intent Classifier              | Merge intents                  |                                                                                                              |
+| Commercial               | Set                              | Labels intent as commercial               | Intent Classifier              | Merge intents                  |                                                                                                              |
+| Transactional             | Set                              | Labels intent as transactional             | Intent Classifier              | Merge intents                  |                                                                                                              |
+| Mixed                     | Set                              | Labels intent as mixed                     | Intent Classifier              | Merge intents                  |                                                                                                              |
+| Merge intents             | Merge                            | Merges all intents into one stream         | informational, Navigational, Commercial, Transactional, Mixed | append ranking result |                                                                                                              |
+| append ranking result      | Google Sheets                    | Appends or updates results in Google Sheet | Merge intents               | -                            | - Append the top pages to the G sheet for further analysis                                                  |
+
+---
+
+### 4. Reproducing the Workflow from Scratch
+
+1. **Create a Manual Trigger node** named "When clicking ‘Execute workflow’" — no parameters.
+
+2. **Add a Google Sheets node** named "Get Keywords" to read keyword data:  
+   - Document ID: `1QU9rwawCZLiYW8nlYYRMj-9OvAUNZoe2gP49KbozQqw`  
+   - Sheet GID: `0`  
+   - Authentication: Google Sheets OAuth2 (configure your credentials).  
+   - Connect output of manual trigger to this node.
+
+3. **Add a Split In Batches node** named "Loop Over Items" to process keywords one at a time:  
+   - Default batch size (1) is sufficient.  
+   - Connect output of "Get Keywords" to this node.
+
+4. **Add a Set node** named "set keyword":  
+   - Map `search_term` to `={{ $json.Keyword }}`  
+   - Map `country code` to `={{ $json['country code'] }}`  
+   - Connect output of "Loop Over Items" (main) to this node.
+
+5. **Add an HTTP Request node** named "Fetch Google Search Results JSON":  
+   - Method: POST  
+   - URL: `https://api.brightdata.com/request`  
+   - Authentication: HTTP Header Auth with Bright Data API key.  
+   - Query Parameters: `async=true`  
+   - Headers: Accept: application/json  
+   - Body Parameters (JSON):  
+     - zone: `serp_api1` (or your configured zone name)  
+     - url: `=https://www.google.com/search?q={{ $json.search_term.replaceAll(" ", "+") }}&start=0&brd_json=1`  
+     - country: `={{ $json['country code'] }}`  
+     - format: `raw`  
+   - Connect output of "set keyword" to this node.
+
+6. **Add a Split Out node** named "Split Out":  
+   - Field to split: `organic`  
+   - Connect output of HTTP Request node to this node.
+
+7. **Add a Set node** named "format fields":  
+   - Map fields:  
+     - `Keyword` = `={{ $('set keyword').item.json.search_term }}`  
+     - `ranking page` = `={{ $json.link }}`  
+     - `title` = `={{ $json.title }}`  
+     - `Meta description` = `={{ $json.description }}`  
+     - `extensions` = `={{ $json.extensions?.toJsonString() }}`  
+     - `position` = `={{ $json.rank }}`  
+   - Connect output of "Split Out" to this node.
+
+8. **Add a Langchain Google Gemini Chat node** named "Google Gemini Chat Model2":  
+   - Set credentials with Google PaLM API key.  
+   - Connect output of "Intent Classifier" node (see next step) as AI language model input.
+
+9. **Add a Langchain Text Classifier node** named "Intent Classifier":  
+   - Input text:  
+     ```
+     Keyword: {{ $json.Keyword }}
+
+     Top pages title  : {{ $json.title }}
+
+     meta descriptions : {{ $json.description }}
+     ```
+   - Define categories: Informational, Navigational, Commercial, Transactional, Mixed (each with a description).  
+   - Connect output of "format fields" to this node as input.  
+   - Connect AI language model input to "Google Gemini Chat Model2".
+
+10. **Add five Set nodes** named "informational", "Navigational", "Commercial", "Transactional", and "Mixed":  
+    - Each sets `intent` field to its corresponding category string (e.g., `"informational"`).  
+    - Connect respective outputs from "Intent Classifier" to each Set node.
+
+11. **Add a Merge node** named "Merge intents":  
+    - Number of inputs: 5  
+    - Connect outputs of all five intent Set nodes to this Merge node.
+
+12. **Add a Google Sheets node** named "append ranking result":  
+    - Document ID: same as "Get Keywords" node.  
+    - Sheet GID: `1031244896` (Sheet2).  
+    - Operation: Append or Update  
+    - Match by column: `ranking page`  
+    - Map columns: title, position, ranking page, Meta description, Keyword, extensions, intent.  
+    - Use Google Sheets OAuth2 credentials.  
+    - Connect output of "Merge intents" to this node.
+
+13. **Configure credentials:**  
+    - Google Sheets OAuth2 for both Google Sheets nodes.  
+    - Bright Data HTTP Header Auth for the HTTP Request node.  
+    - Google PaLM API credentials for Google Gemini Chat node.
+
+14. **Test the workflow:** Run manually, verify keywords are read, Google SERP scraped, AI classifies intent, and results append to Google Sheet.
+
+---
+
+### 5. General Notes & Resources
+
+| Note Content                                                                                   | Context or Link                                                                                         |
+|------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
+| Make a copy of this Google Sheet to add your own keywords: https://docs.google.com/spreadsheets/d/1QU9rwawCZLiYW8nlYYRMj-9OvAUNZoe2gP49KbozQqw/edit?usp=sharing | Source Google Sheet for keywords and results storage                                                   |
+| Setting up SERP scraper in Bright Data: 1) Go to [Proxies & Scraping](https://brightdata.com/cp/zones) tab, 2) Create zone under SERP API, 3) Name and configure zone accordingly | Bright Data zone setup instructions                                                                     |
+| Analyze intent of the top ranking pages: We use AI to categorize pages as informational, commercial, transactional, navigational, or mixed | Intent classification logic description                                                                |
+| Loop over each keyword item one at a time to avoid API limits or processing bottlenecks        | Processing note                                                                                         |
+| Map keyword and country code, update Zone name to match your Bright Data zone before running scraper | Configuration note                                                                                      |
+| Append the top pages with their intents to the Google Sheet for further analysis               | Data persistence note                                                                                   |
+
+---
+
+**Disclaimer:** The provided text originates exclusively from an automated workflow created with n8n, a tool for integration and automation. This processing strictly complies with current content policies and contains no illegal, offensive, or protected elements. All data handled are legal and publicly available.