creation

2026-04-21 01:24:34 +00:00 · 2025-11-12 19:59:30 +01:00
parent 459be7a552
commit 023ffc9e3d
1 changed files with 288 additions and 0 deletions
--- a/AI-3149/readme-3149.md
+++ b/AI-3149/readme-3149.md
@@ -0,0 +1,288 @@
+Image-Based Data Extraction API using Gemini AI
+
+https://n8nworkflows.xyz/workflows/image-based-data-extraction-api-using-gemini-ai-3149
+
+
+# Image-Based Data Extraction API using Gemini AI
+
+### 1. Workflow Overview
+
+This workflow, titled **"Image-Based Data Extraction API using Gemini AI"**, implements a no-code API endpoint designed to extract structured data from images using AI-powered OCR technology. It is tailored for scenarios requiring automated text extraction from various image types such as ID cards, invoices, receipts, business cards, and scanned documents.
+
+The workflow is logically divided into the following blocks:
+
+- **1.1 Input Reception:** Receives an HTTP GET request containing an image URL and extraction parameters.
+- **1.2 Image Retrieval and Conversion:** Downloads the image from the provided URL and converts it into a base64-encoded format suitable for AI processing.
+- **1.3 AI Processing with Gemini API:** Sends the base64 image and extraction instructions to the Gemini AI API (Flash Lite model) to perform OCR and extract requested fields.
+- **1.4 Output Formatting and Response:** Parses the AI response to isolate the required data fields and returns the structured JSON result to the API caller.
+
+---
+
+### 2. Block-by-Block Analysis
+
+#### 2.1 Input Reception
+
+- **Overview:**  
+  This block exposes a webhook endpoint that accepts incoming API requests. It captures the image URL and extraction requirements from the request body.
+
+- **Nodes Involved:**  
+  - Webhook
+
+- **Node Details:**
+
+  - **Webhook**  
+    - Type: Webhook (HTTP endpoint)  
+    - Configuration:  
+      - Path: `data-extractor`  
+      - HTTP Method: GET (implied by usage)  
+      - Response Mode: `responseNode` (response is sent by a downstream node)  
+    - Expressions/Variables: Accesses request body parameters such as `image_url`, `Requirement`, and `properties`.  
+    - Input: External HTTP request  
+    - Output: Passes request data downstream  
+    - Edge Cases:  
+      - Missing or invalid `image_url` parameter  
+      - Malformed JSON in request body  
+      - Unsupported HTTP methods  
+    - Notes: This node is the API entry point.
+
+#### 2.2 Image Retrieval and Conversion
+
+- **Overview:**  
+  Downloads the image from the URL provided in the webhook request and converts the binary image data into a base64-encoded string for AI processing.
+
+- **Nodes Involved:**  
+  - Get image from URL  
+  - Transform image to base64
+
+- **Node Details:**
+
+  - **Get image from URL**  
+    - Type: HTTP Request  
+    - Configuration:  
+      - URL: Dynamically set to `{{$json.body.image_url}}` from webhook input  
+      - Method: GET (default)  
+      - No additional options configured  
+    - Input: Receives webhook data  
+    - Output: Binary image data  
+    - Edge Cases:  
+      - Invalid or unreachable URL  
+      - Non-image content returned  
+      - Timeout or network errors  
+
+  - **Transform image to base64**  
+    - Type: Extract From File  
+    - Configuration:  
+      - Operation: `binaryToProperty` (converts binary data to a JSON property)  
+      - Destination Key: `data1`  
+      - Encoding: ASCII (base64 encoded string)  
+    - Input: Binary image data from previous node  
+    - Output: JSON with base64 string under `data1`  
+    - Edge Cases:  
+      - Binary data missing or corrupted  
+      - Encoding errors  
+
+#### 2.3 AI Processing with Gemini API
+
+- **Overview:**  
+  Sends the base64-encoded image and extraction instructions to the Gemini AI API (Flash Lite model) to perform OCR and extract the requested structured data fields.
+
+- **Nodes Involved:**  
+  - Call Gemini API (Flash Lite) with Image
+
+- **Node Details:**
+
+  - **Call Gemini API (Flash Lite) with Image**  
+    - Type: HTTP Request  
+    - Configuration:  
+      - URL: `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-lite:generateContent`  
+      - Method: POST  
+      - Body Type: JSON  
+      - Body Content:  
+        - `contents`: Array with two user roles:  
+          1. Inline image data with base64 string (`{{$json.data1}}`) and MIME type `image/jpeg`  
+          2. Text prompt `"check this"` (likely a placeholder or trigger)  
+        - `systemInstruction`: Text from webhook input `Requirement` field  
+        - `generationConfig`: Parameters controlling AI output such as temperature, topK, topP, max tokens, response MIME type (`application/json`), and a dynamic response schema derived from webhook `properties` field  
+      - Authentication: Uses predefined Google Palm API credentials named "Gemini API Srinivasan Online"  
+    - Input: JSON with base64 image and instructions  
+    - Output: AI-generated JSON response containing extracted data candidates  
+    - Expressions/Variables:  
+      - Uses expressions to dynamically insert base64 image and extraction requirements  
+      - Dynamically constructs response schema from input properties  
+    - Edge Cases:  
+      - API authentication failure  
+      - Rate limiting or quota exceeded  
+      - Invalid or malformed request body  
+      - AI model errors or timeouts  
+      - Unexpected response format  
+    - Version-specific: Requires n8n version supporting HTTP Request node v4.2 and Google Palm API credentials  
+    - Notes: This is the core AI OCR processing step.
+
+#### 2.4 Output Formatting and Response
+
+- **Overview:**  
+  Parses the AI response to extract the first candidate's content, converts it from stringified JSON to a JSON object, and sends it back as the API response.
+
+- **Nodes Involved:**  
+  - Edit fields to output required data alone  
+  - Respond to Webhook
+
+- **Node Details:**
+
+  - **Edit fields to output required data alone**  
+    - Type: Set  
+    - Configuration:  
+      - Assigns a new field `result` with the value parsed from the first candidate's content text:  
+        `={{ $json.candidates[0].content.parts[0].text.parseJson() }}`  
+      - Removes all other fields by default (only `result` remains)  
+    - Input: AI JSON response  
+    - Output: JSON with a single `result` field containing structured extracted data  
+    - Edge Cases:  
+      - Missing or empty `candidates` array  
+      - Parsing errors if AI response text is not valid JSON  
+
+  - **Respond to Webhook**  
+    - Type: Respond to Webhook  
+    - Configuration:  
+      - Sends the output of the previous node as the HTTP response  
+    - Input: Final JSON data to return  
+    - Output: HTTP response to the original API caller  
+    - Edge Cases:  
+      - Network errors while sending response  
+      - Large payloads causing timeout  
+
+---
+
+### 3. Summary Table
+
+| Node Name                         | Node Type           | Functional Role                          | Input Node(s)           | Output Node(s)                  | Sticky Note                                                                                      |
+|----------------------------------|---------------------|----------------------------------------|-------------------------|-------------------------------|-------------------------------------------------------------------------------------------------|
+| Webhook                          | Webhook             | API entry point, receives request      | External HTTP request    | Get image from URL             | See Sticky Note2 for detailed API endpoint description and use cases                            |
+| Get image from URL               | HTTP Request        | Downloads image from provided URL      | Webhook                 | Transform image to base64      |                                                                                                 |
+| Transform image to base64        | Extract From File   | Converts binary image to base64 string | Get image from URL      | Call Gemini API (Flash Lite)  |                                                                                                 |
+| Call Gemini API (Flash Lite) with Image | HTTP Request        | Sends image and instructions to AI     | Transform image to base64 | Edit fields to output required data alone |                                                                                                 |
+| Edit fields to output required data alone | Set                 | Parses AI response and extracts data   | Call Gemini API          | Respond to Webhook             |                                                                                                 |
+| Respond to Webhook               | Respond to Webhook  | Sends final JSON response to caller    | Edit fields to output    | External HTTP response         |                                                                                                 |
+| Sticky Note                     | Sticky Note         | Sample API call example                 |                         |                               | Contains cURL example for API usage                                                             |
+| Sticky Note1                    | Sticky Note         | Sample output example                   |                         |                               | Shows example JSON output                                                                        |
+| Sticky Note2                    | Sticky Note         | Workflow overview and API description  |                         |                               | Describes workflow purpose, use cases, and processing steps                                    |
+
+---
+
+### 4. Reproducing the Workflow from Scratch
+
+1. **Create Webhook Node**  
+   - Type: Webhook  
+   - Path: `data-extractor`  
+   - Response Mode: `responseNode`  
+   - Accepts GET requests with JSON body containing:  
+     - `image_url` (string)  
+     - `Requirement` (string)  
+     - `properties` (JSON schema object)  
+
+2. **Create HTTP Request Node: "Get image from URL"**  
+   - Type: HTTP Request  
+   - Method: GET  
+   - URL: Expression `{{$json.body.image_url}}` (from webhook input)  
+   - No authentication or special headers  
+   - Connect Webhook → Get image from URL  
+
+3. **Create Extract From File Node: "Transform image to base64"**  
+   - Type: Extract From File  
+   - Operation: `binaryToProperty`  
+   - Destination Key: `data1`  
+   - Encoding: ASCII (base64)  
+   - Connect Get image from URL → Transform image to base64  
+
+4. **Create HTTP Request Node: "Call Gemini API (Flash Lite) with Image"**  
+   - Type: HTTP Request  
+   - Method: POST  
+   - URL: `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-lite:generateContent`  
+   - Authentication: Use Google Palm API credentials (OAuth2 or API key)  
+   - Body Content (JSON):  
+     ```json
+     {
+       "contents": [
+         {
+           "role": "user",
+           "parts": [
+             {
+               "inlineData": {
+                 "data": "{{$json.data1}}",
+                 "mimeType": "image/jpeg"
+               }
+             }
+           ]
+         },
+         {
+           "role": "user",
+           "parts": [
+             {
+               "text": "check this"
+             }
+           ]
+         }
+       ],
+       "systemInstruction": {
+         "role": "user",
+         "parts": [
+           {
+             "text": "{{ $('Webhook').first().json.body.Requirement }}"
+           }
+         ]
+       },
+       "generationConfig": {
+         "temperature": 1,
+         "topK": 40,
+         "topP": 0.95,
+         "maxOutputTokens": 8192,
+         "responseMimeType": "application/json",
+         "responseSchema": {
+           "type": "object",
+           "properties": {{ $('Webhook').first().json.body.properties.toJsonString() }}
+         }
+       }
+     }
+     ```  
+   - Connect Transform image to base64 → Call Gemini API (Flash Lite) with Image  
+
+5. **Create Set Node: "Edit fields to output required data alone"**  
+   - Type: Set  
+   - Remove all fields except one new field:  
+     - Name: `result`  
+     - Value (expression): `={{ $json.candidates[0].content.parts[0].text.parseJson() }}`  
+   - Connect Call Gemini API → Edit fields to output required data alone  
+
+6. **Create Respond to Webhook Node**  
+   - Type: Respond to Webhook  
+   - Connect Edit fields to output required data alone → Respond to Webhook  
+
+7. **Add Sticky Notes (Optional for Documentation)**  
+   - Add a sticky note near the Webhook node with sample cURL API call (see Sticky Note content)  
+   - Add a sticky note near the Respond to Webhook node with sample JSON output  
+
+8. **Credentials Setup**  
+   - Configure Google Palm API credentials with valid API key or OAuth2 token for Gemini API access  
+   - Assign these credentials to the "Call Gemini API (Flash Lite) with Image" node  
+
+9. **Activate Workflow**  
+   - Save and activate the workflow  
+   - Test by sending a GET request with JSON body to `/webhook/data-extractor`  
+
+---
+
+### 5. General Notes & Resources
+
+| Note Content                                                                                                                                                | Context or Link                                                                                   |
+|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
+| Sample API call using cURL: includes image URL, extraction requirement, and JSON schema for expected fields                                                  | See Sticky Note near Webhook node                                                                |
+| Sample output JSON demonstrating extracted fields such as PAN Number, Name, Date of Birth, and Valid (boolean)                                              | See Sticky Note near Respond to Webhook node                                                     |
+| Workflow converts images to base64 before sending to AI, ensuring compatibility with Gemini API input format                                                | Workflow design detail                                                                           |
+| Uses Gemini API Flash Lite model for OCR and structured data extraction                                                                                      | Requires Google Palm API credentials                                                             |
+| Supports customizable extraction schema via `properties` JSON parameter in API request                                                                     | Enables flexible field extraction                                                                |
+| Suitable for integration with CRM, ERP, document management, and other automation systems                                                                    | Use case guidance                                                                                |
+| For more information on Gemini API and Google Palm API, refer to official Google Cloud documentation                                                       | https://cloud.google.com/vertex-ai/docs/generative-ai/models/gemini                             |
+
+---
+
+This documentation provides a complete, structured understanding of the workflow, enabling reproduction, modification, and troubleshooting by advanced users and AI agents alike.