mirror of
https://github.com/khoaliber/n8nworkflows.xyz.git
synced 2026-04-20 17:14:40 +00:00
288 lines
15 KiB
Markdown
288 lines
15 KiB
Markdown
Image-Based Data Extraction API using Gemini AI
|
|
|
|
https://n8nworkflows.xyz/workflows/image-based-data-extraction-api-using-gemini-ai-3149
|
|
|
|
|
|
# Image-Based Data Extraction API using Gemini AI
|
|
|
|
### 1. Workflow Overview
|
|
|
|
This workflow, titled **"Image-Based Data Extraction API using Gemini AI"**, implements a no-code API endpoint designed to extract structured data from images using AI-powered OCR technology. It is tailored for scenarios requiring automated text extraction from various image types such as ID cards, invoices, receipts, business cards, and scanned documents.
|
|
|
|
The workflow is logically divided into the following blocks:
|
|
|
|
- **1.1 Input Reception:** Receives an HTTP GET request containing an image URL and extraction parameters.
|
|
- **1.2 Image Retrieval and Conversion:** Downloads the image from the provided URL and converts it into a base64-encoded format suitable for AI processing.
|
|
- **1.3 AI Processing with Gemini API:** Sends the base64 image and extraction instructions to the Gemini AI API (Flash Lite model) to perform OCR and extract requested fields.
|
|
- **1.4 Output Formatting and Response:** Parses the AI response to isolate the required data fields and returns the structured JSON result to the API caller.
|
|
|
|
---
|
|
|
|
### 2. Block-by-Block Analysis
|
|
|
|
#### 2.1 Input Reception
|
|
|
|
- **Overview:**
|
|
This block exposes a webhook endpoint that accepts incoming API requests. It captures the image URL and extraction requirements from the request body.
|
|
|
|
- **Nodes Involved:**
|
|
- Webhook
|
|
|
|
- **Node Details:**
|
|
|
|
- **Webhook**
|
|
- Type: Webhook (HTTP endpoint)
|
|
- Configuration:
|
|
- Path: `data-extractor`
|
|
- HTTP Method: GET (implied by usage)
|
|
- Response Mode: `responseNode` (response is sent by a downstream node)
|
|
- Expressions/Variables: Accesses request body parameters such as `image_url`, `Requirement`, and `properties`.
|
|
- Input: External HTTP request
|
|
- Output: Passes request data downstream
|
|
- Edge Cases:
|
|
- Missing or invalid `image_url` parameter
|
|
- Malformed JSON in request body
|
|
- Unsupported HTTP methods
|
|
- Notes: This node is the API entry point.
|
|
|
|
#### 2.2 Image Retrieval and Conversion
|
|
|
|
- **Overview:**
|
|
Downloads the image from the URL provided in the webhook request and converts the binary image data into a base64-encoded string for AI processing.
|
|
|
|
- **Nodes Involved:**
|
|
- Get image from URL
|
|
- Transform image to base64
|
|
|
|
- **Node Details:**
|
|
|
|
- **Get image from URL**
|
|
- Type: HTTP Request
|
|
- Configuration:
|
|
- URL: Dynamically set to `{{$json.body.image_url}}` from webhook input
|
|
- Method: GET (default)
|
|
- No additional options configured
|
|
- Input: Receives webhook data
|
|
- Output: Binary image data
|
|
- Edge Cases:
|
|
- Invalid or unreachable URL
|
|
- Non-image content returned
|
|
- Timeout or network errors
|
|
|
|
- **Transform image to base64**
|
|
- Type: Extract From File
|
|
- Configuration:
|
|
- Operation: `binaryToProperty` (converts binary data to a JSON property)
|
|
- Destination Key: `data1`
|
|
- Encoding: ASCII (base64 encoded string)
|
|
- Input: Binary image data from previous node
|
|
- Output: JSON with base64 string under `data1`
|
|
- Edge Cases:
|
|
- Binary data missing or corrupted
|
|
- Encoding errors
|
|
|
|
#### 2.3 AI Processing with Gemini API
|
|
|
|
- **Overview:**
|
|
Sends the base64-encoded image and extraction instructions to the Gemini AI API (Flash Lite model) to perform OCR and extract the requested structured data fields.
|
|
|
|
- **Nodes Involved:**
|
|
- Call Gemini API (Flash Lite) with Image
|
|
|
|
- **Node Details:**
|
|
|
|
- **Call Gemini API (Flash Lite) with Image**
|
|
- Type: HTTP Request
|
|
- Configuration:
|
|
- URL: `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-lite:generateContent`
|
|
- Method: POST
|
|
- Body Type: JSON
|
|
- Body Content:
|
|
- `contents`: Array with two user roles:
|
|
1. Inline image data with base64 string (`{{$json.data1}}`) and MIME type `image/jpeg`
|
|
2. Text prompt `"check this"` (likely a placeholder or trigger)
|
|
- `systemInstruction`: Text from webhook input `Requirement` field
|
|
- `generationConfig`: Parameters controlling AI output such as temperature, topK, topP, max tokens, response MIME type (`application/json`), and a dynamic response schema derived from webhook `properties` field
|
|
- Authentication: Uses predefined Google Palm API credentials named "Gemini API Srinivasan Online"
|
|
- Input: JSON with base64 image and instructions
|
|
- Output: AI-generated JSON response containing extracted data candidates
|
|
- Expressions/Variables:
|
|
- Uses expressions to dynamically insert base64 image and extraction requirements
|
|
- Dynamically constructs response schema from input properties
|
|
- Edge Cases:
|
|
- API authentication failure
|
|
- Rate limiting or quota exceeded
|
|
- Invalid or malformed request body
|
|
- AI model errors or timeouts
|
|
- Unexpected response format
|
|
- Version-specific: Requires n8n version supporting HTTP Request node v4.2 and Google Palm API credentials
|
|
- Notes: This is the core AI OCR processing step.
|
|
|
|
#### 2.4 Output Formatting and Response
|
|
|
|
- **Overview:**
|
|
Parses the AI response to extract the first candidate's content, converts it from stringified JSON to a JSON object, and sends it back as the API response.
|
|
|
|
- **Nodes Involved:**
|
|
- Edit fields to output required data alone
|
|
- Respond to Webhook
|
|
|
|
- **Node Details:**
|
|
|
|
- **Edit fields to output required data alone**
|
|
- Type: Set
|
|
- Configuration:
|
|
- Assigns a new field `result` with the value parsed from the first candidate's content text:
|
|
`={{ $json.candidates[0].content.parts[0].text.parseJson() }}`
|
|
- Removes all other fields by default (only `result` remains)
|
|
- Input: AI JSON response
|
|
- Output: JSON with a single `result` field containing structured extracted data
|
|
- Edge Cases:
|
|
- Missing or empty `candidates` array
|
|
- Parsing errors if AI response text is not valid JSON
|
|
|
|
- **Respond to Webhook**
|
|
- Type: Respond to Webhook
|
|
- Configuration:
|
|
- Sends the output of the previous node as the HTTP response
|
|
- Input: Final JSON data to return
|
|
- Output: HTTP response to the original API caller
|
|
- Edge Cases:
|
|
- Network errors while sending response
|
|
- Large payloads causing timeout
|
|
|
|
---
|
|
|
|
### 3. Summary Table
|
|
|
|
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|
|
|----------------------------------|---------------------|----------------------------------------|-------------------------|-------------------------------|-------------------------------------------------------------------------------------------------|
|
|
| Webhook | Webhook | API entry point, receives request | External HTTP request | Get image from URL | See Sticky Note2 for detailed API endpoint description and use cases |
|
|
| Get image from URL | HTTP Request | Downloads image from provided URL | Webhook | Transform image to base64 | |
|
|
| Transform image to base64 | Extract From File | Converts binary image to base64 string | Get image from URL | Call Gemini API (Flash Lite) | |
|
|
| Call Gemini API (Flash Lite) with Image | HTTP Request | Sends image and instructions to AI | Transform image to base64 | Edit fields to output required data alone | |
|
|
| Edit fields to output required data alone | Set | Parses AI response and extracts data | Call Gemini API | Respond to Webhook | |
|
|
| Respond to Webhook | Respond to Webhook | Sends final JSON response to caller | Edit fields to output | External HTTP response | |
|
|
| Sticky Note | Sticky Note | Sample API call example | | | Contains cURL example for API usage |
|
|
| Sticky Note1 | Sticky Note | Sample output example | | | Shows example JSON output |
|
|
| Sticky Note2 | Sticky Note | Workflow overview and API description | | | Describes workflow purpose, use cases, and processing steps |
|
|
|
|
---
|
|
|
|
### 4. Reproducing the Workflow from Scratch
|
|
|
|
1. **Create Webhook Node**
|
|
- Type: Webhook
|
|
- Path: `data-extractor`
|
|
- Response Mode: `responseNode`
|
|
- Accepts GET requests with JSON body containing:
|
|
- `image_url` (string)
|
|
- `Requirement` (string)
|
|
- `properties` (JSON schema object)
|
|
|
|
2. **Create HTTP Request Node: "Get image from URL"**
|
|
- Type: HTTP Request
|
|
- Method: GET
|
|
- URL: Expression `{{$json.body.image_url}}` (from webhook input)
|
|
- No authentication or special headers
|
|
- Connect Webhook → Get image from URL
|
|
|
|
3. **Create Extract From File Node: "Transform image to base64"**
|
|
- Type: Extract From File
|
|
- Operation: `binaryToProperty`
|
|
- Destination Key: `data1`
|
|
- Encoding: ASCII (base64)
|
|
- Connect Get image from URL → Transform image to base64
|
|
|
|
4. **Create HTTP Request Node: "Call Gemini API (Flash Lite) with Image"**
|
|
- Type: HTTP Request
|
|
- Method: POST
|
|
- URL: `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-lite:generateContent`
|
|
- Authentication: Use Google Palm API credentials (OAuth2 or API key)
|
|
- Body Content (JSON):
|
|
```json
|
|
{
|
|
"contents": [
|
|
{
|
|
"role": "user",
|
|
"parts": [
|
|
{
|
|
"inlineData": {
|
|
"data": "{{$json.data1}}",
|
|
"mimeType": "image/jpeg"
|
|
}
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"role": "user",
|
|
"parts": [
|
|
{
|
|
"text": "check this"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"systemInstruction": {
|
|
"role": "user",
|
|
"parts": [
|
|
{
|
|
"text": "{{ $('Webhook').first().json.body.Requirement }}"
|
|
}
|
|
]
|
|
},
|
|
"generationConfig": {
|
|
"temperature": 1,
|
|
"topK": 40,
|
|
"topP": 0.95,
|
|
"maxOutputTokens": 8192,
|
|
"responseMimeType": "application/json",
|
|
"responseSchema": {
|
|
"type": "object",
|
|
"properties": {{ $('Webhook').first().json.body.properties.toJsonString() }}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
- Connect Transform image to base64 → Call Gemini API (Flash Lite) with Image
|
|
|
|
5. **Create Set Node: "Edit fields to output required data alone"**
|
|
- Type: Set
|
|
- Remove all fields except one new field:
|
|
- Name: `result`
|
|
- Value (expression): `={{ $json.candidates[0].content.parts[0].text.parseJson() }}`
|
|
- Connect Call Gemini API → Edit fields to output required data alone
|
|
|
|
6. **Create Respond to Webhook Node**
|
|
- Type: Respond to Webhook
|
|
- Connect Edit fields to output required data alone → Respond to Webhook
|
|
|
|
7. **Add Sticky Notes (Optional for Documentation)**
|
|
- Add a sticky note near the Webhook node with sample cURL API call (see Sticky Note content)
|
|
- Add a sticky note near the Respond to Webhook node with sample JSON output
|
|
|
|
8. **Credentials Setup**
|
|
- Configure Google Palm API credentials with valid API key or OAuth2 token for Gemini API access
|
|
- Assign these credentials to the "Call Gemini API (Flash Lite) with Image" node
|
|
|
|
9. **Activate Workflow**
|
|
- Save and activate the workflow
|
|
- Test by sending a GET request with JSON body to `/webhook/data-extractor`
|
|
|
|
---
|
|
|
|
### 5. General Notes & Resources
|
|
|
|
| Note Content | Context or Link |
|
|
|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
|
|
| Sample API call using cURL: includes image URL, extraction requirement, and JSON schema for expected fields | See Sticky Note near Webhook node |
|
|
| Sample output JSON demonstrating extracted fields such as PAN Number, Name, Date of Birth, and Valid (boolean) | See Sticky Note near Respond to Webhook node |
|
|
| Workflow converts images to base64 before sending to AI, ensuring compatibility with Gemini API input format | Workflow design detail |
|
|
| Uses Gemini API Flash Lite model for OCR and structured data extraction | Requires Google Palm API credentials |
|
|
| Supports customizable extraction schema via `properties` JSON parameter in API request | Enables flexible field extraction |
|
|
| Suitable for integration with CRM, ERP, document management, and other automation systems | Use case guidance |
|
|
| For more information on Gemini API and Google Palm API, refer to official Google Cloud documentation | https://cloud.google.com/vertex-ai/docs/generative-ai/models/gemini |
|
|
|
|
---
|
|
|
|
This documentation provides a complete, structured understanding of the workflow, enabling reproduction, modification, and troubleshooting by advanced users and AI agents alike. |