From d5648fa9f709347b038bb322f4bfa5846b7d0579 Mon Sep 17 00:00:00 2001 From: nusquama Date: Wed, 12 Nov 2025 20:02:46 +0100 Subject: [PATCH] creation --- .../readme-2649.md | 324 ++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 workflows/Prompt-based Object Detection with Gemini 2.0-2649/readme-2649.md diff --git a/workflows/Prompt-based Object Detection with Gemini 2.0-2649/readme-2649.md b/workflows/Prompt-based Object Detection with Gemini 2.0-2649/readme-2649.md new file mode 100644 index 000000000..c029b6609 --- /dev/null +++ b/workflows/Prompt-based Object Detection with Gemini 2.0-2649/readme-2649.md @@ -0,0 +1,324 @@ +Prompt-based Object Detection with Gemini 2.0 + +https://n8nworkflows.xyz/workflows/prompt-based-object-detection-with-gemini-2-0-2649 + + +# Prompt-based Object Detection with Gemini 2.0 + +### 1. Workflow Overview + +This workflow demonstrates how to leverage Google Gemini 2.0's experimental prompt-based bounding box object detection capabilities within an n8n automation. The core use case is to identify specific objects or subjects in an image based on a natural language prompt (e.g., "all bunnies"), retrieve their bounding box coordinates, and visually mark these detected objects on the original image. + +The logical flow is divided into these main blocks: + +1.1 **Input Reception & Image Preparation** +- Download the test image from a URL +- Extract image metadata (width and height) needed for coordinate scaling + +1.2 **AI Processing with Gemini 2.0** +- Submit the image and prompt to Gemini 2.0 API for bounding box detection +- Receive normalized bounding box coordinates in JSON format + +1.3 **Coordinate Scaling and Processing** +- Parse and extract the bounding box data +- Rescale normalized coordinates (0 to 1000 scale) to actual pixel values based on image dimensions + +1.4 **Visualization of Results** +- Draw bounding boxes onto the original image using the scaled coordinates +- Output the annotated image for validation or further use + +--- + +### 2. Block-by-Block Analysis + +#### 1.1 Input Reception & Image Preparation + +- **Overview:** + This block handles fetching the target image and extracting its size details, which are essential for accurately mapping the bounding box coordinates returned by the AI. + +- **Nodes Involved:** + - When clicking ‘Test workflow’ (Manual Trigger) + - Get Test Image (HTTP Request) + - Get Image Info (Edit Image) + +- **Node Details:** + + - **When clicking ‘Test workflow’** + - Type: Manual Trigger + - Role: Entry point for manual testing of the workflow + - Config: No parameters; triggers workflow on demand + - Input: None + - Output: Triggers "Get Test Image" + - Edge Cases: None typical; manual trigger ensures controlled start + + - **Get Test Image** + - Type: HTTP Request + - Role: Downloads the image from a fixed URL + - Config: URL set to a petting zoo image with rabbits; no authentication or special headers + - Input: Trigger from manual node + - Output: Binary image data passed to "Get Image Info" + - Edge Cases: + - Network errors or image URL unavailability + - Unsupported image format causing failure downstream + - Large image size could affect performance + + - **Get Image Info** + - Type: Edit Image (Information extraction) + - Role: Extracts metadata (width and height) from the binary image + - Config: Operation set to "information" to retrieve image size metadata + - Input: Binary image from HTTP Request + - Output: JSON containing image width and height, forwarded to Gemini API node + - Edge Cases: + - Corrupted or unsupported image format causing failure + - Missing metadata field errors + +--- + +#### 1.2 AI Processing with Gemini 2.0 + +- **Overview:** + This block sends the image and a text prompt to Google Gemini 2.0 API to detect bounding boxes of objects matching the prompt ("all bunnies" in this demo). It receives normalized coordinates as JSON. + +- **Nodes Involved:** + - Gemini 2.0 Object Detection + - Get Variables (Set node, prepares extracted data) + +- **Node Details:** + + - **Gemini 2.0 Object Detection** + - Type: HTTP Request + - Role: Sends POST request to Google Gemini 2.0 model API for bounding box detection + - Config: + - URL: Gemini 2.0 API endpoint for generating content with bounding box detection + - Method: POST + - Body: JSON containing prompt text and inline image data (base64-encoded) + - Authentication: Uses predefined Google Palm API credentials + - Response schema expects an array of objects, each with `box_2d` (array of coordinates) and `label` (string) + - Input: Binary image from "Get Image Info" node + - Output: JSON with detected bounding box data passed to "Get Variables" node + - Edge Cases: + - Authentication failures due to invalid or expired credentials + - API rate limits or timeouts + - Malformed or unexpected response payloads + - Gemini 2.0 model being experimental may produce incomplete or inaccurate detections + + - **Get Variables** + - Type: Set + - Role: Extracts and parses the bounding box data and image dimensions for further processing + - Config: + - Uses expressions to parse the first candidate’s content text as JSON, extracting bounding box `coords` + - Retrieves image `width` and `height` from prior node's JSON metadata + - Input: JSON response from Gemini node and image info data + - Output: Structured JSON with bounding box array and image dimensions forwarded to scaling code node + - Edge Cases: + - Failure if parsing JSON fails (e.g., API response format changes) + - Missing or null data in expected fields + +--- + +#### 1.3 Coordinate Scaling and Processing + +- **Overview:** + This block rescales the normalized bounding box coordinates (0-1000 range) to pixel coordinates matching the original image dimensions to allow precise drawing. + +- **Nodes Involved:** + - Scale Normalised Coords (Code node) + +- **Node Details:** + + - **Scale Normalised Coords** + - Type: Code (JavaScript) + - Role: Converts normalized bounding boxes into pixel coordinates + - Config: + - Extracts `coords`, `width`, and `height` from input JSON + - Applies scaling formulas: + - X coordinates scaled by `(val * width) / 1000` + - Y coordinates scaled by `(val * height) / 1000` + - Filters out invalid boxes (must have 4 coordinates) + - Outputs normalized boxes with properties `xmin`, `xmax`, `ymin`, `ymax` + - Passes along the original binary image for annotation + - Input: Parsed coordinates and image dimension data from "Get Variables" + - Output: JSON with scaled bounding boxes and original binary image forwarded to drawing node + - Edge Cases: + - Coordinates missing or malformed (e.g., wrong array length) + - Division by zero if width or height missing or zero + - Index errors if bounding box arrays are incorrectly structured + +--- + +#### 1.4 Visualization of Results + +- **Overview:** + This block draws the scaled bounding boxes onto the original image using the "Edit Image" node to visually confirm the detected objects. + +- **Nodes Involved:** + - Draw Bounding Boxes (Edit Image) + +- **Node Details:** + + - **Draw Bounding Boxes** + - Type: Edit Image (Draw operations) + - Role: Annotates the original image by drawing rectangles for each detected bounding box + - Config: + - Multi-step drawing operation configured with a fixed set of up to 6 bounding boxes + - Each box uses the scaled xmin, ymin, xmax, ymax values for start and end positions + - Color set to a semi-transparent pink (#ff00f277) for visibility + - Corner radius fixed to 0 (sharp corners) + - Coordinates accessed via JSON expressions referencing the scaled coords array + - Input: JSON with scaled bounding boxes and binary image from the code node + - Output: Modified image with drawn bounding boxes (binary data) + - Edge Cases: + - If fewer than 6 bounding boxes are returned, references to non-existent array indices will cause errors + - Coordinates out of image bounds may cause drawing issues + - Image format incompatibility or large image size impacting performance + +--- + +### 3. Summary Table + +| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note | +|---------------------------|--------------------|------------------------------------------------|---------------------------|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------| +| When clicking ‘Test workflow’ | Manual Trigger | Entry point to start the workflow manually | - | Get Test Image | Try it out! This n8n template demonstrates how to use Gemini 2.0's new Bounding Box detection capabilities your workflows. The key difference being this enables prompt-based object detection for images... [Discord](https://discord.com/invite/XPKeKXeB7d) [Forum](https://community.n8n.io/) | +| Get Test Image | HTTP Request | Downloads test image from a fixed URL | When clicking ‘Test workflow’ | Get Image Info | ## 1. Download Test Image [HTTP node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.httprequest) [Image format docs](https://ai.google.dev/gemini-api/docs/vision?lang=rest#technical-details-image) | +| Get Image Info | Edit Image | Extracts image width and height | Get Test Image | Gemini 2.0 Object Detection | | +| Gemini 2.0 Object Detection | HTTP Request | Calls Gemini API to detect bounding boxes | Get Image Info | Get Variables | ## 2. Use Prompt-Based Object Detection [HTTP node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.httprequest) [Previous object detection example](https://n8n.io/workflows/2331-build-your-own-image-search-using-ai-object-detection-cdn-and-elasticsearch/) | +| Get Variables | Set | Parses Gemini response and extracts variables | Gemini 2.0 Object Detection | Scale Normalised Coords | | +| Scale Normalised Coords | Code | Scales normalized bounding box coordinates | Get Variables | Draw Bounding Boxes | ## 3. Scale Coords to Fit Original Image [Code node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.code/) [Bounding box explanation](https://ai.google.dev/gemini-api/docs/models/gemini-v2?_gl=1*187cb6v*_up*MQ..*_ga*MTU1ODkzMDc0Mi4xNzM0NDM0NDg2*_ga_P1DBVKWT6V*MTczNDQzNDQ4Ni4xLjAuMTczNDQzNDQ4Ni4wLjAuMjEzNzc5MjU0Ng..#bounding-box) | +| Draw Bounding Boxes | Edit Image | Draws bounding boxes on the original image | Scale Normalised Coords | - | ## 4. Draw! [Edit Image node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.editimage/) | +| Sticky Note | Sticky Note | Image example of output bounding box detection | - | - | Fig 1. Output of Object Detection ![](https://res.cloudinary.com/daglih2g8/image/upload/f_auto,q_auto/v1/n8n-workflows/download_1_qmqyyo#full-width) | +| Sticky Note1 | Sticky Note | Description of downloading the test image | - | - | ## 1. Download Test Image [HTTP node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.httprequest) | +| Sticky Note2 | Sticky Note | Explanation of prompt-based object detection | - | - | ## 2. Use Prompt-Based Object Detection [HTTP node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.httprequest) | +| Sticky Note3 | Sticky Note | Explains coordinate scaling logic | - | - | ## 3. Scale Coords to Fit Original Image [Code node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.code/) | +| Sticky Note4 | Sticky Note | Notes on why not using Basic LLM node | - | - | ### Q. Why not use the Basic LLM node? At time of writing, Langchain version does not recognise Gemini 2.0 to be a multimodal model. | +| Sticky Note5 | Sticky Note | Notes on drawing bounding boxes | - | - | ## 4. Draw! [Edit Image node docs](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.editimage/) | +| Sticky Note6 | Sticky Note | Overall workflow description and help links | - | - | Try it out! This n8n template demonstrates how to use Gemini 2.0's new Bounding Box detection capabilities your workflows. [Discord](https://discord.com/invite/XPKeKXeB7d) [Forum](https://community.n8n.io/) | + +--- + +### 4. Reproducing the Workflow from Scratch + +1. **Create Manual Trigger Node** + - Name: `When clicking ‘Test workflow’` + - Purpose: To start the workflow manually + +2. **Create HTTP Request Node to Download Image** + - Name: `Get Test Image` + - URL: `https://www.stonhambarns.co.uk/wp-content/uploads/jennys-ark-petting-zoo-for-website-6.jpg` + - Method: GET + - Connect output of manual trigger to this node + +3. **Create Edit Image Node to Get Image Info** + - Name: `Get Image Info` + - Operation: `information` (to extract image metadata) + - Input: Connect from `Get Test Image` node's output + - This node outputs JSON with image dimensions + +4. **Create HTTP Request Node for Gemini 2.0 API Call** + - Name: `Gemini 2.0 Object Detection` + - URL: `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent` + - Method: POST + - Authentication: Set to Google Palm API credential (OAuth2 or API key as configured) + - Body Type: JSON + - JSON Body: + ```json + { + "contents": [{ + "parts": [ + {"text": "I want to see all bounding boxes of rabbits in this image."}, + { + "inline_data": { + "mime_type": "image/jpeg", + "data": "={{ $input.item.binary.data.data }}" + } + } + ] + }], + "generationConfig": { + "response_mime_type": "application/json", + "response_schema": { + "type": "ARRAY", + "items": { + "type": "OBJECT", + "properties": { + "box_2d": {"type": "ARRAY", "items": {"type": "NUMBER"}}, + "label": {"type": "STRING"} + } + } + } + } + } + ``` + - Connect output of `Get Image Info` node as input (image binary passed along) + +5. **Create Set Node to Extract Variables** + - Name: `Get Variables` + - Purpose: Parse Gemini response and extract coords, width, height + - Assignments: + - `coords` (array): `={{ $json.candidates[0].content.parts[0].text.parseJson() }}` + - `width` (string): `={{ $('Get Image Info').item.json.size.width }}` + - `height` (string): `={{ $('Get Image Info').item.json.size.height }}` + - Connect from `Gemini 2.0 Object Detection` output + +6. **Create Code Node to Scale Coordinates** + - Name: `Scale Normalised Coords` + - Language: JavaScript + - Code: + ```js + const { coords, width, height } = $input.first().json; + const scale = 1000; + const scaleCoordX = (val) => (val * width) / scale; + const scaleCoordY = (val) => (val * height) / scale; + + const normalisedOutput = coords + .filter(coord => coord.box_2d.length === 4) + .map(coord => { + return { + xmin: coord.box_2d[1] ? scaleCoordX(coord.box_2d[1]) : coord.box_2d[1], + xmax: coord.box_2d[3] ? scaleCoordX(coord.box_2d[3]) : coord.box_2d[3], + ymin: coord.box_2d[0] ? scaleCoordY(coord.box_2d[0]) : coord.box_2d[0], + ymax: coord.box_2d[2] ? scaleCoordY(coord.box_2d[2]) : coord.box_2d[2], + }; + }); + + return { + json: { coords: normalisedOutput }, + binary: $('Get Test Image').first().binary + }; + ``` + - Connect from `Get Variables` output + +7. **Create Edit Image Node to Draw Bounding Boxes** + - Name: `Draw Bounding Boxes` + - Operation: `multiStep` + - Configure drawing operations for each bounding box (up to 6), with: + - StartPositionX: `={{ $json.coords[index].xmin }}` + - StartPositionY: `={{ $json.coords[index].ymin }}` + - EndPositionX: `={{ $json.coords[index].xmax }}` + - EndPositionY: `={{ $json.coords[index].ymax }}` + - Color: `#ff00f277` (semi-transparent pink) + - CornerRadius: `0` + - Connect from `Scale Normalised Coords` output + +8. **Connect all nodes in sequence:** + `When clicking ‘Test workflow’` → `Get Test Image` → `Get Image Info` → `Gemini 2.0 Object Detection` → `Get Variables` → `Scale Normalised Coords` → `Draw Bounding Boxes` + +9. **Configure Credentials:** + - Set up Google Palm API credentials (OAuth2 or API key) under n8n Credentials for the Gemini API node + +10. **Optional:** Add Sticky Notes for documentation and reference inside the editor as per original workflow + +--- + +### 5. General Notes & Resources + +| Note Content | Context or Link | +|-------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------| +| This workflow is an experimental demonstration of Google Gemini 2.0's bounding box detection capabilities. | Workflow description | +| Gemini 2.0 bounding box coordinates are normalized to a 0–1000 scale and must be rescaled to image pixel dimensions. | [Gemini 2.0 bounding box docs](https://ai.google.dev/gemini-api/docs/models/gemini-v2?_gl=1*187cb6v*_up*MQ..#bounding-box) | +| The workflow uses an experimental API and may produce incomplete or inaccurate detections; not recommended for production. | Template warning | +| Join the n8n community Discord or Forum for support and discussion. | [Discord](https://discord.com/invite/XPKeKXeB7d), [Forum](https://community.n8n.io/) | +| Previous related n8n example using ResNet for object detection available for reference. | [Image Search Workflow](https://n8n.io/workflows/2331-build-your-own-image-search-using-ai-object-detection-cdn-and-elasticsearch/) | + +--- + +This documentation provides a comprehensive, structured understanding of the "Prompt-based Object Detection with Gemini 2.0" workflow, facilitating effective use, modification, and troubleshooting. \ No newline at end of file