This commit is contained in:
nusquama
2025-11-13 13:29:26 +01:00
parent 6d736de960
commit 19c96887ba

View File

@@ -0,0 +1,356 @@
Chat with Documents via RAG: Google Drive to GPT-5 with Supabase Vector Database
https://n8nworkflows.xyz/workflows/chat-with-documents-via-rag--google-drive-to-gpt-5-with-supabase-vector-database-7308
# Chat with Documents via RAG: Google Drive to GPT-5 with Supabase Vector Database
### 1. Workflow Overview
This n8n workflow implements a Retrieval-Augmented Generation (RAG) chatbot solution enabling users to interact with documents stored on Google Drive via an advanced GPT-5 language model, with vector similarity search powered by Supabase. It is designed for use cases where users need AI-driven conversational access to documents, including scanned files requiring OCR, and supports multi-channel chat triggers (Slack, Telegram, Gmail, WhatsApp).
The workflow is organized into the following logical blocks:
- **1.1 Document Ingestion & Processing:** Detects new Google Drive files, downloads, optionally OCRs, splits text, generates embeddings, and stores them in Supabase vector database.
- **1.2 Vector Store & Embeddings Setup:** Manages embeddings generation and vector store interactions for document indexing and retrieval.
- **1.3 Chat Trigger & Preprocessing:** Listens for incoming chat messages from multiple platforms, standardizes input, and prepares context.
- **1.4 RAG Agent & Language Model Processing:** Executes the RAG agent that queries Supabase with embeddings, integrates chat memory, and leverages GPT-5 for response generation.
- **1.5 Response Delivery:** Sends generated answers back to users via the appropriate messaging platform.
---
### 2. Block-by-Block Analysis
#### 2.1 Document Ingestion & Processing
**Overview:**
This block listens for new files created in Google Drive, downloads them, performs OCR if needed, splits the text into manageable chunks, generates embeddings, and stores data in the Supabase vector database.
**Nodes Involved:**
- Google Drive Trigger: `Create file`
- Google Drive: `Download File4`
- OCR: `OCR`
- Split Output: `Split Out1`
- Text Splitter: `Character Text Splitter4`
- Default Data Loader: `Default Data Loader4`
- Embeddings OpenAI: `Embeddings OpenAI8`
- Supabase Vector Store Insert: `Insert into Supabase Vectorstore3`
**Node Details:**
- **Create file**
- *Type:* Google Drive Trigger
- *Role:* Detects newly created files in Google Drive to initiate ingestion
- *Config:* Default trigger, no filters specified
- *Connections:* Outputs to `Download File4`
- *Potential failures:* API quota limits, network errors
- **Download File4**
- *Type:* Google Drive (Download)
- *Role:* Downloads the file detected by trigger
- *Config:* Uses file ID from trigger output
- *Connections:* Outputs to `OCR`
- *Edge cases:* Unsupported file types, download failures
- **OCR**
- *Type:* Mistral AI (OCR)
- *Role:* Converts image or scanned PDFs to text
- *Config:* Default OCR parameters
- *Connections:* Outputs to `Split Out1`
- *Edge cases:* Poor image quality causing inaccurate OCR
- **Split Out1**
- *Type:* Split Out
- *Role:* Flattens multi-part OCR output for downstream processing
- *Connections:* Outputs to `Insert into Supabase Vectorstore3`
- **Character Text Splitter4**
- *Type:* Langchain Character Text Splitter
- *Role:* Splits large texts into smaller chunks for embedding generation
- *Config:* Uses character-based splitting strategy (default parameters)
- *Connections:* Sends chunks to `Default Data Loader4`
- **Default Data Loader4**
- *Type:* Langchain Default Data Loader
- *Role:* Prepares chunked text for embedding
- *Connections:* Outputs to `Insert into Supabase Vectorstore3`
- **Embeddings OpenAI8**
- *Type:* Langchain OpenAI Embeddings
- *Role:* Generates vector embeddings for document chunks
- *Config:* Uses OpenAI API with configured credentials
- *Connections:* Outputs to `Insert into Supabase Vectorstore3`
- *Edge cases:* API rate limits, invalid credentials
- **Insert into Supabase Vectorstore3**
- *Type:* Langchain Supabase Vector Store
- *Role:* Inserts document embeddings and metadata into Supabase vector database
- *Config:* Supabase credentials configured
- *Input:* Receives embeddings and document metadata
- *Output:* None (terminal for ingestion branch)
- *Failures:* Network errors, authentication failures, data schema mismatches
---
#### 2.2 Vector Store & Embeddings Setup
**Overview:**
This block sets up the vector store and embeddings for query-time retrieval and interacts with the RAG agent.
**Nodes Involved:**
- Supabase Vector Store: `Supabase Vector Store`
- Embeddings OpenAI: `Embeddings OpenAI6`
**Node Details:**
- **Embeddings OpenAI6**
- *Type:* Langchain OpenAI Embeddings
- *Role:* Generates embeddings for user queries or documents during retrieval
- *Config:* OpenAI API key usage
- *Connections:* Outputs to `Supabase Vector Store`
- **Supabase Vector Store**
- *Type:* Langchain Supabase Vector Store
- *Role:* Provides vector similarity search capabilities from Supabase
- *Config:* Supabase project URL and API key
- *Connections:* Feeds into the `RAG Agent` node for retrieval
- *Failures:* Query timeouts, incorrect vector dimensions
---
#### 2.3 Chat Trigger & Preprocessing
**Overview:**
This block receives chat messages from various messaging platforms, normalizes input, and passes it to the RAG agent.
**Nodes Involved:**
- Langchain Chat Trigger: `When chat message received`
- Slack Trigger: `Slack Trigger`
- Telegram Trigger: `Telegram Trigger`
- Gmail Trigger: `Gmail Trigger`
- WhatsApp Trigger: `WhatsApp Trigger` (disabled)
- Edit Fields: `Edit Fields`
**Node Details:**
- **When chat message received**
- *Type:* Langchain Chat Trigger
- *Role:* Listens for incoming chat messages (main webhook entrypoint)
- *Config:* Webhook ID specified
- *Connections:* Outputs to `RAG Agent`
- *Failures:* Webhook connectivity, invalid message formats
- **Slack Trigger, Telegram Trigger, Gmail Trigger, WhatsApp Trigger**
- *Type:* Messaging platform specific triggers
- *Role:* Provides triggers from respective platforms to capture chat inputs
- *Config:* Webhook IDs set for each platform; WhatsApp trigger disabled
- *Connections:* All output to `Edit Fields` node for normalization
- *Failures:* Auth token expiry, webhook misconfiguration
- **Edit Fields**
- *Type:* Set (Data Editing)
- *Role:* Normalizes incoming message content and metadata before feeding to RAG agent
- *Connections:* Outputs to `RAG Agent`
---
#### 2.4 RAG Agent & Language Model Processing
**Overview:**
Core AI logic combining chat memory, vector store retrieval, and GPT-5 language model to generate contextually relevant answers.
**Nodes Involved:**
- RAG Agent: `RAG Agent`
- Chat Memory: `Chat Memory`
- GPT 5: `GPT 5`
**Node Details:**
- **RAG Agent**
- *Type:* Langchain Agent
- *Role:* Orchestrates retrieval of relevant document chunks from Supabase and interacts with GPT-5 model
- *Inputs:* Receives chat input (from triggers), memory (from `Chat Memory`), and vector store tool (from `Supabase Vector Store`)
- *Outputs:* Generated answer routed to response delivery nodes
- *Failures:* Query failures, model API limits, memory access errors
- **Chat Memory**
- *Type:* Langchain Postgres Chat Memory
- *Role:* Stores and retrieves conversational context to maintain dialogue coherence
- *Config:* Connects to a Postgres database for persistent memory
- *Connections:* Feeds memory context to `RAG Agent`
- **GPT 5**
- *Type:* Langchain OpenAI Chat Model
- *Role:* Language model generating natural language responses
- *Config:* Uses GPT-5 via OpenAI API with proper credentials
- *Connections:* Connected as language model for `RAG Agent`
- *Failures:* API rate limits, malformed input, timeout
---
#### 2.5 Response Delivery
**Overview:**
Delivers the AI-generated responses back to users via their respective messaging channels.
**Nodes Involved:**
- Telegram: `Send a text message`
- WhatsApp: `Send message` (disabled)
- Slack: `Send a message`
- Gmail: `Send a message1`
**Node Details:**
- **Send a text message (Telegram)**
- *Type:* Telegram Node
- *Role:* Sends text reply to Telegram users
- *Config:* Requires Telegram bot credentials
- *Input:* Receives RAG Agent output
- *Failures:* Bot authentication failures, message formatting errors
- **Send message (WhatsApp)**
- *Type:* WhatsApp Node
- *Role:* Intended for WhatsApp message delivery (currently disabled)
- *Failures:* WhatsApp API restrictions, disabled node
- **Send a message (Slack)**
- *Type:* Slack Node
- *Role:* Sends response message to Slack channel or user
- *Config:* Slack OAuth credentials configured
- *Input:* Receives RAG Agent output
- **Send a message1 (Gmail)**
- *Type:* Gmail Node
- *Role:* Sends email responses via Gmail
- *Config:* OAuth2 credentials for Gmail configured
- *Input:* Receives RAG Agent output
---
### 3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|-----------------------------|-------------------------------------------|----------------------------------------|-------------------------------|-----------------------------------------------|------------------------------------------|
| Create file | Google Drive Trigger | Detect new Google Drive files | None | Download File4 | |
| Download File4 | Google Drive | Download detected file | Create file | OCR | |
| OCR | Mistral AI | Perform OCR on downloaded files | Download File4 | Split Out1 | |
| Split Out1 | Split Out | Flatten OCR output | OCR | Insert into Supabase Vectorstore3 | |
| Character Text Splitter4 | Langchain Text Splitter | Split text into chunks | None (implied with Default Data Loader4) | Default Data Loader4 | |
| Default Data Loader4 | Langchain Default Data Loader | Prepare documents for embedding | Character Text Splitter4 | Insert into Supabase Vectorstore3 | |
| Embeddings OpenAI8 | Langchain OpenAI Embeddings | Generate embeddings for document chunks | Default Data Loader4 | Insert into Supabase Vectorstore3 | |
| Insert into Supabase Vectorstore3 | Langchain Supabase Vector Store | Store embeddings in Supabase vector DB | Split Out1, Embeddings OpenAI8, Default Data Loader4 | None | |
| Supabase Vector Store | Langchain Supabase Vector Store | Vector store for RAG retrieval | Embeddings OpenAI6 | RAG Agent | |
| Embeddings OpenAI6 | Langchain OpenAI Embeddings | Generate query embeddings | None | Supabase Vector Store | |
| When chat message received | Langchain Chat Trigger | Entry webhook for chat messages | None | RAG Agent | |
| Slack Trigger | Slack Trigger | Trigger on Slack messages | None | Edit Fields | |
| Telegram Trigger | Telegram Trigger | Trigger on Telegram messages | None | Edit Fields | |
| Gmail Trigger | Gmail Trigger | Trigger on Gmail messages | None | Edit Fields | |
| WhatsApp Trigger | WhatsApp Trigger (disabled) | Trigger on WhatsApp messages | None | Edit Fields | Disabled node |
| Edit Fields | Set | Normalize incoming chat fields | Slack Trigger, Telegram Trigger, Gmail Trigger, WhatsApp Trigger | RAG Agent | |
| Chat Memory | Langchain Postgres Chat Memory | Store and retrieve chat conversation context | None | RAG Agent | |
| RAG Agent | Langchain Agent | Retrieve relevant docs and generate AI response | Edit Fields, Chat Memory, Supabase Vector Store, GPT 5 | Send a text message, Send message, Send a message, Send a message1 | |
| GPT 5 | Langchain OpenAI Chat Model | Generate natural language responses | RAG Agent (ai_languageModel) | RAG Agent | |
| Send a text message | Telegram | Send response to Telegram users | RAG Agent | None | |
| Send message | WhatsApp (disabled) | Send response to WhatsApp users | RAG Agent | None | Disabled node |
| Send a message | Slack | Send response to Slack channels/users | RAG Agent | None | |
| Send a message1 | Gmail | Send response emails | RAG Agent | None | |
| Sticky Note | Sticky Note | Visual notes (empty content) | None | None | Multiple sticky notes with empty content |
| Sticky Note1 | Sticky Note | Visual notes (empty content) | None | None | |
| Sticky Note2 | Sticky Note | Visual notes (empty content) | None | None | |
| Sticky Note3 | Sticky Note | Visual notes (empty content) | None | None | |
| Sticky Note4 | Sticky Note | Visual notes (empty content) | None | None | |
| Sticky Note5 | Sticky Note | Visual notes (empty content) | None | None | |
| Sticky Note6 | Sticky Note | Visual notes (empty content) | None | None | |
---
### 4. Reproducing the Workflow from Scratch
1. **Create a Google Drive Trigger node (`Create file`):**
- Configure it to trigger on new file creation events in your Google Drive account.
2. **Add a Google Drive node (`Download File4`):**
- Connect input from `Create file`.
- Configure to download the file using the file ID from the trigger.
3. **Add an OCR node (`OCR`):**
- Connect input from `Download File4`.
- Configure with Mistral AI or your preferred OCR service to extract text from downloaded files.
4. **Add a Split Out node (`Split Out1`):**
- Connect input from `OCR`.
- This node flattens OCR output arrays for further processing.
5. **Add a Character Text Splitter node (`Character Text Splitter4`):**
- Connect logically to the document processing flow (after OCR or download).
- Configure to split long text into smaller chunks (default settings suffice).
6. **Add a Default Data Loader node (`Default Data Loader4`):**
- Connect input from `Character Text Splitter4`.
- Prepares document chunks for embeddings.
7. **Add an Embeddings OpenAI node (`Embeddings OpenAI8`):**
- Connect input from `Default Data Loader4`.
- Set API key credentials for OpenAI. Use embedding model (e.g., text-embedding-ada-002).
8. **Add a Supabase Vector Store node for insertion (`Insert into Supabase Vectorstore3`):**
- Connect inputs from `Split Out1`, `Embeddings OpenAI8`, and `Default Data Loader4`.
- Configure with your Supabase project URL and API key for vector store insertion.
9. **Add an Embeddings OpenAI node for queries (`Embeddings OpenAI6`):**
- Configure similarly to the ingestion embeddings node.
- Connect output to the Supabase Vector Store retrieval node.
10. **Add a Supabase Vector Store node for retrieval (`Supabase Vector Store`):**
- Connect input from `Embeddings OpenAI6`.
- Configure with Supabase credentials, set as vector search tool.
11. **Add a Langchain Chat Trigger node (`When chat message received`):**
- Configure webhook for incoming chat messages.
12. **Add platform-specific trigger nodes:**
- Slack Trigger, Telegram Trigger, Gmail Trigger (enable as needed).
- Configure respective webhook IDs and credentials.
13. **Add an Edit Fields node (`Edit Fields`):**
- Connect inputs from all platform triggers.
- Normalize and standardize incoming message data before feeding to RAG agent.
14. **Add a Langchain Postgres Chat Memory node (`Chat Memory`):**
- Configure connection to a Postgres database for chat context persistence.
15. **Add a GPT-5 Langchain Chat Model node (`GPT 5`):**
- Configure with OpenAI GPT-5 credentials and parameters.
16. **Add a Langchain Agent node (`RAG Agent`):**
- Connect inputs from `Edit Fields` (main input), `Chat Memory` (memory), `Supabase Vector Store` (tool), and `GPT 5` (language model).
- Configure agent to perform retrieval augmented generation.
17. **Add response delivery nodes:**
- Telegram node (`Send a text message`), Slack node (`Send a message`), Gmail node (`Send a message1`).
- Connect all to `RAG Agent` output.
- Configure credentials for each messaging platform.
18. **(Optional) Disable WhatsApp trigger and message sending nodes if not used.**
19. **Test the entire flow:**
- Upload documents to Google Drive to validate ingestion.
- Send chat messages via configured platforms to test RAG responses.
---
### 5. General Notes & Resources
| Note Content | Context or Link |
|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| The workflow leverages Langchain n8n nodes extensively for AI document processing and chat. | See https://js.langchain.com/docs/use/n8n/ |
| Supabase is used as a vector database for fast semantic search. | https://supabase.com/docs/guides/database/vector-search |
| GPT-5 is accessed via OpenAI API—ensure API keys have correct access and quota. | https://platform.openai.com/docs/models/gpt-5 |
| The OCR node uses Mistral AI; alternative OCR services can be configured as needed. | https://mistral.ai/ |
| Multi-channel chat triggers enable broad user access; disable unused triggers to optimize performance.| - |
| For persistent chat memory, a Postgres database is required and should be configured securely.| https://www.postgresql.org/docs/current/index.html |
| n8n documentation for node-specific configurations is available at https://docs.n8n.io/ | - |
---
**Disclaimer:** The text provided is extracted solely from an automated n8n workflow. It complies fully with current content policies and contains no illegal, offensive, or protected material. All processed data is legal and publicly accessible.