mirror of https://github.com/khoaliber/n8nworkflows.xyz.git synced 2026-04-19 17:14:37 +00:00

Files

nusquama 19c96887ba creation

2025-11-13 13:29:26 +01:00

22 KiB

Raw Permalink Blame History

Chat with Documents via RAG: Google Drive to GPT-5 with Supabase Vector Database

https://n8nworkflows.xyz/workflows/chat-with-documents-via-rag--google-drive-to-gpt-5-with-supabase-vector-database-7308

Chat with Documents via RAG: Google Drive to GPT-5 with Supabase Vector Database

1. Workflow Overview

This n8n workflow implements a Retrieval-Augmented Generation (RAG) chatbot solution enabling users to interact with documents stored on Google Drive via an advanced GPT-5 language model, with vector similarity search powered by Supabase. It is designed for use cases where users need AI-driven conversational access to documents, including scanned files requiring OCR, and supports multi-channel chat triggers (Slack, Telegram, Gmail, WhatsApp).

The workflow is organized into the following logical blocks:

1.1 Document Ingestion & Processing: Detects new Google Drive files, downloads, optionally OCRs, splits text, generates embeddings, and stores them in Supabase vector database.
1.2 Vector Store & Embeddings Setup: Manages embeddings generation and vector store interactions for document indexing and retrieval.
1.3 Chat Trigger & Preprocessing: Listens for incoming chat messages from multiple platforms, standardizes input, and prepares context.
1.4 RAG Agent & Language Model Processing: Executes the RAG agent that queries Supabase with embeddings, integrates chat memory, and leverages GPT-5 for response generation.
1.5 Response Delivery: Sends generated answers back to users via the appropriate messaging platform.

2. Block-by-Block Analysis

2.1 Document Ingestion & Processing

Overview:
This block listens for new files created in Google Drive, downloads them, performs OCR if needed, splits the text into manageable chunks, generates embeddings, and stores data in the Supabase vector database.

Nodes Involved:

Google Drive Trigger: Create file
Google Drive: Download File4
OCR: OCR
Split Output: Split Out1
Text Splitter: Character Text Splitter4
Default Data Loader: Default Data Loader4
Embeddings OpenAI: Embeddings OpenAI8
Supabase Vector Store Insert: Insert into Supabase Vectorstore3

Node Details:

Create file
- Type: Google Drive Trigger
- Role: Detects newly created files in Google Drive to initiate ingestion
- Config: Default trigger, no filters specified
- Connections: Outputs to Download File4
- Potential failures: API quota limits, network errors
Download File4
- Type: Google Drive (Download)
- Role: Downloads the file detected by trigger
- Config: Uses file ID from trigger output
- Connections: Outputs to OCR
- Edge cases: Unsupported file types, download failures
OCR
- Type: Mistral AI (OCR)
- Role: Converts image or scanned PDFs to text
- Config: Default OCR parameters
- Connections: Outputs to Split Out1
- Edge cases: Poor image quality causing inaccurate OCR
Split Out1
- Type: Split Out
- Role: Flattens multi-part OCR output for downstream processing
- Connections: Outputs to Insert into Supabase Vectorstore3
Character Text Splitter4
- Type: Langchain Character Text Splitter
- Role: Splits large texts into smaller chunks for embedding generation
- Config: Uses character-based splitting strategy (default parameters)
- Connections: Sends chunks to Default Data Loader4
Default Data Loader4
- Type: Langchain Default Data Loader
- Role: Prepares chunked text for embedding
- Connections: Outputs to Insert into Supabase Vectorstore3
Embeddings OpenAI8
- Type: Langchain OpenAI Embeddings
- Role: Generates vector embeddings for document chunks
- Config: Uses OpenAI API with configured credentials
- Connections: Outputs to Insert into Supabase Vectorstore3
- Edge cases: API rate limits, invalid credentials
Insert into Supabase Vectorstore3
- Type: Langchain Supabase Vector Store
- Role: Inserts document embeddings and metadata into Supabase vector database
- Config: Supabase credentials configured
- Input: Receives embeddings and document metadata
- Output: None (terminal for ingestion branch)
- Failures: Network errors, authentication failures, data schema mismatches

2.2 Vector Store & Embeddings Setup

Overview:
This block sets up the vector store and embeddings for query-time retrieval and interacts with the RAG agent.

Nodes Involved:

Supabase Vector Store: Supabase Vector Store
Embeddings OpenAI: Embeddings OpenAI6

Node Details:

Embeddings OpenAI6
- Type: Langchain OpenAI Embeddings
- Role: Generates embeddings for user queries or documents during retrieval
- Config: OpenAI API key usage
- Connections: Outputs to Supabase Vector Store
Supabase Vector Store
- Type: Langchain Supabase Vector Store
- Role: Provides vector similarity search capabilities from Supabase
- Config: Supabase project URL and API key
- Connections: Feeds into the RAG Agent node for retrieval
- Failures: Query timeouts, incorrect vector dimensions

2.3 Chat Trigger & Preprocessing

Overview:
This block receives chat messages from various messaging platforms, normalizes input, and passes it to the RAG agent.

Nodes Involved:

Langchain Chat Trigger: When chat message received
Slack Trigger: Slack Trigger
Telegram Trigger: Telegram Trigger
Gmail Trigger: Gmail Trigger
WhatsApp Trigger: WhatsApp Trigger (disabled)
Edit Fields: Edit Fields

Node Details:

When chat message received
- Type: Langchain Chat Trigger
- Role: Listens for incoming chat messages (main webhook entrypoint)
- Config: Webhook ID specified
- Connections: Outputs to RAG Agent
- Failures: Webhook connectivity, invalid message formats
Slack Trigger, Telegram Trigger, Gmail Trigger, WhatsApp Trigger
- Type: Messaging platform specific triggers
- Role: Provides triggers from respective platforms to capture chat inputs
- Config: Webhook IDs set for each platform; WhatsApp trigger disabled
- Connections: All output to Edit Fields node for normalization
- Failures: Auth token expiry, webhook misconfiguration
Edit Fields
- Type: Set (Data Editing)
- Role: Normalizes incoming message content and metadata before feeding to RAG agent
- Connections: Outputs to RAG Agent

2.4 RAG Agent & Language Model Processing

Overview:
Core AI logic combining chat memory, vector store retrieval, and GPT-5 language model to generate contextually relevant answers.

Nodes Involved:

RAG Agent: RAG Agent
Chat Memory: Chat Memory
GPT 5: GPT 5

Node Details:

RAG Agent
- Type: Langchain Agent
- Role: Orchestrates retrieval of relevant document chunks from Supabase and interacts with GPT-5 model
- Inputs: Receives chat input (from triggers), memory (from Chat Memory), and vector store tool (from Supabase Vector Store)
- Outputs: Generated answer routed to response delivery nodes
- Failures: Query failures, model API limits, memory access errors
Chat Memory
- Type: Langchain Postgres Chat Memory
- Role: Stores and retrieves conversational context to maintain dialogue coherence
- Config: Connects to a Postgres database for persistent memory
- Connections: Feeds memory context to RAG Agent
GPT 5
- Type: Langchain OpenAI Chat Model
- Role: Language model generating natural language responses
- Config: Uses GPT-5 via OpenAI API with proper credentials
- Connections: Connected as language model for RAG Agent
- Failures: API rate limits, malformed input, timeout

2.5 Response Delivery

Overview:
Delivers the AI-generated responses back to users via their respective messaging channels.

Nodes Involved:

Telegram: Send a text message
WhatsApp: Send message (disabled)
Slack: Send a message
Gmail: Send a message1

Node Details:

Send a text message (Telegram)
- Type: Telegram Node
- Role: Sends text reply to Telegram users
- Config: Requires Telegram bot credentials
- Input: Receives RAG Agent output
- Failures: Bot authentication failures, message formatting errors
Send message (WhatsApp)
- Type: WhatsApp Node
- Role: Intended for WhatsApp message delivery (currently disabled)
- Failures: WhatsApp API restrictions, disabled node
Send a message (Slack)
- Type: Slack Node
- Role: Sends response message to Slack channel or user
- Config: Slack OAuth credentials configured
- Input: Receives RAG Agent output
Send a message1 (Gmail)
- Type: Gmail Node
- Role: Sends email responses via Gmail
- Config: OAuth2 credentials for Gmail configured
- Input: Receives RAG Agent output

3. Summary Table

Node Name	Node Type	Functional Role	Input Node(s)	Output Node(s)	Sticky Note
Create file	Google Drive Trigger	Detect new Google Drive files	None	Download File4
Download File4	Google Drive	Download detected file	Create file	OCR
OCR	Mistral AI	Perform OCR on downloaded files	Download File4	Split Out1
Split Out1	Split Out	Flatten OCR output	OCR	Insert into Supabase Vectorstore3
Character Text Splitter4	Langchain Text Splitter	Split text into chunks	None (implied with Default Data Loader4)	Default Data Loader4
Default Data Loader4	Langchain Default Data Loader	Prepare documents for embedding	Character Text Splitter4	Insert into Supabase Vectorstore3
Embeddings OpenAI8	Langchain OpenAI Embeddings	Generate embeddings for document chunks	Default Data Loader4	Insert into Supabase Vectorstore3
Insert into Supabase Vectorstore3	Langchain Supabase Vector Store	Store embeddings in Supabase vector DB	Split Out1, Embeddings OpenAI8, Default Data Loader4	None
Supabase Vector Store	Langchain Supabase Vector Store	Vector store for RAG retrieval	Embeddings OpenAI6	RAG Agent
Embeddings OpenAI6	Langchain OpenAI Embeddings	Generate query embeddings	None	Supabase Vector Store
When chat message received	Langchain Chat Trigger	Entry webhook for chat messages	None	RAG Agent
Slack Trigger	Slack Trigger	Trigger on Slack messages	None	Edit Fields
Telegram Trigger	Telegram Trigger	Trigger on Telegram messages	None	Edit Fields
Gmail Trigger	Gmail Trigger	Trigger on Gmail messages	None	Edit Fields
WhatsApp Trigger	WhatsApp Trigger (disabled)	Trigger on WhatsApp messages	None	Edit Fields	Disabled node
Edit Fields	Set	Normalize incoming chat fields	Slack Trigger, Telegram Trigger, Gmail Trigger, WhatsApp Trigger	RAG Agent
Chat Memory	Langchain Postgres Chat Memory	Store and retrieve chat conversation context	None	RAG Agent
RAG Agent	Langchain Agent	Retrieve relevant docs and generate AI response	Edit Fields, Chat Memory, Supabase Vector Store, GPT 5	Send a text message, Send message, Send a message, Send a message1
GPT 5	Langchain OpenAI Chat Model	Generate natural language responses	RAG Agent (ai_languageModel)	RAG Agent
Send a text message	Telegram	Send response to Telegram users	RAG Agent	None
Send message	WhatsApp (disabled)	Send response to WhatsApp users	RAG Agent	None	Disabled node
Send a message	Slack	Send response to Slack channels/users	RAG Agent	None
Send a message1	Gmail	Send response emails	RAG Agent	None
Sticky Note	Sticky Note	Visual notes (empty content)	None	None	Multiple sticky notes with empty content
Sticky Note1	Sticky Note	Visual notes (empty content)	None	None
Sticky Note2	Sticky Note	Visual notes (empty content)	None	None
Sticky Note3	Sticky Note	Visual notes (empty content)	None	None
Sticky Note4	Sticky Note	Visual notes (empty content)	None	None
Sticky Note5	Sticky Note	Visual notes (empty content)	None	None
Sticky Note6	Sticky Note	Visual notes (empty content)	None	None

4. Reproducing the Workflow from Scratch

Create a Google Drive Trigger node (Create file):
- Configure it to trigger on new file creation events in your Google Drive account.
Add a Google Drive node (Download File4):
- Connect input from Create file.
- Configure to download the file using the file ID from the trigger.
Add an OCR node (OCR):
- Connect input from Download File4.
- Configure with Mistral AI or your preferred OCR service to extract text from downloaded files.
Add a Split Out node (Split Out1):
- Connect input from OCR.
- This node flattens OCR output arrays for further processing.
Add a Character Text Splitter node (Character Text Splitter4):
- Connect logically to the document processing flow (after OCR or download).
- Configure to split long text into smaller chunks (default settings suffice).
Add a Default Data Loader node (Default Data Loader4):
- Connect input from Character Text Splitter4.
- Prepares document chunks for embeddings.
Add an Embeddings OpenAI node (Embeddings OpenAI8):
- Connect input from Default Data Loader4.
- Set API key credentials for OpenAI. Use embedding model (e.g., text-embedding-ada-002).
Add a Supabase Vector Store node for insertion (Insert into Supabase Vectorstore3):
- Connect inputs from Split Out1, Embeddings OpenAI8, and Default Data Loader4.
- Configure with your Supabase project URL and API key for vector store insertion.
Add an Embeddings OpenAI node for queries (Embeddings OpenAI6):
- Configure similarly to the ingestion embeddings node.
- Connect output to the Supabase Vector Store retrieval node.
Add a Supabase Vector Store node for retrieval (Supabase Vector Store):
- Connect input from Embeddings OpenAI6.
- Configure with Supabase credentials, set as vector search tool.
Add a Langchain Chat Trigger node (When chat message received):
- Configure webhook for incoming chat messages.
Add platform-specific trigger nodes:
- Slack Trigger, Telegram Trigger, Gmail Trigger (enable as needed).
- Configure respective webhook IDs and credentials.
Add an Edit Fields node (Edit Fields):
- Connect inputs from all platform triggers.
- Normalize and standardize incoming message data before feeding to RAG agent.
Add a Langchain Postgres Chat Memory node (Chat Memory):
- Configure connection to a Postgres database for chat context persistence.
Add a GPT-5 Langchain Chat Model node (GPT 5):
- Configure with OpenAI GPT-5 credentials and parameters.
Add a Langchain Agent node (RAG Agent):
- Connect inputs from Edit Fields (main input), Chat Memory (memory), Supabase Vector Store (tool), and GPT 5 (language model).
- Configure agent to perform retrieval augmented generation.
Add response delivery nodes:
- Telegram node (Send a text message), Slack node (Send a message), Gmail node (Send a message1).
- Connect all to RAG Agent output.
- Configure credentials for each messaging platform.
(Optional) Disable WhatsApp trigger and message sending nodes if not used.
Test the entire flow:
- Upload documents to Google Drive to validate ingestion.
- Send chat messages via configured platforms to test RAG responses.

5. General Notes & Resources

Note Content	Context or Link
The workflow leverages Langchain n8n nodes extensively for AI document processing and chat.	See https://js.langchain.com/docs/use/n8n/
Supabase is used as a vector database for fast semantic search.	https://supabase.com/docs/guides/database/vector-search
GPT-5 is accessed via OpenAI API—ensure API keys have correct access and quota.	https://platform.openai.com/docs/models/gpt-5
The OCR node uses Mistral AI; alternative OCR services can be configured as needed.	https://mistral.ai/
Multi-channel chat triggers enable broad user access; disable unused triggers to optimize performance.	-
For persistent chat memory, a Postgres database is required and should be configured securely.	https://www.postgresql.org/docs/current/index.html
n8n documentation for node-specific configurations is available at https://docs.n8n.io/	-

Disclaimer: The text provided is extracted solely from an automated n8n workflow. It complies fully with current content policies and contains no illegal, offensive, or protected material. All processed data is legal and publicly accessible.

22 KiB Raw Permalink Blame History

Chat with Documents via RAG: Google Drive to GPT-5 with Supabase Vector Database

1. Workflow Overview

2. Block-by-Block Analysis

2.1 Document Ingestion & Processing

2.2 Vector Store & Embeddings Setup

2.3 Chat Trigger & Preprocessing

2.4 RAG Agent & Language Model Processing

2.5 Response Delivery

3. Summary Table

4. Reproducing the Workflow from Scratch

5. General Notes & Resources

22 KiB

Raw Permalink Blame History