22 KiB
Create a Paul Graham Essay Q&A System with OpenAI and Milvus Vector Database
Create a Paul Graham Essay Q&A System with OpenAI and Milvus Vector Database
1. Workflow Overview
This workflow implements a question-answering (QA) system based on the essays of Paul Graham. It is designed for users who want to query the content of these essays semantically using natural language questions. The workflow is divided into two main logical blocks:
1.1 Data Collection & Processing
This block scrapes the list of Paul Graham essays from his website, fetches the essay contents, extracts the main text, splits the text into manageable chunks, generates vector embeddings using OpenAI, and loads these vectors into a Milvus vector database collection named "my_collection". This prepares the data for efficient semantic search.
1.2 Chat Interaction & Question Answering
This block listens for incoming chat messages (questions), retrieves relevant essay chunks from the Milvus vector store using semantic search, and generates answers using an OpenAI chat model (GPT-4o-mini). It provides an interactive QA interface backed by the vector database.
2. Block-by-Block Analysis
2.1 Data Collection & Processing
-
Overview:
This block scrapes Paul Graham’s essays, extracts their text content, processes the text into chunks, generates embeddings, and inserts them into the Milvus vector store. It is triggered manually to refresh or initialize the data. -
Nodes Involved:
- When clicking "Execute Workflow" (Manual Trigger)
- Fetch Essay List (HTTP Request)
- Extract essay names (HTML Extract)
- Split out into items (Split Out)
- Limit to first 3 (Limit)
- Fetch essay texts (HTTP Request)
- Extract Text Only (HTML Extract)
- Recursive Character Text Splitter (Text Splitter)
- Default Data Loader (Document Loader)
- Embeddings OpenAI (Embeddings Generator)
- Milvus Vector Store (Vector Store Insert)
- Sticky Notes (3, 5, and initial instructions)
-
Node Details:
-
When clicking "Execute Workflow"
- Type: Manual Trigger
- Role: Starts the scraping and loading process on demand
- Input: None
- Output: Triggers "Fetch Essay List"
- Failures: None expected; manual start
-
Fetch Essay List
- Type: HTTP Request
- Role: Downloads the HTML page listing Paul Graham essays from http://www.paulgraham.com/articles.html
- Configuration: Simple GET request, no auth or special headers
- Output: HTML content passed to "Extract essay names"
- Failures: Network errors, site down, or changed page structure
-
Extract essay names
- Type: HTML Extract
- Role: Parses the HTML to extract all essay links from nested tables using CSS selector
table table a - Extracted attribute:
href(essay URLs) - Output: Array of essay URLs to "Split out into items"
- Failures: Selector mismatch if website structure changes
-
Split out into items
- Type: Split Out
- Role: Splits the array of essay URLs into individual items for sequential processing
- Output: Single essay URL per item to "Limit to first 3"
- Failures: None expected
-
Limit to first 3
- Type: Limit
- Role: Restricts processing to the first 3 essays only (likely for demo or testing)
- Output: Passes limited essay URLs to "Fetch essay texts"
- Failures: None expected
-
Fetch essay texts
- Type: HTTP Request
- Role: Fetches the full HTML content of each essay by constructing URL
http://www.paulgraham.com/{{ $json.essay }} - Output: HTML content to "Extract Text Only"
- Failures: Network errors, 404 if essay URL invalid
-
Extract Text Only
- Type: HTML Extract
- Role: Extracts the main textual content from the essay page using CSS selector
bodywhile skippingimgandnavtags - Output: Extracted essay text to "Milvus Vector Store" and "Default Data Loader"
- Failures: Extraction errors if page structure changes
-
Recursive Character Text Splitter
- Type: Text Splitter
- Role: Splits large essay text into chunks of 6000 characters for embedding
- Output: Chunks to "Default Data Loader"
- Failures: None expected
-
Default Data Loader
- Type: Document Loader
- Role: Loads the chunked text into document format for embedding generation
- Input: Chunked text from splitter
- Output: Document data to "Milvus Vector Store"
- Failures: Expression errors if input data missing
-
Embeddings OpenAI
- Type: OpenAI Embeddings
- Role: Generates vector embeddings for each text chunk using OpenAI embeddings API
- Configuration: Uses default OpenAI credentials and settings
- Output: Embeddings to "Milvus Vector Store"
- Failures: API quota, auth errors, timeouts
-
Milvus Vector Store
- Type: Vector Store (Milvus)
- Role: Inserts embeddings and documents into the "my_collection" Milvus collection
- Configuration: Insert mode with
clearCollectionset to true (clears existing data before insert) - Input: Embeddings and documents
- Output: None
- Failures: Connection errors, collection not found, auth issues
-
Sticky Notes
- Provide contextual explanations and setup instructions for the scraping and loading block.
-
2.2 Chat Interaction & Question Answering
-
Overview:
This block listens for incoming chat messages (questions), retrieves relevant essay chunks from Milvus using semantic search, and generates answers with an OpenAI chat model. It enables interactive querying of the stored essays. -
Nodes Involved:
- When chat message received (Chat Trigger)
- Milvus Vector Store1 (Milvus Vector Store Retriever)
- Milvus Vector Store Retriever (Retriever)
- Embeddings OpenAI1 (Embeddings Generator)
- Q&A Chain to Retrieve from Milvus and Answer Question (Retrieval QA Chain)
- OpenAI Chat Model (Chat Completion)
- Sticky Note1 (Step 2 instructions)
- Sticky Note2 (empty, possibly for future notes)
-
Node Details:
-
When chat message received
- Type: Chat Trigger (LangChain)
- Role: Webhook listener for incoming chat messages (questions)
- Output: Passes question to "Q&A Chain to Retrieve from Milvus and Answer Question"
- Failures: Webhook misconfiguration, network issues
-
Milvus Vector Store1
- Type: Vector Store (Milvus)
- Role: Connects to the "my_collection" Milvus collection for retrieval
- Configuration: Uses existing Milvus credentials and collection name
- Output: Provides vector store to "Milvus Vector Store Retriever"
- Failures: Connection or auth errors
-
Milvus Vector Store Retriever
- Type: Retriever (Vector Store)
- Role: Retrieves relevant document chunks from Milvus based on query embeddings
- Input: Vector store from "Milvus Vector Store1"
- Output: Retrieved documents to "Q&A Chain to Retrieve from Milvus and Answer Question"
- Failures: Retrieval errors, empty results
-
Embeddings OpenAI1
- Type: OpenAI Embeddings
- Role: Generates embeddings for the incoming question to perform semantic search
- Output: Embeddings to "Milvus Vector Store1" (used internally)
- Failures: API errors, rate limits
-
Q&A Chain to Retrieve from Milvus and Answer Question
- Type: Retrieval QA Chain (LangChain)
- Role: Combines retrieved documents and OpenAI Chat Model to generate an answer to the user’s question
- Inputs: Chat message, retriever, language model
- Output: Answer sent back to chat interface
- Failures: Model errors, retrieval failures, timeout
-
OpenAI Chat Model
- Type: Chat Completion (OpenAI GPT-4o-mini)
- Role: Generates natural language answers based on retrieved context and user question
- Configuration: Model set to "gpt-4o-mini"
- Output: Answer to QA Chain
- Failures: API quota, auth errors, model unavailability
-
Sticky Notes
- Provide instructions for Step 2 — chatting with the QA system.
-
3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| When clicking "Execute Workflow" | Manual Trigger | Starts scraping and loading process | None | Fetch Essay List | ## Step 1 1. Set up a Milvus server based on this guide. And then create a collection named my_collection.2. Click this workflow to load scrape and load Paul Graham essays to Milvus collection. |
| Fetch Essay List | HTTP Request | Downloads Paul Graham essays list page | When clicking "Execute Workflow" | Extract essay names | ## Scrape latest Paul Graham essays |
| Extract essay names | HTML Extract | Extracts essay URLs from HTML | Fetch Essay List | Split out into items | ## Scrape latest Paul Graham essays |
| Split out into items | Split Out | Splits essay URLs into individual items | Extract essay names | Limit to first 3 | ## Scrape latest Paul Graham essays |
| Limit to first 3 | Limit | Limits processing to first 3 essays | Split out into items | Fetch essay texts | ## Scrape latest Paul Graham essays |
| Fetch essay texts | HTTP Request | Fetches full HTML of each essay | Limit to first 3 | Extract Text Only | ## Scrape latest Paul Graham essays |
| Extract Text Only | HTML Extract | Extracts main essay text content | Fetch essay texts | Milvus Vector Store, Default Data Loader | ## Scrape latest Paul Graham essays |
| Recursive Character Text Splitter | Text Splitter | Splits essay text into chunks of 6000 characters | Default Data Loader | Default Data Loader | ## Load into Milvus vector store |
| Default Data Loader | Document Loader | Loads chunked text for embedding generation | Recursive Character Text Splitter | Milvus Vector Store | ## Load into Milvus vector store |
| Embeddings OpenAI | OpenAI Embeddings | Generates vector embeddings for text chunks | Default Data Loader | Milvus Vector Store | ## Load into Milvus vector store |
| Milvus Vector Store | Vector Store (Milvus) | Inserts embeddings and documents into Milvus | Embeddings OpenAI, Extract Text Only, Default Data Loader | None | ## Load into Milvus vector store |
| When chat message received | Chat Trigger | Listens for incoming chat questions | None | Q&A Chain to Retrieve from Milvus and Answer Question | ## Step 2 Chat with this QA Chain with Milvus retriever |
| Milvus Vector Store1 | Vector Store (Milvus) | Connects to Milvus collection for retrieval | Embeddings OpenAI1 | Milvus Vector Store Retriever | |
| Embeddings OpenAI1 | OpenAI Embeddings | Generates embeddings for incoming chat queries | None | Milvus Vector Store1 | |
| Milvus Vector Store Retriever | Retriever (Vector Store) | Retrieves relevant documents from Milvus | Milvus Vector Store1 | Q&A Chain to Retrieve from Milvus and Answer Question | |
| Q&A Chain to Retrieve from Milvus and Answer Question | Retrieval QA Chain (LangChain) | Retrieves context and generates answer | When chat message received, Milvus Vector Store Retriever, OpenAI Chat Model | None | ## Step 2 Chat with this QA Chain with Milvus retriever |
| OpenAI Chat Model | Chat Completion (OpenAI GPT-4o-mini) | Generates natural language answers | Q&A Chain to Retrieve from Milvus and Answer Question | Q&A Chain to Retrieve from Milvus and Answer Question | |
| Sticky Note3 | Sticky Note | Provides context for scraping block | None | None | ## Scrape latest Paul Graham essays |
| Sticky Note5 | Sticky Note | Provides context for Milvus loading block | None | None | ## Load into Milvus vector store |
| Sticky Note | Sticky Note | Setup instructions for Step 1 | None | None | ## Step 1 1. Set up a Milvus server based on this guide. And then create a collection named my_collection.2. Click this workflow to load scrape and load Paul Graham essays to Milvus collection. |
| Sticky Note1 | Sticky Note | Setup instructions for Step 2 | None | None | ## Step 2 Chat with this QA Chain with Milvus retriever |
| Sticky Note2 | Sticky Note | Empty note | None | None |
4. Reproducing the Workflow from Scratch
-
Create Manual Trigger Node
- Type: Manual Trigger
- Purpose: To start the scraping and loading process manually.
-
Create HTTP Request Node "Fetch Essay List"
- URL:
http://www.paulgraham.com/articles.html - Method: GET
- Connect Manual Trigger → Fetch Essay List
- URL:
-
Create HTML Extract Node "Extract essay names"
- Operation: Extract HTML Content
- Extraction Values: Extract attribute
hreffrom CSS selectortable table a - Connect Fetch Essay List → Extract essay names
-
Create Split Out Node "Split out into items"
- Field to split out:
essay(the extracted href array) - Connect Extract essay names → Split out into items
- Field to split out:
-
Create Limit Node "Limit to first 3"
- Max Items: 3 (to limit processing to first 3 essays)
- Connect Split out into items → Limit to first 3
-
Create HTTP Request Node "Fetch essay texts"
- URL:
http://www.paulgraham.com/{{ $json.essay }}(use expression to construct URL) - Method: GET
- Connect Limit to first 3 → Fetch essay texts
- URL:
-
Create HTML Extract Node "Extract Text Only"
- Operation: Extract HTML Content
- Extraction Values: Extract text content from
bodyCSS selector - Skip selectors:
img,navto exclude images and navigation - Connect Fetch essay texts → Extract Text Only
-
Create Recursive Character Text Splitter Node
- Chunk Size: 6000 characters
- Connect Extract Text Only → Recursive Character Text Splitter
-
Create Default Data Loader Node
- Input JSON Data: Use expression
={{ $('Extract Text Only').item.json.data }}to pass extracted text - Connect Recursive Character Text Splitter → Default Data Loader
- Input JSON Data: Use expression
-
Create OpenAI Embeddings Node "Embeddings OpenAI"
- Use OpenAI credentials (set up in n8n credentials)
- Default options
- Connect Default Data Loader → Embeddings OpenAI
-
Create Milvus Vector Store Node
- Mode: Insert
- Options: Clear collection before insert (clearCollection = true)
- Milvus Collection: Select or enter
my_collection - Connect Embeddings OpenAI → Milvus Vector Store
- Also connect Extract Text Only → Milvus Vector Store (for document data)
-
Create Chat Trigger Node "When chat message received"
- Set up webhook to receive chat messages
- Connect to "Q&A Chain to Retrieve from Milvus and Answer Question"
-
Create OpenAI Embeddings Node "Embeddings OpenAI1"
- For embedding incoming chat questions
- Use OpenAI credentials
- Connect to Milvus Vector Store1
-
Create Milvus Vector Store Node "Milvus Vector Store1"
- Milvus Collection:
my_collection - Connect Embeddings OpenAI1 → Milvus Vector Store1
- Milvus Collection:
-
Create Retriever Node "Milvus Vector Store Retriever"
- Connect Milvus Vector Store1 → Milvus Vector Store Retriever
-
Create OpenAI Chat Model Node
- Model:
gpt-4o-mini - Use OpenAI credentials
- Connect to Q&A Chain
- Model:
-
Create Retrieval QA Chain Node "Q&A Chain to Retrieve from Milvus and Answer Question"
- Inputs: Chat message from trigger, retriever, and OpenAI Chat Model
- Connect When chat message received → Q&A Chain
- Connect Milvus Vector Store Retriever → Q&A Chain
- Connect OpenAI Chat Model → Q&A Chain
-
Connect Q&A Chain output to Chat interface
- This completes the chat interaction flow.
-
Add Sticky Notes
- Add notes for setup instructions and block explanations as per original workflow.
-
Configure Credentials
- OpenAI API key for embeddings and chat model nodes
- Milvus server credentials and connection details for vector store nodes
-
Set up Milvus Server
- Follow official Milvus standalone Docker guide: https://milvus.io/docs/install_standalone-docker.md
- Create collection named
my_collection
5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
Step 1: Set up a Milvus server based on this guide. And then create a collection named my_collection. |
Workflow setup instructions |
| Scrape latest Paul Graham essays | Workflow block context |
| Load into Milvus vector store | Workflow block context |
| Step 2: Chat with this QA Chain with Milvus retriever | Workflow chat interaction instructions |
| Milvus official installation guide | https://milvus.io/docs/install_standalone-docker.md |
| OpenAI embeddings documentation | https://platform.openai.com/docs/guides/embeddings |
This structured document fully describes the workflow’s architecture, node configurations, data flow, and setup instructions, enabling users or AI agents to understand, reproduce, and maintain the Paul Graham Essay Q&A system built with n8n, OpenAI, and Milvus.