33 KiB
Handle customer support queries with cache-first RAG using Redis, LangCache and OpenAI
Handle customer support queries with cache-first RAG using Redis, LangCache and OpenAI
1. Workflow Overview
Title: Handle customer support queries with cache-first RAG using Redis, LangCache and OpenAI
Purpose:
A cache-first customer support assistant that (1) decomposes a user’s chat query into cacheable sub-questions, (2) attempts to answer each sub-question via LangCache (semantic cache), (3) falls back to Redis Vector Store RAG (OpenAI embeddings + Redis retrieval tool) on cache misses, (4) quality-scores the retrieved answer, (5) saves only high-quality answers back to LangCache, and finally (6) synthesizes all sub-answers into one user-facing response.
Target use cases: support chatbots, internal help desks, KB assistants where speed, cost control, and reducing hallucinations matter.
1.1 Entry Points & Major Logical Blocks
- Knowledge Base Preparation (scheduled ingestion)
Loads example KB documents and inserts them into a Redis vector index. - Chat Intake + Configuration
Receives chat messages and sets LangCache parameters. - Query Decomposition
Uses an LLM to decide whether to split the query into 2–4 sub-questions (structured output). - Per-Question Loop + Cache-First Lookup
Iterates each sub-question; checks LangCache similarity search first. - RAG Retrieval on Cache Miss + Quality Gate + Retry Control
Uses Redis vector retrieval as an LLM tool, evaluates answer quality, retries within a max iteration limit, then caches acceptable results. - Aggregation + Final Response Synthesis
Aggregates gathered Q/A data and synthesizes a final customer-facing response.
2. Block-by-Block Analysis
Block A — Knowledge Base Preparation (Example Data → Redis Vector Insert)
Overview: Populates a Redis vector index with example customer support documents using OpenAI embeddings. Triggered on a schedule.
Nodes involved: Schedule Trigger, example Data, Default Data Loader, Embeddings OpenAI1, Redis Vector Store, Sticky Note9
A1) Schedule Trigger
- Type / role:
n8n-nodes-base.scheduleTrigger— periodic trigger to run KB ingestion. - Config choices: Interval rule present (template default; not customized in JSON).
- Connections:
- Out:
example Data
- Out:
- Edge cases: Misconfigured interval may run too frequently (cost/overwrites) or never.
A2) example Data
- Type / role:
n8n-nodes-base.set— defines an in-workflow KB as an arrayraw_docs. - Config choices: Sets
raw_docsto an array of support policy strings (plans/pricing, rate limits, exports, integrations, security, billing, account recovery). - Connections:
- In: Schedule Trigger
- Out: Redis Vector Store
- Edge cases: If
raw_docsis empty or not an array, ingestion may fail downstream.
A3) Default Data Loader
- Type / role:
@n8n/n8n-nodes-langchain.documentDefaultDataLoader— converts incoming docs/fields to LangChain “Document” objects. - Config choices: Default options.
- Connections:
- AI document out: to Redis Vector Store
- Edge cases: If upstream format is unexpected, document creation may produce zero docs.
A4) Embeddings OpenAI1
- Type / role:
@n8n/n8n-nodes-langchain.embeddingsOpenAi— generates embeddings for documents. - Config choices: Uses OpenAI credential “n8n free OpenAI API credits”.
- Connections:
- AI embedding out: to Redis Vector Store
- Edge cases: OpenAI auth/quota issues; model changes; rate limits; large document sizes.
A5) Redis Vector Store
- Type / role:
@n8n/n8n-nodes-langchain.vectorStoreRedis— inserts documents into Redis vector index. - Config choices:
- Mode:
insert - Index:
kb-3accd7ed - Uses Redis credential “Redis account”.
- Mode:
- Connections:
- In: example Data (main), Default Data Loader (ai_document), Embeddings OpenAI1 (ai_embedding)
- Edge cases: Wrong index name/schema, Redis auth/connection errors, missing Redis Search module / vector capability, dimension mismatch if embedding model changes.
Sticky Note coverage:
- Sticky Note9: “## Prepare the Knowledge Base - Example Data”
Block B — Chat Intake + LangCache Configuration
Overview: Receives incoming chat messages and sets LangCache parameters used throughout the run.
Nodes involved: When chat message received, LangCache Config, Sticky Note1
B1) When chat message received
- Type / role:
@n8n/n8n-nodes-langchain.chatTrigger— entry point for chat-based executions. - Config choices: Default options; provides
chatInputandsessionId. - Key fields used later:
$('When chat message received').item.json.chatInput$('When chat message received').item.json.sessionId
- Connections:
- Out: LangCache Config
- Edge cases: Missing
chatInput(empty messages), missing sessionId (memory nodes rely on it).
B2) LangCache Config
- Type / role:
n8n-nodes-base.set— central configuration for LangCache and retry parameters. - Config choices (assignments):
langcacheBaseUrl:https://aws-us-east-1.langcache.redis.iolangcacheCacheId:b83aa61d58be484ebc37c64f1f30c2fasimilarityThreshold:0.75max_iterations:"2"(string)
- Connections:
- In: When chat message received
- Out: decompose_query
- Edge cases / integration issues:
max_iterationsis stored as string, but later compared numerically; n8n may coerce, but can also cause strict-type issues in some contexts.- Wrong cache ID/base URL leads to HTTP 401/404 from LangCache endpoints.
Sticky Note coverage:
- Sticky Note1: Configuration instructions for LangCache parameters.
Block C — Query Decomposition (LLM + Structured Parsing)
Overview: Uses an OpenAI chat model to decide whether the user query should be split into multiple cacheable sub-questions; outputs a structured JSON object { questions: [...] }.
Nodes involved: decompose_query, Structured Output Parser, Simple Memory, OpenAI Chat Model, Sticky Note2
C1) Simple Memory
- Type / role:
@n8n/n8n-nodes-langchain.memoryBufferWindow— conversation memory for decomposition step. - Config choices:
sessionKey:={{ $('When chat message received').item.json.sessionId }}- window length: 10
- Connections:
- AI memory out: to decompose_query
- Edge cases: If sessionId missing/unstable, memory won’t persist across turns.
C2) OpenAI Chat Model
- Type / role:
@n8n/n8n-nodes-langchain.lmChatOpenAi— shared LLM used by multiple agent nodes. - Config choices:
gpt-4.1-mini - Connections:
- AI languageModel out: to decompose_query, search_node1, synthesize_response_node
- Edge cases: Model availability, quota, rate limits, high concurrency.
C3) Structured Output Parser
- Type / role:
@n8n/n8n-nodes-langchain.outputParserStructured— forces JSON output matching a schema. - Config choices: Manual JSON schema requiring:
questions(array of strings)
- Connections:
- AI outputParser out: to decompose_query
- Edge cases: If the LLM returns invalid JSON or wrong shape, parsing fails (agent node may error).
C4) decompose_query
- Type / role:
@n8n/n8n-nodes-langchain.agent— LLM agent that produces the list of sub-questions. - Config choices:
- Input text: user chat input
- System message: rules for SINGLE_QUESTION vs. 2–4 sub-questions
- Has output parser: enabled (expects structured output)
- PromptType: define
- Important note: The prompt says “If keeping as single question, respond with exactly: SINGLE_QUESTION”, but the structured schema always requires an array
questions. This mismatch can break parsing unless the agent always maps SINGLE_QUESTION into{"questions":[...]} - Connections:
- In: LangCache Config (main), OpenAI Chat Model (ai_languageModel), Simple Memory (ai_memory), Structured Output Parser (ai_outputParser)
- Out: Split Out
- Edge cases:
- Output-parser failures due to SINGLE_QUESTION behavior.
- Multi-language queries might produce unexpected decomposition formatting.
Sticky Note coverage:
- Sticky Note2: Query decomposition rationale.
Block D — Split + Loop Over Sub-Questions
Overview: Converts the questions array into individual items and processes them in batches.
Nodes involved: Split Out, Loop Over Items, Sticky Note3
D1) Split Out
- Type / role:
n8n-nodes-base.splitOut— splitsoutput.questionsinto separate items. - Config choices:
- Field:
output.questions - Destination field name:
question
- Field:
- Connections:
- In: decompose_query
- Out: Loop Over Items
- Edge cases: If
output.questionsmissing/not an array, produces zero items or errors.
D2) Loop Over Items
- Type / role:
n8n-nodes-base.splitInBatches— iterates items (sub-questions). - Config choices: Default batch settings (no explicit batch size shown).
- Connections:
- In: Split Out; also receives feedback loop from Save to LangCache
- Out(0): Aggregate (collects data)
- Out(1): Search LangCache (cache lookup)
- Edge cases: If used incorrectly, can create loops; here it is intentionally used with a feedback connection.
Sticky Note coverage:
- Sticky Note3: Cache-first strategy explanation.
Block E — LangCache Lookup + Cache Hit Routing
Overview: Performs semantic cache search in LangCache; if hit, returns cached response; if miss, proceeds to retrieval.
Nodes involved: Search LangCache, Is Cache Hit?, current_iteration
E1) Search LangCache
- Type / role:
n8n-nodes-base.httpRequest— calls LangCache search endpoint. - Config choices:
- Method: POST
- URL:
{{langcacheBaseUrl}}/v1/caches/{{langcacheCacheId}}/entries/search - Body:
prompt= currentquestion,similarityThresholdfrom config - Auth: HTTP Bearer (generic credential type)
- Header:
accept: application/json - onError:
continueErrorOutput(workflow continues even on request failure)
- Connections:
- In: Loop Over Items
- Out(0): Is Cache Hit?
- Out(1): current_iteration (secondary path)
- Edge cases / failure modes:
- 401/403 invalid bearer token
- 404 wrong cache ID
- Network timeouts
- Because
continueErrorOutputis enabled, downstream nodes may receive error-shaped JSON; conditions relying on$json.datamay evaluate unexpectedly.
E2) Is Cache Hit?
- Type / role:
n8n-nodes-base.if— checks if similarity score meets threshold. - Config choices: boolean condition:
{{ $json.data?.[0]?.similarity >= similarityThreshold }}
- Connections:
- True path: Loop Over Items (immediately continue loop; implicitly treating cache hit as “done”)
- False path: current_iteration (start retrieval pipeline)
- Edge cases:
- If
data[0]missing, expression yieldsfalseand forces retrieval (safe default). - If LangCache API changes response structure, condition breaks.
- If
E3) current_iteration
- Type / role:
n8n-nodes-base.set— initializes/keeps the retry iteration counter. - Config choices:
current_iteration = {{ $json.current_iteration ?? 1 }}
- Connections:
- In: Is Cache Hit? (miss path) and Search LangCache (secondary output) and increase iteration
- Out: search_node1
- Edge cases:
- Field naming inconsistency later (
current_iterrationtypo) breaks retry logic.
- Field naming inconsistency later (
Block F — Redis Vector Retrieval (Tool) + Answer Generation
Overview: On cache miss, an LLM agent answers using ONLY the Redis-backed KB via a retrieval tool.
Nodes involved: search_node1, Redis Vector Store2, Embeddings OpenAI, Simple Memory1, Sticky Note4
F1) Redis Vector Store2
- Type / role:
@n8n/n8n-nodes-langchain.vectorStoreRedis— exposes Redis retrieval as an LLM tool. - Config choices:
- Mode:
retrieve-as-tool - Index:
kb-3accd7ed - Tool description: “Using search_knowledge_base tool for query”
- Mode:
- Connections:
- AI tool out: to search_node1
- AI embedding in: from Embeddings OpenAI
- Edge cases: Redis index not populated; embedding dimension mismatch; tool not invoked depending on agent behavior.
F2) Embeddings OpenAI
- Type / role: OpenAI embeddings for query-time retrieval.
- Connections:
- AI embedding out: to Redis Vector Store2
- Edge cases: auth/quota/rate limits.
F3) Simple Memory1
- Type / role: conversation memory for retrieval agent step.
- Config choices: same sessionKey, window length 10.
- Connections:
- AI memory out: to search_node1
F4) search_node1
- Type / role:
@n8n/n8n-nodes-langchain.agent— “research engine” that answers sub-question strictly from KB. - Config choices:
- Text: current
question - System message: forbids external knowledge; if not in KB respond exactly
no info found - Has output parser: enabled (but no explicit parser node connected here)
- Text: current
- Connections:
- In: current_iteration (main), OpenAI Chat Model (ai_languageModel), Redis Vector Store2 (ai_tool), Simple Memory1 (ai_memory)
- Out: evaluate_quality
- Edge cases:
- If the agent doesn’t call the retrieval tool, it may output
no info foundoften. - “hasOutputParser” without a parser node can be benign depending on node defaults, but can also cause runtime configuration expectations.
- If the agent doesn’t call the retrieval tool, it may output
Sticky Note coverage:
- Sticky Note4: Redis vector retrieval only on cache miss.
Block G — Quality Evaluation + Retry Control + Cache Save
Overview: Scores each sub-answer; if acceptable, saves to LangCache; if low quality, retries retrieval up to max_iterations.
Nodes involved: evaluate_quality, getScore, low quality ?, increase iteration, Save to LangCache, Sticky Note5, Sticky Note6, Sticky Note7
G1) evaluate_quality
- Type / role:
@n8n/n8n-nodes-langchain.openAi— direct OpenAI call (not the agent node) to evaluate result quality. - Config choices:
- Model:
gpt-4.1-mini - Response format:
json_object(n8n option) - Prompts include original sub-question and research result; system asks for:
SCORE: 0.XFEEDBACK: ...
- Model:
- Connections:
- In: search_node1
- Out: getScore
- Edge cases: The instruction requests a “SCORE: …” textual format, while the node enforces JSON output. This can produce parsing/shape inconsistencies unless the model outputs JSON with
SCOREandFEEDBACK.
G2) getScore
- Type / role:
n8n-nodes-base.set— extractsSCOREandFEEDBACKfrom the evaluation response. - Config choices:
SCORE = {{ $json.output[0].content[0].text.SCORE }}FEEDBACK = {{ $json.output[0].content[0].text.FEEDBACK }}
- Connections:
- Out: low quality ?
- Edge cases: This path is highly dependent on the exact response structure; if evaluate_quality returns a different JSON layout, these expressions fail.
G3) low quality ?
- Type / role:
n8n-nodes-base.if— decides retry vs accept-and-cache. - Config choices: condition:
{{ $json.SCORE < 0.7 && $('current_iteration').item.json.current_iterration >= $('LangCache Config').item.json.max_iterations }}
- Important issues:
- Uses
current_iterration(typo) while the field created iscurrent_iteration. This likely makes the comparison evaluate asundefined >= ...(false), breaking retry gating. - Logic reads “low quality AND current >= max” which means it retries when already at/above max; typically you want retry when
current < max.
- Uses
- Connections:
- True path: increase iteration (then loops back to retrieval)
- False path: Save to LangCache
- Edge cases: Risk of unintended looping or never retrying depending on expression evaluation.
G4) increase iteration
- Type / role:
n8n-nodes-base.set— increments retry counter. - Config choices:
current_iteration = {{ $('current_iteration').item.json.current_iterration + 1 }}
- Issue: Same typo
current_iterrationprevents incrementing properly. - Connections:
- Out: current_iteration (back into retrieval path)
G5) Save to LangCache
- Type / role:
n8n-nodes-base.httpRequest— saves prompt/response pair to LangCache entries. - Config choices:
- Method: POST
- URL:
{{baseUrl}}/v1/caches/{{cacheId}}/entries - Body:
prompt: currentquestionresponse:{{ $('search_node1').item.json.output }}
- Auth: HTTP Bearer
- Connections:
- Out: Loop Over Items (continues batch loop)
- Edge cases: If response is
no info found, you may cache unhelpful answers unless you explicitly gate on that. Also susceptible to auth/timeouts.
Sticky Note coverage:
- Sticky Note5: Quality evaluation score threshold (≥0.7 accept)
- Sticky Note6: Retry control via max_iterations
- Sticky Note7: Save to cache only high-quality answers (intended; current logic may not fully enforce this due to issues above)
Block H — Aggregation + Final Response Synthesis
Overview: Collects all per-question results and produces a single user-facing answer.
Nodes involved: Aggregate, synthesize_response_node, Simple Memory2, Sticky Note8
H1) Aggregate
- Type / role:
n8n-nodes-base.aggregate— aggregates all item data for synthesis. - Config choices:
aggregateAllItemData - Connections:
- In: Loop Over Items
- Out: synthesize_response_node
- Edge cases: If loop outputs inconsistent item shapes (cache hit vs miss), aggregation may include mixed schemas.
H2) Simple Memory2
- Type / role: memory for final response agent.
- Config choices: same sessionKey, window length 10.
- Connections:
- AI memory out: synthesize_response_node
H3) synthesize_response_node
- Type / role:
@n8n/n8n-nodes-langchain.agent— combines gathered info into final response. - Config choices:
- Text input: original query +
{{ $json.data.toJsonString() }} - System message: instructs to produce a coherent support answer; if insufficient info, return an apology fallback sentence.
- Has output parser: enabled (no explicit parser node connected here)
- Text input: original query +
- Connections:
- In: Aggregate (main), OpenAI Chat Model (ai_languageModel), Simple Memory2 (ai_memory)
- Edge cases: If
$json.datadoesn’t exist (aggregate output differs),.toJsonString()may fail. Also may over-apologize if inputs are sparse.
Sticky Note coverage:
- Sticky Note8: “## Generate the respoonse”
3. Summary Table
| Node Name | Node Type | Functional Role | Input Node(s) | Output Node(s) | Sticky Note |
|---|---|---|---|---|---|
| When chat message received | @n8n/n8n-nodes-langchain.chatTrigger | Chat entry point | — | LangCache Config | |
| LangCache Config | n8n-nodes-base.set | Central config for LangCache + retry params | When chat message received | decompose_query | #### Configuration (Edit First) / Update in LangCache Config: / - langcacheBaseUrl / - langcacheCacheId / - similarityThreshold (default 0.75) / - max_iterations (default 2) |
| Simple Memory | @n8n/n8n-nodes-langchain.memoryBufferWindow | Memory for decomposition | When chat message received (sessionId) | decompose_query (ai_memory) | ## Query Decomposition / Splits complex user input into focused questions to improve retrieval and caching. |
| OpenAI Chat Model | @n8n/n8n-nodes-langchain.lmChatOpenAi | Shared LLM for agents | — | decompose_query; search_node1; synthesize_response_node | |
| Structured Output Parser | @n8n/n8n-nodes-langchain.outputParserStructured | Enforces {questions:[...]} schema |
— | decompose_query (ai_outputParser) | ## Query Decomposition / Splits complex user input into focused questions to improve retrieval and caching. |
| decompose_query | @n8n/n8n-nodes-langchain.agent | Decompose query into cacheable sub-questions | LangCache Config; OpenAI Chat Model; Simple Memory; Structured Output Parser | Split Out | ## Query Decomposition / Splits complex user input into focused questions to improve retrieval and caching. |
| Split Out | n8n-nodes-base.splitOut | Split questions array into items | decompose_query | Loop Over Items | |
| Loop Over Items | n8n-nodes-base.splitInBatches | Iterate sub-questions | Split Out; Save to LangCache; Is Cache Hit? (hit loop) | Aggregate; Search LangCache | #### Cache-First Strategy / Each question is checked in LangCache first. / - Hit → reuse answer / - Miss → search Redis / Reduces latency and API cost. |
| Search LangCache | n8n-nodes-base.httpRequest | LangCache semantic search | Loop Over Items | Is Cache Hit?; current_iteration | #### Cache-First Strategy / Each question is checked in LangCache first. / - Hit → reuse answer / - Miss → search Redis / Reduces latency and API cost. |
| Is Cache Hit? | n8n-nodes-base.if | Route hit vs miss | Search LangCache | Loop Over Items (hit); current_iteration (miss) | #### Cache-First Strategy / Each question is checked in LangCache first. / - Hit → reuse answer / - Miss → search Redis / Reduces latency and API cost. |
| current_iteration | n8n-nodes-base.set | Initialize retry counter | Is Cache Hit?; Search LangCache; increase iteration | search_node1 | |
| Redis Vector Store2 | @n8n/n8n-nodes-langchain.vectorStoreRedis | Retrieval tool (Redis vector search) | Embeddings OpenAI (ai_embedding) | search_node1 (ai_tool) | #### Redis Vector Retrieval / Runs only on cache miss. / Uses embeddings to retrieve relevant knowledge from Redis. |
| Embeddings OpenAI | @n8n/n8n-nodes-langchain.embeddingsOpenAi | Query embeddings for retrieval | — | Redis Vector Store2 | #### Redis Vector Retrieval / Runs only on cache miss. / Uses embeddings to retrieve relevant knowledge from Redis. |
| Simple Memory1 | @n8n/n8n-nodes-langchain.memoryBufferWindow | Memory for retrieval agent | When chat message received (sessionId) | search_node1 (ai_memory) | #### Redis Vector Retrieval / Runs only on cache miss. / Uses embeddings to retrieve relevant knowledge from Redis. |
| search_node1 | @n8n/n8n-nodes-langchain.agent | Answer sub-question from KB only | current_iteration; OpenAI Chat Model; Redis Vector Store2; Simple Memory1 | evaluate_quality | #### Redis Vector Retrieval / Runs only on cache miss. / Uses embeddings to retrieve relevant knowledge from Redis. |
| evaluate_quality | @n8n/n8n-nodes-langchain.openAi | Score answer quality | search_node1 | getScore | ## Quality Evaluation / Each answer is scored (0.0 – 1.0). / - ≥ 0.7 → accept / - < 0.7 → retry if allowed |
| getScore | n8n-nodes-base.set | Extract SCORE/FEEDBACK | evaluate_quality | low quality ? | ## Quality Evaluation / Each answer is scored (0.0 – 1.0). / - ≥ 0.7 → accept / - < 0.7 → retry if allowed |
| low quality ? | n8n-nodes-base.if | Retry gate vs accept | getScore | increase iteration; Save to LangCache | ## Retry Control / Retries are limited by max_iterations to avoid loops and high cost. |
| increase iteration | n8n-nodes-base.set | Increment retry iteration | low quality ? | current_iteration | ## Retry Control / Retries are limited by max_iterations to avoid loops and high cost. |
| Save to LangCache | n8n-nodes-base.httpRequest | Save high-quality answers to cache | low quality ? | Loop Over Items | ## ## Save to Cache / Only high-quality answers are saved to LangCache for future reuse. |
| Aggregate | n8n-nodes-base.aggregate | Collect results for final response | Loop Over Items | synthesize_response_node | ## Generate the respoonse |
| Simple Memory2 | @n8n/n8n-nodes-langchain.memoryBufferWindow | Memory for synthesis | When chat message received (sessionId) | synthesize_response_node (ai_memory) | ## Generate the respoonse |
| synthesize_response_node | @n8n/n8n-nodes-langchain.agent | Produce final customer response | Aggregate; OpenAI Chat Model; Simple Memory2 | — | ## Generate the respoonse |
| Schedule Trigger | n8n-nodes-base.scheduleTrigger | Scheduled KB ingestion entry | — | example Data | ## Prepare the Knowledge Base - Example Data |
| example Data | n8n-nodes-base.set | Example KB documents | Schedule Trigger | Redis Vector Store | ## Prepare the Knowledge Base - Example Data |
| Default Data Loader | @n8n/n8n-nodes-langchain.documentDefaultDataLoader | Build Document objects | — | Redis Vector Store (ai_document) | |
| Embeddings OpenAI1 | @n8n/n8n-nodes-langchain.embeddingsOpenAi | Doc embeddings for ingestion | — | Redis Vector Store (ai_embedding) | |
| Redis Vector Store | @n8n/n8n-nodes-langchain.vectorStoreRedis | Insert KB docs into Redis index | example Data; Default Data Loader; Embeddings OpenAI1 | — | |
| Sticky Note | n8n-nodes-base.stickyNote | Canvas documentation | — | — | # Customer support RAG workflow: / ## Workflow Overview / Cache-first RAG workflow for customer support. / Flow: / Chat → Decompose → Cache → Redis Search → Quality Check → Cache → Respond / Goals: Fast, accurate, no hallucinations, cost-controlled. / (full note content continues) |
| Sticky Note1 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | #### Configuration (Edit First) / Update in LangCache Config: / - langcacheBaseUrl / - langcacheCacheId / - similarityThreshold (default 0.75) / - max_iterations (default 2) |
| Sticky Note2 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | ## Query Decomposition / Splits complex user input into focused questions to improve retrieval and caching. |
| Sticky Note3 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | #### Cache-First Strategy / Each question is checked in LangCache first. / - Hit → reuse answer / - Miss → search Redis / Reduces latency and API cost. |
| Sticky Note4 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | #### Redis Vector Retrieval / Runs only on cache miss. / Uses embeddings to retrieve relevant knowledge from Redis. |
| Sticky Note5 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | ## Quality Evaluation / Each answer is scored (0.0 – 1.0). / - ≥ 0.7 → accept / - < 0.7 → retry if allowed |
| Sticky Note6 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | ## Retry Control / Retries are limited by max_iterations to avoid loops and high cost. |
| Sticky Note7 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | ## ## Save to Cache / Only high-quality answers are saved to LangCache for future reuse. |
| Sticky Note8 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | ## Generate the respoonse |
| Sticky Note9 | n8n-nodes-base.stickyNote | Canvas documentation | — | — | ## Prepare the Knowledge Base - Example Data |
4. Reproducing the Workflow from Scratch
A) Create credentials (required)
- OpenAI API credential
- Create an OpenAI credential in n8n.
- Ensure it can access:
- Chat model:
gpt-4.1-mini(or your chosen model) - Embeddings model (default for the embeddings nodes).
- Chat model:
- Redis credential
- Create Redis connection credential (host/port/password/TLS as required).
- Ensure Redis supports vector search (Redis Stack / RediSearch with vector).
- HTTP Bearer credential for LangCache
- Create an “HTTP Bearer Auth” credential.
- Paste LangCache API token.
B) Build the scheduled KB ingestion branch (optional but included here)
- Add Schedule Trigger (Schedule Trigger node).
- Add Set node named
example Data- Create field
raw_docsas an array of strings (your KB entries).
- Create field
- Add Default Data Loader (LangChain Document Default Data Loader).
- Add Embeddings OpenAI node named
Embeddings OpenAI1(OpenAI embeddings) and select your OpenAI credential. - Add Redis Vector Store node
- Mode: insert
- Redis index: choose/create something like
kb-3accd7ed - Select Redis credential.
- Wire:
- Schedule Trigger → example Data → Redis Vector Store (main)
- Default Data Loader (ai_document) → Redis Vector Store
- Embeddings OpenAI1 (ai_embedding) → Redis Vector Store
C) Build the chat/RAG branch
- Add When chat message received (Chat Trigger).
- Add Set node
LangCache Configwith fields:langcacheBaseUrl(e.g.https://aws-us-east-1.langcache.redis.io)langcacheCacheId(your cache ID)similarityThreshold(number, e.g.0.75)max_iterations(number recommended; template uses string"2")
- Add OpenAI Chat Model (LangChain Chat Model OpenAI)
- Set model to
gpt-4.1-mini.
- Set model to
- Add Memory Buffer Window node
Simple Memory- sessionKey: expression using chat sessionId
- contextWindowLength: 10
- Add Structured Output Parser with schema:
- Object with required
questions: string[]
- Object with required
- Add Agent node
decompose_query- Text: chat input
- System message: decomposition rules
- Enable structured output parsing by connecting the parser
- Connect AI language model (OpenAI Chat Model) and AI memory (Simple Memory).
- Add Split Out node
- Field to split:
output.questions - Destination field:
question
- Field to split:
- Add Split In Batches node
Loop Over Items - Add HTTP Request node
Search LangCache- POST to:
{{langcacheBaseUrl}}/v1/caches/{{langcacheCacheId}}/entries/search - Body:
prompt={{$json.question}},similarityThreshold={{similarityThreshold}} - Auth: Bearer credential
- Consider enabling “Continue on Fail” (template does).
- POST to:
- Add IF node
Is Cache Hit?- Condition:
{{$json.data?.[0]?.similarity >= $('LangCache Config').item.json.similarityThreshold}}
- Condition:
- Add Set node
current_iterationcurrent_iteration = {{$json.current_iteration ?? 1}}
- Add Embeddings OpenAI node
Embeddings OpenAI(for retrieval), set OpenAI credential. - Add Redis Vector Store node
Redis Vector Store2- Mode: retrieve-as-tool
- Redis index: same as ingestion
- Add Memory Buffer Window
Simple Memory1(sessionKey = sessionId, window=10) - Add Agent node
search_node1- Text:
{{$json.question}} - System message: “research engine” constraints +
no info foundfallback - Connect: OpenAI Chat Model (ai_languageModel), Redis Vector Store2 (ai_tool), Simple Memory1 (ai_memory)
- Text:
- Add OpenAI node
evaluate_quality- Model:
gpt-4.1-mini - Configure to return JSON object
- Prompt with original question + research result; ask for fields
SCOREandFEEDBACKin JSON.
- Model:
- Add Set node
getScoreto mapSCOREandFEEDBACKfrom the evaluation output. - Add IF node
low quality ?- Implement intended logic:
- retry if
SCORE < 0.7andcurrent_iteration < max_iterations
- retry if
- Implement intended logic:
- Add Set node
increase iterationcurrent_iteration = current_iteration + 1
- Add HTTP Request node
Save to LangCache- POST to:
{{langcacheBaseUrl}}/v1/caches/{{langcacheCacheId}}/entries - Body:
prompt={{$json.question}},response={{$('search_node1').item.json.output}}
- POST to:
- Add Aggregate node
Aggregate(aggregate all item data). - Add Memory Buffer Window
Simple Memory2(session-based). - Add Agent node
synthesize_response_node- Text: original query + aggregated gathered info
- System: combine Q/A pairs; if insufficient info output the apology message
- Connect OpenAI Chat Model + Simple Memory2.
D) Wire the main branch
- When chat message received → LangCache Config → decompose_query → Split Out → Loop Over Items
- Loop Over Items → Search LangCache
- Search LangCache → Is Cache Hit?
- Is Cache Hit?:
- True → Loop Over Items (continue)
- False → current_iteration → search_node1 → evaluate_quality → getScore → low quality ?
- low quality ?:
- Retry → increase iteration → current_iteration (back to search_node1)
- Accept → Save to LangCache → Loop Over Items
- Loop Over Items (other output) → Aggregate → synthesize_response_node
5. General Notes & Resources
| Note Content | Context or Link |
|---|---|
| “Customer support RAG workflow… Chat → Decompose → Cache → Redis Search → Quality Check → Cache → Respond” | From Sticky Note (overall workflow explanation on canvas) |
| Configuration reminders: set LangCache base URL, cache ID, similarity threshold, max iterations | From Sticky Note1 |
| Design intent: cache-first to reduce latency and API cost; Redis retrieval runs on cache miss | From Sticky Note3 & Sticky Note4 |
Important implementation caveat: retry logic currently appears broken due to current_iterration typo and inverted comparison; fix to current_iteration < max_iterations |
Derived from node expressions in low quality ? and increase iteration |
| Disclaimer (provided by user): “Le texte fourni provient exclusivement…” | User-provided compliance disclaimer (non-node content) |