# Motivation
A major component of useful AI systems is adaptation to the user
context. This is a major reason why we'd enabled syncing knowledge
bases. The next steps in this direction is to dynamically update the
evolving state of the user as conversations take place across time and
topics. This allows for more personalized conversations and to maintain
context across conversations.
# Overview
This change introduces medium and long term memories in Khoj.
- The scope of a conversation can be thought of as short term memory.
- Medium term memory extends to the past week.
- Long term memory extends to anytime in the past, where a search query
results in a match.
# Details
- Enable user to view and manage agent generated memories from their
settings page
- Fully integrate the memory object into all downstream usage, from
image generation, notes extraction, online search, etc.
- Scope memory per agent. The default agent has access to memories
created by other agents as well.
- Enable users and admins to enable/disable Khoj's memory system
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
Fix
- Ensure researcher and coder know to save files to /home/user dir
- Make E2B code executor check for generated files in /home/user
- Do not re-add file types already downloaded from /home/user
Issues
- E2B has a mismatch in default home_dir for run_code & list_dir cmds
So run_code was run with /root as home dir. And list_dir("~") was
checking under /home/user. This caused files written to /home/user
by code not to be discovered by the list_files step.
- Previously the researcher did not know that generated files should
be written to /home/user. So it could tell the coder to save files to
a different directory. Now the researcher knows where to save files to
show them to user as well.
- Add excludeFolders field to KhojSetting interface
- Rename 'Sync Folders' to 'Include Folders' for clarity
- Add 'Exclude Folders' UI section with folder picker
- Filter out excluded folders during content sync
- Show file counts when syncing (X of Y files)
- Prevent excluding root folder
This allows users to exclude specific directories (e.g., Inbox,
Highlights) from being indexed, while the existing Include Folders acts
as a whitelist.
---------
Co-authored-by: Debanjum <debanjum@gmail.com>
This change had been removed in 9a8c707 to avoid overwrites. We now
use random filename for generated files to avoid overwrite from
subsequent runs.
Encourage model to write code that writes files in home folder to
capture with logical filenames.
Add khoj app landing page to khoj monorepo. Show in a more natural
place, when non logged in users open the khoj app home page.
Authenticated users still see logged in home page experience.
Delete old login html page. Login via popup on home is the single,
unified login experience.
Have docs mention khoj home url, no need to mention /login as login
popup shows on home page too
Why
--
- The models are now smart enough to usually understand which tools to
call in parallel and when.
- The LLM can request more work for each call to it, which is usually
the slowest step. This speeds up work by reearch agent. Even though
each tool is still executed in sequence (for now).
Old thought messages are dropped by default by the Anthropic API. This
change ensures old thoughts are kept. This should improve cache
utilization to reduce costs. And keeping old thoughts may also improve
model intelligence.
Khoj doesn't handle parallel tool calling right now. Models were told
to call tools in serial but it wasn't enforced via the Anthropic API.
So if model did try make parallel tool call, next response would fail
as it expects a tool result for the other tool calls. But khoj just
returned the first tool calls results. This mostly affected haiku due
to its lower fine-grained instruction following capabilities.
This changes enforces serial tool calls at the API layer to avoid this
issue altogether for claude models.
Logical error due to else conditional being not correctly indented.
This would result in error in using gemini 3 pro image when images are
in S3 bucket.
Overview
---
This change enables specifying fallback chat models for each task
type (fast, deep, default) and user type (free, paid).
Previously we did not fallback to other chat models if the chat model
assigned for a task failed.
Details
---
You can now specify multiple ServerChatSettings via the Admin Panel
with their usage priority. If the highest priority chat model for the
task, user type fails, the task is assigned to a lower priority chat
model configured for the current user and task type.
This change also reduces the retry attempts for openai chat actor
models from 3 to 2 as:
- multiple fallback server chat settings can now be created. So
reducing retries with same model reduces latency.
- 2 attempts is inline with retry attempts with other model
types (gemini, anthropic)
What
--
- Default to using fast model for most chat actors. Specifically in this
change we default to using fast model for doc, web search chat actors
- Only research chat director uses the deep chat model.
- Make using fast model by chat actors configurable via func argument
Code chat actor continues to use deep chat model and webpage reader
continues to use fast chat model.
Deep, fast chat models can be configured via ServerChatSettings on the
admin panel.
Why
--
Modern models are good enough at instruction following. So defaulting
most chat actor to use the fast model should improve chat speed with
acceptable response quality.
The option to fallback to research mode for higher quality
responses or deeper research always exists.
Avoids rendering flicker from attempt to render invalid image paths
referenced in message by khoj on web app.
The rendering flicker made it very annoying to interact with
conversations containing such messages on the web app.
The current change does lightweight validation of image url before
attempting to render it. If invalid image url detected, the image is
replaced with just its alt text.
- Use qwen style <think> tags to extract Minimax M2 model thoughts
- Use function to mark models that use in-stream thinking (including
Kimi K2 thinking)
- Server admin can add MCP servers via the admin panel
- Enabled MCP server tools are exposed to the research agent for use
- Use MCP library to standardize interactions with mcp servers
- Support SSE or Stdio as transport to interact with mcp servers
- Reuse session established to MCP servers across research iterations
Google and Firecrawl do not provide good web search descriptions (within
given latency requirements). Exa does better than them.
So prioritize using Exa over Google or Firecrawl when multiple web
search providers available.
Support using Exa for webpage reading. It seems much faster than
currently available providers.
Remove Jina as a webpage reader and remaining references to Jina from
code, docs. It was anyway slow and API may shut down soon (as it was
bought by Elastic).
Update docs to mention Exa for web search and webpage reading.