What
--
- Default to using fast model for most chat actors. Specifically in this
change we default to using fast model for doc, web search chat actors
- Only research chat director uses the deep chat model.
- Make using fast model by chat actors configurable via func argument
Code chat actor continues to use deep chat model and webpage reader
continues to use fast chat model.
Deep, fast chat models can be configured via ServerChatSettings on the
admin panel.
Why
--
Modern models are good enough at instruction following. So defaulting
most chat actor to use the fast model should improve chat speed with
acceptable response quality.
The option to fallback to research mode for higher quality
responses or deeper research always exists.
Avoids rendering flicker from attempt to render invalid image paths
referenced in message by khoj on web app.
The rendering flicker made it very annoying to interact with
conversations containing such messages on the web app.
The current change does lightweight validation of image url before
attempting to render it. If invalid image url detected, the image is
replaced with just its alt text.
- Use qwen style <think> tags to extract Minimax M2 model thoughts
- Use function to mark models that use in-stream thinking (including
Kimi K2 thinking)
- Server admin can add MCP servers via the admin panel
- Enabled MCP server tools are exposed to the research agent for use
- Use MCP library to standardize interactions with mcp servers
- Support SSE or Stdio as transport to interact with mcp servers
- Reuse session established to MCP servers across research iterations
Google and Firecrawl do not provide good web search descriptions (within
given latency requirements). Exa does better than them.
So prioritize using Exa over Google or Firecrawl when multiple web
search providers available.
Support using Exa for webpage reading. It seems much faster than
currently available providers.
Remove Jina as a webpage reader and remaining references to Jina from
code, docs. It was anyway slow and API may shut down soon (as it was
bought by Elastic).
Update docs to mention Exa for web search and webpage reading.
Issue
---
When agent personality/instructions are safe, we do not require the
safety agent to give a reason. The safety check agent was told this in
the prompt but it was not reflected in the json schema being used.
Latest openai library started throwing error if response doesn't match
requested json schema.
This broke creating/updating agents when using openai models as safety
agent.
Fix
---
Make reason field optional.
Also put send_message_to_model_wrapper in try/catch for more readable
error stacktrace.
Previously we only showed unsafe prompt errors to user when
creating/updating agent. Errors in name collision were not shown on
the web app ux.
This change ensures that such validation errors are bubbled up to the
user in the UX. So they can resolve the agent create/update error on
their end.
Count cached tokens, reasoning tokens for better cost estimates for
models served over an openai compatible api. Previously we didn't
include cached token or reasoning tokens in costing.
There are faster, better web search, webpage read providers. Only keep
reasonable quality online context providers.
Jina was good for self-hosting quickstart as it provided a free api
key without login. It does not provide that now. Its latencies are
pretty high vs other online context providers.
Groq API has stopped support minimum and maximum items fields from
tool schema. This unexpectedly broke using AI models served via Groq
API like Kimi K2 and GPT-OSS in research mode.
Improve typing of relevant fields