Issue
---
When agent personality/instructions are safe, we do not require the
safety agent to give a reason. The safety check agent was told this in
the prompt but it was not reflected in the json schema being used.
Latest openai library started throwing error if response doesn't match
requested json schema.
This broke creating/updating agents when using openai models as safety
agent.
Fix
---
Make reason field optional.
Also put send_message_to_model_wrapper in try/catch for more readable
error stacktrace.
Previously we only showed unsafe prompt errors to user when
creating/updating agent. Errors in name collision were not shown on
the web app ux.
This change ensures that such validation errors are bubbled up to the
user in the UX. So they can resolve the agent create/update error on
their end.
Count cached tokens, reasoning tokens for better cost estimates for
models served over an openai compatible api. Previously we didn't
include cached token or reasoning tokens in costing.
There are faster, better web search, webpage read providers. Only keep
reasonable quality online context providers.
Jina was good for self-hosting quickstart as it provided a free api
key without login. It does not provide that now. Its latencies are
pretty high vs other online context providers.
Groq API has stopped support minimum and maximum items fields from
tool schema. This unexpectedly broke using AI models served via Groq
API like Kimi K2 and GPT-OSS in research mode.
Improve typing of relevant fields
Previously eval run across modes would use different dataset shuffles.
This change enables a strict apples to apples perf comparison of the
different khoj modes across the same (random) subset of questions by
using a dataset seed per workflow run to sample questions
Instead of implicitly defaulting to assuming it is available as:
- For pip install searxng has to be explicitly setup to work
- For docker install we explicitly do set it up and set the
KHOJ_SEARXNG_URL env var already
Also check if Searxng URL is also unset before disable web search
tools now that it is required explicit enablement.
Using prompt cache key enables sticky routing to openai servers.
This increases probability of a chat actor hitting same server and
reusing cached prompts.
We use stable hash of first N characters to uniquely identify a chat
actor prompt
Webpage read is gated behind having a web search engine configured for
now. It can later be decoupled from web search and depend on whether
any web scrapers is configured.
New truncation logic return a new message list.
It does not update message list by reference/in place since 8a16f5a2a.
So truncation tests should run verification on the truncated chat
history returned by the truncation func instead of the original chat
history passed into the truncation func.
- It does not support strict mode for json schema, tool use
- It likes text content to be plain string, not nested in a dictionary
- Verified to work with gpt oss models on cerebras
Responses API is starting to get supported by other ai apis as well.
This change does preparatory improvements to ease moving to use
responses api with other ai apis.
Use the new, better named `supports_responses_api' method.
The method currently just maps to `is_openai_api'. It will add other
ai apis once support for using responses api with them is added.
- Fix identifying gpt-oss as openai reasoning model
- Drop unsupported stop param for openai reasoning models
- Drop the Formatting re-enabled logic for openai reasoing only models
We use responses api for openai models and latest openai models are
hybrid models, they don't seem to need this convoluted system
message to format response as markdown
is_automated_task check isn't required as automation cannot be created
via chat anymore.
conversation specific file_filters are extracted directly in document
search, so doesn't need to be passed down from chat api endpoint
The context building logic was nearly identical across all model
types.
This change extracts that logic into a shared function and calls it
once in the `agenerate_chat_response', the entrypoint to the converse
methods for all 3 model types.
Main differences handled are
- Gemini system prompt had additional verbosity instructions. Keep it
- Pass system messsage via chatml messages list to anthropic, gemini
models as well (like openai models) instead of passing it as
separate arg to chat_completion_* funcs.
The model specific message formatters for both already extract
system instruction from the messages list. So system messages wil be
automatically extracted from the chat_completion_* funcs to pass as
separate arg required by anthropic, gemini api libraries.