- It does not support strict mode for json schema, tool use
- It likes text content to be plain string, not nested in a dictionary
- Verified to work with gpt oss models on cerebras
Responses API is starting to get supported by other ai apis as well.
This change does preparatory improvements to ease moving to use
responses api with other ai apis.
Use the new, better named `supports_responses_api' method.
The method currently just maps to `is_openai_api'. It will add other
ai apis once support for using responses api with them is added.
- Fix identifying gpt-oss as openai reasoning model
- Drop unsupported stop param for openai reasoning models
- Drop the Formatting re-enabled logic for openai reasoing only models
We use responses api for openai models and latest openai models are
hybrid models, they don't seem to need this convoluted system
message to format response as markdown
is_automated_task check isn't required as automation cannot be created
via chat anymore.
conversation specific file_filters are extracted directly in document
search, so doesn't need to be passed down from chat api endpoint
The context building logic was nearly identical across all model
types.
This change extracts that logic into a shared function and calls it
once in the `agenerate_chat_response', the entrypoint to the converse
methods for all 3 model types.
Main differences handled are
- Gemini system prompt had additional verbosity instructions. Keep it
- Pass system messsage via chatml messages list to anthropic, gemini
models as well (like openai models) instead of passing it as
separate arg to chat_completion_* funcs.
The model specific message formatters for both already extract
system instruction from the messages list. So system messages wil be
automatically extracted from the chat_completion_* funcs to pass as
separate arg required by anthropic, gemini api libraries.
Overview
Enable improving speed and cost of chat by setting fast, deep think
models for intermediate steps and non user facing operations.
Details
- Allow decoupling default chat models from models used for
intermediate steps by setting server chat settings on admin panel
- Use deep think models for most intermediate steps like tool
selection, subquery construction etc. in default and research mode
- Use fast think models for webpage read, chat title setting etc.
Faster webpage read should improve conversation latency
What
Explicit selection of notes tool/conversation command by agent is
required now.
Why
- Newer models are good at deciding when to look up notes
- Modern khoj is less of a notes only chat to search notes by default
generated_files wasn't being set (anymore?). But it was being passed
around through for chat context and being saved to db.
Also reduce variables used to set mermaid diagram description
- Process chat history in default order instead of processing it in
reverse. Improve legibility of context construction for minor
performance hit in dropping message from front of list.
- Handle multiple system messages by collating them into list
- Remove logic to drop system role for gemma-2, o1 models. Better to
make code more readable than support old models.
Use seed to stabilize image change consistency across turns when
- KHOJ_LLM_SEED env var is set
- Using Image models via Replicate
OpenAI, Google do not support image seed
Inferred queries is stored with underscore in db but aliased with - in memory.
This conversation.messages logic was broken, so inferred queries field
of chat message history was getting ignored.
This change fixes that issue and improve previous image generation
description for better context for subsequent image generation attempts.
Overview
- Khoj references files it used in its response as markdown links.
For example [1](file://path/to/file.txt#line=121)
- Previously these file links were just shown as raw text
- This change renders khoj's inline file references as a proper links
and shows file content preview (around specified line if deeplink)
on hover or click in the web app
Details
- Render inline file references as links in chat message on web app.
Previously references like [1](file://path/to/file.txt#line=120)
would be shown as plain text. Now they are rendered as links
- Preview file content of referenced files on click or hover.
If reference uses a deeplink with line number, the file content
around that line is shown on hover, click. Click allows viewing file
preview on mobile, unlike hover. Hover is easier with mouse.
Fixes
- Fix to allow khoj to delete content in obsidian write mode
- Do not throw error when no edit blocks in write mode on obsidian
- Limit retries to fix invalid edit blocks in obsidian write mode
Improvements
- Only show 3 recent files as context in obsidian file read, write mode
- Persist open file access mode setting across restarts in obsidian
- Make khoj obsidian keyboard shortcuts toggle voice chat, chat history
- Do not show <SYSTEM> instructions in chat session title on obsidian
Closes#1209
In obsidian we have a hacky system instruction being passed in read,
write file access modes. This shouldn't be shown in chat sessions list
during view or edit. It is an internal implementation detail.
Previously hitting voice chat keybinding would just start voice chat,
not end it and just open chat history and not close it.
This is unintuitive and different from the equivalent button click
behaviors.
Fix toggles voice chat on/off and shows/hides chat history when hit
Ctrl+Alt+V, Ctrl+Alt+O keybindings in khoj obsidian chat view
Better support for GPT OSS
- Tune reasoning effort, temp, top_p for gpt-oss models
- Extract thoughts of openai style models like gpt-oss from api response
Tool use improvements
- Improve view file, code tool prompts. Format other research tool prompts
- Truncate long words in code tool stdout, stderr for context efficiency
- Use instruction instead of query as code tool argument
- Simplify view file tool. Limit viewing upto 50 lines at a time
- Make regex search tool results look more like grep results
- Update khoj personality prompts with better style, capability guide
Web UX improvements
- Wrap long words in train of thought shown on web app
- Do not overwrite charts created in previous code tool use during research
- Update web UX when server side error or hit stop + no task running
Fix AI API Usage
- Use subscriber type specific context window to generate response
- Fix max thinking budget for gemini models to generate final response
- Fix passing temp kwarg to non-streaming openai completion endpoint
- Handle unset reasoning, response chunk from openai api while streaming
- Fix using non-reasoning openai model via responses API
- Fix to calculate usage from openai api streaming completion
- Add more color to personality and communication style
- Split prompt into capabilities and style sections
- Remove directives in personality meant for older, less smart models.
- Discourage model from unnecessarily sharing code snippets in final
response unless explicitly requested.