5134 Commits

Author SHA1 Message Date
Debanjum
2091044db5 Prefer agent chat model to extract document search queries
Make chat model preference order for document search consistent with
all other tools.
2025-08-27 13:55:50 -07:00
Debanjum
7a42042488 Share context builder for chat final response across model types
The context building logic was nearly identical across all model
types.

This change extracts that logic into a shared function and calls it
once in the `agenerate_chat_response', the entrypoint to the converse
methods for all 3 model types.

Main differences handled are
- Gemini system prompt had additional verbosity instructions. Keep it
- Pass system messsage via chatml messages list to anthropic, gemini
  models as well (like openai models) instead of passing it as
  separate arg to chat_completion_* funcs.

  The model specific message formatters for both already extract
  system instruction from the messages list. So system messages wil be
  automatically extracted from the chat_completion_* funcs to pass as
  separate arg required by anthropic, gemini api libraries.
2025-08-27 13:48:33 -07:00
Debanjum
02e220f5f5 Pass args to context builder funcs grouped consistently
Put context params together, followed by model params
Use consistent ordering to improve readability
2025-08-27 13:45:36 -07:00
Debanjum
4976b244a4 Set fast, deep think models for intermediary steps via admin panel
Overview
Enable improving speed and cost of chat by setting fast, deep think
models for intermediate steps and non user facing operations.

Details
- Allow decoupling default chat models from models used for
  intermediate steps by setting server chat settings on admin panel
- Use deep think models for most intermediate steps like tool
  selection, subquery construction etc. in default and research mode
- Use fast think models for webpage read, chat title setting etc.
  Faster webpage read should improve conversation latency
2025-08-27 13:45:36 -07:00
Debanjum
a99eb841ff Do not search documents when default tool selected by agent
What
Explicit selection of notes tool/conversation command by agent is
required now.

Why
- Newer models are good at deciding when to look up notes
- Modern khoj is less of a notes only chat to search notes by default
2025-08-27 13:45:28 -07:00
Debanjum
a52a06ad9d Prefer olostep over firecrawl for webpage read by default
Default to Olostep as faster and higher webpage read success rate.
Fallback logic will use Firecrawl if Olostep fails.
2025-08-27 13:45:09 -07:00
Debanjum
e150dc5a91 Improve copying message with math, file links to clipboard on web app 2025-08-27 13:45:09 -07:00
Debanjum
15d1f39d0b Improve instruction to ai model for writing math expressions in LaTeX 2025-08-27 13:45:09 -07:00
Debanjum
05dbb6a7c1 Drop unused generated_files arg from chat context
generated_files wasn't being set (anymore?). But it was being passed
around through for chat context and being saved to db.

Also reduce variables used to set mermaid diagram description
2025-08-27 13:45:09 -07:00
Debanjum
8a16f5a2af Reduce logical complexity of constructing context from chat history
- Process chat history in default order instead of processing it in
  reverse. Improve legibility of context construction for minor
  performance hit in dropping message from front of list.
- Handle multiple system messages by collating them into list
- Remove logic to drop system role for gemma-2, o1 models. Better to
  make code more readable than support old models.
2025-08-27 13:43:10 -07:00
Debanjum
1e81b51abc Support generating images with different aspect ratios
You can now specify shape of images to be generated. It can be one of
portrait, landscape or square.
2025-08-27 13:43:10 -07:00
Debanjum
5a2cae3756 Improve, simplify image generation prompts and context flow
Use seed to stabilize image change consistency across turns when
- KHOJ_LLM_SEED env var is set
- Using Image models via Replicate
  OpenAI, Google do not support image seed
2025-08-27 13:43:10 -07:00
Debanjum
0fb6020f30 Remove model type check to construct structured messages
All model types use a normalized, chatml structured message format
This check isn't used since offline model support was dropped.
2025-08-27 13:43:10 -07:00
Debanjum
386a17371d Fix identifying deepseek r1 model to process its thinking tokens 2025-08-27 13:43:04 -07:00
Debanjum
ff004d31ef Fix extracting inferred queries from chat history db
Inferred queries is stored with underscore in db but aliased with - in memory.

This conversation.messages logic was broken, so inferred queries field
of chat message history was getting ignored.

This change fixes that issue and improve previous image generation
description for better context for subsequent image generation attempts.
2025-08-25 14:19:27 -07:00
Debanjum
892e4d4077 Fix system prompt construction for gemini models
System prompt was duplicating instructions for gemini models
previously
2025-08-24 18:22:31 -07:00
Debanjum
00c5aec614 Release Khoj version 2.0.0-beta.14 2025-08-23 12:12:33 -07:00
Debanjum
b99ccbc4c3 Improve table styling, fix chat sidebar height on web app 2025-08-23 02:05:50 -07:00
Debanjum
29ae476a26 Use groq with service tier auto to fallback to flex on rate limit
Merge gpt-oss config with openai reasoning config as similar tuning.
Add pricing for gpt oss 20b model
2025-08-23 02:05:50 -07:00
Debanjum
c89c5c7b46 Reorder ai model api columns on admin panel for readability 2025-08-23 01:40:05 -07:00
Debanjum
464c1546b7 Support deepseek v3.1 via official deepseek api
The new deepseek-chat is powered by deepseek v3.1, which is a hybrid
reasoning model unlike it's predecessor, deepseek v3.
2025-08-23 01:40:05 -07:00
Debanjum
40488b3b68 Remove redundant exception for retry calls to gemini api
httpx ReadError inherits from NetworkError so not required to mention
it explicitly in gemini api call retry check
2025-08-23 00:48:10 -07:00
Debanjum
8aa9c0f534 Reduce max reasoning tokens for gemini models
A high reasoning tokens does not seem to help for standard Khoj use
cases. And hopefully reducing it may avoid repetition loops by model.
2025-08-23 00:48:10 -07:00
Debanjum
2823c84bb4 Default to gemini 2.5 model series on init and for eval 2025-08-22 20:34:38 -07:00
Debanjum
c53a70c997 Share debug logs from github eval run for debugging 2025-08-22 19:06:37 -07:00
Debanjum
e2f377c27b Render file reference as link with file preview on hover/click in web app
Overview
- Khoj references files it used in its response as markdown links.
  For example [1](file://path/to/file.txt#line=121)
- Previously these file links were just shown as raw text
- This change renders khoj's inline file references as a proper links
  and shows file content preview (around specified line if deeplink)
  on hover or click in the web app

Details
- Render inline file references as links in chat message on web app.
  Previously references like [1](file://path/to/file.txt#line=120)
  would be shown as plain text. Now they are rendered as links
- Preview file content of referenced files on click or hover.
  If reference uses a deeplink with line number, the file content
  around that line is shown on hover, click. Click allows viewing file
  preview on mobile, unlike hover. Hover is easier with mouse.
2025-08-22 18:24:27 -07:00
Debanjum
d8b7e9c8a5 Handle unset content type key when indexing knowledge base on server 2025-08-22 18:24:27 -07:00
Debanjum
3c3205bb06 Fix and improve file read, write handling in Obsidian
Fixes
- Fix to allow khoj to delete content in obsidian write mode
- Do not throw error when no edit blocks in write mode on obsidian
- Limit retries to fix invalid edit blocks in obsidian write mode

Improvements
- Only show 3 recent files as context in obsidian file read, write mode
- Persist open file access mode setting across restarts in obsidian
- Make khoj obsidian keyboard shortcuts toggle voice chat, chat history
- Do not show <SYSTEM> instructions in chat session title on obsidian

Closes #1209
2025-08-20 20:20:12 -07:00
Debanjum
48ed7afab8 Do not show <SYSTEM> instructions in chat session title on obsidian
In obsidian we have a hacky system instruction being passed in read,
write file access modes. This shouldn't be shown in chat sessions list
during view or edit. It is an internal implementation detail.
2025-08-20 20:18:27 -07:00
Debanjum
82dc7b115b Fix to allow khoj to delete content in obsidian write mode
Previous regex and replacement logic did not allow replace block to be
empty
2025-08-20 20:18:27 -07:00
Debanjum
7645cbea3b Do not throw error when no edit blocks in write mode on obsidian
Editing is an option, not a requirement in file write/edit mode.
2025-08-20 20:18:27 -07:00
Debanjum
2e6928c582 Limit retries to fix invalid edit blocks in obsidian write mode 2025-08-20 20:18:27 -07:00
Debanjum
c5e2373d73 Make khoj obsidian keyboard shortcuts toggle voice chat, chat history
Previously hitting voice chat keybinding would just start voice chat,
not end it and just open chat history and not close it.

This is unintuitive and different from the equivalent button click
behaviors.

Fix toggles voice chat on/off and shows/hides chat history when hit
Ctrl+Alt+V, Ctrl+Alt+O keybindings in khoj obsidian chat view
2025-08-20 20:18:27 -07:00
Debanjum
d8b2df4107 Only show 3 recent files as context in obsidian file read, write mode
Related #1209
2025-08-20 20:18:27 -07:00
Debanjum
eb2f0ec6bc Persist open file access mode setting across restarts in obsidian
Allows a lightweight mechanism to persist this user preference.
Improve hover text a bit for readability.

Resolves #1209
2025-08-20 20:18:27 -07:00
Debanjum
2884853c98 Make plugin object accessible to chat, find similar panes in obsidian
Allows ability to access, save settings in a cleaner way
2025-08-20 20:18:27 -07:00
Debanjum
9f6aa922a2 Improve Khoj research tools, gpt-oss support and ai api usage
Better support for GPT OSS
- Tune reasoning effort, temp, top_p for gpt-oss models
- Extract thoughts of openai style models like gpt-oss from api response

Tool use improvements
- Improve view file, code tool prompts. Format other research tool prompts
- Truncate long words in code tool stdout, stderr for context efficiency
- Use instruction instead of query as code tool argument
- Simplify view file tool. Limit viewing upto 50 lines at a time
- Make regex search tool results look more like grep results
- Update khoj personality prompts with better style, capability guide

Web UX improvements
- Wrap long words in train of thought shown on web app
- Do not overwrite charts created in previous code tool use during research
- Update web UX when server side error or hit stop + no task running

Fix AI API Usage
- Use subscriber type specific context window to generate response
- Fix max thinking budget for gemini models to generate final response
- Fix passing temp kwarg to non-streaming openai completion endpoint
- Handle unset reasoning, response chunk from openai api while streaming
- Fix using non-reasoning openai model via responses API
- Fix to calculate usage from openai api streaming completion
2025-08-20 20:06:18 -07:00
Debanjum
13d26ae8b8 Wrap long words in train of thought shown on web app 2025-08-20 19:07:28 -07:00
Debanjum
fb0347a388 Truncate long words in stdout, stderr for context efficiency
Avoid long base64 images etc. in stdout, stderr to result in context
limits being hit.
2025-08-20 19:07:28 -07:00
Debanjum
dbc3330610 Tune reasoning effort, temp, top_p for gpt-oss models 2025-08-20 19:07:28 -07:00
Debanjum
83d725d2d8 Extract thoughts of openai style models like gpt-oss from api response
They use delta.reasoning instead of delta.reasoning_content to share
model reasoning
2025-08-20 19:07:28 -07:00
Debanjum
f483a626b8 Simplify view file tool. Limit viewing upto 50 lines at a time
We were previously truncating by characters. Limiting by max lines
allows model to control line ranges they request
2025-08-20 19:07:28 -07:00
Debanjum
f5a4d106d1 Use instruction instead of query as code tool argument 2025-08-20 19:07:28 -07:00
Debanjum
c5a9c81479 Update khoj personality prompts with better style, capability guide
- Add more color to personality and communication style
- Split prompt into capabilities and style sections
- Remove directives in personality meant for older, less smart models.
- Discourage model from unnecessarily sharing code snippets in final
  response unless explicitly requested.
2025-08-20 19:07:28 -07:00
Debanjum
2c91edbb25 Improve view file, code tool prompts. Format other research tool prompts 2025-08-20 19:07:28 -07:00
Debanjum
452c794e93 Make regex search tool results look more like grep results 2025-08-20 19:07:28 -07:00
Debanjum
9a8c707f84 Do not overwrite charts created in previous code tool use during research 2025-08-20 19:07:28 -07:00
Debanjum
e0007a31bb Update web UX when server side error or hit stop + no task running
- Ack websocket interrupt even when no task running
  Otherwise chat UX isn't updated to indicate query has stopped
  processing for this edge case

- Mark chat request as not being procesed on server side error
2025-08-20 19:07:28 -07:00
Debanjum
222cc19b7f Use subscriber type specific context window to generate response 2025-08-20 19:07:28 -07:00
Debanjum
ff73d30106 Fix max thinking budget for gemini models to generate final response 2025-08-20 19:07:28 -07:00