Commit Graph

5117 Commits

Author SHA1 Message Date
Debanjum
20347e21c2 Reduce noisy indexing logs 2025-08-12 12:06:43 -07:00
Debanjum
bd82626084 Release Khoj version 2.0.0-beta.13 2025-08-11 22:29:06 -07:00
Debanjum
cbeefb7f94 Update researcher prompt to handle ambiguous queries. Clear stale text
Make researcher handle ambiguous requests better by working with
reasonable assumptions (clearly told to user in response) instead of
burdering user with clarification requests.

Fix portions of the researcher prompt that had gone stale since moving
to tool use and making researcher more task (vs q&a) oriented
2025-08-11 22:28:47 -07:00
Debanjum
0a6d87067d Fix to have researcher let the coder tool write code
Previously the researcher was passing the whole code to execute in its
queries to the tool AI instead of asking it to write the code and
limiting its query to a natural language request (with required data).

The division of responsibility should help researcher just worry about
constructing a request with all the required details instead of also
worrying about writing correct code.
2025-08-11 22:28:47 -07:00
Debanjum
0186403891 Limit retry to transient openai API errors. Return non-empty tool output 2025-08-11 21:53:21 -07:00
Debanjum
41f89cf7f3 Handle price, responses of models served via Groq
Their tool call response may not strictly follow expected response
format. Let researcher handle incorrect arguments to code tool (i.e
triggers type error)
2025-08-11 19:32:41 -07:00
Debanjum
b2d26088dc Use openai responses api to interact with official openai models
What
- Get reasoning of openai reasoning models from responses api for sho
- Improves cache hits and reasoning reuse for iterative agents like
  research mode.

This should improve speed, quality, cost and transparency of using
openai reasoning models.

More cache hits and better reasoning as reasoning blocks are included
while model is researching (reasoning intersperse with tool calls)
when using the responses api.
2025-08-09 14:03:24 -07:00
Debanjum
564adb24a7 Add support for GPT 5 model series 2025-08-09 14:03:13 -07:00
Debanjum
0e1615acc8 Fix grep files tool to work with line start, end anchors
Previously line start, end anchors would just work if the whole file
started or ended with the regex pattern rather than matching by line.

Fix it to work like a standard grep tool and match by line start, end.
2025-08-09 12:29:35 -07:00
Debanjum
a79025ee93 Limit max queries allowed per doc search tool call. Improve prompt
Reduce usage of boolean operators like "hello OR bye OR see you" which
doesn't work and reduces search quality. They're trying to stuff the
search query with multiple different queries.
2025-08-09 12:29:35 -07:00
Debanjum
a3bb7100b4 Speed up app development using a faster, modern toolchain (#1196)
## Overview
Speed up app install and development using a faster, modern development
toolchain

## Details
### Major
- Use [uv](https://docs.astral.sh/uv/) for faster server install (vs
pip)
- Use [bun](https://bun.sh/) for faster web app install (vs yarn)
- Use [ruff](https://docs.astral.sh/ruff/) for faster formatting of
server code (vs black, isort)
- Fix devcontainer builds. See if uv and bun can speed up server and
client installs

### Minor
- Format web app with prettier and server with ruff. This is most of the
file changes in this PR.
- Simplify copying web app built files in pypi workflow to make it less
flaky.
2025-08-09 12:27:20 -07:00
Debanjum
80cce7b439 Fix server, web app to reuse prebuilt deps on dev container setup 2025-08-01 23:36:13 -07:00
Debanjum
0a0b97446c Avoid `click' v8.2.2 server dependency as it breaks pypi validation
Refer pallets/click issue 3024 for details
2025-08-01 23:36:13 -07:00
Debanjum
f2bd07044e Speed up github workflows by not installing cuda server dependencies
- CI runners don't have GPUs
- Pytorch related Nvidia cuda packages are not required for testing,
  evals or pre-commit checks.
- Avoiding these massive downloads should speed up workflow run.
2025-08-01 23:35:08 -07:00
Debanjum
8ad38dfe11 Switch to Bun instead of Deno (or Yarn) for faster web app builds 2025-08-01 03:00:43 -07:00
Debanjum
b86430227c Dedupe and move dev dependencies out from web app production builds 2025-08-01 00:28:39 -07:00
Debanjum
791ebe3a97 Format web app code with prettier recommendations
Too many of these had accumulated earlier from being ignored.
Changed to make build logs less noisy
2025-08-01 00:28:39 -07:00
Debanjum
c8e07e86e4 Format server code with ruff recommendations 2025-08-01 00:28:17 -07:00
Debanjum
4a3ed9e5a4 Replace isort, black with ruff for faster linting, formatting 2025-08-01 00:01:34 -07:00
Debanjum
8700fb8937 Use UV, Deno for faster setup of development container 2025-08-01 00:01:34 -07:00
Debanjum
d2940de367 Use Deno for speed, package locks in dev setup, github workflows
It's faster than yarn and comes with standard convenience utilities
2025-08-01 00:01:34 -07:00
Debanjum
006b958071 Use UV to install server for speed, package locks in dev setup, workflows
It's much faster than pip, includes dependency locks via uv.lock and
comes with standard convenience utilities (e.g pipx, venv replacement)
2025-08-01 00:01:34 -07:00
Debanjum
e0f363d718 Use UV to manage python version, env on khoj computer
- Use khoj username on khoj's computer
- Uv is much faster for builds
2025-07-31 18:31:24 -07:00
Debanjum
0387b86a27 Use portable comparator to get flags used to call dev_setup.sh 2025-07-31 18:31:24 -07:00
Debanjum
c6670e815a Drop Server Side Indexer, Native Offline Chat, Old Migration Scripts (#1212)
### Overview
Make server leaner to increase development speed. 
Remove old indexing code and the native offline chat which was hard to
maintain.

- The native offline chat module was written when the local ai model api
ecosystem wasn't mature. Now it is. Reuse that.
- Offline chat requires GPU for usable speeds. Decoupling offline chat
from Khoj server is the recommended way to go for practical inference
speeds (e.g Ollama on machine, Khoj in docker etc.)

### Details
- Drop old code to index files on server filesystem. Clean cli, init
paths.
- Drop native offline chat support with llama-cpp-python. 
  Use established local ai APIs like Llama.cpp Server, Ollama, vLLM etc.
- Drop old pre 1.0 khoj config migration scripts
- Update test setup to index test data after old indexing code removed.
2025-07-31 20:26:08 -05:00
Debanjum
892d57314e Update test setup to index test data after old indexing code removed
- Delete tests testing deprecated server side indexing flows
- Delete `Local(Plaintext|Org|Markdown|Pdf)Config' methods, files and
  references in tests
- Index test data via new helper method, `get_index_files'
  - It is modelled after the old `get_org_files' variants in main app
  - It passes the test data in required format to `configure_content'
    Allows maintaining the more realistic tests from before while
    using new indexing mechanism (rather than the deprecated server
    side indexing mechanism
2025-07-31 18:25:32 -07:00
Debanjum
d9d24dd638 Drop old code to sync files on server filesystem. Clean cli, init paths
This stale code was originally used to index files on server file
system directly by server. We currently push files to sync via API.

Server side syncing of remote content like Github and Notion is still
supported. But old, unused code for server side sync of files on
server fs is being cleaned out.

New --log-file cli args allows specifying where khoj server should
store logs on fs. This replaces the --config-file cli arg that was
only being used as a proxy for deciding where to store the log file.

- TODO
  - Tests are broken. They were relying on the server side content
    syncing for test setup
2025-07-31 18:25:32 -07:00
Debanjum
b1f2737c9a Drop native offline chat support with llama-cpp-python
It is recommended to chat with open-source models by running an
open-source server like Ollama, Llama.cpp on your GPU powered machine
or use a commercial provider of open-source models like DeepInfra or
OpenRouter.

These chat model serving options provide a mature Openai compatible
API that already works with Khoj.

Directly using offline chat models only worked reasonably with pip
install on a machine with GPU. Docker setup of khoj had trouble with
accessing GPU. And without GPU access offline chat is too slow.

Deprecating support for an offline chat provider directly from within
Khoj will reduce code complexity and increase developement velocity.
Offline models are subsumed to use existing Openai ai model provider.
2025-07-31 18:25:32 -07:00
Debanjum
3f8cc71aca Drop old pre 1.0 khoj config migration scripts
These were used when khoj was configured using khoj.yml file
2025-07-31 18:25:32 -07:00
Debanjum
9096f628d0 Release Khoj version 2.0.0-beta.12 2025-07-31 18:13:17 -07:00
Debanjum
a6923fac76 Improve description of query arg to semantic, web search tool
Clarify that the tool AI will perform a maximum of X sub-queries for
each query passed to it by the manager AI.

Avoids the manager AI from trying to directly pass a list of queries
to the search tool AI. It should just pass just a single query.
2025-07-31 18:00:46 -07:00
Debanjum
2e13c9a007 Buffer thought chunks on server side for more performant ws streaming
Send larger thought chunks to improve streaming efficiency and
reduce rendering load on web client.

This rendering load was most evident when using high throughput
models or low compute clients.

The server side message buffering should result in fewer re-renders,
faster streaming and lower compute load on client.

Related commit to buffer message content in fc99f8b37
2025-07-31 18:00:46 -07:00
Debanjum
fba4ad27f7 Extract thought stream from reasoning_content of openai model providers
Grok 3 mini at least sends thoughts in reasoning_content field of
streamed chunk delta. Extract model thoughts from that when available.
2025-07-31 18:00:46 -07:00
Debanjum
b335f8cf79 Support grok 4 reasoning model 2025-07-31 18:00:46 -07:00
Debanjum
c0db9e4fca Use better, standard default temp, top_p for openai model providers 2025-07-31 18:00:46 -07:00
Debanjum
7ab24d875d Release Khoj version 2.0.0-beta.11 2025-07-31 10:25:42 -07:00
Debanjum
6290d744ea Make code tool write safe code to run in sandbox
- Ask both manager and code gen AI to not run or write
  unsafe code for some safety improvement (over code exec in sandbox).
- Disallow custom agent prompts instructing unsafe code gen
2025-07-31 00:11:50 -07:00
Debanjum
0f953f9ec8 Use Gemini suggested retry backoff if set. Improve gemini error handling 2025-07-30 18:16:16 -07:00
Debanjum
bbc14951b4 Redirect to a better error page on server error 2025-07-30 18:08:07 -07:00
Debanjum
6caa6f4008 Make async call to get agent files from async agent/conversation API
This should avoid the sync_to_async errors thrown by django when
calling the /api/agent/conversation API endpoint
2025-07-30 17:37:54 -07:00
Debanjum
b82d4fe68f Resolve Pydantic deprecation warnings (#1211)
## PR Summary
This PR resolves the deprecation warnings of the Pydantic library, which
you can find in the [CI
logs](https://github.com/khoj-ai/khoj/actions/runs/16528997676/job/46749452047#step:9:142):
```python
PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
```
2025-07-25 19:50:57 -05:00
Emmanuel Ferdman
655a1b38f2 Resolve Pydantic deprecation warnings
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-07-25 16:55:00 -07:00
Debanjum
f5d12b7546 Bump desktop app and documentation dependencies 2025-07-25 13:37:45 -05:00
Debanjum
f8924f2521 Avoid duplicate chat turn save if chat cancelled during final response
Save to conversation in normal flow should only be done if
interrupt wasn't triggered.

Saving conversations on interrupt is handled completely by the
disconnect monitor since the improvements to interrupt.

This abort is handled correctly for steps before final response. But
not if interrupt occurs while final response is being sent. This
changes checks for cancellation after final response send attempt and
avoids duplicate chat turn save.
2025-07-25 13:28:13 -05:00
Debanjum
bd9f091a71 Show thoughts of more llm models served via openai compatible api
- Extract llm thoughts from more openai compatible ai api providers
  like llama.cpp server vllm and litellm.
  - Try structured thought extraction by default
  - Try in-stream thought extraction for specific model families like
    qwen and deepseek.
- Show thoughts with tool use. For intermediate steps like research
  mode from openai compatible models

Some consensus on thought in model response is being reached with
using deepseek style thoughts in structured response (via
"reasoning_content" field)  or qwen style thoughts in main
response (i.e <think></think> tags).

Default to try deepseek style structured thought extraction. So the
previous default stream processor isn't required.
2025-07-25 13:28:13 -05:00
Debanjum
624d6227ca Expand to enable deep think for more qwen style models like smollm3 2025-07-25 13:28:13 -05:00
Debanjum
c401bb9591 Stricty enforce tool call schema for llm served via openai compat api
This is required by llama.cpp server and is recommended in general for
openai compatible models
2025-07-25 13:28:13 -05:00
Debanjum
03c4f614dd Handle tool call requests with openai completion in non stream mode 2025-07-25 13:28:13 -05:00
Debanjum
70cfaf72e9 Only send start llm response chat event once, after thoughts streamed
A previous regression resulted in the start llm response event being
sent with every (non-thought) message chunk. It should only be sent
once after thoughts and before first normal message chunk is streamed.

Regression probably introduced with changes to stream thoughts.

This should fix the chat streaming latency logs.
2025-07-25 13:28:13 -05:00
Debanjum
15c6118142 Store event delimiter in chat event enum for reuse 2025-07-25 13:28:13 -05:00