klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-03-02 13:18:18 +00:00

Author	SHA1	Message	Date
Debanjum	a79025ee93	Limit max queries allowed per doc search tool call. Improve prompt Reduce usage of boolean operators like "hello OR bye OR see you" which doesn't work and reduces search quality. They're trying to stuff the search query with multiple different queries.	2025-08-09 12:29:35 -07:00
Debanjum	a3bb7100b4	Speed up app development using a faster, modern toolchain (#1196 ) ## Overview Speed up app install and development using a faster, modern development toolchain ## Details ### Major - Use [uv](https://docs.astral.sh/uv/) for faster server install (vs pip) - Use [bun](https://bun.sh/) for faster web app install (vs yarn) - Use [ruff](https://docs.astral.sh/ruff/) for faster formatting of server code (vs black, isort) - Fix devcontainer builds. See if uv and bun can speed up server and client installs ### Minor - Format web app with prettier and server with ruff. This is most of the file changes in this PR. - Simplify copying web app built files in pypi workflow to make it less flaky.	2025-08-09 12:27:20 -07:00
Debanjum	80cce7b439	Fix server, web app to reuse prebuilt deps on dev container setup	2025-08-01 23:36:13 -07:00
Debanjum	0a0b97446c	Avoid `click' v8.2.2 server dependency as it breaks pypi validation Refer pallets/click issue 3024 for details	2025-08-01 23:36:13 -07:00
Debanjum	f2bd07044e	Speed up github workflows by not installing cuda server dependencies - CI runners don't have GPUs - Pytorch related Nvidia cuda packages are not required for testing, evals or pre-commit checks. - Avoiding these massive downloads should speed up workflow run.	2025-08-01 23:35:08 -07:00
Debanjum	8ad38dfe11	Switch to Bun instead of Deno (or Yarn) for faster web app builds	2025-08-01 03:00:43 -07:00
Debanjum	b86430227c	Dedupe and move dev dependencies out from web app production builds	2025-08-01 00:28:39 -07:00
Debanjum	791ebe3a97	Format web app code with prettier recommendations Too many of these had accumulated earlier from being ignored. Changed to make build logs less noisy	2025-08-01 00:28:39 -07:00
Debanjum	c8e07e86e4	Format server code with ruff recommendations	2025-08-01 00:28:17 -07:00
Debanjum	4a3ed9e5a4	Replace isort, black with ruff for faster linting, formatting	2025-08-01 00:01:34 -07:00
Debanjum	8700fb8937	Use UV, Deno for faster setup of development container	2025-08-01 00:01:34 -07:00
Debanjum	d2940de367	Use Deno for speed, package locks in dev setup, github workflows It's faster than yarn and comes with standard convenience utilities	2025-08-01 00:01:34 -07:00
Debanjum	006b958071	Use UV to install server for speed, package locks in dev setup, workflows It's much faster than pip, includes dependency locks via uv.lock and comes with standard convenience utilities (e.g pipx, venv replacement)	2025-08-01 00:01:34 -07:00
Debanjum	e0f363d718	Use UV to manage python version, env on khoj computer - Use khoj username on khoj's computer - Uv is much faster for builds	2025-07-31 18:31:24 -07:00
Debanjum	0387b86a27	Use portable comparator to get flags used to call dev_setup.sh	2025-07-31 18:31:24 -07:00
Debanjum	c6670e815a	Drop Server Side Indexer, Native Offline Chat, Old Migration Scripts (#1212 ) ### Overview Make server leaner to increase development speed. Remove old indexing code and the native offline chat which was hard to maintain. - The native offline chat module was written when the local ai model api ecosystem wasn't mature. Now it is. Reuse that. - Offline chat requires GPU for usable speeds. Decoupling offline chat from Khoj server is the recommended way to go for practical inference speeds (e.g Ollama on machine, Khoj in docker etc.) ### Details - Drop old code to index files on server filesystem. Clean cli, init paths. - Drop native offline chat support with llama-cpp-python. Use established local ai APIs like Llama.cpp Server, Ollama, vLLM etc. - Drop old pre 1.0 khoj config migration scripts - Update test setup to index test data after old indexing code removed.	2025-07-31 20:26:08 -05:00
Debanjum	892d57314e	Update test setup to index test data after old indexing code removed - Delete tests testing deprecated server side indexing flows - Delete `Local(Plaintext\|Org\|Markdown\|Pdf)Config' methods, files and references in tests - Index test data via new helper method, `get_index_files' - It is modelled after the old `get_org_files' variants in main app - It passes the test data in required format to `configure_content' Allows maintaining the more realistic tests from before while using new indexing mechanism (rather than the deprecated server side indexing mechanism	2025-07-31 18:25:32 -07:00
Debanjum	d9d24dd638	Drop old code to sync files on server filesystem. Clean cli, init paths This stale code was originally used to index files on server file system directly by server. We currently push files to sync via API. Server side syncing of remote content like Github and Notion is still supported. But old, unused code for server side sync of files on server fs is being cleaned out. New --log-file cli args allows specifying where khoj server should store logs on fs. This replaces the --config-file cli arg that was only being used as a proxy for deciding where to store the log file. - TODO - Tests are broken. They were relying on the server side content syncing for test setup	2025-07-31 18:25:32 -07:00
Debanjum	b1f2737c9a	Drop native offline chat support with llama-cpp-python It is recommended to chat with open-source models by running an open-source server like Ollama, Llama.cpp on your GPU powered machine or use a commercial provider of open-source models like DeepInfra or OpenRouter. These chat model serving options provide a mature Openai compatible API that already works with Khoj. Directly using offline chat models only worked reasonably with pip install on a machine with GPU. Docker setup of khoj had trouble with accessing GPU. And without GPU access offline chat is too slow. Deprecating support for an offline chat provider directly from within Khoj will reduce code complexity and increase developement velocity. Offline models are subsumed to use existing Openai ai model provider.	2025-07-31 18:25:32 -07:00
Debanjum	3f8cc71aca	Drop old pre 1.0 khoj config migration scripts These were used when khoj was configured using khoj.yml file	2025-07-31 18:25:32 -07:00
Debanjum	9096f628d0	Release Khoj version 2.0.0-beta.12	2025-07-31 18:13:17 -07:00
Debanjum	a6923fac76	Improve description of query arg to semantic, web search tool Clarify that the tool AI will perform a maximum of X sub-queries for each query passed to it by the manager AI. Avoids the manager AI from trying to directly pass a list of queries to the search tool AI. It should just pass just a single query.	2025-07-31 18:00:46 -07:00
Debanjum	2e13c9a007	Buffer thought chunks on server side for more performant ws streaming Send larger thought chunks to improve streaming efficiency and reduce rendering load on web client. This rendering load was most evident when using high throughput models or low compute clients. The server side message buffering should result in fewer re-renders, faster streaming and lower compute load on client. Related commit to buffer message content in `fc99f8b37`	2025-07-31 18:00:46 -07:00
Debanjum	fba4ad27f7	Extract thought stream from reasoning_content of openai model providers Grok 3 mini at least sends thoughts in reasoning_content field of streamed chunk delta. Extract model thoughts from that when available.	2025-07-31 18:00:46 -07:00
Debanjum	b335f8cf79	Support grok 4 reasoning model	2025-07-31 18:00:46 -07:00
Debanjum	c0db9e4fca	Use better, standard default temp, top_p for openai model providers	2025-07-31 18:00:46 -07:00
Debanjum	7ab24d875d	Release Khoj version 2.0.0-beta.11	2025-07-31 10:25:42 -07:00
Debanjum	6290d744ea	Make code tool write safe code to run in sandbox - Ask both manager and code gen AI to not run or write unsafe code for some safety improvement (over code exec in sandbox). - Disallow custom agent prompts instructing unsafe code gen	2025-07-31 00:11:50 -07:00
Debanjum	0f953f9ec8	Use Gemini suggested retry backoff if set. Improve gemini error handling	2025-07-30 18:16:16 -07:00
Debanjum	bbc14951b4	Redirect to a better error page on server error	2025-07-30 18:08:07 -07:00
Debanjum	6caa6f4008	Make async call to get agent files from async agent/conversation API This should avoid the sync_to_async errors thrown by django when calling the /api/agent/conversation API endpoint	2025-07-30 17:37:54 -07:00
Debanjum	b82d4fe68f	Resolve Pydantic deprecation warnings (#1211 ) ## PR Summary This PR resolves the deprecation warnings of the Pydantic library, which you can find in the [CI logs](https://github.com/khoj-ai/khoj/actions/runs/16528997676/job/46749452047#step:9:142): ```python PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/ ```	2025-07-25 19:50:57 -05:00
Emmanuel Ferdman	655a1b38f2	Resolve Pydantic deprecation warnings Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-07-25 16:55:00 -07:00
Debanjum	f5d12b7546	Bump desktop app and documentation dependencies	2025-07-25 13:37:45 -05:00
Debanjum	f8924f2521	Avoid duplicate chat turn save if chat cancelled during final response Save to conversation in normal flow should only be done if interrupt wasn't triggered. Saving conversations on interrupt is handled completely by the disconnect monitor since the improvements to interrupt. This abort is handled correctly for steps before final response. But not if interrupt occurs while final response is being sent. This changes checks for cancellation after final response send attempt and avoids duplicate chat turn save.	2025-07-25 13:28:13 -05:00
Debanjum	bd9f091a71	Show thoughts of more llm models served via openai compatible api - Extract llm thoughts from more openai compatible ai api providers like llama.cpp server vllm and litellm. - Try structured thought extraction by default - Try in-stream thought extraction for specific model families like qwen and deepseek. - Show thoughts with tool use. For intermediate steps like research mode from openai compatible models Some consensus on thought in model response is being reached with using deepseek style thoughts in structured response (via "reasoning_content" field) or qwen style thoughts in main response (i.e <think></think> tags). Default to try deepseek style structured thought extraction. So the previous default stream processor isn't required.	2025-07-25 13:28:13 -05:00
Debanjum	624d6227ca	Expand to enable deep think for more qwen style models like smollm3	2025-07-25 13:28:13 -05:00
Debanjum	c401bb9591	Stricty enforce tool call schema for llm served via openai compat api This is required by llama.cpp server and is recommended in general for openai compatible models	2025-07-25 13:28:13 -05:00
Debanjum	03c4f614dd	Handle tool call requests with openai completion in non stream mode	2025-07-25 13:28:13 -05:00
Debanjum	70cfaf72e9	Only send start llm response chat event once, after thoughts streamed A previous regression resulted in the start llm response event being sent with every (non-thought) message chunk. It should only be sent once after thoughts and before first normal message chunk is streamed. Regression probably introduced with changes to stream thoughts. This should fix the chat streaming latency logs.	2025-07-25 13:28:13 -05:00
Debanjum	15c6118142	Store event delimiter in chat event enum for reuse	2025-07-25 13:28:13 -05:00
Debanjum	fc99f8b37e	Buffer message chunks on server side for more performant ws streaming Send larger message chunks to improve streaming efficiency and reduce rendering load on web client. This rendering load was most evident when using high throughput models, low compute clients and message with images. As message content was rerendered on every token sent to the web app. The server side message buffering should result in fewer re-renders and lower compute load on client.	2025-07-25 13:28:13 -05:00
Debanjum	bf9a9c7283	Disable per user websocket connection limits in anon or debug mode This rate limiting is only relevant in production scenarios.	2025-07-25 12:19:15 -05:00
Debanjum	48e21d9f0f	Release Khoj version 2.0.0-beta.10	2025-07-19 21:32:14 -05:00
Debanjum	e57acf617a	Convert websocket rate limiter to async method Fixes calling websocket rate limiter from async chat_ws method. Not sure why the issue did not trigger in local setups. Maybe has to do with gunicorn vs uvicorn / multi-workers setup in prod vs local.	2025-07-19 21:15:51 -05:00
Debanjum	76a1b0b686	Release Khoj version 2.0.0-beta.9	2025-07-19 20:20:50 -05:00
Debanjum	43d7e65a49	Limit chat message interrupt queue size to limit performance impact	2025-07-19 20:16:53 -05:00
Debanjum	749160e38d	Validate websocket origin before establishing connection	2025-07-19 20:07:21 -05:00
Debanjum	69a7d332fc	Limit number of new websocket connections allowed per user	2025-07-19 20:04:36 -05:00
Debanjum	76ddf8645c	Improve rate limit and interrupt messages for user, admin	2025-07-19 19:13:51 -05:00

1 2 3 4 5 ...

5008 Commits