klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-03-02 13:18:18 +00:00

Author	SHA1	Message	Date
Debanjum	a6923fac76	Improve description of query arg to semantic, web search tool Clarify that the tool AI will perform a maximum of X sub-queries for each query passed to it by the manager AI. Avoids the manager AI from trying to directly pass a list of queries to the search tool AI. It should just pass just a single query.	2025-07-31 18:00:46 -07:00
Debanjum	2e13c9a007	Buffer thought chunks on server side for more performant ws streaming Send larger thought chunks to improve streaming efficiency and reduce rendering load on web client. This rendering load was most evident when using high throughput models or low compute clients. The server side message buffering should result in fewer re-renders, faster streaming and lower compute load on client. Related commit to buffer message content in `fc99f8b37`	2025-07-31 18:00:46 -07:00
Debanjum	fba4ad27f7	Extract thought stream from reasoning_content of openai model providers Grok 3 mini at least sends thoughts in reasoning_content field of streamed chunk delta. Extract model thoughts from that when available.	2025-07-31 18:00:46 -07:00
Debanjum	b335f8cf79	Support grok 4 reasoning model	2025-07-31 18:00:46 -07:00
Debanjum	c0db9e4fca	Use better, standard default temp, top_p for openai model providers	2025-07-31 18:00:46 -07:00
Debanjum	7ab24d875d	Release Khoj version 2.0.0-beta.11	2025-07-31 10:25:42 -07:00
Debanjum	6290d744ea	Make code tool write safe code to run in sandbox - Ask both manager and code gen AI to not run or write unsafe code for some safety improvement (over code exec in sandbox). - Disallow custom agent prompts instructing unsafe code gen	2025-07-31 00:11:50 -07:00
Debanjum	0f953f9ec8	Use Gemini suggested retry backoff if set. Improve gemini error handling	2025-07-30 18:16:16 -07:00
Debanjum	bbc14951b4	Redirect to a better error page on server error	2025-07-30 18:08:07 -07:00
Debanjum	6caa6f4008	Make async call to get agent files from async agent/conversation API This should avoid the sync_to_async errors thrown by django when calling the /api/agent/conversation API endpoint	2025-07-30 17:37:54 -07:00
Debanjum	b82d4fe68f	Resolve Pydantic deprecation warnings (#1211 ) ## PR Summary This PR resolves the deprecation warnings of the Pydantic library, which you can find in the [CI logs](https://github.com/khoj-ai/khoj/actions/runs/16528997676/job/46749452047#step:9:142): ```python PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/ ```	2025-07-25 19:50:57 -05:00
Emmanuel Ferdman	655a1b38f2	Resolve Pydantic deprecation warnings Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-07-25 16:55:00 -07:00
Debanjum	f5d12b7546	Bump desktop app and documentation dependencies	2025-07-25 13:37:45 -05:00
Debanjum	f8924f2521	Avoid duplicate chat turn save if chat cancelled during final response Save to conversation in normal flow should only be done if interrupt wasn't triggered. Saving conversations on interrupt is handled completely by the disconnect monitor since the improvements to interrupt. This abort is handled correctly for steps before final response. But not if interrupt occurs while final response is being sent. This changes checks for cancellation after final response send attempt and avoids duplicate chat turn save.	2025-07-25 13:28:13 -05:00
Debanjum	bd9f091a71	Show thoughts of more llm models served via openai compatible api - Extract llm thoughts from more openai compatible ai api providers like llama.cpp server vllm and litellm. - Try structured thought extraction by default - Try in-stream thought extraction for specific model families like qwen and deepseek. - Show thoughts with tool use. For intermediate steps like research mode from openai compatible models Some consensus on thought in model response is being reached with using deepseek style thoughts in structured response (via "reasoning_content" field) or qwen style thoughts in main response (i.e <think></think> tags). Default to try deepseek style structured thought extraction. So the previous default stream processor isn't required.	2025-07-25 13:28:13 -05:00
Debanjum	624d6227ca	Expand to enable deep think for more qwen style models like smollm3	2025-07-25 13:28:13 -05:00
Debanjum	c401bb9591	Stricty enforce tool call schema for llm served via openai compat api This is required by llama.cpp server and is recommended in general for openai compatible models	2025-07-25 13:28:13 -05:00
Debanjum	03c4f614dd	Handle tool call requests with openai completion in non stream mode	2025-07-25 13:28:13 -05:00
Debanjum	70cfaf72e9	Only send start llm response chat event once, after thoughts streamed A previous regression resulted in the start llm response event being sent with every (non-thought) message chunk. It should only be sent once after thoughts and before first normal message chunk is streamed. Regression probably introduced with changes to stream thoughts. This should fix the chat streaming latency logs.	2025-07-25 13:28:13 -05:00
Debanjum	15c6118142	Store event delimiter in chat event enum for reuse	2025-07-25 13:28:13 -05:00
Debanjum	fc99f8b37e	Buffer message chunks on server side for more performant ws streaming Send larger message chunks to improve streaming efficiency and reduce rendering load on web client. This rendering load was most evident when using high throughput models, low compute clients and message with images. As message content was rerendered on every token sent to the web app. The server side message buffering should result in fewer re-renders and lower compute load on client.	2025-07-25 13:28:13 -05:00
Debanjum	bf9a9c7283	Disable per user websocket connection limits in anon or debug mode This rate limiting is only relevant in production scenarios.	2025-07-25 12:19:15 -05:00
Debanjum	48e21d9f0f	Release Khoj version 2.0.0-beta.10	2025-07-19 21:32:14 -05:00
Debanjum	e57acf617a	Convert websocket rate limiter to async method Fixes calling websocket rate limiter from async chat_ws method. Not sure why the issue did not trigger in local setups. Maybe has to do with gunicorn vs uvicorn / multi-workers setup in prod vs local.	2025-07-19 21:15:51 -05:00
Debanjum	76a1b0b686	Release Khoj version 2.0.0-beta.9	2025-07-19 20:20:50 -05:00
Debanjum	43d7e65a49	Limit chat message interrupt queue size to limit performance impact	2025-07-19 20:16:53 -05:00
Debanjum	749160e38d	Validate websocket origin before establishing connection	2025-07-19 20:07:21 -05:00
Debanjum	69a7d332fc	Limit number of new websocket connections allowed per user	2025-07-19 20:04:36 -05:00
Debanjum	76ddf8645c	Improve rate limit and interrupt messages for user, admin	2025-07-19 19:13:51 -05:00
Debanjum	de7668daec	Add websocket chat api to ease bi-directional communication (#1207 ) - Add a websocket api endpoint for chat. Reuse most of the existing chat logic. - Communicate from web app using the websocket chat api endpoint. - Pass interrupt messages using websocket to guide research, operator trajectory Previously we were using the abort and send new POST /api/chat mechanism. This didn't scale well to multi-worker setups as a different worker could pick up the new interrupt message request. Using websocket to send messages in the middle of long running tasks should work more naturally.	2025-07-17 18:06:43 -07:00
Debanjum	b90e2367d5	Fix interrupt UX and research when using websocket via web app	2025-07-17 17:09:21 -07:00
Debanjum	0ecd5f497d	Show more informative title for semantic search train of thought	2025-07-17 17:09:21 -07:00
Debanjum	7b7b1830b7	Make callers only share new messages to append to chat logs - Chat history is retrieved and updated with new messages just before write. This is to reduce chance of message loss due to conflicting writes making last to save to conversation win conflict. - This was problematic artifact of old code. Removing it should reduce conflict surface area. - Interrupts and live chat could hit this issue due to different reasons	2025-07-17 17:09:18 -07:00
Debanjum	eaed0c839e	Use websocket chat api endpoint to communicate from web app - Use websocket library to handle setup, reconnection from web app Use react-use-websocket library to handle websocket connection and reconnection logic. Previously connection wasn't re-established on disconnects. - Send interrupt messages with ws to update research, operator trajectory Previously we were using the abort and send new POST /api/chat mechanism. But now we can use the websocket's bi-directional messaging capability to send users messages in the middle of a research, operator run. This change should 1. Allow for a faster, more interactive interruption to shift the research direction without breaking the conversation flow. As previously we were using the DB to communicate interrupts across workers, this would take time and feel sluggish on the UX. 2. Be a more robust interrupt mechanism that'll work in multi worker setups. As same worker is interacted with to send interrupt messages instead of potentially new worker receiving the POST /api/chat with the interrupt user message. On the server we're using an asyncio Queue to pass messages down from websocket api to researcher via event generator. This can be extended to pass to other iterative agents like operator.	2025-07-17 17:06:55 -07:00
Debanjum	9f0eff6541	Handle passing interrupt messages from api to chat actors on server	2025-07-17 17:06:55 -07:00
Debanjum	38dd85c91f	Add websocket chat api endpoint to ease bi-directional communication	2025-07-17 17:06:55 -07:00
Debanjum	99ed796c00	Release Khoj version 2.0.0-beta.8	2025-07-15 16:42:44 -07:00
Debanjum	0a05a5709e	Use agent chat model to generate code instead of default chat model This is consistent with chat model preference order for other tools	2025-07-15 16:22:29 -07:00
Debanjum	238bd66c42	Fix to map user tool names to equivalent tool sets for research mode Fix using research tool names instead of slash command tool names (exposed to user) in research mode conversation history construction. Map agent input tools to relevant research tools. Previously using agents with a limited set of tools in research mode reduces tools available to agent in research mode. Fix checks to skip tools if not configured.	2025-07-15 16:22:29 -07:00
Debanjum	76ed97d066	Set friendly name for auto loaded chat models during first run The chat model friendly name field was introduced in `a8c47a70f`. But we weren't setting the friendly name for ollama models, which get automatically loaded on first run. This broke setting chat model options, server admin settings and creating new chat pages (at least) as they display the chat model's friendly name. This change ensures the friendly name for auto loaded chat models is set to resolve these issues. We also add a null ref check to web app model selector as an additional safeguard to prevent new chat page crash due to missing friendly name going forward. Resolves #1208	2025-07-15 14:27:04 -07:00
Debanjum	0a06f5b41a	Release Khoj version 2.0.0-beta.7	2025-07-11 00:04:56 -07:00
Debanjum	d42176fa7e	Drop tool call, result without tool id on call to Anthropic, Openai APIs	2025-07-11 00:00:05 -07:00
Debanjum	d27aac7f13	Suppress non-actionable pdf indexing warning from logs	2025-07-11 00:00:05 -07:00
Debanjum	05176cd62b	Log dropping messages with invalid content as warnings, not errors They are expected when conversation got interrupted.	2025-07-11 00:00:05 -07:00
Debanjum	b2952236c4	Log conversation id to help troubleshoot errors faster	2025-07-10 23:56:42 -07:00
Debanjum	25db59e49c	Fix to return openai formatted messages in the correct order We'd reversed the formatting of openai messages to drop invalid messages without affecting the other messages being appended . But we need to reverse the final formatted list to return in the right order.	2025-07-10 23:56:22 -07:00
Debanjum	c8ec29551f	Drop invalid messages in reverse order to continue interrupted chats Previously - message with invalid content were getting dropped in normal order which would change the item index being iterated for gemini and anthropic models - messages with empty content weren't getting dropped for openai compatible api models. While openai api is resilient to this, it's better to drop these invalid messages as other openai compatible APIs may not handle this. We see messages with empty or no content when chat gets interrupted due to disconnections, interrupt messages or explicit aborts by user. This changes should now drop invalid messages and not mess formatting of the other messages in a conversation. It should allow continuing interrupted conversations with any ai model.	2025-07-10 22:39:52 -07:00
Debanjum	f1a3ddf2ca	Release Khoj version 2.0.0-beta.6	2025-07-10 13:41:06 -07:00
Debanjum	7b637d3432	Use document style ux when print conversations to pdf Inspired by my previous turnstyle ux explorations. But basically user message becomes section title and khoj message becomes section body with the timestamp being used a section title, body divider.	2025-07-10 13:27:04 -07:00
Debanjum	c28e90f388	Revert to use standard 1.0 temperature for gemini models Using temp of 1.2 didn't help eliminate the repetition loops the gemini models go into sometimes.	2025-07-09 18:22:05 -07:00

1 2 3 4 5 ...

4937 Commits