klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-03-02 21:19:12 +00:00

Author	SHA1	Message	Date
Debanjum	21bf7f1d6d	Continue interrupted operator run with new query and previous context Track research and operator results at each nested iteration step using python object references + async events bubbled up from nested iterators. Instantiates operator with interrupted operator messages from research or normal mode. Reflects actual interaction trajectory as closely as possible to agent including conversation history, partial operator trajectory and new query for fine grained, corrigible steerability. Research mode continues with operator tool directly if previous iteration was an interrupted operator run.	2025-05-31 20:51:08 -07:00
Debanjum	de35d91e1d	Pass previous trajectory to operator agents for context	2025-05-31 20:51:08 -07:00
Debanjum	864e0ac8b5	Simplify research iteration and main research function names	2025-05-31 20:51:08 -07:00
Debanjum	6c9d569a22	Fix to get user questions in chat history from user not khoj message Since partial state reload after interrupt drops Khoj messages. The assumption that there will always be a Khoj message after a user message is broken. That is, there can now be multiple user messages preceding a Khoj user message now. This change allow for user queries to still be extracted for chat history even if no khoj message follow.	2025-05-31 20:51:08 -07:00
Debanjum	b6aa77a6f5	Lookback 3 previous turns to select next tool, for questions history	2025-05-31 20:50:03 -07:00
Debanjum	d511cbfa34	Extract constructing question history into shared function for reuse Minor logic update to only include non image inferred queries for gemini, anthropic models as well instead of just for openai models. Apart from that the extracted function should be functionally same.	2025-05-31 16:50:26 -07:00
Debanjum	da663e184c	Type operator results. Enable storing, loading operator trajectories. We were passing operator results as a simple dictionary. Strongly typing it makes sense as operator results becomes more complex. Storing operator results with trajectory on interrupts will allow restarting interrupted operator run with agent messages of interrupted trajectory loaded into operator agents	2025-05-31 16:50:26 -07:00
Debanjum	675fc0ad05	Decouple trajectory compression from `act'. Reuse func to call llm api	2025-05-31 16:50:26 -07:00
Debanjum	b027024c42	Handle failed operator agent calls to anthropic api more gracefully Add anthropic operator api call errors to trajectory instead of erroring out of current operator run	2025-05-31 16:50:26 -07:00
Debanjum	d54bfc19e5	Add trajectory compression to anthropic operator agent - Add compression parameters to base operator agent for reuse - Increase default operator iterations	2025-05-31 16:50:26 -07:00
Debanjum	cb451fa67c	Put default summarize prompt into operator agent This allows: - Each operator agent to own its summarization prompt. That it can tune if it wants - The outer operator loop to pass an override summarize prompt when it invokes the summarize func but it does not have to	2025-05-31 16:50:26 -07:00
Debanjum	99fdd91a01	Latch to bottom instantly and well when auto scroll chat stream on web	2025-05-31 16:50:26 -07:00
Debanjum	253656b634	Fix engaging anthropic api cache for operator trajectories. It had become broken at some point due to refactoring. The cache control was getting added and removed right after in add_action_results What we actually wanted to do is clear the old cache breakpoint and put a new one at the latest operator tool result message. This should improve operator speed and lower costs with anthropic models.	2025-05-31 16:50:26 -07:00
Debanjum	faecbdb7d8	Enable operators to use computers	2025-05-31 16:50:25 -07:00
Debanjum	771909f76a	Implement docker computer environment for operator - Generalize building pyautogui into executable python code snippet. This should work across docker and local. And should be easier to extend to operate a remote computer over the network as well. - Create dockerfile for pyautogui operate-able containerized computer	2025-05-28 17:40:32 -07:00
Debanjum	e117f57f64	Implement local computer environment for operator	2025-05-28 17:40:32 -07:00
Debanjum	7eab87bfdf	Generalize operator to operate multiple types of environment Previously it could only operate a (playwright) browser. Now - The operator logic and naming has been updated assuming multiple environment types can be operated - The operator entrypoint is now at __init__.py to simplify imports and the entrypoint function is called operate_environment - All operator agents have been updated to select their system prompts and tools based on the environment they'll operate	2025-05-27 19:01:36 -07:00
Debanjum	c0689b2740	Easily interrupt and redirect khoj's research direction via chat - Khoj can now save and restore research from partial state This triggers an interrupt that saves the partial research, then when a new query is sent it loads the previous partial research as context and continues utilizing with the new user query to orient its future research - Support natural interrupt and send query behavior from web app This triggers an abort and send when a user sends a chat message while khoj is in the middle of some previous research. This interrupt mechanism enables a more natural, interactive research flow	2025-05-27 17:57:21 -07:00
Debanjum	c9e6b8e88d	Align expected types to actual returned types by AI APIs, operator	2025-05-26 00:39:06 -07:00
Debanjum	c1c1fc6265	Make send message validation more robust on web app	2025-05-26 00:35:10 -07:00
Debanjum	6cb512d9cf	Support natural interrupt and send query behavior from web app - Just send your new query. If a query was running previously it'd be interrupted and new query would start processing. This improves on the previous 2 click interrupt and send ux. - Utilizes partial research for interrupted query, so you can now redirect khoj's research direction. This is useful if you need to share more details, change khoj's research direction in anyway or complete research. Khoj's train of thought can be helpful for this.	2025-05-26 00:35:10 -07:00
Debanjum	2b7dd7401b	Continue interrupt queries only after previous query written to DB	2025-05-26 00:35:10 -07:00
Debanjum	3cd6e1a9a6	Save and restore research from partial state	2025-05-26 00:35:09 -07:00
Debanjum	a83c36fa05	Validate operator, research, context.query fields of ChatMessage - Track operator, research context in ChatMessage - Track query field in (document) context field of ChatMessage This allows validating chat message before inserting into DB	2025-05-26 00:03:59 -07:00
Debanjum	02ee4e90a2	Pass doc/web/code/operator context as list[dict] of message content	2025-05-26 00:03:59 -07:00
Debanjum	98b56316e4	Support constructing chat message as a list of dictionaries Research mode recently started passing iteration as list of message content dicts. This change extends to storing it as is in DB.	2025-05-26 00:03:59 -07:00
Debanjum	df9ab51fd0	Track research results as iteration list instead of iteration summaries	2025-05-26 00:03:59 -07:00
Debanjum	5d65fa8698	Use Django timezone funcs to make datetimes in DB timezone aware These seem to be a new class of errors showing up. Explicitly using django timezone functions to add awareness to date time files stored in DB seems to mitigate the issue. Related #1180	2025-05-25 23:43:06 -07:00
Debanjum	231aa1c0df	Support claude 4 models. Engage reasoning, operator. Track costs etc. - Engage reasoning when using claude 4 models - Allow claude 4 models as monolithic operator agents - Ease identifying which anthropic models can reason, operate GUIs - Track costs, set default context window of claude 4 models - Handle stop reason on calls to new claude 4 models	2025-05-25 23:43:06 -07:00
Debanjum	dca17591f3	Handle parsing json from string with plain text suffix	2025-05-23 19:44:02 -07:00
Debanjum	acebb90643	Mention keys expected in prompt to next research tool selector	2025-05-23 19:44:02 -07:00
Debanjum	e968cca273	Clean usage of conversation_id in chat API function - Normalize conversation_id type to str instead of str or UUID - Do not pass conversation_id to agenerate_chat_response as the associated conversation is also being passed. So can get its id directly.	2025-05-23 19:44:02 -07:00
Debanjum	a76032522e	Add type hints to function args calling anthropic model api	2025-05-22 15:02:45 -07:00
Debanjum	97c5222b04	Set type hints and reorder args of all converse_[provider] methods - Query is more important and should be passed before references - Add type hints to user query and references for code readability	2025-05-22 15:02:45 -07:00
Debanjum	2ea16298aa	Create Operator Framework. Enable Khoj to Operate Web Browser (#1174 ) ## Overview 1. Create base framework to compose different operators and environments for Khoj to operate. 2. Enable Khoj to operate a web browser using anthropic, openai, gemini or open-source models Note: This is an alpha level feature release. It is meant for local testing by contributors and self-hosters. ## Capabilities - Have Khoj operate a web browser to complete tasks that require actions and visual feedback. - Experiment with any vision model as operator. Khoj supports monolithic and binary operator - Monolithic operators rely on a single models like claude, openai to both reason and ground operator actions - Binary operators allow bootstrapping a fully local operator. It can use any vision model for visual reasoning when paired with a capable visual grounding model. ## Limitations - In general, it is slower, more expensive and less comprehensive than standard Khoj for research ## Setup 1. Install Khoj with playwright by either - running `pip install khoj[local]` - installing playwright separately via `pip install playwright` and `playwright install chromium` 2. Set `KHOJ_OPERATOR_ENABLED` env var to true (i.e `KHOJ_OPERATOR_ENABLED=true`) 3. Start Khoj (e.g `USE_EMBEDDED_DB="true" khoj --anonymous-mode -vv`) 4. Add the necessary chat model(s) with `vision enabled` via your [Khoj Admin Panel](http://localhost:42110/server/admin) - To use Anthropic claude: `claude-3.7-sonnet*` chat model is required with vision enabled - To use Openai operator: `gpt-4o` chat model is required with vision enabled - For other operator configurations: a chat model named `ui-tars-1.5` is required with vision enabled This can technically be any visual grounding model served via an openai compatible api. I've just tested with ui-tars-1.5-7b deployed to an HF inference endpoint for now. See [deployment instructions](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md) 5. Set your desired vision chat model via [user settings](http://localhost:42110/settings) to use as operator. 6. Run your queries with either the `/operator` slash command or by just asking Khoj in your query to use the operator tool. You can combine run operator in research mode a well ### Advanced Usage - Reuse Browser Session - Why: Have Khoj operate web services you've logged into. E.g manage your gmail, github, social media etc. - Setup 1. Start Chromium or Edge in Remote Debugging mode. For example, on Mac you can start Edge by running the following in your terminal: `/Applications/Microsoft\ Edge.app/Contents/MacOS/Microsoft\ Edge --remote-debugging-port=9222` 4. Connect Khoj to that browser instance by setting the environment variable `KHOJ_CDP_URL` to its URL. By default you'd set `KHOJ_CDP_URL="http://localhost:9222"` ## Architecture ### Operator Agents \| Type \| Design \| \|----- \|-----\| \| Monolithic \| <img src="https://github.com/user-attachments/assets/7a96440f-1732-482b-9bd9-0920cb0c60890" width=400> \| \| Binary \| <img src="https://github.com/user-attachments/assets/c5d101c0-3475-43c2-a301-daa943cde190" width=400> \|	2025-05-20 01:30:36 -07:00
Debanjum	19b4c18b69	Configure max iterations per operator run via environment variable	2025-05-20 01:03:11 -07:00
Debanjum	06a1a22e3b	Align generic grounding agent's interface with uitars grounding agent The generic grounding agent has not been tested properly but at least it should be aligned with the interface being used by the ui-tars grounding agent which has been tested.	2025-05-20 00:31:56 -07:00
Debanjum	0ce74e0329	Show operator context when use operator in default and research mode	2025-05-20 00:31:56 -07:00
Debanjum	cc355f93fc	Use operator context consistently as a dict[str, str] of query, result	2025-05-20 00:31:56 -07:00
Debanjum	07e33994f0	Reduce scroll amount to have previous page stay a bit on screen	2025-05-20 00:31:56 -07:00
Debanjum	e2c1b1fcd3	Add dev container config to ease setup for remote development	2025-05-19 23:34:31 -07:00
Debanjum	fdb681ca0e	Only install desktop, obsidian app from dev_setup.sh with --full flag	2025-05-19 23:34:31 -07:00
Debanjum	33dd4c8c33	Handle gemini returning simple string in response candidates	2025-05-19 19:45:10 -07:00
Debanjum	626ced8b8b	Fix adding code results to chatml messages context	2025-05-19 19:45:10 -07:00
Debanjum	ded753ff9a	Improve parsing tool use coordinate returned by claude operator agent It sometimes outputs coordinates in string rather than list. Make parser more robust to those kind of errors. Share error with operator agent to fix/iterate on instead of exiting the operator loop.	2025-05-19 16:28:55 -07:00
Debanjum	473dd006d5	Remove unnecessary images conversion to png in binary operator agent. It's handled by the ai model interaction handlers in khoj server core.	2025-05-19 16:28:55 -07:00
Debanjum	9f3fbf9021	Encourage reasoner, grounder to work better together in binary operator - Encourage grounder to adhere to the reasoners action instruction - Encourage reasoner to explore other actions when stuck in a loop Previously seemed to be forcing it too strongly to choose "single most important" next action. So may not have been exploring other actions to achieve objective on initial failure.	2025-05-19 16:28:55 -07:00
Debanjum	ac19f6d336	Improve operator exception handling - Do not catch errors messages just to re-throw them. Results in confusing exception happened during handling of an exception stacktrace. Makes it harder to debug - Log error when action_results.content isn't set or empty to debug this operator run error	2025-05-19 16:28:55 -07:00
Debanjum	59e0e092b0	Remove deprecated prompt for grounding model to choose goto, back func Goto and back functions are chosen by the visual reasoning model for increased reliability in selecting those tools. The ui-tars grounding models seems too tuned to use a specific set of tools.	2025-05-19 16:28:55 -07:00
Debanjum	1442a4f6fb	Handle reasoning messages returned by openai cua model Documentation about this is currently limited, confusing. But it seems like reasoning item should be kept if computer_call after, else drop. Add noop placeholder for reasoning item to prevent termination of operator run on response with just reasoning.	2025-05-19 16:28:55 -07:00

1 2 3 4 5 ...

4718 Commits