Commit Graph

4750 Commits

Author SHA1 Message Date
Debanjum
50f37d541a Pre-install server deps for fast devcontainer start. Fix dev launch.json
There seems to be a more standard mechanism of specifying launch.json
params for devcontainers. Previous mechanism to write launch.json to
.vscode/launch.json in post creation step does not work.

Improve default launch.json to include khoj admin username, password
with placeholder values to get started with local development faster.

Define dockerfile for devcontainer to pre-built server, web app
dependencies during dev container image creation stage. So install on
dev container startup is sped up as no need to install dependencies.
2025-06-03 01:43:23 -07:00
Debanjum
f3a5fe1ae8 Release Khoj version 1.42.0 2025-06-01 20:52:25 -07:00
Debanjum
82ee0f5451 Revert computer dockerfile startup command to fix operating it 2025-06-01 20:39:58 -07:00
Debanjum
a236288ca9 Fixes to enable dockerized khoj to operate its computer 2025-06-01 19:19:01 -07:00
Debanjum
f95d352eb9 Ensure profile is right border aligned on khoj obsidian settings page
On wide screens it wasn't taking up the header wasn't taking up the
full width, so profile picture could hang out in the middle somewhere.
2025-06-01 17:02:08 -07:00
Debanjum
759ffc46b0 Default to read currently open file when chat with Khoj from Obsidian
Vault is already indexed, this should ease engaging with current
context more easily.
2025-06-01 16:56:19 -07:00
Debanjum
3fb8f77cd5 Fix terminal tool passed to claude 3.7 sonnet as anthropic operator 2025-06-01 16:55:17 -07:00
Debanjum
ddf028f7af Fix khoj computer image name used in docker-compose.yml instead 2025-06-01 16:44:28 -07:00
Debanjum
257bdfadef Setup vscode launch.json and configure pytests for dev container 2025-06-01 16:36:46 -07:00
Debanjum
a98525be01 Add default vscode config for khoj to ease development setup 2025-06-01 16:36:46 -07:00
Debanjum
c6cc709f62 Fix khoj computer image name and only build it once for each arch 2025-06-01 16:36:46 -07:00
Debanjum
a4eb85ac41 Reduce (superficial) xdg dir permissions errors on khoj computer start 2025-06-01 16:36:20 -07:00
sabaimran
e9a107cc06 fix spelling of development 2025-06-01 13:41:39 -07:00
Henri Jamet
dbfac89a0c Major updates to Obsidian Khoj plugin chat interface and editing features (#1109)
## Description
This PR introduces significant improvements to the Obsidian Khoj
plugin's chat interface and editing capabilities, enhancing the overall
user experience and content management functionality.

## Features

### 🔍 Enhanced Communication Mode
I've implemented radio buttons below the chat window for easier
communication mode selection. The modes are now displayed as emojis in
the conversation for a cleaner interface, replacing the previous
text-based system (e.g., /default, /research). I've also documented the
search mode functionality in the help command.

#### Screenshots
- Radio buttons for mode selection
- Emoji display in conversations
![Recording 2025-02-11 at 18 56
10](https://github.com/user-attachments/assets/798d15df-ad32-45bd-b03f-581f6093575a)

### 💬 Revamped Message Interaction
I've redesigned the message buttons with improved spacing and color
coding for better visual differentiation. The new edit button allows
quick message modifications - clicking it removes the conversation up to
that point and copies the message to the input field for easy editing or
retrying questions.

#### Screenshots
- New message styling and color scheme
![Recording 2025-02-11 at 18 44
48](https://github.com/user-attachments/assets/159ece3d-2d80-4583-a7a8-2ef1f253adcc)
- Edit button functionality
![Recording 2025-02-11 at 18 47
52](https://github.com/user-attachments/assets/82ee7221-bc49-4088-9a98-744ef74d1e58)

### 🤖 Advanced Agent Selection System
I've added a new chat creation button with agent selection capability.
Users can now choose from their available agents when starting a new
chat. While agents can't be switched mid-conversation to maintain
context, users can easily start fresh conversations with different
agents.

#### Screenshots
- Agent selection dropdown
![Recording 2025-02-11 at 18 51
27](https://github.com/user-attachments/assets/be4208df-224c-45bf-a5b4-cf0a8068b102)

### 👁️ Real-Time Context Awareness
I've added a button that gives Khoj access to read Obsidian opened tabs.
This allows Khoj to read open notes and track changes in real-time,
maintaining a history of previous versions to provide more contextual
assistance.

#### Screenshots
- Window access toggle
![Recording 2025-02-11 at 18 59
01](https://github.com/user-attachments/assets/b596bfca-f622-41b7-b826-25a8e254d4a2)

### ✏️ Smart Document Editing
Inspired by Cursor IDE's intelligent editing and ChatGPT's Canvas
functionality, I've implemented a first version of a content creation
system we've been discussing. Using a JSON-based modification system,
Khoj can now make precise changes to specific parts of files, with
changes previewed in yellow highlighting before application.
Modification code blocks are neatly organized in collapsible sections
with clear action summaries. While this is just a first step, it's
working remarkably well and I have several ideas for expanding this
functionality to make Khoj an even more powerful content creation
assistant.

#### Screenshots
- JSON modification preview
- Change highlighting system
- Collapsible code blocks
- Accept/cancel controls
![Recording 2025-02-11 at 19 02
32](https://github.com/user-attachments/assets/88826c9e-d0c9-40da-ab78-9976c786aa9e)

---------

Co-authored-by: Debanjum <debanjum@gmail.com>
2025-06-01 10:42:36 +05:30
Debanjum
dee767042e Operate Computer with Khoj Operator (#1190)
## Summary
- Enable Khoj to operate computers: Add experimental computer operator
functionality that allows Khoj to interact with desktop environments,
browsers, and terminals to accomplish complex tasks
- Multi-environment support: Implement computer environments with GUI,
file system, and terminal access. Can control host computer or Docker
container computer

## Key Features
### Computer Operation Capabilities
- Desktop control (screenshots, clicking, typing, keyboard shortcuts)
- File editing and management
- Terminal/bash command execution
- Web browser automation
- Visual feedback via train-of-thought video playback

### Infrastructure & Architecture:
- Docker container (ghcr.io/khoj-ai/computer:latest) with Ubuntu 24.04,
XFCE desktop, VNC access
- Local computer environment support with pyautogui
- Modular operator agent system supporting multiple environment types
- Trajectory compression and context management for long-running tasks

###  Model Integration:
- Anthropic models only (Claude Sonnet 4, Claude 3.7 Sonnet, Claude Opus
4)
- OpenAI and binary operator agents temporarily disabled
- Enhanced caching and context management for operator conversations

### User Experience:
- `/operator` command or just ask Khoj to use operator tool to invoke
computer operation
- Integrate with research mode for extended 30+ minute task execution
- Video of computer operation in train of thought for transparency

### Configuration
- Set `KHOJ_OPERATOR_ENABLED=True` in `docker-compose.yml`
- Requires Anthropic API key
- Computer container runs on port 5900 (VNC)
2025-05-31 22:04:12 -07:00
Debanjum
fa2e370ce6 Document how to enable and use computer operator in operator readme 2025-05-31 21:41:23 -07:00
Debanjum
ceb1d82bf6 Create khoj computer via cloud build. Add computer to docker-compose.yml 2025-05-31 21:39:38 -07:00
Debanjum
68f7aae71c Install claude 4 sonnet, latest gemini 2.5s when configure on first run 2025-05-31 20:52:27 -07:00
Debanjum
b90b724f9a Disable openai, binary operator agents until they become useful 2025-05-31 20:51:08 -07:00
Debanjum
830a1af69e Render operator train of thought as video on web app to ease viewing
- You can seek through the train of thought video of computer operation or
  follow it in live mode.
- Interleaves video with normal text thoughts.
- Video available of old interactions and currently streaming message.
2025-05-31 20:51:08 -07:00
Debanjum
6821bd38ed Fix mypy typing errors in operator environment files
- Add type guards for action.path in drag vs text editor actions
- Added type guards for Union type attribute access
- Fixed variable naming conflicts between drag and text editor cases
- Resolved remaining typing issues in OpenAI, Anthropic agents
- Type guard without requiring another code indent level
2025-05-31 20:51:08 -07:00
Debanjum
c5c06a086e Fix, improve openai operator agent for interrupts, computer environment
- Create reusable method to call model
- Fix to summarize messages on operator run.
- Mark assistant tool calls with role = assistant, not environment

- Try fix message format when load after interrupts.
  Does not work well yet
2025-05-31 20:51:08 -07:00
Debanjum
f517566560 Improve invoking keybindings on computer always using lowercase keys
Previously CTRL+A would get triggered instead of ctrl+a. CTRL+A is
equivalent to ctrl+shift+a. This isn't intended and should be
called directly when required.

Now key combos like ctrl+a on computer firefox etc. work as expected
2025-05-31 20:51:08 -07:00
Debanjum
2558ac7f18 Show thinking and engage deep thought for gemini 2.5 model series
Gemini models now show (a summary of) their thoughts. Stream this in
research mode, similar to how it is done already for claude, deepseek,
qwen etc.
2025-05-31 20:51:08 -07:00
Debanjum
cecbfe35e2 Rename compile response into a private operator agents function 2025-05-31 20:51:08 -07:00
Debanjum
ded1db642c Get max context for user, operator model pair for context compression 2025-05-31 20:51:08 -07:00
Debanjum
7eaf0e80c5 Get max prompt size for given user, model via reusable functions 2025-05-31 20:51:08 -07:00
Debanjum
3797f03625 Log ai model usage on every call to get_chat_usage_metrics in debug mode 2025-05-31 20:51:08 -07:00
Debanjum
4cb900658d Cache system prompt, tools of anthropic operator agent for efficiency 2025-05-31 20:51:08 -07:00
Debanjum
928e5ee8ad Cache messages to anthropic models from chat actors for efficiency 2025-05-31 20:51:08 -07:00
Debanjum
0d1e6b0d53 Do not overwrite system_prompt for idempotent AI API calls retry
Previously on tenacity retry the system_prompt could get overwritten
2025-05-31 20:51:08 -07:00
Debanjum
e0ea151f20 Implement file editor and terminal tools, in-built in claude
This should improve viewing, editing files and viewing terminal
command outputs by anthropic operator
2025-05-31 20:51:08 -07:00
Debanjum
21bf7f1d6d Continue interrupted operator run with new query and previous context
Track research and operator results at each nested iteration step
using python object references + async events bubbled up from nested
iterators.

Instantiates operator with interrupted operator messages from research
or normal mode.

Reflects actual interaction trajectory as closely as possible to agent
including conversation history, partial operator trajectory and new
query for fine grained, corrigible steerability.

Research mode continues with operator tool directly if previous
iteration was an interrupted operator run.
2025-05-31 20:51:08 -07:00
Debanjum
de35d91e1d Pass previous trajectory to operator agents for context 2025-05-31 20:51:08 -07:00
Debanjum
864e0ac8b5 Simplify research iteration and main research function names 2025-05-31 20:51:08 -07:00
Debanjum
6c9d569a22 Fix to get user questions in chat history from user not khoj message
Since partial state reload after interrupt drops Khoj messages. The
assumption that there will always be a Khoj message after a user
message is broken. That is, there can now be multiple user messages
preceding a Khoj user message now.

This change allow for user queries to still be extracted for chat
history even if no khoj message follow.
2025-05-31 20:51:08 -07:00
Debanjum
b6aa77a6f5 Lookback 3 previous turns to select next tool, for questions history 2025-05-31 20:50:03 -07:00
Debanjum
d511cbfa34 Extract constructing question history into shared function for reuse
Minor logic update to only include non image inferred queries for
gemini, anthropic models as well instead of just for openai models.

Apart from that the extracted function should be functionally same.
2025-05-31 16:50:26 -07:00
Debanjum
da663e184c Type operator results. Enable storing, loading operator trajectories.
We were passing operator results as a simple dictionary. Strongly
typing it makes sense as operator results becomes more complex.

Storing operator results with trajectory on interrupts will allow
restarting interrupted operator run with agent messages of interrupted
trajectory loaded into operator agents
2025-05-31 16:50:26 -07:00
Debanjum
675fc0ad05 Decouple trajectory compression from `act'. Reuse func to call llm api 2025-05-31 16:50:26 -07:00
Debanjum
b027024c42 Handle failed operator agent calls to anthropic api more gracefully
Add anthropic operator api call errors to trajectory instead of
erroring out of current operator run
2025-05-31 16:50:26 -07:00
Debanjum
d54bfc19e5 Add trajectory compression to anthropic operator agent
- Add compression parameters to base operator agent for reuse
- Increase default operator iterations
2025-05-31 16:50:26 -07:00
Debanjum
cb451fa67c Put default summarize prompt into operator agent
This allows:
- Each operator agent to own its summarization prompt. That it can
  tune if it wants
- The outer operator loop to pass an override summarize prompt when it
  invokes the summarize func but it does not have to
2025-05-31 16:50:26 -07:00
Debanjum
99fdd91a01 Latch to bottom instantly and well when auto scroll chat stream on web 2025-05-31 16:50:26 -07:00
Debanjum
253656b634 Fix engaging anthropic api cache for operator trajectories.
It had become broken at some point due to refactoring. The cache
control was getting added and removed right after in add_action_results

What we actually wanted to do is clear the old cache breakpoint and
put a new one at the latest operator tool result message.

This should improve operator speed and lower costs with anthropic
models.
2025-05-31 16:50:26 -07:00
Debanjum
faecbdb7d8 Enable operators to use computers 2025-05-31 16:50:25 -07:00
Debanjum
771909f76a Implement docker computer environment for operator
- Generalize building pyautogui into executable python code snippet.
  This should work across docker and local. And should be easier to
  extend to operate a remote computer over the network as well.

- Create dockerfile for pyautogui operate-able containerized computer
2025-05-28 17:40:32 -07:00
Debanjum
e117f57f64 Implement local computer environment for operator 2025-05-28 17:40:32 -07:00
Debanjum
7eab87bfdf Generalize operator to operate multiple types of environment
Previously it could only operate a (playwright) browser. Now
- The operator logic and naming has been updated assuming
  multiple environment types can be operated
- The operator entrypoint is now at __init__.py to simplify imports
  and the entrypoint function is called operate_environment
- All operator agents have been updated to select their system prompts
  and tools based on the environment they'll operate
2025-05-27 19:01:36 -07:00
Debanjum
c0689b2740 Easily interrupt and redirect khoj's research direction via chat
- Khoj can now save and restore research from partial state
  This triggers an interrupt that saves the partial research, then
  when a new query is sent it loads the previous partial research as
  context and continues utilizing with the new user query to orient
  its future research
- Support natural interrupt and send query behavior from web app
  This triggers an abort and send when a user sends a chat message
  while khoj is in the middle of some previous research.

This interrupt mechanism enables a more natural, interactive
research flow
2025-05-27 17:57:21 -07:00