From fa2e370ce6dab207059054ad14e8d864c6f803c9 Mon Sep 17 00:00:00 2001 From: Debanjum Date: Sat, 31 May 2025 21:41:23 -0700 Subject: [PATCH] Document how to enable and use computer operator in operator readme --- src/khoj/processor/operator/README.md | 59 +++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 src/khoj/processor/operator/README.md diff --git a/src/khoj/processor/operator/README.md b/src/khoj/processor/operator/README.md new file mode 100644 index 00000000..defc5c44 --- /dev/null +++ b/src/khoj/processor/operator/README.md @@ -0,0 +1,59 @@ +# Khoj Operator (Experimental) + +## Overview +Give Khoj its own computer to operate in a transparent, controlled manner. Accomplish tasks that require visual browsing, file editing and terminal access. Operator with research mode can work for 30+ minutes to accomplish more substantial tasks like feature development, travel planning, shopping etc. + +## Setup + +### Prerequisites +- Docker and Docker Compose installed +- Anthropic API key (required - only Anthropic models currently enabled) + +### Installation Steps +1. Download the Khoj docker-compose.yml file + ```shell + mkdir ~/.khoj && cd ~/.khoj + wget https://raw.githubusercontent.com/khoj-ai/khoj/master/docker-compose.yml + ``` + +2. Configure environment variables in `docker-compose.yml` + - Set `ANTHROPIC_API_KEY` to your [Anthropic API key](https://console.anthropic.com/settings/keys) + - Uncomment `KHOJ_OPERATOR_ENABLED=True` to enable the operator tool + +3. Start Khoj services + ```shell + docker-compose up + ``` + +4. Access the web app at http://localhost:42110 + Ensure you're using a claude 3.7+ models on your [settings page](http://localhost:42110/settings) + +## Usage +Use the `/operator` command or ask Khoj in normal or research mode to use the operator tool to have it operate its computer: + +**Examples:** +- `/operator Find flights from Bangkok to Mexico City with no US layover` +- `/research Clone the khoj repo and tell me how the operator tool is implemented` + +## Supported Models + +Currently enables **only Anthropic models**: +- Claude Sonnet 4 +- Claude 3.7 Sonnet +- Claude Opus 4 + +*Note: OpenAI and other operator models are disabled while in developemnt.* + +## Capabilities + +The operator can: +- **Computer Control**: Take screenshots, click, type, navigate desktop +- **File Operations**: Create, edit, and manage files +- **Terminal Access**: Execute bash commands and scripts +- **Web Browsing**: Navigate websites, documents and extract information + +## Architecture + +- **Environments**: Operator Computer and Browser environments +- **Models**: Enable Vision Language Models (VLM) to operate computer +- **Execution**: Containerize computer environment for security and isolation