## Overview 1. Create base framework to compose different operators and environments for Khoj to operate. 2. Enable Khoj to operate a web browser using anthropic, openai, gemini or open-source models **Note**: *This is an alpha level feature release. It is meant for local testing by contributors and self-hosters.* ## Capabilities - Have Khoj operate a web browser to complete tasks that require actions and visual feedback. - Experiment with any vision model as operator. Khoj supports monolithic and binary operator - Monolithic operators rely on a single models like claude, openai to both reason and ground operator actions - Binary operators allow bootstrapping a fully local operator. It can use any vision model for visual reasoning when paired with a capable visual grounding model. ## Limitations - In general, it is slower, more expensive and less comprehensive than standard Khoj for research ## Setup 1. Install Khoj with playwright by either - running `pip install khoj[local]` - installing playwright separately via `pip install playwright` and `playwright install chromium` 2. Set `KHOJ_OPERATOR_ENABLED` env var to true (i.e `KHOJ_OPERATOR_ENABLED=true`) 3. Start Khoj (e.g `USE_EMBEDDED_DB="true" khoj --anonymous-mode -vv`) 4. Add the necessary chat model(s) with `vision enabled` via your [Khoj Admin Panel](http://localhost:42110/server/admin) - To use Anthropic claude: `claude-3.7-sonnet*` chat model is required with vision enabled - To use Openai operator: `gpt-4o` chat model is required with vision enabled - For other operator configurations: a chat model named `ui-tars-1.5` is required with vision enabled This can technically be any visual grounding model served via an openai compatible api. I've just tested with ui-tars-1.5-7b deployed to an HF inference endpoint for now. See [deployment instructions](https://github.com/bytedance/UI-TARS/blob/main/README_deploy.md) 5. Set your desired vision chat model via [user settings](http://localhost:42110/settings) to use as operator. 6. Run your queries with either the `/operator` slash command or by just asking Khoj in your query to use the operator tool. You can combine run operator in research mode a well ### Advanced Usage - Reuse Browser Session - Why: Have Khoj operate web services you've logged into. E.g manage your gmail, github, social media etc. - Setup 1. Start Chromium or Edge in Remote Debugging mode. For example, on Mac you can start Edge by running the following in your terminal: `/Applications/Microsoft\ Edge.app/Contents/MacOS/Microsoft\ Edge --remote-debugging-port=9222` 4. Connect Khoj to that browser instance by setting the environment variable `KHOJ_CDP_URL` to its URL. By default you'd set `KHOJ_CDP_URL="http://localhost:9222"` ## Architecture ### Operator Agents | Type | Design | |----- |-----| | Monolithic | <img src="https://github.com/user-attachments/assets/7a96440f-1732-482b-9bd9-0920cb0c60890" width=400> | | Binary | <img src="https://github.com/user-attachments/assets/c5d101c0-3475-43c2-a301-daa943cde190" width=400> |
🎁 New
- Start any message with
/researchto try out the experimental research mode with Khoj. - Anyone can now create custom agents with tunable personality, tools and knowledge bases.
- Read about Khoj's excellent performance on modern retrieval and reasoning benchmarks.
Overview
Khoj is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.
- Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini, deepseek).
- Get answers from the internet and your docs (including image, pdf, markdown, org-mode, word, notion files).
- Access it from your Browser, Obsidian, Emacs, Desktop, Phone or Whatsapp.
- Create agents with custom knowledge, persona, chat model and tools to take on any role.
- Automate away repetitive research. Get personal newsletters and smart notifications delivered to your inbox.
- Find relevant docs quickly and easily using our advanced semantic search.
- Generate images, talk out loud, play your messages.
- Khoj is open-source, self-hostable. Always.
- Run it privately on your computer or try it on our cloud app.
See it in action
Go to https://app.khoj.dev to see Khoj live.
Full feature list
You can see the full feature list here.
Self-Host
To get started with self-hosting Khoj, read the docs.
Enterprise
Khoj is available as a cloud service, on-premises, or as a hybrid solution. To learn more about Khoj Enterprise, visit our website.
Frequently Asked Questions (FAQ)
Q: Can I use Khoj without self-hosting?
Yes! You can use Khoj right away at https://app.khoj.dev — no setup required.
Q: What kinds of documents can Khoj read?
Khoj supports a wide variety: PDFs, Markdown, Notion, Word docs, org-mode files, and more.
Q: How can I make my own agent?
Check out this blog post for a step-by-step guide to custom agents. For more questions, head over to our Discord!
Contributors
Cheers to our awesome contributors! 🎉
Made with contrib.rocks.
Interested in Contributing?
Khoj is open source. It is sustained by the community and we’d love for you to join it! Whether you’re a coder, designer, writer, or enthusiast, there’s a place for you.
Why Contribute?
- Make an Impact: Help build, test and improve a tool used by thousands to boost productivity.
- Learn & Grow: Work on cutting-edge AI, LLMs, and semantic search technologies.
You can help us build new features, improve the project documentation, report issues and fix bugs. If you're a developer, please see our Contributing Guidelines and check out good first issues to work on.

