Simplify integrating Ollama, OpenAI proxies with Khoj on first run

- Integrate with Ollama or other openai compatible APIs by simply setting `OPENAI_API_BASE' environment variable in docker-compose etc. - Update docs on integrating with Ollama, openai proxies on first run - Auto populate all chat models supported by openai compatible APIs - Auto set vision enabled for all commercial models - Minor - Add huggingface cache to khoj_models volume. This is where chat models and (now) sentence transformer models are stored by default - Reduce verbosity of yarn install of web app. Otherwise hit docker log size limit & stops showing remaining logs after web app install - Suggest `ollama pull <model_name>` to start it in background
2026-04-20 01:24:31 +00:00 · 2024-11-16 23:53:11 -08:00
parent 2366fa08b9
commit 69ef6829c1
6 changed files with 164 additions and 84 deletions
--- a/documentation/docs/advanced/ollama.md
+++ b/documentation/docs/advanced/ollama.md
@@ -1,33 +0,0 @@
-# Ollama
-:::info
-This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you're limited to our first-party models.
-:::
-
-:::info
-Khoj natively supports local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). Using an OpenAI API proxy with Khoj maybe useful for ease of setup, trying new models or using commercial LLMs via API.
-:::
-
-Ollama allows you to run [many popular open-source LLMs](https://ollama.com/library) locally from your terminal.
-For folks comfortable with the terminal, Ollama's terminal based flows can ease setup and management of chat models.
-
-Ollama exposes a local [OpenAI API compatible server](https://github.com/ollama/ollama/blob/main/docs/openai.md#models). This makes it possible to use chat models from Ollama to create your personal AI agents with Khoj.
-
-## Setup
-
-1. Setup Ollama: https://ollama.com/
-2. Start your preferred model with Ollama. For example,
-    ```bash
-    ollama run llama3.1
-    ```
-3. Create a new [OpenAI Processor Conversation Config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) on your Khoj admin panel
-   - Name: `ollama`
-   - Api Key: `any string`
-   - Api Base Url: `http://localhost:11434/v1/` (default for Ollama)
-4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
-   - Name: `llama3.1` (replace with the name of your local model)
-   - Model Type: `Openai`
-   - Openai Config: `<the ollama config you created in step 3>`
-   - Max prompt size: `20000` (replace with the max prompt size of your model)
-5. Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
-
-That's it! You should now be able to chat with your Ollama model from Khoj. If you want to add additional models running on Ollama, repeat step 6 for each model.
--- a/documentation/docs/advanced/ollama.mdx
+++ b/documentation/docs/advanced/ollama.mdx
@@ -0,0 +1,78 @@
+# Ollama
+
+```mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+```
+
+:::info
+This is only helpful for self-hosted users. If you're using [Khoj Cloud](https://app.khoj.dev), you can use our first-party supported models.
+:::
+
+:::info
+Khoj can directly run local LLMs [available on HuggingFace in GGUF format](https://huggingface.co/models?library=gguf). The integration with Ollama is useful to run Khoj on Docker and have the chat models use your GPU or to try new models via CLI.
+:::
+
+Ollama allows you to run [many popular open-source LLMs](https://ollama.com/library) locally from your terminal.
+For folks comfortable with the terminal, Ollama's terminal based flows can ease setup and management of chat models.
+
+Ollama exposes a local [OpenAI API compatible server](https://github.com/ollama/ollama/blob/main/docs/openai.md#models). This makes it possible to use chat models from Ollama with Khoj.
+
+## Setup
+:::info
+Restart your Khoj server after first run or update to the settings below to ensure all settings are applied correctly.
+:::
+
+<Tabs groupId="type" queryString>
+  <TabItem value="first-run" label="First Run">
+    <Tabs groupId="server" queryString>
+      <TabItem value="docker" label="Docker">
+      1. Setup Ollama: https://ollama.com/
+      2. Download your preferred chat model with Ollama. For example,
+         ```bash
+         ollama pull llama3.1
+         ```
+      3. Uncomment `OPENAI_API_BASE` environment variable in your downloaded Khoj [docker-compose.yml](https://github.com/khoj-ai/khoj/blob/master/docker-compose.yml#:~:text=OPENAI_API_BASE)
+      4. Start Khoj docker for the first time to automatically integrate and load models from the Ollama running on your host machine
+         ```bash
+         # run below command in the directory where you downloaded the Khoj docker-compose.yml
+         docker-compose up
+         ```
+      </TabItem>
+
+      <TabItem value="pip" label="Pip">
+      1. Setup Ollama: https://ollama.com/
+      2. Download your preferred chat model with Ollama. For example,
+         ```bash
+         ollama pull llama3.1
+         ```
+      3. Set `OPENAI_API_BASE` environment variable to `http://localhost:11434/v1` in your shell before starting Khoj for the first time
+         ```bash
+         export OPENAI_API_BASE="http://localhost:11434/v1"
+         khoj --anonymous-mode
+         ```
+      </TabItem>
+   </Tabs>
+  </TabItem>
+  <TabItem value="update" label="Update">
+   1. Setup Ollama: https://ollama.com/
+   2. Download your preferred chat model with Ollama. For example,
+      ```bash
+      ollama pull llama3.1
+      ```
+   3. Create a new [OpenAI Processor Conversation Config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) on your Khoj admin panel
+      - Name: `ollama`
+      - Api Key: `any string`
+      - Api Base Url: `http://localhost:11434/v1/` (default for Ollama)
+   4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
+      - Name: `llama3.1` (replace with the name of your local model)
+      - Model Type: `Openai`
+      - Openai Config: `<the ollama config you created in step 3>`
+      - Max prompt size: `20000` (replace with the max prompt size of your model)
+   5. Go to [your config](http://localhost:42110/settings) and select the model you just created in the chat model dropdown.
+
+   If you want to add additional models running on Ollama, repeat step 4 for each model.
+  </TabItem>
+</Tabs>
+
+That's it! You should now be able to chat with your Ollama model from Khoj.