Update docs to show how to setup llama-cpp with Khoj

- How to pip install khoj to run offline chat on GPU
  After migration to llama-cpp-python more GPU types are supported but
  require build step so mention how
- New default offline chat model
- Where to get supported chat models from on HuggingFace
This commit is contained in:
Debanjum Singh Solanky
2024-03-16 04:25:14 +05:30
parent 8ca39a436c
commit dcdd1edde2
3 changed files with 27 additions and 7 deletions

View File

@@ -14,16 +14,16 @@ You can configure Khoj to chat with you about anything. When relevant, it'll use
### Setup (Self-Hosting) ### Setup (Self-Hosting)
#### Offline Chat #### Offline Chat
Offline chat stays completely private and works without internet using open-source models. Offline chat stays completely private and can work without internet using open-source models.
> **System Requirements**: > **System Requirements**:
> - Minimum 8 GB RAM. Recommend **16Gb VRAM** > - Minimum 8 GB RAM. Recommend **16Gb VRAM**
> - Minimum **5 GB of Disk** available > - Minimum **5 GB of Disk** available
> - A CPU supporting [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) is required > - A CPU supporting [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) is required
> - A Mac M1+ or [Vulcan supported GPU](https://vulkan.gpuinfo.org/) should significantly speed up chat response times > - An Nvidia, AMD GPU or a Mac M1+ machine would significantly speed up chat response times
1. Open your [Khoj offline settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and click *Enable* on the Offline Chat configuration. 1. Open your [Khoj offline settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and click *Enable* on the Offline Chat configuration.
2. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the offline chat model you want to use. Make sure to use `Offline` as its type. We currently only support offline models that use the [Llama chat prompt](https://replicate.com/blog/how-to-prompt-llama#wrap-user-input-with-inst-inst-tags) format. We recommend using `mistral-7b-instruct-v0.1.Q4_0.gguf`. 2. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the offline chat model you want to use. Make sure to use `Offline` as its type. We support any [GGUF chat model](https://huggingface.co/models?library=gguf) for offline chat. For a balanced chat model that runs well on standard consumer hardware we recommend, use [Hermes-2-Pro-Mistral-7B by NousResearch](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF) by default.
:::tip[Note] :::tip[Note]

View File

@@ -97,6 +97,7 @@ sudo -u postgres createdb khoj --password
##### Local Server Setup ##### Local Server Setup
- *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine* - *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine*
- Check [llama-cpp-python setup](https://python.langchain.com/docs/integrations/llms/llamacpp#installation) if you hit any llama-cpp issues with the installation
Run the following command in your terminal to install the Khoj backend. Run the following command in your terminal to install the Khoj backend.
@@ -104,17 +105,36 @@ Run the following command in your terminal to install the Khoj backend.
<Tabs groupId="operating-systems"> <Tabs groupId="operating-systems">
<TabItem value="macos" label="MacOS"> <TabItem value="macos" label="MacOS">
```shell ```shell
# ARM/M1+ Machines
MAKE_ARGS="-DLLAMA_METAL=on" python -m pip install khoj-assistant
# Intel Machines
python -m pip install khoj-assistant python -m pip install khoj-assistant
``` ```
</TabItem> </TabItem>
<TabItem value="win" label="Windows"> <TabItem value="win" label="Windows">
```shell ```shell
py -m pip install khoj-assistant # 1. (Optional) To use NVIDIA (CUDA) GPU
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
# 1. (Optional) To use AMD (ROCm) GPU
CMAKE_ARGS="-DLLAMA_HIPBLAS=on"
# 1. (Optional) To use VULCAN GPU
CMAKE_ARGS="-DLLAMA_VULKAN=on"
# 2. Install Khoj
py -m pip install khoj-assistant
``` ```
</TabItem> </TabItem>
<TabItem value="unix" label="Linux"> <TabItem value="unix" label="Linux">
```shell ```shell
python -m pip install khoj-assistant # CPU
python -m pip install khoj-assistant
# NVIDIA (CUDA) GPU
CMAKE_ARGS="DLLAMA_CUBLAS=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
# AMD (ROCm) GPU
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
# VULCAN GPU
CMAKE_ARGS="-DLLAMA_VULKAN=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
``` ```
</TabItem> </TabItem>
</Tabs> </Tabs>
@@ -181,7 +201,7 @@ To use the desktop client, you need to go to your Khoj server's settings page (h
1. Select files and folders to index [using the desktop client](/get-started/setup#2-download-the-desktop-client). When you click 'Save', the files will be sent to your server for indexing. 1. Select files and folders to index [using the desktop client](/get-started/setup#2-download-the-desktop-client). When you click 'Save', the files will be sent to your server for indexing.
- Select Notion workspaces and Github repositories to index using the web interface. - Select Notion workspaces and Github repositories to index using the web interface.
[^1]: Khoj, by default, can use [OpenAI GPT3.5+ chat models](https://platform.openai.com/docs/models/overview) or [GPT4All chat models that follow Llama2 Prompt Template](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models2.json). See [this section](/miscellaneous/advanced#use-openai-compatible-llm-api-server-self-hosting) to use non-standard chat models [^1]: Khoj, by default, can use [OpenAI GPT3.5+ chat models](https://platform.openai.com/docs/models/overview) or [GGUF chat models](https://huggingface.co/models?library=gguf). See [this section](/miscellaneous/advanced#use-openai-compatible-llm-api-server-self-hosting) to use non-standard chat models
:::tip[Note] :::tip[Note]
Using Safari on Mac? You might not be able to login to the admin panel. Try using Chrome or Firefox instead. Using Safari on Mac? You might not be able to login to the admin panel. Try using Chrome or Firefox instead.

View File

@@ -10,4 +10,4 @@ Many Open Source projects are used to power Khoj. Here's a few of them:
- Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html) - Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
- [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface - [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
- [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface - [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
- [GPT4All](https://github.com/nomic-ai/gpt4all) to chat with local LLM - [Llama.cpp](https://github.com/ggerganov/llama.cpp) to chat with local LLM