From dcdd1edde2e7638fffe8e88fca3f8450b7448de5 Mon Sep 17 00:00:00 2001
From: Debanjum Singh Solanky <debanjum@gmail.com>
Date: Sat, 16 Mar 2024 04:25:14 +0530
Subject: [PATCH] Update docs to show how to setup llama-cpp with Khoj

- How to pip install khoj to run offline chat on GPU
  After migration to llama-cpp-python more GPU types are supported but
  require build step so mention how
- New default offline chat model
- Where to get supported chat models from on HuggingFace
---
 documentation/docs/features/chat.md         |  6 ++---
 documentation/docs/get-started/setup.mdx    | 26 ++++++++++++++++++---
 documentation/docs/miscellaneous/credits.md |  2 +-
 3 files changed, 27 insertions(+), 7 deletions(-)
diff --git a/documentation/docs/features/chat.md b/documentation/docs/features/chat.md
index f6581746..323ab71c 100644
--- a/documentation/docs/features/chat.md
+++ b/documentation/docs/features/chat.md
@@ -14,16 +14,16 @@ You can configure Khoj to chat with you about anything. When relevant, it'll use
 
 ### Setup (Self-Hosting)
 #### Offline Chat
-Offline chat stays completely private and works without internet using open-source models.
+Offline chat stays completely private and can work without internet using open-source models.
 
 > **System Requirements**:
 >  - Minimum 8 GB RAM. Recommend **16Gb VRAM**
 >  - Minimum **5 GB of Disk** available
 >  - A CPU supporting [AVX or AVX2 instructions](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) is required
->  - A Mac M1+ or [Vulcan supported GPU](https://vulkan.gpuinfo.org/) should significantly speed up chat response times
+>  - An Nvidia, AMD GPU or a Mac M1+ machine would significantly speed up chat response times
 
 1. Open your [Khoj offline settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and click *Enable* on the Offline Chat configuration.
-2. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the offline chat model you want to use. Make sure to use `Offline` as its type. We currently only support offline models that use the [Llama chat prompt](https://replicate.com/blog/how-to-prompt-llama#wrap-user-input-with-inst-inst-tags) format. We recommend using `mistral-7b-instruct-v0.1.Q4_0.gguf`.
+2. Open your [Chat model options](http://localhost:42110/server/admin/database/chatmodeloptions/) and add a new option for the offline chat model you want to use. Make sure to use `Offline` as its type. We support any [GGUF chat model](https://huggingface.co/models?library=gguf) for offline chat. For a balanced chat model that runs well on standard consumer hardware we recommend, use [Hermes-2-Pro-Mistral-7B by NousResearch](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF) by default.
 
 
 :::tip[Note]
diff --git a/documentation/docs/get-started/setup.mdx b/documentation/docs/get-started/setup.mdx
index 3b2b8db5..b209bbee 100644
--- a/documentation/docs/get-started/setup.mdx
+++ b/documentation/docs/get-started/setup.mdx
@@ -97,6 +97,7 @@ sudo -u postgres createdb khoj --password
 
 ##### Local Server Setup
 - *Make sure [python](https://realpython.com/installing-python/) and [pip](https://pip.pypa.io/en/stable/installation/) are installed on your machine*
+- Check [llama-cpp-python setup](https://python.langchain.com/docs/integrations/llms/llamacpp#installation) if you hit any llama-cpp issues with the installation
 
 Run the following command in your terminal to install the Khoj backend.
 
@@ -104,17 +105,36 @@ Run the following command in your terminal to install the Khoj backend.
   <Tabs groupId="operating-systems">
     <TabItem value="macos" label="MacOS">
     ```shell
+# ARM/M1+ Machines
+MAKE_ARGS="-DLLAMA_METAL=on" python -m pip install khoj-assistant
+
+# Intel Machines
 python -m pip install khoj-assistant
     ```
     </TabItem>
     <TabItem value="win" label="Windows">
       ```shell
-      py -m pip install khoj-assistant
+ # 1. (Optional) To use NVIDIA (CUDA) GPU
+ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
+ # 1. (Optional) To use AMD (ROCm) GPU
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on"
+ # 1. (Optional) To use VULCAN GPU
+ CMAKE_ARGS="-DLLAMA_VULKAN=on"
+
+ # 2. Install Khoj
+ py -m pip install khoj-assistant
       ```
     </TabItem>
     <TabItem value="unix" label="Linux">
       ```shell
-python -m pip install khoj-assistant
+ # CPU
+ python -m pip install khoj-assistant
+ # NVIDIA (CUDA) GPU
+ CMAKE_ARGS="DLLAMA_CUBLAS=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
+ # AMD (ROCm) GPU
+ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
+ # VULCAN GPU
+ CMAKE_ARGS="-DLLAMA_VULKAN=on" FORCE_CMAKE=1 python -m pip install khoj-assistant
       ```
     </TabItem>
   </Tabs>
@@ -181,7 +201,7 @@ To use the desktop client, you need to go to your Khoj server's settings page (h
 1. Select files and folders to index [using the desktop client](/get-started/setup#2-download-the-desktop-client). When you click 'Save', the files will be sent to your server for indexing.
     - Select Notion workspaces and Github repositories to index using the web interface.
 
-[^1]: Khoj, by default, can use [OpenAI GPT3.5+ chat models](https://platform.openai.com/docs/models/overview) or [GPT4All chat models that follow Llama2 Prompt Template](https://github.com/nomic-ai/gpt4all/blob/main/gpt4all-chat/metadata/models2.json). See [this section](/miscellaneous/advanced#use-openai-compatible-llm-api-server-self-hosting) to use non-standard chat models
+[^1]: Khoj, by default, can use [OpenAI GPT3.5+ chat models](https://platform.openai.com/docs/models/overview) or [GGUF chat models](https://huggingface.co/models?library=gguf). See [this section](/miscellaneous/advanced#use-openai-compatible-llm-api-server-self-hosting) to use non-standard chat models
 
 :::tip[Note]
 Using Safari on Mac? You might not be able to login to the admin panel. Try using Chrome or Firefox instead.
diff --git a/documentation/docs/miscellaneous/credits.md b/documentation/docs/miscellaneous/credits.md
index 6f77ed41..d1c3c90c 100644
--- a/documentation/docs/miscellaneous/credits.md
+++ b/documentation/docs/miscellaneous/credits.md
@@ -10,4 +10,4 @@ Many Open Source projects are used to power Khoj. Here's a few of them:
 - Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
 - [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
 - [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
-- [GPT4All](https://github.com/nomic-ai/gpt4all) to chat with local LLM
+- [Llama.cpp](https://github.com/ggerganov/llama.cpp) to chat with local LLM