Update docs to mention using Llama 3.1 and 20K max prompt size for it

Update stale credits to better reflect bigger open source dependencies
2026-03-02 13:18:18 +00:00 · 2024-08-22 20:27:58 -07:00
parent 238bc11a50
commit bdb81260ac
5 changed files with 13 additions and 12 deletions
--- a/documentation/docs/advanced/litellm.md
+++ b/documentation/docs/advanced/litellm.md
@@ -26,11 +26,11 @@ Using LiteLLM with Khoj makes it possible to turn any LLM behind an API into you
   - Api Key: `any string`
   - Api Base Url: **URL of your Openai Proxy API**
 4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
-   - Name: `llama3` (replace with the name of your local model)
+   - Name: `llama3.1` (replace with the name of your local model)
   - Model Type: `Openai`
   - Openai Config: `<the proxy config you created in step 3>`
-   - Max prompt size: `2000` (replace with the max prompt size of your model)
+   - Max prompt size: `20000` (replace with the max prompt size of your model)
-   - Tokenizer: *Do not set for OpenAI, mistral, llama3 based models*
+   - Tokenizer: *Do not set for OpenAI, Mistral, Llama3 based models*
 5. Create a new [Server Chat Setting](http://localhost:42110/server/admin/database/serverchatsettings/add/) on your Khoj admin panel
   - Default model: `<name of chat model option you created in step 4>`
   - Summarizer model: `<name of chat model option you created in step 4>`
--- a/documentation/docs/advanced/lmstudio.md
+++ b/documentation/docs/advanced/lmstudio.md
@@ -19,10 +19,10 @@ LM Studio can expose an [OpenAI API compatible server](https://lmstudio.ai/docs/
   - Api Key: `any string`
   - Api Base Url: `http://localhost:1234/v1/` (default for LMStudio)
 4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
-   - Name: `llama3` (replace with the name of your local model)
+   - Name: `llama3.1` (replace with the name of your local model)
   - Model Type: `Openai`
   - Openai Config: `<the proxy config you created in step 3>`
-   - Max prompt size: `2000` (replace with the max prompt size of your model)
+   - Max prompt size: `20000` (replace with the max prompt size of your model)
   - Tokenizer: *Do not set for OpenAI, mistral, llama3 based models*
 5. Create a new [Server Chat Setting](http://localhost:42110/server/admin/database/serverchatsettings/add/) on your Khoj admin panel
   - Default model: `<name of chat model option you created in step 4>`
--- a/documentation/docs/advanced/ollama.md
+++ b/documentation/docs/advanced/ollama.md
@@ -17,17 +17,17 @@ Ollama exposes a local [OpenAI API compatible server](https://github.com/ollama/
 1. Setup Ollama: https://ollama.com/
 2. Start your preferred model with Ollama. For example,
    ```bash
-    ollama run llama3
+    ollama run llama3.1
    ```
 3. Create a new [OpenAI Processor Conversation Config](http://localhost:42110/server/admin/database/openaiprocessorconversationconfig/add) on your Khoj admin panel
   - Name: `ollama`
   - Api Key: `any string`
   - Api Base Url: `http://localhost:11434/v1/` (default for Ollama)
 4. Create a new [Chat Model Option](http://localhost:42110/server/admin/database/chatmodeloptions/add) on your Khoj admin panel.
-   - Name: `llama3` (replace with the name of your local model)
+   - Name: `llama3.1` (replace with the name of your local model)
   - Model Type: `Openai`
   - Openai Config: `<the ollama config you created in step 3>`
-   - Max prompt size: `1000` (replace with the max prompt size of your model)
+   - Max prompt size: `20000` (replace with the max prompt size of your model)
 5. Create a new [Server Chat Setting](http://localhost:42110/server/admin/database/serverchatsettings/add/) on your Khoj admin panel
   - Default model: `<name of chat model option you created in step 4>`
   - Summarizer model: `<name of chat model option you created in step 4>`
--- a/documentation/docs/features/chat.md
+++ b/documentation/docs/features/chat.md
@@ -25,7 +25,7 @@ Offline chat stays completely private and can work without internet using open-s
 >  - An Nvidia, AMD GPU or a Mac M1+ machine would significantly speed up chat response times
 1. Open your [Khoj offline settings](http://localhost:42110/server/admin/database/offlinechatprocessorconversationconfig/) and click *Enable* on the Offline Chat configuration.
-2. Open your [Chat model options settings](http://localhost:42110/server/admin/database/chatmodeloptions/) and add any [GGUF chat model](https://huggingface.co/models?library=gguf) to use for offline chat. Make sure to use `Offline` as its type. For a balanced chat model that runs well on standard consumer hardware we recommend using [Llama 3.1 by Meta](https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF) by default.
+2. Open your [Chat model options settings](http://localhost:42110/server/admin/database/chatmodeloptions/) and add any [GGUF chat model](https://huggingface.co/models?library=gguf) to use for offline chat. Make sure to use `Offline` as its type. For a balanced chat model that runs well on standard consumer hardware we recommend using [Llama 3.1 by Meta](https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF) by default. For machines with no or small GPU we recommend using [Gemma 2 2B](https://huggingface.co/bartowski/gemma-2-2b-it-GGUF) or [Phi 3.5 mini](https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF)
 :::tip[Note]
--- a/documentation/docs/miscellaneous/credits.md
+++ b/documentation/docs/miscellaneous/credits.md
@@ -5,9 +5,10 @@ sidebar_position: 4
 # Credits
 Many Open Source projects are used to power Khoj. Here's a few of them:
- [Multi-QA MiniLM Model](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1), [All MiniLM Model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for Text Search. See [SBert Documentation](https://www.sbert.net/examples/applications/retrieve_rerank/README.html)
+- [Llama.cpp](https://github.com/ggerganov/llama.cpp) to chat with local LLM
- [OpenAI CLIP Model](https://github.com/openai/CLIP) for Image Search. See [SBert Documentation](https://www.sbert.net/examples/applications/image-search/README.html)
+- [SentenceTransformer](https://www.sbert.net/examples/applications/retrieve_rerank/README.html) for Text Search
 - [HuggingFace](https://huggingface.co/) for hosting open-source chat and search models
 - Charles Cave for [OrgNode Parser](http://members.optusnet.com.au/~charles57/GTD/orgnode.html)
 - [Org.js](https://mooz.github.io/org-js/) to render Org-mode results on the Web interface
 - [Markdown-it](https://github.com/markdown-it/markdown-it) to render Markdown results on the Web interface
- [Llama.cpp](https://github.com/ggerganov/llama.cpp) to chat with local LLM
+- [Katex](https://katex.org/) to render math