Upgrade to latest GPT4All. Use Mistral as default offline chat model

GPT4all now supports gguf llama.cpp chat models. Latest GPT4All (+mistral) performs much at least 3x faster. On Macbook Pro at ~10s response start time vs 30s-120s earlier. Mistral is also a better chat model, although it hallucinates more than llama-2
2026-03-08 05:39:13 +00:00 · 2023-10-22 18:16:02 -07:00
parent 6dc0df3afb
commit 0f1ebcae18
10 changed files with 84 additions and 11 deletions
--- a/tests/test_gpt4all_chat_actors.py
+++ b/tests/test_gpt4all_chat_actors.py
@@ -24,7 +24,7 @@ from khoj.processor.conversation.gpt4all.utils import download_model

 from khoj.processor.conversation.utils import message_to_log

-MODEL_NAME = "llama-2-7b-chat.ggmlv3.q4_0.bin"
+MODEL_NAME = "mistral-7b-instruct-v0.1.Q4_0.gguf"


@pytest.fixture(scope="session")