Commit Graph

1136 Commits

Author SHA1 Message Date
sabaimran
90efc2ea7a Update comments and add explanations 2023-08-01 09:24:03 -07:00
sabaimran
f7e03f6d63 Switch spinner snake case -> camel case 2023-08-01 08:52:25 -07:00
sabaimran
1c52a6993f add a lock around chat operations to prevent the offline model from getting bombarded and stealing a bunch of compute resources
- This also solves #367
2023-08-01 00:23:17 -07:00
sabaimran
6c3074061b Disable the input bar when chat response is in flight 2023-08-01 00:21:39 -07:00
sabaimran
c14cbe926a Add a loading symbol to web chat. Closes #392 2023-07-31 23:35:48 -07:00
sabaimran
8054bdc896 Use n_batch parameter to increase resource consumption on host machine (and implicitly engage GPU) 2023-07-31 23:25:08 -07:00
sabaimran
e55e9a7b67 Fix unit tests and truncation logic 2023-07-31 21:37:59 -07:00
sabaimran
2335f11b00 Add better error handling for download processes incase of failure 2023-07-31 21:07:38 -07:00
sabaimran
209975e065 Resolve merge conflicts: let Khoj fail if the model tokenizer is not found 2023-07-31 19:12:26 -07:00
sabaimran
2d6c3cd4fa Misc. quality improvements for Llama V2
- Fix download url -- was mapping to q3_K_M, but fixed to use q4_K_S
- Use a proper Llama Tokenizer for counting tokens for truncation with Llama
- Add additional null checks when running
2023-07-31 19:11:20 -07:00
sabaimran
ca195097d7 Update chat hint message at first run 2023-07-31 17:46:09 -07:00
Debanjum Singh Solanky
ded606c7cb Fix format of user query during general conversation with Llama 2 2023-07-31 17:21:14 -07:00
Debanjum Singh Solanky
48e5ac0169 Do not drop system message when truncating context to max prompt size
Previously the system message was getting dropped when the context
size with chat history would be more than the max prompt size
supported by the cat model

Now only the previous chat messages are dropped or the current
message is truncated but the system message is kept to provide
guidance to the chat model
2023-07-31 17:21:14 -07:00
sabaimran
88ef86ad5c Fix typing issues for mypy (#372) 2023-07-30 19:27:48 -07:00
sabaimran
ca2c942b65 Add typing to compiled_references and inferred_queries 2023-07-30 19:10:30 -07:00
sabaimran
3646fd1449 Add a warning to indicate that Khoj is not configured to work with personal data sources 2023-07-30 18:52:10 -07:00
sabaimran
996832dc72 Allow user to chat even if content types aren't configured - use empty references 2023-07-30 18:47:45 -07:00
Debanjum Singh Solanky
53810a0ff7 Create khoj config dir if non-existant, before writing to khoj env file 2023-07-30 01:35:36 -07:00
sabaimran
f65d157244 Release Khoj version 0.10.0 2023-07-28 19:27:47 -07:00
Debanjum Singh Solanky
f76af869f1 Do not log the gpt4all chat response stream in khoj backend
Stream floods stdout and does not provide useful info to user
2023-07-28 19:14:04 -07:00
sabaimran
5ccb01343e Add Offline chat to Obsidian (#359)
* Add support for configuring/using offline chat from within Obsidian
* Fix type checking for search type
* If Github is not configured, /update call should fail
* Fix regenerate tests same as the update ones
* Update help text for offline chat in obsidian
* Update relevant description for Khoj settings in Obsidian
* Simplify configuration logic and use smarter defaults
2023-07-28 18:47:56 -07:00
Debanjum
b3c1507708 Merge pull request #361 from khoj-ai/configure-offline-chat-from-emacs
- Configure using Offline Chat from Emacs: 
- Enable, Disable Offline Chat from Emacs

- Use: Enable offline chat with `(setq khoj-chat-offline t)' during khoj setup
- Benefits: Offline chat models are better for privacy but not great at answering questions
2023-07-28 18:06:58 -07:00
sabaimran
9f78db0579 Let Offline chat override OpenAI API settings (#362)
* Let Offline chat override OpenAI API settings
* Download the offline model whenever offline chat is enabled
* Add progressbar for download for llamav2 model to track progress
* Change ordering of n due to switch of default processor
* Flip ordering of offline/openai checks when extracting questions from query
2023-07-28 17:26:20 -07:00
Debanjum Singh Solanky
ebfbef1f68 Configure using offline chat from Emacs
Closes #358
2023-07-28 16:07:33 -07:00
sabaimran
29081f4429 Adjust parameters for offline chat 2023-07-27 22:22:09 -07:00
sabaimran
124d97c26d Replace Falcon 🦅 model with Llama V2 🦙 for offline chat (#352)
* Working example with LlamaV2 running locally on my machine

- Download from huggingface
- Plug in to GPT4All
- Update prompts to fit the llama format

* Add appropriate prompts for extracting questions based on a query based on llama format

* Rename Falcon to Llama and make some improvements to the extract_questions flow

* Do further tuning to extract question prompts and unit tests

* Disable extracting questions dynamically from Llama, as results are still unreliable
2023-07-27 20:51:20 -07:00
Debanjum Singh Solanky
715d56d4f0 Use new schema to update khoj.yml config from khoj.el 2023-07-26 17:34:16 -07:00
sabaimran
8b2af0b5ef Add support for our first Local LLM 🤖🏠 (#330)
* Add support for gpt4all's falcon model as an additional conversation processor
- Update the UI pages to allow the user to point to the new endpoints for GPT
- Update the internal schemas to support both GPT4 models and OpenAI
- Add unit tests benchmarking some of the Falcon performance
* Add exc_info to include stack trace in error logs for text processors
* Pull shared functions into utils.py to be used across gpt4 and gpt
* Add migration for new processor conversation schema
* Skip GPT4All actor tests due to typing issues
* Fix Obsidian processor configuration in auto-configure flow
* Rename enable_local_llm to enable_offline_chat
2023-07-26 16:27:08 -07:00
sabaimran
23d77ee338 Fix import issues in desktop image builds (#343) 2023-07-26 15:45:52 -07:00
Debanjum Singh Solanky
7722a9c347 Default to using the gpt-3.5-turbo model for chat from khoj.el 2023-07-22 00:29:26 -07:00
Debanjum Singh Solanky
f0d4a4cf9a Revert "Make configure_content functional. Do not pass content index state to it."
This reverts commit 2ddee7e745 as it
broke partial updates of the content index for just the specified
content types
2023-07-21 13:59:09 -07:00
sabaimran
82c725817e Merge branch 'master' of github.com:khoj-ai/khoj 2023-07-21 13:24:05 -07:00
sabaimran
596e11ec6d Use the same function for computing entries for IDs regardless of whether it has prev entries 2023-07-21 13:23:56 -07:00
Debanjum Singh Solanky
2ddee7e745 Make configure_content functional. Do not pass content index state to it. 2023-07-20 23:24:08 -07:00
sabaimran
1610d2ebd9 📝 Add a documentation base for Khoj! (#333)
* Add docs for more organized, accessible information detailing Khoj setup
* Delete duplicated files
* Add a coverpage without enabling it. Add logo and theme
* Remove obsidian README.md
* Add plausible script to index.html via docsify
2023-07-20 22:34:25 -07:00
Debanjum Singh Solanky
3e59be7f1d Release Khoj version 0.9.0 2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky
d078e7b1f6 Clean up search type usage in khoj server, tests and Readme 2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky
4d910936b7 Fix triggering index update on khoj server from khoj.el 2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky
5c7d7f558d Make AI model used for Khoj chat configurable from khoj.el
- Fix bug. Set the unused model-name to a standad default value
2023-07-18 19:57:54 -07:00
Debanjum
5f2be2a9bb Merge pull request #298 from HyunggyuJang/patch-1
Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config
2023-07-18 17:54:11 -07:00
Debanjum Singh Solanky
429e1b4b48 Regenerate index to apply corruption fixes on first run of new khoj 2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky
83e1088d42 Manage khoj.yml config migrations on app start. Version the schema
- Add version to khoj.yml schema
  Versioning the khoj.yml config schema will simplify future migrations
2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky
71e8ddd9a2 Check if PDF is configured before showing it as an option in khoj.el 2023-07-17 15:49:20 -07:00
Debanjum
d00c5da8b7 Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing
## Stabilize and Simplify Content Indexing

### Major Updates
- 9bcca43 Unify logic to update entries when indexing from scratch or incrementally
- 89c7819 Unify logic to update embeddings when indexing from scratch or incrementally
- 6a0297c Stable sort new entries when marking entries for update
- 58d86d7 Unify logic to configure server from API or on server start
- Create tests to ensure old entries, embeddings in index are unaffected on adding new entries
  - Refer: 1482fd4, 7669b85, 88d1a29 
  - ad41ef3 Make normalization of embeddings configurable to test this in c73feeb

### Minor Updates
- 1673bb5 Add todo state to compiled form of each entry
- 6e70b91 Remove unused `dump_jsonl` helper method 
- 7ad9603 Improve naming of lock
- b02323a Improve naming text search test methods

Resolves #190
2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky
3e3a1ecbc8 Start app even if server init fails to let user fix it
Show stacktrace on error to help debugging
2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky
ef6a0044f4 Drop embeddings of deleted text entries from index
Previously the deleted embeddings would continue to be in the index,
even after the entry was deleted
2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky
ad41ef3991 Make normalizing embeddings configurable 2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky
89c7819cb7 Unify logic to generate embeddings from scratch and incrementally
This simplifies the `compute_embeddings' method and avoids potential
later divergence in handling the index regenerate vs update scenarios
2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6a0297cc86 Stable sort new entries when marking entries for update 2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky
6e70b914c2 Remove unused dump_jsonl method
The entries index is stored ingzipped jsonl files for each content type
2023-07-16 01:45:53 -07:00