klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-04-23 17:09:18 +00:00

Author	SHA1	Message	Date
sabaimran	596e11ec6d	Use the same function for computing entries for IDs regardless of whether it has prev entries	2023-07-21 13:23:56 -07:00
Debanjum Singh Solanky	3e59be7f1d	Release Khoj version 0.9.0	2023-07-18 19:59:27 -07:00
Debanjum Singh Solanky	d078e7b1f6	Clean up search type usage in khoj server, tests and Readme	2023-07-18 19:57:55 -07:00
Debanjum Singh Solanky	4d910936b7	Fix triggering index update on khoj server from khoj.el	2023-07-18 19:57:54 -07:00
Debanjum Singh Solanky	5c7d7f558d	Make AI model used for Khoj chat configurable from khoj.el - Fix bug. Set the unused model-name to a standad default value	2023-07-18 19:57:54 -07:00
Debanjum	5f2be2a9bb	Merge pull request #298 from HyunggyuJang/patch-1 Encode config as utf-8 during setup in khoj.el. This will allow utf-8 encoded files etc to be passed in config	2023-07-18 17:54:11 -07:00
Debanjum Singh Solanky	429e1b4b48	Regenerate index to apply corruption fixes on first run of new khoj	2023-07-18 16:10:47 -07:00
Debanjum Singh Solanky	83e1088d42	Manage khoj.yml config migrations on app start. Version the schema - Add version to khoj.yml schema Versioning the khoj.yml config schema will simplify future migrations	2023-07-18 16:10:10 -07:00
Debanjum Singh Solanky	71e8ddd9a2	Check if PDF is configured before showing it as an option in khoj.el	2023-07-17 15:49:20 -07:00
Debanjum	d00c5da8b7	Merge pull request #325 from khoj-ai/stablize-simplify-content-indexing ## Stabilize and Simplify Content Indexing ### Major Updates - `9bcca43` Unify logic to update entries when indexing from scratch or incrementally - `89c7819` Unify logic to update embeddings when indexing from scratch or incrementally - `6a0297c` Stable sort new entries when marking entries for update - `58d86d7` Unify logic to configure server from API or on server start - Create tests to ensure old entries, embeddings in index are unaffected on adding new entries - Refer: `1482fd4`, `7669b85`, `88d1a29` - `ad41ef3` Make normalization of embeddings configurable to test this in `c73feeb` ### Minor Updates - `1673bb5` Add todo state to compiled form of each entry - `6e70b91` Remove unused `dump_jsonl` helper method - `7ad9603` Improve naming of lock - `b02323a` Improve naming text search test methods Resolves #190	2023-07-17 14:51:10 -07:00
Debanjum Singh Solanky	3e3a1ecbc8	Start app even if server init fails to let user fix it Show stacktrace on error to help debugging	2023-07-17 14:33:02 -07:00
Debanjum Singh Solanky	ef6a0044f4	Drop embeddings of deleted text entries from index Previously the deleted embeddings would continue to be in the index, even after the entry was deleted	2023-07-16 03:47:05 -07:00
Debanjum Singh Solanky	ad41ef3991	Make normalizing embeddings configurable	2023-07-16 02:16:33 -07:00
Debanjum Singh Solanky	89c7819cb7	Unify logic to generate embeddings from scratch and incrementally This simplifies the `compute_embeddings' method and avoids potential later divergence in handling the index regenerate vs update scenarios	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6a0297cc86	Stable sort new entries when marking entries for update	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	6e70b914c2	Remove unused dump_jsonl method The entries index is stored ingzipped jsonl files for each content type	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	9bcca43299	Use single func to handle indexing from scratch and incrementally Previous regenerate mechanism did not deduplicate entries with same key So entries looked different between regenerate and update Having single func, mark_entries_for_update, to handle both scenarios will avoid this divergence Update all text_to_jsonl methods to use the above method for generating index from scratch	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	1673bb5558	Add todo state to compiled form of each org-mode entry	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	7ad96036b0	Improve lock name to config_lock instead of search_index_lock It is used to lock updates to all app config state, including processor	2023-07-16 01:45:53 -07:00
Debanjum Singh Solanky	58d86d7876	Use single func to configure server via API and on server start Improve error messages on failure to configure server components	2023-07-16 01:45:53 -07:00
sabaimran	a15711e635	Fix null type checks in get /config	2023-07-15 15:53:56 -07:00
sabaimran	e590d75b20	Start Khoj even when config is not valid (#320 ) * Add icon to indicate bad config, start Khoj even if there was an issue setting up the index	2023-07-15 14:11:54 -07:00
sabaimran	49ab201c30	Fix issues importing PySide in Docker container (#322 ) * Rather than installing PyQT dependencies, remove codepaths that require pyqt files in no-gui mode	2023-07-15 13:33:13 -07:00
sabaimran	ba47f2ab39	Merge branch 'master' of github.com:debanjum/khoj	2023-07-14 22:28:05 -07:00
sabaimran	874cffd256	Add additional support for parsing notion workspaces	2023-07-14 22:27:56 -07:00
Debanjum	52f68167ce	Merge pull request #317 from khoj-ai/reduce-memory-consumption-by-search-model-duplication Reuse Search Models across Content Types to reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled text content type - Split model state into 2 separate state objects, `search_models` and `content_index`. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - The change should cut down memory utilization quite a bit for most users. I see a >50% drop in memory utilization on my Khoj instance. But this will vary for each user based on the amount of content indexed vs number of plugins enabled. - This change does not solve the RAM utilization scaling with size of the index, as the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 19:54:12 -07:00
Debanjum Singh Solanky	f08e9539f1	Release lock after updating index even if update fails to prevent deadlock Wrap acquire/release locks in try/catch/finally when updating content index and search models to prevent lock not being released on error and causing a deadlock	2023-07-14 16:57:27 -07:00
sabaimran	37f7f9fd1d	Add additional telemetry for system understanding (#316 ) * Add additional telemetry in order to understand which data sources are the most useful * Make actions side by side in the configuration page * Restore main run command * Update links to point to wiki pages for Github, Notion integrations * Stanardize nomenclature of the api_type to use _config suffix Remove header fields that aren't actually helpful for understanding config usage	2023-07-14 10:14:07 -07:00
Debanjum Singh Solanky	86e2bec9a0	Reuse Search Models across Content Types to Reduce Memory Consumption - Memory consumption now only scales with search models used, not with content types as well. Previously each content type had it's own copy of the search ML models. That'd result in 300+ Mb per enabled content type - Split model state into 2 separate state objects, `search_models' and `content_index'. This allows loading text_search and image_search models first and then reusing them across all content_types in content_index - This should cut down memory utilization quite a bit for most users. I see a ~50% drop in memory utilization. This will, of course, vary for each user based on the amount of content indexed vs number of plugins enabled - This does not solve the RAM utilization scaling with size of the index. As the whole content index is still kept in RAM while Khoj is running Should help with #195, #301 and #303	2023-07-14 01:27:22 -07:00
Debanjum	b2718d330c	Merge pull request #304 from migrate-from-pyqt-to-pyside Migrate from PyQT6 to PySide6	2023-07-13 11:54:47 -07:00
sabaimran	31e933207f	Set default values for sys.stdout if they're unavailable	2023-07-12 22:22:49 -07:00
Debanjum Singh Solanky	9c76150895	Migrate from PyQT6 to PySide6	2023-07-11 18:43:44 -07:00
HyunggyuJang	88c42b3043	Encode data as utf-8 otherwise it will complain, see https://github.com/magit/ghub/commit/1c855310909340791e5ed0410ffa6b64dbec1fed	2023-07-11 17:06:05 +09:00
Debanjum Singh Solanky	f664a74e77	Update Khoj server to run on non standard port, 42110 instead of 8000 Resolves #295	2023-07-10 21:27:58 -07:00
sabaimran	effb52f859	Fix demo rendering with the new header	2023-07-10 21:16:19 -07:00
sabaimran	55f5be7b03	Release Khoj version 0.8.2	2023-07-10 14:39:32 -07:00
sabaimran	9a63f89f33	Merge branch 'master' of github.com:debanjum/khoj	2023-07-10 14:31:19 -07:00
sabaimran	53809298c0	Release Khoj version 0.8.1	2023-07-10 14:30:04 -07:00
tjsousa	5b37e988e6	Allow using configured GPT chat model (#292 ) My account doesn't have gpt-4 enabled and it wouldn't work as the default value was always used from extract_questions, where the caller could use the configured model.	2023-07-10 14:24:40 -07:00
Debanjum Singh Solanky	75ff871217	Release Khoj version 0.8.0	2023-07-10 13:37:51 -07:00
Debanjum Singh Solanky	979088b3dc	Add tooltip helper text on web settings page buttons - Provide more details on what clicking configure, initialize buttons or changing the results count slider does - This shows up on user hovering over those buttons	2023-07-10 13:32:41 -07:00
Debanjum Singh Solanky	255781e135	Use relative link on logo to jump to correct page on local and cloud	2023-07-10 13:22:20 -07:00
Debanjum Singh Solanky	b2d229c116	Move header pane style to base khoj.css for reuse. Fix logo size	2023-07-10 13:10:17 -07:00
Debanjum Singh Solanky	20cb314171	Open the Khoj config page in the browser on first run	2023-07-10 12:10:20 -07:00
sabaimran	07cf5a214a	Check if PDF files are present in the Obsidian vault before initializing the Khoj configuration (#293 )	2023-07-10 10:33:04 -07:00
sabaimran	7364bac8ae	Make the header take up less space - Use a single row for the header - Needed custom styling for each page because each of them are different in subtle ways, unfortunately	2023-07-09 22:31:37 -07:00
sabaimran	62704cac09	Add a plugin which allows users to index their Notion pages (#284 ) * For the demo instance, re-instate the scheduler, but infrequently for api updates - In constants, determine the cadence based on whether it's a demo instance or not - This allow us to collect telemetry again. This will also allow us to save the chat session * Conditionally skip updating the index altogether if it's a demo isntance * Add backend support for Notion data parsing - Add a NotionToJsonl class which parses the text of Notion documents made accessible to the API token - Make corresponding updates to the default config, raw config to support the new notion addition * Add corresponding views to support configuring Notion from the web-based settings page - Support backend APIs for deleting/configuring notion setup as well - Streamline some of the index updating code * Use defaults for search and chat queries results count * Update pagination of retrieving pages from Notion * Update state conversation processor when update is hit * frequency_penalty should be passed to gpt through kwargs * Add check for notion in render_multiple method * Add headings to Notion render * Revert results count slider and split Notion files by blocks * Clean/fix misc things in the function to update index - Use the successText and errorText variables appropriately - Name parameters in function calls - Add emojis, woohoo * Clean up and further modularize code for processing data in Notion	2023-07-09 15:29:26 -07:00
Debanjum	77755c0284	Fix Packaging the Khoj Desktop Apps (#289 ) * Add langchain static files and pytorch metadata to Khoj native app * Add pillow static files, metadata & hidden imports to Khoj native app * Fix path to web interface static files on Khoj native app * Add tiktoken hidden imports to make chat work from Khoj native app * Fix Khoj native app to run with GUI mode enabled This got broken when we moved from using the --no-gui flag to using --gui in https://github.com/khoj-ai/khoj/pull/263	2023-07-09 10:21:16 -07:00
sabaimran	4c135ea316	Make streaming optional for the /chat endpoint (#287 ) * Update the /chat endpoint to conditionally support streaming - If streams are enabled, return the threadgenerator as it does currently - If stream is disabled, return a JSON response with the response/compiled references separated out - Correspondingly, update the chat.html UI to use the streamed API, as well as Obsidian - Rename chat/init/ to chat/history * Update khoj.el to use the /history endpoint - Update corresponding unit tests to use stream=true * Remove & from call to /chat for obsidian * Abstract functions out into a helpers.py file and clean up some of the error-catching	2023-07-09 10:12:09 -07:00
Debanjum Singh Solanky	0a86220d42	Use default values, delete content config on disable and update state	2023-07-07 20:36:16 -07:00

1 2 3 4 5 ...

1002 Commits