Commit Graph

293 Commits

Author SHA1 Message Date
sabaimran 73e38fccf3 Explicitly set billing to off in the test for being able to index a large set of data 2023-11-25 20:48:32 -08:00
sabaimran b2afbaa315 Add support for rate limiting the amount of data indexed
- Add a dependency on the indexer API endpoint that rounds up the amount of data indexed and uses that to determine whether the next set of data should be processed
- Delete any files that are being removed for adminstering the calculation
- Show current amount of data indexed in the config page
2023-11-25 20:28:04 -08:00
sabaimran 60c23d9e3a Add online search chat director tests 2023-11-21 23:08:36 -08:00
sabaimran c652a7fd2d Move text_to_entries under the new content folder 2023-11-21 22:25:17 -08:00
sabaimran 1e2af083f0 Rename the data_sources module to content 2023-11-21 22:11:32 -08:00
sabaimran 2bb989e9d8 Resolve merge conflicts and fix some import ordering 2023-11-21 12:30:43 -08:00
sabaimran a474c31e02 Move the django app into the src/khoj folder for better organization and functionality
- Our pypi package currently does not work because the django app and associated database is not included. To remedy this issue, move the app into the src/khoj folder. This has the added benefit of improved organization of the codebase, as all server related code is now in a single folder
- Update associated file paths and system references
2023-11-21 10:56:04 -08:00
sabaimran b8e6883a81 Merge branch 'master' of github.com:khoj-ai/khoj into features/internet-enabled-search 2023-11-19 16:20:08 -08:00
sabaimran 4def8cce36 Merge pull request #541 from asim-shrestha/patch-1
Add test separators
2023-11-19 14:14:34 -08:00
Debanjum 71799add0b Index Parent Headings of Org-Mode Entries to Improve Search Context (#548)
### Overview
The parent hierarchy of org-mode entries can store important context. 
This change updates OrgNode to track parent headings for each org entry and adds the parent outline for each entry to the index

### Details
- Test search uses ancestor headings as context for improved results
- Add ancestor headings of each org-mode entry to their compiled form
- Track ancestor headings for each org-mode entry in org-node parser

Resolves #85
2023-11-19 13:18:19 -08:00
sabaimran e398a76779 Fix test word filter 2023-11-19 13:14:58 -08:00
sabaimran 33a9304428 Resolve merge conflicts 2023-11-19 12:57:55 -08:00
sabaimran ef5e9d66c1 Resolve merge conflicts in dependency imports 2023-11-19 11:42:20 -08:00
sabaimran f688529150 Update the default configuration for the AppConfig 2023-11-17 19:26:31 -08:00
Debanjum Singh Solanky ca87b4ede9 Wrap common API query parameters into shared class to deduplicate code
- Upgrade FastAPI to >= latest version. Required upgrade of FastAPI.
  Earlier version didn't support wrapping common query params in class

- Use per fixture app instead of a global FastAPI app in conftest

- Upgrade minimum required Django version

- Fix no notes chat director test with updated no notes message
  No notes message was updated in commit 118f1143
2023-11-17 18:43:49 -08:00
Debanjum Singh Solanky 33ad9b8e64 Update text search test since indexing ancestor hierarchy added 2023-11-17 15:26:55 -08:00
Debanjum Singh Solanky 55785d50c3 Use title, when present, as root ancestor of entries instead of file path 2023-11-17 15:03:27 -08:00
sabaimran ec06d2c446 Move data indexer files into a separate folder under processor. Update assoc UTs 2023-11-16 17:19:55 -08:00
Debanjum Singh Solanky ddb07def0d Test search uses ancestor headings as context for improved results
- Update test data to add deeper outline hierarchy for testing
  hierarchy as context
- Update collateral tests that need count of entries updated, deleted
  asserts to be updated
2023-11-16 03:05:19 -08:00
Debanjum Singh Solanky 74403e3536 Add ancestor headings of each org-mode entry to their compiled form
Resolves #85
2023-11-16 02:54:41 -08:00
Debanjum Singh Solanky 305c25ae1a Track ancestor headings for each org-mode entry in org-node parser 2023-11-16 02:39:14 -08:00
Debanjum Singh Solanky 922983bd53 Set max cos distance to 0.18. Test search API query with max distance 2023-11-15 20:26:21 -08:00
Debanjum Singh Solanky 08a057bdd5 Rename SearchModel to SearchModelConfig DB model, Require Cross-Encoder 2023-11-15 17:31:50 -08:00
Debanjum Singh Solanky 5a6ab9cc85 Fix failing client tests 2023-11-15 00:17:44 -08:00
Debanjum Singh Solanky 8f200cf53f Remove unused parameter from configure_search_type method 2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky 4af194d74b Make search model configurable on server
- Expose ability to modify search model via Django admin interface
- Previously the bi_encoder and cross_encoder models to use were set
  in code
- Now it's user configurable but with a default config generated by
  default
2023-11-14 19:09:35 -08:00
Debanjum Singh Solanky e98141f4c3 Subscribe default user to standard plan with a far away renewal date
Self hosted users in anonymous mode have all capabilities unlocked
2023-11-14 16:31:39 -08:00
Asim Shrestha 0bfc094e18 Add test separators 2023-11-11 17:08:58 -08:00
Debanjum Singh Solanky c4364b9100 Weaken asking follow-up qs and q&a mode in notes prompt to OpenAI models
- Notes prompt doesn't need to be so tuned to question answering. User
could just want to talk about life. The notes need to be used to
response to those, not necessarily only retrieve answers from notes

- System and notes prompts were forcing asking follow-up questions a
  little too much. Reduce strength of follow-up question asking
2023-11-10 23:36:43 -08:00
sabaimran e2e96f9aa4 Add default settings to let new users be subscribed on trial
- Add the default user to a subscription trial
- Update associated unit tests
2023-11-10 22:38:28 -08:00
Debanjum Singh Solanky c9c0ba67c6 Fix chat_client configurations for OpenAI chat director tests 2023-11-10 17:29:23 -08:00
sabaimran 262a8574d1 Add a test to verify that a user without data sucessfully returns a respones to the /search endpoint 2023-11-10 14:00:58 -08:00
Debanjum Singh Solanky 404d47f1a1 Bubble up content indexing errors to notify user on client apps 2023-11-07 05:28:13 -08:00
Debanjum Singh Solanky 9ab327a2b6 Store the data source of each entry in database
This will be useful for updating, deleting entries by their data
source. Data source can be one of Computer, Github or Notion for now

Store each file/entries source in database
2023-11-07 02:18:48 -08:00
Debanjum Singh Solanky a08b152358 Improve log messages in text_entries and memory leak unit test 2023-11-06 19:27:31 -08:00
Debanjum 38f24a037d Improve Indexing Text Entries (#535)
Major
- Ensure search results logic consistent across migration to DB, multi-user
- Manually verified search results for sample queries look the same across migration
 - Flatten indexing code for better indexing progress tracking and code readability

Minor
- a4f407f Test memory leak on MPS device when generating vector embeddings
- ef24485 Improve Khoj with DB setup instructions in the Django app readme (for now)
- f212cc7 Arrange remaining text search tests in arrange, act, assert order
- 022017d Fix text search tests to test updated indexing log messages
2023-11-06 16:01:53 -08:00
sabaimran 3d6e8d53fe Try adding dependencies for libgl in order to run OCR in github action unit tests 2023-11-05 15:09:40 -08:00
sabaimran fdd727712f Rename test files from x_to_jsonl to x_to_entries 2023-11-05 14:33:07 -08:00
Debanjum Singh Solanky a4f407f595 Test memory leak on MPS device when generating vector embeddings
Slope threshold of 2.0 determined qualitatively on local Mac device
Minor unused import and clean-up
2023-11-05 03:48:54 -08:00
Debanjum Singh Solanky f212cc7174 Arrange remaining text search tests in arrange, act, assert order 2023-11-05 02:04:52 -08:00
Debanjum Singh Solanky 022017dd0f Fix text search tests to test updated indexing log messages 2023-11-05 02:04:52 -08:00
sabaimran d1d210605e Merge branch 'features/multi-user-support-khoj' of github.com:khoj-ai/khoj into features/multi-user-support-khoj 2023-11-04 14:29:34 -07:00
sabaimran 3678aa5614 Add tests to validate expected behaviors in the multi-user scenario 2023-11-04 14:29:30 -07:00
Debanjum Singh Solanky 2f1756cc15 Do not use icon for each file, folder to index in desktop app.
Other minor fixes based on PR feedback
2023-11-04 00:13:10 -07:00
Debanjum Singh Solanky 345856e7be Merge branch 'master' of github.com:khoj-ai/khoj into features/multi-user-support-khoj
Merge changes to use latest GPT4All with GPU, GGUF model support into
khoj multi-user support rearchitecture branch
2023-11-02 22:44:25 -07:00
sabaimran fe6720fa06 [Multi-User Part 8]: Make conversation processor settings server-wide (#529)
- Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code.
- To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match.
- Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
2023-11-02 10:43:27 -07:00
Debanjum Singh Solanky d92a2d03a7 Rename Files, Classes from X_To_JSONL to more appropriate X_To_Entries
These content processors are converting content into entries in DB
instead of entries in JSONL file
2023-11-01 14:51:33 -07:00
Debanjum Singh Solanky 87e6b1eab9 Rename TextEmbeddings to TextEntries for improved readability
Improves readability as name has closer match to underlying
constructs
2023-10-31 18:55:59 -07:00
Debanjum Singh Solanky bcbee05a9e Rename DbModels Embeddings, EmbeddingsAdapter to Entry, EntryAdapter
Improves readability as name has closer match to underlying
constructs

- Entry is any atomic item indexed by Khoj. This can be an org-mode
  entry, a markdown section, a PDF or Notion page etc.

- Embeddings are semantic vectors generated by the search ML model
  that encodes for meaning contained in an entries text.

- An "Entry" contains "Embeddings" vectors but also other metadata
  about the entry like filename etc.
2023-10-31 18:50:54 -07:00
sabaimran 54a387326c [Multi-User Part 6]: Address small bugs and upstream PR comments (#518)
- 08654163cb227edb8991ad7f77c99b560819f4f9: Add better parsing for XML files
- f3acfac7fbec0a3876e7586607c376c9dfac6a4d: Add a try/catch around the dateparser in order to avoid internal server errors in app
- 7d43cd62c0d51889978413ca411cec1bd37b024a: Chunk embeddings generation in order to avoid large memory load
- e02d751eb3cb9a005f1b529fde3b34ce3c026f1b: Addresses comments from PR #498 
- a3f393edb49842bedb4f42fba2dfc831ee51db7f: Addresses comments from PR #503 
- 66eb0782867b201a878c7fb13ba662be1258037c: Addresses comments from PR #511 
- Address various items in https://github.com/khoj-ai/khoj/issues/527
2023-10-31 17:59:53 -07:00