Commit Graph

9 Commits

Author SHA1 Message Date
sabaimran
fb6ebd19fc Fix refactor bugs, CSRF token issues for use in production (#531)
Fix refactor bugs, CSRF token issues for use in production
* Add flags for samesite settings to enable django admin login
* Include tzdata to dependencies to work around python package issues in linux
* Use DJANGO_DEBUG flag correctly
* Fix naming of entry field when creating EntryDate objects
* Correctly retrieve openai config settings
* Fix datefilter with embeddings name for field
2023-11-02 23:02:38 -07:00
sabaimran
fe6720fa06 [Multi-User Part 8]: Make conversation processor settings server-wide (#529)
- Rather than having each individual user configure their conversation settings, allow the server admin to configure the OpenAI API key or offline model once, and let all the users re-use that code.
- To configure the settings, the admin should go to the `django/admin` page and configure the relevant chat settings. To create an admin, run `python3 src/manage.py createsuperuser` and enter in the details. For simplicity, the email and username should match.
- Remove deprecated/unnecessary endpoints and views for configuring per-user chat settings
2023-11-02 10:43:27 -07:00
Debanjum Singh Solanky
bcbee05a9e Rename DbModels Embeddings, EmbeddingsAdapter to Entry, EntryAdapter
Improves readability as name has closer match to underlying
constructs

- Entry is any atomic item indexed by Khoj. This can be an org-mode
  entry, a markdown section, a PDF or Notion page etc.

- Embeddings are semantic vectors generated by the search ML model
  that encodes for meaning contained in an entries text.

- An "Entry" contains "Embeddings" vectors but also other metadata
  about the entry like filename etc.
2023-10-31 18:50:54 -07:00
sabaimran
54a387326c [Multi-User Part 6]: Address small bugs and upstream PR comments (#518)
- 08654163cb227edb8991ad7f77c99b560819f4f9: Add better parsing for XML files
- f3acfac7fbec0a3876e7586607c376c9dfac6a4d: Add a try/catch around the dateparser in order to avoid internal server errors in app
- 7d43cd62c0d51889978413ca411cec1bd37b024a: Chunk embeddings generation in order to avoid large memory load
- e02d751eb3cb9a005f1b529fde3b34ce3c026f1b: Addresses comments from PR #498 
- a3f393edb49842bedb4f42fba2dfc831ee51db7f: Addresses comments from PR #503 
- 66eb0782867b201a878c7fb13ba662be1258037c: Addresses comments from PR #511 
- Address various items in https://github.com/khoj-ai/khoj/issues/527
2023-10-31 17:59:53 -07:00
Debanjum
9acc722f7f [Multi-User Part 4]: Authenticate using API Tokens (#513)
###  New
- Use API keys to authenticate from Desktop, Obsidian, Emacs clients
- Create API, UI on web app config page to CRUD API Keys
- Create user API keys table and functions to CRUD them in Database

### 🧪 Improve
- Default to better search model, [gte-small](https://huggingface.co/thenlper/gte-small), to improve search quality
- Only load chat model to GPU if enough space, throw error on load failure
- Show encoding progress, truncate headings to max chars supported
- Add instruction to create db in Django DB setup Readme

### ⚙️ Fix
- Fix error handling when configure offline chat via Web UI
- Do not warn in anon mode about Google OAuth env vars not being set
- Fix path to load static files when server started from project root
2023-10-26 12:33:03 -07:00
sabaimran
4b6ec248a6 [Multi-User Part 3]: Separate chat sesssions based on authenticated users (#511)
- Add a data model which allows us to store Conversations with users. This does a minimal lift over the current setup, where the underlying data is stored in a JSON file. This maintains parity with that configuration.
- There does _seem_ to be some regression in chat quality, which is most likely attributable to search results.

This will help us with #275. It should become much easier to maintain multiple Conversations in a given table in the backend now. We will have to do some thinking on the UI.
2023-10-26 11:37:41 -07:00
sabaimran
a8a82d274a [Multi-User Part 2]: Add login pages and gate access to application behind login wall (#503)
- Make most routes conditional on authentication *if anonymous mode is not enabled*. If anonymous mode is enabled, it scaffolds a default user and uses that for all application interactions.
- Add a basic login page and add routes for redirecting the user if logged in
2023-10-26 10:17:29 -07:00
sabaimran
216acf545f [Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498)
- Partition configuration for indexing local data based on user accounts
- Store indexed data in an underlying postgres db using the `pgvector` extension
- Add migrations for all relevant user data and embeddings generation. Very little performance optimization has been done for the lookup time
- Apply filters using SQL queries
- Start removing many server-level configuration settings
- Configure GitHub test actions to run during any PR. Update the test action to run in a containerized environment with a DB.
- Update the Docker image and docker-compose.yml to work with the new application design
2023-10-26 09:42:29 -07:00
sabaimran
c125995d94 [Multi-User]: Part 0 - Add support for logging in with Google (#487)
* Add concept of user authentication to the request session via GoogleUser
2023-10-14 19:39:13 -07:00