klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

Fork 0

mirror of https://github.com/khoaliber/khoj.git synced 2026-03-04 05:39:06 +00:00

Commit Graph

Author	SHA1	Message	Date
Debanjum Singh Solanky	2f7a6af56a	Support incremental update of org-mode entries and embeddings - What - Hash the entries and compare to find new/updated entries - Reuse embeddings encoded for existing entries - Only encode embeddings for updated or new entries - Merge the existing and new entries and embeddings to get the updated entries, embeddings - Why - Given most note text entries are expected to be unchanged across time. Reusing their earlier encoded embeddings should significantly speed up embeddings updates - Previously we were regenerating embeddings for all entries, even if they had existed in previous runs	2022-09-10 20:58:33 +03:00
Debanjum Singh Solanky	7606724dbc	Add file of each entry to entry dict in org_to_jsonl converter - This will help filter query to org content type using file filter - Do not explicitly specify items being extracted from json of each entry in text_search as all text search content types do not have file being set in jsonl converters	2022-09-05 15:21:40 +03:00
Debanjum Singh Solanky	ea4fdd9134	Fix logic to ignore notes with no body. Add tests to prevent regression - Notes with empty newlines in body were not being ignored - Add regression tests to avoid above regression in org_to_jsonl conversion	2022-08-21 19:41:40 +03:00

Author

SHA1

Message

Date

Debanjum Singh Solanky

2f7a6af56a

Support incremental update of org-mode entries and embeddings

- What
  - Hash the entries and compare to find new/updated entries
  - Reuse embeddings encoded for existing entries
  - Only encode embeddings for updated or new entries
  - Merge the existing and new entries and embeddings to get the updated
    entries, embeddings

- Why
  - Given most note text entries are expected to be unchanged
    across time. Reusing their earlier encoded embeddings should
    significantly speed up embeddings updates
  - Previously we were regenerating embeddings for all entries,
    even if they had existed in previous runs

2022-09-10 20:58:33 +03:00

Debanjum Singh Solanky

7606724dbc

Add file of each entry to entry dict in org_to_jsonl converter

- This will help filter query to org content type using file filter
- Do not explicitly specify items being extracted from json of each
  entry in text_search as all text search content types do not have
  file being set in jsonl converters

2022-09-05 15:21:40 +03:00

Debanjum Singh Solanky

ea4fdd9134

Fix logic to ignore notes with no body. Add tests to prevent regression

- Notes with empty newlines in body were not being ignored
- Add regression tests to avoid above regression in org_to_jsonl conversion

2022-08-21 19:41:40 +03:00

3 Commits