- Image search already uses a sorted list of images to process
- Prevents index of entries to desync when entries, embeddings
generated by a separate server/app instance
- Update existings code, tests to process input-filters as list
instead of str
- Test `text_to_jsonl' get files methods to work with combination of
`input-files' and `input-filters'
Resolves#84
- Let the specific text_to_jsonl method decide which of the
TextContentConfig fields it needs to convert <text> type to jsonl
- This simplifies extending TextContentConfig for a specific type without
modifying all text_to_jsonl methods
- It keeps the number of args being passed to the `text_to_jsonl'
methods in check
- What
- Hash the entries and compare to find new/updated entries
- Reuse embeddings encoded for existing entries
- Only encode embeddings for updated or new entries
- Merge the existing and new entries and embeddings to get the updated
entries, embeddings
- Why
- Given most note text entries are expected to be unchanged
across time. Reusing their earlier encoded embeddings should
significantly speed up embeddings updates
- Previously we were regenerating embeddings for all entries,
even if they had existed in previous runs
- Pass file associated with entries in markdown, beancount to json converters
- Add File, Word, Date Filters to Ledger, Markdown Types
- Word, Date Filters were accidently removed from the above types yesterday
- File Filter is the only filter that newly got added
- Stop passing verbose flag around app methods
- Minor remap of verbosity levels to match python logging framework levels
- verbose = 0 maps to logging.WARN
- verbose = 1 maps to logging.INFO
- verbose >=2 maps to logging.DEBUG
- Minor clean-up of app: unused modules, conversation file opening
Now that the logic to compile entries is in the processor layer, the
extract_entries method is standard across (text) search_types
Extract the load_jsonl method as a utility helper method.
Use it in (a)symmetric search types
- The logic for compiling a beancount entry (for later encoding) now
completely resides in the org-to-jsonl processor layer
- This allows symmetric search to be generic and not be aware of
beancount specific properties that were extracted by the
beancount-to-jsonl processor layer
- Now symmetric search just expects the jsonl to (at least) have the
'compiled' and 'raw' keys for each entry. What original text the
entry was compiled from is irrelevant to it. The original text
could be location, transaction, chat etc, it doesn't have to care
- Fix loading entries from jsonl in extract_entries method
- Only extract Title from jsonl of each entry
This is the only thing written to the jsonl for symmetric ledger
- This fixes the trailing escape seq in loaded entries
- Remove the need for semantic-search.el response reader to do pointless complicated cleanup
- Make symmetric_ledger:extract_entries use beancount_to_jsonl:load_jsonl
Both methods were doing similar work
- Make load_jsonl handle loading entries from both gzip and uncompressed jsonl