The `has_documents' flag wasn't being passed. So the search tab
always showing up as empty instead of being dynamically enabled if
documents had been indexed.
## Major
- Parse markdown, org parent entries as single entry if fit within max tokens
- Parse a file as single entry if it fits with max token limits
- Add parent heading ancestry to extracted markdown entries for context
- Chunk text in preference order of para, sentence, word, character
## Minor
- Create wrapper function to get entries from org, md, pdf & text files
- Remove unused Entry to Jsonl converter from text to entry class, tests
- Dedupe code by using single func to process an org file into entries
Resolves#620
### Why
- Python 3.12 is the default Python on Ubuntu 24.04 LTS, Windows and Mac via Homebrew
- Python 3.12 has a bunch of improvements that can be explored with Khoj (e.g per core GIL for performance)
## Changes
- The latest PyTorch now supports Python 3.12
- RapidOCR for indexing image PDFs doesn't currently support python 3.12.
But it's an optional dependency, so only install it if python < 3.12
### Testing
- Verified Khoj installs fine on Windows and Mac with Python 3.12
- Verified Khoj chat works fine on Mac, Windows with Python 3.12
Resolves#522
- RapidOCR for indexing image PDFs doesn't currently support python 3.12.
It's an optional dependency anyway, so only install it if python < 3.12
- Run unit tests with python version 3.12 as well
Resolves#522
* Add support for using OAuth2.0 in the Notion integration
* Add notion to the admin page
* Remove unnecessary content_index and image search/setup references
* Trigger background job to start indexing Notion after user configures it
* Add a log line when a new Notion integration is setup
* Fix references to the configure_content methods
`re.MULTILINE' should be passed to the `flags' argument, not the
`max_splits' argument of the `re.split' func
This was messing up the indexing by only allowing a maximum of
re.MULTILINE splits. Fixing this improves the search quality to
previous state
More content indexed per entry would result in an overall scores
lowering effect. Increase default search distance threshold to counter that
- Details
- Fix expected results post indexing updates
- Fix search with max distance post indexing updates
- Minor
- Remove openai chat actor test for after: operator as it's not expected anymore