Debanjum Singh Solanky 86575b2946 Chunk text in preference order of para, sentence, word, character
- Previous simplistic chunking strategy of splitting text by space
  didn't capture notes with newlines, no spaces. For e.g in #620

- New strategy will try chunk the text at more natural points like
  paragraph, sentence, word first. If none of those work it'll split
  at character to fit within max token limit

- Drop long words while preserving original delimiters

Resolves #620
2024-04-04 02:41:55 +05:30
2024-03-31 00:06:15 +05:30
2024-04-02 16:16:06 +05:30
2024-03-10 17:48:30 +05:30
2024-03-31 00:06:15 +05:30

Khoj Logo

test dockerize pypi Discord

An AI personal assistant for your digital brain


Khoj is an AI application to search and chat with your notes and documents.
It is open-source, self-hostable and accessible on Desktop, Emacs, Obsidian, Web and Whatsapp.
It works with pdf, markdown, org-mode, notion files and github repositories.
It can paint, search the internet and understand speech.


🔎 Search 💬 Chat
Quickly retrieve relevant documents using natural language Get answers and create content from your existing knowledge base
Does not need internet Can be configured to work without internet

Contributors

Cheers to our awesome contributors! 🎉

Made with contrib.rocks.

Description
No description provided
Readme AGPL-3.0 116 MiB
Languages
Python 51%
TypeScript 36.1%
CSS 4.1%
HTML 3.2%
Emacs Lisp 2.4%
Other 3.1%