klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-04-20 01:24:31 +00:00

Go to file

Debanjum Singh Solanky db2581459f Parse markdown parent entries as single entry if fit within max tokens

These changes improve context available to the search model.
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with sparse, short heading/entry trees

Previously we'd always split markdown files by headings, even if a
parent entry was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to the search model to select appropriate entries for a query,
especially from short entry trees

Revert back to using regex to parse through markdown file instead of
using MarkdownHeaderTextSplitter. It was easier to implement the
logical split using regexes rather than bend MarkdowHeaderTextSplitter
to implement it.
- DFS traverse the markdown knowledge tree, prefix ancestry to each entry

2024-04-04 02:41:55 +05:30

.github

Update stale Khoj pypi package metadata

2024-03-29 00:06:55 +05:30

documentation

Fix docs showing how to setup llama-cpp with Khoj

2024-03-31 15:36:40 +05:30

scripts

Fix bump_version.sh to commit, clean-up after desktop app version bump

2023-12-22 21:42:03 +05:30

src

Parse markdown parent entries as single entry if fit within max tokens

2024-04-04 02:41:55 +05:30

tests

Parse markdown parent entries as single entry if fit within max tokens

2024-04-04 02:41:55 +05:30

.dockerignore

Use pypi khoj to fix docker builds and dockerize github workflow

2023-02-19 01:57:01 -06:00

.gitattributes

Exclude tests data file from programming stats on Github

2023-08-28 11:00:52 -07:00

.gitignore

[Multi-User Part 5]: Add a production Docker file and use a gunicorn configuration with it (#514 )

2023-10-26 13:15:31 -07:00

.pre-commit-config.yaml

Add isort to the pre-commit configuration and apply it to the whole project (#595 )

2023-12-28 18:04:02 +05:30

docker-compose.yml

Set default value of KHOJ_DEBUG to False in the docker-compose file

2024-03-01 21:51:13 +05:30

Dockerfile

Remove unused git dependency from Docker images

2024-02-16 17:41:09 +05:30

gunicorn-config.py

Configure production setup for moving to single worker model

2024-03-30 10:35:55 +05:30

LICENSE

Change license to GNU AGPLv3 from GNU GPLv3

2023-11-16 11:14:06 -08:00

manifest.json

Release Khoj version 1.8.0

2024-03-31 00:06:15 +05:30

prod.Dockerfile

Configure production setup for moving to single worker model

2024-03-30 10:35:55 +05:30

pyproject.toml

Rebase with matser

2024-04-02 16:16:06 +05:30

pytest.ini

Move the django app into the src/khoj folder for better organization and functionality

2023-11-21 10:56:04 -08:00

README.md

Add num online for Discord badge

2024-03-10 17:48:30 +05:30

versions.json

Release Khoj version 1.8.0

2024-03-31 00:06:15 +05:30

README.md

An AI personal assistant for your digital brain

📜 Read Docs • 🌍 Try Khoj Cloud • 💬 Get Involved

Khoj is an AI application to search and chat with your notes and documents.
It is open-source, self-hostable and accessible on Desktop, Emacs, Obsidian, Web and Whatsapp.
It works with pdf, markdown, org-mode, notion files and github repositories.
It can paint, search the internet and understand speech.

🔎 Search	💬 Chat
Quickly retrieve relevant documents using natural language	Get answers and create content from your existing knowledge base
Does not need internet	Can be configured to work without internet

Contributors

Cheers to our awesome contributors! 🎉

Made with contrib.rocks.

Languages

Python 51%

TypeScript 36.1%

CSS 4.1%

HTML 3.2%

Emacs Lisp 2.4%

Other 3.1%