khoj/tests at db2581459f5e8ae5bfa35de95830bf78367e1a7d - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-04-20 01:24:31 +00:00

Files

History

Debanjum Singh Solanky db2581459f Parse markdown parent entries as single entry if fit within max tokens

These changes improve context available to the search model.
Specifically this should improve entry context from short knowledge trees,
that is knowledge bases with sparse, short heading/entry trees

Previously we'd always split markdown files by headings, even if a
parent entry was small enough to fit entirely within the max token
limits of the search model. This used to reduce the context available
to the search model to select appropriate entries for a query,
especially from short entry trees

Revert back to using regex to parse through markdown file instead of
using MarkdownHeaderTextSplitter. It was easier to implement the
logical split using regexes rather than bend MarkdowHeaderTextSplitter
to implement it.
- DFS traverse the markdown knowledge tree, prefix ancestry to each entry

2024-04-04 02:41:55 +05:30

..

Update the default configuration for the AppConfig

2023-11-17 19:26:31 -08:00

__init__.py

Move tests out to project root. Use absolute import in project

2021-09-30 04:12:14 -07:00

conftest.py

Part 1: Server-side changes to support agents integrated with Conversations (#671 )

2024-03-23 22:09:38 +05:30

helpers.py

Use llama.cpp for offline chat models

2024-03-26 22:33:01 +05:30

test_cli.py

Add isort to the pre-commit configuration and apply it to the whole project (#595 )

2023-12-28 18:04:02 +05:30

test_client.py

Short-circuit API rate limiter for unauthenticated users (#607 )

2024-01-17 00:59:52 +05:30

test_conversation_utils.py

Handle msg truncation when question is larger than max prompt size

2024-03-31 15:50:06 +05:30

test_date_filter.py

Improve date filter regexes to extract structured, natural, partial dates

2024-03-30 00:07:19 +05:30

test_file_filter.py

[Multi-User Part 1]: Enable storage of settings for plaintext files based on user account (#498 )

2023-10-26 09:42:29 -07:00

test_helpers.py

Part 2: Add web UI updates for basic agent interactions (#675 )

2024-03-26 18:13:24 +05:30

test_image_search.py

Add isort to the pre-commit configuration and apply it to the whole project (#595 )

2023-12-28 18:04:02 +05:30

test_markdown_to_entries.py

Parse markdown parent entries as single entry if fit within max tokens

2024-04-04 02:41:55 +05:30

test_multiple_users.py

Add isort to the pre-commit configuration and apply it to the whole project (#595 )

2023-12-28 18:04:02 +05:30

test_offline_chat_actors.py

Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat

2024-03-31 00:59:20 +05:30

test_offline_chat_director.py

Merge branch 'master' into migrate-to-llama-cpp-for-offline-chat

2024-03-31 00:59:20 +05:30

test_openai_chat_actors.py

Part 2: Add web UI updates for basic agent interactions (#675 )

2024-03-26 18:13:24 +05:30

test_openai_chat_director.py

Part 1: Server-side changes to support agents integrated with Conversations (#671 )

2024-03-23 22:09:38 +05:30

test_org_to_entries.py

Chunk text in preference order of para, sentence, word, character

2024-04-04 02:41:55 +05:30

test_orgnode.py

Add isort to the pre-commit configuration and apply it to the whole project (#595 )

2023-12-28 18:04:02 +05:30

test_pdf_to_entries.py

Remove unused Entry to Jsonl converter from text to entry class, tests

2024-04-04 02:41:55 +05:30

test_plaintext_to_entries.py

Remove unused Entry to Jsonl converter from text to entry class, tests

2024-04-04 02:41:55 +05:30

test_rawconfig.py

Add isort to the pre-commit configuration and apply it to the whole project (#595 )

2023-12-28 18:04:02 +05:30

test_text_search.py

Chunk text in preference order of para, sentence, word, character

2024-04-04 02:41:55 +05:30

test_word_filter.py

Fix test word filter

2023-11-19 13:14:58 -08:00