Commit Graph

470 Commits

Author SHA1 Message Date
Debanjum
81c651b5b2 Fix truncation tests to check output chat history for truncation
New truncation logic return a new message list.
It does not update message list by reference/in place since 8a16f5a2a.
So truncation tests should run verification on the truncated chat
history returned by the truncation func instead of the original chat
history passed into the truncation func.
2025-08-28 15:50:32 -07:00
Debanjum
8a16f5a2af Reduce logical complexity of constructing context from chat history
- Process chat history in default order instead of processing it in
  reverse. Improve legibility of context construction for minor
  performance hit in dropping message from front of list.
- Handle multiple system messages by collating them into list
- Remove logic to drop system role for gemma-2, o1 models. Better to
  make code more readable than support old models.
2025-08-27 13:43:10 -07:00
Debanjum
2823c84bb4 Default to gemini 2.5 model series on init and for eval 2025-08-22 20:34:38 -07:00
Debanjum
452c794e93 Make regex search tool results look more like grep results 2025-08-20 19:07:28 -07:00
Debanjum
0e1615acc8 Fix grep files tool to work with line start, end anchors
Previously line start, end anchors would just work if the whole file
started or ended with the regex pattern rather than matching by line.

Fix it to work like a standard grep tool and match by line start, end.
2025-08-09 12:29:35 -07:00
Debanjum
c8e07e86e4 Format server code with ruff recommendations 2025-08-01 00:28:17 -07:00
Debanjum
892d57314e Update test setup to index test data after old indexing code removed
- Delete tests testing deprecated server side indexing flows
- Delete `Local(Plaintext|Org|Markdown|Pdf)Config' methods, files and
  references in tests
- Index test data via new helper method, `get_index_files'
  - It is modelled after the old `get_org_files' variants in main app
  - It passes the test data in required format to `configure_content'
    Allows maintaining the more realistic tests from before while
    using new indexing mechanism (rather than the deprecated server
    side indexing mechanism
2025-07-31 18:25:32 -07:00
Debanjum
d9d24dd638 Drop old code to sync files on server filesystem. Clean cli, init paths
This stale code was originally used to index files on server file
system directly by server. We currently push files to sync via API.

Server side syncing of remote content like Github and Notion is still
supported. But old, unused code for server side sync of files on
server fs is being cleaned out.

New --log-file cli args allows specifying where khoj server should
store logs on fs. This replaces the --config-file cli arg that was
only being used as a proxy for deciding where to store the log file.

- TODO
  - Tests are broken. They were relying on the server side content
    syncing for test setup
2025-07-31 18:25:32 -07:00
Debanjum
b1f2737c9a Drop native offline chat support with llama-cpp-python
It is recommended to chat with open-source models by running an
open-source server like Ollama, Llama.cpp on your GPU powered machine
or use a commercial provider of open-source models like DeepInfra or
OpenRouter.

These chat model serving options provide a mature Openai compatible
API that already works with Khoj.

Directly using offline chat models only worked reasonably with pip
install on a machine with GPU. Docker setup of khoj had trouble with
accessing GPU. And without GPU access offline chat is too slow.

Deprecating support for an offline chat provider directly from within
Khoj will reduce code complexity and increase developement velocity.
Offline models are subsumed to use existing Openai ai model provider.
2025-07-31 18:25:32 -07:00
Emmanuel Ferdman
655a1b38f2 Resolve Pydantic deprecation warnings
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-07-25 16:55:00 -07:00
Debanjum
f0513cbbb1 Fix to run new automation api tests in ci 2025-07-08 23:27:09 -07:00
Debanjum
8d9e75f580 Fix automation url parsing, response handling. Test automations api
- Methods calling send_message_to_model_wrapper_sync handn't been
  update to handle the function returning new ResponseWithThought
- Store, load request.url to DB as, from string to avoid serialization
  issues
2025-07-08 17:40:19 -07:00
Debanjum
6bda8dc20b Fix file upload size limit client test after max upload bump 2025-07-06 10:24:44 -07:00
Debanjum
5010623a0a Deep link to markdown entries by line number in uri
Use url fragment schema for deep link URIs, borrowing from URL/PDF
schemas. E.g file:///path/to/file.txt#line=<line_no>&#page=<page_no>

Compute line number during (recursive) markdown entry chunking.

Test line number in URI maps to line number of chunk in actual md file.

This deeplink URI with line number is passed to llm as context to
better combine with line range based view file tool.

Grep tool already passed matching line number. This change passes
line number in URIs of markdown entries matched by the semantic search
tool.
2025-07-03 19:27:57 -07:00
Debanjum
dcfa4288c4 Deep link to org-mode entries. Deep link by line number in uri
Use url fragment schema for deep link URIs, borrowing from URL/PDF
schemas. E.g file:///path/to/file.txt#line=<line_no>&#page=<page_no>

Compute line number during (recursive) org-mode entry chunking.

Thoroughly test line number in URI maps to line number of chunk in
actual org mode file.

This deeplink URI with line number is passed to llm as context to
better combine with line range based view file tool.

Grep tool already passed matching line number. This change passes
line number in URIs of org entries matched by the semantic search tool
2025-07-03 17:38:34 -07:00
Debanjum
5c4d41d300 Reduce structural changes to indexed raw org mode entries
Reduce structural changes to raw entry allows better deep-linking and
re-annotation. Currently done via line number in new uri field.

Only add properties drawer to raw entry if entry has properties
Previously line and source properties were inserted into raw entries.
This isn't done anymore. Line, source are deprecated for use in khoj.el.
2025-07-03 17:38:31 -07:00
Debanjum
aa081913bf Improve truncation with tool use and Anthropic caching
- Cache last anthropic message. Given research mode now uses function
  calling paradigm and not the old research mode structure.
- Cache tool definitions passed to anthropic models
- Stop dropping first message if by assistant as seems like Anthropic
  API doesn't complain about it any more.

- Drop tool result when tool call is truncated as invalid state
- Do not truncate tool use message content, just drop the whole tool
  use message.

  AI model APIs need tool use assistant message content in specific
  form (e.g with thinking etc.). So dropping content items breaks
  expected tool use message content format.

Handle tool use scenarios where iteration query isn't set for retry
2025-07-02 23:32:44 -07:00
Debanjum
721c55a37b Rename ResponseWithThought response field to text for better naming 2025-07-02 20:48:24 -07:00
Debanjum
e6cc9b1182 Test update agents with large knowledge bases 2025-07-02 18:01:18 -07:00
Debanjum
257c238a88 Improve DB clean up after test runs 2025-06-06 15:09:39 -07:00
Debanjum
d2c7e5516f Fix online chat actor tests, improve offline chat actor tests
The chat actor (and director) tests haven't been looked into in a long
while. They'd gone stale in how they were calling thee functions. And
what was required to run them. Now the online chat actor tests work
again.
2025-06-06 13:28:18 -07:00
Debanjum
2f4160e24b Use single extract questions method across all LLMs for doc search
Using model specific extract questions was an artifact from older
times, with less guidable models.

New changes collate and reuse logic
- Rely on send_message_to_model_wrapper for model specific formatting.
- Use same prompt, context for all LLMs as can handle prompt variation.
- Use response schema enforcer to ensure response consistency across models.

Extract questions (because of its age) was the only tool directly within
each provider code. Put it into helpers to have all the (mini) tools
in one place.
2025-06-06 13:28:18 -07:00
Debanjum
7d59688729 Move document search tool into helpers module with other tools
Document search (because of its age) was the only tool directly within
an api router. Put it into helpers to have all the (mini) tools in one
place.
2025-06-06 13:28:18 -07:00
Debanjum
05d4e19cb8 Pass deep typed chat history for more ergonomic, readable, safe code
The chat dictionary is an artifact from earlier non-db chat history
storage. We've been ensuring new chat messages have valid type before
being written to DB for more than 6 months now.

Move to using the deeply typed chat history helps avoids null refs,
makes code more readable and easier to reason about.

Next Steps:
The current update entangles chat_history written to DB
with any virtual chat history message generated for intermediate
steps. The chat message type written to DB should be decoupled from
type that can be passed to AI model APIs (maybe?).

For now we've made the ChatMessage.message type looser to allow
for list[dict] type (apart from string). But later maybe a good idea
to decouple the chat_history recieved by send_message_to_model from
the chat_history saved to DB (which can then have its stricter type check)
2025-06-04 00:03:14 -07:00
Debanjum
dca17591f3 Handle parsing json from string with plain text suffix 2025-05-23 19:44:02 -07:00
Debanjum
4f3fdaf19d Increase khoj api response timeout on evals call. Handle no decision 2025-05-18 19:14:49 -07:00
Debanjum
fd591c6e6c Upgrade tenacity to respect min time for exponential backoff
Fix for issue is in tenacity 9.0.0. But older langchain required
tenacity <0.9.0.

Explicitly pin version of langchain sub packages to avoid indexing
and doc parsing breakage.
2025-05-17 17:37:15 -07:00
Debanjum
2694734d22 Update truncation logic to handle multi-part message content 2025-05-17 17:37:15 -07:00
Debanjum
8050173ee1 Timeout calls to khoj api in evals to continue to next question 2025-05-17 17:37:11 -07:00
Debanjum
e0352cd8e1 Handle unset ttft in metadata of failed chat response. Fixes evals.
This was causing evals to stop processing rest of batch as well.
2025-05-17 15:06:22 -07:00
Debanjum
911e1bf981 Use gemini 2.0 flash as evaluator. Set seed for it to reduce eval variance.
Gemini 2.0 flash model is cheaper and better than Gemini 1.5 pro
2025-04-04 20:11:00 +05:30
Debanjum
94ca458639 Set default chat model to KHOJ_CHAT_MODEL env var if set
Simplify code log to set default_use_model during init for readability
2025-03-09 18:23:30 +05:30
Debanjum
b4183c7333 Default to gemini 2.0 flash instead of 1.5 flash on Gemini setup
Add price of gemini 2.0 flash for cost calculations
2025-03-07 13:48:15 +05:30
Debanjum
f13bdc5135 Log eval run progress percentage for orientation 2025-03-07 13:48:15 +05:30
Debanjum
dc0bc5bcca Evaluate information retrieval quality using eval script
- Encode article urls in filename indexed in Khoj KB
  Makes it easier for humans to compare, trace retrieval performance
  by looking at logs than using content hash (which was previously
  explored)
2025-01-06 13:19:52 +07:00
Debanjum
daeba66c0d Optionally pass references used by agent for response to eval scorers
This will allow the eval framework to evaluate retrieval quality too
2025-01-06 13:19:52 +07:00
Debanjum
8231f4bb6e Return accuracy as decision to generalize across IR & standard scorers 2025-01-06 13:19:52 +07:00
Debanjum
c4bb92076e Convert async create automation api endpoints to sync 2024-12-26 21:59:55 -08:00
Debanjum
01bc6d35dc Rename Chat Model Options table to Chat Model as short & readable (#1003)
- Previous was incorrectly plural but was defining only a single model
- Rename chat model table field to name
- Update documentation
- Update references functions and variables to match new name
2024-12-12 11:24:16 -08:00
sabaimran
6c8007e23b Improve handling of multiple output modes
- Use the generated descriptions / inferred queries to supply context to the model about what it's created and give a richer response
- Stop sending the generated image in user message. This seemed to be confusing the model more than helping.
- Also, rename the open ai converse method to converse_openai to follow patterns with other providers
2024-12-10 16:54:21 -08:00
Debanjum
9dd3782f5c Rename OpenAIProcessorConversationConfig DB model to more apt AiModelApi (#998)
* Rename OpenAIProcessorConversationConfig to more apt AiModelAPI

The DB model name had drifted from what it is being used for,
a general chat api provider that supports other chat api providers like
anthropic and google chat models apart from openai based chat models.

This change renames the DB model and updates the docs to remove this
confusion.

Using Ai Model Api we catch most use-cases including chat, stt, image generation etc.
2024-12-08 18:02:29 -08:00
sabaimran
886fe4a0c9 Merge branch 'master' of github.com:khoj-ai/khoj into features/allow-multi-outputs-in-chat 2024-12-03 21:37:00 -08:00
Debanjum
fc6be543bd Improve GPQA eval prompt to imrpove parsing answer from Khoj response 2024-11-30 17:21:09 -08:00
sabaimran
c5329d76ba Merge branch 'master' of github.com:khoj-ai/khoj into features/allow-multi-outputs-in-chat 2024-11-29 14:12:03 -08:00
sabaimran
d91935c880 Initial commit of a functional but not yet elegant prototype for this concept 2024-11-28 17:28:23 -08:00
Debanjum
29e801c381 Add MATH500 dataset to eval
Evaluate simpler MATH500 responses with gemini 1.5 flash

This improves both the speed and cost of running this eval
2024-11-28 12:48:25 -08:00
Debanjum
22aef9bf53 Add GPQA (diamond) dataset to eval 2024-11-28 12:48:25 -08:00
Debanjum
70b7e7c73a Improve load of complex json objects. Use it to pick tool, run code
Gemini doesn't work well when trying to output json objects. Using it
to output raw json strings with complex, multi-line structures
requires more intense clean-up of raw json string for parsing
2024-11-26 17:37:57 -08:00
Debanjum
ed364fa90e Track running costs & accuracy of eval runs in progress
Collect, display and store running costs & accuracy of eval run.

This provides more insight into eval runs during execution instead of
having to wait until the eval run completes.
2024-11-20 12:40:51 -08:00
Debanjum
45c623f95c Dedupe, organize chat actor, director tests
- Move Chat actor tests that were previously in chat director tests file
- Dedupe online, offline io selector chat actor tests
2024-11-18 16:10:50 -08:00