Commit Graph

4896 Commits

Author SHA1 Message Date
Debanjum
d81fb08366 Use case insensitive regex matching with grep files tool 2025-07-02 20:48:24 -07:00
Debanjum
9c38326608 Add grep files tool to enable researcher to find documents by regex
Earlier khoj could technically only answer questions existential
questions, i.e question that would terminate once any relevant note to
answer that question was found.

This change enables khoj to answer universal questions, i.e questions
that require searching through all notes or finding all instances.

It enables more thorough retrieval from user's knowledge base by
combining semantic search, regex search, view and list files tools.

For more development details including motivation, see live coding
session 1.1 at https://www.youtube.com/live/-2s_qi4hd2k
2025-07-02 20:48:24 -07:00
Debanjum
59f5648dbd Add list files tool to enable researcher to find documents by path
Allow getting a map of user's knowledge base under specified path.

This enables more thorough retrieval from user's knowledge base by
combining search, view and list files tools.
2025-07-02 20:48:24 -07:00
Debanjum
2f9f608cff Add file viewer tool to enable researcher to read documents
Allow reading whole file contents or content in specified line range
in user's knowledge base. This allows for more deterministic
traversal.
2025-07-02 20:48:24 -07:00
Debanjum
721c55a37b Rename ResponseWithThought response field to text for better naming 2025-07-02 20:48:24 -07:00
Debanjum
490f0a435d Pass research tools directly with their varied args for flexibility
Why
---
Previously researcher had a uniform response schema to pick next tool,
scratchpad, query and tool. This didn't allow choosing different
arguments for the different tools being called. And the tool call,
result format passed by khoj was custom and static across all LLMs.

Passing the tools and their schemas directly to llm when picking next
tool allows passing multiple, tool specific arguments for llm to
select. For example, model can choose webpage urls to read or image
gen aspect ratio (apart from tool query) to pass to the specific tool.

Using the LLM tool calling paradigm allows model to see tool call,
tool result in a format that it understands best.

Using standard tool calling paradigm also allows for incorporating
community builts tools more easily via MCP servers, clients tools,
native llm api tools etc.

What
---
- Return ResponseWithThought from completion_with_backoff ai model
  provider methods
- Show reasoning model thoughts in research mode train of thought.
  For non-reasoning models do not show researcher train of thought.
  As non-reasoning models don't (by default) think before selecing
  tool. Showing tool call is lame and resembles tool's action shown in
  next step.

- Store tool calls in standardized format.
- Specify tool schemas in tool for research llm definitions as well.
- Transform tool calls, tool results to standardized form for use
  within khoj. Manage the following tool call, result transformations:
  - Model provider tool_call -> standardized tool call
  - Standardized tool call, result -> model specific tool call, result

- Make researcher choose webpages urls to read as well for the webpage
  tool. Previously it would just decide the query but let the webpage
  reader infer the query url(s). But researcher has better context on
  which webpages it wants to have read to answer their query.

  This should eliminate the webpage reader deciding urls to read step
  and speed up webpage read tool use.

Handle unset response thoughts. Useful when retry on failed request

Previously resulted in unbound local variable response_thoughts error
2025-07-02 20:48:23 -07:00
Debanjum
80522e370e Make researcher pick next tool using model function calling feature
The pick next tool requests next tool to call to model in function
calling / tool use format.
2025-07-02 19:10:02 -07:00
Debanjum
b888d5e65e Add function calling support to Anthropic, Gemini and OpenAI models
Previously these models could use response schema but not tools use
capabilities provided by these AI model APIs.

This change allows chat actors to use the function calling feature to
specify which tools the LLM by these providers can call.

This should help simplify tool definition and structure context in
forms that these LLMs natively understand.
(i.e in tool_call - tool_result ~chatml format).
2025-07-02 19:10:02 -07:00
Debanjum
9607f2e87c Release Khoj version 1.42.8 2025-07-02 19:07:51 -07:00
Debanjum
f4fc76645c Upgrade electron package used by desktop app 2025-07-02 18:50:18 -07:00
Debanjum
96fb9bd87c Tune temperature and top_p to reduce gemini model repetition
Gemini models, especially flash models, seems to have a tendency to go
into long, repetitive output tokens loop. Unsure why.

Tune temp, top_p as gemini api doesn't seem to allow setting frequency
or presence penalty, at least for reasoning models. Those would have
been a more direct mechanism to avoid model getting stuck in a loop.
2025-07-02 18:42:32 -07:00
Debanjum
9774bb012e Update agent knowledge base and configuration atomically
This should help prevent partial updates to agent. Especially useful
for agent's with large knowledge bases being updated. Failing the call
should raise an exception. This will allow your to retry save instead
of losing your previous agent changes or saving only partial.
2025-07-02 18:01:18 -07:00
Debanjum
e6cc9b1182 Test update agents with large knowledge bases 2025-07-02 18:01:18 -07:00
Debanjum
5fe2ea8a55 Run safety check only when agent persona updated on agent edits
Running safety check isn't required if the agent persona wasn't
updated this time around as it would have passed safety check
previously.

This should speed up editing agents when agent persona isn't updated.
2025-07-02 18:01:18 -07:00
Debanjum
a8c47a70f7 Show friendly name for available ai models on clients when set 2025-07-01 16:59:13 -07:00
Debanjum
487826bc32 Release Khoj version 1.42.7 2025-06-27 18:21:18 -07:00
Debanjum
29e5d7ef08 Improve support for new Deepseek R1 model over Openai compatible api
Parse thinking out from <think>..</think> tags in chat response
Handle merging structured message content, not just str, for deepseek.
2025-06-27 18:17:35 -07:00
Debanjum
a33580d560 Enable cache, proxy to improve firecrawl webpage scrape speed, success 2025-06-27 16:35:25 -07:00
Debanjum
1566e3c74d Ease bulk (de-)selecting of files to add/remove to agent knowledge base
Add select all, deselect all buttons to select all (filtered) files to
add, remove from an agent's knowledge base.
2025-06-27 15:19:50 -07:00
Debanjum
3bb4e63f3e Add ability to set default chat model via env var in docker-compose.yml 2025-06-27 15:19:50 -07:00
Debanjum
dd89dd3fc8 Bump web, documentation and desktop app package dependencies 2025-06-27 15:19:50 -07:00
Peter Gaultney
9f3ceba541 Allow setting embedded postgres db directory with PGSERVER_DATA_DIR env var (#1202)
It seems to me that it would be useful to be able to be explicit about
where the embedded database should live - as well as where it _does_
live (via the info log), when not specifying.
2025-06-28 03:21:23 +05:30
Debanjum
d37113850c Let reasoning gemini models dynamically set their thinking budget
All gemini 2.5 series models support dynamic thinking budgets by
setting thinking_budget to -1.
2025-06-27 13:13:24 -07:00
Debanjum
ba059ad8b0 Deduplicate passing chat history to extract question only in prompt
Extract questions has chat history in prompt and in actual chat history.

Only pass in prompt for now. Later update prompts to pass chat history
in chat messages list for better truncation flexibility.
2025-06-24 02:49:29 -07:00
Debanjum
170a8036fe Fix 2 document retrieval bugs to not drop valid search results
1. Due to the interaction of two changes:
  - dedupe by corpus_id, where corpus_id tracks logical content blocks
  like files, org/md headings.
  - return compiled, not logical blocks, where compiled track smaller
  content chunks that fit within search model, llm context windows.

  When combined they showed only 1 hit compiled chunk per logical
  block. Even if multiple chunks match within a logical content block.

  Fix is to either dedupe by compiled text or to return deduped
  logical content blocks (by corpus_id) corresponding to matched
  compiled chunks. This commit fixes it by the first method.

2. Due to inferred query, search results zip which resulted in a
   single search result being returned per query!
   This silently cut down matching search results and went undetected.
2025-06-24 02:47:07 -07:00
Debanjum
73c384b052 Reduce chat history spacing to reduce wasted space b/w chat input box
The tailwing theme spacing of the scroll area surrounding chat history
on large screens was what was causing the large gap between chat input
box and chat history on some screen layouts.

This change reduces the spacing to a more acceptable level.
2025-06-24 02:46:46 -07:00
Debanjum
ca9109455b Retry on intermitted image generation failure for resilient generation 2025-06-24 02:46:46 -07:00
Debanjum
4448ab665c Improve google image generation configuration 2025-06-24 02:46:46 -07:00
Debanjum
dc202e4441 Release Khoj version 1.42.6 2025-06-20 15:00:22 -07:00
Debanjum
623c8b65f1 Set failed response message when a research iteration fails.
Previously summarizedResult would be unset when a tool call failed.

This caused research to fail due to ChatMessageModel failures when
constructing tool chat histories and would have caused similar errors
in other constructed chat histories.

Putting a failed iteration message in the summary prevents that while
letting the research agent continue its research.
2025-06-20 14:13:50 -07:00
Debanjum
b85c646611 Make organic web search result text snippet field optional
All web search providers, like Jina/Searxng?, do not return a text
snippet. Making snippet optional allows processing search results by
these web search providers, without hitting validation errors.
2025-06-20 13:47:08 -07:00
Debanjum
22d71cab44 Log ChatMessageModel validation errors during conversation save 2025-06-19 16:48:11 -07:00
Debanjum
494e7b3856 Update gemini 2.5 to stable model pricing from preview pricing 2025-06-19 16:48:11 -07:00
Debanjum
029bd3be56 Handle breaking change in write file to e2b code sandbox
For some reason the function signature, kwargs are broken. Removing
usage of keyword args resolves the file upload to sandbox error.
2025-06-19 16:48:11 -07:00
Debanjum
b18b7b2e33 Handle unset response thoughts. Useful when retry on failed request
Previously resulted in unbound local variable response_thoughts error
2025-06-19 16:48:06 -07:00
Debanjum
906ff46e6c Handle research iterations where document search returns no results 2025-06-19 16:47:08 -07:00
Debanjum
aa7b23c125 Handle rendering document references with no compiled text on web app 2025-06-17 15:47:58 -07:00
Debanjum
4ca247f0bc Always append random suffix to shared conversations urls 2025-06-17 15:47:58 -07:00
Debanjum
68b7057a76 Share https url unless explicitly disabled or on localhost 2025-06-17 15:47:58 -07:00
Debanjum
bdda03b0bf Git ignore obsidian config directories 2025-06-16 12:01:19 -07:00
Debanjum
e635b8e3b9 Handle gemini chat response completion chunk when streaming 2025-06-13 18:36:53 -07:00
Debanjum
963ebc8875 Pass query params to doc search function before user, chat history
Makes document search arg ordering more consistent with other tools
like online search, run code etc.
2025-06-13 13:29:30 -07:00
Debanjum
9673f8beba Release Khoj version 1.42.5 2025-06-11 13:36:46 -07:00
Debanjum
e87be4edf4 Pin python version used by github workflow to publish to pypi
Avoids having to update python path to write web app static build
files to everytime patch version of python is updated
2025-06-11 13:30:15 -07:00
Debanjum
eaae1cf74e Fix rendering thoughts of Gemini reasoning models
Previously there was duplication of thought in message to user and in
the train of thought. This should be resolved now
2025-06-11 13:09:38 -07:00
Debanjum
4946ea1668 Fix to save organic results to conversation context in DB
This bug was introduced in 05d4e19cb, version 1.42.2, during migration
to save deeply typed ChatMessageModel. As the ChatMessageModel did
not use the right field name for organic results (since the start).

Previously it did not matter as it was storing to DB irrespective but
now the mapping of dictionary to ChatMessageModel drops that field
before save to conversation in DB.

This was resulting in organic context being lost on page reload and
only being shown on first response.
2025-06-11 12:52:42 -07:00
Debanjum
30ced1d86c Log non schema adhering chat message before save to DB 2025-06-11 12:52:42 -07:00
Debanjum
71763684a9 Explicitly drop stream_options if not streaming openai chat response
Not sure why but it some cases when interacting with o3 (which needs
non-streaming) the stream_options seems to be set.

Cannot reproduce but hopefully dropping the stream_options explicitly
should resolve this issue.

Related 985a98214
2025-06-11 12:52:42 -07:00
Debanjum
65644f78b0 Set lower max output tokens for non reasoning Gemini models
While reasoning models support longer output tokens. Non reasoning
models do not. Use a lower max output tokens for them
2025-06-11 11:12:24 -07:00
Debanjum
71221533c8 Release Khoj version 1.42.4 2025-06-10 23:49:30 -07:00