Commit Graph

4858 Commits

Author SHA1 Message Date
Debanjum
e90ab5341a Add context uri field to deeplink line number in original doc 2025-07-03 17:38:34 -07:00
Debanjum
820b4523fd Show raw rather than compiled entry to llm and users
Only embedding models see, operate on compiled text.

LLMs should see raw entry to improve combining it with other document
traversal tools for better regex and line matching.

Users see raw entry for better matching with their actual notes.
2025-07-03 17:38:34 -07:00
Debanjum
5c4d41d300 Reduce structural changes to indexed raw org mode entries
Reduce structural changes to raw entry allows better deep-linking and
re-annotation. Currently done via line number in new uri field.

Only add properties drawer to raw entry if entry has properties
Previously line and source properties were inserted into raw entries.
This isn't done anymore. Line, source are deprecated for use in khoj.el.
2025-07-03 17:38:31 -07:00
sabaimran
870d9d851a Only handle Stripe webhooks meant for the KHOJ_CLOUD product 2025-07-03 17:02:49 -07:00
Debanjum
fe44cd3c59 Upgrade Retrieval from KB in Research Mode. Use Function Calling for Tool Use (#1205)
## Why
Move to function calling paradigm to give models tool call -> tool
result in formats they're fine-tuned to understand. Previously we were
giving them results in our specific format (as function calling paradigm
wasn't well-established yet).

And improve prompt cache hits by caching tool definitions.

This is a **breaking change**. AI Models and APIs that do not support
function calling will not work with Khoj in research mode. Function
calling is supported by:
- Standard commercial AI Models and APIs like Anthropic, Gemini, OpenAI,
OpenRouter
- Standard open-source AI APIs like llama.cpp server, Ollama
- Standard open source models like Qwen, DeepSeek, Gemma, Llama, Mistral

## What
### Use Function Calling for Tool Use
- Add Function Calling support to Anthropic, Gemini, OpenAI AI Model
APIs
- Move Existing Research Mode Tools to Use Function Calling

### Get More Comprehensive Results from your Knowledge Base (KB)
- Give Research Agent better Document Retrieval Tools
  - Add grep files tool to enable researcher to find documents by regex
  - Add list files tool to enable researcher to find documents by path
  - Add file viewer tool to enable researcher to read documents

### Miscellaneous
- Improve Research Prompt, Truncation, Retry and Caching
- Show reasoning model thoughts in Khoj train of thought for
intermediate steps as well
2025-07-03 00:14:07 -07:00
Debanjum
f343a92b1d Give research tools better, consistent names for balanced usage 2025-07-02 23:32:44 -07:00
Debanjum
aa081913bf Improve truncation with tool use and Anthropic caching
- Cache last anthropic message. Given research mode now uses function
  calling paradigm and not the old research mode structure.
- Cache tool definitions passed to anthropic models
- Stop dropping first message if by assistant as seems like Anthropic
  API doesn't complain about it any more.

- Drop tool result when tool call is truncated as invalid state
- Do not truncate tool use message content, just drop the whole tool
  use message.

  AI model APIs need tool use assistant message content in specific
  form (e.g with thinking etc.). So dropping content items breaks
  expected tool use message content format.

Handle tool use scenarios where iteration query isn't set for retry
2025-07-02 23:32:44 -07:00
Debanjum
786b06bb3f Handle failed llm calls, message idempotency to improve retry success
- Deepcopy messages before formatting message for Anthropic to allow
  idempotency so retry on failure behaves as expected
- Handle failed calls to pick next tools to pass failure warning and
  continue next research iteration. Previously if API call to pick
  next failed, the research run would crash
- Add null response check for when Gemini models fail to respond
2025-07-02 23:32:30 -07:00
Debanjum
30878a2fed Show thoughts and text response in thoughts on anthropic tool use
Previously if anthropic models were using tools, the models text
response accompanying the tool use wouldn't be shown as they were
overwritten in aggregated response with the tool call json.

This changes appends the text response to the thoughts portion on tool
use to still show model's thinking. Thinking and text response are
delineated by italics vs normal text for such cases.
2025-07-02 20:48:24 -07:00
Debanjum
c2ab75efef Track, reuse raw model response for multi-turn conversations
This should avoid the need to reformat the Khoj standardized tool call
for cache hits and satisfying ai model api requirements.

Previously multi-turn tool use calls to anthropic reasoning models
would fail as needed their thoughts to be passed back. Other AI model
providers can have other requirements.

Passing back the raw response as is should satisfy the default case.

Tracking raw response should make it easy to apply any formatting
required before sending previous response back, if any ai model
provider requires that.

Details
---
- Raw response content is passed back in ResponseWithThoughts.
- Research iteration stores this and puts it into model response
  ChatMessageModel when constructing iteration history when it is
  present.
  Fallback to using parsed tool call when raw response isn't present.
- No need to format tool call messages for anthropic models as we're
  passing the raw response as is.
2025-07-02 20:48:24 -07:00
Debanjum
7cd496ac19 Frame research prompt as accomplish task instead of answer question
Researcher is expanding into accomplish task behavior, especially with
tool use from the previous collect information to answer user query
behavior.

Update the researcher's system prompt to reflect the new objective better.
Encourage model to not stop working on task until achieve objective
2025-07-02 20:48:24 -07:00
Debanjum
4e67ba4d6c Support seeing lines around regex match with grep files tool
Let research agent see lines surrounding regex matched lines when
using grep files tool to improve document retrieval quality
2025-07-02 20:48:24 -07:00
Debanjum
d81fb08366 Use case insensitive regex matching with grep files tool 2025-07-02 20:48:24 -07:00
Debanjum
9c38326608 Add grep files tool to enable researcher to find documents by regex
Earlier khoj could technically only answer questions existential
questions, i.e question that would terminate once any relevant note to
answer that question was found.

This change enables khoj to answer universal questions, i.e questions
that require searching through all notes or finding all instances.

It enables more thorough retrieval from user's knowledge base by
combining semantic search, regex search, view and list files tools.

For more development details including motivation, see live coding
session 1.1 at https://www.youtube.com/live/-2s_qi4hd2k
2025-07-02 20:48:24 -07:00
Debanjum
59f5648dbd Add list files tool to enable researcher to find documents by path
Allow getting a map of user's knowledge base under specified path.

This enables more thorough retrieval from user's knowledge base by
combining search, view and list files tools.
2025-07-02 20:48:24 -07:00
Debanjum
2f9f608cff Add file viewer tool to enable researcher to read documents
Allow reading whole file contents or content in specified line range
in user's knowledge base. This allows for more deterministic
traversal.
2025-07-02 20:48:24 -07:00
Debanjum
721c55a37b Rename ResponseWithThought response field to text for better naming 2025-07-02 20:48:24 -07:00
Debanjum
490f0a435d Pass research tools directly with their varied args for flexibility
Why
---
Previously researcher had a uniform response schema to pick next tool,
scratchpad, query and tool. This didn't allow choosing different
arguments for the different tools being called. And the tool call,
result format passed by khoj was custom and static across all LLMs.

Passing the tools and their schemas directly to llm when picking next
tool allows passing multiple, tool specific arguments for llm to
select. For example, model can choose webpage urls to read or image
gen aspect ratio (apart from tool query) to pass to the specific tool.

Using the LLM tool calling paradigm allows model to see tool call,
tool result in a format that it understands best.

Using standard tool calling paradigm also allows for incorporating
community builts tools more easily via MCP servers, clients tools,
native llm api tools etc.

What
---
- Return ResponseWithThought from completion_with_backoff ai model
  provider methods
- Show reasoning model thoughts in research mode train of thought.
  For non-reasoning models do not show researcher train of thought.
  As non-reasoning models don't (by default) think before selecing
  tool. Showing tool call is lame and resembles tool's action shown in
  next step.

- Store tool calls in standardized format.
- Specify tool schemas in tool for research llm definitions as well.
- Transform tool calls, tool results to standardized form for use
  within khoj. Manage the following tool call, result transformations:
  - Model provider tool_call -> standardized tool call
  - Standardized tool call, result -> model specific tool call, result

- Make researcher choose webpages urls to read as well for the webpage
  tool. Previously it would just decide the query but let the webpage
  reader infer the query url(s). But researcher has better context on
  which webpages it wants to have read to answer their query.

  This should eliminate the webpage reader deciding urls to read step
  and speed up webpage read tool use.

Handle unset response thoughts. Useful when retry on failed request

Previously resulted in unbound local variable response_thoughts error
2025-07-02 20:48:23 -07:00
Debanjum
80522e370e Make researcher pick next tool using model function calling feature
The pick next tool requests next tool to call to model in function
calling / tool use format.
2025-07-02 19:10:02 -07:00
Debanjum
b888d5e65e Add function calling support to Anthropic, Gemini and OpenAI models
Previously these models could use response schema but not tools use
capabilities provided by these AI model APIs.

This change allows chat actors to use the function calling feature to
specify which tools the LLM by these providers can call.

This should help simplify tool definition and structure context in
forms that these LLMs natively understand.
(i.e in tool_call - tool_result ~chatml format).
2025-07-02 19:10:02 -07:00
Debanjum
9607f2e87c Release Khoj version 1.42.8 2025-07-02 19:07:51 -07:00
Debanjum
f4fc76645c Upgrade electron package used by desktop app 2025-07-02 18:50:18 -07:00
Debanjum
96fb9bd87c Tune temperature and top_p to reduce gemini model repetition
Gemini models, especially flash models, seems to have a tendency to go
into long, repetitive output tokens loop. Unsure why.

Tune temp, top_p as gemini api doesn't seem to allow setting frequency
or presence penalty, at least for reasoning models. Those would have
been a more direct mechanism to avoid model getting stuck in a loop.
2025-07-02 18:42:32 -07:00
Debanjum
9774bb012e Update agent knowledge base and configuration atomically
This should help prevent partial updates to agent. Especially useful
for agent's with large knowledge bases being updated. Failing the call
should raise an exception. This will allow your to retry save instead
of losing your previous agent changes or saving only partial.
2025-07-02 18:01:18 -07:00
Debanjum
e6cc9b1182 Test update agents with large knowledge bases 2025-07-02 18:01:18 -07:00
Debanjum
5fe2ea8a55 Run safety check only when agent persona updated on agent edits
Running safety check isn't required if the agent persona wasn't
updated this time around as it would have passed safety check
previously.

This should speed up editing agents when agent persona isn't updated.
2025-07-02 18:01:18 -07:00
Debanjum
a8c47a70f7 Show friendly name for available ai models on clients when set 2025-07-01 16:59:13 -07:00
Debanjum
487826bc32 Release Khoj version 1.42.7 2025-06-27 18:21:18 -07:00
Debanjum
29e5d7ef08 Improve support for new Deepseek R1 model over Openai compatible api
Parse thinking out from <think>..</think> tags in chat response
Handle merging structured message content, not just str, for deepseek.
2025-06-27 18:17:35 -07:00
Debanjum
a33580d560 Enable cache, proxy to improve firecrawl webpage scrape speed, success 2025-06-27 16:35:25 -07:00
Debanjum
1566e3c74d Ease bulk (de-)selecting of files to add/remove to agent knowledge base
Add select all, deselect all buttons to select all (filtered) files to
add, remove from an agent's knowledge base.
2025-06-27 15:19:50 -07:00
Debanjum
3bb4e63f3e Add ability to set default chat model via env var in docker-compose.yml 2025-06-27 15:19:50 -07:00
Debanjum
dd89dd3fc8 Bump web, documentation and desktop app package dependencies 2025-06-27 15:19:50 -07:00
Peter Gaultney
9f3ceba541 Allow setting embedded postgres db directory with PGSERVER_DATA_DIR env var (#1202)
It seems to me that it would be useful to be able to be explicit about
where the embedded database should live - as well as where it _does_
live (via the info log), when not specifying.
2025-06-28 03:21:23 +05:30
Debanjum
d37113850c Let reasoning gemini models dynamically set their thinking budget
All gemini 2.5 series models support dynamic thinking budgets by
setting thinking_budget to -1.
2025-06-27 13:13:24 -07:00
Debanjum
ba059ad8b0 Deduplicate passing chat history to extract question only in prompt
Extract questions has chat history in prompt and in actual chat history.

Only pass in prompt for now. Later update prompts to pass chat history
in chat messages list for better truncation flexibility.
2025-06-24 02:49:29 -07:00
Debanjum
170a8036fe Fix 2 document retrieval bugs to not drop valid search results
1. Due to the interaction of two changes:
  - dedupe by corpus_id, where corpus_id tracks logical content blocks
  like files, org/md headings.
  - return compiled, not logical blocks, where compiled track smaller
  content chunks that fit within search model, llm context windows.

  When combined they showed only 1 hit compiled chunk per logical
  block. Even if multiple chunks match within a logical content block.

  Fix is to either dedupe by compiled text or to return deduped
  logical content blocks (by corpus_id) corresponding to matched
  compiled chunks. This commit fixes it by the first method.

2. Due to inferred query, search results zip which resulted in a
   single search result being returned per query!
   This silently cut down matching search results and went undetected.
2025-06-24 02:47:07 -07:00
Debanjum
73c384b052 Reduce chat history spacing to reduce wasted space b/w chat input box
The tailwing theme spacing of the scroll area surrounding chat history
on large screens was what was causing the large gap between chat input
box and chat history on some screen layouts.

This change reduces the spacing to a more acceptable level.
2025-06-24 02:46:46 -07:00
Debanjum
ca9109455b Retry on intermitted image generation failure for resilient generation 2025-06-24 02:46:46 -07:00
Debanjum
4448ab665c Improve google image generation configuration 2025-06-24 02:46:46 -07:00
Debanjum
dc202e4441 Release Khoj version 1.42.6 2025-06-20 15:00:22 -07:00
Debanjum
623c8b65f1 Set failed response message when a research iteration fails.
Previously summarizedResult would be unset when a tool call failed.

This caused research to fail due to ChatMessageModel failures when
constructing tool chat histories and would have caused similar errors
in other constructed chat histories.

Putting a failed iteration message in the summary prevents that while
letting the research agent continue its research.
2025-06-20 14:13:50 -07:00
Debanjum
b85c646611 Make organic web search result text snippet field optional
All web search providers, like Jina/Searxng?, do not return a text
snippet. Making snippet optional allows processing search results by
these web search providers, without hitting validation errors.
2025-06-20 13:47:08 -07:00
Debanjum
22d71cab44 Log ChatMessageModel validation errors during conversation save 2025-06-19 16:48:11 -07:00
Debanjum
494e7b3856 Update gemini 2.5 to stable model pricing from preview pricing 2025-06-19 16:48:11 -07:00
Debanjum
029bd3be56 Handle breaking change in write file to e2b code sandbox
For some reason the function signature, kwargs are broken. Removing
usage of keyword args resolves the file upload to sandbox error.
2025-06-19 16:48:11 -07:00
Debanjum
b18b7b2e33 Handle unset response thoughts. Useful when retry on failed request
Previously resulted in unbound local variable response_thoughts error
2025-06-19 16:48:06 -07:00
Debanjum
906ff46e6c Handle research iterations where document search returns no results 2025-06-19 16:47:08 -07:00
Debanjum
aa7b23c125 Handle rendering document references with no compiled text on web app 2025-06-17 15:47:58 -07:00
Debanjum
4ca247f0bc Always append random suffix to shared conversations urls 2025-06-17 15:47:58 -07:00