Use url fragment schema for deep link URIs, borrowing from URL/PDF
schemas. E.g file:///path/to/file.txt#line=<line_no>&#page=<page_no>
Compute line number during (recursive) org-mode entry chunking.
Thoroughly test line number in URI maps to line number of chunk in
actual org mode file.
This deeplink URI with line number is passed to llm as context to
better combine with line range based view file tool.
Grep tool already passed matching line number. This change passes
line number in URIs of org entries matched by the semantic search tool
Only embedding models see, operate on compiled text.
LLMs should see raw entry to improve combining it with other document
traversal tools for better regex and line matching.
Users see raw entry for better matching with their actual notes.
Reduce structural changes to raw entry allows better deep-linking and
re-annotation. Currently done via line number in new uri field.
Only add properties drawer to raw entry if entry has properties
Previously line and source properties were inserted into raw entries.
This isn't done anymore. Line, source are deprecated for use in khoj.el.
## Why
Move to function calling paradigm to give models tool call -> tool
result in formats they're fine-tuned to understand. Previously we were
giving them results in our specific format (as function calling paradigm
wasn't well-established yet).
And improve prompt cache hits by caching tool definitions.
This is a **breaking change**. AI Models and APIs that do not support
function calling will not work with Khoj in research mode. Function
calling is supported by:
- Standard commercial AI Models and APIs like Anthropic, Gemini, OpenAI,
OpenRouter
- Standard open-source AI APIs like llama.cpp server, Ollama
- Standard open source models like Qwen, DeepSeek, Gemma, Llama, Mistral
## What
### Use Function Calling for Tool Use
- Add Function Calling support to Anthropic, Gemini, OpenAI AI Model
APIs
- Move Existing Research Mode Tools to Use Function Calling
### Get More Comprehensive Results from your Knowledge Base (KB)
- Give Research Agent better Document Retrieval Tools
- Add grep files tool to enable researcher to find documents by regex
- Add list files tool to enable researcher to find documents by path
- Add file viewer tool to enable researcher to read documents
### Miscellaneous
- Improve Research Prompt, Truncation, Retry and Caching
- Show reasoning model thoughts in Khoj train of thought for
intermediate steps as well
- Cache last anthropic message. Given research mode now uses function
calling paradigm and not the old research mode structure.
- Cache tool definitions passed to anthropic models
- Stop dropping first message if by assistant as seems like Anthropic
API doesn't complain about it any more.
- Drop tool result when tool call is truncated as invalid state
- Do not truncate tool use message content, just drop the whole tool
use message.
AI model APIs need tool use assistant message content in specific
form (e.g with thinking etc.). So dropping content items breaks
expected tool use message content format.
Handle tool use scenarios where iteration query isn't set for retry
- Deepcopy messages before formatting message for Anthropic to allow
idempotency so retry on failure behaves as expected
- Handle failed calls to pick next tools to pass failure warning and
continue next research iteration. Previously if API call to pick
next failed, the research run would crash
- Add null response check for when Gemini models fail to respond
Previously if anthropic models were using tools, the models text
response accompanying the tool use wouldn't be shown as they were
overwritten in aggregated response with the tool call json.
This changes appends the text response to the thoughts portion on tool
use to still show model's thinking. Thinking and text response are
delineated by italics vs normal text for such cases.
This should avoid the need to reformat the Khoj standardized tool call
for cache hits and satisfying ai model api requirements.
Previously multi-turn tool use calls to anthropic reasoning models
would fail as needed their thoughts to be passed back. Other AI model
providers can have other requirements.
Passing back the raw response as is should satisfy the default case.
Tracking raw response should make it easy to apply any formatting
required before sending previous response back, if any ai model
provider requires that.
Details
---
- Raw response content is passed back in ResponseWithThoughts.
- Research iteration stores this and puts it into model response
ChatMessageModel when constructing iteration history when it is
present.
Fallback to using parsed tool call when raw response isn't present.
- No need to format tool call messages for anthropic models as we're
passing the raw response as is.
Researcher is expanding into accomplish task behavior, especially with
tool use from the previous collect information to answer user query
behavior.
Update the researcher's system prompt to reflect the new objective better.
Encourage model to not stop working on task until achieve objective
Earlier khoj could technically only answer questions existential
questions, i.e question that would terminate once any relevant note to
answer that question was found.
This change enables khoj to answer universal questions, i.e questions
that require searching through all notes or finding all instances.
It enables more thorough retrieval from user's knowledge base by
combining semantic search, regex search, view and list files tools.
For more development details including motivation, see live coding
session 1.1 at https://www.youtube.com/live/-2s_qi4hd2k
Allow getting a map of user's knowledge base under specified path.
This enables more thorough retrieval from user's knowledge base by
combining search, view and list files tools.
Why
---
Previously researcher had a uniform response schema to pick next tool,
scratchpad, query and tool. This didn't allow choosing different
arguments for the different tools being called. And the tool call,
result format passed by khoj was custom and static across all LLMs.
Passing the tools and their schemas directly to llm when picking next
tool allows passing multiple, tool specific arguments for llm to
select. For example, model can choose webpage urls to read or image
gen aspect ratio (apart from tool query) to pass to the specific tool.
Using the LLM tool calling paradigm allows model to see tool call,
tool result in a format that it understands best.
Using standard tool calling paradigm also allows for incorporating
community builts tools more easily via MCP servers, clients tools,
native llm api tools etc.
What
---
- Return ResponseWithThought from completion_with_backoff ai model
provider methods
- Show reasoning model thoughts in research mode train of thought.
For non-reasoning models do not show researcher train of thought.
As non-reasoning models don't (by default) think before selecing
tool. Showing tool call is lame and resembles tool's action shown in
next step.
- Store tool calls in standardized format.
- Specify tool schemas in tool for research llm definitions as well.
- Transform tool calls, tool results to standardized form for use
within khoj. Manage the following tool call, result transformations:
- Model provider tool_call -> standardized tool call
- Standardized tool call, result -> model specific tool call, result
- Make researcher choose webpages urls to read as well for the webpage
tool. Previously it would just decide the query but let the webpage
reader infer the query url(s). But researcher has better context on
which webpages it wants to have read to answer their query.
This should eliminate the webpage reader deciding urls to read step
and speed up webpage read tool use.
Handle unset response thoughts. Useful when retry on failed request
Previously resulted in unbound local variable response_thoughts error
Previously these models could use response schema but not tools use
capabilities provided by these AI model APIs.
This change allows chat actors to use the function calling feature to
specify which tools the LLM by these providers can call.
This should help simplify tool definition and structure context in
forms that these LLMs natively understand.
(i.e in tool_call - tool_result ~chatml format).
Gemini models, especially flash models, seems to have a tendency to go
into long, repetitive output tokens loop. Unsure why.
Tune temp, top_p as gemini api doesn't seem to allow setting frequency
or presence penalty, at least for reasoning models. Those would have
been a more direct mechanism to avoid model getting stuck in a loop.
This should help prevent partial updates to agent. Especially useful
for agent's with large knowledge bases being updated. Failing the call
should raise an exception. This will allow your to retry save instead
of losing your previous agent changes or saving only partial.
Running safety check isn't required if the agent persona wasn't
updated this time around as it would have passed safety check
previously.
This should speed up editing agents when agent persona isn't updated.
It seems to me that it would be useful to be able to be explicit about
where the embedded database should live - as well as where it _does_
live (via the info log), when not specifying.
Extract questions has chat history in prompt and in actual chat history.
Only pass in prompt for now. Later update prompts to pass chat history
in chat messages list for better truncation flexibility.
1. Due to the interaction of two changes:
- dedupe by corpus_id, where corpus_id tracks logical content blocks
like files, org/md headings.
- return compiled, not logical blocks, where compiled track smaller
content chunks that fit within search model, llm context windows.
When combined they showed only 1 hit compiled chunk per logical
block. Even if multiple chunks match within a logical content block.
Fix is to either dedupe by compiled text or to return deduped
logical content blocks (by corpus_id) corresponding to matched
compiled chunks. This commit fixes it by the first method.
2. Due to inferred query, search results zip which resulted in a
single search result being returned per query!
This silently cut down matching search results and went undetected.
The tailwing theme spacing of the scroll area surrounding chat history
on large screens was what was causing the large gap between chat input
box and chat history on some screen layouts.
This change reduces the spacing to a more acceptable level.
Previously summarizedResult would be unset when a tool call failed.
This caused research to fail due to ChatMessageModel failures when
constructing tool chat histories and would have caused similar errors
in other constructed chat histories.
Putting a failed iteration message in the summary prevents that while
letting the research agent continue its research.
All web search providers, like Jina/Searxng?, do not return a text
snippet. Making snippet optional allows processing search results by
these web search providers, without hitting validation errors.