New
- Support Firecrawl as a online search provider
Improve
- Fallback to other enabled online search providers on failure
- Speed up online search with Jina by excluding webpage content in search results
Fix
- Fix Jina webpage reader. Improve it to include generated alt text to each image on webpage
- Truncate online query to Serper if query exceeds max supported length
Previously query to serper with longer than max supported would throw
error instead of returning at least some results.
Truncating the onlien search query to serper to max supported length
mitigates that issue.
- Improve webpage read to include image alt text
- Improve Jina webpage search to not include each page content
- Use POST instead of GET for web search, webpage read with Jina
This avoids installing pgserver on linux arm64 docker builds, which it
doesn't currently support and isn't required to support as Khoj docker
images can use standard postgres server made available via our
docker-compose.yml
Use pgserver python package as an embedded postgres db,
installed directly as a khoj python package dependency.
This significantly simplifies self-hosting with just a `pip install khoj'.
No need to also install postgres separately.
Still use standard postgres server for multi-user, production use-cases.
- Update default anthropic chat models to latest good models.
- Now that Google supports a good text to image model. Suggest adding
that if Google AI API is setup on first run.
Previously agent slug was not considered on create even when passed
explicitly in agent creation step.
This made the default agent slug different until next run when it was
updated after creation. And didn't allow chat to work on first run
The fix to use the agent slug when explicitly passed allows users to
chat on first run.
Previously messages got Anthropic specific formatting done before
being passed to Anthropic (chat) completion functions.
Move the code to format messages of type list[ChatMessage] into Anthropic
specific format down to the Anthropic (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalized list[ChatMesssage] type of chat messages across AI API
providers
Previously we'd always request up to 3 webpage url via the prompt but
read only one of the requested webpage url.
This would degrade quality of research and default mode. As model may
request reading upto 3 webpage links but get only one of the requested
webpages read.
This change passes the number of webpages to read down to the AI model
dynamically via the updated prompt. So number of webpages requested to
be read should mostly be same as number of webpages actually read.
Note: For now, the max webpages to read is kept same as before at 1.
Previously the research mode planner ignored the current agent or
conversation specific chat model the user was chatting with. Only the
server chat settings, user default chat model, first created chat model
were considered to decide the planner chat model.
This change considers the agent chat model to be used for the planner
as well. The actual chat model picked is decided by the existing
prioritization of server > agent > user > first chat model.
This change enables the creator of a shared conversation to stop sharing the conversation publicly.
### Details
1. Create an API endpoint to enable the owner of the shared conversation to unshare it
2. Unshare a public conversations from the title pane of the public conversation on the web app
Only show the unshare button on public conversations created by the
currently logged in user. Otherwise hide the button
Set conversation.isOwner = true only if currently logged in user
shared the current conversation.
This isOwner information is passed by the get shared conversation API
endpoint
Previously messages passed to gemini (chat) completion functions
got a little of Gemini specific formatting mixed in.
These functions expect a message of type list[ChatMessage] to work
with prompt tracer etc.
Move the code to format messages of type list[ChatMessage] into gemini
specific format down to the gemini (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalize list[ChatMesssage] type of chat messages across
providers
This is analogous to how we enable extended thinking for claude models
in research mode.
Default to medium effort irrespective of deepthought for openai
reasoning models as high effort is currently flaky with regular
timeouts and low effort isn't great.
Sets env vars to empty if condition not met so:
- Terrarium (not e2b) used as code sandbox on release triggered eval
- Internet turned off for math500 eval
- Anthropic expects a 0-1 range. Gemini & OpenAI expect a 0-2 range
- Anneal temperature to explore reasoning trajectories but respond factually
- Default send_message_to_model and extract_question temps to the same
Enable configuring a Khoj AI model API for Vertex AI using GCP credentials.
Specifically use the api key & api base url fields of the AI Model API
associated with the current chat model to extract gcp region, gcp
project id & credentials. This helps create a AnthropicVertex client.
The api key field should contain the GCP service account keyfile as a
base64 encoded string.
The api base url field should be of the form
`https://{MODEL_GCP_REGION}-aiplatform.googleapis.com/v1/projects/{YOUR_GCP_PROJECT_ID}`
Accepting GCP credentials via the AI model API makes it easy to use
across local and cloud environments. As it bypasses the need for a
separate service account key file on the Khoj server.
- The 3.4.1 release of sentence tranformer fixes offline load latency
of sentence transformer models (and Khoj) by avoiding call to HF
- The 4.50.0 release of transformers is resulting in
jax error (unexpected keyword argument 'flatten_with_keys') on load.
Previously google auth library was explicitly installed only for the
cloud variant of Khoj to minimize packages installed for non
production use-cases.
But it was being implicitly installed as a dependency of an explicit
package in the default installation anyway.
Making the dependency on google auth package explicit simplifies
the conditional import of google auth in code while not incurring any
additional cost in terms of space or complexity.
Reaching >94% in research mode on SimpleQA. When answers can be
researched online, it becomes too easy. And the FRAMES eval does a
more thorough job of evaluating that use-case anyway.