Previously Gemini 2 flash and flash lite were using context window of
10K by default as no defaults were added for it.
Increase default context for small commercial models to 120K from 60K
as cheaper and faster than their pro models equivalents at 60K context.
We'd moved research planner to only use tools in enum of schema. This
enum tool enforcement prevented model from terminating research by
setting tool field to empty.
Fix the issue by adding text tool to research tools enum and tell
model to use that to terminate research and start response instead.
Make research planner consistently select tool before query. As the
model should tune it's query for the selected tool. It got space to
think about tool to use in the scratchpad already.
- Control auto read webpage via eval workflow. Prefix env var with KHOJ_
Default to false as it is the default that is going to be used in prod
going forward.
- Set openai api key via input param in manual eval workflow runs
- Simplify evaluating other chat models available over openai
compatible api via eval workflow.
- Mask input api key as secret in workflow.
- Discard unnecessary null setting of env vars.
- Control randomization of samples in eval workflow.
If randomization is turned off, it'll take the first SAMPLE_SIZE
items from the eval dataset instead of a random collection of
SAMPLE_SIZE items.
This PR implements a new feature request template with a few UX/UI improvements.
Key changes:
- Use of GitHub forms.
- Provide note info for a submitter about feature request submitting rules.
- Adds a few handy fields like "Describe the feature" or "Use Case"
Overall, with a template like this feature requests will be more structured and meaningful.
Only setup speech to text and text to image models served via openai
compatible APIs when explicitly specified during initialization.
This avoids setup of whisper and dalle when an openai compatible API
is being setup instead of the openai API itself.
- Specify min, max number of list items expected in AI response via JSON schema enforcement. Used by Gemini models
- Warn and drop invalid/empty messages when format messages for Gemini models
- Make Gemini response adhere to the order of the schema property definitions
- Improve agent creation safety checker by using response schema, better prompt
Without explicitly using the property ordering field, gemini returns
responses in alphabetically sorted property order.
We want the model to respect the schema property definition order.
This ensures control during development to maintain response quality.
For example in CoT make it fill scratchpad before answers.
Require at least 1 item in lists. Otherwise gemini flash will
sometimes return an empty list. For chat actors where max items is
known, set that as well.
OpenAI API does not support specifying min, max items in response
schema lists, so drop those properties when response schema is
passed. Add other enforcements to response schema to comply with
response schema format expected by OpenAI API.
Previously we were setting message content part with empty text. This
results in error from Gemini API. Warn and drop such messages instead.
Log empty message content found during construction to root-cause the
issue but allow Khoj to respond without the offending messages in
context for call to Gemini API.
New
- Support Firecrawl as a online search provider
Improve
- Fallback to other enabled online search providers on failure
- Speed up online search with Jina by excluding webpage content in search results
Fix
- Fix Jina webpage reader. Improve it to include generated alt text to each image on webpage
- Truncate online query to Serper if query exceeds max supported length
Previously query to serper with longer than max supported would throw
error instead of returning at least some results.
Truncating the onlien search query to serper to max supported length
mitigates that issue.
- Improve webpage read to include image alt text
- Improve Jina webpage search to not include each page content
- Use POST instead of GET for web search, webpage read with Jina
This avoids installing pgserver on linux arm64 docker builds, which it
doesn't currently support and isn't required to support as Khoj docker
images can use standard postgres server made available via our
docker-compose.yml
Use pgserver python package as an embedded postgres db,
installed directly as a khoj python package dependency.
This significantly simplifies self-hosting with just a `pip install khoj'.
No need to also install postgres separately.
Still use standard postgres server for multi-user, production use-cases.
- Update default anthropic chat models to latest good models.
- Now that Google supports a good text to image model. Suggest adding
that if Google AI API is setup on first run.
Previously agent slug was not considered on create even when passed
explicitly in agent creation step.
This made the default agent slug different until next run when it was
updated after creation. And didn't allow chat to work on first run
The fix to use the agent slug when explicitly passed allows users to
chat on first run.
Previously messages got Anthropic specific formatting done before
being passed to Anthropic (chat) completion functions.
Move the code to format messages of type list[ChatMessage] into Anthropic
specific format down to the Anthropic (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalized list[ChatMesssage] type of chat messages across AI API
providers
Previously we'd always request up to 3 webpage url via the prompt but
read only one of the requested webpage url.
This would degrade quality of research and default mode. As model may
request reading upto 3 webpage links but get only one of the requested
webpages read.
This change passes the number of webpages to read down to the AI model
dynamically via the updated prompt. So number of webpages requested to
be read should mostly be same as number of webpages actually read.
Note: For now, the max webpages to read is kept same as before at 1.
Previously the research mode planner ignored the current agent or
conversation specific chat model the user was chatting with. Only the
server chat settings, user default chat model, first created chat model
were considered to decide the planner chat model.
This change considers the agent chat model to be used for the planner
as well. The actual chat model picked is decided by the existing
prioritization of server > agent > user > first chat model.
This change enables the creator of a shared conversation to stop sharing the conversation publicly.
### Details
1. Create an API endpoint to enable the owner of the shared conversation to unshare it
2. Unshare a public conversations from the title pane of the public conversation on the web app
Only show the unshare button on public conversations created by the
currently logged in user. Otherwise hide the button
Set conversation.isOwner = true only if currently logged in user
shared the current conversation.
This isOwner information is passed by the get shared conversation API
endpoint
Previously messages passed to gemini (chat) completion functions
got a little of Gemini specific formatting mixed in.
These functions expect a message of type list[ChatMessage] to work
with prompt tracer etc.
Move the code to format messages of type list[ChatMessage] into gemini
specific format down to the gemini (chat) completion functions.
This allows the rest of the functionality like prompt tracing to work
with normalize list[ChatMesssage] type of chat messages across
providers
This is analogous to how we enable extended thinking for claude models
in research mode.
Default to medium effort irrespective of deepthought for openai
reasoning models as high effort is currently flaky with regular
timeouts and low effort isn't great.
Sets env vars to empty if condition not met so:
- Terrarium (not e2b) used as code sandbox on release triggered eval
- Internet turned off for math500 eval