Commit Graph

4527 Commits

Author SHA1 Message Date
Debanjum
16ffebf765 Document how to configure using AI models via GCP Vertex AI 2025-03-23 16:12:46 +05:30
Debanjum
7153d27528 Cache Google AI API client for reuse 2025-03-23 16:12:46 +05:30
Debanjum
da33c7d83c Support access to Gemini models via GCP Vertex AI 2025-03-23 16:12:46 +05:30
Debanjum
603c4bf2df Support access to Anthropic models via GCP Vertex AI
Enable configuring a Khoj AI model API for Vertex AI using GCP credentials.

Specifically use the api key & api base url fields of the AI Model API
associated with the current chat model to extract gcp region, gcp
project id & credentials. This helps create a AnthropicVertex client.

The api key field should contain the GCP service account keyfile as a
base64 encoded string.

The api base url field should be of the form
`https://{MODEL_GCP_REGION}-aiplatform.googleapis.com/v1/projects/{YOUR_GCP_PROJECT_ID}`

Accepting GCP credentials via the AI model API makes it easy to use
across local and cloud environments. As it bypasses the need for a
separate service account key file on the Khoj server.
2025-03-23 16:12:46 +05:30
Debanjum
8bebcd5f81 Support longer API key field in DB to store GCP service account keyfile 2025-03-23 14:55:50 +05:30
Debanjum
f2b438145f Upgrade sentence-transformers. Avoid transformers v4.50.0 as problematic
- The 3.4.1 release of sentence tranformer fixes offline load latency
  of sentence transformer models (and Khoj) by avoiding call to HF

- The 4.50.0 release of transformers is resulting in
  jax error (unexpected keyword argument 'flatten_with_keys') on load.
2025-03-23 09:02:57 +05:30
Debanjum
510cbed61c Make google auth package dependency explicit to simplify code
Previously google auth library was explicitly installed only for the
cloud variant of Khoj to minimize packages installed for non
production use-cases.

But it was being implicitly installed as a dependency of an explicit
package in the default installation anyway.

Making the dependency on google auth package explicit simplifies
the conditional import of google auth in code while not incurring any
additional cost in terms of space or complexity.
2025-03-23 09:02:57 +05:30
Debanjum
5fff05add3 Set seed for Google Gemini models using KHOJ_LLM_SEED env variable
This env var was already being used to set seed for OpenAI and Offline
models
2025-03-22 08:59:31 +05:30
Debanjum
6cc5a10b09 Disable SimpleQA eval on release as saturated & low signal for usecase
Reaching >94% in research mode on SimpleQA. When answers can be
researched online, it becomes too easy. And the FRAMES eval does a
more thorough job of evaluating that use-case anyway.
2025-03-22 08:05:12 +05:30
Debanjum
45015dae27 Limit to json enforcement via json object with DeepInfra hosted models
DeepInfra based models do not seem to support json schema. See
https://deepinfra.com/docs/advanced/json_mode for reference
2025-03-22 08:04:09 +05:30
Debanjum
dc473015fe Set default model, sandbox to display in eval workflow summary on release 2025-03-20 14:44:56 +05:30
Debanjum
80d864ada7 Release Khoj version 1.37.0 2025-03-20 14:06:57 +05:30
Debanjum
0c53106b30 Fix passing inline images to vision models
- Fix regression: Inline images were not getting passed to the AI
  models since #992
- Format inline images passed to Gemini models correctly
- Format inline images passed to Anthropic models correctly

Verified vision working with inline and url images for OpenAI,
Anthropic and Gemini models.

Resolves #1112
2025-03-20 13:22:46 +05:30
Debanjum
1ce1d2f5ab Deduplicate, clean code for S3 images uploads 2025-03-20 12:30:07 +05:30
Debanjum
f15a95dccf Show Khoj agent in agent dropdown by default on mobile in web app home
Previously on slow connection you'd see the agent dropdown flicker
from undefined to Khoj default agent on phones and other thin screens.

This is unnecessary and jarring. Populate with default agent to remove
this issue
2025-03-20 12:27:52 +05:30
Debanjum
9a0b126f12 Allow chat input on web app while Khoj responds to speed interactions
Previously the chat input area didn't allow inputting text while Khoj is
researching and generating response.

This change allows the user to add their next text while Khoj
responds. This should speed up interaction cycles as user can have
their next query ready to send when Khoj finishes its response.
2025-03-19 23:08:22 +05:30
Debanjum
e68428dd24 Support enforcing json schema in supported AI model APIs (#1133)
- Trigger
  Gemini 2.0 Flash doesn't always follow JSON schema in research prompt

- Details
  - Use json schema to enforce generate online queries format
  - Use json schema to enforce research mode tool pick format
  - Support constraining Gemini model output to specified response schema
  - Support constraining OpenAI model output to specified response schema
  - Only enforce json output in supported AI model APIs
  - Simplify OpenAI reasoning model specific arguments to OpenAI API
2025-03-19 22:59:23 +05:30
Debanjum
a5627ef787 Use json schema to enforce generate online queries format 2025-03-19 22:32:53 +05:30
Debanjum
2c53eb9de1 Use json schema to enforce research mode tool pick format 2025-03-19 22:32:53 +05:30
Debanjum
6980014838 Support constraining Gemini model output to specified response schema
If the response_schema argument is passed to
send_message_to_model_wrapper it is used to constrain output by Gemini
models
2025-03-19 22:32:53 +05:30
Debanjum
ac4b36b9fd Support constraining OpenAI model output to specified response schema 2025-03-19 22:32:52 +05:30
Debanjum
4a4d225455 Only enforce json output in supported AI model APIs
Deepseek reasoner does not support json object or schema via deepseek API
Azure Ai API does not support json schema

Resolves #1126
2025-03-19 22:32:11 +05:30
Debanjum
d74c3a1db4 Simplify OpenAI reasoning model specific arguments to OpenAI API
Previously OpenAI reasoning models didn't support stream_options and
response_format

Add reasoning_effort arg for calls to OpenAI reasoning models via API.
Right now it defaults to medium but can be changed to low or high
2025-03-19 21:12:02 +05:30
Debanjum
9b6d626a09 Fix to store e2b code execution text output file content as string
Previously was encoding E2B code execution text output content as b64.

This was breaking
- The AI model's ability to see the content of the file
- Downloading the output text file with appropriately encoded content

Issue created when adding E2B code sandbox in #1120
2025-03-19 20:09:41 +05:30
Artem Yurchenko
a7e261a191 Implement better bug issue template (#1129)
* Implement better bug issue template
* Fix IDs in new bug issue template
* Reduce, reorder and improve field descriptions in the bug issue template

---------

Co-authored-by: Debanjum <debanjum@gmail.com>
2025-03-18 20:53:57 +05:30
Debanjum
931f555cf8 Configure max allowed iterations in research mode via env var 2025-03-18 18:15:50 +05:30
Debanjum
2ab8e711d3 Fix Gemini models to output valid json when configured 2025-03-18 17:02:45 +05:30
sabaimran
ce60cb9779 Remove max-w 80vw, which was smushing AI responses 2025-03-13 13:44:05 -07:00
sabaimran
a3c4347c11 Add a one-click action to export all conversations. Add a self-service delete account action to the settings page 2025-03-12 23:54:02 -07:00
Debanjum
79816d2b9b Upgrade package dependencies of server, clients and docs 2025-03-12 00:22:08 +05:30
Debanjum
7bb6facdea Add support for Google Imagen AI models for image generation
Use the new Google GenAI client to generate images with Imagen
2025-03-11 23:39:46 +05:30
Debanjum
bd06fcd9be Stop using old google generativeai package to raise, catch exceptions 2025-03-11 23:39:46 +05:30
Debanjum
bdfa6400ef Upgrade to new Gemini package to interface with Google AI 2025-03-11 22:18:07 +05:30
Debanjum
2790ba3121 Update default temperature for calls to Gemini models to 0.6 from 0.2
This aligns with default temperature used by google ai studio and
may reduce loops and repetitions
2025-03-11 21:28:04 +05:30
Debanjum
50f71be03d Support Claude 3.7 and use its extended thinking in research mode
Claude 3.7 Sonnet is Anthropic's first reasoning model. It provides a
single model/api capable of standard and extended thinking. Utilize
the extended thinking in Khoj's research mode.

Increase default max output tokens to 8K for Anthropic models.
2025-03-11 21:27:59 +05:30
Debanjum
69048a859f Fix E2B tool description prompt to mention plotly package available 2025-03-11 02:20:06 +05:30
Debanjum
9751adb1a2 Improve Code Tool, Sandbox and Eval (#1120)
# Improve Code Tool, Sandbox
  - Improve code gen chat actor to output code in inline md code blocks
  - Stop code sandbox on request timeout to allow sandbox process restarts
  - Use tenacity retry decorator to retry executing code in sandbox
  - Add retry logic to code execution and add health check to sandbox container
  - Add E2B as an optional code sandbox provider

# Improve Gemini Chat Models
  - Default to non-zero temperature for all queries to Gemini models
  - Default to Gemini 2.0 flash instead of 1.5 flash on setup
  - Set default chat model to KHOJ_CHAT_MODEL env var if set
2025-03-09 18:49:59 +05:30
Debanjum
c133d11556 Improvements based on code feedback 2025-03-09 18:23:30 +05:30
Debanjum
94ca458639 Set default chat model to KHOJ_CHAT_MODEL env var if set
Simplify code log to set default_use_model during init for readability
2025-03-09 18:23:30 +05:30
Debanjum
7b2d0fdddc Improve code gen chat actor to output code in inline md code blocks
Simplify code gen chat actor to improve correct code gen success,
especially for smaller models & models with limited json mode support

Allow specify code blocks inline with reasoning to try improve
code quality

Infer input files based on user file paths referenced in code.
2025-03-09 18:23:30 +05:30
Debanjum
8305fddb14 Default to non-zero temperature for all queries to Gemini models.
It may mitigate the intermittent invalid json output issues. Model
maybe going into repetition loops, non-zero temp may avoid that.
2025-03-09 18:23:30 +05:30
Debanjum
45fb85f1df Add E2B as an optional code sandbox provider
- Specify E2B api key and template to use via env variables
- Try load, use e2b library when E2B api key set
- Fallback to try use terrarium sandbox otherwise
- Enable more python packages in e2b sandbox like rdkit via custom e2b template

- Use Async E2B Sandbox
- Parallelize file IO with sandbox
- Add documentation on how to enable E2B as code sandbox instead of Terrarium
2025-03-09 18:23:30 +05:30
Debanjum
b4183c7333 Default to gemini 2.0 flash instead of 1.5 flash on Gemini setup
Add price of gemini 2.0 flash for cost calculations
2025-03-07 13:48:15 +05:30
Debanjum
701a7be291 Stop code sandbox on request timeout to allow sandbox process restarts 2025-03-07 13:48:15 +05:30
Debanjum
ecc2f79571 Use tenacity retry decorator to retry executing code in sandbox 2025-03-07 13:48:15 +05:30
sabaimran
4a28714a08 Add retry logic to code execution and add health check to sandbox container 2025-03-07 13:48:15 +05:30
Debanjum
f13bdc5135 Log eval run progress percentage for orientation 2025-03-07 13:48:15 +05:30
Debanjum
bbe1b63361 Improve Obsidian Sync for Large Vaults (#1078)
- Batch sync files by size to try not exceed API request payload size limits
- Fix force sync of large vaults from Obsidian
- Add API endpoint to delete all indexed files by file type
- Fix to also delete file objects when call DELETE content source API
2025-03-07 13:47:21 +05:30
Debanjum
043de068ff Fix force sync of large vaults from Obsidian
Previously if you tried to force sync a vault with more than 1000
files it would only end up keeping the last batch because the PUT API
call would delete all previous entries.

This change calls DELETE for all previously indexed data first, followed by
a PATCH to index current vault on a force sync (regenerate) request.
This ensures that files from previous batches are not deleted.
2025-03-07 13:34:48 +05:30
Debanjum
86fa528a73 Add API endpoint to delete all indexed files by file type 2025-03-07 13:28:53 +05:30