Commit Graph

218 Commits

Author SHA1 Message Date
Debanjum
f0513cbbb1 Fix to run new automation api tests in ci 2025-07-08 23:27:09 -07:00
Debanjum
58f44ad43b Make release/1.x a privileged branch to run workflows, create releases
It'll work similar to the master branch but with pre-1x and latest-1x
tagged series of docker images.

This should ease deployment changes from 1.x vs 2.x series
2025-07-06 09:28:16 -07:00
Debanjum
e87be4edf4 Pin python version used by github workflow to publish to pypi
Avoids having to update python path to write web app static build
files to everytime patch version of python is updated
2025-06-11 13:30:15 -07:00
Debanjum
ddf028f7af Fix khoj computer image name used in docker-compose.yml instead 2025-06-01 16:44:28 -07:00
Debanjum
c6cc709f62 Fix khoj computer image name and only build it once for each arch 2025-06-01 16:36:46 -07:00
Debanjum
ceb1d82bf6 Create khoj computer via cloud build. Add computer to docker-compose.yml 2025-05-31 21:39:38 -07:00
Debanjum
22cd638add Fix handling unset openai_base_url to run eval with openai chat models
The github run_eval workflow sets OPENAI_BASE_URL to empty string.

The ai model api created during initialization for openai models gets
set to empty string rather than None or the actual openai base url

This tries to call llm at to empty string base url instead of the
default openai api base url, which obviously fails.

Fix is to map empty base url's to the actual openai api base url.
2025-05-19 16:19:43 -07:00
Debanjum
ab29ffd799 Fix web app packaging for pypi since upgrade to python 3.11.12 in CI 2025-04-19 18:03:29 +05:30
Debanjum
c1912f8ca7 Default eval to use 10 iterations for research mode 2025-04-05 10:09:58 +05:30
Debanjum
e9928d3c50 Eval more model, control randomization & auto read webpage via workflow
- Control auto read webpage via eval workflow. Prefix env var with KHOJ_
  Default to false as it is the default that is going to be used in prod
  going forward.

- Set openai api key via input param in manual eval workflow runs
  - Simplify evaluating other chat models available over openai
    compatible api via eval workflow.
  - Mask input api key as secret in workflow.
  - Discard unnecessary null setting of env vars.

- Control randomization of samples in eval workflow.
  If randomization is turned off, it'll take the first SAMPLE_SIZE
  items from the eval dataset instead of a random collection of
  SAMPLE_SIZE items.
2025-04-04 20:11:00 +05:30
Debanjum
0dcb2544d7 Use embedded postgres instead of postgres server for eval workflow 2025-04-04 20:11:00 +05:30
Debanjum
66e9ddb6be Support OpenAI (API compatible) models and Firecrawl in eval workflow 2025-04-03 14:03:29 +05:30
Debanjum
d4b0ef5e93 Fix ability to disable code and internet providers in eval workflow
Sets env vars to empty if condition not met so:
- Terrarium (not e2b) used as code sandbox on release triggered eval
- Internet turned off for math500 eval
2025-03-25 14:04:16 +05:30
Debanjum
6cc5a10b09 Disable SimpleQA eval on release as saturated & low signal for usecase
Reaching >94% in research mode on SimpleQA. When answers can be
researched online, it becomes too easy. And the FRAMES eval does a
more thorough job of evaluating that use-case anyway.
2025-03-22 08:05:12 +05:30
Debanjum
dc473015fe Set default model, sandbox to display in eval workflow summary on release 2025-03-20 14:44:56 +05:30
Debanjum
931f555cf8 Configure max allowed iterations in research mode via env var 2025-03-18 18:15:50 +05:30
Debanjum
c133d11556 Improvements based on code feedback 2025-03-09 18:23:30 +05:30
Debanjum
94ca458639 Set default chat model to KHOJ_CHAT_MODEL env var if set
Simplify code log to set default_use_model during init for readability
2025-03-09 18:23:30 +05:30
Debanjum
45fb85f1df Add E2B as an optional code sandbox provider
- Specify E2B api key and template to use via env variables
- Try load, use e2b library when E2B api key set
- Fallback to try use terrarium sandbox otherwise
- Enable more python packages in e2b sandbox like rdkit via custom e2b template

- Use Async E2B Sandbox
- Parallelize file IO with sandbox
- Add documentation on how to enable E2B as code sandbox instead of Terrarium
2025-03-09 18:23:30 +05:30
Debanjum
b4183c7333 Default to gemini 2.0 flash instead of 1.5 flash on Gemini setup
Add price of gemini 2.0 flash for cost calculations
2025-03-07 13:48:15 +05:30
Debanjum
701a7be291 Stop code sandbox on request timeout to allow sandbox process restarts 2025-03-07 13:48:15 +05:30
sabaimran
fd90842d38 Bump postgresql server dev version to 16 for latest ubuntu 2025-01-22 19:07:54 -08:00
sabaimran
8fe08eecce add --break-system-packages to bypass venv requirement 2025-01-20 00:21:27 -08:00
sabaimran
bf58d9430b downgrade postgres server pkg to 16 2025-01-20 00:15:56 -08:00
sabaimran
95ad1f936e upgrade postgres server to 17 2025-01-20 00:10:20 -08:00
sabaimran
a214bd4100 upgrade pg server dev version to 15 2025-01-20 00:05:35 -08:00
sabaimran
82ff74cfa9 Run on container with ubuntu latest for pytest gh action workflow 2025-01-19 23:57:57 -08:00
sabaimran
af9e906cb5 Use python3 instead of python when running pip install commands in gh actions 2025-01-17 17:48:42 -08:00
Debanjum
6bd9f6bb61 Give a shorter, simpler name to github workflow to deploy docs 2025-01-12 10:54:56 +07:00
sabaimran
bac90ad69d Upgrade deploy-pages action to vv4 2025-01-09 19:04:31 -08:00
Debanjum
2069f571c8 Upgrade upload-artifact gh action to v4 as <=v3 deprecated
This started failing github workflow jobs
2025-01-10 00:41:24 +07:00
sabaimran
92144c8102 Remove release step in todesktop flow, since we need to run releases manually now
- Leaving it commented out for the time being so we can revisit automating this later
2024-12-17 16:02:45 -08:00
Debanjum
10bd56d2b9 Attest Khoj pypi package by upgrading pypi publish gh action
- Print hash in CI to ease verifying ci built python package matches
  khoj package published on pypi
- Newer pypi publish github action should speed up workflow by ~30s
2024-12-17 13:40:39 -08:00
Debanjum
df15f00243 Tag docker images with latest tag in dockerize workflow on release 2024-12-17 13:18:51 -08:00
sabaimran
f6abfcfa6b Use latest release version for pypi gh action to publish 2024-12-17 12:19:42 -08:00
sabaimran
e74e922cea Update file path of python installation 2024-12-12 16:50:32 -08:00
Debanjum
2db7a1ca6b Restart code sandbox on crash in eval github workflow (#1007)
See
e3fed3750b
for corresponding change to use pm2 to auto-restart code sandbox
2024-12-12 14:32:03 -08:00
Debanjum
9eb863e964 Restart code sandbox on crash in eval github workflow 2024-12-12 11:28:54 -08:00
Debanjum
59008ae90e Use buildx to create multi platform docker image 2024-12-11 00:21:29 -08:00
Debanjum
ec797bc6b8 Build docker imgs on native arch runners to avoid manifest list error
This also avoids the need to use --amend and annotate steps when
creating the multi-arch docker images
2024-12-10 23:16:36 -08:00
Debanjum
5f7b13df2d Fix new docker tags in workflow to not include forward slashes 2024-12-10 22:55:33 -08:00
Debanjum
ba6237b5c0 Fix to create multi-arch builds. Stop docker image overwrites in workflow 2024-12-10 21:08:17 -08:00
sabaimran
44ede26e67 Temporarily disable cloud arm builds while we disambiguate the build issues 2024-12-10 20:00:59 -08:00
sabaimran
9c403d24e1 Fix reference to directory in the eval workflow for starting terrarium 2024-12-08 13:03:05 -08:00
sabaimran
6940c6379b Add sudo when running installations in order to install relevant packages
add --legacy-peer-deps temporarily to see if it helps mitigate the issue
2024-12-08 11:11:13 -08:00
sabaimran
4c4b7120c6 Use Khoj terrarium fork instead of building from official Cohere repo 2024-12-08 11:06:33 -08:00
sabaimran
2dfd163430 Add more explicity run strategies in the runner matrix 2024-11-28 19:31:34 -08:00
sabaimran
80cd902c86 Since linux/amd64 images aren't being created, try setting a custom description on the image
Refer to this GH documentation on working with multi arch images in the container registry:
https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#adding-a-description-to-multi-arch-images
2024-11-28 19:14:06 -08:00
Debanjum
29e801c381 Add MATH500 dataset to eval
Evaluate simpler MATH500 responses with gemini 1.5 flash

This improves both the speed and cost of running this eval
2024-11-28 12:48:25 -08:00
Debanjum
22aef9bf53 Add GPQA (diamond) dataset to eval 2024-11-28 12:48:25 -08:00