Commit Graph

219 Commits

Author SHA1 Message Date
Debanjum
ab29ffd799 Fix web app packaging for pypi since upgrade to python 3.11.12 in CI 2025-04-19 18:03:29 +05:30
Debanjum
c1912f8ca7 Default eval to use 10 iterations for research mode 2025-04-05 10:09:58 +05:30
Debanjum
e9928d3c50 Eval more model, control randomization & auto read webpage via workflow
- Control auto read webpage via eval workflow. Prefix env var with KHOJ_
  Default to false as it is the default that is going to be used in prod
  going forward.

- Set openai api key via input param in manual eval workflow runs
  - Simplify evaluating other chat models available over openai
    compatible api via eval workflow.
  - Mask input api key as secret in workflow.
  - Discard unnecessary null setting of env vars.

- Control randomization of samples in eval workflow.
  If randomization is turned off, it'll take the first SAMPLE_SIZE
  items from the eval dataset instead of a random collection of
  SAMPLE_SIZE items.
2025-04-04 20:11:00 +05:30
Debanjum
0dcb2544d7 Use embedded postgres instead of postgres server for eval workflow 2025-04-04 20:11:00 +05:30
Artem Yurchenko
1ef8c37c3a Implement better template for feature request issue (#1132)
This PR implements a new feature request template with a few UX/UI improvements.

Key changes:
- Use of GitHub forms.
- Provide note info for a submitter about feature request submitting rules.
- Adds a few handy fields like "Describe the feature" or "Use Case"

Overall, with a template like this feature requests will be more structured and meaningful.
2025-04-03 15:41:04 +05:30
Debanjum
66e9ddb6be Support OpenAI (API compatible) models and Firecrawl in eval workflow 2025-04-03 14:03:29 +05:30
Debanjum
d4b0ef5e93 Fix ability to disable code and internet providers in eval workflow
Sets env vars to empty if condition not met so:
- Terrarium (not e2b) used as code sandbox on release triggered eval
- Internet turned off for math500 eval
2025-03-25 14:04:16 +05:30
Debanjum
6cc5a10b09 Disable SimpleQA eval on release as saturated & low signal for usecase
Reaching >94% in research mode on SimpleQA. When answers can be
researched online, it becomes too easy. And the FRAMES eval does a
more thorough job of evaluating that use-case anyway.
2025-03-22 08:05:12 +05:30
Debanjum
dc473015fe Set default model, sandbox to display in eval workflow summary on release 2025-03-20 14:44:56 +05:30
Artem Yurchenko
a7e261a191 Implement better bug issue template (#1129)
* Implement better bug issue template
* Fix IDs in new bug issue template
* Reduce, reorder and improve field descriptions in the bug issue template

---------

Co-authored-by: Debanjum <debanjum@gmail.com>
2025-03-18 20:53:57 +05:30
Debanjum
931f555cf8 Configure max allowed iterations in research mode via env var 2025-03-18 18:15:50 +05:30
Debanjum
c133d11556 Improvements based on code feedback 2025-03-09 18:23:30 +05:30
Debanjum
94ca458639 Set default chat model to KHOJ_CHAT_MODEL env var if set
Simplify code log to set default_use_model during init for readability
2025-03-09 18:23:30 +05:30
Debanjum
45fb85f1df Add E2B as an optional code sandbox provider
- Specify E2B api key and template to use via env variables
- Try load, use e2b library when E2B api key set
- Fallback to try use terrarium sandbox otherwise
- Enable more python packages in e2b sandbox like rdkit via custom e2b template

- Use Async E2B Sandbox
- Parallelize file IO with sandbox
- Add documentation on how to enable E2B as code sandbox instead of Terrarium
2025-03-09 18:23:30 +05:30
Debanjum
b4183c7333 Default to gemini 2.0 flash instead of 1.5 flash on Gemini setup
Add price of gemini 2.0 flash for cost calculations
2025-03-07 13:48:15 +05:30
Debanjum
701a7be291 Stop code sandbox on request timeout to allow sandbox process restarts 2025-03-07 13:48:15 +05:30
sabaimran
fd90842d38 Bump postgresql server dev version to 16 for latest ubuntu 2025-01-22 19:07:54 -08:00
sabaimran
8fe08eecce add --break-system-packages to bypass venv requirement 2025-01-20 00:21:27 -08:00
sabaimran
bf58d9430b downgrade postgres server pkg to 16 2025-01-20 00:15:56 -08:00
sabaimran
95ad1f936e upgrade postgres server to 17 2025-01-20 00:10:20 -08:00
sabaimran
a214bd4100 upgrade pg server dev version to 15 2025-01-20 00:05:35 -08:00
sabaimran
82ff74cfa9 Run on container with ubuntu latest for pytest gh action workflow 2025-01-19 23:57:57 -08:00
sabaimran
af9e906cb5 Use python3 instead of python when running pip install commands in gh actions 2025-01-17 17:48:42 -08:00
Debanjum
6bd9f6bb61 Give a shorter, simpler name to github workflow to deploy docs 2025-01-12 10:54:56 +07:00
sabaimran
bac90ad69d Upgrade deploy-pages action to vv4 2025-01-09 19:04:31 -08:00
Debanjum
2069f571c8 Upgrade upload-artifact gh action to v4 as <=v3 deprecated
This started failing github workflow jobs
2025-01-10 00:41:24 +07:00
sabaimran
92144c8102 Remove release step in todesktop flow, since we need to run releases manually now
- Leaving it commented out for the time being so we can revisit automating this later
2024-12-17 16:02:45 -08:00
Debanjum
10bd56d2b9 Attest Khoj pypi package by upgrading pypi publish gh action
- Print hash in CI to ease verifying ci built python package matches
  khoj package published on pypi
- Newer pypi publish github action should speed up workflow by ~30s
2024-12-17 13:40:39 -08:00
Debanjum
df15f00243 Tag docker images with latest tag in dockerize workflow on release 2024-12-17 13:18:51 -08:00
sabaimran
f6abfcfa6b Use latest release version for pypi gh action to publish 2024-12-17 12:19:42 -08:00
sabaimran
e74e922cea Update file path of python installation 2024-12-12 16:50:32 -08:00
Debanjum
2db7a1ca6b Restart code sandbox on crash in eval github workflow (#1007)
See
e3fed3750b
for corresponding change to use pm2 to auto-restart code sandbox
2024-12-12 14:32:03 -08:00
Debanjum
9eb863e964 Restart code sandbox on crash in eval github workflow 2024-12-12 11:28:54 -08:00
Debanjum
59008ae90e Use buildx to create multi platform docker image 2024-12-11 00:21:29 -08:00
Debanjum
ec797bc6b8 Build docker imgs on native arch runners to avoid manifest list error
This also avoids the need to use --amend and annotate steps when
creating the multi-arch docker images
2024-12-10 23:16:36 -08:00
Debanjum
5f7b13df2d Fix new docker tags in workflow to not include forward slashes 2024-12-10 22:55:33 -08:00
Debanjum
ba6237b5c0 Fix to create multi-arch builds. Stop docker image overwrites in workflow 2024-12-10 21:08:17 -08:00
sabaimran
44ede26e67 Temporarily disable cloud arm builds while we disambiguate the build issues 2024-12-10 20:00:59 -08:00
sabaimran
9c403d24e1 Fix reference to directory in the eval workflow for starting terrarium 2024-12-08 13:03:05 -08:00
sabaimran
6940c6379b Add sudo when running installations in order to install relevant packages
add --legacy-peer-deps temporarily to see if it helps mitigate the issue
2024-12-08 11:11:13 -08:00
sabaimran
4c4b7120c6 Use Khoj terrarium fork instead of building from official Cohere repo 2024-12-08 11:06:33 -08:00
sabaimran
2dfd163430 Add more explicity run strategies in the runner matrix 2024-11-28 19:31:34 -08:00
sabaimran
80cd902c86 Since linux/amd64 images aren't being created, try setting a custom description on the image
Refer to this GH documentation on working with multi arch images in the container registry:
https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#adding-a-description-to-multi-arch-images
2024-11-28 19:14:06 -08:00
Debanjum
29e801c381 Add MATH500 dataset to eval
Evaluate simpler MATH500 responses with gemini 1.5 flash

This improves both the speed and cost of running this eval
2024-11-28 12:48:25 -08:00
Debanjum
22aef9bf53 Add GPQA (diamond) dataset to eval 2024-11-28 12:48:25 -08:00
Debanjum
8cb0db0051 Fix llama-cpp-python install by pytest github workflow
- Use pre-built wheels for torch and llama-cpp-python
- Install and link musl as it's used by llama-cpp-python pre-built
  wheel instead of glibc
- Join Install git and Install Dependencies steps in pytest workflow
  To remove unnecessary steps
2024-11-26 02:04:36 -08:00
Debanjum
e088fcbc7b Build for arm64 on arm64 runner. Parallelize arm64, x64 docker builds
- Building arm64 image on an ubuntu arm64 runner reduces `yarn build'
  step time by 75% from 12mins to 3mins.
  - This is because no QEMU emulation for arm64 on x86 is required now
- Parallelizing x64 and arm64 platform builds halves build time on top
  - Revert to use standard ubuntu-latest runner as large x64 runner
    doesn't give much more speed improvements

This results an effective additional 50%-66% reduction in build time
on top of #987.

So a full dockerize workflow run now takes *10 mins* vs previous 35+mins.
This is a total of *72% improvement* in max dockerize run time.

Get additional speed improvements when docker layer cache hit.
2024-11-24 23:18:55 -08:00
Debanjum
4a5646c8da Cache docker layers, nextjs builds in dockerize github workflow 2024-11-24 21:06:22 -08:00
Debanjum
9848d89d03 Try build docker images with github high cpu, ram runner 2024-11-23 19:09:36 -08:00
Debanjum
8dd2122817 Set sample size to 200 for automated eval runs as well 2024-11-23 14:48:38 -08:00