Previously eval run across modes would use different dataset shuffles.
This change enables a strict apples to apples perf comparison of the
different khoj modes across the same (random) subset of questions by
using a dataset seed per workflow run to sample questions
- CI runners don't have GPUs
- Pytorch related Nvidia cuda packages are not required for testing,
evals or pre-commit checks.
- Avoiding these massive downloads should speed up workflow run.
It is recommended to chat with open-source models by running an
open-source server like Ollama, Llama.cpp on your GPU powered machine
or use a commercial provider of open-source models like DeepInfra or
OpenRouter.
These chat model serving options provide a mature Openai compatible
API that already works with Khoj.
Directly using offline chat models only worked reasonably with pip
install on a machine with GPU. Docker setup of khoj had trouble with
accessing GPU. And without GPU access offline chat is too slow.
Deprecating support for an offline chat provider directly from within
Khoj will reduce code complexity and increase developement velocity.
Offline models are subsumed to use existing Openai ai model provider.
It'll work similar to the master branch but with pre-1x and latest-1x
tagged series of docker images.
This should ease deployment changes from 1.x vs 2.x series
The github run_eval workflow sets OPENAI_BASE_URL to empty string.
The ai model api created during initialization for openai models gets
set to empty string rather than None or the actual openai base url
This tries to call llm at to empty string base url instead of the
default openai api base url, which obviously fails.
Fix is to map empty base url's to the actual openai api base url.
- Control auto read webpage via eval workflow. Prefix env var with KHOJ_
Default to false as it is the default that is going to be used in prod
going forward.
- Set openai api key via input param in manual eval workflow runs
- Simplify evaluating other chat models available over openai
compatible api via eval workflow.
- Mask input api key as secret in workflow.
- Discard unnecessary null setting of env vars.
- Control randomization of samples in eval workflow.
If randomization is turned off, it'll take the first SAMPLE_SIZE
items from the eval dataset instead of a random collection of
SAMPLE_SIZE items.
Sets env vars to empty if condition not met so:
- Terrarium (not e2b) used as code sandbox on release triggered eval
- Internet turned off for math500 eval
Reaching >94% in research mode on SimpleQA. When answers can be
researched online, it becomes too easy. And the FRAMES eval does a
more thorough job of evaluating that use-case anyway.
- Specify E2B api key and template to use via env variables
- Try load, use e2b library when E2B api key set
- Fallback to try use terrarium sandbox otherwise
- Enable more python packages in e2b sandbox like rdkit via custom e2b template
- Use Async E2B Sandbox
- Parallelize file IO with sandbox
- Add documentation on how to enable E2B as code sandbox instead of Terrarium
- Print hash in CI to ease verifying ci built python package matches
khoj package published on pypi
- Newer pypi publish github action should speed up workflow by ~30s