klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-03-02 21:19:12 +00:00

Author	SHA1	Message	Date
Debanjum	5ef3a3f027	Remove unused eval workflow config to auto read webpage in default mode	2025-09-16 14:55:06 +05:30
Debanjum	703e189979	Deterministically shuffle dataset for consistent data in a eval run Previously eval run across modes would use different dataset shuffles. This change enables a strict apples to apples perf comparison of the different khoj modes across the same (random) subset of questions by using a dataset seed per workflow run to sample questions	2025-08-31 23:40:08 -07:00
Debanjum	2823c84bb4	Default to gemini 2.5 model series on init and for eval	2025-08-22 20:34:38 -07:00
Debanjum	c53a70c997	Share debug logs from github eval run for debugging	2025-08-22 19:06:37 -07:00
Debanjum	a494a766a4	Fix eval github workflow and show more logs to debug its startup	2025-08-15 16:26:37 -07:00
Debanjum	f2bd07044e	Speed up github workflows by not installing cuda server dependencies - CI runners don't have GPUs - Pytorch related Nvidia cuda packages are not required for testing, evals or pre-commit checks. - Avoiding these massive downloads should speed up workflow run.	2025-08-01 23:35:08 -07:00
Debanjum	006b958071	Use UV to install server for speed, package locks in dev setup, workflows It's much faster than pip, includes dependency locks via uv.lock and comes with standard convenience utilities (e.g pipx, venv replacement)	2025-08-01 00:01:34 -07:00
Debanjum	22cd638add	Fix handling unset openai_base_url to run eval with openai chat models The github run_eval workflow sets OPENAI_BASE_URL to empty string. The ai model api created during initialization for openai models gets set to empty string rather than None or the actual openai base url This tries to call llm at to empty string base url instead of the default openai api base url, which obviously fails. Fix is to map empty base url's to the actual openai api base url.	2025-05-19 16:19:43 -07:00
Debanjum	c1912f8ca7	Default eval to use 10 iterations for research mode	2025-04-05 10:09:58 +05:30
Debanjum	e9928d3c50	Eval more model, control randomization & auto read webpage via workflow - Control auto read webpage via eval workflow. Prefix env var with KHOJ_ Default to false as it is the default that is going to be used in prod going forward. - Set openai api key via input param in manual eval workflow runs - Simplify evaluating other chat models available over openai compatible api via eval workflow. - Mask input api key as secret in workflow. - Discard unnecessary null setting of env vars. - Control randomization of samples in eval workflow. If randomization is turned off, it'll take the first SAMPLE_SIZE items from the eval dataset instead of a random collection of SAMPLE_SIZE items.	2025-04-04 20:11:00 +05:30
Debanjum	0dcb2544d7	Use embedded postgres instead of postgres server for eval workflow	2025-04-04 20:11:00 +05:30
Debanjum	66e9ddb6be	Support OpenAI (API compatible) models and Firecrawl in eval workflow	2025-04-03 14:03:29 +05:30
Debanjum	d4b0ef5e93	Fix ability to disable code and internet providers in eval workflow Sets env vars to empty if condition not met so: - Terrarium (not e2b) used as code sandbox on release triggered eval - Internet turned off for math500 eval	2025-03-25 14:04:16 +05:30
Debanjum	6cc5a10b09	Disable SimpleQA eval on release as saturated & low signal for usecase Reaching >94% in research mode on SimpleQA. When answers can be researched online, it becomes too easy. And the FRAMES eval does a more thorough job of evaluating that use-case anyway.	2025-03-22 08:05:12 +05:30
Debanjum	dc473015fe	Set default model, sandbox to display in eval workflow summary on release	2025-03-20 14:44:56 +05:30
Debanjum	931f555cf8	Configure max allowed iterations in research mode via env var	2025-03-18 18:15:50 +05:30
Debanjum	c133d11556	Improvements based on code feedback	2025-03-09 18:23:30 +05:30
Debanjum	94ca458639	Set default chat model to KHOJ_CHAT_MODEL env var if set Simplify code log to set default_use_model during init for readability	2025-03-09 18:23:30 +05:30
Debanjum	45fb85f1df	Add E2B as an optional code sandbox provider - Specify E2B api key and template to use via env variables - Try load, use e2b library when E2B api key set - Fallback to try use terrarium sandbox otherwise - Enable more python packages in e2b sandbox like rdkit via custom e2b template - Use Async E2B Sandbox - Parallelize file IO with sandbox - Add documentation on how to enable E2B as code sandbox instead of Terrarium	2025-03-09 18:23:30 +05:30
Debanjum	b4183c7333	Default to gemini 2.0 flash instead of 1.5 flash on Gemini setup Add price of gemini 2.0 flash for cost calculations	2025-03-07 13:48:15 +05:30
Debanjum	701a7be291	Stop code sandbox on request timeout to allow sandbox process restarts	2025-03-07 13:48:15 +05:30
sabaimran	fd90842d38	Bump postgresql server dev version to 16 for latest ubuntu	2025-01-22 19:07:54 -08:00
Debanjum	2069f571c8	Upgrade upload-artifact gh action to v4 as <=v3 deprecated This started failing github workflow jobs	2025-01-10 00:41:24 +07:00
Debanjum	2db7a1ca6b	Restart code sandbox on crash in eval github workflow (#1007 ) See `e3fed3750b` for corresponding change to use pm2 to auto-restart code sandbox	2024-12-12 14:32:03 -08:00
Debanjum	9eb863e964	Restart code sandbox on crash in eval github workflow	2024-12-12 11:28:54 -08:00
sabaimran	9c403d24e1	Fix reference to directory in the eval workflow for starting terrarium	2024-12-08 13:03:05 -08:00
sabaimran	6940c6379b	Add sudo when running installations in order to install relevant packages add --legacy-peer-deps temporarily to see if it helps mitigate the issue	2024-12-08 11:11:13 -08:00
sabaimran	4c4b7120c6	Use Khoj terrarium fork instead of building from official Cohere repo	2024-12-08 11:06:33 -08:00
Debanjum	29e801c381	Add MATH500 dataset to eval Evaluate simpler MATH500 responses with gemini 1.5 flash This improves both the speed and cost of running this eval	2024-11-28 12:48:25 -08:00
Debanjum	22aef9bf53	Add GPQA (diamond) dataset to eval	2024-11-28 12:48:25 -08:00
Debanjum	8dd2122817	Set sample size to 200 for automated eval runs as well	2024-11-23 14:48:38 -08:00
Debanjum	50d8405981	Enable khoj to use terrarium code sandbox as tool in eval workflow	2024-11-20 14:19:27 -08:00
Debanjum	ffbd0ae3a5	Fix eval github workflow to run on releases, i.e on tags push	2024-11-20 12:57:42 -08:00
Debanjum	a2ccf6f59f	Fix github workflow to start Khoj, connect to PG and upload results - Do not trigger tests to run in ci on update to evals	2024-11-18 04:25:15 -08:00
Debanjum	7c0fd71bfd	Add GitHub workflow to quiz Khoj across modes and specified evals (#982 ) - Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across general, default and research modes - Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models - Trigger eval workflow on release or manually - Make dataset, khoj mode and sample size configurable when triggered via manual workflow - Enable Web search, webpage read tools during evaluation	2024-11-18 02:19:30 -08:00

35 Commits