klbr/khoj - khoj - Gitea: Git with a cup of tea

klbr/khoj

mirror of https://github.com/khoaliber/khoj.git synced 2026-03-02 21:19:12 +00:00

Author	SHA1	Message	Date
Debanjum	6cc5a10b09	Disable SimpleQA eval on release as saturated & low signal for usecase Reaching >94% in research mode on SimpleQA. When answers can be researched online, it becomes too easy. And the FRAMES eval does a more thorough job of evaluating that use-case anyway.	2025-03-22 08:05:12 +05:30
Debanjum	dc473015fe	Set default model, sandbox to display in eval workflow summary on release	2025-03-20 14:44:56 +05:30
Debanjum	931f555cf8	Configure max allowed iterations in research mode via env var	2025-03-18 18:15:50 +05:30
Debanjum	c133d11556	Improvements based on code feedback	2025-03-09 18:23:30 +05:30
Debanjum	94ca458639	Set default chat model to KHOJ_CHAT_MODEL env var if set Simplify code log to set default_use_model during init for readability	2025-03-09 18:23:30 +05:30
Debanjum	45fb85f1df	Add E2B as an optional code sandbox provider - Specify E2B api key and template to use via env variables - Try load, use e2b library when E2B api key set - Fallback to try use terrarium sandbox otherwise - Enable more python packages in e2b sandbox like rdkit via custom e2b template - Use Async E2B Sandbox - Parallelize file IO with sandbox - Add documentation on how to enable E2B as code sandbox instead of Terrarium	2025-03-09 18:23:30 +05:30
Debanjum	b4183c7333	Default to gemini 2.0 flash instead of 1.5 flash on Gemini setup Add price of gemini 2.0 flash for cost calculations	2025-03-07 13:48:15 +05:30
Debanjum	701a7be291	Stop code sandbox on request timeout to allow sandbox process restarts	2025-03-07 13:48:15 +05:30
sabaimran	fd90842d38	Bump postgresql server dev version to 16 for latest ubuntu	2025-01-22 19:07:54 -08:00
Debanjum	2069f571c8	Upgrade upload-artifact gh action to v4 as <=v3 deprecated This started failing github workflow jobs	2025-01-10 00:41:24 +07:00
Debanjum	2db7a1ca6b	Restart code sandbox on crash in eval github workflow (#1007 ) See `e3fed3750b` for corresponding change to use pm2 to auto-restart code sandbox	2024-12-12 14:32:03 -08:00
Debanjum	9eb863e964	Restart code sandbox on crash in eval github workflow	2024-12-12 11:28:54 -08:00
sabaimran	9c403d24e1	Fix reference to directory in the eval workflow for starting terrarium	2024-12-08 13:03:05 -08:00
sabaimran	6940c6379b	Add sudo when running installations in order to install relevant packages add --legacy-peer-deps temporarily to see if it helps mitigate the issue	2024-12-08 11:11:13 -08:00
sabaimran	4c4b7120c6	Use Khoj terrarium fork instead of building from official Cohere repo	2024-12-08 11:06:33 -08:00
Debanjum	29e801c381	Add MATH500 dataset to eval Evaluate simpler MATH500 responses with gemini 1.5 flash This improves both the speed and cost of running this eval	2024-11-28 12:48:25 -08:00
Debanjum	22aef9bf53	Add GPQA (diamond) dataset to eval	2024-11-28 12:48:25 -08:00
Debanjum	8dd2122817	Set sample size to 200 for automated eval runs as well	2024-11-23 14:48:38 -08:00
Debanjum	50d8405981	Enable khoj to use terrarium code sandbox as tool in eval workflow	2024-11-20 14:19:27 -08:00
Debanjum	ffbd0ae3a5	Fix eval github workflow to run on releases, i.e on tags push	2024-11-20 12:57:42 -08:00
Debanjum	a2ccf6f59f	Fix github workflow to start Khoj, connect to PG and upload results - Do not trigger tests to run in ci on update to evals	2024-11-18 04:25:15 -08:00
Debanjum	7c0fd71bfd	Add GitHub workflow to quiz Khoj across modes and specified evals (#982 ) - Evaluate khoj on random 200 questions from each of google frames and openai simpleqa benchmarks across general, default and research modes - Run eval with Gemini 1.5 Flash as test giver and Gemini 1.5 Pro as test evaluator models - Trigger eval workflow on release or manually - Make dataset, khoj mode and sample size configurable when triggered via manual workflow - Enable Web search, webpage read tools during evaluation	2024-11-18 02:19:30 -08:00

22 Commits