Debanjum 4cad96ded6 Add Script to Evaluate Khoj on Google's FRAMES benchmark (#955)
- Why
We need better, automated evals to measure performance shifts of Khoj
across prompt, model and capability changes.

Google's FRAMES benchmark evaluates multi-step retrieval and reasoning
capabilities of AI agents. It's a good starter benchmark to evaluate Khoj.

- Details
This PR adds an eval script to evaluate Khoj responses on the the FRAMES
benchmark prompts against the ground truth provided by it.

Script allows configuring sample size, batch size, sampling queries from the
eval dataset.

Gemini is used as an LLM Judge to auto grade Khoj responses vs ground truth 
data from the benchmark.
2024-11-06 17:52:01 -08:00
2024-11-02 12:23:11 -07:00
2024-11-02 12:23:11 -07:00

Khoj Logo

test docker pypi discord

Your AI second brain

📑 Docs   •   🌐 Web   •   🔥 App   •   💬 Discord   •   ✍🏽 Blog


Khoj is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.

  • Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini).
  • Get answers from the internet and your docs (including image, pdf, markdown, org-mode, word, notion files).
  • Access it from your Browser, Obsidian, Emacs, Desktop, Phone or Whatsapp.
  • Create agents with custom knowledge, persona, chat model and tools to take on any role.
  • Automate away repetitive research. Get personal newsletters and smart notifications delivered to your inbox.
  • Find relevant docs quickly and easily using our advanced semantic search.
  • Generate images, talk out loud, play your messages.
  • Khoj is open-source, self-hostable. Always.
  • Run it privately on your computer or try it on our cloud app.

See it in action

demo_chat

Go to https://app.khoj.dev to see Khoj live.

Full feature list

You can see the full feature list here.

Self-Host

To get started with self-hosting Khoj, read the docs.

Contributors

Cheers to our awesome contributors! 🎉

Made with contrib.rocks.

Interested in Contributing?

We are always looking for contributors to help us build new features, improve the project documentation, or fix bugs. If you're interested, please see our Contributing Guidelines and check out our Contributors Project Board.

Description
No description provided
Readme AGPL-3.0 116 MiB
Languages
Python 51%
TypeScript 36.1%
CSS 4.1%
HTML 3.2%
Emacs Lisp 2.4%
Other 3.1%