mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-02 21:19:12 +00:00
- The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1] - It has the right mix of model query speed, size and performance on benchmarks - On hugging face it has way more downloads and likes than the msmarco model[2] - On very preliminary evaluation of the model - It doubles the encoding speed of all entries (down from ~8min to 4mins) - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier) [1]: https://www.sbert.net/docs/pretrained_models.html [2]: https://huggingface.co/sentence-transformers
2.2 KiB
2.2 KiB
Semantic Search
Allow natural language search on user content like notes, images using transformer based models
All data is processed locally. User can interface with semantic-search app via Emacs, API or Commandline
Dependencies
- Python3
- Miniconda
Install
git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search
Run
Load ML model, generate embeddings and expose API to query specified org-mode files
python3 main.py --input-files ~/Notes/Schedule.org ~/Notes/Incoming.org --verbose
Use
-
Semantic Search via Emacs
- Install semantic-search.el
- Run
M-x semantic-search <user-query>or CallC-c C-s
-
Semantic Search via API
- Query:
GEThttp://localhost:8000/search?q="What is the meaning of life" - Regenerate Embeddings:
GEThttp://localhost:8000/regenerate - Semantic Search API Docs
- Query:
-
Call Semantic Search via Python Script Directly
python3 search_types/asymmetric.py \ --compressed-jsonl .notes.jsonl.gz \ --embeddings .notes_embeddings.pt \ --results-count 5 \ --verbose \ --interactive
Acknowledgments
- MiniLM Model for Asymmetric Text Search. See SBert Documentation
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser