mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-02 21:19:12 +00:00
f4bde75249607fc5b68bf3609339f14656e01eb2
- Previously:
The text the model was trained on was being used to
re-create a semblance of the original org-mode entry.
- Now:
- Store raw entry as another key:value in each entry json too
Only return actual raw org entries in results
But create embeddings like before
- Also add link to entry in file:<filename>::<line_number> form
in property drawer of returned results
This can be used to jump to actual entry in it's original file
Semantic Search
Allow natural language search on user content like notes, images, transactions using transformer based models
All data is processed locally. User can interface with semantic-search app via Emacs, API or Commandline
Dependencies
- Python3
- Miniconda
Install
git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search
Run
Load ML model, generate embeddings and expose API to query specified org-mode files
python3 src/main.py -c=sample_config.yml --verbose
Use
-
Semantic Search via Emacs
- Install semantic-search.el
- Run
M-x semantic-search <user-query>or CallC-c s
-
Semantic Search via API
- Query:
GEThttp://localhost:8000/search?q="What is the meaning of life" - Regenerate Embeddings:
GEThttp://localhost:8000/regenerate - Semantic Search API Docs
- Query:
Upgrade
cd semantic-search
git pull origin master
conda env update -f environment.yml
conda activate semantic-search
Acknowledgments
- MiniLM Model for Asymmetric Text Search. See SBert Documentation
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser
Languages
Python
51%
TypeScript
36.1%
CSS
4.1%
HTML
3.2%
Emacs Lisp
2.4%
Other
3.1%