mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-02 13:18:18 +00:00
dcf7b2d04f9c6c342a78beea993c62c194c4752b
Haven't gotten it to work on Mac or Ubuntu. Remove to avoid confusion for now. Application depends on miniconda for now
Semantic Search
Provide natural language search on user personal content like notes, images using ML models
All data is processed locally. User can interface with semantic-search app via Emacs, API or Commandline
Dependencies
- Python3
- Miniconda
Install
git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search
Setup
Generate compressed JSONL from specified org-mode files
python3 processor/org-mode/org-to-jsonl.py \
--org-files "Schedule.org" "Incoming.org" \
--org-directory "~/Notes" \
--jsonl-file ".notes.jsonl" \
--compress \
--verbose
Run
Load ML model, generate embeddings and expose API interface to run user queries on above org-mode files
python3 main.py -j .notes.jsonl.gz -e .notes_embeddings.pt
Use
-
Calls Semantic Search via Emacs
M-x semantic-search "<user-query>"C-c C-s
-
Call Semantic Search via API
-
Call Semantic Search via Python Script Directly
python3 search_types/asymmetric.py \ -j .notes.jsonl.gz \ -e .notes_embeddings.pt \ -n 5 \ --verbose \ --interactive
Acknowledgments
- MiniLM Model for Asymmetric Text Search. See SBert Documentation
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser
Languages
Python
51%
TypeScript
36.1%
CSS
4.1%
HTML
3.2%
Emacs Lisp
2.4%
Other
3.1%