mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-02 21:19:12 +00:00
76cd63f4bd79f59f5e240c457b2cb5ba28475013
Count lines not chars
Semantic Search
Allow natural language search on user content like notes, images, transactions using transformer based models
All search is done locally. User can interface with semantic-search app via Emacs, API or Commandline
Setup
Setup using Docker
1. Clone Repository
git clone https://github.com/debanjum/semantic-search && cd semantic-search
2. Configure
- Add Content Directories for Semantic Search to Docker-Compose
- Update docker-compose.yml to mount your images, org-mode notes, ledger/beancount directories
- If required, edit config settings in docker_sample_config.yml.
3. Run
docker-compose up -d
Troubleshooting
- The first run will take time. Let it run, it's mostly not hung
-
Symptom: Errors out with "Killed" in error message
- Fix: Increase RAM available to Docker Containers in Docker Settings
- Refer: StackOverflow Solution, Configure Resources on Docker for Mac
-
Symptom: Errors out complaining about Tensors mismatch, null etc
- Mitigation: Delete content-type > image section from docker_sample_config.yml
Setup on Local Machine
1. Install Dependencies
- Install Python3 [Required[
- Install Conda [Required]
-
Install Exiftool [Optional]
sudo apt-get -y install libimage-exiftool-perl
2. Install Semantic Search
git clone https://github.com/debanjum/semantic-search && cd semantic-search
conda env create -f environment.yml
conda activate semantic-search
3. Configure
- Configure files/directories to search in
content-typesection ofsample_config.yml -
To run application on test data, update file paths containing
/data/totests/data/insample_config.yml- Example replace
/data/notes/*.orgwithtests/data/notes/*.org
- Example replace
4. Run
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
python3 -m src.main -c=sample_config.yml -vv
Use
-
Semantic Search via Emacs
- Install semantic-search.el
- Run
M-x semantic-search <user-query>
-
Semantic Search via API
- Query:
GEThttp://localhost:8000/search?q="What is the meaning of life"&t=notes - Regenerate Embeddings:
GEThttp://localhost:8000/regenerate - Semantic Search API Docs
- Query:
-
UI to Edit Config
Upgrade
On Docker
docker-compose build
On Local Machine
cd semantic-search
git pull origin master
conda deactivate semantic-search
conda env update -f environment.yml
conda activate semantic-search
Miscellaneous
-
The experimental /chat API endpoint uses the OpenAI API
- It is disabled by default
- To use it add your
openai-api-keyto config.yml
Acknowledgments
- MiniLM Model for Asymmetric Text Search. See SBert Documentation
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser
- Sven Marnach for PyExifTool
Languages
Python
51%
TypeScript
36.1%
CSS
4.1%
HTML
3.2%
Emacs Lisp
2.4%
Other
3.1%