mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-02 21:19:12 +00:00
b1e64fd4a88c2079cccb3efc0b61ee5101ecbc34
- Formalize filters into class with can_filter() and filter() methods - Use can_filter() method to decide whether to apply filter and create deep copies of entries and embeddings for it - Improve search speed for queries with no filters as deep copying entries, embeddings takes the most time after cross-encodes scoring when calling the /search API Earlier we would create deep copies of entries, embeddings even if the query did not contain any filter keywords
Khoj
Natural language search engine for your personal notes, transactions and images
Features
- Advanced Natural language understanding using Transformer based ML Models
- Your personal data stays local. All search, indexing is done on your machine*
- Index Org-mode and Markdown notes, Beancount transactions and Photos
- Interact with Khoj using a Web Browser, Emacs or the API.
Demo
Setup
1. Clone
git clone https://github.com/debanjum/khoj && cd khoj
2. Configure
- Required: Update docker-compose.yml to mount your images, (org-mode or markdown) notes and beancount directories
- Optional: Edit application configuration in sample_config.yml
3. Run
docker-compose up -d
Note: The first run will take time. Let it run, it's mostly not hung, just generating embeddings
Use
-
Khoj via Web
- Go to http://localhost:8000/ or open index.html in your browser
-
Khoj via Emacs
-
Khoj via API
Upgrade
docker-compose build --pull
Troubleshooting
-
Symptom: Errors out with "Killed" in error message
- Fix: Increase RAM available to Docker Containers in Docker Settings
- Refer: StackOverflow Solution, Configure Resources on Docker for Mac
-
Symptom: Errors out complaining about Tensors mismatch, null etc
- Mitigation: Delete content-type > image section from docker_sample_config.yml
Miscellaneous
-
The experimental chat API endpoint uses the OpenAI API
- It is disabled by default
- To use it add your
openai-api-keyto config.yml
Development Setup
Setup on Local Machine
1. Install Dependencies
- Install Python3 [Required]
- Install Conda [Required]
-
Install Exiftool [Optional]
sudo apt-get -y install libimage-exiftool-perl
2. Install Khoj
git clone https://github.com/debanjum/khoj && cd khoj
conda env create -f config/environment.yml
conda activate khoj
3. Configure
- Configure files/directories to search in
content-typesection ofsample_config.yml -
To run application on test data, update file paths containing
/data/totests/data/insample_config.yml- Example replace
/data/notes/*.orgwithtests/data/notes/*.org
- Example replace
4. Run
Load ML model, generate embeddings and expose API to query notes, images, transactions etc specified in config YAML
python3 -m src.main -c=config/sample_config.yml -vv
Upgrade On Local Machine
cd khoj
git pull origin master
conda deactivate khoj
conda env update -f config/environment.yml
conda activate khoj
Run Unit tests
pytest
Acknowledgments
- Multi-QA MiniLM Model for Asymmetric Text Search. See SBert Documentation
- All MiniLM Model for Symmetric Text Search
- OpenAI CLIP Model for Image Search. See SBert Documentation
- Charles Cave for OrgNode Parser
- Sven Marnach for PyExifTool
Languages
Python
51%
TypeScript
36.1%
CSS
4.1%
HTML
3.2%
Emacs Lisp
2.4%
Other
3.1%
