Make type of encoder to use for embeddings configurable via khoj.yml

- Previously `model_type' was set in the setup of each `search_type'
  - All encoders were of type `SentenceTransformer'
  - All cross_encoders were of type `CrossEncoder'

- Now `encoder-type' can be configured via the new `encoder_type' field
  in `TextSearchConfig' under `search-type` in `khoj.yml`.

- All the specified `encoder-type' class needs is an `encode' method
  that takes entries and returns embedding vectors
This commit is contained in:
Debanjum Singh Solanky
2023-01-06 15:58:03 -03:00
parent fa92adcf0d
commit 2fe37a090f
4 changed files with 15 additions and 5 deletions

View File

@@ -36,7 +36,7 @@ def initialize_model(search_config: ImageSearchConfig):
encoder = load_model(
model_dir = search_config.model_directory,
model_name = search_config.encoder,
model_type = SentenceTransformer)
model_type = search_config.encoder_type or SentenceTransformer)
return encoder

View File

@@ -37,7 +37,7 @@ def initialize_model(search_config: TextSearchConfig):
bi_encoder = load_model(
model_dir = search_config.model_directory,
model_name = search_config.encoder,
model_type = SentenceTransformer,
model_type = search_config.encoder_type or SentenceTransformer,
device=f'{state.device}')
# The cross-encoder re-ranks the results to improve quality