mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-10 13:26:13 +00:00
Use a better model for asymmetric semantic search
- The multi-qa-MiniLM-L6-cos-v1 is more extensively benchmarked[1] - It has the right mix of model query speed, size and performance on benchmarks - On hugging face it has way more downloads and likes than the msmarco model[2] - On very preliminary evaluation of the model - It doubles the encoding speed of all entries (down from ~8min to 4mins) - It gave more entries that stay relevant to the query (3/5 vs 1/5 earlier) [1]: https://www.sbert.net/docs/pretrained_models.html [2]: https://huggingface.co/sentence-transformers
This commit is contained in:
@@ -100,7 +100,7 @@
|
|||||||
#+end_src
|
#+end_src
|
||||||
|
|
||||||
** Acknowledgments
|
** Acknowledgments
|
||||||
- [[https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-6-v3][MiniLM Model]] for Asymmetric Text Search. See [[https://www.sbert.net/examples/applications/retrieve_rerank/README.html][SBert Documentation]]
|
- [[https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1][Multi-QA MiniLM Model]] for Asymmetric Text Search. See [[https://www.sbert.net/examples/applications/retrieve_rerank/README.html][SBert Documentation]]
|
||||||
- [[https://github.com/openai/CLIP][OpenAI CLIP Model]] for Image Search. See [[https://www.sbert.net/examples/applications/image-search/README.html][SBert Documentation]]
|
- [[https://github.com/openai/CLIP][OpenAI CLIP Model]] for Image Search. See [[https://www.sbert.net/examples/applications/image-search/README.html][SBert Documentation]]
|
||||||
- Charles Cave for [[http://members.optusnet.com.au/~charles57/GTD/orgnode.html][OrgNode Parser]]
|
- Charles Cave for [[http://members.optusnet.com.au/~charles57/GTD/orgnode.html][OrgNode Parser]]
|
||||||
- Sven Marnach for [[https://github.com/smarnach/pyexiftool/blob/master/exiftool.py][PyExifTool]]
|
- Sven Marnach for [[https://github.com/smarnach/pyexiftool/blob/master/exiftool.py][PyExifTool]]
|
||||||
|
|||||||
@@ -33,7 +33,7 @@ search-type:
|
|||||||
model_directory: "/data/models/symmetric"
|
model_directory: "/data/models/symmetric"
|
||||||
|
|
||||||
asymmetric:
|
asymmetric:
|
||||||
encoder: "sentence-transformers/msmarco-MiniLM-L-6-v3"
|
encoder: "sentence-transformers/multi-qa-MiniLM-L6-cos-v1"
|
||||||
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
cross-encoder: "cross-encoder/ms-marco-MiniLM-L-6-v2"
|
||||||
model_directory: "/data/models/asymmetric"
|
model_directory: "/data/models/asymmetric"
|
||||||
|
|
||||||
|
|||||||
@@ -85,7 +85,7 @@ default_config = {
|
|||||||
},
|
},
|
||||||
'asymmetric':
|
'asymmetric':
|
||||||
{
|
{
|
||||||
'encoder': "sentence-transformers/msmarco-MiniLM-L-6-v3",
|
'encoder': "sentence-transformers/multi-qa-MiniLM-L6-cos-v1",
|
||||||
'cross-encoder': "cross-encoder/ms-marco-MiniLM-L-6-v2",
|
'cross-encoder': "cross-encoder/ms-marco-MiniLM-L-6-v2",
|
||||||
'model_directory': None
|
'model_directory': None
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -20,7 +20,7 @@ def search_config(tmp_path_factory):
|
|||||||
)
|
)
|
||||||
|
|
||||||
search_config.asymmetric = AsymmetricSearchConfig(
|
search_config.asymmetric = AsymmetricSearchConfig(
|
||||||
encoder = "sentence-transformers/msmarco-MiniLM-L-6-v3",
|
encoder = "sentence-transformers/multi-qa-MiniLM-L6-cos-v1",
|
||||||
cross_encoder = "cross-encoder/ms-marco-MiniLM-L-6-v2",
|
cross_encoder = "cross-encoder/ms-marco-MiniLM-L-6-v2",
|
||||||
model_directory = model_dir
|
model_directory = model_dir
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -42,6 +42,6 @@
|
|||||||
#+end_src
|
#+end_src
|
||||||
|
|
||||||
** Acknowledgments
|
** Acknowledgments
|
||||||
- [[https://huggingface.co/sentence-transformers/msmarco-MiniLM-L-6-v3][MiniLM Model]] for Asymmetric Text Search. See [[https://www.sbert.net/examples/applications/retrieve_rerank/README.html][SBert Documentation]]
|
- [[https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1][MiniLM Model]] for Asymmetric Text Search. See [[https://www.sbert.net/examples/applications/retrieve_rerank/README.html][SBert Documentation]]
|
||||||
- [[https://github.com/openai/CLIP][OpenAI CLIP Model]] for Image Search. See [[https://www.sbert.net/examples/applications/image-search/README.html][SBert Documentation]]
|
- [[https://github.com/openai/CLIP][OpenAI CLIP Model]] for Image Search. See [[https://www.sbert.net/examples/applications/image-search/README.html][SBert Documentation]]
|
||||||
- Charles Cave for [[http://members.optusnet.com.au/~charles57/GTD/orgnode.html][OrgNode Parser]]
|
- Charles Cave for [[http://members.optusnet.com.au/~charles57/GTD/orgnode.html][OrgNode Parser]]
|
||||||
|
|||||||
Reference in New Issue
Block a user