mirror of
https://github.com/khoaliber/khoj.git
synced 2026-03-04 05:39:06 +00:00
- Issue
- Explicit filtering was earlier being done after search by bi-encoder
but before re-ranking by cross-encoder
- This was limiting the quality of results being returned. As the
bi-encoder returned results which were going to be excluded. So the
burden of improving those limited results post filtering was on the
cross-encoder by re-ranking the remaining results based on query
- Fix
- Given the embeddings corresponding to an entry are at the same index
in their respective lists. We can run the filter for blocked,
required words before the search by the bi-encoder model. And limit
entries, embeddings being considered for the current query
- Result
- Semantic search by the bi-encoder gets to return most relevant
results for the query, knowing that the results aren't going to be
filtered out after. So the cross-encoder shoulders less of the
burden of improving results
- Corollary
- This pre-filtering technique allows us to apply other explicit
filters on entries relevant for the current query
- E.g limit search for entries within date/time specified in query