-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Vector rescoring oversamples k instead of num_candidates #119835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector rescoring oversamples k instead of num_candidates #119835
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! You may have issues with the api compat tests and thus they may need to be muted before backport. I am not sure.
I hope not as I changed the capability name - I'll keep an eye on this 👍 |
💚 Backport successful
|
It makes more sense to apply rescoring to an oversampled
k
instead ofnum_candidates
, as rescoring just a fraction of the candidates will be more performant and offer good recall, specially for smaller k sizes compared to number of candidates.API changes so we use
oversample
instead ofnum_candidates_factor
:This will mean rescoring
k * oversample
from thenum_candidates
retrieved on each shard, and returning the topk
out of them.Follow up to #116663