Skip to content

Vector rescoring oversamples k instead of num_candidates #119835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jan 9, 2025

It makes more sense to apply rescoring to an oversampled k instead of num_candidates, as rescoring just a fraction of the candidates will be more performant and offer good recall, specially for smaller k sizes compared to number of candidates.

API changes so we use oversample instead of num_candidates_factor:

GET msmarco-v2-bbq/_search
{
    "query": {
        "knn": {
            "field": "emb",
            "query_vector": [...],
            "k": 10,
            "num_candidates": 100,
            "rescore_vector": {
                "oversample": 2.5
            }
        }
    }
}

This will mean rescoring k * oversample from the num_candidates retrieved on each shard, and returning the top k out of them.

Follow up to #116663

@carlosdelest carlosdelest added >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 auto-backport Automatically create backport pull requests when merged labels Jan 9, 2025
@carlosdelest carlosdelest changed the base branch from main to 8.x January 9, 2025 09:52
@carlosdelest carlosdelest changed the base branch from 8.x to main January 9, 2025 09:53
@carlosdelest carlosdelest marked this pull request as ready for review January 10, 2025 16:11
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! You may have issues with the api compat tests and thus they may need to be muted before backport. I am not sure.

@carlosdelest carlosdelest merged commit 8ca062a into elastic:main Jan 10, 2025
16 checks passed
@carlosdelest
Copy link
Member Author

You may have issues with the api compat tests and thus they may need to be muted before backport. I am not sure.

I hope not as I changed the capability name - I'll keep an eye on this 👍

@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants