Skip to content

Add rescore knn vector test coverage #122801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

carlosdelest
Copy link
Member

Closes #122723

#122653 fixed a scoring bug for knn rescore. This bug should have been caught earlier, as scores were not being sorted when retrieved from knn search.

This PR adds an integration test for kNN section, kNN query and kNN retriever to double check that documents added to multiple shards, in random order, can be successfully rescored.

@carlosdelest carlosdelest added >test Issues or PRs that are addressing/adding tests auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.0.0 v8.18.0 v8.19.0 v9.1.0 labels Feb 17, 2025
@carlosdelest carlosdelest marked this pull request as ready for review February 18, 2025 08:55
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

for (int i = 1; i < docs.length; i++) {
assert docs[i - 1] < docs[i] : "doc ids are not in order: " + Arrays.toString(docs);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like this is redundant if we have appropriate test coverage? I was also wondering if it may be worth changing the two first arguments into a ScoreDoc[] given that's how stuff comes in, and perhaps unifying the sorting here. I realize though that this is a copy of a Lucene class and the change I am suggesting will make it diverge from its original source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like this is redundant if we have appropriate test coverage?

My thinking was to provide a way to understand a test failure in an easier way in case someone provided a non-sorted array, instead of going through all the investigations that you had to do 😓

I'm happy with removing the assertion in case you think it's unnecessary, but I think it helps to understand what the preconditions for this constructor are.

I was also wondering if it may be worth changing the two first arguments into a ScoreDoc[] given that's how stuff comes in, and perhaps unifying the sorting here. I realize though that this is a copy of a Lucene class and the change I am suggesting will make it diverge from its original source.

I think that's a good idea. I will give it a try.

I realize though that this is a copy of a Lucene class and the change I am suggesting will make it diverge from its original source.

It already diverges a bit in terms of making it easier to create - as long as it's on the constructor stuff I think we should be good for doing the change.

I'll give it a go and come back for feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a test for RescoreKnnVectorQuery that indexes a bunch of random vectors, searches with a random vector and asserts the rewrite is a KnnScoreDocQuery with the appropriately ordered values.

It seems we are almost there in RescoreKnnVectorQueryTests, but maybe add some assertions there. Maybe via package private methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be worth changing the two first arguments into a ScoreDoc[] given that's how stuff comes in, and perhaps unifying the sorting here

@javanna I gave it a try in a073f43 - I like it more, it simplifies how clients create this query plus we enforce the invariant in the constructor itself 💯

We should have a test for RescoreKnnVectorQuery that indexes a bunch of random vectors, searches with a random vector and asserts the rewrite is a KnnScoreDocQuery with the appropriately ordered values.

@benwtrent I think the change in a073f43 makes it unnecessary. We're already checking via random insertions in the test. Do you think we need to add something else to make sure this doesn't bite us again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing the sort in the ctor is fine and as long as we have tests that fill fail if somebody removes that sort, I am happy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RescoreKnnVectorQueryIT add those tests. I checked by removing the sort that Luca added back in #122653 that this was caught by the newly added tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. It also allows to share some code between the two consumers. Perhaps make it clear in the javadocs that this is no longer a straight copy of its lucene sibling. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make it clear in the javadocs that this is no longer a straight copy of its lucene sibling.

👍 I've clarified that in ee464fe

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@carlosdelest carlosdelest enabled auto-merge (squash) February 21, 2025 18:43
@carlosdelest
Copy link
Member Author

@elasticsearchmachine update branch

@carlosdelest carlosdelest merged commit f5e2a92 into elastic:main Feb 24, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.0
8.18 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 122801

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Feb 24, 2025
carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Feb 24, 2025
carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Feb 24, 2025
(cherry picked from commit f5e2a92)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/vectors/RescoreKnnVectorQuery.java
#	server/src/test/java/org/elasticsearch/search/vectors/RescoreKnnVectorQueryTests.java
@carlosdelest
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.x
9.0
8.18

Questions ?

Please refer to the Backport tool documentation

carlosdelest added a commit to carlosdelest/elasticsearch that referenced this pull request Feb 24, 2025
(cherry picked from commit f5e2a92)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/vectors/RescoreKnnVectorQuery.java
#	server/src/test/java/org/elasticsearch/search/vectors/RescoreKnnVectorQueryTests.java
elasticsearchmachine pushed a commit that referenced this pull request Feb 24, 2025
carlosdelest added a commit that referenced this pull request Feb 24, 2025
* Add rescore knn vector test coverage (#122801)

(cherry picked from commit f5e2a92)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/vectors/RescoreKnnVectorQuery.java
#	server/src/test/java/org/elasticsearch/search/vectors/RescoreKnnVectorQueryTests.java

* Fix merge for 8.x
carlosdelest added a commit that referenced this pull request Feb 24, 2025
* Add rescore knn vector test coverage (#122801)

(cherry picked from commit f5e2a92)

# Conflicts:
#	server/src/main/java/org/elasticsearch/search/vectors/RescoreKnnVectorQuery.java
#	server/src/test/java/org/elasticsearch/search/vectors/RescoreKnnVectorQueryTests.java

* Fix merge for 8.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v8.18.0 v8.19.0 v9.0.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increase test coverage for knn rescore vector
4 participants