Skip to content

Knn vector rescoring to sort score docs #122653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 15, 2025

Conversation

javanna
Copy link
Member

@javanna javanna commented Feb 14, 2025

RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of doc ids and corresponding array including scores fo such docs. A binary search is performed on top of the docs array, and such global ids are converted back to segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search to return non deterministic results, which in turn made us look up wrong docs, something using out of bound ids. One symptom of this was observed in a DFSProfilerIT test failure which triggered a Lucene assertion around doc id being outside of the range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores and providing them to KnnScoreDocQuery upon rewrite.

Relates to #116663

Closes #119711

RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of
doc ids and corresponding array including scores fo such docs. A binary search is
performed on top of the docs array, and such global ids are converted back to
segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search
to return non deterministic results, which in turn made us look up wrong docs,
something using out of bound ids. One symptom of this was observed in a DFSProfilerIT
test failure which triggered a Lucene assertion around doc id being outside of the
range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores
and providing them to KnnScoreDocQuery upon rewrite.

Relates to elastic#116663

Closes elastic#119711
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Feb 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@javanna
Copy link
Member Author

javanna commented Feb 14, 2025

This looks like a pretty bad one. It had nothing to do with profiling although it surfaced from a DFSProfilerIT test failure. I could reproduce it in DFSProfilerIT removing profiling, trimming down to a single shard, and disabling inter-segment concurrency, via the following code using 7E03FF959CA490B6 as seed. I assume we are missing some test coverage, cause bugs like these should be quickly and more easily caught? Was knn vector rescoring providing correct results at all? We were effectively doing binary search against an unsorted array there....

    public void testProfileDfs() throws Exception {
        String textField = "text_field";
        String numericField = "number";
        String vectorField = "vector";
        String indexName = "text-dfs-profile";
        createIndex(indexName, vectorField);
        ensureGreen();

        int numDocs = randomIntBetween(10, 50);
        IndexRequestBuilder[] docs = new IndexRequestBuilder[numDocs];
        for (int i = 0; i < numDocs; i++) {
            docs[i] = prepareIndex(indexName).setId(String.valueOf(i))
                .setSource(
                    textField,
                    English.intToEnglish(i),
                    numericField,
                    i,
                    vectorField,
                    new float[] { randomFloat(), randomFloat(), randomFloat() }
                );
        }
        indexRandom(true, docs);
        refresh();

        float[] queryVector = new float[]{0.2956158f, 0.67327607f, 0.67715585f};
        int k = 10;
        RescoreVectorBuilder rescoreVectorBuilder = new RescoreVectorBuilder(1.8569797f);
        KnnSearchBuilder knnSearchBuilder = new KnnSearchBuilder(
            vectorField,
            queryVector,
            k,
            50,
            rescoreVectorBuilder, //required to fail the test
            null
        );

        assertResponse(
            prepareSearch()
                .setTrackTotalHits(true) //required to fail the test
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                .setKnnSearch(randomList(2, 5, () -> knnSearchBuilder)),
            response -> {
            }
        );
    }

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@javanna javanna added the auto-backport Automatically create backport pull requests when merged label Feb 15, 2025
@javanna javanna merged commit 9adb91d into elastic:main Feb 15, 2025
17 checks passed
@javanna javanna deleted the fix/rescore_knn_query_sort_score_docs branch February 15, 2025 13:24
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.0
8.18 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 122653

javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 15, 2025
RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of
doc ids and corresponding array including scores fo such docs. A binary search is
performed on top of the docs array, and such global ids are converted back to
segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search
to return non deterministic results, which in turn made us look up wrong docs,
something using out of bound ids. One symptom of this was observed in a DFSProfilerIT
test failure which triggered a Lucene assertion around doc id being outside of the
range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores
and providing them to KnnScoreDocQuery upon rewrite.

Relates to elastic#116663

Closes elastic#119711
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 15, 2025
RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of
doc ids and corresponding array including scores fo such docs. A binary search is
performed on top of the docs array, and such global ids are converted back to
segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search
to return non deterministic results, which in turn made us look up wrong docs,
something using out of bound ids. One symptom of this was observed in a DFSProfilerIT
test failure which triggered a Lucene assertion around doc id being outside of the
range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores
and providing them to KnnScoreDocQuery upon rewrite.

Relates to elastic#116663

Closes elastic#119711
javanna added a commit to javanna/elasticsearch that referenced this pull request Feb 15, 2025
RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of
doc ids and corresponding array including scores fo such docs. A binary search is
performed on top of the docs array, and such global ids are converted back to
segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search
to return non deterministic results, which in turn made us look up wrong docs,
something using out of bound ids. One symptom of this was observed in a DFSProfilerIT
test failure which triggered a Lucene assertion around doc id being outside of the
range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores
and providing them to KnnScoreDocQuery upon rewrite.

Relates to elastic#116663

Closes elastic#119711
elasticsearchmachine pushed a commit that referenced this pull request Feb 15, 2025
RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of
doc ids and corresponding array including scores fo such docs. A binary search is
performed on top of the docs array, and such global ids are converted back to
segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search
to return non deterministic results, which in turn made us look up wrong docs,
something using out of bound ids. One symptom of this was observed in a DFSProfilerIT
test failure which triggered a Lucene assertion around doc id being outside of the
range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores
and providing them to KnnScoreDocQuery upon rewrite.

Relates to #116663

Closes #119711
elasticsearchmachine pushed a commit that referenced this pull request Feb 15, 2025
RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of
doc ids and corresponding array including scores fo such docs. A binary search is
performed on top of the docs array, and such global ids are converted back to
segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search
to return non deterministic results, which in turn made us look up wrong docs,
something using out of bound ids. One symptom of this was observed in a DFSProfilerIT
test failure which triggered a Lucene assertion around doc id being outside of the
range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores
and providing them to KnnScoreDocQuery upon rewrite.

Relates to #116663

Closes #119711
elasticsearchmachine pushed a commit that referenced this pull request Feb 15, 2025
RescoreKnnVectorQuery rewrites to KnnScoreDocQuery, which takes a sorted array of
doc ids and corresponding array including scores fo such docs. A binary search is
performed on top of the docs array, and such global ids are converted back to
segment level ids (subtracting the context docbase) when scoring docs.

RescoreKnnVectoryQuery did not sort the array of docs which caused binary search
to return non deterministic results, which in turn made us look up wrong docs,
something using out of bound ids. One symptom of this was observed in a DFSProfilerIT
test failure which triggered a Lucene assertion around doc id being outside of the
range of the bitset of live docs.

The fix is to simply sort the score docs array before extracting docs ids and scores
and providing them to KnnScoreDocQuery upon rewrite.

Relates to #116663

Closes #119711
@carlosdelest
Copy link
Member

carlosdelest commented Feb 17, 2025

Thanks for the fix @javanna . This is a nasty one.

I've opened #122723 as a follow up for adding test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.1 v8.19.0 v9.0.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] DfsProfilerIT testProfileDfs failing
4 participants