Skip to content

Remove empty results before merging #126770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 17, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/126770.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 126770
summary: Remove empty results before merging
area: Search
type: bug
issues:
- 126742
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,12 @@ static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) {
return topDocs;
} else if (topDocs instanceof TopFieldGroups firstTopDocs) {
final Sort sort = new Sort(firstTopDocs.fields);
final TopFieldGroups[] shardTopDocs = results.stream().filter(td -> td != Lucene.EMPTY_TOP_DOCS).toArray(TopFieldGroups[]::new);
assert results.stream().noneMatch(topDoc -> topDoc == Lucene.EMPTY_TOP_DOCS);
final TopFieldGroups[] shardTopDocs = results.toArray(TopFieldGroups[]::new);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear can you keep me honest here? the filtering broke the field collapsing tests, I removed it entirely here (see TopFieldGroups#merge). We basically can't deal with empty top docs in there, so we can only assert that that scenario does not present. Is that the case in practice?

Copy link
Contributor

@original-brownbear original-brownbear Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure we can do that, can't we deal with this case by simply returning null here like we did before if the resulting shardTopDocs after filtering are an empty array? (I think there's cases where this can legitimately be empty looking at org.elasticsearch.lucene.grouping.SinglePassGroupingCollector#getTopGroups for example?)

All of that said, I'm really starting to think I simply did a bit of a bad job here upstream. If we simply don't pass the empty top docs into this method ever (like we used to without data node side batching) then we would simply return null out of the box and behavior is unchanged. But for a quick fix, returning null should do the trick?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the null handling one more time, and it seems to work this time. I had to add a bunch of odd null checks, but at least we can now distinguish between proper empty top docs and our own placeholder.

mergedTopDocs = TopFieldGroups.merge(sort, from, topN, shardTopDocs, false);
} else if (topDocs instanceof TopFieldDocs firstTopDocs) {
final Sort sort = checkSameSortTypes(results, firstTopDocs.fields);
final TopFieldDocs[] shardTopDocs = results.stream().filter((td -> td != Lucene.EMPTY_TOP_DOCS)).toArray(TopFieldDocs[]::new);
final TopFieldDocs[] shardTopDocs = removeEmptyResults(results).toArray(TopFieldDocs[]::new);
mergedTopDocs = TopDocs.merge(sort, from, topN, shardTopDocs);
} else {
final TopDocs[] shardTopDocs = results.toArray(new TopDocs[numShards]);
Expand All @@ -163,6 +164,18 @@ static TopDocs mergeTopDocs(List<TopDocs> results, int topN, int from) {
return mergedTopDocs;
}

private static <T extends TopDocs> List<T> removeEmptyResults(List<T> results) {
List<T> nonEmptyResults = new ArrayList<>();
for (T result : results) {
if (result.totalHits.value() > 0 || result.totalHits.relation() != Relation.EQUAL_TO) {
nonEmptyResults.add(result);
} else {
assert result.scoreDocs.length == 0;
}
}
return nonEmptyResults;
}

private static Sort checkSameSortTypes(Collection<TopDocs> results, SortField[] firstSortFields) {
Sort sort = new Sort(firstSortFields);
if (results.size() < 2) return sort;
Expand Down
Loading