You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Store _doc_count field as custom term frequency (#65776)
A while back, Lucene introduced the ability to index custom term frequencies, ie. giving users
the ability to provide a numeric value that should be indexed as a term frequency rather than
letting Lucene compute the term frequency by itself based on the number of occurrences of
a term.
This PR modifies the _doc_count field so that it is stored as Lucene custom term frequency.
A benefit of moving to custom term frequencies is that Lucene will automatically compute global term
statistics like totalTermFreq which will let us know the sum of the values of the _doc_count field across
an entire shard. This could in-turn be useful to generalize optimizations to rollup indices,
e.g. buckets aggregations where all documents fall into the same bucket.
Relates to #64503
Copy file name to clipboardExpand all lines: server/src/main/java/org/elasticsearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java
+2-2
Original file line number
Diff line number
Diff line change
@@ -315,7 +315,7 @@ public void collect(int doc, long owningBucketOrd) throws IOException {
315
315
return;
316
316
}
317
317
intord = singleValues.ordValue();
318
-
longdocCount = docCountProvider.getDocCount(doc);
318
+
intdocCount = docCountProvider.getDocCount(doc);
319
319
segmentDocCounts.increment(ord + 1, docCount);
320
320
}
321
321
});
@@ -329,7 +329,7 @@ public void collect(int doc, long owningBucketOrd) throws IOException {
0 commit comments