Skip to content

[7.x] Store _doc_count field as custom term frequency #65825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 3, 2020

Conversation

csoulios
Copy link
Contributor

@csoulios csoulios commented Dec 3, 2020

Backports #65776 to 7.x

A while back, Lucene introduced the ability to index custom term frequencies, ie. giving users the ability to provide a numeric value that should be indexed as a term frequency rather than letting Lucene compute the term frequency by itself based on the number of occurrences of a term.

This PR modifies the _doc_count field so that it is stored as Lucene custom term frequency.

A benefit of moving to custom term frequencies is that Lucene will automatically compute global term statistics like totalTermFreq which will let us know the sum of the values of the _doc_count field across an entire shard. This could in-turn be useful to generalize optimizations to rollup indices, e.g. buckets aggregations where all documents fall into the same bucket.

Relates to #64503

A while back, Lucene introduced the ability to index custom term frequencies, ie. giving users 
the ability to provide a numeric value that should be indexed as a term frequency rather than 
letting Lucene compute the term frequency by itself based on the number of occurrences of 
a term.

This PR modifies the _doc_count field so that it is stored as Lucene custom term frequency.

A benefit of moving to custom term frequencies is that Lucene will automatically compute global term 
statistics like totalTermFreq which will let us know the sum of the values of the _doc_count field across 
an entire shard. This could in-turn be useful to generalize optimizations to rollup indices,
 e.g. buckets aggregations where all documents fall into the same bucket.

Relates to elastic#64503
@csoulios csoulios added :Search Foundations/Mapping Index mappings, including merging and defining field types backport labels Dec 3, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@csoulios csoulios merged commit c3ff707 into elastic:7.x Dec 3, 2020
@csoulios csoulios deleted the doc_count_term_freq_7.x branch December 3, 2020 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants