Use doc_values for streaming _uid / _id #15155
Labels
discuss
>enhancement
high hanging fruit
:Search/Search
Search-related issues that do not fall into other categories
This issue relates heavily to #11887.
In many use cases, there is frequently the need to stream all (or many)
_id
s from Elasticsearch to munge them together with some other data set. In these cases, Elasticsearch is often the actual search platform, but something else is acting as the source of "truth" or a more complete representation of the data (as oppose to just what is indexed to make search work). For example imagine a scroll request that disables_source
:For these use cases, it's not uncommon to want to stream literally more than 50K+ document IDs per second (aka as fast as possible). However, in practice, there is a bottleneck on streaming
_id
s due to the need to fetch the stored_uid
field, decompress it, split it into_id
, then finally serialize it as part of the response. If the aforementioned issue is merged, then we can use doc_values in order to stream these values from disk more efficiently in this use case.Note: It may be worthwhile to consider this for other use cases where source filtering is enabled and all of the selected fields exist in doc values, especially if the user supplies the list of fields using
fielddata_fields
.The text was updated successfully, but these errors were encountered: