-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix binary doc values fetching in _search #29567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Binary doc values are retrieved during the DocValueFetchSubPhase through an instance of ScriptDocValues. Since 6.0 ScriptDocValues instances are not allowed to reuse the object that they return (elastic#26775) but BinaryScriptDocValues doesn't follow this restriction and reuses instances of BytesRefBuilder among different documents. This results in `field` values assigned to the wrong document in the response. This commit fixes this issue by recreating the BytesRef for each value that needs to be returned. Fixes elastic#29565
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach is going to require additional BytesRef copies for strings. Maybe instead replace the get
calls in ScriptDocValues.BytesRefs
with a toBytesRef
?
you already merged it which is fine, I just started looking into it and I wonder if we should instead of having a |
This is what the PR did initially but I asked Jim to change it because it added one more object allocation for strings. I suspect that object allocations may be easier to skip thanks to escape analysis this way as well, since we don't put the copies in a long-living array. In the end, I don't feel strongly about it, if you think it's better to change back to performing a deep copy of the BytesRef, I'm good with it. |
I wonder if we still need to have a shared version and instead can hold a |
This would save one copy for the |
++ |
This commit refactors ScriptDocValues.Strings to directly creates String objects instead of using an intermediate BytesRef's copy. ScriptDocValues.Binary is also changed to create a single copy of BytesRef per consumed value. Relates elastic#29567
Binary doc values are retrieved during the DocValueFetchSubPhase through an instance of ScriptDocValues. Since 6.0 ScriptDocValues instances are not allowed to reuse the object that they return (#26775) but BinaryScriptDocValues doesn't follow this restriction and reuses instances of BytesRefBuilder among different documents. This results in `field` values assigned to the wrong document in the response. This commit fixes this issue by recreating the BytesRef for each value that needs to be returned. Fixes #29565
* master: Remove the index thread pool (#29556) Remove extra copy in ScriptDocValues.Strings Fix full cluster restart test recovery (#29545) Fix binary doc values fetching in _search (#29567) Mutes failing MovAvgIT tests Fix the assertion message for an incorrect current version. (#29572) Fix the version ID for v5.6.10. (#29570) Painless Spec Documentation Clean Up (#29441) Add versions 5.6.10 and 6.2.5 [TEST] test against scaled value instead of fixed epsilon in MovAvgIT Remove `flatSettings` support from request classes (#29560) MapperService to wrap a single DocumentMapper. (#29511) Fix dependency checks on libs when generating Eclipse configuration. (#29550) Add null_value support to geo_point type (#29451) Add documentation about the include_type_name option. (#29555) Enforce translog access via engine (#29542)
Binary doc values are retrieved during the DocValueFetchSubPhase through an instance of ScriptDocValues.
Since 6.0 ScriptDocValues instances are not allowed to reuse the object that they return
(#26775) but BinaryScriptDocValues doesn't follow this
restriction and reuses instances of BytesRefBuilder among different documents.
This results in
field
values assigned to the wrong document in the response.This commit fixes this issue by recreating the BytesRef for each value that needs to be returned.
Fixes #29565