Skip to content

Fix binary doc values fetching in _search #29567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 18, 2018

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Apr 17, 2018

Binary doc values are retrieved during the DocValueFetchSubPhase through an instance of ScriptDocValues.
Since 6.0 ScriptDocValues instances are not allowed to reuse the object that they return
(#26775) but BinaryScriptDocValues doesn't follow this
restriction and reuses instances of BytesRefBuilder among different documents.
This results in field values assigned to the wrong document in the response.
This commit fixes this issue by recreating the BytesRef for each value that needs to be returned.

Fixes #29565

 Binary doc values are retrieved during the DocValueFetchSubPhase through an instance of ScriptDocValues.
 Since 6.0 ScriptDocValues instances are not allowed to reuse the object that they return
 (elastic#26775) but BinaryScriptDocValues doesn't follow this
 restriction and reuses instances of BytesRefBuilder among different documents.
 This results in `field` values assigned to the wrong document in the response.
 This commit fixes this issue by recreating the BytesRef for each value that needs to be returned.

 Fixes elastic#29565
@jimczi jimczi added >bug :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.3.0 labels Apr 17, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach is going to require additional BytesRef copies for strings. Maybe instead replace the get calls in ScriptDocValues.BytesRefs with a toBytesRef?

@jimczi
Copy link
Contributor Author

jimczi commented Apr 18, 2018

Thanks @rjernst @jpountz
I pushed a commit that removes the additional copy for strings.

@jimczi jimczi merged commit a7c9857 into elastic:master Apr 18, 2018
@jimczi jimczi deleted the bug/binary_doc_values_reuse branch April 18, 2018 11:01
@s1monw
Copy link
Contributor

s1monw commented Apr 18, 2018

you already merged it which is fine, I just started looking into it and I wonder if we should instead of having a BytesRefBuilder that we copy into from SortedBinaryDocValues and then copy again when it's consumed. Wouldn't it be simpler to hold on BytesRef directly that we deepCopy from the incoming values which we do anyway somehow? This would safe on time of copying? @jimczi @jpountz WDYT

@jpountz
Copy link
Contributor

jpountz commented Apr 18, 2018

This is what the PR did initially but I asked Jim to change it because it added one more object allocation for strings. I suspect that object allocations may be easier to skip thanks to escape analysis this way as well, since we don't put the copies in a long-living array. In the end, I don't feel strongly about it, if you think it's better to change back to performing a deep copy of the BytesRef, I'm good with it.

@s1monw
Copy link
Contributor

s1monw commented Apr 18, 2018

I wonder if we still need to have a shared version and instead can hold a String[] that way we can skip all copies to BytesRef for strings and convert to string on nextDoc?

@jimczi
Copy link
Contributor Author

jimczi commented Apr 18, 2018

This would save one copy for the keyword case and would not affect binary .
+1 I can open a new pr for this.

@s1monw
Copy link
Contributor

s1monw commented Apr 18, 2018

+1 I can open a new pr for this.

++

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Apr 18, 2018
This commit refactors ScriptDocValues.Strings to directly creates String objects
instead of using an intermediate BytesRef's copy.
ScriptDocValues.Binary is also changed to create a single copy of BytesRef per consumed value.

Relates elastic#29567
jimczi added a commit that referenced this pull request Apr 18, 2018
This commit removes a BytesRef copy introduced in #29567 and not
required.

Relates #29567
jimczi added a commit that referenced this pull request Apr 18, 2018
Binary doc values are retrieved during the DocValueFetchSubPhase through an instance of ScriptDocValues.
Since 6.0 ScriptDocValues instances are not allowed to reuse the object that they return (#26775) but BinaryScriptDocValues doesn't follow this restriction and reuses instances of BytesRefBuilder among different documents.
This results in `field` values assigned to the wrong document in the response.
This commit fixes this issue by recreating the BytesRef for each value that needs to be returned.

 Fixes #29565
jimczi added a commit that referenced this pull request Apr 18, 2018
This commit removes a BytesRef copy introduced in #29567 and not
required.

Relates #29567
dnhatn added a commit that referenced this pull request Apr 18, 2018
* master:
  Remove the index thread pool (#29556)
  Remove extra copy in ScriptDocValues.Strings
  Fix full cluster restart test recovery (#29545)
  Fix binary doc values fetching in _search (#29567)
  Mutes failing MovAvgIT tests
  Fix the assertion message for an incorrect current version. (#29572)
  Fix the version ID for v5.6.10. (#29570)
  Painless Spec Documentation Clean Up (#29441)
  Add versions 5.6.10 and 6.2.5
  [TEST] test against scaled value instead of fixed epsilon in MovAvgIT
  Remove `flatSettings` support from request classes (#29560)
  MapperService to wrap a single DocumentMapper. (#29511)
  Fix dependency checks on libs when generating Eclipse configuration. (#29550)
  Add null_value support to geo_point type (#29451)
  Add documentation about the include_type_name option. (#29555)
  Enforce translog access via engine (#29542)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Doc value get indexed into the wrong document
6 participants