Skip to content

Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits #25644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 12, 2017
Merged

Conversation

colings86
Copy link
Contributor

Closes #24986

@colings86 colings86 added :Search/Search Search-related issues that do not fall into other categories >non-issue review v6.0.0 labels Jul 11, 2017
@colings86 colings86 self-assigned this Jul 11, 2017
@colings86 colings86 requested a review from jpountz July 11, 2017 09:58
@@ -37,8 +41,44 @@
*/
public final class DocValueFieldsFetchSubPhase implements FetchSubPhase {

// @Override
// public void hitExecute(SearchContext context, HitContext hitContext) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this commented code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, only leaving it here until I ensure the tests pass so I have a reference of the old code

if (context.docValueFieldsContext() == null) {
return;
}

Arrays.sort(hits, (a, b) -> Integer.compare(a.docId(), b.docId()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this also changing the order in which the hits are being serialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I pushed a change just before you commented :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thx!

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me as-is but I left some suggestions of improvements.

if (context.docValueFieldsContext() == null) {
return;
}

hits = hits.clone(); // don't modify the incoming hits
Arrays.sort(hits, (a, b) -> Integer.compare(a.docId(), b.docId()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matter of taste but I tend to like method refs better, ie. Arrays.sort(hits, Comparators.comparing(SearchHit::docId))

for (SearchHit hit : hits) {
int readerIndex = ReaderUtil.subIndex(hit.docId(), context.searcher().getIndexReader().leaves());
// if the reader index has changed we need to get a new doc values reader instance
if (readerIndex != currentReaderIndex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do if (subReaderContext == null || hit.docId() >= subReaderContext.docBase + subReaderContext.reader().maxDoc()) to avoid doing the ReaderUtil.subIndex binary search for every doc

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem currently is that ScriptDocValues may reuse the values internally so reusing the same ScriptDocValues inside a segment is not allowed.
One solution is to change all ScriptDocValues to never reuse the values internally but I think we should do the other way around and make the DocValuesFieldsFetchSubPhase clone the values of the ScriptDocValues for each docID. I think it's better to do it this way since the fetch sub phase is not supposed to hit many documents whereas the aggregation that uses the ScriptDocValues will hit them all ?

@jpountz
Copy link
Contributor

jpountz commented Jul 11, 2017

One solution is to change all ScriptDocValues to never reuse the values internally but I think we should do the other way around and make the DocValuesFieldsFetchSubPhase clone the values of the ScriptDocValues for each docID.

I was thinking the same until I realized that both numbers and strings do not reuse, even though they are probably the most common types one would use in scripts. On the other hand, dates and geo points reuse objects even though they are probably less commonly used in scripts. Maybe we should just align them with strings and numbers?

@colings86
Copy link
Contributor Author

My latest commit changes dates and geo points to not reuse objects.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi What do you think?

@@ -340,7 +340,7 @@ public void setNextDocId(int docId) throws IOException {
resize(in.docValueCount());
for (int i = 0; i < count; i++) {
GeoPoint point = in.nextValue();
values[i].reset(point.lat(), point.lon());
values[i] = new GeoPoint(point.lat(), point.lon());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether we should keep things this way here and do the cloning in get(int index)/getValue() to help GC by having even shorter lived objects, and potentially make escape analysis more likely to not ever create those objects.

@jimczi
Copy link
Contributor

jimczi commented Jul 12, 2017

On the other hand, dates and geo points reuse objects even though they are probably less commonly used in scripts. Maybe we should just align them with strings and numbers?

Sure that's fine. Your last comment regarding GC is also a solution, we could not reuse objects and make sure that we don't create them when it's not needed (lazy creation on get).

My latest commit changes dates and geo points to not reuse objects.

I think the BinaryScriptDocValues reuses the BytesRef as well so it needs to cloning too ?

@jimczi
Copy link
Contributor

jimczi commented Jul 12, 2017

We don't use the BinaryScriptDocValues directly to retrieve doc values so it should be fine. Though I am not sure that it won't be a problem later so I think it would be good to clearly mark the intention in the javadocs. I think it's dangerous to rely on the fact that some ScriptDocValues can reuse and some can't.

@jpountz
Copy link
Contributor

jpountz commented Jul 12, 2017

Though I am not sure that it won't be a problem later so I think it would be good to clearly mark the intention in the javadocs. I think it's dangerous to rely on the fact that some ScriptDocValues can reuse and some can't.

+1

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@colings86
Copy link
Contributor Author

Thanks @jimczi, there are still some failing rest tests that I am working through so might ping for review again if fixing those gets complex enough to warrant a review

@colings86 colings86 merged commit 55a157e into elastic:master Jul 12, 2017
@colings86 colings86 deleted the fix/24986 branch July 12, 2017 12:03
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jul 12, 2017
* master:
  Fix inadvertent rename of systemd tests
  Adding basic search request documentation for high level client (elastic#25651)
  Disallow lang to be used with Stored Scripts (elastic#25610)
  Fix typo in ScriptDocValues deprecation warnings (elastic#25672)
  Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644)
  Query range fields by doc values when they are expected to be more efficient than points.
  Remove SearchHit#internalHits (elastic#25653)
  [DOCS] Reorganized the highlighting topic so it's less confusing.
jasontedor added a commit to fred84/elasticsearch that referenced this pull request Jul 12, 2017
* master: (181 commits)
  Use a non default port range in MockTransportService
  Add a shard filter search phase to pre-filter shards based on query rewriting (elastic#25658)
  Prevent excessive disk consumption by log files
  Migrate RestHttpResponseHeadersIT to ESRestTestCase (elastic#25675)
  Use config directory to find jvm.options
  Fix inadvertent rename of systemd tests
  Adding basic search request documentation for high level client (elastic#25651)
  Disallow lang to be used with Stored Scripts (elastic#25610)
  Fix typo in ScriptDocValues deprecation warnings (elastic#25672)
  Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644)
  Query range fields by doc values when they are expected to be more efficient than points.
  Remove SearchHit#internalHits (elastic#25653)
  [DOCS] Reorganized the highlighting topic so it's less confusing.
  Add an underscore to flood stage setting
  Avoid failing install if system-sysctl is masked
  Add another parent value option to join documentation (elastic#25609)
  Ensure we rewrite common queries to `match_none` if possible (elastic#25650)
  Remove reference to field-stats docs.
  Optimize the order of bytes in uuids for better compression. (elastic#24615)
  Fix BytesReferenceStreamInput#skip with offset (elastic#25634)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search/Search Search-related issues that do not fall into other categories v6.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants