Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits #25644

colings86 · 2017-07-11T09:58:26Z

martijnvg · 2017-07-11T10:06:03Z

core/src/main/java/org/elasticsearch/search/fetch/subphase/DocValueFieldsFetchSubPhase.java

@@ -37,8 +41,44 @@
 */
 public final class DocValueFieldsFetchSubPhase implements FetchSubPhase {

+//    @Override
+//    public void hitExecute(SearchContext context, HitContext hitContext) throws IOException {


remove this commented code?

Yep, only leaving it here until I ensure the tests pass so I have a reference of the old code

martijnvg · 2017-07-11T10:07:06Z

core/src/main/java/org/elasticsearch/search/fetch/subphase/DocValueFieldsFetchSubPhase.java

        if (context.docValueFieldsContext() == null) {
            return;
        }
+
+        Arrays.sort(hits, (a, b) -> Integer.compare(a.docId(), b.docId()));


isn't this also changing the order in which the hits are being serialized?

Yep, I pushed a change just before you commented :)

jpountz

The PR looks good to me as-is but I left some suggestions of improvements.

jpountz · 2017-07-11T12:01:34Z

core/src/main/java/org/elasticsearch/search/fetch/subphase/DocValueFieldsFetchSubPhase.java

        if (context.docValueFieldsContext() == null) {
            return;
        }
+
+        hits = hits.clone(); // don't modify the incoming hits
+        Arrays.sort(hits, (a, b) -> Integer.compare(a.docId(), b.docId()));


matter of taste but I tend to like method refs better, ie. Arrays.sort(hits, Comparators.comparing(SearchHit::docId))

jpountz · 2017-07-11T12:07:26Z

core/src/main/java/org/elasticsearch/search/fetch/subphase/DocValueFieldsFetchSubPhase.java

+                for (SearchHit hit : hits) {
+                    int readerIndex = ReaderUtil.subIndex(hit.docId(), context.searcher().getIndexReader().leaves());
+                    // if the reader index has changed we need to get a new doc values reader instance
+                    if (readerIndex != currentReaderIndex) {


you could do if (subReaderContext == null || hit.docId() >= subReaderContext.docBase + subReaderContext.reader().maxDoc()) to avoid doing the ReaderUtil.subIndex binary search for every doc

jimczi

So the problem currently is that ScriptDocValues may reuse the values internally so reusing the same ScriptDocValues inside a segment is not allowed.
One solution is to change all ScriptDocValues to never reuse the values internally but I think we should do the other way around and make the DocValuesFieldsFetchSubPhase clone the values of the ScriptDocValues for each docID. I think it's better to do it this way since the fetch sub phase is not supposed to hit many documents whereas the aggregation that uses the ScriptDocValues will hit them all ?

jpountz · 2017-07-11T16:26:11Z

One solution is to change all ScriptDocValues to never reuse the values internally but I think we should do the other way around and make the DocValuesFieldsFetchSubPhase clone the values of the ScriptDocValues for each docID.

I was thinking the same until I realized that both numbers and strings do not reuse, even though they are probably the most common types one would use in scripts. On the other hand, dates and geo points reuse objects even though they are probably less commonly used in scripts. Maybe we should just align them with strings and numbers?

colings86 · 2017-07-12T07:55:54Z

My latest commit changes dates and geo points to not reuse objects.

jpountz

@jimczi What do you think?

jpountz · 2017-07-12T08:12:45Z

core/src/main/java/org/elasticsearch/index/fielddata/ScriptDocValues.java

@@ -340,7 +340,7 @@ public void setNextDocId(int docId) throws IOException {
                resize(in.docValueCount());
                for (int i = 0; i < count; i++) {
                    GeoPoint point = in.nextValue();
-                    values[i].reset(point.lat(), point.lon());
+                    values[i] = new GeoPoint(point.lat(), point.lon());


I'm wondering whether we should keep things this way here and do the cloning in get(int index)/getValue() to help GC by having even shorter lived objects, and potentially make escape analysis more likely to not ever create those objects.

jimczi · 2017-07-12T08:22:35Z

On the other hand, dates and geo points reuse objects even though they are probably less commonly used in scripts. Maybe we should just align them with strings and numbers?

Sure that's fine. Your last comment regarding GC is also a solution, we could not reuse objects and make sure that we don't create them when it's not needed (lazy creation on get).

My latest commit changes dates and geo points to not reuse objects.

I think the BinaryScriptDocValues reuses the BytesRef as well so it needs to cloning too ?

jimczi · 2017-07-12T08:38:49Z

We don't use the BinaryScriptDocValues directly to retrieve doc values so it should be fine. Though I am not sure that it won't be a problem later so I think it would be good to clearly mark the intention in the javadocs. I think it's dangerous to rely on the fact that some ScriptDocValues can reuse and some can't.

jpountz · 2017-07-12T09:16:59Z

Though I am not sure that it won't be a problem later so I think it would be good to clearly mark the intention in the javadocs. I think it's dangerous to rely on the fact that some ScriptDocValues can reuse and some can't.

+1

… multiple hits Closes #24986

jimczi

LGTM

colings86 · 2017-07-12T10:59:54Z

Thanks @jimczi, there are still some failing rest tests that I am working through so might ping for review again if fixing those gets complex enough to warrant a review

* master: Fix inadvertent rename of systemd tests Adding basic search request documentation for high level client (elastic#25651) Disallow lang to be used with Stored Scripts (elastic#25610) Fix typo in ScriptDocValues deprecation warnings (elastic#25672) Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644) Query range fields by doc values when they are expected to be more efficient than points. Remove SearchHit#internalHits (elastic#25653) [DOCS] Reorganized the highlighting topic so it's less confusing.

* master: (181 commits) Use a non default port range in MockTransportService Add a shard filter search phase to pre-filter shards based on query rewriting (elastic#25658) Prevent excessive disk consumption by log files Migrate RestHttpResponseHeadersIT to ESRestTestCase (elastic#25675) Use config directory to find jvm.options Fix inadvertent rename of systemd tests Adding basic search request documentation for high level client (elastic#25651) Disallow lang to be used with Stored Scripts (elastic#25610) Fix typo in ScriptDocValues deprecation warnings (elastic#25672) Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644) Query range fields by doc values when they are expected to be more efficient than points. Remove SearchHit#internalHits (elastic#25653) [DOCS] Reorganized the highlighting topic so it's less confusing. Add an underscore to flood stage setting Avoid failing install if system-sysctl is masked Add another parent value option to join documentation (elastic#25609) Ensure we rewrite common queries to `match_none` if possible (elastic#25650) Remove reference to field-stats docs. Optimize the order of bytes in uuids for better compression. (elastic#24615) Fix BytesReferenceStreamInput#skip with offset (elastic#25634) ...

colings86 added :Search/Search Search-related issues that do not fall into other categories >non-issue review v6.0.0 labels Jul 11, 2017

colings86 self-assigned this Jul 11, 2017

colings86 requested a review from jpountz July 11, 2017 09:58

martijnvg reviewed Jul 11, 2017

View reviewed changes

jpountz approved these changes Jul 11, 2017

View reviewed changes

jimczi reviewed Jul 11, 2017

View reviewed changes

jpountz approved these changes Jul 12, 2017

View reviewed changes

colings86 added 4 commits July 12, 2017 10:51

Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for…

d0e9038

… multiple hits Closes #24986

iter

e0e9e74

Update ScriptDocValues to not reuse GeoPoint and Date objects

ec37b53

added Javadoc about script value re-use

c8efa14

jimczi approved these changes Jul 12, 2017

View reviewed changes

colings86 merged commit 55a157e into elastic:master Jul 12, 2017

colings86 deleted the fix/24986 branch July 12, 2017 12:03

clintongormley added v6.0.0-beta1 and removed v6.0.0 labels Jul 25, 2017

jpountz mentioned this pull request Sep 25, 2017

ScriptFieldsFetchSubPhase should create search scripts once per segment #26775

Closed

jimczi mentioned this pull request Nov 14, 2017

Reduce synchronization on field data cache #27365

Merged

Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits #25644

Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits #25644

Uh oh!

Conversation

colings86 commented Jul 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimczi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz commented Jul 11, 2017

Uh oh!

colings86 commented Jul 12, 2017

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimczi commented Jul 12, 2017

Uh oh!

jimczi commented Jul 12, 2017

Uh oh!

jpountz commented Jul 12, 2017

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

colings86 commented Jul 12, 2017

Uh oh!

Uh oh!

jimczi left a comment •

edited

Loading