-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits #25644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -37,8 +41,44 @@ | |||
*/ | |||
public final class DocValueFieldsFetchSubPhase implements FetchSubPhase { | |||
|
|||
// @Override | |||
// public void hitExecute(SearchContext context, HitContext hitContext) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this commented code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, only leaving it here until I ensure the tests pass so I have a reference of the old code
if (context.docValueFieldsContext() == null) { | ||
return; | ||
} | ||
|
||
Arrays.sort(hits, (a, b) -> Integer.compare(a.docId(), b.docId())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this also changing the order in which the hits are being serialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I pushed a change just before you commented :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me as-is but I left some suggestions of improvements.
if (context.docValueFieldsContext() == null) { | ||
return; | ||
} | ||
|
||
hits = hits.clone(); // don't modify the incoming hits | ||
Arrays.sort(hits, (a, b) -> Integer.compare(a.docId(), b.docId())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matter of taste but I tend to like method refs better, ie. Arrays.sort(hits, Comparators.comparing(SearchHit::docId))
for (SearchHit hit : hits) { | ||
int readerIndex = ReaderUtil.subIndex(hit.docId(), context.searcher().getIndexReader().leaves()); | ||
// if the reader index has changed we need to get a new doc values reader instance | ||
if (readerIndex != currentReaderIndex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do if (subReaderContext == null || hit.docId() >= subReaderContext.docBase + subReaderContext.reader().maxDoc())
to avoid doing the ReaderUtil.subIndex
binary search for every doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the problem currently is that ScriptDocValues may reuse the values internally so reusing the same ScriptDocValues inside a segment is not allowed.
One solution is to change all ScriptDocValues to never reuse the values internally but I think we should do the other way around and make the DocValuesFieldsFetchSubPhase clone the values of the ScriptDocValues for each docID. I think it's better to do it this way since the fetch sub phase is not supposed to hit many documents whereas the aggregation that uses the ScriptDocValues will hit them all ?
I was thinking the same until I realized that both numbers and strings do not reuse, even though they are probably the most common types one would use in scripts. On the other hand, dates and geo points reuse objects even though they are probably less commonly used in scripts. Maybe we should just align them with strings and numbers? |
My latest commit changes dates and geo points to not reuse objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi What do you think?
@@ -340,7 +340,7 @@ public void setNextDocId(int docId) throws IOException { | |||
resize(in.docValueCount()); | |||
for (int i = 0; i < count; i++) { | |||
GeoPoint point = in.nextValue(); | |||
values[i].reset(point.lat(), point.lon()); | |||
values[i] = new GeoPoint(point.lat(), point.lon()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we should keep things this way here and do the cloning in get(int index)
/getValue()
to help GC by having even shorter lived objects, and potentially make escape analysis more likely to not ever create those objects.
Sure that's fine. Your last comment regarding GC is also a solution, we could not reuse objects and make sure that we don't create them when it's not needed (lazy creation on get).
I think the BinaryScriptDocValues reuses the BytesRef as well so it needs to cloning too ? |
We don't use the BinaryScriptDocValues directly to retrieve doc values so it should be fine. Though I am not sure that it won't be a problem later so I think it would be good to clearly mark the intention in the javadocs. I think it's dangerous to rely on the fact that some ScriptDocValues can reuse and some can't. |
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @jimczi, there are still some failing rest tests that I am working through so might ping for review again if fixing those gets complex enough to warrant a review |
* master: Fix inadvertent rename of systemd tests Adding basic search request documentation for high level client (elastic#25651) Disallow lang to be used with Stored Scripts (elastic#25610) Fix typo in ScriptDocValues deprecation warnings (elastic#25672) Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644) Query range fields by doc values when they are expected to be more efficient than points. Remove SearchHit#internalHits (elastic#25653) [DOCS] Reorganized the highlighting topic so it's less confusing.
* master: (181 commits) Use a non default port range in MockTransportService Add a shard filter search phase to pre-filter shards based on query rewriting (elastic#25658) Prevent excessive disk consumption by log files Migrate RestHttpResponseHeadersIT to ESRestTestCase (elastic#25675) Use config directory to find jvm.options Fix inadvertent rename of systemd tests Adding basic search request documentation for high level client (elastic#25651) Disallow lang to be used with Stored Scripts (elastic#25610) Fix typo in ScriptDocValues deprecation warnings (elastic#25672) Changes DocValueFieldsFetchSubPhase to reuse doc values iterators for multiple hits (elastic#25644) Query range fields by doc values when they are expected to be more efficient than points. Remove SearchHit#internalHits (elastic#25653) [DOCS] Reorganized the highlighting topic so it's less confusing. Add an underscore to flood stage setting Avoid failing install if system-sysctl is masked Add another parent value option to join documentation (elastic#25609) Ensure we rewrite common queries to `match_none` if possible (elastic#25650) Remove reference to field-stats docs. Optimize the order of bytes in uuids for better compression. (elastic#24615) Fix BytesReferenceStreamInput#skip with offset (elastic#25634) ...
Closes #24986