-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Coalesce getSortedNumeric calls for ES819 doc values merging #126732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Coalesce getSortedNumeric calls for ES819 doc values merging #126732
Conversation
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
String addressDataOutputName = null; | ||
try ( | ||
var addressMetaOutput = new ByteBuffersIndexOutput(addressMetaBuffer, "meta-temp", "meta-temp"); | ||
// TODO: which IOContext should be used here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was in Martijn's initial implementation, and I didn't know the answer, so I left it. I'd appreciate suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnhatn Do you think usage of IOContext.DEFAULT
is ok here or is there a better IOContext that can be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to do something like this: #126499 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good Jordan.
Would you be able to change the ES819TSDBDocValuesFormatTests#testForceMergeDenseCase()
and ES819TSDBDocValuesFormatTests#testForceMergeSparseCase()
tests by also indexing multi valued sorted numeric doc values? For example by randomly indexing gauge_2
field with multiple values? (similar to the tags
field)
String addressDataOutputName = null; | ||
try ( | ||
var addressMetaOutput = new ByteBuffersIndexOutput(addressMetaBuffer, "meta-temp", "meta-temp"); | ||
// TODO: which IOContext should be used here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnhatn Do you think usage of IOContext.DEFAULT
is ok here or is there a better IOContext that can be used here?
I also re-ran the micro benchmark:
Which looks better as was reported in #125403. UPDATE: Running the same micro benchmark from main branch:
which is relatively slightly slower than what was reported above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating. I left two more comments.
} catch (final IOException ignored) { | ||
// ignore exception | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this try-catch be removed? This method signature does allow IOException
. If addressDataOutputName
is not null, then there should be a temp file, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
I originally added the try-catch because the draft implementation used org.apache.lucene.util.IOUtils.deleteFilesIgnoringExceptions
here. That's a forbidden api so I couldn't use it, but it seemed like we were trying to suppress any IOException that happened during that deletion, so I added the try-catch.
var addressMetaOutput = new ByteBuffersIndexOutput(addressMetaBuffer, "meta-temp", "meta-temp"); | ||
var addressDataOutput = dir.createTempOutput(data.getName(), "address-data", ioContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe similarly to DISIAccumulator
when can encapsulate the accumulation of the addresses in a class that implements Closable and has. a build method that copies data from temp file to actual data file and update metadata?
I think this could make the code a little bit more manageable similar to effect of what DISIAccumulator
did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, I'll add that
When writing the doc values addresses, we currently perform an iteration over all the sorted numeric doc values to calculate the addresses. When merging sorted segments, this iteration is expensive as it requires performing a merge sort.
This patch removes this iteration by instead calculating the addresses while we are writing the values, writing the addresses addresses to a temporary file. Afterwards, they are copied from the temporary file into the merged segment.
Relates to #126111