Coalesce getSortedNumeric calls for ES819 doc values merging #126732

jordan-powers · 2025-04-11T23:37:25Z

When writing the doc values addresses, we currently perform an iteration over all the sorted numeric doc values to calculate the addresses. When merging sorted segments, this iteration is expensive as it requires performing a merge sort.

This patch removes this iteration by instead calculating the addresses while we are writing the values, writing the addresses addresses to a temporary file. Afterwards, they are copied from the temporary file into the merged segment.

Relates to #126111

…ation

elasticsearchmachine · 2025-04-14T19:58:03Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

jordan-powers · 2025-04-14T19:59:13Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+                String addressDataOutputName = null;
+                try (
+                    var addressMetaOutput = new ByteBuffersIndexOutput(addressMetaBuffer, "meta-temp", "meta-temp");
+                    // TODO: which IOContext should be used here?


This comment was in Martijn's initial implementation, and I didn't know the answer, so I left it. I'd appreciate suggestions

@dnhatn Do you think usage of IOContext.DEFAULT is ok here or is there a better IOContext that can be used here?

I think we need to do something like this: #126499 (comment)

martijnvg

This looks good Jordan.

Would you be able to change the ES819TSDBDocValuesFormatTests#testForceMergeDenseCase() and ES819TSDBDocValuesFormatTests#testForceMergeSparseCase() tests by also indexing multi valued sorted numeric doc values? For example by randomly indexing gauge_2 field with multiple values? (similar to the tags field)

martijnvg · 2025-04-15T08:46:43Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+                String addressDataOutputName = null;
+                try (
+                    var addressMetaOutput = new ByteBuffersIndexOutput(addressMetaBuffer, "meta-temp", "meta-temp");
+                    // TODO: which IOContext should be used here?


@dnhatn Do you think usage of IOContext.DEFAULT is ok here or is there a better IOContext that can be used here?

martijnvg · 2025-04-15T08:54:14Z

I also re-ran the micro benchmark:

Benchmark                                                    (deltaTime)   (nDocs)  (seed)  Mode  Cnt     Score   Error  Units
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge            1000  20431204      42    ss       4678.076          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge         1000  20431204      42    ss       7230.848          ms/op

Which looks better as was reported in #125403.

UPDATE:

Running the same micro benchmark from main branch:

Benchmark                                                    (deltaTime)   (nDocs)  (seed)  Mode  Cnt     Score   Error  Units
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge            1000  20431204      42    ss       5607.886          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge         1000  20431204      42    ss       8397.983          ms/op

which is relatively slightly slower than what was reported above.

…ation

martijnvg

Thanks for iterating. I left two more comments.

martijnvg · 2025-04-17T08:16:33Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+                        } catch (final IOException ignored) {
+                            // ignore exception
+                        }


Can this try-catch be removed? This method signature does allow IOException. If addressDataOutputName is not null, then there should be a temp file, I think?

Will do.
I originally added the try-catch because the draft implementation used org.apache.lucene.util.IOUtils.deleteFilesIgnoringExceptions here. That's a forbidden api so I couldn't use it, but it seemed like we were trying to suppress any IOException that happened during that deletion, so I added the try-catch.

martijnvg · 2025-04-17T08:18:42Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java

+                    var addressMetaOutput = new ByteBuffersIndexOutput(addressMetaBuffer, "meta-temp", "meta-temp");
+                    var addressDataOutput = dir.createTempOutput(data.getName(), "address-data", ioContext)


Maybe similarly to DISIAccumulator when can encapsulate the accumulation of the addresses in a class that implements Closable and has. a build method that copies data from temp file to actual data file and update metadata?

I think this could make the code a little bit more manageable similar to effect of what DISIAccumulator did.

Makes sense to me, I'll add that

Coalesce getSortedNumeric calls for ES819 doc values merging

b1eedf0

jordan-powers added >non-issue auto-backport Automatically create backport pull requests when merged :StorageEngine/Codec v8.19.0 v9.1.0 labels Apr 11, 2025

jordan-powers self-assigned this Apr 11, 2025

jordan-powers added 2 commits April 14, 2025 10:27

Avoid using forbidden lucene IOUtils api

fa51f2c

Merge remote-tracking branch 'upstream/main' into es819-merge-optimiz…

55c56e2

…ation

jordan-powers requested a review from martijnvg April 14, 2025 19:57

jordan-powers marked this pull request as ready for review April 14, 2025 19:57

elasticsearchmachine added the Team:StorageEngine label Apr 14, 2025

jordan-powers commented Apr 14, 2025

View reviewed changes

martijnvg reviewed Apr 15, 2025

View reviewed changes

martijnvg mentioned this pull request Apr 15, 2025

Optimize segment merging in the tsdb doc value codec #126111

Open

4 tasks

jordan-powers added 6 commits April 15, 2025 12:27

Index multi-valued sorted numeric doc values in ES819 force merge tests

187fe48

Merge remote-tracking branch 'upstream/main' into es819-merge-optimiz…

4e94261

…ation

Merge remote-tracking branch 'upstream/main' into es819-merge-optimiz…

183ccb2

…ation

Use IOContext from SegmentWriteState

7237f19

Merge remote-tracking branch 'upstream/main' into es819-merge-optimiz…

6b733d3

…ation

Merge remote-tracking branch 'upstream/main' into es819-merge-optimiz…

5147b91

…ation

martijnvg reviewed Apr 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coalesce getSortedNumeric calls for ES819 doc values merging #126732

Coalesce getSortedNumeric calls for ES819 doc values merging #126732

jordan-powers commented Apr 11, 2025

elasticsearchmachine commented Apr 14, 2025

jordan-powers Apr 14, 2025

martijnvg Apr 15, 2025 •

edited

Loading

martijnvg Apr 16, 2025

martijnvg left a comment

martijnvg Apr 15, 2025 •

edited

Loading

martijnvg commented Apr 15, 2025 •

edited

Loading

martijnvg left a comment

martijnvg Apr 17, 2025

jordan-powers Apr 17, 2025

martijnvg Apr 17, 2025 •

edited

Loading

jordan-powers Apr 17, 2025

		var addressMetaOutput = new ByteBuffersIndexOutput(addressMetaBuffer, "meta-temp", "meta-temp");
		var addressDataOutput = dir.createTempOutput(data.getName(), "address-data", ioContext)

Coalesce getSortedNumeric calls for ES819 doc values merging #126732

Are you sure you want to change the base?

Coalesce getSortedNumeric calls for ES819 doc values merging #126732

Conversation

jordan-powers commented Apr 11, 2025

elasticsearchmachine commented Apr 14, 2025

jordan-powers Apr 14, 2025

Choose a reason for hiding this comment

martijnvg Apr 15, 2025 • edited Loading

Choose a reason for hiding this comment

martijnvg Apr 16, 2025

Choose a reason for hiding this comment

martijnvg left a comment

Choose a reason for hiding this comment

martijnvg Apr 15, 2025 • edited Loading

Choose a reason for hiding this comment

martijnvg commented Apr 15, 2025 • edited Loading

martijnvg left a comment

Choose a reason for hiding this comment

martijnvg Apr 17, 2025

Choose a reason for hiding this comment

jordan-powers Apr 17, 2025

Choose a reason for hiding this comment

martijnvg Apr 17, 2025 • edited Loading

Choose a reason for hiding this comment

jordan-powers Apr 17, 2025

Choose a reason for hiding this comment

martijnvg Apr 15, 2025 •

edited

Loading

martijnvg Apr 15, 2025 •

edited

Loading

martijnvg commented Apr 15, 2025 •

edited

Loading

martijnvg Apr 17, 2025 •

edited

Loading