First step optimizing tsdb doc values codec merging. #125403

martijnvg · 2025-03-21T12:58:30Z

The doc values codec iterates a few times over the doc value instance that needs to be written to disk. In case when merging and index sorting is enabled, this is much more expensive, as each time the doc values instance is iterated an expensive doc id sorting is performed (in order to get the doc ids in order of index sorting).

There are several reasons why the doc value instance is iterated multiple times:

To compute stats (num values, number of docs with value) required for writing values to disk.
To write bitset that indicate which documents have a value. (indexed disi, jump table)
To write the actual values to disk.
To write the addresses to disk (in case docs have multiple values)

This applies for numeric doc values, but also for the ordinals of sorted (set) doc values.

This PR addresses solving the first reason why doc value instance needs to be iterated. This is done only when in case of merging and when the segments to be merged with are also of type es87 doc values, codec version is the same and there are no deletes.

The doc values codec iterates a few times over the doc value instance that needs to be written to disk. In case when merging and index sorting is enabled, this is much more expensive, as each time the doc values instance is iterated an expensive doc id sorting is performed (in order to get the doc ids in order of index sorting). There are several reasons why the doc value instance is iterated multiple times: * To compute stats (num values, number of docs with value) required for writing values to disk. * To write bitset that indicate which documents have a value. (indexed disi, jump table) * To write the actual values to disk. * To write the addresses to disk (in case docs have multiple values) This applies for numeric doc values, but also for the ordinals of sorted (set) doc values. This PR addresses solving the first reason why doc value instance needs to be iterated. This is done only when in case of merging and when the segments to be merged with are also of type es87 doc values, codec version is the same and there are no deletes.

fixed sorted set dv added unit test with index sorting

martijnvg · 2025-03-21T18:11:32Z

The attached micro benchmark to tests the tsdb doc value codec with force merge suggests the following:

Benchmark                                                            (deltaTime)   (nDocs)  (seed)    Mode     Cnt     Score   Error  Units
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge                    1000  13431204      42  sample  322932     0.012 ± 0.037  ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.00              1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.50              1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.90              1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.95              1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.99              1000  13431204      42  sample             0.001          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.999             1000  13431204      42  sample             0.006          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.9999            1000  13431204      42  sample             0.031          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p1.00              1000  13431204      42  sample          3611.296          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge                 1000  13431204      42  sample  301539     0.014 ± 0.044  ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.00           1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.50           1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.90           1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.95           1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.99           1000  13431204      42  sample             0.001          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.999          1000  13431204      42  sample             0.006          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.9999         1000  13431204      42  sample             0.026          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p1.00           1000  13431204      42  sample          4060.086          ms/op

martijnvg · 2025-03-21T18:19:25Z

This PR adds a more code than I thought I needed, but the good thing is that the 'format 'code the writes to disk didn't need to be changed.

martijnvg · 2025-03-24T17:25:03Z

Running elastic/logs track (logging-indexing challenge) without this change as baseline and with this change as contender:

|                                                        Metric |                                   Task |        Baseline |       Contender |            Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------------------------:|----------------:|----------------:|----------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                        |  1466.96        |  1376.13        |    -90.8323     |    min |   -6.19% |
|             Min cumulative indexing time across primary shard |                                        |    13.6549      |    13.2534      |     -0.40147    |    min |   -2.94% |
|          Median cumulative indexing time across primary shard |                                        |    51.722       |    49.3463      |     -2.37568    |    min |   -4.59% |
|             Max cumulative indexing time across primary shard |                                        |   530.91        |   516.047       |    -14.8631     |    min |   -2.80% |
|           Cumulative indexing throttle time of primary shards |                                        |     0           |     0           |      0          |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                        |     0           |     0           |      0          |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                        |     0           |     0           |      0          |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                        |     0           |     0           |      0          |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                        |   372.98        |   368.189       |     -4.79062    |    min |   -1.28% |
|                      Cumulative merge count of primary shards |                                        |   287           |   316           |     29          |        |  +10.10% |
|                Min cumulative merge time across primary shard |                                        |     1.87335     |     1.73993     |     -0.13342    |    min |   -7.12% |
|             Median cumulative merge time across primary shard |                                        |     7.52116     |     7.99812     |      0.47696    |    min |   +6.34% |
|                Max cumulative merge time across primary shard |                                        |   198.019       |   183.153       |    -14.8651     |    min |   -7.51% |
|              Cumulative merge throttle time of primary shards |                                        |    95.1297      |    99.2514      |      4.12168    |    min |   +4.33% |
|       Min cumulative merge throttle time across primary shard |                                        |     0.312833    |     0.286733    |     -0.0261     |    min |   -8.34% |
|    Median cumulative merge throttle time across primary shard |                                        |     1.6415      |     1.56289     |     -0.07861    |    min |   -4.79% |
|       Max cumulative merge throttle time across primary shard |                                        |    46.661       |    45.5059      |     -1.15508    |    min |   -2.48% |
|                     Cumulative refresh time of primary shards |                                        |     6.74532     |     5.47937     |     -1.26595    |    min |  -18.77% |
|                    Cumulative refresh count of primary shards |                                        |  1823           |  1820           |     -3          |        |   -0.16% |
|              Min cumulative refresh time across primary shard |                                        |     0.0782167   |     0.147883    |      0.06967    |    min |  +89.07% |
|           Median cumulative refresh time across primary shard |                                        |     0.344267    |     0.25715     |     -0.08712    |    min |  -25.30% |
|              Max cumulative refresh time across primary shard |                                        |     1.79267     |     1.4992      |     -0.29347    |    min |  -16.37% |
|                       Cumulative flush time of primary shards |                                        |    85.7167      |    70.3317      |    -15.385      |    min |  -17.95% |
|                      Cumulative flush count of primary shards |                                        |  1761           |  1738           |    -23          |        |   -1.31% |
|                Min cumulative flush time across primary shard |                                        |     2.4082      |     1.96297     |     -0.44523    |    min |  -18.49% |
|             Median cumulative flush time across primary shard |                                        |     5.72419     |     4.48419     |     -1.24       |    min |  -21.66% |
|                Max cumulative flush time across primary shard |                                        |    13.6429      |    13.0937      |     -0.5492     |    min |   -4.03% |
|                                       Total Young Gen GC time |                                        |   134.348       |    97.011       |    -37.337      |      s |  -27.79% |
|                                      Total Young Gen GC count |                                        |  6484           |  6100           |   -384          |        |   -5.92% |
|                                         Total Old Gen GC time |                                        |     0           |     0           |      0          |      s |    0.00% |
|                                        Total Old Gen GC count |                                        |     0           |     0           |      0          |        |    0.00% |
|                                                    Store size |                                        |    44.6637      |    47.8919      |      3.22829    |     GB |   +7.23% |
|                                                 Translog size |                                        |     0.477316    |     0.609365    |      0.13205    |     GB |  +27.66% |
|                                        Heap used for segments |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                      Heap used for doc values |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                           Heap used for terms |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                           Heap used for norms |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                          Heap used for points |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                   Heap used for stored fields |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                                 Segment count |                                        |   525           |   582           |     57          |        |  +10.86% |
|                                   Total Ingest Pipeline count |                                        |     4.88641e+08 |     4.88617e+08 | -24000          |        |   -0.00% |
|                                    Total Ingest Pipeline time |                                        |     4.37256e+07 |     4.13417e+07 |     -2.3839e+06 |     ms |   -5.45% |
|                                  Total Ingest Pipeline failed |                                        |     0           |     0           |      0          |        |    0.00% |
|                                                Min Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                               Mean Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                             Median Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                                Max Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                      100th percentile latency |                       insert-pipelines |  1107.98        |   986.273       |   -121.702      |     ms |  -10.98% |
|                                 100th percentile service time |                       insert-pipelines |  1107.98        |   986.273       |   -121.702      |     ms |  -10.98% |
|                                                    error rate |                       insert-pipelines |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                               Mean Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                             Median Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                                Max Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                      100th percentile latency |                             insert-ilm |    38.8762      |    35.8612      |     -3.01497    |     ms |   -7.76% |
|                                 100th percentile service time |                             insert-ilm |    38.8762      |    35.8612      |     -3.01497    |     ms |   -7.76% |
|                                                    error rate |                             insert-ilm |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                               Mean Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                             Median Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                                Max Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                      100th percentile latency | validate-package-template-installation |    21.491       |    20.0744      |     -1.41665    |     ms |   -6.59% |
|                                 100th percentile service time | validate-package-template-installation |    21.491       |    20.0744      |     -1.41665    |     ms |   -6.59% |
|                                                    error rate | validate-package-template-installation |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                               Mean Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                             Median Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                                Max Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                      100th percentile latency |        update-custom-package-templates |   429.812       |   392.63        |    -37.1817     |     ms |   -8.65% |
|                                 100th percentile service time |        update-custom-package-templates |   429.812       |   392.63        |    -37.1817     |     ms |   -8.65% |
|                                                    error rate |        update-custom-package-templates |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput |                             bulk-index |   892.867       |   525.49        |   -367.378      | docs/s |  -41.15% |
|                                               Mean Throughput |                             bulk-index | 56625.3         | 58508.5         |   1883.17       | docs/s |   +3.33% |
|                                             Median Throughput |                             bulk-index | 56797.8         | 58562.7         |   1764.92       | docs/s |   +3.11% |
|                                                Max Throughput |                             bulk-index | 57996.2         | 60975.4         |   2979.21       | docs/s |   +5.14% |
|                                       50th percentile latency |                             bulk-index |  1647.44        |   315.067       |  -1332.37       |     ms |  -80.88% |
|                                       90th percentile latency |                             bulk-index |  3106.41        |   609.538       |  -2496.87       |     ms |  -80.38% |
|                                       99th percentile latency |                             bulk-index |  5699.55        |  1121.14        |  -4578.41       |     ms |  -80.33% |
|                                     99.9th percentile latency |                             bulk-index |  8431.32        |  4875.32        |  -3556          |     ms |  -42.18% |
|                                    99.99th percentile latency |                             bulk-index | 11082           |  7170.7         |  -3911.33       |     ms |  -35.29% |
|                                      100th percentile latency |                             bulk-index | 15475           | 11908.2         |  -3566.86       |     ms |  -23.05% |
|                                  50th percentile service time |                             bulk-index |  1646.87        |   316.439       |  -1330.43       |     ms |  -80.79% |
|                                  90th percentile service time |                             bulk-index |  3105.91        |   612.673       |  -2493.24       |     ms |  -80.27% |
|                                  99th percentile service time |                             bulk-index |  5701.42        |  1092.73        |  -4608.69       |     ms |  -80.83% |
|                                99.9th percentile service time |                             bulk-index |  8423.32        |  4887.69        |  -3535.63       |     ms |  -41.97% |
|                               99.99th percentile service time |                             bulk-index | 11085.5         |  7215.36        |  -3870.11       |     ms |  -34.91% |
|                                 100th percentile service time |                             bulk-index | 15475           | 11908.2         |  -3566.86       |     ms |  -23.05% |
|                                                    error rate |                             bulk-index |     0           |     0           |      0          |      % |    0.00% |

dnhatn

I've left some comments. Thanks @martijnvg.

dnhatn · 2025-03-25T00:11:21Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/DocValuesConsumerUtil.java

+            sumNumDocsWithField += entry.numDocsWithField;
+        }
+
+        // Documents marked as deleted should be rare. Maybe in the case of noop operation?


Should we check this before getting docValues?

dnhatn · 2025-03-25T00:12:13Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/ES87TSDBDocValuesConsumer.java

+    @Override
+    public void mergeNumericField(FieldInfo mergeFieldInfo, MergeState mergeState) throws IOException {
+        var result = compatibleWithOptimizedMerge(enableOptimizedMerge, mergeFieldInfo, mergeState, (docValuesProducer) -> {
+            var numeric = docValuesProducer.getNumeric(mergeFieldInfo);


Should we query the NumericEntry directly?

if (docValuesProducer instanceof ES87TSDBDocValuesProducer producer && producer.version == VERSION_CURRENT) { var entry = producer.numerics.get(mergeFieldInfo.name); return new DocValuesConsumerUtil.FieldEntry(entry.docsWithFieldOffset, entry.numValues, -1); }

This is what I had initially, however the type of producer isn't ES87TSDBDocValuesProducer, but is PerFieldDocValuesFormat.FieldsReader. So I ended up with the current workaround, not happy about it, but I don't see another way.

dnhatn · 2025-03-25T01:33:50Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/DocValuesConsumerUtil.java

+        };
+    }
+
+    static NumericDocValues mergeNumericValues(List<NumericDocValuesSub> subs, boolean indexIsSorted) throws IOException {


Did we copy these from DocValuesConsumer? Is there any chance we can avoid copying this?

Yes. I don't see another way here. All the logic that we need here is private to DocValuesConsumer only.

martijnvg · 2025-03-25T13:28:40Z

Running tsdb track without this change as baseline and with this change as contender:

|                                                                   Metric |                    Task |        Baseline |       Contender |        Diff |   Unit |   Diff % |
|-------------------------------------------------------------------------:|------------------------:|----------------:|----------------:|------------:|-------:|---------:|
|                               Cumulative indexing time of primary shards |                         |   267.277       |   272.379       |     5.10255 |    min |   +1.91% |
|                        Min cumulative indexing time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                     Median cumulative indexing time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                        Max cumulative indexing time across primary shard |                         |   267.277       |   272.379       |     5.10255 |    min |   +1.91% |
|                      Cumulative indexing throttle time of primary shards |                         |     0           |     0           |     0       |    min |    0.00% |
|               Min cumulative indexing throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|            Median cumulative indexing throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|               Max cumulative indexing throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                                  Cumulative merge time of primary shards |                         |    88.0096      |    85.7457      |    -2.26392 |    min |   -2.57% |
|                                 Cumulative merge count of primary shards |                         |    60           |    59           |    -1       |        |   -1.67% |
|                           Min cumulative merge time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                        Median cumulative merge time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                           Max cumulative merge time across primary shard |                         |    88.0096      |    85.7457      |    -2.26392 |    min |   -2.57% |
|                         Cumulative merge throttle time of primary shards |                         |    15.859       |    14.9264      |    -0.9327  |    min |   -5.88% |
|                  Min cumulative merge throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|               Median cumulative merge throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                  Max cumulative merge throttle time across primary shard |                         |    15.859       |    14.9264      |    -0.9327  |    min |   -5.88% |
|                                Cumulative refresh time of primary shards |                         |     1.61935     |     1.67273     |     0.05338 |    min |   +3.30% |
|                               Cumulative refresh count of primary shards |                         |   163           |   162           |    -1       |        |   -0.61% |
|                         Min cumulative refresh time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                      Median cumulative refresh time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                         Max cumulative refresh time across primary shard |                         |     1.61935     |     1.67273     |     0.05338 |    min |   +3.30% |
|                                  Cumulative flush time of primary shards |                         |     0.0152333   |     0.0009      |    -0.01433 |    min |  -94.09% |
|                                 Cumulative flush count of primary shards |                         |    14           |    13           |    -1       |        |   -7.14% |
|                           Min cumulative flush time across primary shard |                         |     3.33333e-05 |     3.33333e-05 |     0       |    min |    0.00% |
|                        Median cumulative flush time across primary shard |                         |     3.33333e-05 |     3.33333e-05 |     0       |    min |    0.00% |
|                           Max cumulative flush time across primary shard |                         |     0.0147667   |     0.000433333 |    -0.01433 |    min |  -97.07% |
|                                                  Total Young Gen GC time |                         |    36.007       |    37.219       |     1.212   |      s |   +3.37% |
|                                                 Total Young Gen GC count |                         |  1598           |  1602           |     4       |        |   +0.25% |
|                                                    Total Old Gen GC time |                         |     0           |     0           |     0       |      s |    0.00% |
|                                                   Total Old Gen GC count |                         |     0           |     0           |     0       |        |    0.00% |
|                                                               Store size |                         |     4.64955     |     4.67477     |     0.02522 |     GB |   +0.54% |
|                                                            Translog size |                         |     6.65896e-07 |     6.65896e-07 |     0       |     GB |    0.00% |
|                                                   Heap used for segments |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                 Heap used for doc values |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                      Heap used for terms |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                      Heap used for norms |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                     Heap used for points |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                              Heap used for stored fields |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                            Segment count |                         |     6           |    13           |     7       |        | +116.67% |
|                                              Total Ingest Pipeline count |                         |     0           |     0           |     0       |        |    0.00% |
|                                               Total Ingest Pipeline time |                         |     0           |     0           |     0       |     ms |    0.00% |
|                                             Total Ingest Pipeline failed |                         |     0           |     0           |     0       |        |    0.00% |
|                                                           Min Throughput |                   index | 82125.6         | 80755.3         | -1370.32    | docs/s |   -1.67% |
|                                                          Mean Throughput |                   index | 87956.1         | 87051.1         |  -904.987   | docs/s |   -1.03% |
|                                                        Median Throughput |                   index | 88444.2         | 87573.7         |  -870.508   | docs/s |   -0.98% |
|                                                           Max Throughput |                   index | 93514.4         | 91517.8         | -1996.6     | docs/s |   -2.14% |
|                                                  50th percentile latency |                   index |   851.696       |   856.605       |     4.90901 |     ms |   +0.58% |
|                                                  90th percentile latency |                   index |  1113.73        |  1136.79        |    23.0587  |     ms |   +2.07% |
|                                                  99th percentile latency |                   index |  2823.29        |  2941.41        |   118.118   |     ms |   +4.18% |
|                                                99.9th percentile latency |                   index |  4246.65        |  4373.47        |   126.818   |     ms |   +2.99% |
|                                               99.99th percentile latency |                   index |  5733.52        |  5060.5         |  -673.019   |     ms |  -11.74% |
|                                                 100th percentile latency |                   index |  5980.77        |  5550.07        |  -430.702   |     ms |   -7.20% |
|                                             50th percentile service time |                   index |   851.235       |   856.613       |     5.37861 |     ms |   +0.63% |
|                                             90th percentile service time |                   index |  1122.35        |  1135.69        |    13.3398  |     ms |   +1.19% |
|                                             99th percentile service time |                   index |  2823.5         |  2934.71        |   111.212   |     ms |   +3.94% |
|                                           99.9th percentile service time |                   index |  4220.14        |  4372.69        |   152.55    |     ms |   +3.61% |
|                                          99.99th percentile service time |                   index |  5733.52        |  5060.5         |  -673.019   |     ms |  -11.74% |
|                                            100th percentile service time |                   index |  5980.77        |  5550.07        |  -430.702   |     ms |   -7.20% |
|                                                               error rate |                   index |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput |                 default |    72.7266      |    63.9027      |    -8.82396 |  ops/s |  -12.13% |
|                                                          Mean Throughput |                 default |    72.7266      |    66.0422      |    -6.68449 |  ops/s |   -9.19% |
|                                                        Median Throughput |                 default |    72.7266      |    66.0422      |    -6.68449 |  ops/s |   -9.19% |
|                                                           Max Throughput |                 default |    72.7266      |    68.1816      |    -4.54502 |  ops/s |   -6.25% |
|                                                  50th percentile latency |                 default |    11.3524      |    12.8836      |     1.53121 |     ms |  +13.49% |
|                                                  90th percentile latency |                 default |    11.9501      |    13.4078      |     1.45768 |     ms |  +12.20% |
|                                                  99th percentile latency |                 default |    17.505       |    15.3226      |    -2.18244 |     ms |  -12.47% |
|                                                 100th percentile latency |                 default |    17.6197      |    16.5698      |    -1.04989 |     ms |   -5.96% |
|                                             50th percentile service time |                 default |    11.3524      |    12.8836      |     1.53121 |     ms |  +13.49% |
|                                             90th percentile service time |                 default |    11.9501      |    13.4078      |     1.45768 |     ms |  +12.20% |
|                                             99th percentile service time |                 default |    17.505       |    15.3226      |    -2.18244 |     ms |  -12.47% |
|                                            100th percentile service time |                 default |    17.6197      |    16.5698      |    -1.04989 |     ms |   -5.96% |
|                                                               error rate |                 default |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput |              default_1k |    29.1361      |    27.9969      |    -1.1392  |  ops/s |   -3.91% |
|                                                          Mean Throughput |              default_1k |    29.5797      |    28.6573      |    -0.92236 |  ops/s |   -3.12% |
|                                                        Median Throughput |              default_1k |    29.6672      |    28.7824      |    -0.88483 |  ops/s |   -2.98% |
|                                                           Max Throughput |              default_1k |    29.8482      |    29.0676      |    -0.78057 |  ops/s |   -2.62% |
|                                                  50th percentile latency |              default_1k |    32.2199      |    33.0253      |     0.80531 |     ms |   +2.50% |
|                                                  90th percentile latency |              default_1k |    33.0007      |    33.6412      |     0.64048 |     ms |   +1.94% |
|                                                  99th percentile latency |              default_1k |    51.6371      |    38.0896      |   -13.5476  |     ms |  -26.24% |
|                                                 100th percentile latency |              default_1k |    54.7434      |    48.1042      |    -6.63918 |     ms |  -12.13% |
|                                             50th percentile service time |              default_1k |    32.2199      |    33.0253      |     0.80531 |     ms |   +2.50% |
|                                             90th percentile service time |              default_1k |    33.0007      |    33.6412      |     0.64048 |     ms |   +1.94% |
|                                             99th percentile service time |              default_1k |    51.6371      |    38.0896      |   -13.5476  |     ms |  -26.24% |
|                                            100th percentile service time |              default_1k |    54.7434      |    48.1042      |    -6.63918 |     ms |  -12.13% |
|                                                               error rate |              default_1k |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                          Mean Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                        Median Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                           Max Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                  50th percentile latency | date-histo-entire-range |     2.66948     |     2.59981     |    -0.06967 |     ms |   -2.61% |
|                                                  90th percentile latency | date-histo-entire-range |     2.94906     |     2.74951     |    -0.19955 |     ms |   -6.77% |
|                                                  99th percentile latency | date-histo-entire-range |     3.67115     |     3.05257     |    -0.61857 |     ms |  -16.85% |
|                                                 100th percentile latency | date-histo-entire-range |     3.7502      |     3.14106     |    -0.60914 |     ms |  -16.24% |
|                                             50th percentile service time | date-histo-entire-range |     2.66948     |     2.59981     |    -0.06967 |     ms |   -2.61% |
|                                             90th percentile service time | date-histo-entire-range |     2.94906     |     2.74951     |    -0.19955 |     ms |   -6.77% |
|                                             99th percentile service time | date-histo-entire-range |     3.67115     |     3.05257     |    -0.61857 |     ms |  -16.85% |
|                                            100th percentile service time | date-histo-entire-range |     3.7502      |     3.14106     |    -0.60914 |     ms |  -16.24% |
|                                                               error rate | date-histo-entire-range |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput |          esql-fetch-500 |     8.95722     |     9.22561     |     0.26839 |  ops/s |   +3.00% |
|                                                          Mean Throughput |          esql-fetch-500 |     9.60207     |     9.79872     |     0.19664 |  ops/s |   +2.05% |
|                                                        Median Throughput |          esql-fetch-500 |     9.65652     |     9.85814     |     0.20162 |  ops/s |   +2.09% |
|                                                           Max Throughput |          esql-fetch-500 |    10.0294      |    10.1854      |     0.156   |  ops/s |   +1.56% |
|                                                  50th percentile latency |          esql-fetch-500 |    90.0991      |    89.3751      |    -0.72395 |     ms |   -0.80% |
|                                                  90th percentile latency |          esql-fetch-500 |    97.2384      |    95.5091      |    -1.72926 |     ms |   -1.78% |
|                                                  99th percentile latency |          esql-fetch-500 |   108.207       |   122.953       |    14.7463  |     ms |  +13.63% |
|                                                 100th percentile latency |          esql-fetch-500 |   110.563       |   125.449       |    14.8864  |     ms |  +13.46% |
|                                             50th percentile service time |          esql-fetch-500 |    90.0991      |    89.3751      |    -0.72395 |     ms |   -0.80% |
|                                             90th percentile service time |          esql-fetch-500 |    97.2384      |    95.5091      |    -1.72926 |     ms |   -1.78% |
|                                             99th percentile service time |          esql-fetch-500 |   108.207       |   122.953       |    14.7463  |     ms |  +13.63% |
|                                            100th percentile service time |          esql-fetch-500 |   110.563       |   125.449       |    14.8864  |     ms |  +13.46% |
|                                                               error rate |          esql-fetch-500 |     0           |     0           |     0       |      % |    0.00% |

Unfortunately the improvement is less visible here. Looks like ~3% less time spent on merging.

martijnvg · 2025-03-25T14:18:29Z

Thanks Nhat for the review!

…ctly in compatibleWithOptimizedMerge(...) method.

dnhatn

It looks much better. Thanks, Martijn. Would you mind taking a look at the test failures? They seem related.

dnhatn · 2025-04-05T05:23:59Z

server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java

@@ -1262,6 +1262,15 @@ public long longValue() throws IOException {
        }
    }

+    abstract static class BaseNumericDocValues extends NumericDocValues {


I think we no longer need these base classes?

Good point we no longer need these base classes. I will remove.

Removed: 39dc98f

martijnvg · 2025-04-05T06:52:47Z

Would you mind taking a look at the test failures? They seem related.

They are related and I think this will address the bwc failures: 5a2da25

dnhatn

LGTM. Thanks Martijn!

martijnvg · 2025-04-08T10:54:54Z

Unfortunately it isn't possible to run the release tests due to:

* What went wrong:
Could not determine the dependencies of task ':distribution:docker:buildAarch64FipsDockerContext'.
> Could not resolve all dependencies for configuration ':distribution:docker:metricbeat_fips_aarch64'.
   > Could not find beats:metricbeat-fips:9.1.0.
     Required by:
         project :distribution:docker

I did confirm that locally the unit tests pass with release build (meaning the feature flag is disabled):

./gradlew ":server:test" --tests "org.elasticsearch.index.codec.*" -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dlicense.key=x-pack/license-tools/src/test/resources/public.key

elasticsearchmachine added the v9.1.0 label Mar 21, 2025

elasticsearchmachine and others added 2 commits March 21, 2025 13:10

[CI] Auto commit changes from spotless

9bd2907

actually use OrdinalMap when merging sorted and sorted dv

65d97e5

fixed sorted set dv added unit test with index sorting

martijnvg force-pushed the mergeSortedNumericField_3 branch from 52f3084 to 65d97e5 Compare March 21, 2025 15:50

martijnvg and others added 4 commits March 21, 2025 17:13

fix test

7369a22

[CI] Auto commit changes from spotless

3b7822d

fix test (2)

ce4b326

fix lost of stuff

486ea20

martijnvg added 5 commits March 21, 2025 20:16

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

16c0a00

iter

984513a

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

5a575d6

iter test

3b53705

moving code around

9fb38b6

dnhatn reviewed Mar 25, 2025

View reviewed changes

martijnvg added 2 commits March 25, 2025 10:05

benchmark iter

1e0e2f8

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

65741c4

Check for deleted docs before getting doc value instances.

1ec6308

martijnvg added 6 commits March 25, 2025 15:30

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

ccae570

remove doc value skipper check

5e7cc11

Remove getEntryFunction lamda and delegate to doc value instance dire…

744a665

…ctly in compatibleWithOptimizedMerge(...) method.

lower doc count in benchmark

176fac7

added node setting to control whether optimized merge is enabled.

ec998a3

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

5425079

martijnvg added the :StorageEngine/Codec label Mar 25, 2025

dnhatn reviewed Apr 5, 2025

View reviewed changes

dnhatn self-requested a review April 5, 2025 05:37

address bwc failures

5a2da25

github-actions bot deployed to docs-preview April 5, 2025 06:52 View deployment

Remove BaseNumericDocValues and BaseSortedNumericDocValues

39dc98f

github-actions bot deployed to docs-preview April 5, 2025 07:05 View deployment

martijnvg added 2 commits April 5, 2025 14:30

improve TsdbDocValueBwcTests

5bfb302

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

b641e01

github-actions bot deployed to docs-preview April 5, 2025 12:31 View deployment

Assert per field format field info attributes.

5e8ea42

github-actions bot deployed to docs-preview April 5, 2025 13:01 View deployment

dnhatn approved these changes Apr 7, 2025

View reviewed changes

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

1c4efa6

github-actions bot deployed to docs-preview April 8, 2025 06:43 View deployment

martijnvg added auto-backport Automatically create backport pull requests when merged test-full-bwc Trigger full BWC version matrix tests test-release Trigger CI checks against release build labels Apr 8, 2025

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

57e2996

github-actions bot deployed to docs-preview April 8, 2025 07:55 View deployment

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

b5f332b

github-actions bot deployed to docs-preview April 8, 2025 10:27 View deployment

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

66c7efb

github-actions bot deployed to docs-preview April 8, 2025 13:29 View deployment

martijnvg removed the test-release Trigger CI checks against release build label Apr 8, 2025

Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3

86b4d22

github-actions bot deployed to docs-preview April 8, 2025 14:21 View deployment

move per field dv code to dedicated package

a44ab59

github-actions bot deployed to docs-preview April 8, 2025 18:46 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First step optimizing tsdb doc values codec merging. #125403

First step optimizing tsdb doc values codec merging. #125403

martijnvg commented Mar 21, 2025

martijnvg commented Mar 21, 2025

martijnvg commented Mar 21, 2025

martijnvg commented Mar 24, 2025

dnhatn left a comment

dnhatn Mar 25, 2025

dnhatn Mar 25, 2025

martijnvg Mar 25, 2025

dnhatn Mar 25, 2025

martijnvg Mar 25, 2025

martijnvg commented Mar 25, 2025

martijnvg commented Mar 25, 2025

dnhatn left a comment

dnhatn Apr 5, 2025

martijnvg Apr 5, 2025

martijnvg Apr 5, 2025

martijnvg commented Apr 5, 2025

dnhatn left a comment

martijnvg commented Apr 8, 2025

First step optimizing tsdb doc values codec merging. #125403

Are you sure you want to change the base?

First step optimizing tsdb doc values codec merging. #125403

Conversation

martijnvg commented Mar 21, 2025

martijnvg commented Mar 21, 2025

martijnvg commented Mar 21, 2025

martijnvg commented Mar 24, 2025

dnhatn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg commented Mar 25, 2025

martijnvg commented Mar 25, 2025

dnhatn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg commented Apr 5, 2025

dnhatn left a comment

Choose a reason for hiding this comment

martijnvg commented Apr 8, 2025