Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First step optimizing tsdb doc values codec merging. #125403

Open
wants to merge 61 commits into
base: main
Choose a base branch
from

Conversation

martijnvg
Copy link
Member

The doc values codec iterates a few times over the doc value instance that needs to be written to disk. In case when merging and index sorting is enabled, this is much more expensive, as each time the doc values instance is iterated an expensive doc id sorting is performed (in order to get the doc ids in order of index sorting).

There are several reasons why the doc value instance is iterated multiple times:

  • To compute stats (num values, number of docs with value) required for writing values to disk.
  • To write bitset that indicate which documents have a value. (indexed disi, jump table)
  • To write the actual values to disk.
  • To write the addresses to disk (in case docs have multiple values)

This applies for numeric doc values, but also for the ordinals of sorted (set) doc values.

This PR addresses solving the first reason why doc value instance needs to be iterated. This is done only when in case of merging and when the segments to be merged with are also of type es87 doc values, codec version is the same and there are no deletes.

The doc values codec iterates a few times over the doc value instance that needs to be written to disk. In case when merging and index sorting is enabled, this is much more expensive, as each time the doc values instance is iterated an expensive doc id sorting is performed (in order to get the doc ids in order of index sorting).

There are several reasons why the doc value instance is iterated multiple times:
* To compute stats (num values, number of docs with value) required for writing values to disk.
* To write bitset that indicate which documents have a value. (indexed disi, jump table)
* To write the actual values to disk.
* To write the addresses to disk (in case docs have multiple values)

This applies for numeric doc values, but also for the ordinals of sorted (set) doc values.

This PR addresses solving the first reason why doc value instance needs to be iterated. This is done only when in case of merging and when the segments to be merged with are also of type es87 doc values, codec version is the same and there are no deletes.
@martijnvg martijnvg force-pushed the mergeSortedNumericField_3 branch from 52f3084 to 65d97e5 Compare March 21, 2025 15:50
@martijnvg
Copy link
Member Author

The attached micro benchmark to tests the tsdb doc value codec with force merge suggests the following:

Benchmark                                                            (deltaTime)   (nDocs)  (seed)    Mode     Cnt     Score   Error  Units
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge                    1000  13431204      42  sample  322932     0.012 ± 0.037  ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.00              1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.50              1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.90              1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.95              1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.99              1000  13431204      42  sample             0.001          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.999             1000  13431204      42  sample             0.006          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p0.9999            1000  13431204      42  sample             0.031          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithOptimizedMerge:p1.00              1000  13431204      42  sample          3611.296          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge                 1000  13431204      42  sample  301539     0.014 ± 0.044  ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.00           1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.50           1000  13431204      42  sample            ≈ 10⁻⁴          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.90           1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.95           1000  13431204      42  sample            ≈ 10⁻³          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.99           1000  13431204      42  sample             0.001          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.999          1000  13431204      42  sample             0.006          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p0.9999         1000  13431204      42  sample             0.026          ms/op
TSDBDocValuesMergeBenchmark.forceMergeWithoutOptimizedMerge:p1.00           1000  13431204      42  sample          4060.086          ms/op

@martijnvg
Copy link
Member Author

This PR adds a more code than I thought I needed, but the good thing is that the 'format 'code the writes to disk didn't need to be changed.

@martijnvg
Copy link
Member Author

Running elastic/logs track (logging-indexing challenge) without this change as baseline and with this change as contender:

|                                                        Metric |                                   Task |        Baseline |       Contender |            Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------------------------:|----------------:|----------------:|----------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                        |  1466.96        |  1376.13        |    -90.8323     |    min |   -6.19% |
|             Min cumulative indexing time across primary shard |                                        |    13.6549      |    13.2534      |     -0.40147    |    min |   -2.94% |
|          Median cumulative indexing time across primary shard |                                        |    51.722       |    49.3463      |     -2.37568    |    min |   -4.59% |
|             Max cumulative indexing time across primary shard |                                        |   530.91        |   516.047       |    -14.8631     |    min |   -2.80% |
|           Cumulative indexing throttle time of primary shards |                                        |     0           |     0           |      0          |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                        |     0           |     0           |      0          |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                        |     0           |     0           |      0          |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                        |     0           |     0           |      0          |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                        |   372.98        |   368.189       |     -4.79062    |    min |   -1.28% |
|                      Cumulative merge count of primary shards |                                        |   287           |   316           |     29          |        |  +10.10% |
|                Min cumulative merge time across primary shard |                                        |     1.87335     |     1.73993     |     -0.13342    |    min |   -7.12% |
|             Median cumulative merge time across primary shard |                                        |     7.52116     |     7.99812     |      0.47696    |    min |   +6.34% |
|                Max cumulative merge time across primary shard |                                        |   198.019       |   183.153       |    -14.8651     |    min |   -7.51% |
|              Cumulative merge throttle time of primary shards |                                        |    95.1297      |    99.2514      |      4.12168    |    min |   +4.33% |
|       Min cumulative merge throttle time across primary shard |                                        |     0.312833    |     0.286733    |     -0.0261     |    min |   -8.34% |
|    Median cumulative merge throttle time across primary shard |                                        |     1.6415      |     1.56289     |     -0.07861    |    min |   -4.79% |
|       Max cumulative merge throttle time across primary shard |                                        |    46.661       |    45.5059      |     -1.15508    |    min |   -2.48% |
|                     Cumulative refresh time of primary shards |                                        |     6.74532     |     5.47937     |     -1.26595    |    min |  -18.77% |
|                    Cumulative refresh count of primary shards |                                        |  1823           |  1820           |     -3          |        |   -0.16% |
|              Min cumulative refresh time across primary shard |                                        |     0.0782167   |     0.147883    |      0.06967    |    min |  +89.07% |
|           Median cumulative refresh time across primary shard |                                        |     0.344267    |     0.25715     |     -0.08712    |    min |  -25.30% |
|              Max cumulative refresh time across primary shard |                                        |     1.79267     |     1.4992      |     -0.29347    |    min |  -16.37% |
|                       Cumulative flush time of primary shards |                                        |    85.7167      |    70.3317      |    -15.385      |    min |  -17.95% |
|                      Cumulative flush count of primary shards |                                        |  1761           |  1738           |    -23          |        |   -1.31% |
|                Min cumulative flush time across primary shard |                                        |     2.4082      |     1.96297     |     -0.44523    |    min |  -18.49% |
|             Median cumulative flush time across primary shard |                                        |     5.72419     |     4.48419     |     -1.24       |    min |  -21.66% |
|                Max cumulative flush time across primary shard |                                        |    13.6429      |    13.0937      |     -0.5492     |    min |   -4.03% |
|                                       Total Young Gen GC time |                                        |   134.348       |    97.011       |    -37.337      |      s |  -27.79% |
|                                      Total Young Gen GC count |                                        |  6484           |  6100           |   -384          |        |   -5.92% |
|                                         Total Old Gen GC time |                                        |     0           |     0           |      0          |      s |    0.00% |
|                                        Total Old Gen GC count |                                        |     0           |     0           |      0          |        |    0.00% |
|                                                    Store size |                                        |    44.6637      |    47.8919      |      3.22829    |     GB |   +7.23% |
|                                                 Translog size |                                        |     0.477316    |     0.609365    |      0.13205    |     GB |  +27.66% |
|                                        Heap used for segments |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                      Heap used for doc values |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                           Heap used for terms |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                           Heap used for norms |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                          Heap used for points |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                   Heap used for stored fields |                                        |     0           |     0           |      0          |     MB |    0.00% |
|                                                 Segment count |                                        |   525           |   582           |     57          |        |  +10.86% |
|                                   Total Ingest Pipeline count |                                        |     4.88641e+08 |     4.88617e+08 | -24000          |        |   -0.00% |
|                                    Total Ingest Pipeline time |                                        |     4.37256e+07 |     4.13417e+07 |     -2.3839e+06 |     ms |   -5.45% |
|                                  Total Ingest Pipeline failed |                                        |     0           |     0           |      0          |        |    0.00% |
|                                                Min Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                               Mean Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                             Median Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                                Max Throughput |                       insert-pipelines |    12.8772      |    14.1695      |      1.29227    |  ops/s |  +10.04% |
|                                      100th percentile latency |                       insert-pipelines |  1107.98        |   986.273       |   -121.702      |     ms |  -10.98% |
|                                 100th percentile service time |                       insert-pipelines |  1107.98        |   986.273       |   -121.702      |     ms |  -10.98% |
|                                                    error rate |                       insert-pipelines |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                               Mean Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                             Median Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                                Max Throughput |                             insert-ilm |    25.0239      |    27.0675      |      2.04365    |  ops/s |   +8.17% |
|                                      100th percentile latency |                             insert-ilm |    38.8762      |    35.8612      |     -3.01497    |     ms |   -7.76% |
|                                 100th percentile service time |                             insert-ilm |    38.8762      |    35.8612      |     -3.01497    |     ms |   -7.76% |
|                                                    error rate |                             insert-ilm |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                               Mean Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                             Median Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                                Max Throughput | validate-package-template-installation |    45.8409      |    49.1556      |      3.31475    |  ops/s |   +7.23% |
|                                      100th percentile latency | validate-package-template-installation |    21.491       |    20.0744      |     -1.41665    |     ms |   -6.59% |
|                                 100th percentile service time | validate-package-template-installation |    21.491       |    20.0744      |     -1.41665    |     ms |   -6.59% |
|                                                    error rate | validate-package-template-installation |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                               Mean Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                             Median Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                                Max Throughput |        update-custom-package-templates |    27.9008      |    30.5407      |      2.63995    |  ops/s |   +9.46% |
|                                      100th percentile latency |        update-custom-package-templates |   429.812       |   392.63        |    -37.1817     |     ms |   -8.65% |
|                                 100th percentile service time |        update-custom-package-templates |   429.812       |   392.63        |    -37.1817     |     ms |   -8.65% |
|                                                    error rate |        update-custom-package-templates |     0           |     0           |      0          |      % |    0.00% |
|                                                Min Throughput |                             bulk-index |   892.867       |   525.49        |   -367.378      | docs/s |  -41.15% |
|                                               Mean Throughput |                             bulk-index | 56625.3         | 58508.5         |   1883.17       | docs/s |   +3.33% |
|                                             Median Throughput |                             bulk-index | 56797.8         | 58562.7         |   1764.92       | docs/s |   +3.11% |
|                                                Max Throughput |                             bulk-index | 57996.2         | 60975.4         |   2979.21       | docs/s |   +5.14% |
|                                       50th percentile latency |                             bulk-index |  1647.44        |   315.067       |  -1332.37       |     ms |  -80.88% |
|                                       90th percentile latency |                             bulk-index |  3106.41        |   609.538       |  -2496.87       |     ms |  -80.38% |
|                                       99th percentile latency |                             bulk-index |  5699.55        |  1121.14        |  -4578.41       |     ms |  -80.33% |
|                                     99.9th percentile latency |                             bulk-index |  8431.32        |  4875.32        |  -3556          |     ms |  -42.18% |
|                                    99.99th percentile latency |                             bulk-index | 11082           |  7170.7         |  -3911.33       |     ms |  -35.29% |
|                                      100th percentile latency |                             bulk-index | 15475           | 11908.2         |  -3566.86       |     ms |  -23.05% |
|                                  50th percentile service time |                             bulk-index |  1646.87        |   316.439       |  -1330.43       |     ms |  -80.79% |
|                                  90th percentile service time |                             bulk-index |  3105.91        |   612.673       |  -2493.24       |     ms |  -80.27% |
|                                  99th percentile service time |                             bulk-index |  5701.42        |  1092.73        |  -4608.69       |     ms |  -80.83% |
|                                99.9th percentile service time |                             bulk-index |  8423.32        |  4887.69        |  -3535.63       |     ms |  -41.97% |
|                               99.99th percentile service time |                             bulk-index | 11085.5         |  7215.36        |  -3870.11       |     ms |  -34.91% |
|                                 100th percentile service time |                             bulk-index | 15475           | 11908.2         |  -3566.86       |     ms |  -23.05% |
|                                                    error rate |                             bulk-index |     0           |     0           |      0          |      % |    0.00% |

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments. Thanks @martijnvg.

sumNumDocsWithField += entry.numDocsWithField;
}

// Documents marked as deleted should be rare. Maybe in the case of noop operation?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check this before getting docValues?

@Override
public void mergeNumericField(FieldInfo mergeFieldInfo, MergeState mergeState) throws IOException {
var result = compatibleWithOptimizedMerge(enableOptimizedMerge, mergeFieldInfo, mergeState, (docValuesProducer) -> {
var numeric = docValuesProducer.getNumeric(mergeFieldInfo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we query the NumericEntry directly?

if (docValuesProducer instanceof ES87TSDBDocValuesProducer producer && producer.version == VERSION_CURRENT) {
     var entry = producer.numerics.get(mergeFieldInfo.name);
      return new DocValuesConsumerUtil.FieldEntry(entry.docsWithFieldOffset, entry.numValues, -1);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I had initially, however the type of producer isn't ES87TSDBDocValuesProducer, but is PerFieldDocValuesFormat.FieldsReader. So I ended up with the current workaround, not happy about it, but I don't see another way.

};
}

static NumericDocValues mergeNumericValues(List<NumericDocValuesSub> subs, boolean indexIsSorted) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we copy these from DocValuesConsumer? Is there any chance we can avoid copying this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I don't see another way here. All the logic that we need here is private to DocValuesConsumer only.

@martijnvg
Copy link
Member Author

Running tsdb track without this change as baseline and with this change as contender:

|                                                                   Metric |                    Task |        Baseline |       Contender |        Diff |   Unit |   Diff % |
|-------------------------------------------------------------------------:|------------------------:|----------------:|----------------:|------------:|-------:|---------:|
|                               Cumulative indexing time of primary shards |                         |   267.277       |   272.379       |     5.10255 |    min |   +1.91% |
|                        Min cumulative indexing time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                     Median cumulative indexing time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                        Max cumulative indexing time across primary shard |                         |   267.277       |   272.379       |     5.10255 |    min |   +1.91% |
|                      Cumulative indexing throttle time of primary shards |                         |     0           |     0           |     0       |    min |    0.00% |
|               Min cumulative indexing throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|            Median cumulative indexing throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|               Max cumulative indexing throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                                  Cumulative merge time of primary shards |                         |    88.0096      |    85.7457      |    -2.26392 |    min |   -2.57% |
|                                 Cumulative merge count of primary shards |                         |    60           |    59           |    -1       |        |   -1.67% |
|                           Min cumulative merge time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                        Median cumulative merge time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                           Max cumulative merge time across primary shard |                         |    88.0096      |    85.7457      |    -2.26392 |    min |   -2.57% |
|                         Cumulative merge throttle time of primary shards |                         |    15.859       |    14.9264      |    -0.9327  |    min |   -5.88% |
|                  Min cumulative merge throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|               Median cumulative merge throttle time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                  Max cumulative merge throttle time across primary shard |                         |    15.859       |    14.9264      |    -0.9327  |    min |   -5.88% |
|                                Cumulative refresh time of primary shards |                         |     1.61935     |     1.67273     |     0.05338 |    min |   +3.30% |
|                               Cumulative refresh count of primary shards |                         |   163           |   162           |    -1       |        |   -0.61% |
|                         Min cumulative refresh time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                      Median cumulative refresh time across primary shard |                         |     0           |     0           |     0       |    min |    0.00% |
|                         Max cumulative refresh time across primary shard |                         |     1.61935     |     1.67273     |     0.05338 |    min |   +3.30% |
|                                  Cumulative flush time of primary shards |                         |     0.0152333   |     0.0009      |    -0.01433 |    min |  -94.09% |
|                                 Cumulative flush count of primary shards |                         |    14           |    13           |    -1       |        |   -7.14% |
|                           Min cumulative flush time across primary shard |                         |     3.33333e-05 |     3.33333e-05 |     0       |    min |    0.00% |
|                        Median cumulative flush time across primary shard |                         |     3.33333e-05 |     3.33333e-05 |     0       |    min |    0.00% |
|                           Max cumulative flush time across primary shard |                         |     0.0147667   |     0.000433333 |    -0.01433 |    min |  -97.07% |
|                                                  Total Young Gen GC time |                         |    36.007       |    37.219       |     1.212   |      s |   +3.37% |
|                                                 Total Young Gen GC count |                         |  1598           |  1602           |     4       |        |   +0.25% |
|                                                    Total Old Gen GC time |                         |     0           |     0           |     0       |      s |    0.00% |
|                                                   Total Old Gen GC count |                         |     0           |     0           |     0       |        |    0.00% |
|                                                               Store size |                         |     4.64955     |     4.67477     |     0.02522 |     GB |   +0.54% |
|                                                            Translog size |                         |     6.65896e-07 |     6.65896e-07 |     0       |     GB |    0.00% |
|                                                   Heap used for segments |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                 Heap used for doc values |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                      Heap used for terms |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                      Heap used for norms |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                     Heap used for points |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                              Heap used for stored fields |                         |     0           |     0           |     0       |     MB |    0.00% |
|                                                            Segment count |                         |     6           |    13           |     7       |        | +116.67% |
|                                              Total Ingest Pipeline count |                         |     0           |     0           |     0       |        |    0.00% |
|                                               Total Ingest Pipeline time |                         |     0           |     0           |     0       |     ms |    0.00% |
|                                             Total Ingest Pipeline failed |                         |     0           |     0           |     0       |        |    0.00% |
|                                                           Min Throughput |                   index | 82125.6         | 80755.3         | -1370.32    | docs/s |   -1.67% |
|                                                          Mean Throughput |                   index | 87956.1         | 87051.1         |  -904.987   | docs/s |   -1.03% |
|                                                        Median Throughput |                   index | 88444.2         | 87573.7         |  -870.508   | docs/s |   -0.98% |
|                                                           Max Throughput |                   index | 93514.4         | 91517.8         | -1996.6     | docs/s |   -2.14% |
|                                                  50th percentile latency |                   index |   851.696       |   856.605       |     4.90901 |     ms |   +0.58% |
|                                                  90th percentile latency |                   index |  1113.73        |  1136.79        |    23.0587  |     ms |   +2.07% |
|                                                  99th percentile latency |                   index |  2823.29        |  2941.41        |   118.118   |     ms |   +4.18% |
|                                                99.9th percentile latency |                   index |  4246.65        |  4373.47        |   126.818   |     ms |   +2.99% |
|                                               99.99th percentile latency |                   index |  5733.52        |  5060.5         |  -673.019   |     ms |  -11.74% |
|                                                 100th percentile latency |                   index |  5980.77        |  5550.07        |  -430.702   |     ms |   -7.20% |
|                                             50th percentile service time |                   index |   851.235       |   856.613       |     5.37861 |     ms |   +0.63% |
|                                             90th percentile service time |                   index |  1122.35        |  1135.69        |    13.3398  |     ms |   +1.19% |
|                                             99th percentile service time |                   index |  2823.5         |  2934.71        |   111.212   |     ms |   +3.94% |
|                                           99.9th percentile service time |                   index |  4220.14        |  4372.69        |   152.55    |     ms |   +3.61% |
|                                          99.99th percentile service time |                   index |  5733.52        |  5060.5         |  -673.019   |     ms |  -11.74% |
|                                            100th percentile service time |                   index |  5980.77        |  5550.07        |  -430.702   |     ms |   -7.20% |
|                                                               error rate |                   index |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput |                 default |    72.7266      |    63.9027      |    -8.82396 |  ops/s |  -12.13% |
|                                                          Mean Throughput |                 default |    72.7266      |    66.0422      |    -6.68449 |  ops/s |   -9.19% |
|                                                        Median Throughput |                 default |    72.7266      |    66.0422      |    -6.68449 |  ops/s |   -9.19% |
|                                                           Max Throughput |                 default |    72.7266      |    68.1816      |    -4.54502 |  ops/s |   -6.25% |
|                                                  50th percentile latency |                 default |    11.3524      |    12.8836      |     1.53121 |     ms |  +13.49% |
|                                                  90th percentile latency |                 default |    11.9501      |    13.4078      |     1.45768 |     ms |  +12.20% |
|                                                  99th percentile latency |                 default |    17.505       |    15.3226      |    -2.18244 |     ms |  -12.47% |
|                                                 100th percentile latency |                 default |    17.6197      |    16.5698      |    -1.04989 |     ms |   -5.96% |
|                                             50th percentile service time |                 default |    11.3524      |    12.8836      |     1.53121 |     ms |  +13.49% |
|                                             90th percentile service time |                 default |    11.9501      |    13.4078      |     1.45768 |     ms |  +12.20% |
|                                             99th percentile service time |                 default |    17.505       |    15.3226      |    -2.18244 |     ms |  -12.47% |
|                                            100th percentile service time |                 default |    17.6197      |    16.5698      |    -1.04989 |     ms |   -5.96% |
|                                                               error rate |                 default |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput |              default_1k |    29.1361      |    27.9969      |    -1.1392  |  ops/s |   -3.91% |
|                                                          Mean Throughput |              default_1k |    29.5797      |    28.6573      |    -0.92236 |  ops/s |   -3.12% |
|                                                        Median Throughput |              default_1k |    29.6672      |    28.7824      |    -0.88483 |  ops/s |   -2.98% |
|                                                           Max Throughput |              default_1k |    29.8482      |    29.0676      |    -0.78057 |  ops/s |   -2.62% |
|                                                  50th percentile latency |              default_1k |    32.2199      |    33.0253      |     0.80531 |     ms |   +2.50% |
|                                                  90th percentile latency |              default_1k |    33.0007      |    33.6412      |     0.64048 |     ms |   +1.94% |
|                                                  99th percentile latency |              default_1k |    51.6371      |    38.0896      |   -13.5476  |     ms |  -26.24% |
|                                                 100th percentile latency |              default_1k |    54.7434      |    48.1042      |    -6.63918 |     ms |  -12.13% |
|                                             50th percentile service time |              default_1k |    32.2199      |    33.0253      |     0.80531 |     ms |   +2.50% |
|                                             90th percentile service time |              default_1k |    33.0007      |    33.6412      |     0.64048 |     ms |   +1.94% |
|                                             99th percentile service time |              default_1k |    51.6371      |    38.0896      |   -13.5476  |     ms |  -26.24% |
|                                            100th percentile service time |              default_1k |    54.7434      |    48.1042      |    -6.63918 |     ms |  -12.13% |
|                                                               error rate |              default_1k |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                          Mean Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                        Median Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                           Max Throughput | date-histo-entire-range |   317.361       |   319.564       |     2.20343 |  ops/s |   +0.69% |
|                                                  50th percentile latency | date-histo-entire-range |     2.66948     |     2.59981     |    -0.06967 |     ms |   -2.61% |
|                                                  90th percentile latency | date-histo-entire-range |     2.94906     |     2.74951     |    -0.19955 |     ms |   -6.77% |
|                                                  99th percentile latency | date-histo-entire-range |     3.67115     |     3.05257     |    -0.61857 |     ms |  -16.85% |
|                                                 100th percentile latency | date-histo-entire-range |     3.7502      |     3.14106     |    -0.60914 |     ms |  -16.24% |
|                                             50th percentile service time | date-histo-entire-range |     2.66948     |     2.59981     |    -0.06967 |     ms |   -2.61% |
|                                             90th percentile service time | date-histo-entire-range |     2.94906     |     2.74951     |    -0.19955 |     ms |   -6.77% |
|                                             99th percentile service time | date-histo-entire-range |     3.67115     |     3.05257     |    -0.61857 |     ms |  -16.85% |
|                                            100th percentile service time | date-histo-entire-range |     3.7502      |     3.14106     |    -0.60914 |     ms |  -16.24% |
|                                                               error rate | date-histo-entire-range |     0           |     0           |     0       |      % |    0.00% |
|                                                           Min Throughput |          esql-fetch-500 |     8.95722     |     9.22561     |     0.26839 |  ops/s |   +3.00% |
|                                                          Mean Throughput |          esql-fetch-500 |     9.60207     |     9.79872     |     0.19664 |  ops/s |   +2.05% |
|                                                        Median Throughput |          esql-fetch-500 |     9.65652     |     9.85814     |     0.20162 |  ops/s |   +2.09% |
|                                                           Max Throughput |          esql-fetch-500 |    10.0294      |    10.1854      |     0.156   |  ops/s |   +1.56% |
|                                                  50th percentile latency |          esql-fetch-500 |    90.0991      |    89.3751      |    -0.72395 |     ms |   -0.80% |
|                                                  90th percentile latency |          esql-fetch-500 |    97.2384      |    95.5091      |    -1.72926 |     ms |   -1.78% |
|                                                  99th percentile latency |          esql-fetch-500 |   108.207       |   122.953       |    14.7463  |     ms |  +13.63% |
|                                                 100th percentile latency |          esql-fetch-500 |   110.563       |   125.449       |    14.8864  |     ms |  +13.46% |
|                                             50th percentile service time |          esql-fetch-500 |    90.0991      |    89.3751      |    -0.72395 |     ms |   -0.80% |
|                                             90th percentile service time |          esql-fetch-500 |    97.2384      |    95.5091      |    -1.72926 |     ms |   -1.78% |
|                                             99th percentile service time |          esql-fetch-500 |   108.207       |   122.953       |    14.7463  |     ms |  +13.63% |
|                                            100th percentile service time |          esql-fetch-500 |   110.563       |   125.449       |    14.8864  |     ms |  +13.46% |
|                                                               error rate |          esql-fetch-500 |     0           |     0           |     0       |      % |    0.00% |

Unfortunately the improvement is less visible here. Looks like ~3% less time spent on merging.

@martijnvg
Copy link
Member Author

Thanks Nhat for the review!

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks much better. Thanks, Martijn. Would you mind taking a look at the test failures? They seem related.

@@ -1262,6 +1262,15 @@ public long longValue() throws IOException {
}
}

abstract static class BaseNumericDocValues extends NumericDocValues {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we no longer need these base classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point we no longer need these base classes. I will remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed: 39dc98f

@dnhatn dnhatn self-requested a review April 5, 2025 05:37
@martijnvg
Copy link
Member Author

Would you mind taking a look at the test failures? They seem related.

They are related and I think this will address the bwc failures: 5a2da25

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Martijn!

@martijnvg martijnvg added auto-backport Automatically create backport pull requests when merged test-full-bwc Trigger full BWC version matrix tests test-release Trigger CI checks against release build labels Apr 8, 2025
@martijnvg
Copy link
Member Author

Unfortunately it isn't possible to run the release tests due to:

* What went wrong:
Could not determine the dependencies of task ':distribution:docker:buildAarch64FipsDockerContext'.
> Could not resolve all dependencies for configuration ':distribution:docker:metricbeat_fips_aarch64'.
   > Could not find beats:metricbeat-fips:9.1.0.
     Required by:
         project :distribution:docker

I did confirm that locally the unit tests pass with release build (meaning the feature flag is disabled):

./gradlew ":server:test" --tests "org.elasticsearch.index.codec.*" -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dlicense.key=x-pack/license-tools/src/test/resources/public.key

@martijnvg martijnvg removed the test-release Trigger CI checks against release build label Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :StorageEngine/Codec Team:StorageEngine test-full-bwc Trigger full BWC version matrix tests v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants