Skip to content

First step optimizing tsdb doc values codec merging. #125403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Apr 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
e496e0d
First step optimizing tsdb doc values codec merging.
martijnvg Mar 21, 2025
9bd2907
[CI] Auto commit changes from spotless
elasticsearchmachine Mar 21, 2025
65d97e5
actually use OrdinalMap when merging sorted and sorted dv
martijnvg Mar 21, 2025
7369a22
fix test
martijnvg Mar 21, 2025
3b7822d
[CI] Auto commit changes from spotless
elasticsearchmachine Mar 21, 2025
ce4b326
fix test (2)
martijnvg Mar 21, 2025
486ea20
fix lost of stuff
martijnvg Mar 21, 2025
16c0a00
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 21, 2025
984513a
iter
martijnvg Mar 24, 2025
5a575d6
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 24, 2025
3b53705
iter test
martijnvg Mar 24, 2025
9fb38b6
moving code around
martijnvg Mar 24, 2025
1e0e2f8
benchmark iter
martijnvg Mar 25, 2025
65741c4
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 25, 2025
1ec6308
Check for deleted docs before getting doc value instances.
martijnvg Mar 25, 2025
ccae570
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 25, 2025
5e7cc11
remove doc value skipper check
martijnvg Mar 25, 2025
744a665
Remove getEntryFunction lamda and delegate to doc value instance dire…
martijnvg Mar 25, 2025
176fac7
lower doc count in benchmark
martijnvg Mar 25, 2025
ec998a3
added node setting to control whether optimized merge is enabled.
martijnvg Mar 25, 2025
5425079
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 25, 2025
066b778
Update docs/changelog/125403.yaml
martijnvg Mar 25, 2025
722f85e
[CI] Auto commit changes from spotless
elasticsearchmachine Mar 25, 2025
2bb9867
register node setting
martijnvg Mar 25, 2025
5bcc62c
fix npe
martijnvg Mar 25, 2025
27efdd2
iter
martijnvg Mar 26, 2025
646c566
Revert node setting for jvm env variable.
martijnvg Mar 26, 2025
98c0874
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 26, 2025
71201c8
more tests
martijnvg Mar 27, 2025
c41b1f9
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 27, 2025
e6fe87a
iter
martijnvg Mar 27, 2025
020bae7
remove unused field
martijnvg Mar 27, 2025
d8b3c15
fixed bug
martijnvg Mar 27, 2025
30b3c62
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Mar 29, 2025
9f96da1
Make it really work:
martijnvg Mar 29, 2025
8edfc39
Store numDocsWithField statistic on NumericEntry instead of SortedNum…
martijnvg Mar 31, 2025
7e958fe
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 2, 2025
45c6a7c
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 2, 2025
638ae13
iter
martijnvg Apr 2, 2025
7f4773b
removed unused import
martijnvg Apr 2, 2025
09ee20a
cleanup
martijnvg Apr 2, 2025
5ad223d
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 2, 2025
b26000e
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 4, 2025
75f5a75
fork DocValuesConsumer
martijnvg Apr 4, 2025
fa0c5ee
Remove unused code from XDocValuesConsumer and let ES819TSDBDocValues…
martijnvg Apr 4, 2025
fb9fd6d
spotless and checkstyle
martijnvg Apr 4, 2025
dd460e8
fork PerFieldDocValuesFormat in order to avoid tricky unwrapping in D…
martijnvg Apr 4, 2025
c3abb0e
oops
martijnvg Apr 4, 2025
6921dd8
rename codec name
martijnvg Apr 4, 2025
3e60a74
removed unused exception
martijnvg Apr 4, 2025
5a2da25
address bwc failures
martijnvg Apr 5, 2025
39dc98f
Remove BaseNumericDocValues and BaseSortedNumericDocValues
martijnvg Apr 5, 2025
5bfb302
improve TsdbDocValueBwcTests
martijnvg Apr 5, 2025
b641e01
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 5, 2025
5e8ea42
Assert per field format field info attributes.
martijnvg Apr 5, 2025
1c4efa6
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 8, 2025
57e2996
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 8, 2025
b5f332b
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 8, 2025
66c7efb
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 8, 2025
86b4d22
Merge remote-tracking branch 'es/main' into mergeSortedNumericField_3
martijnvg Apr 8, 2025
a44ab59
move per field dv code to dedicated package
martijnvg Apr 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the "Elastic License
* 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side
* Public License v 1"; you may not use this file except in compliance with, at
* your election, the "Elastic License 2.0", the "GNU Affero General Public
* License v3.0 only", or the "Server Side Public License, v 1".
*/

package org.elasticsearch.benchmark.index.codec.tsdb;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.codecs.DocValuesFormat;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.SortedDocValuesField;
import org.apache.lucene.document.SortedNumericDocValuesField;
import org.apache.lucene.document.SortedSetDocValuesField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LogByteSizeMergePolicy;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.SortedNumericSortField;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.cluster.metadata.DataStream;
import org.elasticsearch.common.logging.LogConfigurator;
import org.elasticsearch.index.codec.Elasticsearch900Lucene101Codec;
import org.elasticsearch.index.codec.tsdb.es819.ES819TSDBDocValuesFormat;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.profile.AsyncProfiler;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.io.IOException;
import java.nio.file.Files;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(1)
@Threads(1)
@Warmup(iterations = 0)
@Measurement(iterations = 1)
public class TSDBDocValuesMergeBenchmark {

static {
// For Elasticsearch900Lucene101Codec:
LogConfigurator.loadLog4jPlugins();
LogConfigurator.configureESLogging();
LogConfigurator.setNodeName("test");
}

@Param("20431204")
private int nDocs;

@Param("1000")
private int deltaTime;

@Param("42")
private int seed;

private static final String TIMESTAMP_FIELD = "@timestamp";
private static final String HOSTNAME_FIELD = "host.name";
private static final long BASE_TIMESTAMP = 1704067200000L;

private IndexWriter indexWriterWithoutOptimizedMerge;
private IndexWriter indexWriterWithOptimizedMerge;
private ExecutorService executorService;

public static void main(String[] args) throws RunnerException {
final Options options = new OptionsBuilder().include(TSDBDocValuesMergeBenchmark.class.getSimpleName())
.addProfiler(AsyncProfiler.class)
.build();

new Runner(options).run();
}

@Setup(Level.Trial)
public void setup() throws IOException {
executorService = Executors.newSingleThreadExecutor();

final Directory tempDirectoryWithoutDocValuesSkipper = FSDirectory.open(Files.createTempDirectory("temp1-"));
final Directory tempDirectoryWithDocValuesSkipper = FSDirectory.open(Files.createTempDirectory("temp2-"));

indexWriterWithoutOptimizedMerge = createIndex(tempDirectoryWithoutDocValuesSkipper, false);
indexWriterWithOptimizedMerge = createIndex(tempDirectoryWithDocValuesSkipper, true);
}

private IndexWriter createIndex(final Directory directory, final boolean optimizedMergeEnabled) throws IOException {
final var iwc = createIndexWriterConfig(optimizedMergeEnabled);
long counter1 = 0;
long counter2 = 10_000_000;
long[] gauge1Values = new long[] { 2, 4, 6, 8, 10, 12, 14, 16 };
long[] gauge2Values = new long[] { -2, -4, -6, -8, -10, -12, -14, -16 };
int numHosts = 1000;
String[] tags = new String[] { "tag_1", "tag_2", "tag_3", "tag_4", "tag_5", "tag_6", "tag_7", "tag_8" };

final Random random = new Random(seed);
IndexWriter indexWriter = new IndexWriter(directory, iwc);
for (int i = 0; i < nDocs; i++) {
final Document doc = new Document();

final int batchIndex = i / numHosts;
final String hostName = "host-" + batchIndex;
// Slightly vary the timestamp in each document
final long timestamp = BASE_TIMESTAMP + ((i % numHosts) * deltaTime) + random.nextInt(0, deltaTime);

doc.add(new SortedDocValuesField(HOSTNAME_FIELD, new BytesRef(hostName)));
doc.add(new SortedNumericDocValuesField(TIMESTAMP_FIELD, timestamp));
doc.add(new SortedNumericDocValuesField("counter_1", counter1++));
doc.add(new SortedNumericDocValuesField("counter_2", counter2++));
doc.add(new SortedNumericDocValuesField("gauge_1", gauge1Values[i % gauge1Values.length]));
doc.add(new SortedNumericDocValuesField("gauge_2", gauge2Values[i % gauge1Values.length]));
int numTags = tags.length % (i + 1);
for (int j = 0; j < numTags; j++) {
doc.add(new SortedSetDocValuesField("tags", new BytesRef(tags[j])));
}

indexWriter.addDocument(doc);
}
indexWriter.commit();
return indexWriter;
}

@Benchmark
public void forceMergeWithoutOptimizedMerge() throws IOException {
forceMerge(indexWriterWithoutOptimizedMerge);
}

@Benchmark
public void forceMergeWithOptimizedMerge() throws IOException {
forceMerge(indexWriterWithOptimizedMerge);
}

private void forceMerge(final IndexWriter indexWriter) throws IOException {
indexWriter.forceMerge(1);
}

@TearDown(Level.Trial)
public void tearDown() {
if (executorService != null) {
executorService.shutdown();
try {
if (executorService.awaitTermination(30, TimeUnit.SECONDS) == false) {
executorService.shutdownNow();
}
} catch (InterruptedException e) {
executorService.shutdownNow();
Thread.currentThread().interrupt();
}
}
}

private static IndexWriterConfig createIndexWriterConfig(boolean optimizedMergeEnabled) {
var config = new IndexWriterConfig(new StandardAnalyzer());
// NOTE: index sort config matching LogsDB's sort order
config.setIndexSort(
new Sort(
new SortField(HOSTNAME_FIELD, SortField.Type.STRING, false),
new SortedNumericSortField(TIMESTAMP_FIELD, SortField.Type.LONG, true)
)
);
config.setLeafSorter(DataStream.TIMESERIES_LEAF_READERS_SORTER);
config.setMergePolicy(new LogByteSizeMergePolicy());
var docValuesFormat = new ES819TSDBDocValuesFormat(4096, optimizedMergeEnabled);
config.setCodec(new Elasticsearch900Lucene101Codec() {

@Override
public DocValuesFormat getDocValuesFormatForField(String field) {
return docValuesFormat;
}
});
return config;
}
}
5 changes: 5 additions & 0 deletions docs/changelog/125403.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 125403
summary: First step optimizing tsdb doc values codec merging
area: Codec
type: enhancement
issues: []
1 change: 1 addition & 0 deletions server/src/main/java/module-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -475,4 +475,5 @@
exports org.elasticsearch.monitor.metrics;
exports org.elasticsearch.plugins.internal.rewriter to org.elasticsearch.inference;
exports org.elasticsearch.lucene.util.automaton;
exports org.elasticsearch.index.codec.perfield;
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
import org.apache.lucene.codecs.lucene101.Lucene101PostingsFormat;
import org.apache.lucene.codecs.lucene90.Lucene90DocValuesFormat;
import org.apache.lucene.codecs.lucene99.Lucene99HnswVectorsFormat;
import org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat;
import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat;
import org.apache.lucene.codecs.perfield.PerFieldPostingsFormat;
import org.elasticsearch.index.codec.perfield.XPerFieldDocValuesFormat;
import org.elasticsearch.index.codec.zstd.Zstd814StoredFieldsFormat;

/**
Expand All @@ -39,7 +39,7 @@ public PostingsFormat getPostingsFormatForField(String field) {
};

private final DocValuesFormat defaultDVFormat;
private final DocValuesFormat docValuesFormat = new PerFieldDocValuesFormat() {
private final DocValuesFormat docValuesFormat = new XPerFieldDocValuesFormat() {
@Override
public DocValuesFormat getDocValuesFormatForField(String field) {
return Elasticsearch900Lucene101Codec.this.getDocValuesFormatForField(field);
Expand Down
Loading
Loading