Skip to content

Fix segment stats in tsdb #89754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Aug 31, 2022

The serialization for segment stats was broken for tsdb because we
return a slightly different sort configuration. That caused
_segments and _cat/segments to break when any shard of the tsdb
index is hosted on another node.

Closes #89609

@nik9000
Copy link
Member Author

nik9000 commented Aug 31, 2022

CC @weizijun

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 31, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

The serialization for segment stats was broken for tsdb because we
return a *slightly* different sort configuration. That caused
`_segments` and `_cat/segments` to break when any shard of the tsdb
index is hosted on another node.

Closes elastic#89609
@nik9000 nik9000 requested a review from csoulios August 31, 2022 14:26
- match:
$body: |
/^(tsdb \s+ 0 \s+ p \s+ \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} \s+ _\d (\s\d){3} \s+
(\d+|\d+[.]\d+)(kb|b) \s+ \d+ (\s+ (false|true)){2} \s+ \d+\.\d+(\.\d+)? \s+ (false|true) \s? \n?)$/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reproduce @weizijun's failure like this:

while g -p qa/smoke-test-multinode/ check -Dtests.method='*segments*tsdb*'; do echo ok; done

Copy link
Contributor

@csoulios csoulios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the Segment class and its inner workings.

I have left couple comments/questions, but I leave it up to the other reviewers to approve this PR.

} else {
o.writeByte((byte) 5);
o.writeOptionalBoolean(field.getMissingValue() == null ? null : field.getMissingValue() == SortField.STRING_FIRST);
o.writeBoolean(field.getReverse());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand this part very well. Don't we need to write a boolean for the selector?

In all other cases (type = 0) I see one byte and thre booleans are written. Not sure why type = 5 requires only 2 booleans.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a selector we use the other case above. In this case we just don't have one.

@@ -189,6 +189,13 @@ private static Sort readSegmentSort(StreamInput in) throws IOException {
if (missingFirst != null) {
fields[i].setMissingValue(missingFirst ? SortedSetSortField.STRING_FIRST : SortedSetSortField.STRING_LAST);
}
} else if (type == 5) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I see what type = 5 is? A comment would help here, a constant would be even better, I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I was following the old way and it didn't have them. I didn't want to rock the boat too much here, but there isn't any reason not to try and at least make it more readable. I'll add some constants.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me, _tsid is our first field that we index as a single-value field which can be used for index sorting.

@nik9000
Copy link
Member Author

nik9000 commented Sep 1, 2022

The change makes sense to me, _tsid is our first field that we index as a single-value field which can be used for index sorting.

@jpountz could you comment on why we aren't using Lucene's serialization for these? Or why we don't make our own object rather than this brittle thing? I don't particularly want to change now, but it seems like we wouldn't try to serialize with the instanceof stuff if we were building it now.

@jpountz
Copy link
Contributor

jpountz commented Sep 1, 2022

I would guess that one reason is because segment stats added the sorting configuration to its output before Lucene exposed serialization for SortFields, which used to be a codec implementation before. @romseygeek might have more info about whether moving to Lucene serialization would be possible and a good idea. FWIW I see we have custom serialization logic for SortField in org.elasticsearch.common.lucene.Lucene#readSortField as well.

@nik9000
Copy link
Member Author

nik9000 commented Sep 1, 2022

prg.elasticsearch.common.lucene.Lucene#readSortField

Oh no! It's totally different!

@nik9000 nik9000 added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 1, 2022
@elasticsearchmachine elasticsearchmachine merged commit 188990f into elastic:main Sep 8, 2022
@nik9000 nik9000 deleted the tsdb_segment_stats_sneaky branch September 8, 2022 16:11
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Sep 9, 2022
* main: (34 commits)
  Make sure ivy repo directory exists before downloading artifacts
  Use 'file://' scheme for local repository URL
  Use DRA artifacts for release build CI jobs
  Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241)
  Script: Write Field API path manipulation (elastic#89889)
  Fetch health info action (elastic#89820)
  Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935)
  [ML] Performance improvements for categorization jobs (elastic#89824)
  [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931)
  Fix deadlock bug exposed by a test (elastic#89934)
  [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497)
  Fix segment stats in tsdb (elastic#89754)
  Synthetic _source: support dense_vector (elastic#89840)
  REST tests fetching fields with synthetic _source (elastic#89888)
  Do not deserialize back BytesTransportRequest to clone a request in MockTransportService (elastic#89926)
  Add SDK request logging to debug failures of S3BlobStoreRepositoryTests#testRequestStats (elastic#89912)
  Fix SnapshotStatusApisIT.testGetSnapshotsWithSnapshotInProgress (elastic#89925)
  Document synthetic source for text and keyword (elastic#89893)
  Fix CloneSnapshotIT.testRemoveFailedCloneFromCSWithQueuedSnapshotInProgress (elastic#89914)
  Add missing index.mapping.total_fields.limit setting to the target index (elastic#89875)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TSDB: GET _cat/segments failed in time_series index
4 participants