-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Fix segment stats in tsdb #89754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segment stats in tsdb #89754
Conversation
CC @weizijun |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
Hi @nik9000, I've created a changelog YAML for you. |
The serialization for segment stats was broken for tsdb because we return a *slightly* different sort configuration. That caused `_segments` and `_cat/segments` to break when any shard of the tsdb index is hosted on another node. Closes elastic#89609
- match: | ||
$body: | | ||
/^(tsdb \s+ 0 \s+ p \s+ \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} \s+ _\d (\s\d){3} \s+ | ||
(\d+|\d+[.]\d+)(kb|b) \s+ \d+ (\s+ (false|true)){2} \s+ \d+\.\d+(\.\d+)? \s+ (false|true) \s? \n?)$/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reproduce @weizijun's failure like this:
while g -p qa/smoke-test-multinode/ check -Dtests.method='*segments*tsdb*'; do echo ok; done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the Segment
class and its inner workings.
I have left couple comments/questions, but I leave it up to the other reviewers to approve this PR.
} else { | ||
o.writeByte((byte) 5); | ||
o.writeOptionalBoolean(field.getMissingValue() == null ? null : field.getMissingValue() == SortField.STRING_FIRST); | ||
o.writeBoolean(field.getReverse()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand this part very well. Don't we need to write a boolean for the selector?
In all other cases (type = 0
) I see one byte and thre booleans are written. Not sure why type = 5
requires only 2 booleans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a selector we use the other case above. In this case we just don't have one.
@@ -189,6 +189,13 @@ private static Sort readSegmentSort(StreamInput in) throws IOException { | |||
if (missingFirst != null) { | |||
fields[i].setMissingValue(missingFirst ? SortedSetSortField.STRING_FIRST : SortedSetSortField.STRING_LAST); | |||
} | |||
} else if (type == 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where can I see what type = 5
is? A comment would help here, a constant would be even better, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I was following the old way and it didn't have them. I didn't want to rock the boat too much here, but there isn't any reason not to try and at least make it more readable. I'll add some constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me, _tsid
is our first field that we index as a single-value field which can be used for index sorting.
@jpountz could you comment on why we aren't using Lucene's serialization for these? Or why we don't make our own object rather than this brittle thing? I don't particularly want to change now, but it seems like we wouldn't try to serialize with the |
I would guess that one reason is because segment stats added the sorting configuration to its output before Lucene exposed serialization for SortFields, which used to be a codec implementation before. @romseygeek might have more info about whether moving to Lucene serialization would be possible and a good idea. FWIW I see we have custom serialization logic for |
Oh no! It's totally different! |
* main: (34 commits) Make sure ivy repo directory exists before downloading artifacts Use 'file://' scheme for local repository URL Use DRA artifacts for release build CI jobs Log unsuccessful attempts to get credentials from web identity tokens (elastic#88241) Script: Write Field API path manipulation (elastic#89889) Fetch health info action (elastic#89820) Fix memory leak in TransportDeleteExpiredDataAction (elastic#89935) [ML] Performance improvements for categorization jobs (elastic#89824) [DOCS] Revert changes for ES_JAVA_OPTS (elastic#89931) Fix deadlock bug exposed by a test (elastic#89934) [Downsampling] Remove `FieldValueFetcher` validator (elastic#89497) Fix segment stats in tsdb (elastic#89754) Synthetic _source: support dense_vector (elastic#89840) REST tests fetching fields with synthetic _source (elastic#89888) Do not deserialize back BytesTransportRequest to clone a request in MockTransportService (elastic#89926) Add SDK request logging to debug failures of S3BlobStoreRepositoryTests#testRequestStats (elastic#89912) Fix SnapshotStatusApisIT.testGetSnapshotsWithSnapshotInProgress (elastic#89925) Document synthetic source for text and keyword (elastic#89893) Fix CloneSnapshotIT.testRemoveFailedCloneFromCSWithQueuedSnapshotInProgress (elastic#89914) Add missing index.mapping.total_fields.limit setting to the target index (elastic#89875) ...
The serialization for segment stats was broken for tsdb because we
return a slightly different sort configuration. That caused
_segments
and_cat/segments
to break when any shard of the tsdbindex is hosted on another node.
Closes #89609