-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Update persistent state document in the index the document belongs to #51751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update persistent state document in the index the document belongs to #51751
Conversation
Pinging @elastic/ml-core (:ml) |
4657afd
to
889df4f
Compare
|
||
stateProcessor.process(stream); | ||
Exception e = expectThrows(IllegalStateException.class, () -> stateProcessor.process(stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a dangerous change. Previously if the C++ wrote nothing but whitespace it was silently ignored. Now it will cause the state processor to throw an exception and never process any subsequent state.
I think the behaviour should be changed back to what it was for the pure whitespace case. It's fine to throw an exception if a bulk metadata JSON object is invalid. But if it's not present at all then I think we should maintain the previous behaviour of treating it as a no-op. Otherwise a very careful audit of the C++ code will be required to find what situations it might write blank state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good finding, done.
Please note that I'm now skipping leading blank lines just in case the actual request body is present after those blank lines.
889df4f
to
9095369
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but please can you add some comments to make it clearer to future maintainers why we are doing this
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/process/IndexingStateProcessor.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/process/IndexingStateProcessor.java
Outdated
Show resolved
Hide resolved
9095369
to
8a97833
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Currently, the persistent state document is always indexed into the index pointed by an alias ".ml-state-write".
We plan this alias->index mapping to change over time (as we introduce rollover).
Therefore, we need to make sure we don't end up with two copies of the state document.
This PR achieves that by:
In order to know the document's id, bytes from C++ stream must be parsed (but only until the first new line character).
Relates #29938