-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[FEATURE][ML] Add checksum checks on dataframe result joining #37259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE][ML] Add checksum checks on dataframe result joining #37259
Conversation
Pinging @elastic/ml-core |
This is the java side of elastic/ml-cpp#358 |
.../ml/src/main/java/org/elasticsearch/xpack/ml/analytics/process/AnalyticsResultProcessor.java
Outdated
Show resolved
Hide resolved
I've pushed a commit to actually use a checksum of all relevant values instead of just the document id. |
In order to sanity check that analytics results are joined correctly with their corresponding dataframe rows, we write a 32-bit hash of the document ids to the c++ process which includes it in the results. Upon joining we check the id hashes match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
13bd906
to
c04fa60
Compare
In order to sanity check that analytics results are joined
correctly with their corresponding dataframe rows, we write
a 32-bit hash of the document ids to the c++ process which
includes it in the results. Upon joining we check the id
hashes match.