-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Validate build hash of joining node #65249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
We discussed this in the most recent team sync and decided to proceed. This change will block any folks that are inadvertently relying on today's lenience in their testing workflows, and we don't have a good way to determine who those people are, so the Cloud testing folks have requested an escape hatch to temporarily restore the lenient behaviour in case it's needed in an emergency. We'll introduce and immediately deprecate a system property that provides this escape hatch. |
There is no guarantee of wire compatibility between nodes running different builds of the same version, but today we do not validate whether two communicating nodes are compatible or not. This results in confusing failures that look like serialization bugs, and it usually takes nontrivial effort to determine that the failure is in fact due to the user running incompatible builds. This commit adds the build hash to the transport service handshake and validates that matching versions have matching build hashes. Closes elastic#65249
There is no guarantee of wire compatibility between nodes running different builds of the same version, but today we do not validate whether two communicating nodes are compatible or not. This results in confusing failures that look like serialization bugs, and it usually takes nontrivial effort to determine that the failure is in fact due to the user running incompatible builds. This commit adds the build hash to the transport service handshake and validates that matching versions have matching build hashes. Closes elastic#65249
There is no guarantee of wire compatibility between nodes running different builds of the same version, but today we do not validate whether two communicating nodes are compatible or not. This results in confusing failures that look like serialization bugs, and it usually takes nontrivial effort to determine that the failure is in fact due to the user running incompatible builds. This commit adds the build hash to the transport service handshake and validates that matching versions have matching build hashes. Closes #65249
There is no guarantee of wire compatibility between nodes running different builds of the same version, but today we do not validate whether two communicating nodes are compatible or not. This results in confusing failures that look like serialization bugs, and it usually takes nontrivial effort to determine that the failure is in fact due to the user running incompatible builds. This commit adds the build hash to the transport service handshake and validates that matching versions have matching build hashes. Closes #65249 Backport of #65732
Today in `7.x` there is a deprecated system property that bypasses the check that prevents nodes of incompatible builds from communicating. This commit removes the system property in `master` so that the check is always enforced. Relates elastic#65601, elastic#65249
Today our more intrepid users sometimes form a cluster from unreleased versions in order to test out new functionality. This is usually fine, and indeed is encouraged, but can lead to difficulties if they happen to use two different builds of the same numbered version since there is no guarantee of compatibility between such builds. Clusters like this can behave quite strangely and it can be tricky to work out why. A similar issue occurs when using a remote cluster with an equal version but a different build hash.
I'm opening this issue to discuss whether we should prevent an invalid mix of nodes from forming a faulty cluster or remote cluster connection by sharing
Build.CURRENT.hash()
as well asVersion.CURRENT
somewhere in the handshaking process. If two nodes determine that they have the same numbered version, but a different build hash, then they would emit a warning and drop the connection.The text was updated successfully, but these errors were encountered: