Skip to content

Validate build hash of joining node #65249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DaveCTurner opened this issue Nov 19, 2020 · 2 comments · Fixed by #65732
Closed

Validate build hash of joining node #65249

DaveCTurner opened this issue Nov 19, 2020 · 2 comments · Fixed by #65732
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@DaveCTurner
Copy link
Contributor

Today our more intrepid users sometimes form a cluster from unreleased versions in order to test out new functionality. This is usually fine, and indeed is encouraged, but can lead to difficulties if they happen to use two different builds of the same numbered version since there is no guarantee of compatibility between such builds. Clusters like this can behave quite strangely and it can be tricky to work out why. A similar issue occurs when using a remote cluster with an equal version but a different build hash.

I'm opening this issue to discuss whether we should prevent an invalid mix of nodes from forming a faulty cluster or remote cluster connection by sharing Build.CURRENT.hash() as well as Version.CURRENT somewhere in the handshaking process. If two nodes determine that they have the same numbered version, but a different build hash, then they would emit a warning and drop the connection.

@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. team-discuss labels Nov 19, 2020
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Nov 19, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

We discussed this in the most recent team sync and decided to proceed. This change will block any folks that are inadvertently relying on today's lenience in their testing workflows, and we don't have a good way to determine who those people are, so the Cloud testing folks have requested an escape hatch to temporarily restore the lenient behaviour in case it's needed in an emergency. We'll introduce and immediately deprecate a system property that provides this escape hatch.

@DaveCTurner DaveCTurner self-assigned this Nov 27, 2020
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Nov 30, 2020
There is no guarantee of wire compatibility between nodes running
different builds of the same version, but today we do not validate
whether two communicating nodes are compatible or not. This results in
confusing failures that look like serialization bugs, and it usually
takes nontrivial effort to determine that the failure is in fact due to
the user running incompatible builds.

This commit adds the build hash to the transport service handshake and
validates that matching versions have matching build hashes.

Closes elastic#65249
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Dec 2, 2020
There is no guarantee of wire compatibility between nodes running
different builds of the same version, but today we do not validate
whether two communicating nodes are compatible or not. This results in
confusing failures that look like serialization bugs, and it usually
takes nontrivial effort to determine that the failure is in fact due to
the user running incompatible builds.

This commit adds the build hash to the transport service handshake and
validates that matching versions have matching build hashes.

Closes elastic#65249
DaveCTurner added a commit that referenced this issue Dec 2, 2020
There is no guarantee of wire compatibility between nodes running
different builds of the same version, but today we do not validate
whether two communicating nodes are compatible or not. This results in
confusing failures that look like serialization bugs, and it usually
takes nontrivial effort to determine that the failure is in fact due to
the user running incompatible builds.

This commit adds the build hash to the transport service handshake and
validates that matching versions have matching build hashes.

Closes #65249
DaveCTurner added a commit that referenced this issue Dec 2, 2020
There is no guarantee of wire compatibility between nodes running
different builds of the same version, but today we do not validate
whether two communicating nodes are compatible or not. This results in
confusing failures that look like serialization bugs, and it usually
takes nontrivial effort to determine that the failure is in fact due to
the user running incompatible builds.

This commit adds the build hash to the transport service handshake and
validates that matching versions have matching build hashes.

Closes #65249
Backport of #65732
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Dec 2, 2020
Today in `7.x` there is a deprecated system property that bypasses the
check that prevents nodes of incompatible builds from communicating.
This commit removes the system property in `master` so that the check is
always enforced.

Relates elastic#65601, elastic#65249
DaveCTurner added a commit that referenced this issue Dec 2, 2020
Today in `7.x` there is a deprecated system property that bypasses the
check that prevents nodes of incompatible builds from communicating.
This commit removes the system property in `master` so that the check is
always enforced.

Relates #65601, #65249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants