-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster downtime during master node restart while not in discovery file provider #1138
Comments
This sounds like the sort of thing that is fixed by elastic/elasticsearch#39629 in 7.2. It's probably a good idea to exclude the master from the voting config before shutting it down, in order to cause it to hand over to another node while it's still alive, and this should help in 7.0 and 7.1 as well. |
@DaveCTurner thanks for the feedback! Your hint on voting exclusions (which we already did) helped me find a bug in the current code where exclusions where reset only after all nodes were rolled (which doesn't make sense). I think that was actually the underlying issue behind my cluster getting stuck. |
That makes me suspect there might also be a bug in how you're detecting the success of adding voting config exclusions. You must check that they're in the exclusions list but also that the node ids are removed from the voting config. This is what the API does, so it's probably simplest to hit the API until it returns |
@DaveCTurner I'm a bit confused now.
What we do is:
Do you think we are missing a step here?
How?
Which API are you referencing here? |
Ok, I was guessing how the OP might have come about, but maybe I guessed wrong. AIUI you ended up trying to add each node to the voting config exclusions list, and then shut it down, but you weren't clearing the list at each step. This would mean that when you got to the last node (the master) you wouldn't get a
I would say to do this sooner, ideally just after stopping the node. This is how it's done in the test suite: When
The safest way is to |
Closing this one, thanks a lot @DaveCTurner for the help. |
I'm opening this issue while working on PVC reuse/rolling upgrades (#312), which is not merged yet, but it seemed important to have this separate discussion.
I observe a small downtime in the cluster during the rolling upgrade process, while the master node is being restarted (we restart it last). There is no downtime when other nodes are restarted.
During master nodes restart, requests to Elasticsearch (3 x v7.1 mdi nodes cluster - 2/3 nodes still alive - the master is down) return:
Master election never happens, even though we have 2/3 master eligible nodes alive, until the master node gets back into the cluster a few seconds later (restart over).
Those are the errors we can see in one of the 2 remaining Elasticsearch instances logs:
The two remaining nodes seem to complain about the third node (master whose restart is in progress) not being available.
Debugging this a bit more, I realised this is related to the way we manage the discovery.seed_providers file. In this file, we inject each master node's IP (Kubernetes pod IP), on every reconciliation loop. To do that we simply inspect the current pods in the cluster, and if they're master eligible we append their IP to that file, which gets propagated to all nodes in the cluster.
Very soon after stopping the master node (deleting the pod but keeping data volume around), its IP address is also deleted from that file. From our perspective there is no reason to keep it around: that "old" IP does not make sense anymore. When recreated (with the same data), the pod will probably get assigned a new IP.
So we first create the pod, then as soon as it has an IP available we inject it into the file. And the situation gets unlocked, master election can proceed.
However during the whole time where the pod is restarted, and its IP disappears from the seed providers discovery file, the cluster is unavailable.
If I "manually" delete the pod but keep its IP (which does not make sense anymore) in the discovery.seed_providers file, a new master gets elected instantly among the 2 remaining nodes.
I'm wondering if:
The text was updated successfully, but these errors were encountered: