Merge master election with state recovery in the case of a full cluster restart #14016
Labels
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
>enhancement
resiliency
At the moment we have a two step process - we first elect a master (based on the votes of a min_master_nodes masters). Next the elected master reaches out to at least min_master_nodes master nodes and finds the best last known cluster state. The cluster state will be used as the initial state of the cluster. We can probably merge these two into one, making sure that the elected master have the best state locally (similar to how things work in RAFT, for example). We should watch out for subtleties around honoring
recover_after
settings and their implications (they are mostly meant for shard recovery, so it should, in theory, be OK).The text was updated successfully, but these errors were encountered: