Skip to content

Commit 5db7ed2

Browse files
authored
Bootstrap a Zen2 cluster once quorum is discovered (#37463)
Today when bootstrapping a Zen2 cluster we wait for every node in the `initial_master_nodes` setting to be discovered, so that we can map the node names or addresses in the `initial_master_nodes` list to their IDs for inclusion in the initial voting configuration. This means that if any of the expected master-eligible nodes fails to start then bootstrapping will not occur and the cluster will not form. This is not ideal, and we would prefer the cluster to bootstrap even if some of the master-eligible nodes do not start. Safe bootstrapping requires that all pairs of quorums of all initial configurations overlap, and this is particularly troublesome to ensure given that nodes may be concurrently and independently attempting to bootstrap the cluster. The solution is to bootstrap using an initial configuration whose size matches the size of the expected set of master-eligible nodes, but with the unknown IDs replaced by "placeholder" IDs that can never belong to any node. Any quorum of received votes in any of these placeholder-laden initial configurations is also a quorum of the "true" initial set of master-eligible nodes, giving the guarantee that it intersects all other quorums as required. Note that this change means that the initial configuration is not necessarily robust to any node failures. Normally the cluster will form and then auto-reconfigure to a more robust configuration in which the placeholder IDs are replaced by the IDs of genuine nodes as they join the cluster; however if a node fails between bootstrapping and this auto-reconfiguration then the cluster may become unavailable. This we feel to be less likely than a node failing to start at all. This commit also enormously simplifies the cluster bootstrapping process. Today, the cluster bootstrapping process involves two (local) transport actions in order to support a flexible bootstrapping API and to make it easily accessible to plugins. However this flexibility is not required for the current design so it is adding a good deal of unnecessary complexity. Here we remove this complexity in favour of a much simpler ClusterBootstrapService implementation that does all the work itself.
1 parent e9fcb25 commit 5db7ed2

27 files changed

+563
-2448
lines changed

server/src/main/java/org/elasticsearch/ElasticsearchException.java

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1009,10 +1009,8 @@ private enum ElasticsearchExceptionHandle {
10091009
MultiBucketConsumerService.TooManyBucketsException::new, 149, Version.V_7_0_0),
10101010
COORDINATION_STATE_REJECTED_EXCEPTION(org.elasticsearch.cluster.coordination.CoordinationStateRejectedException.class,
10111011
org.elasticsearch.cluster.coordination.CoordinationStateRejectedException::new, 150, Version.V_7_0_0),
1012-
CLUSTER_ALREADY_BOOTSTRAPPED_EXCEPTION(org.elasticsearch.cluster.coordination.ClusterAlreadyBootstrappedException.class,
1013-
org.elasticsearch.cluster.coordination.ClusterAlreadyBootstrappedException::new, 151, Version.V_7_0_0),
10141012
SNAPSHOT_IN_PROGRESS_EXCEPTION(org.elasticsearch.snapshots.SnapshotInProgressException.class,
1015-
org.elasticsearch.snapshots.SnapshotInProgressException::new, 152, Version.V_7_0_0);
1013+
org.elasticsearch.snapshots.SnapshotInProgressException::new, 151, Version.V_7_0_0);
10161014

10171015
final Class<? extends ElasticsearchException> exceptionClass;
10181016
final CheckedFunction<StreamInput, ? extends ElasticsearchException, IOException> constructor;

server/src/main/java/org/elasticsearch/action/ActionModule.java

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,6 @@
2323
import org.apache.logging.log4j.LogManager;
2424
import org.elasticsearch.action.admin.cluster.allocation.ClusterAllocationExplainAction;
2525
import org.elasticsearch.action.admin.cluster.allocation.TransportClusterAllocationExplainAction;
26-
import org.elasticsearch.action.admin.cluster.bootstrap.BootstrapClusterAction;
27-
import org.elasticsearch.action.admin.cluster.bootstrap.GetDiscoveredNodesAction;
28-
import org.elasticsearch.action.admin.cluster.bootstrap.TransportBootstrapClusterAction;
29-
import org.elasticsearch.action.admin.cluster.bootstrap.TransportGetDiscoveredNodesAction;
3026
import org.elasticsearch.action.admin.cluster.configuration.AddVotingConfigExclusionsAction;
3127
import org.elasticsearch.action.admin.cluster.configuration.ClearVotingConfigExclusionsAction;
3228
import org.elasticsearch.action.admin.cluster.configuration.TransportAddVotingConfigExclusionsAction;
@@ -433,8 +429,6 @@ public <Request extends ActionRequest, Response extends ActionResponse> void reg
433429
actions.register(GetTaskAction.INSTANCE, TransportGetTaskAction.class);
434430
actions.register(CancelTasksAction.INSTANCE, TransportCancelTasksAction.class);
435431

436-
actions.register(GetDiscoveredNodesAction.INSTANCE, TransportGetDiscoveredNodesAction.class);
437-
actions.register(BootstrapClusterAction.INSTANCE, TransportBootstrapClusterAction.class);
438432
actions.register(AddVotingConfigExclusionsAction.INSTANCE, TransportAddVotingConfigExclusionsAction.class);
439433
actions.register(ClearVotingConfigExclusionsAction.INSTANCE, TransportClearVotingConfigExclusionsAction.class);
440434
actions.register(ClusterAllocationExplainAction.INSTANCE, TransportClusterAllocationExplainAction.class);

server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/BootstrapClusterAction.java

Lines changed: 0 additions & 41 deletions
This file was deleted.

server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/BootstrapClusterRequest.java

Lines changed: 0 additions & 65 deletions
This file was deleted.

server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/BootstrapClusterResponse.java

Lines changed: 0 additions & 66 deletions
This file was deleted.

server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/BootstrapConfiguration.java

Lines changed: 0 additions & 179 deletions
This file was deleted.

0 commit comments

Comments
 (0)