Skip to content

Commit 19eb922

Browse files
authored
Remove join timeout (#60873)
There is no point in timing out a join attempt any more. Timing out and retrying with the same master is pointless, and an in-flight join attempt to one master no longer blocks attempts to join other masters. This commit removes this unnecessary setting. Relates #60872 in which this setting was deprecated.
1 parent 0dc3364 commit 19eb922

File tree

9 files changed

+31
-54
lines changed

9 files changed

+31
-54
lines changed

docs/reference/migration/migrate_8_0/cluster.asciidoc

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,14 @@ to exclude instead of using a node filter. Requests submitted to the
2424
`/_cluster/voting_config_exclusions/{node_filter}` endpoint will return an
2525
error.
2626
====
27+
28+
.The `cluster.join.timeout` setting has been removed.
29+
[%collapsible]
30+
====
31+
*Details* +
32+
The `cluster.join.timeout` setting has been removed. Join attempts no longer
33+
time out.
34+
35+
*Impact* +
36+
Do not set `cluster.join.timeout` in your `elasticsearch.yml` file.
37+
====

docs/reference/modules/discovery/discovery-settings.asciidoc

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,10 @@ for `discovery.seed_hosts` is `["127.0.0.1", "[::1]"]`. See <<unicast.hosts>>.
3535

3636
Specifies whether {es} should form a multiple-node cluster. By default, {es}
3737
discovers other nodes when forming a cluster and allows other nodes to join
38-
the cluster later. If `discovery.type` is set to `single-node`, {es} forms a
39-
single-node cluster and suppresses the timeouts set by
40-
`cluster.publish.timeout` and `cluster.join.timeout`. For more information
41-
about when you might use this setting, see <<single-node-discovery>>.
38+
the cluster later. If `discovery.type` is set to `single-node`, {es} forms
39+
a single-node cluster and suppresses the timeout set by
40+
`cluster.publish.timeout`. For more information about when you might use
41+
this setting, see <<single-node-discovery>>.
4242

4343
`cluster.initial_master_nodes`::
4444

@@ -179,12 +179,6 @@ or may become unstable or intolerant of certain failures.
179179
time, it is considered to have failed and is removed from the cluster. See
180180
<<cluster-state-publishing>>.
181181

182-
`cluster.join.timeout`::
183-
184-
Sets how long a node will wait after sending a request to join a cluster
185-
before it considers the request to have failed and retries, unless
186-
`discovery.type` is set to `single-node`. Defaults to `60s`.
187-
188182
`cluster.max_voting_config_exclusions`::
189183

190184
Sets a limit on the number of voting configuration exclusions at any one

docs/reference/setup/add-nodes.asciidoc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,9 +67,7 @@ to the voting configuration if it is appropriate to do so.
6767

6868
During master election or when joining an existing formed cluster, a node
6969
sends a join request to the master in order to be officially added to the
70-
cluster. You can use the `cluster.join.timeout` setting to configure how long a
71-
node waits after sending a request to join a cluster. Its default value is `30s`.
72-
See <<modules-discovery-settings>>.
70+
cluster.
7371

7472
[discrete]
7573
[[modules-discovery-removing-nodes]]

server/src/internalClusterTest/java/org/elasticsearch/action/support/master/IndexingMasterFailoverIT.java

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,15 +52,11 @@ protected Collection<Class<? extends Plugin>> nodePlugins() {
5252
public void testMasterFailoverDuringIndexingWithMappingChanges() throws Throwable {
5353
logger.info("--> start 4 nodes, 3 master, 1 data");
5454

55-
final Settings sharedSettings = Settings.builder()
56-
.put("cluster.join.timeout", "10s") // still long to induce failures but not too long so test won't time out
57-
.build();
58-
5955
internalCluster().setBootstrapMasterNodeIndex(2);
6056

61-
internalCluster().startMasterOnlyNodes(3, sharedSettings);
57+
internalCluster().startMasterOnlyNodes(3, Settings.EMPTY);
6258

63-
String dataNode = internalCluster().startDataOnlyNode(sharedSettings);
59+
String dataNode = internalCluster().startDataOnlyNode(Settings.EMPTY);
6460

6561
logger.info("--> wait for all nodes to join the cluster");
6662
ensureStableCluster(4);

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -168,9 +168,9 @@ public Coordinator(String nodeName, Settings settings, ClusterSettings clusterSe
168168
this.onJoinValidators = JoinTaskExecutor.addBuiltInJoinValidators(onJoinValidators);
169169
this.singleNodeDiscovery = DiscoveryModule.isSingleNodeDiscovery(settings);
170170
this.electionStrategy = electionStrategy;
171-
this.joinHelper = new JoinHelper(settings, allocationService, masterService, transportService,
172-
this::getCurrentTerm, this::getStateForMasterService, this::handleJoinRequest, this::joinLeaderInTerm, this.onJoinValidators,
173-
rerouteService, nodeHealthService);
171+
this.joinHelper = new JoinHelper(allocationService, masterService, transportService, this::getCurrentTerm,
172+
this::getStateForMasterService, this::handleJoinRequest, this::joinLeaderInTerm, this.onJoinValidators, rerouteService,
173+
nodeHealthService);
174174
this.persistedStateSupplier = persistedStateSupplier;
175175
this.noMasterBlockService = new NoMasterBlockService(settings, clusterSettings);
176176
this.lastKnownLeader = Optional.empty();

server/src/main/java/org/elasticsearch/cluster/coordination/JoinHelper.java

Lines changed: 5 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -34,22 +34,17 @@
3434
import org.elasticsearch.cluster.routing.RerouteService;
3535
import org.elasticsearch.cluster.routing.allocation.AllocationService;
3636
import org.elasticsearch.cluster.service.MasterService;
37-
import org.elasticsearch.common.Nullable;
3837
import org.elasticsearch.common.Priority;
3938
import org.elasticsearch.common.collect.Tuple;
4039
import org.elasticsearch.common.io.stream.StreamInput;
41-
import org.elasticsearch.common.settings.Setting;
42-
import org.elasticsearch.common.settings.Settings;
4340
import org.elasticsearch.common.unit.TimeValue;
44-
import org.elasticsearch.discovery.DiscoveryModule;
4541
import org.elasticsearch.monitor.NodeHealthService;
4642
import org.elasticsearch.monitor.StatusInfo;
4743
import org.elasticsearch.threadpool.ThreadPool;
4844
import org.elasticsearch.threadpool.ThreadPool.Names;
4945
import org.elasticsearch.transport.TransportChannel;
5046
import org.elasticsearch.transport.TransportException;
5147
import org.elasticsearch.transport.TransportRequest;
52-
import org.elasticsearch.transport.TransportRequestOptions;
5348
import org.elasticsearch.transport.TransportResponse;
5449
import org.elasticsearch.transport.TransportResponse.Empty;
5550
import org.elasticsearch.transport.TransportResponseHandler;
@@ -81,32 +76,22 @@ public class JoinHelper {
8176
public static final String VALIDATE_JOIN_ACTION_NAME = "internal:cluster/coordination/join/validate";
8277
public static final String START_JOIN_ACTION_NAME = "internal:cluster/coordination/start_join";
8378

84-
// the timeout for each join attempt
85-
public static final Setting<TimeValue> JOIN_TIMEOUT_SETTING =
86-
Setting.timeSetting("cluster.join.timeout",
87-
TimeValue.timeValueMillis(60000), TimeValue.timeValueMillis(1), Setting.Property.NodeScope);
88-
8979
private final MasterService masterService;
9080
private final TransportService transportService;
9181
private final JoinTaskExecutor joinTaskExecutor;
92-
93-
@Nullable // if using single-node discovery
94-
private final TimeValue joinTimeout;
9582
private final NodeHealthService nodeHealthService;
9683

9784
private final Set<Tuple<DiscoveryNode, JoinRequest>> pendingOutgoingJoins = Collections.synchronizedSet(new HashSet<>());
85+
private final AtomicReference<FailedJoinAttempt> lastFailedJoinAttempt = new AtomicReference<>();
9886

99-
private AtomicReference<FailedJoinAttempt> lastFailedJoinAttempt = new AtomicReference<>();
100-
101-
JoinHelper(Settings settings, AllocationService allocationService, MasterService masterService,
102-
TransportService transportService, LongSupplier currentTermSupplier, Supplier<ClusterState> currentStateSupplier,
87+
JoinHelper(AllocationService allocationService, MasterService masterService, TransportService transportService,
88+
LongSupplier currentTermSupplier, Supplier<ClusterState> currentStateSupplier,
10389
BiConsumer<JoinRequest, JoinCallback> joinHandler, Function<StartJoinRequest, Join> joinLeaderInTerm,
10490
Collection<BiConsumer<DiscoveryNode, ClusterState>> joinValidators, RerouteService rerouteService,
10591
NodeHealthService nodeHealthService) {
10692
this.masterService = masterService;
10793
this.transportService = transportService;
10894
this.nodeHealthService = nodeHealthService;
109-
this.joinTimeout = DiscoveryModule.isSingleNodeDiscovery(settings) ? null : JOIN_TIMEOUT_SETTING.get(settings);
11095
this.joinTaskExecutor = new JoinTaskExecutor(allocationService, logger, rerouteService) {
11196

11297
@Override
@@ -249,7 +234,6 @@ public void sendJoinRequest(DiscoveryNode destination, long term, Optional<Join>
249234
if (pendingOutgoingJoins.add(dedupKey)) {
250235
logger.debug("attempting to join {} with {}", destination, joinRequest);
251236
transportService.sendRequest(destination, JOIN_ACTION_NAME, joinRequest,
252-
TransportRequestOptions.builder().withTimeout(joinTimeout).build(),
253237
new TransportResponseHandler<Empty>() {
254238
@Override
255239
public Empty read(StreamInput in) {
@@ -309,10 +293,8 @@ public String executor() {
309293
}
310294

311295
void sendValidateJoinRequest(DiscoveryNode node, ClusterState state, ActionListener<TransportResponse.Empty> listener) {
312-
transportService.sendRequest(node, VALIDATE_JOIN_ACTION_NAME,
313-
new ValidateJoinRequest(state),
314-
TransportRequestOptions.builder().withTimeout(joinTimeout).build(),
315-
new ActionListenerResponseHandler<>(listener, i -> Empty.INSTANCE, ThreadPool.Names.GENERIC));
296+
transportService.sendRequest(node, VALIDATE_JOIN_ACTION_NAME, new ValidateJoinRequest(state),
297+
new ActionListenerResponseHandler<>(listener, i -> Empty.INSTANCE, ThreadPool.Names.GENERIC));
316298
}
317299

318300
public interface JoinCallback {

server/src/main/java/org/elasticsearch/common/settings/ClusterSettings.java

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121
import org.apache.logging.log4j.LogManager;
2222
import org.elasticsearch.action.admin.cluster.configuration.TransportAddVotingConfigExclusionsAction;
2323
import org.elasticsearch.action.admin.indices.close.TransportCloseIndexAction;
24-
import org.elasticsearch.index.IndexingPressure;
2524
import org.elasticsearch.action.search.TransportSearchAction;
2625
import org.elasticsearch.action.support.AutoCreateIndex;
2726
import org.elasticsearch.action.support.DestructiveOperations;
@@ -38,7 +37,6 @@
3837
import org.elasticsearch.cluster.coordination.Coordinator;
3938
import org.elasticsearch.cluster.coordination.ElectionSchedulerFactory;
4039
import org.elasticsearch.cluster.coordination.FollowersChecker;
41-
import org.elasticsearch.cluster.coordination.JoinHelper;
4240
import org.elasticsearch.cluster.coordination.LagDetector;
4341
import org.elasticsearch.cluster.coordination.LeaderChecker;
4442
import org.elasticsearch.cluster.coordination.NoMasterBlockService;
@@ -79,6 +77,7 @@
7977
import org.elasticsearch.http.HttpTransportSettings;
8078
import org.elasticsearch.index.IndexModule;
8179
import org.elasticsearch.index.IndexSettings;
80+
import org.elasticsearch.index.IndexingPressure;
8281
import org.elasticsearch.indices.IndexingMemoryController;
8382
import org.elasticsearch.indices.IndicesQueryCache;
8483
import org.elasticsearch.indices.IndicesRequestCache;
@@ -469,7 +468,6 @@ public void apply(Settings value, Settings current, Settings previous) {
469468
ElectionSchedulerFactory.ELECTION_DURATION_SETTING,
470469
Coordinator.PUBLISH_TIMEOUT_SETTING,
471470
Coordinator.PUBLISH_INFO_TIMEOUT_SETTING,
472-
JoinHelper.JOIN_TIMEOUT_SETTING,
473471
FollowersChecker.FOLLOWER_CHECK_TIMEOUT_SETTING,
474472
FollowersChecker.FOLLOWER_CHECK_INTERVAL_SETTING,
475473
FollowersChecker.FOLLOWER_CHECK_RETRY_COUNT_SETTING,

server/src/test/java/org/elasticsearch/cluster/coordination/JoinHelperTests.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ public void testJoinDeduplication() {
5959
TransportService transportService = capturingTransport.createTransportService(Settings.EMPTY,
6060
deterministicTaskQueue.getThreadPool(), TransportService.NOOP_TRANSPORT_INTERCEPTOR,
6161
x -> localNode, null, Collections.emptySet());
62-
JoinHelper joinHelper = new JoinHelper(Settings.EMPTY, null, null, transportService, () -> 0L, () -> null,
62+
JoinHelper joinHelper = new JoinHelper(null, null, transportService, () -> 0L, () -> null,
6363
(joinRequest, joinCallback) -> { throw new AssertionError(); }, startJoinRequest -> { throw new AssertionError(); },
6464
Collections.emptyList(), (s, p, r) -> {},
6565
() -> new StatusInfo(HEALTHY, "info"));
@@ -156,7 +156,7 @@ public void testJoinValidationRejectsMismatchedClusterUUID() {
156156
TransportService transportService = mockTransport.createTransportService(Settings.EMPTY,
157157
deterministicTaskQueue.getThreadPool(), TransportService.NOOP_TRANSPORT_INTERCEPTOR,
158158
x -> localNode, null, Collections.emptySet());
159-
new JoinHelper(Settings.EMPTY, null, null, transportService, () -> 0L, () -> localClusterState,
159+
new JoinHelper(null, null, transportService, () -> 0L, () -> localClusterState,
160160
(joinRequest, joinCallback) -> { throw new AssertionError(); }, startJoinRequest -> { throw new AssertionError(); },
161161
Collections.emptyList(), (s, p, r) -> {}, null); // registers request handler
162162
transportService.start();
@@ -189,9 +189,9 @@ public void testJoinFailureOnUnhealthyNodes() {
189189
x -> localNode, null, Collections.emptySet());
190190
AtomicReference<StatusInfo> nodeHealthServiceStatus = new AtomicReference<>
191191
(new StatusInfo(UNHEALTHY, "unhealthy-info"));
192-
JoinHelper joinHelper = new JoinHelper(Settings.EMPTY, null, null, transportService, () -> 0L, () -> null,
192+
JoinHelper joinHelper = new JoinHelper(null, null, transportService, () -> 0L, () -> null,
193193
(joinRequest, joinCallback) -> { throw new AssertionError(); }, startJoinRequest -> { throw new AssertionError(); },
194-
Collections.emptyList(), (s, p, r) -> {}, () -> nodeHealthServiceStatus.get());
194+
Collections.emptyList(), (s, p, r) -> {}, nodeHealthServiceStatus::get);
195195
transportService.start();
196196

197197
DiscoveryNode node1 = new DiscoveryNode("node1", buildNewFakeTransportAddress(), Version.CURRENT);

server/src/test/java/org/elasticsearch/discovery/AbstractDisruptionTestCase.java

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,6 @@
2424
import org.elasticsearch.cluster.block.ClusterBlockLevel;
2525
import org.elasticsearch.cluster.coordination.Coordinator;
2626
import org.elasticsearch.cluster.coordination.FollowersChecker;
27-
import org.elasticsearch.cluster.coordination.JoinHelper;
2827
import org.elasticsearch.cluster.coordination.LeaderChecker;
2928
import org.elasticsearch.cluster.node.DiscoveryNodes;
3029
import org.elasticsearch.common.Nullable;
@@ -126,7 +125,6 @@ List<String> startCluster(int numberOfNodes) {
126125
.put(LeaderChecker.LEADER_CHECK_RETRY_COUNT_SETTING.getKey(), 1) // for hitting simulated network failures quickly
127126
.put(FollowersChecker.FOLLOWER_CHECK_TIMEOUT_SETTING.getKey(), "5s") // for hitting simulated network failures quickly
128127
.put(FollowersChecker.FOLLOWER_CHECK_RETRY_COUNT_SETTING.getKey(), 1) // for hitting simulated network failures quickly
129-
.put(JoinHelper.JOIN_TIMEOUT_SETTING.getKey(), "10s") // still long to induce failures but to long so test won't time out
130128
.put(Coordinator.PUBLISH_TIMEOUT_SETTING.getKey(), "5s") // <-- for hitting simulated network failures quickly
131129
.put(TransportSettings.CONNECT_TIMEOUT.getKey(), "10s") // Network delay disruption waits for the min between this
132130
// value and the time of disruption and does not recover immediately

0 commit comments

Comments
 (0)