Use default discovery implementation for single-node discovery #40036

ywelsch · 2019-03-14T08:54:13Z

Switches "discovery.type: single-node" from using a separate implementation for single-node discovery to using the existing standard discovery implementation, with two small adaptions:

auto-bootstrapping, but requiring initial_master_nodes not to be set.
not actively pinging other nodes using the Peerfinder
not allowing other nodes to join its single-node cluster (if they have e.g. been set up using regular discovery and connect to the single-disco node).

elasticmachine · 2019-03-14T08:54:15Z

Pinging @elastic/es-distributed

DaveCTurner

I think we can improve our handling of the case where this node used to be part of a larger cluster and has since been restarted in single-node mode, in which it will no longer be able to form a cluster. For instance, the Coordinator could check this in doStart() and throw an exception if it'll never obtain a quorum.

I think we should also refuse to start if node.master: false.

I think both of the above need a mention in the breaking changes docs.

I would also like to reject PeersRequests from other nodes; today we respond normally and this means that other nodes' ClusterFormationFailureHelpers will report that they have "discovered" this node (in amongst all the exceptions about being unable to join it). (edit - see below)

server/src/main/java/org/elasticsearch/cluster/coordination/ClusterBootstrapService.java

DaveCTurner · 2019-03-20T08:11:10Z

server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

        logger.trace("handleJoinRequest: as {}, handling {}", mode, joinRequest);
+
+        if (singleNodeDiscovery && joinRequest.getSourceNode().equals(getLocalNode()) == false) {
+            joinCallback.onFailure(new IllegalStateException("cannot join node with single-node discovery"));


Suggest cannot join node with [discovery.type] set to [single-node] so that it's clearer what setting to look for.

DaveCTurner · 2019-03-20T08:41:16Z

I would also like to reject PeersRequests from other nodes; today we respond normally and this means that other nodes' ClusterFormationFailureHelpers will report that they have "discovered" this node (in amongst all the exceptions about being unable to join it).

Rejecting PeersRequests isn't sufficient for this: if another node has our transport address then it'll connect to us, handshake, and add us to its known peers list without any further checks. I think we'd need to do something like only exposing known peers once they've responded to a PeersRequest.

ywelsch · 2019-03-20T09:35:34Z

I think we can improve our handling of the case where this node used to be part of a larger cluster and has since been restarted in single-node mode, in which it will no longer be able to form a cluster. For instance, the Coordinator could check this in doStart() and throw an exception if it'll never obtain a quorum.

It's an extra check in Coordinator for a super-edge case. I'm not sure how much that buys us (additional code to maintain vs slightly better error reporting), so just want to ask again if you think this is really worth pursuing.

I think we should also refuse to start if node.master: false.

Again, extra check for an edge case, and covered by ClusterFormationFailureHelper.

I think both of the above need a mention in the breaking changes docs.

I disagree and think this should be treated as a bug fix.

DaveCTurner · 2019-03-20T10:04:29Z

It's an extra check in Coordinator for a super-edge case. I'm not sure how much that buys us (additional code to maintain vs slightly better error reporting), so just want to ask again if you think this is really worth pursuing

Yes, I think the improvement in the error reporting is worth it. Bootstrap checks continue to cause confusion, and if someone tries to disable the bootstrap checks by moving to single-node discovery after the cluster has formed in development mode then I think that keeping the node running despite a hopeless situation is going to add to the confusion.

Again, extra check for an edge case, and covered by ClusterFormationFailureHelper.

Similarly, keeping the node running in a hopeless situation is confusing. The log message would say master not discovered yet: discovery will continue using [...] from hosts providers... which is not true.

… voting config

DaveCTurner

LGTM

Switches "discovery.type: single-node" from using a separate implementation for single-node discovery to using the existing standard discovery implementation, with two small adaptions: - auto-bootstrapping, but requiring initial_master_nodes not to be set. - not actively pinging other nodes using the Peerfinder - not allowing other nodes to join its single-node cluster (if they have e.g. been set up using regular discovery and connect to the single-disco node).

Remove single-node discovery implementation

d888a78

ywelsch added >enhancement v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.0.0 v7.2.0 labels Mar 14, 2019

ywelsch requested a review from DaveCTurner March 14, 2019 08:54

DaveCTurner requested changes Mar 20, 2019

View reviewed changes

ywelsch added 5 commits March 25, 2019 16:47

Merge remote-tracking branch 'elastic/master' into remove-single-disco

7add826

fail if not master

40f1e22

Fail single-node discovery at startup if node does not have quorum in…

4d9ca65

… voting config

improve error messages

4d4587d

Merge remote-tracking branch 'elastic/master' into remove-single-disco

5ab3a3f

ywelsch requested a review from DaveCTurner March 26, 2019 16:10

DaveCTurner approved these changes Mar 27, 2019

View reviewed changes

ywelsch merged commit 730dad6 into elastic:master Mar 27, 2019

jakelandis added v7.0.0-rc2 and removed v7.0.0 labels Apr 3, 2019

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

rajiv-kv mentioned this pull request Jun 29, 2025

[Experimental] Start without joining a cluster if a "clusterless" ClusterPlugin is loaded opensearch-project/OpenSearch#18479

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use default discovery implementation for single-node discovery #40036

Use default discovery implementation for single-node discovery #40036

Uh oh!

ywelsch commented Mar 14, 2019

Uh oh!

elasticmachine commented Mar 14, 2019

Uh oh!

DaveCTurner left a comment •

edited

Loading

Uh oh!

Uh oh!

DaveCTurner Mar 20, 2019

Uh oh!

DaveCTurner commented Mar 20, 2019

Uh oh!

ywelsch commented Mar 20, 2019

Uh oh!

DaveCTurner commented Mar 20, 2019 •

edited

Loading

Uh oh!

DaveCTurner left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use default discovery implementation for single-node discovery #40036

Use default discovery implementation for single-node discovery #40036

Uh oh!

Conversation

ywelsch commented Mar 14, 2019

Uh oh!

elasticmachine commented Mar 14, 2019

Uh oh!

DaveCTurner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DaveCTurner Mar 20, 2019

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Mar 20, 2019

Uh oh!

ywelsch commented Mar 20, 2019

Uh oh!

DaveCTurner commented Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DaveCTurner left a comment •

edited

Loading

DaveCTurner commented Mar 20, 2019 •

edited

Loading