Persistent Node Ids #19140

bleskes · 2016-06-29T08:42:54Z

Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys.

The first approach I considered was to use the node's published address as the base for the id. We already treat nodes with the same address as the same so this is a simple change (see here). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same.

Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I

It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join.

Last, and most importantly, this change requires that all nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking.

Other less important notes:

DummyTransportAddress is removed as we need a unique network address per node. Use LocalTransportAddress.buildUnique() instead.
I renamed node.add_lid_to_custom_path to node.add_lock_id_to_custom_path to avoid confusion with the node ID which is now part of the NodeEnvironment logic.
I removed the version paramater from MetaDataStateFormat#write , it wasn't really used and was just in the way :)
TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.

This PR supersedes #17811, and changes it by adding an ephimeralID field to DiscoveryNode that maintains the current id semantics, i.e., it changes with each node restart. This allows to keep the same semantics and use it for node equality.

…ster

ywelsch · 2016-06-30T09:00:57Z

core/src/main/java/org/elasticsearch/cluster/node/DiscoveryNodes.java

+     *
+     * @param node of the node which existence should be verified
+     * @return <code>true</code> if the node exists. Otherwise <code>false</code>
+     */


I see two usages of nodeExists(String nodeId) that could use this one instead. (LocalDiscovery and IndicesClusterStateServiceRandomUpdatesTests)

good catches. fixed.

ywelsch · 2016-06-30T10:14:56Z

@bleskes Left some comments on the PR but looks good overall. Can you add documentation as well? (breaking change). I'm ok if you want to address tribe nodes as well in this PR (I don't think it's too big of a change, see #17987).

bleskes · 2016-07-03T20:23:19Z

@ywelsch Thanks. I pushed a commit addressing your comments. I'm not sure about the tribe nodes - will reach out to discuss more.

bleskes · 2016-07-04T13:35:59Z

@ywelsch I pushed another commit with the tribe node change we discussed.

ywelsch · 2016-07-04T15:30:46Z

LGTM. Can you also add something to the migration docs? (see e.g. https://github.com/elastic/elasticsearch/pull/17987/files#diff-2d50fb3821a6a54aedf97e29135d711aR13 )

With #19140 we started persisting the node ID across node restarts. Now that we have a "stable" anchor, we can use it to generate a stable default node name and make it easier to track nodes over a restarts. Sadly, this means we will not have those random fun Marvel characters but we feel this is the right tradeoff. On the implementation side, this requires a bit of juggling because we now need to read the node id from disk before we can log as the node node is part of each log message. The PR move the initialization of NodeEnvironment as high up in the starting sequence as possible, with only one logging message before it to indicate we are initializing. Things look now like this: ``` [2016-07-15 19:38:39,742][INFO ][node ] [_unset_] initializing ... [2016-07-15 19:38:39,826][INFO ][node ] [aAmiW40] node name set to [aAmiW40] by default. set the [node.name] settings to change it [2016-07-15 19:38:39,829][INFO ][env ] [aAmiW40] using [1] data paths, mounts [[ /(/dev/disk1)]], net usable_space [5.5gb], net total_space [232.6gb], spins? [unknown], types [hfs] [2016-07-15 19:38:39,830][INFO ][env ] [aAmiW40] heap size [1.9gb], compressed ordinary object pointers [true] [2016-07-15 19:38:39,837][INFO ][node ] [aAmiW40] version[5.0.0-alpha5-SNAPSHOT], pid[46048], build[473d3c0/2016-07-15T17:38:06.771Z], OS[Mac OS X/10.11.5/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_51/25.51-b03] [2016-07-15 19:38:40,980][INFO ][plugins ] [aAmiW40] modules [percolator, lang-mustache, lang-painless, reindex, aggs-matrix-stats, lang-expression, ingest-common, lang-groovy, transport-netty], plugins [] [2016-07-15 19:38:43,218][INFO ][node ] [aAmiW40] initialized ``` Needless to say, settings `node.name` explicitly still works as before. The commit also contains some clean ups to the relationship between Environment, Settings and Plugins. The previous code suggested the path related settings could be changed after the initial Environment was changed. This did not have any effect as the security manager already locked things down.

@ywelsch

…ster state (#19743) When we introduces [persistent node ids](#19140) we were concerned that people may copy data folders from one to another resulting in two nodes competing for the same id in the cluster. To solve this we elected to not allow an incoming join if a different with same id already exists in the cluster, or if some other node already has the same transport address as the incoming join. The rationeel there was that it is better to prefer existing nodes and that we can rely on node fault detection to remove any node from the cluster that isn't correct any more, making room for the node that wants to join (and will keep trying). Sadly there were two problems with this: 1) One minor and easy to fix - we didn't allow for the case where the existing node can have the same network address as the incoming one, but have a different ephemeral id (after node restart). This confused the logic in `AllocationService`, in this rare cases. The cluster is good enough to detect this and recover later on, but it's not clean. 2) The assumption that Node Fault Detection will clean up is *wrong* when the node just won an election (it wasn't master before) and needs to process the incoming joins in order to commit the cluster state and assume it's mastership. In those cases, the Node Fault Detection isn't active. This PR fixes these two and prefers incoming nodes to existing node when finishing an election. On top of the, on request by @ywelsch , `AllocationService` synchronization between the nodes of the cluster and it's routing table is now explicit rather than something we do all the time. The same goes for promotion of replicas to primaries.

bleskes added 28 commits June 21, 2016 08:37

wip

720907e

compiliation

4c771b8

remove DummyTransportAddress

cef3008

overly aggressive replace

e831ae9

infinite loops FTW

66c7033

fix ClusterChangedEventTests

fc6ed34

fix some id based lookups

4557a4d

fix TribeServiceTests

b352fce

fix PipelineStoreTests

cfe587a

fix ClusterStateDiffIT

9205bf5

fix references to add_id_to_custom_path

036d5ed

register NODE_LOCAL_STORAGE_SETTING

28c1e49

Fix TribeUnitTests

8a97b83

fix testThatTribeClientsIgnoreGlobalConfig

1153beb

line length MetaDataStateFormat

17ef26c

line lengths ftw

9023ae5

merge from master

8ea0b2d

line length

72a36b6

merge from master

023d14d

merge from master

790aaba

remove custom data dirs per role, in favor of consistent startup order

02f7ede

fix data folder wiping and make data folders stable in the shared clu…

b35d66c

…ster

merge from master

0885d69

Merge branch 'master' into node_persistent_id2

6ec2197

Merge remote-tracking branch 'upstream/master' into node_persistent_id2

5fb8e7d

merge

833b05f

merge from master

190456d

fix formatting

b5bd842

bleskes added >enhancement >breaking labels Jun 29, 2016

ywelsch reviewed Jun 30, 2016
View reviewed changes

ywelsch mentioned this pull request Jun 30, 2016

Persistent Node Ids #17987

Closed

bleskes added 2 commits July 2, 2016 22:45

merge from master

244e40c

feedback

f80d254

bleskes added 3 commits July 4, 2016 13:52

merge from master

68dee40

order in the universe

8224a5d

tribe node consistency

4078323

bleskes added 2 commits July 4, 2016 20:57

merge from master

6cb28b3

add migration do to fs.asciidoc

e88b83a

bleskes merged commit 6861d35 into elastic:master Jul 4, 2016

bleskes deleted the node_persistent_id branch July 4, 2016 19:09

bleskes mentioned this pull request Jul 15, 2016

Persistent Node Names #19456

Merged

bleskes mentioned this pull request Aug 2, 2016

Upon being elected as master, prefer joins' node info to existing cluster state #19743

Merged

clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persistent Node Ids #19140

Persistent Node Ids #19140

Uh oh!

bleskes commented Jun 29, 2016 •

edited

Loading

Uh oh!

ywelsch Jun 30, 2016

Uh oh!

bleskes Jul 3, 2016

Uh oh!

ywelsch commented Jun 30, 2016

Uh oh!

bleskes commented Jul 3, 2016

Uh oh!

bleskes commented Jul 4, 2016

Uh oh!

ywelsch commented Jul 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Persistent Node Ids #19140

Persistent Node Ids #19140

Uh oh!

Conversation

bleskes commented Jun 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywelsch Jun 30, 2016

Choose a reason for hiding this comment

Uh oh!

bleskes Jul 3, 2016

Choose a reason for hiding this comment

Uh oh!

ywelsch commented Jun 30, 2016

Uh oh!

bleskes commented Jul 3, 2016

Uh oh!

bleskes commented Jul 4, 2016

Uh oh!

ywelsch commented Jul 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bleskes commented Jun 29, 2016 •

edited

Loading