Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Today when ESIntegTestCase starts some nodes it writes out the unicast hosts
files each time a node starts its transport service. This does mean that a
number of nodes can start and perform their first pinging round without any
unicast hosts which, if the timing is unlucky and a lot of nodes are all
started at the same time, can lead to a split brain as in #35052.

Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider
would always have yielded the existing hosts, so the timing would have to have
been implausibly unlucky. Since #33554, however, it's more likely because the
race occurs between the start of the first round of pinging and the writing of
the unicast hosts file. It is realistic that new nodes will be configured with
the existing nodes from startup, so this change reinstates that behaviour

Closes #35052.

Today when ESIntegTestCase starts some nodes it writes out the unicast hosts
files each time a node starts its transport service. This does mean that a
number of nodes can start and perform their first pinging round without any
unicast hosts which, if the timing is unlucky and a lot of nodes are all
started at the same time, can lead to a split brain as in elastic#35052.

Prior to elastic#33554 this was unlikely to happen since the MockUncasedHostsProvider
would always have yielded the existing hosts, so the timing would have to have
been implausibly unlucky. Since elastic#33554, however, it's more likely because the
race occurs between the start of the first round of pinging and the writing of
the unicast hosts file. It is realistic that new nodes will be configured with
the existing nodes from startup, so this change reinstates that behaviour

Closes elastic#35052.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v6.6.0 v6.5.1 labels Oct 31, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@DaveCTurner DaveCTurner merged commit 0072c90 into elastic:master Oct 31, 2018
@DaveCTurner DaveCTurner deleted the 2018-10-31-pre-populate-unicast-hosts branch October 31, 2018 19:21
DaveCTurner added a commit that referenced this pull request Oct 31, 2018
Today when ESIntegTestCase starts some nodes it writes out the unicast hosts
files each time a node starts its transport service. This does mean that a
number of nodes can start and perform their first pinging round without any
unicast hosts which, if the timing is unlucky and a lot of nodes are all
started at the same time, can lead to a split brain as in #35052.

Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider
would always have yielded the existing hosts, so the timing would have to have
been implausibly unlucky. Since #33554, however, it's more likely because the
race occurs between the start of the first round of pinging and the writing of
the unicast hosts file. It is realistic that new nodes will be configured with
the existing nodes from startup, so this change reinstates that behaviour.

Closes #35052.
DaveCTurner added a commit that referenced this pull request Oct 31, 2018
Today when ESIntegTestCase starts some nodes it writes out the unicast hosts
files each time a node starts its transport service. This does mean that a
number of nodes can start and perform their first pinging round without any
unicast hosts which, if the timing is unlucky and a lot of nodes are all
started at the same time, can lead to a split brain as in #35052.

Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider
would always have yielded the existing hosts, so the timing would have to have
been implausibly unlucky. Since #33554, however, it's more likely because the
race occurs between the start of the first round of pinging and the writing of
the unicast hosts file. It is realistic that new nodes will be configured with
the existing nodes from startup, so this change reinstates that behaviour.

Closes #35052.
@colings86 colings86 added v6.5.0 and removed v6.5.1 labels Nov 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test Issues or PRs that are addressing/adding tests v6.5.0 v6.6.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants