Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

Currently it is possible for a transient network error to disrupt the
start recovery request from the remote to source node. This disruption
is racy with the recovery occurring on the source node. It is possible
for the source node to finish and clear its recovery. When this occurs,
the recovery cannot be reestablished and the "no two start" assertion
is tripped. This commit fixes this issue by allowing two starts if the
finalize request has been received.

Fixes #57416.

@Tim-Brooks Tim-Brooks added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.9.0 labels Jun 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Recovery)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jun 8, 2020
@Tim-Brooks Tim-Brooks requested review from dnhatn and ywelsch June 8, 2020 17:05
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tim-Brooks Tim-Brooks merged commit 9524daf into elastic:master Jun 9, 2020
Tim-Brooks added a commit that referenced this pull request Jun 9, 2020
Currently it is possible for a transient network error to disrupt the
start recovery request from the remote to source node. This disruption
is racy with the recovery occurring on the source node. It is possible
for the source node to finish and clear its recovery. When this occurs,
the recovery cannot be reestablished and the "no two start" assertion
is tripped. This commit fixes this issue by allowing two starts if the
finalize request has been received.

Fixes #57416.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndexRecoveryIT testTransientErrorsDuringRecoveryAreRetried timeout

5 participants