Cascading primary failure lead to MSU too low #40249

henningandersen · 2019-03-20T10:55:47Z

If a replica were first reset due to one primary failover and then
promoted (before resync completes), its MSU would not include changes
since global checkpoint, leading to assertion errors during translog replay.

Fixed by re-initializing MSU before restoring local history.

Two follow-ups to this:

All tests in IndexShardTests calling indexOnReplicaWithGaps should allow updates.
For 8.0 and 7.x we could move the initialization of MSU to constructor (or at least the engine).

If a replica were first reset due to one primary failover and then promoted (before resync completes), its MSU would not include changes since global checkpoint, leading to errors during translog replay. Fixed by re-initializing MSU before restoring local history.

elasticmachine · 2019-03-20T10:55:49Z

Pinging @elastic/es-distributed

ywelsch

LGTM

dnhatn

LGTM. Thanks @henningandersen

If a replica were first reset due to one primary failover and then promoted (before resync completes), its MSU would not include changes since global checkpoint, leading to errors during translog replay. Fixed by re-initializing MSU before restoring local history.

If there's a failover on the follower, then its max_seq_no_of_updates is bootstrapped from its max_seq_no which might be higher than the max_seq_no_of_updates of the leader. We need to relax this check. Relates #40249

Since #40249, we always reinitialize max_seq_no_of_updates to max_seq_no when a promoting primary restores history regardless of whether it did rollback previously or not. Closes #40929

Today we choose to initialize max_seq_no_of_updates on primaries only so we can deal with a situation where a primary is on an old node (before 6.5) which does not have MUS while replicas on new nodes (6.5+). However, this strategy is quite complex and can lead to bugs (for example #40249) since we have to assign a correct value (not too low) to MSU in all possible situations (before recovering from translog, restoring history on promotion, and handing off relocation). Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes in the cluster should have MSU. This change simplifies the initialization of MSU by always assigning it a correct value in the constructor of Engine regardless of whether it's a replica or primary. Relates #33842

Today we choose to initialize max_seq_no_of_updates on primaries only so we can deal with a situation where a primary is on an old node (before 6.5) which does not have MUS while replicas on new nodes (6.5+). However, this strategy is quite complex and can lead to bugs (for example elastic#40249) since we have to assign a correct value (not too low) to MSU in all possible situations (before recovering from translog, restoring history on promotion, and handing off relocation). Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes in the cluster should have MSU. This change simplifies the initialization of MSU by always assigning it a correct value in the constructor of Engine regardless of whether it's a replica or primary. Relates elastic#33842

Since elastic#40249, we always reinitialize max_seq_no_of_updates to max_seq_no when a promoting primary restores history regardless of whether it did rollback previously or not. Closes elastic#40929

Today we choose to initialize max_seq_no_of_updates on primaries only so we can deal with a situation where a primary is on an old node (before 6.5) which does not have MUS while replicas on new nodes (6.5+). However, this strategy is quite complex and can lead to bugs (for example elastic#40249) since we have to assign a correct value (not too low) to MSU in all possible situations (before recovering from translog, restoring history on promotion, and handing off relocation). Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes in the cluster should have MSU. This change simplifies the initialization of MSU by always assigning it a correct value in the constructor of Engine regardless of whether it's a replica or primary. Relates elastic#33842

henningandersen added >bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v7.0.0 v6.7.0 v8.0.0 v7.2.0 labels Mar 20, 2019

henningandersen requested a review from dnhatn March 20, 2019 10:57

ywelsch approved these changes Mar 20, 2019

View reviewed changes

dnhatn approved these changes Mar 20, 2019

View reviewed changes

henningandersen merged commit df9f0f7 into elastic:master Mar 20, 2019

henningandersen added the backport pending label Mar 20, 2019

henningandersen removed the backport pending label Mar 20, 2019

henningandersen mentioned this pull request Mar 22, 2019

Store Pending Deletions Fix #40345

Merged

michaelbaamonde added v7.0.0-rc1 and removed v7.0.0 labels Mar 25, 2019

dnhatn mentioned this pull request Apr 12, 2019

Simplify initialization of max_seq_no of updates #41161

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cascading primary failure lead to MSU too low #40249

Cascading primary failure lead to MSU too low #40249

Uh oh!

henningandersen commented Mar 20, 2019

Uh oh!

elasticmachine commented Mar 20, 2019

Uh oh!

ywelsch left a comment

Uh oh!

dnhatn left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Cascading primary failure lead to MSU too low #40249

Cascading primary failure lead to MSU too low #40249

Uh oh!

Conversation

henningandersen commented Mar 20, 2019

Uh oh!

elasticmachine commented Mar 20, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants