-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Seq Number based recovery should validate last lucene commit max seq# #22851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seq Number based recovery should validate last lucene commit max seq# #22851
Conversation
…q_no_should_be_below_global_check_point
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery. This commit adds a test that creates situation where rollback is needed (primary fail over with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming `startSeqNo` as a filter. Relates to elastic#22484 & #elastic#10708
|
test this please |
|
Test this please |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments but change LGTM
| return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1; | ||
| final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint); | ||
| if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) { | ||
| assert seqNoStats.getLocalCheckpoint() <= seqNoStats.getGlobalCheckpoint() : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be a consequence of maxSeqNo >= localCheckpoint. Should we instead add the assertion maxSeqNo >= localCheckpoint to the constructor of SeqNoStats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. moved.
| final long globalCheckpoint = Translog.readGlobalCheckpoint(recoveryTarget.indexShard().shardPath().resolveTranslog()); | ||
| return recoveryTarget.store().loadSeqNoStats(globalCheckpoint).getLocalCheckpoint() + 1; | ||
| final SeqNoStats seqNoStats = recoveryTarget.store().loadSeqNoStats(globalCheckpoint); | ||
| if (seqNoStats.getMaxSeqNo() <= seqNoStats.getGlobalCheckpoint()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here why we check this here (i.e. essentially the first paragraph of the PR description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure added
| this.logger = logger; | ||
| this.indexName = this.request.shardId().getIndex().getName(); | ||
| this.shardId = this.request.shardId().id(); | ||
| this.logger = Loggers.getLogger(getClass(), nodeSettings, request.shardId(),"recover to " + request.targetNode().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing
| .setSettings(Settings.builder() | ||
| .put(IndexMetaData.SETTING_NUMBER_OF_SHARDS, 1 + randomInt(2)) | ||
| .put(IndexMetaData.SETTING_NUMBER_OF_REPLICAS, randomInt(2)) | ||
| .put(IndexSettings.INDEX_SEQ_NO_CHECKPOINT_SYNC_INTERVAL.getKey(), "200ms") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randomize this a bit? Maybe we won't uncover other issues if this is too low?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added randomization between 5s and 200ms
| } | ||
|
|
||
| public static ShardRouting promoteToPrimary(ShardRouting routing) { | ||
| return new ShardRouting(routing.shardId(), routing.currentNodeId(), routing.relocatingNodeId(), true, routing.state(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use ShardRouting.moveActiveReplicaToPrimary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I didn't find it. Thanks for the tip.
| // simulate docs that were inflight when primary failed, these will be rolled back | ||
| final int rollbackDocs = randomIntBetween(1, 5); | ||
| logger.info("--> indexing {} rollback docs", rollbackDocs); | ||
| for (int i = 0; i< rollbackDocs; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces
| replica.updateGlobalCheckpointOnReplica(maxSeqNo - 1); | ||
| replica.getTranslog().sync(); | ||
|
|
||
| // commit is enough, global checkpoint is bellow max *committed* which is NO_OPS_PERFORMED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx.
|
|
||
| replica.updateGlobalCheckpointOnReplica(maxSeqNo); | ||
| replica.getTranslog().sync(); | ||
| // commit is enough, global checkpoint is bellow max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
below
…q_no_should_be_below_global_check_point
|
thx @ywelsch |
The seq# base recovery logic relies on rolling back lucene to remove any operations above the global checkpoint. This part of the plan is not implemented yet but have to have these guarantees. Instead we should make the seq# logic validate that the last commit point (and the only one we have) maintains the invariant and if not, fall back to file based recovery.
This commit adds a test that creates situation where rollback is needed (primary failover with ops in flight) and fixes another issue that was surfaced by it - if a primary can't serve a seq# based recovery request and does a file copy, it still used the incoming
startSeqNoas a filter.Relates to #22484 & #10708