-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Replicate max seq_no of updates to replicas #33967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We start tracking max seq_no_of_updates on the primary in elastic#33842. This commit replicates that value from a primary to its replicas in replication requests or the translog phase of peer-recovery. With this change, we guarantee that the value of max seq_no_of_updates on a replica when any index/delete operation is performed at least the max_seq_no_of_updates on the primary when that operation was executed. Relates elastic#33656
|
Pinging @elastic/es-distributed |
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I left some comments
| private boolean assertMaxSeqNoOfUpdatesIsPropagated(Index index, IndexingStrategy plan) { | ||
| final long maxSeqNoOfUpdates = getMaxSeqNoOfUpdatesOrDeletes(); | ||
| final Version indexVersion = config().getIndexSettings().getIndexVersionCreated(); | ||
| assert plan.useLuceneUpdateDocument == false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this buys us? it just validates the logic in plan as non primary and has nothing to do with the replication code? what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but with our classic example: index-1, delete-2, and index-3 on the primary; and the order on a replica is index-1(msu=-1), index-3(msu=2), and delete-2(msu=2). An index-3 on replica is an update on a replica but its msu is only 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't follow. I'll reach out monday to discuss.
| // If the old primary was on an old version, this promoting primary (was replica before) | ||
| // does not have max_seq_no_of_updates. We need to bootstrap it manually from its local history. | ||
| assert indexSettings.getIndexVersionCreated().before(Version.V_7_0_0_alpha1); | ||
| engine.initializeMaxSeqNoOfUpdatesOrDeletes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need maxSeq no here? maybe the translog wasn't fsynced? (if people disable the per request fsync)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, initializeMaxSeqNoOfUpdatesOrDeletes uses the max_seq_no from the local tracker and translog.
| // assert indexSettings.getIndexVersionCreated().before(Version.V_7_0_0_alpha1) : indexSettings.getIndexVersionCreated(); | ||
| // If the old primary was on an old version, this promoting primary (was replica before) | ||
| // does not have max_seq_no_of_updates. We need to bootstrap it manually from its local history. | ||
| assert indexSettings.getIndexVersionCreated().before(Version.V_7_0_0_alpha1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
| opPrimaryTerm, currentGlobalCheckpoint, maxSeqNo); | ||
| if (currentGlobalCheckpoint < maxSeqNo) { | ||
| resetEngineToGlobalCheckpoint(); | ||
| resetEngineToGlobalCheckpoint(maxSeqNoOfUpdatesOrDeletes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to set the maxSeqNoUpdatesOrDeletes here? can't we let it be what engine does it self? I think it's also risky because we recovery from the translog (and need make the optimization doesn't create duplicates in thas process)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can let an engine take care itself. The reason I use the max_seq_no_of_updates from the primary, so at the end of a test, all replicas and its primary have the same max_seq_no_of_updates. I will update this.
| }); | ||
| } | ||
| final IndexResult result = super.index(index); | ||
| if (index.seqNo() == SequenceNumbers.UNASSIGNED_SEQ_NO && result.getFailure() == null && result.isCreated() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can inline this into internal engine as a standard assertion no?
| advanceMaxSeqNoOfUpdatesOrDeletes(maxSeqNo); | ||
| } | ||
| final DeleteResult result = super.delete(delete); | ||
| if (delete.seqNo() == SequenceNumbers.UNASSIGNED_SEQ_NO && result.getFailure() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment this can be in the engine it self?
| * An alternative of {@link InternalEngine} that allows tweaking internals to reduce noise in engine tests. | ||
| */ | ||
| class InternalTestEngine extends InternalEngine { | ||
| private volatile boolean autoAdjustMaxSeqNoOfUpdatesOrDeletes = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't like anyone is change this - is this a follow up?
|
@bleskes Could you please have another look? Thank you! |
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I left some nits. I'm waiting with LGTM for our discussion
| // If the old primary was on an old version, this promoting primary (was replica before) | ||
| // does not have max_seq_no_of_updates. We need to bootstrap it manually from its local history. | ||
| assert indexSettings.getIndexVersionCreated().before(Version.V_7_0_0_alpha1); | ||
| engine.advanceMaxSeqNoOfUpdatesOrDeletes(seqNoStats().getMaxSeqNo()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reads better thanks. I think this comment will be more intuitive if you change old primary was on an old version to old primary was on an old version that didn't yet replicate the MSU, we need to bootstrap it...
| newEngine = createNewEngine(newEngineConfig()); | ||
| active.set(true); | ||
| } | ||
| newEngine.advanceMaxSeqNoOfUpdatesOrDeletes(seqNoStats.getMaxSeqNo()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be easier to understand if we use seqNoStats.getGlobalCheckpoint() which is the upper bound for the local recovery. The rest we don't care about here, right?
| } else if (plan.indexIntoLucene || plan.addStaleOpToLucene) { | ||
| indexResult = indexIntoLucene(index, plan); | ||
| assert (index.seqNo() != SequenceNumbers.UNASSIGNED_SEQ_NO || indexResult.isCreated() || indexResult.getFailure() != null) | ||
| || indexResult.getSeqNo() <= getMaxSeqNoOfUpdatesOrDeletes() : indexResult.getSeqNo() + " > " + getMaxSeqNoOfUpdatesOrDeletes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to allow for the msu to be unset here, no?
|
@bleskes I've updated the assertion as we discussed. Would you please give it another look? Thank you. |
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
s1monw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 2 thanks @dnhatn
We start tracking max seq_no_of_updates on the primary in #33842. This commit replicates that value from a primary to its replicas in replication requests or the translog phase of peer-recovery. With this change, we guarantee that the value of max seq_no_of_updates on a replica when any index/delete operation is performed at least the max_seq_no_of_updates on the primary when that operation was executed. Relates #33656
We start tracking max seq_no_of_updates on the primary in #33842. This commit replicates that value from a primary to its replicas in replication requests or the translog phase of peer-recovery. With this change, we guarantee that the value of max seq_no_of_updates on a replica when any index/delete operation is performed at least the max_seq_no_of_updates on the primary when that operation was executed. Relates #33656
We start tracking max seq_no_of_updates on the primary in #33842. This commit replicates that value from a primary to its replicas in replication requests or the translog phase of peer-recovery.
With this change, we guarantee that the value of max seq_no_of_updates on a replica when any index/delete operation is performed at least the max_seq_no_of_updates on the primary when that operation was executed.
Relates #33656