Skip to content

Conversation

@s1monw
Copy link
Contributor

@s1monw s1monw commented Apr 21, 2017

Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.

Relates to #10708
I still work on a test for store recovery to ensure it's called but I think it's ready for review.

…store

Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I one minor request to the test.

final long maxSeqId = seqNoService.getMaxSeqNo();
int numNoOpsAdded = 0;
for (long i = localCheckpoint + 1; i <= maxSeqId;
// the local checkpoint might have been advanced so we are leap-frogging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the local checkpoint must have advanced by at least one. We can assert on that after the noop was indexed.

Engine.Index primaryResponse = indexForDoc(doc);
Engine.IndexResult indexResult = engine.index(primaryResponse);
if (randomBoolean()) {
doc.updateSeqID(indexResult.getSeqNo(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? doesn't the engine take care of that?

assertEquals((maxSeqIDOnReplica+1) - numDocsOnReplica, recoveringEngine.fillSequenceNumberHistory(2));
assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getMaxSeqNo());
assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getLocalCheckpoint());
if ((flushed = randomBoolean())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we snapshot the translog and assert that the noops have the right primary term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I had that but remvoed it... good catch...

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM. Left a suggestion for the new test.

// start a replica shard and index the second doc
final IndexShard otherShard = newStartedShard(false);
test = otherShard.prepareIndexOnReplica(
SourceToParse.source(SourceToParse.Origin.PRIMARY, shard.shardId().getIndexName(), test.type(), test.id(), test.source(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Origin should REPLICA


/* This test just verifies that we fill up local checkpoint up to max seen seqID on primary recovery */
public void testRecoverFromStoreWithNoOps() throws IOException {
final IndexShard shard = newStartedShard(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can introduce a variant of indexDoc called indexDocOnReplica which takes a seq# as a parameter. This will remove the need for the extra shard. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that in a sep PR

@s1monw s1monw merged commit 2ca7072 into elastic:master Apr 21, 2017
@s1monw s1monw deleted the fill_gaps_on_tlog_recovery branch April 21, 2017 18:28
@s1monw s1monw removed the review label Apr 21, 2017
asettouf pushed a commit to asettouf/elasticsearch that referenced this pull request Apr 23, 2017
…store (elastic#24238)

Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.

Relates to elastic#10708
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants