-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Avoid losing ops in file-based recovery #22945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When a primary is relocated from an old node to a new node, it can have ops in its translog that do not have a sequence number assigned. When a file-based recovery is started, this can lead to skipping these ops when replaying the translog due to a bug in the recovery logic. This commit addresses this bug and adds a test in the BWC tests.
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. It would be great to have tests added in RecoverySourceHandlerTests .
| assertOK(response); | ||
| final InputStream content = response.getEntity().getContent(); | ||
| final int actualCount = | ||
| Integer.parseInt(XContentHelper.convertToMap(JsonXContent.jsonXContent, content, false).get("count").toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance to use something like ObjectPath.evaluate(shard, "seq_no.local_checkpoint")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 4430caa.
|
|
||
| logger.info("allowing shards on all nodes"); | ||
| updateIndexSetting(index, Settings.builder().putNull("index.routing.allocation.include._name")); | ||
| ensureGreen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add assert counts here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 0112f90.
4a01087 to
bb8884a
Compare
|
Thanks @bleskes. I've pushed commits in response to your comments. |
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome
|
test this please |
1 similar comment
|
test this please |
|
Thanks @bleskes. |
When a primary is relocated from an old node to a new node, it can have ops in its translog that do not have a sequence number assigned. When a file-based recovery is started, this can lead to skipping these ops when replaying the translog due to a bug in the recovery logic. This commit addresses this bug and adds a test in the BWC tests.
Relates #22484