Force flush in FullClusterRestartIT#testRecovery #46956

dnhatn · 2019-09-21T18:42:27Z

If peer recovery happens after indexing, and indexing flushes some shard at the end, then the explicit flush in the test will be a noop. Then replicas will have some uncommitted translog , which is [transferred] in peer recovery, although all of these operations are in the commit already. If that replica becomes primary (after we restarted the cluster), it will have translog to replay and the test will fail. I can reproduce this failure in 0ced108.

Another issue in this test is that synced_flush is not a replication action, then the global checkpoint on replicas might be not up to date. We need to either wait for the global checkpoint to be synced or call a replication action to sync it.

Closes #46712

elasticmachine · 2019-09-21T18:42:29Z

Pinging @elastic/es-distributed

ywelsch

LGTM

dnhatn · 2019-09-22T20:54:48Z

Thanks @ywelsch.

If peer recovery happens after indexing, and indexing flushes some shard at the end, then the explicit flush in the test will be a noop. Then replicas will have some uncommitted translog , which is transferred in peer recovery, although all of these operations are in the commit already. If that replica becomes primary (after we restarted the cluster), it will have translog to replay and the test will fail. Another issue in this test is that synced_flush is not a replication action, then the global checkpoint on replicas might be not up to date. We need to either wait for the global checkpoint to be synced or call a replication action to sync it. Closes #46712

The pattern in the latest failure is similar to the source fixed in #46956 but relates to synced-flush. If peer recovery happens after indexing, and indexing flushes some shard at the end, then a synced flush in the test will not roll or commit translog. Closes #46712

Force flush in FullClusterRestartIT#testRecovery

e0e33f4

dnhatn added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.5.0 v6.8.4 v7.4.1 v7.3.3 labels Sep 21, 2019

ywelsch approved these changes Sep 22, 2019

View reviewed changes

dnhatn merged commit 38277fd into elastic:master Sep 22, 2019

dnhatn deleted the fix-recovery-test branch September 22, 2019 20:55

dnhatn added the backport pending label Sep 22, 2019

dnhatn removed the backport pending label Sep 24, 2019

colings86 added v7.4.0 and removed v7.4.1 labels Sep 25, 2019

dnhatn mentioned this pull request Oct 2, 2019

Always flush in FullClusterRestartIT#testRecovery #47465

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Force flush in FullClusterRestartIT#testRecovery #46956

Force flush in FullClusterRestartIT#testRecovery #46956

Uh oh!

dnhatn commented Sep 21, 2019

Uh oh!

elasticmachine commented Sep 21, 2019

Uh oh!

ywelsch left a comment

Uh oh!

dnhatn commented Sep 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Force flush in FullClusterRestartIT#testRecovery #46956

Force flush in FullClusterRestartIT#testRecovery #46956

Uh oh!

Conversation

dnhatn commented Sep 21, 2019

Uh oh!

elasticmachine commented Sep 21, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Sep 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants