Skip to content

Conversation

@jasontedor
Copy link
Member

Today we do not fail a replica shard if the primary-replica resync to that replica fails. Yet, we should at least log the failure messages. This commit causes this to be the case.

Relates #24841, relates #27418

Today we do not fail a replica shard if the primary-replica resync to
that replica fails. Yet, we should at least log the failure
messages. This commit causes this to be the case.
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a good idea to do this. For example, when shutting down a cluster (full cluster restart), this might result in these warnings being logged, which could look alarming to users even though there is no reason to be alarmed. The primary-replica resync being best-effort at the moment anyhow, what advantage can the user gain from seeing these warnings? If it's for debugging purposes, I'm fine logging this as debug here.

@jasontedor
Copy link
Member Author

@ywelsch I disagree; today we log and only log primary-replica resync completed with {} operations and that is misleading. I do not want to throw out having these error messages in the logs when there is a genuine failure (the baby) solely to not have them in the logs when this occurs during shutdown (the bathwater, which will occur rarely).

@ywelsch
Copy link
Contributor

ywelsch commented Nov 17, 2017

I agree that the current log message is misleading. I'm not sure though that we should log every replication failure on a primary-replica resync, though. I'll reach out to discuss.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if log-level is changed to info

@jasontedor jasontedor merged commit da11515 into elastic:master Nov 17, 2017
@jasontedor
Copy link
Member Author

I discussed this with @ywelsch and we agreed that info logging is acceptable here.

@jasontedor jasontedor deleted the log-resync-failures branch November 17, 2017 18:34
jasontedor added a commit that referenced this pull request Nov 17, 2017
Today we do not fail a replica shard if the primary-replica resync to
that replica fails. Yet, we should at least log the failure
messages. This commit causes this to be the case.

Relates #27421
jasontedor added a commit that referenced this pull request Nov 17, 2017
Today we do not fail a replica shard if the primary-replica resync to
that replica fails. Yet, we should at least log the failure
messages. This commit causes this to be the case.

Relates #27421
@clintongormley clintongormley added :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.1 v6.1.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants