-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Log primary-replica resync failures #27421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Today we do not fail a replica shard if the primary-replica resync to that replica fails. Yet, we should at least log the failure messages. This commit causes this to be the case.
dakrone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's a good idea to do this. For example, when shutting down a cluster (full cluster restart), this might result in these warnings being logged, which could look alarming to users even though there is no reason to be alarmed. The primary-replica resync being best-effort at the moment anyhow, what advantage can the user gain from seeing these warnings? If it's for debugging purposes, I'm fine logging this as debug here.
|
@ywelsch I disagree; today we log and only log |
|
I agree that the current log message is misleading. I'm not sure though that we should log every replication failure on a primary-replica resync, though. I'll reach out to discuss. |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if log-level is changed to info
|
I discussed this with @ywelsch and we agreed that |
Today we do not fail a replica shard if the primary-replica resync to that replica fails. Yet, we should at least log the failure messages. This commit causes this to be the case. Relates #27421
Today we do not fail a replica shard if the primary-replica resync to that replica fails. Yet, we should at least log the failure messages. This commit causes this to be the case. Relates #27421
Today we do not fail a replica shard if the primary-replica resync to that replica fails. Yet, we should at least log the failure messages. This commit causes this to be the case.
Relates #24841, relates #27418