-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Remove parent-task bans on channels disconnect #66066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done a first pass and think there's a missing bit of synchronisation; I left a few other comments too.
| private static class Ban { | ||
| final String reason; | ||
| final boolean perChannel; // TODO: Remove this in 8.0 | ||
| final Set<ChannelPendingTaskTracker> channels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mismatch between tracking the Transport.Connection on the parent node and the individual TcpChannel on this node is tricksy. It's all fine I think, neither can remain open forever if the other one closes, I'm just noting for posterity that this isn't obvious and took a bit of thought.
|
@DaveCTurner Thanks for looking. I have addressed your comments. |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one further comment, otherwise this LGTM I think
| synchronized (banedParents) { | ||
| synchronized (bannedParents) { | ||
| lastDiscoveryNodes = event.state().getNodes(); | ||
| // Remove all bans that were registered by nodes that are no longer in the cluster state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move to drop this now and rely entirely on the channel-closing trigger. I don't think this logic ever completely worked, we could receive a task from a departing node after it had left the cluster but while it was still connected to us; moreover it might rejoin the cluster without disconnecting from us and then we'd keep on processing the task it thought it had cancelled.
(it's still needed for BWC, it's better than nothing, but should only apply to bans that aren't tracking their own channels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this applies to bans that are not tracked by channels (see line 512 where we check ban.getValue().perChannel == false).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this on 8.0 once we backported this change to 7.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you're right again.
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| synchronized (banedParents) { | ||
| synchronized (bannedParents) { | ||
| lastDiscoveryNodes = event.state().getNodes(); | ||
| // Remove all bans that were registered by nodes that are no longer in the cluster state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you're right again.
|
@DaveCTurner Thanks so much for your thoughtful reviews on the task cancellation issues. I appreciate that :). |
Like elastic#56620, this change relies on channel disconnect instead of node leave events to remove parent-task ban markers. Relates elastic#65443 Relates elastic#56620
Like #56620, this change relies on channel disconnect instead of node leave events to remove parent-task ban markers.
Relates #65443
Relates #56620