-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make MasterService.patchVersions not Rebuild the Full CS #79860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make MasterService.patchVersions not Rebuild the Full CS #79860
Conversation
This makes the method really just patch the version via a cheap copy constructor. Moreover, it makes the cluster state builder smarter when it comes to updating the routing nodes so they aren't rebuilt so often as well.
|
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking for an assertion; could we also have some tests that we are preserving instance equality in the places on which we're relying for the efficiency gains here?
| this.blocks = blocks; | ||
| this.customs = customs; | ||
| this.wasReadFromDiff = wasReadFromDiff; | ||
| this.routingNodes = routingNodes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert that this is the routingNodes we would have got from a fresh rebuild here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easily. Neither RoutingNodes nor ShardRouting have any equals methods.
I could try to add those here, but that would make it a much larger change.
For now it's also not entirely trivial to test the gains made here in isolation I think. I was thinking of just taking this chunk in isolation, then doing a follow-up with the remaining chunks that eliminate the remainder of the unnecessary routing node rebuilds from the big PR and just adding an instance equality assertion in ClusterChangedEvent that makes sure that if the routing table doesn't change, then the nodes instance didn't change.
This change seemed safe enough to not require additional tests in isolation (to me that is:)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehh I really don't like having this invariant not be enforced. This is a public constructor, there's a risk that a future caller gets this wrong in future. They might just pass in a RoutingNodes that they happen to have on hand (especially since the argument isn't marked as @Nullable).
Instance equality on ShardRouting should be enough right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instance equality on ShardRouting should be enough right?
Right ... I think. You convinced me :) I'll do it right. Working out a proper equals now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, added the assertion and all the necessary equals methods I did exploit some obvious invariants to not have to compare all fields, but I erred on the side of caution here and there might be possible optimizations to these methods, but since they're only used for the assertion it's probably irrelevant.
As for ShardRouting, we unfortunately needed an equals there as well because of the case where we re-create the instance with a null DiscoveryNode when building RoutingNodes (hence the fix to its toString here).
Seems to work fine now though :)
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks David! |
This makes the method really just patch the version via a cheap copy constructor. Moreover, it makes the cluster state builder smarter when it comes to updating the routing nodes so they aren't rebuilt so often as well.
…9900) This makes the method really just patch the version via a cheap copy constructor. Moreover, it makes the cluster state builder smarter when it comes to updating the routing nodes so they aren't rebuilt so often as well.
This makes the method really just patch the version via a cheap
copy constructor. Moreover, it makes the cluster state builder smarter
when it comes to updating the routing nodes so they aren't rebuilt
so often as well.
relates #77466
extracted from #79692 when I benchmarked this to show a pretty hefty speedup by saving lots of routing nodes rebuilds.