-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Integrate Node Shutdown API with cluster metadata #71162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit hooks up the Node Shutdown API to the Node Shutdown cluster metadata, so using the API will result in the appropriate writes to the cluster state.
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
dakrone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left three comments, but nothing major!
| * @return All {@link SingleNodeShutdownMetadata}s that currently exist. | ||
| */ | ||
| public List<SingleNodeShutdownMetadata> getAllNodeMetadata() { | ||
| return new ArrayList<>(nodes.values()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return the Map here rather than making a list from the values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that at one point, but there's not much additional value in exposing the map I think. If we need to do so, it's easy enough to change.
I don't care strongly about this one way or the other, should I change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer the Map because I think it'll be a bit easier when testing since we can do get-by-id for nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll remove the getNodeMetadata(nodeId) method above then and replace it with map access as well.
| // Verify that the requested node actually exists in the cluster | ||
| if (state.nodes().nodeExists(request.getNodeId()) == false) { | ||
| throw new IllegalArgumentException("there is no node with id [" + request.getNodeId() + "] in this cluster"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should have this constraint, for example, if you need to add a shutdown for a node that is having connectivity issues, we should still allow marking it as shut down so that data can be migrated, rather than playing whack-a-mole waiting for it to be part of the cluster so that it can be marked as shutting down.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I wasn't thinking about that when I wrote this. It would be nice to have some kind of validation, so that if you completely fumble the node ID it doesn't just silently accept that.
Looking around at the code though, it doesn't look like there's much (any?) validation of node IDs - a lot of tests just use arbitrary strings, so maybe there's nothing we can really do here (besides maybe a length limit?).
| // Verify that there's not already a shutdown metadata for this node | ||
| if (Objects.nonNull(currentShutdownMetadata.getNodeMetadata(request.getNodeId()))) { | ||
| throw new IllegalArgumentException("node [" + request.getNodeId() + "] is already shutting down"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow a change in type or reason for a shutdown? I think it'd be nice to be able to change a restart type to removal if we decide that we wanted to remove the node after all.
For now though, we can keep this and maybe think about revisiting it in the future, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we've discussed being able to change the type, at least in one direction. That's a pretty easy follow-up though, so I think we should move forward with this and add that functionality when we need it.
I opened #71305 to keep track.
This commit hooks up the Node Shutdown API to the Node Shutdown cluster metadata, so using the API will result in the appropriate writes to the cluster state.
This commit hooks up the Node Shutdown API to the Node Shutdown cluster metadata, so using the API will result in the appropriate writes to the cluster state.
This commit hooks up the Node Shutdown API to the Node Shutdown cluster
metadata, so using the API will result in the appropriate writes to the
cluster state.
Relates #70338