-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Adding a view of master history #85941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a view of master history #85941
Conversation
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @masseyke, I've created a changelog YAML for you. |
|
@elasticmachine update branch |
|
This is part of #85624 |
…i-master-stability-check
andreidan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Keith
I left a first round of suggestions/comments
| super(NAME, MasterHistoryAction.Response::new); | ||
| } | ||
|
|
||
| public static class Request extends MasterNodeReadRequest<MasterHistoryAction.Request> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this could be a ActionRequest and receive a DiscoveryNode as a parameter (just to be sure we're getting the history from the node we want - if there's a master change in the meantime we might be getting the history from another node by the time we submit this as there's a new master) https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/searchable-snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/action/cache/FrozenCacheInfoNodeAction.java is a good example
cc @DaveCTurner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally I'd suggest adding the code that uses this machinery to this same PR. I think there will be other opportunities to simplify things and perhaps change the message flow but it's hard to see what's needed without the calling code too.
(Acking that this'll make the PR bigger, and we might want to split it up again later, but for now I'd rather see everything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Acking that this'll make the PR bigger, and we might want to split it up again later, but for now I'd rather see everything)
Ha yeah originally the calling code was in this PR and I pulled it out. I'll put it back in for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added StableMasterHealthIndicatorService.java to show the possible usage of MasterHistoryService. I'll delete it from this PR before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this could be a
ActionRequest
Oops I had originally incorrectly had this as a TransportMasterNodeReadAction and forgot to fix the Request when I fixed the rest. Fixing that now.
and receive a
DiscoveryNodeas a parameter
I don't think I'm following you here. You mean as an argument to its constructor rather than calling remoteAddress() directly? Or something else?
| */ | ||
| public class MasterHistory implements ClusterStateListener, Writeable, Writeable.Reader<MasterHistory> { | ||
| private List<TimeAndMaster> masterHistory; | ||
| Supplier<Long> nowSupplier = System::currentTimeMillis; // Can be changed for testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're working with "things that happened since / in the last X minutes/seconds" we should use nanoTime here as it's monotonically increasing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend ThreadPool#relativeTimeInMillis which is monotonic but easily faked in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think what I did isn't going to work at all the way I've done it if we don't have somewhat-in-sync times across boxes. The reason is the hasSeenMasterInLastNSeconds method. If the times are all only meaningful on the current machine then there's no way to know if we've seen a master in the last 30 seconds on a remote machine, unless we calculate that before we send it over the wire. Might need to rethink this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have each of these methods be its own remote call rather than returning a MasterHistory object? I'm stuck on how to answer hasSeenMasterInLastNSeconds without it being its own call. And even then, it could easily take more than 30 seconds to respond, so the answer would be void before we even got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't have somewhat-in-sync times across boxes
This is an invariant we have to work with (without specialised hardware we can't do much about it)
I don't completely understand why you'd need to measure and compare time across machines though? Can you maybe explain that?
Machine 1 - computes its local history (ie. masters seen in last X seconds in relation to an origin chosen on machine 1)
Machine 2 - computes its own local history (ie. masters seen in the last X seconds in relation to an origin chosen on machine 2)
Machine 1 might ask Machine 2 about its history, in which case Machine 2 will reply with the history it has in its local "cache" for the last X seconds, according to the origin chosen on Machine 2 (no comparison to Machine 1 time/origin). Machine 1 will receive this history and if it contains repeated changes or master going null/not-null Machine 1 will be able to conclude that Machine 2 is an unstable master (again, Master 1 will work with the entire history it received from Machine 2, without comparing the times to its local seen masters, nor truncating/purging it in any way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that in machine 2's history there is no notion of "now". There are just a few events with timestamps that are meaningless to machine 1. Machine 1 can't tell if the newest even was 2 seconds or 2 years ago. I'll change MasterHistory to ship some notion of "now" so that i can answer questions like "has there been a master in the last 30 seconds". I won't need to expose what "now" is -- it'll just be some anchor to answer relative questions like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that in machine 2's history there is no notion of "now".
There is, but it's relative (and local) to Machine 2.
Machine 1 can't tell if the newest even was 2 seconds or 2 years ago.
Maybe I'm missing something but each machine will only hold 30 minutes' worth of history AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is, but it's relative (and local) to Machine 2.
Right, but we're talking about Machine 2's history on Machine 1.
Maybe I'm missing something but each machine will only hold 30 minutes' worth of history AFAIK
Yeah it's not an issue for the questions about what happened in the last 30 minutes. It's an issue for the question about what happened on Machine 2 in the last 30 seconds (as seen on machine 1). That's part of the MasterHistory API, but we don't actually have any use case where we need to answer that for a remote machine. Maybe I'll break out an interface for a LocalMasterHistory vs a RemoteMasterHhistory, and only have that method in the Local one. I could also only have the local one be a ClusterChangeEventListener, which would simplify things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I broke it into a MutableMasterHistory that is updated live on the local node, and an ImmutableMasterHistory that is never updated but that can be passed from one node to another.
server/src/main/java/org/elasticsearch/cluster/coordination/MasterHistory.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/coordination/MasterHistory.java
Outdated
Show resolved
Hide resolved
| if (currentMaster == null || currentMaster.equals(previousMaster) == false || masterHistory.isEmpty()) { | ||
| masterHistory.add(new TimeAndMaster(nowSupplier.get(), currentMaster)); | ||
| } | ||
| removeOldMasterHistory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is called from the cluster change notified thread and potentially from a public method (ie. getMostRecentNonNullMaster) this poses a multi-thread safety issue.
We either choose a thread-safe collection to store the history (ie. CopyOnWriteArrayList ) or we just purge the old history in the clusterChanged call (ie. have one writer against the history List).
I believe the latter is a good candidate here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I forgot to come back to that. If we only purge in clusterChanged I'll need to change the read methods to ignore anything old, since we could probably go a very long time between a cluster changed event. But that would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wound up protecting write access from multiple threads because I thought it would be easier to read the code (vs another check for all reads to ignore anything older than 30 minutes). Let me know what you think.
…eptions encountered while fetching remote history
|
@elasticmachine update branch |
…/elasticsearch into feature/health-api-master-check
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (except for the indicator service which we'll work on in a separate PR).
.../test/java/org/elasticsearch/action/admin/cluster/coordination/MasterHistoryActionTests.java
Show resolved
Hide resolved
andreidan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (no indicator though :) )
Thanks for iterating on this Keith ! 🚀
| * more than 30 minutes old). Rather than being scheduled, this method is called whenever the cluster state changes. | ||
| */ | ||
| private void removeOldMasterHistory(List<TimeAndMaster> newMasterHistory) { | ||
| if (newMasterHistory.size() < 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be ok to document this locally? It's not immediately obvious IMO
server/src/main/java/org/elasticsearch/cluster/coordination/MasterHistory.java
Outdated
Show resolved
Hide resolved
| Collections.reverse(masterHistoryCopy); | ||
| for (TimeAndMaster timeAndMaster : masterHistoryCopy) { | ||
| if (timeAndMaster.master != null) { | ||
| return timeAndMaster.master; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would iterating backwards avoid some cpu cycles and allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha it used to be that way. See #85941 (comment).
| /** | ||
| * An identity change is when we get notified of a change to a non-null master that is different from the previous non-null master. | ||
| * Note that a master changes to null on (virtually) every identity change. | ||
| * So for example: | ||
| * node1 -> node2 is 1 identity change | ||
| * node1 -> node2 -> node1 is 2 identity changes | ||
| * node1 -> node2 -> node2 is 1 identity change (transitions from a node to itself do not count) | ||
| * node1 -> null -> node1 is 0 identity changes (transitions from a node to itself, even with null in the middle, do not count) | ||
| * node1 -> null -> node2 is 1 identity change | ||
| * @param masterHistory The list of nodes that have been master | ||
| * @return The number of master identity changes as defined above | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES! This is really nice ❤️ ! Thanks for these docs Keith
| * @param n The number of seconds to look back | ||
| * @return true if the current master is non-null or if a non-null master was seen in the last n seconds | ||
| */ | ||
| public boolean hasSeenMasterInLastNSeconds(int n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we indicate the time value here ?
| public boolean hasSeenMasterInLastNSeconds(int n) { | |
| public boolean hasSeenMasterInLastSeconds(int numberOfSeconds) { |
| * updated unless this method is called again. | ||
| * @param node The node whose view of the master history we want to fetch | ||
| */ | ||
| public void requestRemoteMasterHistory(DiscoveryNode node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
What I was proposing before was something similar to what we do in the client https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/client/internal/support/AbstractClient.java#L399
e.g.
public void requestRemoteMasterHistory(DiscoveryNode node, ActionListener<MasterHistoryAction.Response>)
I'm happy with the naming suggestion though. Thanks for iterating on this !
| @Override | ||
| public void onResponse(MasterHistoryAction.Response response) { | ||
| long endTime = System.nanoTime(); | ||
| logger.trace("Received history from {} in {}", node, TimeValue.timeValueNanos(endTime - startTime)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| logger.trace("Received history from {} in {}", node, TimeValue.timeValueNanos(endTime - startTime)); | |
| logger.trace("Received history from {} in {} nanos", node, TimeValue.timeValueNanos(endTime - startTime)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeValue.timeValueNanos() returns a TimeValue, and TimeValue's toString puts the units into the string. So no need to add " nanos" a second time.
| ConnectionProfile.buildDefaultConnectionProfile(clusterService.getSettings()), | ||
| new ActionListener<>() { | ||
| @Override | ||
| public void onResponse(Transport.Connection connection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice - TIL
This commit adds the notion of an in-memory MasterHistory of which nodes have been master for the last 30 minutes that is maintained in memory on each node. It is exposed via the MasterHistoryService. This commit also has a transport action so that you can fetch the master history from any node in the cluster, represented as a List of NodeClients. The list is an ordered list of nodes that have been seen as master for the last 30 minutes, with the oldest first. This action is used by the MasterHistoryService exposed for use via the MasterHistoryService. The local and remote master history representations will be used to determine if the master has been stable as part of the health API.