-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make SnapshotsInProgress Diffable #89619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make SnapshotsInProgress Diffable #89619
Conversation
server/src/test/java/org/elasticsearch/snapshots/SnapshotsInProgressSerializationTests.java
Show resolved
Hide resolved
| */ | ||
| private record ByRepoDiff( | ||
| DiffableUtils.MapDiff<String, Entry, Map<String, Entry>> diffBySnapshotUUID, | ||
| DiffableUtils.MapDiff<String, Integer, Map<String, Integer>> positionDiff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the nicest solution ever to diffing a list that can see both entries change and move about in the list but the best I could come up with without refactoring things in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, could we do that prep work first and avoid having to do this? If not, could we at least sprinkle a bunch of assertions around here to validate our assumptions about the diff (no gaps, no duplicates, etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a big refactoring that avoids the current structure of a map of lists will be hard to pull off in the short-term. It's a massive change-set that would be required here and I'd much rather do it in increments.
The beauty of the code here is that at least for the positions we need no such assertions since List.of(...) will be unforgiving to gaps. So if we take the size of the array from the uuid->snaphot map and then use the positions for the iteration we can't really run into trouble ever can we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I'll suggest a couple of extra assertions in some following comments.
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Hi @original-brownbear, I've created a changelog YAML for you. |
|
|
||
| @Override | ||
| public Version getMinimalSupportedVersion() { | ||
| return Version.CURRENT.minimumCompatibilityVersion(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be DIFFABLE_VERSION?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because we were always allegedly "diffable" but used a non-diff. That's why I had to add the BwC new SimpleDiffable.CompleteDiff<>(after).writeTo(out); below. So in a sense this is just a wire format change from the perspective of the checks on this I think.
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look ok now, I left a couple of other ideas.
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
| */ | ||
| private record ByRepoDiff( | ||
| DiffableUtils.MapDiff<String, Entry, Map<String, Entry>> diffBySnapshotUUID, | ||
| DiffableUtils.MapDiff<String, Integer, Map<String, Integer>> positionDiff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, could we do that prep work first and avoid having to do this? If not, could we at least sprinkle a bunch of assertions around here to validate our assumptions about the diff (no gaps, no duplicates, etc)?
|
@DaveCTurner ping in case you have a second, this would be quite nice to have in for our snapshot benchmarking. It's a surprisingly large speedup :) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more small comments
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @SuppressWarnings("unchecked") | ||
| EntryDiff(Entry before, Entry after) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be explicit about the fields that must not change between before and after here, at least asserting that they're the same but maybe also protecting against that kind of bug in production too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I added actual exception throwing in production. If we ever run into a bug here, it's still better to stall everything and (maybe) for a full cluster restart to fix things than to corrupt the repo IMO :)
server/src/test/java/org/elasticsearch/snapshots/SnapshotsInProgressSerializationTests.java
Outdated
Show resolved
Hide resolved
| entries.set(i, mutateEntryWithLegalChange(entry)); | ||
| } | ||
| } | ||
| updatedInstance = updatedInstance.withUpdatedEntriesForRepo(perRepoEntries.get(0).repository(), entries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also reorder the entries for one or more repositories? Really just to give the ByRepoDiff stuff a proper workout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jup added shuffling the list. Had to change the way we generate the repo name for that. It was a random string per snapshot and we'd effectively never see collisions there. Now we actually have multiple snapshots per repo here and the position reordering is covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 sounds good
| */ | ||
| private record ByRepoDiff( | ||
| DiffableUtils.MapDiff<String, Entry, Map<String, Entry>> diffBySnapshotUUID, | ||
| DiffableUtils.MapDiff<String, Integer, Map<String, Integer>> positionDiff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I'll suggest a couple of extra assertions in some following comments.
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/SnapshotsInProgress.java
Outdated
Show resolved
Hide resolved
|
Thanks @DaveCTurner all points addressed I think :) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks David! |
This is very important for #77466. Profiling showed that serializing snapshots-in-progress when there's a few snapshots with high shard count running takes a significant amount of CPU and heap for sending the full data structure over an over.
This PR adds diffing in the simplest way I could think of on top of the existing data structure.
closes #88732