-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make RepositoryData Less Memory Heavy #55293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make RepositoryData Less Memory Heavy #55293
Conversation
We don't really need `LinkedHashSet` here. We can assume that all the entries are unique and just use a list and use the list utilities to create the cheapest possible version of the list. Also, this fixes a bug in `addSnapshot` which would mutate the existing linked hash set on the current instance (fortunately this never caused a real world bug) and brings the collection in line with the java docs on its getter that claim immutability.
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
| Map<IndexId, Set<SnapshotId>> allIndexSnapshots = new HashMap<>(indexSnapshots); | ||
| Map<IndexId, List<SnapshotId>> allIndexSnapshots = new HashMap<>(indexSnapshots); | ||
| for (final IndexId indexId : shardGenerations.indices()) { | ||
| allIndexSnapshots.computeIfAbsent(indexId, k -> new LinkedHashSet<>()).add(snapshotId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was broken, we were mutating the existing LinkedHashSet
| List<SnapshotId> remaining; | ||
| List<SnapshotId> snapshotIds = this.indexSnapshots.get(indexId); | ||
| assert snapshotIds != null; | ||
| if (snapshotIds.contains(snapshotId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not great that we're quadratic here now (for the nested loop), but I don't think it really matters much relative to the significant space+GC savings.
|
Found this while examining a heap dump for a cluster running into #55153 . |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks ok, but there is too much unnecessary list copying going on.
| } else { | ||
| final List<SnapshotId> copy = new ArrayList<>(snapshotIds); | ||
| copy.add(snapshotId); | ||
| allIndexSnapshots.put(indexId, List.copyOf(copy)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why create a copy of the copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These RepositoryData instances live for quite a while, so I figured the cost of doing another copy is worth the lower storage overhead + shorter path to the GC root compared to wrapping with Collections.unmodifiableList? I could technically make this more efficient by copying to a SnapshotId[] and then just wrapping that array but I figured this wasn't that much slower and nicer to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into how List.copyOf is implemented, and lo and behold, it copies the elements twice (first calls Collection.toArray(), and then creates another copy of that temporary array in List.of (using manual for loop, FFS).
This means that the list is copied three times here, plus the resize of the ArrayList when calling copy.add(snapshotId);, leading to another full copy ....
High-level languages ftw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) you win => I pushed 8319e21 , probably not worth the hassle to go further than this then.
| set.remove(snapshotId); | ||
| remaining = new ArrayList<>(snapshotIds); | ||
| remaining.remove(listIndex); | ||
| remaining = List.copyOf(remaining); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here, copy of copy
| } | ||
| assert indexId != null; | ||
| indexSnapshots.put(indexId, snapshotIds); | ||
| indexSnapshots.put(indexId, List.copyOf(snapshotIds)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy of copy
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks Yannick! |
We don't really need `LinkedHashSet` here. We can assume that all the entries are unique and just use a list and use the list utilities to create the cheapest possible version of the list. Also, this fixes a bug in `addSnapshot` which would mutate the existing linked hash set on the current instance (fortunately this never caused a real world bug) and brings the collection in line with the java docs on its getter that claim immutability.
We don't really need `LinkedHashSet` here. We can assume that all the entries are unique and just use a list and use the list utilities to create the cheapest possible version of the list. Also, this fixes a bug in `addSnapshot` which would mutate the existing linked hash set on the current instance (fortunately this never caused a real world bug) and brings the collection in line with the java docs on its getter that claim immutability.
We don't really need
LinkedHashSethere. We can assume that all theentries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in
addSnapshotwhich would mutate the existinglinked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.