Skip to content

Conversation

@original-brownbear
Copy link
Contributor

@original-brownbear original-brownbear commented Apr 16, 2020

We don't really need LinkedHashSet here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in addSnapshot which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.

We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

Map<IndexId, Set<SnapshotId>> allIndexSnapshots = new HashMap<>(indexSnapshots);
Map<IndexId, List<SnapshotId>> allIndexSnapshots = new HashMap<>(indexSnapshots);
for (final IndexId indexId : shardGenerations.indices()) {
allIndexSnapshots.computeIfAbsent(indexId, k -> new LinkedHashSet<>()).add(snapshotId);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was broken, we were mutating the existing LinkedHashSet

List<SnapshotId> remaining;
List<SnapshotId> snapshotIds = this.indexSnapshots.get(indexId);
assert snapshotIds != null;
if (snapshotIds.contains(snapshotId)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not great that we're quadratic here now (for the nested loop), but I don't think it really matters much relative to the significant space+GC savings.

@original-brownbear
Copy link
Contributor Author

Found this while examining a heap dump for a cluster running into #55153 .
For a few thousand indices and ~1k snapshots the heap was full of uncollected java.util.LinkedHashMap.Entry (the sets/maps themselves were mostly already collected but collecting the entries is apparently trickier for the JVM) under load on the snapshot status APIs from the somewhat long lived RepositoryData instances. Also the outright overhead is massive for LinkedHashSet just on the face of it with at least 44 bytes per element (say the cluster contains 100 live indices at a time and we have 100 snapshots this comes out to half a MB in overhead per instance easily).

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks ok, but there is too much unnecessary list copying going on.

} else {
final List<SnapshotId> copy = new ArrayList<>(snapshotIds);
copy.add(snapshotId);
allIndexSnapshots.put(indexId, List.copyOf(copy));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create a copy of the copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These RepositoryData instances live for quite a while, so I figured the cost of doing another copy is worth the lower storage overhead + shorter path to the GC root compared to wrapping with Collections.unmodifiableList? I could technically make this more efficient by copying to a SnapshotId[] and then just wrapping that array but I figured this wasn't that much slower and nicer to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into how List.copyOf is implemented, and lo and behold, it copies the elements twice (first calls Collection.toArray(), and then creates another copy of that temporary array in List.of (using manual for loop, FFS).
This means that the list is copied three times here, plus the resize of the ArrayList when calling copy.add(snapshotId);, leading to another full copy ....
High-level languages ftw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) you win => I pushed 8319e21 , probably not worth the hassle to go further than this then.

set.remove(snapshotId);
remaining = new ArrayList<>(snapshotIds);
remaining.remove(listIndex);
remaining = List.copyOf(remaining);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, copy of copy

}
assert indexId != null;
indexSnapshots.put(indexId, snapshotIds);
indexSnapshots.put(indexId, List.copyOf(snapshotIds));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy of copy

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear original-brownbear merged commit 8fd81df into elastic:master Apr 20, 2020
@original-brownbear original-brownbear deleted the smaller-repository-data branch April 20, 2020 15:23
@original-brownbear
Copy link
Contributor Author

Thanks Yannick!

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 20, 2020
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
original-brownbear added a commit that referenced this pull request Apr 20, 2020
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
@original-brownbear original-brownbear restored the smaller-repository-data branch August 6, 2020 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants