Deduplicate RepositoryData on a Best Effort Basis #67947

original-brownbear · 2021-01-25T19:25:02Z

Enhance transport request deduplicator to allow for more general deduplication.
Make use of that to enable deduplicate RepositoryData under concurrent request load
(which so far has been the only situation where RepositoryData has created unmanagable
memory pressure).

relates #66042 (improves loading snapshot info for many snapshots in parallel as done by Cloud snapshotter for example)

relates #55153 (pushes back at least a little by creating a bottle-neck on the repository data loading step and saves significant memory from reducing the number of RepositoryData instances on heap during concurrent snapshot get requests)

Enhance transport request deduplicator to allow for more general deduplication. Make use of that to enable deduplicate RepositoryData under concurrent request load (which so far has been the only situation where RepositoryData has created unmanagable memory pressure).

elasticmachine · 2021-01-25T19:25:06Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2021-01-25T19:28:49Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-            } catch (Exception e) {
-                listener.onFailure(e);
-            }
+            repoDataDeduplicator.executeOnce(metadata, listener, (metadata, l) -> l.onResponse(cached.repositoryData()));


This is a little less significant in savings relative to the other spot but I think it's worth it.

Deserializing the cached RepositoryData is still expensive and it's nice to save the work.

For status APIs this gives a nice natural rate limit by only fanning out after fetching RepositoryData in case of massive concurrency of requests (e.g. snapshotter fetching all snapshots in a 1k snapshots repo)

For other operations like create, delete, clone we want those to run serially anyway (which we generally do via the master service) so its fine if we just run all the listeners in series here to begin with.

For status APIs this gives a nice natural rate limit by only fanning out after fetching RepositoryData in case of massive concurrency of requests (e.g. snapshotter fetching all snapshots in a 1k snapshots repo)

I fail to see how this change could act as a rate limit for that scenario, it's a good improvement but the requests would accumulate anyway, right?

but the requests would accumulate anyway, right?

Yea, I guess "rate-limiter" was a poor choice of wording. I guess we "rate limit" the rate at which we dispatch the actions after loading repo data to a single thread at a time but requests still pile up. In practice that probably means that things run much quicker than before effectively but with requests queuing and waiting for the first one to finish instead of executing in parallel it's more of a "thread-limiter" I suppose :)

That makes sense, thanks for the clarification!

original-brownbear · 2021-01-25T19:29:49Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            if (bestEffortConsistency) {
+                threadPool.generic().execute(ActionRunnable.wrap(listener, this::doGetRepositoryData));
+            } else {
+                repoDataDeduplicator.executeOnce(metadata, listener, (metadata, l) ->


This would completely resolve the broken situations we observed where there were 50+ concurrent GENERIC threads fetching RepositoryData and deserializing it over and over, eventually running the system OOM.

…eads-during-clone

original-brownbear · 2021-01-26T08:40:23Z

server/src/main/java/org/elasticsearch/repositories/Repository.java

     * Returns a {@link RepositoryData} to describe the data in the repository, including the snapshots
     * and the indices across all snapshots found in the repository.  Throws a {@link RepositoryException}
     * if there was an error in reading the data.
+     * @param listener listener that may be resolved on different kinds of threads including transport and cluster state applier threads


I audited all usages of this method and I couldn't find any remaining spot where this is a problem (at least in master, might need some adjustments in 7.x).

fcofdez

Nice change, I've left some questions and small suggestions. 👍

fcofdez · 2021-01-26T13:38:05Z

server/src/main/java/org/elasticsearch/transport/AbstractResultDeduplicator.java

+ * @param <T> Request type
 */
-public final class TransportRequestDeduplicator<T> {
+public final class AbstractResultDeduplicator<T, R> {


nit: Maybe we should name it just ResultDeduplicator? as we don't expect to extend this class.
Also, now that we're using it outside of the transport package, maybe it makes sense to move it to another package?

++ to both, lets renamed and moved to the .action package :)

fcofdez · 2021-01-26T13:47:35Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-            } catch (Exception e) {
-                listener.onFailure(e);
-            }
+            repoDataDeduplicator.executeOnce(metadata, listener, (metadata, l) -> l.onResponse(cached.repositoryData()));


For status APIs this gives a nice natural rate limit by only fanning out after fetching RepositoryData in case of massive concurrency of requests (e.g. snapshotter fetching all snapshots in a 1k snapshots repo)

I fail to see how this change could act as a rate limit for that scenario, it's a good improvement but the requests would accumulate anyway, right?

…eads-during-clone

original-brownbear · 2021-01-26T15:57:47Z

Thanks Francisco all addressed I think :)

fcofdez

LGTM, thanks Armin!

original-brownbear · 2021-01-26T18:04:25Z

Thanks Francisco!

Enhance transport request deduplicator to allow for more general deduplication. Make use of that to enable deduplicate RepositoryData under concurrent request load (which so far has been the only situation where RepositoryData has created unmanageable memory pressure).

There is a small chance here that elastic#67947 would cause the callback for the repository data to run on a transport or CS updater thread and do a lot of IO to fetch `SnapshotInfo`. Fixed by always forking to the generic pool for the callback. Added test that triggers lots of deserializing repository data from cache on the transport thread concurrently which triggers this bug relatively reliable (more than half the runs) but is still reasonably fast (under 5s).

Enhance transport request deduplicator to allow for more general deduplication. Make use of that to enable deduplicate RepositoryData under concurrent request load (which so far has been the only situation where RepositoryData has created unmanageable memory pressure).

) There is a small chance here that #67947 would cause the callback for the repository data to run on a transport or CS updater thread and do a lot of IO to fetch `SnapshotInfo`. Fixed by always forking to the generic pool for the callback. Added test that triggers lots of deserializing repository data from cache on the transport thread concurrently which triggers this bug relatively reliable (more than half the runs) but is still reasonably fast (under 5s).

…stic#68023) There is a small chance here that elastic#67947 would cause the callback for the repository data to run on a transport or CS updater thread and do a lot of IO to fetch `SnapshotInfo`. Fixed by always forking to the generic pool for the callback. Added test that triggers lots of deserializing repository data from cache on the transport thread concurrently which triggers this bug relatively reliable (more than half the runs) but is still reasonably fast (under 5s).

) (#68092) There is a small chance here that #67947 would cause the callback for the repository data to run on a transport or CS updater thread and do a lot of IO to fetch `SnapshotInfo`. Fixed by always forking to the generic pool for the callback. Added test that triggers lots of deserializing repository data from cache on the transport thread concurrently which triggers this bug relatively reliable (more than half the runs) but is still reasonably fast (under 5s).

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.12.0 labels Jan 25, 2021

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 25, 2021

original-brownbear commented Jan 25, 2021

View reviewed changes

original-brownbear added 2 commits January 26, 2021 08:45

Merge remote-tracking branch 'elastic/master' into deduplicate-blob-r…

c58dc1c

…eads-during-clone

javadoc

130216c

original-brownbear requested review from fcofdez and tlrx January 26, 2021 08:39

original-brownbear commented Jan 26, 2021

View reviewed changes

fcofdez reviewed Jan 26, 2021

View reviewed changes

original-brownbear added 2 commits January 26, 2021 16:50

Merge remote-tracking branch 'elastic/master' into deduplicate-blob-r…

19d2169

…eads-during-clone

rename and move packages

8e0c76f

original-brownbear requested a review from fcofdez January 26, 2021 15:57

fcofdez approved these changes Jan 26, 2021

View reviewed changes

original-brownbear merged commit 29d1d25 into elastic:master Jan 26, 2021

original-brownbear deleted the deduplicate-blob-reads-during-clone branch January 26, 2021 18:04

original-brownbear mentioned this pull request Jan 26, 2021

Deduplicate RepositoryData on a Best Effort Basis (#67947) #68015

Merged

original-brownbear mentioned this pull request Jan 26, 2021

Fix SnapshotStatus Transport Action Doing IO on Transport Thread #68023

Merged

original-brownbear mentioned this pull request Jan 28, 2021

Fix SnapshotStatus Transport Action Doing IO on Transport Thread (#68023) #68092

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

original-brownbear restored the deduplicate-blob-reads-during-clone branch April 18, 2023 20:52

Deduplicate RepositoryData on a Best Effort Basis #67947

Deduplicate RepositoryData on a Best Effort Basis #67947

Uh oh!

Conversation

original-brownbear commented Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jan 25, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fcofdez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jan 26, 2021

Uh oh!

fcofdez left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jan 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

original-brownbear commented Jan 25, 2021 •

edited

Loading