-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Simplify and optimize deduplication of RepositoryData for a non-caching repository instance #91851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ng repository instance This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952
|
Pinging @elastic/es-distributed (Team:Distributed) |
|
Hi @original-brownbear, I've created a changelog YAML for you. |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simplification. I left one small comment.
| this.basePath = basePath; | ||
| this.maxSnapshotCount = MAX_SNAPSHOTS_SETTING.get(metadata.settings()); | ||
| this.repoDataDeduplicator = new ResultDeduplicator<>(threadPool.getThreadContext()); | ||
| this.repoDataLoadDeduplicator = new SingleResultDeduplicator<>(threadPool.getThreadContext(), this::doGetRepositoryData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SingleResultDeduplicator risks a stack overflow exception if the action doesn't fork, which is the case here. Can we fork to SNAPSHOT_META here rather than when calling repoDataLoadDeduplicator::execute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I pushed below commit, thanks for catching this! Also simplifies the call-sites nicely :)
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| repoData.getGenId() | ||
| ) | ||
| ) | ||
| repoDataLoadDeduplicator.execute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Thanks David! |
…ng repository instance (elastic#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952
…-caching repository instance (#91851) (#91866) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952. closes #89952 * fix compile
|
Would it be safe to backport this to 7.17 too? We definitely see #89952 affecting them, and I don't see any obvious reasons not to apply this there. |
|
Yea it should be fine :) let me try .. |
|
#92661 worked |
…-caching repository instance (elastic#91851) (elastic#91866) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (elastic#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in elastic#89952. closes elastic#89952 * fix compile
…-caching repository instance (#91851) (#91866) (#92661) * Simplify and optimize deduplication of RepositoryData for a non-caching repository instance (#91851) This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics. The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of `RepositoryData` will ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952. closes #89952 * fix compile
This makes use of the new deduplicator infrastructure to move to more efficient deduplication mechanics.
The existing solution hardly ever deduplicated because it would only deduplicate after the repository entered a consistent state. The adjusted solution is much simpler, in that it simply deduplicates such that only a single loading of
RepositoryDatawill ever happen at a time, fixing memory issues from massively concurrent loading of the repo data as described in #89952.closes #89952 (this should be all that's needed to fix the memory issue in practice, judging by heap dumps as the problem really just is the initial set of snapshots going wild with concurrent loading of repo data on the huge snapshot-meta pool)