Cleanup Stale Root Level Blobs in Sn. Repository #43542

original-brownbear · 2019-06-24T15:53:05Z

Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
- The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
- Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
Follow up to Recursively Delete Unreferenced Index Directories #42189
- Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written

This leaves only cleaning up unreferenced index metadata blobs (in the index folders) and some rare corner cases of snap- and meta- blobs in shard (older repos in which shard deletes failed).

* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to elastic#42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written

elasticmachine · 2019-06-24T15:53:07Z

Pinging @elastic/es-distributed

ywelsch · 2019-07-02T13:55:44Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

            }
            final SnapshotInfo finalSnapshotInfo = snapshot;
-            try {
-                blobContainer().deleteBlobsIgnoringIfNotExists(


should we still do this, even though we might clean-up later as well? I wonder if doing this regardless of the listing helps with eventually consistent repos.

Also explicitly deleting this here will avoid the bogus log message "Found stale root level blobs" later and remove the blobs here from the rootBlobs list

I wouldn't:

It's highly unlikely that we will miss a blob here even on S3. The only way this would happen in rare cases is if you were to delete a snapshot right after creating it.

It needlessly adds RPC calls (and code) for no apparent gain.

The logging is debug only anyway. If you think that message is confusing, we could just reword the message slightly so that it doesn't look like a potential issue?

It's highly unlikely that we will miss a blob here even on S3. The only way this would happen in rare cases is if you were to delete a snapshot right after creating it.

True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.

It needlessly adds RPC calls (and code)

It's one extra RPC call per snapshot deletion, so ok I think. It also makes it clearer what needs to be cleaned up as part of normal clean-up operation and what is cleaned up as part of garbage removal. It is a little bit of extra code, but also makes the intent clearer (which files belong to the snapshot and need to be cleaned up). Separating this in terms of code also allows us to apply different logging levels and messages instead of having a vague one. I was pondering for example whether we should provide a summary of clean-up information about dangling snapshot files at info level.

True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.

👍

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

tlrx

I'm really sorry to come so late about this change, but I'm a bit uncomfortable with this change. As a user I would not expect that deleting a specific snapshot also cleans up orphan files left behind by a failed creation or deletion.

Sorry if that sounds stupid, but I'd expect the repository clean up tool to do this instead.

tlrx · 2019-07-04T08:25:37Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

            }
            final SnapshotInfo finalSnapshotInfo = snapshot;
-            try {
-                blobContainer().deleteBlobsIgnoringIfNotExists(


True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.

👍

original-brownbear · 2019-07-04T08:40:50Z

@tlrx

Sorry if that sounds stupid, but I'd expect the repository clean up tool to do this instead.

No worries it doesn't. I think in a perfect world I'd agree (not with the tool approach as that's very specific to the Cloud providers and not generally applicable due to the lack of locking on the repo and the unreliability of timestamps in general) and say that the packaging of the cleanup in an endpoint (incoming in #43900) is enough and there shouldn't be any need for automatic cleanup.
In practice though, I think the long-term cost of never automatically cleaning up stale data is pretty significant for one. Also, it makes operating the snapshot repository needlessly complex and forces users to run the cleanup endpoint at some frequency (running it costs money in terms of API calls as does storage, solar scale operations will need to balance here to find an optimal approach). Automatic cleanups need fewer API calls since they can reused data from the normal delete and don't require manual maintenance. That was the motivation here in a nutshell :) Moving forward I think it would be wise to further optimize this functionality to require fewer list calls and be a little less brute-force but with the current situation and backwards-compatibility constraints I think this is the cleanest way forward for now.

…evel

original-brownbear · 2019-07-04T09:11:06Z

@ywelsch @tlrx I've now brought back the separate delete call for the known root level meta blobs and adjusted the logic to not catch those in the cleanup (so that I could move the cleanup to INFO log level as Yannick suggested).

original-brownbear · 2019-07-04T10:44:47Z

Jenkins run elasticsearch-ci/packaging-sample
Jenkins run elasticsearch-ci/2

ywelsch

I've left one comment, once that's addressed, LGTM

ywelsch · 2019-07-04T11:23:53Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

                snapshotId,
                ActionListener.map(listener, v -> {
                    cleanupStaleIndices(foundIndices, survivingIndices);
+                    // Cleaning up according to repository data before the delete so we don't accidentally identify the two just deleted


I find this too subtle, in particular because the cleanupStaleRootFiles method now operates on an old repository data, which is not clear when looking at that method itself.

Can you instead do something like the following:

diff --git a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java index 8b731f05a39..5c6695a93e0 100644 --- a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java +++ b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java @@ -59,6 +59,7 @@ import org.elasticsearch.common.settings.Setting; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.common.unit.ByteSizeUnit; import org.elasticsearch.common.unit.ByteSizeValue; +import org.elasticsearch.common.util.set.Sets; import org.elasticsearch.common.xcontent.LoggingDeprecationHandler; import org.elasticsearch.common.xcontent.NamedXContentRegistry; import org.elasticsearch.common.xcontent.XContentFactory; @@ -101,6 +102,7 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.Collection; import java.util.Collections; +import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Optional; @@ -394,11 +396,10 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp } // Delete snapshot from the index file, since it is the maintainer of truth of active snapshots final RepositoryData updatedRepositoryData; - final RepositoryData repositoryData; final Map<String, BlobContainer> foundIndices; final Set<String> rootBlobs; try { - repositoryData = getRepositoryData(); + final RepositoryData repositoryData = getRepositoryData(); updatedRepositoryData = repositoryData.removeSnapshot(snapshotId); // Cache the indices that were found before writing out the new index-N blob so that a stuck master will never // delete an index that was created by another master node after writing this index-N blob. @@ -410,9 +411,10 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp return; } final SnapshotInfo finalSnapshotInfo = snapshot; + final List<String> snapMetaFilesToDelete = + Arrays.asList(snapshotFormat.blobName(snapshotId.getUUID()), globalMetaDataFormat.blobName(snapshotId.getUUID())); try { - blobContainer().deleteBlobsIgnoringIfNotExists( - Arrays.asList(snapshotFormat.blobName(snapshotId.getUUID()), globalMetaDataFormat.blobName(snapshotId.getUUID()))); + blobContainer().deleteBlobsIgnoringIfNotExists(snapMetaFilesToDelete); } catch (IOException e) { logger.warn(() -> new ParameterizedMessage("[{}] Unable to delete global metadata files", snapshotId), e); } @@ -425,9 +427,8 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp snapshotId, ActionListener.map(listener, v -> { cleanupStaleIndices(foundIndices, survivingIndices); - // Cleaning up according to repository data before the delete so we don't accidentally identify the two just deleted - // blobs for the current snapshot as stale. - cleanupStaleRootFiles(rootBlobs, repositoryData); + // Remove snapMetaFilesToDelete, which have been deleted in a prior step, so that they are not identified as stale + cleanupStaleRootFiles(Sets.difference(rootBlobs, new HashSet<>(snapMetaFilesToDelete)), updatedRepositoryData); return null; }) );

Sure will do :)

tlrx

@original-brownbear Thanks for the detailed response, @ywelsch and I also talked about this via another channel. I understand the motivation around this change and my main concern is about multiple clusters accessing the same repository for writes (this is not supported, but we know many users do it) and how this change could make things even worse by silently deleting data (instead of just leaving orphaned files in the repository).

Anyway, I agree we can move forward with this change. We also think that adding a test (in another PR) that executes multiple concurrent snapshots deletions/creations and then checks that it always leaves the repository in a clean state would be a god idea.

original-brownbear · 2019-07-04T13:16:22Z

@tlrx

Anyway, I agree we can move forward with this change. We also think that adding a test (in another PR) that executes multiple concurrent snapshots deletions/creations and then checks that it always leaves the repository in a clean state would be a god idea.

See SnapshotResiliencyTests we have that test (if I understand your wishes correctly) :) (not with concurrent deletes/creates from multiple clusters since that would always fail at some frequency) but it simulates all kinds of stuck nodes that lead to concurrent operations etc :)

original-brownbear · 2019-07-04T13:22:42Z

@tlrx

how this change could make things even worse by silently deleting data (instead of just leaving orphaned files in the repository).

Actually there's a certain beauty to the approach here that makes this less of a concern :) If you look at the code here and for the stale indices clean up we always do this:

Find what we want to delete
Write updated index-N blob
Actually delete

so in step 3 we always bring things back in line with the latest index-N blob. If there was a concurrent action that started between 1 and 3 we wouldn't clean up the files it created and the action itself would fail because of the index-N change that happened concurrently. Hope that helps :)

* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to elastic#42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written

* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to #42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.3.0 labels Jun 24, 2019

original-brownbear requested review from andrershov, tlrx and ywelsch June 24, 2019 18:21

original-brownbear mentioned this pull request Jun 25, 2019

Remove Redundant Delete from Shard Snapshot Delete #43569

Closed

ywelsch suggested changes Jul 3, 2019

View reviewed changes

original-brownbear added 2 commits July 3, 2019 11:17

Merge remote-tracking branch 'elastic/master' into cleanup-round-2

2e12064

CR: cleaner blob -> uuid translation + asserts

75588bb

original-brownbear requested a review from ywelsch July 3, 2019 09:30

jpountz added v7.4.0 and removed v7.3.0 labels Jul 3, 2019

tlrx reviewed Jul 4, 2019

View reviewed changes

original-brownbear added 2 commits July 4, 2019 10:41

Merge remote-tracking branch 'elastic/master' into cleanup-round-2

778c274

CR: separately delete current snapshot's root blobs and log at info l…

b0c0a65

…evel

original-brownbear requested a review from tlrx July 4, 2019 09:10

ywelsch approved these changes Jul 4, 2019

View reviewed changes

original-brownbear added 2 commits July 4, 2019 13:48

Merge remote-tracking branch 'elastic/master' into cleanup-round-2

915562b

less subtle

7f2d961

tlrx approved these changes Jul 4, 2019

View reviewed changes

original-brownbear merged commit b9ce649 into elastic:master Jul 4, 2019

original-brownbear deleted the cleanup-round-2 branch July 4, 2019 13:49

original-brownbear added the backport pending label Jul 4, 2019

original-brownbear removed the backport pending label Jul 11, 2019

original-brownbear mentioned this pull request Jul 11, 2019

Cleanup Stale Root Level Blobs in Sn. Repository (#43542) #44226

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Cleanup Stale Root Level Blobs in Sn. Repository #43542

Cleanup Stale Root Level Blobs in Sn. Repository #43542

Uh oh!

Conversation

original-brownbear commented Jun 24, 2019

Uh oh!

elasticmachine commented Jun 24, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jul 4, 2019

Uh oh!

original-brownbear commented Jul 4, 2019

Uh oh!

original-brownbear commented Jul 4, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jul 4, 2019

Uh oh!

original-brownbear commented Jul 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants