Skip to content

Conversation

@original-brownbear
Copy link
Contributor

  • Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
    • The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
    • Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
  • Follow up to Recursively Delete Unreferenced Index Directories #42189
    • Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written

This leaves only cleaning up unreferenced index metadata blobs (in the index folders) and some rare corner cases of snap- and meta- blobs in shard (older repos in which shard deletes failed).

* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
   * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
   * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to elastic#42189
  * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

}
final SnapshotInfo finalSnapshotInfo = snapshot;
try {
blobContainer().deleteBlobsIgnoringIfNotExists(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we still do this, even though we might clean-up later as well? I wonder if doing this regardless of the listing helps with eventually consistent repos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also explicitly deleting this here will avoid the bogus log message "Found stale root level blobs" later and remove the blobs here from the rootBlobs list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't:

  1. It's highly unlikely that we will miss a blob here even on S3. The only way this would happen in rare cases is if you were to delete a snapshot right after creating it.
  2. It needlessly adds RPC calls (and code) for no apparent gain.
  3. The logging is debug only anyway. If you think that message is confusing, we could just reword the message slightly so that it doesn't look like a potential issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's highly unlikely that we will miss a blob here even on S3. The only way this would happen in rare cases is if you were to delete a snapshot right after creating it.

True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.

It needlessly adds RPC calls (and code)

It's one extra RPC call per snapshot deletion, so ok I think. It also makes it clearer what needs to be cleaned up as part of normal clean-up operation and what is cleaned up as part of garbage removal. It is a little bit of extra code, but also makes the intent clearer (which files belong to the snapshot and need to be cleaned up). Separating this in terms of code also allows us to apply different logging levels and messages instead of having a vague one. I was pondering for example whether we should provide a summary of clean-up information about dangling snapshot files at info level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.

👍

@jpountz jpountz added v7.4.0 and removed v7.3.0 labels Jul 3, 2019
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really sorry to come so late about this change, but I'm a bit uncomfortable with this change. As a user I would not expect that deleting a specific snapshot also cleans up orphan files left behind by a failed creation or deletion.

Sorry if that sounds stupid, but I'd expect the repository clean up tool to do this instead.

}
final SnapshotInfo finalSnapshotInfo = snapshot;
try {
blobContainer().deleteBlobsIgnoringIfNotExists(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.

👍

@original-brownbear
Copy link
Contributor Author

@tlrx

Sorry if that sounds stupid, but I'd expect the repository clean up tool to do this instead.

No worries it doesn't. I think in a perfect world I'd agree (not with the tool approach as that's very specific to the Cloud providers and not generally applicable due to the lack of locking on the repo and the unreliability of timestamps in general) and say that the packaging of the cleanup in an endpoint (incoming in #43900) is enough and there shouldn't be any need for automatic cleanup.
In practice though, I think the long-term cost of never automatically cleaning up stale data is pretty significant for one. Also, it makes operating the snapshot repository needlessly complex and forces users to run the cleanup endpoint at some frequency (running it costs money in terms of API calls as does storage, solar scale operations will need to balance here to find an optimal approach). Automatic cleanups need fewer API calls since they can reused data from the normal delete and don't require manual maintenance. That was the motivation here in a nutshell :) Moving forward I think it would be wise to further optimize this functionality to require fewer list calls and be a little less brute-force but with the current situation and backwards-compatibility constraints I think this is the cleanest way forward for now.

@original-brownbear original-brownbear requested a review from tlrx July 4, 2019 09:10
@original-brownbear
Copy link
Contributor Author

@ywelsch @tlrx I've now brought back the separate delete call for the known root level meta blobs and adjusted the logic to not catch those in the cleanup (so that I could move the cleanup to INFO log level as Yannick suggested).

@original-brownbear
Copy link
Contributor Author

Jenkins run elasticsearch-ci/packaging-sample
Jenkins run elasticsearch-ci/2

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left one comment, once that's addressed, LGTM

snapshotId,
ActionListener.map(listener, v -> {
cleanupStaleIndices(foundIndices, survivingIndices);
// Cleaning up according to repository data before the delete so we don't accidentally identify the two just deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this too subtle, in particular because the cleanupStaleRootFiles method now operates on an old repository data, which is not clear when looking at that method itself.

Can you instead do something like the following:

diff --git a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
index 8b731f05a39..5c6695a93e0 100644
--- a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
+++ b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
@@ -59,6 +59,7 @@ import org.elasticsearch.common.settings.Setting;
 import org.elasticsearch.common.settings.Settings;
 import org.elasticsearch.common.unit.ByteSizeUnit;
 import org.elasticsearch.common.unit.ByteSizeValue;
+import org.elasticsearch.common.util.set.Sets;
 import org.elasticsearch.common.xcontent.LoggingDeprecationHandler;
 import org.elasticsearch.common.xcontent.NamedXContentRegistry;
 import org.elasticsearch.common.xcontent.XContentFactory;
@@ -101,6 +102,7 @@ import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collection;
 import java.util.Collections;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
@@ -394,11 +396,10 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
             }
             // Delete snapshot from the index file, since it is the maintainer of truth of active snapshots
             final RepositoryData updatedRepositoryData;
-            final RepositoryData repositoryData;
             final Map<String, BlobContainer> foundIndices;
             final Set<String> rootBlobs;
             try {
-                repositoryData = getRepositoryData();
+                final RepositoryData repositoryData = getRepositoryData();
                 updatedRepositoryData = repositoryData.removeSnapshot(snapshotId);
                 // Cache the indices that were found before writing out the new index-N blob so that a stuck master will never
                 // delete an index that was created by another master node after writing this index-N blob.
@@ -410,9 +411,10 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
                 return;
             }
             final SnapshotInfo finalSnapshotInfo = snapshot;
+            final List<String> snapMetaFilesToDelete =
+                Arrays.asList(snapshotFormat.blobName(snapshotId.getUUID()), globalMetaDataFormat.blobName(snapshotId.getUUID()));
             try {
-                blobContainer().deleteBlobsIgnoringIfNotExists(
-                    Arrays.asList(snapshotFormat.blobName(snapshotId.getUUID()), globalMetaDataFormat.blobName(snapshotId.getUUID())));
+                blobContainer().deleteBlobsIgnoringIfNotExists(snapMetaFilesToDelete);
             } catch (IOException e) {
                 logger.warn(() -> new ParameterizedMessage("[{}] Unable to delete global metadata files", snapshotId), e);
             }
@@ -425,9 +427,8 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
                 snapshotId,
                 ActionListener.map(listener, v -> {
                     cleanupStaleIndices(foundIndices, survivingIndices);
-                    // Cleaning up according to repository data before the delete so we don't accidentally identify the two just deleted
-                    // blobs for the current snapshot as stale.
-                    cleanupStaleRootFiles(rootBlobs, repositoryData);
+                    // Remove snapMetaFilesToDelete, which have been deleted in a prior step, so that they are not identified as stale
+                    cleanupStaleRootFiles(Sets.difference(rootBlobs, new HashSet<>(snapMetaFilesToDelete)), updatedRepositoryData);
                     return null;
                 })
             );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will do :)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear Thanks for the detailed response, @ywelsch and I also talked about this via another channel. I understand the motivation around this change and my main concern is about multiple clusters accessing the same repository for writes (this is not supported, but we know many users do it) and how this change could make things even worse by silently deleting data (instead of just leaving orphaned files in the repository).

Anyway, I agree we can move forward with this change. We also think that adding a test (in another PR) that executes multiple concurrent snapshots deletions/creations and then checks that it always leaves the repository in a clean state would be a god idea.

@original-brownbear
Copy link
Contributor Author

@tlrx

Anyway, I agree we can move forward with this change. We also think that adding a test (in another PR) that executes multiple concurrent snapshots deletions/creations and then checks that it always leaves the repository in a clean state would be a god idea.

See SnapshotResiliencyTests we have that test (if I understand your wishes correctly) :) (not with concurrent deletes/creates from multiple clusters since that would always fail at some frequency) but it simulates all kinds of stuck nodes that lead to concurrent operations etc :)

@original-brownbear
Copy link
Contributor Author

@tlrx

how this change could make things even worse by silently deleting data (instead of just leaving orphaned files in the repository).

Actually there's a certain beauty to the approach here that makes this less of a concern :) If you look at the code here and for the stale indices clean up we always do this:

  1. Find what we want to delete
  2. Write updated index-N blob
  3. Actually delete

so in step 3 we always bring things back in line with the latest index-N blob. If there was a concurrent action that started between 1 and 3 we wouldn't clean up the files it created and the action itself would fail because of the index-N change that happened concurrently. Hope that helps :)

@original-brownbear original-brownbear merged commit b9ce649 into elastic:master Jul 4, 2019
@original-brownbear original-brownbear deleted the cleanup-round-2 branch July 4, 2019 13:49
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 11, 2019
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
   * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
   * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to elastic#42189
  * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
original-brownbear added a commit that referenced this pull request Jul 11, 2019
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
   * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
   * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to #42189
  * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants