-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Cleanup Stale Root Level Blobs in Sn. Repository #43542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup Stale Root Level Blobs in Sn. Repository #43542
Conversation
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to elastic#42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
|
Pinging @elastic/es-distributed |
| } | ||
| final SnapshotInfo finalSnapshotInfo = snapshot; | ||
| try { | ||
| blobContainer().deleteBlobsIgnoringIfNotExists( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we still do this, even though we might clean-up later as well? I wonder if doing this regardless of the listing helps with eventually consistent repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also explicitly deleting this here will avoid the bogus log message "Found stale root level blobs" later and remove the blobs here from the rootBlobs list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't:
- It's highly unlikely that we will miss a blob here even on S3. The only way this would happen in rare cases is if you were to delete a snapshot right after creating it.
- It needlessly adds RPC calls (and code) for no apparent gain.
- The logging is
debugonly anyway. If you think that message is confusing, we could just reword the message slightly so that it doesn't look like a potential issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's highly unlikely that we will miss a blob here even on S3. The only way this would happen in rare cases is if you were to delete a snapshot right after creating it.
True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.
It needlessly adds RPC calls (and code)
It's one extra RPC call per snapshot deletion, so ok I think. It also makes it clearer what needs to be cleaned up as part of normal clean-up operation and what is cleaned up as part of garbage removal. It is a little bit of extra code, but also makes the intent clearer (which files belong to the snapshot and need to be cleaned up). Separating this in terms of code also allows us to apply different logging levels and messages instead of having a vague one. I was pondering for example whether we should provide a summary of clean-up information about dangling snapshot files at info level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.
👍
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
tlrx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really sorry to come so late about this change, but I'm a bit uncomfortable with this change. As a user I would not expect that deleting a specific snapshot also cleans up orphan files left behind by a failed creation or deletion.
Sorry if that sounds stupid, but I'd expect the repository clean up tool to do this instead.
| } | ||
| final SnapshotInfo finalSnapshotInfo = snapshot; | ||
| try { | ||
| blobContainer().deleteBlobsIgnoringIfNotExists( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. I think though that if we can avoid relying on list operations for regular clean-up, then we should.
👍
No worries it doesn't. I think in a perfect world I'd agree (not with the tool approach as that's very specific to the Cloud providers and not generally applicable due to the lack of locking on the repo and the unreliability of timestamps in general) and say that the packaging of the cleanup in an endpoint (incoming in #43900) is enough and there shouldn't be any need for automatic cleanup. |
|
Jenkins run elasticsearch-ci/packaging-sample |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left one comment, once that's addressed, LGTM
| snapshotId, | ||
| ActionListener.map(listener, v -> { | ||
| cleanupStaleIndices(foundIndices, survivingIndices); | ||
| // Cleaning up according to repository data before the delete so we don't accidentally identify the two just deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this too subtle, in particular because the cleanupStaleRootFiles method now operates on an old repository data, which is not clear when looking at that method itself.
Can you instead do something like the following:
diff --git a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
index 8b731f05a39..5c6695a93e0 100644
--- a/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
+++ b/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
@@ -59,6 +59,7 @@ import org.elasticsearch.common.settings.Setting;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;
+import org.elasticsearch.common.util.set.Sets;
import org.elasticsearch.common.xcontent.LoggingDeprecationHandler;
import org.elasticsearch.common.xcontent.NamedXContentRegistry;
import org.elasticsearch.common.xcontent.XContentFactory;
@@ -101,6 +102,7 @@ import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
+import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Optional;
@@ -394,11 +396,10 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
}
// Delete snapshot from the index file, since it is the maintainer of truth of active snapshots
final RepositoryData updatedRepositoryData;
- final RepositoryData repositoryData;
final Map<String, BlobContainer> foundIndices;
final Set<String> rootBlobs;
try {
- repositoryData = getRepositoryData();
+ final RepositoryData repositoryData = getRepositoryData();
updatedRepositoryData = repositoryData.removeSnapshot(snapshotId);
// Cache the indices that were found before writing out the new index-N blob so that a stuck master will never
// delete an index that was created by another master node after writing this index-N blob.
@@ -410,9 +411,10 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
return;
}
final SnapshotInfo finalSnapshotInfo = snapshot;
+ final List<String> snapMetaFilesToDelete =
+ Arrays.asList(snapshotFormat.blobName(snapshotId.getUUID()), globalMetaDataFormat.blobName(snapshotId.getUUID()));
try {
- blobContainer().deleteBlobsIgnoringIfNotExists(
- Arrays.asList(snapshotFormat.blobName(snapshotId.getUUID()), globalMetaDataFormat.blobName(snapshotId.getUUID())));
+ blobContainer().deleteBlobsIgnoringIfNotExists(snapMetaFilesToDelete);
} catch (IOException e) {
logger.warn(() -> new ParameterizedMessage("[{}] Unable to delete global metadata files", snapshotId), e);
}
@@ -425,9 +427,8 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
snapshotId,
ActionListener.map(listener, v -> {
cleanupStaleIndices(foundIndices, survivingIndices);
- // Cleaning up according to repository data before the delete so we don't accidentally identify the two just deleted
- // blobs for the current snapshot as stale.
- cleanupStaleRootFiles(rootBlobs, repositoryData);
+ // Remove snapMetaFilesToDelete, which have been deleted in a prior step, so that they are not identified as stale
+ cleanupStaleRootFiles(Sets.difference(rootBlobs, new HashSet<>(snapMetaFilesToDelete)), updatedRepositoryData);
return null;
})
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will do :)
tlrx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@original-brownbear Thanks for the detailed response, @ywelsch and I also talked about this via another channel. I understand the motivation around this change and my main concern is about multiple clusters accessing the same repository for writes (this is not supported, but we know many users do it) and how this change could make things even worse by silently deleting data (instead of just leaving orphaned files in the repository).
Anyway, I agree we can move forward with this change. We also think that adding a test (in another PR) that executes multiple concurrent snapshots deletions/creations and then checks that it always leaves the repository in a clean state would be a god idea.
See |
Actually there's a certain beauty to the approach here that makes this less of a concern :) If you look at the code here and for the stale indices clean up we always do this:
so in step 3 we always bring things back in line with the latest index-N blob. If there was a concurrent action that started between |
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to elastic#42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to #42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
This leaves only cleaning up unreferenced index metadata blobs (in the index folders) and some rare corner cases of
snap-andmeta-blobs in shard (older repos in which shard deletes failed).