-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
The fact that snapshots are only identified by a name has led to some issues, especially with how snapshots are represented in their underlying storage repositories. For example, if a snapshot is deleted, but the deletion fails to delete some of the files in the repository, then a snapshot by the same name is created again, this could lead to some conflict and/or overwriting with the left over snapshot files. This is captured in #15579 and #13159.
In addition, snapshot names don't necessarily make good blob names for the snapshot repository. For example, having a : in the snapshot name is legal, but presents an issue with accessing that snapshot in URI based repositories. Therefore, we would have to strip those problematic characters when naming blobs and as a result, the name itself is no longer a valid way to uniquely identify each snapshot. See issue #7540.
A solution is to introduce the notion of a UUID for each snapshot. In this way, we can store the UUID along with the name for each snapshot, and the repository should identify, store, and retrieve snapshots using the UUID. UUIDs will also help define the repository semantics more clearly, as discussed in #15580.
This effort can be broken up with the following tasks, each of which will build on top of the previous tasks:
- Define the contract of the
BlobContainerAPI, using Javadocs. Clarify the semantics of the BlobContainer interface #18157 -
SnapshotInfoandSnapshotrepresent the same data, so just merge all usages toSnapshotInfoand get rid of theSnapshotclass. This will help us in naming as well, as you will see below. Remove the Snapshot class in favor of using SnapshotInfo #18167 - Modify the notion of a
SnapshotIdso it includes a UUID. Store the UUID along with the name in the repository's snapshot index file. Adds UUIDs to snapshots #18228 - When writing a new snapshot index file to the snapshot repository, ensure that it is an atomic move (similar to how the
MetaDataStateFormatclass atomically writes the metadata state to disk), and make the snapshot index file generational. Adding repository index generational files #19002 - Use the snapshot index file as the source of truth for which snapshots are in the repository and valid, instead of listing snapshot blobs and the snapshot index file as back-up. Listing the blobs in the repository as the source of truth for which snapshots are part of the ES cluster could lead to confusion if a snapshot deletion leaves behind undeleted files. Adding repository index generational files #19002
- Use snapshot UUIDs to name blobs in a snapshot repository. Snapshot UUIDs in blob names #19421
- Blobs related to indices in the snapshot repository should also be named with a strip down version of the name plus the index UUID, to prevent the same blob naming problems as described above. Snapshot UUIDs in blob names #19421
-
FsBlobContainershould pass inStandardOpenOptions.CREATE_NEWwhen opening a file output stream, so we never silently truncate/overwrite a file. Since snapshot files are stored by name + uuid, this should not be an problem to implement any longer. BlobContainer#writeBlob no longer can overwrite a blob #19749