Skip unnecessary loading of `IndexMetadata` during snapshot deletion #134441

joshua-adams-1 · 2025-09-10T10:39:05Z

Skip unnecessary loading of IndexMetadata during snapshot deletion

During snapshot deletion we load the metadata for an index into heap purely to calculate the shard count. On nodes with small heaps, loading multiple objects concurrently can cause them to OOMe.

Rather than recomputing the shard count value for the same index across multiple snapshots, this change stores the shard count in a map. This stops loading multiple IndexMetaData objects into heap for the same index.

This means that for commands such as:

assertAcked(client().admin().cluster().prepareDeleteSnapshot(timeout, repoName, snapshotName1, snapshotName2).get())

assuming both snapshots include Index1, we only loaded Index1's metadata once.

For commands such as:

assertAcked(client().admin().cluster().prepareDeleteSnapshot(timeout, repoName, snapshotName1).get())
assertAcked(client().admin().cluster().prepareDeleteSnapshot(timeout, repoName, snapshotName2).get())

since these are different delete snapshot requests, we load Index1's metadata once for each

Relates to: #131822

Closes ES-12539

During snapshot deletion we load the metadata for an index into heap purely to calculate the shard count. Since the number of primary shards does not change for an index, we can store this to avoid recomputation, and avoid repeatedly loading large metadata objects into heap which can cause small nodes to OOMe. Closes ES-12539

During snapshot deletion we load the metadata for an index into heap purely to calculate the shard count. Since the number of primary shards does not change for an index, we can store this when the snapshot is being created, so that upon deletion we avoid recomputing this value and loading large metadata objects into heap which can cause small nodes to OOMe. Closes ES-12539 Closes ES-12538 Closes elastic#100569

During snapshot deletion we load the metadata for an index into heap purely to calculate the shard count. On nodes with small heaps, loading multiple objects concurrently causes them to OOMe. Rather than recomputing the shard count value for the same index across multiple snapshots, this change stores the shard count in a map. This stops loading multiple IndexMetaData objects for the same index into memory. Closes ES-12539

…a-adams-1/elasticsearch into unnecessary-loading-index-metadata

joshua-adams-1 · 2025-09-10T12:39:46Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

-            + indexMetaData.getMappingVersion()
-            + "-"
-            + indexMetaData.getAliasesVersion();
+        return indexMetaData.getIndexUUID() + DELIMITER + indexMetaData.getSettings()


I prefer the other format but spotless keeps reformatting it to this

DaveCTurner · 2025-09-10T12:57:29Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

+    // Cannot use - as a delimiter because getIndexUUID() produces UUIDs with - in them
+    private static final String DELIMITER = "/";


For better or for worse I don't think we can change this. There are repositories out there in production that use - as their delimiter, we can't just ignore them like this.

AFAICT this identifier is used as a UUID for the index meta data object, not for a repository

That being said, when I think about it, if this code is deployed to a cluster already with a repository, and already with index metadata classes instantiated (with - inside their UUIDs) this code would break, right?

Yes exactly.

DaveCTurner · 2025-09-10T12:58:44Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        /**
+         * Maps the Index UUID to its shard count
+         */
+        private final ConcurrentMap<String, Integer> indexUUIDToShardCountMap = new ConcurrentHashMap<>();


This is going to grow to enormous size. I'd much prefer to scope this tracking down to just a single IndexSnapshotsDeletion which gets discarded as soon as we've finished processing that index.

DaveCTurner · 2025-09-10T13:18:27Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                    for (SnapshotId snapshotId : snapshotIds.stream().filter(snapshotsWithIndex::contains).collect(Collectors.toSet())) {
+                        snapshotExecutor.execute(ActionRunnable.run(listeners.acquire(), () -> {


All the IDs in snapshotIds are already unique so there's no need to build a separate Set if we're doing it like this. But I don't think we should remove the .map() call here, we should process each indexMetadataId once, rather than spawning a task for every single snapshot.

I dropped the map since I need to use the snapshotId to generate two subsequent IDs. I'm not sure how I can (cleanly) write the for loop to include the map, but also generate the required indexMetadataId from the snapshotId

Are you sure you need to call snapshotIndexMetadataIdentifier yourself? I think calling indexMetaDataGenerations().indexMetaBlobId() is sufficient. If the ID that this returns doesn't contain an index UUID and all the other junk (i.e. it's just the plain snapshot UUID) then you have to load it.

DaveCTurner · 2025-09-10T13:38:27Z

would not load any index metadata into heap, assuming the repository was not deleted.

I think the safest way to do this would be to record the (max) shard count for each index in the RepositoryData stored in the repository. We can't reasonably keep it in memory on the master, updating it as we finalize snapshots, because then it'd be lost on a master failover.

Adds extra test capability to handle deleted indices

…om/joshua-adams-1/elasticsearch into unnecessary-loading-index-metadata

elasticsearchmachine · 2025-10-22T08:26:44Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

DaveCTurner

Some comments on the production code. I skimmed through the tests and they look to be doing the right sort of thing, will come back to them in a later cycle.

DaveCTurner · 2025-10-22T09:59:59Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

-                        // unnecessary to read multiple metadata blobs corresponding to the same index UUID.
-                        // TODO Skip this unnecessary work? Maybe track the shard count in RepositoryData?
-                        snapshotExecutor.execute(ActionRunnable.run(listeners.acquire(), () -> getOneShardCount(indexMetaGeneration)));
+                        snapshotExecutor.execute(ActionRunnable.run(listeners.acquire(), () -> {


I'd rather we didn't fork the no-op tasks - can we check the index UUID first?

DaveCTurner · 2025-10-22T10:29:03Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            /**
+             * Maps the Index UUID to its shard count
+             */
+            private final ConcurrentMap<String, Integer> indexUUIDToShardCountMap = new ConcurrentHashMap<>();


No need for a map here, we're looking for the max so we can just update the shardCount field as needed.

DaveCTurner · 2025-10-22T10:29:58Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                                // indexUUIDToShardCountMap is shared across all threads. Therefore, while there may be an entry for this
+                                // UUID in the map, there is no guarantee that we've encountered it in this thread.
+                                updateShardCount(indexUUIDToShardCountMap.get(indexUUID));


This isn't necessary, we don't read the shardCount field until all these tasks have finished.

DaveCTurner · 2025-10-22T10:31:16Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                            // index UUID; the shard count is going to be the same for all metadata with the same index UUID, so it is
+                            // unnecessary to read multiple metadata blobs corresponding to the same index UUID.
+                            String indexUUID = originalRepositoryData.indexMetaDataGenerations().convertBlobIdToIndexUUID(blobId);
+                            if (indexUUIDToShardCountMap.containsKey(indexUUID) == false) {


This still risks duplicates since we call containsKey separately from put. Once we're just tracking the set of the index UUIDs we can just call add here which returns a flag indicating whether the item was newly added.

DaveCTurner · 2025-10-22T10:32:34Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                            // NB since 7.9.0 we deduplicate index metadata blobs, and one of the components of the deduplication key is the
+                            // index UUID; the shard count is going to be the same for all metadata with the same index UUID, so it is
+                            // unnecessary to read multiple metadata blobs corresponding to the same index UUID.
+                            String indexUUID = originalRepositoryData.indexMetaDataGenerations().convertBlobIdToIndexUUID(blobId);


This will be null if the index metadata blob is in the pre-7.9.0 format, but we have to load all such blobs. Not sure if the map accepts a null key, but even if it does it won't accept more than one of them.

DaveCTurner · 2025-10-22T10:36:30Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

+        for (Map.Entry<String, String> entry : this.identifiers.entrySet()) {
+            if (Objects.equals(entry.getValue(), blobId)) {


Rather than a linear scan for each blob ID, could we compute a Map<String,String> which inverts identifiers and use that for all the lookups?

DaveCTurner · 2025-10-22T10:38:35Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

+        // The uniqueIdentifier was built in buildUniqueIdentifier, is of the format IndexUUID-String-long-long-long,
+        // and uses '-' as a delimiter.
+        // The below regex accounts for the fact that the IndexUUID can also contain the '-' character
+        Pattern pattern = Pattern.compile("^(.*?)-[^-]+-\\d+-\\d+-\\d+$");


The index UUID (and the history UUID) are fixed-length. Can we just split the string at the appropriate index?

AFAICT the length is set to 22, here, or set to _na_ if unknown, here. Will simplify this accordingly

- Remove unreferenced method - Rename convertBlobIdToIndexUUID to getIndexUUIDFromBlobId - Tweak comments

DaveCTurner

Sorry got to dash but I saw a couple more things. Will give it a more thorough look next week.

DaveCTurner · 2025-10-23T16:06:16Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

+    /**
+     * Map of blob uuid to index metadata identifier. This is a reverse lookup of the identifiers map.
+     */
+    final Map<String, String> blobUuidToIndexMetadataMap;


We only need this for a snapshot deletion, and it could be pretty big, so I'd suggest we don't pre-compute it for every IndexMetaDataGenerations instance. Instead, let's have a method that computes and returns this map on demand which the deletion calls once up-front.

DaveCTurner · 2025-10-23T16:08:59Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

+        if (na) {
+            return ClusterState.UNKNOWN_UUID;


I don't think this is possible for a blob that actually ended up in the repository - we use _na_ for the index UUID only if the index hasn't been created yet, but in that case we couldn't possibly have snapshotted it.

DaveCTurner · 2025-10-23T16:10:57Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

+            return ClusterState.UNKNOWN_UUID;
+        }
+        assert uniqueIdentifier.length() >= 22;
+        return uniqueIdentifier.substring(0, 22);


Rather than doing this parsing on demand I'd suggest we put the substring in the blobUuidToIndexMetadataMap right away. That'll reduce the size of all the values in this map by more than half.

…om/joshua-adams-1/elasticsearch into unnecessary-loading-index-metadata

DaveCTurner · 2025-11-04T15:10:20Z

server/src/main/java/org/elasticsearch/repositories/IndexMetaDataGenerations.java

+        // Map of blob id to index uuid. This is a reverse lookup of the identifiers map.
+        final Map<String, String> blobUuidToIndexUUIDMap = identifiers.entrySet()
+            .stream()
+            .collect(


This is going to walk identifiers and construct a new complete map for every single call to getIndexUUIDFromBlobId (of which there may be thousands++). We need to construct this map once at the start of the snapshot deletion process and then use it for each lookup.

DaveCTurner · 2025-11-04T15:12:35Z

test/framework/src/main/java/org/elasticsearch/test/ESSingleNodeTestCase.java

+        // Wait for the cluster to be green after deletion
+        ensureGreen();


Hmm this seems like a surprising side-effect of this method: if the cluster was green before the delete then it'll be green immediately after (no shards will have become unassigned) and conversely if the cluster wasn't green first then maybe we shouldn't wait for it to become green here.

DaveCTurner · 2025-11-04T15:13:16Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                        // NB if the index metadata blob is in the pre-7.9.0 format then this will return null
+                        String indexUUID = originalRepositoryData.indexMetaDataGenerations().getIndexUUIDFromBlobId(blobId);
+
+                        // Without an index UUID, we don't know if we've encountered this index before and must read it's IndexMetadata


grammar nit

Suggested change

// Without an index UUID, we don't know if we've encountered this index before and must read it's IndexMetadata

// Without an index UUID, we don't know if we've encountered this index before and must read its IndexMetadata

DaveCTurner

Given the massive savings we've been able to find in #137210 I wonder if we should defer this for now. Computing blobIdToIndexUuidMap is kinda expensive, even if we do it just once, and its expense scales with the size of the repository rather than the number of indices being deleted - if we're only deleting a few indices then most of that expense is wasted. We can find ways to refine this for sure, and it's kinda nice that this might reduce the number of blobs we read during a large delete, but I don't think this is critical once #137210 lands and I worry that it might end up causing more problems than it solves.

DaveCTurner · 2025-11-12T10:19:50Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

            private void determineShardCount(ActionListener<Void> listener) {
                try (var listeners = new RefCountingListener(listener)) {
-                    for (final var indexMetaGeneration : snapshotIds.stream()
+                    Map<String, String> blobIdToIndexUuidMap = originalRepositoryData.indexMetaDataGenerations().getBlobIdToIndexUuidMap();


This gets called once per index, which could still be (hundreds-of-)thousands of times, but it returns the same value each time. It really needs to be done once for the entire snapshot deletion process.

I've made the above suggestion irrespective of whether we merge this change 👍

…om/joshua-adams-1/elasticsearch into unnecessary-loading-index-metadata

joshua-adams-1 · 2025-11-12T12:24:28Z

I wonder if we should defer this for now

@DaveCTurner I agree that there is a computation cost in generating the blobIdToIndexUuidMap , even if this computation is moved into the SnapshotsDeletion class and therefore only computed once per snapshot deletion. However, for large deletes, there will be a benefit in reducing the number of reads from heap, especially given that #137210 only reduces allocations and not the amount of time / work during parsing (since we still have to read to the end of the object).

Therefore, I don't want to introduce "magic numbers" but is it worth considering logic of "if this is a big delete then compute this map so we don't have to keep reading from heap memory, else set this map to null so that we're forced to read from the heap everytime, but we don't care because we're only deleting 3 indices and it's quicker that way"? We could use a setting for this value, whatever we decide that to be

DaveCTurner · 2025-11-12T12:28:28Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+        /**
+         * A map of blob id to index UUID
+         */
+        private final Map<String, String> blobIdToIndexUuidMap;


As a field here we will be retaining this map until the very end of the post-deletion cleanup when the SnapshotsDeletion instance becomes unreachable, but that will overlap with the next snapshot operation (which may be another deletion, generating another such map, etc).

However, we only need this map for the determineShardCount calls at the start of each IndexSnapshotsDeletion, all of which happen before we update the RepositoryData root blob. I'd rather we dropped this potentially-large map as soon as possible, and definitely before allowing the next snapshot operation to proceed. It'd be best if it were a local variable computed in writeUpdatedShardMetadataAndComputeDeletes and passed as an argument to each determineShardCount call via IndexSnapshotsDeletion#run.

DaveCTurner · 2025-11-12T12:40:26Z

is it worth considering logic of "if this is a big delete then compute this map so we don't have to keep reading from heap memory, else set this map to null so that we're forced to read from the heap everytime, but we don't care because we're only deleting 3 indices and it's quicker that way"?

Right yeah if we really wanted to do this that would be something worth considering. The time taken to generate this map is less of a concern to me than the failures it might cause: with this change a tiny master may now go OOM during every snapshot delete if the repository has so many indices that this map's size becomes overwhelming. We could make it conditional (maybe even a function of the heap space available on the master and the size of the repository) and that might have been worthwhile when we started but the extra complexity carries its own costs and #137210 reduces the marginal benefits of this change.

ywangd · 2025-11-12T23:36:47Z

#137210 reduces the marginal benefits of this change.

I was wondering whether we still need this PR. My hunch is no. It might be helpful in certain theoretical cases. But still probably better to address them when actually observed?

joshua-adams-1 · 2025-11-13T12:29:28Z

Closing as per the above conversation

joshua-adams-1 added 2 commits September 10, 2025 10:35

joshua-adams-1 self-assigned this Sep 10, 2025

elasticsearchmachine added the v9.2.0 label Sep 10, 2025

elasticsearchmachine and others added 3 commits September 10, 2025 10:47

[CI] Auto commit changes from spotless

83114dd

Merge branch 'unnecessary-loading-index-metadata' of github.com:joshu…

7478143

…a-adams-1/elasticsearch into unnecessary-loading-index-metadata

joshua-adams-1 commented Sep 10, 2025

View reviewed changes

joshua-adams-1 added >non-issue :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. labels Sep 10, 2025

Merge branch 'main' into unnecessary-loading-index-metadata

2ba5675

DaveCTurner reviewed Sep 10, 2025

View reviewed changes

Moves indexUUIDToShardCountMap into IndexSnapshotsDeletion

ba5bf13

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

joshua-adams-1 mentioned this pull request Oct 10, 2025

[WIP] Skip unnecessary loading of IndexMetadata during snapshot deletion #136357

Closed

joshua-adams-1 added 5 commits October 20, 2025 15:40

Refactors determineShardCount

2f69539

Adds extra test capability to handle deleted indices

Extend tests to delete and recreate indices

00e264b

Merge branch 'unnecessary-loading-index-metadata' of https://github.c…

cf5abfe

…om/joshua-adams-1/elasticsearch into unnecessary-loading-index-metadata

Merge branch 'main' into unnecessary-loading-index-metadata

3293762

Fix comments

f76ad00

joshua-adams-1 marked this pull request as ready for review October 22, 2025 08:26

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 22, 2025

joshua-adams-1 requested a review from DaveCTurner October 22, 2025 08:31

DaveCTurner reviewed Oct 22, 2025

View reviewed changes

joshua-adams-1 added 3 commits October 23, 2025 10:55

David comments

c181d77

Clean up

ab17c35

Further clea up:

10f710d

- Remove unreferenced method - Rename convertBlobIdToIndexUUID to getIndexUUIDFromBlobId - Tweak comments

joshua-adams-1 added 2 commits October 23, 2025 11:50

Refactor Tests

31de1cb

Merge branch 'main' into unnecessary-loading-index-metadata

309a019

joshua-adams-1 requested a review from DaveCTurner October 23, 2025 14:46

DaveCTurner reviewed Oct 23, 2025

View reviewed changes

joshua-adams-1 mentioned this pull request Oct 27, 2025

Introduce INDEX_SHARD_COUNT_FORMAT #137210

Merged

joshua-adams-1 added 3 commits October 31, 2025 13:43

Comments for IndexMetaDataGenerations

1910411

Merge branch 'unnecessary-loading-index-metadata' of https://github.c…

a2becd1

…om/joshua-adams-1/elasticsearch into unnecessary-loading-index-metadata

Merge branch 'main' into unnecessary-loading-index-metadata

706f6e5

joshua-adams-1 requested a review from DaveCTurner October 31, 2025 14:56

DaveCTurner reviewed Nov 4, 2025

View reviewed changes

joshua-adams-1 and others added 4 commits November 5, 2025 15:03

David Comments

434b88f

Merge branch 'main' into unnecessary-loading-index-metadata

822f968

Fixing comment and variable name

830d64b

[CI] Auto commit changes from spotless

b8bd53c

joshua-adams-1 requested a review from DaveCTurner November 5, 2025 16:25

DaveCTurner reviewed Nov 12, 2025

View reviewed changes

joshua-adams-1 added 3 commits November 12, 2025 11:17

Move blobIdToIndexUuidMap int snapshots deletion

42a2830

Merge branch 'unnecessary-loading-index-metadata' of https://github.c…

4960b0b

…om/joshua-adams-1/elasticsearch into unnecessary-loading-index-metadata

Merge branch 'main' into unnecessary-loading-index-metadata

4986354

DaveCTurner reviewed Nov 12, 2025

View reviewed changes

Merge branch 'main' into unnecessary-loading-index-metadata

000c0dc

joshua-adams-1 closed this Nov 13, 2025

joshua-adams-1 mentioned this pull request Nov 13, 2025

Restrict resource usage during snapshots deletion #131822

Closed

		// Cannot use - as a delimiter because getIndexUUID() produces UUIDs with - in them
		private static final String DELIMITER = "/";

		for (SnapshotId snapshotId : snapshotIds.stream().filter(snapshotsWithIndex::contains).collect(Collectors.toSet())) {
		snapshotExecutor.execute(ActionRunnable.run(listeners.acquire(), () -> {

		for (Map.Entry<String, String> entry : this.identifiers.entrySet()) {
		if (Objects.equals(entry.getValue(), blobId)) {

		// Wait for the cluster to be green after deletion
		ensureGreen();

	// Without an index UUID, we don't know if we've encountered this index before and must read it's IndexMetadata
	// Without an index UUID, we don't know if we've encountered this index before and must read its IndexMetadata

Skip unnecessary loading of IndexMetadata during snapshot deletion #134441

Skip unnecessary loading of IndexMetadata during snapshot deletion #134441

Uh oh!

Conversation

joshua-adams-1 commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Sep 10, 2025

Uh oh!

elasticsearchmachine commented Oct 22, 2025

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshua-adams-1 commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Nov 12, 2025

Uh oh!

ywangd commented Nov 12, 2025

Uh oh!

Skip unnecessary loading of `IndexMetadata` during snapshot deletion #134441

Skip unnecessary loading of `IndexMetadata` during snapshot deletion #134441

joshua-adams-1 commented Sep 10, 2025 •

edited

Loading

joshua-adams-1 Oct 23, 2025 •

edited

Loading