Add periodic maintenance task to clean up unused blob store cache docs #78438

tlrx · 2021-09-29T10:05:28Z

In #77686 we added a service to clean up blob store cache docs after a searchable snapshot is no more used. We noticed some situations where some cache docs could still remain in the system index: when the system index is not available when the searchable snapshot index is deleted; when the system index is restored from a backup or when the searchable snapshot index was deleted on a version before #77686.

This pull request introduce a maintenance task that periodically scans and cleans up unused blob cache docs. This task is scheduled to run every hour on the data node that contain the blob store cache primary shard. The periodic task works by using a point in time context with search_after. I avoided to use the Reindex and AbstractBulkByScrollRequest infrastructure as it requires a lot of plumbing and I wanted this task to be easily modifiable in the case we want to upgrade/reindex the system index by ourselves in the future.

tlrx · 2021-09-29T10:08:57Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

-                    if (Objects.equals(indexId, otherIndexId)) {
-                        return true;
-                    }
+                if (Objects.equals(snapshotId, SNAPSHOT_SNAPSHOT_ID_SETTING.get(indexSettings))


Note: this method does not check if the snapshot belongs to the same repository. Since snapshot UUID and index UUID are unique ids I think the risk of collision is very limited, given that in case of collision the cached docs are not deleted.

tlrx · 2021-09-29T10:10:08Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+
+                    final Set<Tuple<String, String>> knownSnapshots = new HashSet<>();
+                    final Set<Tuple<String, String>> missingSnapshots = new HashSet<>();
+                    final Set<String> knownRepositories = state.metadata()


We keep around the snapshots in order to avoid reiterating over all indices in cluster state.

tlrx · 2021-09-29T10:12:08Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+
+                        // See {@link BlobStoreCacheService#generateId}
+                        // doc id = {repository name}/{snapshot id}/{snapshot index id}/{shard id}/{file name}/@{file offset}
+                        final String[] parts = Objects.requireNonNull(searchHit.getId()).split("/");


I noticed that the repository name is part of the document id :( Using our own maintenance task here could help to reindex the docs without the repository name, if we think the snapshot UUID + index UUID are unique enough to avoid collisions.

henningandersen

I went through the production code changes, I have one concern I would like to clarify before reviewing the rest.

henningandersen · 2021-09-30T08:45:49Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+        if (periodicTask == null || periodicTask.isCancelled()) {
+            schedulePeriodic = true;
+            schedulePeriodicTask();
+        }


nit: since we have stop in it's own method, I would prefer to also add a start method.

Sure, I pushed 71ff57a

henningandersen · 2021-09-30T09:15:09Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+                            final Tuple<String, String> snapshot = Tuple.tuple(parts[1], parts[2]);
+                            boolean isMissing = missingSnapshots.contains(snapshot);
+                            boolean isKnown = knownSnapshots.contains(snapshot);
+                            if (isMissing


Rather than maintain this extra state and loop through the indices for every snapshot we see, I wonder if we could simply find the set of snapshots to keep after opening the PIT? Since the PIT will not return docs indexed after it was opened, that should be relatively safe.

There is a race condition though, in that we risk a searchable snapshot in the process of being mounted but this node does not know about it yet. That race also exists with the current structure.

I can think of a few options:

Remember max seq-no from last round and only allow deleting entries with lower seq-no than that.

Wait 10 sec after opening PIT before proceeding.

Collect the snapshot ids to delete but wait a round before deleting, rechecking that they can still be deleted.

Ignore the issue, put in a comment and accept that we rarely delete a bit too eagerly.

Add a timestamp to each entry, allowing us to not delete entries newer than an hour or so.

Do you have other ideas/options?

Thanks for this feedback. I used to consider this cache as a best effort thing so in my mind I accepted that we sometimes delete cache entries too early. With the fact that it runs not so frequently and that the cluster state is "refreshed" for every batch of docs to scan I found it acceptable.

But I do like your suggestion of using a timestamp. It fits well with the current behavior (assuming we list the snapshots to keep once the PIT is opened) and in fact I already added a creation_time field to the docs as I anticipated it would be useful in the future... for clean up.

So I'd go with listing the snapshots to keep and use the existing timestamp.

Sounds great, I somehow missed the timestamp (was not given as args to putAsync), but do see it now. I agree to this path forward.

I somehow missed the timestamp (was not given as args to putAsync)

I moved the creation time as an argument of putAsync in 6ac5a6e. This is more natural and it is useful in tests too.

I pushed 70b7666 to compute the existing searchable snapshots before the first bulk request is executed. I updated the test to check that recent cache entries are correctly ignored.

tlrx · 2021-09-30T18:53:29Z

Thanks Henning for your first comments. I updated the code, let me know if you have more feedback.

henningandersen

Thanks Tanguy, I left a number of smaller comments, otherwise this looks good.

henningandersen · 2021-09-30T18:57:02Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+    /**
+     * The interval at which the periodic cleanup of the blob store cache index is scheduled.
+     */
+    public static final Setting<TimeValue> SNAPSHOT_SNAPSHOT_CLEANUP_INTERVAL_SETTING = Setting.timeSetting(


Let us document these settings as part of this PR.

I have no idea where the documentation for these settings should go. searchable-snapshots/index.asciidoc is more of a high-level explanation of the feature. Maybe in the Mount API since this is what triggers the creation of the system index?

I think we can put it in the high level description for now (also contains the cache size settings). Maybe @jrodewig can then figure out if a new settings page should be added as a follow-up?

Thanks @tlrx @henningandersen.

I agree that adding this setting to the high-level page is okay for now. That page currently includes docs for some of the cache-related settings, like xpack.searchable.snapshot.shared_cache.size.

I'll defer to @debadair on whether to create a separate page in the long term. She's the docs lead for things related to searchable snapshots and data tiers.

Thanks both! I pushed 8909ab8, suggestions welcome

henningandersen · 2021-09-30T19:01:49Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+
+                if (searchResponse == null) {
+                    final SearchSourceBuilder searchSource = new SearchSourceBuilder();
+                    searchSource.trackScores(false);


Should we add searchSource.trackTotalHits(searchAfter==null)?

Makes sense yes, I pushed 2a84b4a

henningandersen · 2021-09-30T19:02:56Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+                    searchSource.pointInTimeBuilder(pointInTime);
+                    pointInTime.setKeepAlive(keepAlive);


Perhaps swap the order of these two lines?

Sure, pushed in 2a84b4a

henningandersen · 2021-09-30T19:03:36Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+                    if (searchAfter != null) {
+                        searchSource.searchAfter(searchAfter);
+                    }


Can we move this before creating the search request?

Sure, pushed in 2a84b4a

henningandersen · 2021-09-30T19:11:25Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+                        .minus(retention.duration(), retention.timeUnit().toChronoUnit());
+
+                    // compute the list of existing searchable snapshots once
+                    Map<String, Set<String>> knownSnapshots = existingSnapshots;


I think we can make this a field to avoid collecting this for every search round?

I think it was already computed once. But I reworked this a bit in 15ca8cd where the expiration time, snapshots and repositories are all computed once.

henningandersen · 2021-09-30T19:25:42Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+                    final BulkRequest bulkRequest = new BulkRequest();
+
+                    final TimeValue retention = periodicTaskRetention;
+                    final Instant expirationTime = Instant.ofEpochMilli(threadPool.absoluteTimeInMillis())


We can also make this a field. At least we should take current time before taking the state (though this is best-effort and it thus really does not matter).

Makes sense. I made expirationTime a field and used it as a trigger to compute all other ones (snapshots and repos) in 15ca8cd

henningandersen · 2021-09-30T19:30:21Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+                                deleteRequest.setIfSeqNo(searchHit.getSeqNo());
+                                deleteRequest.setIfPrimaryTerm(searchHit.getPrimaryTerm());


I wonder if this is necessary. Not that it harms, but we do not retry on failure and I think it would be correct without it. Obviously it will be sorted on next round in case it should fail.

I added this to avoid collision in case of a primary shard failing over. But since we only delete and don't retry or log, it does not make much sense. I removed this in 1d0654f

henningandersen · 2021-09-30T19:39:12Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

+        threadPool.generic().execute(maintenanceTask);
+    }
+
+    private void complete(PeriodicMaintenanceTask maintenanceTask, @Nullable Exception failure) {


Could this not be a method on PeriodicMaintenanceTask to avoid the maintenanceTask parameter here and the dereferences of it in the implementation?

Sure, I pushed b009c35.

henningandersen · 2021-09-30T19:46:37Z

...k/searchablesnapshots/cache/blob/SearchableSnapshotsBlobStoreCacheMaintenanceIntegTests.java

+        assertAcked(client().admin().cluster().prepareDeleteRepository("repo"));
+        ensureClusterStateConsistency();
+
+        refreshSystemIndex(false);


Is this line necessary?

Looks like a left over from debugging the test. It is not required as the system index is deleted before. I removed it in f963685

henningandersen · 2021-09-30T19:49:43Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

            return; // state not fully recovered
        }
        final ShardRouting primary = systemIndexPrimaryShard(state);
        if (primary == null || Objects.equals(state.nodes().getLocalNodeId(), primary.currentNodeId()) == false) {


I wonder if we should also check that shard is active here?

Yes we should, I thought I added this in #77686 🤔 I pushed 03d178b

tlrx · 2021-10-01T11:35:48Z

@henningandersen This is ready for another round of review; I'm still trying to find where adding the doc for the new settings.

henningandersen

LGTM, thanks for the docs and extra iteration.

henningandersen · 2021-10-01T17:53:47Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java

-                            maintenanceTask.searchAfter = null;
-                            executeNext(maintenanceTask);
+                            if (searchAfter == null) {
+                                PeriodicMaintenanceTask.this.total.compareAndSet(0L, response.getHits().getTotalHits().value);


Can we instead assert that total == 0 and then set the value?

henningandersen · 2021-10-01T18:06:35Z

...org/elasticsearch/xpack/searchablesnapshots/cache/blob/BlobStoreCacheMaintenanceService.java


                if (searchResponse == null) {
                    final SearchSourceBuilder searchSource = new SearchSourceBuilder();
+                    searchSource.fetchField(CachedBlob.CREATION_TIME_FIELD);


Can we use a new FieldAndFormat(CachedBlob.CREATION_TIME_FIELD, "epoch_millis") to avoid parsing the string to long in getCreationTime?

Yes, but it seems that the Fields API still return a String to be parsed.

tlrx · 2021-10-04T11:16:10Z

Thanks Henning and James!

elastic#78438) In elastic#77686 we added a service to clean up blob store cache docs after a searchable snapshot is no more used. We noticed some situations where some cache docs could still remain in the system index: when the system index is not available when the searchable snapshot index is deleted; when the system index is restored from a backup or when the searchable snapshot index was deleted on a version before elastic#77686. This commit introduces a maintenance task that periodically scans and cleans up unused blob cache docs. This task is scheduled to run every hour on the data node that contain the blob store cache primary shard. The periodic task works by using a point in time context with search_after.

…he docs (#78610) * Add periodic maintenance task to clean up unused blob store cache docs (#78438) In #77686 we added a service to clean up blob store cache docs after a searchable snapshot is no more used. We noticed some situations where some cache docs could still remain in the system index: when the system index is not available when the searchable snapshot index is deleted; when the system index is restored from a backup or when the searchable snapshot index was deleted on a version before #77686. This commit introduces a maintenance task that periodically scans and cleans up unused blob cache docs. This task is scheduled to run every hour on the data node that contain the blob store cache primary shard. The periodic task works by using a point in time context with search_after. * fix

tlrx added 4 commits September 28, 2021 21:59

Add periodic maintenance task for snapshot blob cache

2b3d400

Merge branch 'master' into periodic-maintenance-task

e18e56c

nits

9710972

tests

afb40f7

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.16.0 labels Sep 29, 2021

tlrx changed the title ~~Periodic maintenance task~~ Add periodic maintenance task to clean up unused blob store cache docs Sep 29, 2021

tlrx commented Sep 29, 2021

View reviewed changes

tlrx requested a review from henningandersen September 29, 2021 10:48

tlrx mentioned this pull request Sep 29, 2021

Add maintenance service to clean up unused docs in snapshot blob cache #77686

Merged

henningandersen reviewed Sep 30, 2021

View reviewed changes

tlrx added 4 commits September 30, 2021 13:33

startPeriodicTask

71ff57a

move creation time up

6ac5a6e

keep existing snapshots + use creation time

70b7666

Merge branch 'master' into periodic-maintenance-task

74a2ca6

tlrx requested a review from henningandersen September 30, 2021 18:53

henningandersen reviewed Sep 30, 2021

View reviewed changes

tlrx added 8 commits October 1, 2021 10:17

track total hits 1st search

2a84b4a

docvalue + expiration time fields

15ca8cd

log instead of delete

1966a83

remove seq no

1d0654f

unnecessary

f963685

active

03d178b

complete

b009c35

Merge branch 'master' into periodic-maintenance-task

2aa06cc

tlrx requested a review from henningandersen October 1, 2021 11:35

doc

8909ab8

henningandersen approved these changes Oct 1, 2021

View reviewed changes

tlrx added 2 commits October 4, 2021 09:44

Merge branch 'master' into periodic-maintenance-task

85638d0

nits

c8d1067

tlrx merged commit 63d663e into elastic:master Oct 4, 2021

tlrx deleted the periodic-maintenance-task branch October 4, 2021 11:16

tlrx mentioned this pull request Oct 4, 2021

[7.x] Add periodic maintenance task to clean up unused blob store cache docs #78610

Merged

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

		searchSource.pointInTimeBuilder(pointInTime);
		pointInTime.setKeepAlive(keepAlive);

		deleteRequest.setIfSeqNo(searchHit.getSeqNo());
		deleteRequest.setIfPrimaryTerm(searchHit.getPrimaryTerm());

Add periodic maintenance task to clean up unused blob store cache docs #78438

Add periodic maintenance task to clean up unused blob store cache docs #78438

Uh oh!

Conversation

tlrx commented Sep 29, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlrx commented Sep 30, 2021

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrodewig Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen Sep 30, 2021 •

edited

Loading

jrodewig Oct 1, 2021 •

edited

Loading