Add peer recoveries using snapshot files when possible #76237

fcofdez · 2021-08-09T12:18:57Z

This commit adds peer recoveries using snapshots when it's possible to
reuse existing snapshot files.

Relates #73496

elasticmachine · 2021-08-09T12:18:59Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Great, I've done a first pass and left a few small comments but overall this looks to be the right sort of shape.

qa/snapshot-based-recoveries/fs/build.gradle

server/src/internalClusterTest/java/org/elasticsearch/indices/recovery/IndexRecoveryIT.java

server/src/main/java/org/elasticsearch/indices/recovery/MultiFileWriter.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySettings.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

server/src/main/java/org/elasticsearch/indices/recovery/SnapshotFilesProvider.java

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

server/src/test/java/org/elasticsearch/index/replication/RecoveryDuringReplicationTests.java

…-recovery

henningandersen

This is looking good. I did a first read of the production code and here are my initial comments. The main ones are about failure handling.

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

server/src/main/java/org/elasticsearch/indices/recovery/MultiFileWriter.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySnapshotFileRequest.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryState.java

server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTargetHandler.java

server/src/test/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetServiceTests.java

server/src/test/java/org/elasticsearch/indices/recovery/RecoverySourceHandlerTests.java

DaveCTurner · 2021-08-10T13:06:58Z

qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/SnapshotBasedRecoveryIT.java

+import static org.hamcrest.Matchers.lessThan;
+
+public class SnapshotBasedRecoveryIT extends AbstractRollingTestCase {
+    public void testSnapshotBasedRecovery() throws Exception {


I'm guessing there's not yet a way to check that we actually do use the snapshot for recovery here? But there will be when we enhance the recovery API.

qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/SnapshotBasedRecoveryIT.java

...nternalClusterTest/java/org/elasticsearch/indices/recovery/SnapshotBasedIndexRecoveryIT.java

qa/snapshot-based-recoveries/fs/build.gradle

DaveCTurner · 2021-08-10T13:18:02Z

Oh weird it seems you got some review comments one-by-one and then my summary comment got lost. Anyway, mostly a few smaller things remaining. I'll look over Henning's questions now.

…-recovery

…re done

…-recovery

henningandersen · 2021-08-12T13:26:54Z

...nternalClusterTest/java/org/elasticsearch/indices/recovery/SnapshotBasedIndexRecoveryIT.java

+            );
+
+            // We're setting the rate limiting at 50kb/s, meaning that
+            // we need an index with size > 50kb


I think we theoretically only need more than 256 bytes, since SimpleRateLimiter.MIN_PAUSE_CHECK_MSEC=5. We do need a bit more though to ensure we have enough time to handle if network and CI is generally slow, since if the experienced download rate is less than 50KB there will be no throttling. I would at least 4x that to be on a somewhat safe side against things like a single GC.

So 1000 docs minimum looks fine, but perhaps you want to update the comment a bit.

henningandersen · 2021-08-12T13:35:15Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

+        }
+
+        private void trackOutstandingRequest(ListenableFuture<Void> future) {
+            boolean cancelled = false;


nit: initialization is unused:

Suggested change

boolean cancelled = false;

boolean cancelled;

henningandersen · 2021-08-12T13:37:44Z

server/src/main/java/org/elasticsearch/indices/recovery/RecoverySourceHandler.java

+            boolean cancelled = false;
+            synchronized (outstandingRequests) {
+                outstandingRequests.add(future);
+                cancelled = cancellableThreads.isCancelled();


I think we should check the cancelled flag first and only add to outstandingRequests if cancelled==false? Otherwise we could risk adding a request to outstandingRequests and then never responding on it, since we bail out during checkForCancel below.

...nternalClusterTest/java/org/elasticsearch/indices/recovery/SnapshotBasedIndexRecoveryIT.java

henningandersen

LGTM, just 3 more comments.

fcofdez · 2021-08-12T14:17:06Z

@DaveCTurner do you want to take another look into the PR?

DaveCTurner

LGTM2, thanks Henning for the extra reviews too

…-recovery

This commit adds peer recoveries from snapshots. It allows establishing a replica by downloading file data from a snapshot rather than transferring the data from the primary. Enabling this feature is done on the repository definition. Repositories having the setting `use_for_peer_recovery=true` will be consulted to find a good snapshot when recovering a shard. Relates elastic#73496 Backport of elastic#76237

This commit adds peer recoveries from snapshots. It allows establishing a replica by downloading file data from a snapshot rather than transferring the data from the primary. Enabling this feature is done on the repository definition. Repositories having the setting `use_for_peer_recovery=true` will be consulted to find a good snapshot when recovering a shard. Relates #73496 Backport of #76237

Randomly add to use a snapshot for recovery to searchable snapshot and snapshot tests to verify that recover from snapshot does not break other features (those should not care about the flag). Relates #elastic#76237

Randomly add to use a snapshot for recovery to searchable snapshot and snapshot tests to verify that recover from snapshot does not break other features (those should not care about the flag). Relates #76237

Add test to verify that concurrently indexing together with recover from snapshot works. Relates elastic#76237

Add test to verify that concurrently indexing together with recover from snapshot works. Relates #76237

henningandersen · 2021-08-20T07:55:54Z

I completed some benchmarking against this, using a 100GB dataset against i3en_2xlarge. 13 indices/39 shards, sizes ranging from 40mb to 17.5gb.

~~With standard rate limiting (40MB/s), recovery time went from 249576ms to 175410ms.~~
With no rate limiting, recovery time went from 187219ms to 106938ms.

Comparison (interesting measurement is `wait-for-recovered-data-streams`), effectively no ratelimit

Comparing baseline
  Race ID: c3dbfe94-6dbf-4281-888f-3505fed82a33
  Race timestamp: 2021-08-19 13:09:52
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

with contender
  Race ID: 888259f5-d21f-4fcd-9861-4baa6873cc31
  Race timestamp: 2021-08-19 12:25:27
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                        Metric |                                                  Task |    Baseline |   Contender |         Diff |   Unit |
|--------------------------------------------------------------:|------------------------------------------------------:|------------:|------------:|-------------:|-------:|
|                    Cumulative indexing time of primary shards |                                                       |   0.0258167 |      0.0222 |     -0.00362 |    min |
|             Min cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|          Median cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Max cumulative indexing time across primary shard |                                                       |   0.0258167 |      0.0222 |     -0.00362 |    min |
|           Cumulative indexing throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|    Min cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
| Median cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Max cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                       Cumulative merge time of primary shards |                                                       |           0 |           0 |            0 |    min |
|                      Cumulative merge count of primary shards |                                                       |           0 |           0 |            0 |        |
|                Min cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Cumulative merge throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|       Min cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Median cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|       Max cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                     Cumulative refresh time of primary shards |                                                       |     0.00355 |     0.00405 |       0.0005 |    min |
|                    Cumulative refresh count of primary shards |                                                       |         516 |         520 |            4 |        |
|              Min cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|           Median cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Max cumulative refresh time across primary shard |                                                       |     0.00355 |     0.00405 |       0.0005 |    min |
|                       Cumulative flush time of primary shards |                                                       |      0.0019 |  0.00213333 |      0.00023 |    min |
|                      Cumulative flush count of primary shards |                                                       |          47 |          48 |            1 |        |
|                Min cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative flush time across primary shard |                                                       |      0.0019 |  0.00213333 |      0.00023 |    min |
|                                       Total Young Gen GC time |                                                       |       1.955 |       1.646 |       -0.309 |      s |
|                                      Total Young Gen GC count |                                                       |          53 |          44 |           -9 |        |
|                                         Total Old Gen GC time |                                                       |           0 |           0 |            0 |      s |
|                                        Total Old Gen GC count |                                                       |           0 |           0 |            0 |        |
|                                                    Store size |                                                       |     277.597 |     278.307 |      0.71002 |     GB |
|                                                 Translog size |                                                       | 8.70787e-06 | 8.70787e-06 |            0 |     GB |
|                                        Heap used for segments |                                                       |     14.5983 |     14.5973 |     -0.00103 |     MB |
|                                      Heap used for doc values |                                                       |     1.64378 |     1.64371 |       -7e-05 |     MB |
|                                           Heap used for terms |                                                       |     12.3456 |     12.3452 |     -0.00049 |     MB |
|                                           Heap used for norms |                                                       |   0.0812988 |   0.0812988 |            0 |     MB |
|                                          Heap used for points |                                                       |           0 |           0 |            0 |     MB |
|                                   Heap used for stored fields |                                                       |    0.527626 |    0.527161 |     -0.00047 |     MB |
|                                                 Segment count |                                                       |         881 |         880 |           -1 |        |
|                                                Min Throughput |                                      insert-pipelines |     5.80861 |     5.98248 |      0.17387 |  ops/s |
|                                               Mean Throughput |                                      insert-pipelines |     5.80861 |     5.98248 |      0.17387 |  ops/s |
|                                             Median Throughput |                                      insert-pipelines |     5.80861 |     5.98248 |      0.17387 |  ops/s |
|                                                Max Throughput |                                      insert-pipelines |     5.80861 |     5.98248 |      0.17387 |  ops/s |
|                                      100th percentile latency |                                      insert-pipelines |     2408.22 |     2338.04 |     -70.1868 |     ms |
|                                 100th percentile service time |                                      insert-pipelines |     2408.22 |     2338.04 |     -70.1868 |     ms |
|                                                    error rate |                                      insert-pipelines |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                            insert-ilm |     16.4125 |     22.9607 |      6.54826 |  ops/s |
|                                               Mean Throughput |                                            insert-ilm |     16.4125 |     22.9607 |      6.54826 |  ops/s |
|                                             Median Throughput |                                            insert-ilm |     16.4125 |     22.9607 |      6.54826 |  ops/s |
|                                                Max Throughput |                                            insert-ilm |     16.4125 |     22.9607 |      6.54826 |  ops/s |
|                                      100th percentile latency |                                            insert-ilm |     59.6869 |     41.7628 |     -17.9241 |     ms |
|                                 100th percentile service time |                                            insert-ilm |     59.6869 |     41.7628 |     -17.9241 |     ms |
|                                                    error rate |                                            insert-ilm |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                tune-recovery-settings |     15.2768 |     14.2949 |     -0.98197 |  ops/s |
|                                               Mean Throughput |                                tune-recovery-settings |     15.2768 |     14.2949 |     -0.98197 |  ops/s |
|                                             Median Throughput |                                tune-recovery-settings |     15.2768 |     14.2949 |     -0.98197 |  ops/s |
|                                                Max Throughput |                                tune-recovery-settings |     15.2768 |     14.2949 |     -0.98197 |  ops/s |
|                                      100th percentile latency |                                tune-recovery-settings |     65.0607 |     69.5207 |      4.46002 |     ms |
|                                 100th percentile service time |                                tune-recovery-settings |     65.0607 |     69.5207 |      4.46002 |     ms |
|                                                    error rate |                                tune-recovery-settings |           0 |           0 |            0 |      % |
|                                      100th percentile latency |                                 delete-local-snapshot |     19270.8 |     22.0259 |     -19248.8 |     ms |
|                                 100th percentile service time |                                 delete-local-snapshot |     19270.8 |     22.0259 |     -19248.8 |     ms |
|                                                    error rate |                                 delete-local-snapshot |           0 |         100 |          100 |      % |
|                                                Min Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 7.89989e+08 | 7.77125e+08 | -1.28643e+07 | byte/s |
|                                               Mean Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 7.89989e+08 | 7.77125e+08 | -1.28643e+07 | byte/s |
|                                             Median Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 7.89989e+08 | 7.77125e+08 | -1.28643e+07 | byte/s |
|                                                Max Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 7.89989e+08 | 7.77125e+08 | -1.28643e+07 | byte/s |
|                                      100th percentile latency | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 |      184909 |      188177 |       3268.2 |     ms |
|                                 100th percentile service time | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 |      184909 |      188177 |       3268.2 |     ms |
|                                                    error rate | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               wait-for-local-snapshot | 1.06294e+09 | 1.06882e+09 |  5.87923e+06 | byte/s |
|                                               Mean Throughput |                               wait-for-local-snapshot | 1.06294e+09 | 1.06882e+09 |  5.87923e+06 | byte/s |
|                                             Median Throughput |                               wait-for-local-snapshot | 1.06294e+09 | 1.06882e+09 |  5.87923e+06 | byte/s |
|                                                Max Throughput |                               wait-for-local-snapshot | 1.06294e+09 | 1.06882e+09 |  5.87923e+06 | byte/s |
|                                      100th percentile latency |                               wait-for-local-snapshot |      134503 |      135641 |      1137.86 |     ms |
|                                 100th percentile service time |                               wait-for-local-snapshot |      134503 |      135641 |      1137.86 |     ms |
|                                                    error rate |                               wait-for-local-snapshot |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               set-shards-data-streams |     5.19107 |     5.44916 |      0.25809 |  ops/s |
|                                               Mean Throughput |                               set-shards-data-streams |     5.19107 |     5.44916 |      0.25809 |  ops/s |
|                                             Median Throughput |                               set-shards-data-streams |     5.19107 |     5.44916 |      0.25809 |  ops/s |
|                                                Max Throughput |                               set-shards-data-streams |     5.19107 |     5.44916 |      0.25809 |  ops/s |
|                                      100th percentile latency |                               set-shards-data-streams |     192.202 |     183.085 |     -9.11691 |     ms |
|                                 100th percentile service time |                               set-shards-data-streams |     192.202 |     183.085 |     -9.11691 |     ms |
|                                                    error rate |                               set-shards-data-streams |           0 |           0 |            0 |      % |
|                                                Min Throughput |                       wait-for-recovered-data-streams |    0.997689 |    0.997658 |       -3e-05 |  ops/s |
|                                               Mean Throughput |                       wait-for-recovered-data-streams |    0.997689 |    0.997658 |       -3e-05 |  ops/s |
|                                             Median Throughput |                       wait-for-recovered-data-streams |    0.997689 |    0.997658 |       -3e-05 |  ops/s |
|                                                Max Throughput |                       wait-for-recovered-data-streams |    0.997689 |    0.997658 |       -3e-05 |  ops/s |
|                                      100th percentile latency |                       wait-for-recovered-data-streams |      249576 |      175410 |       -74166 |     ms |
|                                 100th percentile service time |                       wait-for-recovered-data-streams |      249576 |      175410 |       -74166 |     ms |
|                                                    error rate |                       wait-for-recovered-data-streams |           0 |           0 |            0 |      % |
|                                       50th percentile latency |                          create-required-data-streams |     3.89542 |     3.68277 |     -0.21265 |     ms |
|                                       90th percentile latency |                          create-required-data-streams |     14.6611 |     19.0404 |      4.37931 |     ms |
|                                      100th percentile latency |                          create-required-data-streams |     18.5574 |     30.0101 |      11.4527 |     ms |
|                                  50th percentile service time |                          create-required-data-streams |     3.89542 |     3.68277 |     -0.21265 |     ms |
|                                  90th percentile service time |                          create-required-data-streams |     14.6611 |     19.0404 |      4.37931 |     ms |
|                                 100th percentile service time |                          create-required-data-streams |     18.5574 |     30.0101 |      11.4527 |     ms |
|                                                    error rate |                          create-required-data-streams |           0 |           0 |            0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

With no rate limiting

Comparing baseline
  Race ID: 7f906076-9687-4a45-a405-20e3168ae937
  Race timestamp: 2021-08-19 12:55:39
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

with contender
  Race ID: 9f305852-2513-496b-bcc0-3c19513008f9
  Race timestamp: 2021-08-19 12:42:29
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                        Metric |                                                  Task |    Baseline |   Contender |         Diff |   Unit |
|--------------------------------------------------------------:|------------------------------------------------------:|------------:|------------:|-------------:|-------:|
|                    Cumulative indexing time of primary shards |                                                       |   0.0262833 |   0.0250333 |     -0.00125 |    min |
|             Min cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|          Median cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Max cumulative indexing time across primary shard |                                                       |   0.0262833 |   0.0250333 |     -0.00125 |    min |
|           Cumulative indexing throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|    Min cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
| Median cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Max cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                       Cumulative merge time of primary shards |                                                       |           0 |           0 |            0 |    min |
|                      Cumulative merge count of primary shards |                                                       |           0 |           0 |            0 |        |
|                Min cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Cumulative merge throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|       Min cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Median cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|       Max cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                     Cumulative refresh time of primary shards |                                                       |  0.00323333 |  0.00398333 |      0.00075 |    min |
|                    Cumulative refresh count of primary shards |                                                       |         528 |         290 |         -238 |        |
|              Min cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|           Median cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Max cumulative refresh time across primary shard |                                                       |  0.00321667 |  0.00398333 |      0.00077 |    min |
|                       Cumulative flush time of primary shards |                                                       |  0.00216667 |  0.00236667 |       0.0002 |    min |
|                      Cumulative flush count of primary shards |                                                       |          49 |          44 |           -5 |        |
|                Min cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative flush time across primary shard |                                                       |  0.00216667 |  0.00236667 |       0.0002 |    min |
|                                       Total Young Gen GC time |                                                       |       1.917 |       1.661 |       -0.256 |      s |
|                                      Total Young Gen GC count |                                                       |          53 |          43 |          -10 |        |
|                                         Total Old Gen GC time |                                                       |           0 |           0 |            0 |      s |
|                                        Total Old Gen GC count |                                                       |           0 |           0 |            0 |        |
|                                                    Store size |                                                       |     278.307 |     278.307 |       -2e-05 |     GB |
|                                                 Translog size |                                                       | 8.70787e-06 | 4.71249e-06 |           -0 |     GB |
|                                        Heap used for segments |                                                       |     14.5994 |     14.5983 |     -0.00103 |     MB |
|                                      Heap used for doc values |                                                       |     1.64385 |     1.64378 |       -7e-05 |     MB |
|                                           Heap used for terms |                                                       |     12.3461 |     12.3456 |     -0.00049 |     MB |
|                                           Heap used for norms |                                                       |   0.0812988 |   0.0812988 |            0 |     MB |
|                                          Heap used for points |                                                       |           0 |           0 |            0 |     MB |
|                                   Heap used for stored fields |                                                       |    0.528076 |    0.527611 |     -0.00047 |     MB |
|                                                 Segment count |                                                       |         882 |         881 |           -1 |        |
|                                                Min Throughput |                                      insert-pipelines |      6.5086 |     5.80094 |     -0.70767 |  ops/s |
|                                               Mean Throughput |                                      insert-pipelines |      6.5086 |     5.80094 |     -0.70767 |  ops/s |
|                                             Median Throughput |                                      insert-pipelines |      6.5086 |     5.80094 |     -0.70767 |  ops/s |
|                                                Max Throughput |                                      insert-pipelines |      6.5086 |     5.80094 |     -0.70767 |  ops/s |
|                                      100th percentile latency |                                      insert-pipelines |     2148.89 |      2411.1 |      262.216 |     ms |
|                                 100th percentile service time |                                      insert-pipelines |     2148.89 |      2411.1 |      262.216 |     ms |
|                                                    error rate |                                      insert-pipelines |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                            insert-ilm |     18.9031 |     21.5496 |      2.64653 |  ops/s |
|                                               Mean Throughput |                                            insert-ilm |     18.9031 |     21.5496 |      2.64653 |  ops/s |
|                                             Median Throughput |                                            insert-ilm |     18.9031 |     21.5496 |      2.64653 |  ops/s |
|                                                Max Throughput |                                            insert-ilm |     18.9031 |     21.5496 |      2.64653 |  ops/s |
|                                      100th percentile latency |                                            insert-ilm |     51.1772 |     45.8049 |     -5.37231 |     ms |
|                                 100th percentile service time |                                            insert-ilm |     51.1772 |     45.8049 |     -5.37231 |     ms |
|                                                    error rate |                                            insert-ilm |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                tune-recovery-settings |     15.5724 |     15.5789 |      0.00659 |  ops/s |
|                                               Mean Throughput |                                tune-recovery-settings |     15.5724 |     15.5789 |      0.00659 |  ops/s |
|                                             Median Throughput |                                tune-recovery-settings |     15.5724 |     15.5789 |      0.00659 |  ops/s |
|                                                Max Throughput |                                tune-recovery-settings |     15.5724 |     15.5789 |      0.00659 |  ops/s |
|                                      100th percentile latency |                                tune-recovery-settings |     63.8047 |       63.79 |     -0.01474 |     ms |
|                                 100th percentile service time |                                tune-recovery-settings |     63.8047 |       63.79 |     -0.01474 |     ms |
|                                                    error rate |                                tune-recovery-settings |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                 delete-local-snapshot |   0.0855213 |   0.0414938 |     -0.04403 |  ops/s |
|                                               Mean Throughput |                                 delete-local-snapshot |   0.0855213 |   0.0414938 |     -0.04403 |  ops/s |
|                                             Median Throughput |                                 delete-local-snapshot |   0.0855213 |   0.0414938 |     -0.04403 |  ops/s |
|                                                Max Throughput |                                 delete-local-snapshot |   0.0855213 |   0.0414938 |     -0.04403 |  ops/s |
|                                      100th percentile latency |                                 delete-local-snapshot |     11692.6 |     24099.6 |        12407 |     ms |
|                                 100th percentile service time |                                 delete-local-snapshot |     11692.6 |     24099.6 |        12407 |     ms |
|                                                    error rate |                                 delete-local-snapshot |           0 |           0 |            0 |      % |
|                                                Min Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 8.53651e+08 | 8.17701e+08 | -3.59501e+07 | byte/s |
|                                               Mean Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 8.53651e+08 | 8.17701e+08 | -3.59501e+07 | byte/s |
|                                             Median Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 8.53651e+08 | 8.17701e+08 | -3.59501e+07 | byte/s |
|                                                Max Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 | 8.53651e+08 | 8.17701e+08 | -3.59501e+07 | byte/s |
|                                      100th percentile latency | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 |      169879 |      177967 |      8087.84 |     ms |
|                                 100th percentile service time | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 |      169879 |      177967 |      8087.84 |     ms |
|                                                    error rate | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-3-100 |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               wait-for-local-snapshot | 9.65389e+08 | 1.06126e+09 |  9.58736e+07 | byte/s |
|                                               Mean Throughput |                               wait-for-local-snapshot | 9.65389e+08 | 1.06126e+09 |  9.58736e+07 | byte/s |
|                                             Median Throughput |                               wait-for-local-snapshot | 9.65389e+08 | 1.06126e+09 |  9.58736e+07 | byte/s |
|                                                Max Throughput |                               wait-for-local-snapshot | 9.65389e+08 | 1.06126e+09 |  9.58736e+07 | byte/s |
|                                      100th percentile latency |                               wait-for-local-snapshot |      148677 |      138000 |     -10677.2 |     ms |
|                                 100th percentile service time |                               wait-for-local-snapshot |      148677 |      138000 |     -10677.2 |     ms |
|                                                    error rate |                               wait-for-local-snapshot |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               set-shards-data-streams |     6.22509 |      6.2704 |      0.04531 |  ops/s |
|                                               Mean Throughput |                               set-shards-data-streams |     6.22509 |      6.2704 |      0.04531 |  ops/s |
|                                             Median Throughput |                               set-shards-data-streams |     6.22509 |      6.2704 |      0.04531 |  ops/s |
|                                                Max Throughput |                               set-shards-data-streams |     6.22509 |      6.2704 |      0.04531 |  ops/s |
|                                      100th percentile latency |                               set-shards-data-streams |     160.201 |     159.041 |     -1.15941 |     ms |
|                                 100th percentile service time |                               set-shards-data-streams |     160.201 |     159.041 |     -1.15941 |     ms |
|                                                    error rate |                               set-shards-data-streams |           0 |           0 |            0 |      % |
|                                                Min Throughput |                       wait-for-recovered-data-streams |    0.998827 |     1.00058 |      0.00175 |  ops/s |
|                                               Mean Throughput |                       wait-for-recovered-data-streams |    0.998827 |     1.00058 |      0.00175 |  ops/s |
|                                             Median Throughput |                       wait-for-recovered-data-streams |    0.998827 |     1.00058 |      0.00175 |  ops/s |
|                                                Max Throughput |                       wait-for-recovered-data-streams |    0.998827 |     1.00058 |      0.00175 |  ops/s |
|                                      100th percentile latency |                       wait-for-recovered-data-streams |      187219 |      106938 |     -80281.1 |     ms |
|                                 100th percentile service time |                       wait-for-recovered-data-streams |      187219 |      106938 |     -80281.1 |     ms |
|                                                    error rate |                       wait-for-recovered-data-streams |           0 |           0 |            0 |      % |
|                                       50th percentile latency |                          create-required-data-streams |     3.48745 |     2.57473 |     -0.91271 |     ms |
|                                       90th percentile latency |                          create-required-data-streams |     13.5358 |     13.1077 |     -0.42804 |     ms |
|                                      100th percentile latency |                          create-required-data-streams |     16.2955 |     16.5085 |      0.21309 |     ms |
|                                  50th percentile service time |                          create-required-data-streams |     3.48745 |     2.57473 |     -0.91271 |     ms |
|                                  90th percentile service time |                          create-required-data-streams |     13.5358 |     13.1077 |     -0.42804 |     ms |
|                                 100th percentile service time |                          create-required-data-streams |     16.2955 |     16.5085 |      0.21309 |     ms |
|                                                    error rate |                          create-required-data-streams |           0 |           0 |            0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

henningandersen · 2021-08-25T07:00:21Z

A second set of benchmarks have been run with a 500GB snapshot, 13 indices/26 shards.

~~With standard rate limiting (40MB/s), recovery time went from 999913ms to 558342ms. (I am mystified by these numbers, which will require further investigation).~~
With no rate limiting, recovery time went from 876308ms to 588860ms.

With ~default ratelimit~ comparing baseline without recovery from snapshot and contender with recovery from snapshot (this really ran without any ratelimit)

Comparing baseline
  Race ID: ca0ed860-8d91-4db5-a03e-aa728fe47f89
  Race timestamp: 2021-08-20 18:38:25
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

with contender
  Race ID: a6d04b41-cbcb-4a1f-b5fd-d0160e6cfb9e
  Race timestamp: 2021-08-20 19:37:49
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                        Metric |                                                  Task |    Baseline |   Contender |        Diff |   Unit |
|--------------------------------------------------------------:|------------------------------------------------------:|------------:|------------:|------------:|-------:|
|                    Cumulative indexing time of primary shards |                                                       |     0.02605 |   0.0218167 |    -0.00423 |    min |
|             Min cumulative indexing time across primary shard |                                                       |           0 |           0 |           0 |    min |
|          Median cumulative indexing time across primary shard |                                                       |           0 |           0 |           0 |    min |
|             Max cumulative indexing time across primary shard |                                                       |     0.02605 |   0.0218167 |    -0.00423 |    min |
|           Cumulative indexing throttle time of primary shards |                                                       |           0 |           0 |           0 |    min |
|    Min cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |           0 |    min |
| Median cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |           0 |    min |
|    Max cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |           0 |    min |
|                       Cumulative merge time of primary shards |                                                       |           0 |           0 |           0 |    min |
|                      Cumulative merge count of primary shards |                                                       |           0 |           0 |           0 |        |
|                Min cumulative merge time across primary shard |                                                       |           0 |           0 |           0 |    min |
|             Median cumulative merge time across primary shard |                                                       |           0 |           0 |           0 |    min |
|                Max cumulative merge time across primary shard |                                                       |           0 |           0 |           0 |    min |
|              Cumulative merge throttle time of primary shards |                                                       |           0 |           0 |           0 |    min |
|       Min cumulative merge throttle time across primary shard |                                                       |           0 |           0 |           0 |    min |
|    Median cumulative merge throttle time across primary shard |                                                       |           0 |           0 |           0 |    min |
|       Max cumulative merge throttle time across primary shard |                                                       |           0 |           0 |           0 |    min |
|                     Cumulative refresh time of primary shards |                                                       |  0.00326667 |     0.00345 |     0.00018 |    min |
|                    Cumulative refresh count of primary shards |                                                       |         583 |         591 |           8 |        |
|              Min cumulative refresh time across primary shard |                                                       |           0 |           0 |           0 |    min |
|           Median cumulative refresh time across primary shard |                                                       |           0 |           0 |           0 |    min |
|              Max cumulative refresh time across primary shard |                                                       |  0.00326667 |     0.00345 |     0.00018 |    min |
|                       Cumulative flush time of primary shards |                                                       |      0.0021 |  0.00196667 |    -0.00013 |    min |
|                      Cumulative flush count of primary shards |                                                       |          99 |          99 |           0 |        |
|                Min cumulative flush time across primary shard |                                                       |           0 |           0 |           0 |    min |
|             Median cumulative flush time across primary shard |                                                       |           0 |           0 |           0 |    min |
|                Max cumulative flush time across primary shard |                                                       |      0.0021 |  0.00196667 |    -0.00013 |    min |
|                                       Total Young Gen GC time |                                                       |       4.239 |       3.881 |      -0.358 |      s |
|                                      Total Young Gen GC count |                                                       |         196 |         147 |         -49 |        |
|                                         Total Old Gen GC time |                                                       |           0 |           0 |           0 |      s |
|                                        Total Old Gen GC count |                                                       |           0 |           0 |           0 |        |
|                                                    Store size |                                                       |     1319.73 |     1319.66 |    -0.07114 |     GB |
|                                                 Translog size |                                                       | 9.93721e-06 | 9.93721e-06 |           0 |     GB |
|                                        Heap used for segments |                                                       |     21.4268 |     21.4268 |           0 |     MB |
|                                      Heap used for doc values |                                                       |       3.013 |       3.013 |           0 |     MB |
|                                           Heap used for terms |                                                       |     17.2572 |     17.2572 |           0 |     MB |
|                                           Heap used for norms |                                                       |   0.0710449 |   0.0710449 |           0 |     MB |
|                                          Heap used for points |                                                       |           0 |           0 |           0 |     MB |
|                                   Heap used for stored fields |                                                       |     1.08556 |     1.08556 |           0 |     MB |
|                                                 Segment count |                                                       |        1157 |        1157 |           0 |        |
|                                                Min Throughput |                                      insert-pipelines |      6.1827 |     6.10731 |    -0.07539 |  ops/s |
|                                               Mean Throughput |                                      insert-pipelines |      6.1827 |     6.10731 |    -0.07539 |  ops/s |
|                                             Median Throughput |                                      insert-pipelines |      6.1827 |     6.10731 |    -0.07539 |  ops/s |
|                                                Max Throughput |                                      insert-pipelines |      6.1827 |     6.10731 |    -0.07539 |  ops/s |
|                                      100th percentile latency |                                      insert-pipelines |     2261.86 |     2290.21 |     28.3533 |     ms |
|                                 100th percentile service time |                                      insert-pipelines |     2261.86 |     2290.21 |     28.3533 |     ms |
|                                                    error rate |                                      insert-pipelines |           0 |           0 |           0 |      % |
|                                                Min Throughput |                                            insert-ilm |     19.8725 |      18.429 |    -1.44342 |  ops/s |
|                                               Mean Throughput |                                            insert-ilm |     19.8725 |      18.429 |    -1.44342 |  ops/s |
|                                             Median Throughput |                                            insert-ilm |     19.8725 |      18.429 |    -1.44342 |  ops/s |
|                                                Max Throughput |                                            insert-ilm |     19.8725 |      18.429 |    -1.44342 |  ops/s |
|                                      100th percentile latency |                                            insert-ilm |     48.0512 |     52.7942 |     4.74299 |     ms |
|                                 100th percentile service time |                                            insert-ilm |     48.0512 |     52.7942 |     4.74299 |     ms |
|                                                    error rate |                                            insert-ilm |           0 |           0 |           0 |      % |
|                                                Min Throughput |                                tune-recovery-settings |     13.7105 |     15.6771 |     1.96666 |  ops/s |
|                                               Mean Throughput |                                tune-recovery-settings |     13.7105 |     15.6771 |     1.96666 |  ops/s |
|                                             Median Throughput |                                tune-recovery-settings |     13.7105 |     15.6771 |     1.96666 |  ops/s |
|                                                Max Throughput |                                tune-recovery-settings |     13.7105 |     15.6771 |     1.96666 |  ops/s |
|                                      100th percentile latency |                                tune-recovery-settings |     72.5099 |     63.3558 |    -9.15413 |     ms |
|                                 100th percentile service time |                                tune-recovery-settings |     72.5099 |     63.3558 |    -9.15413 |     ms |
|                                                    error rate |                                tune-recovery-settings |           0 |           0 |           0 |      % |
|                                      100th percentile latency |                                 delete-local-snapshot |     23.3652 |     26749.4 |       26726 |     ms |
|                                 100th percentile service time |                                 delete-local-snapshot |     23.3652 |     26749.4 |       26726 |     ms |
|                                                    error rate |                                 delete-local-snapshot |         100 |           0 |        -100 |      % |
|                                                Min Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.89547e+08 | 8.94858e+08 | 1.05311e+08 | byte/s |
|                                               Mean Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.89547e+08 | 8.94858e+08 | 1.05311e+08 | byte/s |
|                                             Median Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.89547e+08 | 8.94858e+08 | 1.05311e+08 | byte/s |
|                                                Max Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.89547e+08 | 8.94858e+08 | 1.05311e+08 | byte/s |
|                                      100th percentile latency | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |      894341 |      788446 |     -105895 |     ms |
|                                 100th percentile service time | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |      894341 |      788446 |     -105895 |     ms |
|                                                    error rate | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |           0 |           0 |           0 |      % |
|                                                Min Throughput |                               wait-for-local-snapshot | 1.12986e+09 | 1.20579e+09 | 7.59296e+07 | byte/s |
|                                               Mean Throughput |                               wait-for-local-snapshot | 1.12986e+09 | 1.20579e+09 | 7.59296e+07 | byte/s |
|                                             Median Throughput |                               wait-for-local-snapshot | 1.12986e+09 | 1.20579e+09 | 7.59296e+07 | byte/s |
|                                                Max Throughput |                               wait-for-local-snapshot | 1.12986e+09 | 1.20579e+09 | 7.59296e+07 | byte/s |
|                                      100th percentile latency |                               wait-for-local-snapshot |      620956 |      581423 |    -39532.1 |     ms |
|                                 100th percentile service time |                               wait-for-local-snapshot |      620956 |      581423 |    -39532.1 |     ms |
|                                                    error rate |                               wait-for-local-snapshot |           0 |           0 |           0 |      % |
|                                                Min Throughput |                               set-shards-data-streams |     3.78222 |     3.50564 |    -0.27658 |  ops/s |
|                                               Mean Throughput |                               set-shards-data-streams |     3.78222 |     3.50564 |    -0.27658 |  ops/s |
|                                             Median Throughput |                               set-shards-data-streams |     3.78222 |     3.50564 |    -0.27658 |  ops/s |
|                                                Max Throughput |                               set-shards-data-streams |     3.78222 |     3.50564 |    -0.27658 |  ops/s |
|                                      100th percentile latency |                               set-shards-data-streams |     263.968 |     284.829 |     20.8613 |     ms |
|                                 100th percentile service time |                               set-shards-data-streams |     263.968 |     284.829 |     20.8613 |     ms |
|                                                    error rate |                               set-shards-data-streams |           0 |           0 |           0 |      % |
|                                                Min Throughput |                       wait-for-recovered-data-streams |    0.994085 |     0.99401 |      -7e-05 |  ops/s |
|                                               Mean Throughput |                       wait-for-recovered-data-streams |    0.994085 |     0.99401 |      -7e-05 |  ops/s |
|                                             Median Throughput |                       wait-for-recovered-data-streams |    0.994085 |     0.99401 |      -7e-05 |  ops/s |
|                                                Max Throughput |                       wait-for-recovered-data-streams |    0.994085 |     0.99401 |      -7e-05 |  ops/s |
|                                      100th percentile latency |                       wait-for-recovered-data-streams |      999913 |      558342 |     -441570 |     ms |
|                                 100th percentile service time |                       wait-for-recovered-data-streams |      999913 |      558342 |     -441570 |     ms |
|                                                    error rate |                       wait-for-recovered-data-streams |           0 |           0 |           0 |      % |
|                                       50th percentile latency |                          create-required-data-streams |     2.94076 |     2.86268 |    -0.07808 |     ms |
|                                       90th percentile latency |                          create-required-data-streams |     11.1517 |     12.4218 |     1.27007 |     ms |
|                                      100th percentile latency |                          create-required-data-streams |     13.1239 |     14.4092 |     1.28536 |     ms |
|                                  50th percentile service time |                          create-required-data-streams |     2.94076 |     2.86268 |    -0.07808 |     ms |
|                                  90th percentile service time |                          create-required-data-streams |     11.1517 |     12.4218 |     1.27007 |     ms |
|                                 100th percentile service time |                          create-required-data-streams |     13.1239 |     14.4092 |     1.28536 |     ms |
|                                                    error rate |                          create-required-data-streams |           0 |           0 |           0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

Similarly without any rate limit

Comparing baseline
  Race ID: 1524f8ca-5b7f-41b5-8a84-ea2570ed5dcf
  Race timestamp: 2021-08-24 05:22:03
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

with contender
  Race ID: 60225ad5-addf-4b1a-b969-9faf4308a25c
  Race timestamp: 2021-08-23 07:41:39
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                        Metric |                                                  Task |    Baseline |   Contender |         Diff |   Unit |
|--------------------------------------------------------------:|------------------------------------------------------:|------------:|------------:|-------------:|-------:|
|                    Cumulative indexing time of primary shards |                                                       |   0.0273333 |      0.0241 |     -0.00323 |    min |
|             Min cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|          Median cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Max cumulative indexing time across primary shard |                                                       |   0.0273333 |      0.0241 |     -0.00323 |    min |
|           Cumulative indexing throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|    Min cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
| Median cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Max cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                       Cumulative merge time of primary shards |                                                       |           0 |           0 |            0 |    min |
|                      Cumulative merge count of primary shards |                                                       |           0 |           0 |            0 |        |
|                Min cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Cumulative merge throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|       Min cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Median cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|       Max cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                     Cumulative refresh time of primary shards |                                                       |  0.00388333 |  0.00356667 |     -0.00032 |    min |
|                    Cumulative refresh count of primary shards |                                                       |         601 |         592 |           -9 |        |
|              Min cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|           Median cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Max cumulative refresh time across primary shard |                                                       |  0.00388333 |  0.00356667 |     -0.00032 |    min |
|                       Cumulative flush time of primary shards |                                                       |  0.00208333 |  0.00198333 |      -0.0001 |    min |
|                      Cumulative flush count of primary shards |                                                       |          99 |         100 |            1 |        |
|                Min cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative flush time across primary shard |                                                       |  0.00208333 |  0.00198333 |      -0.0001 |    min |
|                                       Total Young Gen GC time |                                                       |       4.219 |       3.799 |        -0.42 |      s |
|                                      Total Young Gen GC count |                                                       |         195 |         148 |          -47 |        |
|                                         Total Old Gen GC time |                                                       |           0 |           0 |            0 |      s |
|                                        Total Old Gen GC count |                                                       |           0 |           0 |            0 |        |
|                                                    Store size |                                                       |     1319.73 |     1319.73 |      -0.0013 |     GB |
|                                                 Translog size |                                                       | 9.93721e-06 | 9.93721e-06 |            0 |     GB |
|                                        Heap used for segments |                                                       |     21.4289 |     21.4279 |     -0.00104 |     MB |
|                                      Heap used for doc values |                                                       |     3.01315 |     3.01308 |       -7e-05 |     MB |
|                                           Heap used for terms |                                                       |     17.2582 |     17.2577 |     -0.00049 |     MB |
|                                           Heap used for norms |                                                       |   0.0710449 |   0.0710449 |            0 |     MB |
|                                          Heap used for points |                                                       |           0 |           0 |            0 |     MB |
|                                   Heap used for stored fields |                                                       |     1.08651 |     1.08603 |     -0.00048 |     MB |
|                                                 Segment count |                                                       |        1159 |        1158 |           -1 |        |
|                                                Min Throughput |                                      insert-pipelines |     6.14809 |     6.33387 |      0.18579 |  ops/s |
|                                               Mean Throughput |                                      insert-pipelines |     6.14809 |     6.33387 |      0.18579 |  ops/s |
|                                             Median Throughput |                                      insert-pipelines |     6.14809 |     6.33387 |      0.18579 |  ops/s |
|                                                Max Throughput |                                      insert-pipelines |     6.14809 |     6.33387 |      0.18579 |  ops/s |
|                                      100th percentile latency |                                      insert-pipelines |        2275 |     2207.76 |     -67.2437 |     ms |
|                                 100th percentile service time |                                      insert-pipelines |        2275 |     2207.76 |     -67.2437 |     ms |
|                                                    error rate |                                      insert-pipelines |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                            insert-ilm |      19.202 |     19.5398 |      0.33787 |  ops/s |
|                                               Mean Throughput |                                            insert-ilm |      19.202 |     19.5398 |      0.33787 |  ops/s |
|                                             Median Throughput |                                            insert-ilm |      19.202 |     19.5398 |      0.33787 |  ops/s |
|                                                Max Throughput |                                            insert-ilm |      19.202 |     19.5398 |      0.33787 |  ops/s |
|                                      100th percentile latency |                                            insert-ilm |     50.5231 |     48.7004 |     -1.82267 |     ms |
|                                 100th percentile service time |                                            insert-ilm |     50.5231 |     48.7004 |     -1.82267 |     ms |
|                                                    error rate |                                            insert-ilm |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                tune-recovery-settings |      16.768 |      16.957 |      0.18893 |  ops/s |
|                                               Mean Throughput |                                tune-recovery-settings |      16.768 |      16.957 |      0.18893 |  ops/s |
|                                             Median Throughput |                                tune-recovery-settings |      16.768 |      16.957 |      0.18893 |  ops/s |
|                                                Max Throughput |                                tune-recovery-settings |      16.768 |      16.957 |      0.18893 |  ops/s |
|                                      100th percentile latency |                                tune-recovery-settings |     59.2336 |     58.5507 |     -0.68291 |     ms |
|                                 100th percentile service time |                                tune-recovery-settings |     59.2336 |     58.5507 |     -0.68291 |     ms |
|                                                    error rate |                                tune-recovery-settings |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                 delete-local-snapshot |    0.034256 |   0.0350075 |      0.00075 |  ops/s |
|                                               Mean Throughput |                                 delete-local-snapshot |    0.034256 |   0.0350075 |      0.00075 |  ops/s |
|                                             Median Throughput |                                 delete-local-snapshot |    0.034256 |   0.0350075 |      0.00075 |  ops/s |
|                                                Max Throughput |                                 delete-local-snapshot |    0.034256 |   0.0350075 |      0.00075 |  ops/s |
|                                      100th percentile latency |                                 delete-local-snapshot |     29191.6 |     28564.9 |     -626.715 |     ms |
|                                 100th percentile service time |                                 delete-local-snapshot |     29191.6 |     28564.9 |     -626.715 |     ms |
|                                                    error rate |                                 delete-local-snapshot |           0 |           0 |            0 |      % |
|                                                Min Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.57763e+08 | 8.06163e+08 |  4.84001e+07 | byte/s |
|                                               Mean Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.57763e+08 | 8.06163e+08 |  4.84001e+07 | byte/s |
|                                             Median Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.57763e+08 | 8.06163e+08 |  4.84001e+07 | byte/s |
|                                                Max Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 7.57763e+08 | 8.06163e+08 |  4.84001e+07 | byte/s |
|                                      100th percentile latency | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |      931938 |      875335 |     -56603.2 |     ms |
|                                 100th percentile service time | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |      931938 |      875335 |     -56603.2 |     ms |
|                                                    error rate | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               wait-for-local-snapshot | 1.17574e+09 | 1.13387e+09 | -4.18656e+07 | byte/s |
|                                               Mean Throughput |                               wait-for-local-snapshot | 1.17574e+09 | 1.13387e+09 | -4.18656e+07 | byte/s |
|                                             Median Throughput |                               wait-for-local-snapshot | 1.17574e+09 | 1.13387e+09 | -4.18656e+07 | byte/s |
|                                                Max Throughput |                               wait-for-local-snapshot | 1.17574e+09 | 1.13387e+09 | -4.18656e+07 | byte/s |
|                                      100th percentile latency |                               wait-for-local-snapshot |      600461 |      623225 |      22763.7 |     ms |
|                                 100th percentile service time |                               wait-for-local-snapshot |      600461 |      623225 |      22763.7 |     ms |
|                                                    error rate |                               wait-for-local-snapshot |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               set-shards-data-streams |     3.72414 |     3.63289 |     -0.09124 |  ops/s |
|                                               Mean Throughput |                               set-shards-data-streams |     3.72414 |     3.63289 |     -0.09124 |  ops/s |
|                                             Median Throughput |                               set-shards-data-streams |     3.72414 |     3.63289 |     -0.09124 |  ops/s |
|                                                Max Throughput |                               set-shards-data-streams |     3.72414 |     3.63289 |     -0.09124 |  ops/s |
|                                      100th percentile latency |                               set-shards-data-streams |     268.094 |     274.819 |      6.72498 |     ms |
|                                 100th percentile service time |                               set-shards-data-streams |     268.094 |     274.819 |      6.72498 |     ms |
|                                                    error rate |                               set-shards-data-streams |           0 |           0 |            0 |      % |
|                                                Min Throughput |                       wait-for-recovered-data-streams |    0.995082 |    0.993442 |     -0.00164 |  ops/s |
|                                               Mean Throughput |                       wait-for-recovered-data-streams |    0.995082 |    0.993442 |     -0.00164 |  ops/s |
|                                             Median Throughput |                       wait-for-recovered-data-streams |    0.995082 |    0.993442 |     -0.00164 |  ops/s |
|                                                Max Throughput |                       wait-for-recovered-data-streams |    0.995082 |    0.993442 |     -0.00164 |  ops/s |
|                                      100th percentile latency |                       wait-for-recovered-data-streams |      876308 |      588860 |      -287448 |     ms |
|                                 100th percentile service time |                       wait-for-recovered-data-streams |      876308 |      588860 |      -287448 |     ms |
|                                                    error rate |                       wait-for-recovered-data-streams |           0 |           0 |            0 |      % |
|                                       50th percentile latency |                          create-required-data-streams |     2.94675 |     2.25423 |     -0.69252 |     ms |
|                                       90th percentile latency |                          create-required-data-streams |     12.3682 |     11.3334 |     -1.03482 |     ms |
|                                      100th percentile latency |                          create-required-data-streams |     13.5988 |     11.3678 |     -2.23099 |     ms |
|                                  50th percentile service time |                          create-required-data-streams |     2.94675 |     2.25423 |     -0.69252 |     ms |
|                                  90th percentile service time |                          create-required-data-streams |     12.3682 |     11.3334 |     -1.03482 |     ms |
|                                 100th percentile service time |                          create-required-data-streams |     13.5988 |     11.3678 |     -2.23099 |     ms |
|                                                    error rate |                          create-required-data-streams |           0 |           0 |            0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

henningandersen · 2021-08-26T06:10:23Z

A new set of benchmarks, with ratelimit of 40MB/s, 500GB snapshot, 13 indices/26 shards. Recovery time dropped from 11612s to 7175s. A contribution to this drop is that all nodes have both primaries and replicas, i.e., when recovering files from primary, the outgoing bytes on a node will reduce the incoming rate limit quota available too, it all shares the same 40MB/s rate limiter.

Comparison of recovery from primary (baseline) vs recover from snapshot (contender)

Comparing baseline
  Race ID: 1dd448d2-7bc3-41fc-801d-7fbf16ba2ac1
  Race timestamp: 2021-08-25 19:30:08
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

with contender
  Race ID: fdd72f87-6b71-4b63-9cb7-24358d78c822
  Race timestamp: 2021-08-25 12:53:21
  Challenge: logging-snapshot-restore
  Car: external
  User tags: env-id=7bcbb2c7-223f-4a50-a8e3-980f1474acdc

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                        Metric |                                                  Task |    Baseline |   Contender |         Diff |   Unit |
|--------------------------------------------------------------:|------------------------------------------------------:|------------:|------------:|-------------:|-------:|
|                    Cumulative indexing time of primary shards |                                                       |   0.0250667 |   0.0233667 |      -0.0017 |    min |
|             Min cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|          Median cumulative indexing time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Max cumulative indexing time across primary shard |                                                       |   0.0250667 |   0.0233667 |      -0.0017 |    min |
|           Cumulative indexing throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|    Min cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
| Median cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Max cumulative indexing throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                       Cumulative merge time of primary shards |                                                       |           0 |           0 |            0 |    min |
|                      Cumulative merge count of primary shards |                                                       |           0 |           0 |            0 |        |
|                Min cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative merge time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Cumulative merge throttle time of primary shards |                                                       |           0 |           0 |            0 |    min |
|       Min cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|    Median cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|       Max cumulative merge throttle time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                     Cumulative refresh time of primary shards |                                                       |  0.00421667 |  0.00436667 |      0.00015 |    min |
|                    Cumulative refresh count of primary shards |                                                       |         594 |         592 |           -2 |        |
|              Min cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|           Median cumulative refresh time across primary shard |                                                       |           0 |           0 |            0 |    min |
|              Max cumulative refresh time across primary shard |                                                       |  0.00421667 |  0.00433333 |      0.00012 |    min |
|                       Cumulative flush time of primary shards |                                                       |  0.00188333 |  0.00236667 |      0.00048 |    min |
|                      Cumulative flush count of primary shards |                                                       |         100 |         100 |            0 |        |
|                Min cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|             Median cumulative flush time across primary shard |                                                       |           0 |           0 |            0 |    min |
|                Max cumulative flush time across primary shard |                                                       |  0.00188333 |  0.00236667 |      0.00048 |    min |
|                                       Total Young Gen GC time |                                                       |       4.415 |       3.887 |       -0.528 |      s |
|                                      Total Young Gen GC count |                                                       |         214 |         158 |          -56 |        |
|                                         Total Old Gen GC time |                                                       |           0 |           0 |            0 |      s |
|                                        Total Old Gen GC count |                                                       |           0 |           0 |            0 |        |
|                                                    Store size |                                                       |     1319.73 |     1319.73 |       -1e-05 |     GB |
|                                                 Translog size |                                                       | 9.93721e-06 | 9.93721e-06 |            0 |     GB |
|                                        Heap used for segments |                                                       |     21.4279 |     21.4258 |     -0.00207 |     MB |
|                                      Heap used for doc values |                                                       |     3.01308 |     3.01293 |     -0.00014 |     MB |
|                                           Heap used for terms |                                                       |     17.2577 |     17.2567 |     -0.00098 |     MB |
|                                           Heap used for norms |                                                       |   0.0710449 |   0.0710449 |            0 |     MB |
|                                          Heap used for points |                                                       |           0 |           0 |            0 |     MB |
|                                   Heap used for stored fields |                                                       |     1.08604 |      1.0851 |     -0.00095 |     MB |
|                                                 Segment count |                                                       |        1158 |        1156 |           -2 |        |
|                                                Min Throughput |                                      insert-pipelines |     6.03682 |       5.793 |     -0.24382 |  ops/s |
|                                               Mean Throughput |                                      insert-pipelines |     6.03682 |       5.793 |     -0.24382 |  ops/s |
|                                             Median Throughput |                                      insert-pipelines |     6.03682 |       5.793 |     -0.24382 |  ops/s |
|                                                Max Throughput |                                      insert-pipelines |     6.03682 |       5.793 |     -0.24382 |  ops/s |
|                                      100th percentile latency |                                      insert-pipelines |     2316.74 |     2413.95 |      97.2034 |     ms |
|                                 100th percentile service time |                                      insert-pipelines |     2316.74 |     2413.95 |      97.2034 |     ms |
|                                                    error rate |                                      insert-pipelines |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                            insert-ilm |     18.8207 |     18.9421 |      0.12137 |  ops/s |
|                                               Mean Throughput |                                            insert-ilm |     18.8207 |     18.9421 |      0.12137 |  ops/s |
|                                             Median Throughput |                                            insert-ilm |     18.8207 |     18.9421 |      0.12137 |  ops/s |
|                                                Max Throughput |                                            insert-ilm |     18.8207 |     18.9421 |      0.12137 |  ops/s |
|                                      100th percentile latency |                                            insert-ilm |     51.4174 |     51.3318 |     -0.08562 |     ms |
|                                 100th percentile service time |                                            insert-ilm |     51.4174 |     51.3318 |     -0.08562 |     ms |
|                                                    error rate |                                            insert-ilm |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                tune-recovery-settings |     16.5618 |     16.3233 |     -0.23848 |  ops/s |
|                                               Mean Throughput |                                tune-recovery-settings |     16.5618 |     16.3233 |     -0.23848 |  ops/s |
|                                             Median Throughput |                                tune-recovery-settings |     16.5618 |     16.3233 |     -0.23848 |  ops/s |
|                                                Max Throughput |                                tune-recovery-settings |     16.5618 |     16.3233 |     -0.23848 |  ops/s |
|                                      100th percentile latency |                                tune-recovery-settings |     59.9837 |      60.842 |      0.85828 |     ms |
|                                 100th percentile service time |                                tune-recovery-settings |     59.9837 |      60.842 |      0.85828 |     ms |
|                                                    error rate |                                tune-recovery-settings |           0 |           0 |            0 |      % |
|                                                Min Throughput |                                 delete-local-snapshot |   0.0331672 |   0.0358357 |      0.00267 |  ops/s |
|                                               Mean Throughput |                                 delete-local-snapshot |   0.0331672 |   0.0358357 |      0.00267 |  ops/s |
|                                             Median Throughput |                                 delete-local-snapshot |   0.0331672 |   0.0358357 |      0.00267 |  ops/s |
|                                                Max Throughput |                                 delete-local-snapshot |   0.0331672 |   0.0358357 |      0.00267 |  ops/s |
|                                      100th percentile latency |                                 delete-local-snapshot |     30149.9 |     27904.7 |     -2245.17 |     ms |
|                                 100th percentile service time |                                 delete-local-snapshot |     30149.9 |     27904.7 |     -2245.17 |     ms |
|                                                    error rate |                                 delete-local-snapshot |           0 |           0 |            0 |      % |
|                                                Min Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |   6.523e+08 |  7.9338e+08 |   1.4108e+08 | byte/s |
|                                               Mean Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |   6.523e+08 |  7.9338e+08 |   1.4108e+08 | byte/s |
|                                             Median Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |   6.523e+08 |  7.9338e+08 |   1.4108e+08 | byte/s |
|                                                Max Throughput | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |   6.523e+08 |  7.9338e+08 |   1.4108e+08 | byte/s |
|                                      100th percentile latency | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 1.08335e+06 |      889728 |      -193625 |     ms |
|                                 100th percentile service time | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 | 1.08335e+06 |      889728 |      -193625 |     ms |
|                                                    error rate | wait-for-snapshot-recovery-logging-0.1-7.11.1-3-2-500 |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               wait-for-local-snapshot | 1.14557e+09 | 1.06318e+09 | -8.23916e+07 | byte/s |
|                                               Mean Throughput |                               wait-for-local-snapshot | 1.14557e+09 | 1.06318e+09 | -8.23916e+07 | byte/s |
|                                             Median Throughput |                               wait-for-local-snapshot | 1.14557e+09 | 1.06318e+09 | -8.23916e+07 | byte/s |
|                                                Max Throughput |                               wait-for-local-snapshot | 1.14557e+09 | 1.06318e+09 | -8.23916e+07 | byte/s |
|                                      100th percentile latency |                               wait-for-local-snapshot |      616617 |      660410 |      43793.4 |     ms |
|                                 100th percentile service time |                               wait-for-local-snapshot |      616617 |      660410 |      43793.4 |     ms |
|                                                    error rate |                               wait-for-local-snapshot |           0 |           0 |            0 |      % |
|                                                Min Throughput |                   reset-recovery-settings-to-defaults |     22.4368 |     23.1654 |      0.72865 |  ops/s |
|                                               Mean Throughput |                   reset-recovery-settings-to-defaults |     22.4368 |     23.1654 |      0.72865 |  ops/s |
|                                             Median Throughput |                   reset-recovery-settings-to-defaults |     22.4368 |     23.1654 |      0.72865 |  ops/s |
|                                                Max Throughput |                   reset-recovery-settings-to-defaults |     22.4368 |     23.1654 |      0.72865 |  ops/s |
|                                      100th percentile latency |                   reset-recovery-settings-to-defaults |      44.166 |     42.7751 |     -1.39095 |     ms |
|                                 100th percentile service time |                   reset-recovery-settings-to-defaults |      44.166 |     42.7751 |     -1.39095 |     ms |
|                                                    error rate |                   reset-recovery-settings-to-defaults |           0 |           0 |            0 |      % |
|                                                Min Throughput |                               set-shards-data-streams |     3.46802 |     3.78544 |      0.31742 |  ops/s |
|                                               Mean Throughput |                               set-shards-data-streams |     3.46802 |     3.78544 |      0.31742 |  ops/s |
|                                             Median Throughput |                               set-shards-data-streams |     3.46802 |     3.78544 |      0.31742 |  ops/s |
|                                                Max Throughput |                               set-shards-data-streams |     3.46802 |     3.78544 |      0.31742 |  ops/s |
|                                      100th percentile latency |                               set-shards-data-streams |     286.469 |     262.133 |     -24.3365 |     ms |
|                                 100th percentile service time |                               set-shards-data-streams |     286.469 |     262.133 |     -24.3365 |     ms |
|                                                    error rate |                               set-shards-data-streams |           0 |           0 |            0 |      % |
|                                                Min Throughput |                       wait-for-recovered-data-streams |    0.995121 |    0.994857 |     -0.00026 |  ops/s |
|                                               Mean Throughput |                       wait-for-recovered-data-streams |    0.995121 |    0.994857 |     -0.00026 |  ops/s |
|                                             Median Throughput |                       wait-for-recovered-data-streams |    0.995121 |    0.994857 |     -0.00026 |  ops/s |
|                                                Max Throughput |                       wait-for-recovered-data-streams |    0.995121 |    0.994857 |     -0.00026 |  ops/s |
|                                      100th percentile latency |                       wait-for-recovered-data-streams | 1.16127e+07 | 7.17591e+06 | -4.43675e+06 |     ms |
|                                 100th percentile service time |                       wait-for-recovered-data-streams | 1.16127e+07 | 7.17591e+06 | -4.43675e+06 |     ms |
|                                                    error rate |                       wait-for-recovered-data-streams |           0 |           0 |            0 |      % |
|                                       50th percentile latency |                          create-required-data-streams |     2.27007 |      2.7955 |      0.52543 |     ms |
|                                       90th percentile latency |                          create-required-data-streams |     10.9978 |      11.302 |      0.30423 |     ms |
|                                      100th percentile latency |                          create-required-data-streams |     12.8601 |     12.5243 |     -0.33584 |     ms |
|                                  50th percentile service time |                          create-required-data-streams |     2.27007 |      2.7955 |      0.52543 |     ms |
|                                  90th percentile service time |                          create-required-data-streams |     10.9978 |      11.302 |      0.30423 |     ms |
|                                 100th percentile service time |                          create-required-data-streams |     12.8601 |     12.5243 |     -0.33584 |     ms |
|                                                    error rate |                          create-required-data-streams |           0 |           0 |            0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

Compression using indexing_data or lz4 as well as recovery from snapshot are primarily intended for ESS and is therefore marked ESS only in docs. Relates elastic#76237 and elastic#74587

Compression using indexing_data or lz4 as well as recovery from snapshot are primarily intended for ESS and is therefore marked ESS only in docs. Relates #76237 and #74587

Add logic to recover files from snapshot

c6080be

fcofdez added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.15.0 labels Aug 9, 2021

fcofdez requested review from DaveCTurner and henningandersen August 9, 2021 12:19

This was referenced Aug 9, 2021

Replicate or relocate data via snapshot #73496

Closed

Peer recovery using snapshot files #76114

Closed

Disable security on rest tests

8767582

DaveCTurner reviewed Aug 9, 2021

View reviewed changes

fcofdez added 4 commits August 10, 2021 10:29

Review comments

7714b7b

Fine grained exceptions for writeFile

b7883af

Use CHUNK_SIZE as a default read buffer size

d254fb9

Merge remote-tracking branch 'origin/master' into snapshot-files-peer…

ce74c48

…-recovery

fcofdez requested a review from DaveCTurner August 10, 2021 10:29

henningandersen reviewed Aug 10, 2021

View reviewed changes

Use BlobStoreRepository buffer size

4b55438

DaveCTurner reviewed Aug 10, 2021

View reviewed changes

fcofdez added 5 commits August 10, 2021 16:51

Review comments

09fa0ad

Merge remote-tracking branch 'origin/master' into snapshot-files-peer…

eda9725

…-recovery

Notify listener on cancellation only after all outstanding requests a…

44ef9ff

…re done

Merge remote-tracking branch 'origin/master' into snapshot-files-peer…

dbe8781

…-recovery

Take into account when there aren't in-flight requests

e355dfc

fcofdez requested review from DaveCTurner and henningandersen August 11, 2021 07:11

Add additional test

36e5dda

henningandersen reviewed Aug 12, 2021

View reviewed changes

henningandersen approved these changes Aug 12, 2021

View reviewed changes

DaveCTurner approved these changes Aug 12, 2021

View reviewed changes

fcofdez added 4 commits August 12, 2021 20:10

Review comments

94e5776

Merge remote-tracking branch 'origin/master' into snapshot-files-peer…

58e6bf0

…-recovery

Improve docs

8c6fd4d

Merge remote-tracking branch 'origin/master' into snapshot-files-peer…

1dda7a3

…-recovery

fcofdez merged commit 2ebe5cd into elastic:master Aug 13, 2021

fcofdez mentioned this pull request Aug 13, 2021

[7.x] Add peer recoveries using snapshot files when possible #76482

Merged

henningandersen mentioned this pull request Aug 14, 2021

Add recovery from snapshot to tests #76535

Merged

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Aug 16, 2021

Test recovery from snapshot with indexing

c151d6e

Add test to verify that concurrently indexing together with recover from snapshot works. Relates elastic#76237

henningandersen mentioned this pull request Aug 16, 2021

Test recovery from snapshot with indexing #76550

Merged

henningandersen added a commit that referenced this pull request Aug 17, 2021

Test recovery from snapshot with indexing (#76550)

031d9bb

Add test to verify that concurrently indexing together with recover from snapshot works. Relates #76237

henningandersen added a commit that referenced this pull request Aug 17, 2021

Test recovery from snapshot with indexing (#76550)

440bb8a

Add test to verify that concurrently indexing together with recover from snapshot works. Relates #76237

henningandersen mentioned this pull request Sep 1, 2021

Indexing_data/lz4, recover from snapshot ESS only #77130

Merged

jakelandis added v8.0.0-alpha2 and removed v8.0.0 labels Sep 15, 2021

kingherc mentioned this pull request Sep 1, 2022

Editing the found-snapshots repository fails elastic/kibana#139896

Closed

Add peer recoveries using snapshot files when possible #76237

Add peer recoveries using snapshot files when possible #76237

Uh oh!

Conversation

fcofdez commented Aug 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Aug 9, 2021

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaveCTurner Aug 10, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaveCTurner commented Aug 10, 2021

Uh oh!

henningandersen Aug 12, 2021

Choose a reason for hiding this comment

Uh oh!

henningandersen Aug 12, 2021

Choose a reason for hiding this comment

Uh oh!

henningandersen Aug 12, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

fcofdez commented Aug 12, 2021

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen commented Aug 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henningandersen commented Aug 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henningandersen commented Aug 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fcofdez commented Aug 9, 2021 •

edited

Loading

henningandersen commented Aug 20, 2021 •

edited

Loading

henningandersen commented Aug 25, 2021 •

edited

Loading

henningandersen commented Aug 26, 2021 •

edited

Loading