GCS repository does not use new handlers for failed request for every new request 

Google Cloud Support was contacted by a customer who sees many 503 errors when doing snapshots to Google Cloud Storage (GCS). We believe to have tracked down the bug to an incorrect initialization of HttpRequest in [GoogleCloudStorageService](https://github.com/elastic/elasticsearch/blob/master/plugins/repository-gcs/src/main/java/org/elasticsearch/repositories/gcs/GoogleCloudStorageService.java).

The example code in https://cloud.google.com/storage/transfer/create-client#retry attaches new instances of HttpUnsuccessfulResponseHandler and HttpBackOffIOExceptionHandler to every request. However the plugin initializes only one instance of each class in the constructor (line 142-143) and attaches the same instances to every request.

Please also consult the JavaDoc of both handler classes:

* [HttpBackOffUnsuccessfulResponseHandler](https://developers.google.com/api-client-library/java/google-http-java-client/reference/1.20.0/com/google/api/client/http/HttpBackOffUnsuccessfulResponseHandler): "you MUST create a new instance of HttpBackOffIOExceptionHandler with a new instance of BackOff for each instance of HttpRequest."
* [HttpBackOffIOExceptionHandler](https://developers.google.com/api-client-library/java/google-http-java-client/reference/1.20.0/com/google/api/client/http/HttpBackOffIOExceptionHandler): "you MUST create a new instance of HttpBackOffIOExceptionHandler with a new instance of BackOff for each instance of HttpRequest."

We believe that the reuse of the failure handlers causes the client to not retry failed requests anymore after the number of possible backoffs (or backoff time) has been exhausted once for each failure handler. Instead clients immediately fail. This is even more problematic since the snapshot logic in an ES cluster causes an immediate spike of write requests and is thus very prone to temporary failure.

Please move the initialization of the failure handlers from the constructor of `DefaultHttpRequestInitializer` into the method `initialize`

Thank you for your consideration!

**Elasticsearch version** (`bin/elasticsearch --version`): 5.4.3

** plugins:
analysis-icu
analysis-smartcn
repository-gcs
repository-s3

** details for GCS plugin:
Name: repository-gcs
Description: The GCS repository plugin adds Google Cloud Storage support for repositories.
Version: 5.4.3
Native Controller: false
 * Classname: org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin

** OS: Debian 3.16.43-2+deb8u2

**JVM version** (`java -version`): 1.8.0_131

**Description of the problem including expected versus actual behavior**:

**Steps to reproduce**:

Please include a *minimal* but *complete* recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc.  The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

 1. Use the GCS snapshot repository for multiple snapshots, without restarting in an ES cluster of more than 6 GCE VMs with an index that takes around 30-40s for a snapshot.

**Provide logs (if relevant)**:

```
logs snippet from search-es-data-b-scale-1709191530 VM Instance:
[2017-10-10T03:30:03,139][WARN ][o.e.s.SnapshotShardsService] [search-es-data-b-scale-1709191530] [[listings_v7][14]] [gcs-search-es-snapshots:search-es-10102017_1129_29688/Hy6gbA22Q0WVv9i8KL9PvQ] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to perform snapshot (index files)
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1376) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:971) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:382) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:88) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:335) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.4.3.jar:5.4.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
Service Unavailable
    at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145) ~[?:?]
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) ~[?:?]
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) ~[?:?]
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) ~[?:?]
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) ~[?:?]
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) ~[?:?]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.lambda$writeBlob$5(GoogleCloudStorageBlobStore.java:219) ~[?:?]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.doPrivileged(GoogleCloudStorageBlobStore.java:333) ~[?:?]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.writeBlob(GoogleCloudStorageBlobStore.java:213) ~[?:?]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainer.writeBlob(GoogleCloudStorageBlobContainer.java:72) ~[?:?]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshotFile(BlobStoreRepository.java:1432) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1374) ~[elasticsearch-5.4.3.jar:5.4.3]
    ... 9 more
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GCS repository does not use new handlers for failed request for every new request #27092

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GCS repository does not use new handlers for failed request for every new request #27092

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions