Skip to content

GCS repository does not use new handlers for failed request for every new request  #27092

@thkoch2001

Description

@thkoch2001

Google Cloud Support was contacted by a customer who sees many 503 errors when doing snapshots to Google Cloud Storage (GCS). We believe to have tracked down the bug to an incorrect initialization of HttpRequest in GoogleCloudStorageService.

The example code in https://cloud.google.com/storage/transfer/create-client#retry attaches new instances of HttpUnsuccessfulResponseHandler and HttpBackOffIOExceptionHandler to every request. However the plugin initializes only one instance of each class in the constructor (line 142-143) and attaches the same instances to every request.

Please also consult the JavaDoc of both handler classes:

We believe that the reuse of the failure handlers causes the client to not retry failed requests anymore after the number of possible backoffs (or backoff time) has been exhausted once for each failure handler. Instead clients immediately fail. This is even more problematic since the snapshot logic in an ES cluster causes an immediate spike of write requests and is thus very prone to temporary failure.

Please move the initialization of the failure handlers from the constructor of DefaultHttpRequestInitializer into the method initialize

Thank you for your consideration!

Elasticsearch version (bin/elasticsearch --version): 5.4.3

** plugins:
analysis-icu
analysis-smartcn
repository-gcs
repository-s3

** details for GCS plugin:
Name: repository-gcs
Description: The GCS repository plugin adds Google Cloud Storage support for repositories.
Version: 5.4.3
Native Controller: false

  • Classname: org.elasticsearch.repositories.gcs.GoogleCloudStoragePlugin

** OS: Debian 3.16.43-2+deb8u2

JVM version (java -version): 1.8.0_131

Description of the problem including expected versus actual behavior:

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

  1. Use the GCS snapshot repository for multiple snapshots, without restarting in an ES cluster of more than 6 GCE VMs with an index that takes around 30-40s for a snapshot.

Provide logs (if relevant):

logs snippet from search-es-data-b-scale-1709191530 VM Instance:
[2017-10-10T03:30:03,139][WARN ][o.e.s.SnapshotShardsService] [search-es-data-b-scale-1709191530] [[listings_v7][14]] [gcs-search-es-snapshots:search-es-10102017_1129_29688/Hy6gbA22Q0WVv9i8KL9PvQ] failed to create snapshot
org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Failed to perform snapshot (index files)
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1376) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:971) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:382) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.snapshots.SnapshotShardsService.access$200(SnapshotShardsService.java:88) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:335) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.4.3.jar:5.4.3]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 503 Service Unavailable
Service Unavailable
    at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145) ~[?:?]
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) ~[?:?]
    at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) ~[?:?]
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432) ~[?:?]
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) ~[?:?]
    at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) ~[?:?]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.lambda$writeBlob$5(GoogleCloudStorageBlobStore.java:219) ~[?:?]
    at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_131]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.doPrivileged(GoogleCloudStorageBlobStore.java:333) ~[?:?]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStore.writeBlob(GoogleCloudStorageBlobStore.java:213) ~[?:?]
    at org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobContainer.writeBlob(GoogleCloudStorageBlobContainer.java:72) ~[?:?]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshotFile(BlobStoreRepository.java:1432) ~[elasticsearch-5.4.3.jar:5.4.3]
    at org.elasticsearch.repositories.blobstore.BlobStoreRepository$SnapshotContext.snapshot(BlobStoreRepository.java:1374) ~[elasticsearch-5.4.3.jar:5.4.3]
    ... 9 more

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions