Skip to content

S3 snapshots with timeout failures after upgrade to 5.5.2 #26576

@davekonopka

Description

@davekonopka

Elasticsearch version (bin/elasticsearch --version):
5.5.2

Plugins installed: []
discovery-ec2
repository-s3

JVM version (java -version):
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

OS version (uname -a if on a Unix-like system):
Amazon Linux on EC2 instances
Linux 4.9.43-17.38.amzn1.x86_64 #1 SMP Thu Aug 17 00:20:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Steps to reproduce:

I recently upgraded a few clusters in different environments from 5.2.2 -> 5.5.2. Since doing so one of the clusters is running into timeout failures creating snapshots to S3. I've had a few successful snapshots and the other clusters have no failures so I know it does work. However most runs produce at least one failed shared or more with the same timeout error. Incidentally this has been limited to our production cluster which has the most/largest indices.

Provide logs (if relevant):

Some data redacted with ... below.

     "failures": [
        {
          "index": "...-2017.09.11",
          "index_uuid": "...-2017.09.11",
          "shard_id": 4,
          "reason": "IndexShardSnapshotFailedException[Failed to perform snapshot (index files)]; nested: IOException[Unable to upload object elasticsearch-snapshots/indices/.../4/__e]; nested: AmazonS3Exception[Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: BB3062E801AD4513)]; ",
          "node_id": "...",
          "status": "INTERNAL_SERVER_ERROR"
        }
      ],
     "failures": [
        {
          "index": "...-2017.09.10",
          "index_uuid": "...-2017.09.10",
          "shard_id": 0,
          "reason": "IndexShardSnapshotFailedException[Failed to perform snapshot (index files)]; nested: IOException[Unable to upload object elasticsearch-snapshots/indices/.../0/__1]; nested: AmazonS3Exception[Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 4C030EBC4EB49F51)]; ",
          "node_id": "...",
          "status": "INTERNAL_SERVER_ERROR"
        },
        {
          "index": "...-2017.08.30",
          "index_uuid": "...-2017.08.30",
          "shard_id": 0,
          "reason": "IndexShardSnapshotFailedException[Failed to write file list]; nested: IOException[Unable to upload object elasticsearch-snapshots/indices/.../0/pending-index-11]; nested: AmazonS3Exception[Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 1562D61EBA5696BD)]; ",
          "node_id": "...",
          "status": "INTERNAL_SERVER_ERROR"
        }
      ],

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions