Skip to content

No response returned from snapshot restore #27791

@dolaru

Description

@dolaru

Elasticsearch version (bin/elasticsearch --version): 6.2.0 , 7.0.0-alpha1

Plugins installed: repository-s3, x-pack

JVM version (java -version): openjdk version "1.8.0_151"

OS version (uname -a if on a Unix-like system): Linux 4.10.0-40-generic #44~16.04.1-Ubuntu SMP Thu Nov 9 15:37:44 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
During ML QA test runs, index snapshots are being restored from S3 by calling the _restore endpoint with the wait_for_completion parameter set to true. About 1% of the API calls from the ML QA framework to the _restore endpoint fail with the following error:

Exception org.apache.http.NoHttpResponseException

Message: localhost:9200 failed to respond

Stacktrace:

at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:281)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:257)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:207)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:684)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:835)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.apache.http.client.HttpClient$execute$0.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
at org.apache.http.client.HttpClient$execute$0.call(Unknown Source)
...

The snapshot restore request is received, acknowledged and completed successfully by the Elasticsearch node, but there's no response coming back. This is not caused by a timeout, as the request fails immediately after it was sent.

The issue started to be observed only in 6.2.0 and 7.0.0-alpha1 after the following changes have been introduced: 58b4d6c...8b49b3f

Reproduction rate: 1%

Steps to reproduce:

  1. Send snapshot restore request
  2. Note that a response is not returned, but the is received, acknowledged and completed successfully by the Elasticsearch node.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions