[CI] unexpected failure while replicating translog entry

A test failure occurred in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/99/console, however, the root cause was that one of the nodes in the cluster suffered a fatal exception:

```
[2019-02-14T11:46:31,144][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node-1] fatal error in thread [elasticsearch[node-1][generic][T#2]], exiting
java.lang.AssertionError: unexpected failure while replicating translog entry: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
    at org.elasticsearch.indices.recovery.RecoveryTarget.lambda$indexTranslogOperations$2(RecoveryTarget.java:362) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:191) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT] 
    at org.elasticsearch.indices.recovery.RecoveryTarget.indexTranslogOperations(RecoveryTarget.java:333) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:521) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$TranslogOperationsRequestHandler.messageReceived(PeerRecoveryTargetService.java:480) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1076) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.1.0-SNAPSHOT.jar:7.1.0-SNAPSHOT]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]
```

The test that was running when this happened was:

```
./gradlew :qa:smoke-test-multinode:integTestRunner \
  -Dtests.seed=666DFE2D5892C30E \
  -Dtests.class=org.elasticsearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT \
  -Dtests.method="test {yaml=smoke_test_multinode/10_basic/cluster health basic test, wait for both nodes to join}" \
  -Dtests.security.manager=true \
  -Dtests.locale=mk \
  -Dtests.timezone=Europe/Malta \
  -Dcompiler.java=11 \
  -Druntime.java=8
```

Since an almost identical test immediately before succeeded I doubt there is anything wrong with that test.  (The "stash dump on failure" in the Jenkins log is very confusing too as it contains the result of the previous successful test.)

[cluster_logs.zip](https://github.com/elastic/elasticsearch/files/2865147/cluster_logs.zip) contains the logs from the nodes in the test cluster that died with the fatal error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] unexpected failure while replicating translog entry #38898

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CI] unexpected failure while replicating translog entry #38898

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions