-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.Cluster formation and cluster state publication, including cluster membership and fault detection.>test-failureTriaged test failures from CITriaged test failures from CI
Description
It looks like the cluster from SettingsBasedHostProviderIT#testClusterFormsWithSingleSeedHostInSettings isn't always properly shut down (we can see SEVERE: There are still zombie threads that couldn't be terminated: ... in the logs). This causes the suite to fail as a whole because threads are leaked, and also the next case SettingsBasedHostProviderIT#testClusterFormsByScanningPorts fails with a port conflict. I haven't been able to reproduce this locally.
Link to the build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=centos/14/console
Example reproduction line:
./gradlew :server:integTest \
-Dtests.seed=D9449C904EA3FE3D \
-Dtests.class=org.elasticsearch.discovery.zen.SettingsBasedHostProviderIT \
-Dtests.security.manager=true \
-Dtests.locale=lv-LV \
-Dtests.timezone=Africa/Khartoum \
-Dcompiler.java=11 \
-Druntime.java=8
Relevant excerpts from the logs:
SEVERE: There are still zombie threads that couldn't be terminated:
1> {node_t2}{eTfDiXEjTeW36FIwAQPIMw}{_-79Lb0zQzS2QNF3xm2TUg}{127.0.0.1}{127.0.0.1:44959}
1> {node_t1}{C8ielKNXQHy_CWeeULZJHg}{shpu0GWiSoOaJY1WTe6anQ}{127.0.0.1}{127.0.0.1:33331}, local
2> 1) Thread[id=9776, name=elasticsearch[node_t1][generic][T#1], state=WAITING, group=TGRP-SettingsBasedHostProviderIT]
2> at sun.misc.Unsafe.park(Native Method)
1> [2018-10-23T13:45:52,004][WARN ][o.e.t.n.MockNioTransport ] [node_t3] send message failed [channel: NioSocketChannel{localAddress=0.0.0.0/0.0.0.0:40045, remoteAddress=/127.0.0.1:45116}]
1> java.nio.channels.ClosedChannelException: null
2> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
1> at org.elasticsearch.nio.SocketChannelContext.sendMessage(SocketChannelContext.java:131) [elasticsearch-nio-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
2> at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
1> at org.elasticsearch.transport.nio.MockNioTransport$MockSocketChannel.sendMessage(MockNioTransport.java:288) [framework-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
2> at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
2> at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
1> at org.elasticsearch.transport.TcpTransport.internalSendMessage(TcpTransport.java:921) [main/:?]
2> at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
2> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
2> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
1> at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:1010) [main/:?]
1> at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:978) [main/:?]
1> at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:66) [main/:?]
1> at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:60) [main/:?]
2> at java.lang.Thread.run(Thread.java:748)
1> at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) [main/:?]
2> 2) Thread[id=9772, name=elasticsearch[node_t1][scheduler][T#1], state=TIMED_WAITING, group=TGRP-SettingsBasedHostProviderIT]
2> at sun.misc.Unsafe.park(Native Method)
2> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
1> at org.elasticsearch.discovery.zen.MembershipAction$LeaveRequestRequestHandler.messageReceived(MembershipAction.java:287) [main/:?]
1> at org.elasticsearch.discovery.zen.MembershipAction$LeaveRequestRequestHandler.messageReceived(MembershipAction.java:282) [main/:?]
1> at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) [main/:?]
1> at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1448) [main/:?]
2> at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
2> at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
2> at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
2> at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
2> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) [main/:?]
1> at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [main/:?]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_192]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_192]
2> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2> at java.lang.Thread.run(Thread.java:748)
...
ERROR 0.12s J1 | SettingsBasedHostProviderIT.testClusterFormsByScanningPorts <<< FAILURES!
> Throwable #1: java.lang.RuntimeException: failed to start nodes
> at __randomizedtesting.SeedInfo.seed([D9449C904EA3FE3D:47EF44A5DD493630]:0)
> at org.elasticsearch.test.InternalTestCluster.startAndPublishNodesAndClients(InternalTestCluster.java:1574)
> at org.elasticsearch.test.InternalTestCluster.startNodes(InternalTestCluster.java:1891)
> at org.elasticsearch.test.InternalTestCluster.startNode(InternalTestCluster.java:1859)
> at org.elasticsearch.test.InternalTestCluster.startNode(InternalTestCluster.java:1852)
> at org.elasticsearch.discovery.zen.SettingsBasedHostProviderIT.testClusterFormsByScanningPorts(SettingsBasedHostProviderIT.java:78)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.util.concurrent.ExecutionException: BindTransportException[Failed to bind to [44859-44860]]; nested: BindException[Address already in use];
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at org.elasticsearch.test.InternalTestCluster.startAndPublishNodesAndClients(InternalTestCluster.java:1569)
> ... 41 more
Full build log: build_log.txt
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.Cluster formation and cluster state publication, including cluster membership and fault detection.>test-failureTriaged test failures from CITriaged test failures from CI