-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version: 2.3.4
JVM version: 1.8.0_91
OS version: RedHat 6.5
We are using the TribeNode feature to enable search across a number of geographically distributed ElasticSearch clusters. Occasionally when we take one of these clusters completely offline, we find that our TribeNode hits the following exception:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:85)
at org.elasticsearch.threadpool.ThreadPool$ThreadedRunnable.run(ThreadPool.java:676)
at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:640)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This exception is thrown because of thread exhaustion due to the TribeNode creating a new thread every couple of seconds. Below is the stack trace of the leaked threads:
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
org.elasticsearch.common.util.concurrent.KeyedLock.acquire(KeyedLock.java:75)
org.elasticsearch.transport.netty.NettyTransport.disconnectFromNode(NettyTransport.java:1063)
org.elasticsearch.transport.TransportService.disconnectFromNode(TransportService.java:274)
org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2$1.doRun(UnicastZenPing.java:258)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Steps to reproduce:
Create TribeNode configuration where one cluster is offline. Its not enough that the processes are shutdown and the machine is online, the nodes specified in the discovery.zen.ping.unicast.hosts for the offline cluster must be offline and not respond to ping/connection attempts. Here is a simple configuration I was able to use to reproduce the problem.
---
cluster.name: "thread-leak-test"
node.name: "thread-leak-node"
http.port: "9201"
http.host: "127.0.0.1"
tribe:
online-cluster:
cluster.name: "online-cluster"
discovery.zen.ping.unicast.hosts:
- "localhost"
offline-cluster:
cluster.name: "offline-cluster"
discovery.zen.ping.unicast.hosts:
- "10.10.10.10"
Start the Tribe node. Observe that the number of threads continue to grow unbounded (ps -m <pid> | wc -l) until the OutOfMemoryError: unable to create new native thread exceptions are thrown.
This issue appears similar to the problem described in #8057.