-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version):
Tested over all major 5.x versions [ 5.1.2 5.2.x 5.3.x ... ]
Plugins installed*: [ defaults ]
JVM version (java -version):
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
OS version (uname -a if on a Unix-like system): CentOS 6
Description of the problem including expected versus actual behavior: The issue faced is when using the Java API Transport client in client side applications. If ES Hangs or goes down, all bulk processor threads gets deadlocked.
This issue was already discussed here: https://discuss.elastic.co/t/java-application-using-bulkprocessing-hangs-if-elasticsearch-hangs/36960/8
We are facing this issue as a result of ES going down due to #24359.
Can we as a feature implement a timed waiting semaphore as described here: https://discuss.elastic.co/t/java-application-using-bulkprocessing-hangs-if-elasticsearch-hangs/36960/2 and expose it as a param the value for semaphore release. ( Or ofcourse a better way of correcting this ).
Is there any workaround for this possible from the code outside of driver as a quick fix in case it can't be handled at driver level?
Steps to reproduce:
- ES is running and bulk insertion from application in progress.
- ES nodes restart
- Deadlock at application side.
Thread dump:
Looks like as follows:
"HistoryCachedExecutor-125" #429 daemon prio=5 os_prio=0 tid=0x00007fd268dea000 nid=0x8fe1 waiting on condition [0x00007fd0de2e3000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000744a1ca98> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
Any help in fix or a workaround for this will be really appreciated. Glad to provide any further info needed on this.