-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
Description
ArrayCompareConditionSearchTests test suite is flaky due to the integ cluster's
SchedulerEngine's thread trigger_engine_scheduler not shutting down in time.
failure instance in CI: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=openjdk12,ES_RUNTIME_JAVA=openjdk12,nodes=immutable&&linux&&docker/240/console
num occurrences: 6 times in last 6 months.
stacktrace:
ERROR 0.00s J2 | ArrayCompareConditionSearchTests (suite) <<< FAILURES!
> Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.elasticsearch.xpack.watcher.condition.ArrayCompareConditionSearchTests:
> 1) Thread[id=337, name=elasticsearch[node_sm1][trigger_engine_scheduler][T#1], state=TIMED_WAITING, group=TGRP-ArrayCompareConditionSearchTests]
> at java.base@12/jdk.internal.misc.Unsafe.park(Native Method)
> at java.base@12/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:235)
> at java.base@12/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
> at java.base@12/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
> at java.base@12/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
> at java.base@12/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
> at java.base@12/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
> at java.base@12/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base@12/java.lang.Thread.run(Thread.java:835)
> at __randomizedtesting.SeedInfo.seed([3D43839813A5AEA5]:0)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated:
> 1) Thread[id=337, name=elasticsearch[node_sm1][trigger_engine_scheduler][T#1], state=TIMED_WAITING, group=TGRP-ArrayCompareConditionSearchTests]
> at java.base@12/jdk.internal.misc.Unsafe.park(Native Method)
> at java.base@12/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:235)
> at java.base@12/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123)
> at java.base@12/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1182)
> at java.base@12/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
> at java.base@12/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
> at java.base@12/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
> at java.base@12/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base@12/java.lang.Thread.run(Thread.java:835)
> at __randomizedtesting.SeedInfo.seed([3D43839813A5AEA5]:0)
Completed [111/140] on J2 in 18.31s, 1 test, 2 errors <<< FAILURES!
RollupIT had the same problem because it leverages the SchedulerEngine, its solution was to move away from integ tests and rewrite the test as rest tests. It looks like this test suite can do the same.