Skip to content

[CI] Timeouts in CcrRetentionLeaseIT#testRetentionLeaseIsAddedIfItDisappearsWhileFollowing #41737

@cbuescher

Description

@cbuescher

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+multijob-unix-compatibility/os=opensuse/142/console

Doesn't reproduce:

REPRODUCE WITH: ./gradlew :x-pack:plugin:ccr:internalClusterTest \
  -Dtests.seed=4B6E4686DF432308 \
  -Dtests.class=org.elasticsearch.xpack.ccr.CcrRetentionLeaseIT \
  -Dtests.method="testRetentionLeaseIsAddedIfItDisappearsWhileFollowing" \
  -Dtests.security.manager=true \
  -Dtests.locale=es-PR \
  -Dtests.timezone=Antarctica/South_Pole \
  -Dcompiler.java=12 \
  -Druntime.java=8

REPRODUCE WITH: ./gradlew :x-pack:plugin:ccr:internalClusterTest \
  -Dtests.seed=4B6E4686DF432308 \
  -Dtests.class=org.elasticsearch.xpack.ccr.CcrRetentionLeaseIT \
  -Dtests.security.manager=true \
  -Dtests.locale=en-US \
  -Dtests.timezone=Etc/UTC \
  -Dcompiler.java=12 \
  -Druntime.java=8

There are lots of thread leak outputs in the full console log like these:

1> [2019-05-02T21:54:22,863][INFO ][o.e.x.c.a.ShardFollowTasksExecutor] [followerd1] [follower][0] Starting to track leader shard [leader][0]
  1> [2019-05-02T21:54:22,867][INFO ][o.e.x.c.CcrRetentionLeaseIT] [testRetentionLeaseIsAddedIfItDisappearsWhileFollowing] ensure green follower indices [follower]
  1> [2019-05-02T21:54:22,876][INFO ][o.e.x.c.a.ShardFollowNodeTask] [followerd1] [follower][0] following leader shard [leader][0], follower global checkpoint=[-1], mapping version=[1], settings version=[1]
  2> may 02, 2019 10:13:49 PM com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
  2> ADVERTENCIA: Suite execution timed out: org.elasticsearch.xpack.ccr.CcrRetentionLeaseIT
  2> ==== jstack at approximately timeout time ====
  2> "elasticsearch[followerd1][flush][T#1]" ID=457 WAITING on org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue@7aedf8c7
  2> 	at sun.misc.Unsafe.park(Native Method)
  2> 	- waiting on org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue@7aedf8c7
  2> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  2> 	at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
  2> 	at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
  2> 	at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
  2> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2> 	at java.lang.Thread.run(Thread.java:748)
[...]
2> ADVERTENCIA: Will linger awaiting termination of 328 leaked thread(s).
  2> may 02, 2019 10:13:55 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
  2> GRAVE: 328 threads leaked from SUITE scope at org.elasticsearch.xpack.ccr.CcrRetentionLeaseIT: 
  2>    1) Thread[id=139, name=elasticsearch[followerm0][[timer]], state=TIMED_WAITING, group=TGRP-CcrRetentionLeaseIT]
  2>         at java.lang.Thread.sleep(Native Method)
  2>         at org.elasticsearch.threadpool.ThreadPool$CachedTimeThread.run(ThreadPool.java:574)
  2>    2) Thread[id=169, name=elasticsearch[leaderd4][__mock_network_thread][T#15], state=RUNNABLE, group=TGRP-CcrRetentionLeaseIT]
  2>         at java.net.SocketInputStream.socketRead0(Native Method)
  2>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
  2>         at java.net.SocketInputStream.read(SocketInputStream.java:171)
  2>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
  2>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
  2>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
  2>         at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
  2>         at org.elasticsearch.common.io.Streams.readFully(Streams.java:214)
  2>         at org.elasticsearch.common.io.stream.InputStreamStreamInput.readBytes(InputStreamStreamInput.java:67)
  2>         at org.elasticsearch.common.io.stream.StreamInput.readFully(StreamInput.java:192)
  2>         at org.elasticsearch.transport.MockTcpTransport.readMessage(MockTcpTransport.java:150)
  2>         at org.elasticsearch.transport.MockTcpTransport.access$800(MockTcpTransport.java:75)
  2>         at org.elasticsearch.transport.MockTcpTransport$MockChannel$2.lambda$doRun$0(MockTcpTransport.java:349)
  2>         at org.elasticsearch.transport.MockTcpTransport$MockChannel$2$$Lambda$1931/705689348.run(Unknown Source)
  2>         at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:108)
  2>         at org.elasticsearch.transport.MockTcpTransport$MockChannel$2.doRun(MockTcpTransport.java:349)
  2>         at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2>         at java.lang.Thread.run(Thread.java:748)
  2>    3) Thread[id=286, name=elasticsearch[followerm0][ccr][T#3], state=WAITING, group=TGRP-CcrRetentionLeaseIT]
  2>         at sun.misc.Unsafe.park(Native Method)
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  2>         at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
  2>         at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
  2>         at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
  2>         at org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:165)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2>         at java.lang.Thread.run(Thread.java:748)
  2>    4) Thread[id=338, name=elasticsearch[followerd1][ccr][T#25], state=WAITING, group=TGRP-CcrRetentionLeaseIT]
  2>         at sun.misc.Unsafe.park(Native Method)
  2>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  2>         at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:737)
  2>         at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
  2>         at java.util.concurrent.LinkedTransferQueue.take(LinkedTransferQueue.java:1269)
  2>         at org.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:165)
  2>         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
  2>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
  2>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2>         at java.lang.Thread.run(Thread.java:748)
[...]

Possibly related to #41428 and #41679 but filing as a new issue here regardless so the team can decide.

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/CCRIssues around the Cross Cluster State Replication features>test-failureTriaged test failures from CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions