-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Closed
Copy link
Labels
:Distributed Coordination/Snapshot/RestoreAnything directly related to the `_snapshot/*` APIsAnything directly related to the `_snapshot/*` APIs>test-failureTriaged test failures from CITriaged test failures from CITeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Description
This failed exactly once in 7.x but I can't really explain why and how (https://gradle-enterprise.elastic.co/s/jeugiua6ddqlc).
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.snapshots.CloneSnapshotIT.testBackToBackClonesForIndexNotInCluster" -Dtests.seed=8166C04A329F955 -Dtests.security.manager=true -Dtests.locale=nl -Dtests.timezone=Canada/Pacific -Druntime.java=11
failed without any exception by simply failing to ever make progress on one of the clone operations in that test leading to a timeout:
1> [2020-10-23T21:24:03,703][INFO ][o.e.s.CloneSnapshotIT ] [testBackToBackClonesForIndexNotInCluster] --> creating repository [test-repo] [mock]
1> [2020-10-23T21:24:03,703][INFO ][o.e.p.PluginsService ] [testBackToBackClonesForIndexNotInCluster] no modules loaded
1> [2020-10-23T21:24:03,703][INFO ][o.e.p.PluginsService ] [testBackToBackClonesForIndexNotInCluster] loaded plugin [org.elasticsearch.transport.nio.MockNioTransportPlugin]
1> [2020-10-23T21:24:03,864][INFO ][o.e.s.m.MockRepository ] [node_t0] starting mock repository with random prefix default
1> [2020-10-23T21:24:03,866][INFO ][o.e.r.RepositoriesService] [node_t0] put repository [test-repo]
1> [2020-10-23T21:24:03,889][INFO ][o.e.s.m.MockRepository ] [node_t1] starting mock repository with random prefix default
1> [2020-10-23T21:24:03,906][INFO ][o.e.s.m.MockRepository ] [node_t0] starting mock repository with random prefix default
1> [2020-10-23T21:24:03,980][INFO ][o.e.s.CloneSnapshotIT ] [testBackToBackClonesForIndexNotInCluster] --> creating index [index-blocked]
1> [2020-10-23T21:24:03,986][INFO ][o.e.c.m.MetadataCreateIndexService] [node_t0] [index-blocked] creating index, cause [api], templates [], shards [1]/[0]
1> [2020-10-23T21:24:04,157][INFO ][o.e.c.r.a.AllocationService] [node_t0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index-blocked][0]]]).
1> [2020-10-23T21:24:04,252][INFO ][o.e.c.m.MetadataMappingService] [node_t0] [index-blocked/7J6PbSNNQ9mjoNRHNyS7QA] create_mapping [_doc]
1> [2020-10-23T21:24:04,367][INFO ][o.e.s.CloneSnapshotIT ] [testBackToBackClonesForIndexNotInCluster] --> creating full snapshot [source-snapshot] in [test-repo]
1> [2020-10-23T21:24:04,417][INFO ][o.e.s.SnapshotsService ] [node_t0] snapshot [test-repo:source-snapshot/JPDXkiKjT_uG73g8ey_C5w] started
1> [2020-10-23T21:24:04,679][INFO ][o.e.s.SnapshotsService ] [node_t0] snapshot [test-repo:source-snapshot/JPDXkiKjT_uG73g8ey_C5w] completed with state [SUCCESS]
1> [2020-10-23T21:24:04,681][INFO ][o.e.p.PluginsService ] [testBackToBackClonesForIndexNotInCluster] no modules loaded
1> [2020-10-23T21:24:04,681][INFO ][o.e.p.PluginsService ] [testBackToBackClonesForIndexNotInCluster] loaded plugin [org.elasticsearch.transport.nio.MockNioTransportPlugin]
1> [2020-10-23T21:24:04,826][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_t0] [index-blocked/7J6PbSNNQ9mjoNRHNyS7QA] deleting index
1> [2020-10-23T21:24:04,901][INFO ][o.e.s.CloneSnapshotIT ] [testBackToBackClonesForIndexNotInCluster] --> waiting for [test-repo] to be blocked on node [node_t0]
1> [2020-10-23T21:24:04,914][INFO ][o.e.s.SnapshotsService ] [node_t0] snapshot clone [test-repo:target-snapshot/SJqFswO9Q_W68n9LQx42BA] started
1> [2020-10-23T21:24:04,976][INFO ][o.e.s.m.MockRepository ] [node_t0] [test-repo] blocking I/O operation for file [snap-SJqFswO9Q_W68n9LQx42BA.dat] at path [[indices][FYwjm9zlTIO5OcEDX_aOMA][0]]
1> [2020-10-23T21:24:05,002][INFO ][o.e.s.CloneSnapshotIT ] [testBackToBackClonesForIndexNotInCluster] --> wait for [3] snapshots to show up in the cluster state
1> [2020-10-23T21:24:05,015][INFO ][o.e.s.SnapshotsService ] [node_t0] snapshot clone [test-repo:target-snapshot-1/gFemqMWARE2xRT092DA9gg] started
1> [2020-10-23T21:24:05,024][INFO ][o.e.s.CloneSnapshotIT ] [testBackToBackClonesForIndexNotInCluster] --> wait for [3] snapshots to show up in the cluster state
1> [2020-10-23T21:24:05,024][INFO ][o.e.s.SnapshotsService ] [node_t0] snapshot clone [test-repo:target-snapshot-0/KGqbSozaSQyL17ul8iaAlA] started
1> [2020-10-23T21:24:05,025][INFO ][o.e.s.CloneSnapshotIT ] [testBackToBackClonesForIndexNotInCluster] --> unblocking [test-repo] on node [node_t0]
1> [2020-10-23T21:24:05,134][INFO ][o.e.s.SnapshotsService ] [node_t0] snapshot [test-repo:target-snapshot/SJqFswO9Q_W68n9LQx42BA] completed with state [SUCCESS]
1> [2020-10-23T21:24:05,194][INFO ][o.e.s.SnapshotsService ] [node_t0] snapshot [test-repo:target-snapshot-0/KGqbSozaSQyL17ul8iaAlA] completed with state [SUCCESS]
2> okt 23, 2020 9:44:02 PM com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
2> WARNING: Suite execution timed out: org.elasticsearch.snapshots.CloneSnapshotIT
2> ==== jstack at approximately timeout time ====
2> "main" ID=1 WAITING on java.util.concurrent.CountDownLatch$Sync@393e85e7
2> at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
2> - waiting on java.util.concurrent.CountDownLatch$Sync@393e85e7
2> at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
2> at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
2> at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
2> at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
2> at [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
2> at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:73)
2> at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:47)
2> at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
2> at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:133)
2> at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
2> at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
2> at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
2> "Reference Handler" ID=2 RUNNABLE
2> at [email protected]/java.lang.ref.Reference.waitForReferencePendingList(Native Method)
2> at [email protected]/java.lang.ref.Reference.processPendingReferences(Reference.java:241)
2> at [email protected]/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)
I'll try to reason about this a little more and add some logging to see if I can track it down.
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/Snapshot/RestoreAnything directly related to the `_snapshot/*` APIsAnything directly related to the `_snapshot/*` APIs>test-failureTriaged test failures from CITriaged test failures from CITeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.