Skip to content

[CI] Test Failure in CloneSnapshotIT.testBackToBackClonesForIndexNotInCluster #64115

@original-brownbear

Description

@original-brownbear

This failed exactly once in 7.x but I can't really explain why and how (https://gradle-enterprise.elastic.co/s/jeugiua6ddqlc).

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.snapshots.CloneSnapshotIT.testBackToBackClonesForIndexNotInCluster" -Dtests.seed=8166C04A329F955 -Dtests.security.manager=true -Dtests.locale=nl -Dtests.timezone=Canada/Pacific -Druntime.java=11

failed without any exception by simply failing to ever make progress on one of the clone operations in that test leading to a timeout:

  1> [2020-10-23T21:24:03,703][INFO ][o.e.s.CloneSnapshotIT    ] [testBackToBackClonesForIndexNotInCluster] --> creating repository [test-repo] [mock]
  1> [2020-10-23T21:24:03,703][INFO ][o.e.p.PluginsService     ] [testBackToBackClonesForIndexNotInCluster] no modules loaded
  1> [2020-10-23T21:24:03,703][INFO ][o.e.p.PluginsService     ] [testBackToBackClonesForIndexNotInCluster] loaded plugin [org.elasticsearch.transport.nio.MockNioTransportPlugin]
  1> [2020-10-23T21:24:03,864][INFO ][o.e.s.m.MockRepository   ] [node_t0] starting mock repository with random prefix default
  1> [2020-10-23T21:24:03,866][INFO ][o.e.r.RepositoriesService] [node_t0] put repository [test-repo]
  1> [2020-10-23T21:24:03,889][INFO ][o.e.s.m.MockRepository   ] [node_t1] starting mock repository with random prefix default
  1> [2020-10-23T21:24:03,906][INFO ][o.e.s.m.MockRepository   ] [node_t0] starting mock repository with random prefix default
  1> [2020-10-23T21:24:03,980][INFO ][o.e.s.CloneSnapshotIT    ] [testBackToBackClonesForIndexNotInCluster] --> creating index [index-blocked]
  1> [2020-10-23T21:24:03,986][INFO ][o.e.c.m.MetadataCreateIndexService] [node_t0] [index-blocked] creating index, cause [api], templates [], shards [1]/[0]
  1> [2020-10-23T21:24:04,157][INFO ][o.e.c.r.a.AllocationService] [node_t0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index-blocked][0]]]).
  1> [2020-10-23T21:24:04,252][INFO ][o.e.c.m.MetadataMappingService] [node_t0] [index-blocked/7J6PbSNNQ9mjoNRHNyS7QA] create_mapping [_doc]
  1> [2020-10-23T21:24:04,367][INFO ][o.e.s.CloneSnapshotIT    ] [testBackToBackClonesForIndexNotInCluster] --> creating full snapshot [source-snapshot] in [test-repo]
  1> [2020-10-23T21:24:04,417][INFO ][o.e.s.SnapshotsService   ] [node_t0] snapshot [test-repo:source-snapshot/JPDXkiKjT_uG73g8ey_C5w] started
  1> [2020-10-23T21:24:04,679][INFO ][o.e.s.SnapshotsService   ] [node_t0] snapshot [test-repo:source-snapshot/JPDXkiKjT_uG73g8ey_C5w] completed with state [SUCCESS]
  1> [2020-10-23T21:24:04,681][INFO ][o.e.p.PluginsService     ] [testBackToBackClonesForIndexNotInCluster] no modules loaded
  1> [2020-10-23T21:24:04,681][INFO ][o.e.p.PluginsService     ] [testBackToBackClonesForIndexNotInCluster] loaded plugin [org.elasticsearch.transport.nio.MockNioTransportPlugin]
  1> [2020-10-23T21:24:04,826][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_t0] [index-blocked/7J6PbSNNQ9mjoNRHNyS7QA] deleting index
  1> [2020-10-23T21:24:04,901][INFO ][o.e.s.CloneSnapshotIT    ] [testBackToBackClonesForIndexNotInCluster] --> waiting for [test-repo] to be blocked on node [node_t0]
  1> [2020-10-23T21:24:04,914][INFO ][o.e.s.SnapshotsService   ] [node_t0] snapshot clone [test-repo:target-snapshot/SJqFswO9Q_W68n9LQx42BA] started
  1> [2020-10-23T21:24:04,976][INFO ][o.e.s.m.MockRepository   ] [node_t0] [test-repo] blocking I/O operation for file [snap-SJqFswO9Q_W68n9LQx42BA.dat] at path [[indices][FYwjm9zlTIO5OcEDX_aOMA][0]]
  1> [2020-10-23T21:24:05,002][INFO ][o.e.s.CloneSnapshotIT    ] [testBackToBackClonesForIndexNotInCluster] --> wait for [3] snapshots to show up in the cluster state
  1> [2020-10-23T21:24:05,015][INFO ][o.e.s.SnapshotsService   ] [node_t0] snapshot clone [test-repo:target-snapshot-1/gFemqMWARE2xRT092DA9gg] started
  1> [2020-10-23T21:24:05,024][INFO ][o.e.s.CloneSnapshotIT    ] [testBackToBackClonesForIndexNotInCluster] --> wait for [3] snapshots to show up in the cluster state
  1> [2020-10-23T21:24:05,024][INFO ][o.e.s.SnapshotsService   ] [node_t0] snapshot clone [test-repo:target-snapshot-0/KGqbSozaSQyL17ul8iaAlA] started
  1> [2020-10-23T21:24:05,025][INFO ][o.e.s.CloneSnapshotIT    ] [testBackToBackClonesForIndexNotInCluster] --> unblocking [test-repo] on node [node_t0]
  1> [2020-10-23T21:24:05,134][INFO ][o.e.s.SnapshotsService   ] [node_t0] snapshot [test-repo:target-snapshot/SJqFswO9Q_W68n9LQx42BA] completed with state [SUCCESS]
  1> [2020-10-23T21:24:05,194][INFO ][o.e.s.SnapshotsService   ] [node_t0] snapshot [test-repo:target-snapshot-0/KGqbSozaSQyL17ul8iaAlA] completed with state [SUCCESS]
  2> okt 23, 2020 9:44:02 PM com.carrotsearch.randomizedtesting.ThreadLeakControl$2 evaluate
  2> WARNING: Suite execution timed out: org.elasticsearch.snapshots.CloneSnapshotIT
  2> ==== jstack at approximately timeout time ====
  2> "main" ID=1 WAITING on java.util.concurrent.CountDownLatch$Sync@393e85e7
  2> 	at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
  2> 	- waiting on java.util.concurrent.CountDownLatch$Sync@393e85e7
  2> 	at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
  2> 	at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
  2> 	at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
  2> 	at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
  2> 	at [email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
  2> 	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:73)
  2> 	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:47)
  2> 	at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
  2> 	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:133)
  2> 	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
  2> 	at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
  2> 	at app//worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

  2> "Reference Handler" ID=2 RUNNABLE
  2> 	at [email protected]/java.lang.ref.Reference.waitForReferencePendingList(Native Method)
  2> 	at [email protected]/java.lang.ref.Reference.processPendingReferences(Reference.java:241)
  2> 	at [email protected]/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)

I'll try to reason about this a little more and add some logging to see if I can track it down.

Metadata

Metadata

Labels

:Distributed Coordination/Snapshot/RestoreAnything directly related to the `_snapshot/*` APIs>test-failureTriaged test failures from CITeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions