-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Distributed Coordination/Snapshot/RestoreAnything directly related to the `_snapshot/*` APIsAnything directly related to the `_snapshot/*` APIs>test-failureTriaged test failures from CITriaged test failures from CI
Description
This test failed in this CI run: https://gradle-enterprise.elastic.co/s/rlbqh5tapantg/ (in 7.x). It shuts down a node while that node is snapshotting a shard, but the shard snapshot does not fail:
1> [2019-10-25T01:22:40,150][INFO ][o.e.s.DedicatedClusterSnapshotRestoreIT] [testDataNodeRestartAfterShardSnapshotFailure] --> snapshot
1> [2019-10-25T01:22:40,160][INFO ][o.e.s.SnapshotsService ] [node_tm0] snapshot [test-repo:test-snap/8CMeKZucSfKdomiV2gX9Lw] started
1> [2019-10-25T01:22:40,217][INFO ][o.e.s.DedicatedClusterSnapshotRestoreIT] [testDataNodeRestartAfterShardSnapshotFailure] --> restarting first data node, which should cause the primary shard on it to be failed
1> [2019-10-25T01:22:40,217][INFO ][o.e.t.InternalTestCluster] [testDataNodeRestartAfterShardSnapshotFailure] Restarting node [node_td1]
1> [2019-10-25T01:22:40,218][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] stopping ...
1> [2019-10-25T01:22:40,220][INFO ][o.e.c.c.Coordinator ] [node_td1] master node [{node_tm0}{ztB71aT-Qfmpv14TI36OOg}{JRG4MuoYREqTFOGkDH3D2g}{127.0.0.1}{127.0.0.1:34059}{m}] failed, restarting discovery
1> org.elasticsearch.transport.NodeDisconnectedException: [node_tm0][127.0.0.1:34059][disconnected] disconnected
1> [2019-10-25T01:22:40,223][INFO ][o.e.s.m.MockRepository ] [node_td2] [test-repo] blocking I/O operation for file [__Gee36ShzRIWE9EIdskMBfw.part0] at path [[indices][Md-z-wvRQPuh-C_U1reAfQ][1]]
1> [2019-10-25T01:22:40,224][INFO ][o.e.c.s.MasterService ] [node_tm0] node-left[{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{DKlTH9pkRymZtUbRoe8_rA}{127.0.0.1}{127.0.0.1:37309}{di} reason: disconnected], term: 1, version: 12, delta: removed {{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{DKlTH9pkRymZtUbRoe8_rA}{127.0.0.1}{127.0.0.1:37309}{di}}
1> [2019-10-25T01:22:40,224][WARN ][o.e.s.SnapshotShardsService] [node_td1] [[test-idx][0]][test-repo:test-snap/8CMeKZucSfKdomiV2gX9Lw] failed to snapshot shard
1> org.elasticsearch.index.snapshots.IndexShardSnapshotFailedException: Aborted
1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:1112) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:337) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:285) ~[main/:?]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [main/:?]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_221]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_221]
1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]
1> [2019-10-25T01:22:40,226][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] stopped
1> [2019-10-25T01:22:40,227][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] closing ...
1> [2019-10-25T01:22:40,229][WARN ][o.e.s.SnapshotShardsService] [node_td1] [test-repo:test-snap/8CMeKZucSfKdomiV2gX9Lw] [ShardSnapshotStatus[state=FAILED, nodeId=Kn1zpH90QPqXMtL1NqqVOg, reason=[test-idx/l-lv0KpjQX6MGCl5PTKXUw][[test-idx][0]] IndexShardSnapshotFailedException[Aborted]
1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:1112)
1> at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:337)
1> at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:285)
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703)
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
1> at java.lang.Thread.run(Thread.java:748)
1> , generation=null]] failed to update snapshot state
1> org.elasticsearch.transport.SendRequestTransportException: [node_td1][127.0.0.1:37309][internal:cluster/snapshot/update_snapshot_status]
1> at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:704) ~[main/:?]
1> at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:602) ~[main/:?]
1> at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:577) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.lambda$sendSnapshotShardUpdate$2(SnapshotShardsService.java:471) ~[main/:?]
1> at org.elasticsearch.transport.TransportRequestDeduplicator.executeOnce(TransportRequestDeduplicator.java:52) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.sendSnapshotShardUpdate(SnapshotShardsService.java:457) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.notifyFailedSnapshotShard(SnapshotShardsService.java:451) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.access$400(SnapshotShardsService.java:91) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService$1.onFailure(SnapshotShardsService.java:301) ~[main/:?]
1> at org.elasticsearch.action.ActionListener$5.onFailure(ActionListener.java:256) ~[main/:?]
1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$snapshotShard$32(BlobStoreRepository.java:1061) ~[main/:?]
1> at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) ~[main/:?]
1> at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:1239) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:337) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:285) ~[main/:?]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) ~[main/:?]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_221]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_221]
1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]
1> Caused by: org.elasticsearch.transport.TransportException: TransportService is closed stopped can't send request
1> at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:686) ~[main/:?]
1> ... 18 more
1> [2019-10-25T01:22:40,232][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] closed
1> [2019-10-25T01:22:40,240][INFO ][o.e.e.NodeEnvironment ] [testDataNodeRestartAfterShardSnapshotFailure] using [1] data paths, mounts [[/ (/dev/sda1)]], net usable_space [307gb], net total_space [349.9gb], types [xfs]
1> [2019-10-25T01:22:40,240][INFO ][o.e.e.NodeEnvironment ] [testDataNodeRestartAfterShardSnapshotFailure] heap size [491mb], compressed ordinary object pointers [true]
1> [2019-10-25T01:22:40,249][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] node name [node_td1], node ID [Kn1zpH90QPqXMtL1NqqVOg], cluster name [TEST-TEST_WORKER_VM=[435]-CLUSTER_SEED=[-2213907199717514849]-HASH=[17AAA92C130]-cluster]
1> [2019-10-25T01:22:40,249][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] version[7.6.0], pid[217518], build[unknown/unknown/unknown/unknown], OS[Linux/4.14.35-1902.6.6.el7uek.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_221/25.221-b11]
1> [2019-10-25T01:22:40,249][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] JVM home [/var/lib/jenkins/.java/oracle-8u221-linux/jre]
1> [2019-10-25T01:22:40,249][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] JVM arguments [-Dfile.encoding=UTF8, -Dcompiler.java=12, -Des.scripting.update.ctx_in_params=false, -Des.transport.cname_in_publish_address=true, -Dgradle.dist.lib=/var/lib/jenkins/.gradle/wrapper/dists/gradle-5.6.2-all/9st6wgf78h16so49nn74lgtbb/gradle-5.6.2/lib, -Dgradle.user.home=/var/lib/jenkins/.gradle, -Dgradle.worker.jar=/var/lib/jenkins/.gradle/caches/5.6.2/workerMain/gradle-worker.jar, -Dio.netty.noKeySetOptimization=true, -Dio.netty.noUnsafe=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Djava.awt.headless=true, -Djava.locale.providers=SPI,JRE, -Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager, -Djna.nosys=true, -Dorg.gradle.native=false, -Druntime.java=8, -Dtests.artifact=server, -Dtests.gradle=true, -Dtests.logger.level=WARN, -Dtests.security.manager=true, -Dtests.seed=14AC7937BAB45A6E, -Dtests.task=:server:integTest, -esa, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/jenkins/workspace/elastic+elasticsearch+7.x+multijob-unix-compatibility/os/oraclelinux-7&&immutable/server/build/heapdump, -Xms512m, -Xmx512m, -Dfile.encoding=UTF-8, -Djava.io.tmpdir=./temp, -Duser.country=US, -Duser.language=en, -Duser.variant, -ea]
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] no modules loaded
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] loaded plugin [org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT$BrokenSettingPlugin]
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] loaded plugin [org.elasticsearch.snapshots.DedicatedClusterSnapshotRestoreIT$TestCustomMetaDataPlugin]
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] loaded plugin [org.elasticsearch.snapshots.mockstore.MockRepository$Plugin]
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] loaded plugin [org.elasticsearch.test.ESIntegTestCase$AssertActionNamePlugin]
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] loaded plugin [org.elasticsearch.test.ESIntegTestCase$TestSeedPlugin]
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] loaded plugin [org.elasticsearch.test.MockHttpTransport$TestPlugin]
1> [2019-10-25T01:22:40,250][INFO ][o.e.p.PluginsService ] [testDataNodeRestartAfterShardSnapshotFailure] loaded plugin [org.elasticsearch.transport.nio.MockNioTransportPlugin]
1> [2019-10-25T01:22:40,269][INFO ][o.e.d.DiscoveryModule ] [testDataNodeRestartAfterShardSnapshotFailure] using discovery type [zen] and seed hosts providers [settings, file]
1> [2019-10-25T01:22:40,290][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] initialized
1> [2019-10-25T01:22:40,290][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] starting ...
1> [2019-10-25T01:22:40,294][INFO ][o.e.t.TransportService ] [testDataNodeRestartAfterShardSnapshotFailure] publish_address {127.0.0.1:38475}, bound_addresses {[::1]:35853}, {127.0.0.1:38475}
1> [2019-10-25T01:22:40,301][INFO ][o.e.c.s.ClusterApplierService] [node_td2] removed {{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{DKlTH9pkRymZtUbRoe8_rA}{127.0.0.1}{127.0.0.1:37309}{di}}, term: 1, version: 12, reason: ApplyCommitRequest{term=1, version=12, sourceNode={node_tm0}{ztB71aT-Qfmpv14TI36OOg}{JRG4MuoYREqTFOGkDH3D2g}{127.0.0.1}{127.0.0.1:34059}{m}}
1> [2019-10-25T01:22:40,305][INFO ][o.e.c.s.ClusterApplierService] [node_tm0] removed {{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{DKlTH9pkRymZtUbRoe8_rA}{127.0.0.1}{127.0.0.1:37309}{di}}, term: 1, version: 12, reason: Publication{term=1, version=12}
1> [2019-10-25T01:22:40,309][INFO ][o.e.c.c.Coordinator ] [testDataNodeRestartAfterShardSnapshotFailure] cluster UUID [6WUNaFxyS9e_GZgDQ7k3FQ]
1> [2019-10-25T01:22:40,356][INFO ][o.e.c.s.MasterService ] [node_tm0] node-join[{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{AdrMl80YTKu-MHo2vIUEhw}{127.0.0.1}{127.0.0.1:38475}{di} join existing leader], term: 1, version: 14, delta: added {{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{AdrMl80YTKu-MHo2vIUEhw}{127.0.0.1}{127.0.0.1:38475}{di}}
1> [2019-10-25T01:22:40,359][INFO ][o.e.c.s.ClusterApplierService] [node_td2] added {{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{AdrMl80YTKu-MHo2vIUEhw}{127.0.0.1}{127.0.0.1:38475}{di}}, term: 1, version: 14, reason: ApplyCommitRequest{term=1, version=14, sourceNode={node_tm0}{ztB71aT-Qfmpv14TI36OOg}{JRG4MuoYREqTFOGkDH3D2g}{127.0.0.1}{127.0.0.1:34059}{m}}
1> [2019-10-25T01:22:40,359][INFO ][o.e.c.s.ClusterApplierService] [node_td1] master node changed {previous [], current [{node_tm0}{ztB71aT-Qfmpv14TI36OOg}{JRG4MuoYREqTFOGkDH3D2g}{127.0.0.1}{127.0.0.1:34059}{m}]}, added {{node_td2}{mpwldoJOQTucDLY79bMP6w}{Ge9Glg2TTyCthw9_1wtfnQ}{127.0.0.1}{127.0.0.1:36515}{di},{node_tm0}{ztB71aT-Qfmpv14TI36OOg}{JRG4MuoYREqTFOGkDH3D2g}{127.0.0.1}{127.0.0.1:34059}{m}}, term: 1, version: 14, reason: ApplyCommitRequest{term=1, version=14, sourceNode={node_tm0}{ztB71aT-Qfmpv14TI36OOg}{JRG4MuoYREqTFOGkDH3D2g}{127.0.0.1}{127.0.0.1:34059}{m}}
1> [2019-10-25T01:22:40,362][INFO ][o.e.s.m.MockRepository ] [node_td1] starting mock repository with random prefix default
1> [2019-10-25T01:22:40,442][WARN ][o.e.s.SnapshotShardsService] [node_td1] [[test-idx][0]][test-repo:test-snap/8CMeKZucSfKdomiV2gX9Lw] failed to snapshot shard
1> org.elasticsearch.index.IndexNotFoundException: no such index [test-idx]
1> at org.elasticsearch.indices.IndicesService.indexServiceSafe(IndicesService.java:458) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:317) ~[main/:?]
1> at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:285) ~[main/:?]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [main/:?]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_221]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_221]
1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]
1> [2019-10-25T01:22:40,444][INFO ][o.e.n.Node ] [testDataNodeRestartAfterShardSnapshotFailure] started
1> [2019-10-25T01:22:40,444][INFO ][o.e.c.s.ClusterApplierService] [node_tm0] added {{node_td1}{Kn1zpH90QPqXMtL1NqqVOg}{AdrMl80YTKu-MHo2vIUEhw}{127.0.0.1}{127.0.0.1:38475}{di}}, term: 1, version: 14, reason: Publication{term=1, version=14}
1> [2019-10-25T01:22:40,447][INFO ][o.e.s.DedicatedClusterSnapshotRestoreIT] [testDataNodeRestartAfterShardSnapshotFailure] --> wait for shard snapshot of first primary to show as failed
1> [2019-10-25T01:22:40,632][INFO ][o.e.c.r.a.AllocationService] [node_tm0] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[test-idx][0]]]).
1> [2019-10-25T01:23:46,034][INFO ][o.e.a.a.c.r.c.TransportCleanupRepositoryAction] [node_tm0] Running cleanup operations on repository [test-repo][-1]
1> [2019-10-25T01:23:46,036][WARN ][o.e.a.a.c.r.c.TransportCleanupRepositoryAction] [node_tm0] Failed to run repository cleanup operations on [test-repo][-1]
1> java.lang.IllegalStateException: Cannot cleanup [test-repo] - a snapshot is currently running
1> at org.elasticsearch.action.admin.cluster.repositories.cleanup.TransportCleanupRepositoryAction$2.execute(TransportCleanupRepositoryAction.java:187) ~[main/:?]
1> at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[main/:?]
1> at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[main/:?]
1> at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[main/:?]
1> at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [main/:?]
1> at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [main/:?]
1> at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [main/:?]
1> at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [main/:?]
1> at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [main/:?]
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [main/:?]
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [main/:?]
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [main/:?]
1> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_221]
1> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_221]
1> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_221]
1> [2019-10-25T01:23:46,039][INFO ][o.e.s.DedicatedClusterSnapshotRestoreIT] [testDataNodeRestartAfterShardSnapshotFailure] [DedicatedClusterSnapshotRestoreIT#testDataNodeRestartAfterShardSnapshotFailure]: cleaning up after test
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/Snapshot/RestoreAnything directly related to the `_snapshot/*` APIsAnything directly related to the `_snapshot/*` APIs>test-failureTriaged test failures from CITriaged test failures from CI