Skip to content

CloseWhileRelocatingShardsIT.testCloseWhileRelocatingShards failure on master due to AssertionError in production code #39588

@gwbrown

Description

@gwbrown

Example CI links:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=java11,ES_RUNTIME_JAVA=zulu8,nodes=immutable&&linux&&docker/272/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/2323/console

I believe this may be related to replicated closed indices, as this particular failure first appeared (according to build stats) on the replicated-closed-indices branch on Feb. 8.

The assertion appears to come from this check.

Reproduce line, does not reproduce locally:

./gradlew :server:integTest \
  -Dtests.seed=BB15E4FDA1CABDD9 \
  -Dtests.class=org.elasticsearch.indices.state.CloseWhileRelocatingShardsIT \
  -Dtests.method="testCloseWhileRelocatingShards" \
  -Dtests.security.manager=true \
  -Dtests.locale=de-DE \
  -Dtests.timezone=America/Lower_Princes \
  -Dcompiler.java=11 \
  -Druntime.java=8

Stack trace:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=5128, name=elasticsearch[node_sd2][generic][T#1], state=RUNNABLE, group=TGRP-CloseWhileRelocatingShardsIT]
	at __randomizedtesting.SeedInfo.seed([B020647676D038FA:9FAB7AE3594E6E40]:0)
Caused by: java.lang.AssertionError: max seq. no. [-1] does not match [531]
	at __randomizedtesting.SeedInfo.seed([B020647676D038FA]:0)
	at org.elasticsearch.index.engine.ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:142)
	at org.elasticsearch.index.engine.ReadOnlyEngine.<init>(ReadOnlyEngine.java:116)
	at org.elasticsearch.index.engine.NoOpEngine.<init>(NoOpEngine.java:40)
	at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1442)
	at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1395)
	at org.elasticsearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:424)
	at org.elasticsearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:95)
	at org.elasticsearch.index.shard.StoreRecovery.executeRecovery(StoreRecovery.java:302)
	at org.elasticsearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:93)
	at org.elasticsearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:1681)
	at org.elasticsearch.index.shard.IndexShard.lambda$startRecovery$9(IndexShard.java:2318)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

I'm going to mute this test in master as it appears to be failing a few times per day and looks like a legitimate failure.

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/DistributedA catch all label for anything in the Distributed Indexing Area. Please avoid if you can.>test-failureTriaged test failures from CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions