-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:mlMachine learningMachine learning>test-failureTriaged test failures from CITriaged test failures from CI
Description
Failure not reproducible locally.
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/3881/consoleFull
Reproduce Line:
./gradlew :x-pack:plugin:ml:internalClusterTest -Dtests.seed=D9A30C4EA4AC438B -Dtests.class=org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT -Dtests.method="testFailOverBasics" -Dtests.security.manager=true -Dtests.locale=en-AU -Dtests.timezone=Etc/GMT-14 -Dcompiler.java=12 -Druntime.java=8
Digging into the failure, it appears that the test timed out waiting for ensureGreen
1> [2019-03-28T07:05:01,572][INFO ][o.e.x.m.i.BasicDistributedJobsIT] [testFailOverBasics] ensureGreen timed out, cluster state:
The failure seems as if:
• We killed the master (node0) and they abdicated to node3
• We tried waiting for green but timed out
• This is because .ml-state never fully made a replica. Seems that we killed a node too quickly? So the .ml-state index kept the cluster from being green.
1> ----shard_id [.ml-state][0]
1> --------[.ml-state][0], node[null], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[UNASSIGNED], unassigned_info[[reason=NODE_LEFT], at[2019-03-27T17:04:31.506Z], delayed=false, details[node_left [7teV8OudStufgnfSDcALpw]], allocation_status[no_valid_shard_copy]]
1> --------[.ml-state][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=PRIMARY_FAILED], at[2019-03-27T17:04:31.506Z], delayed=false, details[primary failed while replica initializing], allocation_status[no_attempt]]
Metadata
Metadata
Assignees
Labels
:mlMachine learningMachine learning>test-failureTriaged test failures from CITriaged test failures from CI