org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT.testFailOverBasics failed

Failure not reproducible locally. 
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+internalClusterTest/3881/consoleFull
Reproduce Line:
```
./gradlew :x-pack:plugin:ml:internalClusterTest -Dtests.seed=D9A30C4EA4AC438B -Dtests.class=org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT -Dtests.method="testFailOverBasics" -Dtests.security.manager=true -Dtests.locale=en-AU -Dtests.timezone=Etc/GMT-14 -Dcompiler.java=12 -Druntime.java=8
```

Digging into the failure, it appears that the test timed out waiting for `ensureGreen`
```
  1> [2019-03-28T07:05:01,572][INFO ][o.e.x.m.i.BasicDistributedJobsIT] [testFailOverBasics] ensureGreen timed out, cluster state:
```

The failure seems as if:
• We killed the master (node0) and they abdicated to node3
• We tried waiting for green but timed out
• This is because `.ml-state` never fully made a replica. Seems that we killed a node too quickly? So the `.ml-state` index kept the cluster from being green. 

 ```
1> ----shard_id [.ml-state][0]
  1> --------[.ml-state][0], node[null], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[UNASSIGNED], unassigned_info[[reason=NODE_LEFT], at[2019-03-27T17:04:31.506Z], delayed=false, details[node_left [7teV8OudStufgnfSDcALpw]], allocation_status[no_valid_shard_copy]]
  1> --------[.ml-state][0], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=PRIMARY_FAILED], at[2019-03-27T17:04:31.506Z], delayed=false, details[primary failed while replica initializing], allocation_status[no_attempt]]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT.testFailOverBasics failed #40546

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

org.elasticsearch.xpack.ml.integration.BasicDistributedJobsIT.testFailOverBasics failed #40546

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions