Skip to content

Conversation

@davidkyle
Copy link
Member

@davidkyle davidkyle commented Dec 16, 2019

The test was failing to relocate a job to a new node after a network disruption because the .ml-state index did not have active shards:

[o.e.p.PersistentTasksClusterService] ignoring task job-relocation-job because assignment is the same node: [null], explanation: [Not opening job [relocation-job], because not all primary shards are active for the following indices [.ml-state]]

.ml-state is created when the first job is opened then the node was removed from the cluster before the index had time to replicate. Waiting for a green cluster state before triggering the disruption should ensure the replicas are present and fix the test.

I hope this closes #49908 but I'll leave the issue open and trace logging enabled for a week in case it reoccurs.

@davidkyle davidkyle added >test Issues or PRs that are addressing/adding tests :ml Machine learning labels Dec 16, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

Copy link

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle davidkyle merged commit 736e9f9 into elastic:master Dec 16, 2019
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
@davidkyle davidkyle deleted the fix-disruption-test branch June 2, 2020 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >test Issues or PRs that are addressing/adding tests v7.6.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] NetworkDisruptionIT.testJobRelocation failing

4 participants