Skip to content

Conversation

@droberts195
Copy link

@droberts195 droberts195 commented Sep 26, 2019

When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.

Fixes #45652

When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices on the coordinating
node before the request is forwarded to the master node.
Then it is possible for the master node to return an
index_not_found_exception if one of the concrete indices
that was expanded on the coordinating node has been
deleted in the meantime.

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results patten.

Fixes elastic#45652
@droberts195 droberts195 added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 v7.5.0 v7.4.1 v7.3.3 labels Sep 26, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a safe change, given that the results indices should already be created and will not be deleted in the middle of these tests.

What a crazy bug!!!

@droberts195 droberts195 merged commit 1304274 into elastic:master Sep 26, 2019
@droberts195 droberts195 deleted the work_around_cat_indices_bug branch September 26, 2019 12:19
droberts195 pushed a commit that referenced this pull request Sep 26, 2019
When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.

Fixes #45652
droberts195 pushed a commit that referenced this pull request Sep 26, 2019
When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.

Fixes #45652
droberts195 pushed a commit that referenced this pull request Sep 26, 2019
When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)

It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.

Fixes #45652
@colings86 colings86 added v7.4.0 and removed v7.4.1 labels Sep 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >test Issues or PRs that are addressing/adding tests v7.3.3 v7.4.0 v7.5.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] MlJobIT.testDeleteJob and testDeleteJobAsync fail with index_not_found_exception

5 participants