Skip to content

Conversation

@jasontedor
Copy link
Member

When a client sends a request to fail a shard to the master, the current
behavior is that the master will submit the cluster state update task
and then immediately send a successful response back to the client;
additionally, if there are any failures while processing the cluster
state update task to fail the shard, then the client will never be
notified of these failures.

This commit modifies the master behavior when handling requests to fail
a shard. In particular, the master will now wait until successful
publication of the cluster state update before notifying the request
client that the shard is marked as failed; additionally, the client is
now notified of any failures during the execution of the cluster state
update task.

Relates #14252

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this keeps bugging me :) we should something on the executor as well....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleskes What keeps bugging you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this through another channel; the issue is the possible reroutes are being issued for every task rather than once per batch. Modifying this behavior will require a little extra machinery in the cluster state task execution framework. I opened #15482 to address.

@bleskes
Copy link
Contributor

bleskes commented Dec 16, 2015

LGTM

When a client sends a request to fail a shard to the master, the current
behavior is that the master will submit the cluster state update task
and then immediately send a successful response back to the client;
additionally, if there are any failures while processing the cluster
state update task to fail the shard, then the client will never be
notified of these failures.

This commit modifies the master behavior when handling requests to fail
a shard. In particular, the master will now wait until successful
publication of the cluster state update before notifying the request
client that the shard is marked as failed; additionally, the client is
now notified of any failures during the execution of the cluster state
update task.

Relates #14252
jasontedor added a commit that referenced this pull request Dec 16, 2015
…d-failures

Master should wait on cluster state publication when failing a shard
@jasontedor jasontedor merged commit 3e8768f into elastic:master Dec 16, 2015
@jasontedor jasontedor deleted the master-side-of-wait-on-shard-failures branch December 16, 2015 15:39
@clintongormley clintongormley added :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >enhancement resiliency v5.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants