Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

Today we reroute the cluster as part of the process of starting a shard, which
runs at URGENT priority. In large clusters, rerouting may take some time to
complete, and this means that a mere trickle of shard-started events can cause
starvation for other, lower-priority, tasks that are pending on the master.

However, it isn't really necessary to perform a reroute when starting a shard,
as long as one occurs eventually. This commit removes the inline reroute from
the process of starting a shard and replaces it with a deferred one that runs
at NORMAL priority, avoiding starvation of higher-priority tasks.

Backport of #44433 to 7.x..

* Defer reroute when starting shards

Today we reroute the cluster as part of the process of starting a shard, which
runs at `URGENT` priority. In large clusters, rerouting may take some time to
complete, and this means that a mere trickle of shard-started events can cause
starvation for other, lower-priority, tasks that are pending on the master.

However, it isn't really necessary to perform a reroute when starting a shard,
as long as one occurs eventually. This commit removes the inline reroute from
the process of starting a shard and replaces it with a deferred one that runs
at `NORMAL` priority, avoiding starvation of higher-priority tasks.

This may improve some of the situations related to elastic#42738 and elastic#42105.

* Specific test case for followup priority setting

We cannot set the priority in all InternalTestClusters because the deprecation
warning makes some tests unhappy. This commit adds a specific test instead.

* Checkstyle

* Cluster state always changed here

* Assert consistency of routing nodes

* Restrict setting only to reasonable priorities
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) backport v7.4.0 labels Jul 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

The change in elastic#44433 introduces a state in which the cluster has no relocating
shards but still has a pending reroute task which might start a shard
relocation. `TransportSearchFailuresIT` failed on a PR build seemingly because
it did not wait for this pending task to complete too, reporting more active shards
than expected:

    2> java.lang.AssertionError:
      Expected: <9>
           but: was <10>
          at __randomizedtesting.SeedInfo.seed([4057CA4301FE95FA:207EC88573747235]:0)
          at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
          at org.junit.Assert.assertThat(Assert.java:956)
          at org.junit.Assert.assertThat(Assert.java:923)
          at org.elasticsearch.search.basic.TransportSearchFailuresIT.testFailedSearchWithWrongQuery(TransportSearchFailuresIT.java:97)

This commit addresses this failure by waiting until there are neither pending
tasks nor shard relocations in progress.
@DaveCTurner
Copy link
Contributor Author

PR build failure looks legitimately related - I opened #44543 to address it in master and added it here in a775047.

@DaveCTurner DaveCTurner merged commit 452f7f6 into elastic:7.x Jul 18, 2019
@DaveCTurner DaveCTurner deleted the 2019-07-18-defer-reroute-when-starting-shards-7x branch July 18, 2019 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement v7.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants