Skip to content

Suboptimal shard allocation with imbalanced shards #17213

@bobrik

Description

@bobrik

Elasticsearch version:

# elasticsearch --version
Version: 2.2.0, Build: 8ff36d1/2016-01-27T13:32:39Z, JVM: 1.8.0_72-internal

JVM version:

# java -version
openjdk version "1.8.0_72-internal"
OpenJDK Runtime Environment (build 1.8.0_72-internal-b15)
OpenJDK 64-Bit Server VM (build 25.72-b15, mixed mode)

OS version: Debian Jessie on kernel 4.1.3.

Description of the problem including expected versus actual behavior:

Elasticsearch does not try to spread shards for new indices equally if one node has less shards than others.

Elasticsearch on on of 12 machines died early in the morning (UTC) and caused several indices go red, even though each index has 1 replica. All of the red indices were today's indices. This is a separate issue, this one is about allocation. I unmounted the bad disk on unlucky node and restarted it, roughly 6 hours after the incident. Indices did not recover.

Since there wasn't much to do at this point, I've dropped red indices to let them refill from kafka. At the same time elasticsearch started recovery procedure to rebalance shards evenly. This happened for new indices:

image

This is how node list looked like:

image

It seems like elasticsearch decided to ignore the fact that it's going to rebalance shards anyway and allocated more shards to the empties node. This caused much higher load on this node and made it a bottleneck.

Another issue is that all shards disappeared. I think the following happened:

  1. After the incident elasticsearch recovered available indices to have 2 copies on healthy nodes.
  2. After bad node rejoined it had the same copies removed since they were redundant.
  3. Since bad node has less shards now, it started recovering inactive shards from old indices that were just removed from it.

This looks suboptimal if it's right. This could be #17019 again, since I run optimize and flush every night and it flushed sync happened after the disk failure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)>enhancementMetaTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions