Suboptimal shard allocation with imbalanced shards

**Elasticsearch version**:

```
# elasticsearch --version
Version: 2.2.0, Build: 8ff36d1/2016-01-27T13:32:39Z, JVM: 1.8.0_72-internal
```

**JVM version**:

```
# java -version
openjdk version "1.8.0_72-internal"
OpenJDK Runtime Environment (build 1.8.0_72-internal-b15)
OpenJDK 64-Bit Server VM (build 25.72-b15, mixed mode)
```

**OS version**: Debian Jessie on kernel 4.1.3.

**Description of the problem including expected versus actual behavior**:

Elasticsearch does not try to spread shards for new indices equally if one node has less shards than others.

Elasticsearch on on of 12 machines died early in the morning (UTC) and caused several indices go red, even though each index has 1 replica. All of the red indices were today's indices. This is a separate issue, this one is about allocation. I unmounted the bad disk on unlucky node and restarted it, roughly 6 hours after the incident. Indices did not recover.

Since there wasn't much to do at this point, I've dropped red indices to let them refill from kafka. At the same time elasticsearch started recovery procedure to rebalance shards evenly. This happened for new indices:

![image](https://cloud.githubusercontent.com/assets/89186/13914778/88dac652-ef48-11e5-8384-307a9fc12c12.png)

This is how node list looked like:

![image](https://cloud.githubusercontent.com/assets/89186/13914781/8fa1d318-ef48-11e5-8f2c-27c886354492.png)

It seems like elasticsearch decided to ignore the fact that it's going to rebalance shards anyway and allocated more shards to the empties node. This caused much higher load on this node and made it a bottleneck.

Another issue is that all shards disappeared. I think the following happened:
1. After the incident elasticsearch recovered available indices to have 2 copies on healthy nodes.
2. After bad node rejoined it had the same copies removed since they were redundant.
3. Since bad node has less shards now, it started recovering inactive shards from old indices that were just removed from it.

This looks suboptimal if it's right. This could be #17019 again, since I run optimize and flush every night and it flushed sync happened after the disk failure.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suboptimal shard allocation with imbalanced shards #17213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suboptimal shard allocation with imbalanced shards #17213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions