An API for decommissioning a node or nodes

A common need when administrating nodes is to remove one or more nodes from the cluster without interrupting any operations. I think it would be useful to have a dedicate API for doing this decommission that is smarter than our decommissioning process.

Currently to decommission a node, a user would set the `"cluster.routing.allocation.exclude._id"` parameter to the node they wanted to remove from the cluster (alternatively `_ip`, `_host`, or `_name` could be used). They would then have to wait for the shards to be moved off of the node (checking the `/_cat/allocation` or similar APIs).

The issue with this is that during this time, it's possible for mutually exclusive settings to be set that causes data not to be migrated from the node, or for other filtering settings not to be achievable since the node will be subsequently removed from the cluster. Additionally, there isn't a *simple* way to check readiness other than reading cat APIs (should not be relied on) or the cluster state routing table.

What would be nice is the ability to mark a node as "pending decommission" such as:

``` json
PUT /_cluster/allocation/decommission
{
  "nodes": ["node-id-1", "node-id-2"]
}
```

This API would act similarly to the allocation exclusion API. We could then also treat these node ids "specially" internally, for instance we could disallow setting *index-level* shard filtering directly to these node IDs (this would be very nice for ILM since we randomly pick a node for allocation during a shrink). We could also potentially say "when the master node fails, prefer not to elect this node because it is going to be shortly removed from the cluster".

Rather than having a user have to check our other allocation APIs for the status of the decommission, we could provide a way to check the status of decommissioning nodes:

```
GET /_cluster/allocation/status
```

Which could give status like:

``` json
{
  "nodes": [
    {
      "node": "data-node-1",
      "node_id": "node-id-1",
      "safe_to_shutdown": false,
      "data_loss_on_removal": false,
      "decommission_possible": false,
      "shards_remaining": 7,
      "time_since_decommission": "1.2h",
      "time_since_decommission_millis": 4320000
    },
    {
      "node": "data-node-2",
      "node_id": "node-id-2",
      "safe_to_shutdown": true,
      "data_loss_on_removal": false,
      "decommission_possible": true,
      "shards_remaining": 0,
      "time_since_decommission": "1.2h",
      "time_since_decommission_millis": 4320000
    }
  ]
}
```

These fields are all made up, there's just ideas, but here's how they'd work:

- `safe_to_shutdown` - whether the node contains *any* shards, if it holds 0, this would be true
- `data_loss_on_removal` - in some cases (2 node clusters) you cannot evacuate all the data because primary and replicas can't be on the same node, this could return "true" if there were any 0-replica indices present on the node (so you could still remove the node if you didn't mind your cluster going yellow)
- `decommission_possible` - internally ES could check whether the shards *could* even be moved off of the node (maybe they can't due to disk space, for instance), and return `false` if it's never going to be possible to decommission this node

Additionally, we'd want a way to recommission a node in the event it was decommissioned but an admin changed their mind:

``` json
PUT /_cluster/allocation/recommission
{
  "nodes": ["node-id-2"]
}
```

There's a lot of additional ideas around this (potential things like "as soon as a node has moved all data off, automatically shut itself down"), but this is a bare bones issue to get discussion started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An API for decommissioning a node or nodes #49064

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

An API for decommissioning a node or nodes #49064

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions