[DOCS] Document cluster behavior when a file system crashes but node remains operational

**Describe the feature**:  Document cluster behavior when a file system crashes but node remains operational



**Elasticsearch version**: Generic 

**Plugins installed**: [] n/a

**JVM version** (`java -version`): n/a

**OS version** (`uname -a` if on a Unix-like system): generic 

**Description of the problem including expected versus actual behavior**:

Elasticsearch documentation currently describes behavior when a node in a cluster fails. The documentation does not describe behavior when a node's file system fails but the node itself remains operational.  Such failure conditions can and will happen especially for customers using 3rd-party high-performance disk systems (SSD, RAID, etc.) which are loosely coupled with the OS. Additionally it is common that customers will  mount their data directories on high-performance disk systems while keeping their log data on the system drive. 


**General issues that need to be addressed:** 

- cluster actions when primary shards are lost due to disk failure (according to my testing, replica shards are promoted on other nodes) 
- cluster actions when replica shards are lost due to disk failure (new replica shards are created on surviving nodes) 
- parameters affecting shard management when a disk failure occurs 
- cluster response when disk failure is resolved and the disk system is brought back online (according to my testing, nothing happens until the entire cluster is restarted) 
- response of the node and the cluster to queries and CRUD requests addressed to the node with the failed system. 


**Relevant Discussions** 

"Expected behavior" during disk crashes has changed significantly between elastic search versions and there are several significant open issues speaking to this question including: 

https://github.com/elastic/elasticsearch/issues/18417
https://github.com/elastic/elasticsearch/pull/18467
https://github.com/elastic/elasticsearch/issues/19789

Cluster response specifically to failed disk conditions should be documented for user system design and recovery planning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DOCS] Document cluster behavior when a file system crashes but node remains operational #25591

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DOCS] Document cluster behavior when a file system crashes but node remains operational #25591

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions