[Indexing] A network partition can cause in flight documents to be lost

This ticket is meant to capture an issue which was discovered as part of the work done in #7493 , which contains a [failing reproduction test](https://github.com/elasticsearch/elasticsearch/blob/596a4a073584c4262d574828c9caea35b5ed1de5/src/test/java/org/elasticsearch/discovery/DiscoveryWithServiceDisruptions.java#L375) with @awaitFix.

If a network partition separates a node from the master, there is some window of time before the node detects it. The length of the window is dependent on the type of the partition. This window is extremely small if a socket is broken. More adversarial partitions, for example, silently dropping requests without breaking the socket can take longer (up to 3x30s using current defaults).

If the node hosts a _primary_ shard at the moment of partition, and ends up being isolated from the cluster (which could have resulted in Split Brain before), some documents that are being indexed into the primary _may_ be lost if they fail to reach one of the allocated replicas  (due to the partition) and that replica is later promoted to primary by the master.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Indexing] A network partition can cause in flight documents to be lost #7572

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Indexing] A network partition can cause in flight documents to be lost #7572

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions