Timeout for primary relocation handoff seems too long

Today we wait for 30 minutes to block all operations when performing a primary relocation handoff:

https://github.com/elastic/elasticsearch/blob/4bd8e7b9f49ed091a53ce2717d65dc0ecb77b8d8/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java#L636

However this behaves badly if a write task gets stuck, for instance in the case fixed in https://github.com/elastic/elasticsearch/pull/36770. The effect is that the block cannot be put in place (because there is an in-flight operation) but the pending block prevents any further operations from taking place until it eventually fails.

I think we should consider reducing this timeout, because it seems preferable to fail the primary relocation (and therefore resume write operations) much sooner than 30 minutes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timeout for primary relocation handoff seems too long #41307

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Timeout for primary relocation handoff seems too long #41307

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions