Skip to content

Timeout for primary relocation handoff seems too long #41307

@DaveCTurner

Description

@DaveCTurner

Today we wait for 30 minutes to block all operations when performing a primary relocation handoff:

indexShardOperationPermits.blockOperations(30, TimeUnit.MINUTES, () -> {

However this behaves badly if a write task gets stuck, for instance in the case fixed in #36770. The effect is that the block cannot be put in place (because there is an in-flight operation) but the pending block prevents any further operations from taking place until it eventually fails.

I think we should consider reducing this timeout, because it seems preferable to fail the primary relocation (and therefore resume write operations) much sooner than 30 minutes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/CRUDA catch all label for issues around indexing, updating and getting a doc by id. Not search.team-discuss

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions