Skip to content

[CCR] Tweak the retry-able error list #33954

@martijnvg

Description

@martijnvg

When a shard follow tasks encounters an error, it determines whether it is a retry-able error or a non retry-able error. A retry-able error is retried in back off like manner. A non retry-able error is not retried and fails the shard follow task.

The follow list of errors is retried at the moment:

  • java.net.ConnectException
  • java.nio.ClosedChannelException
  • "Connecion reset" (any error with this message)
  • "connection was aborted" (any exception with this message)
  • "forcibly closed" (any exception with this message)
  • "Broken pipe" (any exception with this message)
  • "Connection timed out" (any exception with this message)
  • "Socket is closed" (any exception with this message)
  • "Socket closed" (any exception with this message)
  • ShardNotFoundException
  • IndexNotFoundException
  • IllegalIndexShardStateException
  • NoShardAvailableActionException
  • UnavailableShardsException
  • AlreadyClosedException

I think we should make the following changes to this list of retry-able errors:

  • Add ElasticsearchSecurityException. For in the case the current user has insufficient privileges while an index is being followed. Prior to following ccr checks whether the current user has sufficient privileges and if not the follow api fails with an error.
  • Add Index block exception. If the leader index gets closed, this exception is returned.
  • Add ClusterBlockException service unavailable. In case for example the leader cluster is without elected master.
  • Remove IndexNotFoundException. If the leader / follower index has been deleted, ccr will need to stop the shard follow tasks with an error.

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/CCRIssues around the Cross Cluster State Replication features

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions