Skip to content

Decouple global checkpoint sync from a shard falling idle #26573

@jasontedor

Description

@jasontedor

Global checkpoints are inlined with replication operations. This means that a replication request from a primary to a replica carries the global checkpoint on the primary to the replica who then updates its local knowledge of the global checkpoint. There's a problem though in that the replica will always be lagging. Consider the last indexing operation on a shard. This operation will carry the global checkpoint known to the primary at the time that the primary completed the operation but before the replicas have completed the operation; by definition, this global checkpoint can not be equal to what the global checkpoint could advance to after the operation completes. When the replicas complete the operation, their local checkpoint will advance and they will relay this information back to the primary so that the global checkpoint could now advance further. Without a follow-up operation, the local knowledge of the global checkpoint on the replicas will never advance. To account for this, today we send a background global checkpoint sync when the primary shard falls idle. This carries problems though:

  • the shard idle timer is by default five minutes, a long time to wait for the global checkpoint to advance
  • there's no follow-up sync if a replica misses the sync when the shard idle timer fires
  • there's an inherent race condition where the primary shard could fall idle before an on-going operation completes, meaning that there will never be a background sync when that operation does complete

To account for this, we are going to decouple the background sync from a shard falling idle, and instead add a periodic background sync that sends a sync if there is a replica that is behind the global checkpoint on the primary.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions