-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Issue description
Original comment by @jasontedor:
The goal of cross-cluster replication is to enable users to replicate operations from one cluster to another cluster via active-passive replication. There are three drivers for providing CCR functionality in X-Pack:
- resiliency in case the primary cluster fails, a secondary cluster can serve as a hot backup
- geo-proximity so that reads can be served locally
- ECE can use the infrastructure for a UI to replicate data from one cluster to another
The purpose of this meta-issue is to serve as a high-level plan for the initial implementation.
Our initial implementation will build a shard-to-shard pull-based model without automatic setup built on the transport layer.
In this model, shards on a following index are responsible for pulling from shards on the leader index. We chose this model because:
- this model has simpler state management than a push-based model (each follower has one leader, but not vice-versa)
- recovery during failure is simpler as the follower knows how far it is in the sequence of operations
- operations can be streamed from any copy of the leader shard (the primary shard, or any of its replicas)
As far as automatic setup, for now users will have to manually set up a following index to a leader index, and would have to use on bootstrapping existing data via snapshot/restore. This does not necessarily mean that we will not have something more sophisticated when the first version ships, only that for the initial implementation we will not look at anything else. We can consider automatic setup (for example, for time-based indices) and remote recovery infrastructure later.
Utilizing the transport layer allows us to reuse existing infrastructure, and we can follow the path blazed by cross-cluster search for reading from remote clusters.
Basic infrastructure:
- API to serve operations from a given sequence number
- needs to fetch the sequence number
- respond with batch of requested size to client
- only return documents below the global checkpoint
- persistent task for the following index to pull data from the leader index
- a following engine implementation that can index operations that already have a sequence number (from the leader)
- the engine should reject operations that do not have a sequence number
- the engine will add a primary term
- this engine will not close gaps in histories upon recoveries, that is expected to be done from the leader
- we need to carefully consider the tradeoff of having the engine open for pluggability versus a key component of the CCR infrastructure being in open source
- implement a mechanism to transfer index metadata changes (e.g., mapping) to a follower from the leader
- could be done inline with the shard operation stream
- alternatively, the shard operation stream could indicate the minimum index metadata version required and the following index could wait (through an observer) until the local version catches up
- CCR REST API for users to set up remote replication
- design API
- implement API
Things to do and investigate:
[ ] Old indices cannot be followed. The soft delete index setting should not allowed to be set on older indices (indices that have been created prior to when ccr was released) (Nhat). We discussed and decided not to do this for now.- We currently allow duplicate seq# in lucene (marked as deleted and pointing to the same document version), in some rare cases around recovery and delayed recovery. This is not considered a big deal, but if we find a simple solution to avoid it, we would prefer a clean history. @dnhatn (no)
[ ] Until rollbacks are fully implemented for Lucene, lucene make contain seq# collisions. These can be resolved by making CCR terms aware - i.e., use terms as a secondary sort and us it to dedup operations. We should decide if needs to be done, especially when at some point Lucene rollbacks will be implemented (and are expected to clean these collisions). @dnhatn (yes). Discussed this with Martijn, the dedup logic we have in LuceneChangesSnapshot is sufficient.- Add additional validation to FollowIndexAction#validate(...). For example mappings need to be identical and leader / follow index must be open. (MvG) [CCR] Added more validation to follow index api. #31068
- Reject a follow request if the follower does not have
index.xpack.ccr.following_indexsetting (Nhat) (Reject follow request if following setting not enabled on follower #32448). - Add a test that verifies that when all copies of a shard are unavailable ccr keeps on following the leader index and restart the shard follow task when a copy of this shard becomes available. (@dnhatn) (yes)
- Modifying a mapping of a follower index should be prohibited. The mapping is kept in sync with the leader index by an internal mechanism in CCR. (@dnhatn) (Do not allow put mapping on follower #37675)
- Add a version to MappingMetaData in cluster state, so that we only sync mappings between leader and follow index if the mapping has changed. Currently we keep track of changes via the index metadata version. So if the refresh rate is changed then we try to sync mappings too. @jasontedor
- The follow shard task should poll for the global checkpoint in a more efficient manner. Currently the shard follow checks the leader's indices stats api every fixed amount of time to get the global checkpoint of a specific primary shard. However we can add an additional api that allows us to do this in a long polling manner. That way we get notified right away when the global checkpoint has increased based on what the follow shard task knows. Also this avoids a lot of chatter in the case no changes occur in the leader index. Add global checkpoint polling to cross-cluster replication #32651 (@jasontedor) (yes)
- Add validation that prevent following incompatible indexes. There is no good validation that prevents from following leader index with an incompatible existing follow index. It is likely that this will because of not matching global checkpoints and sequence numbers, but this is not guaranteed. Ideally you should only be able to follow an index that has been created by the create and follow api. Validation could be based on leader index uuid that we store in the index settings of a follow index. Or something more clever like using history uuid, but that requires more work. [CCR] Introduce leader index setting for follow index #31505 @martijnvg (yes)
- FollowingEngine shouldn't fill history gaps upon promotion and recovery FollowingEngine shouldn't fill history gaps upon promotion and recovery #31318
- Re-evaluate shard follow parameter defaults. [CCR] Re-evaluate shard follow parameter defaults #31717 (@dliappis) (yes)
- Improve shard follow task retry mechanism Improve shard follow task retry mechanism #31816
- ShardFollowNodeTask should fetch operations once (ShardFollowNodeTask should fetch operation once #32455)
- Move requests, response and action classes to xpack core module and verify that transport client works with ccr. (no)
- Make ccr work with high level rest client. (no)
- Store or generate autoGeneratedIdTimestamp for append-only documents. (@dnhatn) (yes) (Uses auto generated timestamp with soft-deletes #33656)
- Make
index.xpack.ccr.following_indexa final / internal setting. To avoid directly writing into the following index. Only CCR can write into a follow index. (@martijnvg) (yes) - Record the history uuid of the leader index into index level metadata and validate in follow api and each fetch in shard follow task that the leader uuid is the same with what is recorded. (@martijnvg) (yes)
- Force a new history uuid when force allocating a stale primary. So that we can detect in the follow shard task that the history uuid has changed and then fail. Force a new history uuid when force allocating a stale primary #26712 (@dnhatn) (yes)
- Improve failure handling and by retrying failed fetch calls in exponential backoff manner. Retryable failures should be retried indefinitely. [CCR] Improve shard follow task's retryable error handling #33371 (@martijnvg) (yes)
- [ ] Disable external usage of the follow api until we can safely execute a follow call after an unfollow call. (no)- Expose current number of retries for failed fetch tasks in stats API. (@jasontedor) (yes)
- Auto-follow patterns [CCR] Auto follow patterns #33007 (yes and no 😉)
- Rename
search.remote.*settings tocluster.remote.*(@jasontedor) (Generalize search.remote settings to cluster.remote #33413) - Automatically upgrade
search.remote.*in the cluster state tocluster.remote.*(@jasontedor) Add infrastructure to upgrade settings #33536, Upgrade remote cluster settings #33537 - Follow APIs should check if user has sufficient privileges before executing [CCR] create_and_follow api and follow api should check if user has sufficient privileges before executing #33553 (yes) (@martijnvg)
- Add create_follow_index privilege [CCR] Add create_follow_index privilege #33555 (yes) (@martijnvg)
- Auto resume following a leader index shard in case the history uuid of a follow shard has changed.
- When following a leader in that is also a follower index then use the history uuid of the leader index's leader index.
- Auto-follow patterns should fail and report if they match an index where soft deletes are not enabled. This is likely an operator mistake.
- Include CCR into the xpack usage api. Include CCR in the xpack usage api #37221