-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.Cluster formation and cluster state publication, including cluster membership and fault detection.>featureMeta
Description
In Zen, ClusterState is stored through GatewayMetaState. There are the following issues with the current approach:
- Since
GatewayMetaStateimplementsClusterStateApplier, if there was anIOExceptionduring storing the state,ClusterStatechange still will be applied to in-memory node state. GatewayMetaStatestores global metadata and index metadata for each index in separate files. SeeMetaDataStateFormatter.MetaDataStateFormatterensures that global/index metadata is stored atomically. However, if there is a change to the global metadata and index metadata or metadata of several indices, state update could be partial.
Zen2 needs a reliable mechanism to store ClusterState without these drawbacks. Two alternative approaches were discussed:
- Instead of storing metadata in separate files, create translog for
ClusterStatediffs. OnceClusterState.Diffis received, persist it to the translog. There could be a background merging process that merges multipleClusterStatediffs together. - Enhance existing solution, by adding the manifest file that will contain pointers to global state/index metadata files and will ensure atomicity.
While the 1st approach is preferable in the long run, for now, we decided to go with the 2nd approach.
Below is the list of things that should be done:
- Migrate
MetaDataStateFormatto Lucene directory abstraction for easier failure testing. (Switch MetaDataStateFormat to Lucene directory abstraction #33989) - Change
MetaDataStateFormat.writesemantics, to clearly distinguish 2 failure cases - write has failed andloadLatestStatemust return old state and write has failed and loadLatestState may return either old or new state. ([Zen2] Change MetaDataStateFormat write semantics #34709) - Add manifest file support ([Zen2] Write manifest file #35049)
- Move
ClusterStatefields to be persisted toMetaDatafield (exceptversion, which is updated very often and will go directly to Manifest). Namely,term,lastCommitedConfiguration,lastAcceptedConfigurationandvotingTombstones. ([Zen2] Move ClusterState fields to be persisted to ClusterState.MetaData #35625) - Implement
PersistedStateinterface for Zen2. Note,term != currentTermandtermgoes toMetaData,currentTermgoes toManifest. ([Zen2] PersistedState interface implementation #35819) - Properly handle
WriteStateExceptionthrown byGatewayMetaState.
Although points 1-3 are relevant for Zen, it's decided to make changes on Zen2 branch.
Relates to #32006
DaveCTurner, ywelsch and luyuncheng
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/Cluster CoordinationCluster formation and cluster state publication, including cluster membership and fault detection.Cluster formation and cluster state publication, including cluster membership and fault detection.>featureMeta