-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
While reviewing #14062 I found it somewhat difficult to reason about which version is incremented when in the cluster state. I open this issue to summarize how I think versions work right now and to maybe get some feedback on how this can be handled in a more easy to understand way.
This is the version numbers we maintain:
- cluster state version:
- should be incremented only by one each time a cluster state update task is processed and actually yielded any change in the whole cluster state
- currently only changed in the cluster state update thread.
MetaDataversion:- should increment each time something in the cluster settings or index meta data changes.
- currently only changed in the cluster state update thread.
IndexMetaDataversion:- Should increment whenever some index specific property changes (mapping, settings, etc. expect for routing changes which we track in shard version).
- changed in many places, look at
IndexMetaData.Builder.version(..)and where it is called.
RoutingTableversion- should increment each time any shard routing changes
- currently changed in the cluster state update thread and
AllocationService.buildChangedResult(..).
- Shard version:
- incremented each time something in the routing of a single shard changes (add/remove replica, relocate shard etc.).
- Shard version is stored in the
ShardRoutings. It is updated when a singleShardRoutingchanges during allocation and then copied over to allShardRoutings from the same copy inIndexRoutingTable.normalizeVersions()once allocation has finished.
Q: Cluster state version and MetaData version are updated only once per cluster state update task but the others are not or at least it is not immediately clear if they can potentially be incremented by > 1. It it OK if IndexMetaData version, RoutingTable version and shard version increment by more than one between cluster states?
While I think I understand why we do versioning now the way we do it I find it cumbersome to read and wonder if there is a cleaner way to maintain these versions.
For example: we only need the version increments before master sends the new cluster state to the other nodes. Can we build a new cluster state without incrementing any version in any of the components and then here check the difference between new and old cluster state and update all versions in one go? This would leave no questions open about when versions are updated and where. Chatted very briefly with @s1monw who thinks this will add complexity and not remove any but I have not yet given up hope and will give it a shot.
Any kind of feedback is more than welcome.