Skip to content

Cluster state and versions - when should we increment which version and how often? #14158

@brwe

Description

@brwe

While reviewing #14062 I found it somewhat difficult to reason about which version is incremented when in the cluster state. I open this issue to summarize how I think versions work right now and to maybe get some feedback on how this can be handled in a more easy to understand way.

This is the version numbers we maintain:

  • cluster state version:
    • should be incremented only by one each time a cluster state update task is processed and actually yielded any change in the whole cluster state
    • currently only changed in the cluster state update thread.
  • MetaData version:
    • should increment each time something in the cluster settings or index meta data changes.
    • currently only changed in the cluster state update thread.
  • IndexMetaData version:
    • Should increment whenever some index specific property changes (mapping, settings, etc. expect for routing changes which we track in shard version).
    • changed in many places, look at IndexMetaData.Builder.version(..) and where it is called.
  • RoutingTable version
    • should increment each time any shard routing changes
    • currently changed in the cluster state update thread and AllocationService.buildChangedResult(..).
  • Shard version:
    • incremented each time something in the routing of a single shard changes (add/remove replica, relocate shard etc.).
    • Shard version is stored in the ShardRoutings. It is updated when a single ShardRouting changes during allocation and then copied over to all ShardRoutings from the same copy in IndexRoutingTable.normalizeVersions() once allocation has finished.

Q: Cluster state version and MetaData version are updated only once per cluster state update task but the others are not or at least it is not immediately clear if they can potentially be incremented by > 1. It it OK if IndexMetaData version, RoutingTable version and shard version increment by more than one between cluster states?

While I think I understand why we do versioning now the way we do it I find it cumbersome to read and wonder if there is a cleaner way to maintain these versions.
For example: we only need the version increments before master sends the new cluster state to the other nodes. Can we build a new cluster state without incrementing any version in any of the components and then here check the difference between new and old cluster state and update all versions in one go? This would leave no questions open about when versions are updated and where. Chatted very briefly with @s1monw who thinks this will add complexity and not remove any but I have not yet given up hope and will give it a shot.

Any kind of feedback is more than welcome.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions