Skip to content

Conversation

@dumbbell
Copy link
Collaborator

Why

If we run reset again on an already virgin node, it will take decisions based on the wrong state. In particular, the previous use of Khepri or Mnesia is lost with the first reset. Therefore, the second reset wolud delete non-Khepri related files that belong to the coordination Ra system.

This is particularily problematic with the previously documented way of joining two nodes using the CLI:

rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster $REMOTE_NODE
rabbitmqctl start_app

Indeed, join_cluster implies a reset. If the admin already reset the node as documented, join_cluster implied reset would delete too many files, breaking Khepri after the join if Khepri is used by the remote node.

How

In rabbit_db:reset/0, we skip the reset if the node is already virgin.

Fixes #14748.

[Why]
The previous implementation checked if the store was empty. This part is
unchanged. However if `is_empty/0` returned an error, typically because
the store was stopped, it would consider that the node is also virgin.

This is a wrong approximation because an admin could have executed
`rabbitmqctl stop_app` before something checks the node is virgin.

[How]
With this patch, we introduce `rabbit_khepri:is_virgin_node/0`. It will
start the store if it is stopped, then it will query if it is empty. It
will stop the store again if it was stopped initially.

This way, we have a more accurate answer to the question.
[Why]
If we run `reset` again on an already virgin node, it will take
decisions based on the wrong state. In particular, the previous use of
Khepri or Mnesia is lost with the first reset. Therefore, the second
reset wolud delete non-Khepri related files that belong to the
coordination Ra system.

This is particularily problematic with the previously documented way of
joining two nodes using the CLI:

    rabbitmqctl stop_app
    rabbitmqctl reset
    rabbitmqctl join_cluster $REMOTE_NODE
    rabbitmqctl start_app

Indeed, `join_cluster` implies a reset. If the admin already reset the
node as documented, `join_cluster` implied reset would delete too many
files, breaking Khepri after the join if Khepri is used by the remote
node.

[How]
In `rabbit_db:reset/0`, we skip the reset if the node is already virgin.

Fixes #14748.
@dumbbell dumbbell added this to the 4.3.0 milestone Oct 21, 2025
@dumbbell dumbbell self-assigned this Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant