rabbit_db: Skip reset if the node is already virgin #14768

dumbbell · 2025-10-21T11:18:29Z

Why

If we run reset again on an already virgin node, it will take decisions based on the wrong state. In particular, the previous use of Khepri or Mnesia is lost with the first reset. Therefore, the second reset wolud delete non-Khepri related files that belong to the coordination Ra system.

This is particularily problematic with the previously documented way of joining two nodes using the CLI:

rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl join_cluster $REMOTE_NODE
rabbitmqctl start_app

Indeed, join_cluster implies a reset. If the admin already reset the node as documented, join_cluster implied reset would delete too many files, breaking Khepri after the join if Khepri is used by the remote node.

How

In rabbit_db:reset/0, we skip the reset if the node is already virgin.

Fixes #14748.

[Why] The previous implementation checked if the store was empty. This part is unchanged. However if `is_empty/0` returned an error, typically because the store was stopped, it would consider that the node is also virgin. This is a wrong approximation because an admin could have executed `rabbitmqctl stop_app` before something checks the node is virgin. [How] With this patch, we introduce `rabbit_khepri:is_virgin_node/0`. It will start the store if it is stopped, then it will query if it is empty. It will stop the store again if it was stopped initially. This way, we have a more accurate answer to the question.

[Why] If we run `reset` again on an already virgin node, it will take decisions based on the wrong state. In particular, the previous use of Khepri or Mnesia is lost with the first reset. Therefore, the second reset wolud delete non-Khepri related files that belong to the coordination Ra system. This is particularily problematic with the previously documented way of joining two nodes using the CLI: rabbitmqctl stop_app rabbitmqctl reset rabbitmqctl join_cluster $REMOTE_NODE rabbitmqctl start_app Indeed, `join_cluster` implies a reset. If the admin already reset the node as documented, `join_cluster` implied reset would delete too many files, breaking Khepri after the join if Khepri is used by the remote node. [How] In `rabbit_db:reset/0`, we skip the reset if the node is already virgin. Fixes #14748.

dumbbell added 2 commits October 21, 2025 13:06

dumbbell added this to the 4.3.0 milestone Oct 21, 2025

dumbbell self-assigned this Oct 21, 2025

dumbbell added the backport-v4.2.x label Oct 21, 2025

dumbbell added 11 commits October 21, 2025 15:42

khepri

32aaf70

WIP

8f7f7b3

assert

3555a1b

wipe

51aec0a

join

0325c45

khepri_reset

5ad7988

are_running

84b4081

feature_flags

963e556

no_khepri_setup

47cb873

mnesia_reset

6b575e0

khepri_remove_member

d37d8d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rabbit_db: Skip reset if the node is already virgin #14768

rabbit_db: Skip reset if the node is already virgin #14768

dumbbell commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rabbit_db: Skip reset if the node is already virgin #14768

Are you sure you want to change the base?

rabbit_db: Skip reset if the node is already virgin #14768

Conversation

dumbbell commented Oct 21, 2025

Why

How

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant