Keep exclusive/auto-delete queues with Khepri + network partition #14573

dumbbell · 2025-09-19T11:38:00Z

Keep exclusive/auto-delete queues with Khepri + network partition

Why

With Mnesia, when the network partition strategy is set to pause_minority, nodes on the "minority side" are stopped.

Thus, the exclusive queues that were hosted by nodes on that minority side are lost:

Consumers connected on these nodes are disconnected because the nodes are stopped.
Queue records on the majority side are deleted from the metadata store.

This was ok with Mnesia and how this network partition handling strategy is implemented. However, it does not work with Khepri because the nodes on the "minority side" continue to run and serve clients. Therefore the cluster ends up in a weird situation:

The "majority side" deleted the queue records.
When the network partition is solved, the "minority side" gets the record deletion, but the queue processes continue to run.

This was similar for auto-delete queues.

How

With Khepri, we stop to delete transient queue records in general, just because there is a node going down. Thanks to this, an exclusive or an auto-delete queue and its consumer(s) are not affected by a network partition: they continue to work.

However, if a node is really lost, we need to clean up dead queue records. This was already done for durable queues with both Mnesia and Khepri. But with Khepri, transient queue records persist in the store like durable queue records (unlike with Mnesia).

That's why this commit changes the clean-up function, rabbit_amqqueue:forget_all_durable/1 into rabbit_amqqueue:forget_all/1 which deletes all queue records of queues that were hosted on the given node, regardless if they are transient or durable.

In addition to this, the queue process will spawn a temporary process who will try to delete the underlying record indefinitely if no other processes are waiting for a reply from the queue process. That's the case for queues that are deleted because of an internal event (like the exclusive/auto-delete conditions). The queue process will exit, which will notify connections that the queue is gone.

Thanks to this, the temporary process will do its best to delete the record in case of a network partition, whether the consumers go away during or after that partition. That said, the node monitor drives some failsafe code that cleans up record if the queue process was killed before it could delete its own record.

Fixes #12949, #12597, #14527.

dumbbell · 2025-09-22T09:08:25Z

There are two test flakes in the new tests that I’m looking at. Otherwise, the patch looks ready for further testing.

… message [Why] So far, when there was a network partition with Mnesia, the most popular partition handling strategies restarted RabbitMQ nodes. Therefore, `rabbit` would execute the boot steps and one of them would notify other members of the cluster that "this RabbitMQ node is live". With Khepri, nodes are not restarted anymore and thus, boot steps are not executed at the end of a network partition. As a consequence, other members are not notified that a member is back online. [How] When the node monitor receives the `nodeup` message (managed by Erlang, meaning that "a remote Erlang node just connected to this node through Erlang distribution"), a `node_up` message is sent to all cluster members (meaning "RabbitMQ is now running on the originating node"). Yeah, very poor naming... This lets the RabbitMQ node monitor know when other nodes running RabbitMQ are back online and react accordingly. If a node is restarted, it means that another node could receive the `node_up` message twice. The actions behind it must be idempotent.

kjnilsson · 2025-09-30T10:37:04Z

deps/rabbit/src/rabbit_amqqueue_process.erl

    end.

+infinite_internal_delete(Q, ActingUser, Reason) ->
+    case delete_queue_record(Q, ActingUser, Reason) of


If the node is partitioned with a khepri leader on it this code could grow the khepri log infinitely.

I see what you mean. Then, I need to explore @lhoguin’s idea of waiting for the node_up message.

In fact, using a Khepri fence after the first delete attempt should be enough: the call waits for all updates to be applied locally. I just pushed that change.

[Why] With Mnesia, when the network partition strategy is set to `pause_minority`, nodes on the "minority side" are stopped. Thus, the exclusive queues that were hosted by nodes on that minority side are lost: * Consumers connected on these nodes are disconnected because the nodes are stopped. * Queue records on the majority side are deleted from the metadata store. This was ok with Mnesia and how this network partition handling strategy is implemented. However, it does not work with Khepri because the nodes on the "minority side" continue to run and serve clients. Therefore the cluster ends up in a weird situation: 1. The "majority side" deleted the queue records. 2. When the network partition is solved, the "minority side" gets the record deletion, but the queue processes continue to run. This was similar for auto-delete queues. [How] With Khepri, we stop to delete transient queue records in general, just because there is a node going down. Thanks to this, an exclusive or an auto-delete queue and its consumer(s) are not affected by a network partition: they continue to work. However, if a node is really lost, we need to clean up dead queue records. This was already done for durable queues with both Mnesia and Khepri. But with Khepri, transient queue records persist in the store like durable queue records (unlike with Mnesia). That's why this commit changes the clean-up function, `rabbit_amqqueue:forget_all_durable/1` into `rabbit_amqqueue:forget_all/1` which deletes all queue records of queues that were hosted on the given node, regardless if they are transient or durable. In addition to this, the queue process will spawn a temporary process who will try to delete the underlying record indefinitely if no other processes are waiting for a reply from the queue process. That's the case for queues that are deleted because of an internal event (like the exclusive/auto-delete conditions). The queue process will exit, which will notify connections that the queue is gone. Thanks to this, the temporary process will do its best to delete the record in case of a network partition, whether the consumers go away during or after that partition. That said, the node monitor drives some failsafe code that cleans up record if the queue process was killed before it could delete its own record. Fixes #12949, #12597, #14527.

mkuratczyk

Everything seems to work as expected. Thanks!

Keep exclusive/auto-delete queues with Khepri + network partition (backport #14573)

dumbbell requested review from kjnilsson and mkuratczyk September 19, 2025 11:38

dumbbell self-assigned this Sep 19, 2025

dumbbell force-pushed the fix-exclusive-queues-with-khepri branch 2 times, most recently from 2b31b23 to 7cc220b Compare September 19, 2025 13:01

the-mikedavis linked an issue Sep 19, 2025 that may be closed by this pull request

With Khepri, transient exclusive queues are not deleted in case of a partition #12597

Closed

michaelklishin added the backport-v4.2.x label Sep 19, 2025

michaelklishin added this to the 4.3.0 milestone Sep 19, 2025

michaelklishin mentioned this pull request Sep 19, 2025

Exclusive queues can be deleted without the consumers being notified #12949

Closed

dumbbell force-pushed the fix-exclusive-queues-with-khepri branch 5 times, most recently from b4d18c4 to 69cf89c Compare September 29, 2025 09:32

dumbbell force-pushed the fix-exclusive-queues-with-khepri branch 2 times, most recently from 5f40cb5 to b850e37 Compare September 29, 2025 13:54

dumbbell changed the title ~~Keep exclusive queues with Khepri + network partition~~ Keep exclusive/auto-delete queues with Khepri + network partition Sep 29, 2025

dumbbell force-pushed the fix-exclusive-queues-with-khepri branch 2 times, most recently from 175133b to ad3fdfa Compare September 29, 2025 14:59

dumbbell force-pushed the fix-exclusive-queues-with-khepri branch from f95aa4b to 47d59eb Compare September 30, 2025 09:35

dumbbell marked this pull request as ready for review September 30, 2025 10:21

dumbbell mentioned this pull request Sep 30, 2025

With Khepri, transient exclusive queues are not deleted in case of a partition #12597

Closed

kjnilsson reviewed Sep 30, 2025

View reviewed changes

dumbbell force-pushed the fix-exclusive-queues-with-khepri branch from 47d59eb to 3c4d073 Compare September 30, 2025 11:11

mkuratczyk approved these changes Oct 1, 2025

View reviewed changes

michaelklishin merged commit 40728af into main Oct 1, 2025
568 of 569 checks passed

michaelklishin deleted the fix-exclusive-queues-with-khepri branch October 1, 2025 14:37

mergify bot mentioned this pull request Oct 1, 2025

Keep exclusive/auto-delete queues with Khepri + network partition (backport #14573) #14654

Merged

dumbbell mentioned this pull request Oct 2, 2025

clustering_recovery_SUITE: Skip tests that require RabbitMQ 4.2.0 in mixed-version testing #14665

Merged

dumbbell added a commit that referenced this pull request Oct 3, 2025

Merge pull request #14654 from rabbitmq/mergify/bp/v4.2.x/pr-14573

1c856ee

Keep exclusive/auto-delete queues with Khepri + network partition (backport #14573)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keep exclusive/auto-delete queues with Khepri + network partition #14573

Keep exclusive/auto-delete queues with Khepri + network partition #14573

Uh oh!

dumbbell commented Sep 19, 2025 •

edited

Loading

Uh oh!

dumbbell commented Sep 22, 2025

Uh oh!

kjnilsson Sep 30, 2025

Uh oh!

dumbbell Sep 30, 2025

Uh oh!

dumbbell Sep 30, 2025 •

edited

Loading

Uh oh!

mkuratczyk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Keep exclusive/auto-delete queues with Khepri + network partition #14573

Keep exclusive/auto-delete queues with Khepri + network partition #14573

Uh oh!

Conversation

dumbbell commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

Uh oh!

dumbbell commented Sep 22, 2025

Uh oh!

kjnilsson Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

dumbbell Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

dumbbell Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkuratczyk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dumbbell commented Sep 19, 2025 •

edited

Loading

dumbbell Sep 30, 2025 •

edited

Loading