Quorum Queue followers may stop taking snapshots under specific usage patterns #14137
-
Describe the bugUnder specific usage patterns, followers in a QQ cluster may stop taking snapshots while the leader continues taking snapshots normally. This is reproducible up to v4.1.0 and should be fixed in #13971. I've seen this in 3.13.7 but it works up to v4.1.0 as long as the delivery limit is undefined. Reproduction stepsI have edited the v4.1.0 source to add a list shuffle to With those changes, the following steps can casue the followers of a 3-node QQ cluster to stop taking snapshots. $ make start-cluster
# Set the delivery limit of QQ "qq" to unlimited.
$ rabbitmqctl -n rabbit-1 set_policy qq-unlimited-delivery-limit '^qq$' '{"delivery-limit": -1}' --priority 123 --apply-to "quorum_queues"
# Publish many messages so we can consume them with basic.get.
$ perf-test --quorum-queue --queue qq --consumers 0 --producers 1 --pmessages 10000
# Run the reproduction code from the patch in another terminal.
$ rabbitmq-diagnostics -n rabbit-1 remote_shell
> rabbit_repro:run().
# Consume the remaining messages in the queue. This will cause the leader to take a
# snapshot.
$ perf-test --quorum-queue --queue qq --consumers 1 --producers 0 --time 2
# (optional) Publish and consume many messages. This is not necessary but it
# moves the leader's snapshot index forward.
# View the quorum status.
$ rabbitmq-queues -n rabbit-1 quorum_status qq
Status of quorum queue qq on node rabbit-1@mango2 ...
┌─────────────────┬────────────┬────────────┬────────────────┬──────────────┬──────────────┬──────────────┬────────────────┬──────┬─────────────────┐
│ Node Name │ Raft State │ Membership │ Last Log Index │ Last Written │ Last Applied │ Commit Index │ Snapshot Index │ Term │ Machine Version │
├─────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-1@mango2 │ leader │ voter │ 1590120 │ 1590120 │ 1590120 │ 1590120 │ 1550540 │ 1 │ 5 │
├─────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-2@mango2 │ follower │ voter │ 1590120 │ 1590120 │ 1590120 │ 1590120 │ -1 │ 1 │ 5 │
├─────────────────┼────────────┼────────────┼────────────────┼──────────────┼──────────────┼──────────────┼────────────────┼──────┼─────────────────┤
│ rabbit-3@mango2 │ follower │ voter │ 1590120 │ 1590120 │ 1590120 │ 1590120 │ -1 │ 1 │ 5 │
└─────────────────┴────────────┴────────────┴────────────────┴──────────────┴──────────────┴──────────────┴────────────────┴──────┴─────────────────┘As the QQ sees continued usage, the followers' disk space will be consumed while the leader's will remain relatively empty. The followers are 'stuck' - they cannot snapshot and truncate their old data away. Expected behaviorLeader and follower snapshot indices are not necessarily identical but if the leader and followers have similar last applied indices, the snapshot indices should be 'within spitting distance' of each other. The followers should not stop taking snapshots while the leader continues taking snapshots. Additional contextThe reproduction relies on unstable map ordering. Iteration order of an Erlang map is not defined and can vary across OTP releases or even across nodes. (TODO: information about reproducing this would be valuable.) In the reproduction code in The next step of Specifically what can happen here is that when followers handle the |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
|
This is fixed by #13971 which is released in v4.1.1. The reproduction relies on unstable map ordering which is fixed by the PR. |
Beta Was this translation helpful? Give feedback.
-
|
I say that this usage pattern is very specific because it relies on multiple It might also technically be possible to reproduce this with |
Beta Was this translation helpful? Give feedback.
-
|
One quick remediation for this is to send a Another way to remediate is to use Also, to avoid the issue you can set a delivery limit on the QQ. This is not a remediation - it doesn't fix followers if they are lagging behind - but it prevents it since it avoids the |
Beta Was this translation helpful? Give feedback.
-
|
I'm not satisfied with the shuffle part of the reproduction steps. I'll be looking into the large map implementation upstream and trying to figure out where natural changes in ordering can come from. From what I've seen looking so far it sounds like collision nodes in the HAMT are expected to be rare - maybe that's a lead. |
Beta Was this translation helpful? Give feedback.
-
|
@the-mikedavis #13971 is not safe to backport to |
Beta Was this translation helpful? Give feedback.
Actually I was wrong here ☝️, both about the
delivery-limitand the purge part.When a channel with 33 or more consumers closes, the QQ returns the messages in a different order on all replicas.
basic.getit will returnbasic.get_emptybut the followers will check out their extra messages. Those checkouts will sit there stuck in the followers since positive or negative acknowledgement…