Skip to content

Conversation

@the-mikedavis
Copy link
Collaborator

This is an extension of the free disk space alarm which allows configuring additional mount points to monitor and which queue type(s) to block when they are near full. For example with a config like so:

stream.data_dir = /mnt/data/streams
# Directory where the file system is mounted.
disk_free_limits.stream.mount_point = /mnt/data/streams
# Alarm threshold: if free space falls under this absolute
# limit then an alarm fires per queue type.
disk_free_limits.stream.absolute = 2GB
# Queue types to block when the threshold is breached.
disk_free_limits.stream.queue_types = stream

Publishers to streams would be blocked once the free space of /data/stream-data falls under 2GB. Publishers to classic or quorum queues could continue though.

The motivation of this feature is that you may want to use separate disks for different queue types. For example for higher throughput you may want to use volume(s) with better throughput and/or IOPS for streaming but use standard disks for queue data. Also, alarms are currently fairly aggressive by blocking all publishing. Ideally you should be able to continue using queues when the space you have allocated for streams fills up, or vice versa.

This is a different approach than #14086. Instead of measuring disk usage under a directory like du(1), rabbit_disk_monitor is updated to measure free space of all mounts at once with disksup:get_disk_info/0. Under the hood this performs the same df(1) check as rabbit_disk_monitor had been doing previously - measuring mount-point free space is much cheaper than measuring directory disk footprint. Monitoring mount points is also quite flexible: you can use multiple disks on one mount point with RAID-0 striping or split up a single disk with partitions.

This is a draft - it needs tests and currently only AMQP 0-9-1 is updated to perform selecting blocking. All other protocols currently block for any alarm.

Some of the commits in this branch are refactors that could be cherry-picked out. #14814 is pretty trivial and the refactors to use maps instead of dict in rabbit_alarm and use disksup instead of the custom df code in rabbit_disk_alarm are not strictly related to the feature here.

Discussed in #14590

This is the same as the `raft.data_dir` option but for Osiris' data
directory. Configuring this in Cuttlefish is nicer than the existing
`$RABBITMQ_STREAM_DIR` environment variable way of changing the dir.
This is not a functional change, just a refactor to eliminate dicts and
use maps instead. This cleans up some helper functions like
dict_append/3, and we can use map comprehensions in some places to
avoid intermediary lists.
Previously we set `start_disksup` to `false` to avoid OTP's automatic
monitoring of disk space. `disksup`'s gen_server starts a port (which
runs `df` on Unix) which measures disk usage and sets an alarm through
OTP's `alarm_handler` when usage exceeds the configured
`disk_almost_full_threshold`. We can set this threshold to 1.0 to
effectively turn off disksup's monitoring (i.e. the alarm will never be
set).

By enabling disksup we have access to `get_disk_data/0` and
`get_disk_info/0,1` which can be used to replace the copied versions in
`rabbit_disk_monitor`.
`disksup` now exposes the calculation for available disk space for a
given path using the same `df` mechanism on Unix. We can use this
directly and drop the custom code which reimplements that.
This introduces a new variant of `rabbit_alarm:resource_alarm_source()`:
`{disk, QueueType}` which triggers when the configured mount point for
queue type(s) fall under their limit of available space.
This covers both network and direct connections for 0-9-1. We store a
set of the queue types which have been published into on both a channel
and connection level since blocking is done on the connection level but
only the channel knows what queue types have been published.

Then when the published queue types or the set of alarms changes, the
connection evaluates whether it is affected by the alarm. If not it may
publish but once a channel publishes to an alarmed queue type the
connection then blocks until the channel exits or the alarm clears.
false ->
{noreply, State1}
end;
handle_cast({channel_published_to_queue_type, _ChPid, QT},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature might need a feature flag. Here for direct connections if old client code is used on a newer server then it would error after publishing since it isn't expecting this cast. I think it would be unlikely to happen in practice but the mixed-version test suite will probably run into this.

@samuelmasse
Copy link

What would the config setup be for having a main disk that contains quorum and classic queues and a secondary disk that contains streams. Would we specify the same mount point for quorum and classic with each defining queue_types as quorum and classic respectively? Would that result in a common alarm for both or two alarms looking at the same thing.

@the-mikedavis
Copy link
Collaborator Author

the-mikedavis commented Oct 24, 2025

Ah yeah, in that scenario you could have a config like so:

disk_free_limits.streaming.mount_point = /mnt/data/streams
disk_free_limits.streaming.absolute = 2GB
disk_free_limits.streaming.queue_types = stream

disk_free_limits.messaging.mount_point = /mnt/data/queues
disk_free_limits.messaging.absolute = 2GB
disk_free_limits.messaging.queue_types = classic,quorum

And if /mnt/data/queues went under its configured limit it would set two alarms (disk for classic and disk for quorum queue types) but wouldn't affect streams.

@samuelmasse
Copy link

Ah I see thanks! So if I understand correctly the name here disk_free_limits.[name].mount_point can be any name we want to set it to, it wouldn't have to be the name of a queue type. So I could set disk_free_limits.bob.mount_point for example.

Taking that thought further, what would the process of adding that "bob" disk alarm to an existing broker look like. If node A thinks the "bob" alarm exists but node B doesn't can there be issues that comes from the disagreement? When node A restarts with the new configuration do all nodes now know of the "bob" alarm or just node A until all other nodes also restart.

Also after I added my new "bob" disk alarm, what ways do I have as a user to then monitor the "bob" alarm to see if it's currently alarming, what value it is configured to and how close it's getting to the alarm point. For MQ's use case currently we are getting this information from the /api/nodes endpoint using the disk_free_limit, disk_free and disk_free_alarm values. Are we thinking about adding an alarm name map to the output of this API to give those values for each disk alarm. So I would maybe access disk_free_map.bob.disk_free to know the status of my "bob" disk alarm.

Lastly for the RabbitMQ console we currently have a column named "Disk space" that displays the information of the as of now only disk alarm. When I add this "bob" disk alarm would we want to dynamically add a new column to that table named something like "Disk space (bob)". In that case would we also support defining the ordering of those columns. For example if we consider that it's most relevant to display the disk alarm of quorum queues on the left, then streams, then classic and at the end the disk alarm for non queue storage, would we be able to define that order manually in some console config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants