Skip to content

Conversation

kernel-patches-daemon-bpf-rc[bot]
Copy link

Pull request for series with
subject: Add overwrite mode for bpf ring buffer
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 9621eb6
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: e12873e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 93a83d0
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 60ef541
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: f859813
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 5d87e96
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: a578b54
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 6798668
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: fd2e081
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 32d3766
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: f7528e4
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 61ee2cc
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 3ae4c52
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: b13448d
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=999415
version: 2

Xu Kuohai added 3 commits September 15, 2025 13:09
When the bpf ring buffer is full, new events can not be recorded util
the consumer consumes some events to free space. This may cause critical
events to be discarded, such as in fault diagnostic, where recent events
are more critical than older ones.

So add ovewrite mode for bpf ring buffer. In this mode, the new event
overwrites the oldest event when the buffer is full.

The scheme is as follows:

1. producer_pos tracks the next position to write new data. When there
   is enough free space, producer simply moves producer_pos forward to
   make space for the new event.

2. To avoid waiting for consumer to free space when the buffer is full,
   a new variable overwrite_pos is introduced for producer. overwrite_pos
   tracks the next event to be overwritten (the oldest event committed) in
   the buffer. producer moves it forward to discard the oldest events when
   the buffer is full.

3. pending_pos tracks the oldest event under committing. producer ensures
   producers_pos never passes pending_pos when making space for new events.
   So multiple producers never write to the same position at the same time.

4. producer wakes up consumer every half a round ahead to give it a chance
   to retrieve data. However, for an overwrite-mode ring buffer, users
   typically only cares about the ring buffer snapshot before a fault occurs.
   In this case, the producer should commit data with BPF_RB_NO_WAKEUP flag
   to avoid unnecessary wakeups.

To make it clear, here are some example diagrams.

1. Let's say we have a ring buffer with size 4096.

    At first, {producer,overwrite,pending,consumer}_pos are all set to 0

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |                                                                       |
    |                                                                       |
    |                                                                       |
    +-----------------------------------------------------------------------+
    ^
    |
    |
producer_pos = 0
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

2. Reserve event A, size 512.

    There is enough free space, so A is allocated at offset 0 and producer_pos
    is moved to 512, the end of A. Since A is not submitted, the BUSY bit is
    set.

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |        |                                                              |
    |   A    |                                                              |
    | [BUSY] |                                                              |
    +-----------------------------------------------------------------------+
    ^        ^
    |        |
    |        |
    |    producer_pos = 512
    |
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

3. Reserve event B, size 1024.

    B is allocated at offset 512 with BUSY bit set, and producer_pos is moved
    to the end of B.

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |        |                 |                                            |
    |   A    |        B        |                                            |
    | [BUSY] |      [BUSY]     |                                            |
    +-----------------------------------------------------------------------+
    ^                          ^
    |                          |
    |                          |
    |                   producer_pos = 1536
    |
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

4. Reserve event C, size 2048.

    C is allocated at offset 1536 and producer_pos becomes 3584.

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |        |                 |                                   |        |
    |    A   |        B        |                 C                 |        |
    | [BUSY] |      [BUSY]     |               [BUSY]              |        |
    +-----------------------------------------------------------------------+
    ^                                                              ^
    |                                                              |
    |                                                              |
    |                                                    producer_pos = 3584
    |
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

5. Submit event A.

    The BUSY bit of A is cleared. B becomes the oldest event under writing, so
    pending_pos is moved to 512, the start of B.

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |        |                 |                                   |        |
    |    A   |        B        |                 C                 |        |
    |        |      [BUSY]     |               [BUSY]              |        |
    +-----------------------------------------------------------------------+
    ^        ^                                                     ^
    |        |                                                     |
    |        |                                                     |
    |   pending_pos = 512                                  producer_pos = 3584
    |
overwrite_pos = 0
consumer_pos = 0

6. Submit event B.

    The BUSY bit of B is cleared, and pending_pos is moved to the start of C,
    which is the oldest event under writing now.

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |        |                 |                                   |        |
    |    A   |        B        |                 C                 |        |
    |        |                 |               [BUSY]              |        |
    +-----------------------------------------------------------------------+
    ^                          ^                                   ^
    |                          |                                   |
    |                          |                                   |
    |                     pending_pos = 1536               producer_pos = 3584
    |
overwrite_pos = 0
consumer_pos = 0

7. Reserve event D, size 1536 (3 * 512).

    There are 2048 bytes not under writing between producer_pos and pending_pos,
    so D is allocated at offset 3584, and producer_pos is moved from 3584 to
    5120.

    Since event D will overwrite all bytes of event A and the begining 512 bytes
    of event B, overwrite_pos is moved to the start of event C, the oldest event
    that is not overwritten.

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |                 |        |                                   |        |
    |      D End      |        |                 C                 | D Begin|
    |      [BUSY]     |        |               [BUSY]              | [BUSY] |
    +-----------------------------------------------------------------------+
    ^                 ^        ^
    |                 |        |
    |                 |   pending_pos = 1536
    |                 |   overwrite_pos = 1536
    |                 |
    |             producer_pos=5120
    |
consumer_pos = 0

8. Reserve event E, size 1024.

    Though there are 512 bytes not under writing between producer_pos and
    pending_pos, E can not be reserved, as it would overwrite the first 512
    bytes of event C, which is still under writing.

9. Submit event C and D.

    pending_pos is moved to the end of D.

    0       512      1024    1536     2048     2560     3072     3584       4096
    +-----------------------------------------------------------------------+
    |                 |        |                                   |        |
    |      D End      |        |                 C                 | D Begin|
    |                 |        |                                   |        |
    +-----------------------------------------------------------------------+
    ^                 ^        ^
    |                 |        |
    |                 |   overwrite_pos = 1536
    |                 |
    |             producer_pos=5120
    |             pending_pos=5120
    |
consumer_pos = 0

The performance data for overwrite mode will be provided in a follow-up
patch that adds overwrite mode benchs.

A sample of performance data for non-overwrite mode on an x86_64 and arm64
CPU, before and after this patch, is shown below. As we can see, no obvious
performance regression occurs.

- x86_64 (AMD EPYC 9654)

Before:

Ringbuf, multi-producer contention
==================================
  rb-libbpf nr_prod 1  13.218 ± 0.039M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 2  15.684 ± 0.015M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 3  7.771 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 4  6.281 ± 0.001M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 8  2.842 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 12 2.001 ± 0.004M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 16 1.833 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 20 1.508 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 24 1.421 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 28 1.309 ± 0.001M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 32 1.265 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 36 1.198 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 40 1.174 ± 0.001M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 44 1.113 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 48 1.097 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 52 1.070 ± 0.002M/s (drops 0.000 ± 0.000M/s)

After:

Ringbuf, multi-producer contention
==================================
  rb-libbpf nr_prod 1  13.751 ± 0.673M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 2  15.592 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 3  7.776 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 4  6.463 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 8  2.883 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 12 2.017 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 16 1.816 ± 0.004M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 20 1.512 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 24 1.396 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 28 1.303 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 32 1.267 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 36 1.210 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 40 1.181 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 44 1.136 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 48 1.090 ± 0.001M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 52 1.091 ± 0.002M/s (drops 0.000 ± 0.000M/s)

- arm64 (HiSilicon Kunpeng 920)

Before:

  Ringbuf, multi-producer contention
  ==================================
  rb-libbpf nr_prod 1  11.602 ± 0.423M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 2  9.599 ± 0.007M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 3  6.669 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 4  4.806 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 8  3.856 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 12 3.368 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 16 3.210 ± 0.007M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 20 3.003 ± 0.007M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 24 2.944 ± 0.007M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 28 2.863 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 32 2.819 ± 0.007M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 36 2.887 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 40 2.837 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 44 2.787 ± 0.012M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 48 2.738 ± 0.010M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 52 2.700 ± 0.007M/s (drops 0.000 ± 0.000M/s)

After:

  Ringbuf, multi-producer contention
  ==================================
  rb-libbpf nr_prod 1  11.614 ± 0.268M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 2  9.917 ± 0.007M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 3  6.920 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 4  4.803 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 8  3.898 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 12 3.426 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 16 3.320 ± 0.008M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 20 3.029 ± 0.013M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 24 3.068 ± 0.012M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 28 2.890 ± 0.009M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 32 2.950 ± 0.012M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 36 2.812 ± 0.006M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 40 2.834 ± 0.009M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 44 2.803 ± 0.010M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 48 2.766 ± 0.010M/s (drops 0.000 ± 0.000M/s)
  rb-libbpf nr_prod 52 2.754 ± 0.009M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Xu Kuohai <[email protected]>
Add test for overwiret mode ring buffer. The test creates a bpf ring
buffer in overwrite mode, then repeatlly reserves and commits data
to check if the ring buffer works as expected both before and after
overwrite happens.

Signed-off-by: Xu Kuohai <[email protected]>
Add rb-prod test for bpf ring buffer to bench producer performance
without counsumer thread. And add --rb-overwrite option to bench
ring buffer in overwrite mode.

For reference, below are bench numbers collected from x86_64 and
arm64 CPUs.

- AMD EPYC 9654 (x86_64)

  Ringbuf, overwrite mode with multi-producer contention, no consumer
  ===================================================================
  rb-prod nr_prod 1    32.295 ± 0.004M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 2    9.591 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 3    8.895 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 4    9.206 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 8    9.220 ± 0.002M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 12   4.595 ± 0.022M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 16   4.348 ± 0.016M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 20   3.957 ± 0.017M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 24   3.787 ± 0.014M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 28   3.603 ± 0.011M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 32   3.707 ± 0.011M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 36   3.562 ± 0.012M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 40   3.616 ± 0.012M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 44   3.598 ± 0.016M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 48   3.555 ± 0.014M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 52   3.463 ± 0.020M/s (drops 0.000 ± 0.000M/s)

- HiSilicon Kunpeng 920 (arm64)

  Ringbuf, overwrite mode with multi-producer contention, no consumer
  ===================================================================
  rb-prod nr_prod 1    14.687 ± 0.058M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 2    22.263 ± 0.007M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 3    5.736 ± 0.003M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 4    4.934 ± 0.001M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 8    4.661 ± 0.001M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 12   3.753 ± 0.013M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 16   3.706 ± 0.018M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 20   3.660 ± 0.015M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 24   3.610 ± 0.016M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 28   3.238 ± 0.010M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 32   3.270 ± 0.018M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 36   2.892 ± 0.021M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 40   2.995 ± 0.018M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 44   2.830 ± 0.019M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 48   2.877 ± 0.015M/s (drops 0.000 ± 0.000M/s)
  rb-prod nr_prod 52   2.814 ± 0.015M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Xu Kuohai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants