Skip to content

bpf: misc performance improvements for cgroup hooks #597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=408925

anakryiko and others added 10 commits December 24, 2020 02:11
Add BPF_CORE_READ_USER(), BPF_CORE_READ_USER_STR() and their _INTO()
variations to allow reading CO-RE-relocatable kernel data structures from the
user-space. One of such cases is reading input arguments of syscalls, while
reaping the benefits of CO-RE relocations w.r.t. handling 32/64 bit
conversions and handling missing/new fields in UAPI data structs.

Suggested-by: Gilad Reti <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
BPF_CORE_READ(), in addition to handling CO-RE relocations, also allows much
nicer way to read data structures with nested pointers. Instead of writing
a sequence of bpf_probe_read() calls to follow links, one can just write
BPF_CORE_READ(a, b, c, d) to effectively do a->b->c->d read. This is a welcome
ability when porting BCC code, which (in most cases) allows exactly the
intuitive a->b->c->d variant.

This patch adds non-CO-RE variants of BPF_CORE_READ() family of macros for
cases where CO-RE is not supported (e.g., old kernels). In such cases, the
property of shortening a sequence of bpf_probe_read()s to a simple
BPF_PROBE_READ(a, b, c, d) invocation is still desirable, especially when
porting BCC code to libbpf. Yet, no CO-RE relocation is going to be emitted.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
…ants

Add selftests validating that newly added variations of BPF_CORE_READ(), for
use with user-space addresses and for non-CO-RE reads, work as expected.

Signed-off-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
This program does not use argp (which is a glibcism). Instead include <errno.h>
directly, which was pulled in by <argp.h>.

Signed-off-by: Leah Neukirchen <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
The function sockfd_lookup uses fget on the value that is stored in
the file field of the returned structure, so fput should ultimately
be applied to this value. This can be done directly, but it seems
better to use the specific macro sockfd_put, which does the same
thing.

The cleanup was done using the following semantic patch:
    (http://www.emn.fr/x-info/coccinelle/)

    // <smpl>
    @@
    expression s;
    @@

       s = sockfd_lookup(...)
       ...
    +  sockfd_put(s);
    ?- fput(s->file);
    // </smpl>

Signed-off-by: Zheng Yongjun <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Introduce xdp_init_buff utility routine to initialize xdp_buff fields
const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
xdp_init_buff in all XDP capable drivers.

Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Shay Agroskin <[email protected]>
Acked-by: Martin Habets <[email protected]>
Acked-by: Camelia Groza <[email protected]>
Acked-by: Marcin Wojtas <[email protected]>
Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org
Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.

Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Shay Agroskin <[email protected]>
Acked-by: Martin Habets <[email protected]>
Acked-by: Camelia Groza <[email protected]>
Acked-by: Marcin Wojtas <[email protected]>
Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org
When we attach a bpf program to cgroup/getsockopt any other getsockopt()
syscall starts incurring kzalloc/kfree cost. While, in general, it's
not an issue, sometimes it is, like in the case of TCP_ZEROCOPY_RECEIVE.
TCP_ZEROCOPY_RECEIVE (ab)uses getsockopt system call to implement
fastpath for incoming TCP, we don't want to have extra allocations in
there.

Let add a small buffer on the stack and use it for small (majority)
{s,g}etsockopt values. I've started with 128 bytes to cover
the options we care about (TCP_ZEROCOPY_RECEIVE which is 32 bytes
currently, with some planned extension to 64).

It seems natural to do the same for setsockopt, but it's a bit more
involved when the BPF program modifies the data (where we have to
kmalloc). The assumption is that for the majority of setsockopt
calls (which are doing pure BPF options or apply policy) this
will bring some benefit as well.

Collected some performance numbers using (on a 65k MTU localhost in a VM):
$ perf record -g -- ./tcp_mmap -s -z
$ ./tcp_mmap -H ::1 -z
$ ...
$ perf report --symbol-filter=__cgroup_bpf_run_filter_getsockopt

Without this patch:
     4.81%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_>
            |
             --4.74%--__cgroup_bpf_run_filter_getsockopt
                       |
                       |--1.06%--__kmalloc
                       |
                       |--0.71%--lock_sock_nested
                       |
                       |--0.62%--__might_fault
                       |
                        --0.52%--release_sock

With the patch applied:
     3.29%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
            |
             --3.22%--__cgroup_bpf_run_filter_getsockopt
                       |
                       |--0.66%--lock_sock_nested
                       |
                       |--0.57%--__might_fault
                       |
                        --0.56%--release_sock

So it saves about 1% of the system call. Unfortunately, we still get
2-3% of overhead due to another socket lock/unlock :-(

Signed-off-by: Stanislav Fomichev <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
When we attach any cgroup hook, the rest (even if unused/unattached) start
to contribute small overhead. In particular, the one we want to avoid is
__cgroup_bpf_run_filter_skb which does two redirections to get to
the cgroup and pushes/pulls skb.

Let's split cgroup_bpf_enabled to be per-attach to make sure
only used attach types trigger.

I've dropped some existing high-level cgroup_bpf_enabled in some
places because BPF_PROG_CGROUP_XXX_RUN macros usually have another
cgroup_bpf_enabled check.

I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for
GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type]
has to be constant and known at compile time.

Signed-off-by: Stanislav Fomichev <[email protected]>
Acked-by: Song Liu <[email protected]>
@kernel-patches-bot
Copy link
Author

Master branch: 482ec34
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=408925
version: 2

@kernel-patches-bot kernel-patches-bot force-pushed the bpf-next branch 2 times, most recently from 17b75d3 to 9a8120a Compare January 8, 2021 21:57
@kernel-patches-bot kernel-patches-bot changed the title bpf: try to avoid kzalloc in cgroup/{s,g}etsockopt bpf: misc performance improvements for cgroup hooks Mar 17, 2021
@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=415587 irrelevant now. Closing PR.

@kernel-patches-bot kernel-patches-bot deleted the series/408925=>bpf-next branch March 20, 2021 04:13
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Jul 30, 2025
Failing tests:
- kernel-patches#110     fexit_bpf2bpf:FAIL
- kernel-patches#124     for_each:FAIL
- kernel-patches#144     iters:FAIL
- kernel-patches#148     kfree_skb:FAIL
- kernel-patches#161     l4lb_all:FAIL
- kernel-patches#193     map_kptr:FAIL
- kernel-patches#23      bpf_loop:FAIL
- kernel-patches#260     pkt_access:FAIL
- kernel-patches#269     prog_run_opts:FAIL
- kernel-patches#280     rbtree_success:FAIL
- kernel-patches#356     res_spin_lock_failure:FAIL
- kernel-patches#364     setget_sockopt:FAIL
- kernel-patches#381     sock_fields:FAIL
- kernel-patches#394     spin_lock:FAIL
- kernel-patches#395     spin_lock_success:FAIL
- kernel-patches#444     test_bpffs:FAIL
- kernel-patches#453     test_profiler:FAIL
- kernel-patches#479     usdt:FAIL
- kernel-patches#488     verifier_bits_iter:FAIL
- kernel-patches#597     verif_scale_pyperf600:FAIL
- kernel-patches#598     verif_scale_pyperf600_bpf_loop:FAIL
- kernel-patches#599     verif_scale_pyperf600_iter:FAIL
- kernel-patches#608     verif_scale_strobemeta_subprogs:FAIL
- kernel-patches#622     xdp_attach:FAIL
- kernel-patches#637     xdp_noinline:FAIL
- kernel-patches#639     xdp_synproxy:FAIL
- kernel-patches#72      cls_redirect:FAIL
- kernel-patches#88      crypto_sanity:FAIL
- kernel-patches#97      dynptr:FAIL

Signed-off-by: Eduard Zingerman <[email protected]>
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Jul 30, 2025
Failing tests:
- kernel-patches#110     fexit_bpf2bpf:FAIL
- kernel-patches#124     for_each:FAIL
- kernel-patches#144     iters:FAIL
- kernel-patches#148     kfree_skb:FAIL
- kernel-patches#161     l4lb_all:FAIL
- kernel-patches#193     map_kptr:FAIL
- kernel-patches#23      bpf_loop:FAIL
- kernel-patches#260     pkt_access:FAIL
- kernel-patches#269     prog_run_opts:FAIL
- kernel-patches#280     rbtree_success:FAIL
- kernel-patches#356     res_spin_lock_failure:FAIL
- kernel-patches#364     setget_sockopt:FAIL
- kernel-patches#381     sock_fields:FAIL
- kernel-patches#394     spin_lock:FAIL
- kernel-patches#395     spin_lock_success:FAIL
- kernel-patches#444     test_bpffs:FAIL
- kernel-patches#453     test_profiler:FAIL
- kernel-patches#479     usdt:FAIL
- kernel-patches#488     verifier_bits_iter:FAIL
- kernel-patches#597     verif_scale_pyperf600:FAIL
- kernel-patches#598     verif_scale_pyperf600_bpf_loop:FAIL
- kernel-patches#599     verif_scale_pyperf600_iter:FAIL
- kernel-patches#608     verif_scale_strobemeta_subprogs:FAIL
- kernel-patches#622     xdp_attach:FAIL
- kernel-patches#637     xdp_noinline:FAIL
- kernel-patches#639     xdp_synproxy:FAIL
- kernel-patches#72      cls_redirect:FAIL
- kernel-patches#88      crypto_sanity:FAIL
- kernel-patches#97      dynptr:FAIL

Signed-off-by: Eduard Zingerman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants