-
Notifications
You must be signed in to change notification settings - Fork 6
scripts/pahole-flags.sh: Parse DWARF and generate BTF with multithreading. #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Master branch: 1b8c924 |
Master branch: b38101c |
78624f8
to
42c9b07
Compare
Master branch: b75daca |
42c9b07
to
e9ba484
Compare
Master branch: d24d2a2 |
e9ba484
to
7f26e6f
Compare
Master branch: 086d490 |
7f26e6f
to
db245f6
Compare
Master branch: 9087c6f |
…ding. Pass a "-j" argument to pahole if possible to reduce the time of generating BTF info. Since v1.22, pahole can parse DWARF and generate BTF with multithreading to speed up the conversion. It will reduce the overall build time of the kernel for seconds. v3 fixes whitespaces and improves the commit description. v2 checks the version of pahole to enable multithreading only if possible. [v2] https://lore.kernel.org/bpf/[email protected]/ [v1] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Kui-Feng Lee <[email protected]> Acked-by: Yonghong Song <[email protected]>
db245f6
to
98c9362
Compare
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=615515 expired. Closing PR. |
The below crash can be encountered when using xdpsock in rx mode for legacy rq: the buffer gets released in the XDP_REDIRECT path, and then once again in the driver. This fix sets the flag to avoid releasing on the driver side. XSK handling of buffers for legacy rq was relying on the caller to set the skip release flag. But the referenced fix started using fragment counts for pages instead of the skip flag. Crash log: general protection fault, probably for non-canonical address 0xffff8881217e3a: 0000 [#1] SMP CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.5.0-rc1+ #31 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:bpf_prog_03b13f331978c78c+0xf/0x28 Code: ... RSP: 0018:ffff88810082fc98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888138404901 RCX: c0ffffc900027cbc RDX: ffffffffa000b514 RSI: 00ffff8881217e32 RDI: ffff888138404901 RBP: ffff88810082fc98 R08: 0000000000091100 R09: 0000000000000006 R10: 0000000000000800 R11: 0000000000000800 R12: ffffc9000027a000 R13: ffff8881217e2dc0 R14: ffff8881217e2910 R15: ffff8881217e2f00 FS: 0000000000000000(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000564cb2e2cde0 CR3: 000000010e603004 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x32/0x80 ? exc_general_protection+0x192/0x390 ? asm_exc_general_protection+0x22/0x30 ? 0xffffffffa000b514 ? bpf_prog_03b13f331978c78c+0xf/0x28 mlx5e_xdp_handle+0x48/0x670 [mlx5_core] ? dev_gro_receive+0x3b5/0x6e0 mlx5e_xsk_skb_from_cqe_linear+0x6e/0x90 [mlx5_core] mlx5e_handle_rx_cqe+0x55/0x100 [mlx5_core] mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core] mlx5e_napi_poll+0x45e/0x6b0 [mlx5_core] __napi_poll+0x25/0x1a0 net_rx_action+0x28a/0x300 __do_softirq+0xcd/0x279 ? sort_range+0x20/0x20 run_ksoftirqd+0x1a/0x20 smpboot_thread_fn+0xa2/0x130 kthread+0xc9/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 7abd955 ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP") Signed-off-by: Dragos Tatulea <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
MCAM register reports the device supported management features. Querying this register exposes if features are supported with the current firmware version in the current ASIC. Then, the driver can separate between different implementations dynamically. MCAM register supports querying whether the MCIA register supports 128 bytes payloads or only 48 bytes. Add support for the register as preparation for allowing larger MCIA transactions. Note that the access to the bits in the field 'mng_feature_cap_mask' is not same to other mask fields in other registers. In most of the cases bit #0 is the first one in the last dword, in MCAM register, bits #0-#31 are in the first dword and so on. Declare the mask field using bits arrays per dword to simplify the access. Signed-off-by: Amit Cohen <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Link: https://lore.kernel.org/r/1427a3f57ba93db1c5dd4f982bfb31dd5c82356e.1690281940.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Eduard Zingerman <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]>
Use 32-bit subranges to prune some 64-bit BPF_JEQ/BPF_JNE conditions that otherwise would be "inconclusive" (i.e., is_branch_taken() would return -1). This can happen, for example, when registers are initialized as 64-bit u64/s64, then compared for inequality as 32-bit subregisters, and then followed by 64-bit equality/inequality check. That 32-bit inequality can establish some pattern for lower 32 bits of a register (e.g., s< 0 condition determines whether the bit #31 is zero or not), while overall 64-bit value could be anything (according to a value range representation). This is not a fancy quirky special case, but actually a handling that's necessary to prevent correctness issue with BPF verifier's range tracking: set_range_min_max() assumes that register ranges are non-overlapping, and if that condition is not guaranteed by is_branch_taken() we can end up with invalid ranges, where min > max. [0] https://lore.kernel.org/bpf/CACkBjsY2q1_fUohD7hRmKGqv1MV=eP2f6XK8kjkYNw7BaiF8iQ@mail.gmail.com/ Acked-by: Shung-Hsi Yu <[email protected]> Acked-by: Eduard Zingerman <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
syzkaller reported an overflown write in arp_req_get(). [0] When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour entry and copies neigh->ha to struct arpreq.arp_ha.sa_data. The arp_ha here is struct sockaddr, not struct sockaddr_storage, so the sa_data buffer is just 14 bytes. In the splat below, 2 bytes are overflown to the next int field, arp_flags. We initialise the field just after the memcpy(), so it's not a problem. However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN), arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL) in arp_ioctl() before calling arp_req_get(). To avoid the overflow, let's limit the max length of memcpy(). Note that commit b5f0de6 ("net: dev: Convert sa_data to flexible array in struct sockaddr") just silenced syzkaller. [0]: memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14) WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Modules linked in: CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6 RSP: 0018:ffffc900050b7998 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001 RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000 R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010 FS: 00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261 inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981 sock_do_ioctl+0xdf/0x260 net/socket.c:1204 sock_ioctl+0x3ef/0x650 net/socket.c:1321 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x64/0xce RIP: 0033:0x7f172b262b8d Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003 RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000 </TASK> Reported-by: syzkaller <[email protected]> Reported-by: Bjoern Doebel <[email protected]> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
The current implementation of the mov instruction with sign extension has the following problems: 1. It clobbers the source register if it is not stacked because it sign extends the source and then moves it to the destination. 2. If the dst_reg is stacked, the current code doesn't write the value back in case of 64-bit mov. 3. There is room for improvement by emitting fewer instructions. The steps for fixing this and the instructions emitted by the JIT are explained below with examples in all combinations: Case A: offset == 32: ===================== Case A.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Store tmp_lo into dst_lo 3. Sign extend tmp_lo into tmp_hi 4. Store tmp_hi to dst_hi Example: r3 = (s32)r3 r3 is a stacked register ldr r6, [r11, #-16] // Load r3_lo into tmp_lo // str to dst_lo is not emitted because src_lo == dst_lo asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi into r3_hi Case A.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo into dst_hi Example: r6 = (s32)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo into r6_lo asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case A.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Store src_lo into dst_lo 2. Sign extend src_lo into tmp_hi 3. Store tmp_hi to dst_hi Example: r3 = (s32)r6 r3 is stacked and r6 maps to {ARM_R5, ARM_R4} str r4, [r11, #-16] // Store r6_lo to r3_lo asr r7, r4, #31 // Sign extend r6_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi to dest_hi Case A.4: Both src and dst are not stacked: ------------------------------------------- 1. Mov src_lo into dst_lo 2. Sign extend src_lo into dst_hi Example: (bf) r6 = (s32)r6 r6 maps to {ARM_R5, ARM_R4} // Mov not emitted because dst == src asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B: offset != 32: ===================== Case B.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Sign extend tmp_lo according to offset. 3. Store tmp_lo into dst_lo 4. Sign extend tmp_lo into tmp_hi 5. Store tmp_hi to dst_hi Example: r9 = (s8)r3 r9 and r3 are both stacked registers ldr r6, [r11, #-16] // Load r3_lo into tmp_lo lsl r6, r6, #24 // Sign extend tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-56] // Store tmp_lo to r9_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-52] // Store tmp_hi to r9_hi Case B.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo according to offset. 3. Sign extend tmp_lo into dst_hi Example: r6 = (s8)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo to r6_lo lsl r4, r4, #24 // Sign extend r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Sign extend src_lo into tmp_lo according to offset. 2. Store tmp_lo into dst_lo. 3. Sign extend src_lo into tmp_hi. 4. Store tmp_hi to dst_hi. Example: r3 = (s8)r1 r3 is stacked and r1 maps to {ARM_R3, ARM_R2} lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-16] // Store tmp_lo to r3_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-12] // Store tmp_hi to r3_hi Case B.4: Both src and dst are not stacked: ------------------------------------------- 1. Sign extend src_lo into dst_lo according to offset. 2. Sign extend dst_lo into dst_hi. Example: r6 = (s8)r1 r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2} lsl r4, r2, #24 // Sign extend r1_lo to r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo to r6_hi Fixes: fc83265 ("arm32, bpf: add support for sign-extension mov instruction") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Puranjay Mohan <[email protected]>
The current implementation of the mov instruction with sign extension has the following problems: 1. It clobbers the source register if it is not stacked because it sign extends the source and then moves it to the destination. 2. If the dst_reg is stacked, the current code doesn't write the value back in case of 64-bit mov. 3. There is room for improvement by emitting fewer instructions. The steps for fixing this and the instructions emitted by the JIT are explained below with examples in all combinations: Case A: offset == 32: ===================== Case A.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Store tmp_lo into dst_lo 3. Sign extend tmp_lo into tmp_hi 4. Store tmp_hi to dst_hi Example: r3 = (s32)r3 r3 is a stacked register ldr r6, [r11, #-16] // Load r3_lo into tmp_lo // str to dst_lo is not emitted because src_lo == dst_lo asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi into r3_hi Case A.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo into dst_hi Example: r6 = (s32)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo into r6_lo asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case A.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Store src_lo into dst_lo 2. Sign extend src_lo into tmp_hi 3. Store tmp_hi to dst_hi Example: r3 = (s32)r6 r3 is stacked and r6 maps to {ARM_R5, ARM_R4} str r4, [r11, #-16] // Store r6_lo to r3_lo asr r7, r4, #31 // Sign extend r6_lo into tmp_hi str r7, [r11, #-12] // Store tmp_hi to dest_hi Case A.4: Both src and dst are not stacked: ------------------------------------------- 1. Mov src_lo into dst_lo 2. Sign extend src_lo into dst_hi Example: (bf) r6 = (s32)r6 r6 maps to {ARM_R5, ARM_R4} // Mov not emitted because dst == src asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B: offset != 32: ===================== Case B.1: src and dst are stacked registers: -------------------------------------------- 1. Load src_lo into tmp_lo 2. Sign extend tmp_lo according to offset. 3. Store tmp_lo into dst_lo 4. Sign extend tmp_lo into tmp_hi 5. Store tmp_hi to dst_hi Example: r9 = (s8)r3 r9 and r3 are both stacked registers ldr r6, [r11, #-16] // Load r3_lo into tmp_lo lsl r6, r6, #24 // Sign extend tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-56] // Store tmp_lo to r9_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-52] // Store tmp_hi to r9_hi Case B.2: src is stacked but dst is not: ---------------------------------------- 1. Load src_lo into dst_lo 2. Sign extend dst_lo according to offset. 3. Sign extend tmp_lo into dst_hi Example: r6 = (s8)r3 r6 maps to {ARM_R5, ARM_R4} and r3 is stacked ldr r4, [r11, #-16] // Load r3_lo to r6_lo lsl r4, r4, #24 // Sign extend r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo into r6_hi Case B.3: src is not stacked but dst is stacked: ------------------------------------------------ 1. Sign extend src_lo into tmp_lo according to offset. 2. Store tmp_lo into dst_lo. 3. Sign extend src_lo into tmp_hi. 4. Store tmp_hi to dst_hi. Example: r3 = (s8)r1 r3 is stacked and r1 maps to {ARM_R3, ARM_R2} lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo asr r6, r6, #24 // .. str r6, [r11, #-16] // Store tmp_lo to r3_lo asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi str r7, [r11, #-12] // Store tmp_hi to r3_hi Case B.4: Both src and dst are not stacked: ------------------------------------------- 1. Sign extend src_lo into dst_lo according to offset. 2. Sign extend dst_lo into dst_hi. Example: r6 = (s8)r1 r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2} lsl r4, r2, #24 // Sign extend r1_lo to r6_lo asr r4, r4, #24 // .. asr r5, r4, #31 // Sign extend r6_lo to r6_hi Fixes: fc83265 ("arm32, bpf: add support for sign-extension mov instruction") Reported-by: [email protected] Signed-off-by: Puranjay Mohan <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Closes: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee. However, when freplace prog has been attached and then updates to PROG_ARRAY map, it will panic, because the updating checks prog type of freplace prog by 'prog->aux->dst_prog->type' and 'prog->aux->dst_prog' of freplace prog is NULL. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Why 'prog->aux->dst_prog' of freplace prog is NULL? It causes by commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach"). As 'prog->aux->dst_prog' of freplace prog is set as NULL when attach, freplace prog does not have stable prog type. But when to update freplace prog to PROG_ARRAY map, it requires checking prog type. They are conflict in theory. This patch is unable to resolve this issue thoroughly. It resolves prog type of freplace prog by 'prog->aux->saved_dst_prog_type' to avoid panic. Fixes: 1c123c5 ("bpf: Resolve fext program type when checking map compatibility") Signed-off-by: Leon Hwang <[email protected]>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <[email protected]> Cc: Martin KaFai Lau <[email protected]> Signed-off-by: Leon Hwang <[email protected]>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <[email protected]> Cc: Martin KaFai Lau <[email protected]> Signed-off-by: Leon Hwang <[email protected]>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <[email protected]> Cc: Martin KaFai Lau <[email protected]> Signed-off-by: Leon Hwang <[email protected]>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <[email protected]> Cc: Martin KaFai Lau <[email protected]> Signed-off-by: Leon Hwang <[email protected]>
The commit f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed the following panic, which was caused by updating attached freplace prog to PROG_ARRAY map. But, it does not support updating attached freplace prog to PROG_ARRAY map. [309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004 [309049.036419] #PF: supervisor read access in kernel mode [309049.036426] #PF: error_code(0x0000) - not-present page [309049.036432] PGD 0 P4D 0 [309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI [309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu [309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 [309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140 [309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00 [309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246 [309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000 [309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00 [309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000 [309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00 [309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400 [309049.036551] FS: 00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000 [309049.036559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0 [309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [309049.036588] Call Trace: [309049.036592] <TASK> [309049.036597] ? show_regs+0x6d/0x80 [309049.036604] ? __die+0x24/0x80 [309049.036619] ? page_fault_oops+0x99/0x1b0 [309049.036628] ? do_user_addr_fault+0x2ee/0x6b0 [309049.036634] ? exc_page_fault+0x83/0x1b0 [309049.036641] ? asm_exc_page_fault+0x27/0x30 [309049.036649] ? bpf_prog_map_compatible+0x2a/0x140 [309049.036656] prog_fd_array_get_ptr+0x2c/0x70 [309049.036664] bpf_fd_array_map_update_elem+0x37/0x130 [309049.036671] bpf_map_update_value+0x1d3/0x260 [309049.036677] map_update_elem+0x1fa/0x360 [309049.036683] __sys_bpf+0x54c/0xa10 [309049.036689] __x64_sys_bpf+0x1a/0x30 [309049.036694] x64_sys_call+0x1936/0x25c0 [309049.036700] do_syscall_64+0x7f/0x180 [309049.036706] ? do_syscall_64+0x8c/0x180 [309049.036712] ? do_syscall_64+0x8c/0x180 [309049.036717] ? irqentry_exit+0x43/0x50 [309049.036723] ? common_interrupt+0x54/0xb0 [309049.036729] entry_SYSCALL_64_after_hwframe+0x73/0x7b Since commit 1c123c5 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog can be used as tail-callee of its target prog. And the commit 3aac1ea ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL when attach freplace prog to its target. Then, as for following example: tailcall_freplace.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; count++; bpf_tail_call_static(skb, &jmp_table, 0); return ret; } SEC("freplace") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; tc_bpf2bpf.c: // SPDX-License-Identifier: GPL-2.0 \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" __noinline int subprog(struct __sk_buff *skb) { volatile int ret = 1; return ret; } SEC("tc") int entry(struct __sk_buff *skb) { return subprog(skb); } char __license[] SEC("license") = "GPL"; And freplace entry prog's target is the tc subprog. After loading, the freplace jmp_table's owner type is BPF_PROG_TYPE_SCHED_CLS. Next, after attaching freplace prog to tc subprog, its prog->aux-> dst_prog is NULL. Next, when update freplace prog to jmp_table, bpf_prog_map_compatible() returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. With this patch, resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS to support updating attached freplace prog to PROG_ARRY map for this example. Fixes: f7866c3 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen <[email protected]> Cc: Martin KaFai Lau <[email protected]> Signed-off-by: Leon Hwang <[email protected]>
…er dereference A malicious HID device with quirk APPLE_MAGIC_BACKLIGHT can trigger a NULL pointer dereference whilst the power feature-report is toggled and sent to the device in apple_magic_backlight_report_set(). The power feature-report is expected to have two data fields, but if the descriptor declares one field then accessing field[1] and dereferencing it in apple_magic_backlight_report_set() becomes invalid since field[1] will be NULL. An example of a minimal descriptor which can cause the crash is something like the following where the report with ID 3 (power report) only references a single 1-byte field. When hid core parses the descriptor it will encounter the final feature tag, allocate a hid_report (all members of field[] will be zeroed out), create field structure and populate it, increasing the maxfield to 1. The subsequent field[1] access and dereference causes the crash. Usage Page (Vendor Defined 0xFF00) Usage (0x0F) Collection (Application) Report ID (1) Usage (0x01) Logical Minimum (0) Logical Maximum (255) Report Size (8) Report Count (1) Feature (Data,Var,Abs) Usage (0x02) Logical Maximum (32767) Report Size (16) Report Count (1) Feature (Data,Var,Abs) Report ID (3) Usage (0x03) Logical Minimum (0) Logical Maximum (1) Report Size (8) Report Count (1) Feature (Data,Var,Abs) End Collection Here we see the KASAN splat when the kernel dereferences the NULL pointer and crashes: [ 15.164723] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN NOPTI [ 15.165691] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] [ 15.165691] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0 #31 PREEMPT(voluntary) [ 15.165691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 15.165691] RIP: 0010:apple_magic_backlight_report_set+0xbf/0x210 [ 15.165691] Call Trace: [ 15.165691] <TASK> [ 15.165691] apple_probe+0x571/0xa20 [ 15.165691] hid_device_probe+0x2e2/0x6f0 [ 15.165691] really_probe+0x1ca/0x5c0 [ 15.165691] __driver_probe_device+0x24f/0x310 [ 15.165691] driver_probe_device+0x4a/0xd0 [ 15.165691] __device_attach_driver+0x169/0x220 [ 15.165691] bus_for_each_drv+0x118/0x1b0 [ 15.165691] __device_attach+0x1d5/0x380 [ 15.165691] device_initial_probe+0x12/0x20 [ 15.165691] bus_probe_device+0x13d/0x180 [ 15.165691] device_add+0xd87/0x1510 [...] To fix this issue we should validate the number of fields that the backlight and power reports have and if they do not have the required number of fields then bail. Fixes: 394ba61 ("HID: apple: Add support for magic keyboard backlight on T2 Macs") Cc: [email protected] Signed-off-by: Qasim Ijaz <[email protected]> Reviewed-by: Orlando Chamberlain <[email protected]> Tested-by: Aditya Garg <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Benjamin Tissoires <[email protected]>
Without the change `perf `hangs up on charaster devices. On my system it's enough to run system-wide sampler for a few seconds to get the hangup: $ perf record -a -g --call-graph=dwarf $ perf report # hung `strace` shows that hangup happens on reading on a character device `/dev/dri/renderD128` $ strace -y -f -p 2780484 strace: Process 2780484 attached pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached It's call trace descends into `elfutils`: $ gdb -p 2780484 (gdb) bt #0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:25 #1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1 #2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0) at util/dso.h:537 #6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114 #7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242 #8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1 #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0, thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127, best_effort=best_effort@entry=false) at util/thread.h:152 #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939 #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2920 #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440, sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true) at util/machine.c:2970 #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440, sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198 #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0, evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127 #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127, arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255 #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540, evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334 #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0, file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367 #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245 #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324 #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224, file_path=0x105ffbf0 "perf.data") at util/session.c:1419 #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0, --Type <RET> for more, q to quit, c to continue without paging-- quit prog=prog@entry=0x7fff9df81220) at util/session.c:2132 #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220) at util/session.c:2181 #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226 #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390 #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076 #29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827 #30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:351 #31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404 #32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448 #33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556 The hangup happens because nothing in` perf` or `elfutils` checks if a mapped file is easily readable. The change conservatively skips all non-regular files. Signed-off-by: Sergei Trofimovich <[email protected]> Acked-by: Namhyung Kim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Namhyung Kim <[email protected]>
Pull request for series with
subject: scripts/pahole-flags.sh: Parse DWARF and generate BTF with multithreading.
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=615515