Skip to content

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usage
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=385417

kernel-patches-bot and others added 2 commits November 16, 2020 12:10
The intention of the current check is to avoid using bpf_sk_storage
in irq and nmi.  Jakub pointed out that the current check cannot
do that.  For example, in_serving_softirq() returns true
if the softirq handling is interrupted by hard irq.

Fixes: 8e4597c ("bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP")
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: Martin KaFai Lau <[email protected]>
@kernel-patches-bot
Copy link
Author

Master branch: 024cd2c
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=385417
version: 1

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=385417 irrelevant now. Closing PR.

@kernel-patches-bot kernel-patches-bot deleted the series/385417=>bpf-next branch November 17, 2020 01:06
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Nov 27, 2023
With latest upstream llvm18, the following test cases failed:
  $ ./test_progs -j
  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  #77      fentry_fexit:FAIL
  #78/1    fentry_test/fentry:FAIL
  #78      fentry_test:FAIL
  #82/1    fexit_test/fexit:FAIL
  #82      fexit_test:FAIL
  #112/1   kprobe_multi_test/skel_api:FAIL
  #112/2   kprobe_multi_test/link_api_addrs:FAIL
  ...
  #112     kprobe_multi_test:FAIL
  #356/17  test_global_funcs/global_func17:FAIL
  #356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible
for the above failures. For example, for function bpf_fentry_test7()
in net/bpf/test_run.c, without [1], the asm code is:
  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)
and with [1], the asm code is:
  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq
and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for #13/#77 etc. except #356.

For test case #356/17, with [1] (progs/test_global_func17.c)),
the main prog looks like:
  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit
which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent
function specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Nov 27, 2023
With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  #77      fentry_fexit:FAIL
  #78/1    fentry_test/fentry:FAIL
  #78      fentry_test:FAIL
  #82/1    fexit_test/fexit:FAIL
  #82      fexit_test:FAIL
  #112/1   kprobe_multi_test/skel_api:FAIL
  #112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  #112     kprobe_multi_test:FAIL
  #356/17  test_global_funcs/global_func17:FAIL
  #356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for #13/#77 etc. except #356.

For test case #356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Nov 5, 2024
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <[email protected]>
Cc: Evan Quan <[email protected]>
Cc: Wenyou Yang <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
(cherry picked from commit 0880f58)
Cc: [email protected] # v6.6+
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Nov 22, 2024
KASAN reports that the GPU metrics table allocated in
vangogh_tables_init() is not large enough for the memset done in
smu_cmn_init_soft_gpu_metrics(). Condensed report follows:

[   33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu]
[   33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067
...
[   33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G        W          6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544
[   33.861816] Tainted: [W]=WARN
[   33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023
[   33.861822] Call Trace:
[   33.861826]  <TASK>
[   33.861829]  dump_stack_lvl+0x66/0x90
[   33.861838]  print_report+0xce/0x620
[   33.861853]  kasan_report+0xda/0x110
[   33.862794]  kasan_check_range+0xfd/0x1a0
[   33.862799]  __asan_memset+0x23/0x40
[   33.862803]  smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.863306]  vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.864257]  vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.865682]  amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.866160]  amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779]
[   33.867135]  dev_attr_show+0x43/0xc0
[   33.867147]  sysfs_kf_seq_show+0x1f1/0x3b0
[   33.867155]  seq_read_iter+0x3f8/0x1140
[   33.867173]  vfs_read+0x76c/0xc50
[   33.867198]  ksys_read+0xfb/0x1d0
[   33.867214]  do_syscall_64+0x90/0x160
...
[   33.867353] Allocated by task 378 on cpu 7 at 22.794876s:
[   33.867358]  kasan_save_stack+0x33/0x50
[   33.867364]  kasan_save_track+0x17/0x60
[   33.867367]  __kasan_kmalloc+0x87/0x90
[   33.867371]  vangogh_init_smc_tables+0x3f9/0x840 [amdgpu]
[   33.867835]  smu_sw_init+0xa32/0x1850 [amdgpu]
[   33.868299]  amdgpu_device_init+0x467b/0x8d90 [amdgpu]
[   33.868733]  amdgpu_driver_load_kms+0x19/0xf0 [amdgpu]
[   33.869167]  amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu]
[   33.869608]  local_pci_probe+0xda/0x180
[   33.869614]  pci_device_probe+0x43f/0x6b0

Empirically we can confirm that the former allocates 152 bytes for the
table, while the latter memsets the 168 large block.

Root cause appears that when GPU metrics tables for v2_4 parts were added
it was not considered to enlarge the table to fit.

The fix in this patch is rather "brute force" and perhaps later should be
done in a smarter way, by extracting and consolidating the part version to
size logic to a common helper, instead of brute forcing the largest
possible allocation. Nevertheless, for now this works and fixes the out of
bounds write.

v2:
 * Drop impossible v3_0 case. (Mario)

Signed-off-by: Tvrtko Ursulin <[email protected]>
Fixes: 41cec40 ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics")
Cc: Mario Limonciello <[email protected]>
Cc: Evan Quan <[email protected]>
Cc: Wenyou Yang <[email protected]>
Cc: Alex Deucher <[email protected]>
Reviewed-by: Mario Limonciello <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mario Limonciello <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Jul 30, 2025
Failing tests:
- kernel-patches#110     fexit_bpf2bpf:FAIL
- kernel-patches#124     for_each:FAIL
- kernel-patches#144     iters:FAIL
- kernel-patches#148     kfree_skb:FAIL
- kernel-patches#161     l4lb_all:FAIL
- kernel-patches#193     map_kptr:FAIL
- kernel-patches#23      bpf_loop:FAIL
- kernel-patches#260     pkt_access:FAIL
- kernel-patches#269     prog_run_opts:FAIL
- kernel-patches#280     rbtree_success:FAIL
- kernel-patches#356     res_spin_lock_failure:FAIL
- kernel-patches#364     setget_sockopt:FAIL
- kernel-patches#381     sock_fields:FAIL
- kernel-patches#394     spin_lock:FAIL
- kernel-patches#395     spin_lock_success:FAIL
- kernel-patches#444     test_bpffs:FAIL
- kernel-patches#453     test_profiler:FAIL
- kernel-patches#479     usdt:FAIL
- kernel-patches#488     verifier_bits_iter:FAIL
- kernel-patches#597     verif_scale_pyperf600:FAIL
- kernel-patches#598     verif_scale_pyperf600_bpf_loop:FAIL
- kernel-patches#599     verif_scale_pyperf600_iter:FAIL
- kernel-patches#608     verif_scale_strobemeta_subprogs:FAIL
- kernel-patches#622     xdp_attach:FAIL
- kernel-patches#637     xdp_noinline:FAIL
- kernel-patches#639     xdp_synproxy:FAIL
- kernel-patches#72      cls_redirect:FAIL
- kernel-patches#88      crypto_sanity:FAIL
- kernel-patches#97      dynptr:FAIL

Signed-off-by: Eduard Zingerman <[email protected]>
eddyz87 added a commit to eddyz87/bpf that referenced this pull request Jul 30, 2025
Failing tests:
- kernel-patches#110     fexit_bpf2bpf:FAIL
- kernel-patches#124     for_each:FAIL
- kernel-patches#144     iters:FAIL
- kernel-patches#148     kfree_skb:FAIL
- kernel-patches#161     l4lb_all:FAIL
- kernel-patches#193     map_kptr:FAIL
- kernel-patches#23      bpf_loop:FAIL
- kernel-patches#260     pkt_access:FAIL
- kernel-patches#269     prog_run_opts:FAIL
- kernel-patches#280     rbtree_success:FAIL
- kernel-patches#356     res_spin_lock_failure:FAIL
- kernel-patches#364     setget_sockopt:FAIL
- kernel-patches#381     sock_fields:FAIL
- kernel-patches#394     spin_lock:FAIL
- kernel-patches#395     spin_lock_success:FAIL
- kernel-patches#444     test_bpffs:FAIL
- kernel-patches#453     test_profiler:FAIL
- kernel-patches#479     usdt:FAIL
- kernel-patches#488     verifier_bits_iter:FAIL
- kernel-patches#597     verif_scale_pyperf600:FAIL
- kernel-patches#598     verif_scale_pyperf600_bpf_loop:FAIL
- kernel-patches#599     verif_scale_pyperf600_iter:FAIL
- kernel-patches#608     verif_scale_strobemeta_subprogs:FAIL
- kernel-patches#622     xdp_attach:FAIL
- kernel-patches#637     xdp_noinline:FAIL
- kernel-patches#639     xdp_synproxy:FAIL
- kernel-patches#72      cls_redirect:FAIL
- kernel-patches#88      crypto_sanity:FAIL
- kernel-patches#97      dynptr:FAIL

Signed-off-by: Eduard Zingerman <[email protected]>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 19, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 19, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 22, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 23, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <[email protected]>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 24, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants