Skip to content

tools: bpftool: support creating and dumping outer maps #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: tools: bpftool: support creating and dumping outer maps
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=199591

tsipa and others added 3 commits September 4, 2020 12:57
(hash-of-maps or array-of-maps), bpftool does not allow to do so.

It seems that the only reason for that is historical. Lookups for outer
maps was added in commit 14dc6f0 ("bpf: Add syscall lookup support
for fd array and htab"), and although the relevant code in bpftool had
not been merged yet, I suspect it had already been written with the
assumption that user space could not read outer maps.

Let's remove the restriction, dump for outer maps works with no further
change.

Reported-by: Martynas Pumputis <[email protected]>
Signed-off-by: Quentin Monnet <[email protected]>
---
 tools/bpf/bpftool/map.c | 4 ----
 1 file changed, 4 deletions(-)
hash-of-map in bpftool. This is because the kernel needs an inner_map_fd
to collect metadata on the inner maps to be supported by the new map,
but bpftool does not provide a way to pass this file descriptor.

Add a new optional "inner_map" keyword that can be used to pass a
reference to a map, retrieve a fd to that map, and pass it as the
inner_map_fd.

Add related documentation and bash completion. Note that we can
reference the inner map by its name, meaning we can have several times
the keyword "name" with different meanings (mandatory outer map name,
and possibly a name to use to find the inner_map_fd). The bash
completion will offer it just once, and will not suggest "name" on the
following command:

    # bpftool map create /sys/fs/bpf/my_outer_map type hash_of_maps \
        inner_map name my_inner_map [TAB]

Fixing that specific case seems too convoluted. Completion will work as
expected, however, if the outer map name comes first and the "inner_map
name ..." is passed second.

Signed-off-by: Quentin Monnet <[email protected]>
---
 .../bpf/bpftool/Documentation/bpftool-map.rst | 10 +++-
 tools/bpf/bpftool/bash-completion/bpftool     | 22 ++++++++-
 tools/bpf/bpftool/map.c                       | 48 +++++++++++++------
 3 files changed, 62 insertions(+), 18 deletions(-)
@kernel-patches-bot
Copy link
Author

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=199591 expired. Closing PR.

kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2020
error likes:
  error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*):
  Looks like the BPF stack limit of 512 bytes is exceeded.
  Please move large on stack variables into BPF per-cpu array map.

The error is triggered by the following LLVM patch:
  https://reviews.llvm.org/D87134

For example, the following code is from test_sysctl_loop1.c:
  static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
  {
    volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string";
    ...
  }
Without the above LLVM patch, the compiler did optimization to load the string
(59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load,
occupying 64 byte stack size.

With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit.
So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on
the stack, total stack size exceeds 512 bytes, hence compiler complains and quits.

To fix the issue, removing "volatile" key word or changing "volatile" to
"const"/"static const" does not work, the string is put in .rodata.str1.1 section,
which libbpf did not process it and errors out with
  libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1
  libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name'
          in section '.rodata.str1.1'

Defining the string const as global variable can fix the issue as it puts the string constant
in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process
'.rodata.str*.*' properly, the global definition can be changed back to local definition.

Defining tcp_mem_name as a global, however, triggered a verifier failure.
   ./test_progs -n 7/21
  libbpf: load bpf program failed: Permission denied
  libbpf: -- BEGIN DUMP LOG ---
  libbpf:
  invalid stack off=0 size=1
  verification time 6975 usec
  stack depth 160+64
  processed 889 insns (limit 1000000) max_states_per_insn 4 total_states
  14 peak_states 14 mark_read 10

  libbpf: -- END LOG --
  libbpf: failed to load program 'sysctl_tcp_mem'
  libbpf: failed to load object 'test_sysctl_loop2.o'
  test_bpf_verif_scale:FAIL:114
  #7/21 test_sysctl_loop2.o:FAIL
This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code
like
  const char tcp_mem_name[] = "<...long string...>";
  ...
  char name[64];
  ...
  for (i = 0; i < sizeof(tcp_mem_name); ++i)
      if (name[i] != tcp_mem_name[i])
          return 0;
In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be
out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and
79 for test_sysctl_loop2.c.

Without promotion-to-global change, old compiler generates code where
the overflowed stack access is actually filled with valid value, so hiding
the bpf program bug. With promotion-to-global change, the code is different,
more specifically, the previous loading constants to stack is gone, and
"name" occupies stack[-64:0] and overflow access triggers a verifier error.
To fix the issue, adjust "name" buffer size properly.

Reported-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
---
 tools/testing/selftests/bpf/progs/test_sysctl_loop1.c | 2 +-
 tools/testing/selftests/bpf/progs/test_sysctl_loop2.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

Changelog:
  v1 -> v2:
    . The tcp_mem_name change actually triggers a verifier failure due to
      a bpf program bug. Fixing the bpf program bug can make test pass
      with both old and latest llvm. (Alexei)
kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2020
error likes:
  error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*):
  Looks like the BPF stack limit of 512 bytes is exceeded.
  Please move large on stack variables into BPF per-cpu array map.

The error is triggered by the following LLVM patch:
  https://reviews.llvm.org/D87134

For example, the following code is from test_sysctl_loop1.c:
  static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
  {
    volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string";
    ...
  }
Without the above LLVM patch, the compiler did optimization to load the string
(59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load,
occupying 64 byte stack size.

With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit.
So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on
the stack, total stack size exceeds 512 bytes, hence compiler complains and quits.

To fix the issue, removing "volatile" key word or changing "volatile" to
"const"/"static const" does not work, the string is put in .rodata.str1.1 section,
which libbpf did not process it and errors out with
  libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1
  libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name'
          in section '.rodata.str1.1'

Defining the string const as global variable can fix the issue as it puts the string constant
in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process
'.rodata.str*.*' properly, the global definition can be changed back to local definition.

Defining tcp_mem_name as a global, however, triggered a verifier failure.
   ./test_progs -n 7/21
  libbpf: load bpf program failed: Permission denied
  libbpf: -- BEGIN DUMP LOG ---
  libbpf:
  invalid stack off=0 size=1
  verification time 6975 usec
  stack depth 160+64
  processed 889 insns (limit 1000000) max_states_per_insn 4 total_states
  14 peak_states 14 mark_read 10

  libbpf: -- END LOG --
  libbpf: failed to load program 'sysctl_tcp_mem'
  libbpf: failed to load object 'test_sysctl_loop2.o'
  test_bpf_verif_scale:FAIL:114
  #7/21 test_sysctl_loop2.o:FAIL
This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code
like
  const char tcp_mem_name[] = "<...long string...>";
  ...
  char name[64];
  ...
  for (i = 0; i < sizeof(tcp_mem_name); ++i)
      if (name[i] != tcp_mem_name[i])
          return 0;
In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be
out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and
79 for test_sysctl_loop2.c.

Without promotion-to-global change, old compiler generates code where
the overflowed stack access is actually filled with valid value, so hiding
the bpf program bug. With promotion-to-global change, the code is different,
more specifically, the previous loading constants to stack is gone, and
"name" occupies stack[-64:0] and overflow access triggers a verifier error.
To fix the issue, adjust "name" buffer size properly.

Reported-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
---
 tools/testing/selftests/bpf/progs/test_sysctl_loop1.c | 4 ++--
 tools/testing/selftests/bpf/progs/test_sysctl_loop2.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

Changelog:
  v2 -> v3:
    . using sizeof(tcp_mem_name) instead of hardcoded value for
      local buf "name". (Andrii)
  v1 -> v2:
    . The tcp_mem_name change actually triggers a verifier failure due to
      a bpf program bug. Fixing the bpf program bug can make test pass
      with both old and latest llvm. (Alexei)
kernel-patches-bot pushed a commit that referenced this pull request Sep 9, 2020
Andrii reported that with latest clang, when building selftests, we have
error likes:
  error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*):
  Looks like the BPF stack limit of 512 bytes is exceeded.
  Please move large on stack variables into BPF per-cpu array map.

The error is triggered by the following LLVM patch:
  https://reviews.llvm.org/D87134

For example, the following code is from test_sysctl_loop1.c:
  static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
  {
    volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string";
    ...
  }
Without the above LLVM patch, the compiler did optimization to load the string
(59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load,
occupying 64 byte stack size.

With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit.
So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on
the stack, total stack size exceeds 512 bytes, hence compiler complains and quits.

To fix the issue, removing "volatile" key word or changing "volatile" to
"const"/"static const" does not work, the string is put in .rodata.str1.1 section,
which libbpf did not process it and errors out with
  libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1
  libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name'
          in section '.rodata.str1.1'

Defining the string const as global variable can fix the issue as it puts the string constant
in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process
'.rodata.str*.*' properly, the global definition can be changed back to local definition.

Defining tcp_mem_name as a global, however, triggered a verifier failure.
   ./test_progs -n 7/21
  libbpf: load bpf program failed: Permission denied
  libbpf: -- BEGIN DUMP LOG ---
  libbpf:
  invalid stack off=0 size=1
  verification time 6975 usec
  stack depth 160+64
  processed 889 insns (limit 1000000) max_states_per_insn 4 total_states
  14 peak_states 14 mark_read 10

  libbpf: -- END LOG --
  libbpf: failed to load program 'sysctl_tcp_mem'
  libbpf: failed to load object 'test_sysctl_loop2.o'
  test_bpf_verif_scale:FAIL:114
  #7/21 test_sysctl_loop2.o:FAIL
This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code
like
  const char tcp_mem_name[] = "<...long string...>";
  ...
  char name[64];
  ...
  for (i = 0; i < sizeof(tcp_mem_name); ++i)
      if (name[i] != tcp_mem_name[i])
          return 0;
In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be
out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and
79 for test_sysctl_loop2.c.

Without promotion-to-global change, old compiler generates code where
the overflowed stack access is actually filled with valid value, so hiding
the bpf program bug. With promotion-to-global change, the code is different,
more specifically, the previous loading constants to stack is gone, and
"name" occupies stack[-64:0] and overflow access triggers a verifier error.
To fix the issue, adjust "name" buffer size properly.

Reported-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
@kernel-patches-bot kernel-patches-bot deleted the series/199591 branch September 15, 2020 17:49
kernel-patches-bot pushed a commit that referenced this pull request Sep 16, 2020
I got the following lockdep splat while testing:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.8.0-rc7-00172-g021118712e59 #932 Not tainted
  ------------------------------------------------------
  btrfs/229626 is trying to acquire lock:
  ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450

  but task is already holding lock:
  ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #7 (&fs_info->scrub_lock){+.+.}-{3:3}:
	 __mutex_lock+0x9f/0x930
	 btrfs_scrub_dev+0x11c/0x630
	 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
	 btrfs_ioctl+0x2799/0x30a0
	 ksys_ioctl+0x83/0xc0
	 __x64_sys_ioctl+0x16/0x20
	 do_syscall_64+0x50/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x9f/0x930
	 btrfs_run_dev_stats+0x49/0x480
	 commit_cowonly_roots+0xb5/0x2a0
	 btrfs_commit_transaction+0x516/0xa60
	 sync_filesystem+0x6b/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0xe/0x30
	 btrfs_kill_super+0x12/0x20
	 deactivate_locked_super+0x29/0x60
	 cleanup_mnt+0xb8/0x140
	 task_work_run+0x6d/0xb0
	 __prepare_exit_to_usermode+0x1cc/0x1e0
	 do_syscall_64+0x5c/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x9f/0x930
	 btrfs_commit_transaction+0x4bb/0xa60
	 sync_filesystem+0x6b/0x90
	 generic_shutdown_super+0x22/0x100
	 kill_anon_super+0xe/0x30
	 btrfs_kill_super+0x12/0x20
	 deactivate_locked_super+0x29/0x60
	 cleanup_mnt+0xb8/0x140
	 task_work_run+0x6d/0xb0
	 __prepare_exit_to_usermode+0x1cc/0x1e0
	 do_syscall_64+0x5c/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x9f/0x930
	 btrfs_record_root_in_trans+0x43/0x70
	 start_transaction+0xd1/0x5d0
	 btrfs_dirty_inode+0x42/0xd0
	 touch_atime+0xa1/0xd0
	 btrfs_file_mmap+0x3f/0x60
	 mmap_region+0x3a4/0x640
	 do_mmap+0x376/0x580
	 vm_mmap_pgoff+0xd5/0x120
	 ksys_mmap_pgoff+0x193/0x230
	 do_syscall_64+0x50/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
	 __might_fault+0x68/0x90
	 _copy_to_user+0x1e/0x80
	 perf_read+0x141/0x2c0
	 vfs_read+0xad/0x1b0
	 ksys_read+0x5f/0xe0
	 do_syscall_64+0x50/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #2 (&cpuctx_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x9f/0x930
	 perf_event_init_cpu+0x88/0x150
	 perf_event_init+0x1db/0x20b
	 start_kernel+0x3ae/0x53c
	 secondary_startup_64+0xa4/0xb0

  -> #1 (pmus_lock){+.+.}-{3:3}:
	 __mutex_lock+0x9f/0x930
	 perf_event_init_cpu+0x4f/0x150
	 cpuhp_invoke_callback+0xb1/0x900
	 _cpu_up.constprop.26+0x9f/0x130
	 cpu_up+0x7b/0xc0
	 bringup_nonboot_cpus+0x4f/0x60
	 smp_init+0x26/0x71
	 kernel_init_freeable+0x110/0x258
	 kernel_init+0xa/0x103
	 ret_from_fork+0x1f/0x30

  -> #0 (cpu_hotplug_lock){++++}-{0:0}:
	 __lock_acquire+0x1272/0x2310
	 lock_acquire+0x9e/0x360
	 cpus_read_lock+0x39/0xb0
	 alloc_workqueue+0x378/0x450
	 __btrfs_alloc_workqueue+0x15d/0x200
	 btrfs_alloc_workqueue+0x51/0x160
	 scrub_workers_get+0x5a/0x170
	 btrfs_scrub_dev+0x18c/0x630
	 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
	 btrfs_ioctl+0x2799/0x30a0
	 ksys_ioctl+0x83/0xc0
	 __x64_sys_ioctl+0x16/0x20
	 do_syscall_64+0x50/0x90
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  other info that might help us debug this:

  Chain exists of:
    cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(&fs_info->scrub_lock);
				 lock(&fs_devs->device_list_mutex);
				 lock(&fs_info->scrub_lock);
    lock(cpu_hotplug_lock);

   *** DEADLOCK ***

  2 locks held by btrfs/229626:
   #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630
   #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630

  stack backtrace:
  CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932
  Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
  Call Trace:
   dump_stack+0x78/0xa0
   check_noncircular+0x165/0x180
   __lock_acquire+0x1272/0x2310
   lock_acquire+0x9e/0x360
   ? alloc_workqueue+0x378/0x450
   cpus_read_lock+0x39/0xb0
   ? alloc_workqueue+0x378/0x450
   alloc_workqueue+0x378/0x450
   ? rcu_read_lock_sched_held+0x52/0x80
   __btrfs_alloc_workqueue+0x15d/0x200
   btrfs_alloc_workqueue+0x51/0x160
   scrub_workers_get+0x5a/0x170
   btrfs_scrub_dev+0x18c/0x630
   ? start_transaction+0xd1/0x5d0
   btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
   btrfs_ioctl+0x2799/0x30a0
   ? do_sigaction+0x102/0x250
   ? lockdep_hardirqs_on_prepare+0xca/0x160
   ? _raw_spin_unlock_irq+0x24/0x30
   ? trace_hardirqs_on+0x1c/0xe0
   ? _raw_spin_unlock_irq+0x24/0x30
   ? do_sigaction+0x102/0x250
   ? ksys_ioctl+0x83/0xc0
   ksys_ioctl+0x83/0xc0
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x50/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens because we're allocating the scrub workqueues under the
scrub and device list mutex, which brings in a whole host of other
dependencies.

Because the work queue allocation is done with GFP_KERNEL, it can
trigger reclaim, which can lead to a transaction commit, which in turns
needs the device_list_mutex, it can lead to a deadlock. A different
problem for which this fix is a solution.

Fix this by moving the actual allocation outside of the
scrub lock, and then only take the lock once we're ready to actually
assign them to the fs_info.  We'll now have to cleanup the workqueues in
a few more places, so I've added a helper to do the refcount dance to
safely free the workqueues.

CC: [email protected] # 5.4+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 16, 2020
…s metrics" test

Linux 5.9 introduced perf test case "Parse and process metrics" and
on s390 this test case always dumps core:

  [root@t35lp67 perf]# ./perf test -vvvv -F 67
  67: Parse and process metrics                             :
  --- start ---
  metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
  parsing metric: inst_retired.any / cpu_clk_unhalted.thread
  Segmentation fault (core dumped)
  [root@t35lp67 perf]#

I debugged this core dump and gdb shows this call chain:

  (gdb) where
   #0  0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6
   #1  0x000003ffabc293de in strcasestr () from /lib64/libc.so.6
   #2  0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any",
            n=<optimized out>)
       at util/metricgroup.c:368
   #3  find_metric (map=<optimized out>, map=<optimized out>,
           metric=0x1e6ea20 "inst_retired.any")
      at util/metricgroup.c:765
   #4  __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0,
           metric_no_group=<optimized out>, m=<optimized out>)
      at util/metricgroup.c:844
   #5  resolve_metric (ids=0x0, map=0x0, metric_list=0x0,
          metric_no_group=<optimized out>)
      at util/metricgroup.c:881
   #6  metricgroup__add_metric (metric=<optimized out>,
        metric_no_group=metric_no_group@entry=false, events=<optimized out>,
        events@entry=0x3ffd84fb878, metric_list=0x0,
        metric_list@entry=0x3ffd84fb868, map=0x0)
      at util/metricgroup.c:943
   #7  0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>,
        metric_list=0x3ffd84fb868, events=0x3ffd84fb878,
        metric_no_group=<optimized out>, list=<optimized out>)
      at util/metricgroup.c:988
   #8  parse_groups (perf_evlist=perf_evlist@entry=0x1e70260,
          str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>,
          metric_no_merge=<optimized out>,
          fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>,
          metric_events=0x3ffd84fba58, map=0x1)
      at util/metricgroup.c:1040
   #9  0x0000000001103eb2 in metricgroup__parse_groups_test(
  	evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>,
  	str=str@entry=0x12f34b2 "IPC",
  	metric_no_group=metric_no_group@entry=false,
  	metric_no_merge=metric_no_merge@entry=false,
  	metric_events=0x3ffd84fba58)
      at util/metricgroup.c:1082
   #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0,
          ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC",
  	vals=0x3ffd84fbad8, name=0x12f34b2 "IPC")
      at tests/parse-metric.c:159
   #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8,
  	name=0x12f34b2 "IPC")
      at tests/parse-metric.c:189
   #12 test_ipc () at tests/parse-metric.c:208
.....
..... omitted many more lines

This test case was added with
commit 218ca91 ("perf tests: Add parse metric test for frontend metric").

When I compile with make DEBUG=y it works fine and I do not get a core dump.

It turned out that the above listed function call chain worked on a struct
pmu_event array which requires a trailing element with zeroes which was
missing. The marco map_for_each_event() loops over that array tests for members
metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes
the issue.

Output after:

  [root@t35lp46 perf]# ./perf test 67
  67: Parse and process metrics                             : Ok
  [root@t35lp46 perf]#

Committer notes:

As Ian remarks, this is not s390 specific:

<quote Ian>
  This also shows up with address sanitizer on all architectures
  (perhaps change the patch title) and perhaps add a "Fixes: <commit>"
  tag.

  =================================================================
  ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address
  0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp
  0x7ffd24327c58
  READ of size 8 at 0x55c93b4d59e8 thread T0
      #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2
      #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9
      #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9
      #3 0x55c93a1528db in metricgroup__add_metric
  tools/perf/util/metricgroup.c:943:9
      #4 0x55c93a151996 in metricgroup__add_metric_list
  tools/perf/util/metricgroup.c:988:9
      #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8
      #6 0x55c93a1513e1 in metricgroup__parse_groups_test
  tools/perf/util/metricgroup.c:1082:9
      #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8
      #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9
      #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2
      #10 0x55c93a00f1e8 in test__parse_metric
  tools/perf/tests/parse-metric.c:345:2
      #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9
      #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9
      #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4
      #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9
      #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11
      #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8
      #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2
      #18 0x55c939e45f7e in main tools/perf/perf.c:539:3

  0x55c93b4d59e8 is located 0 bytes to the right of global variable
  'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25'
  (0x55c93b4d54a0) of size 1352
  SUMMARY: AddressSanitizer: global-buffer-overflow
  tools/perf/util/metricgroup.c:764:2 in find_metric
  Shadow bytes around the buggy address:
    0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9
    0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
    0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
    0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
    0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:	   fa
    Freed heap region:	   fd
    Stack left redzone:	   f1
    Stack mid redzone:	   f2
    Stack right redzone:     f3
    Stack after return:	   f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:	   f6
    Poisoned by user:        f7
    Container overflow:	   fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
    Shadow gap:              cc
</quote>

I'm also adding the missing "Fixes" tag and setting just .name to NULL,
as doing it that way is more compact (the compiler will zero out
everything else) and the table iterators look for .name being NULL as
the sentinel marking the end of the table.

Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric")
Signed-off-by: Thomas Richter <[email protected]>
Reviewed-by: Sumanth Korikkar <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
Krzysztof Kozlowski says:

====================
nfc: s3fwrn5: Few cleanups

Changes since v2:
1. Fix dtschema ID after rename (patch 1/8).
2. Apply patch 9/9 (defconfig change).

Changes since v1:
1. Rename dtschema file and add additionalProperties:false, as Rob
   suggested,
2. Add Marek's tested-by,
3. New patches: #4, #5, #6, #7 and #9.
====================

Signed-off-by: David S. Miller <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
Commit

  b972fdb ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()")

didn't clear all the information from the scanned system and, more
specifically, left ghes_hw.num_dimms to its previous value. On a
second load (CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use
the leftover num_dimms value which is not 0 and thus the 0 check in
enumerate_dimms() will get bypassed and it would go directly to the
pointer deref:

  d = &hw->dimms[hw->num_dimms];

which is, of course, NULL:

  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] PREEMPT SMP
  CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7
  Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
  RIP: 0010:enumerate_dimms.cold+0x7b/0x375

Reset the whole ghes_hw on driver unregister so that no stale values are
used on a second system scan.

Fixes: b972fdb ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()")
Cc: Shiju Jose <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The aliases were never released causing the following leaks:

  Indirect leak of 1224 byte(s) in 9 object(s) allocated from:
    #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628)
    #1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322
    #2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778
    #3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295
    #4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367
    #5 0x56332c76a09b in run_test tests/builtin-test.c:410
    #6 0x56332c76a09b in test_and_print tests/builtin-test.c:440
    #7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695
    #8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807
    #9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 956a783 ("perf test: Test pmu-events aliases")
Signed-off-by: Namhyung Kim <[email protected]>
Reviewed-by: John Garry <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The evsel->unit borrows a pointer of pmu event or alias instead of
owns a string.  But tool event (duration_time) passes a result of
strdup() caused a leak.

It was found by ASAN during metric test:

  Direct leak of 210 byte(s) in 70 object(s) allocated from:
    #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
    #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
    #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
    #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
    #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
    #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
    #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
    #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
    #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
    #10 0x559fbbc0109b in run_test tests/builtin-test.c:410
    #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
    #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
    #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
    #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The test_generic_metric() missed to release entries in the pctx.  Asan
reported following leak (and more):

  Direct leak of 128 byte(s) in 1 object(s) allocated from:
    #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
    #2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
    #3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
    #4 0x55f7e7341667 in expr__add_ref util/expr.c:120
    #5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
    #6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
    #7 0x55f7e712390b in compute_single tests/parse-metric.c:128
    #8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
    #9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
    #10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
    #11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
    #12 0x55f7e70be09b in run_test tests/builtin-test.c:410
    #13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
    #14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
    #15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
    #16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 6d432c4 ("perf tools: Add test_generic_metric function")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The metricgroup__add_metric() can find multiple match for a metric group
and it's possible to fail.  Also it can fail in the middle like in
resolve_metric() even for single metric.

In those cases, the intermediate list and ids will be leaked like:

  Direct leak of 3 byte(s) in 1 object(s) allocated from:
    #0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683
    #2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906
    #3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940
    #4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993
    #5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045
    #6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087
    #7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164
    #8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196
    #9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318
    #10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356
    #11 0x55f7e70be09b in run_test tests/builtin-test.c:410
    #12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
    #13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
    #14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
    #15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 24, 2020
The following leaks were detected by ASAN:

  Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    #4 0x560578e07045 in test__pmu tests/pmu.c:155
    #5 0x560578de109b in run_test tests/builtin-test.c:410
    #6 0x560578de109b in test_and_print tests/builtin-test.c:440
    #7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    #8 0x560578de401a in cmd_test tests/builtin-test.c:807
    #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Sep 29, 2020
Andrii Nakryiko says:

====================
This patch set introduces a new set of BTF APIs to libbpf that allow to
conveniently produce BTF types and strings. These APIs will allow libbpf to do
more intrusive modifications of program's BTF (by rewriting it, at least as of
right now), which is necessary for the upcoming libbpf static linking. But
they are complete and generic, so can be adopted by anyone who has a need to
produce BTF type information.

One such example outside of libbpf is pahole, which was actually converted to
these APIs (locally, pending landing of these changes in libbpf) completely
and shows reduction in amount of custom pahole code necessary and brings nice
savings in memory usage (about 370MB reduction at peak for my kernel
configuration) and even BTF deduplication times (one second reduction,
23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of
"uncompressed" raw BTF data. Time reduction comes from faster string
search and deduplication by relying on hashmap instead of BST used by pahole's
own code. Consequently, these APIs are already tested on real-world
complicated kernel BTF, but there is also pretty extensive selftest doing
extra validations.

Selftests in patch #3 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros
that are useful for writing shorter and less repretitive selftests. I decided
to keep them local to that selftest for now, but if they prove to be useful in
more contexts we should move them to test_progs.h. And few more (e.g.,
inequality tests) macros are probably necessary to have a more complete set.

Cc: Arnaldo Carvalho de Melo <[email protected]>

v2->v3:
  - resending original patches #7-9 as patches #1-3 due to merge conflict;

v1->v2:
  - fixed comments (John);
  - renamed btf__append_xxx() into btf__add_xxx() (Alexei);
  - added btf__find_str() in addition to btf__add_str();
  - btf__new_empty() now sets kernel FD to -1 initially.
====================

Signed-off-by: Alexei Starovoitov <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 2, 2020
Ido Schimmel says:

====================
drop_monitor: Convert to use devlink tracepoint

Drop monitor is able to monitor both software and hardware originated
drops. Software drops are monitored by having drop monitor register its
probe on the 'kfree_skb' tracepoint. Hardware originated drops are
monitored by having devlink call into drop monitor whenever it receives
a dropped packet from the underlying hardware.

This patch set converts drop monitor to monitor both software and
hardware originated drops in the same way - by registering its probe on
the relevant tracepoint.

In addition to drop monitor being more consistent, it is now also
possible to build drop monitor as module instead of as a builtin and
still monitor hardware originated drops. Initially, CONFIG_NET_DEVLINK
implied CONFIG_NET_DROP_MONITOR, but after commit def2fbf
("kconfig: allow symbols implied by y to become m") we can have
CONFIG_NET_DEVLINK=y and CONFIG_NET_DROP_MONITOR=m and hardware
originated drops will not be monitored.

Patch set overview:

Patch #1 adds a tracepoint in devlink for trap reports.

Patch #2 prepares probe functions in drop monitor for the new
tracepoint.

Patch #3 converts drop monitor to use the new tracepoint.

Patches #4-#6 perform cleanups after the conversion.

Patch #7 adds a test case for drop monitor. Both software originated
drops and hardware originated drops (using netdevsim) are tested.

Tested:

| CONFIG_NET_DEVLINK | CONFIG_NET_DROP_MONITOR | Build | SW drops | HW drops |
| -------------------|-------------------------|-------|----------|----------|
|          y         |            y            |   v   |     v    |     v    |
|          y         |            m            |   v   |     v    |     v    |
|          y         |            n            |   v   |     x    |     x    |
|          n         |            y            |   v   |     v    |     x    |
|          n         |            m            |   v   |     v    |     x    |
|          n         |            n            |   v   |     x    |     x    |
====================

Signed-off-by: David S. Miller <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 3, 2020
With latest llvm trunk, bpf programs under samples/bpf
directory, if using CORE, may experience the following
errors:

LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1'
 #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c)
...
 #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e)
 #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5)
 #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*,
    unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8)
...
Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o

The reason is due to llvm change https://reviews.llvm.org/D87153
where the CORE relocation global generation is moved from the beginning
of target dependent optimization (llc) to the beginning
of target independent optimization (opt).

Since samples/bpf programs did not use vmlinux.h and its clang compilation
uses native architecture, we need to adjust arch triple at opt level
to do CORE relocation global generation properly. Otherwise, the above
error will appear.

This patch fixed the issue by introduce opt and llvm-dis to compilation chain,
which will do proper CORE relocation global generation as well as O2 level
optimization. Tested with llvm10, llvm11 and trunk/llvm12.

Signed-off-by: Yonghong Song <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 5, 2020
With latest llvm trunk, bpf programs under samples/bpf
directory, if using CORE, may experience the following
errors:

LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1'
 #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c)
...
 #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e)
 #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5)
 #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*,
    unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8)
...
Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o

The reason is due to llvm change https://reviews.llvm.org/D87153
where the CORE relocation global generation is moved from the beginning
of target dependent optimization (llc) to the beginning
of target independent optimization (opt).

Since samples/bpf programs did not use vmlinux.h and its clang compilation
uses native architecture, we need to adjust arch triple at opt level
to do CORE relocation global generation properly. Otherwise, the above
error will appear.

This patch fixed the issue by introduce opt and llvm-dis to compilation chain,
which will do proper CORE relocation global generation as well as O2 level
optimization. Tested with llvm10, llvm11 and trunk/llvm12.

Signed-off-by: Yonghong Song <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 5, 2020
With latest llvm trunk, bpf programs under samples/bpf
directory, if using CORE, may experience the following
errors:

LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1'
 #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c)
...
 #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e)
 #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5)
 #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*,
    unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8)
...
Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o

The reason is due to llvm change https://reviews.llvm.org/D87153
where the CORE relocation global generation is moved from the beginning
of target dependent optimization (llc) to the beginning
of target independent optimization (opt).

Since samples/bpf programs did not use vmlinux.h and its clang compilation
uses native architecture, we need to adjust arch triple at opt level
to do CORE relocation global generation properly. Otherwise, the above
error will appear.

This patch fixed the issue by introduce opt and llvm-dis to compilation chain,
which will do proper CORE relocation global generation as well as O2 level
optimization. Tested with llvm10, llvm11 and trunk/llvm12.

Signed-off-by: Yonghong Song <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Oct 5, 2020
With latest llvm trunk, bpf programs under samples/bpf
directory, if using CORE, may experience the following
errors:

LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1'
 #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c)
...
 #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e)
 #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*)
    (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5)
 #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*,
    unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8)
...
Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o

The reason is due to llvm change https://reviews.llvm.org/D87153
where the CORE relocation global generation is moved from the beginning
of target dependent optimization (llc) to the beginning
of target independent optimization (opt).

Since samples/bpf programs did not use vmlinux.h and its clang compilation
uses native architecture, we need to adjust arch triple at opt level
to do CORE relocation global generation properly. Otherwise, the above
error will appear.

This patch fixed the issue by introduce opt and llvm-dis to compilation chain,
which will do proper CORE relocation global generation as well as O2 level
optimization. Tested with llvm10, llvm11 and trunk/llvm12.

Signed-off-by: Yonghong Song <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Apr 15, 2025
As shown in [1], it is possible to corrupt a BPF ELF file such that
arbitrary BPF instructions are loaded by libbpf. This can be done by
setting a symbol (BPF program) section offset to a large (unsigned)
number such that <section start + symbol offset> overflows and points
before the section data in the memory.

Consider the situation below where:
- prog_start = sec_start + symbol_offset    <-- size_t overflow here
- prog_end   = prog_start + prog_size

    prog_start        sec_start        prog_end        sec_end
        |                |                 |              |
        v                v                 v              v
    .....................|################################|............

The report in [1] also provides a corrupted BPF ELF which can be used as
a reproducer:

    $ readelf -S crash
    Section Headers:
      [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
    ...
      [ 2] uretprobe.mu[...] PROGBITS         0000000000000000  00000040
           0000000000000068  0000000000000000  AX       0     0     8

    $ readelf -s crash
    Symbol table '.symtab' contains 8 entries:
       Num:    Value          Size Type    Bind   Vis      Ndx Name
    ...
         6: ffffffffffffffb8   104 FUNC    GLOBAL DEFAULT    2 handle_tp

Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will
point before the actual memory where section 2 is allocated.

This is also reported by AddressSanitizer:

    =================================================================
    ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490
    READ of size 104 at 0x7c7302fe0000 thread T0
        #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76)
        #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856
        #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928
        #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930
        #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067
        #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090
        #6 0x000000400c16 in main /poc/poc.c:8
        #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4)
        #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667)
        #9 0x000000400b34 in _start (/poc/poc+0x400b34)

    0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8)
    allocated by thread T0 here:
        #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b)
        #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600)
        #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018)
        #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740

The problem here is that currently, libbpf only checks that the program
end is within the section bounds. There used to be a check
`while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was
removed by commit 6245947 ("libbpf: Allow gaps in BPF program
sections to support overriden weak functions").

Add a check for detecting the overflow of `sec_off + prog_sz` to
bpf_object__init_prog to fix this issue.

[1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md

Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions")
Reported-by: lmarch2 <[email protected]>
Signed-off-by: Viktor Malik <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Reviewed-by: Shung-Hsi Yu <[email protected]>
Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md
Link: https://lore.kernel.org/bpf/[email protected]
olsajiri pushed a commit to olsajiri/bpf that referenced this pull request May 15, 2025
Without CONFIG_DRM_XE_GPUSVM set, GPU SVM is not initialized thus below
warning pops. Refine the flush work code to be controlled by the config
to avoid below warning:
"
[  453.132028] ------------[ cut here ]------------
[  453.132527] WARNING: CPU: 9 PID: 4491 at kernel/workqueue.c:4205 __flush_work+0x379/0x3a0
[  453.133355] Modules linked in: xe drm_ttm_helper ttm gpu_sched drm_buddy drm_suballoc_helper drm_gpuvm drm_exec
[  453.134352] CPU: 9 UID: 0 PID: 4491 Comm: xe_exec_mix_mod Tainted: G     U  W           6.15.0-rc3+ kernel-patches#7 PREEMPT(full)
[  453.135405] Tainted: [U]=USER, [W]=WARN
...
[  453.136921] RIP: 0010:__flush_work+0x379/0x3a0
[  453.137417] Code: 8b 45 00 48 8b 55 08 89 c7 48 c1 e8 04 83 e7 08 83 e0 0f 83 cf 02 89 c6 48 0f ba 6d 00 03 e9 d5 fe ff ff 0f 0b e9 db fd ff ff <0f> 0b 45 31 e4 e9 d1 fd ff ff 0f 0b e9 03 ff ff ff 0f 0b e9 d6 fe
[  453.139250] RSP: 0018:ffffc90000c67b18 EFLAGS: 00010246
[  453.139782] RAX: 0000000000000000 RBX: ffff888108a24000 RCX: 0000000000002000
[  453.140521] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8881016d61c8
[  453.141253] RBP: ffff8881016d61c8 R08: 0000000000000000 R09: 0000000000000000
[  453.141985] R10: 0000000000000000 R11: 0000000008a24000 R12: 0000000000000001
[  453.142709] R13: 0000000000000002 R14: 0000000000000000 R15: ffff888107db8c00
[  453.143450] FS:  00007f44853d4c80(0000) GS:ffff8882f469b000(0000) knlGS:0000000000000000
[  453.144276] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  453.144853] CR2: 00007f4487629228 CR3: 00000001016aa000 CR4: 00000000000406f0
[  453.145594] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  453.146320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  453.147061] Call Trace:
[  453.147336]  <TASK>
[  453.147579]  ? tick_nohz_tick_stopped+0xd/0x30
[  453.148067]  ? xas_load+0x9/0xb0
[  453.148435]  ? xa_load+0x6f/0xb0
[  453.148781]  __xe_vm_bind_ioctl+0xbd5/0x1500 [xe]
[  453.149338]  ? dev_printk_emit+0x48/0x70
[  453.149762]  ? _dev_printk+0x57/0x80
[  453.150148]  ? drm_ioctl+0x17c/0x440
[  453.150544]  ? __drm_dev_vprintk+0x36/0x90
[  453.150983]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[  453.151575]  ? drm_ioctl_kernel+0x9f/0xf0
[  453.151998]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[  453.152560]  drm_ioctl_kernel+0x9f/0xf0
[  453.152968]  drm_ioctl+0x20f/0x440
[  453.153332]  ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe]
[  453.153893]  ? ioctl_has_perm.constprop.0.isra.0+0xae/0x100
[  453.154489]  ? memory_bm_test_bit+0x5/0x60
[  453.154935]  xe_drm_ioctl+0x47/0x70 [xe]
[  453.155419]  __x64_sys_ioctl+0x8d/0xc0
[  453.155824]  do_syscall_64+0x47/0x110
[  453.156228]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
"

v2 (Matt):
    refine commit message to have more details
    add Fixes tag
    move the code to xe_svm.h which already have the config
    remove a blank line per codestyle suggestion

Fixes: 63f6e48 ("drm/xe: Add SVM garbage collector")
Cc: Matthew Brost <[email protected]>
Signed-off-by: Shuicheng Lin <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
(cherry picked from commit 9d80698)
Signed-off-by: Lucas De Marchi <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 5, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 5, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 5, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 5, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 5, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 6, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 7, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 7, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 7, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 7, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 7, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 7, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 7, 2025
There are two sites in atm mpoa code that believe the fetched object
net_device is of lec type. However, both of them do just name checking
to ensure that the device name starts with "lec" pattern string.

That is, malicious user can hijack this by creating another device
starting with that pattern, thereby causing type confusion. For example,
create a *team* interface with lecX name, bind that interface and send
messages will get a crash like below:

[   18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   18.452366] BUG: unable to handle page fault for address: ffff888005702a70
[   18.454253] #PF: supervisor instruction fetch in kernel mode
[   18.455058] #PF: error_code(0x0011) - permissions violation
[   18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3
[   18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI
[   18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7
[   18.456921] RIP: 0010:0xffff888005702a70
[   18.457151] Code: .....
[   18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286
[   18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b
[   18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000
[   18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000
[   18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000
[   18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000
[   18.460425] FS:  0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[   18.460872] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0
[   18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.462368] Call Trace:
[   18.462518]  <TASK>
[   18.462645]  ? __die_body+0x64/0xb0
[   18.462856]  ? page_fault_oops+0x353/0x3e0
[   18.463092]  ? exc_page_fault+0xaf/0xd0
[   18.463322]  ? asm_exc_page_fault+0x22/0x30
[   18.463589]  ? msg_from_mpoad+0x431/0x9d0
[   18.463820]  ? vcc_sendmsg+0x165/0x3b0
[   18.464031]  vcc_sendmsg+0x20a/0x3b0
[   18.464238]  ? wake_bit_function+0x80/0x80
[   18.464511]  __sys_sendto+0x38c/0x3a0
[   18.464729]  ? percpu_counter_add_batch+0x87/0xb0
[   18.465002]  __x64_sys_sendto+0x22/0x30
[   18.465219]  do_syscall_64+0x6c/0xa0
[   18.465465]  ? preempt_count_add+0x54/0xb0
[   18.465697]  ? up_read+0x37/0x80
[   18.465883]  ? do_user_addr_fault+0x25e/0x5b0
[   18.466126]  ? exit_to_user_mode_prepare+0x12/0xb0
[   18.466435]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   18.466727] RIP: 0033:0x785e61be4407
[   18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[   18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407
[   18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003
[   18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000
[   18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98

Correctly validating the net_device object has several methods. For
example, function xgbe_netdev_event() checks `netdev_ops` field,
function clip_device_event() checks `type` field. Considering the
related variable `lec_netdev_ops` is not defined in the same file, so
introduce another type value `ARPHRD_ATM_LANE` for a simple and correct
check.

By the way, this bug dates back to pre-git history (2.3.15), hence use
the first reference for tracking.

Signed-off-by: Lin Ma <[email protected]>
Fixes: 1da177e ("Linux-2.6.12-rc2")
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 17, 2025
A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list:
    [exception RIP: __nf_ct_delete_from_lists+172]
    [..]
 kernel-patches#7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]
 kernel-patches#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]
 kernel-patches#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]
    [..]

The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state:

 ct hlist pointer is garbage; looks like the ct hash value
 (hence crash).
 ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected
 ct->timeout is 30000 (=30s), which is unexpected.

Everything else looks like normal udp conntrack entry.  If we ignore
ct->status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash:
  - ct hlist pointers are overloaded and store/cache the raw tuple hash
  - ct->timeout matches the relative time expected for a new udp flow
    rather than the absolute 'jiffies' value.

If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry.

Theory is that we did hit following race:

cpu x 			cpu y			cpu z
 found entry E		found entry E
 E is expired		<preemption>
 nf_ct_delete()
 return E to rcu slab
					init_conntrack
					E is re-inited,
					ct->status set to 0
					reply tuplehash hnnode.pprev
					stores hash value.

cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z.  cpu y was preempted before
checking for expiry and/or confirm bit.

					->refcnt set to 1
					E now owned by skb
					->timeout set to 30000

If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit.

					nf_conntrack_confirm gets called
					sets: ct->status |= CONFIRMED
					This is wrong: E is not yet added
					to hashtable.

cpu y resumes, it observes E as expired but CONFIRMED:
			<resumes>
			nf_ct_expired()
			 -> yes (ct->timeout is 30s)
			confirmed bit set.

cpu y will try to delete E from the hashtable:
			nf_ct_delete() -> set DYING bit
			__nf_ct_delete_from_lists

Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks:

			wait for spinlock held by z

					CONFIRMED is set but there is no
					guarantee ct will be added to hash:
					"chaintoolong" or "clash resolution"
					logic both skip the insert step.
					reply hnnode.pprev still stores the
					hash value.

					unlocks spinlock
					return NF_DROP
			<unblocks, then
			 crashes on hlist_nulls_del_rcu pprev>

In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs.

Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy.

To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock.

Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this.

It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set:

Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped.

Also change nf_ct_should_gc() to first check the confirmed bit.

The gc sequence is:
 1. Check if entry has expired, if not skip to next entry
 2. Obtain a reference to the expired entry.
 3. Call nf_ct_should_gc() to double-check step 1.

nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check passes
ct->timeout has been altered to reflect the absolute 'best before' date
instead of a relative time.  Step 3 will therefore not remove the entry.

Without this change to nf_ct_should_gc() we could still get this sequence:

 1. Check if entry has expired.
 2. Obtain a reference.
 3. Call nf_ct_should_gc() to double-check step 1:
    4 - entry is still observed as expired
    5 - meanwhile, ct->timeout is corrected to absolute value on other CPU
      and confirm bit gets set
    6 - confirm bit is seen
    7 - valid entry is removed again

First do check 6), then 4) so the gc expiry check always picks up either
confirmed bit unset (entry gets skipped) or expiry re-check failure for
re-inited conntrack objects.

This change cannot be backported to releases before 5.19. Without
commit 8a75a2c ("netfilter: conntrack: remove unconfirmed list")
|= IPS_CONFIRMED line cannot be moved without further changes.

Cc: Razvan Cojocaru <[email protected]>
Link: https://lore.kernel.org/netfilter-devel/[email protected]/
Link: https://lore.kernel.org/netfilter-devel/[email protected]/
Fixes: 1397af5 ("netfilter: conntrack: remove the percpu dying list")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jul 17, 2025
A crash in conntrack was reported while trying to unlink the conntrack
entry from the hash bucket list:
    [exception RIP: __nf_ct_delete_from_lists+172]
    [..]
 kernel-patches#7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]
 kernel-patches#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]
 kernel-patches#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]
    [..]

The nf_conn struct is marked as allocated from slab but appears to be in
a partially initialised state:

 ct hlist pointer is garbage; looks like the ct hash value
 (hence crash).
 ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected
 ct->timeout is 30000 (=30s), which is unexpected.

Everything else looks like normal udp conntrack entry.  If we ignore
ct->status and pretend its 0, the entry matches those that are newly
allocated but not yet inserted into the hash:
  - ct hlist pointers are overloaded and store/cache the raw tuple hash
  - ct->timeout matches the relative time expected for a new udp flow
    rather than the absolute 'jiffies' value.

If it were not for the presence of IPS_CONFIRMED,
__nf_conntrack_find_get() would have skipped the entry.

Theory is that we did hit following race:

cpu x 			cpu y			cpu z
 found entry E		found entry E
 E is expired		<preemption>
 nf_ct_delete()
 return E to rcu slab
					init_conntrack
					E is re-inited,
					ct->status set to 0
					reply tuplehash hnnode.pprev
					stores hash value.

cpu y found E right before it was deleted on cpu x.
E is now re-inited on cpu z.  cpu y was preempted before
checking for expiry and/or confirm bit.

					->refcnt set to 1
					E now owned by skb
					->timeout set to 30000

If cpu y were to resume now, it would observe E as
expired but would skip E due to missing CONFIRMED bit.

					nf_conntrack_confirm gets called
					sets: ct->status |= CONFIRMED
					This is wrong: E is not yet added
					to hashtable.

cpu y resumes, it observes E as expired but CONFIRMED:
			<resumes>
			nf_ct_expired()
			 -> yes (ct->timeout is 30s)
			confirmed bit set.

cpu y will try to delete E from the hashtable:
			nf_ct_delete() -> set DYING bit
			__nf_ct_delete_from_lists

Even this scenario doesn't guarantee a crash:
cpu z still holds the table bucket lock(s) so y blocks:

			wait for spinlock held by z

					CONFIRMED is set but there is no
					guarantee ct will be added to hash:
					"chaintoolong" or "clash resolution"
					logic both skip the insert step.
					reply hnnode.pprev still stores the
					hash value.

					unlocks spinlock
					return NF_DROP
			<unblocks, then
			 crashes on hlist_nulls_del_rcu pprev>

In case CPU z does insert the entry into the hashtable, cpu y will unlink
E again right away but no crash occurs.

Without 'cpu y' race, 'garbage' hlist is of no consequence:
ct refcnt remains at 1, eventually skb will be free'd and E gets
destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy.

To resolve this, move the IPS_CONFIRMED assignment after the table
insertion but before the unlock.

Pablo points out that the confirm-bit-store could be reordered to happen
before hlist add resp. the timeout fixup, so switch to set_bit and
before_atomic memory barrier to prevent this.

It doesn't matter if other CPUs can observe a newly inserted entry right
before the CONFIRMED bit was set:

Such event cannot be distinguished from above "E is the old incarnation"
case: the entry will be skipped.

Also change nf_ct_should_gc() to first check the confirmed bit.

The gc sequence is:
 1. Check if entry has expired, if not skip to next entry
 2. Obtain a reference to the expired entry.
 3. Call nf_ct_should_gc() to double-check step 1.

nf_ct_should_gc() is thus called only for entries that already failed an
expiry check. After this patch, once the confirmed bit check passes
ct->timeout has been altered to reflect the absolute 'best before' date
instead of a relative time.  Step 3 will therefore not remove the entry.

Without this change to nf_ct_should_gc() we could still get this sequence:

 1. Check if entry has expired.
 2. Obtain a reference.
 3. Call nf_ct_should_gc() to double-check step 1:
    4 - entry is still observed as expired
    5 - meanwhile, ct->timeout is corrected to absolute value on other CPU
      and confirm bit gets set
    6 - confirm bit is seen
    7 - valid entry is removed again

First do check 6), then 4) so the gc expiry check always picks up either
confirmed bit unset (entry gets skipped) or expiry re-check failure for
re-inited conntrack objects.

This change cannot be backported to releases before 5.19. Without
commit 8a75a2c ("netfilter: conntrack: remove unconfirmed list")
|= IPS_CONFIRMED line cannot be moved without further changes.

Cc: Razvan Cojocaru <[email protected]>
Link: https://lore.kernel.org/netfilter-devel/[email protected]/
Link: https://lore.kernel.org/netfilter-devel/[email protected]/
Fixes: 1397af5 ("netfilter: conntrack: remove the percpu dying list")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Jul 30, 2025
The hfsplus_bnode_read() method can trigger the issue:

[  174.852007][ T9784] ==================================================================
[  174.852709][ T9784] BUG: KASAN: slab-out-of-bounds in hfsplus_bnode_read+0x2f4/0x360
[  174.853412][ T9784] Read of size 8 at addr ffff88810b5fc6c0 by task repro/9784
[  174.854059][ T9784]
[  174.854272][ T9784] CPU: 1 UID: 0 PID: 9784 Comm: repro Not tainted 6.16.0-rc3 #7 PREEMPT(full)
[  174.854281][ T9784] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  174.854286][ T9784] Call Trace:
[  174.854289][ T9784]  <TASK>
[  174.854292][ T9784]  dump_stack_lvl+0x10e/0x1f0
[  174.854305][ T9784]  print_report+0xd0/0x660
[  174.854315][ T9784]  ? __virt_addr_valid+0x81/0x610
[  174.854323][ T9784]  ? __phys_addr+0xe8/0x180
[  174.854330][ T9784]  ? hfsplus_bnode_read+0x2f4/0x360
[  174.854337][ T9784]  kasan_report+0xc6/0x100
[  174.854346][ T9784]  ? hfsplus_bnode_read+0x2f4/0x360
[  174.854354][ T9784]  hfsplus_bnode_read+0x2f4/0x360
[  174.854362][ T9784]  hfsplus_bnode_dump+0x2ec/0x380
[  174.854370][ T9784]  ? __pfx_hfsplus_bnode_dump+0x10/0x10
[  174.854377][ T9784]  ? hfsplus_bnode_write_u16+0x83/0xb0
[  174.854385][ T9784]  ? srcu_gp_start+0xd0/0x310
[  174.854393][ T9784]  ? __mark_inode_dirty+0x29e/0xe40
[  174.854402][ T9784]  hfsplus_brec_remove+0x3d2/0x4e0
[  174.854411][ T9784]  __hfsplus_delete_attr+0x290/0x3a0
[  174.854419][ T9784]  ? __pfx_hfs_find_1st_rec_by_cnid+0x10/0x10
[  174.854427][ T9784]  ? __pfx___hfsplus_delete_attr+0x10/0x10
[  174.854436][ T9784]  ? __asan_memset+0x23/0x50
[  174.854450][ T9784]  hfsplus_delete_all_attrs+0x262/0x320
[  174.854459][ T9784]  ? __pfx_hfsplus_delete_all_attrs+0x10/0x10
[  174.854469][ T9784]  ? rcu_is_watching+0x12/0xc0
[  174.854476][ T9784]  ? __mark_inode_dirty+0x29e/0xe40
[  174.854483][ T9784]  hfsplus_delete_cat+0x845/0xde0
[  174.854493][ T9784]  ? __pfx_hfsplus_delete_cat+0x10/0x10
[  174.854507][ T9784]  hfsplus_unlink+0x1ca/0x7c0
[  174.854516][ T9784]  ? __pfx_hfsplus_unlink+0x10/0x10
[  174.854525][ T9784]  ? down_write+0x148/0x200
[  174.854532][ T9784]  ? __pfx_down_write+0x10/0x10
[  174.854540][ T9784]  vfs_unlink+0x2fe/0x9b0
[  174.854549][ T9784]  do_unlinkat+0x490/0x670
[  174.854557][ T9784]  ? __pfx_do_unlinkat+0x10/0x10
[  174.854565][ T9784]  ? __might_fault+0xbc/0x130
[  174.854576][ T9784]  ? getname_flags.part.0+0x1c5/0x550
[  174.854584][ T9784]  __x64_sys_unlink+0xc5/0x110
[  174.854592][ T9784]  do_syscall_64+0xc9/0x480
[  174.854600][ T9784]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  174.854608][ T9784] RIP: 0033:0x7f6fdf4c3167
[  174.854614][ T9784] Code: f0 ff ff 73 01 c3 48 8b 0d 26 0d 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 08
[  174.854622][ T9784] RSP: 002b:00007ffcb948bca8 EFLAGS: 00000206 ORIG_RAX: 0000000000000057
[  174.854630][ T9784] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6fdf4c3167
[  174.854636][ T9784] RDX: 00007ffcb948bcc0 RSI: 00007ffcb948bcc0 RDI: 00007ffcb948bd50
[  174.854641][ T9784] RBP: 00007ffcb948cd90 R08: 0000000000000001 R09: 00007ffcb948bb40
[  174.854645][ T9784] R10: 00007f6fdf564fc0 R11: 0000000000000206 R12: 0000561e1bc9c2d0
[  174.854650][ T9784] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  174.854658][ T9784]  </TASK>
[  174.854661][ T9784]
[  174.879281][ T9784] Allocated by task 9784:
[  174.879664][ T9784]  kasan_save_stack+0x20/0x40
[  174.880082][ T9784]  kasan_save_track+0x14/0x30
[  174.880500][ T9784]  __kasan_kmalloc+0xaa/0xb0
[  174.880908][ T9784]  __kmalloc_noprof+0x205/0x550
[  174.881337][ T9784]  __hfs_bnode_create+0x107/0x890
[  174.881779][ T9784]  hfsplus_bnode_find+0x2d0/0xd10
[  174.882222][ T9784]  hfsplus_brec_find+0x2b0/0x520
[  174.882659][ T9784]  hfsplus_delete_all_attrs+0x23b/0x320
[  174.883144][ T9784]  hfsplus_delete_cat+0x845/0xde0
[  174.883595][ T9784]  hfsplus_rmdir+0x106/0x1b0
[  174.884004][ T9784]  vfs_rmdir+0x206/0x690
[  174.884379][ T9784]  do_rmdir+0x2b7/0x390
[  174.884751][ T9784]  __x64_sys_rmdir+0xc5/0x110
[  174.885167][ T9784]  do_syscall_64+0xc9/0x480
[  174.885568][ T9784]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  174.886083][ T9784]
[  174.886293][ T9784] The buggy address belongs to the object at ffff88810b5fc600
[  174.886293][ T9784]  which belongs to the cache kmalloc-192 of size 192
[  174.887507][ T9784] The buggy address is located 40 bytes to the right of
[  174.887507][ T9784]  allocated 152-byte region [ffff88810b5fc600, ffff88810b5fc698)
[  174.888766][ T9784]
[  174.888976][ T9784] The buggy address belongs to the physical page:
[  174.889533][ T9784] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10b5fc
[  174.890295][ T9784] flags: 0x57ff00000000000(node=1|zone=2|lastcpupid=0x7ff)
[  174.890927][ T9784] page_type: f5(slab)
[  174.891284][ T9784] raw: 057ff00000000000 ffff88801b4423c0 ffffea000426dc80 dead000000000002
[  174.892032][ T9784] raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
[  174.892774][ T9784] page dumped because: kasan: bad access detected
[  174.893327][ T9784] page_owner tracks the page as allocated
[  174.893825][ T9784] page last allocated via order 0, migratetype Unmovable, gfp_mask 0x52c00(GFP_NOIO|__GFP_NOWARN|__GFP_NO1
[  174.895373][ T9784]  post_alloc_hook+0x1c0/0x230
[  174.895801][ T9784]  get_page_from_freelist+0xdeb/0x3b30
[  174.896284][ T9784]  __alloc_frozen_pages_noprof+0x25c/0x2460
[  174.896810][ T9784]  alloc_pages_mpol+0x1fb/0x550
[  174.897242][ T9784]  new_slab+0x23b/0x340
[  174.897614][ T9784]  ___slab_alloc+0xd81/0x1960
[  174.898028][ T9784]  __slab_alloc.isra.0+0x56/0xb0
[  174.898468][ T9784]  __kmalloc_noprof+0x2b0/0x550
[  174.898896][ T9784]  usb_alloc_urb+0x73/0xa0
[  174.899289][ T9784]  usb_control_msg+0x1cb/0x4a0
[  174.899718][ T9784]  usb_get_string+0xab/0x1a0
[  174.900133][ T9784]  usb_string_sub+0x107/0x3c0
[  174.900549][ T9784]  usb_string+0x307/0x670
[  174.900933][ T9784]  usb_cache_string+0x80/0x150
[  174.901355][ T9784]  usb_new_device+0x1d0/0x19d0
[  174.901786][ T9784]  register_root_hub+0x299/0x730
[  174.902231][ T9784] page last free pid 10 tgid 10 stack trace:
[  174.902757][ T9784]  __free_frozen_pages+0x80c/0x1250
[  174.903217][ T9784]  vfree.part.0+0x12b/0xab0
[  174.903645][ T9784]  delayed_vfree_work+0x93/0xd0
[  174.904073][ T9784]  process_one_work+0x9b5/0x1b80
[  174.904519][ T9784]  worker_thread+0x630/0xe60
[  174.904927][ T9784]  kthread+0x3a8/0x770
[  174.905291][ T9784]  ret_from_fork+0x517/0x6e0
[  174.905709][ T9784]  ret_from_fork_asm+0x1a/0x30
[  174.906128][ T9784]
[  174.906338][ T9784] Memory state around the buggy address:
[  174.906828][ T9784]  ffff88810b5fc580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  174.907528][ T9784]  ffff88810b5fc600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  174.908222][ T9784] >ffff88810b5fc680: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
[  174.908917][ T9784]                                            ^
[  174.909481][ T9784]  ffff88810b5fc700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  174.910432][ T9784]  ffff88810b5fc780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  174.911401][ T9784] ==================================================================

The reason of the issue that code doesn't check the correctness
of the requested offset and length. As a result, incorrect value
of offset or/and length could result in access out of allocated
memory.

This patch introduces is_bnode_offset_valid() method that checks
the requested offset value. Also, it introduces
check_and_correct_requested_length() method that checks and
correct the requested length (if it is necessary). These methods
are used in hfsplus_bnode_read(), hfsplus_bnode_write(),
hfsplus_bnode_clear(), hfsplus_bnode_copy(), and hfsplus_bnode_move()
with the goal to prevent the access out of allocated memory
and triggering the crash.

Reported-by: Kun Hu <[email protected]>
Reported-by: Jiaji Qin <[email protected]>
Reported-by: Shuoran Bai <[email protected]>
Signed-off-by: Viacheslav Dubeyko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Viacheslav Dubeyko <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Aug 2, 2025
pert script tests fails with segmentation fault as below:

  92: perf script tests:
  --- start ---
  test child forked, pid 103769
  DB test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.012 MB /tmp/perf-test-script.7rbftEpOzX/perf.data (9 samples) ]
  /usr/libexec/perf-core/tests/shell/script.sh: line 35:
  103780 Segmentation fault      (core dumped)
  perf script -i "${perfdatafile}" -s "${db_test}"
  --- Cleaning up ---
  ---- end(-1) ----
  92: perf script tests                                               : FAILED!

Backtrace pointed to :
	#0  0x0000000010247dd0 in maps.machine ()
	#1  0x00000000101d178c in db_export.sample ()
	#2  0x00000000103412c8 in python_process_event ()
	#3  0x000000001004eb28 in process_sample_event ()
	#4  0x000000001024fcd0 in machines.deliver_event ()
	#5  0x000000001025005c in perf_session.deliver_event ()
	#6  0x00000000102568b0 in __ordered_events__flush.part.0 ()
	#7  0x0000000010251618 in perf_session.process_events ()
	#8  0x0000000010053620 in cmd_script ()
	#9  0x00000000100b5a28 in run_builtin ()
	#10 0x00000000100b5f94 in handle_internal_command ()
	#11 0x0000000010011114 in main ()

Further investigation reveals that this occurs in the `perf script tests`,
because it uses `db_test.py` script. This script sets `perf_db_export_mode = True`.

With `perf_db_export_mode` enabled, if a sample originates from a hypervisor,
perf doesn't set maps for "[H]" sample in the code. Consequently, `al->maps` remains NULL
when `maps__machine(al->maps)` is called from `db_export__sample`.

As al->maps can be NULL in case of Hypervisor samples , use thread->maps
because even for Hypervisor sample, machine should exist.
If we don't have machine for some reason, return -1 to avoid segmentation fault.

Reported-by: Disha Goel <[email protected]>
Signed-off-by: Aditya Bodkhe <[email protected]>
Reviewed-by: Adrian Hunter <[email protected]>
Tested-by: Disha Goel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Adrian Hunter <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Aug 2, 2025
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:

    $ perf record -a -g --call-graph=dwarf
    $ perf report
    # hung

`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`

    $ strace -y -f -p 2780484
    strace: Process 2780484 attached
    pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached

It's call trace descends into `elfutils`:

    $ gdb -p 2780484
    (gdb) bt
    #0  0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
        at ../sysdeps/unix/sysv/linux/pread64.c:25
    #1  0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
    #2  0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #3  0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #4  0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
       from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #5  0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
        at util/dso.h:537
    #6  0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
    #7  frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
    #8  0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #9  0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
        thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
        best_effort=best_effort@entry=false) at util/thread.h:152
    #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
        sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
    #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
        max_stack=127, symbols=true) at util/machine.c:2920
    #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
        sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
        at util/machine.c:2970
    #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
        sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
    #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
        evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
    #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
        arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
    #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
        evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
    #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
        file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
    #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
    #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
    #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
        file_path=0x105ffbf0 "perf.data") at util/session.c:1419
    #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
    --Type <RET> for more, q to quit, c to continue without paging--
    quit
        prog=prog@entry=0x7fff9df81220) at util/session.c:2132
    #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
        at util/session.c:2181
    #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
    #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
    #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
    #29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
    #30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
        at perf.c:351
    #31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
    #32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
    #33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556

The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.

The change conservatively skips all non-regular files.

Signed-off-by: Sergei Trofimovich <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Aug 2, 2025
Symbolize stack traces by creating a live machine. Add this
functionality to dump_stack and switch dump_stack users to use
it. Switch TUI to use it. Add stack traces to the child test function
which can be useful to diagnose blocked code.

Example output:
```
$ perf test -vv PERF_RECORD_
...
  7: PERF_RECORD_* events & perf_sample fields:
  7: PERF_RECORD_* events & perf_sample fields                       : Running (1 active)
^C
Signal (2) while running tests.
Terminating tests with the same signal
Internal test harness failure. Completing any started tests:
:  7: PERF_RECORD_* events & perf_sample fields:

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #4 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #5 0x7fc12ff02d68 in __sleep sleep.c:55
    #6 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #7 0x55788c620fb0 in run_test_child builtin-test.c:0
    #8 0x55788c5bd18d in start_command run-command.c:127
    #9 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #10 0x55788c6225bf in cmd_test ??:0
    #11 0x55788c5afbd0 in run_builtin perf.c:0
    #12 0x55788c5afeeb in handle_internal_command perf.c:0
    #13 0x55788c52b383 in main ??:0
    #14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #16 0x55788c52b9d1 in _start ??:0

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45
    #3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26
    #4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36
    #5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0
    #6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #9 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #10 0x7fc12ff02d68 in __sleep sleep.c:55
    #11 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #12 0x55788c620fb0 in run_test_child builtin-test.c:0
    #13 0x55788c5bd18d in start_command run-command.c:127
    #14 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #15 0x55788c6225bf in cmd_test ??:0
    #16 0x55788c5afbd0 in run_builtin perf.c:0
    #17 0x55788c5afeeb in handle_internal_command perf.c:0
    #18 0x55788c52b383 in main ??:0
    #19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #21 0x55788c52b9d1 in _start ??:0
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Signed-off-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Aug 2, 2025
Calling perf top with branch filters enabled on Intel CPU's
with branch counters logging (A.K.A LBR event logging [1]) support
results in a segfault.

$ perf top  -e '{cpu_core/cpu-cycles/,cpu_core/event=0xc6,umask=0x3,frontend=0x11,name=frontend_retired_dsb_miss/}' -j any,counter
...
Thread 27 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffafff76c0 (LWP 949003)]
perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
653			*width = env->cpu_pmu_caps ? env->br_cntr_width :
(gdb) bt
 #0  perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
 #1  0x00000000005b1599 in symbol__account_br_cntr (branch=0x7fffcc3db580, evsel=0xfea2d0, offset=12, br_cntr=8) at util/annotate.c:345
 #2  0x00000000005b17fb in symbol__account_cycles (addr=5658172, start=5658160, sym=0x7fffcc0ee420, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:389
 #3  0x00000000005b1976 in addr_map_symbol__account_cycles (ams=0x7fffcd7b01d0, start=0x7fffcd7b02b0, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:422
 #4  0x000000000068d57f in hist__account_cycles (bs=0x110d288, al=0x7fffafff6540, sample=0x7fffafff6760, nonany_branch_mode=false, total_cycles=0x0, evsel=0xfea2d0) at util/hist.c:2850
 #5  0x0000000000446216 in hist_iter__top_callback (iter=0x7fffafff6590, al=0x7fffafff6540, single=true, arg=0x7fffffff9e00) at builtin-top.c:737
 #6  0x0000000000689787 in hist_entry_iter__add (iter=0x7fffafff6590, al=0x7fffafff6540, max_stack_depth=127, arg=0x7fffffff9e00) at util/hist.c:1359
 #7  0x0000000000446710 in perf_event__process_sample (tool=0x7fffffff9e00, event=0x110d250, evsel=0xfea2d0, sample=0x7fffafff6760, machine=0x108c968) at builtin-top.c:845
 #8  0x0000000000447735 in deliver_event (qe=0x7fffffffa120, qevent=0x10fc200) at builtin-top.c:1211
 #9  0x000000000064ccae in do_flush (oe=0x7fffffffa120, show_progress=false) at util/ordered-events.c:245
 #10 0x000000000064d005 in __ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP, timestamp=0) at util/ordered-events.c:324
 #11 0x000000000064d0ef in ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP) at util/ordered-events.c:342
 #12 0x00000000004472a9 in process_thread (arg=0x7fffffff9e00) at builtin-top.c:1120
 #13 0x00007ffff6e7dba8 in start_thread (arg=<optimized out>) at pthread_create.c:448
 #14 0x00007ffff6f01b8c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The cause is that perf_env__find_br_cntr_info tries to access a
null pointer pmu_caps in the perf_env struct. A similar issue exists
for homogeneous core systems which use the cpu_pmu_caps structure.

Fix this by populating cpu_pmu_caps and pmu_caps structures with
values from sysfs when calling perf top with branch stack sampling
enabled.

[1], LBR event logging introduced here:
https://lore.kernel.org/all/[email protected]/

Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Thomas Falcon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Aug 8, 2025
As syzbot [1] reported as below:

R10: 0000000000000100 R11: 0000000000000206 R12: 00007ffe17473450
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
 </TASK>
---[ end trace 0000000000000000 ]---
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff88812d962278 by task syz-executor/564

CPU: 1 PID: 564 Comm: syz-executor Tainted: G        W          6.1.129-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
 <TASK>
 __dump_stack+0x21/0x24 lib/dump_stack.c:88
 dump_stack_lvl+0xee/0x158 lib/dump_stack.c:106
 print_address_description+0x71/0x210 mm/kasan/report.c:316
 print_report+0x4a/0x60 mm/kasan/report.c:427
 kasan_report+0x122/0x150 mm/kasan/report.c:531
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:351
 __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
 __list_del_entry include/linux/list.h:134 [inline]
 list_del_init include/linux/list.h:206 [inline]
 f2fs_inode_synced+0xf7/0x2e0 fs/f2fs/super.c:1531
 f2fs_update_inode+0x74/0x1c40 fs/f2fs/inode.c:585
 f2fs_update_inode_page+0x137/0x170 fs/f2fs/inode.c:703
 f2fs_write_inode+0x4ec/0x770 fs/f2fs/inode.c:731
 write_inode fs/fs-writeback.c:1460 [inline]
 __writeback_single_inode+0x4a0/0xab0 fs/fs-writeback.c:1677
 writeback_single_inode+0x221/0x8b0 fs/fs-writeback.c:1733
 sync_inode_metadata+0xb6/0x110 fs/fs-writeback.c:2789
 f2fs_sync_inode_meta+0x16d/0x2a0 fs/f2fs/checkpoint.c:1159
 block_operations fs/f2fs/checkpoint.c:1269 [inline]
 f2fs_write_checkpoint+0xca3/0x2100 fs/f2fs/checkpoint.c:1658
 kill_f2fs_super+0x231/0x390 fs/f2fs/super.c:4668
 deactivate_locked_super+0x98/0x100 fs/super.c:332
 deactivate_super+0xaf/0xe0 fs/super.c:363
 cleanup_mnt+0x45f/0x4e0 fs/namespace.c:1186
 __cleanup_mnt+0x19/0x20 fs/namespace.c:1193
 task_work_run+0x1c6/0x230 kernel/task_work.c:203
 exit_task_work include/linux/task_work.h:39 [inline]
 do_exit+0x9fb/0x2410 kernel/exit.c:871
 do_group_exit+0x210/0x2d0 kernel/exit.c:1021
 __do_sys_exit_group kernel/exit.c:1032 [inline]
 __se_sys_exit_group kernel/exit.c:1030 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1030
 x64_sys_call+0x7b4/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:232
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2
RIP: 0033:0x7f28b1b8e169
Code: Unable to access opcode bytes at 0x7f28b1b8e13f.
RSP: 002b:00007ffe174710a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f28b1c10879 RCX: 00007f28b1b8e169
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 0000000000000002 R08: 00007ffe1746ee47 R09: 00007ffe17472360
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffe17472360
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
 </TASK>

Allocated by task 569:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
 kasan_save_alloc_info+0x25/0x30 mm/kasan/generic.c:505
 __kasan_slab_alloc+0x72/0x80 mm/kasan/common.c:328
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook+0x4f/0x2c0 mm/slab.h:737
 slab_alloc_node mm/slub.c:3398 [inline]
 slab_alloc mm/slub.c:3406 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3413 [inline]
 kmem_cache_alloc_lru+0x104/0x220 mm/slub.c:3429
 alloc_inode_sb include/linux/fs.h:3245 [inline]
 f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
 alloc_inode fs/inode.c:261 [inline]
 iget_locked+0x186/0x880 fs/inode.c:1373
 f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
 f2fs_lookup+0x366/0xab0 fs/f2fs/namei.c:487
 __lookup_slow+0x2a3/0x3d0 fs/namei.c:1690
 lookup_slow+0x57/0x70 fs/namei.c:1707
 walk_component+0x2e6/0x410 fs/namei.c:1998
 lookup_last fs/namei.c:2455 [inline]
 path_lookupat+0x180/0x490 fs/namei.c:2479
 filename_lookup+0x1f0/0x500 fs/namei.c:2508
 vfs_statx+0x10b/0x660 fs/stat.c:229
 vfs_fstatat fs/stat.c:267 [inline]
 vfs_lstat include/linux/fs.h:3424 [inline]
 __do_sys_newlstat fs/stat.c:423 [inline]
 __se_sys_newlstat+0xd5/0x350 fs/stat.c:417
 __x64_sys_newlstat+0x5b/0x70 fs/stat.c:417
 x64_sys_call+0x393/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:7
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2

Freed by task 13:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
 kasan_save_free_info+0x31/0x50 mm/kasan/generic.c:516
 ____kasan_slab_free+0x132/0x180 mm/kasan/common.c:236
 __kasan_slab_free+0x11/0x20 mm/kasan/common.c:244
 kasan_slab_free include/linux/kasan.h:177 [inline]
 slab_free_hook mm/slub.c:1724 [inline]
 slab_free_freelist_hook+0xc2/0x190 mm/slub.c:1750
 slab_free mm/slub.c:3661 [inline]
 kmem_cache_free+0x12d/0x2a0 mm/slub.c:3683
 f2fs_free_inode+0x24/0x30 fs/f2fs/super.c:1562
 i_callback+0x4c/0x70 fs/inode.c:250
 rcu_do_batch+0x503/0xb80 kernel/rcu/tree.c:2297
 rcu_core+0x5a2/0xe70 kernel/rcu/tree.c:2557
 rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2574
 handle_softirqs+0x178/0x500 kernel/softirq.c:578
 run_ksoftirqd+0x28/0x30 kernel/softirq.c:945
 smpboot_thread_fn+0x45a/0x8c0 kernel/smpboot.c:164
 kthread+0x270/0x310 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Last potentially related work creation:
 kasan_save_stack+0x3a/0x60 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xb6/0xc0 mm/kasan/generic.c:486
 kasan_record_aux_stack_noalloc+0xb/0x10 mm/kasan/generic.c:496
 call_rcu+0xd4/0xf70 kernel/rcu/tree.c:2845
 destroy_inode fs/inode.c:316 [inline]
 evict+0x7da/0x870 fs/inode.c:720
 iput_final fs/inode.c:1834 [inline]
 iput+0x62b/0x830 fs/inode.c:1860
 do_unlinkat+0x356/0x540 fs/namei.c:4397
 __do_sys_unlink fs/namei.c:4438 [inline]
 __se_sys_unlink fs/namei.c:4436 [inline]
 __x64_sys_unlink+0x49/0x50 fs/namei.c:4436
 x64_sys_call+0x958/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:88
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2

The buggy address belongs to the object at ffff88812d961f20
 which belongs to the cache f2fs_inode_cache of size 1200
The buggy address is located 856 bytes inside of
 1200-byte region [ffff88812d961f20, ffff88812d9623d0)

The buggy address belongs to the physical page:
page:ffffea0004b65800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12d960
head:ffffea0004b65800 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 0000000000000000 dead000000000122 ffff88810a94c500
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Reclaimable, gfp_mask 0x1d2050(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_RECLAIMABLE), pid 569, tgid 568 (syz.2.16), ts 55943246141, free_ts 0
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1d0/0x1f0 mm/page_alloc.c:2532
 prep_new_page mm/page_alloc.c:2539 [inline]
 get_page_from_freelist+0x2e63/0x2ef0 mm/page_alloc.c:4328
 __alloc_pages+0x235/0x4b0 mm/page_alloc.c:5605
 alloc_slab_page include/linux/gfp.h:-1 [inline]
 allocate_slab mm/slub.c:1939 [inline]
 new_slab+0xec/0x4b0 mm/slub.c:1992
 ___slab_alloc+0x6f6/0xb50 mm/slub.c:3180
 __slab_alloc+0x5e/0xa0 mm/slub.c:3279
 slab_alloc_node mm/slub.c:3364 [inline]
 slab_alloc mm/slub.c:3406 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3413 [inline]
 kmem_cache_alloc_lru+0x13f/0x220 mm/slub.c:3429
 alloc_inode_sb include/linux/fs.h:3245 [inline]
 f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
 alloc_inode fs/inode.c:261 [inline]
 iget_locked+0x186/0x880 fs/inode.c:1373
 f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
 f2fs_fill_super+0x3ad7/0x6bb0 fs/f2fs/super.c:4293
 mount_bdev+0x2ae/0x3e0 fs/super.c:1443
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:4642
 legacy_get_tree+0xea/0x190 fs/fs_context.c:632
 vfs_get_tree+0x89/0x260 fs/super.c:1573
 do_new_mount+0x25a/0xa20 fs/namespace.c:3056
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88812d962100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88812d962180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88812d962200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                ^
 ffff88812d962280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88812d962300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

[1] https://syzkaller.appspot.com/x/report.txt?x=13448368580000

This bug can be reproduced w/ the reproducer [2], once we enable
CONFIG_F2FS_CHECK_FS config, the reproducer will trigger panic as below,
so the direct reason of this bug is the same as the one below patch [3]
fixed.

kernel BUG at fs/f2fs/inode.c:857!
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20
Call Trace:
 <TASK>
 evict+0x32a/0x7a0
 do_unlinkat+0x37b/0x5b0
 __x64_sys_unlink+0xad/0x100
 do_syscall_64+0x5a/0xb0
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20

[2] https://syzkaller.appspot.com/x/repro.c?x=17495ccc580000
[3] https://lore.kernel.org/linux-f2fs-devel/[email protected]

Tracepoints before panic:

f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file1
f2fs_unlink_exit: dev = (7,0), ino = 7, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 7, pino = 3, i_mode = 0x81ed, i_size = 10, i_nlink = 0, i_blocks = 0, i_advise = 0x0
f2fs_truncate_node: dev = (7,0), ino = 7, nid = 8, block_address = 0x3c05

f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file3
f2fs_unlink_exit: dev = (7,0), ino = 8, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 9000, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 0, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate_blocks_enter: dev = (7,0), ino = 8, i_size = 0, i_blocks = 24, start file offset = 0
f2fs_truncate_blocks_exit: dev = (7,0), ino = 8, ret = -2

The root cause is: in the fuzzed image, dnode kernel-patches#8 belongs to inode kernel-patches#7,
after inode kernel-patches#7 eviction, dnode kernel-patches#8 was dropped.

However there is dirent that has ino kernel-patches#8, so, once we unlink file3, in
f2fs_evict_inode(), both f2fs_truncate() and f2fs_update_inode_page()
will fail due to we can not load node kernel-patches#8, result in we missed to call
f2fs_inode_synced() to clear inode dirty status.

Let's fix this by calling f2fs_inode_synced() in error path of
f2fs_evict_inode().

PS: As I verified, the reproducer [2] can trigger this bug in v6.1.129,
but it failed in v6.16-rc4, this is because the testcase will stop due to
other corruption has been detected by f2fs:

F2FS-fs (loop0): inconsistent node block, node_type:2, nid:8, node_footer[nid:8,ino:8,ofs:0,cpver:5013063228981249506,blkaddr:15366]
F2FS-fs (loop0): f2fs_lookup: inode (ino=9) has zero i_nlink

Fixes: 0f18b46 ("f2fs: flush inode metadata when checkpoint is doing")
Closes: https://syzkaller.appspot.com/x/report.txt?x=13448368580000
Signed-off-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants