Skip to content

Conversation

@sdimitro
Copy link

No description provided.

osandov and others added 26 commits February 21, 2020 10:37
I've been wanting to add type hints for the _drgn C extension for
awhile. The main blocker was that there is a large overlap between the
documentation (in docs/api_reference.rst) and the stub file, and I
really didn't want to duplicate the information. Therefore, it was a
requirement that the the documentation could be generated from the stub
file, or vice versa. Unfortunately, none of the existing tools that I
could find supported this very well. So, I bit the bullet and wrote my
own Sphinx extension that uses the stub file as the source of truth (and
subsumes my old autopackage extension and gen_docstrings script).

The stub file is probably incomplete/inaccurate in places, but this
should be a good starting point to improve on.

Closes #22.
String annotations (i.e., forward references) need to be parsed into an
ast node. Do it as a transformation step immediately after parsing the
source. We can also squash the constant node transformation into this
one.
While we're here, make generate_dwarf_constants.py use the bundled
dwarf.h, generate code that black is happy with, and use the keyword
list from the standard library.
We only lazily evaluate compound type members and function type
parameters, which are never void.
The plain variant is a trivial wrapper around the internal variant, so
get rid of the wrapper and use the internal variant directly everywhere.
This way, languages can be identified by an index, which will be useful
for adding Python bindings for drgn_language and for adding a language
field to drgn_type.
For types obtained from DWARF, we determine it from the language of the
CU. For other types, it can be specified manually or fall back to the
default (C). Then, we can use the language for operations where the type
is available.
For operations where we don't have a type available, we currently fall
back to C. Instead, we should guess the language of the program and use
that as the default. The heurisitic implemented here gets the language
of the CU containing "main" (except for the Linux kernel, which is
always C). In the future, we should allow manually overriding the
automatically determined language.
Introduce bpf_inspect.py drgn script to list BPF programs and maps and
their properties unavailable to user space via kernel API.

The script was initially sent to kernel tree [1] but it was agreed that
drgn repo is a better place for it and it's a good idea to create
`tools/` directory in drgn to keep tools likes this. See [2] for
details.

The main use-case bpf_inspect.py covers is to show BPF programs attached
to other BPF programs via freplace/fentry/fexit mechanisms introduced
recently. There is no user-space API to get this info and, for example,
bpftool can show all BPF programs but can't show if program A replaces a
function in program B.

Example:

  % sudo tools/bpf_inspect.py p | grep test_pkt_access
     650: BPF_PROG_TYPE_SCHED_CLS          test_pkt_access
     654: BPF_PROG_TYPE_TRACING            test_main                        linked:[650->25: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access()]
     655: BPF_PROG_TYPE_TRACING            test_subprog1                    linked:[650->29: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog1()]
     656: BPF_PROG_TYPE_TRACING            test_subprog2                    linked:[650->31: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog2()]
     657: BPF_PROG_TYPE_TRACING            test_subprog3                    linked:[650->21: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog3()]
     658: BPF_PROG_TYPE_EXT                new_get_skb_len                  linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()]
     659: BPF_PROG_TYPE_EXT                new_get_skb_ifindex              linked:[650->23: BPF_TRAMP_REPLACE test_pkt_access->get_skb_ifindex()]
     660: BPF_PROG_TYPE_EXT                new_get_constant                 linked:[650->19: BPF_TRAMP_REPLACE test_pkt_access->get_constant()]

It can be seen that there is a program test_pkt_access, id 650 and there
are multiple other tracing and ext programs attached to functions in
test_pkt_access.

For example the line:

     658: BPF_PROG_TYPE_EXT                new_get_skb_len                  linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()]

means that BPF program new_get_skb_len, id 658, type BPF_PROG_TYPE_EXT
replaces (BPF_TRAMP_REPLACE) function get_skb_len() that has BTF id 16
in BPF program test_pkt_access, prog id 650.

Just very simple output is supported now but it can be extended in the
future if needed.

The script is extendable and currently implements two subcommands:
* prog (alias: p) to list all BPF programs;
* map (alias: m) to list all BPF maps;

Developer can simply tweak the script to print interesting pieces of
programs or maps.

More examples of output:

  % sudo tools/bpf_inspect.py p | shuf -n 3
      81: BPF_PROG_TYPE_CGROUP_SOCK_ADDR   tw_ipt_bind
      94: BPF_PROG_TYPE_CGROUP_SOCK_ADDR   tw_ipt_bind
      43: BPF_PROG_TYPE_KPROBE             kprobe__tcp_reno_cong_avoid

  % sudo tools/bpf_inspect.py m | shuf -n 3
     213: BPF_MAP_TYPE_HASH                errors
      30: BPF_MAP_TYPE_ARRAY               sslwall_setting
      41: BPF_MAP_TYPE_LRU_HASH            flow_to_snd

Help:

  % sudo tools/bpf_inspect.py
  usage: bpf_inspect.py [-h] {prog,p,map,m} ...

  drgn script to list BPF programs or maps and their properties
  unavailable via kernel API.

  See https://github.com/osandov/drgn/ for more details on drgn.

  optional arguments:
    -h, --help      show this help message and exit

  subcommands:
    {prog,p,map,m}
      prog (p)      list BPF programs
      map (m)       list BPF maps

[1] https://lore.kernel.org/bpf/20200228201514.GB51456@rdna-mbp/T/
[2] https://lore.kernel.org/bpf/20200228201514.GB51456@rdna-mbp/T/#mefed65e8a98116bd5d07d09a570a3eac46724951

Signed-off-by: Andrey Ignatov <[email protected]>
`examples/linux/bpf.py` was superseded by `tools/bpf_inspect.py` so no
reason to keep it around anymore. Remove it.

Signed-off-by: Andrey Ignatov <[email protected]>
We should be looking at the kind of the previous token, not the kind of
the unexpected token.

Closes #52.
We need to keep the Program alive for its types to stay valid, not just
the objects the Program has pinned. (I have no idea why I changed this
in commit 565e034 ("libdrgn: make symbol index pluggable with
callbacks").)
Instead, print a warning (unless in quiet mode).
The upcoming vmtest rework won't have any block devices, so let's add a
loop device so that we always have a device to test with.
@sdimitro sdimitro requested a review from shartse March 27, 2020 17:45
@sdimitro sdimitro merged commit 2d449de into 6.0/stage Apr 1, 2020
delphix-devops-bot pushed a commit that referenced this pull request Sep 27, 2025
The CI has intermittently been hitting the following test failures on
Python 3.8 with Clang:

  ======================================================================
  ERROR: test_task_cpu (tests.linux_kernel.helpers.test_sched.TestSched)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/runner/work/drgn/drgn/tests/linux_kernel/helpers/test_sched.py", line 40, in test_task_cpu
      with fork_and_stop(os.sched_setaffinity, 0, (cpu,)) as (pid, _):
    File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/contextlib.py", line 113, in __enter__
      return next(self.gen)
    File "/home/runner/work/drgn/drgn/tests/linux_kernel/__init__.py", line 203, in fork_and_stop
      ret = pickle.load(pipe_r)
  EOFError: Ran out of input

The EOFError occurs because the forked process segfaults immediately:

  python[132]: segfault at 7f8f87085014 ip 00007f8f891e9774 sp 00007ffccf7acf00 error 4 in ld-linux-x86-64.so.2[16774,7f8f891d5000+2a000] likely on CPU 0 (core 0, socket 0)

The segfault is on dereferencing cache_new in in _dl_load_cache_lookup()
in ld-linux here:
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-cache.c;h=88bf78ad7c914b02109d6ddef7e08c0e8fd4574d;hb=f94f6d8a3572840d3ba42ab9ace3ea522c99c0c2#l489

Which is coming from a libomp fork handler:

  #0  0x00007f5566f9d774 in _dl_load_cache_lookup (name=name@entry=0x7f55654afde6 "libmemkind.so")
      at ./elf/dl-cache.c:498
  #1  0x00007f5566f91982 in _dl_map_object (loader=loader@entry=0x55f8a170b670,
      name=name@entry=0x7f55654afde6 "libmemkind.so", type=type@entry=2, trace_mode=trace_mode@entry=0,
      mode=mode@entry=-1879048191, nsid=<optimized out>) at ./elf/dl-load.c:2193
  #2  0x00007f5566f959a9 in dl_open_worker_begin (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:534
  #3  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf585050,
      operate=operate@entry=0x7f5566f95900 <dl_open_worker_begin>, args=args@entry=0x7fffcf5851f0)
      at ./elf/dl-error-skeleton.c:208
  #4  0x00007f5566f94f9a in dl_open_worker (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:782
  #5  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5851d0,
      operate=operate@entry=0x7f5566f94f60 <dl_open_worker>, args=args@entry=0x7fffcf5851f0)
      at ./elf/dl-error-skeleton.c:208
  #6  0x00007f5566f9534e in _dl_open (file=<optimized out>, mode=-2147483647, caller_dlopen=0x7f55653fa882, nsid=-2,
      argc=9, argv=<optimized out>, env=0x55f8a1477e10) at ./elf/dl-open.c:883
  #7  0x00007f5566a6663c in dlopen_doit (a=a@entry=0x7fffcf585460) at ./dlfcn/dlopen.c:56
  #8  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5853c0, operate=<optimized out>,
      args=<optimized out>) at ./elf/dl-error-skeleton.c:208
  #9  0x00007f5566b4abd3 in __GI__dl_catch_error (objname=0x7fffcf585418, errstring=0x7fffcf585420,
      mallocedp=0x7fffcf585417, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:227
  #10 0x00007f5566a6612e in _dlerror_run (operate=operate@entry=0x7f5566a665e0 <dlopen_doit>,
      args=args@entry=0x7fffcf585460) at ./dlfcn/dlerror.c:138
  #11 0x00007f5566a666c8 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>)
      at ./dlfcn/dlopen.c:71
  #12 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
  #13 0x00007f55653fa882 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #14 0x00007f5565413556 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #15 0x00007f5565421d1a in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #16 0x00007f5566ac0fc1 in __run_fork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=true)
      at ./posix/register-atfork.c:130
  #17 0x00007f5566ac08d3 in __libc_fork () at ./posix/fork.c:108
  #18 0x00007f5566e108ad in os_fork_impl (module=<optimized out>) at ./Modules/posixmodule.c:6250
  #19 os_fork (module=<optimized out>, _unused_ignored=<optimized out>) at ./Modules/clinic/posixmodule.c.h:2750

This doesn't happen in Python 3.9, which I bisected to CPython commit
45a78f906d2d ("bpo-44434: Don't call PyThread_exit_thread() explicitly
(GH-26758)") (in v3.11, backported to v3.9.6).

That commit describes a different symptom where the process aborts
because libgcc_s can't be loaded. I don't understand how that issue can
cause our crash, but the fix appears to be the same. The discussion also
suggests a workaround: linking to libgcc_s explicitly.

Apply the workaround, which appears to fix our problem. We only do this
for the CI and not for the general build for a few reasons:

1. I'm nervous about explicitly linking to this low-level library
   unconditionally, and the logic to decide when it's necessary (only
   for Python 3.8 and glibc) isn't worth the trouble.
2. The situation required to hit it (drgn + Python threading + fork) is
   unlikely outside of our test suite.
3. Python 3.8 is EOL.
4. Builds with libkdumpfile already pull in libgcc_s via libkdumpfile ->
   libsnappy -> libstdc++ -> libgcc_s.

Signed-off-by: Omar Sandoval <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants