Skip to content

Conversation

github-actions[bot]
Copy link

@github-actions github-actions bot commented Oct 9, 2020

No description provided.

osandov and others added 16 commits September 29, 2020 23:04
doxygen warns about a few obsolete Doxyfile options. Update it with
doxygen -u.

Signed-off-by: Omar Sandoval <[email protected]>
The Doxygen documentation for libdrgn has bit-rotted over time. Bring
back the Internal module, clean up a few renamed members and parameters,
and fix broken parsing caused by the generic definition macros.

Signed-off-by: Omar Sandoval <[email protected]>
drgn_type_members_eq() skips comparing the types of anonymous members.
Fix that and add a test for it.

Signed-off-by: Omar Sandoval <[email protected]>
These will be used in upcoming changes.

Signed-off-by: Omar Sandoval <[email protected]>
And improve their documentation.

Signed-off-by: Omar Sandoval <[email protected]>
min() and max() from the Linux kernel go through the trouble of
resulting in a constant expression if the arguments are constant
expressions, but they can't be used outside of a function due to their
use of ({ }). This means that they can't be used for, e.g., enumerators
or global arrays. Let's simplify min() and max() and instead add
explicit min_iconst() and max_iconst() macros that can be used
everywhere that an integer constant expression is required. We can then
use it in hash_table.h. While we're here, let's split these into their
own header file and document them better.

Signed-off-by: Omar Sandoval <[email protected]>
Use *_hash_pair() for hash functions that do the full double hashing and
return a struct hash_pair and hash_*() for other hashing utility
functions. Also change some of the equality function names to be more
symmetric and improve the documentation.

Signed-off-by: Omar Sandoval <[email protected]>
These were added in commit e5874ad ("libdrgn: use libdwfl"), but
they have never been used. Remove them.

Signed-off-by: Omar Sandoval <[email protected]>
The next commit will allow using the offline path for the live kernel,
so the offline naming won't make much sense. Fold the offline path into
the top-level functions, and make the live path an escape hatch. Also
add some comments and improve naming for the file and directory handles
and update the coding style.

Signed-off-by: Omar Sandoval <[email protected]>
…and /sys/module

We use /proc/modules and /sys/module to find loaded kernel modules for
the running kernel instead of walking the module list in the core dump
as an optimization. To make it easier to test the core dump path, add an
environment variable to disable the optimization.

Signed-off-by: Omar Sandoval <[email protected]>
We're freeing path and then using it to report an error.

This has some weird knock-on effects. Since we freed the path, the error
message contains garbage. So, PyErr_SetString() can't decode it as a
UTF-8 string. The end result is a MissingDebugInfoError with no message.

Fix it by creating the error before freeing the path.

Signed-off-by: Omar Sandoval <[email protected]>
If cache_kernel_module_sections() in report_loaded_kernel_module()
fails, we continue to the next iteration without advancing to the next
kernel module. Then, we fail on that same kernel module and repeat. Make
sure that we go to the next kernel module.

Fixes: 423d2cd ("libdrgn: dwarf_index: rework file reporting")
Reported-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Linux v5.8 changed the module section structure, so we need to get the
section name differently.

Closes #73.

Reported-by: Serapheim Dimitropoulos <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
@sdimitro sdimitro requested a review from prakashsurya October 16, 2020 15:09
@prakashsurya prakashsurya merged commit 347ff46 into 6.0/stage Oct 16, 2020
delphix-devops-bot pushed a commit that referenced this pull request Sep 27, 2025
The CI has intermittently been hitting the following test failures on
Python 3.8 with Clang:

  ======================================================================
  ERROR: test_task_cpu (tests.linux_kernel.helpers.test_sched.TestSched)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/runner/work/drgn/drgn/tests/linux_kernel/helpers/test_sched.py", line 40, in test_task_cpu
      with fork_and_stop(os.sched_setaffinity, 0, (cpu,)) as (pid, _):
    File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/contextlib.py", line 113, in __enter__
      return next(self.gen)
    File "/home/runner/work/drgn/drgn/tests/linux_kernel/__init__.py", line 203, in fork_and_stop
      ret = pickle.load(pipe_r)
  EOFError: Ran out of input

The EOFError occurs because the forked process segfaults immediately:

  python[132]: segfault at 7f8f87085014 ip 00007f8f891e9774 sp 00007ffccf7acf00 error 4 in ld-linux-x86-64.so.2[16774,7f8f891d5000+2a000] likely on CPU 0 (core 0, socket 0)

The segfault is on dereferencing cache_new in in _dl_load_cache_lookup()
in ld-linux here:
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-cache.c;h=88bf78ad7c914b02109d6ddef7e08c0e8fd4574d;hb=f94f6d8a3572840d3ba42ab9ace3ea522c99c0c2#l489

Which is coming from a libomp fork handler:

  #0  0x00007f5566f9d774 in _dl_load_cache_lookup (name=name@entry=0x7f55654afde6 "libmemkind.so")
      at ./elf/dl-cache.c:498
  #1  0x00007f5566f91982 in _dl_map_object (loader=loader@entry=0x55f8a170b670,
      name=name@entry=0x7f55654afde6 "libmemkind.so", type=type@entry=2, trace_mode=trace_mode@entry=0,
      mode=mode@entry=-1879048191, nsid=<optimized out>) at ./elf/dl-load.c:2193
  #2  0x00007f5566f959a9 in dl_open_worker_begin (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:534
  #3  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf585050,
      operate=operate@entry=0x7f5566f95900 <dl_open_worker_begin>, args=args@entry=0x7fffcf5851f0)
      at ./elf/dl-error-skeleton.c:208
  #4  0x00007f5566f94f9a in dl_open_worker (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:782
  #5  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5851d0,
      operate=operate@entry=0x7f5566f94f60 <dl_open_worker>, args=args@entry=0x7fffcf5851f0)
      at ./elf/dl-error-skeleton.c:208
  #6  0x00007f5566f9534e in _dl_open (file=<optimized out>, mode=-2147483647, caller_dlopen=0x7f55653fa882, nsid=-2,
      argc=9, argv=<optimized out>, env=0x55f8a1477e10) at ./elf/dl-open.c:883
  #7  0x00007f5566a6663c in dlopen_doit (a=a@entry=0x7fffcf585460) at ./dlfcn/dlopen.c:56
  #8  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5853c0, operate=<optimized out>,
      args=<optimized out>) at ./elf/dl-error-skeleton.c:208
  #9  0x00007f5566b4abd3 in __GI__dl_catch_error (objname=0x7fffcf585418, errstring=0x7fffcf585420,
      mallocedp=0x7fffcf585417, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:227
  #10 0x00007f5566a6612e in _dlerror_run (operate=operate@entry=0x7f5566a665e0 <dlopen_doit>,
      args=args@entry=0x7fffcf585460) at ./dlfcn/dlerror.c:138
  #11 0x00007f5566a666c8 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>)
      at ./dlfcn/dlopen.c:71
  #12 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
  #13 0x00007f55653fa882 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #14 0x00007f5565413556 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #15 0x00007f5565421d1a in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #16 0x00007f5566ac0fc1 in __run_fork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=true)
      at ./posix/register-atfork.c:130
  #17 0x00007f5566ac08d3 in __libc_fork () at ./posix/fork.c:108
  #18 0x00007f5566e108ad in os_fork_impl (module=<optimized out>) at ./Modules/posixmodule.c:6250
  #19 os_fork (module=<optimized out>, _unused_ignored=<optimized out>) at ./Modules/clinic/posixmodule.c.h:2750

This doesn't happen in Python 3.9, which I bisected to CPython commit
45a78f906d2d ("bpo-44434: Don't call PyThread_exit_thread() explicitly
(GH-26758)") (in v3.11, backported to v3.9.6).

That commit describes a different symptom where the process aborts
because libgcc_s can't be loaded. I don't understand how that issue can
cause our crash, but the fix appears to be the same. The discussion also
suggests a workaround: linking to libgcc_s explicitly.

Apply the workaround, which appears to fix our problem. We only do this
for the CI and not for the general build for a few reasons:

1. I'm nervous about explicitly linking to this low-level library
   unconditionally, and the logic to decide when it's necessary (only
   for Python 3.8 and glibc) isn't worth the trouble.
2. The situation required to hit it (drgn + Python threading + fork) is
   unlikely outside of our test suite.
3. Python 3.8 is EOL.
4. Builds with libkdumpfile already pull in libgcc_s via libkdumpfile ->
   libsnappy -> libstdc++ -> libgcc_s.

Signed-off-by: Omar Sandoval <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants