Skip to content

Conversation

github-actions[bot]
Copy link

@github-actions github-actions bot commented Jul 1, 2020

No description provided.

osandov and others added 7 commits June 24, 2020 13:33
…SSE 4.2

We were forgetting to mask away the extra bits. There are two places
that we use the tag without converting it to a uint8_t:
hash_table_probe_delta(), which is mostly benign since we mask it by the
chunk mask anyways; and table_chunk_match() without SSE 2, which
completely breaks.

While we're here, let's align the comments better.
This makes it much easier to follow along with the code and understand
the format.
Declaring a local vector or hash table and separately initializing it
with vector_init()/hash_table_init() is annoying. Add macros that can be
used as initializers.

This exposes several places where the C89 style of placing all
declarations at the beginning of a block is awkward. I adopted this
style from the Linux kernel, which uses C89 and thus requires this
style. I'm now convinced that it's usually nicer to declare variables
where they're used. So let's officially adopt the style of mixing
declarations and code (and ditch the blank line after declarations) and
update the functions touched by this change.
I once tried to implement a generic arithmetic right shift macro without
relying on any implementation-defined behavior, but this turned out to
be really hard. drgn is fairly tied to GCC and GCC-compatible compilers
(like Clang), so let's just assume GCC's model [1]: modular conversion
to signed types, two's complement signed bitwise operators, and sign
extension for signed right shift.

1: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
This documents best practices for contributing to drgn. We now require a
DCO sign-off.

Also clean up some related areas in the documentation.

Signed-off-by: Omar Sandoval <[email protected]>
@sdimitro sdimitro requested a review from prakashsurya July 8, 2020 17:08
osandov and others added 5 commits July 8, 2020 18:34
Commit eea5422 ("libdrgn: make Linux kernel stack unwinding more
robust") overlooked that if the task is running in userspace, the stack
pointer in PRSTATUS obviously won't match the kernel stack pointer.
Let's bite the bullet and use the PID. If the race shows up in practice,
we can try to come up with another workaround.
GCC 10 doesn't generate a DIE for union thread_union, which breaks our
THREAD_SIZE object finder. The previous change removed our internal
dependency on THREAD_SIZE, so disable this test while I investigate why
GCC changed.
@prakashsurya prakashsurya merged commit bdb9336 into 6.0/stage Jul 9, 2020
delphix-devops-bot pushed a commit that referenced this pull request Sep 27, 2025
The CI has intermittently been hitting the following test failures on
Python 3.8 with Clang:

  ======================================================================
  ERROR: test_task_cpu (tests.linux_kernel.helpers.test_sched.TestSched)
  ----------------------------------------------------------------------
  Traceback (most recent call last):
    File "/home/runner/work/drgn/drgn/tests/linux_kernel/helpers/test_sched.py", line 40, in test_task_cpu
      with fork_and_stop(os.sched_setaffinity, 0, (cpu,)) as (pid, _):
    File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/contextlib.py", line 113, in __enter__
      return next(self.gen)
    File "/home/runner/work/drgn/drgn/tests/linux_kernel/__init__.py", line 203, in fork_and_stop
      ret = pickle.load(pipe_r)
  EOFError: Ran out of input

The EOFError occurs because the forked process segfaults immediately:

  python[132]: segfault at 7f8f87085014 ip 00007f8f891e9774 sp 00007ffccf7acf00 error 4 in ld-linux-x86-64.so.2[16774,7f8f891d5000+2a000] likely on CPU 0 (core 0, socket 0)

The segfault is on dereferencing cache_new in in _dl_load_cache_lookup()
in ld-linux here:
https://sourceware.org/git/?p=glibc.git;a=blob;f=elf/dl-cache.c;h=88bf78ad7c914b02109d6ddef7e08c0e8fd4574d;hb=f94f6d8a3572840d3ba42ab9ace3ea522c99c0c2#l489

Which is coming from a libomp fork handler:

  #0  0x00007f5566f9d774 in _dl_load_cache_lookup (name=name@entry=0x7f55654afde6 "libmemkind.so")
      at ./elf/dl-cache.c:498
  #1  0x00007f5566f91982 in _dl_map_object (loader=loader@entry=0x55f8a170b670,
      name=name@entry=0x7f55654afde6 "libmemkind.so", type=type@entry=2, trace_mode=trace_mode@entry=0,
      mode=mode@entry=-1879048191, nsid=<optimized out>) at ./elf/dl-load.c:2193
  #2  0x00007f5566f959a9 in dl_open_worker_begin (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:534
  #3  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf585050,
      operate=operate@entry=0x7f5566f95900 <dl_open_worker_begin>, args=args@entry=0x7fffcf5851f0)
      at ./elf/dl-error-skeleton.c:208
  #4  0x00007f5566f94f9a in dl_open_worker (a=a@entry=0x7fffcf5851f0) at ./elf/dl-open.c:782
  #5  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5851d0,
      operate=operate@entry=0x7f5566f94f60 <dl_open_worker>, args=args@entry=0x7fffcf5851f0)
      at ./elf/dl-error-skeleton.c:208
  #6  0x00007f5566f9534e in _dl_open (file=<optimized out>, mode=-2147483647, caller_dlopen=0x7f55653fa882, nsid=-2,
      argc=9, argv=<optimized out>, env=0x55f8a1477e10) at ./elf/dl-open.c:883
  #7  0x00007f5566a6663c in dlopen_doit (a=a@entry=0x7fffcf585460) at ./dlfcn/dlopen.c:56
  #8  0x00007f5566b4ab08 in __GI__dl_catch_exception (exception=exception@entry=0x7fffcf5853c0, operate=<optimized out>,
      args=<optimized out>) at ./elf/dl-error-skeleton.c:208
  #9  0x00007f5566b4abd3 in __GI__dl_catch_error (objname=0x7fffcf585418, errstring=0x7fffcf585420,
      mallocedp=0x7fffcf585417, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:227
  #10 0x00007f5566a6612e in _dlerror_run (operate=operate@entry=0x7f5566a665e0 <dlopen_doit>,
      args=args@entry=0x7fffcf585460) at ./dlfcn/dlerror.c:138
  #11 0x00007f5566a666c8 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>)
      at ./dlfcn/dlopen.c:71
  #12 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
  #13 0x00007f55653fa882 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #14 0x00007f5565413556 in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #15 0x00007f5565421d1a in ?? () from /usr/lib/llvm-14/lib/libomp.so.5
  #16 0x00007f5566ac0fc1 in __run_fork_handlers (who=who@entry=atfork_run_child, do_locking=do_locking@entry=true)
      at ./posix/register-atfork.c:130
  #17 0x00007f5566ac08d3 in __libc_fork () at ./posix/fork.c:108
  #18 0x00007f5566e108ad in os_fork_impl (module=<optimized out>) at ./Modules/posixmodule.c:6250
  #19 os_fork (module=<optimized out>, _unused_ignored=<optimized out>) at ./Modules/clinic/posixmodule.c.h:2750

This doesn't happen in Python 3.9, which I bisected to CPython commit
45a78f906d2d ("bpo-44434: Don't call PyThread_exit_thread() explicitly
(GH-26758)") (in v3.11, backported to v3.9.6).

That commit describes a different symptom where the process aborts
because libgcc_s can't be loaded. I don't understand how that issue can
cause our crash, but the fix appears to be the same. The discussion also
suggests a workaround: linking to libgcc_s explicitly.

Apply the workaround, which appears to fix our problem. We only do this
for the CI and not for the general build for a few reasons:

1. I'm nervous about explicitly linking to this low-level library
   unconditionally, and the logic to decide when it's necessary (only
   for Python 3.8 and glibc) isn't worth the trouble.
2. The situation required to hit it (drgn + Python threading + fork) is
   unlikely outside of our test suite.
3. Python 3.8 is EOL.
4. Builds with libkdumpfile already pull in libgcc_s via libkdumpfile ->
   libsnappy -> libstdc++ -> libgcc_s.

Signed-off-by: Omar Sandoval <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants