-
Notifications
You must be signed in to change notification settings - Fork 6
Merge remote-tracking branch 'origin/upstreams/os-upgrade' into os-upgrade #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge remote-tracking branch 'origin/upstreams/os-upgrade' into os-upgrade #70
Conversation
Ubuntu enables -Wformat-security by default, but upstream GCC doesn't enable it even with -Wall. It caught some legitimate issues in the module API branch, so let's enable it explicitly. Signed-off-by: Omar Sandoval <[email protected]>
An upcoming commit needs to look up a few notes by name+type, so add a common helper function for that. Signed-off-by: Omar Sandoval <[email protected]>
This will be used for checking the CRC in .gnu_debuglink. Signed-off-by: Omar Sandoval <[email protected]>
These will be used for GNU build IDs. Signed-off-by: Omar Sandoval <[email protected]>
We are going to need to parse various structures that have different 64- and 32-bit formats, may be byte-swapped, and may be unaligned (e.g., various ELF and link map structures). Provide a couple of convenience macros for doing this. Signed-off-by: Omar Sandoval <[email protected]>
This is a trivial wrapper to delete by entry instead of by key or iterator. Signed-off-by: Omar Sandoval <[email protected]>
This has been available since Linux kernel commit 0935288c6e00 ("kdump:
append kernel build-id string to VMCOREINFO") (in v5.9). Save it so we
can use it when loading debugging information.
Unfortunately, the build ID in VMCOREINFO was briefly broken in several
stable releases. 6.10 and 5.15 reached their end-of-life while broken
and so will remain broken forever. It feels like overkill to drop
support for those versions over this, so we work around it with a
version check.
Signed-off-by: Omar Sandoval <[email protected]>
No code changes other than moving it in the file to make upcoming diffs cleaner. Signed-off-by: Omar Sandoval <[email protected]>
We currently use debuginfod via libdwfl, but when we get rid of our libdwfl dependency, we'll need to do the debuginfod calls ourselves. So, let's add the scaffolding for using libdebuginfod. We provide three choices at build time: * No debuginfod (./configure --without-debuginfod). * Soft dependency: load libdebuginfod with dlopen at runtime if available (./configure --with-debuginfod --enable-dlopen-debuginfod). This is the default and probably what we want distros to use. * Hard dependency: link against libdebuginfod (./configure --with-debuginfod --disable-dlopen-debuginfod). This is intended for environments where dlopen can't be used (e.g., manylinux wheels). The client handle will be created lazily, so for now this just sets up some wrappers and doesn't do much with them. Signed-off-by: Omar Sandoval <[email protected]>
We try to pick a good default, but it's not exactly the same as logging so it needs extra flexibility. This will be used by upcoming debuginfod integration. Signed-off-by: Omar Sandoval <[email protected]>
drgn currently provides limited control over how debugging information is found. drgn has hardcoded logic for where to search for debugging information. The most the user can do is provide a list of files for drgn to try in addition to the default locations (with the -s CLI option or the drgn.Program.load_debug_info() method). The implementation is also a mess. We use libdwfl, but its data model is slightly different from what we want, so we have to work around it or reimplement its functionality in several places: see commits e5874ad ("libdrgn: use libdwfl"), e6abfea ("libdrgn: debug_info: report userspace core dump debug info ourselves"), and 1d4854a ("libdrgn: implement optimized x86-64 ELF relocations") for some examples. The mismatched combination of libdwfl and our own code is difficult to maintain, and the lack of control over the whole debug info pipeline has made it difficult to fix several longstanding issues. The solution is a major rework removing our libdwfl dependency and replacing it with our own model. This (huge) commit is that rework comprising the following components: - drgn.Module/struct drgn_module, a representation of a binary used by a program. - Automatic discovery of the modules loaded in a program. - Interfaces for manually creating and overriding modules. - Automatic discovery of debugging information from the standard locations and debuginfod. - Interfaces for custom debug info finders and for manually overriding debugging information. - Tons of test cases. A lot of care was taken to make these interfaces extremely flexible yet cohesive. The existing interfaces are also reimplemented on top of the new functionality to maintain backwards compatibility, with one exception: drgn.Program.load_debug_info()/-s would previously accept files that it didn't find loaded in the program. This turned out to be a big footgun for users, so now this must be done explicitly (with drgn.ExtraModule/--extra-symbols). The API and implementation both owe a lot to libdwfl: - The concepts of modules, module address ranges/section addresses, and file biases are heavily inspired by the libdwfl interfaces. - Ideas for determining modules in userspace processes and core dumps were taken from libdwfl. - Our implementation of ELF symbol table address lookups is based on dwfl_module_addrinfo(). drgn has taken these concepts and fine-tuned them based on lessons learned. Credit is also due to Stephen Brennan for early testing and feedback. Closes #16, closes #25, closes osandov#332. Signed-off-by: Omar Sandoval <[email protected]>
Overwriting a file that libelf has already mmap'd can confuse it and cause it to crash. In particular, libelf/elf_begin.c::file_read_elf() initializes Elf_Scn::rawdata_base and Elf_Scn::data_base from the mmap'd file. libelf/elf_getdata.c::__libelf_set_rawdata_wrlock() also sets Elf_Scn::rawdata_base from the mmap'd file. If the file changes between those two events, then Elf_Scn::rawdata_base will change. Then, the following line in libelf/elf_end.c::elf_end() will try to free an mmap'd pointer: if (scn->data_base != scn->rawdata_base) free (scn->data_base); Stephen reported crashes like this from test_gnu_debugaltlink_debug_directories() while testing a patch that inadvertently caused debug info to be indexed on module creation. Reported-by: Stephen Brennan <[email protected]> Signed-off-by: Omar Sandoval <[email protected]>
Since commit 4e83130 ("Introduce module and debug info finder APIs"), DWARF indexing is somewhat lazy. As a result, drgn_void_type() may require DWARF indexing in order to determine the program's main language. This is overkill for drgn_object_init(), which just needs to initialize a valid dummy object that is usually reinitialized immediately. Signed-off-by: Omar Sandoval <[email protected]>
Fixes: 4e83130 ("Introduce module and debug info finder APIs") Signed-off-by: Omar Sandoval <[email protected]>
Mention that `drgn` is shipped with Ubuntu except jammy, but it's an older version Signed-off-by: Michel Lind <[email protected]>
Since libkdumpfile commit 5b044292abe9 ("Clarify and fix attribute data
lifetime") changes the lifetime of attribute values retrieved with
kdump_attr_ref_get(), the extra reference would keep the PRSTATUS blob
around even after kdump_free().
However, the attribute hierarchy cannot change while iterating over the
PRSTATUS attributes, so it is not necessary to take an attribute reference
and we can use kdump_get_typed_attr().
The attribute blob itself should not change either, but it is a good idea
to keep its data pinned, because a raw pointer to it is stored in the
drgn_thread_set hash table. If some code tries to modify the PRSTATUS
attribute data, the attempt will fail with KDUMP_ERR_BUSY rather than leave
a dangling pointer in the hash table and possibly cause a UAF bug later.
The blob pin does not prevent freeing the blob when the blob reference
count reaches zero.
Signed-off-by: Petr Tesarik <[email protected]>
The kdump_get_typed_attr() function prototype changed in libkdumpfile
commit e182aeaf4d72 ("Make kdump_get_typed_attr() easier to use").
Signed-off-by: Petr Tesarik <[email protected]>
…on_name Multiple people have lamented that StackFrame.name is None for functions implemented in assembly or missing debug info for any other reason. With DWARFless debugging, this will be way more common. My original hope was that StackFrame.name would strictly be the function name from the debugging information and that callers would fall back to getting the symbol name themselves. However, the distinction isn't super meaningful to users, so let's add the fallback directly to StackFrame.name and add StackFrame.function_name with the old behavior of StackFrame.name. Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
This was previously used for testing internals via ctypes, but it's no longer needed. Fixes: 7d251fe ("Translate C lexer tests to C unit tests") Signed-off-by: Omar Sandoval <[email protected]>
The x86-64 fallback unwinder currently has a special case for handling a call to a NULL pointer. Other architectures need the same workaround. To avoid code duplication, let's extract the null program counter check into the generic stack tracing code and add a bad_call_unwind architecture callback. This also gives us a centralized place to add heuristics for detecting non-null bad calls. Signed-off-by: Omar Sandoval <[email protected]>
This will mainly be useful for bad_unwind_call implementations that make use of drgn_register_state_dup() and need to mark some registers as unknown in the unwound frame. Signed-off-by: Omar Sandoval <[email protected]>
Since AArch64 uses a link register rather than storing the return address on the stack, this is a bit easier than on x86-64. Fixes osandov#462. Signed-off-by: Omar Sandoval <[email protected]>
This is a shorter-term solution for anyone who can't run a version of drgn with the previous fix. Signed-off-by: Omar Sandoval <[email protected]>
PID 0 is not unique in the Linux kernel; there is a task with PID 0 for each CPU. stack_trace(0) currently fails with a generic "task not found" error message, which can be confusing; see osandov#462. Add a hint to use idle_task() to the error message when the given PID is 0. Signed-off-by: Omar Sandoval <[email protected]>
Alec Rivers noticed in osandov#461 that WITH_LIBKDUMPFILE is misspelled as WITH_KDUMPFILE here. The whole ifdef block isn't actually needed, so remove it. Fixes: 4e330bb ("cli: indicate if drgn was compiled with libkdumpfile") Signed-off-by: Omar Sandoval <[email protected]>
None of our setter functions handle deletions, which pass the value as NULL. This results in a segfault when attempting to access the value. Fix them all with a new convenience macro. Fixes: 50e4ac8 ("libdrgn: allow overriding program default language") Fixes: 4e83130 ("Introduce module and debug info finder APIs") Signed-off-by: Omar Sandoval <[email protected]>
STRING_BUILDER has been really convenient, so let's do the same for vectors and hash tables. Signed-off-by: Omar Sandoval <[email protected]>
…lable Since the module API was introduced, Program.load_debug_info() and the drgn CLI's -s option match strictly based on build IDs. This fails when the build ID is not available, specifically in the case of Linux kernel core dumps without a usable build ID in VMCOREINFO (old versions and a few buggy stable versions). Before the module API, Program.load_debug_info() and -s used any vmlinux file given to them. This caused confusion when the wrong file was given, so we don't want to bring that behavior back. Instead, let's look for a vmlinux file matching the Linux version from VMCOREINFO. Fixes osandov#464. Fixes: 4e83130 ("Introduce module and debug info finder APIs") Signed-off-by: Omar Sandoval <[email protected]>
We do this so often I'm surprised I didn't add this sooner. Signed-off-by: Omar Sandoval <[email protected]>
Tweak style, split up some examples for clarification, and a couple more small updates. Signed-off-by: Omar Sandoval <[email protected]>
I'm not really even sure how or why the URLs changed. Again. Signed-off-by: Stephen Brennan <[email protected]>
Fixes: adf6472 ("Add plugin system") Signed-off-by: Omar Sandoval <[email protected]>
In some test environments, like Buck, it's not easy to reexecute the Python interpreter and have it find the drgn module. Instead of using subprocess, fork and call drgn.cli._main() directly (setting up the necessary pipes ourselves). This gives us slightly less realistic end-to-end coverage, but it makes the tests more robust. Signed-off-by: Omar Sandoval <[email protected]>
We need to close the socket. Signed-off-by: Omar Sandoval <[email protected]>
Our handling of rbp in ORC_TYPE_REGS is incorrect. First, we set the CFI rule to restore rbp from the on-stack pt_regs. Then, we set it again based on bp_reg, which is always ORC_REG_UNDEFINED, meaning to use the previous rbp. The intention is to keep the value we just got from pt_regs, but we interpret it as the previous frame's rbp, which is incorrect. Fix it by skipping the bp_reg handling for ORC_TYPE_REGS. Reported-by: Leo Martins <[email protected]> Signed-off-by: Omar Sandoval <[email protected]>
…iles again Since the fixes commit, the _drgn Python extension module links directly to libpython. However, manylinux extensions are not supposed to link against libpython [1], and the manylinux container images intentionally omit libpython, so building drgn for manylinux fails. Add a few modes of building libdrgn to address this: * --disable-libdrgn --enable-python-extension: build _drgn.so (the Python extension module), which doesn't link against libpython, and don't build libdrgn.so. This is what setup.py uses since Python-only installations don't need libdrgn. It works in manylinux. * --enable-python: build libdrgn.so with Python support, which links against libpython, and don't build _drgn.so. In this configuration, libdrgn.so also works as the Python extension module, so _drgn.so can be installed as a symlink to libdrgn.so. * (No options, equivalent to --enable-libdrgn --disable-python --disable-python-extension): build libdrgn.so with no Python support, don't build _drgn.so. For backwards-compatibility, this is the default when building libdrgn directly, but I doubt anyone does that these days. * --enable-libdrgn --enable-python-extension: build libdrgn.so with Python support, which links against libpython, and build _drgn.so, which doesn't link against libpython. I guess this works but I don't see a reason to prefer it over --enable-python + a symlink. This also requires refactoring our libtool convenience libraries. Hopefully this sucks less in Meson. [1]: https://peps.python.org/pep-0513/#libpythonx-y-so-1 Fixes: 3d49360 ("libdrgn: combine libdrgn and _drgn Python extension into one .so") Signed-off-by: Omar Sandoval <[email protected]>
The new version includes a new kdumpid binary with a dependency on BFD that we don't want, so disable it. Signed-off-by: Omar Sandoval <[email protected]>
Some versions of the kernel fail to build for ppc64 with GCC 12 and CONFIG_KPROBES enabled due to -Wdangling-pointer. Backport the commit that disabled that warning. Fixes: 9b7297d ("vmtest.config: enable CONFIG_KPROBES for upcoming kmodify breakpoints") Signed-off-by: Omar Sandoval <[email protected]>
The GitHub Actions Ubuntu 20.04 image has just been shut down: actions/runner-images#11101. Update to Ubuntu 22.04. Since 22.04 doesn't include Python 3.6 or 3.7, also drop those. I've already announced that this upcoming release will be the last one with support for those Python versions (osandov#467). I'll test them manually before cutting the release. Signed-off-by: Omar Sandoval <[email protected]>
I've found that these flood the logs and often make me lose more interesting stuff in the scrollback. The values are available in Module.section_addresses, so just log where we got them. Signed-off-by: Omar Sandoval <[email protected]>
…lugins These options also disable directories added by plugins, so document that. Signed-off-by: Omar Sandoval <[email protected]>
dnf_debug_info_finder() needs a couple more comments, and example_debug_info_finder() wouldn't actually run as is. Signed-off-by: Omar Sandoval <[email protected]>
It's missing a blank line, which messes up the formatting. Also reword it a tiny bit. Fixes: 545aa52 ("mm, slab: Fix test failures on kernels with SLOB") Signed-off-by: Omar Sandoval <[email protected]>
The major reworks for the module/debug info finder APIs and proper debuginfod support substantially change the story for getting debugging symbols. Revamp our documentation: * Document how to use debuginfod on different distributions. * Add flow charts with our recommendations on each distribution. * Add openSUSE documentation. * Also document how to build with debugging symbols on different build systems. Closes osandov#380. Signed-off-by: Omar Sandoval <[email protected]>
Fixes: adf6472 ("Add plugin system") Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Omar Sandoval <[email protected]>
struct foo64 is only 20 bytes with i386's alignment requirements, so we get the following compiler warning on i386: ‘memcpy’ forming offset [20, 23] is out of the bounds [0, 20] of object ‘foo’ with type ‘struct foo64’ and a crash when running the tests. Fix it by adding padding manually. I verified that this only affects the test cases. All of the structs we actually use deserialize_struct64 for are properly defined. Fixes osandov#493. Signed-off-by: Omar Sandoval <[email protected]>
We need to catch and print exceptions in the forked process. stdout and stderr are also str, so we shouldn't be calling .decode(). Fixes: 22ac3f3 ("tests: don't shell out for CLI tests") Signed-off-by: Omar Sandoval <[email protected]>
drgn -e '' should start drgn, execute the empty statement, and exit, not enter interactive mode. Fix it by checking whether args.exec is None instead of bool(args.exec). Fixes: 150ee76 ("cli: add -e option to exec() code directly") Signed-off-by: Omar Sandoval <[email protected]>
We don't have a nice automated way to run the C unit tests from the main vmtest CLI yet (I'm probably going to wait until we convert to Meson to see how that will work), but at least install check so it can be done manually. Signed-off-by: Omar Sandoval <[email protected]>
6420980 to
b067e43
Compare
|
Before we move forward with this, can we get a handle on why we're only updating os-upgrade, and not develop? |
|
Also, do we want to update os-upgrade, vs. keeping in sync with develop.. and then potentially doing this sync on develop, after the LTS "merge"? assuming there's a good reason not to do the sync on develop prior to the merge..? |
|
I'm not going to move forward with this @prakashsurya unless you want to move forward with it. We can certainly take care of this merge on develop later. |
|
We've decided to pause merge conflict resolution for drgn until after the os-upgrade branches have been merged to develop. |
Test restoring the pre-commit hook