Skip to content

Conversation

@kcbanner
Copy link
Contributor

@kcbanner kcbanner commented Aug 26, 2023

Closes #16975.

Before:

thread 4288 panic: panic
Panicked during a panic. Aborting.
Aborted

After:

thread 4135 panic: panic
/mnt/c/cygwin64/home/kcbanner/temp/16975/lib.zig:2:5: 0x2ae695 in panic (lib)
    @panic("panic");
    ^
/mnt/c/cygwin64/home/kcbanner/temp/16975/exe.zig:4:10: 0x23a7e8 in main (exe)
    panic();
         ^
/mnt/c/cygwin64/home/kcbanner/kit/zig/lib/std/start.zig:360:22: 0x23a0ac in posixCallMainAndExit (exe)
    while (envp_optional[envp_count]) |_| : (envp_count += 1) {}
                     ^
/mnt/c/cygwin64/home/kcbanner/kit/zig/lib/std/start.zig:243:5: 0x239c01 in _start (exe)
    asm volatile (switch (native_arch) {
    ^
???:?:?:

The issue was caused by the version of elf_aux_maybe (linux.zig) in lib not being initialized by the startup code (since only the version in the exe existed), which caused the phdr lookup to fail (due to underflow when subtracting from zero).

The panic within a panic trace:

#0  0x00000000002ddc54 in process.getBaseAddress () at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/process.zig:1079
#1  0x00000000002cec55 in os.dl_iterate_phdr__anon_5275 (context=0x7fffffffc120) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/os.zig:5409
#2  0x00000000002ce749 in debug.DebugInfo.lookupModuleDl (self=0x326608 <debug.self_debug_info>, address=2992457) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:1837
#3  0x00000000002cf5dd in debug.DebugInfo.getModuleForAddress (self=0x326608 <debug.self_debug_info>, address=2992457) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:1561
#4  0x00000000002f633f in debug.StackIterator.next_unwind (self=0x7fffffffd120) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:643
#5  0x00000000002e9670 in debug.StackIterator.next_internal (self=0x7fffffffd120) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:669
#6  0x00000000002dad0f in debug.StackIterator.next (self=0x7fffffffd120) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:584
#7  0x00000000002daadc in debug.writeCurrentStackTrace__anon_7051 (out_stream=..., debug_info=0x326608 <debug.self_debug_info>, tty_config=..., start_addr=...)
    at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:731
#8  0x00000000002aef67 in debug.dumpCurrentStackTrace (start_addr=...) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:127
#9  0x00000000002ae897 in debug.panicImpl (trace=0x0, first_trace_addr=..., msg=...) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/debug.zig:421
#10 0x00000000002ae6f7 in builtin.default_panic (msg=..., error_return_trace=0x0, ret_addr=...) at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/builtin.zig:813
#11 0x00000000002ae696 in panic () at lib.zig:2
#12 0x000000000023a7e9 in exe.main () at exe.zig:4
#13 0x000000000023a0ad in start.posixCallMainAndExit () at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/start.zig:360
#14 0x0000000000239c02 in _start () at /home/kcbanner/kit/zig-linux-x86_64-0.12.0-dev.170+750998eef/lib/std/start.zig:243

However, this fix does pose a couple questions:

@kcbanner
Copy link
Contributor Author

The CI failure:

error: ld.lld: relocation R_X86_64_PC32 cannot be used against symbol '_elf_aux_maybe'; recompile with -fPIC
    note: defined in /mnt/c/cygwin64/home/kcbanner/kit/zig/test/standalone/load_dynamic_library/zig-cache/o/a3b4a5c0e1497f31df239eba4427558f/libadd.so.1.0.0.o
    note: referenced by linux.zig:163 (/mnt/c/cygwin64/home/kcbanner/kit/zig/lib/std/os/linux.zig:163)
    note:               /mnt/c/cygwin64/home/kcbanner/kit/zig/test/standalone/load_dynamic_library/zig-cache/o/a3b4a5c0e1497f31df239eba4427558f/libadd.so.1.0.0.o:(os.linux.getauxval)

This happens on the tests which use dynamic linking - but specifying force_pic on the exe/lib doesn't resolve it.

@kcbanner kcbanner changed the title linux: export elf_aux_maybe so that libraries can call getauxval linux: export getauxval when not compiling with libc Aug 28, 2023
Copy link
Member

@kubkon kubkon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to my inexperienced eyes, but I do have a few questions that perhaps you can answer.

/// This matches the libc getauxval function.
pub extern fn getauxval(index: usize) usize;
comptime {
@export(getauxvalImpl, .{ .name = "getauxval", .linkage = .Weak });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, we are exporting it as weak because we expect it to be overriden by a different version in an exe/dso that has getauxval hooked up into startup routine? What if we're building an exe and link with a dso? Which version takes precedence then? Do I even understand the problem correctly here?

Copy link
Contributor Author

@kcbanner kcbanner Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, we are exporting it as weak because we expect it to be overriden by a different version in an exe/dso that has getauxval hooked up into startup routine?

Yes, exactly. The intention is that the version used is the one that references the elf_aux_maybe initialized by the startup code.

What if we're building an exe and link with a dso? Which version takes precedence then? Do I even understand the problem correctly here?

My assumption had been that the version in the exe would take precedence, but if this is not true in all cases then this solution won't be adequate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha thanks!

@Aransentin
Copy link
Contributor

Right now this pollutes every non-libC Zig Linux binary with the getauxv symbol & code, even if they never load libraries... This also means the compiler can't optimize away the ELF header parsing if nothing else uses it.

Not a massive amount of bloat for real programs, but significant for the tiny "demo" binaries I like to make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stack trace not printed when panicking in imported function

3 participants