Skip to content

build failure for Python 3.13 in EESSI 2025.06 due to missing __gcov_indirect_call for test_embed during profile-guided optimization #1227

@boegel

Description

@boegel

The build of Python-3.13.1-GCCcore-14.2.0.eb for EESSI 2025.06 is failing with:

LD_LIBRARY_PATH=/tmp/eessibot/easybuild/build/Python/3.13.1/GCCcore-14.2.0/Python-3.13.1 ./python -m test --pgo --timeout=
...
0:00:13 load avg: 1.33 [16/44] test_embed
test test_embed failed
0:00:14 load avg: 1.33 [17/44] test_float -- test_embed failed (71 failures)

The actual problem becomes clear when trying to run the testembed binary from the interactive debug shell that can be started with the cmd.sh script that is generated by EasyBuild:

eb-shell> Programs/_testembed
Programs/_testembed: symbol lookup error: Programs/_testembed: undefined symbol: __gcov_indirect_call

Indeed, the _testembed binary is missing this symbol:

eb-shell> nm Programs/_testembed | grep __gcov_indirect_call
                 U __gcov_indirect_call
                 U __gcov_indirect_call_profiler_v4

(U stands for "undefined" here)

This symbol is supposed to get resolved by the libgcov.a static library provided by GCC:

eb-shell> nm $EBROOTGCCCORE/lib/gcc/x86_64-pc-linux-gnu/14.2.0/libgcov.a | grep __gcov_indirect_call
0000000000000000 B __gcov_indirect_call
0000000000000000 T __gcov_indirect_call_profiler_v4
0000000000000220 T __gcov_indirect_call_profiler_v4_atomic

It was a bit tricky to figure out, but re-running the command that links the _testembed binary with an extra option (--trace-symbol=__gcov_indirect_call) to let ld report how the __gcov_indirect_call symbol gets resolved helps to pinpoint the problem:

$ gcc -Wl,--trace-symbol=__gcov_indirect_call ... -fprofile-generate ... -o Programs/_testembed: ...
/cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/usr/bin/ld: Programs/_testembed.o (symbol from plugin): reference to __gcov_indirect_call
/cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/usr/bin/ld: ./libpython3.13.so: definition of __gcov_indirect_call
/cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/usr/bin/ld: /tmp/eb-fpnz_fhe/ccGRDgYP.lto.o: reference to __gcov_indirect_call

This shows that the __gcov_indirect_call is actually being resolved through the libpython3.13.so in the current directory.
Indeed, this (temporary, profile-enabled) library provides this symbol:

$ nm ./libpython3.13.so | grep __gcov_indirect_call
0000000000000020 B __gcov_indirect_call
000000000063f320 T __gcov_indirect_call_profiler_v4
000000000063f540 T __gcov_indirect_call_profiler_v4_atomic

Checking which libraries are used by _testembed at runtime then reveals the problem:

eb-shell> ldd Programs/_testembed
        linux-vdso.so.1 (0x000014f8f1e31000)
        libpython3.13.so.1.0 => /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/usr/lib/../lib64/libpython3.13.so.1.0 (0x000014f8f1800000)
        libm.so.6 => /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/../lib64/libm.so.6 (0x000014f8f1714000)
        libc.so.6 => /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/../lib64/libc.so.6 (0x000014f8f1539000)
        /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib64/ld-linux-x86-64.so.2 (0x000014f8f1e33000)

Because of the RPATH section in _testembed, which includes the path to libraries provided by the compat layer, the libpython3.13.so.1.0 being used at runtime is the one from the compat layer.
And this one does not provide the __gcov_indirect_call symbol, because it wasn't built with -fprofile-generate (which implies linking in -lgcov):

eb-shell> nm /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/usr/lib/../lib64/libpython3.13.so.1.0
nm: /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/usr/lib/../lib64/libpython3.13.so.1.0: no symbols

Hence, the LD_LIBRARY_PATH=... that is used as prefix for the ./python -m test --pgo --timeout= command doesn't have the intended effect anymore of making sure that libpython3.13.so is being picked up from the current working directory, because it gets overruled by the RPATH section in the _testembed binary.
It's also a stroke of "bad luck", since there happens to be a libpython3.13.so.1.0 in the compat layer. If not, then eventually the path specified in $LD_LIBRARY_PATH would be picked up, and it would all work out fine. This explains why this problem doesn't pop up with Python-3.12.3-GCCcore-13.3.0.eb.

Metadata

Metadata

Assignees

No one assigned

    Labels

    2025.06-software.eessi.io2025.06 version of software.eessi.iobugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions