Skip to content

Conversation

@lizhenneng
Copy link
Contributor

When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004, the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags, which results in compilation failures for certain extended instructions, but the correct CPU flags can be obtained by using gcc -march.

Make sure to read the contributing guidelines before submitting a PR

When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004,
the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags,
which results in compilation failures for certain extended instructions,
but the correct CPU flags can be obtained by using gcc -march.

Signed-off-by: lizhenneng <[email protected]>
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 25, 2025
@taronaeo taronaeo linked an issue Sep 25, 2025 that may be closed by this pull request
@angt
Copy link
Collaborator

angt commented Sep 25, 2025

Hi,

Could you share the output of your test ?

From my understanding, -march=native and -mcpu=native should give the same result only for GCC ≥ 9, but you still need to explicitly check the -mcpu flag.

$ for d in arch cpu; do for r in arch cpu; do echo "use -m$d=native, read -m$r:" && gcc -m$d=native -E -v - 2>&1 </dev/null | grep -o "m$r=[^ ']*"; done; done
use -march=native, read -march:
use -march=native, read -mcpu:
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
use -mcpu=native, read -march:
use -mcpu=native, read -mcpu:
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs

$ gcc --version
gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@lizhenneng
Copy link
Contributor Author

Hi,

Could you share the output of your test ?

From my understanding, -march=native and -mcpu=native should give the same result only for GCC ≥ 9, but you still need to explicitly check the -mcpu flag.

$ for d in arch cpu; do for r in arch cpu; do echo "use -m$d=native, read -m$r:" && gcc -m$d=native -E -v - 2>&1 </dev/null | grep -o "m$r=[^ ']*"; done; done
use -march=native, read -march:
use -march=native, read -mcpu:
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
use -mcpu=native, read -march:
use -mcpu=native, read -mcpu:
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs

$ gcc --version
gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

kylin@kylin-pc:~$ gcc -mcpu=native -E -v -
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc_old
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='openKylin 12.3.0-1ok3k0.1' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.3.0 (openKylin 12.3.0-1ok3k0.1)
COLLECT_GCC_OPTIONS= '-E' '-v' '-mlittle-endian' '-mabi=lp64' '-march=armv8-a+crypto+crc+lse+fp16+rcpc+rdma+dotprod+sha3+sm4'
/usr/lib/gcc/aarch64-linux-gnu/12/cc1 -E -quiet -v -imultiarch aarch64-linux-gnu - -mlittle-endian -mabi=lp64 -march=armv8-a+crypto+crc+lse+fp16+rcpc+rdma+dotprod+sha3+sm4 -fasynchronous-unwind-tables -dumpbase -
ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/12/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/12/../../../../aarch64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/aarch64-linux-gnu/12/include
/usr/local/include
/usr/include/aarch64-linux-gnu
/usr/include
End of search list.

@angt
Copy link
Collaborator

angt commented Sep 29, 2025

Thanks for the help!
I've opened a PR (#16333) that should fix the build on your version of GCC while working for recent ones too.
Please let me know if it works for you!

@angt
Copy link
Collaborator

angt commented Nov 6, 2025

up :) This PR is still relevant.

gcc-12:

$ for d in arch cpu; do for r in arch cpu; do echo "use -m$d=native, read -m$r:" && gcc-12 -m$d=native -E -v - 2>&1 </dev/null | grep -o "m$r=[^ ']*"; done; done
use -march=native, read -march:
march=armv9-a+crypto+rng+sve2-aes+sve2-sha3+sve2-bitperm+i8mm+bf16+nossbs+nopredres
march=armv9-a+crypto+rng+sve2-aes+sve2-sha3+sve2-bitperm+i8mm+bf16+nossbs+nopredres
march=armv9-a+crypto+rng+sve2-aes+sve2-sha3+sve2-bitperm+i8mm+bf16+nossbs+nopredres
use -march=native, read -mcpu:
use -mcpu=native, read -march:
use -mcpu=native, read -mcpu:
mcpu=demeter+crypto+sve2-aes+sve2-sha3+noprofile+nomemtag+nossbs+nopredres
mcpu=demeter+crypto+sve2-aes+sve2-sha3+noprofile+nomemtag+nossbs+nopredres
mcpu=demeter+crypto+sve2-aes+sve2-sha3+noprofile+nomemtag+nossbs+nopredres

gcc-14:

$ for d in arch cpu; do for r in arch cpu; do echo "use -m$d=native, read -m$r:" && gcc-14 -m$d=native -E -v - 2>&1 </dev/null | grep -o "m$r=[^ ']*"; done; done
use -march=native, read -march:
use -march=native, read -mcpu:
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
use -mcpu=native, read -march:
use -mcpu=native, read -mcpu:
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still confusing to me how -mcpu and -march work and when to use one or the other. But it seems you've confirmed their usage, so should be good to merge.

Will let @slaren have the final word.

@angt
Copy link
Collaborator

angt commented Nov 7, 2025

Indeed it is!

The key point is that ARM's arch covers a very large range of CPUs.
-march works, but it misses almost all the details of the specific CPU you're running on.

@slaren
Copy link
Member

slaren commented Nov 7, 2025

This is very hard to test for, and I am starting to think that this approach to detect the CPU features on ARM is never going to work reliably. It may make more sense to write a program that returns the features in a similar way to arm/cpu-feats.cpp, and use that as the basis to decide what architecture options to enable for GGML_NATIVE. Or alternatively, instruct ARM users to use GGML_CPU_ALL_VARIANTS, and let ggml choose the best option at runtime.

@angt
Copy link
Collaborator

angt commented Nov 7, 2025

I agree the current approach is not ideal and we can definitely do something better, maybe similar to how I handle it in the new packaging layer: https://github.com/angt/target-features.

But today, gcc 12 si still very common and does not optimize correctly for ARM.

@slaren
Copy link
Member

slaren commented Nov 7, 2025

Ok, let's merge this to fix the immediate issue.

@slaren slaren merged commit 7c23f3f into ggml-org:master Nov 7, 2025
62 of 65 checks passed
@angt
Copy link
Collaborator

angt commented Nov 7, 2025

Damn.... i commented the wrong PR... :(

It was this one: #16333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compile bug: Failed to retrive the correct cpu flag on arm64

4 participants