Skip to content

Conversation

martin-frbg
Copy link
Collaborator

This is sufficient to enable the SME version of the "small matrix SGEMM" kernel on Apple M4
Also added is commented-out code for recognizing the M4 as ARMV9SME - this is not yet useful except for testing, as
none of the ARMV8SVE kernels that the V9SME target builds upon support streaming SVE.

@vaiskv
Copy link
Contributor

vaiskv commented Apr 13, 2025

Hi @martin-frbg

For a non-Apple CPU, the check should enter this part of get_coretype() (verified on QEMU). Here when the TARGET is set as ARMV8, gotoblas_ARMV9SME is NULL whereas when the TARGET is set to ARMV9SME, gotoblas_ARMV9SME is not NULL and hence the architecture initialization is successful.

Please note that for compilation I am using the following command:

make BINARY=64 CC=aarch64-linux-android35-clang ONLY_CBLAS=1 HOSTCC=gcc TARGET=ARMV8 DYNAMIC_ARCH=1

Also, though the test is on QEMU, the SME sgemmdirect kernel will eventually have to run on a Qualcomm device as well. So I think we need to add support_sme1() check for 0x51 implementer ID here similar to the one added by you for Apple M4

@martin-frbg
Copy link
Collaborator Author

The way this is supposed to work is that for Linux, it checks a variety of implementer and cpu IDs, and if none of them matches, it runs support_sme1() to see if it should return ARMV9SME.
I think your future Qualcomm device should fit right into this code even if the 0x51 implementer id is not specifically catered for. (Unless it is Windows on Arm, for which there is currently no conditional code (like the sysctl for MacOS) instead of the Linux-specific hwcap or proc-based calls.)
I wonder if qemu does not set the capability flag for SME, so that support_sme1 is returning false ?

@vaiskv
Copy link
Contributor

vaiskv commented Apr 13, 2025

On QEMU, support_sme1() returns true which I verified using debug prints.

I think the issue is somewhere in gotoblas->init returning null.


 if (gotoblas && gotoblas->init) {
    strncpy(coren, gotoblas_corename(), 20);
    sprintf(coremsg, "Core: %s\n", coren);
    openblas_warning(2, coremsg);
    gotoblas -> init();
  } else {
    openblas_warning(0, "OpenBLAS : Architecture Initialization failed. No initialization function found.\n");
    exit(1);
  }

Moreover, the check for (gotoblas && gotoblas->init) is true when the library is compiled with TARGET=ARMV9SME DYNAMIC_ARCH=1. It fails when TARGET=ARMV8 or ARMV8SVE , DYNMAIC_ARCH=1.

I believe the init function maps to init_parameter() taken from the generated file setparam-ARMV9SME.c. This object (setparam-ARMV9SME.o) is getting generated in both the cases (ARMV8 and ARMV9SME). Not sure if I am missing something here .. :(

@vaiskv
Copy link
Contributor

vaiskv commented Apr 23, 2025

Hi @martin-frbg

Were you able to check on this issue? I tried to fix but without any luck. Please let me know if you figure out a solution.

@martin-frbg
Copy link
Collaborator Author

Unfortunately I'm still at the stage of building a kernel with SME support in a Debian VM under qemu (which is a lot slower than anticipated even on a fast x86_64). Wanted to try Arm FVP instead but did not quite figure out how to make that work

@martin-frbg
Copy link
Collaborator Author

Think I got it sorted now (after spending too much time in vain trying to get qemu with SME working on x86_64).
The issue wasn't so much the "init" function of ARMV9SME being NULL, but it pointing to the preceding entry in the struct, zlaswp_ncopy, as the conditional inclusion of sgemm_direct led to an unhealthy difference in the gotoblas struct between arm64 targets with and without SME.

@martin-frbg martin-frbg merged commit 5141a90 into OpenMathLib:develop May 10, 2025
113 of 119 checks passed
@qti-vaiskv
Copy link

Hi @martin-frbg ,

The issue of architecture init failure is fixed, but when we compile with TARGET=ARMV8 DYNAMIC_ARCH=1, sgemm_direct is not used by cblas_sgemm. It is used only when TARGET=ARMV9SME DYNAMIC_ARCH=1 case only.

@martin-frbg
Copy link
Collaborator Author

Hmm. This appears to be due to a general flaw in the current implementation of the "direct" SGEMM code - USE_SGEMM_KERNEL_DIRECT is not actually available as a cpu-specific datum at runtime, as the gemm interface is only compiled once for the TARGET cpu. This was probably meant to be an "OR" conditional, instead of tying DYNAMIC_ARCH to it. (Another oddity is that the direct codepath is only available to CBLAS, and only when using C-style row-major order)

@martin-frbg
Copy link
Collaborator Author

#5268

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants