Skip to content

Conversation

taronaeo
Copy link
Collaborator

@taronaeo taronaeo commented Sep 10, 2025

closes #13243

Introduce s390x & ppc64le CI using IBM Actions on POWER and Z Runner images.

TODO:

Edit: I have to use this PR to test the GitHub Actions as only Llama.cpp is authorised to use the s390x and ppc64le images. Unfortunately I do not have an alternative to develop this elsewhere and submit the PR once I'm ready.

@github-actions github-actions bot added the devops improvements to build systems and github actions label Sep 10, 2025
@CISC
Copy link
Collaborator

CISC commented Sep 10, 2025

But it's not a cross build is it? Should be in build.yml.

@taronaeo
Copy link
Collaborator Author

Nope, native build. I am testing around, hence draft haha. Thanks for the heads up!

@CISC
Copy link
Collaborator

CISC commented Sep 10, 2025

I think you will need to resolve the warnings and revert before undrafting, the build is done with LLAMA_FATAL_WARNINGS=ON for a reason.

This looks like an actual bug (fixed in #15928):
https://github.com/ggml-org/llama.cpp/actions/runs/17615331967/job/50046823487#step:5:87

This might be a bug, as it suggests these loops are no-op:
https://github.com/ggml-org/llama.cpp/actions/runs/17615204324/job/50046429801#step:5:121

for (int i = 0; i < np; i += GGML_F32_STEP) {
for (int j = 0; j < GGML_F32_ARR; j++) {
ay[j] = GGML_F32_VEC_LOAD(y + i + j*GGML_F32_EPR);
ay[j] = GGML_F32_VEC_MUL(ay[j], vx);
GGML_F32_VEC_STORE(y + i + j*GGML_F32_EPR, ay[j]);
}
}

for (int i = 0; i < np; i += GGML_F16_STEP) {
for (int j = 0; j < GGML_F16_ARR; j++) {
ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
ay[j] = GGML_F16_VEC_MUL(ay[j], vx);
GGML_F16_VEC_STORE(y + i + j*GGML_F16_EPR, ay, j);
}
}

@taronaeo
Copy link
Collaborator Author

I think you will need to resolve the warnings and revert before undrafting, the build is done with LLAMA_FATAL_WARNINGS=ON for a reason.

Yep, will take note of that before marking as ready. I wanted to progress past the compile stage to fix the endianness issues with the test models first because that is the problematic part.

Also, the problem about GCC on s390x and ppc64le is that the compiler is a little more strict. Things that would have been okay on x86 or ARM are reported as warnings on s390x and ppc64le and, in this case, marked as fatal.

I suppose we can either have the codeowner work to fix the warning on the affected lines, or (not ideal) we disable warnings as fatal for s390x and ppc64le.

@github-actions github-actions bot added the testing Everything test related label Sep 10, 2025
@CISC
Copy link
Collaborator

CISC commented Sep 10, 2025

Maybe just convert and overwrite the vocab GGUFs for BE tests?

@CISC
Copy link
Collaborator

CISC commented Sep 10, 2025

You need the .out files too.

@taronaeo
Copy link
Collaborator Author

It's 3 AM on my side now, will continue this tomorrow 🙂

@CISC
Copy link
Collaborator

CISC commented Sep 10, 2025

Looks like there's an endian-issue in WPM and BPE tokenizer.

@CISC
Copy link
Collaborator

CISC commented Sep 10, 2025

If you add vocab GGUF conversion instead of duplicating files, test-tokenizers-ggml-vocabs could also do that, so preferable IMO.

@taronaeo
Copy link
Collaborator Author

If you add vocab GGUF conversion instead of duplicating files

You mean use the gguf-py/gguf/scripts/gguf_convert_endian.py script to convert the GGUF files during CI tests right? That's a good approach but I didn't want to increase the CI runtime because our s390x and ppc64le runners are pretty limited (supplied by IBM Actions team directly).

I don't think storage is an issue here also right? But if this is a concern, do let me know and we can go the conversion route during CI run.

test-tokenizers-ggml-vocabs could also do that, so preferable IMO.

I didn't quite get this part. Can you explain further?

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

If you add vocab GGUF conversion instead of duplicating files

You mean use the gguf-py/gguf/scripts/gguf_convert_endian.py script to convert the GGUF files during CI tests right? That's a good approach but I didn't want to increase the CI runtime because our s390x and ppc64le runners are pretty limited (supplied by IBM Actions team directly).

Yes, shouldn't add much to runtime as they are only small vocab files.

I don't think storage is an issue here also right? But if this is a concern, do let me know and we can go the conversion route during CI run.

Generally we don't want to bloat git with more binaries than necessary, that's why new vocab files are being stored on HF.

test-tokenizers-ggml-vocabs could also do that, so preferable IMO.

I didn't quite get this part. Can you explain further?

This test downloads extra vocab files from HF, it's preferable that you also run this test and not skip it, but these files are little-endian only, so must be converted.

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

Don't try to install all the repo requirements just for gguf_convert_endian.py, pip install gguf or even just numpy and tqdm is enough.

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

It's just gguf, not gguf-py. :)

@taronaeo
Copy link
Collaborator Author

It's just gguf, not gguf-py. :)

I was trying to target the relative path so it installs from local. Should I prefer installing via pypi instead?

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

It's just gguf, not gguf-py. :)

I was trying to target the relative path so it installs from local. Should I prefer installing via pypi instead?

Ah, no, that's fine, it'll install the right dependencies then.

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

Here's a nice trick to check if system is big-endian, that can be added to test-tokenizers-repo.sh, returns 1 for big-endian and 0 for little-endian:

echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c1

@taronaeo
Copy link
Collaborator Author

Taking a quick glance at the tokeniser tests, its clear that Little-Endian and Big-Endian models process these a little differently (i.e., space vs. no space before text)

- 6: src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天~'
+ 6: res: 'Hello, y'all! How are you 😁?我想在apple工作1314151天~'
6: tok: 15496 11 331 6 439 0 1374 389 345 30325 223 5633 22755 239 46349 111 28839 101 18040 32432 98 43291 1485 1415 24309 25465 171 121 252 

For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos. I suppose it's okay for me to edit the .out file to match the Big-Endian results right? :)

@taronaeo
Copy link
Collaborator Author

taronaeo commented Sep 11, 2025

Here's a nice trick to check if system is big-endian, that can be added to test-tokenizers-repo.sh, returns 1 for big-endian and 0 for little-endian:

echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c1

I suppose this is to separate the Little-Endian and Big-Endian .inp and .out files right?

Edit: Doesn't work :(

[taronaeo@aqlinux2 ~]$ echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c1
0

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

I suppose this is to separate the Little-Endian and Big-Endian .inp and .out files right?

For converting the downloaded files.

Edit: Doesn't work :(

Dang, I was making a guess there to invert the result, guess I was wrong, change -c1to -c6 and it should return 0 for big-endian (returns 1 on little-endian).

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

Taking a quick glance at the tokeniser tests, its clear that Little-Endian and Big-Endian models process these a little differently (i.e., space vs. no space before text)

- 6: src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天~'
+ 6: res: 'Hello, y'all! How are you 😁?我想在apple工作1314151天~'
6: tok: 15496 11 331 6 439 0 1374 389 345 30325 223 5633 22755 239 46349 111 28839 101 18040 32432 98 43291 1485 1415 24309 25465 171 121 252 

For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos. I suppose it's okay for me to edit the .out file to match the Big-Endian results right? :)

No, this looks like an endian-bug in the tokenizer, the output must be equal.

Edit: It would be interesting to know what transformers produces though, have you run convert_hf_to_gguf_update.py and looked at ggml-vocab-phi-3.gguf.outf.ex?

@taronaeo
Copy link
Collaborator Author

taronaeo commented Sep 11, 2025

Hmm I've added this check though. I think its enough to prevent the endianness conversion script from running on LE systems.

if: ${{ matrix.build == 's390x' }}

I was thinking your check can be added for the .inp and .out files though, since the tokeniser will be different between LE and BE

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

Hmm I've added this check though. I think its enough to prevent the endianness conversion script from running on LE systems.

Yeah, but test-tokenizers-repo.sh downloads more, see:
https://github.com/ggml-org/llama.cpp/actions/runs/17650662673/job/50160591872?pr=15925#step:8:4454

I was thinking your check can be added for the .inp and .out files though, since the tokeniser will be difference between LE and BE

They should not be different, this is a bug.

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos.

The emoji issue seems to be specific for the WPM tokenizer, so you probably just haven't used any models with that.

@taronaeo
Copy link
Collaborator Author

taronaeo commented Sep 11, 2025

Taking a quick glance at the tokeniser tests, its clear that Little-Endian and Big-Endian models process these a little differently (i.e., space vs. no space before text)

- 6: src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天~'
+ 6: res: 'Hello, y'all! How are you 😁?我想在apple工作1314151天~'
6: tok: 15496 11 331 6 439 0 1374 389 345 30325 223 5633 22755 239 46349 111 28839 101 18040 32432 98 43291 1485 1415 24309 25465 171 121 252 

For emojis, I don't seem to know why it can't produce the correct emojis even though I've had no issues with them for my Z & LinuxONE demos. I suppose it's okay for me to edit the .out file to match the Big-Endian results right? :)

No, this looks like an endian-bug in the tokenizer, the output must be equal.

Edit: It would be interesting to know what transformers produces though, have you run convert_hf_to_gguf_update.py and looked at ggml-vocab-phi-3.gguf.outf.ex?

Got the same tokeniser result using Big-Endian model.

Steps taken:

  1. python3 convert_hf_to_gguf_update.py
  2. python3 convert_hf_to_gguf.py models/tokenizers/phi-3/ --outfile models/ggml-vocab-phi-3.gguf --vocab-only --bigendian
➜  llama.cpp git:(feat/s390x-ci) diff '/Users/taronaeo/Documents/llama.cpp/models/ggml-vocab-phi-3.gguf.out' '/Users/taronaeo/Downloads/ggml-vocab-phi-3.gguf.out'
➜  llama.cpp git:(feat/s390x-ci) echo $?
0
Big-Endian Phi-3 Tokeniser image
 474 287 29871 29946 29871 30226 7378
 11585 7810 295

 259
 1678
 268
 29871 12
 29871 13
 29871 13 13
 29871 13 13 13
 29871 12 13
 15043 3186
 29871 15043 3186
 15043 2787
 29871 15043 2787
 29871 15043 2787 29991
 15043 29892 3186 29991
 29871 15043 29892 3186 29991
 29871 445 338 29871 243 162 169 156 29889 8223
 281 29900 29946 29947 29871 29955 9161 13535 18031 2176 6905
 1538 4851 665 1386 29713 1305
 29871 31849 31324 31934 228 162 142 228 161 146 228 162 133 228 161 153 228 161 186 31708 228 162 132 31708 228 161 165 31324 228 161 136 228 161 132 228 161 158 228 161 136 228 162 132 228 161 140
 29871 243 162 157 131 313 8945 29897 29871 243 162 155 185 30722 243 162 143 174 30598 313 20787 953 3848 275 16125 630 29897 29871 31681 313 6194 953 29877 2397 393 756 967 1914 5993 29897
 15043
 29871 15043
 259 15043
 1678 15043
 268 15043
 268 15043 13 1678 15043
 29871 313
 29871 13 353
 525 3152
 15043 29892 343 29915 497 29991 1128 526 366 29871 243 162 155 132 1577 30672 31522 30505 11548 31041 30732 29896 29941 29896 29946 29896 29945 29896 30408 30739
 1738 6824 21004
 29871 29941
 29871 29941 29941
 29871 29941 29941 29941
 29871 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941 29941 29941
 29871 29941 29941 29941 29941 29941 29941 29941 29941 29941
 315 228 190 176 29874 10630 30529 29873
 29871 2313 3163
 29871 13 29871 13 13 29871 13 13 13 29871 12 29871 12 12 29871 12 13 259 13 1678 13 268 13 418 13 243 162 157 131 313 8945 29897 29871 243 162 155 185 30722 243 162 143 174 30598 313 20787 953 3848 275 16125 630 29897 29871 31681 29871 243 162 169 156 243 162 169 156 29871 29941 29871 29941 29941 29871 29941 29941 29941 29871 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29941 29941 29871 29941 29941 29941 29941 29941 29941 29941 29941 29871 29941 29889 29941 29871 29941 636 29941 29871 29941 856 29941 29871 31849 31324 31934 228 162 142 228 161 146 228 162 133 228 161 153 228 161 186 31708 228 162 132 31708 228 161 165 31324 228 161 136 243 162 155 132 1577 30672 31522 30505 11548 31041 30732 29896 29941 29896 29946 29896 29945 29896 30408 30739 448 23648 2751 25512 1538 4851 665 1386 29713 1305 14550 4907 11120 16159 16159 16159 15945 15945 3045 636 6824 6824 6824 8773 8773 8773 306 29915 345 1063 525 29873 1025 540 29915 29879 727 29892 525 1525 366 1854 29973 525 29924 451 1854 306 29915 645 1207 372 29892 525 29928 366 763 777 23429 29973 1334 29915 29963 29872 263 29915 29880 29931

@taronaeo
Copy link
Collaborator Author

Hmm I've added this check though. I think its enough to prevent the endianness conversion script from running on LE systems.

Yeah, but test-tokenizers-repo.sh downloads more, see: https://github.com/ggml-org/llama.cpp/actions/runs/17650662673/job/50160591872?pr=15925#step:8:4454

Got it. I was looking at the test-tokenizer-0 results and I didn't scroll down to check that portion.

With regards to the tokeniser, I'm a little stumped now. If the generated .out file is the same as the Little-Endian variant, as you said, the test results should match 1-to-1 to Little-Endian. I'm not an expert on this, any idea how can we move forward?

@CISC
Copy link
Collaborator

CISC commented Sep 11, 2025

With regards to the tokeniser, I'm a little stumped now. If the generated .out file is the same as the Little-Endian variant, as you said, the test results should match 1-to-1 to Little-Endian. I'm not an expert on this, any idea how can we move forward?

Someone will have to debug the tokenizers in question in llama-vocab.cpp and figure out where it goes wrong.

@CISC
Copy link
Collaborator

CISC commented Sep 12, 2025

Got the same tokeniser result using Big-Endian model.

Great, but that was just SPM, which appears to work fine on big-endian, so we need to test some others. Check bert-bge (WPM) and t5 (UGM, though I think this one is skipped?) too. If you remove the chkhsh entry for f.ex. seed-coder (BPE) in convert_hf_to_gguf.py and run the update script again, it should have downloaded and generated files to test with.

@taronaeo
Copy link
Collaborator Author

Ah, I have a flight tomorrow and will on vacation from 14 to 21 September. I'll come back to this the week after, or I'll check if @AlekseiNikiforovIBM is able to continue this whilst I'm away :)

taronaeo and others added 14 commits September 26, 2025 15:38
Array q8bytes had only 4 elements allocated, but 8 elements accessed.
This lead to write out of bounds and later read of overwritten values out of bounds
and incorrect result.

Signed-off-by: Aaron Teo <[email protected]>
for some reason it keeps failing test-thread-safety tests and I do not
    have a machine that is able to replicate the tests.

Signed-off-by: Aaron Teo <[email protected]>
Ensure it works even if both XDG_CACHE_HOME and HOME are unset.
This might happen in containers.

Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Only memcpy data from sections argument if it's non-NULL.

Signed-off-by: Aaron Teo <[email protected]>
@AlekseiNikiforovIBM
Copy link
Contributor

I've rebased this PR since #16275 was merged, removed miniaudio-related commits since #16212 was merged, and removed commits devops: add s390x to build-linux-cross and Revert "devops: add s390x to build-linux-cross" since together they change nothing.

Co-authored-by: Sigbjørn Skjæret <[email protected]>
@ggerganov
Copy link
Member

@taronaeo It's up: https://huggingface.co/ggml-org/models/tree/main/tinyllamas

Would it be possible to upload stories15M-be.Q4_0.gguf there too?

Yes, it's uploaded now.

@CISC
Copy link
Collaborator

CISC commented Sep 26, 2025

@AlekseiNikiforovIBM Tests passing, wait for @taronaeo to merge or shall I merge when CI is done?

@taronaeo
Copy link
Collaborator Author

Feel free to merge this, I didnt have the time to check the CI results :)

@taronaeo
Copy link
Collaborator Author

  • CI / ggml-ci-x64-cpu-amx failing has been the same as other PRs.
  • CI / ggml-ci-arm64-cpu-high-perf-sve, CI / ggml-ci-x64-nvidia-vulkan-cm2 and CI / macOS-latest-cmake-arm64 failing does not seem related to this PR.

CI / ubuntu-22-cmake-vulkan (pull_request) This I'm unsure though:

	 27 - test-thread-safety (ILLEGAL)                      main
	 29 - test-opt (ILLEGAL)                                main
	 31 - test-backend-ops (ILLEGAL)                        main
	 34 - test-barrier (ILLEGAL)                            main
	 35 - test-quantize-fns (ILLEGAL)                       main
	 36 - test-quantize-perf (ILLEGAL)                      main
	 37 - test-rope (ILLEGAL)                               main

@CISC
Copy link
Collaborator

CISC commented Sep 26, 2025

CI / ubuntu-22-cmake-vulkan (pull_request) This I'm unsure though:

It's "OK", it's a corrupt ccache.

@taronaeo
Copy link
Collaborator Author

CI / ubuntu-22-cmake-vulkan (pull_request) This I'm unsure though:

It's "OK", it's a corrupt ccache.

Gotcha. Will proceed to merge then.

@taronaeo taronaeo merged commit 624207e into ggml-org:master Sep 26, 2025
62 of 67 checks passed
struct pushed a commit to struct/llama.cpp that referenced this pull request Sep 26, 2025
* devops: move s390x and ppc64le ci build

we have access to ubuntu-24.04-s390x and ppc64le images now

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le for now since they have compiler errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: stop warnings as errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: switch to non-macro flag

Signed-off-by: Aaron Teo <[email protected]>

* devops: going the llama macro route

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian gguf test models

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le to test s390x, check test build

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.inp files for big-endian tests

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.out files for big-endian too

Signed-off-by: Aaron Teo <[email protected]>

* devops: add python setup and endian byteswap

Signed-off-by: Aaron Teo <[email protected]>

* devops: pooring thing does not have s390x python3

Signed-off-by: Aaron Teo <[email protected]>

* devops: add missing rust compiler for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: try rust actions runner

Signed-off-by: Aaron Teo <[email protected]>

* Revert "devops: try rust actions runner"

This reverts commit 3f8db04.

Signed-off-by: Aaron Teo <[email protected]>

* devops: try a different path for rust

Signed-off-by: Aaron Teo <[email protected]>

* devops: dump home directory and user info

Signed-off-by: Aaron Teo <[email protected]>

* devops: install gguf-py only

Signed-off-by: Aaron Teo <[email protected]>

* devops: missed relative path

Signed-off-by: Aaron Teo <[email protected]>

* devops: remove big-endian files since local swapping is working

Signed-off-by: Aaron Teo <[email protected]>

* devops: revert test-tokenizer-0 cmakelists

Signed-off-by: Aaron Teo <[email protected]>

* Fix unicode flags conversion from and to uint16_t

Bitfields are allocated in different order on s390x

Signed-off-by: Aaron Teo <[email protected]>

* Simplify byteswap command

Signed-off-by: Aaron Teo <[email protected]>

* Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs

Signed-off-by: Aaron Teo <[email protected]>

* Fix endianness detection in vocab loader

Signed-off-by: Aaron Teo <[email protected]>

* Disable test-thread-safety on s390x

In this test a model is downloaded,
then immediately loaded to check if more downloads are needed,
and then used for test.

There is no clean way to separate all those steps
 to add byteswapping between them, so just skip this test.

Signed-off-by: Aaron Teo <[email protected]>

* Fix q8_0 test in test-quantize-fns

vec_signed uses unexpected rounding mode.
Explicitly use different rounding function.

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian stories260K

Signed-off-by: Aaron Teo <[email protected]>

* devops: add s390x test-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix test does not exist

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix model not found llama-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* Fix q3_K dot product error in test-quantize-fns on s390x

Array q8bytes had only 4 elements allocated, but 8 elements accessed.
This lead to write out of bounds and later read of overwritten values out of bounds
and incorrect result.

Signed-off-by: Aaron Teo <[email protected]>

* devops: re-enable ppc64le for testing

Signed-off-by: Aaron Teo <[email protected]>

* devops: activate test-thread-safety for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le tests

for some reason it keeps failing test-thread-safety tests and I do not
    have a machine that is able to replicate the tests.

Signed-off-by: Aaron Teo <[email protected]>

* devops: LLAMA_FATAL_WARNINGS=ON

Signed-off-by: Aaron Teo <[email protected]>

* Correct repository URL for s390x for test-thread-safety model

Signed-off-by: Aaron Teo <[email protected]>

* Fix fs_get_cache_directory

Ensure it works even if both XDG_CACHE_HOME and HOME are unset.
This might happen in containers.

Signed-off-by: Aaron Teo <[email protected]>

* Re-enable CI for ppc64le

Signed-off-by: Aaron Teo <[email protected]>

* Fortify ggml_rope_impl

Only memcpy data from sections argument if it's non-NULL.

Signed-off-by: Aaron Teo <[email protected]>

* Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way

* Update URL for big-endian model

* Update .github/workflows/build.yml

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update remaining mentions of BE models to ggml-org/models repo

---------

Signed-off-by: Aaron Teo <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
@CISC CISC mentioned this pull request Oct 3, 2025
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
* devops: move s390x and ppc64le ci build

we have access to ubuntu-24.04-s390x and ppc64le images now

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le for now since they have compiler errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: stop warnings as errors

Signed-off-by: Aaron Teo <[email protected]>

* devops: switch to non-macro flag

Signed-off-by: Aaron Teo <[email protected]>

* devops: going the llama macro route

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian gguf test models

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le to test s390x, check test build

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.inp files for big-endian tests

Signed-off-by: Aaron Teo <[email protected]>

* devops: dup .gguf.out files for big-endian too

Signed-off-by: Aaron Teo <[email protected]>

* devops: add python setup and endian byteswap

Signed-off-by: Aaron Teo <[email protected]>

* devops: pooring thing does not have s390x python3

Signed-off-by: Aaron Teo <[email protected]>

* devops: add missing rust compiler for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: try rust actions runner

Signed-off-by: Aaron Teo <[email protected]>

* Revert "devops: try rust actions runner"

This reverts commit 3f8db04.

Signed-off-by: Aaron Teo <[email protected]>

* devops: try a different path for rust

Signed-off-by: Aaron Teo <[email protected]>

* devops: dump home directory and user info

Signed-off-by: Aaron Teo <[email protected]>

* devops: install gguf-py only

Signed-off-by: Aaron Teo <[email protected]>

* devops: missed relative path

Signed-off-by: Aaron Teo <[email protected]>

* devops: remove big-endian files since local swapping is working

Signed-off-by: Aaron Teo <[email protected]>

* devops: revert test-tokenizer-0 cmakelists

Signed-off-by: Aaron Teo <[email protected]>

* Fix unicode flags conversion from and to uint16_t

Bitfields are allocated in different order on s390x

Signed-off-by: Aaron Teo <[email protected]>

* Simplify byteswap command

Signed-off-by: Aaron Teo <[email protected]>

* Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs

Signed-off-by: Aaron Teo <[email protected]>

* Fix endianness detection in vocab loader

Signed-off-by: Aaron Teo <[email protected]>

* Disable test-thread-safety on s390x

In this test a model is downloaded,
then immediately loaded to check if more downloads are needed,
and then used for test.

There is no clean way to separate all those steps
 to add byteswapping between them, so just skip this test.

Signed-off-by: Aaron Teo <[email protected]>

* Fix q8_0 test in test-quantize-fns

vec_signed uses unexpected rounding mode.
Explicitly use different rounding function.

Signed-off-by: Aaron Teo <[email protected]>

* devops: add big-endian stories260K

Signed-off-by: Aaron Teo <[email protected]>

* devops: add s390x test-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix test does not exist

Signed-off-by: Aaron Teo <[email protected]>

* devops: fix model not found llama-eval-callback

Signed-off-by: Aaron Teo <[email protected]>

* Fix q3_K dot product error in test-quantize-fns on s390x

Array q8bytes had only 4 elements allocated, but 8 elements accessed.
This lead to write out of bounds and later read of overwritten values out of bounds
and incorrect result.

Signed-off-by: Aaron Teo <[email protected]>

* devops: re-enable ppc64le for testing

Signed-off-by: Aaron Teo <[email protected]>

* devops: activate test-thread-safety for s390x

Signed-off-by: Aaron Teo <[email protected]>

* devops: disable ppc64le tests

for some reason it keeps failing test-thread-safety tests and I do not
    have a machine that is able to replicate the tests.

Signed-off-by: Aaron Teo <[email protected]>

* devops: LLAMA_FATAL_WARNINGS=ON

Signed-off-by: Aaron Teo <[email protected]>

* Correct repository URL for s390x for test-thread-safety model

Signed-off-by: Aaron Teo <[email protected]>

* Fix fs_get_cache_directory

Ensure it works even if both XDG_CACHE_HOME and HOME are unset.
This might happen in containers.

Signed-off-by: Aaron Teo <[email protected]>

* Re-enable CI for ppc64le

Signed-off-by: Aaron Teo <[email protected]>

* Fortify ggml_rope_impl

Only memcpy data from sections argument if it's non-NULL.

Signed-off-by: Aaron Teo <[email protected]>

* Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way

* Update URL for big-endian model

* Update .github/workflows/build.yml

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Update remaining mentions of BE models to ggml-org/models repo

---------

Signed-off-by: Aaron Teo <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Aleksei Nikiforov <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions examples ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: s390x CI

4 participants