-
Notifications
You must be signed in to change notification settings - Fork 66
Rebuild UCX UCC, OSU and NCCL for the CUDA sanity check #1165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
Strange.. I can see that the job for #1165 (comment) is running, but this comment (at this point in time) doesn't reflect that, it just says 'eligible to start in 20 seconds'. I'm curious if it will report the result. |
|
I guess not, let's retry: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
I guess the event handler is still running, but the job manager crashed? :| Not sure why though, nothing in the logs... |
There's known issues with the job manager that can make it crash, not resolved yet, see for example: |
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc70 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: help |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
bot: help |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
Need to rebuild the haswell cc70 tarball, as the last attempt failed, and the one before that (which is the one that got uploaded) didn't include NCCL: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=haswell for:arch=x86_64/intel/haswell,accel=nvidia/cc70 |
|
New job on instance
|
|
Once more, should only build NCCL now: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=haswell for:arch=x86_64/intel/haswell,accel=nvidia/cc70 |
|
New job on instance
|
|
Hmm, because it's a rebuild it will still (re)build everything. Let's ingest that one anyway and overwrite the existing installations. |
|
All tarballs have been ingested 🎉 |
No description provided.