-
Notifications
You must be signed in to change notification settings - Fork 796
[UR][CI] Update to v1.25.2 L0 with fixed L0 api symbols #20537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
22512b8 to
4e78a1c
Compare
4e78a1c to
9a664ee
Compare
9a664ee to
84e351f
Compare
84e351f to
1f4dc3d
Compare
1f4dc3d to
6817bf4
Compare
6817bf4 to
42c4cf8
Compare
42c4cf8 to
16f7816
Compare
- Due to symbol conflicts in the L0 gpu driver, zeCommandListAppendLaunchKernelWithArguments can only be used from the spec definition and the driver exp cannot be used as of the update to v1.14 supported symbols. Signed-off-by: Neil R. Spruit <[email protected]>
16f7816 to
e777ffb
Compare
|
@intel/llvm-gatekeepers , please consider merging - we need this fix for benchmarks to fully work |
|
Some jobs in our pre-commit CI started to fail like this yesterday: which seems related to this PR. The jobs in question are "Dev IGC" and "ABI Compatibility". The latter, in particular, uses an older docker container image and the test binaries built with an official release. Then we copy newly built toolchain (think @nrspruit , @ldorau , @pbalcer , @lukaszstolarczuk , Can you confirm this PR is the reason of these failures? What can be done here? |
Adding @PatKamin |
The only way that would fail is if the UR adapter was NOT built statically linking the L0 static loader v1.25.x+ . What is different between the workflows for Dev IGC and ABI compatibility? Why are they allowing the build of the adapters without statically linking the L0 Loader? All the CI run in the PR passed because the loader is statically linked. |
I don't know anything about that. Docker image for "ABI Compatibility tasks" is built using the following workflow: https://github.com/intel/llvm/blob/sycl/.github/workflows/sycl-prebuilt-e2e-container.yml And the "current" toolchain is built the same way as for any other E2E jobs in the pre-commit. |
|
Pre-commit CI is still broken, should we revert this PR due to no response from the author? |
|
The issues cannot be reproduced locally. This only happens if somehow the UR adapters are built without the L0 static loader and only in those CI flows.
Someone who is familiar with those CI paths will need to help because there is some config being forced such that the adapters are not being built with L0 static loader.
@balcer, ***@***.***>, I was under the impression that the v2 adapter is always built with the static loader, but this error shows that in some cases it is built dynamic. Can we fix that or do we need to add an ifdef for the static build?
…________________________________
From: Andrei Elovikov ***@***.***>
Sent: Friday, November 7, 2025 7:50:27 AM
To: intel/llvm ***@***.***>
Cc: Spruit, Neil R ***@***.***>; Mention ***@***.***>
Subject: Re: [intel/llvm] [UR][CI] Update to v1.25.2 L0 with fixed L0 api symbols (PR #20537)
[https://avatars.githubusercontent.com/u/55163882?s=20&v=4]aelovikov-intel left a comment (intel/llvm#20537)<#20537 (comment)>
Pre-commit CI is still broken, should we revert this PR due to no response from the author?
—
Reply to this email directly, view it on GitHub<#20537 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABZWKTVLICRQW76AVKHHLEL33S5UHAVCNFSM6AAAAACLAJP6UGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMBTGMYTEOJWGE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
I can help with any of the CI build questions but I am incredibly busy so I would need someone familiar with the UR/L0 code to tell me exactly what I need to be looking for, some of the other devops people may be more free |
|
Here's my guess what's happening. In regular CI jobs, the level-zero loader is fetched dynamically from GitHub, and the loader is linked statically. This is independent of the adapter. See here:
It's possible that the jobs that are failing use a preinstalled dynamic loader. And then, at runtime, on the actual system where the job is executing, the symbol loading fails. Long-term, the loader should ship and export two separate targets, static and dynamic, so that UR can always choose the static variant when linking with the adapter, regardless how the loader is sourced. |
|
The docker images definitely have a preinstalled L0, the change in this PR updated the version of it, so it should be v1.25.2 now. |
|
I just built v1.25.2, and the symbol that's failing to load is exported: Other explanations for the behavior we are seeing in CI are a) LD_LIBRARY_PATH being set to a location where a different, older, version of the loader is installed, b) rpath, c) ... ?
If we revert this, other parts of CI will stop working. |
|
Sorry, I think the only remaining failures are the ABI compatibility jobs and those do not have an updated L0, by design. I fixed IGC-Dev just by rebuilding the container. Note sure what we should do for the ABI compatibility jobs. If we could make the compiler use the just-built L0 instead of the system one that should work I guess, not sure if that's a good solution though. |
I considered that. We could change the CMake's to do that, but that would also affect other builds, and those may not be able to fetch stuff dynamically from GitHub. |
|
Could we set some CMake envvar or actual CMake variable passed in build such that it won't find the system one or won't even try? Not sure if there is a variable like that. In the general case we should use the system one IMO. |
|
Extra note: we're in ABI breaking window, so we'll have to stop testing for ABI compatibility soon'ish and might just do it now as well, but I want to make sure that the issue that happened here won't happen in future, so we do need to root cause it. |
No idea either, I experimented with a few things, but no luck. I think it may be best to add a cmake option that would force it to go to this path:
|
|
Sounds good to me, the question is who has the bandwidth to implement it :P |
It's 6pm Friday for me, so... xd Given what @aelovikov-intel said, and with IGC-Dev jobs fixed, this no longer seems that urgent. I'll ask someone in my team for help once we get back from holidays (Wednesday). |
|
I can just disable the ABI jobs until we can implement the fix. Is that okay @aelovikov-intel? |
#20597, waiting for the CI result before merging (not 100% sure it'll work like that). |
No description provided.