-
Notifications
You must be signed in to change notification settings - Fork 769
run_prebuilt_e2e_tests CI jobs fail in cases with UR API changes #16982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
**Same patch as oneapi-src#2606 which was reverted due to intel/llvm#16982 The API to enqueue a closed command-buffer to a queue is defined in the YAML as a part of the command-buffer class, but it should be part of the enqueue class like other enqueue API extensions. This PR updates the YAML and regenerates UR code, making the associated changes to adapters and CTS. Closes oneapi-src#2600
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
@ayylol Do you mind taking a look at this? It seems specific to prebuilt e2e testing. If that's not related to the problem let me know and I'll investigate. |
AFAIK, normally one can use llvm/.github/workflows/sycl-linux-build.yml Line 281 in 0e44bb4
|
Hey @EwanC, for some context the way the "dev_kit" features are added, which seems to be the ones causing the issues in your pr, is by testing to see if a simple file that depends on those dev_kits can be compiled. In the case for the
using the following command:
Given the similarity of the error message |
Thanks for investigating, I've tried that reproducer locally using #16984 which is the same change and shows the same CI issues in the prebuilt e2e test. Using the reproducer you suggest seems to work fine locally. Is there something I need to tweak locally to reproduce this issue?
This is the same output I see as tip sycl
|
@EwanC Ok, I investigated this a bit further, and Andrei's comment above is correct. The way the CI works at the build stage is we don't set the PATH to include the compiler rather we rely on the The prebuilt running stages on the other hand do outright set the PATH/LD_LIBRARY_PATH to include the bin/lib folders, in this case the l0 check file is able to be compiled. This is likely why you weren't able to reproduce this issue locally, and this is also why we do end up getting the I'm not entirely sure why we have this difference (perhaps @sarnex or @aelovikov-intel could comment), but I think its important to note that this same issue is not reproducible on HEAD (there it works fine in either case), just this pr. |
If you'd like to check I made this reproducer which follows what the CI is doing at the build stage (with some minor tweaks)
ran this on #16992 locally in the |
Thanks again for coming up with a reproducer, unfortunately I still don't see this issue locally. I tried it out the git branch for #16984 and ran the
If i add
Which if I run by itself in the command-line does indeed give me the error "$WS/toolchain/bin/../lib/libsycl.so: undefined reference to " for all the UR entry-points, but if I add The only way I've managed to reproduce the "undefined reference to 'urEnqueueCommandBufferExp@LIBUR_LOADER_0.12'" error is by intentionally pointing LD_LIBRARY_PATH to a build of the commit prior to the one I'm testing when running |
**Same patch as oneapi-src#2606 which was reverted due to intel/llvm#16982 The API to enqueue a closed command-buffer to a queue is defined in the YAML as a part of the command-buffer class, but it should be part of the enqueue class like other enqueue API extensions. This PR updates the YAML and regenerates UR code, making the associated changes to adapters and CTS. Closes oneapi-src#2600
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
Nobody is owning the CI, all contributions there are based on the good will. |
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
An update on where I am with trying to debug this further, I modified my #16984 PR so that it only added a new UR entry-point and didn't remove the old UR entry-point (although it updates all the callsites in UR CTS tests and DPC++ to the new entry-point). Interestingly the CI still failed runs still failed, although this isn't the case for other PRs like #17117 which add new UR entry-points. Sticking in some lit.cfg.py debug prints I see this in the CI log which is suspicious.
Will try to debug this error message further and work out what is different about my PR compared to others that are adding UR entry-points |
I think this error in particular is expected in this scenario. The line that spits this out is trying to call the compiler with an MSVC style option ( |
I've run into very similar My first suspicion was that this was caused by the Is there any idea why this problem might be occurring? |
Thanks both, I don't have any other leads as to what's going on here then, so unassigning myself. |
Thanks for looking into this more! I'm about to go on holiday until Tuesday but will rebase then. |
E2E features in the #16900 (https://github.com/intel/llvm/actions/runs/13565842833/job/37918644306?pr=16900#step:25:2581)
The features after the CI changes #17221 (https://github.com/intel/llvm/actions/runs/13570764161/job/37934910659?pr=17221#step:25:2886)
Looks like that worked, since after the CI changes we get the |
That's great news! Let me know if/when you'd like for me to close my test PR: #17221. |
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in intel#16747 due to intel#16982 Reflects change in name of UR entry-point from `urCommandBufferEnqueueExp` to `urEnqueueCommandBufferExp` in oneapi-src/unified-runtime#2606
**Same PR as was closed in #16747 due to CI issue #16982 which has since been resolved.** The API to enqueue a closed command-buffer to a queue is defined in the YAML as a part of the command-buffer class, but it should be part of the enqueue class like other enqueue API extensions. This PR updates the YAML and regenerates UR code, making the associated changes to adapters and CTS. Closes UR issue oneapi-src/unified-runtime#2600
**Same PR as was closed in intel/llvm#16747 due to CI issue intel/llvm#16982 which has since been resolved.** The API to enqueue a closed command-buffer to a queue is defined in the YAML as a part of the command-buffer class, but it should be part of the enqueue class like other enqueue API extensions. This PR updates the YAML and regenerates UR code, making the associated changes to adapters and CTS. Closes UR issue #2600
As discovered by @yingcong-wu in #16747 (comment) the run_prebuilt_e2e_tests CI jobs in PR testing fail in the case that the UR API changes.
Experimental features in Unified Runtime are allowed to make API/ABI breaking changes, and the SYCL-Graph experimental oneAPI extension relies on being able to do this. This CI job should be updated so that it can pass in such a case, or a policy should be established with the gatekeepers whereby PRs that fail run_prebuilt_e2e_tests CI runs due to this reason can still be merged.
The text was updated successfully, but these errors were encountered: