Revert task sequence changes #2

aejjehint · 2024-06-28T19:45:21Z

This reverts PR intel#12453 and intel#13080

…#13987) Bump UR L0 commit to oneapi-src/unified-runtime#1694 so that the SYCL device aspect for supporting update in graphs is correctly reported for L0 devices. Currently, support can be incorrectly reported. --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

Running the `Graph/Update` E2E tests on Level Zero with `UR_L0_LEAKS_DEBUG=1` shows that we are leaking a PI kernel and module. On investigation this was because we are retaining these objects in `getOrCreateKernel()` but not releasing them. Added release calls similar to how it is done in [enqueueImpCommandBufferKernel](https://github.com/intel/llvm/blob/b49303c7e13ca0a69454eaaaeb8c3d094916218d/sycl/source/detail/scheduler/commands.cpp#L2550) by the scheduler

This upgrades the docker to use the cuda 12.5 image. I've ran the test-e2e locally using cuda 12.5 and all is well. cuda 12.5 also fixed an issue introduced by the cuda 12.4 driver: see intel#13661 (comment) Signed-off-by: JackAKirk <[email protected]>

…lock_load/slm_block_store APIs accepting simd_view (intel#13978) Co-authored-by: Nick Sarnie <[email protected]>

… build libdevice with thinLTO (intel#14036) This is the first change in my work on thinLTO for SYCL. --------- Signed-off-by: Sarnie, Nick <[email protected]>

Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <[email protected]>

See intel#14053 Signed-off-by: Sarnie, Nick <[email protected]>

…mbiguity (intel#14018) This change avoids the ambiguity between the deprecated `sycl::ext::oneapi::sub_group` and `sycl::sub_group` when both namespaces are used. This fixes a failure on windows for cuda. --------- Signed-off-by: JackAKirk <[email protected]>

…ate for local_accessors API accepting simd_view (intel#14024)

intel#14000) Co-authored-by: Nick Sarnie <[email protected]>

…el#14061) See intel#14060 --------- Signed-off-by: Sarnie, Nick <[email protected]>

Supported matrix dimensions are queried from the device, and inform the tests which tile sizes one can use. This is a subset of all tests that are planned to be modified. Test manually tested on PVC and SPR --------- Co-authored-by: Yury Plyakhin <[email protected]>

Supported matrix dimensions are queried from the device, and inform the tests which tile sizes one can use. This is a subset of all tests that are planned to be modified. Test manually tested on PVC and SPR - no new regresssions The following tests have been marked as XFAIL on all platforms. I removed them from XMX8 folder. Once they are passing then they can be modified to query the supported matrix dimensions form the device. * joint_matrix_colA_rowB_colC.cpp * joint_matrix_out_bounds.cpp * joint_matrix_unaligned_k.cpp

…ntel#14067) Co-authored-by: Ewan Crawford <[email protected]>

By some reason, we used to only emit unused member functions if they are explicitly annotated with `sycl_device` attribute (through `SYCL_EXTERNAL` macro). This logic was introduced in 3baec18 and there is no clear indication as to why exactly we have a check that the attribute is explicit. SYCL extension for virtual functions introduces an alternative markup for specifying which function and that markup is SYCL compile-time properties that we turn into attributes implicitly under the hood. Essentially, we now have a situation where an implicit `sycl_device` attribute on a member function should be treated as an explicit one, because it could be a result of SYCL compile-time property being applied to that method. Considering our current codebase, it seems like we intend to have member function to be emitted in all cases where `sycl_device` is being implicitly added and therefore this patch removes the requirement for the attribute to be explicit.

This PR adds functionalities for: * Listing devices in stdout * Filtering devices Tests and docs updated accordingly. --------- Signed-off-by: Alberto Cabrera <[email protected]> Co-authored-by: Joe Todd <[email protected]>

…intel#14015) This PR adds a `wait_and_free` func. This makes it safer and less likely to release memory during or before it is used by enqueued commands. `async_free` is renamed `enqueue_free`, to make its behaviour clearer This PR updates the comments and tests accordingly

…d to run on Windows (intel#13957) [Windows doesn't support cudaMemPrefetchAsync()](bitsandbytes-foundation/bitsandbytes#453) which is used in the call to `prefetch` in the test. [urEnqueueUSMPrefetch](https://github.com/oneapi-src/unified-runtime/blob/c0c607c3a88933b4c5c20a0aca4539781c678411/source/adapters/cuda/enqueue.cpp#L1629) is also commented with a note for not having the support for CUDA on Windows.

temp fix for problems from cuda 12.5 uplift that were caused by intel#14049. Should fix intel#14071 --------- Signed-off-by: JackAKirk <[email protected]>

…e exception message (intel#14055) - C++ thrown exception message not shown when running from Windows terminal. - The patch fixes [cuda-max-local-mem-size.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda-max-local-mem-size.cpp) test failure.

… GPUs through OpenCL. (intel#14072) Extend the `sycl-ls-gpu-default.cpp` test to cover the support of Intel GPUs through OpenCL. The patch fixes the failure when running the test on a system with Intel and CUDA gpus.

Fixes ansi-alias violation and reads from uninitialized buffers. Fixes intel#13790.

…UDA (intel#14058) Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

…date ACC API accepting simd_view (intel#14065) Co-authored-by: Nick Sarnie <[email protected]>

…date USM API accepting simd_view (intel#14043)

…#14087) See intel#14086 Signed-off-by: Sarnie, Nick <[email protected]>

…ntel#14088) CODEOWNERS seems to be missing a line attributing `sycl/test/check_device_code/matrix` tests to intel/sycl-matrix-reviewers (As per [this discussion](intel#14063 (comment))). This PR remedies this. Although, I noticed the current CODEOWNERS section for the matrix reviewers uses paths; Let me know if I should use `sycl/**/matrix` instead.

…address) (intel#13948) UR: oneapi-src/unified-runtime#1677 In kernel, we save at most `ASAN_MAX_NUM_REPORTS` (default: 10) number of SanitizerReport. We select the index of SanitizerReport by `WG_LINEAR_ID % ASAN_MAX_NUM_REPORTS`. When `-fsanitize-recover=address` is passed in compiler flag, `asan_loadX_noabort`/`asan_storeX_noabort` is used, we use `is_recover = true` flag to distinguish this case. If `is_recover` is true, the UR will print out all the error reports and continue (use at your own risk). If `is_recover` is false (default case), the UR will print out only one error report and exit. --------- Co-authored-by: Callum Fare <[email protected]>

This patch fixes errors like "call to 'XXX' is ambiguous" and also disables all TaskSequence tests on Windows because they fail in the same way as on Linux. Github issue was created against that and comment in tests was updated.

In accordance with KhronosGroup/SYCL-Docs#555 proposal, this commit allows raw pointers in the `load` and `store` member functions on `sycl::vec`. --------- Signed-off-by: Larsen, Steffen <[email protected]>

)

It had been removed from the specification in KhronosGroup/SYCL-Docs#431. Originally introduced in intel#4298 the implementation has never been completed, so, while technically a breaking change, no customer code can be really using it.

) Re-lands intel#13199 and reverts intel#13464.

…SYCL offloading under an option (intel#14262) We have added an experimental option (off by default): -fsycl-use-spirv-backend-for-spirv-gen This will cause device compilation pipeline in SYCL offloading to invoke llc (with SPIR-V target) instead of llvm-spirv to perform SPIR-V code generation. This change will result in the SPIR-V backend being built by default. Also, if spirv-val is available, a sanity check of the generated SPIR-V code will be done and any issues will be reported as warnings to the user. Thanks --------- Signed-off-by: Arvind Sudarsanam <[email protected]>

…13385)

…vice_level_zero.cpp` (intel#14310) Fixing test that was missed in intel#14162

…l#14175) In the same vein as intel#14174, this PR moves test cases that do not check device code outside of the `check_device_code` folder. This is a separate PR as intel#14174 because the code is technically owned by another team.

When using -fsycl --coverage, we should not enable code coverage for device compilations as code coverage for device is not supported at this time.

error: unused variable 'DecorateAddressIndex' [-Werror,-Wunused-variable]

Due to some confusion about the output from the in_order_profiling_queue test on L0, the test was disabled. However, the test can be safely reenabled for that target, while keeping it disabled for FPGA. Additionally, the failure in profiling_queue is believed to be due to the same issue, so the JIRA has been added to it and the note in in_order_profiling_queue has been updated to reflect the known information about the failure. Signed-off-by: Larsen, Steffen <[email protected]>

UR: oneapi-src/unified-runtime#1676 Instrument "__asan_mem_to_shadow" to convert private address to its shadow memory address Other steps are same with ASan on stack.

Re-lands intel#13236 and reverts intel#13463.

…14168)

…n into IR-based analysis (intel#14220) Based on feedback from intel#14197, I seperated out the code that generates the module properties and symbol table into separate functions that can be called by anyone, and just looks at the IR and entry points. For now, we still call it inside `sycl-post-link` because we still support the old offloading model, but once we drop support for that we can drop this responsibility from sycl-post-link and only compute it inside `clang-linker-wrapper`, both for normal compilation and thinLTO. In a (hopefully soon) future PR I plan to call these functions from `clang-linker-wrapper` when compiling for thinLTO, which we need because we will split early. Most of this change should be NFC(I). The expected changes are: 1) New option to sycl-post-link to generate the properties file 2) Driver change to NOT pass the option from 1) in thinLTO mode 3) Two minor chages in logic from properties generation, I've called these out inline. --------- Signed-off-by: Sarnie, Nick <[email protected]>

…iver (intel#14334) A similar test was disabled in intel@672b225, and this test was found to have the same issue. Signed-off-by: Sarnie, Nick <[email protected]>

) Subgroups are core sycl functionality which should be tested on all backends.

This reverts commit 92f5b98.

…add new fpga_cluster kernel property (intel#12453)" This reverts commit 7b9001e.

… (#92855) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, we extend the provisional wording introduced in llvm/llvm-project#89807 so it also covers deduction of class templates. Given the following example: ```C++ template <class T1, class T2 = float> struct A; template <class T3> struct B; template <template <class T4> class TT1, class T5> struct B<TT1<T5>>; // #1 template <class T6, class T7> struct B<A<T6, T7>>; // #2 template struct B<A<int>>; ``` Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This patch restores the pre-P0522 behavior, `#2` is picked again. This has the beneficial side effect of making the following code valid: ```C++ template<class T, class U> struct A {}; A<int, float> v; template<template<class> class TT> void f(TT<int>); // OK: TT picks 'float' as the default argument for the second parameter. void g() { f(v); } ``` --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.

EwanC and others added 30 commits June 5, 2024 10:20

[UR] Bump HIP tag to 399430d (intel#14037)

0cec128

[ESIMD] Allow full autodeduction for block_load/block_store and slm_b…

643d21f

…lock_load/slm_block_store APIs accepting simd_view (intel#13978) Co-authored-by: Nick Sarnie <[email protected]>

[SYCL] Allow specifying -foffload-lto with the new offload driver and…

7ff1a29

… build libdevice with thinLTO (intel#14036) This is the first change in my work on thinLTO for SYCL. --------- Signed-off-by: Sarnie, Nick <[email protected]>

[GHA] Uplift Linux IGC Dev RT version to igc-dev-8b999ec (intel#13963)

fbeb7d4

Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <[email protected]>

[SYCL][E2E] Disable flaky profiling_queue.cpp test on CUDA (intel#14054)

aa92b24

See intel#14053 Signed-off-by: Sarnie, Nick <[email protected]>

[ESIMD] Allow full autodeduction for slm_atomic_update and atomic_upd…

2dc80c3

…ate for local_accessors API accepting simd_view (intel#14024)

[ESIMD] Allow full autodeduction for prefetch APIs accepting simd_view (

c7d627f

intel#14000) Co-authored-by: Nick Sarnie <[email protected]>

[SYCL] Disable flaky test EnqueueNoMemObjTwoHostTasks on Windows (int…

037c67f

…el#14061) See intel#14060 --------- Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL][Graph] Clarify graph enable_profiling property in finalize() (i…

09c9384

…ntel#14067) Co-authored-by: Ewan Crawford <[email protected]>

[CI] pre-commit/aws pointed back to old image. (intel#14074)

f4829ab

temp fix for problems from cuda 12.5 uplift that were caused by intel#14049. Should fix intel#14071 --------- Signed-off-by: JackAKirk <[email protected]>

[SYCL][E2E] Refactor/fix bfloat16 test (intel#14062)

463f00c

Fixes ansi-alias violation and reads from uninitialized buffers. Fixes intel#13790.

[SYCL][E2E] Disable NonUniformGroups/ballot_group_algorithms.cpp on C…

0f796bc

…UDA (intel#14058) Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.

[SYCL][E2E] Remove warnings in Basic e2e tests (intel#13994)

0cbc9a0

[Doc] Add Mar'24 Release Notes (intel#13879)

b8693eb

[ESIMD] Allow full autodeduction of template parameters for atomic_up…

d66106c

…date ACC API accepting simd_view (intel#14065) Co-authored-by: Nick Sarnie <[email protected]>

[ESIMD] Allow full autodeduction of template parameters for atomic_up…

353cc51

…date USM API accepting simd_view (intel#14043)

[SYCL][E2E] Disable memory_management_test3.cpp on Gen12 linux (intel…

f8552d4

…#14087) See intel#14086 Signed-off-by: Sarnie, Nick <[email protected]>

AllanZyne and others added 28 commits June 27, 2024 14:19

[SYCL] Bump boost to 1.85.0 and trim dependencies (intel#14163)

3767874

[SYCL] Allow raw pointers in SYCL vec load and store (intel#13895)

f5178eb

In accordance with KhronosGroup/SYCL-Docs#555 proposal, this commit allows raw pointers in the `load` and `store` member functions on `sycl::vec`. --------- Signed-off-by: Larsen, Steffen <[email protected]>

[SYCL] Remove ESIMD Emulator (intel#13295)

4dca823

[SYCL] Remove old ABI AccessorBaseHost ctors (intel#13386)

787670e

[SYCL] Remove old ABI entry points from detail::stream_impl (intel#13387

a9e7ece

)

[SYCL] Remove unused ctors in AccessorImplHost (intel#13388)

65fb79e

[SYCL] Remove sycl::errc_for (intel#13903)

b429484

It had been removed from the specification in KhronosGroup/SYCL-Docs#431. Originally introduced in intel#4298 the implementation has never been completed, so, while technically a breaking change, no customer code can be really using it.

[ABI-Break][SYCL] Remove collectives in the sub-group class (intel#13665

7193c26

) Re-lands intel#13199 and reverts intel#13464.

[SYCL] Remove unused ctors in buffer_impl (intel#13390)

a14689f

[SYCL] Remove old ABI entry points from memory_manager.[ch]pp (intel#…

c0eab4b

…13385)

[SYCL][E2E] Use callable device selector in `FilterSelector/select_de…

58e60a5

…vice_level_zero.cpp` (intel#14310) Fixing test that was missed in intel#14162

[Driver][SYCL] Restrict --coverage for SYCL device (intel#14312)

c60a12e

When using -fsycl --coverage, we should not enable code coverage for device compilations as code coverage for device is not supported at this time.

[NFC][SYCL] Remove unused var in CGBulitin (intel#13972)

f4220d7

error: unused variable 'DecorateAddressIndex' [-Werror,-Wunused-variable]

[DeviceSanitizer] Support out-of-bounds on private memory (intel#13935)

df0dc3b

UR: oneapi-src/unified-runtime#1676 Instrument "__asan_mem_to_shadow" to convert private address to its shadow memory address Other steps are same with ASan on stack.

[SYCL] Remove deprecated shuffles from the sub-group class (intel#13666)

c9842c1

Re-lands intel#13236 and reverts intel#13463.

[SYCL][E2E] Fix deprecated warnings in Regression e2e tests (intel#…

b51ebb8

…14168)

[SYCL] Remove deprecated interop interfaces (intel#13306)

0b59341

[SYCL][ESIMD][E2E] Disable atomic_update test on Windows until new dr…

7ce48cf

…iver (intel#14334) A similar test was disabled in intel@672b225, and this test was found to have the same issue. Signed-off-by: Sarnie, Nick <[email protected]>

[SYCL][E2E] Remove subgroup supported checks from e2e tests (intel#14313

deeb664

) Subgroups are core sycl functionality which should be tested on all backends.

Revert "[SYCL][E2E] Disable TaskSequence/* tests on Linux (intel#13080)"

239bdd8

This reverts commit 92f5b98.

Revert "[SYCL][FPGA] Implement task_sequence header, properties, and …

26dc3b8

…add new fpga_cluster kernel property (intel#12453)" This reverts commit 7b9001e.

Empty commit

cb2de31

aejjehint closed this Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert task sequence changes #2

Revert task sequence changes #2

aejjehint commented Jun 28, 2024

Revert task sequence changes #2

Revert task sequence changes #2

Conversation

aejjehint commented Jun 28, 2024