Skip to content

Revert task sequence changes #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1,514 commits into from
Closed

Revert task sequence changes #2

wants to merge 1,514 commits into from

Conversation

aejjehint
Copy link
Owner

This reverts PR intel#12453 and intel#13080

EwanC and others added 30 commits June 5, 2024 10:20
…#13987)

Bump UR L0 commit to
oneapi-src/unified-runtime#1694 so that the SYCL
device aspect for supporting update in graphs is correctly reported for
L0 devices. Currently, support can be incorrectly reported.

---------

Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Running the `Graph/Update` E2E tests on Level Zero with
`UR_L0_LEAKS_DEBUG=1` shows that we are leaking a PI kernel and module.

On investigation this was because we are retaining these objects in
`getOrCreateKernel()` but not releasing them. Added release calls
similar to how it is done in
[enqueueImpCommandBufferKernel](https://github.com/intel/llvm/blob/b49303c7e13ca0a69454eaaaeb8c3d094916218d/sycl/source/detail/scheduler/commands.cpp#L2550)
by the scheduler
This upgrades the docker to use the cuda 12.5 image.

I've ran the test-e2e locally using cuda 12.5 and all is well. cuda 12.5
also fixed an issue introduced by the cuda 12.4 driver: see
intel#13661 (comment)

Signed-off-by: JackAKirk <[email protected]>
…lock_load/slm_block_store APIs accepting simd_view (intel#13978)

Co-authored-by: Nick Sarnie <[email protected]>
… build libdevice with thinLTO (intel#14036)

This is the first change in my work on thinLTO for SYCL.

---------

Signed-off-by: Sarnie, Nick <[email protected]>
Scheduled igc dev drivers uplift

Co-authored-by: GitHub Actions <[email protected]>
…mbiguity (intel#14018)

This change avoids the ambiguity between the deprecated
`sycl::ext::oneapi::sub_group` and `sycl::sub_group` when both
namespaces are used. This fixes a failure on windows for cuda.

---------

Signed-off-by: JackAKirk <[email protected]>
Supported matrix dimensions are queried from the device, and inform the
tests which tile sizes one can use.

This is a subset of all tests that are planned to be modified.

Test manually tested on PVC and SPR

---------

Co-authored-by: Yury Plyakhin <[email protected]>
Supported matrix dimensions are queried from the device, and inform the
tests which tile sizes one can use.

This is a subset of all tests that are planned to be modified.

Test manually tested on PVC and SPR - no new regresssions

The following tests have been marked as XFAIL on all platforms. I
removed them from XMX8 folder. Once they are passing then they can be
modified to query the supported matrix dimensions form the device.
* joint_matrix_colA_rowB_colC.cpp
* joint_matrix_out_bounds.cpp
* joint_matrix_unaligned_k.cpp
By some reason, we used to only emit unused member functions if they
are explicitly annotated with `sycl_device` attribute (through
`SYCL_EXTERNAL` macro).

This logic was introduced in 3baec18
and there is no clear indication as to why exactly we have a check that
the attribute is explicit.

SYCL extension for virtual functions introduces an alternative markup
for specifying which function and that markup is SYCL compile-time
properties that we turn into attributes implicitly under the hood.

Essentially, we now have a situation where an implicit `sycl_device`
attribute on a member function should be treated as an explicit one,
because it could be a result of SYCL compile-time property being applied
to that method.

Considering our current codebase, it seems like we intend to
have member function to be emitted in all cases where 
`sycl_device` is being implicitly added and therefore this patch removes
the requirement for the attribute to be explicit.
This PR adds functionalities for:
* Listing devices in stdout
* Filtering devices

Tests and docs updated accordingly.

---------

Signed-off-by: Alberto Cabrera <[email protected]>
Co-authored-by: Joe Todd <[email protected]>
…intel#14015)

This PR adds a `wait_and_free` func. This makes it safer and less likely
to release memory during or before it is used by enqueued commands.

`async_free` is renamed `enqueue_free`, to make its behaviour clearer

This PR updates the comments and tests accordingly
…d to run on Windows (intel#13957)

[Windows doesn't support
cudaMemPrefetchAsync()](bitsandbytes-foundation/bitsandbytes#453)
which is used in the call to `prefetch` in the test.

[urEnqueueUSMPrefetch](https://github.com/oneapi-src/unified-runtime/blob/c0c607c3a88933b4c5c20a0aca4539781c678411/source/adapters/cuda/enqueue.cpp#L1629)
is also commented with a note for not having the support for CUDA on
Windows.
temp fix for problems from cuda 12.5 uplift that were caused by
intel#14049. Should fix
intel#14071

---------

Signed-off-by: JackAKirk <[email protected]>
…e exception message (intel#14055)

- C++ thrown exception message not shown when running from Windows
terminal.
- The patch fixes
[cuda-max-local-mem-size.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda-max-local-mem-size.cpp)
test failure.
… GPUs through OpenCL. (intel#14072)

Extend the `sycl-ls-gpu-default.cpp` test to cover the support of Intel
GPUs through OpenCL.
The patch fixes the failure when running the test on a system with Intel
and CUDA gpus.
Fixes ansi-alias violation and reads from uninitialized buffers. Fixes
intel#13790.
…UDA (intel#14058)

Fails in Nightly testing on the self-hosted CUDA runner:
intel#12995.
…date ACC API accepting simd_view (intel#14065)

Co-authored-by: Nick Sarnie <[email protected]>
…ntel#14088)

CODEOWNERS seems to be missing a line attributing
`sycl/test/check_device_code/matrix` tests to
intel/sycl-matrix-reviewers (As per [this
discussion](intel#14063 (comment))).
This PR remedies this.

Although, I noticed the current CODEOWNERS section for the matrix
reviewers uses paths; Let me know if I should use `sycl/**/matrix`
instead.
AllanZyne and others added 28 commits June 27, 2024 14:19
…address) (intel#13948)

UR: oneapi-src/unified-runtime#1677

In kernel, we save at most `ASAN_MAX_NUM_REPORTS` (default: 10) number
of SanitizerReport.
We select the index of SanitizerReport by `WG_LINEAR_ID %
ASAN_MAX_NUM_REPORTS`.

When `-fsanitize-recover=address` is passed in compiler flag,
`asan_loadX_noabort`/`asan_storeX_noabort` is used, we use `is_recover =
true` flag to distinguish this case.
If `is_recover` is true, the UR will print out all the error reports and
continue (use at your own risk).
If `is_recover` is false (default case), the UR will print out only one
error report and exit.

---------

Co-authored-by: Callum Fare <[email protected]>
This patch fixes errors like "call to 'XXX' is ambiguous" and also
disables all TaskSequence tests on Windows because they fail in the same
way as on Linux. Github issue was created against that and comment in
tests was updated.
In accordance with KhronosGroup/SYCL-Docs#555
proposal, this commit allows raw pointers in the `load` and `store`
member functions on `sycl::vec`.

---------

Signed-off-by: Larsen, Steffen <[email protected]>
It had been removed from the specification in
KhronosGroup/SYCL-Docs#431. Originally
introduced in intel#4298 the implementation
has never been completed, so, while technically a breaking change, no
customer code can be really using it.
…SYCL offloading under an option (intel#14262)

We have added an experimental option (off by default):
-fsycl-use-spirv-backend-for-spirv-gen

This will cause device compilation pipeline in SYCL offloading to invoke
llc (with SPIR-V target) instead of llvm-spirv to perform SPIR-V code
generation.
This change will result in the SPIR-V backend being built by default.
Also, if spirv-val is available, a sanity check of the generated SPIR-V
code will be done and any issues will be reported as warnings to the
user.

Thanks

---------

Signed-off-by: Arvind Sudarsanam <[email protected]>
…l#14175)

In the same vein as intel#14174, this PR moves test cases that do not check
device code outside of the `check_device_code` folder. This is a
separate PR as intel#14174 because the code is technically owned by another
team.
When using -fsycl --coverage, we should not enable code coverage for
device compilations as code coverage for device is not supported at this
time.
error: unused variable 'DecorateAddressIndex'
[-Werror,-Wunused-variable]
Due to some confusion about the output from the in_order_profiling_queue
test on L0, the test was disabled. However, the test can be safely
reenabled for that target, while keeping it disabled for FPGA.

Additionally, the failure in profiling_queue is believed to be due to
the same issue, so the JIRA has been added to it and the note in
in_order_profiling_queue has been updated to reflect the known
information about the failure.

Signed-off-by: Larsen, Steffen <[email protected]>
UR: oneapi-src/unified-runtime#1676

Instrument "__asan_mem_to_shadow" to convert private address to its
shadow memory address
Other steps are same with ASan on stack.
…n into IR-based analysis (intel#14220)

Based on feedback from intel#14197, I
seperated out the code that generates the module properties and symbol
table into separate functions that can be called by anyone, and just
looks at the IR and entry points.

For now, we still call it inside `sycl-post-link` because we still
support the old offloading model, but once we drop support for that we
can drop this responsibility from sycl-post-link and only compute it
inside `clang-linker-wrapper`, both for normal compilation and thinLTO.

In a (hopefully soon) future PR I plan to call these functions from
`clang-linker-wrapper` when compiling for thinLTO, which we need because
we will split early.

Most of this change should be NFC(I). The expected changes are:

1) New option to sycl-post-link to generate the properties file
2) Driver change to NOT pass the option from 1) in thinLTO mode
3) Two minor chages in logic from properties generation, I've called
these out inline.

---------

Signed-off-by: Sarnie, Nick <[email protected]>
…iver (intel#14334)

A similar test was disabled in
intel@672b225,
and this test was found to have the same issue.

Signed-off-by: Sarnie, Nick <[email protected]>
)

Subgroups are core sycl functionality which should be tested on all
backends.
…add new fpga_cluster kernel property (intel#12453)"

This reverts commit 7b9001e.
aejjehint pushed a commit that referenced this pull request Jun 28, 2024
… (#92855)

This solves some ambuguity introduced in P0522 regarding how template
template parameters are partially ordered, and should reduce the
negative impact of enabling `-frelaxed-template-template-args` by
default.

When performing template argument deduction, we extend the provisional
wording introduced in llvm/llvm-project#89807 so
it also covers deduction of class templates.

Given the following example:
```C++
template <class T1, class T2 = float> struct A;
template <class T3> struct B;

template <template <class T4> class TT1, class T5> struct B<TT1<T5>>;   // #1
template <class T6, class T7>                      struct B<A<T6, T7>>; // #2

template struct B<A<int>>;
```
Prior to P0522, `#2` was picked. Afterwards, this became ambiguous. This
patch restores the pre-P0522 behavior, `#2` is picked again.

This has the beneficial side effect of making the following code valid:
```C++
template<class T, class U> struct A {};
A<int, float> v;
template<template<class> class TT> void f(TT<int>);

// OK: TT picks 'float' as the default argument for the second parameter.
void g() { f(v); }
```

---

Since this changes provisional implementation of CWG2398 which has not
been released yet, and already contains a changelog entry, we don't
provide a changelog entry here.
@aejjehint aejjehint closed this Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.