[SYCL][ESIMD] Intel "Explicit SIMD" extension documentation draft. #1731

kbobrovs · 2020-05-21T02:40:42Z

This PR starts a series of PRs to integrate DPC++ "Explicit SIMD" extension for efficient Intel GPU programming. This PR provides overall description of the extenstion.
Contributors (alphabetic order):
Konst Bobrovsky
Peter Caday
Gang Chen
Kai Yu Chen
Ken Lueh
Wei Pan
Xinmin Tian

The upcoming changes will be spread across a number of components:

Driver
-fsycl-explicit-simd option option added to tweak FE behavior slightly compared to normal SYCL. In the future it is expected that no such distinction will be necessary, and the need in extra -fsycl-explicit-simd option will go away.
Front-End
- recognize and mark explicit SIMD kernels and functions
- allow non-constant globals used in device code
- __SYCL_EXPLICIT_SIMD__ macro is set by (device FE only)
- LLVM passes are run (-disable-llvm-passes is not added) (device FE only)
LLVM
- A number of ESIMD-specific LLVM-IR passes are added
SYCL RT - ESIMD-specific tweaks in accessors (using image buffers), 2D images,...
New SIMD API (~9000 lines)
This is the largest piece implementing the essence of the ESIMD extension. It provides APIs to write device code in close to metal manner.
llvm-spirv translator
(will be external github project) GenXIntrinsics llvm project. Provides APIs for low-level intrinsic generation.

More description of changes in each component will come with respective PR.

Contributors (alphabetic order): Konst Bobrovsky Peter Caday Gang Chen Kai Yu Chen Ken Lueh Wei Pan Xinmin Tian Signed-off-by: Konstantin S Bobrovsky <[email protected]>

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

Pennycook · 2020-05-21T15:29:47Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+See auto-generated documentation for the complete list of APIs here. (TBD)
+
+#### USM pointer-based memory access
+##### Flat-address gather/scatter


This comment shouldn't block this PR, but I'd like us to think more about what we can do to make these interfaces more user-friendly.

Two things stand out:

A user providing cache hints has to provide a lot more template arguments than required. Could we make this nicer by providing the hints as tag-type arguments?

// This works float* p; simd<uint32_t, 16> offsets; auto result = flat_load(p, offsets); // Adding cache hints makes it much more complex float* p; simd<uint32_t, 16> offsets; auto result = flat_load<uint32_t, 16, 1, CacheHint::Foo, CacheHint::Bar>(p, offsets);

I find the existing names quite confusing. I'd much prefer that the names were more obvious (e.g. flat_load could be called gather if that's all it does) and/or used overloading where appropriate to minimize the number of times a reader of code has to check documentation to figure out what a function is supposed to be doing.

Makes sense to me. @kychendev, please comment.

I'm open to change the name to make it more user friendly. The cache hints setting are mostly based on existing implementation. Please check with Gang who added these memory access APIs.

I do not see why gather/scatter is better naming than load/store. Our current naming scheme has its own consistency. I also do not see why tagged argument is any better than template

The more explicit and obvious a name is, the better. Anybody reading a function called "gather" or "gather load" knows that the function being called performed is a gather operation. I wouldn't know that a "flat load" was a gather without reading the documentation.

A tagged argument is better in this case because it allows for more template arguments to be deduced. Expanding on my previous example:

// Without cache hints, type and length can be deduced from offsets float* p; simd<uint32_t, 16> offsets; auto result = flat_load(p, offsets); // With cache hints as templates, verbosity increases significantly: // - Providing any cache hint forces the user to specify the type and length float* p; simd<uint32_t, 16> offsets; auto result = flat_load<uint32_t, 16, 1, CacheHint::Foo, CacheHint::Bar>(p, offsets); // With cache hints as tag types, verbosity is reduced: // - Providing a cache hint does not prevent deduction of type and length float* p; simd <uint32_t, 16> offsets; auto result = flat_load(p, offsets, CacheHint::Foo{});

Note also that the templated form prevents a developer from specifying an L3 hint without also explicitly specifying an L1 hint. If flat_load accepted a list of hints, it might be possible to refactor the hints to specify them in any order, and it may be more extensible to future cache hints:

auto result = flat_load(p, offsets, l3_cache_hint_foo, ...);

@cmc-rep are you being facetious about reading documentation? Descriptive names and interfaces are very important for readable/maintainable/accessible code.

'flat' addresses are only meaningful on Gen, and the convention that load == gather is obviously a Gen-specific notion too, since we keep having to say 'oh this is a gather' over and over just in this conversation!

Maybe it would be best to change the name of this extension back to DPC++-CM?

Reading documentation is very dangerous when designing/writing write-only programs... :-)

Also another area where it needless differs from std::simd. Non-gather load/store is called copy_from and copy_to. Why invent new names just to confuse people?

// Without cache hints, type and length can be deduced from offsets float* p; simd<uint32_t, 16> offsets; auto result = flat_load(p, offsets); auto result = flat_load<uint32_t, 16, 1, CacheHint::Foo, CacheHint::Bar>(p, offsets);

An approach à la https://github.com/chriskohlhoff/propria from @chriskohlhoff would be to add a property to the pointer, such as

auto result = flat_load(p, offsets); auto result = flat_load(decorate<CacheHint::Foo, CacheHint::Bar>(p), offsets);

The advantage is that you do not have to change all tour API and all the uses of this
decorated pointer will benefit from this.
decorate is to be bikeshed accordingly.

added TODO to the headers

Pennycook · 2020-05-21T15:31:14Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+          CacheHint L1H = CacheHint::Default,
+          CacheHint L3H = CacheHint::Default>
+    typename std::enable_if<__check_atomic<Op, T, n, 0>(), simd<T, n>>::type
+    flat_atomic(T *p, simd<unsigned, n> offset, simd<ushort, n> pred);


We should look into what can be done to simplify these atomic functions and align their design with the other new atomic features. That is perhaps out of scope for this PR (the direction is less clear than for the reduce changes, for example) but we should open an issue to track it.

@Pennycook , can you please open the issue then?

I'll open it as soon as this is merged.

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

Pennycook · 2020-05-21T16:25:13Z

-fsycl-esimd option option added to tweak FE behavior slightly compared to normal SYCL. In the future it is expected that no such distinction will be necessary, and the need in extra -fsycl-esimd option will go away.

I didn't see this discussed in the extension documentation. I think you should add some discussion of it there, and I'd recommend changing it to -fsycl-explicit-simd or just -fsycl-simd.

Co-authored-by: John Pennycook <[email protected]> Co-authored-by: rolandschulz <[email protected]>

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

rolandschulz · 2020-05-21T21:37:30Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+};
+
+template <int __N>
+using __mask_type_t = typename __vector_type<uint16_t, __N>::type;


The mask should also be a wrapper around the clang-vector type rather than the clang-vector type itself.

The internal storage should be implementation defined. uint16_t is a bad choice for some HW. Nor is it how clang-vector types works (using the same size int as the corresponding vector type used for comparison (e.g. long for double and int for float)).

I added a TODO

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

jasonsewall-intel · 2020-05-22T20:49:08Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+          CacheHint L3H = CacheHint::Default>
+    typename std::enable_if<(n == 16 || n == 32),
+                            simd<T, n * __NumChannels(Mask)>>::type
+    flat_load4(T *p, simd<uint32_t, n> offsets, simd<uint16_t, n> pred = 1);


Can this be parameterized? Why is 4 magic?

4 maps to Gen hardware memory access instruction (message)

I understand, but is that what it is going to be like for all future hardware? Would it be wise to parameterize this so that we don't need new function names for flat_loadX at some point?

I'm not sure. For there is 1 and 4 only, and people would have to use more verbose templated syntax anticipating other widths which is not clear if/when will happen.

Can't this be deduced?

Probably. Do you have an example?

jasonsewall-intel · 2020-05-22T20:58:18Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+```cpp
+#include <iostream>
+#include <CL/sycl.hpp>
+#include <sycl_[[sycl_explicit_simd]].hpp>


This looks like a find+replace error.

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

kbobrovs · 2020-05-28T03:33:21Z

@rolandschulz , please take a look at the changes @kychendev pushed recently to address your comments

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

Pennycook · 2020-05-28T13:32:53Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+other devices will result in error. 
+
+Kernels and `SYCL_EXTERNAL` functions using ESP must be explicitly marked with
+the `[[sycl_explicit_simd]]` attribute. Subgroup size query within such


I don't want to bikeshed on this, but just a note: I think [[sycl_explicit_simd]] is better than a macro (for all the reasons @rolandschulz pointed out) but the spelling here is different to attributes like [[intel::reqd_work_group_size]].

We should decide what rules we're following for attributes (if any). I'll open an issue for this as well.

I think we should also update all the examples to align with this proposal: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/KernelRHSAttributes/SYCL_INTEL_attribute_style.asciidoc

Done. (@bader - I assume alignment means function type attribute placed before definition instead of function attribute placed before the declaration)

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

kbobrovs · 2020-05-28T20:57:54Z

-fsycl-esimd option option added to tweak FE behavior slightly compared to normal SYCL. In the future it is expected that no such distinction will be necessary, and the need in extra -fsycl-esimd option will go away.

I didn't see this discussed in the extension documentation. I think you should add some discussion of it there, and I'd recommend changing it to -fsycl-explicit-simd or just -fsycl-simd.

This one is implemented in #1743

keryell

That seems useful for this SIMD architecture. Since it is also specific for some instructions, perhaps you could add some links to some public presentations/documentation to help understanding?

kbobrovs · 2020-06-02T03:36:08Z

@keryell, @Ruyk, there is only documentation for prior incarnation called "C-for-Metal". Some links:
https://software.intel.com/content/www/us/en/develop/articles/intel-c-for-metal-the-precious-metal-for-computing-on-intel-graphics-cards.html

https://01.org/c-for-metal-development-package/downloads/cm-programming-reference-1

@kychendev, please add if you are aware of anything else

Co-authored-by: John Pennycook <[email protected]>

kbobrovs · 2020-06-02T04:59:39Z

These comments haven't been addressed yet. If @kychendev isn't opposed to the direction, I think we should rename these functions before the merge. It seems like a small change to me.

@Pennycook, I think @kychendev updated the reduction APIs - please take a look

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

Pennycook · 2020-06-02T13:36:09Z

@Pennycook, I think @kychendev updated the reduction APIs - please take a look

Looks great, thanks!

rolandschulz · 2020-06-02T20:32:34Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+Here are some future directions in which this API is intended to evolve:
+- enabling this extension for other architectures, such as x86, with extracting
+  and clearly marking generic and target-dependent API portions
+- aligning with `std::simd` and maybe providing `std::simd` implementation atop


Why would we want to implement std::simd on top of sycl::intel::gpu::simd rather than directly implementing the former on top of clang vectors?

to avoid re-implementing lots of stuff and take advantage of sycl::intel::gpu::simd efficiency

Implementing it on top of gpu::simd is the strategy which requires a reimplementation of std::simd features missing from gpu::simd. On the other hand there is an existing implementation directly on top of gcc/clang vector: https://github.com/VcDevel/std-simd . We should explore how we can reuse most of that implementation also for the GPU.

This is a pretty fundamental design question, I'm sorry if I am missing context, but I just found out your work today. First of all, 👍 thanks for working on a SIMD type for GPU. I wanted to do this since years but didn't have the manpower/time.

What I'd like to know is why you can't (don't want to) take the std::experimental::simd specification as is. I'd be very happy to help. Especially with folding back improvements and extensions into C++ committee papers targeting C++23. If the simd<T, ?> type is 1:1 portable between CPU and GPU that would be a huge win in terms of software maintenance, and freedom to move to newer platforms.

The short answer is that the primary target for this specification is Intel GPU, which is reflected in the namespace.

But generalizing the spec (and implementation) to fit more devices and be more compatible
with std C++ is definitely welcome, as long as it does not hurt GPU performance. simd<T, ?>
type is portable to CPU, but performance of various associated APIs is another question. Our
plan is to upstream the existing initial implementation, then iteratively improve it along with
the specification, making sure it remains functional and performant on Intel GPU.

The short answer is that the primary target for this specification is Intel GPU, which is reflected in the namespace.

Right. I think it totally makes sense to explore what would be the perfect API for programming a specific target. It makes less sense to write applications with it. Because portability and being able to move to newer/other hardware is an important reason for using SIMD types in the first place.
Also, "Intel GPU" is a narrow target right now. But do you know whether it will stay that way?

But generalizing the spec (and implementation) to fit more devices and be more compatible
with std C++ is definitely welcome, as long as it does not hurt GPU performance.

👍 A SIMD type that doesn't allow you to reach full performance is a failure (or hopefully just buggy and needs to be fixed). Performance is the reason why we use it.

I guess what I'd like to discuss is: If you deviate from the Parallelism TS 2, why? Is it to simplify your implementation, is it because of missing functionality, is it because of performance, or? This is important feedback to the C++ committee when merging the TS into the IS is discussed.

To help you understand our points of view, I recommend you take a look the documents and examples at https://github.com/intel/cm-compiler

Be sure that all the interesting links which were mentioned in the discussions end up in the extension description itself, so it is not lost for the future.

Added these comments to the bottom of the doc as TODO

rolandschulz · 2020-06-02T20:41:34Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+
+The `sycl::intel::gpu::simd` class is a vector templated on some element type. The element type must be vectorizable type. The set of vectorizable types is the set of fundamental SYCL arithmetic types (C++ arithmetic types or `half` type) excluding `bool`. The length of the vector is the second template parameter.
+
+Each simd class object is mapped to a consecutive block of general register


This isn't guaranteed. And any stack variable is mapped to a register unless spilled. Not sure what you try to say with this sentence.

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

rolandschulz · 2020-06-03T08:08:23Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+  // Constructors.
+  simd_view(BaseTy &Base, RegionTy Region);
+  simd_view(BaseTy &&Base, RegionTy Region);
+  simd_view(simd_view &Other);


Is this intentional not a correct copy constructor (would need to be const for that)?

I believe we agreed that simd_view would have a deleted copy and move constructor. Why are they suddenly back?

added TODO to the headers

rolandschulz · 2020-06-03T08:08:56Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+  using element_type = typename ShapeTy::element_type;
+
+  // Constructors.
+  simd_view(BaseTy &Base, RegionTy Region);


Why is this and the next constructor public? Those should only be called internally by e.g. select, correct?

added TODO to the headers

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

rolandschulz · 2020-06-03T08:37:31Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+See auto-generated documentation for the complete list of APIs here. (TBD)
+
+#### USM pointer-based memory access
+##### Flat-address gather/scatter


Also another area where it needless differs from std::simd. Non-gather load/store is called copy_from and copy_to. Why invent new names just to confuse people?

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

Co-authored-by: John Pennycook <[email protected]>

keryell · 2020-06-04T20:33:57Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+    auto Acc1 = Buf1.get_access<cl::sycl::access::mode::read>(Cgh);
+    auto Acc2 = Buf2.get_access<cl::sycl::access::mode::read_write>(Cgh);
+
+    Cgh.single_task<class KernelID>([=] () [[intel::sycl_explicit_simd]] {


I do not remember whether we can remove the () when we have an attribute...

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

keryell · 2020-06-04T21:54:15Z

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

+      sycl::intel::gpu<float, VL> va = sycl::intel::gpu::flat_block_load<float, VL>(A + offset);
+      sycl::intel::gpu<float, VL> vb = sycl::intel::gpu::flat_block_load<float, VL>(B + offset);
+      sycl::intel::gpu<float, VL> vc = va + vb;
+      sycl::intel::gpu::flat_block_store<float, VL>(C + offset, vc);


Is it not possible to infer float and VL?

Compiler complains that it can't determine vector element type

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

keryell · 2020-06-04T22:04:52Z

Definitely useful for SIMD architectures and... FPGA! :-) @aisoard @javier-cabezas
It would be nice to converge on the naming at some point.

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

kbobrovs · 2020-06-12T21:45:45Z

@Ruyk , @keryell , @mattkretz , @iburyl , @rolandschulz , and other reviewers - apologies for delay.
All review comments are greatly appreciated, thank you all!
Some of them have been addressed in the PRs with actual SIMD library code, and will be propagated to the spec shortly:
#1853
#1854
#1855
#1856
(At some point we should create automated API spec generation from sources to avoid duplicating in two places.)

This is initial experimental version to try out functionality and performance. We are going to work with the community on incrementally improving the APIs to bring them closer to std C++ and SYCL in the next several months. At this point we would like to

"freeze" the experimental APIs until
- we have some working version in this branch
- we have initial performance regression suite in place to check API changes
and
- collect all unresolved review comments and add them to the TODO section in this document
- merge the current state of the experimental spec

Then review can be resumed and we can then keep adding comments to this document and/or the sources.

jbrodman · 2020-06-26T16:39:38Z

Will approve after @rolandschulz does.

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

- Collected TODOs and: * addressed some of them * summarized some of those in the end of this document * added others directly to the source code - Removed API definitions from the spec, as they are now part of the source base Signed-off-by: Konstantin S Bobrovsky <[email protected]>

kbobrovs · 2020-06-27T04:14:27Z

@rolandschulz, @Pennycook, @keryell, @Ruyk, @iburyl, @mattkretz, @jasonsewall-intel - sorry for delay. I updated the spec, please take another look. Specifically:

removed the API definition, as we now have sources. The spec will just reference the auto-generated API docs
addressed some of the comments
summarized others in the end of the doc
moved your comments to specific APIs as TODOs to the code, please see the following patches:

Pennycook

Thanks for adding the TODOs, @kbobrovs. This LGTM.

Ruyk

Thanks, looks good so far, and looking forward to see those TODO's acted upon later on!

rolandschulz

All remaining issues will be resolved later

[SYCL][SIMD] Intel "Explicit SIMD" extension documentation draft.

635c034

Contributors (alphabetic order): Konst Bobrovsky Peter Caday Gang Chen Kai Yu Chen Ken Lueh Wei Pan Xinmin Tian Signed-off-by: Konstantin S Bobrovsky <[email protected]>

kbobrovs requested review from jbrodman and mkinsner as code owners May 21, 2020 02:40

kbobrovs requested review from rolandschulz, Pennycook and kychendev May 21, 2020 02:41

rolandschulz requested changes May 21, 2020

View reviewed changes

Ruyk reviewed May 21, 2020

View reviewed changes

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md Outdated Show resolved Hide resolved

Ruyk reviewed May 21, 2020

View reviewed changes

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md Outdated Show resolved Hide resolved

Pennycook requested changes May 21, 2020

View reviewed changes

Apply suggestions from code review

9b28aac

Co-authored-by: John Pennycook <[email protected]> Co-authored-by: rolandschulz <[email protected]>

kbobrovs force-pushed the esimd-spec branch from 5e629fc to 9b28aac Compare May 21, 2020 20:58

[SYCL][SIMD] Define __mask_type_t, other ESIMD spec review comments.

6a91d65

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

rolandschulz reviewed May 21, 2020

View reviewed changes

kbobrovs added 2 commits May 21, 2020 18:25

[SYCL][CM] Replaced EXPLICIT_SIMD macro with [[sycl_explicit_simd]] .

54f3548

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

[SYCL][CM] Added TODO to the spec to create another example.

33deecf

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

jasonsewall-intel reviewed May 22, 2020

View reviewed changes

kbobrovs mentioned this pull request May 26, 2020

[SYCL][ESIMD] Add driver option to enable SYCL Explicit SIMD extension. #1743

Merged

kbobrovs commented May 26, 2020

View reviewed changes

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md Outdated Show resolved Hide resolved

kbobrovs and others added 2 commits May 26, 2020 16:15

Update sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md

67ad44c

[SYCL][ESIMD] Update esimd API to address review comments.

f4445e4

kbobrovs requested review from rolandschulz, Pennycook and Ruyk May 28, 2020 03:33

Pennycook requested changes May 28, 2020

View reviewed changes

kbobrovs requested a review from Pennycook May 28, 2020 18:41

keryell reviewed May 29, 2020

View reviewed changes

[SYCL][ESIMD] Update reduction APIs to better align with std::simd.

f33b370

Apply suggestions from code review

ddfb09e

Co-authored-by: John Pennycook <[email protected]>

Review: attribute style fix, add intel::, add TODOs

7a3cf33

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

rolandschulz reviewed Jun 3, 2020

View reviewed changes

kbobrovs commented Jun 3, 2020

View reviewed changes

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md Outdated Show resolved Hide resolved

Apply suggestions from code review

2462d35

Co-authored-by: John Pennycook <[email protected]>

keryell reviewed Jun 4, 2020

View reviewed changes

mattkretz reviewed Jun 5, 2020

View reviewed changes

sycl/doc/extensions/ExplicitSIMD/dpcpp-explicit-simd.md Outdated Show resolved Hide resolved

bader changed the title ~~[SYCL][SIMD] Intel "Explicit SIMD" extension documentation draft.~~ [SYCL][ESIMD] Intel "Explicit SIMD" extension documentation draft. Jun 12, 2020

bader added the esimd Explicit SIMD feature label Jun 12, 2020

kbobrovs added 2 commits June 26, 2020 14:04

[SQUASH] Address review comments.

3f59568

Signed-off-by: Konstantin S Bobrovsky <[email protected]>

Pennycook approved these changes Jun 29, 2020

View reviewed changes

jbrodman approved these changes Jul 9, 2020

View reviewed changes

Ruyk approved these changes Jul 15, 2020

View reviewed changes

rolandschulz approved these changes Jul 20, 2020

View reviewed changes

kbobrovs merged commit 84bf234 into intel:sycl Jul 21, 2020

kbobrovs deleted the esimd-spec branch July 21, 2020 04:36


		The `sycl::intel::gpu::simd` class is a vector templated on some element type. The element type must be vectorizable type. The set of vectorizable types is the set of fundamental SYCL arithmetic types (C++ arithmetic types or `half` type) excluding `bool`. The length of the vector is the second template parameter.

		Each simd class object is mapped to a consecutive block of general register

[SYCL][ESIMD] Intel "Explicit SIMD" extension documentation draft. #1731

[SYCL][ESIMD] Intel "Explicit SIMD" extension documentation draft. #1731

Conversation

kbobrovs commented May 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pennycook commented May 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kbobrovs commented May 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kbobrovs commented May 28, 2020

keryell left a comment

Choose a reason for hiding this comment

kbobrovs commented Jun 2, 2020

kbobrovs commented Jun 2, 2020

Pennycook commented Jun 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keryell commented Jun 4, 2020

kbobrovs commented Jun 12, 2020

jbrodman commented Jun 26, 2020

kbobrovs commented Jun 27, 2020

Pennycook left a comment

Choose a reason for hiding this comment

Ruyk left a comment

Choose a reason for hiding this comment

rolandschulz left a comment

Choose a reason for hiding this comment

kbobrovs commented May 21, 2020 •

edited

Loading