Skip to content

[SYCL][Doc] Add draft LocalMemory extension #2338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions sycl/doc/extensions/LocalMemory/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# SYCL_INTEL_local_memory

A free function enabling the declaration of local memory objects at kernel function scope.
228 changes: 228 additions & 0 deletions sycl/doc/extensions/LocalMemory/SYCL_INTEL_local_memory.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
= SYCL_INTEL_local_memory

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}

// This is necessary for asciidoc, but not for asciidoctor
:cpp: C++

== Introduction
IMPORTANT: This specification is a draft.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

NOTE: This document is better viewed when rendered as html with asciidoctor.
GitHub does not render image icons.

This document describes an extension enabling the declaration of local memory
objects at the kernel functor scope.

== Name Strings

+SYCL_INTEL_local_memory+

== Notice

Copyright (c) 2020 Intel Corporation. All rights reserved.

== Status

Working Draft

This is a preview extension specification, intended to provide early access to
a feature for review and community feedback. When the feature matures, this
specification may be released as a formal extension.

Because the interfaces defined by this specification are not final and are
subject to change they are not intended to be used by shipping software
products.

== Version

Built On: {docdate} +
Revision: 1

== Contact

John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com)
Roland Schulz, Intel (roland 'dot' schulz 'at' intel 'dot' com)

== Contributors

Felipe de Azevedo Piovezan, Intel
Michael Kinsner, Intel

== Dependencies

This extension is written against the SYCL 1.2.1 specification, revision 6.

== Overview

OpenCL provides two ways for local memory to be used in a kernel:

* The kernel accepts a pointer in the `local` address space as an argument,
and the host passes the size of the allocation to the OpenCL runtime when
the kernel is launched.
* The kernel declares `local` variables in the kernel function
scope.

In SYCL, programmers have two choices:

* Local accessors created by the host, analogous to the OpenCL kernel argument
mechanism.
* Variables declared at the kernel functor scope, in hierarchical parallelism
kernels.

Note that SYCL currently lags behind OpenCL when it comes to local memory
allocations; in particular, work-group data parallel SYCL kernels are limited
to the accessor method. This is undesirable for some architectures, where
allocating local memory with a compile-time known size is required for
performance.

This limitation is also undesirable from a usability point of view, since
programmers have to declare an accessor *outside* a kernel and capture it
inside the kernel functor.

This extension introduces a concept of group-local memory, with semantics
similar to OpenCL kernel-scope `local` variables and C++ `thread_local`
variables.

== Modifications of SYCL 1.2.1 Specification

=== Modify sentence in Section 3.5.2.1 Access to memory

==== From:

To allocate local memory within a kernel, the user can either pass a
`cl::sycl::local_accessor` object to the kernel as a parameter, or can define a
variable in work-group scope inside `cl::sycl::parallel_for_work_group`.

==== To:

To allocate local memory within a kernel, the user can:

* Pass a `cl::sycl::local_accessor` object to the kernel as a parameter.
* Define a variable in work-group scope inside `cl::sycl::parallel_for_work_group`.
* Define a group-local variable at the kernel functor scope of a work-group
data parallel kernel using the `group_local_memory` or
`group_local_memory_for_overwrite` functions.

[_Note_ - The restriction that group-local variables must be defined at kernel
functor scope may be lifted in a future version of this extension.]

==== Extend Section 4.8.5.2

==== Include paragraphs:

The `nd_range` variant of `parallel_for` also enables the declaration of
group-local variables; those variables are allocated in the an address space
accessible by all work-items in the group and are shared by all work-items of a
work-group.

[source,c++]
----
myQueue.submit([&](handler &cgh) {
cgh.parallel_for<class example_kernel>(
nd_range<1>(range<1>(128), range<1>(32)), [=](nd_item<1> item) {
multi_ptr<int[64], access::address_space::local_space> ptr = group_local_memory<int[64]>(item.get_group());
auto& ref = *ptr;
ref[2 * item.get_local_linear_id()] = 42;
});
});
----

The example above creates a kernel with four work-groups, each containing 32
work-items. An `int[64]` object is defined as a group-local variable, and
each work-item in the work-group obtains a `multi_ptr` to the same variable.

There are two interfaces for defining group-local variables:

[source,c++]
----
namespace sycl {

template <typename T, typename Group, typename... Args>
multi_ptr<T, Group::address_space> group_local_memory(Group g, Args&&... args);

template <typename T, typename Group>
multi_ptr<T, Group::address_space> group_local_memory_for_overwrite(Group g);

} // namespace sycl
----

==== Add table: Functions for defining group-local variables

[frame="topbot",options="header,footer"]
|======================
|Functions |Description

|`template <typename T, typename Group, typename ... Args>
multi_ptr<T, Group::address_space> group_local_memory(Group g, Args&&... args)` |
Constructs an object of type `T` in an address space accessible by all
work-items in group _g_, forwarding _args_ to the constructor's parameter list.
The constructor is called once per group, upon or before the first call to
`group_local_memory`. The storage for the object is allocated upon or before
the first call to `group_local_memory`, and deallocated when all work-items in
the group have completed execution of the kernel.

All arguments in _args_ must be the same for all work-items in the group.

`Group` must be `sycl::group`, and `T` must be trivially destructible.
[_Note_ - These restrictions may be lifted in a future version of this
extension.]

|`template <typename T, typename Group>
multi_ptr<T, Group::address_space> group_local_memory_for_overwrite(Group g)` |
Constructs an object of type `T` in an address space accessible by all
work-items in group _g_, using default initialization. The object is
initialized pon or before the first call to `group_local_memory`. The storage
for the object is allocated upon or before the first call to
`group_local_memory`, and deallocated when all work-items in the group have
completed execution of the kernel.

All arguments in _args_ must be the same for all work-items in the group.

`Group` must be `sycl::group`, and `T` must be trivially destructible.
[_Note_ - These restrictions may be lifted in a future version of this
extension.]

|======================

== Issues

None.

== Revision History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|========================================
|Rev|Date|Author|Changes
|1|2020-08-18|John Pennycook|*Initial public working draft*
|========================================

//************************************************************************
//Other formatting suggestions:
//
//* Use *bold* text for host APIs, or [source] syntax highlighting.
//* Use +mono+ text for device APIs, or [source] syntax highlighting.
//* Use +mono+ text for extension names, types, or enum values.
//* Use _italics_ for parameters.
//************************************************************************
1 change: 1 addition & 0 deletions sycl/doc/extensions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ DPC++ extensions status:
| [SYCL_INTEL_reqd_work_group_size](ReqdWorkGroupSize/SYCL_INTEL_reqd_work_group_size.asciidoc) | Supported(OpenCL: CPU, GPU) | |
| [SPV_INTEL_function_pointers](SPIRV/SPV_INTEL_function_pointers.asciidoc) | Supported(OpenCL: CPU, GPU; HOST) | |
| [SPV_INTEL_inline_assembly](SPIRV/SPV_INTEL_inline_assembly.asciidoc) | Supported(OpenCL: GPU) | |
| [SYCL_INTEL_local_memory](LocalMemory/SYCL_INTEL_local_memory.asciidoc) | Proposal | |
| [SYCL_INTEL_static_local_memory_query](StaticLocalMemoryQuery/SYCL_INTEL_static_local_memory_query.asciidoc) | Proposal | |
| [SYCL_INTEL_sub_group_algorithms](SubGroupAlgorithms/SYCL_INTEL_sub_group_algorithms.asciidoc) | Partially supported(OpenCL: CPU, GPU) | Features from SYCL_INTEL_group_algorithms extended to sub-groups |
| [Sub-groups for NDRange Parallelism](SubGroupNDRange/SubGroupNDRange.md) | Deprecated(OpenCL: CPU, GPU) | |
Expand Down