Skip to content

[SYCL] [FPGA] Update the experimental latency control API to use property list #5993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 87 additions & 63 deletions sycl/doc/extensions/supported/sycl_ext_intel_dataflow_pipes.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,9 @@ A pipe type is a specialization of the pipe class:

[source,c++,Pipe type def,linenums]
----
template <typename name,
typename dataT,
size_t min_capacity = 0>
template <typename Name,
typename DataT,
size_t MinCapacity = 0>
class pipe;
----

Expand All @@ -131,7 +131,7 @@ using pipe<class bar, float, 5>;

The interface of a pipe is through static member functions, and instances of a pipe class cannot be instantiated. Allowing instances of pipe objects, when their type defines connectivity, would introduce an error prone secondary mechanism of reference.

The first template parameter, `name`, can be any type, and is typically expected to be a user defined class in a user namespace. The type only needs to be forward declared, and not defined.
The first template parameter, `Name`, can be any type, and is typically expected to be a user defined class in a user namespace. The type only needs to be forward declared, and not defined.

Above this basic mechanism of {cpp} type being used to identify a pipe, additional layers can be built on top to contain the type in an object that can be passed by value. Because such mechanisms (e.g. `boost::hana::type`) can layer on top of the fundamental type-based mechanism described here, those mechanisms are not included in the extension specification.

Expand Down Expand Up @@ -166,37 +166,37 @@ myQueue.submit([&](handler& cgh) {

== Read/write member functions, and pipe template parameters

The pipe class exposes static member functions for writing a data word to a pipe, and for reading a data word from a pipe. A data word in this context is the data type that the pipe contains (`dataT` pipe template argument).
The pipe class exposes static member functions for writing a data word to a pipe, and for reading a data word from a pipe. A data word in this context is the data type that the pipe contains (`DataT` pipe template argument).

Blocking and non-blocking forms of the read and write members are defined, with the form chosen based on overload resolution.

[source,c++,Read write members,linenums]
----
template <typename name,
typename dataT,
size_t min_capacity = 0>
template <typename Name,
typename DataT,
size_t MinCapacity = 0>
class pipe {
// Blocking
static dataT read();
static void write( const dataT &data );
static DataT read();
static void write( const DataT &Data );

// Non-blocking
static dataT read( bool &success_code );
static void write( const dataT &data, bool &success_code );
static DataT read( bool &Success );
static void write( const DataT &Data, bool &Success );

// Static members
using value_type = dataT;
size_t min_capacity = min_capacity;
using value_type = DataT;
size_t min_capacity = MinCapacity;
}
----

The read and write member functions may be invoked within device code, or within a SYCL host program. Some interfaces may not be available on all devices/implementations, but the pipe definition itself does not gate availability. Instead, implementations should error if an unsupported pipe is used. See section <<device_queries>> for information on querying the availability of specific pipe features relative to a device.

The template parameters of the device type are defined as:

* `name`: Type that is the basis of pipe identification. Typically a user-defined class, in a user namespace. Forward declaration of the type is sufficient, and the type does not need to be defined.
* `dataT`: The type of data word/packet contained within a pipe. This is the data type that is read during a successful `pipe::read` operation, or written during a successful `pipe::write` operation. The type must be standard layout and trivially copyable. This template parameter can be queried by using the `value_type` type alias.
* `min_capacity`: User defined minimum number of words in units of `dataT` that the pipe must be able to store without any being read out. A minimum capacity is required in some algorithms to avoid deadlock, or for performance tuning. An implementation can include more capacity than this parameter, but not less. This template parameter can be queried by using the `min_capacity` static member.
* `Name`: Type that is the basis of pipe identification. Typically a user-defined class, in a user namespace. Forward declaration of the type is sufficient, and the type does not need to be defined.
* `DataT`: The type of data word/packet contained within a pipe. This is the data type that is read during a successful `pipe::read` operation, or written during a successful `pipe::write` operation. The type must be standard layout and trivially copyable. This template parameter can be queried by using the `value_type` type alias.
* `MinCapacity`: User defined minimum number of words in units of `DataT` that the pipe must be able to store without any being read out. A minimum capacity is required in some algorithms to avoid deadlock, or for performance tuning. An implementation can include more capacity than this parameter, but not less. This template parameter can be queried by using the `min_capacity` static member.

== Pipe types and {cpp} scope

Expand Down Expand Up @@ -260,12 +260,12 @@ Pipes expose two additional static member functions that are available within ho

[source,c++,Read write members,linenums]
----
template <typename name,
typename dataT,
size_t min_capacity = 0>
template <typename Name,
typename DataT,
size_t MinCapacity = 0>
class pipe {
template <pipe_property::writeable host_writeable>
static dataT* map(size_t requested_size, size_t &mapped_size);
static DataT* map(size_t requested_size, size_t &mapped_size);

static size_t unmap(T* mapped_ptr, size_t size_to_unmap);
}
Expand All @@ -284,11 +284,11 @@ The APIs are defined as:
|Function |Description
|`template <pipe_property::writeable host_writeable> +
dataT* map(size_t requested_size, size_t &mapped_size);`
|Returns a _dataT *_ in the host address space. The host can write data to this pointer for reading by a device pipe endpoint, if it was created with template parameter `host_writeable = true`. Alternatively, the host can read data from this pointer if it was created with template parameter `host_writeable = false`.
|Returns a DataT *_ in the host address space. The host can write data to this pointer for reading by a device pipe endpoint, if it was created with template parameter `host_writeable = true`. Alternatively, the host can read data from this pointer if it was created with template parameter `host_writeable = false`.

The value returned in the mapped_size argument specifies the maximum number of bytes that the host can access. The value specified by _mapped_size_ is less than or equal to the value of the _requested_size_ argument that the caller specifies. _mapped_size_ does not impact the _min_capacity_ property of the pipe.

After writing to or reading from the returned _dataT *_, the host must execute one or more `unmap` calls on the same pipe, to signal to the runtime that data is ready for transfer to the device (on a write), and that the runtime can reclaim the memory for reuse (on a read or write). If `map` is called on a pipe before `unmap` has been used to unmap all memory mapped by a previous `map` operation, the buffer returned by the second `map` call will not overlap with that returned by the first call.
After writing to or reading from the returned DataT *_, the host must execute one or more `unmap` calls on the same pipe, to signal to the runtime that data is ready for transfer to the device (on a write), and that the runtime can reclaim the memory for reuse (on a read or write). If `map` is called on a pipe before `unmap` has been used to unmap all memory mapped by a previous `map` operation, the buffer returned by the second `map` call will not overlap with that returned by the first call.

|`static size_t unmap(T* mapped_ptr, size_t size_to_unmap);`
|Signals to the runtime that the host is no longer using _size_to_unmap_ bytes of the host allocation that was returned previously from a call to `map`. In the case of a writeable host pipe, calling `unmap` allows the unmapped data to become available to the kernel. If the _size_to_unmap_ value is smaller than the _mapped_size_ value specified to `map`, then multiple `unmap` function calls are necessary to unmap the full capacity of the host allocation. It is legal to perform multiple `unmap` function calls to unmap successive bytes in the buffer returned by `map`, up to the _mapped_size_ value defined in the `map` operation.
Expand Down Expand Up @@ -382,22 +382,22 @@ The pipe class described above exposes both read and write static member functio

[source,c++,iopipes,linenums]
----
template <typename name,
typename dataT,
size_t min_capacity = 0>
template <typename Name,
typename DataT,
size_t MinCapacity = 0>
class kernel_readable_io_pipe {
public:
static dataT read(); // Blocking
static dataT read( bool &success_code ); // Non-blocking
static DataT read(); // Blocking
static DataT read( bool &Success ); // Non-blocking
};

template <typename name,
typename dataT,
size_t min_capacity = 0>
template <typename Name,
typename DataT,
size_t MinCapacity = 0>
class kernel_writeable_io_pipe {
public:
static void write( dataT data ); // Blocking
static void write( dataT data, bool &success_code ); // Non-blocking
static void write( DataT Data ); // Blocking
static void write( DataT Data, bool &Success ); // Non-blocking
}
----

Expand Down Expand Up @@ -642,55 +642,76 @@ Automated mechanisms are possible to provide uniquification across calls, and co

*NOTE*: The APIs described in this section are experimental. Future versions of this extension may change these APIs in ways that are incompatible with the versions described here.

In the experimental API version, read/write methods take template arguments, which can contain the latency control properties `latency_anchor_id` and/or `latency_constraint`.
The Intel FPGA experimental `pipe` class is implemented in `sycl/ext/intel/experimental/pipes.hpp` which is included in `sycl/ext/intel/fpga_extensions.hpp`.

In the experimental API version, read/write methods take in a property list as function argument, which can contain the latency control properties `latency_anchor_id` and/or `latency_constraint`.

* `sycl::ext::intel::experimental::latency_anchor_id<N>`, where `N` is an integer: An ID to associate with the current read/write function call, which can then be referenced by other `latency_constraint` properties elsewhere in the program to define relative latency constaints. ID must be unique within the application, and a diagnostic is required if that condition is not met.
* `sycl::ext::intel::experimental::latency_constraint<A, B, C>`: A tuple of three values which cause the current read/write function call to act as an endpoint of a latency constraint relative to a specified `latency_anchor_id` defined by a different instruction.
** `A` is an integer: The ID of the target anchor defined on a different instruction through a `latency_anchor_id` property.
** `B` is an enum value: The type of control from the set {`type::exact`, `type::max`, `type::min`}.
** `B` is an enum value: The type of control from the set {`latency_control_type::exact`, `latency_control_type::max`, `latency_control_type::min`}.
** `C` is an integer: The relative clock cycle difference between the target anchor and the current function call, that the constraint should infer subject to the type of the control (exact, max, min).

The template arguments above don't have to be specified if user doesn't want to apply latency controls. The template arguments can be passed in arbitrary order.

=== Implementation
=== Synopsis

[source,c++]
----
// Added in version 2 of this extension.
namespace sycl::ext::intel::experimental {
enum class type {
enum class latency_control_type {
none, // default
exact,
max,
min
};

template <int32_t _N> struct latency_anchor_id {
static constexpr int32_t value = _N;
static constexpr int32_t default_value = -1;
struct latency_anchor_id_key {
template <int Anchor>
using value_t =
oneapi::experimental::property_value<latency_anchor_id_key,
std::integral_constant<int, Anchor>>;
};

template <int32_t _N1, type _N2, int32_t _N3> struct latency_constraint {
static constexpr std::tuple<int32_t, type, int32_t> value = {_N1, _N2, _N3};
static constexpr std::tuple<int32_t, type, int32_t> default_value = {
0, type::none, 0};
struct latency_constraint_key {
template <int Target, latency_control_type Type, int Cycle>
using value_t = oneapi::experimental::property_value<
latency_constraint_key, std::integral_constant<int, Target>,
std::integral_constant<latency_control_type, Type>,
std::integral_constant<int, Cycle>>;
};

template <typename name,
typename dataT,
size_t min_capacity = 0>
template <int Anchor>
inline constexpr latency_anchor_id_key::value_t<Anchor> latency_anchor_id;

template <int Target, latency_control_type Type, int Cycle>
inline constexpr latency_constraint_key::value_t<Target, Type, Cycle>
latency_constraint;

template <typename Name,
typename DataT,
size_t MinCapacity = 0>
class pipe {
// Blocking
template <class... _Params>
static dataT read();
template <class... _Params>
static void write( const dataT &data );
static DataT read();

template <typename PropertiesT>
static DataT read( PropertiesT Properties );

static void write( const DataT &Data);

template <typename PropertiesT>
static void write( const DataT &Data, PropertiesT Properties );

// Non-blocking
template <class... _Params>
static dataT read( bool &success_code );
template <class... _Params>
static void write( const dataT &data, bool &success_code );
static DataT read( bool &Success );

template <typename PropertiesT>
static DataT read( bool &Success, PropertiesT Properties );

static void write( const DataT &Data, bool &Success );

template <typename PropertiesT>
static void write( const DataT &Data, bool &Success, PropertiesT Properties );
}
} // namespace sycl::ext::intel::experimental
----
Expand All @@ -709,17 +730,20 @@ using Pipe3 = ext::intel::experimental::pipe<class PipeClass2, int, 8>;
myQueue.submit([&](handler &cgh) {
cgh.single_task<class foo>([=] {
// The following Pipe1::read is anchor 0
int value = Pipe1::read<ext::intel::experimental::latency_anchor_id<0>>();
int value = Pipe1::read(
ext::oneapi::experimental::properties(latency_anchor_id<0>));

// The following Pipe2::write is anchor 1
// The following Pipe2::write occurs exactly 2 cycles after anchor 0
Pipe2::write<ext::intel::experimental::latency_anchor_id<1>,
ext::intel::experimental::latency_constraint<
0, ext::intel::experimental::type::exact, 2>>(value);
Pipe2::write(value,
ext::oneapi::experimental::properties(
latency_anchor_id<1>,
latency_constraint<0, latency_control_type::exact, 2>));

// The following Pipe3::write occurs at least 2 cycles after anchor 1
Pipe3::write<ext::intel::experimental::latency_constraint<
1, ext::intel::experimental::type::min, 2>>(value);
Pipe3::write(value,
ext::oneapi::experimental::properties(
latency_constraint<1, latency_control_type::min, 2>));
});
});
----
Expand Down
Loading