Skip to content

[SYCL] [FPGA] Update the experimental latency control API to use property list #5993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 28, 2022
Original file line number Diff line number Diff line change
Expand Up @@ -642,55 +642,70 @@ Automated mechanisms are possible to provide uniquification across calls, and co

*NOTE*: The APIs described in this section are experimental. Future versions of this extension may change these APIs in ways that are incompatible with the versions described here.

In the experimental API version, read/write methods take template arguments, which can contain the latency control properties `latency_anchor_id` and/or `latency_constraint`.
In the experimental API version, read/write methods take in a property list as function argument, which can contain the latency control properties `latency_anchor_id` and/or `latency_constraint`.

* `sycl::ext::intel::experimental::latency_anchor_id<N>`, where `N` is an integer: An ID to associate with the current read/write function call, which can then be referenced by other `latency_constraint` properties elsewhere in the program to define relative latency constaints. ID must be unique within the application, and a diagnostic is required if that condition is not met.
* `sycl::ext::intel::experimental::latency_constraint<A, B, C>`: A tuple of three values which cause the current read/write function call to act as an endpoint of a latency constraint relative to a specified `latency_anchor_id` defined by a different instruction.
* `sycl::ext::oneapi::experimental::latency_anchor_id<N>`, where `N` is an integer: An ID to associate with the current read/write function call, which can then be referenced by other `latency_constraint` properties elsewhere in the program to define relative latency constaints. ID must be unique within the application, and a diagnostic is required if that condition is not met.
* `sycl::ext::oneapi::experimental::latency_constraint<A, B, C>`: A tuple of three values which cause the current read/write function call to act as an endpoint of a latency constraint relative to a specified `latency_anchor_id` defined by a different instruction.
** `A` is an integer: The ID of the target anchor defined on a different instruction through a `latency_anchor_id` property.
** `B` is an enum value: The type of control from the set {`type::exact`, `type::max`, `type::min`}.
** `B` is an enum value: The type of control from the set {`latency_control_type::exact`, `latency_control_type::max`, `latency_control_type::min`}.
** `C` is an integer: The relative clock cycle difference between the target anchor and the current function call, that the constraint should infer subject to the type of the control (exact, max, min).

The template arguments above don't have to be specified if user doesn't want to apply latency controls. The template arguments can be passed in arbitrary order.

=== Implementation
=== Synopsis

[source,c++]
----
// Added in version 2 of this extension.
namespace sycl::ext::intel::experimental {
enum class type {
none, // default
exact,
max,
min
};

template <int32_t _N> struct latency_anchor_id {
static constexpr int32_t value = _N;
static constexpr int32_t default_value = -1;
};

template <int32_t _N1, type _N2, int32_t _N3> struct latency_constraint {
static constexpr std::tuple<int32_t, type, int32_t> value = {_N1, _N2, _N3};
static constexpr std::tuple<int32_t, type, int32_t> default_value = {
0, type::none, 0};
};
#include <sycl/ext/oneapi/latency_control/properties.hpp>
#include <sycl/ext/oneapi/properties/properties.hpp>
//***********sycl/ext/oneapi/latency_control/properties.hpp***************
// enum class latency_control_type {
// none, // default
// exact,
// max,
// min
// };
//
// struct latency_anchor_id_key {
// template <int Anchor>
// using value_t = property_value<latency_anchor_id_key,
// std::integral_constant<int, Anchor>>;
// };
//
// struct latency_constraint_key {
// template <int Target, latency_control_type Type, int Cycle>
// using value_t =
// property_value<latency_constraint_key,
// std::integral_constant<int, Target>,
// std::integral_constant<latency_control_type, Type>,
// std::integral_constant<int, Cycle>>;
// };
//************************************************************************

template <typename name,
typename dataT,
size_t min_capacity = 0>
class pipe {
// Blocking
template <class... _Params>
static dataT read();
template <class... _Params>
static void write( const dataT &data );

template <typename _propertiesT>
static dataT read( _propertiesT Properties );

static void write( const dataT &data);

template <typename _propertiesT>
static void write( const dataT &data, _propertiesT Properties );

// Non-blocking
template <class... _Params>
static dataT read( bool &success_code );
template <class... _Params>

template <typename _propertiesT>
static dataT read( bool &success_code, _propertiesT Properties );

static void write( const dataT &data, bool &success_code );

template <typename _propertiesT>
static void write( const dataT &data, bool &success_code, _propertiesT Properties );
}
} // namespace sycl::ext::intel::experimental
----
Expand All @@ -709,17 +724,20 @@ using Pipe3 = ext::intel::experimental::pipe<class PipeClass2, int, 8>;
myQueue.submit([&](handler &cgh) {
cgh.single_task<class foo>([=] {
// The following Pipe1::read is anchor 0
int value = Pipe1::read<ext::intel::experimental::latency_anchor_id<0>>();
int value = Pipe1::read(ext::oneapi::experimental::properties(
ext::oneapi::experimental::latency_anchor_id<0>));

// The following Pipe2::write is anchor 1
// The following Pipe2::write occurs exactly 2 cycles after anchor 0
Pipe2::write<ext::intel::experimental::latency_anchor_id<1>,
ext::intel::experimental::latency_constraint<
0, ext::intel::experimental::type::exact, 2>>(value);
Pipe2::write(value, ext::oneapi::experimental::properties(
ext::oneapi::experimental::latency_anchor_id<1>,
ext::oneapi::experimental::latency_constraint<
0, ext::oneapi::experimental::latency_control_type::exact, 2>));

// The following Pipe3::write occurs at least 2 cycles after anchor 1
Pipe3::write<ext::intel::experimental::latency_constraint<
1, ext::intel::experimental::type::min, 2>>(value);
Pipe3::write(value, ext::oneapi::experimental::properties(
ext::oneapi::experimental::latency_constraint<
1, ext::oneapi::experimental::latency_control_type::min, 2>));
});
});
----
Expand Down
155 changes: 81 additions & 74 deletions sycl/doc/extensions/supported/sycl_ext_intel_fpga_lsu.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,113 +126,119 @@ this extension may change these APIs in ways that are incompatible with the
versions described here.

In the experimental API version, member functions `load()` and `store()` take
template arguments, which can contain the latency control properties
`latency_anchor_id` and/or `latency_constraint`.
in a property list as function argument, which can contain the latency control
properties `latency_anchor_id` and/or `latency_constraint`.

1. **`sycl::ext::intel::experimental::latency_anchor_id<N>`, where `N` is an integer**:
1. **`sycl::ext::oneapi::experimental::latency_anchor_id<N>`, where `N` is an integer**:
represents ID of the current function call when it performs as an anchor. The ID
must be unique within the application, with a diagnostic required if that
condition is not met.
2. **`sycl::ext::intel::experimental::latency_constraint<A, B, C>`** contains control
2. **`sycl::ext::oneapi::experimental::latency_constraint<A, B, C>`** contains control
parameters when the current function performs as a non-anchor, where:
- **`A` is an integer**: The ID of the target anchor defined on a different
instruction through a `latency_anchor_id` property.
- **`B` is an enum value**: The type of control from the set
{`type::exact`, `type::max`, `type::min`}.
{`latency_control_type::exact`, `latency_control_type::max`, `latency_control_type::min`}.
- **`C` is an integer**: The relative clock cycle difference between the
target anchor and the current function call, that the constraint should
infer subject to the type of the control (exact, max, min).

The template arguments above don't have to be specified if user doesn't want to
apply latency controls. The template arguments can be passed in arbitrary order.

### Implementation
```c++
// Added in version 2 of this extension.
namespace sycl::ext::intel::experimental {
enum class type {
none, // default
exact,
max,
min
};

template <int32_t _N> struct latency_anchor_id {
static constexpr int32_t value = _N;
static constexpr int32_t default_value = -1;
};

template <int32_t _N1, type _N2, int32_t _N3> struct latency_constraint {
static constexpr std::tuple<int32_t, type, int32_t> value = {_N1, _N2, _N3};
static constexpr std::tuple<int32_t, type, int32_t> default_value = {
0, type::none, 0};
};

#include <sycl/ext/oneapi/latency_control/properties.hpp>
#include <sycl/ext/oneapi/properties/properties.hpp>
...
template <class... mem_access_params> class lsu final {
public:
lsu() = delete;

template <class... _Params, typename _T, access::address_space _space>
static _T load(sycl::multi_ptr<_T, _space> Ptr) {
template <typename _T, access::address_space _space, typename _propertiesT>
static _T load(sycl::multi_ptr<_T, _space> Ptr, _propertiesT Properties) {
check_space<_space>();
check_load();
#if defined(__SYCL_DEVICE_ONLY__) && __has_builtin(__builtin_intel_fpga_mem)
static constexpr auto _anchor_id =
__GetValue<int, latency_anchor_id, _Params...>::value;
static constexpr auto _constraint =
__GetValue3<int, type, int, latency_constraint, _Params...>::value;

static constexpr int _target_anchor = std::get<0>(_constraint);
static constexpr type _control_type = std::get<1>(_constraint);
static constexpr int _cycle = std::get<2>(_constraint);
int _type;
if (_control_type == type::none) {
_type = 0;
} else if (_control_type == type::exact) {
_type = 1;
} else if (_control_type == type::max) {
_type = 2;
} else { // _control_type == type::min
_type = 3;
using _latency_anchor_id_prop =
typename GetOrDefaultValT<_propertiesT,
oneapi::experimental::latency_anchor_id_key,
defaultLatencyAnchorIdProperty>::type;
using _latency_constraint_prop =
typename GetOrDefaultValT<_propertiesT,
oneapi::experimental::latency_constraint_key,
defaultLatencyConstraintProperty>::type;
static constexpr int32_t _anchor_id = _latency_anchor_id_prop::value;
static constexpr int32_t _target_anchor = _latency_constraint_prop::target;
static constexpr oneapi::experimental::latency_control_type _control_type =
_latency_constraint_prop::type;
static constexpr int32_t _relative_cycle = _latency_constraint_prop::cycle;

int32_t _control_type_code = 0; // latency_control_type::none
if constexpr (_control_type ==
oneapi::experimental::latency_control_type::exact) {
_control_type_code = 1;
} else if constexpr (_control_type ==
oneapi::experimental::latency_control_type::max) {
_control_type_code = 2;
} else if constexpr (_control_type ==
oneapi::experimental::latency_control_type::min) {
_control_type_code = 3;
}

return *__latency_control_mem_wrapper((_T *)Ptr, _anchor_id, _target_anchor,
_type, _cycle);
_control_type_code, _relative_cycle);
#else
return *Ptr;
#endif
}

template <class... _Params, typename _T, access::address_space _space>
static void store(sycl::multi_ptr<_T, _space> Ptr, _T Val) {
template <typename _T, access::address_space _space>
static _T load(sycl::multi_ptr<_T, _space> Ptr) {
return load<_T, _space>(Ptr, oneapi::experimental::properties{});
}

template <typename _T, access::address_space _space, typename _propertiesT>
static void store(sycl::multi_ptr<_T, _space> Ptr, _T Val,
_propertiesT Properties) {
check_space<_space>();
check_store();
#if defined(__SYCL_DEVICE_ONLY__) && __has_builtin(__builtin_intel_fpga_mem)
static constexpr auto _anchor_id =
__GetValue<int, latency_anchor_id, _Params...>::value;
static constexpr auto _constraint =
__GetValue3<int, type, int, latency_constraint, _Params...>::value;

static constexpr int _target_anchor = std::get<0>(_constraint);
static constexpr type _control_type = std::get<1>(_constraint);
static constexpr int _cycle = std::get<2>(_constraint);
int _type;
if (_control_type == type::none) {
_type = 0;
} else if (_control_type == type::exact) {
_type = 1;
} else if (_control_type == type::max) {
_type = 2;
} else { // _control_type == type::min
_type = 3;
using _latency_anchor_id_prop =
typename GetOrDefaultValT<_propertiesT,
oneapi::experimental::latency_anchor_id_key,
defaultLatencyAnchorIdProperty>::type;
using _latency_constraint_prop =
typename GetOrDefaultValT<_propertiesT,
oneapi::experimental::latency_constraint_key,
defaultLatencyConstraintProperty>::type;
static constexpr int32_t _anchor_id = _latency_anchor_id_prop::value;
static constexpr int32_t _target_anchor = _latency_constraint_prop::target;
static constexpr oneapi::experimental::latency_control_type _control_type =
_latency_constraint_prop::type;
static constexpr int32_t _relative_cycle = _latency_constraint_prop::cycle;

int32_t _control_type_code = 0; // latency_control_type::none
if constexpr (_control_type ==
oneapi::experimental::latency_control_type::exact) {
_control_type_code = 1;
} else if constexpr (_control_type ==
oneapi::experimental::latency_control_type::max) {
_control_type_code = 2;
} else if constexpr (_control_type ==
oneapi::experimental::latency_control_type::min) {
_control_type_code = 3;
}

*__latency_control_mem_wrapper((_T *)Ptr, _anchor_id, _target_anchor, _type,
_cycle) = Val;
*__latency_control_mem_wrapper((_T *)Ptr, _anchor_id, _target_anchor,
_control_type_code, _relative_cycle) = Val;
#else
*Ptr = Val;
#endif
}

template <typename _T, access::address_space _space>
static void store(sycl::multi_ptr<_T, _space> Ptr, _T Val) {
store<_T, _space>(Ptr, Val, oneapi::experimental::properties{});
}
...
private:
#if defined(__SYCL_DEVICE_ONLY__) && __has_builtin(__builtin_intel_fpga_mem)
Expand Down Expand Up @@ -277,14 +283,15 @@ Queue.submit([&](sycl::handler &cgh) {
sycl::ext::intel::experimental::statically_coalesce<false>>;

// The following load is anchor 1
int Z = ExpPrefetchingLSU::load<
sycl::ext::intel::experimental::latency_anchor_id<1>>(input_ptr + 2);
int Z = ExpPrefetchingLSU::load(
input_ptr + 2, sycl::ext::oneapi::experimental::properties(
sycl::ext::oneapi::experimental::latency_anchor_id<1>));

// The following store occurs exactly 5 cycles after the anchor 1 read
ExpBurstCoalescedLSU::store<
sycl::ext::intel::experimental::latency_constraint<
1, sycl::ext::intel::experimental::type::exact, 5>>(output_ptr + 2,
Z);
ExpBurstCoalescedLSU::store(output_ptr + 2, Z,
sycl::ext::oneapi::experimental::properties(
sycl::ext::oneapi::experimental::latency_constraint<
1, sycl::ext::oneapi::experimental::latency_control_type::exact, 5>));
});
});
...
Expand Down
Loading