Skip to content

Commit 2bc8b5b

Browse files
BensuoEwanC
andauthored
[SYCL][Graph] Implementation of explicit update with indices (#12840)
- Experimental implementation of explicit update with indices (based on API in #12486 ) - New scheduler command for updating a command buffer command - PI equivalents for new UR APIs - E2E and Unit Tests --------- Co-authored-by: Ewan Crawford <[email protected]>
1 parent 84426d1 commit 2bc8b5b

File tree

64 files changed

+2937
-93
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+2937
-93
lines changed

llvm/include/llvm/SYCLLowerIR/DeviceConfigFile.td

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ def AspectExt_oneapi_is_composite : Aspect<"ext_oneapi_is_composite">;
7171
def AspectExt_oneapi_is_component : Aspect<"ext_oneapi_is_component">;
7272
def AspectExt_oneapi_graph : Aspect<"ext_oneapi_graph">;
7373
def AspectExt_intel_fpga_task_sequence : Aspect<"ext_intel_fpga_task_sequence">;
74+
def AspectExt_oneapi_limited_graph : Aspect<"ext_oneapi_limited_graph">;
7475
// Deprecated aspects
7576
def AspectInt64_base_atomics : Aspect<"int64_base_atomics">;
7677
def AspectInt64_extended_atomics : Aspect<"int64_extended_atomics">;
@@ -122,7 +123,7 @@ def : TargetInfo<"__TestAspectList",
122123
AspectExt_oneapi_mipmap, AspectExt_oneapi_mipmap_anisotropy, AspectExt_oneapi_mipmap_level_reference, AspectExt_intel_esimd,
123124
AspectExt_oneapi_ballot_group, AspectExt_oneapi_fixed_size_group, AspectExt_oneapi_opportunistic_group,
124125
AspectExt_oneapi_tangle_group, AspectExt_intel_matrix, AspectExt_oneapi_is_composite, AspectExt_oneapi_is_component,
125-
AspectExt_oneapi_graph, AspectExt_intel_fpga_task_sequence],
126+
AspectExt_oneapi_graph, AspectExt_intel_fpga_task_sequence, AspectExt_oneapi_limited_graph],
126127
[]>;
127128
// This definition serves the only purpose of testing whether the deprecated aspect list defined in here and in SYCL RT
128129
// match.

sycl/doc/design/CommandGraph.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ with the following entry-points:
4646
| `urCommandBufferAppendMemBufferReadRectExp` | Append a rectangular memory read command to a command-buffer object. |
4747
| `urCommandBufferAppendMemBufferFillExp` | Append a memory fill command to a command-buffer object. |
4848
| `urCommandBufferEnqueueExp` | Submit command-buffer to a command-queue for execution. |
49+
| `urCommandBufferUpdateKernelLaunchExp` | Updates the parameters of a previous kernel launch command. |
4950

5051
See the [UR EXP-COMMAND-BUFFER](https://oneapi-src.github.io/unified-runtime/core/EXP-COMMAND-BUFFER.html)
5152
specification for more details.
@@ -230,6 +231,47 @@ on buffer usage in a graph so that their lifetime semantics are compatible with
230231
a lazy work execution model. However these changes to storage lifetimes have not
231232
yet been implemented.
232233

234+
## Graph Update
235+
236+
### Design Challenges
237+
238+
Graph update faces significant design challenges in SYCL:
239+
240+
* Lambda capture order is explicitly undefined in C++, so the user cannot reason
241+
about the indices of arguments captured by kernel lambdas.
242+
* Once arguments have been captured the actual type information is lost in the
243+
transition through the integration header and extracting arguments in the SYCL
244+
runtime, therefore we cannot automatically match new argument values by
245+
querying the captured arguments without significant possibility for
246+
collisions. For example, if a kernel captures two USM pointers and the user
247+
wishes to update one, we cannot reason about which pointer they actually want
248+
to update when we only know that: they are pointer args of a certain size.
249+
250+
The current approach is to limit graph update to the explicit APIs and where the
251+
user is using `handler::set_arg()` or some equivalent to manually set kernel
252+
arguments using indices. Therefore when updating we can use indices to avoid
253+
collisions. In practice there are only a few current scenarios where `set_arg()`
254+
can be used:
255+
256+
* The proposed ["Free Function Kernel"
257+
extension](../extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc)
258+
* OpenCL interop kernels created from SPIR-V source at runtime.
259+
260+
A possible future workaround lambda capture issues could be "Whole-Graph Update"
261+
where if we can guarantee that lambda capture order is the same across two
262+
different recordings we can then match parameter order when updating.
263+
264+
### Scheduler Integration
265+
266+
Graph updates in the runtime are synchronous calls however they can optionally
267+
be done through the scheduler using a new command,
268+
`sycl::detail::UpdateCommandBufferCommand`. This is needed when dealing with
269+
accessor updates. Since a new buffer which the user creates for updating may not
270+
yet have been lazily initialized on device we schedule a new command which has
271+
requirements for these new accessors to correctly trigger allocations before
272+
updating. This is similar to how individual graph commands are enqueued when
273+
accessors are used in a graph node.
274+
233275
## Backend Implementation
234276

235277
Implementation of UR command-buffers

sycl/include/sycl/detail/pi.def

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,9 @@ _PI_API(piextCommandBufferFillUSM)
184184
_PI_API(piextCommandBufferPrefetchUSM)
185185
_PI_API(piextCommandBufferAdviseUSM)
186186
_PI_API(piextEnqueueCommandBuffer)
187+
_PI_API(piextCommandBufferUpdateKernelLaunch)
188+
_PI_API(piextCommandBufferRetainCommand)
189+
_PI_API(piextCommandBufferReleaseCommand)
187190

188191
_PI_API(piextUSMPitchedAlloc)
189192

sycl/include/sycl/detail/pi.h

Lines changed: 65 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -156,9 +156,10 @@
156156
// piextEnqueueCooperativeKernelLaunch.
157157
// 15.46 Add piextGetGlobalVariablePointer
158158
// 15.47 Added PI_ERROR_FEATURE_UNSUPPORTED.
159+
// 15.48 Add CommandBuffer update definitions
159160

160161
#define _PI_H_VERSION_MAJOR 15
161-
#define _PI_H_VERSION_MINOR 47
162+
#define _PI_H_VERSION_MINOR 48
162163

163164
#define _PI_STRING_HELPER(a) #a
164165
#define _PI_CONCAT(a, b) _PI_STRING_HELPER(a.b)
@@ -445,6 +446,10 @@ typedef enum {
445446
// Composite device
446447
PI_EXT_ONEAPI_DEVICE_INFO_COMPONENT_DEVICES = 0x20111,
447448
PI_EXT_ONEAPI_DEVICE_INFO_COMPOSITE_DEVICE = 0x20112,
449+
450+
// Command Buffers
451+
PI_EXT_ONEAPI_DEVICE_INFO_COMMAND_BUFFER_SUPPORT = 0x20113,
452+
PI_EXT_ONEAPI_DEVICE_INFO_COMMAND_BUFFER_UPDATE_SUPPORT = 0x20114,
448453
} _pi_device_info;
449454

450455
typedef enum {
@@ -2320,7 +2325,10 @@ __SYCL_EXPORT pi_result piGetDeviceAndHostTimer(pi_device Device,
23202325
/// Command buffer extension
23212326
struct _pi_ext_command_buffer;
23222327
struct _pi_ext_sync_point;
2328+
struct _pi_ext_command_buffer_command;
2329+
23232330
using pi_ext_command_buffer = _pi_ext_command_buffer *;
2331+
using pi_ext_command_buffer_command = _pi_ext_command_buffer_command *;
23242332
using pi_ext_sync_point = pi_uint32;
23252333

23262334
typedef enum {
@@ -2330,7 +2338,40 @@ typedef enum {
23302338
struct pi_ext_command_buffer_desc final {
23312339
pi_ext_structure_type stype;
23322340
const void *pNext;
2333-
pi_queue_properties *properties;
2341+
pi_bool is_updatable;
2342+
};
2343+
2344+
// Command Buffer Update types
2345+
struct pi_ext_command_buffer_update_memobj_arg_desc_t final {
2346+
uint32_t arg_index;
2347+
const pi_mem_obj_property *properties;
2348+
pi_mem new_mem_obj;
2349+
};
2350+
2351+
struct pi_ext_command_buffer_update_pointer_arg_desc_t final {
2352+
uint32_t arg_index;
2353+
void *new_ptr;
2354+
};
2355+
2356+
struct pi_ext_command_buffer_update_value_arg_desc_t final {
2357+
uint32_t arg_index;
2358+
uint32_t arg_size;
2359+
void *new_value;
2360+
};
2361+
2362+
struct pi_ext_command_buffer_update_kernel_launch_desc final {
2363+
uint32_t num_mem_obj_args;
2364+
uint32_t num_ptr_args;
2365+
uint32_t num_value_args;
2366+
uint32_t num_work_dim;
2367+
2368+
pi_ext_command_buffer_update_memobj_arg_desc_t *mem_obj_arg_list;
2369+
pi_ext_command_buffer_update_pointer_arg_desc_t *ptr_arg_list;
2370+
pi_ext_command_buffer_update_value_arg_desc_t *value_arg_list;
2371+
2372+
size_t *global_work_offset;
2373+
size_t *global_work_size;
2374+
size_t *local_work_size;
23342375
};
23352376

23362377
/// API to create a command-buffer.
@@ -2374,12 +2415,14 @@ piextCommandBufferFinalize(pi_ext_command_buffer command_buffer);
23742415
/// \param sync_point_wait_list A list of sync points that this command must
23752416
/// wait on.
23762417
/// \param sync_point The sync_point associated with this kernel execution.
2418+
/// \param command Return pointer to the command representing this kernel
2419+
/// execution.
23772420
__SYCL_EXPORT pi_result piextCommandBufferNDRangeKernel(
23782421
pi_ext_command_buffer command_buffer, pi_kernel kernel, pi_uint32 work_dim,
23792422
const size_t *global_work_offset, const size_t *global_work_size,
23802423
const size_t *local_work_size, pi_uint32 num_sync_points_in_wait_list,
23812424
const pi_ext_sync_point *sync_point_wait_list,
2382-
pi_ext_sync_point *sync_point);
2425+
pi_ext_sync_point *sync_point, pi_ext_command_buffer_command *command);
23832426

23842427
/// API to append a USM memcpy command to the command-buffer.
23852428
/// \param command_buffer The command-buffer to append onto.
@@ -2607,6 +2650,25 @@ piextEnqueueCommandBuffer(pi_ext_command_buffer command_buffer, pi_queue queue,
26072650
pi_uint32 num_events_in_wait_list,
26082651
const pi_event *event_wait_list, pi_event *event);
26092652

2653+
/// API to update a kernel launch command inside of a command-buffer.
2654+
/// @param command The command to be updated.
2655+
/// @param desc Descriptor which describes the updated parameters of the kernel
2656+
/// launch.
2657+
__SYCL_EXPORT pi_result piextCommandBufferUpdateKernelLaunch(
2658+
pi_ext_command_buffer_command command,
2659+
pi_ext_command_buffer_update_kernel_launch_desc *desc);
2660+
2661+
/// API to increment the reference count of a command-buffer command.
2662+
/// \param command The command to release.
2663+
__SYCL_EXPORT pi_result
2664+
piextCommandBufferRetainCommand(pi_ext_command_buffer_command command);
2665+
2666+
/// API to decrement the reference count of a command-buffer command. After the
2667+
/// command reference count becomes zero, the command is deleted.
2668+
/// \param command The command to release.
2669+
__SYCL_EXPORT pi_result
2670+
piextCommandBufferReleaseCommand(pi_ext_command_buffer_command command);
2671+
26102672
/// API to destroy bindless unsampled image handles.
26112673
///
26122674
/// \param context is the pi_context

sycl/include/sycl/detail/pi.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ using PiKernelCacheConfig = ::pi_kernel_cache_config;
156156
using PiExtSyncPoint = ::pi_ext_sync_point;
157157
using PiExtCommandBuffer = ::pi_ext_command_buffer;
158158
using PiExtCommandBufferDesc = ::pi_ext_command_buffer_desc;
159+
using PiExtCommandBufferCommand = ::pi_ext_command_buffer_command;
159160
using PiPeerAttr = ::pi_peer_attr;
160161
using PiImageHandle = ::pi_image_handle;
161162
using PiImageMemHandle = ::pi_image_mem_handle;

sycl/include/sycl/detail/property_helper.hpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,9 @@ enum DataLessPropKind {
4747
GraphAssumeDataOutlivesBuffer = 22,
4848
GraphAssumeBufferOutlivesGraph = 23,
4949
GraphDependOnAllLeaves = 24,
50+
GraphUpdatable = 25,
5051
// Indicates the last known dataless property.
51-
LastKnownDataLessPropKind = 24,
52+
LastKnownDataLessPropKind = 25,
5253
// Exceeding 32 may cause ABI breaking change on some of OSes.
5354
DataLessPropKindSize = 32
5455
};

sycl/include/sycl/device_aspect_macros.hpp

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,11 @@
323323
#define __SYCL_ALL_DEVICES_HAVE_ext_intel_fpga_task_sequence__ 0
324324
#endif
325325

326+
#ifndef __SYCL_ALL_DEVICES_HAVE_ext_oneapi_limited_graph__
327+
// __SYCL_ASPECT(ext_oneapi_limited_graph, 63)
328+
#define __SYCL_ALL_DEVICES_HAVE_ext_oneapi_limited_graph__ 0
329+
#endif
330+
326331
#ifndef __SYCL_ANY_DEVICE_HAS_host__
327332
// __SYCL_ASPECT(host, 0)
328333
#define __SYCL_ANY_DEVICE_HAS_host__ 0
@@ -637,3 +642,8 @@
637642
// __SYCL_ASPECT(ext_intel_fpga_task_sequence__, 62)
638643
#define __SYCL_ANY_DEVICE_HAS_ext_intel_fpga_task_sequence__ 0
639644
#endif
645+
646+
#ifndef __SYCL_ANY_DEVICE_HAS_ext_oneapi_limited_graph__
647+
// __SYCL_ASPECT(ext_oneapi_limited_graph, 63)
648+
#define __SYCL_ANY_DEVICE_HAS_ext_oneapi_limited_graph__ 0
649+
#endif

0 commit comments

Comments
 (0)