Skip to content

[SYCL][Graph][Doc] Specify API for whole graph updates #13253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 3, 2024
264 changes: 171 additions & 93 deletions sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ is run repeatedly for different inputs, this is particularly useful.
In order to achieve the goals described in previous sections, the following
requirements were considered:

1. Ability to update inputs/outputs of the graph between submissions, without
1. Ability to update parameters of the graph between submissions, without
changing the overall graph structure.
2. Enable low effort porting of existing applications to use the extension.
3. Profiling, debugging, and tracing functionality at the granularity of graph
Expand Down Expand Up @@ -206,7 +206,7 @@ Table {counter: tableNumber}. Values of the `SYCL_EXT_ONEAPI_GRAPH` macro.
|1 |Initial extension version. Base features are supported.
|===

=== SYCL Graph Terminology
=== SYCL Graph Terminology [[terminology]]

Table {counter: tableNumber}. Terminology.
[%header,cols="1,3"]
Expand Down Expand Up @@ -646,6 +646,7 @@ public:

void update(node& node);
void update(const std::vector<node>& nodes);
void update(const command_graph<graph_state::modifiable>& graph);
};

} // namespace sycl::ext::oneapi::experimental
Expand Down Expand Up @@ -703,14 +704,14 @@ graph LR

A graph in the executable state can have the configuration of its nodes modified
using a concept called graph _update_. This avoids a user having to rebuild and
finalize a new executable graph when only the inputs & outputs to a graph change
finalize a new executable graph when only the parameters of graph nodes change
between submissions.

Updates to a graph will be scheduled after any in-flight executions of the same
graph and will not affect previous submissions of the same graph. The user is
not required to wait on any previous submissions of a graph before updating it.

The only type of nodes that are currently supported for updating in a graph are
The only type of nodes that are currently able to be updated in a graph are
kernel execution nodes.

The aspects of a kernel execution node that can be configured during update are:
Expand Down Expand Up @@ -742,9 +743,9 @@ The other node configuration that can be updated is the execution range of the
kernel, this can be set through `node::update_nd_range()` or
`node::update_range()` but does not require any prior registration.

These updated nodes can then be passed to
`command_graph<graph_state::executable>::update()` which will update the
executable graph with the current state of the nodes.
The executable graph can then be updated by passing the updated nodes to
`command_graph<graph_state::executable>::update(node& node)` or
`command_graph<graph_state::executable>::update(const std::vector<node>& nodes)`.

Since the structure of the graph became fixed when finalizing, updating
parameters on a node will not change the already defined dependencies between
Expand All @@ -761,6 +762,51 @@ dynamic parameter for the buffer can be registered with all the nodes which
use the buffer as a parameter. Then a single `dynamic_parameter::update()` call
will maintain the graphs data dependencies.

===== Whole Graph Update [[whole-graph-update]]

A graph in the executable state can have all of its nodes updated using the
`command_graph<graph_state::executable>::update(graph)` method. This method
takes a source graph in the modifiable state and updates the nodes in the target
executable state graph to reflect any changes made to the nodes in the source
graph. The characteristics which will be updated are detailed in the section on
<<executable-graph-update, Executable Graph Update>>.

Both the source and target graphs for the update must satisfy the following
conditions:

* Both graphs must have been created with the same device and context.
* Both graphs must be topologically identical. The graphs are considered
topologically identical when:

** Both graphs must have the same number of nodes and edges.
** Internal edges must be between corresponding nodes in each graph.
** Nodes must be added in the same order in the two graphs. Nodes may be added
via `command_graph::add`, or for a recorded queue via `queue::submit` or
queue shortcut functions.
** Corresponding nodes in each graph must be kernels that have the same type:

*** When the kernel is defined as a lambda, the lambda must be the same.
*** When the kernel is defined as a named function object, the kernel class
must be the same.
*** When the kernel is defined as a plain function, the function must be the
same.

** Edge dependencies for each node in the two graphs must be created in the
same order by using the same API invocation to create each edge. See
the <<terminology, terminology section>> for an exhaustive definition of
how edges are defined in a graph for each of the two graph construction
APIs.

Attempting to use whole-graph update with source or target graphs which do not
satisfy the conditions of topological identity results in undefined behaviour,
as it may prevent the runtime from pairing nodes in the source and target
graphs.

It is valid to use nodes that contain dynamic parameters in whole graph updates.
If a node containing a dynamic parameter is updated through the whole graph
update API, then any previous updates to the dynamic parameter will be reflected
in the new graph.

==== Graph Properties [[graph-properties]]

===== No-Cycle-Check Property
Expand Down Expand Up @@ -1167,6 +1213,58 @@ Exceptions:
`property::graph::updatable` was not set when the executable graph was created.
* Throws with error code `invalid` if any node in `nodes` is not part of the
graph.

|
[source, c++]
----
void
update(const command_graph<graph_state::modifiable>& source);
----

|Updates all of the nodes in the target graph with parameters from a
topologically identical source graph in the modifiable state. The full
definition of what constitutes a topologically identical graph can be found in
the <<whole-graph-update, Whole-Graph Update>> section. Violating any of these
topology requirements results in undefined behaviour.

The characteristics in the executable graph which will be updated are detailed
in the section on <<executable-graph-update, Executable Graph Update>>.

It is not an error to update an executable graph such that all parameters of
nodes in `source` are identical to the arguments of the executable graph prior to
the update.

The implementation may perform a blocking wait during this call on
any in-flight executions of that same graph if required by the backend.

This function may only be called if the graph was created with the `updatable`
property.

Constraints:

* This member function is only available when the `command_graph` state is
`graph_state::executable`.

Parameters:

* `source` - Modifiable graph object used as the source for updating this graph.

Exceptions:

* Throws synchronously with error code `invalid` if `source` contains any node
which is not one of the following types:

** `node_type::empty`
** `node_type::ext_oneapi_barrier`
** `node_type::kernel`

* Throws synchronously with error code `invalid` if the context or device
associated with `source` does not match that of the `command_graph` being
updated.

* Throws synchronously with error code `invalid` if
`property::graph::updatable` was not set when the executable graph was
created.
|===

Table {counter: tableNumber}. Member functions of the `command_graph` class for
Expand Down Expand Up @@ -2125,6 +2223,69 @@ node nodeA = myGraph.add([&](handler& cgh) {
dynParamAccessor.update(bufferB.get_access());
----

=== Whole Graph Update

Example that shows recording and updating several nodes with different
parameters using <<whole-graph-update, Whole Graph Update>>.

[source,c++]
----
...
using namespace sycl;
namespace sycl_ext = sycl::ext::oneapi::experimental;

// Enqueue several kernels which use inputPtr
void run_kernels(int* inputPtr, queue syclQueue){
event eventA = syclQueue.submit([&](handler& CGH){
CGH.parallel_for(...);
});
event eventB = syclQueue.submit([&](handler& CGH){
CGH.depends_on(eventA);
CGH.parallel_for(...);
});
syclQueue.submit([&](handler& CGH){
CGH.depends_on(eventB);
CGH.parallel_for(...);
});
}

...

queue myQueue;

// USM allocations
const size_t n = 1024;
int *ptrA = malloc_device<int>(n, myQueue);
int *ptrB = malloc_device<int>(n, myQueue);

// Main graph which will be updated later
sycl_ext::command_graph mainGraph(myQueue);

// Record the kernels to mainGraph, using ptrA
mainGraph.begin_recording(myQueue);
run_kernels(ptrA, myQueue);
mainGraph.end_recording();

auto execMainGraph = mainGraph.finalize({sycl_ext::property::graph::updatable});

// Execute execMainGraph
myQueue.ext_oneapi_graph(execMainGraph);

// Record a second graph which records the same kernels, but using ptrB instead
sycl_ext::command_graph updateGraph(myQueue);
updateGraph.begin_recording(myQueue);
run_kernels(ptrB, myQueue);
updateGraph.end_recording();

// Update execMainGraph using updateGraph. We do not need to finalize
// updateGraph (this would be expensive)
execMainGraph.update(updateGraph);

// Execute execMainGraph again, which will now be operating on ptrB instead of
// ptrA
myQueue.ext_oneapi_graph(execMainGraph);
----

== Future Direction [[future-direction]]

This section contains both features of the specification which have been
Expand Down Expand Up @@ -2185,89 +2346,6 @@ if all the commands accessing this buffer use `access_mode::write` or the
Note, however, that these cases require the application to disable copy-back
as described in <<buffer-limitations, Buffer Limitations>>.

==== Whole Graph Update

A graph in the executable state can have each nodes inputs & outputs updated
using the `command_graph::update()` method. This takes a graph in the
modifiable state and updates the executable graph to use the node input &
outputs of the modifiable graph, a technique called _Whole Graph Update_. The
modifiable graph must have the same topology as the graph originally used to
create the executable graphs, with the nodes targeting the same devices and
added in the same order.
If a graph has been updated since its last submission, the sequential
execution constraint is no longer required.
The automatic addition of dependencies is disabled and updated graphs
can be submitted simultaneously.
Users are therefore responsible for explicitly managing potential dependencies
between these executions to avoid data races.

:sycl-kernel-function: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl-kernel-function

Table {counter: tableNumber}. Member functions of the `command_graph` class (executable graph update).
[cols="2a,a"]
|===
|Member function|Description

|
[source, c++]
----
void
update(const command_graph<graph_state::modifiable>& graph);
----


|Updates the executable graph node inputs & outputs from a topologically
identical modifiable graph. A topologically identical graph is one with the
same structure of nodes and edges, and the nodes added in the same order to
both graphs. Equivalent nodes in topologically identical graphs each have the
same command, targeting the same device. There is the additional limitation that
to update an executable graph, every node in the graph must be either a kernel
command or a host task.

The only characteristic that can differ between two topologically identical
graphs during an update are the arguments to kernel nodes. For example,
the graph may capture different values for the USM pointers or accessors used
in the graph. It is these kernels arguments in `graph` that constitute the
inputs & outputs to update to.

Differences in the following characteristics between two graphs during an
update results in undefined behavior:

* Modifying the native C++ callable of a `host task` node.
* Modifying the {sycl-kernel-function}[kernel function] of a kernel node.

The effects of the update will be visible on the next submission of the
executable graph without the need for additional user synchronization.

Constraints:

* This member function is only available when the `command_graph` state is
`graph_state::executable`.

Parameters:

* `graph` - Modifiable graph object to update graph node inputs & outputs with.
This graph must have the same topology as the original graph used on
executable graph creation.

Exceptions:

* Throws synchronously with error code `invalid` if the topology of `graph` is
not the same as the existing graph topology, or if the nodes were not added in
the same order.

:handler-copy-functions: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.handler.copy

* Throws synchronously with error code `invalid` if `graph` contains any node
which is not a kernel command or host task, e.g.
{handler-copy-functions}[memory operations].

* Throws synchronously with error code `invalid` if the context or device
associated with `graph` does not match that of the `command_graph` being
updated.

|===

=== Features Still in Development

==== Memory Allocation Nodes
Expand Down Expand Up @@ -2331,11 +2409,11 @@ runtime.

=== Update More Command Types

Support updating arguments to types of nodes other that kernel execution
Support updating arguments to types of nodes other than kernel execution
commands.

**UNRESOLVED** Should be added for at least memory copy nodes, however
full scope of support needs to be designed and implemented.
**UNRESOLVED** Should be added for at least memory copy nodes and host-tasks.
However, the full scope of support needs to be designed and implemented.

=== Updatable Property Graph Resubmission

Expand Down