@@ -117,7 +117,7 @@ is run repeatedly for different inputs, this is particularly useful.
117
117
In order to achieve the goals described in previous sections, the following
118
118
requirements were considered:
119
119
120
- 1. Ability to update inputs/outputs of the graph between submissions, without
120
+ 1. Ability to update parameters of the graph between submissions, without
121
121
changing the overall graph structure.
122
122
2. Enable low effort porting of existing applications to use the extension.
123
123
3. Profiling, debugging, and tracing functionality at the granularity of graph
@@ -206,7 +206,7 @@ Table {counter: tableNumber}. Values of the `SYCL_EXT_ONEAPI_GRAPH` macro.
206
206
|1 |Initial extension version. Base features are supported.
207
207
|===
208
208
209
- === SYCL Graph Terminology
209
+ === SYCL Graph Terminology [[terminology]]
210
210
211
211
Table {counter: tableNumber}. Terminology.
212
212
[%header,cols="1,3"]
@@ -646,6 +646,7 @@ public:
646
646
647
647
void update(node& node);
648
648
void update(const std::vector<node>& nodes);
649
+ void update(const command_graph<graph_state::modifiable>& graph);
649
650
};
650
651
651
652
} // namespace sycl::ext::oneapi::experimental
@@ -703,14 +704,14 @@ graph LR
703
704
704
705
A graph in the executable state can have the configuration of its nodes modified
705
706
using a concept called graph _update_. This avoids a user having to rebuild and
706
- finalize a new executable graph when only the inputs & outputs to a graph change
707
+ finalize a new executable graph when only the parameters of graph nodes change
707
708
between submissions.
708
709
709
710
Updates to a graph will be scheduled after any in-flight executions of the same
710
711
graph and will not affect previous submissions of the same graph. The user is
711
712
not required to wait on any previous submissions of a graph before updating it.
712
713
713
- The only type of nodes that are currently supported for updating in a graph are
714
+ The only type of nodes that are currently able to be updated in a graph are
714
715
kernel execution nodes.
715
716
716
717
The aspects of a kernel execution node that can be configured during update are:
@@ -742,9 +743,9 @@ The other node configuration that can be updated is the execution range of the
742
743
kernel, this can be set through `node::update_nd_range()` or
743
744
`node::update_range()` but does not require any prior registration.
744
745
745
- These updated nodes can then be passed to
746
- `command_graph<graph_state::executable>::update()` which will update the
747
- executable graph with the current state of the nodes.
746
+ The executable graph can then be updated by passing the updated nodes to
747
+ `command_graph<graph_state::executable>::update(node& node )` or
748
+ `command_graph<graph_state:: executable>::update(const std::vector<node>& nodes)` .
748
749
749
750
Since the structure of the graph became fixed when finalizing, updating
750
751
parameters on a node will not change the already defined dependencies between
@@ -761,6 +762,51 @@ dynamic parameter for the buffer can be registered with all the nodes which
761
762
use the buffer as a parameter. Then a single `dynamic_parameter::update()` call
762
763
will maintain the graphs data dependencies.
763
764
765
+ ===== Whole Graph Update [[whole-graph-update]]
766
+
767
+ A graph in the executable state can have all of its nodes updated using the
768
+ `command_graph<graph_state::executable>::update(graph)` method. This method
769
+ takes a source graph in the modifiable state and updates the nodes in the target
770
+ executable state graph to reflect any changes made to the nodes in the source
771
+ graph. The characteristics which will be updated are detailed in the section on
772
+ <<executable-graph-update, Executable Graph Update>>.
773
+
774
+ Both the source and target graphs for the update must satisfy the following
775
+ conditions:
776
+
777
+ * Both graphs must have been created with the same device and context.
778
+ * Both graphs must be topologically identical. The graphs are considered
779
+ topologically identical when:
780
+
781
+ ** Both graphs must have the same number of nodes and edges.
782
+ ** Internal edges must be between corresponding nodes in each graph.
783
+ ** Nodes must be added in the same order in the two graphs. Nodes may be added
784
+ via `command_graph::add`, or for a recorded queue via `queue::submit` or
785
+ queue shortcut functions.
786
+ ** Corresponding nodes in each graph must be kernels that have the same type:
787
+
788
+ *** When the kernel is defined as a lambda, the lambda must be the same.
789
+ *** When the kernel is defined as a named function object, the kernel class
790
+ must be the same.
791
+ *** When the kernel is defined as a plain function, the function must be the
792
+ same.
793
+
794
+ ** Edge dependencies for each node in the two graphs must be created in the
795
+ same order by using the same API invocation to create each edge. See
796
+ the <<terminology, terminology section>> for an exhaustive definition of
797
+ how edges are defined in a graph for each of the two graph construction
798
+ APIs.
799
+
800
+ Attempting to use whole-graph update with source or target graphs which do not
801
+ satisfy the conditions of topological identity results in undefined behaviour,
802
+ as it may prevent the runtime from pairing nodes in the source and target
803
+ graphs.
804
+
805
+ It is valid to use nodes that contain dynamic parameters in whole graph updates.
806
+ If a node containing a dynamic parameter is updated through the whole graph
807
+ update API, then any previous updates to the dynamic parameter will be reflected
808
+ in the new graph.
809
+
764
810
==== Graph Properties [[graph-properties]]
765
811
766
812
===== No-Cycle-Check Property
@@ -1169,6 +1215,58 @@ Exceptions:
1169
1215
`property::graph::updatable` was not set when the executable graph was created.
1170
1216
* Throws with error code `invalid` if any node in `nodes` is not part of the
1171
1217
graph.
1218
+
1219
+ |
1220
+ [source, c++]
1221
+ ----
1222
+ void
1223
+ update(const command_graph<graph_state::modifiable>& source);
1224
+ ----
1225
+
1226
+ |Updates all of the nodes in the target graph with parameters from a
1227
+ topologically identical source graph in the modifiable state. The full
1228
+ definition of what constitutes a topologically identical graph can be found in
1229
+ the <<whole-graph-update, Whole-Graph Update>> section. Violating any of these
1230
+ topology requirements results in undefined behaviour.
1231
+
1232
+ The characteristics in the executable graph which will be updated are detailed
1233
+ in the section on <<executable-graph-update, Executable Graph Update>>.
1234
+
1235
+ It is not an error to update an executable graph such that all parameters of
1236
+ nodes in `source` are identical to the arguments of the executable graph prior to
1237
+ the update.
1238
+
1239
+ The implementation may perform a blocking wait during this call on
1240
+ any in-flight executions of that same graph if required by the backend.
1241
+
1242
+ This function may only be called if the graph was created with the `updatable`
1243
+ property.
1244
+
1245
+ Constraints:
1246
+
1247
+ * This member function is only available when the `command_graph` state is
1248
+ `graph_state::executable`.
1249
+
1250
+ Parameters:
1251
+
1252
+ * `source` - Modifiable graph object used as the source for updating this graph.
1253
+
1254
+ Exceptions:
1255
+
1256
+ * Throws synchronously with error code `invalid` if `source` contains any node
1257
+ which is not one of the following types:
1258
+
1259
+ ** `node_type::empty`
1260
+ ** `node_type::ext_oneapi_barrier`
1261
+ ** `node_type::kernel`
1262
+
1263
+ * Throws synchronously with error code `invalid` if the context or device
1264
+ associated with `source` does not match that of the `command_graph` being
1265
+ updated.
1266
+
1267
+ * Throws synchronously with error code `invalid` if
1268
+ `property::graph::updatable` was not set when the executable graph was
1269
+ created.
1172
1270
|===
1173
1271
1174
1272
Table {counter: tableNumber}. Member functions of the `command_graph` class for
@@ -2135,6 +2233,69 @@ node nodeA = myGraph.add([&](handler& cgh) {
2135
2233
dynParamAccessor.update(bufferB.get_access());
2136
2234
----
2137
2235
2236
+ === Whole Graph Update
2237
+
2238
+ Example that shows recording and updating several nodes with different
2239
+ parameters using <<whole-graph-update, Whole Graph Update>>.
2240
+
2241
+ [source,c++]
2242
+ ----
2243
+ ...
2244
+ using namespace sycl;
2245
+ namespace sycl_ext = sycl::ext::oneapi::experimental;
2246
+
2247
+ // Enqueue several kernels which use inputPtr
2248
+ void run_kernels(int* inputPtr, queue syclQueue){
2249
+ event eventA = syclQueue.submit([&](handler& CGH){
2250
+ CGH.parallel_for(...);
2251
+ });
2252
+ event eventB = syclQueue.submit([&](handler& CGH){
2253
+ CGH.depends_on(eventA);
2254
+ CGH.parallel_for(...);
2255
+ });
2256
+ syclQueue.submit([&](handler& CGH){
2257
+ CGH.depends_on(eventB);
2258
+ CGH.parallel_for(...);
2259
+ });
2260
+ }
2261
+
2262
+ ...
2263
+
2264
+ queue myQueue;
2265
+
2266
+ // USM allocations
2267
+ const size_t n = 1024;
2268
+ int *ptrA = malloc_device<int>(n, myQueue);
2269
+ int *ptrB = malloc_device<int>(n, myQueue);
2270
+
2271
+ // Main graph which will be updated later
2272
+ sycl_ext::command_graph mainGraph(myQueue);
2273
+
2274
+ // Record the kernels to mainGraph, using ptrA
2275
+ mainGraph.begin_recording(myQueue);
2276
+ run_kernels(ptrA, myQueue);
2277
+ mainGraph.end_recording();
2278
+
2279
+ auto execMainGraph = mainGraph.finalize({sycl_ext::property::graph::updatable});
2280
+
2281
+ // Execute execMainGraph
2282
+ myQueue.ext_oneapi_graph(execMainGraph);
2283
+
2284
+ // Record a second graph which records the same kernels, but using ptrB instead
2285
+ sycl_ext::command_graph updateGraph(myQueue);
2286
+ updateGraph.begin_recording(myQueue);
2287
+ run_kernels(ptrB, myQueue);
2288
+ updateGraph.end_recording();
2289
+
2290
+ // Update execMainGraph using updateGraph. We do not need to finalize
2291
+ // updateGraph (this would be expensive)
2292
+ execMainGraph.update(updateGraph);
2293
+
2294
+ // Execute execMainGraph again, which will now be operating on ptrB instead of
2295
+ // ptrA
2296
+ myQueue.ext_oneapi_graph(execMainGraph);
2297
+ ----
2298
+
2138
2299
== Future Direction [[future-direction]]
2139
2300
2140
2301
This section contains both features of the specification which have been
@@ -2195,89 +2356,6 @@ if all the commands accessing this buffer use `access_mode::write` or the
2195
2356
Note, however, that these cases require the application to disable copy-back
2196
2357
as described in <<buffer-limitations, Buffer Limitations>>.
2197
2358
2198
- ==== Whole Graph Update
2199
-
2200
- A graph in the executable state can have each nodes inputs & outputs updated
2201
- using the `command_graph::update()` method. This takes a graph in the
2202
- modifiable state and updates the executable graph to use the node input &
2203
- outputs of the modifiable graph, a technique called _Whole Graph Update_. The
2204
- modifiable graph must have the same topology as the graph originally used to
2205
- create the executable graphs, with the nodes targeting the same devices and
2206
- added in the same order.
2207
- If a graph has been updated since its last submission, the sequential
2208
- execution constraint is no longer required.
2209
- The automatic addition of dependencies is disabled and updated graphs
2210
- can be submitted simultaneously.
2211
- Users are therefore responsible for explicitly managing potential dependencies
2212
- between these executions to avoid data races.
2213
-
2214
- :sycl-kernel-function: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl-kernel-function
2215
-
2216
- Table {counter: tableNumber}. Member functions of the `command_graph` class (executable graph update).
2217
- [cols="2a,a"]
2218
- |===
2219
- |Member function|Description
2220
-
2221
- |
2222
- [source, c++]
2223
- ----
2224
- void
2225
- update(const command_graph<graph_state::modifiable>& graph);
2226
- ----
2227
-
2228
-
2229
- |Updates the executable graph node inputs & outputs from a topologically
2230
- identical modifiable graph. A topologically identical graph is one with the
2231
- same structure of nodes and edges, and the nodes added in the same order to
2232
- both graphs. Equivalent nodes in topologically identical graphs each have the
2233
- same command, targeting the same device. There is the additional limitation that
2234
- to update an executable graph, every node in the graph must be either a kernel
2235
- command or a host task.
2236
-
2237
- The only characteristic that can differ between two topologically identical
2238
- graphs during an update are the arguments to kernel nodes. For example,
2239
- the graph may capture different values for the USM pointers or accessors used
2240
- in the graph. It is these kernels arguments in `graph` that constitute the
2241
- inputs & outputs to update to.
2242
-
2243
- Differences in the following characteristics between two graphs during an
2244
- update results in undefined behavior:
2245
-
2246
- * Modifying the native C++ callable of a `host task` node.
2247
- * Modifying the {sycl-kernel-function}[kernel function] of a kernel node.
2248
-
2249
- The effects of the update will be visible on the next submission of the
2250
- executable graph without the need for additional user synchronization.
2251
-
2252
- Constraints:
2253
-
2254
- * This member function is only available when the `command_graph` state is
2255
- `graph_state::executable`.
2256
-
2257
- Parameters:
2258
-
2259
- * `graph` - Modifiable graph object to update graph node inputs & outputs with.
2260
- This graph must have the same topology as the original graph used on
2261
- executable graph creation.
2262
-
2263
- Exceptions:
2264
-
2265
- * Throws synchronously with error code `invalid` if the topology of `graph` is
2266
- not the same as the existing graph topology, or if the nodes were not added in
2267
- the same order.
2268
-
2269
- :handler-copy-functions: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.handler.copy
2270
-
2271
- * Throws synchronously with error code `invalid` if `graph` contains any node
2272
- which is not a kernel command or host task, e.g.
2273
- {handler-copy-functions}[memory operations].
2274
-
2275
- * Throws synchronously with error code `invalid` if the context or device
2276
- associated with `graph` does not match that of the `command_graph` being
2277
- updated.
2278
-
2279
- |===
2280
-
2281
2359
=== Features Still in Development
2282
2360
2283
2361
==== Memory Allocation Nodes
@@ -2341,11 +2419,11 @@ runtime.
2341
2419
2342
2420
=== Update More Command Types
2343
2421
2344
- Support updating arguments to types of nodes other that kernel execution
2422
+ Support updating arguments to types of nodes other than kernel execution
2345
2423
commands.
2346
2424
2347
- **UNRESOLVED** Should be added for at least memory copy nodes, however
2348
- full scope of support needs to be designed and implemented.
2425
+ **UNRESOLVED** Should be added for at least memory copy nodes and host-tasks.
2426
+ However, the full scope of support needs to be designed and implemented.
2349
2427
2350
2428
=== Updatable Property Graph Resubmission
2351
2429
0 commit comments