intel · martygrant · Jul 3, 2024 · Apr 2, 2024 · Apr 4, 2024 · May 22, 2024
@@ -117,7 +117,7 @@ is run repeatedly for different inputs, this is particularly useful.
 In order to achieve the goals described in previous sections, the following
 requirements were considered:
 
-1. Ability to update inputs/outputs of the graph between submissions, without
+1. Ability to update parameters of the graph between submissions, without
    changing the overall graph structure.
 2. Enable low effort porting of existing applications to use the extension.
 3. Profiling, debugging, and tracing functionality at the granularity of graph
@@ -206,7 +206,7 @@ Table {counter: tableNumber}. Values of the `SYCL_EXT_ONEAPI_GRAPH` macro.
 |1     |Initial extension version. Base features are supported.
 |===
 
-=== SYCL Graph Terminology
+=== SYCL Graph Terminology [[terminology]]
 
 Table {counter: tableNumber}. Terminology.
 [%header,cols="1,3"]
@@ -646,6 +646,7 @@ public:
 
     void update(node& node);
     void update(const std::vector<node>& nodes);
+    void update(const command_graph<graph_state::modifiable>& graph);
 };
 
 }  // namespace sycl::ext::oneapi::experimental
@@ -703,14 +704,14 @@ graph LR
 
 A graph in the executable state can have the configuration of its nodes modified
 using a concept called graph _update_. This avoids a user having to rebuild and
-finalize a new executable graph when only the inputs & outputs to a graph change
+finalize a new executable graph when only the parameters of graph nodes change
 between submissions.
 
 Updates to a graph will be scheduled after any in-flight executions of the same
 graph and will not affect previous submissions of the same graph. The user is
 not required to wait on any previous submissions of a graph before updating it.
 
-The only type of nodes that are currently supported for updating in a graph are
+The only type of nodes that are currently able to be updated in a graph are
 kernel execution nodes.
 
 The aspects of a kernel execution node that can be configured during update are:
@@ -742,9 +743,9 @@ The other node configuration that can be updated is the execution range of the
 kernel, this can be set through `node::update_nd_range()` or
 `node::update_range()` but does not require any prior registration.
 
-These updated nodes can then be passed to 
-`command_graph<graph_state::executable>::update()` which will update the
-executable graph with the current state of the nodes.
+The executable graph can then be updated by passing the updated nodes to
+`command_graph<graph_state::executable>::update(node& node)` or
+`command_graph<graph_state::executable>::update(const std::vector<node>& nodes)`.
 
 Since the structure of the graph became fixed when finalizing, updating
 parameters on a node will not change the already defined dependencies between
@@ -761,6 +762,51 @@ dynamic parameter for the buffer can be registered with all the nodes which
 use the buffer as a parameter. Then a single `dynamic_parameter::update()` call
 will maintain the graphs data dependencies.
 
+===== Whole Graph Update [[whole-graph-update]]
+
+A graph in the executable state can have all of its nodes updated using the
+`command_graph<graph_state::executable>::update(graph)` method. This method
+takes a source graph in the modifiable state and updates the nodes in the target
+executable state graph to reflect any changes made to the nodes in the source
+graph. The characteristics which will be updated are detailed in the section on
+<<executable-graph-update, Executable Graph Update>>.
+
+Both the source and target graphs for the update must satisfy the following
+conditions:
+
+* Both graphs must have been created with the same device and context.
+* Both graphs must be topologically identical. The graphs are considered
+  topologically identical when:
+
+** Both graphs must have the same number of nodes and edges.
+** Internal edges must be between corresponding nodes in each graph.
+** Nodes must be added in the same order in the two graphs. Nodes may be added
+   via `command_graph::add`, or for a recorded queue via `queue::submit` or
+   queue shortcut functions.
+** Corresponding nodes in each graph must be kernels that have the same type:
+
+*** When the kernel is defined as a lambda, the lambda must be the same.
+*** When the kernel is defined as a named function object, the kernel class
+    must be the same.
+*** When the kernel is defined as a plain function, the function must be the
+    same.
+
+** Edge dependencies for each node in the two graphs must be created in the
+   same order by using the same API invocation to create each edge. See
+   the <<terminology, terminology section>> for an exhaustive definition of
+   how edges are defined in a graph for each of the two graph construction
+   APIs.
+
+Attempting to use whole-graph update with source or target graphs which do not
+satisfy the conditions of topological identity results in undefined behaviour,
+as it may prevent the runtime from pairing nodes in the source and target
+graphs.
+
+It is valid to use nodes that contain dynamic parameters in whole graph updates.
+If a node containing a dynamic parameter is updated through the whole graph
+update API, then any previous updates to the dynamic parameter will be reflected
+in the new graph.
+
 ==== Graph Properties [[graph-properties]]
 
 ===== No-Cycle-Check Property
@@ -1167,6 +1213,58 @@ Exceptions:
   `property::graph::updatable` was not set when the executable graph was created.
 * Throws with error code `invalid` if any node in `nodes` is not part of the
   graph.
+
+|
+[source, c++]
+----
+void
+update(const command_graph<graph_state::modifiable>& source);
+----
+
+|Updates all of the nodes in the target graph with parameters from a
+topologically identical source graph in the modifiable state. The full
+definition of what constitutes a topologically identical graph can be found in
+the <<whole-graph-update, Whole-Graph Update>> section. Violating any of these
+topology requirements results in undefined behaviour.
+
+The characteristics in the executable graph which will be updated are detailed
+in the section on <<executable-graph-update, Executable Graph Update>>.
+
+It is not an error to update an executable graph such that all parameters of
+nodes in `source` are identical to the arguments of the executable graph prior to
+the update.
+
+The implementation may perform a blocking wait during this call on
+any in-flight executions of that same graph if required by the backend.
+
+This function may only be called if the graph was created with the `updatable`
+property.
+
+Constraints:
+
+* This member function is only available when the `command_graph` state is
+  `graph_state::executable`.
+
+Parameters:
+
+* `source` - Modifiable graph object used as the source for updating this graph.
+
+Exceptions:
+
+* Throws synchronously with error code `invalid` if `source` contains any node
+  which is not one of the following types:
+
+** `node_type::empty`
+** `node_type::ext_oneapi_barrier`
+** `node_type::kernel`
+
+* Throws synchronously with error code `invalid` if the context or device
+  associated with `source` does not match that of the `command_graph` being
+  updated.
+
+* Throws synchronously with error code `invalid` if
+  `property::graph::updatable` was not set when the executable graph was
+  created.
 |===
 
 Table {counter: tableNumber}. Member functions of the `command_graph` class for
@@ -2125,6 +2223,69 @@ node nodeA = myGraph.add([&](handler& cgh) {
 dynParamAccessor.update(bufferB.get_access());
 ----
 
+=== Whole Graph Update
+
+Example that shows recording and updating several nodes with different
+parameters using <<whole-graph-update, Whole Graph Update>>.
+
+[source,c++]
+----
+...
+using namespace sycl;
+namespace sycl_ext = sycl::ext::oneapi::experimental;
+
+// Enqueue several kernels which use inputPtr
+void run_kernels(int* inputPtr, queue syclQueue){
+  event eventA = syclQueue.submit([&](handler& CGH){
+    CGH.parallel_for(...);
+  });
+  event eventB = syclQueue.submit([&](handler& CGH){
+    CGH.depends_on(eventA);
+    CGH.parallel_for(...);
+  });
+  syclQueue.submit([&](handler& CGH){
+    CGH.depends_on(eventB);
+    CGH.parallel_for(...);
+  });
+}
+
+...
+
+queue myQueue;
+
+// USM allocations
+const size_t n = 1024;
+int *ptrA = malloc_device<int>(n, myQueue);
+int *ptrB = malloc_device<int>(n, myQueue);
+
+// Main graph which will be updated later
+sycl_ext::command_graph mainGraph(myQueue);
+
+// Record the kernels to mainGraph, using ptrA
+mainGraph.begin_recording(myQueue);
+run_kernels(ptrA, myQueue);
+mainGraph.end_recording();
+
+auto execMainGraph = mainGraph.finalize({sycl_ext::property::graph::updatable});
+
+// Execute execMainGraph
+myQueue.ext_oneapi_graph(execMainGraph);
+
+// Record a second graph which records the same kernels, but using ptrB instead
+sycl_ext::command_graph updateGraph(myQueue);
+updateGraph.begin_recording(myQueue);
+run_kernels(ptrB, myQueue);
+updateGraph.end_recording();
+
+// Update execMainGraph using updateGraph. We do not need to finalize
+// updateGraph (this would be expensive)
+execMainGraph.update(updateGraph);
+
+// Execute execMainGraph again, which will now be operating on ptrB instead of
+// ptrA
+myQueue.ext_oneapi_graph(execMainGraph);
+----
+
 == Future Direction [[future-direction]]
 
 This section contains both features of the specification which have been
@@ -2185,89 +2346,6 @@ if all the commands accessing this buffer use `access_mode::write` or the
 Note, however, that these cases require the application to disable copy-back
 as described in <<buffer-limitations, Buffer Limitations>>.
 
-==== Whole Graph Update
-
-A graph in the executable state can have each nodes inputs & outputs updated
-using the `command_graph::update()` method. This takes a graph in the
-modifiable state and updates the executable graph to use the node input &
-outputs of the modifiable graph, a technique called _Whole Graph Update_. The
-modifiable graph must have the same topology as the graph originally used to
-create the executable graphs, with the nodes targeting the same devices and
-added in the same order.
-If a graph has been updated since its last submission, the sequential 
-execution constraint is no longer required.
-The automatic addition of dependencies is disabled and updated graphs 
-can be submitted simultaneously.
-Users are therefore responsible for explicitly managing potential dependencies
-between these executions to avoid data races.
-
-:sycl-kernel-function: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl-kernel-function
-
-Table {counter: tableNumber}. Member functions of the `command_graph` class (executable graph update).
-[cols="2a,a"]
-|===
-|Member function|Description
-
-|
-[source, c++]
-----
-void
-update(const command_graph<graph_state::modifiable>& graph);
-----
-
-
-|Updates the executable graph node inputs & outputs from a topologically
-identical modifiable graph. A topologically identical graph is one with the
-same structure of nodes and edges, and the nodes added in the same order to
-both graphs. Equivalent nodes in topologically identical graphs each have the
-same command, targeting the same device. There is the additional limitation that
-to update an executable graph, every node in the graph must be either a kernel
-command or a host task.
-
-The only characteristic that can differ between two topologically identical
-graphs during an update are the arguments to kernel nodes. For example,
-the graph may capture different values for the USM pointers or accessors used
-in the graph. It is these kernels arguments in `graph` that constitute the
-inputs & outputs to update to.
-
-Differences in the following characteristics between two graphs during an
-update results in undefined behavior:
-
-* Modifying the native C++ callable of a `host task` node.
-* Modifying the {sycl-kernel-function}[kernel function] of a kernel node.
-
-The effects of the update will be visible on the next submission of the
-executable graph without the need for additional user synchronization.
-
-Constraints:
-
-* This member function is only available when the `command_graph` state is
-  `graph_state::executable`.
-
-Parameters:
-
-* `graph` - Modifiable graph object to update graph node inputs & outputs with.
-  This graph must have the same topology as the original graph used on
-  executable graph creation.
-
-Exceptions:
-
-* Throws synchronously with error code `invalid` if the topology of `graph` is
-  not the same as the existing graph topology, or if the nodes were not added in
-  the same order.
-
-:handler-copy-functions: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.handler.copy
-
-* Throws synchronously with error code `invalid` if `graph` contains any node
-  which is not a kernel command or host task, e.g.
-  {handler-copy-functions}[memory operations].
-
-* Throws synchronously with error code `invalid` if the context or device
-  associated with `graph` does not match that of the `command_graph` being
-  updated.
-
-|===
-
 === Features Still in Development
 
 ==== Memory Allocation Nodes
@@ -2331,11 +2409,11 @@ runtime.
 
 === Update More Command Types
 
-Support updating arguments to types of nodes other that kernel execution
+Support updating arguments to types of nodes other than kernel execution
 commands.
 
-**UNRESOLVED** Should be added for at least memory copy nodes, however
-full scope of support needs to be designed and implemented.
+**UNRESOLVED** Should be added for at least memory copy nodes and host-tasks.
+However, the full scope of support needs to be designed and implemented.
 
 === Updatable Property Graph Resubmission