Aligned spec.md error messages for NodePublishDevice/NodeUnpublishDevice. Clarified some wording

davidz627 · davidz627 · commit e5e86f731e06 · 2018-01-17T18:19:25.000-08:00
diff --git a/spec.md b/spec.md
@@ -1061,9 +1061,9 @@ It is NOT REQUIRED for a controller plugin to implement the `LIST_VOLUMES` capab
 A Node Plugin MUST implement this RPC call if it has `PUBLISH_UNPUBLISH_DEVICE` node capability.
 This RPC is called by the CO when a workload that wants to use the specified volume is placed (scheduled) on a node.
 The Plugin SHALL assume that this RPC will be executed on the node where the volume will be used.
-This RPC MUST be called by the CO once per node, per volume.
-If the corresponding Controller Plugin has `PUBLISH_UNPUBLISH_VOLUME` controller capability and the Node Plugin has `PUBLISH_UNPUBLISH_DEVICE`, then the CO MUST guarantee that this RPC is called after `ControllerPublishVolume` is called for the given volume on the given node and returns a success.
-The CO MUST guarantee that this RPC is called and returns a success before `NodePublishVolume` is called for the given volume on the given node.
+This RPC MUST be called by the CO a maximum of once per node, per volume.
+If the corresponding Controller Plugin has `PUBLISH_UNPUBLISH_VOLUME` controller capability and the Node Plugin has `PUBLISH_UNPUBLISH_DEVICE` capability, then the CO MUST guarantee that this RPC is called after `ControllerPublishVolume` is called for the given volume on the given node and returns a success.
+The CO MUST guarantee that this RPC is called and returns a success before any `NodePublishVolume` is called for the given volume on the given node.
 This operation MUST be idempotent.
 If this RPC failed, or the CO does not know if it failed or not, it MAY choose to call `NodePublishDevice` again, or choose to call `NodeUnpublishDevice`.
 
@@ -1094,24 +1094,31 @@ message NodePublishDeviceRequest {
   VolumeCapability volume_capability = 5;
 }
 
-message NodePublishDeviceResponse {
-  message Result {}
-
-  // One of the following fields MUST be specified.
-  oneof reply {
-    Result result = 1;
-    Error error = 2;
-  }
-}
+message NodePublishDeviceResponse {}
 ```
 
+#### NodePublishDevice Errors
+
+If the plugin is unable to complete the NodePublishDevice call successfully, it MUST return a non-ok gRPC code in the gRPC status.
+If the conditions defined below are encountered, the plugin MUST return the specified gRPC error code.
+The CO MUST implement the specified error recovery behavior when it encounters the gRPC error code.
+
+| Condition | gRPC Code | Description | Recovery Behavior |
+|-----------|-----------|-------------|-------------------|
+| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. |
+| Volume published but is incompatible | 6 ALREADY_EXISTS | Indicates that a volume corresponding to the specified `volume_id` has already been published at the specified `global_target_path` but is incompatible with the specified `volume_capability` flag. | Caller MUST fix the arguments before retying. |
+| Operation pending for volume | 10 ABORTED | Indicates that there is a already an operation pending for the specified volume. In general the Cluster Orchestrator (CO) is responsible for ensuring that there is no more than one call "in-flight" per volume at a given time. However, in some circumstances, the CO MAY lose state (for example when the CO crashes and restarts), and MAY issue multiple calls simultaneously for the same volume. The Plugin, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified volume, and then retry with exponential back off. |
+| Exceeds capabilities | 10 FAILED_PRECONDITION | Indicates that the CO has exceeded the volume's capabilities because the volume does not have MULTI_NODE capability. | Caller MAY choose to call `ValidateVolumeCapabilities` to validate the volume capabilities, or wait for the volume to be unpublished on the node. |
+
 #### `NodeUnpublishDevice`
 
 A Node Plugin MUST implement this RPC call if it has `PUBLISH_UNPUBLISH_DEVICE` node capability.
 This RPC is a reverse operation of `NodePublishDevice`.
 This RPC MUST undo the work by the corresponding `NodePublishDevice`.
 This RPC SHALL be called by the CO once for each `global_target_path` that was successfully setup via `NodePublishDevice`.
-If the corresponding Controller Plugin has `PUBLISH_UNPUBLISH_VOLUME` controller capability, the CO SHOULD issue all `NodeUnpublishDevice` (as specified above) before calling `ControllerUnpublishVolume` for the given node and the given volume.
+If the corresponding Controller Plugin has `PUBLISH_UNPUBLISH_VOLUME` controller capability and the Node Plugin has `PUBLISH_UNPUBLISH_DEVICE` capability, the CO MUST guarantee that this RPC is called and returns success before calling `ControllerUnpublishVolume` for the given node and the given volume.
+The CO MUST guarantee that this RPC is called after all `NodeUnpublishVolume` have been called and returned success for the given volume on the given node.
+
 The Plugin SHALL assume that this RPC will be executed on the node where the volume is being used.
 
 This RPC is typically called by the CO when the workload using the volume is being moved to a different node, or all the workload using the volume on a node has finished.
@@ -1133,17 +1140,20 @@ message NodeUnpublishDeviceRequest {
   string global_target_path = 3;
 }
 
-message NodeUnpublishDeviceResponse {
-  message Result {}
-
-  // One of the following fields MUST be specified.
-  oneof reply {
-    Result result = 1;
-    Error error = 2;
-  }
-}
+message NodeUnpublishDeviceResponse {}
 ```
 
+#### NodeUnpublishDevice Errors
+
+If the plugin is unable to complete the NodeUnpublishDevice call successfully, it MUST return a non-ok gRPC code in the gRPC status.
+If the conditions defined below are encountered, the plugin MUST return the specified gRPC error code.
+The CO MUST implement the specified error recovery behavior when it encounters the gRPC error code.
+
+| Condition | gRPC Code | Description | Recovery Behavior |
+|-----------|-----------|-------------|-------------------|
+| Volume does not exists | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. |
+| Operation pending for volume | 10 ABORTED | Indicates that there is a already an operation pending for the specified volume. In general the Cluster Orchestrator (CO) is responsible for ensuring that there is no more than one call "in-flight" per volume at a given time. However, in some circumstances, the CO MAY lose state (for example when the CO crashes and restarts), and MAY issue multiple calls simultaneously for the same volume. The Plugin, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified volume, and then retry with exponential back off. |
+
 #### `NodePublishVolume`
 
 This RPC is called by the CO when a workload that wants to use the specified volume is placed (scheduled) on a node.
@@ -1184,6 +1194,8 @@ message NodePublishVolumeRequest {
   // The path to which the device was mounted by `NodePublishDevice`.
   // It MUST be an absolute path in the root filesystem of the process
   // serving this request.
+  // It MUST be set if the Node Plugin implements the 
+  // `PUBLISH_UNPUBLISH_DEVICE` node capability.
   // This is an OPTIONAL field.
   string global_target_path = 4;
 
@@ -1264,6 +1276,8 @@ message NodeUnpublishVolumeRequest {
   // The path to which the device was mounted by `NodePublishDevice`.
   // It MUST be an absolute path in the root filesystem of the process
   // serving this request.
+  // It MUST be set if the Node Plugin implements the 
+  // `PUBLISH_UNPUBLISH_DEVICE` node capability.
   // This is an OPTIONAL field.
   string global_target_path = 3;
 
@@ -1400,111 +1414,6 @@ message NodeServiceCapability {
 ##### NodeGetCapabilities Errors
 
 If the plugin is unable to complete the NodeGetCapabilities call successfully, it MUST return a non-ok gRPC code in the gRPC status.
-    string error_description = 2;
-  }
-
-  // `NodePublishDevice` specific error.
-  message NodePublishDeviceError {
-    enum NodePublishDeviceErrorCode {
-      // Default value for backwards compatibility. SHOULD NOT be
-      // returned by Plugins. However, if a Plugin returns a
-      // `NodePublishDeviceErrorCode` code that an older CSI
-      // client is not aware of, the client will see this code (the
-      // default fallback).
-      //
-      // Recovery behavior: Caller SHOULD consider updating CSI client
-      // to match Plugin CSI version.
-      UNKNOWN = 0;
-
-      // Indicates that there is a already an operation pending for the
-      // specified volume. In general the Cluster Orchestrator (CO) is
-      // responsible for ensuring that there is no more than one call
-      // “in-flight” per volume at a given time. However, in some
-      // circumstances, the CO MAY lose state (for example when the CO
-      // crashes and restarts), and MAY issue multiple calls
-      // simultaneously for the same volume. The Plugin, SHOULD handle
-      // this as gracefully as possible, and MAY return this error code
-      // to reject secondary calls.
-      //
-      // Recovery behavior: Caller SHOULD ensure that there are no other
-      // calls pending for the specified volume, and then retry with
-      // exponential back off.
-      OPERATION_PENDING_FOR_VOLUME = 1;
-
-      // Indicates that a volume corresponding to the specified
-      // volume ID does not exist.
-      //
-      // Recovery behavior: Caller SHOULD verify that the volume ID
-      // is correct and that the volume is accessible and has not been
-      // deleted before retrying with exponential back off.
-      VOLUME_DOES_NOT_EXIST = 2;
-
-      UNSUPPORTED_MOUNT_FLAGS = 3;
-      UNSUPPORTED_VOLUME_TYPE = 4;
-      UNSUPPORTED_FS_TYPE = 5;
-      MOUNT_ERROR = 6;
-
-      // Indicates that the specified volume ID is not allowed or
-      // understood by the Plugin. More human-readable information MAY
-      // be provided in the `error_description` field.
-      //
-      // Recovery behavior: Caller MUST fix the volume ID before
-      // retrying.
-      INVALID_VOLUME_ID = 7;
-    }
-
-    NodePublishDeviceErrorCode error_code = 1;
-    string error_description = 2;
-  }
-
-  // `NodeUnpublishDevice` specific error.
-  message NodeUnpublishDeviceError {
-    enum NodeUnpublishDeviceErrorCode {
-      // Default value for backwards compatibility. SHOULD NOT be
-      // returned by Plugins. However, if a Plugin returns a
-      // `NodeUnpublishDeviceErrorCode` code that an older CSI
-      // client is not aware of, the client will see this code (the
-      // default fallback).
-      //
-      // Recovery behavior: Caller SHOULD consider updating CSI client
-      // to match Plugin CSI version.
-      UNKNOWN = 0;
-
-      // Indicates that there is a already an operation pending for the
-      // specified volume. In general the Cluster Orchestrator (CO) is
-      // responsible for ensuring that there is no more than one call
-      // “in-flight” per volume at a given time. However, in some
-      // circumstances, the CO MAY lose state (for example when the CO
-      // crashes and restarts), and MAY issue multiple calls
-      // simultaneously for the same volume. The Plugin, SHOULD handle
-      // this as gracefully as possible, and MAY return this error code
-      // to reject secondary calls.
-      //
-      // Recovery behavior: Caller SHOULD ensure that there are no other
-      // calls pending for the specified volume, and then retry with
-      // exponential back off.
-      OPERATION_PENDING_FOR_VOLUME = 1;
-
-      // Indicates that a volume corresponding to the specified
-      // volume ID does not exist.
-      //
-      // Recovery behavior: Caller SHOULD verify that the volume ID
-      // is correct and that the volume is accessible and has not been
-      // deleted before retrying with exponential back off.
-      VOLUME_DOES_NOT_EXIST = 2;
-
-      UNMOUNT_ERROR = 3;
-
-      // Indicates that the specified volume ID is not allowed or
-      // understood by the Plugin. More human-readable information MAY
-      // be provided in the `error_description` field.
-      //
-      // Recovery behavior: Caller MUST fix the volume ID before
-      // retrying.
-      INVALID_VOLUME_ID = 4;
-    }
-
-    NodeUnpublishDeviceErrorCode error_code = 1;
 
 ## Protocol