autoscaler: update storageautoscaler design doc

parth-gr · parth-gr · commit 6323f076d961 · 2025-03-27T13:25:10.000+05:30
Currrently the controller was dependent on reading cr status,
update the controller to not depend on cr status

Signed-off-by: parth-gr &lt;partharora1010@gmail.com&gt;
diff --git a/docs/design/storage-auto-scale.md b/docs/design/storage-auto-scale.md
@@ -29,7 +29,7 @@ Implement auto storage scale CR per device class of a specific Storagecluster.
         maxOsdSize: 8Ti
         timeoutSeconds: 1800   
     status:
-        phase: (NotStarted | InProgress | Succeeded | Failed)
+        phase: ("" | NotStarted | InProgress | Succeeded | Failed)
         error:
             message: ""
             timestamp: <time>
@@ -59,7 +59,7 @@ Go-routine Scraper:
     Query: (ceph_osd_metadata * on (ceph_daemon, namespace, managedBy) group_right(device_class,hostname) (ceph_osd_stat_bytes_used / ceph_osd_stat_bytes))
 
 2) With every run calculate different device class highest osd filled percentage.
-   
+
         OsdPercentage:
         
         ssd               69%
@@ -72,36 +72,54 @@ Controller:
 
 1) Create a named controller that watches for [channel generic event](https://book-v1.book.kubebuilder.io/beyond_basics/controller_watches) per device class.
 
-2) If the expansion is not in progress set the status `phase` to `NotStarted` 
+2) Set the status `phase` to `NotStarted`, if no expansion has triggered(status.phase==""). 
+   
+3) Check if the expansion is in progress:
+   
+   1) Query actualOsdCount and actualOsdSize from the from Prometheus .
+   
+   2) Calculate the desiredOsdCount and desiredOsdSize from the Storagecluster.
+   
+   3) If the (actualOsdSize < desiredOsdSize || actualOsdCount < desiredOsdCount), expansion is in progress. 
    
-3) If an expansion is in progress(expectedOsdSize!=startOsdSize || expectedOsdCount!=startOsdCount), check the progress and then requeue each 1 minute until the expansion is completed successfully(jump to step 11).
+4) If the expansion is in progress, check the progress and then requeue each 1 minute until the expansion is completed successfully(jump to step 13).
    
-4) If the LSO storageclass is detected in the storageClassDeviceSet, raise a warning and do not recocnile further.
+5) If no-expansion is in progress,
    
-5) If the highest osd percentage reported in the sync map is more than osdScalingThresholdPercent(70%) means reaching osd nearfull, scaling is needed.
+   1) Check the (status.phase == "InProgress"), if yes move the status phase to succeded.
 
-6) If scaling is needed calculate the `expectedOsdSize` and `expectedOsdCount`.
+   2) Proceed with further steps.
+
+6) If the LSO storageclass is detected in the storageClassDeviceSet, raise a warning and do not recocnile further.
+   
+7) If the highest osd percentage reported in the sync map is more than osdScalingThresholdPercent(70%) means reaching osd nearfull, scaling is needed.
+
+8) If scaling is needed calculate the `expectedOsdSize` and `expectedOsdCount`.
        
     1)  If the Osd size is less than maxOsdSize(default:8Tib), do vertical scaling by doubling the each osd sizes for that device class.
             
     2)  If the Osd size is equal to maxOsdSize(default:8Tib), do a horizontal scaling, by adding 1 osd of maxOsdSize(default:8Tib) on each `storageDeviceSet`.
    
-7) Calculate the `expectedStorageCapacity` based on expected size and count.
+9)  Calculate the `expectedStorageCapacity` based on expected size and count.
    
-8) Check if the `storageCapacityLimit` > `expectedStorageCapacity`.
+10) Check if the `storageCapacityLimit` > `expectedStorageCapacity`.
         
     1) If yes, Update `phase` to `InProgress`  and `lastExpansionStartTime` as `current-time` on the `StorageAutoScaling` CR which need scaling.
         
     2) If no, don't reconcile further and the set the `storageCapacityLimitReached` as true in the status.
 
-9)  Update the status, set the `expectedOsdSize` and `expectedOsdCount` to reflect the new expected value and also set `startOsdSize` and `startOsdCount` with current storagecluster values.
+11) Update the status, set the `expectedOsdSize` and `expectedOsdCount` to reflect the new expected value and also set `startOsdSize` and `startOsdCount` with current storagecluster values.
    
-10) Scale by patching the Storagecluster, with all the device sets update needed at the same time.
+12) Scale by patching the Storagecluster, with all the device sets update needed at the same time.
 
-11) Verify and Alert:
+13) Verify and Alert:
    
     1) Verify the Storagecluster whether the new osds are added or scaled in size, for all the device sets.
     
+       1) For vertical scaling, Query osd size from Prometheus and match it with storagecluster.spec..size.
+    
+       2) For horizontal scaling, Query osd count from Prometheus and match it with storagecluster.spec..count.
+    
     2) If the scaling is successful will update the status of the `StorageAutoScaling` CR with `lastExpansionCompletionTime` and `phase` and also osd count and size.
     
     3) If the auto scale is not completed, it will do a requeue every 1 min and, change the phase to `failed` if scaling not `Succeeded` with in timeoutSeconds(default:1800) interval.
@@ -113,19 +131,31 @@ Controller:
 Based on the above algorithm there would be two conditions where in-progress is set, elaborating those conditions,
 
 1) If scaling is just started:
+
     1) Set `phase` to `InProgress`.
-    2) Verify is the scaling is successful. 
+   
+    2) Verify is the scaling is successful.
+
     3) If the scaling is successful set the `phase` to `Succeeded`.
+
     4) Alert the user if the phase changes to `Succeeded`, alerting will be implemented with ocs-metrics-exporter.
+   
     5) If the scaling is not yet completed requeue every 1 min, we have the 2nd case. 
 
 2) If the scaling has already started and its requeue
+
     1) Now the requeue will happen every 1 min.
+
     2) At the start of reconcile will match that `startOsdSize` and `expectedOsdSize` is not equal and similar for osd count.
+
     3) And another validation will do is equating storagecluster spec with prometheus response.
+
     4) Will requeue till the scaling is in-progress.
+
     5) If the scaling is in-progress with more than timeoutSeconds(default:1800) interval we set the phase to `failed`.
+
     6) Alert the user if the phase changes to `Failed`, alerting will be implemented with ocs-metrics-exporter. 
+
     7) If there as a failure alert, provide a mitigation guide for the user. 
 
 ## Failure case