You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
currrently the controller was dependednt on reading cr status,
update the controller to donot depend on cr status
Signed-off-by: parth-gr <[email protected]>
2) With every run calculate different device class highest osd filled percentage.
62
-
63
-
OsdPercentage:
61
+
2) With every run calculate different device class highest osd filled percentage, osdCount and osdSize.
62
+
63
+
OsdInfo:
64
64
65
-
ssd 69%
66
-
nvme 71%
65
+
ssd{
66
+
osdPercentage: 69%
67
+
osdCount: 3
68
+
osdSize: 100Gi
69
+
}
70
+
nvme{
71
+
osdPercentage: 71%
72
+
osdCount: 3
73
+
osdSize: 200Gi
74
+
}
67
75
...
68
76
69
-
3) Create a sync map with `OsdPercentage`, Send a event to the go channel.
77
+
3) Create a sync map with `OsdInfo`, Send a event to the go channel.
70
78
71
79
Controller:
72
80
73
81
1) Create a named controller that watches for [channel generic event](https://book-v1.book.kubebuilder.io/beyond_basics/controller_watches) per device class.
74
82
75
-
2) If the expansion is not in progress set the status `phase` to `NotStarted`
83
+
2) Set the status `phase` to `NotStarted`, if no expansion has triggered(status.phase=="").
84
+
85
+
3) Check if the expansion is in progress:
86
+
87
+
1) Load the actualOsdCount and actualOsdSize from the syncMap.
88
+
89
+
2) Load the desiredOsdCount and desiredOsdSize from the Storagecluster.
90
+
91
+
3) If the (actualOsdSize!=desiredOsdSize && actualOsdCount!=desiredOsdCount), expansion is in progress.
76
92
77
-
3) If an expansion is in progress(expectedOsdSize!=startOsdSize || expectedOsdCount!=startOsdCount), check the progress and then requeue each 1 minute until the expansion is completed successfully(jump to step 11).
93
+
4) Check the progress and then requeue each 1 minute until the expansion is completed successfully(jump to step 11).
94
+
95
+
5) If no-expansion is in progress proceed to with further steps.
78
96
79
97
4) If the LSO storageclass is detected in the storageClassDeviceSet, raise a warning and do not recocnile further.
80
98
@@ -102,6 +120,10 @@ Controller:
102
120
103
121
1) Verify the Storagecluster whether the new osds are added or scaled in size, for all the device sets.
104
122
123
+
1) For vertical scaling, Query osd size from Prometheus and match it with storagecluster.spec..size.
124
+
125
+
2) For horizontal scaling, Query osd count from Prometheus and match it with storagecluster.spec..count.
126
+
105
127
2) If the scaling is successful will update the status of the `StorageAutoScaling` CR with `lastExpansionCompletionTime` and `phase` and also osd count and size.
106
128
107
129
3) If the auto scale is not completed, it will do a requeue every 1 min and, change the phase to `failed` if scaling not `Succeeded` with in timeoutSeconds(default:1800) interval.
@@ -113,19 +135,31 @@ Controller:
113
135
Based on the above algorithm there would be two conditions where in-progress is set, elaborating those conditions,
114
136
115
137
1) If scaling is just started:
138
+
116
139
1) Set `phase` to `InProgress`.
117
-
2) Verify is the scaling is successful.
140
+
141
+
2) Verify is the scaling is successful.
142
+
118
143
3) If the scaling is successful set the `phase` to `Succeeded`.
144
+
119
145
4) Alert the user if the phase changes to `Succeeded`, alerting will be implemented with ocs-metrics-exporter.
146
+
120
147
5) If the scaling is not yet completed requeue every 1 min, we have the 2nd case.
121
148
122
149
2) If the scaling has already started and its requeue
150
+
123
151
1) Now the requeue will happen every 1 min.
152
+
124
153
2) At the start of reconcile will match that `startOsdSize` and `expectedOsdSize` is not equal and similar for osd count.
154
+
125
155
3) And another validation will do is equating storagecluster spec with prometheus response.
156
+
126
157
4) Will requeue till the scaling is in-progress.
158
+
127
159
5) If the scaling is in-progress with more than timeoutSeconds(default:1800) interval we set the phase to `failed`.
160
+
128
161
6) Alert the user if the phase changes to `Failed`, alerting will be implemented with ocs-metrics-exporter.
162
+
129
163
7) If there as a failure alert, provide a mitigation guide for the user.
0 commit comments