|
| 1 | +# NodeAgentConfig configuration with restic/kopia |
| 2 | +Date: 2023-07-25 |
| 3 | + |
| 4 | +## Abstract |
| 5 | + |
| 6 | +New configuration for the file system backup & restore that will be used by the OADP and allow to choose Restic or Kopia uploader type. |
| 7 | + |
| 8 | +## Background |
| 9 | + |
| 10 | +For the file system backup and restore the Velero may use Restic or Kopia as the uploader mechanism. A number of tests were [performed](https://velero.io/docs/main/performance-guidance/) by the Velero community to compare those mechanisms. In many cases Kopia uploader is a much better performing mechanism and as such should be added to the OADP operator. |
| 11 | + |
| 12 | +Please refer to the upstream [kopia uploader integration design](https://github.com/vmware-tanzu/velero/blob/main/design/unified-repo-and-kopia-integration/unified-repo-and-kopia-integration.md) for underlying Velero design and backup & restore workflow. |
| 13 | + |
| 14 | +Upstream [kopia](https://github.com/kopia/kopia) project which is used by Velero and configured by the OADP as this design proposes. |
| 15 | + |
| 16 | +## Goals |
| 17 | + |
| 18 | +- A new option `nodeAgentConfig` to allow configuration of restic or kopia uploader |
| 19 | +- Backwards compatibility with the current OADP configuration schema options, namely `restic` |
| 20 | +- Preparation for the future deprecation of the current `restic` configuration option (from OADP 1.4+) |
| 21 | +- Allow new schema(s) to be used by the datamover node agent |
| 22 | +- Enablment of datamover node agent |
| 23 | +- Deprecation of the `restic` configuration option |
| 24 | +- New environment option `FS_PV_HOSTPATH` that is used as a replacement for `RESTIC_PV_HOSTPATH`. See [Compatibility](#compatibility) section for more details. |
| 25 | +- Removal of the `restic-restore-action-config` ConfigMap with direct replacement by `fs-restore-action-config`. See [Compatibility](#compatibility) section for more details. |
| 26 | + |
| 27 | +## Non Goals |
| 28 | +- Removal of the `restic` configuration option |
| 29 | +- Removal of the `RESTIC_PV_HOSTPATH` environment option |
| 30 | +- Support for the downgreade of OADP operator with new configuration options |
| 31 | +- E2E tests for the `kopia` or `restic` uploader, however they should be added in the near future to cover dpa testing of the new fields and we need backup/restore e2e tests which test both kopia and restic (and datamover eventually) using this new struct. |
| 32 | + |
| 33 | +## High-Level Design |
| 34 | + |
| 35 | +Since new `nodeAgentConfig` configuration option is a sibling of the `restic` one, the new common structure `NodeAgentCommonFields` will be created which will be exactly the same data structure as current `ResticConfig` and will be used by both `Restic` and the new `NodeAgentConfig`. The only difference between `NodeAgentConfig` and `Restic` is an addition of one `UploaderType` option to the `NodeAgentConfig` that will be either `kopia` or `restic`. |
| 36 | + |
| 37 | +When `nodeAgentConfig` is used, the `UploaderType` option is a required one, so the user have to select either `kopia` or `restic`. |
| 38 | + |
| 39 | +## Detailed Design |
| 40 | + |
| 41 | + |
| 42 | +### New data structures |
| 43 | + |
| 44 | +A new structure will be added, that includes the ResticConfig inline and extends this with one new parameter `UploaderType`: |
| 45 | + |
| 46 | +```go |
| 47 | +type NodeAgentConfig struct { |
| 48 | + // Embedding NodeAgentCommonFields |
| 49 | + // +optional |
| 50 | + NodeAgentCommonFields `json:",inline"` |
| 51 | + |
| 52 | + // The type of uploader to transfer the data of pod volumes, the supported values are 'restic' or 'kopia' |
| 53 | + // +kubebuilder:validation:Enum=restic;kopia |
| 54 | + // +kubebuilder:validation:Required |
| 55 | + UploaderType string `json:"uploaderType"` |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +The `NodeAgentCommonFields` structure is 1-1 as the current `Restic` structure. |
| 60 | + |
| 61 | +```go |
| 62 | +type NodeAgentCommonFields struct { |
| 63 | + // enable defines a boolean pointer whether we want the daemonset to |
| 64 | + // exist or not |
| 65 | + // +optional |
| 66 | + Enable *bool `json:"enable,omitempty"` |
| 67 | + // supplementalGroups defines the linux groups to be applied to the NodeAgent Pod |
| 68 | + // +optional |
| 69 | + SupplementalGroups []int64 `json:"supplementalGroups,omitempty"` |
| 70 | + // timeout defines the NodeAgent timeout, default value is 1h |
| 71 | + // +optional |
| 72 | + Timeout string `json:"timeout,omitempty"` |
| 73 | + // Pod specific configuration |
| 74 | + PodConfig *PodConfig `json:"podConfig,omitempty"` |
| 75 | +} |
| 76 | +``` |
| 77 | + |
| 78 | +Current `Restic` structure will embedd the `NodeAgentCommonFields` without any additional options or changes. |
| 79 | + |
| 80 | +The above `NodeAgentConfig` is a member of `ApplicationConfig`, which already includes `ResticConfig`, however we do not replace the `ResticConfig`, instead for backwards compatibility we add new `NodeAgentConfig` parameter: |
| 81 | + |
| 82 | +```go |
| 83 | +// ApplicationConfig defines the configuration for the Data Protection Application |
| 84 | +type ApplicationConfig struct { |
| 85 | + Velero *VeleroConfig `json:"velero,omitempty"` |
| 86 | + // (deprecation warning) ResticConfig is the configuration for restic server. |
| 87 | + // Restic is for backwards compatibility and is replaced by the nodeAgentConfig |
| 88 | + // Restic will be removed with the OADP 1.4 |
| 89 | + // +kubebuilder:deprecatedversion:warning=1.3 |
| 90 | + // +optional |
| 91 | + Restic *ResticConfig `json:"restic,omitempty"` |
| 92 | + |
| 93 | + // NodeAgentConfig is needed to allow selection between kopia or restic |
| 94 | + // +kubebuilder:validation:Optional |
| 95 | + NodeAgent *NodeAgentConfig `json:"nodeAgentConfig,omitempty"` |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +### Configuration YAML options |
| 100 | + |
| 101 | +The `nodeAgentConfig` configuration options and `restic` configuration options will be exactly the same and will contain same options as the current `restic` schema with additional `uploaderType` field under `nodeAgentConfig`. The part of YAML that presents the additions: |
| 102 | + |
| 103 | +```yaml |
| 104 | +configuration: |
| 105 | + description: configuration is used to configure the data protection application's server config |
| 106 | + properties: |
| 107 | + |
| 108 | + nodeAgentConfig: |
| 109 | + description: NodeAgent is needed to allow selection between kopia or restic |
| 110 | + properties: |
| 111 | + |
| 112 | + [...] // <Same as all current restic configuration options> |
| 113 | + |
| 114 | + uploaderType: |
| 115 | + description: The type of uploader to transfer the data of pod volumes, the supported values are 'restic' or 'kopia' |
| 116 | + enum: |
| 117 | + - restic |
| 118 | + - kopia |
| 119 | + type: string |
| 120 | + type: object |
| 121 | + |
| 122 | + restic: |
| 123 | + description: (deprecation warning) Restic is for backwards compatibility and will be removed with the OADP 1.4+. Use nodeAgentConfig instead. |
| 124 | + properties: |
| 125 | + |
| 126 | + [...] // <Same as all current restic configuration options> |
| 127 | + |
| 128 | +``` |
| 129 | + |
| 130 | +### Validations |
| 131 | + |
| 132 | +It is important to disallow user from using both options `restic` and `nodeAgentConfig` in OADP 1.3 together (error state). |
| 133 | + |
| 134 | +### Deprecation of the `restic` coiguration option in OADP 1.3 and it's future removal |
| 135 | + |
| 136 | +The `restic` configuration option will be deprecated in OADP 1.3. It will be removed in a future OADP release. However, restic backup functionality is still fully-supported, but restic users are encouraged to use the new `nodeAgentConfig` struct instead so that they won't be impacted on upgrade when the legacy struct is removed in a future release. |
| 137 | + |
| 138 | +There were few alternatices considered (see [Alternatives Considered](#alternatives-considered)). We will have three places where the deprecation information will be presented to the user: |
| 139 | +1. Description of the `resic` property will have the deprecation warning |
| 140 | +2. If the `restic` is used, the DPA event will contain a `warning` message to inform user that the `restic` is deprecated, that will appear in the DPA `Events`. |
| 141 | +3. If the `restic` is used, the application log will have `warning` message to inform user that the `restic` is deprecated |
| 142 | +4. Release notes for OADP 1.3 will contain information about new configuration option `nodeAgentConfig` that should be used instead of `restic` |
| 143 | + |
| 144 | + |
| 145 | +## Alternatives Considered |
| 146 | + |
| 147 | +- leave `restic` as the only option and do not allow to use `kopia` |
| 148 | +- remove the `restic` and use `kopia` as the default and only available option |
| 149 | + |
| 150 | +### Schema structure |
| 151 | + |
| 152 | +There were alternative schema structure for the `Restic` and new `NodeAgentConfig` considered, however because structures may be directly used by other `go` applications, outside of the API schema, the decision was to use structure as described in the [New data structures](#new-data-structures) section: |
| 153 | +- Keeping `Restic` as is and including inline into `NodeAgentConfig` |
| 154 | +- Duplicating all the fields from `Restic` within `NodeAgentConfig` |
| 155 | +- Moving all the fields from the `Restic` to the `NodeAgentConfig` and ignoring new field which will appear in the `Restic` |
| 156 | + |
| 157 | +### Deprecation warnings |
| 158 | +For informing user about deprecation of the `restic` we considered few additional options: |
| 159 | +- Embedding `warning` within the OpenShift console. This would be the nicest way as the push notifications will appear in the main OCP console, however it would require bigger implementation effort. |
| 160 | + |
| 161 | +- Creating custom information on the DPA object itself, so whenever user would describe created object that had the `restic` the deprecation warning would appear in the `events` section. That option also requires publishing custom `events` fromt he reconcile function and still requires user to pull that information rather then push method. |
| 162 | + |
| 163 | +- Using kubebuilder annotation |
| 164 | + ``` |
| 165 | + // +kubebuilder:deprecatedversion:warning=<string> |
| 166 | + ``` |
| 167 | + This option is not for one particular onfiguration field, but entire CRD, which is outside of our desired needs as we do not want to deprecate entire CRD and create a new version of it, just one field rename. |
| 168 | + |
| 169 | +- Adding additional Reconcile condition with warning message |
| 170 | + Currently there are two `Reasons` that the main DPA reconcail status may have: |
| 171 | + ``` |
| 172 | + const ReconciledReasonComplete = "Complete" |
| 173 | + const ReconciledReasonError = "Error" |
| 174 | + ``` |
| 175 | + |
| 176 | + There is also message that is attached for each of this `Reason`. For an `Error` reason there is propagated error message and for the `Complete` reason there is one string, which is: |
| 177 | + ``` |
| 178 | + const ReconcileCompleteMessage = "Reconcile complete" |
| 179 | + ``` |
| 180 | + |
| 181 | + Additional Reconcile reason, which would have it's own message could be added, that will be similar to `Complete`, so the operator is functioning properly, however there are some messages that require user attention. |
| 182 | + ``` |
| 183 | + const ReconciledReasonWarning = "Warning" |
| 184 | + ``` |
| 185 | + |
| 186 | + This however would require refactoring of the `Reconcile` functions, and that would require separate design doc and separate implementation. |
| 187 | + |
| 188 | +## Security Considerations |
| 189 | + |
| 190 | +The enablement of an extra upload mechanism could potentially introduce security implications due to the integration of a new backend, which might contain vulnerabilities. |
| 191 | + |
| 192 | +## Compatibility |
| 193 | + |
| 194 | +### Current `restic` configuration option |
| 195 | + |
| 196 | +This design does not change current configuration options, however it proposes addition of the deprecation warning to the current `Restic` schema. The `Restic` configuration option will be removed from OADP 1.4. |
| 197 | + |
| 198 | +### RESTIC_PV_HOSTPATH environment option |
| 199 | + |
| 200 | +This design does not remove `RESTIC_PV_HOSTPATH` used to redefine the default `/var/lib/kubelet/pods`, however it adds the environment variable `FS_PV_HOSTPATH` that may be used the same way as `RESTIC_PV_HOSTPATH`. |
| 201 | + |
| 202 | +The `RESTIC_PV_HOSTPATH` takes precedence over `FS_PV_HOSTPATH` |
| 203 | + |
| 204 | +### Replacing `restic-restore-action-config` with `fs-restore-action-config` |
| 205 | + |
| 206 | +The `restic-restore-action-config` was [removed](https://github.com/vmware-tanzu/helm-charts/commit/7484d8e8365ab91da698e5b5d6346153cde30af4) in the Velero v1.10.0 it should also be removed and replaced with the [fs-restore-action-config](https://github.com/vmware-tanzu/helm-charts/blob/c15d13a916c255c93ef48e8bf9c59a3ba198d5ca/charts/velero/templates/NOTES.txt#L62). |
| 207 | + |
| 208 | +## Implementation |
| 209 | + |
| 210 | +Implementation for the new data structure will follow the [New data structures](#new-data-structures) design. |
| 211 | + |
| 212 | +There are two uses of the additional `--uploader-type` that will specify restic or kopia uploader type. |
| 213 | + |
| 214 | +- One within call to the method from the `github.com/vmware-tanzu/velero` package: |
| 215 | + ```go |
| 216 | + func Deployment(namespace string, opts ...podTemplateOption) *appsv1.Deployment |
| 217 | + ``` |
| 218 | + |
| 219 | +- Second is within the OADP `pkg/velero/server/args.go` that is used for testing and development purposes. |
| 220 | + |
| 221 | +We will also rename the `restic.go` module to be named `nodeagent.go` and relevant functions within the `restic.go`, so they are more generic. |
| 222 | + |
| 223 | +## Open Issues |
| 224 | + |
| 225 | +- n/a |
0 commit comments