|
| 1 | +--- |
| 2 | +title: Error handling and retries |
| 3 | +weight: 46 |
| 4 | +--- |
| 5 | + |
| 6 | +## Automatic Retries on Error |
| 7 | + |
| 8 | +JOSDK will schedule an automatic retry of the reconciliation whenever an exception is thrown by |
| 9 | +your `Reconciler`. The retry is behavior is configurable but a default implementation is provided |
| 10 | +covering most of the typical use-cases, see |
| 11 | +[GenericRetry](https://github.com/java-operator-sdk/java-operator-sdk/blob/master/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/retry/GenericRetry.java) |
| 12 | +. |
| 13 | + |
| 14 | +```java |
| 15 | + GenericRetry.defaultLimitedExponentialRetry() |
| 16 | + .setInitialInterval(5000) |
| 17 | + .setIntervalMultiplier(1.5D) |
| 18 | + .setMaxAttempts(5); |
| 19 | +``` |
| 20 | + |
| 21 | +You can also configure the default retry behavior using the `@GradualRetry` annotation. |
| 22 | + |
| 23 | +It is possible to provide a custom implementation using the `retry` field of the |
| 24 | +`@ControllerConfiguration` annotation and specifying the class of your custom implementation. |
| 25 | +Note that this class will need to provide an accessible no-arg constructor for automated |
| 26 | +instantiation. Additionally, your implementation can be automatically configured from an |
| 27 | +annotation that you can provide by having your `Retry` implementation implement the |
| 28 | +`AnnotationConfigurable` interface, parameterized with your annotation type. See the |
| 29 | +`GenericRetry` implementation for more details. |
| 30 | + |
| 31 | +Information about the current retry state is accessible from |
| 32 | +the [Context](https://github.com/java-operator-sdk/java-operator-sdk/blob/master/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/Context.java) |
| 33 | +object. Of note, particularly interesting is the `isLastAttempt` method, which could allow your |
| 34 | +`Reconciler` to implement a different behavior based on this status, by setting an error message |
| 35 | +in your resource' status, for example, when attempting a last retry. |
| 36 | + |
| 37 | +Note, though, that reaching the retry limit won't prevent new events to be processed. New |
| 38 | +reconciliations will happen for new events as usual. However, if an error also occurs that |
| 39 | +would normally trigger a retry, the SDK won't schedule one at this point since the retry limit |
| 40 | +is already reached. |
| 41 | + |
| 42 | +A successful execution resets the retry state. |
| 43 | + |
| 44 | +### Setting Error Status After Last Retry Attempt |
| 45 | + |
| 46 | +In order to facilitate error reporting, `Reconciler` can implement the |
| 47 | +[ErrorStatusHandler](https://github.com/java-operator-sdk/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/ErrorStatusHandler.java) |
| 48 | +interface: |
| 49 | + |
| 50 | +```java |
| 51 | +public interface ErrorStatusHandler<P extends HasMetadata> { |
| 52 | + |
| 53 | + ErrorStatusUpdateControl<P> updateErrorStatus(P resource, Context<P> context, Exception e); |
| 54 | + |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +The `updateErrorStatus` method is called in case an exception is thrown from the `Reconciler`. It is |
| 59 | +also called even if no retry policy is configured, just after the reconciler execution. |
| 60 | +`RetryInfo.getAttemptCount()` is zero after the first reconciliation attempt, since it is not a |
| 61 | +result of a retry (regardless of whether a retry policy is configured or not). |
| 62 | + |
| 63 | +`ErrorStatusUpdateControl` is used to tell the SDK what to do and how to perform the status |
| 64 | +update on the primary resource, always performed as a status sub-resource request. Note that |
| 65 | +this update request will also produce an event, and will result in a reconciliation if the |
| 66 | +controller is not generation aware. |
| 67 | + |
| 68 | +This feature is only available for the `reconcile` method of the `Reconciler` interface, since |
| 69 | +there should not be updates to resource that have been marked for deletion. |
| 70 | + |
| 71 | +Retry can be skipped in cases of unrecoverable errors: |
| 72 | + |
| 73 | +```java |
| 74 | + ErrorStatusUpdateControl.patchStatus(customResource).withNoRetry(); |
| 75 | +``` |
| 76 | + |
| 77 | +### Correctness and Automatic Retries |
| 78 | + |
| 79 | +While it is possible to deactivate automatic retries, this is not desirable, unless for very |
| 80 | +specific reasons. Errors naturally occur, whether it be transient network errors or conflicts |
| 81 | +when a given resource is handled by a `Reconciler` but is modified at the same time by a user in |
| 82 | +a different process. Automatic retries handle these cases nicely and will usually result in a |
| 83 | +successful reconciliation. |
| 84 | + |
| 85 | +## Retry and Rescheduling and Event Handling Common Behavior |
| 86 | + |
| 87 | +Retry, reschedule and standard event processing form a relatively complex system, each of these |
| 88 | +functionalities interacting with the others. In the following, we describe the interplay of |
| 89 | +these features: |
| 90 | + |
| 91 | +1. A successful execution resets a retry and the rescheduled executions which were present before |
| 92 | + the reconciliation. However, a new rescheduling can be instructed from the reconciliation |
| 93 | + outcome (`UpdateControl` or `DeleteControl`). |
| 94 | + |
| 95 | + For example, if a reconciliation had previously been re-scheduled after some amount of time, but an event triggered |
| 96 | + the reconciliation (or cleanup) in the mean time, the scheduled execution would be automatically cancelled, i.e. |
| 97 | + re-scheduling a reconciliation does not guarantee that one will occur exactly at that time, it simply guarantees that |
| 98 | + one reconciliation will occur at that time at the latest, triggering one if no event from the cluster triggered one. |
| 99 | + Of course, it's always possible to re-schedule a new reconciliation at the end of that "automatic" reconciliation. |
| 100 | + |
| 101 | + Similarly, if a retry was scheduled, any event from the cluster triggering a successful execution in the mean time |
| 102 | + would cancel the scheduled retry (because there's now no point in retrying something that already succeeded) |
| 103 | + |
| 104 | +2. In case an exception happened, a retry is initiated. However, if an event is received |
| 105 | + meanwhile, it will be reconciled instantly, and this execution won't count as a retry attempt. |
| 106 | +3. If the retry limit is reached (so no more automatic retry would happen), but a new event |
| 107 | + received, the reconciliation will still happen, but won't reset the retry, and will still be |
| 108 | + marked as the last attempt in the retry info. The point (1) still holds, but in case of an |
| 109 | + error, no retry will happen. |
| 110 | + |
| 111 | +The thing to keep in mind when it comes to retrying or rescheduling is that JOSDK tries to avoid unnecessary work. When |
| 112 | +you reschedule an operation, you instruct JOSDK to perform that operation at the latest by the end of the rescheduling |
| 113 | +delay. If something occurred on the cluster that triggers that particular operation (reconciliation or cleanup), then |
| 114 | +JOSDK considers that there's no point in attempting that operation again at the end of the specified delay since there |
| 115 | +is now no point to do so anymore. The same idea also applies to retries. |
0 commit comments