-
Notifications
You must be signed in to change notification settings - Fork 1.6k
GH-2116: Add blocking retries to RT #2124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc49f7e
to
09b063f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I will merge this for 2.8.3; need to think it through some more. As I understand the logic, it will always go through blocking retries first and then through non-blocking (assuming default config). I think we need the reverse logic - i.e. don't retry (blocking) for any exceptions unless explicitly set by the user; I think that will need changes in the DLPR.
Yes, I thought about this too. I think we could solve this by allowing the |
We'd then have something like:
|
And just to be clear, with this PR as is the default behavior would stay the same, with the no ops BackOff policy. Only if the user provides their own BackOff policy the blocking retries would kick in. |
Right; I get that; I just have a problem with the default behavior if you just add a back off. Thanks for all your work on these issues, by the way. |
Sure, no worries, I'm glad we were able to go through them in time for this release, thanks. Let me know if there's anything else I can help with for this issue. We can close this PR and open another one when we have a clearer picture if you prefer. If nothing more pressing comes up I should start working on improving the way RT's components are configured in general, if that's ok. So hopefully users will have a nice and clean API to use these features instead of all current non-intuitive verbosity. I'll open an issue when I have a draft so we can all take a look into it before actual implementation starts. |
I am ok with this PR, I think we just need a modification to |
Here's a proposal to support classifying exceptions as false by default... |
Correction: garyrussell@63c5a2f |
In preparation for spring-projects#2124
In preparation for spring-projects#2124
Option to not retry by default and retry specific. * Polishing and Tests In preparation for #2124
Option to not retry by default and retry specific. * Polishing and Tests In preparation for #2124
With that improvement in the ExceptionClassifier I can resume working on this, thanks. Should have something later today for your appreciation. |
Before we hardcoded a no-ops back off in the DefaultErrorHandler used in the Retryable Topics feature. Adds a setter to let the user provide their own back off policy and configure blocking retries in conjunction with RT.
09b063f
to
8f1d58a
Compare
With this we no longer need a no ops back off. Some minor adjustments were needed to maintain behavior when the logic gets to DLPR.
8f1d58a
to
85cab3f
Compare
Now that we have the DHE set to default false, we no longer need the No Ops back off for the retryable topics - the user can just add Retryable exceptions and use the DHE default one. Since the logic for when the back off policy has no retries and the logic for when the classification returns false are a bit different, I had to make some minor adjustments to other classes so that the DLPR partition checking behavior stays the same. As far as I could check this adjustment should have no impact besides enabling partition checking for users that use DLPR with exception classification that returns false, which I think should be a good thing for consistency and easy to disable if needed since there's a flag for that. Of course, if that's a problem I can revert to using the NoOps back off. Thanks |
With this we no longer need a no ops back off. Some minor adjustments were needed to maintain behavior when the logic gets to DLPR.
7e86fb8
to
a7e069b
Compare
I'll just add a small improvement to the API and the docs. |
* Set a {@link BackOff} to be used with blocking retries. | ||
* You can specify the exceptions to be retried using the method | ||
* {@link org.springframework.kafka.listener.ExceptionClassifier#addRetryableExceptions(Class[])} | ||
* By default, no exceptions are retried via blocking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add something like "If the backoff execution returns STOP (retries are exhausted), the record will then go to the next retry topic (if present)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think the aggregated behavior of blocking and non-blocking can be better documented in general. I'll see what I can come up with.
protected DefaultErrorHandler createDefaultErrorHandlerInstance(DeadLetterPublishingRecoverer deadLetterPublishingRecoverer) { | ||
return this.providedBlockingBackOff != null | ||
? new DefaultErrorHandler(deadLetterPublishingRecoverer, this.providedBlockingBackOff) | ||
: new DefaultErrorHandler(deadLetterPublishingRecoverer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't right; the default fixed back off is (0, 9), we need (0, 0).
Why not just set the field to new FixedBackOff(0, 0);
? We then don't need this test. Rename the field too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, considering all exceptions will return false when classified by default, it doesn't really matter what's the backoff policy - we can just leave the default one and then the user would have only to add the retryable exceptions, which would then make the default or provided back off kick in.
Anyway, we can set the no ops back off too, no worries. Either way I think the other classes modifications to pass the consumer to DLPR are still needed, since we might go that route for exception classifications that return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh; Ok; makes sense to leave it.
Improved the documentation and made a slight improvement to the API. |
Now retryable exceptions can be set directly in the lcfc class. Improved the docs on how to combine blocking and non-blocking behaviors. Added what's new entry for this feature.
de984b5
to
29ced81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thanks. Just one question.
@@ -185,7 +186,8 @@ public boolean removeNotRetryableException(Class<? extends Exception> exceptionT | |||
* @see #addNotRetryableExceptions(Class...) | |||
* @see #setClassifications(Map, boolean) | |||
*/ | |||
public boolean removeClassification(Class<? extends Exception> exceptionType) { | |||
@Nullable | |||
public Boolean removeClassification(Class<? extends Exception> exceptionType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This change is because as is if we try to remove an exception that is not there we get a NPE for trying to convert null to a boolean. With this change it'll return null and the user can deal with that as they wish.
I for one thought it would return false
for not finding the exception in the map, but turns out it returns the current value, or null if it's not found.
* <p>Returns the value to which this map previously associated the key, * or {@code null} if the map contained no mapping for the key.
It's not something extremely necessary, I can roll it back if you prefer. Or maybe it's worth adding some explanation to the javadocs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops; thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change the javadoc to @return the classification of the exception if the removal was successful; null otherwise.
I can do it during the merge if you like; but I have to run out for a short while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, no worries, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the javadoc and added two assertions to the LCFC new methods
Also add assertions to the LCFC new methods to warn the user if they already set the blocking configurations.
ef4668d
to
ce9656b
Compare
* GH-2116: Add blocking retries to RT Before we hardcoded a no-ops back off in the DefaultErrorHandler used in the Retryable Topics feature. Adds a setter to let the user provide their own back off policy and configure blocking retries in conjunction with RT. * Change DHE in LCFC to defaultFalse With this we no longer need a no ops back off. Some minor adjustments were needed to maintain behavior when the logic gets to DLPR. * Change DHE in LCFC to defaultFalse With this we no longer need a no ops back off. Some minor adjustments were needed to maintain behavior when the logic gets to DLPR. * Improve API and docs Now retryable exceptions can be set directly in the lcfc class. Improved the docs on how to combine blocking and non-blocking behaviors. Added what's new entry for this feature. * Improve ExceptionClassifier JavaDoc Also add assertions to the LCFC new methods to warn the user if they already set the blocking configurations.
...and cherry-picked as a7621a1 |
Before we hardcoded a no-ops back off in the DefaultErrorHandler used in the Retryable Topics feature.
Adds a setter to let the user provide their own back off policy and configure blocking retries in conjunction with RT.
I also thought of making this configuration possible on a per-topic basis by adding it to the builder and annotation, but I think this is enough for this release since we won't have much time to review. We can open an issue for that if that's the case
and I can provide a PR soon.