Combine blocking and non-blocking error handling #2116

ghost · 2022-02-18T10:52:18Z

As before discussed here on Github.

Expected Behavior

A combination of blocking and non-blocking retry should be possible to handle three different types of exceptions:

non-blocking (but not fatal) exceptions should be retried.
non-blocking (but fatal, e.g. DeserializationException) exceptions should be routed to the DLT directly.
blocking exceptions (e.g. a database is currently not available) should not be routed to a retry topic since all following messages would be routed there as well. The consumer should remain at the current offset.

Current Behavior
It is possible to use @RetryableTopic annotation to satisfy the first two bullets. It is also possible to use DefaultErrorHandler to implement the last bullet. But a combination of both approaches cannot be configured at the moment.

Context
@garyrussell mentioned a workaround by stopping the container using the stopImmediate container property. Nevertheless, there are a few blog articles that address the different handling of error types, e.g., this article. Therefore, having a proper way to configure this behavior would be a useful possibility.

The text was updated successfully, but these errors were encountered:

tomazfernandes · 2022-02-18T20:18:04Z

Thanks for the suggestion @mariecuriiee, that's a good article.

I'm currently working on a couple of issues that should pave the way for this feature to work, by leveraging the existing DefaultErrorHandler we already set for the RetryableTopics. So hopefully we'll have this implemented soon.

Before we hardcoded a no-ops back off in the DefaultErrorHandler used in the Retryable Topics feature. Adds a setter to let the user provide their own back off policy and configure blocking retries in conjunction with RT.

tomazfernandes · 2022-02-20T21:52:06Z

Hi @mariecuriiee, I've added a PR with this feature, maybe you want to take a look:
#2124

LMWYT, thanks

Before we hardcoded a no-ops back off in the DefaultErrorHandler used in the Retryable Topics feature. Adds a setter to let the user provide their own back off policy and configure blocking retries in conjunction with RT.

ghost · 2022-02-22T14:50:45Z

Hey @tomazfernandes, thanks for implementing this so quickly! If I understand the implementation correctly, I only have to set the backoff using setBlockingRetryBackoff and add a classification behavior using setErrorHandlerCustomizer? And with setErrorHandlerCustomizer I could then define the behavior depending on the exception, e.g., for some exceptions the records go directly to the retry topics, for other exceptions, the records are handled using blocking retries or for other exceptions, the records are routed directly to the DLT?

tomazfernandes · 2022-02-22T15:08:17Z

Hi @mariecuriiee, thanks for looking into the solution. We’re currently reviewing it to improve the classification for blocking retries and make it allowlistable instead of the current denylist only behavior.

Either way, you’d have to configure the classification for each type of retry, i.e. define which exceptions are fatal for the blocking retries, and which are fatal for the non-blocking retries.

So if you want a non-default fatal exception to go straight to the DLT, you’d have to add it to both classifications. With this you can have any behavior you want - both retries, only one or another, or no retries.

We’re also in the process of reviewing the configuration API for the retryable topics feature as a whole, so when that comes out we’ll probably have a nicer API for configuring this.

As a side note, we’ve just introduced a way of setting global fatal exceptions for the non-blocking retries, maybe that’s something you’ll want to check out too.

ghost · 2022-02-22T20:43:02Z

@tomazfernandes that makes sense, thanks for clarification!
I thought about Gary's comment again and I agree that it would be more intuitive like he described. Even though this means that this feature won't be released in 2.8.3 😢

But as soon as it is released, I will use it in my project 😄 I also had a quick look at the global fatal exceptions - will test that too!

Before we hardcoded a no-ops back off in the DefaultErrorHandler used in the Retryable Topics feature. Adds a setter to let the user provide their own back off policy and configure blocking retries in conjunction with RT.

* GH-2116: Add blocking retries to RT Before we hardcoded a no-ops back off in the DefaultErrorHandler used in the Retryable Topics feature. Adds a setter to let the user provide their own back off policy and configure blocking retries in conjunction with RT. * Change DHE in LCFC to defaultFalse With this we no longer need a no ops back off. Some minor adjustments were needed to maintain behavior when the logic gets to DLPR. * Change DHE in LCFC to defaultFalse With this we no longer need a no ops back off. Some minor adjustments were needed to maintain behavior when the logic gets to DLPR. * Improve API and docs Now retryable exceptions can be set directly in the lcfc class. Improved the docs on how to combine blocking and non-blocking behaviors. Added what's new entry for this feature. * Improve ExceptionClassifier JavaDoc Also add assertions to the LCFC new methods to warn the user if they already set the blocking configurations.

tomazfernandes · 2022-03-02T02:27:37Z

@mariecuriiee, we’ve implemented the solution, don’t know if you had the chance to take a look. It’s on 2.8.4-SNAPSHOT, and referenced in the docs: https://docs.spring.io/spring-kafka/docs/2.8.4-SNAPSHOT/reference/html/#retry-topic-combine-blocking

Thanks again for the suggestion, and feel free to provide any feedback.

@garyrussell, DYT there’s any more work to be done regarding this issue, or maybe we could close it?

Thanks

garyrussell · 2022-03-03T15:08:15Z

Thanks @tomazfernandes ; closing; resolved.

ghost added status: waiting-for-triage type: enhancement labels Feb 18, 2022

tomazfernandes mentioned this issue Feb 18, 2022

Record doesn't get to DLT with FATAL exception and single-topic strategy #2118

Closed

garyrussell removed the status: waiting-for-triage label Mar 2, 2022

garyrussell closed this as completed Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine blocking and non-blocking error handling #2116

Combine blocking and non-blocking error handling #2116

ghost commented Feb 18, 2022

tomazfernandes commented Feb 18, 2022

tomazfernandes commented Feb 20, 2022

ghost commented Feb 22, 2022

tomazfernandes commented Feb 22, 2022

ghost commented Feb 22, 2022

tomazfernandes commented Mar 2, 2022

garyrussell commented Mar 3, 2022

Combine blocking and non-blocking error handling #2116

Combine blocking and non-blocking error handling #2116

Comments

ghost commented Feb 18, 2022

tomazfernandes commented Feb 18, 2022

tomazfernandes commented Feb 20, 2022

ghost commented Feb 22, 2022

tomazfernandes commented Feb 22, 2022

ghost commented Feb 22, 2022

tomazfernandes commented Mar 2, 2022

garyrussell commented Mar 3, 2022