Skip to content

Avoid unnecessary partition seeks on successful record recovery #2195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
v-chernyshev opened this issue Mar 28, 2022 · 10 comments · Fixed by #2207
Closed

Avoid unnecessary partition seeks on successful record recovery #2195

v-chernyshev opened this issue Mar 28, 2022 · 10 comments · Fixed by #2207

Comments

@v-chernyshev
Copy link

I've reordered the sections for this request because, in my opinion, it is easier to understand the desired behaviour when the limitations of the current approach are explained first.

Current Behavior

Let's consider a situation where max.poll.records (500 by default) are available for consumption in a non-transactional record listener container, but the very first record cannot be processed immediately due to e.g. an issue with an external data provider. Here is what happens in the code in case of, at the very least, non-blocking retries:

  • KafkaMessageListenerContainer.ListenerConsumer.doInvokeRecordListener is eventually invoked with record pointing to the first record and iterator holding the remaining 499.
  • this.invokeOnMessage fails with an exception.
  • An error handler is installed, so this.invokeErrorHandler is called.
  • this.commonErrorHandler.remainingRecords() returns true. Both the failed record and all the remaining records from the iterator are drained into a temporary array, this.commonErrorHandler.handleRemaining leads to DefaultErrorHandler.handleRemaining that performs SeekUtils.seekOrRecover.
  • An attempt is made to recover the failed record.
  • No matter whether the recovery operation succeeds or not, the rest of the batch is rewound in seekPartitions.

Unfortunately, this unconditional rewind operation leads to enormous spikes in network I/O as pretty much the same records are requested from Kafka over and over again. To be precise, with a local Kafka broker and a test application that always triggers a recovery I could see spikes exceeding 2000 Mbps in iptraf.

Expected Behavior

I believe that it is not necessary to always rewind the remaining 499 records if the recovery is successful. This is what may happen with the failed record when non-blocking retries are enabled:

  • If the record cannot be processed due to a fatal exception (e.g. ClassCastException) then it is sent straight into the DLT topic.
  • If the exception is not fatal, then the record is sent into the next retry destination.

Both these cases allow the next iterator record to be processed immediately as if nothing happened. This very same record will be at the front of the batch during the next poll invocation anyway! The only special case I can think of is KafkaBackoffException, which does absolutely require rewinding all the offsets so that the same batch is consumed again when the affected partitions are resumed.

Context

The production rollout of our service that uses the non-blocking retries feature provided by Spring Kafka triggered a number of network I/O alarms, so we decided to find the root cause. This feature request is the result of the investigation :)

Implementing a workaround for this issue is possible but, admittedly, quite tricky as invokeErrorHandler is a private API method. These are the steps that may be taken:

  • Implement a custom CommonErrorHandler that:
    • Exposes an API that allows remainingRecords to return either true or false.
    • Overrides handleRecord in a way that delegates to handleRemaining of the default error handler with a singleton list as the second argument.
  • Implement a custom RecordInterceptor with an overridden failure method that uses the above API to make remainingRecords return true only if SeekUtils.isBackoffException is true.

ListenerContainerFactoryConfigurer.setContainerCustomizer may then be used to tie things together, e.g.:

@Bean(name = RetryTopicInternalBeanNames.LISTENER_CONTAINER_FACTORY_CONFIGURER_NAME)
public ListenerContainerFactoryConfigurer listenerContainerFactoryConfigurer(
        KafkaConsumerBackoffManager kafkaConsumerBackoffManager,
        DeadLetterPublishingRecovererFactory deadLetterPublishingRecovererFactory,
        @Qualifier(RetryTopicInternalBeanNames.INTERNAL_BACKOFF_CLOCK_BEAN_NAME) Clock clock) {
    final var configurer = new ListenerContainerFactoryConfigurer(...);

    configurer.setContainerCustomizer(container -> {
        final var customErrorHandler = new CustomErrorHandler(container.getCommonErrorHandler());
        container.setRecordInterceptor(new CustomRecordInterceptor<>(customErrorHandler));
        container.setCommonErrorHandler(customErrorHandler);
    });

    return configurer;
}
@tomazfernandes
Copy link
Contributor

Hi @v-chernyshev, thanks for bringing this up. I'm not sure why this logic is as is - @garyrussell sure can elucidate that. But it seems to me that if we don't do seeks in case the record is successfully recovered, in SeekUtils.doSeeks(), everything should work as expected, at least in this particular scenario.

About KafkaBackOffException, at first I can't see why it should have any special treatment in this case - if the DLPRF rethrows it, recovery won't have been successful and seeks will be performed to the first record's offset as it should.

@v-chernyshev
Copy link
Author

But it seems to me that if we don't do seeks in case the record is successfully recovered, in SeekUtils.doSeeks(), everything should work as expected, at least in this particular scenario.

It may indeed be possible with some API tweaks. The main obstacle, in my opinion, is this block of code. As iterator is drained before handleRemaining the normal processing can no longer resume.

@v-chernyshev
Copy link
Author

About KafkaBackOffException, at first I can't see why it should have any special treatment in this case

Yeah, you are absolutely right. What I meant when I said that this exception is special is that targeted single-record recovery won't ever be possible for it.

@garyrussell
Copy link
Contributor

I'm not sure why this logic is as is ...

It's historical (error handlers pre-dated the retryable topic code by several years).

However, I can see that this might be helpful, even for normal (blocking) retries, after a successful recovery; but it is really exacerbated by retryable topics because recovery is generally successful (at least for the first/main topic).

That said, one of the reasons for doing the seeks is to avoid a rebalance; if we do the retries without polling the consumer and the aggregate of the retry delays exceeds max.poll.interval.ms, Kafka will think we are dead.

I can see (at least) a couple of solutions:

  1. Decorate the iterator with one that can be reset to the position on the first remaining record and have handleRemaining return a boolean to indicate the iterator should be reset. (However, this could suffer from the rebalance problem).
  2. Use similar logic to the FallbackBatchErrorHandler (pause the consumer and invoke the listener with the remaining records).

consumer.pause(consumer.assignment());
try {
while (nextBackOff != BackOffExecution.STOP) {
consumer.poll(Duration.ZERO);
try {
ListenerUtils.stoppableSleep(container, nextBackOff);
}
catch (InterruptedException e1) {
Thread.currentThread().interrupt();
seeker.handleBatch(thrownException, records, consumer, container, () -> { });
throw new KafkaException("Interrupted during retry", logLevel, e1);
}
if (!container.isRunning()) {
throw new KafkaException("Container stopped during retries");
}
try {
invokeListener.run();
return;
}
catch (Exception e) {
if (failed == null) {
failed = recordsToString(records);
}
String toLog = failed;
logger.debug(e, () -> "Retry failed for: " + toLog);
}
nextBackOff = execution.nextBackOff();
}
try {
recoverer.accept(records, thrownException);
}
catch (Exception e) {
logger.error(e, () -> "Recoverer threw an exception; re-seeking batch");
seeker.handleBatch(thrownException, records, consumer, container, () -> { });
}
}
finally {
consumer.resume(consumer.assignment());
}

We could probably combine 1 & 2 in the listener container. I'll give it some thought.

@garyrussell garyrussell added this to the 3.0 Backlog milestone Mar 28, 2022
@tomazfernandes
Copy link
Contributor

Just a couple of thoughts:

That said, one of the reasons for doing the seeks is to avoid a rebalance; if we do the retries without polling the consumer and the aggregate of the retry delays exceeds max.poll.interval.ms, Kafka will think we are dead.

Do we still have any retry mechanisms that rely on backing off and retrying several times in memory? If error handling doesn't include a wait (other than when we wait to seek to current again), maybe the rebalancing problem might not be that much of an issue anymore.

I can see (at least) a couple of solutions:

  1. Decorate the iterator with one that can be reset to the position on the first remaining record and have handleRemaining return a boolean to indicate the iterator should be reset. (However, this could suffer from the rebalance problem).

Why do we need to drain the iterator early? Maybe we could just have the failed record and pass the iterator itself to the error handler, which we'd only iterate if we needed to. Of course, that would have API implications, so maybe it's not worth it.

Thanks

@garyrussell
Copy link
Contributor

Do we still have any retry mechanisms that rely on backing off and retrying several times in memory?

Yes, SeekUtils.doSeeks() calls stoppableSleep() according to the next back off.

I didn't mean to imply that this would be a problem for non-blocking retries, but it would be a problem for blocking retries;

If we change the logic to never do seeks and the aggregate backoff is, say, 10 seconds per record and we have 500 records, we'd not call poll() for 5000 seconds.

I suppose we could change the logic to do the seeks whenever we are retrying and only continue from where we left off when recovery is successful, that would be ok (but that really wouldn't satisfy @v-chernyshev complaint that the seeks aren't really necessary.

I think my final comment will solve it because we can pause/poll/resume the consumer during the retries as well.

As I said

I'll give it some thought.

Why do we need to drain the iterator early? Maybe we could just have the failed record and pass the iterator itself to the error handler, which we'd only iterate if we needed to. Of course, that would have API implications, so maybe it's not worth it.

Yes, that's one possibility.

Either way, these are non-trivial changes.

@tomazfernandes
Copy link
Contributor

I suppose we could change the logic to do the seeks whenever we are retrying and only continue from where we left off when recovery is successful, that would be ok (but that really wouldn't satisfy @v-chernyshev complaint that the seeks aren't really necessary.

Sure, that's what I had in mind, but I see your point.

I think my final comment will solve it because we can pause/poll/resume the consumer during the retries as well.

I see now that for blocking retries that fail this would probably be much better.

Thanks.

@garyrussell
Copy link
Contributor

I have had some fresh thoughts on this. Stay tuned for a PoC early next week.

@tomazfernandes
Copy link
Contributor

Looking forward to it @garyrussell!

garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Apr 4, 2022
Resolves spring-projects#2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
spring-projects#2185 (comment)
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Apr 4, 2022
Resolves spring-projects#2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
spring-projects#2185 (comment)

The new logic is disabled by default, we can consider enabling it in 3.0 and remove
the deprecations.
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Apr 4, 2022
Resolves spring-projects#2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
spring-projects#2185 (comment)

The new logic is disabled by default, we can consider enabling it in 3.0 and remove
the deprecations.
@garyrussell
Copy link
Contributor

I issued a PR for review; please take a look if you get a chance.

@garyrussell garyrussell modified the milestones: 3.0 Backlog, 3.0.0-M4 Apr 5, 2022
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Apr 6, 2022
Resolves spring-projects#2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
spring-projects#2185 (comment)

The new logic is disabled by default, we can consider enabling it in 3.0 and remove
the deprecations.
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Apr 11, 2022
Resolves spring-projects#2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
spring-projects#2185 (comment)

The new logic is disabled by default, we can consider enabling it in 3.0 and remove
the deprecations.
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Apr 12, 2022
Resolves spring-projects#2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
spring-projects#2185 (comment)

The new logic is disabled by default, we can consider enabling it in 3.0 and remove
the deprecations.
@garyrussell garyrussell self-assigned this Apr 13, 2022
artembilan pushed a commit that referenced this issue Apr 13, 2022
Resolves #2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
#2185 (comment)

The new logic is disabled by default, we can consider enabling it in 3.0 and remove
the deprecations.

* Fix race - do not call `resume()` on the container; the user might have paused after the error.

* Change Since to 2.9.

* Fix typos.

Co-authored-by: Artem Bilan <[email protected]>

* Remove unnecessary local variable; add docs.

* Polishing - see commit comment for more details

- move the resume logic to after the invokes and don't resume if pending records
- don't check `isPaused()` after empty poll due to errors; always restore the pending records

* Remove unnecessary boolean; fix deprecation warnings and delegating error handlers.

* Emergency stop container if the consumer returns records while paused after an error.

* Fix race in test - prevent consumer thread from changing pausedConsumers while the test thread is calling revoke/assign.

* Remove System.out().

* Add diagnostics to test.

* Fix race in test; wait until next poll after consumer thread pauses the partitions.

* Fix stubbing in emergency stop test.

* Remove unnecessary boolean.

**Cherry-pick to `2.9.x`**
garyrussell added a commit that referenced this issue Apr 14, 2022
Resolves #2195

Add an option to avoid seeks after handling exceptions.

Instead, pause the consumer for one `poll()` and use the remaining records as the
result of that poll.

New methods on `CommonErrorHandler` - `handleOne` for record listeners, returning
a boolean to indicate whether the record was recovered and should not be redelivered.

`handlaBatchAndReturnRemaining` for batch listeners, returning either the complete
set or a subset, e.g. when the `DEH` receives a `BatchListenerExecutionFailedException`
and commits a partial batch.

Also includes the classifier refactoring discussed here
#2185 (comment)

The new logic is disabled by default, we can consider enabling it in 3.0 and remove
the deprecations.

* Fix race - do not call `resume()` on the container; the user might have paused after the error.

* Change Since to 2.9.

* Fix typos.

Co-authored-by: Artem Bilan <[email protected]>

* Remove unnecessary local variable; add docs.

* Polishing - see commit comment for more details

- move the resume logic to after the invokes and don't resume if pending records
- don't check `isPaused()` after empty poll due to errors; always restore the pending records

* Remove unnecessary boolean; fix deprecation warnings and delegating error handlers.

* Emergency stop container if the consumer returns records while paused after an error.

* Fix race in test - prevent consumer thread from changing pausedConsumers while the test thread is calling revoke/assign.

* Remove System.out().

* Add diagnostics to test.

* Fix race in test; wait until next poll after consumer thread pauses the partitions.

* Fix stubbing in emergency stop test.

* Remove unnecessary boolean.

**Cherry-pick to `2.9.x`**
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue May 9, 2022
- Revert to seeking when `CommitFailedException` - due to a rebalance
- For batch listeners, move committing to the try block
- For record listeners, `AckMode.RECORD` already commits in the try block
- With `AckMode.BATCH`, detect a no-seek retry and call the error handler `handleRemaining`
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue May 9, 2022
- Revert to seeking when `CommitFailedException` - due to a rebalance
- For batch listeners, move committing to the try block
- For record listeners, `AckMode.RECORD` already commits in the try block
- With `AckMode.BATCH`, detect a no-seek retry and call the error handler `handleRemaining`
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue May 9, 2022
- Revert to seeking when `CommitFailedException` - due to a rebalance
- For batch listeners, move committing to the try block
- For record listeners, `AckMode.RECORD` already commits in the try block
- With `AckMode.BATCH`, detect a no-seek retry and call the error handler `handleRemaining`

**cherry-pick to 2.9.x**
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jun 15, 2022
- Revert to seeking when `CommitFailedException` - due to a rebalance
- For batch listeners, move committing to the try block
- For record listeners, `AckMode.RECORD` already commits in the try block
- With `AckMode.BATCH`, detect a no-seek retry and call the error handler `handleRemaining`

**cherry-pick to 2.9.x**
artembilan pushed a commit that referenced this issue Jun 15, 2022
Related to: #2195

- Revert to seeking when `CommitFailedException` - due to a rebalance
- For batch listeners, move committing to the try block
- For record listeners, `AckMode.RECORD` already commits in the try block
- With `AckMode.BATCH`, detect a no-seek retry and call the error handler `handleRemaining`

**cherry-pick to 2.9.x**

* Fix test class names.
artembilan pushed a commit that referenced this issue Jun 15, 2022
Related to: #2195

- Revert to seeking when `CommitFailedException` - due to a rebalance
- For batch listeners, move committing to the try block
- For record listeners, `AckMode.RECORD` already commits in the try block
- With `AckMode.BATCH`, detect a no-seek retry and call the error handler `handleRemaining`

**cherry-pick to 2.9.x**

* Fix test class names.
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 5, 2022
Related to spring-projects#2195

Do not process the remaining records if the first record is from a
paused partition.
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 5, 2022
Related to spring-projects#2195

Do not process the remaining records if the first record is from a
paused partition.
artembilan pushed a commit that referenced this issue Jul 5, 2022
Related to #2195

Do not process the remaining records if the first record is from a
paused partition.
artembilan pushed a commit that referenced this issue Jul 5, 2022
Related to #2195

Do not process the remaining records if the first record is from a
paused partition.
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 5, 2022
- check `isPartitionPauseRequested()` instead of `isPartitionPaused()`
- add diagnostic debug logging
- prevent re-pausing the entire consumer each time a poll exits but the first
  partition is still paused.

**cherry-pick to 2.9.x**
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 5, 2022
Related to spring-projects#2195

- check `isPartitionPauseRequested()` instead of `isPartitionPaused()`
- add diagnostic debug logging
- prevent re-pausing the entire consumer each time a poll exits but the first
  partition is still paused.

**cherry-pick to 2.9.x**
artembilan pushed a commit that referenced this issue Jul 5, 2022
Related to #2195

- check `isPartitionPauseRequested()` instead of `isPartitionPaused()`
- add diagnostic debug logging
- prevent re-pausing the entire consumer each time a poll exits but the first
  partition is still paused.

**cherry-pick to 2.9.x**

* Checkstyle fix.
artembilan pushed a commit that referenced this issue Jul 5, 2022
Related to #2195

- check `isPartitionPauseRequested()` instead of `isPartitionPaused()`
- add diagnostic debug logging
- prevent re-pausing the entire consumer each time a poll exits but the first
  partition is still paused.

**cherry-pick to 2.9.x**

* Checkstyle fix.
garyrussell added a commit to garyrussell/spring-kafka that referenced this issue Jul 6, 2022
Related to spring-projects#2195

Similar to the recent fix when the first partition is paused, we should
not continue to process the remaining records if the container itself
was paused during the poll after an error occurred with `seekAfterError=false`.

**cherry-pick to 2.9.x**
artembilan pushed a commit that referenced this issue Jul 6, 2022
Related to #2195

Similar to the recent fix when the first partition is paused, we should
not continue to process the remaining records if the container itself
was paused during the poll after an error occurred with `seekAfterError=false`.

**cherry-pick to 2.9.x**

* Remove unneeded copy/paste code from test.

* Fix creation of remaining records.

Previously, they were all inserted into the failed record's `TopicPartition` record list.
artembilan pushed a commit that referenced this issue Jul 6, 2022
Related to #2195

Similar to the recent fix when the first partition is paused, we should
not continue to process the remaining records if the container itself
was paused during the poll after an error occurred with `seekAfterError=false`.

**cherry-pick to 2.9.x**

* Remove unneeded copy/paste code from test.

* Fix creation of remaining records.

Previously, they were all inserted into the failed record's `TopicPartition` record list.
garyrussell added a commit that referenced this issue Jul 6, 2022
Use a `LinkedHashMap` for the remaining records so that the order is
retained.
Fix test after removing some logic in the last commit.
garyrussell added a commit that referenced this issue Jul 6, 2022
Use a `LinkedHashMap` for the remaining records so that the order is
retained.
Fix test after removing some logic in the last commit.
garyrussell added a commit that referenced this issue Mar 14, 2023
Don't reference deprecated methods on `CommonErrorHandler`.
garyrussell added a commit that referenced this issue Mar 14, 2023
Don't reference deprecated methods on `CommonErrorHandler`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants