Considerations regarding shutdown timeout #3039

marcogramy · 2025-04-10T17:03:52Z

Hi,
following the resolution of issues #2941 and #3032, we have conducted a further analysis concerning the shutdown process within our custom requirements.

In our custom usage scenario, we customize the ShutdownTimeout, which is used within the shutdownAndWaitOrCallback() method to allow time for processing messages after the BasicCancel before the consumers stop.

However, the commit 02db80b has introduced a modification to the basicCancel() method's behavior. The invocation of RabbitUtils.closeMessageConsumer() now triggers a sequential execution of both the Cancel and the Recover operations.

To restore the previous behavior (with a timeout on shutdown), one potential solution would involve the introduction of a distinct timeout period between the execution of the BasicCancel and Recover commands.
Or, better yet, in our opinion, the basicCancel() method should be restored to its state before the change, and the Recover command should be added after waiting for the shutdown timeout.
I am referring to the shutdownAndWaitOrCallback() method after the instruction:
boolean finished = this.cancellationLock.await(this.getShutdownTimeout(), TimeUnit.MILLISECONDS);
if the result is false.

We are unaware of the reasoning behind the aforementioned change, so we are unsure of the most appropriate approach.
We do agree that, before the change, unacknowledged messages were never requeued in the case of channel transacted, but we think that the shutdown timeout should be anyway considered before the Recover.

What are your thoughts on this?

The text was updated successfully, but these errors were encountered:

artembilan · 2025-04-10T17:24:49Z

We do agree that, before the change, unacknowledged messages were never requeued in the case of channel transacted,

That's exactly why that RabbitUtils.closeMessageConsumer() was added to the basicCancel() logic: to rollback all the in-flight messages for the transactional channel on this consumer.
There is no any other reasoning: just satisfaction of the transaction semantics in RabbitMQ.

I'll think about moving channel.basicRecover(true); call already after the mention shutdown timeout.

However I still would like to understand why do you insist in using transactional channel?
There are no transactions with AMQP 1.0: https://www.rabbitmq.com/docs/next/amqp#amqp-091-features.

Have you considered to use a DirectMessageListenerContainer instead: https://docs.spring.io/spring-amqp/reference/amqp/receiving-messages/choose-container.html?

Fixes: #3039 Issue link: #3039 Currently, the `BlockingQueueConsumer` initiates a Basic Recovery command on the channel for transactional consumer immediately after Basic Cancel. However, it is possible still to try to handle in-flight messages during `shutdownTimeout` in the listener container * Leave only Basic Cancel command in the `BlockingQueueConsumer.basicCancel()` API * Revert `BlockingQueueConsumer.nextMessage(timeout)` method logic to normal loop until message pulled from the in-memory cache is `null` * Call `basicCancel(true)` from the `stop()` is not cancelled yet * Perform `channel.basicRecover()` for transactional channel in the `stop()`. This `stop()` is usually called from the listener container when in-flight messages have not been processed during `shutdownTimeout` (cherry picked from commit 14fe215)

artembilan · 2025-04-11T15:03:38Z

I have pushed some fix according to your request.
I'd appreciate if you can test it on your side before our next release cycle on April, 22.

Thanks

marcogramy · 2025-04-15T09:33:53Z

Hi,
thank you for the fix!

I have successfully tested it in the demo project, confirming that the shutdown process now waits correctly. I did not observe any redelivered messages, indicating that the consumers were given sufficient time to process the in-flight messages.

We will integrate the new version into our production projects as soon as it becomes available, for further testing.

I am currently maintaining a large, decade-old project that has consistently utilized transactional channels. Before removing this configuration we need to fully understand the implications of the change. Our primary requirement is to ensure no message loss occurs.

We process several million messages daily, with traffic loads fluctuating hourly. Consequently, we maintain a dynamically varying number of consumers, leveraging the auto-scaling capabilities of the SimpleMessageListenerContainer, a feature not present in the DirectMessageListenerContainer.

marcogramy · 2025-05-06T14:54:10Z

Hi Artem,
we've had the opportunity to test the latest version in our production environments as well, and I can confirm that it works correctly, exhibiting the expected behavior.

Thank you once more for your support!

marcogramy added status: waiting-for-triage type: enhancement labels Apr 10, 2025

artembilan added status: waiting-for-reporter and removed status: waiting-for-triage labels Apr 10, 2025

artembilan added this to the 4.0.0-M3 milestone Apr 11, 2025

artembilan added for: backport-to-3.1.x for: backport-to-3.2.x labels Apr 11, 2025

artembilan closed this as completed in 14fe215 Apr 11, 2025

spring-builds added the status: backported label Apr 11, 2025

This was referenced Apr 11, 2025

Considerations regarding shutdown timeout #3041

Closed

Considerations regarding shutdown timeout #3042

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Considerations regarding shutdown timeout #3039

Considerations regarding shutdown timeout #3039

marcogramy commented Apr 10, 2025

artembilan commented Apr 10, 2025

artembilan commented Apr 11, 2025

marcogramy commented Apr 15, 2025

marcogramy commented May 6, 2025

Considerations regarding shutdown timeout #3039

Considerations regarding shutdown timeout #3039

Comments

marcogramy commented Apr 10, 2025

artembilan commented Apr 10, 2025

artembilan commented Apr 11, 2025

marcogramy commented Apr 15, 2025

marcogramy commented May 6, 2025