Skip to content

Considerations regarding shutdown timeout #3039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marcogramy opened this issue Apr 10, 2025 · 4 comments
Closed

Considerations regarding shutdown timeout #3039

marcogramy opened this issue Apr 10, 2025 · 4 comments

Comments

@marcogramy
Copy link

Hi,
following the resolution of issues #2941 and #3032, we have conducted a further analysis concerning the shutdown process within our custom requirements.

In our custom usage scenario, we customize the ShutdownTimeout, which is used within the shutdownAndWaitOrCallback() method to allow time for processing messages after the BasicCancel before the consumers stop.

However, the commit 02db80b has introduced a modification to the basicCancel() method's behavior. The invocation of RabbitUtils.closeMessageConsumer() now triggers a sequential execution of both the Cancel and the Recover operations.

To restore the previous behavior (with a timeout on shutdown), one potential solution would involve the introduction of a distinct timeout period between the execution of the BasicCancel and Recover commands.
Or, better yet, in our opinion, the basicCancel() method should be restored to its state before the change, and the Recover command should be added after waiting for the shutdown timeout.
I am referring to the shutdownAndWaitOrCallback() method after the instruction:
boolean finished = this.cancellationLock.await(this.getShutdownTimeout(), TimeUnit.MILLISECONDS);
if the result is false.

We are unaware of the reasoning behind the aforementioned change, so we are unsure of the most appropriate approach.
We do agree that, before the change, unacknowledged messages were never requeued in the case of channel transacted, but we think that the shutdown timeout should be anyway considered before the Recover.

What are your thoughts on this?

@artembilan
Copy link
Member

We do agree that, before the change, unacknowledged messages were never requeued in the case of channel transacted,

That's exactly why that RabbitUtils.closeMessageConsumer() was added to the basicCancel() logic: to rollback all the in-flight messages for the transactional channel on this consumer.
There is no any other reasoning: just satisfaction of the transaction semantics in RabbitMQ.

I'll think about moving channel.basicRecover(true); call already after the mention shutdown timeout.

However I still would like to understand why do you insist in using transactional channel?
There are no transactions with AMQP 1.0: https://www.rabbitmq.com/docs/next/amqp#amqp-091-features.

Have you considered to use a DirectMessageListenerContainer instead: https://docs.spring.io/spring-amqp/reference/amqp/receiving-messages/choose-container.html?

@artembilan artembilan added this to the 4.0.0-M3 milestone Apr 11, 2025
spring-builds pushed a commit that referenced this issue Apr 11, 2025
Fixes: #3039
Issue link: #3039

Currently, the `BlockingQueueConsumer` initiates a Basic Recovery command on the
channel for transactional consumer immediately after Basic Cancel.
However, it is possible still to try to handle in-flight messages during `shutdownTimeout`
in the listener container

* Leave only Basic Cancel command in the `BlockingQueueConsumer.basicCancel()` API
* Revert `BlockingQueueConsumer.nextMessage(timeout)` method logic to normal loop
until message pulled from the in-memory cache is `null`
* Call `basicCancel(true)` from the `stop()` is not cancelled yet
* Perform `channel.basicRecover()` for transactional channel in the `stop()`.
This `stop()` is usually called from the listener container when in-flight messages
have not been processed during `shutdownTimeout`

(cherry picked from commit 14fe215)
spring-builds pushed a commit that referenced this issue Apr 11, 2025
Fixes: #3039
Issue link: #3039

Currently, the `BlockingQueueConsumer` initiates a Basic Recovery command on the
channel for transactional consumer immediately after Basic Cancel.
However, it is possible still to try to handle in-flight messages during `shutdownTimeout`
in the listener container

* Leave only Basic Cancel command in the `BlockingQueueConsumer.basicCancel()` API
* Revert `BlockingQueueConsumer.nextMessage(timeout)` method logic to normal loop
until message pulled from the in-memory cache is `null`
* Call `basicCancel(true)` from the `stop()` is not cancelled yet
* Perform `channel.basicRecover()` for transactional channel in the `stop()`.
This `stop()` is usually called from the listener container when in-flight messages
have not been processed during `shutdownTimeout`

(cherry picked from commit 14fe215)
@artembilan
Copy link
Member

I have pushed some fix according to your request.
I'd appreciate if you can test it on your side before our next release cycle on April, 22.

Thanks

@marcogramy
Copy link
Author

Hi,
thank you for the fix!

I have successfully tested it in the demo project, confirming that the shutdown process now waits correctly. I did not observe any redelivered messages, indicating that the consumers were given sufficient time to process the in-flight messages.

We will integrate the new version into our production projects as soon as it becomes available, for further testing.

I am currently maintaining a large, decade-old project that has consistently utilized transactional channels. Before removing this configuration we need to fully understand the implications of the change. Our primary requirement is to ensure no message loss occurs.

We process several million messages daily, with traffic loads fluctuating hourly. Consequently, we maintain a dynamically varying number of consumers, leveraging the auto-scaling capabilities of the SimpleMessageListenerContainer, a feature not present in the DirectMessageListenerContainer.

@marcogramy
Copy link
Author

Hi Artem,
we've had the opportunity to test the latest version in our production environments as well, and I can confirm that it works correctly, exhibiting the expected behavior.

Thank you once more for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants