Skip to content

Tracing information lost for RabbitTemplate.sendAndReceive timeouts #1825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ionel-sirbu-crunch opened this issue Jan 15, 2021 · 8 comments
Closed
Assignees
Labels
status: duplicate A duplicate of another issue

Comments

@ionel-sirbu-crunch
Copy link

ionel-sirbu-crunch commented Jan 15, 2021

Describe the bug
We have a typical client-server setup, running on spring-cloud Hoxton.SR8:

  • the server (S) handles requests via a @RabbitListener;
  • the client (C) uses RabbitTemplate.sendAndReceive() to communicate with the server.

The problem is that if the server takes too long to get back with a response, the client times out & later on RabbitTemplate logsReply received after timeout for <correlationId> warnings. Those warning lack the tracing information, so they cannot be filtered by traceId for example in kibana.

Debugging the spring-sleuth code, I can see there's instrumentation added for the (C) -> (S) leg of the request, which works fine, but no similar instrumentation is present for the return leg, i.e. (S) -> (C). Of course, in the normal (non-timed out) case, that wouldn't be required, as all the MDC values are there for the duration of the current thread context. But in the case of a timeout, that context is gone, along with the MDC values.

This could be fixed if similar instrumentation was added to that 2nd leg:

  • (S) - AbstractRabbitListenerContainerFactory.setBeforeSendReplyPostProcessors() decorator;
  • (C) - RabbitTemplate.afterReceivePostProcessors decorator.
@ionel-sirbu-crunch
Copy link
Author

I realize this is a bit of an edge case & I am curious to know if you've considered it before, but maybe you realized it's not worth implementing, e.g. for performance reasons?

@jonatan-ivanov
Copy link
Member

I haven't checked this deeper but can it happen that the log event does not contain tracing information because the message coming back from rabbit does not contain it either?
This could be related: #1660

Could you please check if the message in the queue contains tracing info or not? Also, could you please check what happens if you try to simulate this using a fake message that contains the correlationId and some fake tracing info?

@ionel-sirbu-crunch
Copy link
Author

Hey @jonatan-ivanov ! Well, that's exactly what I was trying to explain in my first post: there is no logic to handle tracing information anywhere in the return leg of the sendAndReceive(), neither on server side, nor on the client. So no, there is no tracing B3 header present in the message coming back from the server.

I doubt anything would happen if I added some fake tracing header on the response, as RabbitTemplate.afterReceivePostProcessors is null, so at least in theory there's nothing to unpack & process that header. I'll give it a try anyway when I have a bit of time & get back to you on this.

Thanks for looking into this!

@ionel-sirbu-crunch
Copy link
Author

@jonatan-ivanov Just finished the test. I manually set the B3 header on the response on server side:

message.messageProperties.headers["b3"] = "600809eb37c8ac851e6aec0b6a867cf2-cb97412158406f57-1"

Then breakpoint in MDC.put() never triggered on client side. This is also confirmed in the logs:

2021-01-20 11:16:46,750 WARN  [pool-2-thread-9] [trace: span:] RabbitTemplate: Reply received after timeout for 5

So it is as I was expecting, as explained in my previous comment.

@jonatan-ivanov
Copy link
Member

@ionel-sirbu-crunch Thank you for the extra details and explanation.
Could you please create a minimal sample app that reproduces the issue? I can take a look into this and debug it.

@spring-cloud-issues
Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@ionel-sirbu-crunch
Copy link
Author

Here you go: https://github.com/ionel-sirbu-crunch/zipkin-rabbit-test
Please let me know if that works for you.

@jonatan-ivanov
Copy link
Member

@ionel-sirbu-crunch Thank you for the project (especially for the docker-compose file). I was able to reproduce your issue and indeed it is a bug. As I mentioned in #1825 (comment), this issue is related: #1660. Now, that I debugged your project, it is not just related, it is the exact same issue (just the trigger is different: timeout vs. explicit exception).

Please see the details in #1660 (e.g.: this comment: #1660 (comment)). TL;DR: the tracing code in spring-amqp only wraps the the invocation of the listener not the invocation and the error handling together.

I suggested a fix for this but it is a breaking change, it can be considered for the 2.4 release of rabbit-amqp. Gary also suggested a workaround using a RabbitListenerErrorHandler , you can give it a try.

Since this seems like a dupe, I'm closing this issue, please comment on #1660 if you have questions about the fix or the workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: duplicate A duplicate of another issue
Projects
None yet
Development

No branches or pull requests

3 participants