-
Notifications
You must be signed in to change notification settings - Fork 698
Exporters shutdown takes longer then a minute when failing to send metrics/traces #3309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looking deeper into this issue it appears that both Finally,
which basically means that even if we notify the exporter loop to shutdown it can hit the spot where it sleeps for 1 minute and only then would become aware of the event. Possible solution could be to add exporter shutdown event and a slightly customized sleep method that would be aware of that event
something like this ^ And finally changing the order in which shutdown happens where it would first notify the exporter(s) and then listen for the thread to die ( |
I have a similar issue - some of the users of our application run it on systems with the required port blocked by firewalls. The code defines the |
The same for me.
Another change is about
|
@Elli-Rid seems like you almost have a solution, do you think you could open a PR for this issue? 🙂 |
Kindly review my PR. I hope shutting down exporters before calling threads.join(), should resolve the problem. |
I have a bunch of additional context in #2663, but I'm not sure if it's all still relevant. @rajat315315 your PR looks valuable but I think the biggest issue was called out by @Elli-Rid above #3309 (comment): the exponential backoff is unconditionally sleeping Line 316 in 8378db9
@rajat315315 could you make a separate PR with this? |
@aabmass I have updated my PR with waiting for a |
Environment: Mac OS, RHEL 8 (doesn't matter)
Configure traces and/or metrics GRPC exporter with invalid collector URL
What is the expected behavior?
When executing:
it should shutdown in about 3 seconds
What is the actual behavior?
It takes ~60 seconds to shutdown with logs like
Additional context
From what it looks like both metrics and traces exporters use this base/mixin which has a
timeout_millis
Now going level up where
OTLPMetricExporter.shudown
calls it correctlyHowever,
PeriodicExportingMetricReader.shutdown
callsOTLPMetricExporter.shutdown
with a different kwarg name which seems to be completely ignoredtime_ns()
if correct kwarg name it would lead to the error of negative timeout value being suppliedAs for traces, exporter calls
OTLPExporterMixin.shutdown
without propagating any timeouts at all.This leads to some bad behaviour when combined with k8s and async application since timeoutless thread lock blocks event loop and also leads to hanging containers in k8s cluster until they are killed.
The text was updated successfully, but these errors were encountered: