-
Notifications
You must be signed in to change notification settings - Fork 30
Kibana Instrumentation and APM Server transport error (ECONNRESET): socket hang up
Log Messages
#127
Comments
Kibana's dev server starts up with multiple Node.js processes It's unclear is all these processes are started via the It's also unclear if all these processes are serving Kibana and being instrumented by the agent, or if some processes are independent of that. Finally, it's unclear what's meant by a "kibana restart" -- does this mean some (all) of the child processes are restarted? Understanding Kibana's process model will be critical in understanding this bug. Without the Our current working theory on this bug is during the process cycling of a restart (waves hands vaguely) bad things happen with the processes and the TCP connections while things are settling. (theory: one process closes the connection but other processes try to use that connection)) In addition to solving this for kibana, this also points to a general need to expand our multi-process support. |
Another aspect to consider here -- users have reported they're using APM Server in the cloud when this error occurs. This means their agent configuration looks something like
Understanding what sort of load balancing layers exist between the Agent and the APM Server in the cloud will be important in diagnosing this issue. |
From Slack: Here’s what I think is happening:
|
Let me know if I can help diagnose this problem. I wrote the stream implementation here and I know it's quite complicated and not easy to understand, so if there's anything I can do to help don't hesitate to ask 😃 |
I see this happening on starts, but also restarts, and potentially at any time during the lifecycle of the proxy server, but I haven't been able to confirm the latter yet. |
@dgieselaar Do you know if it happens outside of Kibana as well, or have you only seen this in Kibana so far? If only in Kibana, do you know if it also happens if connecting to a non-proxied APM Server? |
I've only seen it in Kibana in development mode with a proxy Kibana server (which is the proxy I'm referring to). I've not tried any other ways of running Kibana. |
Dario and Tyler have been using Kibana's master branch, which IIUC no longer uses cluster as of elastic/kibana@fd1328f |
I was able to consistently reproduce this by delaying the initialisation of the stream by about ~1.5s.I did this in a very gross manner, which was adding a timeout before initialising the StreamChopper instance. There is probably a better way. What the right delay is probably is dependent on the machine. But, for it to consistently reproduce the stream has to be created before the file watcher log message ( |
Thanks all for helping out with this. Here is a bit more information: While in development, I was able to see the socket hang up without a Kibana server restart or any other change.
There have been discussions around this being related to the Kibana server restarts, so I decided to work on reproducing outside that environment. I have been able to reproduce using a 8.0.0 snapshot build of Kibana.
I am wondering if this is an issue with the APM server in Cloud. Is there anything that would be helpful to make that determination or rule it out? |
@dgieselaar has informed me my previous comment was due to the |
I have recently started sending data to a different APM Server instance, also in cloud, and am not seeing this error anymore, at least not as often. |
@dgieselaar v3.14.0 of the agent includes a fix for the blocking behaviour issue we were seeing with the agent talking to APM server. I have elastic/kibana#97509 open to update Kibana to use the new agent. Would you be able/willing some time to try to reproduce those same errors you were seeing with the updated agent? |
I don't know for certain, but I've not heard any more I'm closing now. We can re-open this or an issue on https://github.com/elastic/apm-agent-nodejs later if the issue re-occurs. (Note that in elastic/apm-agent-nodejs#3507 the http-client code was moved to the apm-agent-nodejs repo.) |
We've received reports that some users are seeing the following error message
These users are using the Elastic Node.js Agent to instrument their Kibana development instances. This issue is a general catch all thread for information about these errors and our attempts to get a stable working reproduction in order to further diagnose the issue.
The text was updated successfully, but these errors were encountered: