-
Notifications
You must be signed in to change notification settings - Fork 231
Transactions not always ended if client closes socket prematurely #1411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've tracked it down to the transaction never ending, which in turn means that it's not sent to the mock APM Server, which then means that we never reach expected number of assertions ( The reason why it never ends is that we're listening for the apm-agent-nodejs/lib/instrumentation/http-shared.js Lines 50 to 54 in 3e360ce
The place this event is fired inside of Node core is in the The problem is that our code seems to contain a race condition which triggers if data is written to the response object too soon after the underlying socket has been aborted. So at the time when the data is written, the ServerResponse object doesn't yet know the socket is actually closed and queues the data (the last part is just a guess of mine). Or tests normally doesn't trigger this race condition because the sockets operate quite quickly. But on Jenkins there's probably something related to how we use Docker, which means it takes a long time for the ServerResponse to get notified about the socket being closed. In fact it sometimes takes more than 10ms according to my tests on Jenkins, as opposed to around 1ms on my own machine. And since we only wait 10ms before we write to the response again after the client request has been aborted, we risk triggering the race condition. I can reproduce this error locally if I change the 10ms timeout to 0ms: apm-agent-nodejs/test/instrumentation/modules/http/aborted-requests-enabled.js Lines 39 to 45 in 3e360ce
We could just "fix" our tests by changing the 10ms timeout to a 100ms timeout. But that would still mean that we have a race condition. If that race condition triggers, a transaction will never be recorded for the request. So why don't we just end the transaction when the ServerResponse closes? We could do that, and that would be more foolproof, but it would mean that the reported length of the transaction would be cut short in cases where the underlying socket was closed prematurely. The server code will still continue to treat the response as one you can write too even though the data is just thrown away, so from the point of view of the server, the transaction hasn't ended yet. So ending the transaction when the socket ends, would skew the avg. transaction length metrics and report smaller response times than in reality. Possible solutions
|
Perhaps we should get input from other agents? @elastic/apm-agent-devs |
Perhaps I'm missing something, but this specific issue seems to closely related to how Node.js works. FWIW, in Go the server will be notified when the socket is closed, but the application code needs to actively watch for this and return if it wants to. Only once the application code returns will the transaction be ended. Option 3 seems like to right thing to do to me, but I have a couple of questions:
|
Maybe yes... at least the streams of Node.js doesn't make this easy.
In that case, it sounds like the Go agent "follows" the implementation of option 3 (which is also what I would expect for all the other agents). What HTTP status code/result do you report for those types of
It might be pretty tricky to implement which is why I leaned more towards option 2. But if we can implement it and also combine it with option 2 as you suggest, then it would be the best of both worlds I guess (see below).
Yes, that's what I meant. Combining 2 and 3 would solve this issue 👍
The issue is that our current implementation relies on the event Currently, our implementation relies on the response object being a stream, and we "simply" listen for when that stream ends. This is what makes the implementation complicated. I guess we could instead listen for when someone calls |
At the moment the Go agent does the same. Having an indication that the request was cancelled seems like a useful change, I'm a bit unsure of whether changing the result is the right thing to do, since it'll affect throughput graphs. OTOH, being able to filter them easily may be useful for isolating outliers. From your earlier analysis:
Yes, this does appear to be the case. I tested with v8.11.0 and v10.16.0; the former marks the connection as destroyed immediately, the latter does not.
How about we hook into |
That sounds like a great idea. Unless there's some hidden edge case that I haven't thought of yet, it might just work 👍 |
This fixes a race condition that in some cases meant that a request transaction was never ended if the underlying TCP socket was closed by the client and the server didn't discover this in time. Fixes elastic#1411
This fixes a race condition that in some cases meant that a request transaction was never ended if the underlying TCP socket was closed by the client and the server didn't discover this in time. Fixes #1411
…c#1439) This fixes a race condition that in some cases meant that a request transaction was never ended if the underlying TCP socket was closed by the client and the server didn't discover this in time. Fixes elastic#1411
I created a separate issue regarding marking the transaction as aborted: elastic/apm#154 |
…c#1439) This fixes a race condition that in some cases meant that a request transaction was never ended if the underlying TCP socket was closed by the client and the server didn't discover this in time. Fixes elastic#1411
This fixes a race condition that in some cases meant that a request transaction was never ended if the underlying TCP socket was closed by the client and the server didn't discover this in time. Fixes #1411
The first test in
test/instrumentation/modules/http/aborted-requests-enabled.js
randomly hangs and never completes on Jenkins. It happens every 5 times or so we run a test on Jenkins and it happens on multiple different versions of Node.js.This might be related to #1350
The text was updated successfully, but these errors were encountered: