-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Getting a connection from pool without actually running a query against it will leak it. #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What version are you using. If you look in lib/defaults.js poolIdleTimeout should be set to 30 seconds by default, meaning that after 30 seconds your client will be destroyed and thus removed from the pool, which should cause a new client to be created on the next call to connect. Granted, this is probably not what you want either as you'll have a bunch of reconnects going on. In a web app, I used to grab a client via connect in the middleware and attach it to the request object as a convenience to save a callback in the main "controller" function and ran into this same problem. My fix was to emit "drain" on the response.end, but ultimately I ended up just installing plv8 on the server as dealing with transactions in an async environment turns out to be a PITA. plv8 is very cool, has a synchronous workflow yet is very performant because no round trips to the server. +1 for better logging, I've added logging queries to console, etc. Generic-pool used by node-postgres already has some debug logging, but it's not exposed via the node-postgres API. Pull request forth coming when I get my project in a deployable state! |
No, I'm aware of the poolIdleTimeout -- the issue is that if you do grab a connection from the pool; then don't do anything with it -- ie: pg.connect(..., callback(err,client) { // Use my client to run query However, like I said above I'm not sure a fix for this because I asked for the connection and until I use it pg library is making a "proper" assumption that I needed it, so because I haven't used it yet it leaves it alone. The only solution that seems worth while is adding some additional (on by default) warning logging would be good if the pool gets to 0 available connections available in the pool and the waiting connections is greater than 10% of the pool size. Doing a drain emit, would work to return it to the pool. Might be better to have helper function that does this (i.e. client.finished(), client.returnToPool(), or client.unused() ) -- just something that documents this issue because at least no one on my team had a clue this could happen. And so it was a lengthly debug session to track it through our code back to this issue. On the logging; I can see immediately that I have 5 being used, and 3 waiting in the pool in a few seconds I would expect the numbers to be somewhere around (since those three waiting in the queue will create 3 new connections) But if I see (frequently): In addition the additional debug logging has an random "id" associated with every connection; so that I can see which connection leaked and who called it. (this.id = parseInt(Math.random()*10000,10); at client connection creation) -- All the logging has the "id" with it so that I can easily trace a id through the log, when it is returned to the pool, any re-use, and finally its cleanup As for the plv8 -- that is very cool, I'm going to have to look into that, I hadn't see that project, thx! :) |
+1 for exposing the built in pool logging from node-pool since it's already there. As for adding debugging logging into node-postgres in other places I'm not a fan of that. If you really want to add logging you can monkey patch around the methods you need to log. Or log when you call into the node-postgres library. I find having different modules all log a different way is annoying (like the way express and socket.io log for you by default). Logging should be handled in the application layer since it's infinitely varied depending on the needs and likes of the developer reading the log. I view requesting a connection from the pool and not using the connection as a code bug, actually. If you take a connection from the pool you better use it. If you don't use it you can manually emit a
Should be moved outside of the |
Oh, I don't disagree Brian -- that is the best solution; I am verifying outside the connect that none of the variables that typically get built and sent to the server are not null and then bypassing the connect in the event that all of them are null. :-) And I would also consider it a bug in our code also. However, this is a nasty gotcha if you don't know about it. I fully understand the reasoning why it works the way it does and I'm not advocating any sweeping changes; all I'm advocating is something like Adding a simple: Which would be a simple quick debug code to make sure that this is caught early rather than much, much later after a very long debug session. :-) My team probably won't get caught by this again; I want to make sure tomorrow someone else doesn't get caught by it. |
We were also caught by this. In our previous code base (before we added caching) we fetched a database connection for each user request. In the cases where one wasn't used because the user made an invalid request, the connection was leaked. Until I saw this issue, we never actually took the time to find this problem. Now that we ask for connections lazily, it all works. However, I think the way this pool functions is fundamentally broken. Having the pool code guess when to free the connection is alright only if it 100% correct. Since it isn't, it should not accept that responsibility. If it were me, I'd make the connection objects proxies that are actually connected when they are first used. That way, you wouldn't leak them. Also, putting a warning somewhere might also be a good idea. |
I like the idea of returning a proxy which only connects if it receives a query. |
Did the debug code ever enter the codebase ? I've code starving while waiting for a callback after exactly 5 connection failures and wondering if it's due to thi problem.. |
It hasn't entered the codebase. Do you have a failing test case for this? |
On Fri, Jul 13, 2012 at 08:51:02AM -0700, Brian Carlson wrote:
Not really. I'm cheasing an heisenbug but can't really tell yet if it's rooted --strk; |
On Fri, Jul 13, 2012 at 06:04:56PM +0200, strk wrote:
Oh, by the way, the tests do run fine for me, but I guess none of the tests --strk; |
try
|
On Fri, Jul 13, 2012 at 01:06:59PM -0700, Brian Carlson wrote:
Ah, great! May I suggest to have the required tables created as part of it ? It is possible you have not yet run the table create script under script/create-test-tables --strk; |
I'm going to talk to felixge sometime soon about connection pooling and hopefully have a little more elegant solution once we put our heads together |
@NathanaelA asked for a |
Yeah, I think the methods on the pooled clients dealing with checking in/checking out should semantically match the actual operations. I've been thinking a lot about this recently and it will probably require a new-ish API for using the pool. I'm thinking actually of implementing the pool in a stand-alone module, but I'm still not sure about that. What do you propose the |
FYI we were just bit by this pretty hard, eventually causing our app to hang. I've removed all the paths through our code where we would grab a client and not do anything with it, solving the immediate problem. I'm considering doing something with a process.nextTick check against the queue size to fire drain, especially since we can't (immediately) upgrade to node 0.8 and use newer stuff. Would such a patch be useful? |
@mattly I don't get your process.nextTick idea. I you don't mind provide us with a pull request so we can discuss the code? |
For what it is worth, I seem to be hitting this issue too. The way I structure my code is to have apply a "decorator" to all my data-functions that checks out the postgres client, and returns it whenever the callback is executed (this is similar to the Since the pool handling is done outside my function, it is tough to ensure that I don't have a return path that discards the client without use. I'm thinking about just using node-pool manually since my use-case might be too hard to bake in. |
It seems like the API in this case is backwards. We're instantiating and obtaining connections and then we're stuck with the problem of when to drain/release them back into the pool. Just thinking out loud here, but wouldn't it be interesting to have an API like this: var query = new Query('select * from users');
var config = 'postgres://localhost/my_database'; // could be an object of params as well
query.run(config, function (err, results) {
// do something with the results
}); In this API the The var query = new Query('LISTEN my_channel');
var config = 'postgres://localhost/my_database';
var client = query.run(config, function (err, results) {
// do something with the results
});
client.pauseDrain(); This way you could explicitly tell the client that you want it to hang around for a while and not be automatically collected. Just some thoughts, but it would help to resolve this issue and some of the headaches around managing the client pool. |
How are you envisioning transactions? |
@nick-apperson Off the top of my head, I think we could just handle them in a similar fashion. var trans = new Transaction();
trans.add(query1);
trans.add(query2);
trans.add(query3);
trans.run(config, function (err, results) {
// do something with the results
}); If you needed results on a per-query basis, you could aggregate them in the returned results object. |
@mjijackson That behavior would be broken. Often, when you use transactions, you do not know when you start how many queries you need. Perhaps it would be better to do: var trans = new Transaction();
trans.run(query1, callback1);
trans.run(query2, callback2);
trans.commit(final_callback) |
That looks good to me. |
About bringing transactions into library interface, IMHO, it will only complicate the library without a good return of investment. In its current use, a connection/client stands at the perfect point as a container for transactions or any other grouping of queries. Once you start bringing SQL into library, the need for adding more and more containers will never end. Fortunately this is not a Java library. In an application, I switch between schemas, run a few queries and then switch to another. In this application, multiple jobs are done concurrently so I need each job to keep things separately. These group of queries can of course be contained in separate transactions; but it will complicate things that are already really simple. I believe the library is a good example of Node's philosophy of keeping things simple and playable so people can play with them as they need. Suggested implementations are trivial to implement so anybody can do it if they wish that convenience in their application. Bringing these into the library itself will break the beauty of simplicity it currently has. For the original issue, I believe, keeping the solution in the implementation of client object is the correct way to go. Making the client object separate from an actual connection and having the actual connection only when the first query arrives looks like a very reasonable solution. It is simple and beautiful it solves the problem without any change to API. On the other hand, returning a client object from a query exec call is not a nice way of carrying implementation details to API level. These are just my humble opinions as I stated at the beginning. |
ugg I just typed a huge response and accidentally deleted it. If you have it in your email, would you be so kind as to repost it here or forward it to me? The tl;dr version is this -
var pg = require('pg');
var connectionString = 'pg://blabla@blabla:5432/bla'
for(var i = 0; i < 100; i++) {
pg.connect(connectionString, function(err, result) {
//do nothing
});
} We want connections to be lazily initialized only on the first call to |
From brianc: "I absolutely agree with @hasanyasin -- honestly he said it better than I could have with regards to a transactions API I believe the library is a good example of Node's philosophy of keeping things simple and playable so people can play with them as they need. Suggested implementations are trivial to implement so anybody can do it if they wish that convenience in their application. It might be good to come up with a wrapper module you enjoy using with transaction support, sql support and so on for a higher-level API. I definitely think the pooling logic within node-postgres needs some iterations or tweaks to get rid of the large gotcha!'s it currently contains, but I aim for "the simplest thing that could possibly work" valuing correctness, explicitness, and error-free over succinctness or high-level API. This is, after all, a database client and not a fully implemented data access layer. The less we can assume for our consumers of the API here the less our consumers, present and future, will have to force their code to work around our assumptions or monkey-patch their way back to their version of code sanity. Now, to the implementation of the lazy client, what we want is something like this? var pg = require('pg'); for(var i = 0; i < 100; i++) { I agree that bringing transactions into the system would be unnecessary and undesired. However, there is connection pooling built in that is fundamentally broken. I wrote a connection pool for our redis connections in about an hour. Connection pooling is easy to do. What is not easy is working with connection pooling that is broken. I'd honestly prefer it just wasn't in the node pg module at all. |
Thanks @nick-apperson...I accidentally clicked "delete" and then, like always, clicked the confirmation box without reading it. Impatience FTL |
I've changed the pool substantially to no longer rely on the https://github.com/brianc/node-postgres/wiki/pg |
Bumps [js-yaml](https://github.com/nodeca/js-yaml) from 3.12.0 to 3.13.1. - [Release notes](https://github.com/nodeca/js-yaml/releases) - [Changelog](https://github.com/nodeca/js-yaml/blob/master/CHANGELOG.md) - [Commits](nodeca/js-yaml@3.12.0...3.13.1) Signed-off-by: dependabot[bot] <[email protected]>
If I grab a connection and never send a query on to the server this will leak the connection out of the pool. Once you hit the max (default is 10) - you are hosed. Not sure there is a lot that can be done about this, since you don't know why they got the connection and why it is still unused. Maybe add a debug mode timer to check how old all unused connections are and spit out an error in the event they are > 60 seconds. -- but at the minimum it probably should be documented somewhere.
Consideration for debugging purposes:
I added a "this.id = parseInt(Math.random()*10000,10);" to the client.js p.connect routine. Then I spit this # out on every pool log output. This allowed me to see the entire flow of a pooled connection (create, reuse, return, pause, drain, destruction) and find the missing connection. Then in my code I was able to do client.id to track it through my code. This was very useful for debugging purposes. You might also consider adding to your pool logging how many connections are out (count - available, including at the automated destruction routine how many are still "out").
The text was updated successfully, but these errors were encountered: