-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, lambda and similar environments are special as the freeze / thaw cycle doesn't allow for long term of holding TCP sockets. It's best to treat each request as stateless and open / close any connections on a per request basis. There's a couple threads in the past discussing this topic: Creating a pool per request and destroying it at the end of the request should work fine. You can do the same with a single client as well. I'd suggest either using the pool interface directly with If they're independent SQL statements it should be done via If you are going the pool route, you should also consider limiting the size of the pool. Some people set it to one (1) so that no one lambda can monopolize all the DB connections. There's also a set of extension events for Lambda that could be used to clean up resources if it's being shutdown: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html. If you register an external extension, your lambda will receive a SIGTERM signal prior to being shutdown. You could use that signal to close a shared connection pool. There's some examples of it here: https://github.com/aws-samples/graceful-shutdown-with-aws-lambda It's more complicated than simply having things be created per-request, but may be worth looking into. |
Thanks for the response.
The way I described it in the question, the idea is that the code is not supposed to be sequential, as IMO it would take too much of a performance hit (e.g. concurrently retrieving multiple result sets which do not depend on each other, or updating multiple records, as many of them as possible at the same time, where the update is sufficiently different for each record that it can't be done using a single UPDATE...SET...WHERE command, but the UPDATE must be applied on a single record each time). (Edit: OK, the second example should be using a transaction, so it would be limited to a single connection anyway, but bear with me on that one) In this case, would a better solution be to obtain/create a client instance for each concurrent query we're planning to execute, and make sure they're all released (in the finally block) before calling pool.end()? |
If you want the DB operations to actually happen concurrently then you will need a separate connection for each one. A single connection can only operate serially. The driver's query interface internally will wait for one query to complete before starting a new one (though again I personally do not like relying on that). You cannot perform multiple SQL DML (i.e. INSERT, UPDATE, DELETE, ...) commands concurrently in a transaction. That's nothing to do with this driver. Postgres simply does not support that as a given connection can only execute a single command at a time. If the operations are DML you cannot do it concurrently. If the operations are only reads (i.e. SELECT), to execute multiple commands concurrently, create a pool with a size greater than one (I think default is 16), and you can use it via Note that each connection will have it's own transaction snapshot and may not see exactly the same data. That's how the Postgres MVCC concurrency model works. That's generally fine for most applications. If not, look into setting an explicit transaction snapshot management to control the exact snapshot each connection reads from: https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION (this is well beyond the scope of this driver so if you have questions on that follow up on the Postgres mailing lists). |
me either! Just a heinous API decision I made 12 years ago I don't want to remove because it could subtly break a ton of old legacy code. I did try to remove any references to that behavior in any documentation. Though maybe in the next major version I could deprecate it w/ a warning & then remove it in the even later major version. A single client (not pool) should throw/reject if its busy running a query already IMO. |
Hey @ivrtaric Can you tell me if you are able to connect to the RDS proxy from AWS lambda? if yes, then how? what's your pool settings look like. For me, I am getting "Connection terminated unexpectedly" when using "kysely" query builder and using their postgres dialect which under the hood uses node-postgres. My localhost postgres server connection works just fine but when I deploy and try to connect to RDS proxy (using username + password) with proper security group settings, I get "Connection terminated unexpectedly". While researching regarding the issue, I found this github issue: #2112 If you could tell me how you are able to connect and your pool settings, I would really appreciate it. PS: I tried to use this library too https://github.com/igalklebanov/kysely-postgres-js Again, my apologies for hijacking this thread. Regards. |
Hi @geekyayush From what I know, there should be no particular difference between the ways you're connecting to an RDS Proxy and to an actual RDS database, aside from the POSTGRES_HOST value, as the RDS Proxy only exposes an additional hostname to connect to, and the actual connections to the database are handled by the Proxy in the background. There are a couple of ways you can investigate where the problem might lie:
|
Hello everyone,
What would be the best way to use the Pool object together with the RDS Proxy, i.e. to be able to run multiple queries concurrently (e.g.
await Promise.all([ query1(), query2(), query3() ])
) during a single request to the Lambda handler, but also to make sure the connections are correctly returned to the RDS Proxy pool after the request is processed?All around the web the suggestion for using the Pool is - declare the pool variable outside the handler function, create the Pool instance on the first request, then have the subsequent requests reuse the pool, removing the need to connect to the DB every time the Lambda is invoked. However, if the Lambda is not invoked often, and/or requires as many connections as it can get (with some reasonable upper limit), it would seem proper to return those connections to the RDS Proxy as soon as the processing is finished, instead of keeping those connections reserved for the Lambda lifetime, and only returning them after the Lambda is shut down.
I was thinking about something similar to this:
Does this type of Pool usage make sense, or is there some other (better?) way of doing this?
The text was updated successfully, but these errors were encountered: