Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016

ivrtaric · 2023-06-21T15:13:38Z

Hello everyone,

What would be the best way to use the Pool object together with the RDS Proxy, i.e. to be able to run multiple queries concurrently (e.g. await Promise.all([ query1(), query2(), query3() ])) during a single request to the Lambda handler, but also to make sure the connections are correctly returned to the RDS Proxy pool after the request is processed?

All around the web the suggestion for using the Pool is - declare the pool variable outside the handler function, create the Pool instance on the first request, then have the subsequent requests reuse the pool, removing the need to connect to the DB every time the Lambda is invoked. However, if the Lambda is not invoked often, and/or requires as many connections as it can get (with some reasonable upper limit), it would seem proper to return those connections to the RDS Proxy as soon as the processing is finished, instead of keeping those connections reserved for the Lambda lifetime, and only returning them after the Lambda is shut down.

I was thinking about something similar to this:

export default const handler = async (event) => {
  let pool: Pool | undefined = undefined;
  let client: PoolClient | undefined = undefined;

  try {
    pool = new Pool({ ...config });
    client = await pool.connect();

    const [value1, value2, value3] = await Promise.all([
      client.query(query1),
      client.query(query2),
      client.query(query3)
    ]);

    // Do something with values
    return retValue;
  } catch (e) {
    // Error handling
  } finally {
    await client?.release();
    await pool?.end();
  }
};

Does this type of Pool usage make sense, or is there some other (better?) way of doing this?

The text was updated successfully, but these errors were encountered:

sehrope · 2023-06-22T14:29:09Z

Yes, lambda and similar environments are special as the freeze / thaw cycle doesn't allow for long term of holding TCP sockets. It's best to treat each request as stateless and open / close any connections on a per request basis.

There's a couple threads in the past discussing this topic:

#1180
#2112
#2243

Creating a pool per request and destroying it at the end of the request should work fine. You can do the same with a single client as well.

I'd suggest either using the pool interface directly with pool.query(...) or changing your code to be sequential. Your example is concurrently using the same client which, while it does work, I personally think is a bad idea as there's no explicit order to the commands being executed.

If they're independent SQL statements it should be done via pool.query(...). If they're meant to be run in a particular order, then each should be resolved before the next one.

If you are going the pool route, you should also consider limiting the size of the pool. Some people set it to one (1) so that no one lambda can monopolize all the DB connections.

There's also a set of extension events for Lambda that could be used to clean up resources if it's being shutdown: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html. If you register an external extension, your lambda will receive a SIGTERM signal prior to being shutdown. You could use that signal to close a shared connection pool. There's some examples of it here: https://github.com/aws-samples/graceful-shutdown-with-aws-lambda

It's more complicated than simply having things be created per-request, but may be worth looking into.

ivrtaric · 2023-06-23T08:24:28Z

Thanks for the response.

I'd suggest either using the pool interface directly with pool.query(...) or changing your code to be sequential. Your example is concurrently using the same client which, while it does work, I personally think is a bad idea as there's no explicit order to the commands being executed.

The way I described it in the question, the idea is that the code is not supposed to be sequential, as IMO it would take too much of a performance hit (e.g. concurrently retrieving multiple result sets which do not depend on each other, or updating multiple records, as many of them as possible at the same time, where the update is sufficiently different for each record that it can't be done using a single UPDATE...SET...WHERE command, but the UPDATE must be applied on a single record each time).

(Edit: OK, the second example should be using a transaction, so it would be limited to a single connection anyway, but bear with me on that one)

In this case, would a better solution be to obtain/create a client instance for each concurrent query we're planning to execute, and make sure they're all released (in the finally block) before calling pool.end()?

sehrope · 2023-06-23T12:40:44Z

If you want the DB operations to actually happen concurrently then you will need a separate connection for each one. A single connection can only operate serially. The driver's query interface internally will wait for one query to complete before starting a new one (though again I personally do not like relying on that).

You cannot perform multiple SQL DML (i.e. INSERT, UPDATE, DELETE, ...) commands concurrently in a transaction. That's nothing to do with this driver. Postgres simply does not support that as a given connection can only execute a single command at a time. If the operations are DML you cannot do it concurrently.

If the operations are only reads (i.e. SELECT), to execute multiple commands concurrently, create a pool with a size greater than one (I think default is 16), and you can use it via pool.query(...). The connections will be independent and can execute their commands concurrently.

Note that each connection will have it's own transaction snapshot and may not see exactly the same data. That's how the Postgres MVCC concurrency model works. That's generally fine for most applications. If not, look into setting an explicit transaction snapshot management to control the exact snapshot each connection reads from: https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION (this is well beyond the scope of this driver so if you have questions on that follow up on the Postgres mailing lists).

brianc · 2023-09-01T17:45:22Z

(though again I personally do not like relying on that).

me either! Just a heinous API decision I made 12 years ago I don't want to remove because it could subtly break a ton of old legacy code. I did try to remove any references to that behavior in any documentation. Though maybe in the next major version I could deprecate it w/ a warning & then remove it in the even later major version. A single client (not pool) should throw/reject if its busy running a query already IMO.

geekyayush · 2023-11-16T11:06:59Z

Hey @ivrtaric
Sorry for chiming in.

Can you tell me if you are able to connect to the RDS proxy from AWS lambda? if yes, then how? what's your pool settings look like.

For me, I am getting "Connection terminated unexpectedly" when using "kysely" query builder and using their postgres dialect which under the hood uses node-postgres.

My localhost postgres server connection works just fine but when I deploy and try to connect to RDS proxy (using username + password) with proper security group settings, I get "Connection terminated unexpectedly".

While researching regarding the issue, I found this github issue: #2112

If you could tell me how you are able to connect and your pool settings, I would really appreciate it.

PS: I tried to use this library too https://github.com/igalklebanov/kysely-postgres-js
it uses postgres.js
but then I get some different issue (probably same under the hood but maybe postgres.js parses it differently)

Again, my apologies for hijacking this thread.

Regards.

ivrtaric · 2023-11-21T12:53:32Z

Hi @geekyayush

From what I know, there should be no particular difference between the ways you're connecting to an RDS Proxy and to an actual RDS database, aside from the POSTGRES_HOST value, as the RDS Proxy only exposes an additional hostname to connect to, and the actual connections to the database are handled by the Proxy in the background.

There are a couple of ways you can investigate where the problem might lie:

first of all, check your Kysely/node-postgres connection properties, maybe you need to specify that the connection should use SSL. Also, if there are any other devs successfully connecting to the same RDS Proxy (from different apps), check with them the connection properties they used
try connecting directly to the remote Aurora/RDS Postgres instance, i.e. not through the RDS Proxy. If that fails, then check with the admins of your AWS account to setup the database access permissions properly. If that works, though, then see what needs to be modified in the RDS Proxy access permissions
try creating a small lambda and use the node-postgres package without any wrappers to connect both to the RDS Proxy and the actual remote database, and issue a simple query, something like SELECT NOW(). If both pass, then the issue might lie somewhere in the wrapper library you're using

ivrtaric closed this as completed Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016

Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016

ivrtaric commented Jun 21, 2023

sehrope commented Jun 22, 2023

ivrtaric commented Jun 23, 2023 •

edited

Loading

sehrope commented Jun 23, 2023

brianc commented Sep 1, 2023

geekyayush commented Nov 16, 2023

ivrtaric commented Nov 21, 2023

Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016

Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016

Comments

ivrtaric commented Jun 21, 2023

sehrope commented Jun 22, 2023

ivrtaric commented Jun 23, 2023 • edited Loading

sehrope commented Jun 23, 2023

brianc commented Sep 1, 2023

geekyayush commented Nov 16, 2023

ivrtaric commented Nov 21, 2023

ivrtaric commented Jun 23, 2023 •

edited

Loading