Skip to content

Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ivrtaric opened this issue Jun 21, 2023 · 6 comments
Closed

Question: AWS Lambda + RDS Proxy and Pool class - Best practices? #3016

ivrtaric opened this issue Jun 21, 2023 · 6 comments

Comments

@ivrtaric
Copy link

Hello everyone,

What would be the best way to use the Pool object together with the RDS Proxy, i.e. to be able to run multiple queries concurrently (e.g. await Promise.all([ query1(), query2(), query3() ])) during a single request to the Lambda handler, but also to make sure the connections are correctly returned to the RDS Proxy pool after the request is processed?

All around the web the suggestion for using the Pool is - declare the pool variable outside the handler function, create the Pool instance on the first request, then have the subsequent requests reuse the pool, removing the need to connect to the DB every time the Lambda is invoked. However, if the Lambda is not invoked often, and/or requires as many connections as it can get (with some reasonable upper limit), it would seem proper to return those connections to the RDS Proxy as soon as the processing is finished, instead of keeping those connections reserved for the Lambda lifetime, and only returning them after the Lambda is shut down.

I was thinking about something similar to this:

export default const handler = async (event) => {
  let pool: Pool | undefined = undefined;
  let client: PoolClient | undefined = undefined;

  try {
    pool = new Pool({ ...config });
    client = await pool.connect();

    const [value1, value2, value3] = await Promise.all([
      client.query(query1),
      client.query(query2),
      client.query(query3)
    ]);

    // Do something with values
    return retValue;
  } catch (e) {
    // Error handling
  } finally {
    await client?.release();
    await pool?.end();
  }
};

Does this type of Pool usage make sense, or is there some other (better?) way of doing this?

@sehrope
Copy link
Contributor

sehrope commented Jun 22, 2023

Yes, lambda and similar environments are special as the freeze / thaw cycle doesn't allow for long term of holding TCP sockets. It's best to treat each request as stateless and open / close any connections on a per request basis.

There's a couple threads in the past discussing this topic:

#1180
#2112
#2243

Creating a pool per request and destroying it at the end of the request should work fine. You can do the same with a single client as well.

I'd suggest either using the pool interface directly with pool.query(...) or changing your code to be sequential. Your example is concurrently using the same client which, while it does work, I personally think is a bad idea as there's no explicit order to the commands being executed.

If they're independent SQL statements it should be done via pool.query(...). If they're meant to be run in a particular order, then each should be resolved before the next one.

If you are going the pool route, you should also consider limiting the size of the pool. Some people set it to one (1) so that no one lambda can monopolize all the DB connections.

There's also a set of extension events for Lambda that could be used to clean up resources if it's being shutdown: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html. If you register an external extension, your lambda will receive a SIGTERM signal prior to being shutdown. You could use that signal to close a shared connection pool. There's some examples of it here: https://github.com/aws-samples/graceful-shutdown-with-aws-lambda

It's more complicated than simply having things be created per-request, but may be worth looking into.

@ivrtaric
Copy link
Author

ivrtaric commented Jun 23, 2023

Thanks for the response.

I'd suggest either using the pool interface directly with pool.query(...) or changing your code to be sequential. Your example is concurrently using the same client which, while it does work, I personally think is a bad idea as there's no explicit order to the commands being executed.

The way I described it in the question, the idea is that the code is not supposed to be sequential, as IMO it would take too much of a performance hit (e.g. concurrently retrieving multiple result sets which do not depend on each other, or updating multiple records, as many of them as possible at the same time, where the update is sufficiently different for each record that it can't be done using a single UPDATE...SET...WHERE command, but the UPDATE must be applied on a single record each time).

(Edit: OK, the second example should be using a transaction, so it would be limited to a single connection anyway, but bear with me on that one)

In this case, would a better solution be to obtain/create a client instance for each concurrent query we're planning to execute, and make sure they're all released (in the finally block) before calling pool.end()?

@sehrope
Copy link
Contributor

sehrope commented Jun 23, 2023

If you want the DB operations to actually happen concurrently then you will need a separate connection for each one. A single connection can only operate serially. The driver's query interface internally will wait for one query to complete before starting a new one (though again I personally do not like relying on that).

You cannot perform multiple SQL DML (i.e. INSERT, UPDATE, DELETE, ...) commands concurrently in a transaction. That's nothing to do with this driver. Postgres simply does not support that as a given connection can only execute a single command at a time. If the operations are DML you cannot do it concurrently.

If the operations are only reads (i.e. SELECT), to execute multiple commands concurrently, create a pool with a size greater than one (I think default is 16), and you can use it via pool.query(...). The connections will be independent and can execute their commands concurrently.

Note that each connection will have it's own transaction snapshot and may not see exactly the same data. That's how the Postgres MVCC concurrency model works. That's generally fine for most applications. If not, look into setting an explicit transaction snapshot management to control the exact snapshot each connection reads from: https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION (this is well beyond the scope of this driver so if you have questions on that follow up on the Postgres mailing lists).

@brianc
Copy link
Owner

brianc commented Sep 1, 2023

(though again I personally do not like relying on that).

me either! Just a heinous API decision I made 12 years ago I don't want to remove because it could subtly break a ton of old legacy code. I did try to remove any references to that behavior in any documentation. Though maybe in the next major version I could deprecate it w/ a warning & then remove it in the even later major version. A single client (not pool) should throw/reject if its busy running a query already IMO.

@geekyayush
Copy link

Hey @ivrtaric
Sorry for chiming in.

Can you tell me if you are able to connect to the RDS proxy from AWS lambda? if yes, then how? what's your pool settings look like.

For me, I am getting "Connection terminated unexpectedly" when using "kysely" query builder and using their postgres dialect which under the hood uses node-postgres.

My localhost postgres server connection works just fine but when I deploy and try to connect to RDS proxy (using username + password) with proper security group settings, I get "Connection terminated unexpectedly".

While researching regarding the issue, I found this github issue: #2112

If you could tell me how you are able to connect and your pool settings, I would really appreciate it.

PS: I tried to use this library too https://github.com/igalklebanov/kysely-postgres-js
it uses postgres.js
but then I get some different issue (probably same under the hood but maybe postgres.js parses it differently)

Again, my apologies for hijacking this thread.

Regards.

@ivrtaric
Copy link
Author

Hi @geekyayush

From what I know, there should be no particular difference between the ways you're connecting to an RDS Proxy and to an actual RDS database, aside from the POSTGRES_HOST value, as the RDS Proxy only exposes an additional hostname to connect to, and the actual connections to the database are handled by the Proxy in the background.

There are a couple of ways you can investigate where the problem might lie:

  • first of all, check your Kysely/node-postgres connection properties, maybe you need to specify that the connection should use SSL. Also, if there are any other devs successfully connecting to the same RDS Proxy (from different apps), check with them the connection properties they used
  • try connecting directly to the remote Aurora/RDS Postgres instance, i.e. not through the RDS Proxy. If that fails, then check with the admins of your AWS account to setup the database access permissions properly. If that works, though, then see what needs to be modified in the RDS Proxy access permissions
  • try creating a small lambda and use the node-postgres package without any wrappers to connect both to the RDS Proxy and the actual remote database, and issue a simple query, something like SELECT NOW(). If both pass, then the issue might lie somewhere in the wrapper library you're using

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants