Skip to content

avoid static lifetime requirement for handler functions #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rasviitanen opened this issue Mar 28, 2021 · 21 comments
Closed

avoid static lifetime requirement for handler functions #310

rasviitanen opened this issue Mar 28, 2021 · 21 comments

Comments

@rasviitanen
Copy link
Contributor

rasviitanen commented Mar 28, 2021

I would like to remove the requirement of having 'static bounds on the handler generics if possible. This would make it easier to pass in shared resources that have been setup/created before creating the handler and avoid a bunch of cloning.

It seems like a tokio::spawn is used to produce a future here:

let task = tokio::spawn(async move { handler.call(body, ctx) });
let req = match task.await {

By taking a guess, I would assume that the reason is to catch a panic when generating the future (the JoinHandle is immediately awaited and nothing else seems to be going on).
I might very well be wrong here, so if the reason for using a tokio::spawn has to do with something else, please correct me.

Since tokio::spawn requires both the future and the output to be 'static, all the handler generics needs to be bounded by 'static. This makes it quite cumbersome to use shared resources. Consider:

let client = SharedClient::try_from("Shared Client 1 (perhaps a database)")?;
let client_ref = &client;
lambda_runtime::run(handler_fn(|event: Request, ctx: Context| async move {
    let command = event.command;
    Ok::<Response, Error>(client_ref.response(ctx.request_id, command))
}))
.await?;

This would not compile, as client_ref is not 'static. To me it would make sense if you were able to do some initial setup (e.g. database clients/initializing env-configs etc) before calling run.

I was thinking it would be possible to remove the static requirements by removing tokio::spawn?
Perhaps it can be replaced with a catch_unwind?

I have been experimenting a bit here (I added an example as well):
rasviitanen@9f9b674
Would this work or would it result in other problems I have not considered?

PS great work on the 0.3 release! :)

@bahildebrand
Copy link
Contributor

bahildebrand commented Mar 28, 2021

We use tokio::spawn in this case to allow us to handle multiple requests at once. I'd recommend reading through the whole tutorial, but this section in particular goes over why would want to use tokio::spawn. Your changes would allow us to only respond to a single invocation of the lambda at a time.

For your specific example I would suggest reading this why there is the requirement for the 'static lifetime. TLDR: since a spawned task can last for the lifetime of a program the compiler needs to be certain that any data inside of it is not dropped for the lifetime of a program. In this case you're taking a reference to client which has no such guarantee.

The above linked example on the 'static requirement hints towards what the solution to your issue is. By wrapping your client in an Arc you will be able satisfy this requirement, as you can ensure that your client is not dropped as long as there is a reference to it.

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Mar 28, 2021

I know how to solve the problem and I am fully aware of what a static lifetime is and why the requirement exists.
My point is that it might not be required to have a static lifetime here.

re:

Your changes would allow us to only respond to a single invocation of the lambda at a time.

Given the code today, the code is essentially reduced to:

while let Some(event) = incoming.next().await {
    // ..
    let task = tokio::spawn(async move { handler.call(body, ctx) });
    let req = match task.await {
        Ok(response) => match response.await {
            _ => (),
        },
        Err(err) if err.is_panic() => {
            // ...
        }
        Err(_) => unreachable!("tokio::task should not be canceled"),
    };
    // reply to client
}

If we clean it up a bit and add an example "handler".

let handler = |body, ctx| {
    // lets call this stage 1
    println!("stage 1");
    async move {
        // lets call this stage 2
        println!("stage 2");
    }
}
while let Some(event) = incoming.next().await {
    // ..
    let task = tokio::spawn(async move { handler.call(body, ctx) });
    let req = match task.await {
        // `stage 1` was printed
        Ok(response) => match response.await {
            _ => { /* `stage 2` was printed */ },
        },
        _ => ()
    };
}

So the flow is:

  1. An event is taken from the incoming queue
  2. A future to create a task using the handler is created, this is called task
  3. The future to create a task is immediately awaited (which could be a problem, as you can put some bad blocking code in stage1, so this should perhaps be a spawn_blocking?).
  4. This means that no further events will be consumed from the incoming event queue, as we need to wait for stage1 to run to completion.
  5. If we successfully proceeded to stage 2 without panicking in stage 1, we proceed to await stage 2. This will once again block us from consuming another event from the incoming queue.
  6. Once stage 2 returns with Poll::Ready we are finally able to continue to process the next event.

So to me, even if you do spawn, you are not spawning the expensive-part which is probably in stage2. And as we are awaitng both stage 1 and stage 2 in the while loop, we cannot process more than one event at a time.
Additionally, you are spawning a single future and immediately await it, so it wouldn't come with any concurrency benefits.

I can't see how this processes multiple events concurrently.
Sorry for not understanding the code, but I would be happy if you could clarify.
Maybe there is some stuff going on in the background?

@rimutaka
Copy link
Contributor

I have never seen more than one invocation per lambda instance at a time from AWS. It would either start a new instance or throttle the request until a lambda instance becomes available. So even if we can process multiple requests it never happens.
Am I mistaken?

@rimutaka
Copy link
Contributor

If changes proposed by @rasviitanen don't go ahead I suggest we add an example for using Arc with a shared resource as per #310 (comment)

@bahildebrand
Copy link
Contributor

@rasviitanen Sorry, I think I misread your initial request and may have come off a bit condescending. That was not my intent. Looking at the code section more carefully I think you're correct in that we're not running these events concurrently now, and that our use of tokio::spawn in this context doesn't make a ton of sense. Based on @rimutaka 's comment this is either a logic error, or that I don't understand the life cycle of a lambda.

Considering the next is just the runtime calling that endpoint I think there is chance to handle multiple requests concurrently, but it would mean that multiple would have to be queued at once. If the answer is that we don't care about executing the invocations concurrently then I think that @rasviitanen is correct in that we could remove the 'static requirement.

Based on this snippet from the runtime doc I think we do want to be executing concurrently here, so this loop may need to be reworked:
When your function is invoked, Lambda attempts to re-use the execution environment from a previous invocation if one is available. This saves time preparing the execution environment, and it allows you to save resources such as database connections and temporary files in the execution environment to avoid creating them every time your function runs.

So overall I think @rasviitanen is correct that we don't need the 'static requirement in the codes current form, but I think the current loop might be incorrect, or at the very least leaving some performance on the table for lambdas that are receiving a large number of invocations.

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Mar 28, 2021

@bahildebrand no problem! I think you are right that the correct solution here isn't to remove the spawn, but rather to fix the loop. Given that the interpretation is correct and they don't mean 'after the whole invocation'. I'm not 100% sure about it, I am leaning towards not running the loop concurrently. Thanks for looking into this btw!

@bahildebrand
Copy link
Contributor

To continue the tradition of bouncing back and forth. @rimutaka brought up a good point in a separate thread. Handling these invocations in parallel would break the model for resource management in lambda. I think you were correct in that we can probably remove the 'static lifetime. The use of tokio::spawn looks to be an artifact of a refactor when async/await support was added in the past.

After playing around with refactoring this loop a bit I think you're on the right track. I should have some more time later this week to dig into this further, and I can get back to you then.

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Mar 29, 2021

@bahildebrand Cool, thank you! I have been thinking about the behaviour when creating the task panics, i.e. error_message: format!("Lambda panicked: {}", err),. It seems a bit scary to me. If this message is sent to some end-user it's potentially a security issue. I haven't looked into it in detail, but if that's the case, I think it would make sense to not include the panic information.
(in my example-code. The current one with the tokio error displays a safe panic message)

@rimutaka
Copy link
Contributor

I think the main benefit of tokio::spawn is to isolate the handler function and protect the runtime in case the handler panics.

The error messages logged by the runtime are sent to stdout and end up in CloudWatch. The would need to be sent back to AWS API endpoint as a response for the caller of the lambda to see them.

@rimutaka
Copy link
Contributor

rimutaka commented Apr 2, 2021

In response to @davidbarsky 's comment (big change, small gains) in PR #309 I started looking into the possible performance gains of zero-copy.

David Tolnay, the creator of Serde kindly shared some code we could use for benchmarking. See his full reply on Reddit.

David's sample benchmark:
running 2 tests
test copied   ... bench:       2,153 ns/iter (+/- 258)
test zerocopy ... bench:           7 ns/iter (+/- 2)

Lambda's max payload size is 6MB which can be all in a single base64 blob or spread across thousands of JSON properties, so any gains will be different for different payloads.

@rasviitanen Rasmus, do you want to try it with your type of payload and share the numbers?

David's code:
// [dependencies]
// bincode = "1.0"
// serde = { version = "1.0", features = ["derive"] }
#![feature(test)]
extern crate test;
use serde::Deserialize;

#[derive(Deserialize)]
pub struct Copied {
    pub id: Vec<u8>,
}

#[derive(Deserialize)]
pub struct ZeroCopy<'a> {
    pub id: &'a [u8],
}

fn input() -> Vec<u8> {
    let mut input = Vec::with_capacity(1024);
    input.extend_from_slice(&[248, 3]);
    input.extend_from_slice(&[0; 1022]);
    input
}

#[bench]
fn copied(b: &mut test::Bencher) {
    let input = input();
    b.iter(|| bincode::deserialize::<Copied>(&input).unwrap());
}

#[bench]
fn zerocopy(b: &mut test::Bencher) {
    let input = input();
    b.iter(|| bincode::deserialize::<ZeroCopy>(&input).unwrap());
}

@bahildebrand
Copy link
Contributor

@rasviitanen sorry for not responding in a while, this fell off of my radar. Were you able to run @rimutaka code and get any results off of your branch?

Additionally looking at it a bit more I think @rimutaka is correct in that the spawn call is here for isolating the handler from the runtime in case of a panic. I think that would need to be taken into account if we wanted to remove the 'static lifetime.

@rasviitanen
Copy link
Contributor Author

rasviitanen commented Apr 14, 2021

@bahildebrand Sorry I forgot about this too. I'll see if I can find some time to test it, we have some different payloads I can test.

re:

isolating the handler from the runtime in case of a panic

This is what I have been suggesting from the start. A possible solution is to use std::panic::catch_unwind to isolate the panic.
We need to make sure that the response message is what we want it to be however, as the information from catch_unwind is different than the information from tokio's panic handler. While at it, it might be good to catch a panic in the future's execution as well.

@rasviitanen
Copy link
Contributor Author

@bahildebrand It looks like the issue for the serde stuff was closed, I guess I can skip the benchmarks?

@bahildebrand
Copy link
Contributor

Yeah, I think you're fine. Feel free to open a PR with these changes so we can look at it more in depth.

@rasviitanen
Copy link
Contributor Author

@bahildebrand sure! Here is the PR: #319

@rimutaka rimutaka mentioned this issue May 23, 2021
2 tasks
@rimutaka
Copy link
Contributor

  1. I can report that Fix rust format issue #330 works great and I'm saving ~500ms on every invocation pre-fetching and caching some values locally. Yaaaaay! Thanks everyone involved! :)

  2. I wonder if there is an elegant way of making these shared resources mutable.
    I'm happy to put some time into a PR to enable that.

@milenkovicm
Copy link

  1. I can report that Fix rust format issue #330 works great and I'm saving ~500ms on every invocation pre-fetching and caching some values locally. Yaaaaay! Thanks everyone involved! :)

  2. I wonder if there is an elegant way of making these shared resources mutable.

I'm happy to put some time into a PR to enable that.

Thanks for the info @rimutaka

Question about second point, what to you have in mind when you say "more elegant way"? Use of mutex is not that terrible 😀

If performance is consideration, IMHO mutex will do just fine. As function is single threaded, there should be no mutex contention thus lock/unlock should be cheap as no syscall will be made.

@fmonniot
Copy link
Contributor

Hello there, first thanks for allowing sharing resources between lambda's invocation. That's awesome !

Is there any plan to remove that restriction in the lambda-http crate too ? The handler in that crate still has a 'static bound on its future implementation (I have been able to remove it locally by patching the crate and adding an explicit lifetime on the handler, but it would be nice to have a solution upstream).

@coltonweaver
Copy link
Contributor

@fmonniot I think that's a reasonable request. I gave that a quick pass tonight here:

#333

@coltonweaver
Copy link
Contributor

PR merged, hope that helps :)

@altaurog
Copy link

We are just starting out with rust and had some confusion on this issue using lambda_runtime, which we got cleared up in the help forum. Curious if/how this PR will affect us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants