Skip to content

Deadlock with recursive task::block_on #644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rmanoka opened this issue Dec 23, 2019 · 16 comments
Closed

Deadlock with recursive task::block_on #644

rmanoka opened this issue Dec 23, 2019 · 16 comments
Labels
bug Something isn't working

Comments

@rmanoka
Copy link

rmanoka commented Dec 23, 2019

When calling task::block_on recursively, the executor seems to dead-lock even when the recursion depth is much smaller than num cpus.

Sample code (deadlocks on a 8-cpu machine):

#[async_std::test]
async fn test_async_deadlock() {
    use std::future::Future;
    use futures::FutureExt;
    fn nth(n: usize) -> impl Future<Output=usize> + Send {
        async move {
            async_std::task::block_on(async move {

                if n == 0 {
                    0
                } else {
                    let fut = async_std::task::spawn(nth(n-1)).boxed();
                    fut.await + 1
                }

            })
        }
    }
    let input = 2;
    assert_eq!(nth(input).await, input);
}

It seems that the test should deadlock when input >= num_cpus, but even when input=2, on a 8-cpu machine this seems to deadlock. Is this expected behaviour? Interestingly, input = 1 (which does involve a recursive call) does not deadlock. If the block_on is removed, the test indeed passes for large input values.

Aside: it would be great if the executor could detect block_on called within the pool processor threads, and spawn more threads. The block_on could be considered an explicit hint that the particular worker is probably going to block.

@ghost
Copy link

ghost commented Dec 24, 2019

I believe this should work with the new scheduler: #631

Can you try running the test again with async-std from the new-scheduler branch?

@rmanoka
Copy link
Author

rmanoka commented Dec 24, 2019

@stjepang Absolutely! Works in the new-scheduler branch.

@kylerky
Copy link

kylerky commented Dec 26, 2019

Still there is some problem with recursive block_on.

It says in the documentation that

Calling this function is similar to spawning a thread and immediately joining it, except an asynchronous task will be spawned.

But it is not true.

use async_std::{stream, sync, task};

use futures::select;
use futures::{FutureExt, StreamExt};

use std::time::Duration;

fn main() {
    let (close_tx, close_rx) = sync::channel::<()>(1);
    task::spawn(async move {
        task::sleep(Duration::from_secs(1)).await;
        let _tx = close_tx;
        println!("close_tx should be dropped");
    });
    task::block_on(async move {
        let mut close_rx = close_rx.fuse();
        select! {
            _ = work().fuse() => (),
            _ = close_rx.next() => (),
        }
    });
}

async fn work() {
    let interval1 = stream::interval(Duration::from_millis(3000));
    let interval2 = stream::interval(Duration::from_millis(2000));
    // task::block_on(async move {
    //     let mut interval1 = interval1.fuse();
    //     let mut interval2 = interval2.fuse();
    //     loop {
    //         select! {
    //             _ = interval1.next() => {
    //                 println!("interval1");
    //             },
    //             _ = interval2.next() => {
    //                 println!("interval2");
    //             },
    //         }
    //     }
    // });
    task::spawn(async move {
        let mut interval1 = interval1.fuse();
        let mut interval2 = interval2.fuse();
        loop {
            select! {
                _ = interval1.next() => {
                    println!("interval1");
                },
                _ = interval2.next() => {
                    println!("interval2");
                },
            }
        }
    })
    .await;
}

In the code above, if we replace the spawn-then-await with block_on, the program will never exit, which is true for both the new scheduler and the old one.

In my use case, I have to use block_on because the Future I am dealing with is non-Send.

@rmanoka
Copy link
Author

rmanoka commented Dec 26, 2019

@kylerky I think your example could be simplified as:

#[async_std::test]
async fn block_on() {
    use async_std::{task, future::{timeout, pending}};
    use std::time::Duration;
    task::block_on(async {
        let _ = timeout(
            Duration::from_secs(1),
            async {
                task::block_on(pending::<()>())
            }
        ).await;
    });
}

The reason it never finishes it because the inner block_on is stuck on a never ending Task. I believe the equivalent threading-based code is:

std::thread::spawn(|| {
    loop {}
}).join();

Once the above statement runs, both the spawned thread, and the parent thread can't be stopped unless the whole process itself aborts or exits. See how to terminate or suspend a rust thread. As suggested in the link, you could use (async variant of) channels or other communication to signal the timeout to the inner task, to make it exit.

An issue with the current scheduler is that it uses a slot as an optimization. This does not seem to play nicely with block_on. When spawning from within a block_on construct, the spawned Task gets placed inside this slot, and it can never be stolen by other processors (i.e. stealers) hence creating deadlocks.

@kylerky
Copy link

kylerky commented Dec 26, 2019

@rmanoka Thanks for your explanation.

I find it a surprise that tasks spawned by block_on cannot be stolen because I just assume that every tasks can. I think this should at least be documented.

Also, the problem is that block_on, I think, is the only way to spawn non-Send futures in async-std. If it is used in a library, we may easily end up with a recursion like the ones above.

@rmanoka
Copy link
Author

rmanoka commented Dec 26, 2019

Since block_on accepts futures that are not Send, they are anyway not safe to be off-loaded to other threads, and hence are never stolen by the other processing threads. I meant that even spawn from within block_on do not get stolen; it doesn't happen in the example you posted though.

@kylerky
Copy link

kylerky commented Dec 26, 2019

By being stolen, I do no mean offloading the task to a different thread.

In the spawn version, when one of the intervals goes to sleep, the control yields to the the executor and it can poll close_rx and work, where it can poll the two intervals.

However, in the recursive block_on version, the executor will only poll the two intervals.

@algesten
Copy link

algesten commented Sep 6, 2020

@rmanoka

An issue with the current scheduler is that it uses a slot as an optimization. This does not seem to play nicely with block_on. When spawning from within a block_on construct, the spawned Task gets placed inside this slot, and it can never be stolen by other processors (i.e. stealers) hence creating deadlocks.

Is this still true?

My mental model is that async_std::task::spawn will put any spawned task into some big happy pool of tasks that are handled by the executor threads.

But if I understand what you're saying correctly, spawn happening from within a async_std::task::block_on are special and are effectively "orphaned" if they are not finished by the time block_on exits.

It seems like a quite severe restriction that should at the very least be clearly documented.

@ghost
Copy link

ghost commented Sep 6, 2020

Is this still true?

It's not true anymore. More info: smol-rs/smol#177 (comment)

@rmanoka
Copy link
Author

rmanoka commented Sep 7, 2020

@algesten In the latest release (1.6.3), async-std no longer accepts the above example and panics if spawn is called inside block_on. However, using either smol::block_on or futures_lite::block_on does work (for input smaller than #cores).

It would be nice to document the panic behaviour, if it is to be expected though.

@installgentoo
Copy link

@rmanoka
on 1.6.4 it doesn't panic anymore, but it does deadlock once again with block_on inside spawn, when i have 1 core.

smol on the other hand works perfectly on one cpu with same block_on/spawn/block_on recursion, even with 4 smol threads. perhaps it would lock up on deeper recursion with a bunch of heavy tasks, i dunno.

moreover, smol has an unbounded sync channel with try_send, so you don't even need to call block_on inside other tasks, because you can just communicate from sync context into async normally.

i'm jumping async_std for smol, personally.

@rmanoka
Copy link
Author

rmanoka commented Sep 21, 2020

@installgentoo This is fantastic! The use-case for block_on (possibly inside a task) comes from the async-scoped crate, where unfortunately, the only safe abstraction we could think of was to block_on a particular future (and the call might be happening inside a task). I think the deadlock for arbitrary depths of recursion may unfortunately not be avoidable unless the executor dynamically spawns more worker threads.

I'm closing this as the original issue has been solved. Shout out to the authors of this and the smol crate for working on this!

@rmanoka rmanoka closed this as completed Sep 21, 2020
@ghost
Copy link

ghost commented Sep 21, 2020

@rmanoka By the way, smol::Executor can now spawn non-'static futures, while having a safe and sound API: https://docs.rs/smol/1.2.1/smol/struct.Executor.html#method.spawn

@rmanoka
Copy link
Author

rmanoka commented Sep 21, 2020

@stjepang this is awesome! Just clarifying: to spawn a Future: 'a , I'll need to create a new Executor<'a>, correct? The motivation behind async-scoped was to provide something like crossbeam scope, but for tasks and hopefully not spawning new threads for each task.

@ghost
Copy link

ghost commented Sep 21, 2020

Yes, exactly - you can think of Executor<'a> as a scope that is allowed to borrow anything with lifetime 'a. Note that Executor<'a> is something you need to "drive" yourself with run() or tick() - it won't run tasks by itself. There are no threads involved unless you spawn your own threads and then call run() on them.

This design makes it possible to do lots of interesting things - here's a scoped executor with task priorities:
https://github.com/stjepang/async-executor/blob/master/examples/priority.rs

@rmanoka
Copy link
Author

rmanoka commented Sep 21, 2020

That is a very elegant abstraction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants