Deadlock with recursive task::block_on #644

rmanoka · 2019-12-23T23:12:11Z

When calling task::block_on recursively, the executor seems to dead-lock even when the recursion depth is much smaller than num cpus.

Sample code (deadlocks on a 8-cpu machine):

#[async_std::test]
async fn test_async_deadlock() {
    use std::future::Future;
    use futures::FutureExt;
    fn nth(n: usize) -> impl Future<Output=usize> + Send {
        async move {
            async_std::task::block_on(async move {

                if n == 0 {
                    0
                } else {
                    let fut = async_std::task::spawn(nth(n-1)).boxed();
                    fut.await + 1
                }

            })
        }
    }
    let input = 2;
    assert_eq!(nth(input).await, input);
}

It seems that the test should deadlock when input >= num_cpus, but even when input=2, on a 8-cpu machine this seems to deadlock. Is this expected behaviour? Interestingly, input = 1 (which does involve a recursive call) does not deadlock. If the block_on is removed, the test indeed passes for large input values.

Aside: it would be great if the executor could detect block_on called within the pool processor threads, and spawn more threads. The block_on could be considered an explicit hint that the particular worker is probably going to block.

The text was updated successfully, but these errors were encountered:

ghost · 2019-12-24T10:10:53Z

I believe this should work with the new scheduler: #631

Can you try running the test again with async-std from the new-scheduler branch?

rmanoka · 2019-12-24T12:32:52Z

@stjepang Absolutely! Works in the new-scheduler branch.

kylerky · 2019-12-26T10:03:33Z

Still there is some problem with recursive block_on.

It says in the documentation that

Calling this function is similar to spawning a thread and immediately joining it, except an asynchronous task will be spawned.

But it is not true.

use async_std::{stream, sync, task};

use futures::select;
use futures::{FutureExt, StreamExt};

use std::time::Duration;

fn main() {
    let (close_tx, close_rx) = sync::channel::<()>(1);
    task::spawn(async move {
        task::sleep(Duration::from_secs(1)).await;
        let _tx = close_tx;
        println!("close_tx should be dropped");
    });
    task::block_on(async move {
        let mut close_rx = close_rx.fuse();
        select! {
            _ = work().fuse() => (),
            _ = close_rx.next() => (),
        }
    });
}

async fn work() {
    let interval1 = stream::interval(Duration::from_millis(3000));
    let interval2 = stream::interval(Duration::from_millis(2000));
    // task::block_on(async move {
    //     let mut interval1 = interval1.fuse();
    //     let mut interval2 = interval2.fuse();
    //     loop {
    //         select! {
    //             _ = interval1.next() => {
    //                 println!("interval1");
    //             },
    //             _ = interval2.next() => {
    //                 println!("interval2");
    //             },
    //         }
    //     }
    // });
    task::spawn(async move {
        let mut interval1 = interval1.fuse();
        let mut interval2 = interval2.fuse();
        loop {
            select! {
                _ = interval1.next() => {
                    println!("interval1");
                },
                _ = interval2.next() => {
                    println!("interval2");
                },
            }
        }
    })
    .await;
}

In the code above, if we replace the spawn-then-await with block_on, the program will never exit, which is true for both the new scheduler and the old one.

In my use case, I have to use block_on because the Future I am dealing with is non-Send.

rmanoka · 2019-12-26T17:15:08Z

@kylerky I think your example could be simplified as:

#[async_std::test]
async fn block_on() {
    use async_std::{task, future::{timeout, pending}};
    use std::time::Duration;
    task::block_on(async {
        let _ = timeout(
            Duration::from_secs(1),
            async {
                task::block_on(pending::<()>())
            }
        ).await;
    });
}

The reason it never finishes it because the inner block_on is stuck on a never ending Task. I believe the equivalent threading-based code is:

std::thread::spawn(|| {
    loop {}
}).join();

Once the above statement runs, both the spawned thread, and the parent thread can't be stopped unless the whole process itself aborts or exits. See how to terminate or suspend a rust thread. As suggested in the link, you could use (async variant of) channels or other communication to signal the timeout to the inner task, to make it exit.

An issue with the current scheduler is that it uses a slot as an optimization. This does not seem to play nicely with block_on. When spawning from within a block_on construct, the spawned Task gets placed inside this slot, and it can never be stolen by other processors (i.e. stealers) hence creating deadlocks.

kylerky · 2019-12-26T17:49:51Z

@rmanoka Thanks for your explanation.

I find it a surprise that tasks spawned by block_on cannot be stolen because I just assume that every tasks can. I think this should at least be documented.

Also, the problem is that block_on, I think, is the only way to spawn non-Send futures in async-std. If it is used in a library, we may easily end up with a recursion like the ones above.

rmanoka · 2019-12-26T20:08:24Z

Since block_on accepts futures that are not Send, they are anyway not safe to be off-loaded to other threads, and hence are never stolen by the other processing threads. I meant that even spawn from within block_on do not get stolen; it doesn't happen in the example you posted though.

kylerky · 2019-12-26T20:37:14Z

By being stolen, I do no mean offloading the task to a different thread.

In the spawn version, when one of the intervals goes to sleep, the control yields to the the executor and it can poll close_rx and work, where it can poll the two intervals.

However, in the recursive block_on version, the executor will only poll the two intervals.

algesten · 2020-09-06T16:37:04Z

@rmanoka

An issue with the current scheduler is that it uses a slot as an optimization. This does not seem to play nicely with block_on. When spawning from within a block_on construct, the spawned Task gets placed inside this slot, and it can never be stolen by other processors (i.e. stealers) hence creating deadlocks.

Is this still true?

My mental model is that async_std::task::spawn will put any spawned task into some big happy pool of tasks that are handled by the executor threads.

But if I understand what you're saying correctly, spawn happening from within a async_std::task::block_on are special and are effectively "orphaned" if they are not finished by the time block_on exits.

It seems like a quite severe restriction that should at the very least be clearly documented.

ghost · 2020-09-06T16:43:43Z

Is this still true?

It's not true anymore. More info: smol-rs/smol#177 (comment)

rmanoka · 2020-09-07T07:45:57Z

@algesten In the latest release (1.6.3), async-std no longer accepts the above example and panics if spawn is called inside block_on. However, using either smol::block_on or futures_lite::block_on does work (for input smaller than #cores).

It would be nice to document the panic behaviour, if it is to be expected though.

installgentoo · 2020-09-21T08:48:01Z

@rmanoka
on 1.6.4 it doesn't panic anymore, but it does deadlock once again with block_on inside spawn, when i have 1 core.

smol on the other hand works perfectly on one cpu with same block_on/spawn/block_on recursion, even with 4 smol threads. perhaps it would lock up on deeper recursion with a bunch of heavy tasks, i dunno.

moreover, smol has an unbounded sync channel with try_send, so you don't even need to call block_on inside other tasks, because you can just communicate from sync context into async normally.

i'm jumping async_std for smol, personally.

rmanoka · 2020-09-21T17:53:23Z

@installgentoo This is fantastic! The use-case for block_on (possibly inside a task) comes from the async-scoped crate, where unfortunately, the only safe abstraction we could think of was to block_on a particular future (and the call might be happening inside a task). I think the deadlock for arbitrary depths of recursion may unfortunately not be avoidable unless the executor dynamically spawns more worker threads.

I'm closing this as the original issue has been solved. Shout out to the authors of this and the smol crate for working on this!

ghost · 2020-09-21T17:59:52Z

@rmanoka By the way, smol::Executor can now spawn non-'static futures, while having a safe and sound API: https://docs.rs/smol/1.2.1/smol/struct.Executor.html#method.spawn

rmanoka · 2020-09-21T18:22:51Z

@stjepang this is awesome! Just clarifying: to spawn a Future: 'a , I'll need to create a new Executor<'a>, correct? The motivation behind async-scoped was to provide something like crossbeam scope, but for tasks and hopefully not spawning new threads for each task.

ghost · 2020-09-21T18:29:27Z

Yes, exactly - you can think of Executor<'a> as a scope that is allowed to borrow anything with lifetime 'a. Note that Executor<'a> is something you need to "drive" yourself with run() or tick() - it won't run tasks by itself. There are no threads involved unless you spawn your own threads and then call run() on them.

This design makes it possible to do lots of interesting things - here's a scoped executor with task priorities:
https://github.com/stjepang/async-executor/blob/master/examples/priority.rs

rmanoka · 2020-09-21T19:30:13Z

That is a very elegant abstraction!

yoshuawuyts assigned ghost Jan 10, 2020

yoshuawuyts added the bug Something isn't working label Jan 10, 2020

rmanoka mentioned this issue Aug 16, 2020

panic when calling scope_and_block rmanoka/async-scoped#2

Closed

rmanoka closed this as completed Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock with recursive task::block_on #644

Deadlock with recursive task::block_on #644

rmanoka commented Dec 23, 2019

ghost commented Dec 24, 2019 •

edited by ghost

Loading

rmanoka commented Dec 24, 2019

kylerky commented Dec 26, 2019 •

edited

Loading

rmanoka commented Dec 26, 2019 •

edited

Loading

kylerky commented Dec 26, 2019

rmanoka commented Dec 26, 2019

kylerky commented Dec 26, 2019 •

edited

Loading

algesten commented Sep 6, 2020

ghost commented Sep 6, 2020

rmanoka commented Sep 7, 2020 •

edited

Loading

installgentoo commented Sep 21, 2020

rmanoka commented Sep 21, 2020

ghost commented Sep 21, 2020

rmanoka commented Sep 21, 2020

ghost commented Sep 21, 2020

rmanoka commented Sep 21, 2020

Deadlock with recursive task::block_on #644

Deadlock with recursive task::block_on #644

Comments

rmanoka commented Dec 23, 2019

ghost commented Dec 24, 2019 • edited by ghost Loading

rmanoka commented Dec 24, 2019

kylerky commented Dec 26, 2019 • edited Loading

rmanoka commented Dec 26, 2019 • edited Loading

kylerky commented Dec 26, 2019

rmanoka commented Dec 26, 2019

kylerky commented Dec 26, 2019 • edited Loading

algesten commented Sep 6, 2020

ghost commented Sep 6, 2020

rmanoka commented Sep 7, 2020 • edited Loading

installgentoo commented Sep 21, 2020

rmanoka commented Sep 21, 2020

ghost commented Sep 21, 2020

rmanoka commented Sep 21, 2020

ghost commented Sep 21, 2020

rmanoka commented Sep 21, 2020

ghost commented Dec 24, 2019 •

edited by ghost

Loading

kylerky commented Dec 26, 2019 •

edited

Loading

rmanoka commented Dec 26, 2019 •

edited

Loading

kylerky commented Dec 26, 2019 •

edited

Loading

rmanoka commented Sep 7, 2020 •

edited

Loading