Skip to content

Barbara solves computational problems with async #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

erichgess
Copy link
Contributor

Written during Vision Doc Writing Sessions, along with other helpful folks.

Related to #105

### Solution Path
What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The design of the `async` framework seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced.

As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by Barbara's choice of tokio since tokio is normally meant for I/O bound workloads. Can you talk about the decision to use tokio more? Was it because Barbara needed an async runtime and tokio seemed like the only choice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it was just the most famous executor. @jzrake could confirm.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I (Barbara) was looking for a multi-threaded executor, and Tokio seemed to be the most mature. Along the way I learned that the async feature was never the right solution for data parallelism / CPU-bound tasks. With generous help from @erichgess I'm now working on a more appropriate solution.

I must say I don't believe Barbara is the right character for this story, because she is an experienced systems programmer. I (jzrake) am an astrophysicist, and have been learning Rust over the past <year from "the book", stack overflow, browsing GitHub, etc. It's only since contacting @nikomatsakis and now talking with @erichgess that I am getting guidance on industry practices. This little bit of interaction has already proven to be really productive.

There's a bigger moral here, which I hope is not being missed. There is no "Marvin the scientist" on the cast of characters, and I think that's symbolic of a neglected user base: the HPC community and scientists who badly need to modernize their codes. Many of us would love to adopt Rust, but there's very little intellectual capital at the intersection of Rust and scientific computing that could enable researchers to learn by osmosis. I see a lot of potential in the language, and my group here at Clemson is accumulating experience, which will be share with the HPC / physics simulation community. However widespread adoption of Rust in HPC is not going to happen until someone (Amazon maybe?) lends an ear to that community's needs, and invests in the types of training and educational materials that were made available to Python users by outfits like Enthought and Anaconda -- efforts which fueled the widespread adoption of Python (over Fortran and IDL) in observational astronomy and experimental physics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick note: Barbara is the experienced Rust developer (Grace is the experienced systems developer). I went with Barbara here because from what I could tell you have some level of experience with Rust and comfortable with both Rust and learning more advanced concepts of programming. However, I don't know if "experienced" for Barbara means deep knowledge of Rust or if it means that you've used Rust enough that you can understand lifetimes and borrow errors without having to use Google 😄.

Barbara's priorities are:

    Top priority: overall productivity and long-term maintenance -- she loves Rust, and wants to see it extended to new areas; she has an existing code base to maintain
    Expectations: elegance and craftsmanship, fits well with Rust

Niklaus is an alternative that lines up with your not coming from a programming background, but I don't think you're close to being a beginner programmer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, I guess Barbara is the best fit of the available characters (I still kind of like Marvin but oh well). To me experienced means "productive" -- can get things done with the language. As opposed to "deep knowledge", which I'd think means you work on the compiler. Or at least do lots of good quality unsafe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may well be a missing character. Interesting. Marvin is kind of like Niklaus, but I do agree @jzrake that you seemed like someone who knew Rust fairly well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing about the 3 experienced characters is they differ in what types of software they write rather than how programming fits into their life. When reading their descriptions, to me, it's pretty clear that they're all people who's job is writing software. But that doesn't cover people who only use programming as one of many tools to do their job. @jzrake would fall into this group, along with data scientists, economists, and so on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having given this some thought, I think we ought to use Niklaus for this. I think we can generalize him to "non-programmers (yet)" of various backgrounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

stage_map.insert(index, runtime.spawn(stage).map(|f| f.unwrap()).shared());
};
```
lacked performance because she needed to clone the value for every task. So, Barbara switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. She realized that managing the dependency graph was not intuitive when using `async` for concurrency.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"she needed to clone the value for every task" - which value are you referring to?

I'm having a hard to relating the code example and the problems mentioned in this paragraph. Perhaps comments in the code examples could bring better attention to the relevant details?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means she has to clone the value that was computed -- in this case, that's the future that resulted from runtime.spawn. So maybe...


The problem with this design was that her stage_map stored an impl Future as the result of each cell. When she needed to access and combine the results of those futures, it involved a lot of cloning:

// compute the value at index `i` based on `i-1`
let previous_value = stage_map.get(i - 1).clone();

I don't think i quite got this right though. @jzrake could probably fill in the right types etc here. I'd actually like to have this noted down. Also, where does the shared function come from? (The futures crate probably?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, shared comes from the futures crate. I think the basic pattern I was trying can be represented more cleanly than with the copy-paste's from the code base that are currently in there. I'll propose a cleaner example for the story.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To contextualize the snippets in @erichgess's post, here is an approximation of the parallelization strategy in our science codes.

use std::sync::Arc;
use std::collections::HashMap;
use futures::FutureExt;
use futures::future::join_all;
use tokio::runtime::Runtime;

async fn update(runtime: Runtime, solver: Solver, state: State) -> Result<State, Error> {

    let mut stage1 = HashMap::new();
    let mut stage2 = HashMap::new();
    let mut stage3 = HashMap::new();

    // Stage 1: results can be computed independently on each block. Tokio
    // futures must be 'static, so anything moved into the async block has to
    // be cloned first.
    for (index, block) in &state.patches {
        let block = block.clone();
        let solver = solver.clone();

        let future = async move {
            solver.try_stage1(block)
        };
        stage1.insert(index, runtime.spawn(future).map(|f| f.unwrap()).shared());
    }
    let stage1 = Arc::new(stage1);

    // Stage 2: each result requires multiple blocks froms stage 1.
    for (index, _) in &state.patches {
        let solver = solver.clone();
        let stage1 = stage1.clone();

        let future = async move {
            let neighbors = join_all(solver.neighbors_of(index, stage1)).await;
            solver.try_stage2(index, neighbors)
        };
        stage2.insert(index, runtime.spawn(future).map(|f| f.unwrap()).shared());
    }
    let stage2 = Arc::new(stage2);

    // Stage 3: similar to stage 2
    for (index, _) in &state.patches {

    }
    let stage3 = Arc::new(stage3);

    // Final stage: convert HashMap<Future<Result<T, E>>> -> Future<Result<Vec<T>, E>>
    let new_patches: Result<Vec<_>, _> = join_all(stage3.into_values())
        .await
        .into_iter()
        .collect();

    State { patches: new_patches? }
}


## 🤔 Frequently Asked Questions

### **What are the morals of the story?**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to explore the choice of tokio a bit more. I wonder if the choice of that particular runtime led to some of the issues (though it's clear that not all of them came from this choice). Perhaps the lack of a runtime designed for compute bound workloads is another issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point: I'll follow up with the source to get more details on how tokio was chosen. Personally, I believe that this is either a missing piece of "guidance" in terms of helping people understand the type of concurrency problem they are solving (IO bound, compute bound, etc) and how to get started on the right foot for solving that type of problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't yet know enough much about this area of the Rust ecosystem. What's the runtime refer to with regards to tokio? The library as a whole, it's API, or the code under the hood that manages threads, task execution, scheduling, etc?

One big problem, from my perspective, is that the semantics of async are simply not able to express a complex network of thousands of compute tasks in a simple way. Trying to write a task until 3 other tasks have completed isn't easy with async. In this story, Barbara had to jury rig a dependency-based scheduler into async that would block tasks from starting their computations until their dependencies were completed. Put another way, the semantics of async are simply meant for very different type of concurrency problem than the one Barbara was solving.

@jzrake let me know if this is an accurate interpretation of your experience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is either a missing piece of "guidance"

I 100% agree here. The async ecosystem is highly focused on server side scenarios (for the obvious reason that it's a very popular use case), but we need to do better advising users who have different needs.

simply meant for very different type of concurrency problem than the one Barbara was solving.

I'm not sure whether I agree or disagree with you. async/await lets you declare dependencies of a certain piece of computation in the same way you would in synchronous code:

let dep1 = dep1.await;
let dep2 = dep2.await
// now compute based on dep1 and dep2
let answer = compute(dep1, dep2);

If you need dependencies to run in parallel you can join them.

let deps = join_all(dep1, dep2).await;
let answer = compute(deps);

Obviously, the experience represents here shows that async/await as it is today does not lead to simple to understand code for compute bound workloads, but I'm just not sure if this is a fundamental property of the computation model or just lack of good guidance and good tooling for this use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether I agree or disagree with you. async/await lets you declare dependencies of a certain piece of computation in the same way you would in synchronous code:
If you need dependencies to run in parallel you can join them.
Right, but I'm wondering if that API just doesn't work well when you have thousands of dynamically created tasks that are all joined together in a complex dependency graph. Which makes me wonder if the async API assumes that the structure of the code defines the dependencies of the tasks. For example, if you have tasks X, Y, Z that depend on A, B then you write:

// define tasks a and b
let deps = join_all(a, b).await;

// define tasks x, y, z
join_all(x, y, z).await;

and that might create some friction when trying to write code that coordinates a lot compute based tasks.

Obviously, the experience represents here shows that async/await as it is today does not lead to simple to understand code for compute bound workloads, but I'm just not sure if this is a fundamental property of the computation model or just lack of good guidance and good tooling for this use case.
Totally agree. One of my take-aways from this story is that it's worth taking some time to see if there is a good solution to Barbara's problem using async, if so then we could use that to improve the documentation and guidance, and if not then ask if this is a feature that async is missing, or if async is not the right tool, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is: the difficulties I imagine tokio brings is perhaps that its fair scheduler could be more efficient for computation bound workflows, for example, where fairness is not a concern. But that doesn't play into this story, which is more about the challenge of picking the right thing and general "hardness" of async Rust overall.

I do think a library that had functions or abstractions for doing computational DAGs like this might help, of course, but I don't see why that would be tied closely to the executor (at least, not yet).

Copy link

@jzrake jzrake Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two main barriers to the code clarity:

  1. there are 1000's of tasks, and linking them creates awkward code: not Tokio's fault
  2. running the tasks on workers requires spawn: also not Tokio's fault (I think)

(1) I believe is just inherent to the async construct. (2) is a problem because it couples the computation to the executor (to perform well, Tokio wants to be the spawner and the executor). If the async block had syntax to hint the executor it wants to run on a worker thread, such as

let y = async cpu_bound {
    calculate(x.await)
};

that would alleviate (2). But (1) is still an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To circle back, are there any changes to the story that I should make based on this discussion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it over the weekend, I think this indicated that how I described the chain of events that led using tokio was not explicit enough. So, I've update the story to make that more explicit. The first paragraph of the Solution Path section now has a very simple step-by-step for how the character wound up using tokio for compute bound parallelization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite happy with it.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few nits. I think the story could be improved somewhat, made more clear, but I also feel that the major morals carry through. I'd be inclined to merge -- although I think we should switch to Niklaus.

### Solution Path
What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The design of the `async` framework seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced.

As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having given this some thought, I think we ought to use Niklaus for this. I think we can generalize him to "non-programmers (yet)" of various backgrounds.

guswynn added a commit to guswynn/wg-async-foundations that referenced this pull request Apr 12, 2021
The technical conclusion of the specific example in this doc is similar to: rust-lang#132, but the story in general can be about how to compare rust async code and other languages async code (particularly c++20)
guswynn added a commit to guswynn/wg-async-foundations that referenced this pull request Apr 12, 2021
The technical conclusion of the specific example in this doc is similar to: rust-lang#132, but the story in general can be about how to compare rust async code and other languages async code (particularly c++20)
Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ready to land to me, I left a few nits


## 🤔 Frequently Asked Questions

### **What are the morals of the story?**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite happy with it.

@nikomatsakis
Copy link
Contributor

Going to go ahead and merge.

@nikomatsakis nikomatsakis merged commit fa4d364 into rust-lang:master Apr 12, 2021
@nikomatsakis nikomatsakis added the status-quo-story-ideas "Status quo" user story ideas label Apr 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-quo-story-ideas "Status quo" user story ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants