Barbara solves computational problems with async #132

erichgess · 2021-04-07T23:43:58Z

Written during Vision Doc Writing Sessions, along with other helpful folks.

Related to #105

…lems with async

src/vision/status_quo/barbara_simulates_hydrodynamics.md

rylev · 2021-04-08T09:50:33Z

src/vision/status_quo/barbara_simulates_hydrodynamics.md

+### Solution Path
+What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The design of the `async` framework seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced.
+
+As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs.


I'm surprised by Barbara's choice of tokio since tokio is normally meant for I/O bound workloads. Can you talk about the decision to use tokio more? Was it because Barbara needed an async runtime and tokio seemed like the only choice?

I think that it was just the most famous executor. @jzrake could confirm.

Yes, I (Barbara) was looking for a multi-threaded executor, and Tokio seemed to be the most mature. Along the way I learned that the async feature was never the right solution for data parallelism / CPU-bound tasks. With generous help from @erichgess I'm now working on a more appropriate solution.

I must say I don't believe Barbara is the right character for this story, because she is an experienced systems programmer. I (jzrake) am an astrophysicist, and have been learning Rust over the past <year from "the book", stack overflow, browsing GitHub, etc. It's only since contacting @nikomatsakis and now talking with @erichgess that I am getting guidance on industry practices. This little bit of interaction has already proven to be really productive.

There's a bigger moral here, which I hope is not being missed. There is no "Marvin the scientist" on the cast of characters, and I think that's symbolic of a neglected user base: the HPC community and scientists who badly need to modernize their codes. Many of us would love to adopt Rust, but there's very little intellectual capital at the intersection of Rust and scientific computing that could enable researchers to learn by osmosis. I see a lot of potential in the language, and my group here at Clemson is accumulating experience, which will be share with the HPC / physics simulation community. However widespread adoption of Rust in HPC is not going to happen until someone (Amazon maybe?) lends an ear to that community's needs, and invests in the types of training and educational materials that were made available to Python users by outfits like Enthought and Anaconda -- efforts which fueled the widespread adoption of Python (over Fortran and IDL) in observational astronomy and experimental physics.

One quick note: Barbara is the experienced Rust developer (Grace is the experienced systems developer). I went with Barbara here because from what I could tell you have some level of experience with Rust and comfortable with both Rust and learning more advanced concepts of programming. However, I don't know if "experienced" for Barbara means deep knowledge of Rust or if it means that you've used Rust enough that you can understand lifetimes and borrow errors without having to use Google 😄.

Barbara's priorities are:

Top priority: overall productivity and long-term maintenance -- she loves Rust, and wants to see it extended to new areas; she has an existing code base to maintain Expectations: elegance and craftsmanship, fits well with Rust

Niklaus is an alternative that lines up with your not coming from a programming background, but I don't think you're close to being a beginner programmer.

Yes, thanks, I guess Barbara is the best fit of the available characters (I still kind of like Marvin but oh well). To me experienced means "productive" -- can get things done with the language. As opposed to "deep knowledge", which I'd think means you work on the compiler. Or at least do lots of good quality unsafe.

There may well be a missing character. Interesting. Marvin is kind of like Niklaus, but I do agree @jzrake that you seemed like someone who knew Rust fairly well.

One thing about the 3 experienced characters is they differ in what types of software they write rather than how programming fits into their life. When reading their descriptions, to me, it's pretty clear that they're all people who's job is writing software. But that doesn't cover people who only use programming as one of many tools to do their job. @jzrake would fall into this group, along with data scientists, economists, and so on.

Having given this some thought, I think we ought to use Niklaus for this. I think we can generalize him to "non-programmers (yet)" of various backgrounds.

rylev · 2021-04-08T09:54:12Z

src/vision/status_quo/barbara_simulates_hydrodynamics.md

+        stage_map.insert(index, runtime.spawn(stage).map(|f| f.unwrap()).shared());
+    };
+```
+lacked performance because she needed to clone the value for every task.  So, Barbara switched over to using `Arc` to keep a thread safe RC to the shared data. But this change introduced a lot of `.map` and `.unwrap` function calls, making the code much harder to read. She realized that managing the dependency graph was not intuitive when using `async` for concurrency.


"she needed to clone the value for every task" - which value are you referring to?

I'm having a hard to relating the code example and the problems mentioned in this paragraph. Perhaps comments in the code examples could bring better attention to the relevant details?

it means she has to clone the value that was computed -- in this case, that's the future that resulted from runtime.spawn. So maybe...

The problem with this design was that her stage_map stored an impl Future as the result of each cell. When she needed to access and combine the results of those futures, it involved a lot of cloning:

// compute the value at index `i` based on `i-1` let previous_value = stage_map.get(i - 1).clone();

I don't think i quite got this right though. @jzrake could probably fill in the right types etc here. I'd actually like to have this noted down. Also, where does the shared function come from? (The futures crate probably?)

Yes, shared comes from the futures crate. I think the basic pattern I was trying can be represented more cleanly than with the copy-paste's from the code base that are currently in there. I'll propose a cleaner example for the story.

To contextualize the snippets in @erichgess's post, here is an approximation of the parallelization strategy in our science codes.

use std::sync::Arc; use std::collections::HashMap; use futures::FutureExt; use futures::future::join_all; use tokio::runtime::Runtime; async fn update(runtime: Runtime, solver: Solver, state: State) -> Result<State, Error> { let mut stage1 = HashMap::new(); let mut stage2 = HashMap::new(); let mut stage3 = HashMap::new(); // Stage 1: results can be computed independently on each block. Tokio // futures must be 'static, so anything moved into the async block has to // be cloned first. for (index, block) in &state.patches { let block = block.clone(); let solver = solver.clone(); let future = async move { solver.try_stage1(block) }; stage1.insert(index, runtime.spawn(future).map(|f| f.unwrap()).shared()); } let stage1 = Arc::new(stage1); // Stage 2: each result requires multiple blocks froms stage 1. for (index, _) in &state.patches { let solver = solver.clone(); let stage1 = stage1.clone(); let future = async move { let neighbors = join_all(solver.neighbors_of(index, stage1)).await; solver.try_stage2(index, neighbors) }; stage2.insert(index, runtime.spawn(future).map(|f| f.unwrap()).shared()); } let stage2 = Arc::new(stage2); // Stage 3: similar to stage 2 for (index, _) in &state.patches { } let stage3 = Arc::new(stage3); // Final stage: convert HashMap<Future<Result<T, E>>> -> Future<Result<Vec<T>, E>> let new_patches: Result<Vec<_>, _> = join_all(stage3.into_values()) .await .into_iter() .collect(); State { patches: new_patches? } }

rylev · 2021-04-08T09:56:39Z

src/vision/status_quo/barbara_simulates_hydrodynamics.md

+
+## 🤔 Frequently Asked Questions
+
+### **What are the morals of the story?**


I'd like to explore the choice of tokio a bit more. I wonder if the choice of that particular runtime led to some of the issues (though it's clear that not all of them came from this choice). Perhaps the lack of a runtime designed for compute bound workloads is another issue here.

That's a very good point: I'll follow up with the source to get more details on how tokio was chosen. Personally, I believe that this is either a missing piece of "guidance" in terms of helping people understand the type of concurrency problem they are solving (IO bound, compute bound, etc) and how to get started on the right foot for solving that type of problem.

I don't yet know enough much about this area of the Rust ecosystem. What's the runtime refer to with regards to tokio? The library as a whole, it's API, or the code under the hood that manages threads, task execution, scheduling, etc?

One big problem, from my perspective, is that the semantics of async are simply not able to express a complex network of thousands of compute tasks in a simple way. Trying to write a task until 3 other tasks have completed isn't easy with async. In this story, Barbara had to jury rig a dependency-based scheduler into async that would block tasks from starting their computations until their dependencies were completed. Put another way, the semantics of async are simply meant for very different type of concurrency problem than the one Barbara was solving.

@jzrake let me know if this is an accurate interpretation of your experience.

this is either a missing piece of "guidance"

I 100% agree here. The async ecosystem is highly focused on server side scenarios (for the obvious reason that it's a very popular use case), but we need to do better advising users who have different needs.

simply meant for very different type of concurrency problem than the one Barbara was solving.

I'm not sure whether I agree or disagree with you. async/await lets you declare dependencies of a certain piece of computation in the same way you would in synchronous code:

let dep1 = dep1.await; let dep2 = dep2.await // now compute based on dep1 and dep2 let answer = compute(dep1, dep2);

If you need dependencies to run in parallel you can join them.

let deps = join_all(dep1, dep2).await; let answer = compute(deps);

Obviously, the experience represents here shows that async/await as it is today does not lead to simple to understand code for compute bound workloads, but I'm just not sure if this is a fundamental property of the computation model or just lack of good guidance and good tooling for this use case.

I'm not sure whether I agree or disagree with you. async/await lets you declare dependencies of a certain piece of computation in the same way you would in synchronous code:
If you need dependencies to run in parallel you can join them.
Right, but I'm wondering if that API just doesn't work well when you have thousands of dynamically created tasks that are all joined together in a complex dependency graph. Which makes me wonder if the async API assumes that the structure of the code defines the dependencies of the tasks. For example, if you have tasks X, Y, Z that depend on A, B then you write:

// define tasks a and b let deps = join_all(a, b).await; // define tasks x, y, z join_all(x, y, z).await;

and that might create some friction when trying to write code that coordinates a lot compute based tasks.

Obviously, the experience represents here shows that async/await as it is today does not lead to simple to understand code for compute bound workloads, but I'm just not sure if this is a fundamental property of the computation model or just lack of good guidance and good tooling for this use case.
Totally agree. One of my take-aways from this story is that it's worth taking some time to see if there is a good solution to Barbara's problem using async, if so then we could use that to improve the documentation and guidance, and if not then ask if this is a feature that async is missing, or if async is not the right tool, etc.

what I mean is: the difficulties I imagine tokio brings is perhaps that its fair scheduler could be more efficient for computation bound workflows, for example, where fairness is not a concern. But that doesn't play into this story, which is more about the challenge of picking the right thing and general "hardness" of async Rust overall.

I do think a library that had functions or abstractions for doing computational DAGs like this might help, of course, but I don't see why that would be tied closely to the executor (at least, not yet).

I think there are two main barriers to the code clarity:

there are 1000's of tasks, and linking them creates awkward code: not Tokio's fault

running the tasks on workers requires spawn: also not Tokio's fault (I think)

(1) I believe is just inherent to the async construct. (2) is a problem because it couples the computation to the executor (to perform well, Tokio wants to be the spawner and the executor). If the async block had syntax to hint the executor it wants to run on a worker thread, such as

let y = async cpu_bound { calculate(x.await) };

that would alleviate (2). But (1) is still an issue.

To circle back, are there any changes to the story that I should make based on this discussion?

After thinking about it over the weekend, I think this indicated that how I described the chain of events that led using tokio was not explicit enough. So, I've update the story to make that more explicit. The first paragraph of the Solution Path section now has a very simple step-by-step for how the character wound up using tokio for compute bound parallelization.

I'm quite happy with it.

nikomatsakis

I left a few nits. I think the story could be improved somewhat, made more clear, but I also feel that the major morals carry through. I'd be inclined to merge -- although I think we should switch to Niklaus.

src/vision/status_quo/barbara_simulates_hydrodynamics.md

nikomatsakis · 2021-04-11T18:51:00Z

src/vision/status_quo/barbara_simulates_hydrodynamics.md

+### Solution Path
+What Barbara wanted to do was find a way to more efficiently use threads: have a fixed number of threads that each mapped to a core on the CPU and assign patches to those threads as patches became ready to compute. The design of the `async` framework seemed to provide exactly that behavior. And to move away from the message passing design, because the number of messages being passed was proportional to the number of trace particles being traced.
+
+As Barbara began working on her new design with `tokio`, her use of `async` went from a general (from the textbook) use of basic `async` features to a more specific implementation leveraging exactly the features that were most suited for her needs.


Having given this some thought, I think we ought to use Niklaus for this. I think we can generalize him to "non-programmers (yet)" of various backgrounds.

The technical conclusion of the specific example in this doc is similar to: rust-lang#132, but the story in general can be about how to compare rust async code and other languages async code (particularly c++20)

nikomatsakis

Looks ready to land to me, I left a few nits

src/vision/status_quo/barbara_simulates_hydrodynamics.md

nikomatsakis · 2021-04-12T19:44:50Z

src/vision/status_quo/barbara_simulates_hydrodynamics.md

+
+## 🤔 Frequently Asked Questions
+
+### **What are the morals of the story?**


I'm quite happy with it.

src/vision/status_quo/barbara_simulates_hydrodynamics.md

nikomatsakis · 2021-04-12T22:08:54Z

Going to go ahead and merge.

erichgess added 2 commits April 7, 2021 19:39

First draft of Barbaras experience solving computationally bound prob…

7d2a062

…lems with async

Removed extraneous text

ecb858d

Stupremee reviewed Apr 8, 2021

View reviewed changes

rylev reviewed Apr 8, 2021

View reviewed changes

erichgess added 3 commits April 8, 2021 08:52

Fixing typos

5813338

Formatting

f6ae9bd

Improved clarity

6f47a88

nikomatsakis approved these changes Apr 11, 2021

View reviewed changes

erichgess added 6 commits April 12, 2021 10:04

Rewrote several sections to improve the clarity of the story

b11494f

Improving clarity of the story

144075e

Improving clarity of the story

b1cf2ee

Switch main character to Niklaus from Barbara

28e0259

Improving clarity

2d9f732

improving clarity

968d911

guswynn mentioned this pull request Apr 12, 2021

Barbara compares some cpp code (and has a performance problem) #144

Merged

nikomatsakis requested changes Apr 12, 2021

View reviewed changes

Missed some occurances of Barbara that needed to be renamed to Niklaus

bfe59f5

nikomatsakis merged commit fa4d364 into rust-lang:master Apr 12, 2021

nikomatsakis added the status-quo-story-ideas "Status quo" user story ideas label Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Barbara solves computational problems with async #132

Barbara solves computational problems with async #132

erichgess commented Apr 7, 2021

rylev Apr 8, 2021

nikomatsakis Apr 8, 2021

jzrake Apr 8, 2021

erichgess Apr 9, 2021

jzrake Apr 9, 2021

nikomatsakis Apr 9, 2021

erichgess Apr 9, 2021

nikomatsakis Apr 11, 2021

erichgess Apr 12, 2021

rylev Apr 8, 2021

nikomatsakis Apr 8, 2021

jzrake Apr 8, 2021

jzrake Apr 8, 2021

rylev Apr 8, 2021

erichgess Apr 8, 2021

erichgess Apr 9, 2021

rylev Apr 9, 2021

erichgess Apr 9, 2021

nikomatsakis Apr 9, 2021

jzrake Apr 9, 2021 •

edited

Loading

erichgess Apr 9, 2021

erichgess Apr 12, 2021

nikomatsakis Apr 12, 2021

nikomatsakis left a comment •

edited

Loading

nikomatsakis Apr 11, 2021

nikomatsakis left a comment

nikomatsakis Apr 12, 2021

nikomatsakis commented Apr 12, 2021


		## 🤔 Frequently Asked Questions

		### What are the morals of the story?

Barbara solves computational problems with async #132

Barbara solves computational problems with async #132

Conversation

erichgess commented Apr 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jzrake Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikomatsakis left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikomatsakis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikomatsakis commented Apr 12, 2021

jzrake Apr 9, 2021 •

edited

Loading

nikomatsakis left a comment •

edited

Loading