kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334

andyross · 2023-05-26T17:12:50Z

[This is @carlocaione's one-line fix from #58116 with some added refactoring from me to clean up docs and use some of the spiffy new APIs for spinning and barriers.]

As discovered by Carlo Caione, the k_thread_join code had a case where
it detected it had been called on a thread already marked _THREAD_DEAD
and exited early. That's not sufficient. The thread state is mutated
from the thread itself on its exit path. It may still be running!

Just like the code in z_swap(), we need to spin waiting on the other
CPU to write the switch handle before knowing it's safe to return,
otherwise the calling context might (and did) do something like
immediately k_thread_create() a new thread in the "dead" thread's
struct while it was still running on the other core.

There was also a similar case in k_thread_abort() which had the same
issue: it needs to spin waiting on the other CPU to kill the thread
via the same mechanism.

Fixes #58116

…pin() This trick turns out also to be needed by the abort/join code. Promote it to a more formal-looking internal API and clean up the documentation to (hopefully) clarify the exact behavior and better explain the need. This is one of the more... enchanted bits of the scheduler, and while the trick is IMHO pretty clean, it remains a big SMP footgun. Signed-off-by: Andy Ross <[email protected]>

The switch_handle field in the thread struct is used as an atomic flag between CPUs in SMP, and has been known for a long time to technically require memory barriers for correct operation. We have an API for that now, so put them in: * The code immediately before arch_switch() needs a write barrier to ensure that thread state written by the scheduler is seen to happen before the outgoing thread is flagged with a valid switch handle. * The loop in z_sched_switch_spin() needs a read barrier at the end, to make sure the calling context doesn't load state from before the other CPU stored the switch handle. Also, that same spot in switch_spin was spinning with interrupts held, which means it needs a call to arch_spin_relax() to avoid a FPU state deadlock on some architectures. Signed-off-by: Andy Ross <[email protected]>

As discovered by Carlo Caione, the k_thread_join code had a case where it detected it had been called on a thread already marked _THREAD_DEAD and exited early. That's not sufficient. The thread state is mutated from the thread itself on its exit path. It may still be running! Just like the code in z_swap(), we need to spin waiting on the other CPU to write the switch handle before knowing it's safe to return, otherwise the calling context might (and did) do something like immediately k_thread_create() a new thread in the "dead" thread's struct while it was still running on the other core. There was also a similar case in k_thread_abort() which had the same issue: it needs to spin waiting on the other CPU to kill the thread via the same mechanism. Fixes zephyrproject-rtos#58116 Originally-by: Carlo Caione <[email protected]> Signed-off-by: Andy Ross <[email protected]>

andyross · 2023-05-26T17:42:02Z

kernel/include/kswap.h

 			new_thread->switch_handle = NULL;
+			barrier_dmem_fence_full(); /* write barrier */


Hilariously, after I was the one who said we should just merge the full barrier API first for simplicity and worry about read/write (ARM, traditional) and/or acquire/release/consume (C++ terms, sigh) complexity later...

...I hit a spot where there's a clear use for finer barrier control within mere days. Still don't think it's wrong to have the conservative API though.

cfriedt · 2023-05-26T18:00:47Z

Taking a look right now.

npitre

Looks good and is holding up in testing.

cfriedt · 2023-05-26T18:07:01Z

It's a definite improvement. I think there are still a few cracks in pthreads where water is getting through, but k_threads look good to me
https://pastebin.com/yEJ2CbAF

andyross · 2023-05-26T20:14:09Z

FWIW, after the expected Keystone Cops hijinks getting the scripting to work, I have a rig with 32 of these running on arm64 in parallel. They've been going a little over an hour now without issue, which is more testing in aggregate than either riscv platform got. I'll let it run for the rest of the afternoon, but it seems pretty clear that @carlocaione found the single issue that was affecting that particular test. The remaining pthread crashes are almost certainly higher level.

cfriedt · 2023-05-27T01:16:16Z

pthreads have some windows where there are race conditions.

My next question: @andyross , @carlocaione , @npitre - is there a list of PR's that should be backported to v2.7-branch, because this issue was first discovered there 😁

Today is the release date for v2.7.5, although the fixes are unlikely to make it in for this release

andyross · 2023-05-27T12:36:17Z

Regarding backports: this relies on brand new APIs, so it's not going to backport cleanly. @carlocaione 's original one-line fix is what we probably want in older branches. Though it's worth pointing out that as interesting as this was as a bug, it's actually going to be extremely rare in practice. Even arm64 was going through millions of thread create/join cycles before it tripped. Real apps just aren't ever going to do that; I'd view this as fairly low priority as a backport candidate.

cfriedt · 2023-05-27T23:06:56Z

Regarding backports: this relies on brand new APIs, so it's not going to backport cleanly. @carlocaione 's original one-line fix is what we probably want in older branches. Though it's

I'll give that a shot then

andyross requested review from dcpleung, nashif, carlocaione, abrodkin, ruuddw and evgeniy-paltsev as code owners May 26, 2023 17:12

andyross requested review from npitre and cfriedt May 26, 2023 17:13

zephyrbot added area: ARM64 ARM (64-bit) Architecture area: Kernel area: ARC ARC Architecture labels May 26, 2023

zephyrbot requested review from ceolin, peter-mitsis, povergoing, SgrrZhf and SiyuanCheng-CN May 26, 2023 17:13

zephyrbot assigned andyross May 26, 2023

andyross mentioned this pull request May 26, 2023

kernel: thread: race condition between create and join #58116

Closed

andyross added 3 commits May 26, 2023 10:33

andyross force-pushed the join-switch-spin branch from 24fb5fb to a654d8a Compare May 26, 2023 17:37

andyross commented May 26, 2023

View reviewed changes

npitre approved these changes May 26, 2023

View reviewed changes

cfriedt approved these changes May 26, 2023

View reviewed changes

jgl-meta merged commit a08e23f into zephyrproject-rtos:main May 26, 2023

cfriedt added the backport v2.7-branch Request backport to the v2.7-branch label May 27, 2023

zephyrbot mentioned this pull request May 27, 2023

[Backport v2.7-branch] Failed to backport #58334 #58354

Closed

This was referenced May 27, 2023

kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58357

Closed

[v2.7] kernel: thread: race condition between create and join #58362

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334

kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334

andyross commented May 26, 2023

andyross May 26, 2023

cfriedt commented May 26, 2023

npitre left a comment

cfriedt commented May 26, 2023

andyross commented May 26, 2023

cfriedt commented May 27, 2023 •

edited

Loading

andyross commented May 27, 2023

cfriedt commented May 27, 2023

		new_thread->switch_handle = NULL;
		barrier_dmem_fence_full(); /* write barrier */

kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334

kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334

Conversation

andyross commented May 26, 2023

andyross May 26, 2023

Choose a reason for hiding this comment

cfriedt commented May 26, 2023

npitre left a comment

Choose a reason for hiding this comment

cfriedt commented May 26, 2023

andyross commented May 26, 2023

cfriedt commented May 27, 2023 • edited Loading

andyross commented May 27, 2023

cfriedt commented May 27, 2023

cfriedt commented May 27, 2023 •

edited

Loading