Skip to content

kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 26, 2023

Conversation

andyross
Copy link
Collaborator

[This is @carlocaione's one-line fix from #58116 with some added refactoring from me to clean up docs and use some of the spiffy new APIs for spinning and barriers.]

As discovered by Carlo Caione, the k_thread_join code had a case where
it detected it had been called on a thread already marked _THREAD_DEAD
and exited early. That's not sufficient. The thread state is mutated
from the thread itself on its exit path. It may still be running!

Just like the code in z_swap(), we need to spin waiting on the other
CPU to write the switch handle before knowing it's safe to return,
otherwise the calling context might (and did) do something like
immediately k_thread_create() a new thread in the "dead" thread's
struct while it was still running on the other core.

There was also a similar case in k_thread_abort() which had the same
issue: it needs to spin waiting on the other CPU to kill the thread
via the same mechanism.

Fixes #58116

andyross added 3 commits May 26, 2023 10:33
…pin()

This trick turns out also to be needed by the abort/join code.
Promote it to a more formal-looking internal API and clean up the
documentation to (hopefully) clarify the exact behavior and better
explain the need.

This is one of the more... enchanted bits of the scheduler, and while
the trick is IMHO pretty clean, it remains a big SMP footgun.

Signed-off-by: Andy Ross <[email protected]>
The switch_handle field in the thread struct is used as an atomic flag
between CPUs in SMP, and has been known for a long time to technically
require memory barriers for correct operation.  We have an API for
that now, so put them in:

* The code immediately before arch_switch() needs a write barrier to
  ensure that thread state written by the scheduler is seen to happen
  before the outgoing thread is flagged with a valid switch handle.

* The loop in z_sched_switch_spin() needs a read barrier at the end,
  to make sure the calling context doesn't load state from before the
  other CPU stored the switch handle.

Also, that same spot in switch_spin was spinning with interrupts held,
which means it needs a call to arch_spin_relax() to avoid a FPU state
deadlock on some architectures.

Signed-off-by: Andy Ross <[email protected]>
As discovered by Carlo Caione, the k_thread_join code had a case where
it detected it had been called on a thread already marked _THREAD_DEAD
and exited early.  That's not sufficient.  The thread state is mutated
from the thread itself on its exit path.  It may still be running!

Just like the code in z_swap(), we need to spin waiting on the other
CPU to write the switch handle before knowing it's safe to return,
otherwise the calling context might (and did) do something like
immediately k_thread_create() a new thread in the "dead" thread's
struct while it was still running on the other core.

There was also a similar case in k_thread_abort() which had the same
issue: it needs to spin waiting on the other CPU to kill the thread
via the same mechanism.

Fixes zephyrproject-rtos#58116

Originally-by: Carlo Caione <[email protected]>
Signed-off-by: Andy Ross <[email protected]>
@andyross andyross force-pushed the join-switch-spin branch from 24fb5fb to a654d8a Compare May 26, 2023 17:37
new_thread->switch_handle = NULL;
barrier_dmem_fence_full(); /* write barrier */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hilariously, after I was the one who said we should just merge the full barrier API first for simplicity and worry about read/write (ARM, traditional) and/or acquire/release/consume (C++ terms, sigh) complexity later...

...I hit a spot where there's a clear use for finer barrier control within mere days. Still don't think it's wrong to have the conservative API though.

@cfriedt
Copy link
Member

cfriedt commented May 26, 2023

Taking a look right now.

Copy link
Collaborator

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and is holding up in testing.

@cfriedt
Copy link
Member

cfriedt commented May 26, 2023

It's a definite improvement. I think there are still a few cracks in pthreads where water is getting through, but k_threads look good to me
https://pastebin.com/yEJ2CbAF

@andyross
Copy link
Collaborator Author

FWIW, after the expected Keystone Cops hijinks getting the scripting to work, I have a rig with 32 of these running on arm64 in parallel. They've been going a little over an hour now without issue, which is more testing in aggregate than either riscv platform got. I'll let it run for the rest of the afternoon, but it seems pretty clear that @carlocaione found the single issue that was affecting that particular test. The remaining pthread crashes are almost certainly higher level.

@jgl-meta jgl-meta merged commit a08e23f into zephyrproject-rtos:main May 26, 2023
@cfriedt
Copy link
Member

cfriedt commented May 27, 2023

pthreads have some windows where there are race conditions.

My next question: @andyross , @carlocaione , @npitre - is there a list of PR's that should be backported to v2.7-branch, because this issue was first discovered there 😁

Today is the release date for v2.7.5, although the fixes are unlikely to make it in for this release

@andyross
Copy link
Collaborator Author

Regarding backports: this relies on brand new APIs, so it's not going to backport cleanly. @carlocaione 's original one-line fix is what we probably want in older branches. Though it's worth pointing out that as interesting as this was as a bug, it's actually going to be extremely rare in practice. Even arm64 was going through millions of thread create/join cycles before it tripped. Real apps just aren't ever going to do that; I'd view this as fairly low priority as a backport candidate.

@cfriedt
Copy link
Member

cfriedt commented May 27, 2023

Regarding backports: this relies on brand new APIs, so it's not going to backport cleanly. @carlocaione 's original one-line fix is what we probably want in older branches. Though it's

I'll give that a shot then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARC ARC Architecture area: ARM64 ARM (64-bit) Architecture area: Kernel backport v2.7-branch Request backport to the v2.7-branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kernel: thread: race condition between create and join
5 participants