-
Notifications
You must be signed in to change notification settings - Fork 7.4k
kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pin() This trick turns out also to be needed by the abort/join code. Promote it to a more formal-looking internal API and clean up the documentation to (hopefully) clarify the exact behavior and better explain the need. This is one of the more... enchanted bits of the scheduler, and while the trick is IMHO pretty clean, it remains a big SMP footgun. Signed-off-by: Andy Ross <[email protected]>
The switch_handle field in the thread struct is used as an atomic flag between CPUs in SMP, and has been known for a long time to technically require memory barriers for correct operation. We have an API for that now, so put them in: * The code immediately before arch_switch() needs a write barrier to ensure that thread state written by the scheduler is seen to happen before the outgoing thread is flagged with a valid switch handle. * The loop in z_sched_switch_spin() needs a read barrier at the end, to make sure the calling context doesn't load state from before the other CPU stored the switch handle. Also, that same spot in switch_spin was spinning with interrupts held, which means it needs a call to arch_spin_relax() to avoid a FPU state deadlock on some architectures. Signed-off-by: Andy Ross <[email protected]>
As discovered by Carlo Caione, the k_thread_join code had a case where it detected it had been called on a thread already marked _THREAD_DEAD and exited early. That's not sufficient. The thread state is mutated from the thread itself on its exit path. It may still be running! Just like the code in z_swap(), we need to spin waiting on the other CPU to write the switch handle before knowing it's safe to return, otherwise the calling context might (and did) do something like immediately k_thread_create() a new thread in the "dead" thread's struct while it was still running on the other core. There was also a similar case in k_thread_abort() which had the same issue: it needs to spin waiting on the other CPU to kill the thread via the same mechanism. Fixes zephyrproject-rtos#58116 Originally-by: Carlo Caione <[email protected]> Signed-off-by: Andy Ross <[email protected]>
24fb5fb
to
a654d8a
Compare
new_thread->switch_handle = NULL; | ||
barrier_dmem_fence_full(); /* write barrier */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hilariously, after I was the one who said we should just merge the full barrier API first for simplicity and worry about read/write (ARM, traditional) and/or acquire/release/consume (C++ terms, sigh) complexity later...
...I hit a spot where there's a clear use for finer barrier control within mere days. Still don't think it's wrong to have the conservative API though.
Taking a look right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and is holding up in testing.
It's a definite improvement. I think there are still a few cracks in pthreads where water is getting through, but k_threads look good to me |
FWIW, after the expected Keystone Cops hijinks getting the scripting to work, I have a rig with 32 of these running on arm64 in parallel. They've been going a little over an hour now without issue, which is more testing in aggregate than either riscv platform got. I'll let it run for the rest of the afternoon, but it seems pretty clear that @carlocaione found the single issue that was affecting that particular test. The remaining pthread crashes are almost certainly higher level. |
pthreads have some windows where there are race conditions. My next question: @andyross , @carlocaione , @npitre - is there a list of PR's that should be backported to v2.7-branch, because this issue was first discovered there 😁 Today is the release date for v2.7.5, although the fixes are unlikely to make it in for this release |
Regarding backports: this relies on brand new APIs, so it's not going to backport cleanly. @carlocaione 's original one-line fix is what we probably want in older branches. Though it's worth pointing out that as interesting as this was as a bug, it's actually going to be extremely rare in practice. Even arm64 was going through millions of thread create/join cycles before it tripped. Real apps just aren't ever going to do that; I'd view this as fairly low priority as a backport candidate. |
I'll give that a shot then |
[This is @carlocaione's one-line fix from #58116 with some added refactoring from me to clean up docs and use some of the spiffy new APIs for spinning and barriers.]
As discovered by Carlo Caione, the k_thread_join code had a case where
it detected it had been called on a thread already marked _THREAD_DEAD
and exited early. That's not sufficient. The thread state is mutated
from the thread itself on its exit path. It may still be running!
Just like the code in z_swap(), we need to spin waiting on the other
CPU to write the switch handle before knowing it's safe to return,
otherwise the calling context might (and did) do something like
immediately k_thread_create() a new thread in the "dead" thread's
struct while it was still running on the other core.
There was also a similar case in k_thread_abort() which had the same
issue: it needs to spin waiting on the other CPU to kill the thread
via the same mechanism.
Fixes #58116