Skip to content

[v2.7] kernel: thread: race condition between create and join #58362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cfriedt opened this issue May 27, 2023 · 1 comment
Closed

[v2.7] kernel: thread: race condition between create and join #58362

cfriedt opened this issue May 27, 2023 · 1 comment
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug LTS Long term release branch related priority: medium Medium impact/importance bug Stale

Comments

@cfriedt
Copy link
Member

cfriedt commented May 27, 2023

Describe the bug
This is a placeholder / duplicate of #58116 specifically for v2.7-branch (where the issue was originally discovered)

Please also mention any information which could help others to understand
the problem you're facing:

  • What target platform are you using? qemu_cortex_a53_smp (some of us are also using qemu_riscv64_smp via topic-v2.7-riscv)
  • What have you tried to diagnose or workaround this issue? backporting kernel/sched: Fix SMP must-wait-for-switch conditions in abort/join #58334 but that alone is insufficient
  • Is this a regression? It's hard to say. I don't know to be honest, because we have never had test coverage for this kind of thing until (almost) now
  • ...

To Reproduce
Steps to reproduce the behavior:

git checkout -b backport-58334-to-v2.7-branch origin/backport-58334-to-v2.7-branch
git remote add cfriedt https://github.com/cfriedt/zephyr.git
git fetch cfriedt
git merge cfriedt/pthread-pressure-v2.7
west build -b qemu_cortex_a53_smp -t run tests/posix/pthread_pressure

The error still occurs with k_threads on qemu_cortex_a53_smp and my cobbled-together backport, it doesn't even make it to testing pthreads.

Expected behavior
creating / joining threads in quick succession (both k_thread and pthread) should work reliably.

Impact
Bit of a showstopper for anyone wishing to run high throughput payloads on SMP platforms, unfortunately.

Logs and console output

west build -p auto -b qemu_cortex_a53_smp -t run tests/posix/pthread_pressure
...
*** Booting Zephyr OS build v2.7.4-68-g2bf3e0fb0219  ***
Secondary CPU core 1 (MPID:0x1) is up
Running test suite pthread_pressure
===================================================================
START - test_k_thread_create_join
BOARD: qemu_cortex_a53
NUM_THREADS: 2
TEST_NUM_CPUS: 2
TEST_DURATION_S: 5
TEST_DELAY_US: 0
now (ms): 1010 end (ms): 5000
Thread 0 created and joined 26446 times
Thread 1 created and joined 26446 times
now (ms): 2010 end (ms): 5000
Thread 0 created and joined 54041 times
Thread 1 created and joined 54040 times
now (ms): 3010 end (ms): 5000
Thread 0 created and joined 80825 times
Thread 1 created and joined 80822 times
now (ms): 4010 end (ms): 5000
Thread 0 created and joined 104379 times
Thread 1 created and joined 104372 times
now (ms): 5000 end (ms): 5000
Thread 0 created and joined 129959 times
Thread 1 created and joined 129951 times
E: ELR_ELn: 0x0000000000000000
E: ESR_ELn: 0x0000000086000006
E:   EC:  0x21 (Instruction Abort taken without a change in Exception level.)
E:   IL:  0x1
E:   ISS: 0x6
E: FAR_ELn: 0x0000000000000000
E: TPIDRRO: 0x010000004001c9f8
E: x0:  0x0000000000000000  x1:  0x000000004001c360
E: x2:  0x0000000000000000  x3:  0x000000004001ca58
E: x4:  0x0000000000000000  x5:  0x0000000000000000
E: x6:  0x0000000000000000  x7:  0x00000000ffffffff
E: x8:  0x000000000000000a  x9:  0x00000000ffffffff
E: x10: 0x0000000000000000  x11: 0x0000000000000000
E: x12: 0x0000000000000000  x13: 0x0000000000000000
E: x14: 0x0000000000000000  x15: 0x0000000000000000
E: x16: 0x0000000000000000  x17: 0x0000000000000000
E: x18: 0x0000000000000000  x30: 0x0000000040005b78
E: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
E: Current thread: 0x4001c260 (main)
E: Halting system

Environment (please complete the following information):

  • OS: (e.g. Linux, MacOS, Windows): macOS
  • Toolchain (e.g Zephyr SDK, ...): Zephyr SDK 0.13.1
  • Commit SHA or Version used: 4fc4dc7

Additional context
Originally reported in #56163

@cfriedt cfriedt added bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug area: Kernel LTS Long term release branch related labels May 27, 2023
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jul 27, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug LTS Long term release branch related priority: medium Medium impact/importance bug Stale
Projects
None yet
Development

No branches or pull requests

2 participants