Skip to content

Some testcases cannot cleanup when assert fail causing an "unexpected eof" on qemu #57681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
povergoing opened this issue May 9, 2023 · 4 comments
Assignees
Labels
area: QEMU QEMU Emulation area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale

Comments

@povergoing
Copy link
Member

Describe the bug
An issue encountered in #55207 (comment) about CI failure
Platform: qemu_cortex_a53_smp, there might be more platform affected.

Some testcases like tests/kernel/sched/schedule_api/kernel.scheduler has the following logic structure

  1. resource alloc
  2. test & assert
  3. resource cleanup

The issue is, when assert fail, the fail routine will call k_thread_abort to abort the current thread, thus the thread cannot clean up the allocated resource. Taking tests/kernel/sched/schedule_api/kernel.scheduler as an example,

setup_threads();
spawn_threads(0);
/* checkpoint: higher priority threads get executed immediately */
zassert_true(tdata[0].executed == 1);
k_busy_wait(500000); /* 500 ms */
/* checkpoint: equal priority threads get executed every time slice */
zassert_true(tdata[1].executed == 0);
for (int i = 2; i < THREADS_NUM; i++) {
zassert_true(tdata[i].executed == 0);
}
/* restore environment */
teardown_threads();

The test thread first spawns 3 threads, then test and finally abort the spawned threads. But if the assert fails,
https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/kernel/sched/schedule_api/src/test_sched_timeslice_and_lock.c#L255
the assert will call k_thread_abort to abort the current thread (test thread), so that the test thread will never teardown the spawned threads which are living in runq 'forever'. However, the next test will respawn the threads without checking if these threads have been aborted or still living in runq and just init them again so that the same threads will exist in runq twice or even more times. This issue probably causes an unexpected exception and further, the unexpected exception will call PSCI func to shutdown the core resulting in an unexpected exit of QEMU.

To Reproduce
Steps to reproduce the behavior:

  1. tweak source
diff --git a/tests/kernel/sched/schedule_api/src/test_sched_timeslice_and_lock.c b/tests/kernel/sched/schedule_api/src/test_sched_timeslice_and_lock.c
index eb5de2cd81..fca0e3f29a 100644
--- a/tests/kernel/sched/schedule_api/src/test_sched_timeslice_and_lock.c
+++ b/tests/kernel/sched/schedule_api/src/test_sched_timeslice_and_lock.c
@@ -290,7 +290,8 @@ ZTEST(threads_scheduling, test_time_slicing_disable_preemptible)
        zassert_true(tdata[0].executed == 1);
        k_busy_wait(500000); /* 500 ms */
        /* checkpoint: equal priority threads get executed every time slice */
-       zassert_true(tdata[1].executed == 0);
+       //zassert_true(tdata[1].executed == 0);
+       zassert_true(false);
        for (int i = 2; i < THREADS_NUM; i++) {
                zassert_true(tdata[i].executed == 0);
        }
  1. west build -p always -b qemu_cortex_a53_smp -t run -T zephyr/tests/kernel/sched/schedule_api/kernel.scheduler
  2. See an unexpected error and no statistical info for ztest
    or
  3. west twister -p qemu_cortex_a53_smp -s tests/kernel/sched/schedule_api/kernel.scheduler
  4. See unexpected eof
INFO    - Using Ninja..
INFO    - Zephyr version: zephyr-v3.3.0-1084-g15ed0457b525
INFO    - Using 'zephyr' toolchain.
INFO    - Building initial testsuite list...
INFO    - Writing JSON report /src/zephyrproject/twister-out/testplan.json
INFO    - JOBS: 8
INFO    - Adding tasks to the queue...
INFO    - Added initial list of jobs to queue
ERROR   - qemu_cortex_a53_smp       tests/kernel/sched/schedule_api/kernel.scheduler    FAILED : unexpected eof
ERROR   - see: /src/zephyrproject/twister-out/qemu_cortex_a53_smp/tests/kernel/sched/schedule_api/kernel.scheduler/handler.log
INFO    - Total complete:    1/   1  100%  skipped:    0, failed:    1, error:    0
INFO    - 1 test scenarios (1 test instances) selected, 0 configurations skipped (0 by static filter, 0 at runtime).
INFO    - 0 of 1 test configurations passed (0.00%), 1 failed, 0 errored, 0 skipped with 0 warnings in 31.71 seconds
INFO    - In total 29 test cases were executed, 0 skipped on 1 out of total 551 platforms (0.18%)
INFO    - 1 test configurations executed on platforms, 0 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing JSON report /src/zephyrproject/twister-out/twister.json
INFO    - Writing xunit report /src/zephyrproject/twister-out/twister.xml...
INFO    - Writing xunit report /src/zephyrproject/twister-out/twister_report.xml...
INFO    - -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
INFO    - The following issues were found (showing the top 10 items):
INFO    - 1) tests/kernel/sched/schedule_api/kernel.scheduler on qemu_cortex_a53_smp failed (unexpected eof)
INFO    - 
INFO    - To rerun the tests, call twister using the following commandline:
INFO    - west twister -p <PLATFORM> -s <TEST ID>, for example:
INFO    - 
INFO    - west twister -p qemu_cortex_a53_smp -s tests/kernel/sched/schedule_api/kernel.scheduler
INFO    - or with west:
INFO    - west build -p -b qemu_cortex_a53_smp -T tests/kernel/sched/schedule_api/kernel.scheduler
INFO    - -+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
INFO    - Run completed

Expected behavior
ztest should have a statistical info
twister should not have unexpected eof
Impact
annoyance

Logs and console output

Environment (please complete the following information):

  • OS: Linux
  • Toolchain Zephyr SDK
  • main branch

Additional context
possible solutions:

  1. rewrite all the testcases which has the similar issue, but too much effort
  2. rewrite the assert, not abort the thread immediately, but would have a greater impact
  3. disable psci shutdown when qemu test, but it's really a workaround.
  4. ...
@povergoing povergoing added bug The issue is a bug, or the PR is fixing a bug area: QEMU QEMU Emulation area: Tests Issues related to a particular existing or missing test area: ARM64 ARM (64-bit) Architecture labels May 9, 2023
@povergoing
Copy link
Member Author

kindly ping @carlocaione, could you please help to tag someone who is interested in this topic?

@carlocaione carlocaione removed the area: ARM64 ARM (64-bit) Architecture label May 9, 2023
@jgl-meta jgl-meta added priority: low Low impact/importance bug priority: medium Medium impact/importance bug and removed priority: low Low impact/importance bug labels May 9, 2023
@github-actions
Copy link

github-actions bot commented Jul 9, 2023

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jul 9, 2023
@nashif nashif removed the Stale label Jul 17, 2023
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Sep 16, 2023
@jhedberg jhedberg removed the Stale label Sep 26, 2023
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Nov 26, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: QEMU QEMU Emulation area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug Stale
Projects
None yet
Development

No branches or pull requests

5 participants