tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028 and nrf52_pca10040 #8159

nashif · 2018-06-04T16:23:08Z

Looks like this bug has been failing on this platform since the test was introduced

***** delaying boot 1000ms (per build configuration) *****
***** Booting Zephyr OS 1.12.0-rc2 (delayed boot 1000ms) *****
Running test suite test_fifo_timeout
===================================================================
starting test - test_timeout_empty_fifo
PASS - test_timeout_empty_fifo
===================================================================
starting test - test_timeout_non_empty_fifo
PASS - test_timeout_non_empty_fifo
===================================================================
starting test - test_timeout_fifo_thread
PASS - test_timeout_fifo_thread
===================================================================
starting test - test_timeout_threads_pend_on_fifo
thread (q order: 2, t/o: 0, fifo 0x20000000)
thread (q order: 3, t/o: 10, fifo 0x20000000)
thread (q order: 0, t/o: 20, fifo 0x20000000)
thread (q order: 4, t/o: 30, fifo 0x20000000)
thread (q order: 1, t/o: 40, fifo 0x20000000)
PASS - test_timeout_threads_pend_on_fifo
===================================================================
starting test - test_timeout_threads_pend_on_dual_fifos
thread (q order: 0, t/o: 0, fifo 0x20000010)
thread (q order: 5, t/o: 10, fifo 0x20000000)
FAIL - test_multiple_threads_pending@156. *** thread 3 woke up, expected 2

 Assertion failed at /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/LABEL/verify/SHARD/3-3/ZEPHYR_GCC_VARIANT/zephyr/zephyr.git/tests/kernel/fifo/fifo_timeout/src/main.c:386: test_timeout_threads_pend_on_dual_fifos: rv not equal to TC_PASS
FAIL - test_timeout_threads_pend_on_dual_fifos
===================================================================
starting test - test_timeout_threads_pend_fail_on_fifo
FAIL - test_multiple_threads_get_data@207. *** thread 7 woke up, expected 0

 Assertion failed at /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/LABEL/verify/SHARD/3-3/ZEPHYR_GCC_VARIANT/zephyr/zephyr.git/tests/kernel/fifo/fifo_timeout/src/main.c:401: test_timeout_threads_pend_fail_on_fifo: rv not equal to TC_PASS
FAIL - test_timeout_threads_pend_fail_on_fifo
===================================================================
===================================================================
RunID: ci-180601-1931-1716:k8if
PROJECT EXECUTION FAILED

 Assertion failed at /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/LABEL/verify/SHARD/3-3/ZEPHYR_GCC_VARIANT/zephyr/zephyr.git/tests/kernel/fifo/fifo_timeout/src/main.c:171: test_thread_pend_and_get_data: packet != NULL is false


 Assertion failed at /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/LABEL/verify/SHARD/3-3/ZEPHYR_GCC_VARIANT/zephyr/zephyr.git/tests/kernel/fifo/fifo_timeout/src/main.c:171: test_thread_pend_and_get_data: packet != NULL is false


 Assertion failed at /home/jenkins/workspace/zephyr-master-tcf-v0.11-branch/LABEL/verify/SHARD/3-3/ZEPHYR_GCC_VARIANT/zephyr/zephyr.git/tests/kernel/fifo/fifo_timeout/src/main.c:171: test_thread_pend_and_get_data: packet != NULL is false

This test also fails with "prj_poll.conf" that is CONFIG_POLL=y in the latest commit: 5b8e4ae

***** delaying boot 1000ms (per build configuration) *****
***** Booting Zephyr OS v1.12.0-831-g5b8e4ae (delayed boot 1000ms) *****
Running test suite test_fifo_timeout
===================================================================
starting test - test_timeout_empty_fifo
PASS - test_timeout_empty_fifo
===================================================================
starting test - test_timeout_non_empty_fifo
PASS - test_timeout_non_empty_fifo
===================================================================
starting test - test_timeout_fifo_thread
PASS - test_timeout_fifo_thread
===================================================================
starting test - test_timeout_threads_pend_on_fifo
 thread (q order: 2, t/o: 0, fifo 0x20000000)
 thread (q order: 3, t/o: 10, fifo 0x20000000)
 thread (q order: 0, t/o: 20, fifo 0x20000000)
 thread (q order: 4, t/o: 30, fifo 0x20000000)
 thread (q order: 1, t/o: 40, fifo 0x20000000)
PASS - test_timeout_threads_pend_on_fifo
===================================================================
starting test - test_timeout_threads_pend_on_dual_fifos
 thread (q order: 0, t/o: 0, fifo 0x20000010)
 thread (q order: 5, t/o: 10, fifo 0x20000000)
FAIL - test_multiple_threads_pending@156.  *** thread 3 woke up, expected 2

    Assertion failed at /home/pswarnak/workspace/1.12_execution/zephyr/tests/kernel/fifo/fifo_timeout/src/main.c:396: test_timeout_threads_pend_on_dual_fifos: rv not equal to TC_PASS

FAIL - test_timeout_threads_pend_on_dual_fifos
===================================================================
starting test - test_timeout_threads_pend_fail_on_fifo
FAIL - test_multiple_threads_get_data@207.  *** thread 7 woke up, expected 0

    Assertion failed at /home/pswarnak/workspace/1.12_execution/zephyr/tests/kernel/fifo/fifo_timeout/src/main.c:411: test_timeout_threads_pend_fail_on_fifo: rv not equal to TC_PASS


    Assertion failed at /home/pswarnak/workspace/1.12_execution/zephyr/tests/kernel/fifo/fifo_timeout/src/main.c:129: test_thread_pend_and_timeout: packet == NULL is false

FAIL - test_timeout_threads_pend_fail_on_fifo
===================================================================
===================================================================
RunID: :wjv2
PROJECT EXECUTION FAILED

    Assertion failed at /home/pswarnak/workspace/1.12_execution/zephyr/tests/kernel/fifo/fifo_timeout/src/main.c:171: test_thread_pend_and_get_data: packet != NULL is false


    Assertion failed at /home/pswarnak/workspace/1.12_execution/zephyr/tests/kernel/fifo/fifo_timeout/src/main.c:171: test_thread_pend_and_get_data: packet != NULL is false


    Assertion failed at /home/pswarnak/workspace/1.12_execution/zephyr/tests/kernel/fifo/fifo_timeout/src/main.c:171: test_thread_pend_and_get_data: packet != NULL is false


    Assertion failed at /home/pswarnak/workspace/1.12_execution/zephyr/tests/kernel/fifo/fifo_timeout/src/main.c:171: test_thread_pend_and_get_data: packet != NULL is false

This failure(with CONFIG_POLL=y) was not seen in earlier commits.

The text was updated successfully, but these errors were encountered:

nashif · 2018-06-04T18:01:39Z

most likely a testcase issue?

punitvara · 2018-06-06T10:24:20Z

Looks like testcase ordering issue. After moving test_timeout_threads_pend_on_dual_fifos to the end test is passing. Doing further debugging to root cause the issue.

pizi-nordic · 2018-06-07T14:32:58Z

Applying #8249 together with #8259 might solve this problem if it was caused by timer issues.

ramakrishnapallala · 2018-06-07T14:36:27Z

@punitvara can you re-test with this PR #8249

punitvara · 2018-06-08T05:29:49Z

@pizi-nordic You suspect it right that it could be related to timer because only platform dependent function is k_cycle_get_32. However, even after your suggest, this test case failed after applying both PR #8249 and #8259

punitvara · 2018-06-11T11:19:45Z

Even after adding one printk error is not reproducible.

punitvara · 2018-06-12T05:23:41Z

@pizi-nordic Can you please check whether still timer related issue persist for nrf because even adding printk solving this problem ?

pizi-nordic · 2018-06-14T09:42:49Z

I am still working on the timing issues. I can try to look at this problem as soon as all timing issues I see now will be resolved.

pizi-nordic · 2018-07-05T13:50:21Z

The test is pretty simple:

Create N threads,
In each thread, do k_fifo_get(<empty-fifo>) with different timeout, then notify main thread.
Check if notifications from threads come in correct order (should be in order specified by timeout values).

I found 2 different cases when this test fails:

Case 1:

- Thread A is created.
- Thread B is created.
- Thread A starts and calls k_fifo_get() with 20ms timeout (2+1 ticks)
--- TICK ----
- Thread B starts and calls k_fifo_get() with 10 ms timeout (1+1 ticks)
--- TICK ---
--- TICK ---
- Thread A wakes up and pushes notification (correct, slept 3 ticks).
- Thread B wakes up and pushes notification (correct, slept 2 ticks).

Test fails because it expects notification from thread B before notification from A (scheduler wakes up threads in different order than the one expected by the test). However the system behaviour looks correct (threads are woken up after specified timeout).

@nashif: You know more about Zephyr scheduler: Is the thread wakeup order defined if multiple threads timeouts an the same time? (if yes, we could have bug in the scheduler, otherwise the test should be updated).

Case 2:

- Thread A is created.
- Thread B is created.
- Thread A starts and calls k_fifo_get() with 20ms timeout (2+1 ticks)
--- SYSTEM BLOCKED FOR 100-200ms ---
- Thread A wakes up and pushes notification (correct, slept much more than 3 ticks).
- Thread B starts and calls k_fifo_get() with 10 ms timeout (1+1 ticks)
- Thread B wakes up and pushes notification (correct, slept 2 ticks).

In this case, the test is disrupted by gap in execution lasting dozens of ms. Such gap is easily observed, when UART console is used and almost disappears when RTT is used, so I think that it is related to printk() handling. I going to investigate this scenario a bit.

pizi-nordic · 2018-07-05T15:53:15Z

It looks that Case 2 was introduced by my debug infrastructure due to #8763.

ManojSubbarao · 2018-07-09T07:20:55Z

@punitvara is this issue fixed?

punitvara · 2018-07-10T13:39:21Z

@ManojSubbarao No. While I am working on ADC consolidation, will look into it whenever I get sometime

ManojSubbarao · 2018-08-28T04:51:38Z

@rgundi Please look into this issue.

rgundi · 2018-09-03T15:26:58Z

Working on this. Looks like this is happening because the nrf boards are pretty slow which means even few instructions will end up adding ms. Will explore more and update shortly.

nashif · 2018-09-04T17:18:21Z

@rgundi any progress?

rgundi · 2018-09-05T13:26:53Z

This seems to be a weird issue. Verified everything that was said by @punitvara and @pizi-nordic to be true. Did some more experiments and saw that the issue goes away even on minor modifications.
@pizi-nordic : Regarding your question about the zephyr scheduler, please refer _add_timeout function in timeout_q.h. If there's a new timeout expiring on the same system clock tick as other timeouts already present in the _timeout_q, it is "prepended" to these timeouts. So, we should have seen this working properly if we were hitting that case.
@nashif : This is one of those timing issues. This is so weird that the issue will not be seen if "boot_delay" is disabled. Also, this issue goes away if we double the timeout values for the fifo (which sounds logical). I will continue to debug this and keep you posted.

nashif · 2018-09-06T15:56:35Z

I am tempted to declare this as a test related issue, do you agree?

rgundi · 2018-09-06T16:56:01Z

No. Not yet. I'll probably need one more day's time to come to some kind of a decision. I see that all the threads are properly populated in the timeout queue just before the issue is seen (last pass case). However, somehow it appears one of the threads is strangely getting bumped off the timeout queue without getting serviced. I'll know more tomorrow.

rgundi · 2018-09-09T15:55:20Z

Looks like there's some issue when "delta_ticks_from_prev" is 0 (i.e. when at least 2 timeouts are expiring on the same system clock tick). Further debug in progress.

rgundi · 2018-09-11T19:17:38Z

Finally I am able to understand the behaviour. @pizi-nordic: This is about the zephyr scheduler as you rightly speculated. There are 2 functions at play here.

_add_timeout function in timeout_q.h - If there's a new timeout expiring on the same system clock tick as other timeouts already present in the _timeout_q, it is "prepended" to these timeouts. So, if thread A and thread B are timing out on the same system clock tick and if thread B is already in the _timeout_q, thread A will be "prepended" to the q.
handle_timeouts in sys_clock.c – This function handles timeouts by dequeuing the expired ones from _timeout_q and queuing them on a local queue. In this local queue, the order of queuing in _timeout_q for the threads timing out on the same system clock tick is reversed. Hence, effectively, they end up being processed in the same order they were added, time-wise. This means thread B will be serviced first followed by thread A.

In this particular test case, thread 2 and thread 3 timeout on the same system clock tick but thread 3 gets processed earlier as it is added prior to thread 2. The order of servicing can be ascertained by putting a breakpoint in the function test_thread_pend_and_timeout just after the k_fifo_get function call and printing the _kernel.current value there successively. Before this, the entire timeout_q can be printed for comparison (with delta_from_prev_tick and thread_id for each threads). Below is the gdb command which does that.

(gdb) p ((struct _timeout *)(_kernel.timeout_q.head))->thread
$1 = (struct k_thread *) 0x20000290 <ttdata+624>
(gdb) p ((struct _timeout *)(_kernel.timeout_q.head))->delta_ticks_from_prev
$2 = 1

(gdb) p ((struct _timeout *)(_kernel.timeout_q.head->next))->thread
$3 = (struct k_thread *) 0x20000020
(gdb) p ((struct _timeout *)(_kernel.timeout_q.head->next))->delta_ticks_from_prev
$4 = 0

rgundi · 2018-09-11T19:21:52Z

@nashif : Since this is a known behavior of the kernel and since there's nothing wrong with this, I propose we classify this as a test case issue. The test case can simply be fixed by doubling the timeouts specified (i.e. 10ms should become 20ms, 20ms should become 40ms and so on). Let me know if you agree with this modification.

andyross · 2018-09-11T19:59:00Z

That sounds right to me. No: in general there is no guarantee of wakeup order when multiple threads are woken up on the same tick. While I think timeout handling is uniformly done with a simple dlist that never reorders, things like wait_q's can be more complicated when iterated over.

So if you have a situation like this where "ordered" timeouts are aliasing into a single tick, you can get this behavior. I'd consider that a test bug -- it should be validating that (at least) the timeout values differ by more than one full tick as defined by CONFIG_SYS_CLOCK_TICKS_PER_SEC. (Even then it's not foolproof if something else loads the system or locks interrupts to cause a tick to be handled late, but the test should be able to guarantee that too)

andyross · 2018-09-11T20:00:32Z

And for reference: NRF5x timer handling does seem to produce surprises. There's a similar thread ordering bug (that I haven't tried to dig into yet) reported against the EDF test on NRF5 when combined with CONFIG_BT: #9843

pizi-nordic · 2018-09-12T13:35:07Z

Finally I am able to understand the behaviour. @pizi-nordic: This is about the zephyr scheduler as you rightly speculated. There are 2 functions at play here. . There are (...)

This wasn't speculation :).
Thank you for digging into this problem. Great job!

rgundi · 2018-09-18T06:22:41Z

@andyross : Please review the PR #10047

There is no guarantee of wake-up order when multiple threads are woken up on the same tick. Hence, modified the tests accordingly. Fixes zephyrproject-rtos#8159. Signed-off-by: Rajavardhan Gundi <[email protected]>

There is no guarantee of wake-up order when multiple threads are woken up on the same tick. Hence, modified the tests accordingly. Fixes #8159. Signed-off-by: Rajavardhan Gundi <[email protected]>

nashif added bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug labels Jun 4, 2018

nashif assigned ramakrishnapallala Jun 4, 2018

spoorthik mentioned this issue Jun 6, 2018

Tests: fifo_timeout fails on nrf51_pca10028 #8198

Closed

ramakrishnapallala assigned punitvara Jun 6, 2018

nashif unassigned ramakrishnapallala Jun 7, 2018

nashif added the In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on label Jun 7, 2018

carlescufi added the platform: nRF Nordic nRFx label Jun 11, 2018

nashif added this to the v1.13.0 milestone Aug 26, 2018

ManojSubbarao assigned rgundi Aug 28, 2018

ManojSubbarao unassigned punitvara Aug 28, 2018

carlescufi changed the title ~~tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028~~ tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028 and nrf52_pca10040 Aug 31, 2018

nashif modified the milestones: v1.13.0, v1.14.0 Sep 11, 2018

carlescufi mentioned this issue Sep 11, 2018

System timer handling with low-frequency timers #9904

Closed

rgundi mentioned this issue Sep 18, 2018

tests/kernel: fifo_timeout: Remove wake-up order checking #10047

Merged

nashif closed this as completed in #10047 Oct 2, 2018

ghost removed the In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on label Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028 and nrf52_pca10040 #8159

tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028 and nrf52_pca10040 #8159

nashif commented Jun 4, 2018 •

edited by pswarnak

Loading

nashif commented Jun 4, 2018

punitvara commented Jun 6, 2018

pizi-nordic commented Jun 7, 2018

ramakrishnapallala commented Jun 7, 2018

punitvara commented Jun 8, 2018

punitvara commented Jun 11, 2018

punitvara commented Jun 12, 2018

pizi-nordic commented Jun 14, 2018 •

edited

Loading

pizi-nordic commented Jul 5, 2018 •

edited

Loading

pizi-nordic commented Jul 5, 2018

ManojSubbarao commented Jul 9, 2018

punitvara commented Jul 10, 2018 •

edited

Loading

ManojSubbarao commented Aug 28, 2018

rgundi commented Sep 3, 2018

nashif commented Sep 4, 2018

rgundi commented Sep 5, 2018

nashif commented Sep 6, 2018

rgundi commented Sep 6, 2018

rgundi commented Sep 9, 2018

rgundi commented Sep 11, 2018

rgundi commented Sep 11, 2018

andyross commented Sep 11, 2018

andyross commented Sep 11, 2018

pizi-nordic commented Sep 12, 2018

rgundi commented Sep 18, 2018

tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028 and nrf52_pca10040 #8159

tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028 and nrf52_pca10040 #8159

Comments

nashif commented Jun 4, 2018 • edited by pswarnak Loading

nashif commented Jun 4, 2018

punitvara commented Jun 6, 2018

pizi-nordic commented Jun 7, 2018

ramakrishnapallala commented Jun 7, 2018

punitvara commented Jun 8, 2018

punitvara commented Jun 11, 2018

punitvara commented Jun 12, 2018

pizi-nordic commented Jun 14, 2018 • edited Loading

pizi-nordic commented Jul 5, 2018 • edited Loading

pizi-nordic commented Jul 5, 2018

ManojSubbarao commented Jul 9, 2018

punitvara commented Jul 10, 2018 • edited Loading

ManojSubbarao commented Aug 28, 2018

rgundi commented Sep 3, 2018

nashif commented Sep 4, 2018

rgundi commented Sep 5, 2018

nashif commented Sep 6, 2018

rgundi commented Sep 6, 2018

rgundi commented Sep 9, 2018

rgundi commented Sep 11, 2018

rgundi commented Sep 11, 2018

andyross commented Sep 11, 2018

andyross commented Sep 11, 2018

pizi-nordic commented Sep 12, 2018

rgundi commented Sep 18, 2018

nashif commented Jun 4, 2018 •

edited by pswarnak

Loading

pizi-nordic commented Jun 14, 2018 •

edited

Loading

pizi-nordic commented Jul 5, 2018 •

edited

Loading

punitvara commented Jul 10, 2018 •

edited

Loading