Timer fixes #12204

andyross · 2018-12-20T18:13:19Z

Two bugs discovered since the timer rewrite: one that would produce incorrect expiration times for timeouts set within timeout handlers in the case where multiple timeouts were expiring in the same tick. Another was a potential timeout list corruption issue if the timeout handler (or under SMP, simultaneous code racing against it) tries to remove a timeout (for example by aborting a thread).

This doesn't fix the API confusion about "what is 'current time' to a timeout handler when the interrupt was delayed" detailed in #11722. I'm going to think a little more on that one and see if an API option makes sense, or at least update the docs.

Relevant to #11722, this works for me over 30 iterations of the timer_api test, so I'm reasonably confident the bug is fixed even if I was never able to reproduce it.

andyross · 2018-12-20T18:14:42Z

(Apologies for the assignee spam -- I added folks as assignees instead of reviewers, sorry!)

andyross · 2018-12-20T18:15:57Z

Oh, and note that the first patch (the list handling one) is also included in the x86_64 series. I cherry picked it here because it's more relevant to timer work. I'll update whichever PR gets merged last to remove, just be aware that if you try to pull both you'll get a collision.

codecov-io · 2018-12-20T18:46:08Z

Codecov Report

Merging #12204 into master will increase coverage by 0.02%.
The diff coverage is 93.1%.

@@            Coverage Diff             @@
##           master   #12204      +/-   ##
==========================================
+ Coverage    48.2%   48.22%   +0.02%     
==========================================
  Files         294      294              
  Lines       44192    44200       +8     
  Branches    10603    10607       +4     
==========================================
+ Hits        21301    21316      +15     
+ Misses      18624    18616       -8     
- Partials     4267     4268       +1

Impacted Files	Coverage Δ
kernel/sched.c	`92.83% <ø> (ø)`	⬆️
kernel/timeout.c	`86.11% <93.1%> (-0.56%)`	⬇️
drivers/clock_control/nrf5_power_clock.c	`50% <0%> (-2.11%)`	⬇️
boards/posix/native_posix/timer_model.c	`51.82% <0%> (+0.72%)`	⬆️
boards/posix/nrf52_bsim/time_machine.c	`52.45% <0%> (+4.91%)`	⬆️
boards/posix/nrf52_bsim/argparse.c	`54.28% <0%> (+12.85%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e8ecd27...4c5d450. Read the comment docs.

pabigot · 2018-12-24T12:24:38Z

kernel/timeout.c

-	if (next(t) != NULL) {
-		next(t)->dticks += t->dticks;
-	}
+	if (t->node.next != NULL && t->node.prev != NULL) {


Does this problem exist generally, in that whenever a dnode is removed from a dlist that fact can't be determined from the dnode itself? If so, would it be better to extend the dlist API with something like:

/* mark that node is not in any list */ static inline void sys_dnode_unlink(sys_dnode_t *node) { if (node) { node->next = NULL; } } /* check whether node is in a list */ static bool sys_dnode_is_linked(sys_dnode_t *node) { return node && (node->next != NULL); }

then implement as:

static void remove_timeout(struct _timeout *t) { if (sys_dnode_is_linked(&t->node)) { if (next(t) != NULL) { next(t)->dticks += t->dticks; } sys_dlist_remove(&t->node); sys_dnode_unlink(&t->node); t->dticks = _INACTIVE; } }

?

It seems this would make the intent more clear, and eliminate the manipulations of prev which I believe should be non-null if and only if next is non-null. It would also make the solution available to any other uses of sys_dlist_remove where the same issue might be relevant.

It might also remove the need to use _INACTIVE as a flag value.

I don't disagree. Though I think there's an argument that "unlink" and "remove" being distinct APIs is going to be confusing for readers. The dlist is a very low level intrusive data structure, honestly I think it's going to be a little error prone always.

But I totally agree about the "_INACTIVE" state, it's used very inconsistently already and ought to be removed.

I'd considered clear instead of unlink but that didn't seem as...clear. If there's another wording that's better....

Alternatively, put the node->next = NULL assignment into remove, which would be safer anyway at a very minor overall cost. An ability to reliably check whether a node is already in a list before adding it to a list could be very helpful.

I generally support @pabigot idea, however I would like to add small modification:
To be consistent with the way how list is implemented in Zephyr, I suggest following Linux (https://elixir.bootlin.com/linux/latest/ident/list_del_init) and mark removed node by setting node->next (and node->prev) to node.

Researching what @pizi-nordic proposed: In Linux list_del_init removes the entry and initializes the entry as an empty list head. list_del removes the entry and sets the next and prev pointers to poisoned values (dereference causes a page fault that can be diagnosed). list_del is used about twice as often as list_del_init in Linux, but list_del_init is to be used when the removed entry may later be added to another list.

Although the structure of a doubly-linked list is the same in Linux and Zephyr the pointer management is different in both operations supported and order-of-assignment for specific operations like add/prepend. Zephyr uses distinct type names for list nodes and list heads and they aren't initialized the same way, unlike Linux which only uses list_head. So I'm not convinced that resetting a removed sys_dnode_t to appear like an empty sys_dlist_t should be preferred over initializing a removed node to have null pointers, which is presumably what a Zephyr sys_dnode_t that's never been added to a list would have (if it isn't left uninitialized).

In either case, after reviewing this I do think it's important that both next and prev be assigned to self or a poison/null value when the entry is removed, not just next as I originally proposed.

Can we submit this as an enhancement against the dlist code? It's not really timer related, and obviously has code size impact to consider: there are a lot of dlist users and while many could probably benefit from being ported to this scheme a naive change would do nothing but add a bunch of code bytes.

@andyross Do you mean you're OK with dropping this patch from this PR, or that you'd rather merge this change as-is rather than bike-shed it here?

Either is fine with me. I will file a PR to fix dlist, because as you've found the existing code is problematic and it should be fixed more generally. IMO there's too much "hack it now, we'll fix it properly later" in Zephyr, and later never comes.

If the problem is fixed globally about 42 tests fail sanity check. This after working around a bug in poll which double-removes events from a list (segfaults in many native_posix tests). Looks like several parts of Zephyr depend on the pointers to neighboring nodes remaining dereferenceable after removal from a list.

Probably best to merge your fix for now.

pizi-nordic · 2018-12-27T13:36:29Z

Unfortunately the problem still exists on top of the #12204.
The test failed 2 times during during my test run (160 iterations).

===================================================================
starting test - test_timer_periodicity

    Assertion failed at /home/pizi/repos/zephyr/tests/kernel/timer/timer_api/src/main.c:240: test_timer_periodicity: (((delta) >= (50)) && ((delta) <= (50) + (1))) is false

FAIL - test_timer_periodicity

pizi-nordic · 2018-12-27T13:38:06Z

The #11722 still fails (#11722 (comment)) with this patch applied.

pizi-nordic

The problems described in #11722 are still not resolved.

pizi-nordic · 2018-12-27T14:18:51Z

kernel/timeout.c

+
+	announce_remaining = ticks;
+
+	while (first() != NULL && first()->dticks <= announce_remaining) {


This still allows for immediate firing of timeouts scheduled by the t->fn(t), breaking monotonicity of the timer.
(please consider case when announce_remaining is larger than scheduled time).

For like the ninth time, this is a feature, not a bug. It's not the source of the problem you're seeing, I promise.

I opened #12247 to try to explain this yet again. Please take the discussion about API changes there.

(Also: there's no violation of monotonicity -- timers expire in the order expected always. DELTA times between timeouts will and MUST be skewed in the face of lost interrupts, the argument is which delta gets skewed.)

andyross · 2018-12-30T06:18:26Z

Is that sub-2% frequency consistent with the original report? Are we sure we aren't talking about a different bug here? The issue fixed would have happened any time you had a double-event stacked in the same tick and more than one tick was being announced. That shouldn't have been stochastic in this way. I was originally led to believe this was a consistent problem.

@pizi-nordic, I can't tell if you're trying to -1 this patch because there's another unresolved bug or because you believe this shouldn't be merged. Can you clarify? These are real bugs and the fixes need to go in. Which patches specifically do you want to keep out of the tree?

pabigot

I'm taking the dlist issue off the table; will have a PR for that. I have no dog in the desired-behavior fight. I do question whether this PR actually changes anything other than the double-remove issue. Please clarify so I can remove the nack.

kernel/timeout.c

This loop was structured badly, as a while(true) with multiple "exit if" cases in the body. It was bad enough that I genuinely fooled myself into rewriting it, having convinced myself there was a bug in it when there wasn't. So keep the rewritten loop which expresses the iteration in a more invariant way (i.e. "while we have an element to expire" and not "test if we have to exit the loop"). Shorter and easier. Also makes the locking clearer as we can simply release the lock around the callback in a natural/obvious way. Signed-off-by: Andy Ross <[email protected]>

The use of dticks == INACTIVE to tell whether or not a timeout was already in the list was insufficient. There is a time period between the moment a timeout is removed from the list and the end of its handler where it is not in the list, yet its list node pointers still point into it. Doing things like aborting a thread while that is true (which can be asynchronous too!) would corrupt the list even though all the operations on it were "atomic". Set the timeout node pointers to nulls atomically when removed, and check for double-remove conditions (which, again, might be perfectly OK). Signed-off-by: Andy Ross <[email protected]>

The recent change that added a locked z_set_timeout_expiry() API obsoleted the subtle note about synchronization above reset_time_slice(). None of that matters any more, the API is synchronized internally in a conventional way. Signed-off-by: Andy Ross <[email protected]>

The logic in z_set_timeout_expiry() missed the case where the ticks argument could be zero (or lower), which can happen naturally due to timing/interrupt slop. In those circumstances, it would still try to reset a timer that was "about to expire at the next tick", which would run afoul of the drivers' internal decisions about how soon a timer interrupt could be set, and then get pushed out to the next tick. Explicitly detect this as an "imminent" predicate to make the logic clearer. Signed-off-by: Andy Ross <[email protected]>

andyross · 2019-01-02T19:52:35Z

New version contains a fix for an edge case that certainly could produce the behavior in #11722 (a delayed tick that would result in late callbacks in the same way a delayed interrupt would). Also a comment cleanup, and I rewrote the commit message for the z_clock_announce() loop since the original bug I saw was apparently a phantom (the new loop structure is still better though).

I'm still working on reproducing the bug locally though.

pabigot · 2019-01-02T20:03:26Z

This commit at the top of pabigot/zephyr:pr/12204 is what I use for reproducibility. Usually fails within 10 repetitions, always on the first iteration. A large EXPIRE_TIMES appears to be critical to quick reproduction.

The updated PR still fails:

===================================================================
starting test - test_timer_periodicity
rep 3 iter 0 failed at 17240 delta 40 exp 50

    Assertion failed at ../src/main.c:246: test_timer_periodicity: false is false

FAIL - test_timer_periodicity

The test_timer_periodicity test is racy and subject to initial state bugs. The operation of that test is to: 1. Start a timer with a known period 2. Take the current time with k_uptime_get() 3. Wait for the timer to fire with k_timer_status_sync() 4. Check that the current time minus start time is the period But that's wrong, because a tick expiring between any of the first three steps is going to skew the math (i.e. the timer will have started on a different tick than the "start time"). And taking an interrupt lock around the process can't fix the issue, because in the tickless world we live in k_uptime_get() is actually a realtime quanity based on a hardware counter and doesn't rely on interrupt delivery. Instead, use another timer object to synchronize the test start to a driver tick, ensuring that even if the race is unfixable the initial conditions are always correct. Signed-off-by: Andy Ross <[email protected]>

andyross · 2019-01-03T01:17:45Z

Got it. After all that, it was a test bug. And it happens everywhere, not just on nRF. We just didn't notice it because the frequency of failure was on the same order as we see with qemu timer tests already, and sanitycheck will retry (one of @andrewboie 's peeves -- yeah, we really need to get -icount working).

Basically, the test had a nonatomic initialization process where it would check current time and start a k_timer, and there was no protection against the case where a tick happened to expire between the two, causing the two times to be off by one tick. Fixed by adding a sync-to-tick process at the start. (This is also a very tidy explanation for why the test only ever failed on the first run -- after the end, something has waited on a timeout and execution is synchronized.)

FWIW: I eventually got it reproduced by putting the test into a while(1) loop with a randomized delay to start at random points in the tick. With this series applied I'm seeing it at 4500 successful iterations and counting.

andyross requested a review from andrewboie as a code owner December 20, 2018 18:13

andyross assigned nashif, pdunaj, andrewboie, pizi-nordic and andyross and unassigned nashif, pdunaj, andrewboie and pizi-nordic Dec 20, 2018

andyross requested review from pdunaj, pizi-nordic and nashif December 20, 2018 18:14

pabigot reviewed Dec 24, 2018

View reviewed changes

pizi-nordic mentioned this pull request Dec 27, 2018

tests/kernel/timer/timer_api fails on nrf52_pca10040 board #11722

Closed

pizi-nordic suggested changes Dec 27, 2018

View reviewed changes

andyross mentioned this pull request Dec 30, 2018

What is the proper timer "now" convention for late interrupts? #12247

Closed

pabigot requested changes Dec 30, 2018

View reviewed changes

kernel/timeout.c Show resolved Hide resolved

pabigot mentioned this pull request Dec 30, 2018

extend dlist to detect unlinked nodes and avoid double-remove bugs #12248

Merged

Andy Ross added 4 commits January 2, 2019 11:37

andyross force-pushed the timer-multiexp branch from 0923b5d to 15c8433 Compare January 2, 2019 19:49

andyross requested review from nniranjhana, pswarnak and spoorthik as code owners January 3, 2019 01:10

pabigot approved these changes Jan 3, 2019

View reviewed changes

pizi-nordic approved these changes Jan 3, 2019

View reviewed changes

nashif merged commit b748d52 into zephyrproject-rtos:master Jan 3, 2019

pabigot mentioned this pull request Jan 6, 2019

Undesirable behaviour of z_clock_announce() when ticks argument is greater by 1 #12332

Open

noelleclement mentioned this pull request Dec 10, 2020

kernel idling / timer API, idle parameter limitations #30590

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timer fixes #12204

Timer fixes #12204

andyross commented Dec 20, 2018

andyross commented Dec 20, 2018

andyross commented Dec 20, 2018

codecov-io commented Dec 20, 2018 •

edited

Loading

pabigot Dec 24, 2018

andyross Dec 24, 2018

pabigot Dec 24, 2018

pizi-nordic Dec 27, 2018

pabigot Dec 28, 2018

andyross Dec 30, 2018

pabigot Dec 30, 2018

pabigot Dec 30, 2018

pizi-nordic commented Dec 27, 2018

pizi-nordic commented Dec 27, 2018

pizi-nordic left a comment

pizi-nordic Dec 27, 2018

andyross Dec 30, 2018

andyross commented Dec 30, 2018

pabigot left a comment

andyross commented Jan 2, 2019

pabigot commented Jan 2, 2019

andyross commented Jan 3, 2019


		announce_remaining = ticks;

		while (first() != NULL && first()->dticks <= announce_remaining) {

Timer fixes #12204

Timer fixes #12204

Conversation

andyross commented Dec 20, 2018

andyross commented Dec 20, 2018

andyross commented Dec 20, 2018

codecov-io commented Dec 20, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pizi-nordic commented Dec 27, 2018

pizi-nordic commented Dec 27, 2018

pizi-nordic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyross commented Dec 30, 2018

pabigot left a comment

Choose a reason for hiding this comment

andyross commented Jan 2, 2019

pabigot commented Jan 2, 2019

andyross commented Jan 3, 2019

codecov-io commented Dec 20, 2018 •

edited

Loading