dlist API enhancements supporting timer cleanup and scheduling fix #12485

pabigot · 2019-01-15T12:48:45Z

This is a roll-up of what was in #12248 and a solution to #12332. At a high level it enables detection of linked state for dnodes, uses that to replace a couple workarounds, extends the dlist API to support splitting and joining lists, then uses that to decouple pending timeouts from new timeouts to avoid an undesirable feature/bug when a timeout callback schedules a new timeout.

There is a large @todo comment in kernel/timeout that needs some eyes on it.

I've also kept the test case for #12332 separate from (after) the solution to #12332, primarily so it can be cherry-picked into existing branches to confirm that the behavior was properly reproduced before the fix. The commits can be merged if that would be preferred.

kernel: thread: correct type and member name of dlist node

Although sys_dnode_t and sys_dlist_t are aliases, their roles are
different and they appear in different positions in dlist API calls.
Change the name of the thread structure field used to link threads into
run queues and wait queues to reflect that it acts as a list node, not a
list head.

kernel: sched: fix empty list detection

CONTAINER_OF() on a NULL pointer returns some offset around NULL and not
another NULL pointer. We have to check for that ourselves.

This only worked because the dnode happened to be at the start of the
struct.

sys: dlist: Add sys_dnode_is_linked

The original implementation allows a list to be corrupted by list
operations on the removed node. Existing code attempts to avoid this by
using external state to determine whether a node is in a list, but this
is fragile and fails when the state that holds the flag value is changed
after the node is removed, e.g. in preparation for re-using the node.

Follow Linux in invalidating the link pointers in a removed node. Add
API so that detection of particpation in a list is available at the node
abstraction.

This solution relies on the following steady-state invariants:

A node (as opposed to a list) will never be adjacent to itself in a
list;
The next and prev pointers of a node are always either both null or
both non-null.

kernel: timeout: remove local fix for double-remove

Use the new generic capability to detect unlinked sys_dnode_t instances.

kernel: poll: fix double-remove of node

k_poll events are registered in a linked list when their signal
condition has been met. The code to clear event registration did not
account for events that were not registered, resulting in double-removes
that produced core dumps on native-posix sanitycheck.

kernel: timeout: detect inactive timeouts using dnode linked state

Whether a timeout is linked into the timeout queue can be determined
from the corresponding sys_dnode_t linked state. This removes the need
to use a special flag value in dticks to determine that the timeout is
inactive.

Update _abort_timeout to return an error code, rather than the flag
value, when the timeout to be aborted was not active.

Remove the _INACTIVE flag value, and replace its external uses with an
internal API function that checks whether a timeout is inactive.

sys: dlist: add API to split lists

Add a constant-time operation that splits a list into a portion before a
node and the portion starting at the node, moving the portion before the
node to the end of a different list and leaving the original list
starting at the node.

Add a constant-time list join operation to append an existing list to a
different list. This is a utility function supporting list splitting.

This supports use cases where a dlist is split at some point
(e.g. timers that have reached their deadline), so that the extracted
elements can be processed without affecting content added to the
dlist during processing.

kernel: timeout: avoid premature adjustment of timeout

The existing loop over timeouts in z_clock_announce adjusts the delay of
the first timeout that has not been expired by the number of ticks
remaining. This adjustment is (by some interpretations) incorrect when
the timeout was added while processing a previous timeout and the
adjustment is not zero, which can happen if multiple ticks are announced
in one call.

Resolve this by changing how the tick clock is managed:

curr_ticks/k_uptime_get() is adjusted as soon as the
z_clock_announce() is invoked. This means that timer callbacks are
able to observe the best available estimate of the current time,
rather than the time at which each callback was supposed to be
invoked. The new policy also ensures that k_uptime_get() is
consistent with k_cycle_get().
Prior to invoking any callbacks the sequence of expired timeouts is
removed from the timeout queue, ensuring new timeouts are scheduled
without the risk of inadvertently reducing their delay by unprocessed
ticks.
Prior to a callback the timeout dticks field is updated to be relative
to the current time, i.e. a non-positive value corresponding to how
late the timeout was invoked.
The tick offset used when scheduling a recurring timeout is adjusted
by the lateness of the invocation, ensuring that the expected callback
times are at a fixed interval even if individual callbacks are late.
The dticks field is zeroed when a timer is initialized, aborted,
started, or stopped, ensuring that non-recurring timeouts respect the
offset passed as the initial duration.

The new policy also enables a future enhancement where each k_timer can
be configured to select between policies where repeating timeouts should
be relative to the scheduled time of the last callback (historical
behavior) or the actual time of the last callback (useful for certain
cases).

Closes #12332.

tests: kernel: timer: test delay for schedule in timer callback

This verifies that when a second timer is scheduled within a delayed
timer callback the lateness of the callback (as reflected by unprocessed
announced ticks) does not impact the timeout of the newly scheduled
timer.

Relates to issue #12332.

Although sys_dnode_t and sys_dlist_t are aliases, their roles are different and they appear in different positions in dlist API calls. Change the name of the thread structure field used to link threads into run queues and wait queues to reflect that it acts as a list node, not a list head. Signed-off-by: Peter A. Bigot <[email protected]>

CONTAINER_OF() on a NULL pointer returns some offset around NULL and not another NULL pointer. We have to check for that ourselves. This only worked because the dnode happened to be at the start of the struct. Signed-off-by: Peter A. Bigot <[email protected]>

The original implementation allows a list to be corrupted by list operations on the removed node. Existing code attempts to avoid this by using external state to determine whether a node is in a list, but this is fragile and fails when the state that holds the flag value is changed after the node is removed, e.g. in preparation for re-using the node. Follow Linux in invalidating the link pointers in a removed node. Add API so that detection of particpation in a list is available at the node abstraction. This solution relies on the following steady-state invariants: * A node (as opposed to a list) will never be adjacent to itself in a list; * The next and prev pointers of a node are always either both null or both non-null. Signed-off-by: Peter A. Bigot <[email protected]>

Use the new generic capability to detect unlinked sys_dnode_t instances. Signed-off-by: Peter A. Bigot <[email protected]>

k_poll events are registered in a linked list when their signal condition has been met. The code to clear event registration did not account for events that were not registered, resulting in double-removes that produced core dumps on native-posix sanitycheck. Signed-off-by: Peter A. Bigot <[email protected]>

Whether a timeout is linked into the timeout queue can be determined from the corresponding sys_dnode_t linked state. This removes the need to use a special flag value in dticks to determine that the timeout is inactive. Update _abort_timeout to return an error code, rather than the flag value, when the timeout to be aborted was not active. Remove the _INACTIVE flag value, and replace its external uses with an internal API function that checks whether a timeout is inactive. Signed-off-by: Peter A. Bigot <[email protected]>

zephyrbot · 2019-01-15T12:50:21Z

Found the following issues, please fix and resubmit:

License/Copyright issues

In most cases you do not need to do anything here, especially if the files reported below are going into ext/ and if license was approved for inclusion into ext/ already. Fix any missing license/copyright issues. The license exception if a JFYI for the maintainers and can be overriden when merging the pull request.

tests/kernel/timer/timer_schedule/README missing license.
tests/kernel/timer/timer_schedule/README missing copyright.

pabigot

Issues to be addressed and pointers to open questions.

kernel/sched.c

pabigot · 2019-01-15T12:55:47Z

kernel/include/timeout_q.h

 }

 void _add_timeout(struct _timeout *to, _timeout_func_t fn, s32_t ticks);

 int _abort_timeout(struct _timeout *to);

+static inline bool _is_inactive_timeout(struct _timeout *t)


@andyross The sense of this can be reversed to _is_active_timeout() if #12264 is resolved. I might be able to include a fix for that if you still want this change made.

pabigot · 2019-01-15T12:58:34Z

kernel/timeout.c

+	 * they missed more than an interval.  But without it
+	 * kernel/timer/timer_api:test_timer_periodicity fails.  Is
+	 * the test making incorrect assumptions about how to trick
+	 * the system? */


This is the point where I need insight. Negative delays are important for maintaining periodicity when ticks are not announced in a timely manner.

@pabigot: Question is what is reference point of _add_timeout()? As far as I know, the reference is "now()" in real time. So scheduling timeouts in past is not possible. On the other hand we should be able to add timeouts which should fire "now()".

The incoming argument ticks is signed, relative to now = k_uptime_get(). It's negative if we've already missed the deadline. We can't unilaterally discard the only information we have on the alignment of the recurring deadline, so negative delays must be recorded and preserved so subsequent reschedules maintain the correct offsets. In practice the timeout will be near the head of the queue (before anything that isn't late) so it will be invoked in the next tick interrupt. That's the earliest possible unless the z_clock_announce is reworked to loop as long as there's a late timeout at the head of the list.

Ok. Now i see it. We have to handle missed periodic timeouts which were missed more than the period.

codecov-io · 2019-01-15T13:46:00Z

Codecov Report

Merging #12485 into master will increase coverage by 0.07%.
The diff coverage is 95.74%.

@@            Coverage Diff             @@
##           master   #12485      +/-   ##
==========================================
+ Coverage   49.72%    49.8%   +0.07%     
==========================================
  Files         292      292              
  Lines       42425    42471      +46     
  Branches     9968     9979      +11     
==========================================
+ Hits        21096    21151      +55     
+ Misses      17178    17170       -8     
+ Partials     4151     4150       -1

Impacted Files	Coverage Δ
include/kernel.h	`94.44% <ø> (ø)`	⬆️
kernel/include/wait_q.h	`100% <ø> (ø)`	⬆️
kernel/include/ksched.h	`94.25% <100%> (ø)`	⬆️
kernel/pipes.c	`96.62% <100%> (ø)`	⬆️
include/misc/dlist.h	`92.68% <100%> (+3.39%)`	⬆️
kernel/include/timeout_q.h	`100% <100%> (ø)`	⬆️
kernel/poll.c	`81.87% <100%> (+0.34%)`	⬆️
kernel/timer.c	`94.66% <66.66%> (+0.07%)`	⬆️
kernel/sched.c	`92.69% <89.47%> (-0.15%)`	⬇️
kernel/timeout.c	`89.81% <97.05%> (+3.7%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3f8f96...2e0d00b. Read the comment docs.

pizi-nordic

This should also fix the #9843 (assuming that the tick<->ms conversion errors are taken under account in the mentioned test).

include/misc/dlist.h

pizi-nordic · 2019-01-15T14:03:49Z

kernel/timeout.c

+	 * they missed more than an interval.  But without it
+	 * kernel/timer/timer_api:test_timer_periodicity fails.  Is
+	 * the test making incorrect assumptions about how to trick
+	 * the system? */


@pabigot: Question is what is reference point of _add_timeout()? As far as I know, the reference is "now()" in real time. So scheduling timeouts in past is not possible. On the other hand we should be able to add timeouts which should fire "now()".

kernel/timeout.c

pizi-nordic · 2019-01-15T14:24:07Z

kernel/timeout.c

+		node = sys_dlist_peek_head(&ready);
+		if (node) {
+			k_spin_unlock(&timeout_lock, key);
+			while (node) {


do { } while (node = sys_dlist_peek_head(&ready))
In my opinion such code will make concept easier to understand.

kernel/timeout.c

pizi-nordic · 2019-01-15T14:26:46Z

kernel/timeout.c

-		first()->dticks -= announce_remaining;
-	}
+		node = sys_dlist_peek_head(&ready);
+		if (node) {


Could we replace this long if by early exit if (!node)?

pizi-nordic · 2019-01-15T14:33:43Z

tests/kernel/timer/timer_schedule/src/main.c

+	k_timer_start(&timer1, K_MSEC(T1_PERIOD), K_MSEC(T1_PERIOD));
+
+	while (state.run < 6) {
+		static s32_t t2_lower_tick = T2_TIMEOUT_TICK - 1;


I assume that "- 1" here is a way to make this test passing on nRF5x and avoid rounding-down problem in tick to ms conversion. If so, please add comment, that this "- 1" have to be removed when the rounding issue will be fixed.

It'd never be removed. I'm not convinced the rounding issue will be fixed (it's not isolated to Nordic, and may fall under "not a bug" or "too late, that's the way it's always worked"), nor that if it is that the more strict lower bound will be satisfied, nor that if it is anybody will see the comment to remove it.

I think that this "- 1" relaxes the test. We should never expire prematurely, and here we tolerate such situation. If we keep this as is, it will be fuel for "not a bug" or "too late, that's the way it's always worked" discussion.

To keep this test failing due to rounding, you can revert the #9745 (which changed round-up back to round-down instead of fixing faulty tests) and include such change in this PR.

Please tell me what issue number to put in the comment. When the bound is tightened the test fails regularly on qemu_xtensa, qemu_x86_64, qemu_x86, and mps_an385 so the bug (if it is one) is pretty widespread.

Add a constant-time operation that splits a list into a portion before a node and the portion starting at the node, moving the portion before the node to the end of a different list and leaving the original list starting at the node. Add a constant-time list join operation to append an existing list to a different list. This supports use cases where a dlist is split at some point (e.g. timers that have reached their deadline), so that the extracted elements can be processed without affecting content added to the dlist during processing. Signed-off-by: Peter A. Bigot <[email protected]>

The existing loop over timeouts in z_clock_announce adjusts the delay of the first timeout that has not been expired by the number of ticks remaining. This adjustment is (by some interpretations) incorrect when the timeout was added while processing a previous timeout and the adjustment is not zero, which can happen if multiple ticks are announced in one call. Resolve this by changing how the tick clock is managed: * curr_ticks/k_uptime_get() is adjusted as soon as the z_clock_announce() is invoked. This means that timer callbacks are able to observe the best available estimate of the current time, rather than the time at which each callback was supposed to be invoked. The new policy also ensures that k_uptime_get() is consistent with k_cycle_get(). * Prior to invoking any callbacks the sequence of expired timeouts is removed from the timeout queue, ensuring new timeouts are scheduled without the risk of inadvertently reducing their delay by unprocessed ticks. * Prior to a callback the timeout dticks field is updated to be relative to the current time, i.e. a non-positive value corresponding to how late the timeout was invoked. * The tick offset used when scheduling a recurring timeout is adjusted by the lateness of the invocation, ensuring that the expected callback times are at a fixed interval even if individual callbacks are late. * The dticks field is zeroed when a timer is initialized, aborted, started, or stopped, ensuring that non-recurring timeouts respect the offset passed as the initial duration. The new policy also enables a future enhancement where each k_timer can be configured to select between policies where repeating timeouts should be relative to the scheduled time of the last callback (historical behavior) or the actual time of the last callback (useful for certain cases). Closes zephyrproject-rtos#12332. Signed-off-by: Peter A. Bigot <[email protected]>

This verifies that when a second timer is scheduled within a delayed timer callback the lateness of the callback (as reflected by unprocessed announced ticks) does not impact the timeout of the newly scheduled timer. Relates to issue zephyrproject-rtos#12332. Signed-off-by: Peter A. Bigot <[email protected]>

pabigot · 2019-01-15T16:29:23Z

@pizi-nordic following changes made:

Change parameter order to sys_dlist_split to put the destination list first, more consistent with sys_dlist_join and mem* routines.
Refactor z_clock_announce to reduce complexity of control flow.
Rename sys_dlist_catenate to sys_dlist_join.

I think that's everything but the too-loose lower bound in the test, which I can document as a workaround if there's an open issue that appears to document the behavior.

andyross

This isn't what I thought we'd agreed on. I was honestly expecting a 10-ish line change that would allow k_timer to specify a different flag or something to _add_timeout() for its first argument.

I need to spend some time staring at this. Can you remove the timer patch and resubmit? We can merge everything else, but this one needs more argument.

andyross · 2019-01-15T17:06:16Z

include/kernel.h

@@ -389,7 +389,7 @@ struct _thread_base {

 	/* this thread's entry in a ready/wait queue */
 	union {
-		sys_dlist_t qnode_dlist;
+		sys_dnode_t qnode_dnode;


Minor point, but if you're going to correct naming, it should remain "qnode_dlist" or become "qnode_list" or something. It's the node field that goes into a list, to distinguish it from the equivalent field that goes into an rbtree. There's little value in trying to encode the type of an object in its field name.

You can bring this up when I re-open #12248 and it shows up there, but basically---no, it's not encoding the type in the name, it's encoding the value's role.

The whole problem is the original name results in passing something named qnode_dlist in a position where the API requires a sys_dnode_t instance. It looks weird and is confusing. The field should not claim to be a list when it's a node. The list is somewhere else.

no, it's not encoding the type in the name, it's encoding the value's role.

Peter, please don't start this fight. Variable names don't uniformly encode what a value "is" and never have, that's what compiler typesafety is for (which fell down here because of a design glitch with dlist -- those are typedefs and not separate structs, but that's a digression). Names tell you what it does.

The type was wrong, that's a bug that you fixed. The name was fine, and correctly indicates the intent of the author whether or not you personally agree with the convention in use. It's not productive (or even feasible) to run around giant codebases "fixing" stuff like other people's variable naming, no one ever agrees with that kind of thing and it just makes a mess. Some fights just aren't worth having.

andyross · 2019-01-15T17:12:27Z

kernel/include/timeout_q.h

@@ -23,6 +23,7 @@ extern "C" {
 static inline void _init_timeout(struct _timeout *t, _timeout_func_t fn)
 {
 	sys_dnode_init(&t->node);
+	t->dticks = 0;


This... is not the scheme I thought we'd agreed to. I thought this was simply going to change the convention

We've had multiple bugs with this kind of separate list handling in the past, because the timer objects may be modified while in that list, and it's not possible to keep execution locked while the callbacks execute. How does this work if another context aborts a thread while it's queued in the "to be executed" list? What happens if the callback tries to reschedule itself?

It is what I said I was going to do: a solution based on the branch I'd referenced in some issue attached to the problem we were discussing. Both abort and self-reschedule should work just fine.

It looks like #12499 causes the test I created to pass, so we'll probably go with that. I'll end up re-opening #12248 though because if you're not going to take the change to managing the timeout list everything else that got added lacks motivation.

Please don't dump the whole thing. The cleaned up list removal workaround has obvious value, as does the test case.

I can't add the test case as it will fail until your solution is merged. The rest is back in #12248 where it started out.

andyross · 2019-01-15T17:49:50Z

See #12499 for an implementation of the somewhat simpler solution I thought we were looking at.

pabigot · 2019-01-15T18:58:07Z

Closing this as #12499 provides a simpler workaround for a specific problem. The additional API for dlist and support for negative relative timeout deadlines will have to be argued in another context.

@andyross if you want to cherry-pick the commit from this that provides a test case into your PR that might be worthwhile.

pizi-nordic · 2019-02-05T12:28:17Z

Ehhh.... 3 weeks later the issues discussed here are still not fixed. Competitive PRs stuck in review. In my opinion, this one provides the best solution we could have without big timers rework. Could we get this merged (especially 2d971c7)?

pizi-nordic · 2019-02-05T12:28:35Z

@pabigot: Could you please rebase?

pabigot · 2019-02-20T15:09:37Z

Per discussion two weeks ago: this PR will not be merged, but whenever #12499 is finished and merged one or more commits from it may need to be reworked and submitted.

pabigot added 6 commits January 15, 2019 06:10

kernel: timeout: remove local fix for double-remove

681da23

Use the new generic capability to detect unlinked sys_dnode_t instances. Signed-off-by: Peter A. Bigot <[email protected]>

pabigot added the area: Timer Timer label Jan 15, 2019

pabigot requested review from andyross, andrewboie, carlescufi and pizi-nordic January 15, 2019 12:48

pabigot requested review from nniranjhana and spoorthik as code owners January 15, 2019 12:48

pabigot commented Jan 15, 2019

View reviewed changes

pabigot force-pushed the wip/20190106a branch 2 times, most recently from 5b0ae2f to dc44f13 Compare January 15, 2019 13:09

pizi-nordic reviewed Jan 15, 2019

View reviewed changes

pabigot added 3 commits January 15, 2019 10:07

pabigot force-pushed the wip/20190106a branch from dc44f13 to 2e0d00b Compare January 15, 2019 16:26

andyross requested changes Jan 15, 2019

View reviewed changes

andyross mentioned this pull request Jan 15, 2019

kernel/timer: Make k_timer "duration" realtime but "period" regular #12499

Closed

pabigot closed this Jan 15, 2019

pabigot mentioned this pull request Jan 15, 2019

extend dlist to detect unlinked nodes and avoid double-remove bugs #12248

Merged

pizi-nordic reopened this Feb 5, 2019

pabigot closed this Feb 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dlist API enhancements supporting timer cleanup and scheduling fix #12485

dlist API enhancements supporting timer cleanup and scheduling fix #12485

pabigot commented Jan 15, 2019

zephyrbot commented Jan 15, 2019 •

edited

Loading

pabigot left a comment

pabigot Jan 15, 2019

pabigot Jan 15, 2019

pizi-nordic Jan 15, 2019

pabigot Jan 15, 2019

pizi-nordic Jan 15, 2019

codecov-io commented Jan 15, 2019 •

edited

Loading

pizi-nordic left a comment

pizi-nordic Jan 15, 2019

pizi-nordic Jan 15, 2019

pizi-nordic Jan 15, 2019

pizi-nordic Jan 15, 2019

pabigot Jan 15, 2019

pizi-nordic Jan 15, 2019

pabigot Jan 15, 2019

pizi-nordic Jan 16, 2019

pabigot commented Jan 15, 2019

andyross left a comment

andyross Jan 15, 2019

pabigot Jan 15, 2019

andyross Jan 15, 2019

andyross Jan 15, 2019

pabigot Jan 15, 2019

andyross Jan 15, 2019

pabigot Jan 15, 2019

andyross commented Jan 15, 2019

pabigot commented Jan 15, 2019

pizi-nordic commented Feb 5, 2019 •

edited

Loading

pizi-nordic commented Feb 5, 2019

pabigot commented Feb 20, 2019

dlist API enhancements supporting timer cleanup and scheduling fix #12485

dlist API enhancements supporting timer cleanup and scheduling fix #12485

Conversation

pabigot commented Jan 15, 2019

kernel: thread: correct type and member name of dlist node

kernel: sched: fix empty list detection

sys: dlist: Add sys_dnode_is_linked

kernel: timeout: remove local fix for double-remove

kernel: poll: fix double-remove of node

kernel: timeout: detect inactive timeouts using dnode linked state

sys: dlist: add API to split lists

kernel: timeout: avoid premature adjustment of timeout

tests: kernel: timer: test delay for schedule in timer callback

zephyrbot commented Jan 15, 2019 • edited Loading

License/Copyright issues

pabigot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jan 15, 2019 • edited Loading

Codecov Report

pizi-nordic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pabigot commented Jan 15, 2019

andyross left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyross commented Jan 15, 2019

pabigot commented Jan 15, 2019

pizi-nordic commented Feb 5, 2019 • edited Loading

pizi-nordic commented Feb 5, 2019

pabigot commented Feb 20, 2019

zephyrbot commented Jan 15, 2019 •

edited

Loading

codecov-io commented Jan 15, 2019 •

edited

Loading

pizi-nordic commented Feb 5, 2019 •

edited

Loading