Skip to content

dlist API enhancements supporting timer cleanup and scheduling fix #12485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

pabigot
Copy link
Collaborator

@pabigot pabigot commented Jan 15, 2019

This is a roll-up of what was in #12248 and a solution to #12332. At a high level it enables detection of linked state for dnodes, uses that to replace a couple workarounds, extends the dlist API to support splitting and joining lists, then uses that to decouple pending timeouts from new timeouts to avoid an undesirable feature/bug when a timeout callback schedules a new timeout.

There is a large @todo comment in kernel/timeout that needs some eyes on it.

I've also kept the test case for #12332 separate from (after) the solution to #12332, primarily so it can be cherry-picked into existing branches to confirm that the behavior was properly reproduced before the fix. The commits can be merged if that would be preferred.

kernel: thread: correct type and member name of dlist node

Although sys_dnode_t and sys_dlist_t are aliases, their roles are
different and they appear in different positions in dlist API calls.
Change the name of the thread structure field used to link threads into
run queues and wait queues to reflect that it acts as a list node, not a
list head.

kernel: sched: fix empty list detection

CONTAINER_OF() on a NULL pointer returns some offset around NULL and not
another NULL pointer. We have to check for that ourselves.

This only worked because the dnode happened to be at the start of the
struct.

sys: dlist: Add sys_dnode_is_linked

The original implementation allows a list to be corrupted by list
operations on the removed node. Existing code attempts to avoid this by
using external state to determine whether a node is in a list, but this
is fragile and fails when the state that holds the flag value is changed
after the node is removed, e.g. in preparation for re-using the node.

Follow Linux in invalidating the link pointers in a removed node. Add
API so that detection of particpation in a list is available at the node
abstraction.

This solution relies on the following steady-state invariants:

  • A node (as opposed to a list) will never be adjacent to itself in a
    list;
  • The next and prev pointers of a node are always either both null or
    both non-null.

kernel: timeout: remove local fix for double-remove

Use the new generic capability to detect unlinked sys_dnode_t instances.

kernel: poll: fix double-remove of node

k_poll events are registered in a linked list when their signal
condition has been met. The code to clear event registration did not
account for events that were not registered, resulting in double-removes
that produced core dumps on native-posix sanitycheck.

kernel: timeout: detect inactive timeouts using dnode linked state

Whether a timeout is linked into the timeout queue can be determined
from the corresponding sys_dnode_t linked state. This removes the need
to use a special flag value in dticks to determine that the timeout is
inactive.

Update _abort_timeout to return an error code, rather than the flag
value, when the timeout to be aborted was not active.

Remove the _INACTIVE flag value, and replace its external uses with an
internal API function that checks whether a timeout is inactive.

sys: dlist: add API to split lists

Add a constant-time operation that splits a list into a portion before a
node and the portion starting at the node, moving the portion before the
node to the end of a different list and leaving the original list
starting at the node.

Add a constant-time list join operation to append an existing list to a
different list. This is a utility function supporting list splitting.

This supports use cases where a dlist is split at some point
(e.g. timers that have reached their deadline), so that the extracted
elements can be processed without affecting content added to the
dlist during processing.

kernel: timeout: avoid premature adjustment of timeout

The existing loop over timeouts in z_clock_announce adjusts the delay of
the first timeout that has not been expired by the number of ticks
remaining. This adjustment is (by some interpretations) incorrect when
the timeout was added while processing a previous timeout and the
adjustment is not zero, which can happen if multiple ticks are announced
in one call.

Resolve this by changing how the tick clock is managed:

  • curr_ticks/k_uptime_get() is adjusted as soon as the
    z_clock_announce() is invoked. This means that timer callbacks are
    able to observe the best available estimate of the current time,
    rather than the time at which each callback was supposed to be
    invoked. The new policy also ensures that k_uptime_get() is
    consistent with k_cycle_get().
  • Prior to invoking any callbacks the sequence of expired timeouts is
    removed from the timeout queue, ensuring new timeouts are scheduled
    without the risk of inadvertently reducing their delay by unprocessed
    ticks.
  • Prior to a callback the timeout dticks field is updated to be relative
    to the current time, i.e. a non-positive value corresponding to how
    late the timeout was invoked.
  • The tick offset used when scheduling a recurring timeout is adjusted
    by the lateness of the invocation, ensuring that the expected callback
    times are at a fixed interval even if individual callbacks are late.
  • The dticks field is zeroed when a timer is initialized, aborted,
    started, or stopped, ensuring that non-recurring timeouts respect the
    offset passed as the initial duration.

The new policy also enables a future enhancement where each k_timer can
be configured to select between policies where repeating timeouts should
be relative to the scheduled time of the last callback (historical
behavior) or the actual time of the last callback (useful for certain
cases).

Closes #12332.

tests: kernel: timer: test delay for schedule in timer callback

This verifies that when a second timer is scheduled within a delayed
timer callback the lateness of the callback (as reflected by unprocessed
announced ticks) does not impact the timeout of the newly scheduled
timer.

Relates to issue #12332.

Although sys_dnode_t and sys_dlist_t are aliases, their roles are
different and they appear in different positions in dlist API calls.
Change the name of the thread structure field used to link threads into
run queues and wait queues to reflect that it acts as a list node, not a
list head.

Signed-off-by: Peter A. Bigot <[email protected]>
CONTAINER_OF() on a NULL pointer returns some offset around NULL and not
another NULL pointer.  We have to check for that ourselves.

This only worked because the dnode happened to be at the start of the
struct.

Signed-off-by: Peter A. Bigot <[email protected]>
The original implementation allows a list to be corrupted by list
operations on the removed node.  Existing code attempts to avoid this by
using external state to determine whether a node is in a list, but this
is fragile and fails when the state that holds the flag value is changed
after the node is removed, e.g. in preparation for re-using the node.

Follow Linux in invalidating the link pointers in a removed node.  Add
API so that detection of particpation in a list is available at the node
abstraction.

This solution relies on the following steady-state invariants:
* A node (as opposed to a list) will never be adjacent to itself in a
  list;
* The next and prev pointers of a node are always either both null or
  both non-null.

Signed-off-by: Peter A. Bigot <[email protected]>
Use the new generic capability to detect unlinked sys_dnode_t instances.

Signed-off-by: Peter A. Bigot <[email protected]>
k_poll events are registered in a linked list when their signal
condition has been met.  The code to clear event registration did not
account for events that were not registered, resulting in double-removes
that produced core dumps on native-posix sanitycheck.

Signed-off-by: Peter A. Bigot <[email protected]>
Whether a timeout is linked into the timeout queue can be determined
from the corresponding sys_dnode_t linked state.  This removes the need
to use a special flag value in dticks to determine that the timeout is
inactive.

Update _abort_timeout to return an error code, rather than the flag
value, when the timeout to be aborted was not active.

Remove the _INACTIVE flag value, and replace its external uses with an
internal API function that checks whether a timeout is inactive.

Signed-off-by: Peter A. Bigot <[email protected]>
@zephyrbot
Copy link
Collaborator

zephyrbot commented Jan 15, 2019

Found the following issues, please fix and resubmit:

License/Copyright issues

In most cases you do not need to do anything here, especially if the files reported below are going into ext/ and if license was approved for inclusion into ext/ already. Fix any missing license/copyright issues. The license exception if a JFYI for the maintainers and can be overriden when merging the pull request.

  • tests/kernel/timer/timer_schedule/README missing license.
  • tests/kernel/timer/timer_schedule/README missing copyright.

Copy link
Collaborator Author

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issues to be addressed and pointers to open questions.

}

void _add_timeout(struct _timeout *to, _timeout_func_t fn, s32_t ticks);

int _abort_timeout(struct _timeout *to);

static inline bool _is_inactive_timeout(struct _timeout *t)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross The sense of this can be reversed to _is_active_timeout() if #12264 is resolved. I might be able to include a fix for that if you still want this change made.

* they missed more than an interval. But without it
* kernel/timer/timer_api:test_timer_periodicity fails. Is
* the test making incorrect assumptions about how to trick
* the system? */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the point where I need insight. Negative delays are important for maintaining periodicity when ticks are not announced in a timely manner.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pabigot: Question is what is reference point of _add_timeout()? As far as I know, the reference is "now()" in real time. So scheduling timeouts in past is not possible. On the other hand we should be able to add timeouts which should fire "now()".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incoming argument ticks is signed, relative to now = k_uptime_get(). It's negative if we've already missed the deadline. We can't unilaterally discard the only information we have on the alignment of the recurring deadline, so negative delays must be recorded and preserved so subsequent reschedules maintain the correct offsets. In practice the timeout will be near the head of the queue (before anything that isn't late) so it will be invoked in the next tick interrupt. That's the earliest possible unless the z_clock_announce is reworked to loop as long as there's a late timeout at the head of the list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Now i see it. We have to handle missed periodic timeouts which were missed more than the period.

@pabigot pabigot force-pushed the wip/20190106a branch 2 times, most recently from 5b0ae2f to dc44f13 Compare January 15, 2019 13:09
@codecov-io
Copy link

codecov-io commented Jan 15, 2019

Codecov Report

Merging #12485 into master will increase coverage by 0.07%.
The diff coverage is 95.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12485      +/-   ##
==========================================
+ Coverage   49.72%    49.8%   +0.07%     
==========================================
  Files         292      292              
  Lines       42425    42471      +46     
  Branches     9968     9979      +11     
==========================================
+ Hits        21096    21151      +55     
+ Misses      17178    17170       -8     
+ Partials     4151     4150       -1
Impacted Files Coverage Δ
include/kernel.h 94.44% <ø> (ø) ⬆️
kernel/include/wait_q.h 100% <ø> (ø) ⬆️
kernel/include/ksched.h 94.25% <100%> (ø) ⬆️
kernel/pipes.c 96.62% <100%> (ø) ⬆️
include/misc/dlist.h 92.68% <100%> (+3.39%) ⬆️
kernel/include/timeout_q.h 100% <100%> (ø) ⬆️
kernel/poll.c 81.87% <100%> (+0.34%) ⬆️
kernel/timer.c 94.66% <66.66%> (+0.07%) ⬆️
kernel/sched.c 92.69% <89.47%> (-0.15%) ⬇️
kernel/timeout.c 89.81% <97.05%> (+3.7%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3f8f96...2e0d00b. Read the comment docs.

Copy link
Collaborator

@pizi-nordic pizi-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also fix the #9843 (assuming that the tick<->ms conversion errors are taken under account in the mentioned test).

* they missed more than an interval. But without it
* kernel/timer/timer_api:test_timer_periodicity fails. Is
* the test making incorrect assumptions about how to trick
* the system? */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pabigot: Question is what is reference point of _add_timeout()? As far as I know, the reference is "now()" in real time. So scheduling timeouts in past is not possible. On the other hand we should be able to add timeouts which should fire "now()".

kernel/timeout.c Outdated
node = sys_dlist_peek_head(&ready);
if (node) {
k_spin_unlock(&timeout_lock, key);
while (node) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do { } while (node = sys_dlist_peek_head(&ready))
In my opinion such code will make concept easier to understand.

kernel/timeout.c Outdated
first()->dticks -= announce_remaining;
}
node = sys_dlist_peek_head(&ready);
if (node) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace this long if by early exit if (!node)?

k_timer_start(&timer1, K_MSEC(T1_PERIOD), K_MSEC(T1_PERIOD));

while (state.run < 6) {
static s32_t t2_lower_tick = T2_TIMEOUT_TICK - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that "- 1" here is a way to make this test passing on nRF5x and avoid rounding-down problem in tick to ms conversion. If so, please add comment, that this "- 1" have to be removed when the rounding issue will be fixed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd never be removed. I'm not convinced the rounding issue will be fixed (it's not isolated to Nordic, and may fall under "not a bug" or "too late, that's the way it's always worked"), nor that if it is that the more strict lower bound will be satisfied, nor that if it is anybody will see the comment to remove it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this "- 1" relaxes the test. We should never expire prematurely, and here we tolerate such situation. If we keep this as is, it will be fuel for "not a bug" or "too late, that's the way it's always worked" discussion.

To keep this test failing due to rounding, you can revert the #9745 (which changed round-up back to round-down instead of fixing faulty tests) and include such change in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me what issue number to put in the comment. When the bound is tightened the test fails regularly on qemu_xtensa, qemu_x86_64, qemu_x86, and mps_an385 so the bug (if it is one) is pretty widespread.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a constant-time operation that splits a list into a portion before a
node and the portion starting at the node, moving the portion before the
node to the end of a different list and leaving the original list
starting at the node.

Add a constant-time list join operation to append an existing list to a
different list.

This supports use cases where a dlist is split at some point
(e.g. timers that have reached their deadline), so that the extracted
elements can be processed without affecting content added to the
dlist during processing.

Signed-off-by: Peter A. Bigot <[email protected]>
The existing loop over timeouts in z_clock_announce adjusts the delay of
the first timeout that has not been expired by the number of ticks
remaining.  This adjustment is (by some interpretations) incorrect when
the timeout was added while processing a previous timeout and the
adjustment is not zero, which can happen if multiple ticks are announced
in one call.

Resolve this by changing how the tick clock is managed:
* curr_ticks/k_uptime_get() is adjusted as soon as the
  z_clock_announce() is invoked.  This means that timer callbacks are
  able to observe the best available estimate of the current time,
  rather than the time at which each callback was supposed to be
  invoked.  The new policy also ensures that k_uptime_get() is
  consistent with k_cycle_get().
* Prior to invoking any callbacks the sequence of expired timeouts is
  removed from the timeout queue, ensuring new timeouts are scheduled
  without the risk of inadvertently reducing their delay by unprocessed
  ticks.
* Prior to a callback the timeout dticks field is updated to be relative
  to the current time, i.e. a non-positive value corresponding to how
  late the timeout was invoked.
* The tick offset used when scheduling a recurring timeout is adjusted
  by the lateness of the invocation, ensuring that the expected callback
  times are at a fixed interval even if individual callbacks are late.
* The dticks field is zeroed when a timer is initialized, aborted,
  started, or stopped, ensuring that non-recurring timeouts respect the
  offset passed as the initial duration.

The new policy also enables a future enhancement where each k_timer can
be configured to select between policies where repeating timeouts should
be relative to the scheduled time of the last callback (historical
behavior) or the actual time of the last callback (useful for certain
cases).

Closes zephyrproject-rtos#12332.

Signed-off-by: Peter A. Bigot <[email protected]>
This verifies that when a second timer is scheduled within a delayed
timer callback the lateness of the callback (as reflected by unprocessed
announced ticks) does not impact the timeout of the newly scheduled
timer.

Relates to issue zephyrproject-rtos#12332.

Signed-off-by: Peter A. Bigot <[email protected]>
@pabigot
Copy link
Collaborator Author

pabigot commented Jan 15, 2019

@pizi-nordic following changes made:

  • Change parameter order to sys_dlist_split to put the destination list first, more consistent with sys_dlist_join and mem* routines.
  • Refactor z_clock_announce to reduce complexity of control flow.
  • Rename sys_dlist_catenate to sys_dlist_join.

I think that's everything but the too-loose lower bound in the test, which I can document as a workaround if there's an open issue that appears to document the behavior.

Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't what I thought we'd agreed on. I was honestly expecting a 10-ish line change that would allow k_timer to specify a different flag or something to _add_timeout() for its first argument.

I need to spend some time staring at this. Can you remove the timer patch and resubmit? We can merge everything else, but this one needs more argument.

@@ -389,7 +389,7 @@ struct _thread_base {

/* this thread's entry in a ready/wait queue */
union {
sys_dlist_t qnode_dlist;
sys_dnode_t qnode_dnode;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point, but if you're going to correct naming, it should remain "qnode_dlist" or become "qnode_list" or something. It's the node field that goes into a list, to distinguish it from the equivalent field that goes into an rbtree. There's little value in trying to encode the type of an object in its field name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can bring this up when I re-open #12248 and it shows up there, but basically---no, it's not encoding the type in the name, it's encoding the value's role.

The whole problem is the original name results in passing something named qnode_dlist in a position where the API requires a sys_dnode_t instance. It looks weird and is confusing. The field should not claim to be a list when it's a node. The list is somewhere else.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's not encoding the type in the name, it's encoding the value's role.

Peter, please don't start this fight. Variable names don't uniformly encode what a value "is" and never have, that's what compiler typesafety is for (which fell down here because of a design glitch with dlist -- those are typedefs and not separate structs, but that's a digression). Names tell you what it does.

The type was wrong, that's a bug that you fixed. The name was fine, and correctly indicates the intent of the author whether or not you personally agree with the convention in use. It's not productive (or even feasible) to run around giant codebases "fixing" stuff like other people's variable naming, no one ever agrees with that kind of thing and it just makes a mess. Some fights just aren't worth having.

@@ -23,6 +23,7 @@ extern "C" {
static inline void _init_timeout(struct _timeout *t, _timeout_func_t fn)
{
sys_dnode_init(&t->node);
t->dticks = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This... is not the scheme I thought we'd agreed to. I thought this was simply going to change the convention

We've had multiple bugs with this kind of separate list handling in the past, because the timer objects may be modified while in that list, and it's not possible to keep execution locked while the callbacks execute. How does this work if another context aborts a thread while it's queued in the "to be executed" list? What happens if the callback tries to reschedule itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is what I said I was going to do: a solution based on the branch I'd referenced in some issue attached to the problem we were discussing. Both abort and self-reschedule should work just fine.

It looks like #12499 causes the test I created to pass, so we'll probably go with that. I'll end up re-opening #12248 though because if you're not going to take the change to managing the timeout list everything else that got added lacks motivation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't dump the whole thing. The cleaned up list removal workaround has obvious value, as does the test case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add the test case as it will fail until your solution is merged. The rest is back in #12248 where it started out.

@andyross
Copy link
Collaborator

See #12499 for an implementation of the somewhat simpler solution I thought we were looking at.

@pabigot
Copy link
Collaborator Author

pabigot commented Jan 15, 2019

Closing this as #12499 provides a simpler workaround for a specific problem. The additional API for dlist and support for negative relative timeout deadlines will have to be argued in another context.

@andyross if you want to cherry-pick the commit from this that provides a test case into your PR that might be worthwhile.

@pizi-nordic
Copy link
Collaborator

pizi-nordic commented Feb 5, 2019

Ehhh.... 3 weeks later the issues discussed here are still not fixed. Competitive PRs stuck in review. In my opinion, this one provides the best solution we could have without big timers rework. Could we get this merged (especially 2d971c7)?

@pizi-nordic pizi-nordic reopened this Feb 5, 2019
@pizi-nordic
Copy link
Collaborator

@pabigot: Could you please rebase?

@pabigot
Copy link
Collaborator Author

pabigot commented Feb 20, 2019

Per discussion two weeks ago: this PR will not be merged, but whenever #12499 is finished and merged one or more commits from it may need to be reworked and submitted.

@pabigot pabigot closed this Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undesirable behaviour of z_clock_announce() when ticks argument is greater by 1
5 participants