-
Notifications
You must be signed in to change notification settings - Fork 7.4k
dlist API enhancements supporting timer cleanup and scheduling fix #12485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Although sys_dnode_t and sys_dlist_t are aliases, their roles are different and they appear in different positions in dlist API calls. Change the name of the thread structure field used to link threads into run queues and wait queues to reflect that it acts as a list node, not a list head. Signed-off-by: Peter A. Bigot <[email protected]>
CONTAINER_OF() on a NULL pointer returns some offset around NULL and not another NULL pointer. We have to check for that ourselves. This only worked because the dnode happened to be at the start of the struct. Signed-off-by: Peter A. Bigot <[email protected]>
The original implementation allows a list to be corrupted by list operations on the removed node. Existing code attempts to avoid this by using external state to determine whether a node is in a list, but this is fragile and fails when the state that holds the flag value is changed after the node is removed, e.g. in preparation for re-using the node. Follow Linux in invalidating the link pointers in a removed node. Add API so that detection of particpation in a list is available at the node abstraction. This solution relies on the following steady-state invariants: * A node (as opposed to a list) will never be adjacent to itself in a list; * The next and prev pointers of a node are always either both null or both non-null. Signed-off-by: Peter A. Bigot <[email protected]>
Use the new generic capability to detect unlinked sys_dnode_t instances. Signed-off-by: Peter A. Bigot <[email protected]>
k_poll events are registered in a linked list when their signal condition has been met. The code to clear event registration did not account for events that were not registered, resulting in double-removes that produced core dumps on native-posix sanitycheck. Signed-off-by: Peter A. Bigot <[email protected]>
Whether a timeout is linked into the timeout queue can be determined from the corresponding sys_dnode_t linked state. This removes the need to use a special flag value in dticks to determine that the timeout is inactive. Update _abort_timeout to return an error code, rather than the flag value, when the timeout to be aborted was not active. Remove the _INACTIVE flag value, and replace its external uses with an internal API function that checks whether a timeout is inactive. Signed-off-by: Peter A. Bigot <[email protected]>
Found the following issues, please fix and resubmit: License/Copyright issuesIn most cases you do not need to do anything here, especially if the files reported below are going into ext/ and if license was approved for inclusion into ext/ already. Fix any missing license/copyright issues. The license exception if a JFYI for the maintainers and can be overriden when merging the pull request.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issues to be addressed and pointers to open questions.
} | ||
|
||
void _add_timeout(struct _timeout *to, _timeout_func_t fn, s32_t ticks); | ||
|
||
int _abort_timeout(struct _timeout *to); | ||
|
||
static inline bool _is_inactive_timeout(struct _timeout *t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* they missed more than an interval. But without it | ||
* kernel/timer/timer_api:test_timer_periodicity fails. Is | ||
* the test making incorrect assumptions about how to trick | ||
* the system? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the point where I need insight. Negative delays are important for maintaining periodicity when ticks are not announced in a timely manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pabigot: Question is what is reference point of _add_timeout()
? As far as I know, the reference is "now()" in real time. So scheduling timeouts in past is not possible. On the other hand we should be able to add timeouts which should fire "now()".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The incoming argument ticks
is signed, relative to now = k_uptime_get()
. It's negative if we've already missed the deadline. We can't unilaterally discard the only information we have on the alignment of the recurring deadline, so negative delays must be recorded and preserved so subsequent reschedules maintain the correct offsets. In practice the timeout will be near the head of the queue (before anything that isn't late) so it will be invoked in the next tick interrupt. That's the earliest possible unless the z_clock_announce
is reworked to loop as long as there's a late timeout at the head of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Now i see it. We have to handle missed periodic timeouts which were missed more than the period.
5b0ae2f
to
dc44f13
Compare
Codecov Report
@@ Coverage Diff @@
## master #12485 +/- ##
==========================================
+ Coverage 49.72% 49.8% +0.07%
==========================================
Files 292 292
Lines 42425 42471 +46
Branches 9968 9979 +11
==========================================
+ Hits 21096 21151 +55
+ Misses 17178 17170 -8
+ Partials 4151 4150 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also fix the #9843 (assuming that the tick<->ms conversion errors are taken under account in the mentioned test).
* they missed more than an interval. But without it | ||
* kernel/timer/timer_api:test_timer_periodicity fails. Is | ||
* the test making incorrect assumptions about how to trick | ||
* the system? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pabigot: Question is what is reference point of _add_timeout()
? As far as I know, the reference is "now()" in real time. So scheduling timeouts in past is not possible. On the other hand we should be able to add timeouts which should fire "now()".
kernel/timeout.c
Outdated
node = sys_dlist_peek_head(&ready); | ||
if (node) { | ||
k_spin_unlock(&timeout_lock, key); | ||
while (node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do { } while (node = sys_dlist_peek_head(&ready))
In my opinion such code will make concept easier to understand.
kernel/timeout.c
Outdated
first()->dticks -= announce_remaining; | ||
} | ||
node = sys_dlist_peek_head(&ready); | ||
if (node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we replace this long if by early exit if (!node)?
k_timer_start(&timer1, K_MSEC(T1_PERIOD), K_MSEC(T1_PERIOD)); | ||
|
||
while (state.run < 6) { | ||
static s32_t t2_lower_tick = T2_TIMEOUT_TICK - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that "- 1" here is a way to make this test passing on nRF5x and avoid rounding-down problem in tick to ms conversion. If so, please add comment, that this "- 1" have to be removed when the rounding issue will be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd never be removed. I'm not convinced the rounding issue will be fixed (it's not isolated to Nordic, and may fall under "not a bug" or "too late, that's the way it's always worked"), nor that if it is that the more strict lower bound will be satisfied, nor that if it is anybody will see the comment to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this "- 1" relaxes the test. We should never expire prematurely, and here we tolerate such situation. If we keep this as is, it will be fuel for "not a bug" or "too late, that's the way it's always worked" discussion.
To keep this test failing due to rounding, you can revert the #9745 (which changed round-up back to round-down instead of fixing faulty tests) and include such change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please tell me what issue number to put in the comment. When the bound is tightened the test fails regularly on qemu_xtensa, qemu_x86_64, qemu_x86, and mps_an385 so the bug (if it is one) is pretty widespread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a constant-time operation that splits a list into a portion before a node and the portion starting at the node, moving the portion before the node to the end of a different list and leaving the original list starting at the node. Add a constant-time list join operation to append an existing list to a different list. This supports use cases where a dlist is split at some point (e.g. timers that have reached their deadline), so that the extracted elements can be processed without affecting content added to the dlist during processing. Signed-off-by: Peter A. Bigot <[email protected]>
The existing loop over timeouts in z_clock_announce adjusts the delay of the first timeout that has not been expired by the number of ticks remaining. This adjustment is (by some interpretations) incorrect when the timeout was added while processing a previous timeout and the adjustment is not zero, which can happen if multiple ticks are announced in one call. Resolve this by changing how the tick clock is managed: * curr_ticks/k_uptime_get() is adjusted as soon as the z_clock_announce() is invoked. This means that timer callbacks are able to observe the best available estimate of the current time, rather than the time at which each callback was supposed to be invoked. The new policy also ensures that k_uptime_get() is consistent with k_cycle_get(). * Prior to invoking any callbacks the sequence of expired timeouts is removed from the timeout queue, ensuring new timeouts are scheduled without the risk of inadvertently reducing their delay by unprocessed ticks. * Prior to a callback the timeout dticks field is updated to be relative to the current time, i.e. a non-positive value corresponding to how late the timeout was invoked. * The tick offset used when scheduling a recurring timeout is adjusted by the lateness of the invocation, ensuring that the expected callback times are at a fixed interval even if individual callbacks are late. * The dticks field is zeroed when a timer is initialized, aborted, started, or stopped, ensuring that non-recurring timeouts respect the offset passed as the initial duration. The new policy also enables a future enhancement where each k_timer can be configured to select between policies where repeating timeouts should be relative to the scheduled time of the last callback (historical behavior) or the actual time of the last callback (useful for certain cases). Closes zephyrproject-rtos#12332. Signed-off-by: Peter A. Bigot <[email protected]>
This verifies that when a second timer is scheduled within a delayed timer callback the lateness of the callback (as reflected by unprocessed announced ticks) does not impact the timeout of the newly scheduled timer. Relates to issue zephyrproject-rtos#12332. Signed-off-by: Peter A. Bigot <[email protected]>
dc44f13
to
2e0d00b
Compare
@pizi-nordic following changes made:
I think that's everything but the too-loose lower bound in the test, which I can document as a workaround if there's an open issue that appears to document the behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't what I thought we'd agreed on. I was honestly expecting a 10-ish line change that would allow k_timer to specify a different flag or something to _add_timeout() for its first argument.
I need to spend some time staring at this. Can you remove the timer patch and resubmit? We can merge everything else, but this one needs more argument.
@@ -389,7 +389,7 @@ struct _thread_base { | |||
|
|||
/* this thread's entry in a ready/wait queue */ | |||
union { | |||
sys_dlist_t qnode_dlist; | |||
sys_dnode_t qnode_dnode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point, but if you're going to correct naming, it should remain "qnode_dlist" or become "qnode_list" or something. It's the node field that goes into a list, to distinguish it from the equivalent field that goes into an rbtree. There's little value in trying to encode the type of an object in its field name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can bring this up when I re-open #12248 and it shows up there, but basically---no, it's not encoding the type in the name, it's encoding the value's role.
The whole problem is the original name results in passing something named qnode_dlist
in a position where the API requires a sys_dnode_t
instance. It looks weird and is confusing. The field should not claim to be a list when it's a node. The list is somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's not encoding the type in the name, it's encoding the value's role.
Peter, please don't start this fight. Variable names don't uniformly encode what a value "is" and never have, that's what compiler typesafety is for (which fell down here because of a design glitch with dlist -- those are typedefs and not separate structs, but that's a digression). Names tell you what it does.
The type was wrong, that's a bug that you fixed. The name was fine, and correctly indicates the intent of the author whether or not you personally agree with the convention in use. It's not productive (or even feasible) to run around giant codebases "fixing" stuff like other people's variable naming, no one ever agrees with that kind of thing and it just makes a mess. Some fights just aren't worth having.
@@ -23,6 +23,7 @@ extern "C" { | |||
static inline void _init_timeout(struct _timeout *t, _timeout_func_t fn) | |||
{ | |||
sys_dnode_init(&t->node); | |||
t->dticks = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This... is not the scheme I thought we'd agreed to. I thought this was simply going to change the convention
We've had multiple bugs with this kind of separate list handling in the past, because the timer objects may be modified while in that list, and it's not possible to keep execution locked while the callbacks execute. How does this work if another context aborts a thread while it's queued in the "to be executed" list? What happens if the callback tries to reschedule itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is what I said I was going to do: a solution based on the branch I'd referenced in some issue attached to the problem we were discussing. Both abort and self-reschedule should work just fine.
It looks like #12499 causes the test I created to pass, so we'll probably go with that. I'll end up re-opening #12248 though because if you're not going to take the change to managing the timeout list everything else that got added lacks motivation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't dump the whole thing. The cleaned up list removal workaround has obvious value, as does the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't add the test case as it will fail until your solution is merged. The rest is back in #12248 where it started out.
See #12499 for an implementation of the somewhat simpler solution I thought we were looking at. |
Closing this as #12499 provides a simpler workaround for a specific problem. The additional API for dlist and support for negative relative timeout deadlines will have to be argued in another context. @andyross if you want to cherry-pick the commit from this that provides a test case into your PR that might be worthwhile. |
Ehhh.... 3 weeks later the issues discussed here are still not fixed. Competitive PRs stuck in review. In my opinion, this one provides the best solution we could have without big timers rework. Could we get this merged (especially 2d971c7)? |
@pabigot: Could you please rebase? |
Per discussion two weeks ago: this PR will not be merged, but whenever #12499 is finished and merged one or more commits from it may need to be reworked and submitted. |
This is a roll-up of what was in #12248 and a solution to #12332. At a high level it enables detection of linked state for dnodes, uses that to replace a couple workarounds, extends the dlist API to support splitting and joining lists, then uses that to decouple pending timeouts from new timeouts to avoid an undesirable feature/bug when a timeout callback schedules a new timeout.
There is a large
@todo
comment inkernel/timeout
that needs some eyes on it.I've also kept the test case for #12332 separate from (after) the solution to #12332, primarily so it can be cherry-picked into existing branches to confirm that the behavior was properly reproduced before the fix. The commits can be merged if that would be preferred.
kernel: thread: correct type and member name of dlist node
Although sys_dnode_t and sys_dlist_t are aliases, their roles are
different and they appear in different positions in dlist API calls.
Change the name of the thread structure field used to link threads into
run queues and wait queues to reflect that it acts as a list node, not a
list head.
kernel: sched: fix empty list detection
CONTAINER_OF() on a NULL pointer returns some offset around NULL and not
another NULL pointer. We have to check for that ourselves.
This only worked because the dnode happened to be at the start of the
struct.
sys: dlist: Add sys_dnode_is_linked
The original implementation allows a list to be corrupted by list
operations on the removed node. Existing code attempts to avoid this by
using external state to determine whether a node is in a list, but this
is fragile and fails when the state that holds the flag value is changed
after the node is removed, e.g. in preparation for re-using the node.
Follow Linux in invalidating the link pointers in a removed node. Add
API so that detection of particpation in a list is available at the node
abstraction.
This solution relies on the following steady-state invariants:
list;
both non-null.
kernel: timeout: remove local fix for double-remove
Use the new generic capability to detect unlinked sys_dnode_t instances.
kernel: poll: fix double-remove of node
k_poll events are registered in a linked list when their signal
condition has been met. The code to clear event registration did not
account for events that were not registered, resulting in double-removes
that produced core dumps on native-posix sanitycheck.
kernel: timeout: detect inactive timeouts using dnode linked state
Whether a timeout is linked into the timeout queue can be determined
from the corresponding sys_dnode_t linked state. This removes the need
to use a special flag value in dticks to determine that the timeout is
inactive.
Update _abort_timeout to return an error code, rather than the flag
value, when the timeout to be aborted was not active.
Remove the _INACTIVE flag value, and replace its external uses with an
internal API function that checks whether a timeout is inactive.
sys: dlist: add API to split lists
Add a constant-time operation that splits a list into a portion before a
node and the portion starting at the node, moving the portion before the
node to the end of a different list and leaving the original list
starting at the node.
Add a constant-time list join operation to append an existing list to a
different list. This is a utility function supporting list splitting.
This supports use cases where a dlist is split at some point
(e.g. timers that have reached their deadline), so that the extracted
elements can be processed without affecting content added to the
dlist during processing.
kernel: timeout: avoid premature adjustment of timeout
The existing loop over timeouts in z_clock_announce adjusts the delay of
the first timeout that has not been expired by the number of ticks
remaining. This adjustment is (by some interpretations) incorrect when
the timeout was added while processing a previous timeout and the
adjustment is not zero, which can happen if multiple ticks are announced
in one call.
Resolve this by changing how the tick clock is managed:
z_clock_announce() is invoked. This means that timer callbacks are
able to observe the best available estimate of the current time,
rather than the time at which each callback was supposed to be
invoked. The new policy also ensures that k_uptime_get() is
consistent with k_cycle_get().
removed from the timeout queue, ensuring new timeouts are scheduled
without the risk of inadvertently reducing their delay by unprocessed
ticks.
to the current time, i.e. a non-positive value corresponding to how
late the timeout was invoked.
by the lateness of the invocation, ensuring that the expected callback
times are at a fixed interval even if individual callbacks are late.
started, or stopped, ensuring that non-recurring timeouts respect the
offset passed as the initial duration.
The new policy also enables a future enhancement where each k_timer can
be configured to select between policies where repeating timeouts should
be relative to the scheduled time of the last callback (historical
behavior) or the actual time of the last callback (useful for certain
cases).
Closes #12332.
tests: kernel: timer: test delay for schedule in timer callback
This verifies that when a second timer is scheduled within a delayed
timer callback the lateness of the callback (as reflected by unprocessed
announced ticks) does not impact the timeout of the newly scheduled
timer.
Relates to issue #12332.