-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Improving resolution and usability of controlled delays in kernel APIs #17162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One more thing to consider: We should specify clock domains for k_busy_wait() and k_cycle_get32(). and other k_* API. Currently we assume that all are clocked from the same source, but for at least Nordic, this is not true. |
Thanks Peter for the detailed summary! I think the requirements contain all the things which I would need for an IoT application with precise TDMA scheme. I don't have a deep understanding of Zephyr so here are just a few comments from my side:
|
@leapslabs Thanks for your comments. Although synchronization is important, the primary use of this feature is internal to a single device. System cycle clocks are always above 1 MHz so nanoseconds are necessary to measure deltas. There are also use cases for setting alarms that will fire months in the future (e.g. preparing for a time zone offset or TAI-UTC delta change). My tests on boards from four different vendors show the system clock is generally accurate to about 25 ppm, which aggregates 1 ms error in 40 s runtime, or 1 s in about eleven hours, which is consistent with your 10s-of-minutes estimate to coordinate time between devices at the application level, e.g. to synchronize scheduled operations like turning lights on or off. |
Thank you for listing all the problems in single place!
I like both the idea of specifying absolute deadlines as well as fetching deadlines from any kind of kernel API. Looking to @andyross PR the following idea popped in my mind:
Question if this idea is really usable is open, but IMHO this might be a step closer to satisfy @pabigot requirements. Moreover, if we are going to change kernel API, there is no better moment than the upcoming Zephyr 2.0 release. Even if not all new features will be available, we might start familiarize users with new API.
This needs discussion. We have to set clear boundaries what will be interpreted as a past, both in absolute and relative API.
IMHO we should not allow for negative deltas from now() in the API. If the user would use "schedule in past" approach we should force him to use absolute timeouts and do the math in application.
That is hard part. I can see the same device needing sub-us precision (for example to implement a radio protocol) and long timeouts (for example run self-check of the controlled machinery each 3 months, remind user about service once per year, etc.) I support @pabigot in one more area: we should clearly define clock domains for the kernel API: For example for Nordic k_cycle_get 32() API is using different clock than k_busy_wait() which is not expected by some of the tests. |
long timeouts can always be achieved in application in fragments so i'm not that worried about that. Regarding sub-us if we are talking about kernel infrastructure then if api's are using ticks (and macros for ms2ticks, us2ticks conversion) then nanoseconds are not a problem if nanoseconds-to-ticks macro is added. For lower frequencies that will of course make no sense. |
@nordic-krch: You reminded me about one important problem: |
Conversion between clock domains is not necessarily both injective (one-to-one) and surjective (onto), which is required to create a total inverse function. I don't think this expectation can be satisfied, especially since these functions don't specify the direction of rounding, and if they both round up the composition might be a monotonically increasing function. |
Ahhh, I missing my math courses :). But you are right. We are working in finite field and we cannot avoid rounding, so there will be no total inversion. However at the moment, the ms2tick rounds up while tick2ms rounds down, which results in |
Technically, yes; without changing behavior, no. For the purposes of this PR what I want is something that in its fundamental form could be:
But we're bikeshedding solutions; the intent here is to discuss requirements. So the requirement the above satisfies is the ability to convert durations between clock domains while controlling the rounding direction. |
And adding this to the list was my goal. Thank you. |
We have to use github for design discussions as there isn't anything else available (slack ruled out by TSC), but its single-threaded nature is really inconvenient. So in a couple days what I'd like to do is take into account all comments received so far and post a new version of the It'll be ugly. I don't have a better solution. |
Experience in #17155 shows another point of discussion: Threads, which are initialized at both compile-time and runtime, include an It may be necessary to make an exception to use milliseconds for |
It was discussed a while back that that argument to k_thread_create() is probably not a good idea to have in the API and that a better scheme would be to simply have a "initialize but do not start" flag, allowing the user to do it at an appropriate time via whatever mechanism she likes. |
Just wanted to point out that given the documentation of It looks like the current way of starting a thread yourself is creating a stack using |
@cfriedt @nordic-krch One more rather old clock-related ticket which now feels more like an RFC than an enhancement request. @nashif RFC tag + backlog? Maybe it would be adequate to collect the remaining open RFCs/Enhancement requests in one RFC (I'd propose #40099 as it already is a meta RFC and has the best link collection AFAICS). For someone like trying to understand for the first time what proposals are being made, this would be very handy. Think this would be a good time as the RFC API has just been introduced and @cfriedt is doing all that POSIX work which is related. Especially in the clock/timing area the number of open stale proposals, abandoned PRs, etc. is really impressive. ;-) |
It's being pulled in to the LTSv3 POSIX roadmap / RFC, which does include a lot of non-posix work, ironically. |
Hi @peter-mitsis, @andyross, This issue, marked as an Enhancement, was opened a while ago and did not get any traction. It was just assigned to you based on the labels. If you don't consider yourself the right person to address this issue, please re-assing it to the right person. Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason. @pabigot you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you. Thanks! |
Pretty sure the subsystem rewrite which this was opened to track and evangelize got abandoned. Let's close it as stale; certainly there's always going to be room for discussing improvements to core subsystems, but this one seems to have run its course. We can always reference it later. |
This issue outlines the current approach to controlled delays in Zephyr, identifies some weaknesses and functional gaps, and serves as a point of discussion of goals and requirements for changes to address those weaknesses.
Existing Approach
Several kernel system APIs allow an application to specify a delay before some operation is initiated or terminated. These APIs include (delay parameters are in bold type):
k_timer_start
(timer, duration, period)k_timer_remaining_get
(timer) => remaining (nb: unsigned)k_queue_get
queue, timeout) (underlyingk_fifo_get
,k_lifo_get
)k_futex_wait
(futex, expected, timeout)k_stack_pop
(stack, data, timeout)k_delayed_work_submit_to_queue
(queue, work, delay)k_delayed_work_remaining_get
(work) => remaining (nb: signed)k_mutex_lock
(mutex, timeout)k_sem_take
(sem, timeout)k_{msgq,mbox,pipe}_[block_]{put,get}
(store, data, timeout)k_mem_{slab,pool}_alloc
(slab, mem, timeout)k_poll
(events, num, timeout)k_sleep
(ms)k_usleep
(us)k_busy_wait
(us)k_thread_deadline_set
(thread, deadline)k_uptime_get
() andk_uptime_delta
(&ms)In current Zephyr most controlled delays are specified as a signed 32-bit integer counting milliseconds, with helper macros like
K_MINUTES(d)
to convert from more coarse-grained measures. Exceptions are:k_usleep
andk_busywait
which operate in microseconds;k_thread_deadline_set
which operates in cycles of the hardware clock;k_uptime_get
which operates ins64_t
milliseconds clamped to tick increments.Internally most delays are implemented through
struct _timeout
which operates onticks
as defined bySYS_CLOCK_TICKS_PER_SEC
. The requested delay is converted to the smallest span of ticks that is not less than the requested delay, except that a duration of zero may be converted to a single tick in some cases. An exception isk_busy_wait
under the influence ofARCH_HAS_CUSTOM_BUSY_WAIT
, currently used in-tree only by Nordic.Functional Gaps
For all APIs but specifically
k_timer
interrupts between the point the application supplies the relative delay and the point the timer infrastructure inserts it into a processing queue introduces complexity in precise delay maintenance. To reduce these complexities it is desirable in some cases to specify delays as deadlines rather than relative offsets (#2811).Use of milliseconds was tolerable as the tick duration has historically been 10 ms. With the upcoming merge of #16782 decreasing tick duration to 100 us finer grained specification will soon be needed for most if not all APIs. Arguments have been made to go as fine as nanoseconds (#6498).
Base Requirements and Questions
In a recent telecon @andyross proposed addressing these gaps by changing the way delays are specified, from signed 32-bit millisecond counts to another representation.
The following are positions and questions which I (@pabigot) have summarized and extended based on previous related discussions and experience. All are open for debate.
k_sem_take(&sem, K_MSEC(5))
--code that uses helper macros to translate delay durations to a timeout value--must be unaffected by any underlying API changes.k_poll
ork_sleep
should allow for absolute deadlines.s32_t
milliseconds that's currently 2147483.647 s (about 3.5 weeks). As resolution is increased this delay is reduced significantly unless larger data types are used.k_uptime_get*
which converts its scale to milliseconds. New API should be added to access the full precision for use in maintaining absolute deadlines.The text was updated successfully, but these errors were encountered: