Skip to content

Boost default tick rate to double digit kHz on tickless hardware #16782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 3, 2019

Conversation

andyross
Copy link
Collaborator

Since the introduction of pervasively-supported tickless mode, it's been possible for apps to achieve effectively arbitrary timing precision (up to the limits of the hardware counter, of course) by adjusting the CONFIG_SYS_CLOCK_TICKS_PER_SEC value.

But experience has shown that this mechanism is too confusing or too controversial for app authors, its uptake has been very low, and there are now currently two separate reimplementations of system timing architectures in review. All to achieve a feature that... was supposed to already be present.

So let's just make it default.

On all tickless capable hardware platforms, the default tick rate is now 10 kHz. On nRF5x devices, which have a counter that doesn't divide well into decimal frequencies, we're using a raw tick rate of 32768 Hz to exactly match the hardware counter (pleasingly, this GREATLY simplifies the generated code for the ISR and should show significantly improved latency numbers). Note that software emulation platforms (qemu and nsim) which expose the host clock must continue to use a 100 Hz default, as the emulator timing precision is limited by host OS scheduling quanta.

Note that this is a small patch series (that removes more lines than it adds!), and that almost all the content is just kconfig muckery. There were only a handful of bugs fixed, two were bad test assumptions about tick rate and one was an issue found by inspection in the "next tick round up" code in the nrf driver.

This completes sanitycheck correctly on the emulation platofrms, and runs tests/timer and tests/sched tests successfully on nrf52840_pca10056 and frdm_k64f. I'm having trouble with my SAM E70 board at the moment but hope to have results from that in a bit.

Anyway, hopefully this can help people looking for increased precision in the near term while we continue to discuss the various rewrites.

Andy Ross added 12 commits July 2, 2019 14:32
The test for the k_uptime_delta utilities was calling it in a loop and
waiting for the uptime to advance.  But the code was specifically
wanting it to advance 5ms or more at one go, which clearly isn't going
to work for a tick rate above 200 Hz.

The weird thing is that the test knew this and even commented about
the limitation.  Which seems silly: it's perfectly fine for the clock
to advance just a single millisecond.  That's correct behavior too.
Let's test for that, and it will work everywhere.

Signed-off-by: Andy Ross <[email protected]>
The sleep test was checking that the sleep took no longer than "2
ticks" longer than requested.  But "2 ticks" for fast tick rate
configurations can be "zero ms", and for aliasing reasons it's always
possible to delay for 1 unit more than requested (becuase you can
cross a millisecond/tick/whatever boundary in your own code on either
side of the sleep).  So that "slop" value needs to be no less than
1ms.

Signed-off-by: Andy Ross <[email protected]>
The default tick rate is now 10 kHz, but that driver was demanding
that it be exactly 1 kHz instead of at least that rate.  I checked the
source, and the driver isn't actually extracting "ticks" from the
kernel illegally, it just needs fine-grained timers that work with the
existing millisecond API.  Let it build, this is fine.

Signed-off-by: Andy Ross <[email protected]>
The scheduler API has always allowed setting a zero slice size as a
way to disable timeslicing.  But the workaround introduced for
CONFIG_SWAP_NONATOMIC forgot that convention, and was calling
reset_time_slice() with that zero value (i.e. requesting an immediate
interrupt) in circumstances where z_swap() had been interrupted
nonatomically.

In practice, this never happened.  And if it did, it was a single
spurious no-op interrupt that no one cared about.  Until it did,
anyway...

Now that ticks on nRF devices are at full 32 kHz speed, we can get
into a situation where the rapidly triggering timeslice interrupts are
interrupting z_swap() calls, and the process feeds back on itself and
becomes self-sustaining.

Put that test into the time slice code itself to prevent this kind of
mistake in the future.

Signed-off-by: Andy Ross <[email protected]>
The logic about minimal sleep sizes due to "tick" aliasing was
correct, but drivers also have similar behavior with "cycle" aliasing
too.  When cycles are 3-4 orders of magnitude faster than ticks, it's
undetectable noise.  But now on nRF they're exactly the same and we
need to correct for that, essentially doubling the number of ticks a
usleep() might wait for.

The logic here was simply too strict, basically.  Fast tick rates
can't guarantee what the test promised.

Note that this relaxes the test bounds on the other side of the
equation too: it's no longer an error to usleep() for only one tick
(i.e. an improved sleep/timeout implementation no longer gets detected
as a test failure).

Signed-off-by: Andy Ross <[email protected]>
This test was written to assume ~100 Hz ticks in ways that are
difficult to fix.  It wants to sleep for periods on the order of the
TICKLESS_IDLE_THRESH kconfig, which is extremely small on high tick
rate systems and (on nRF in particular) does not have a cleanly
divisible representation in milliseconds.

Fixing precision issues by cranking the idle threshold up on a
per-system basis seems like an abuse, as that is what we want to be
testing in the first place.  Just let the test run at the tick rate it
has always expected.

Signed-off-by: Andy Ross <[email protected]>
"50" ticks is fine with 100 Hz timer precision but way too short to
survive the conversion to milliseconds on fast, non-decimal tick
rates.  Make it half a second, which was the original intent.

Signed-off-by: Andy Ross <[email protected]>
This test was written to properly align its millisecond-measured wait
time and assumed that there would be no other overhead.  In fact on
fast tick rate systems (or even ones where the alignment computation
doesn't provide the needed padding as "slop") that's not quite enough
time to complete the full test.  There are cycles between the sleep
calls that need to be accounted for, and aren't.

Just give it one extra work item of time before failing.  We aren't
testing work queue timing precision here, just evaluation semantics.

Signed-off-by: Andy Ross <[email protected]>
This test seems a little confused.  It does a POSIX usleep() for 90ms,
then checks the time taken, and verifies that it was no less
than... 91ms!

On existing platforms, tick alignment makes sure that we always take a
little longer, so this passes. But on high tick rate configurations we
get it exactly right.  And fail.

Adjust the calibration to allow (exactly) 90ms sleeps.  Also fixed a
comment that described the wrong units.

Signed-off-by: Andy Ross <[email protected]>
This test was written to assume that k_busy_wait() and CMSIS
osKernelSysTick() (which is just k_cycle_get_32()) were perfectly
synchronized.  On nRF, they aren't (one is the 32 kHz RTC timer, the
other is a calibrated spin loop using the CPU frequency).

When ticks were being reported at 100 Hz granularity, there wasn't
enough precision to detect the mismatch.  Now there is.  Rework the
test to require that the clocks match to within 1%.

Signed-off-by: Andy Ross <[email protected]>
The current CMSIS v2 implementation is clearly assuming that timeout
arguments being passed to e.g. osDelay() are in units of Zephyr ticks,
not milliseconds as specified by ARM or (inconsistently) assumed by
our test code.

Most tests work with the ~100 Hz default tick rate, but they tend to
fail on precision issues at higher tick rates.  Force the CMSIS v2
applications to be 1000 Hz for now as a workaround, and detect the
mismatch as a build failure.

Signed-off-by: Andy Ross <[email protected]>
The mempool blocking implementation was mixing tick and millisecond
APIs.  Get it right.

Signed-off-by: Andy Ross <[email protected]>
@andyross
Copy link
Collaborator Author

andyross commented Jul 2, 2019

Rebased. Fixed up doc errors due to the endrst/endrststar change that this crossed and mismerged with.

@nashif nashif merged commit 905209b into zephyrproject-rtos:master Jul 3, 2019
@ghost ghost mentioned this pull request Jul 29, 2019
sigurdnev added a commit to sigurdnev/fw-nrfconnect-zephyr that referenced this pull request Nov 4, 2019
…t time

The 1ms wait time has been shown to not be enough. Increasing to 10ms.

This change has been shown to be necessary after
CONFIG_SYS_CLOCK_TICKS_PER_SEC was changed from 128 to 32768
in zephyrproject-rtos/zephyr#16782

Signed-off-by: Sigurd Olav Nevstad <[email protected]>
mbolivar pushed a commit to nrfconnect/sdk-zephyr that referenced this pull request Nov 4, 2019
…t time

The 1ms wait time has been shown to not be enough. Increasing to 10ms.

This change has been shown to be necessary after
CONFIG_SYS_CLOCK_TICKS_PER_SEC was changed from 128 to 32768
in zephyrproject-rtos/zephyr#16782

Signed-off-by: Sigurd Olav Nevstad <[email protected]>
backporting bot pushed a commit to nrfconnect/sdk-zephyr that referenced this pull request Nov 4, 2019
…t time

The 1ms wait time has been shown to not be enough. Increasing to 10ms.

This change has been shown to be necessary after
CONFIG_SYS_CLOCK_TICKS_PER_SEC was changed from 128 to 32768
in zephyrproject-rtos/zephyr#16782

Signed-off-by: Sigurd Olav Nevstad <[email protected]>
ioannisg pushed a commit that referenced this pull request Nov 5, 2019
The 1ms wait time has been shown to not be enough. Increasing to 10ms.

This change has been shown to be necessary after
CONFIG_SYS_CLOCK_TICKS_PER_SEC was changed from 128 to 32768
in #16782

Signed-off-by: Sigurd Olav Nevstad <[email protected]>
mbolivar pushed a commit to nrfconnect/sdk-zephyr that referenced this pull request Nov 5, 2019
…t time

The 1ms wait time has been shown to not be enough. Increasing to 10ms.

This change has been shown to be necessary after
CONFIG_SYS_CLOCK_TICKS_PER_SEC was changed from 128 to 32768
in zephyrproject-rtos/zephyr#16782

Signed-off-by: Sigurd Olav Nevstad <[email protected]>
tejlmand pushed a commit to tejlmand/fw-nrfconnect-zephyr-1 that referenced this pull request Nov 15, 2019
…t time

The 1ms wait time has been shown to not be enough. Increasing to 10ms.

This change has been shown to be necessary after
CONFIG_SYS_CLOCK_TICKS_PER_SEC was changed from 128 to 32768
in zephyrproject-rtos/zephyr#16782

Signed-off-by: Sigurd Olav Nevstad <[email protected]>
SebastianBoe pushed a commit to tejlmand/fw-nrfconnect-zephyr-1 that referenced this pull request Nov 18, 2019
…t time

The 1ms wait time has been shown to not be enough. Increasing to 10ms.

This change has been shown to be necessary after
CONFIG_SYS_CLOCK_TICKS_PER_SEC was changed from 128 to 32768
in zephyrproject-rtos/zephyr#16782

Signed-off-by: Sigurd Olav Nevstad <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Bluetooth area: Boards area: Kernel area: Samples Samples area: Sensors Sensors area: Tests Issues related to a particular existing or missing test area: Timer Timer
Projects
None yet
Development

Successfully merging this pull request may close these issues.