Skip to content

Delay time influenced by memory configuration #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tpwrules opened this issue Jul 1, 2019 · 5 comments
Open

Delay time influenced by memory configuration #151

tpwrules opened this issue Jul 1, 2019 · 5 comments

Comments

@tpwrules
Copy link

tpwrules commented Jul 1, 2019

How many cycles asm::delay() can be affected by how the chip memory is configured and even where the function ends up in Flash. The code assumes the loop takes 4 cycles and explains that interrupts may cause the function to delay longer than requested.

However, items such as Flash wait states, instruction cache, and function alignment can also cause the loop to take longer, even when interrupts are disabled. As an example, asm::delay() running on the NRF52840 can take 4, 5, or 6 cycles per loop depending on whether the function ends up 16, 8, or 2 byte aligned. This also means that the delay depends on how exactly the binary is compiled and linked.

This is very surprising behavior and I think a note to the above effect would be useful in the function's documentation. Perhaps the loop can be measured as part of initialization, but this would probably be unnecessary overhead.

@therealprof
Copy link
Contributor

The documentation clearly states that delay will block for at least the specified amount of cycles. It certainly is a really poor delay implementation and not meant to be used for accurate timing. PRs to improve upon that would certainly be appreciated.

NB: Since this function is inlined it will behave differently depending on the caller so measuring as part of the initialisation is not possible. Not inlining it would make it even less dependable, at least for short delays.

@ianrrees
Copy link
Contributor

I believe the concerns in the OP here have been addressed (to the extent they can be, as discussed in other issues) now that #587 has been merged.

Though, that is a curious note about the NRF52840 and 16 byte alignment. I only tested #587 on a SAMD51 (Cortex M4) on which the branch timing seems to only differentiate between 2, 4, or 8 byte aligned targets.

@tpwrules
Copy link
Author

tpwrules commented Apr 23, 2025

I don’t have the bandwidth right now to confirm the fix or sort out why I came to the cycle vs. alignment conclusions that I did, but this looks like a good step to me.

Thanks for working on it. Feel free to close the issue.

I do still think a doc clarification would be worthwhile. The “at least” statement to me implies an additive error (starting overhead, interrupts) rather than the multiplicative error caused by the cycle timing effects.

@ianrrees
Copy link
Contributor

I agree about the "at least", will aim to PR a change to that wording and fix a minor error I made in the comment that explains the alignment. In another thread, I suggested that we put the argument in terms of instructions/NOPs instead of cycles - maybe that would better communicate the situation.

With a slightly wider view, I'm wondering if the right place for this method is actually in the HAL layer instead. Just thinking about the ATSAM one that I am most familiar with, we could quite easily make a delay that's more-or-less cycle correct since we know exactly what chip the code will run on.

@adamgreig
Copy link
Member

A lot of HALs can implement delays more usefully, including using timer peripherals and such, and can implement the embedded-hal Delay traits etc too, but I think it's useful to have some very basic, very compatible "delay" in cortex-m, even if it varies a bit chip-to-chip (and at least not alignment-to-alignment any more after your PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants