Skip to content

k_uptime_get_32() does not behave as documented #18739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pabigot opened this issue Aug 28, 2019 · 10 comments · Fixed by #18744
Closed

k_uptime_get_32() does not behave as documented #18739

pabigot opened this issue Aug 28, 2019 · 10 comments · Fixed by #18744
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Milestone

Comments

@pabigot
Copy link
Collaborator

pabigot commented Aug 28, 2019

It was recently pointed out on slack that k_uptime_get_32() does not return The lower 32-bits of the elapsed time since the system booted, in milliseconds.

It returns instead the number of milliseconds corresponding to the low 32 bits of the tick counter.

u32_t z_impl_k_uptime_get_32(void)
{
        return __ticks_to_ms(z_tick_get_32());
}

The values are significantly different when you look at a 32-bit rollover of the tick clock:

At 10000 Hz ticks and 2^32 +/- 50:
t0 = 0x0000ffffffce = 4294967246 ticks => t0m = 429496724 ms
t1 = 0x000100000032 = 4294967346 ticks => t1m = 429496734 ms
t1-t0 = 100 ticks = 10 ms
and t1m-t0m = 10 ms
tt0 = 0x0000ffffffce = 4294967246 ticks => tt0m = 429496724 ms
tt1 = 0x000000000032 = 50 ticks => tt1m = 5 ms
tt1-tt0 = 100 ticks = 10 ms
but tt1m-tt1m = 3865470577 ms

This explains #17155 (comment) which had claimed that the current implementation followed an algorithm that I noted was incorrect. Apparently it does.

@pabigot pabigot added the bug The issue is a bug, or the PR is fixing a bug label Aug 28, 2019
@alexanderwachter alexanderwachter added the TSC Topics that need TSC discussion label Aug 28, 2019
@pabigot
Copy link
Collaborator Author

pabigot commented Aug 28, 2019

I'm going to work up a patch that marks it deprecated, removes all uses from master, and documents the misbehavior.

pabigot added a commit to pabigot/zephyr that referenced this issue Aug 28, 2019
This function does not return the low 32 bits of k_uptime_get() as
suggested by it's documentation; it returns the number of milliseconds
represented by the low 32-bits of the underlying system clock.  The
truncation before translation results in discontinuities at every point
where the system clock increments bit 33.

Refine the documentation to describe what it actually does, mark it
deprecated, and replace all in-tree uses with an expression that is
slower but does the right thing.

Closes zephyrproject-rtos#18739.

Signed-off-by: Peter Bigot <[email protected]>
@alexanderwachter alexanderwachter removed the TSC Topics that need TSC discussion label Aug 28, 2019
@andyross
Copy link
Collaborator

Doesn't #17155 already fix this, though? The resolution to that discussion was that the 32 bit conversion results should be correct to the full precision of the output word, and that's what the new conversion utilities implement and test for.

@carlescufi
Copy link
Member

Doesn't #17155 already fix this, though? The resolution to that discussion was that the 32 bit conversion results should be correct to the full precision of the output word, and that's what the new conversion utilities implement and test for.

@pabigot might want to weigh in, but I think the reason for this is the fact that we can get a fix in time for 2.0, whereas #17155 will only be merged in time for 2.1.

@andyross
Copy link
Collaborator

That's fair. Mostly I'm just whining about extra work when rebasing, I guess. I agree with the fix on the whole.

@pabigot
Copy link
Collaborator Author

pabigot commented Aug 28, 2019

My take-away from TSC is that nobody thought this was important enough to fix for 2.0 (otherwise TSC would have had to vote to approve it). So it's not going in.

@carlescufi
Copy link
Member

@pabigot we've had this issue for a few releases now, so that might be why there was no vote. Just not deemed urgent enough given the amount of time this has been in the tree.

@andyross andyross added this to the v2.1.0 milestone Aug 28, 2019
@andyross
Copy link
Collaborator

Bug tracking accounting. Update milestone to match PR

@pabigot
Copy link
Collaborator Author

pabigot commented Aug 28, 2019

@pabigot we've had this issue for a few releases now, so that might be why there was no vote. Just not deemed urgent enough given the amount of time this has been in the tree.

@carlescufi That may be due to a misunderstanding of the significance of the problem, either by me or by others.

Until the 2.0 development series most Zephyr targets had a 100 Hz tick rate, which meant a 32-bit system clock wouldn't wrap until just over 497 days runtime.

Now with #16782 most targets have a 10 kHz tick rate, which wraps the system clock after less than 5 days runtime.

As I noted, durations based on subtracting k_uptime_32() values that cross a 32-bit wrap of the system clock can be wrong because bits above 2^32 affect the ms conversion. So theoretically this should now be observable on any application that runs more than five days, and will recur at 4.97 day intervals on most targets.

Or so I calculate. I haven't had lunch yet, so somebody should probably check that math.

@carlescufi
Copy link
Member

@pabigot that's a very good point.
That said, I am afraid it's just too late now for code changes for 2.0. Is there a chance you can instead send a PR that updates the documentation of k_uptime_get_32()?

@pabigot pabigot added the has-pr label Aug 29, 2019
@pabigot
Copy link
Collaborator Author

pabigot commented Aug 29, 2019

@carlescufi @ioannisg OK, there's now #18771 for this, intended for 2.0 merge as a "documentation fix".

It's a fine line between documenting the serious problems with this API to suggest people not to use it, while it's still used for many timeout operations within the kernel.

BTW: Accepting this for 2.0 probably implies deprecating the function in 2.1, since we don't want the documented behavior of an API function to change multiple times in successive releases.

@rljordan-zz rljordan-zz added the priority: medium Medium impact/importance bug label Aug 30, 2019
@ioannisg ioannisg added priority: high High impact/importance bug and removed priority: high High impact/importance bug priority: medium Medium impact/importance bug labels Sep 3, 2019
pabigot added a commit to pabigot/zephyr that referenced this issue Sep 3, 2019
The current implementation does not return the low 32 bits of
k_uptime_get() as suggested by it's documentation; it returns the number
of milliseconds represented by the low 32-bits of the underlying system
clock.  The truncation before translation results in discontinuities at
every point where the system clock increments bit 33.

Reimplement it using the full-precision value, and update the
documentation to note that this variant has little value for
long-running applications.

Closes zephyrproject-rtos#18739.

Signed-off-by: Peter Bigot <[email protected]>
ioannisg pushed a commit that referenced this issue Sep 3, 2019
The current implementation does not return the low 32 bits of
k_uptime_get() as suggested by it's documentation; it returns the number
of milliseconds represented by the low 32-bits of the underlying system
clock.  The truncation before translation results in discontinuities at
every point where the system clock increments bit 33.

Reimplement it using the full-precision value, and update the
documentation to note that this variant has little value for
long-running applications.

Closes #18739.

Signed-off-by: Peter Bigot <[email protected]>
backporting bot pushed a commit that referenced this issue Sep 3, 2019
The current implementation does not return the low 32 bits of
k_uptime_get() as suggested by it's documentation; it returns the number
of milliseconds represented by the low 32-bits of the underlying system
clock.  The truncation before translation results in discontinuities at
every point where the system clock increments bit 33.

Reimplement it using the full-precision value, and update the
documentation to note that this variant has little value for
long-running applications.

Closes #18739.

Signed-off-by: Peter Bigot <[email protected]>
nashif pushed a commit that referenced this issue Sep 25, 2019
The current implementation does not return the low 32 bits of
k_uptime_get() as suggested by it's documentation; it returns the number
of milliseconds represented by the low 32-bits of the underlying system
clock.  The truncation before translation results in discontinuities at
every point where the system clock increments bit 33.

Reimplement it using the full-precision value, and update the
documentation to note that this variant has little value for
long-running applications.

Closes #18739.

Signed-off-by: Peter Bigot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
7 participants