Skip to content

Conversation

@nordic-krch
Copy link
Contributor

Avoid result overflow due to intermediate product overflow.
Algorithm was multiplying input value by target frequency
before dividing it by source frequency. If target frequency was
high (e.g. conversion to nanoseconds) it could easily lead to
overflow even though final result would not overflow. Adjusting
algorithm to avoid that.

Note, that typically this code is resolved at compile time so
it will not impact performance as long as it can be resolved.

Fixes #41111.

Signed-off-by: Krzysztof Chruscinski [email protected]

@github-actions github-actions bot added the area: API Changes to public APIs label Jan 5, 2022
@zephyrbot zephyrbot added the area: Base OS Base OS Library (lib/os) label Jan 5, 2022
@zephyrbot zephyrbot requested review from andyross and dcpleung January 5, 2022 14:22
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not unreasonable. Though regarding the linked bug... honestly doing math in terms of system uptime measured in (!) gigahertz is just going to lead to tears one way or another, whether or not the API does it optimally. Dance too close to a precision limit and you're going to be burned. The intended application of these utilities was for constructing delays to feed to kernel timeouts.

FWIW: the selection of the algorithm happens at compile time, but the operations still have to happen at runtime as "t" is input. So this is adding three operations[1] and effectively cutting performance in half in practice as I read the assembly output.

[1] Though libgcc has a divmod routine (and x86 has DIVMOD as an instruction) so in practice the worst penalty comes "for free" along with the already-existing quotient.

@nordic-krch
Copy link
Contributor Author

@andyross what about limiting this approach to to_hz < BIT_MASK(x) exceeding arbitrary value (e.g.x=20 bits so 1mhz fits in)? That would reduce computation for typical conversions in us,ms range where overflow is unlikely.

@andyross
Copy link
Contributor

I think that's reasonable too, you could even (with a little care) express that mask in units of real time via a kconfig, so you'd get a guarantee like "we use the fast form as long as conversions smaller than CONFIG_MAX_TIMEOUT_DAYS are representable without precision loss".

Though it's worth remembering that anyone who selected CONFIG_TIMEOUT_64BIT is already willing to pay some overhead. People who really want to microoptimize timeouts and are willing to deal with precision and rollover at the app level already have a much-faster-still 32 bit representation. With care, you can even use that with GHz values (though not values representing system uptime, heh).

@nordic-krch
Copy link
Contributor Author

Added kconfig option which target frequency threshold. I think that using 20bits ensures that milli/microseconds will be using fast algorithm and it's unlikely that it will be touched by anyone (unless someone converts 48h represented in nanoseconds).

I would like to hide this somewhere but haven't found any suitable kernel section in menuconfig. Looking at menuconfig it might be worth creating System clock options menu and put there all clock related stuff from main menu.

@nordic-krch nordic-krch force-pushed the fix_z_tmcvt branch 2 times, most recently from b643210 to 78b59d6 Compare January 12, 2022 11:01
@andyross
Copy link
Contributor

Still looks great to me (though I still think doing this as a "maximum time that can be up-converted to the fastest defined clock" instead of a bitmask would be clever).

@nordic-krch
Copy link
Contributor Author

@andyross

I still think doing this as a "maximum time that can be up-converted to the fastest defined clock" instead of a bitmask would be clever

But that would compare against value t and not to_hz. In that case both algorithms would be compiled in when t is not constant. I don't fully understand what is fastest defined clock here?

@github-actions github-actions bot added the area: Test Framework Issues related not to a particular test, but to the framework instead label Jan 13, 2022
@andyross
Copy link
Contributor

You hit the overflow in the expression from "t * to_hz", the to_hz value is one of the list of specified clocks in the system. So it only overflows if t > 2^64/to_hz. And t is a unit of time in some other clock.

So if your highest clock (cycles) is e.g. 24 Mhz, and your lowest clock (ticks) is 10 kHz, then the largest delay reliably convertible is 2^64/CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC/CONFIG_SYS_CLOCK_TICKS_PER_SEC ~= 77Msec, or 889.6 days.

So I'm proposing that we have a CONFIG_SYS_CLOCK_MAX_TIMEOUT_DAYS value representing the maximum timeout value the application wants to compute without overflow. If (on this particular example system) this is <= 889, then on that system it's permissible to use the fast version of the algorithm.

The math isn't trivial, but it's all computible at compile time. And the advantage is that the app gets a really obvious tunable to control this instead of a somewhat opaque bitmask.

But again I don't see that this should block merge. We can always come back and optimize later.

@andyross
Copy link
Contributor

and your lowest clock (ticks) is

That was poorly phrased. Not the lowest clock in the time_units.h header (which would probably be ms, not ticks), but the lower of the two clocks in the specific conversion being generated. You apply that math in every conversion, so if you want to do "ms to ticks" you get the fast version every time and only pay the cost when you're trying to convert GHz cycles.

@nordic-krch
Copy link
Contributor Author

Ok, i think i got that. Applied. Can you check? I used 365 days as default but if #41814 get it i guess that it can be reduced (to 30 days?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you want a ULL suffix on those constants, otherwise the (kconfig * 24 * 3600) is computed in 32 bit precision and can overflow (for somewhat pathological, but still permissible, values of the tunable) before being widened by the multiplication by the (64 bit) from_hz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Avoid result overflow due to intermediate product overflow.
Algorithm was multiplying input value by target frequency
before dividing it by source frequency. If target frequency was
high (e.g. conversion to nanoseconds) it could easily lead to
overflow even though final result would not overflow. Adjusting
algorithm to avoid that.

Note, that typically this code is resolved at compile time so
it will not impact performance as long as it can be resolved.

Signed-off-by: Krzysztof Chruscinski <[email protected]>
Add maximum timeout used for conversion to Kconfig. Option is used
to determine which conversion algorithm to use: faster but overflowing
earlier or slower without early overflow.

Signed-off-by: Krzysztof Chruscinski <[email protected]>
@andyross
Copy link
Contributor

Reup the +1 for the third time. Honestly I thought it was fine as originally submitted, but I like this version much better.

@cfriedt
Copy link
Member

cfriedt commented Dec 7, 2022

Next time, let's get a unit test as well.

cfriedt pushed a commit to cfriedt/zephyr that referenced this pull request Dec 15, 2022
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
fabiobaltieri pushed a commit that referenced this pull request Dec 19, 2022
Prior to #41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
zephyrbot pushed a commit that referenced this pull request Dec 19, 2022
Prior to #41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
(cherry picked from commit 74c9c0e)
cfriedt pushed a commit that referenced this pull request Dec 21, 2022
Prior to #41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
(cherry picked from commit 74c9c0e)
cfriedt pushed a commit to cfriedt/zephyr that referenced this pull request Apr 7, 2023
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
(cherry picked from commit 74c9c0e)
cfriedt pushed a commit that referenced this pull request Apr 7, 2023
Prior to #41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
(cherry picked from commit 74c9c0e)
ghost pushed a commit to Taggr-AB/zephyr that referenced this pull request May 2, 2023
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
(cherry picked from commit 74c9c0e)
efra-mx-aqua pushed a commit to efra-mx/zephyr that referenced this pull request Jun 13, 2023
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul,
then div), an intermediate value would overflow, resulting in
a time non-linearity.

This test ensures that time rolls-over properly.

Signed-off-by: Chris Friedt <[email protected]>
(cherry picked from commit 74c9c0e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: API Changes to public APIs area: Base OS Base OS Library (lib/os) area: Kernel area: Test Framework Issues related not to a particular test, but to the framework instead

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uint64 overflow in z_tmcvt() function

6 participants