-
Notifications
You must be signed in to change notification settings - Fork 8.2k
sys: time_units: Increase range of z_tmcvt #41602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3e382e3 to
946709b
Compare
andyross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not unreasonable. Though regarding the linked bug... honestly doing math in terms of system uptime measured in (!) gigahertz is just going to lead to tears one way or another, whether or not the API does it optimally. Dance too close to a precision limit and you're going to be burned. The intended application of these utilities was for constructing delays to feed to kernel timeouts.
FWIW: the selection of the algorithm happens at compile time, but the operations still have to happen at runtime as "t" is input. So this is adding three operations[1] and effectively cutting performance in half in practice as I read the assembly output.
[1] Though libgcc has a divmod routine (and x86 has DIVMOD as an instruction) so in practice the worst penalty comes "for free" along with the already-existing quotient.
|
@andyross what about limiting this approach to |
|
I think that's reasonable too, you could even (with a little care) express that mask in units of real time via a kconfig, so you'd get a guarantee like "we use the fast form as long as conversions smaller than CONFIG_MAX_TIMEOUT_DAYS are representable without precision loss". Though it's worth remembering that anyone who selected CONFIG_TIMEOUT_64BIT is already willing to pay some overhead. People who really want to microoptimize timeouts and are willing to deal with precision and rollover at the app level already have a much-faster-still 32 bit representation. With care, you can even use that with GHz values (though not values representing system uptime, heh). |
|
Added kconfig option which target frequency threshold. I think that using 20bits ensures that milli/microseconds will be using fast algorithm and it's unlikely that it will be touched by anyone (unless someone converts 48h represented in nanoseconds). I would like to hide this somewhere but haven't found any suitable kernel section in menuconfig. Looking at menuconfig it might be worth creating |
b643210 to
78b59d6
Compare
|
Still looks great to me (though I still think doing this as a "maximum time that can be up-converted to the fastest defined clock" instead of a bitmask would be clever). |
But that would compare against value |
78b59d6 to
8f4c696
Compare
|
You hit the overflow in the expression from " So if your highest clock (cycles) is e.g. 24 Mhz, and your lowest clock (ticks) is 10 kHz, then the largest delay reliably convertible is 2^64/CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC/CONFIG_SYS_CLOCK_TICKS_PER_SEC ~= 77Msec, or 889.6 days. So I'm proposing that we have a CONFIG_SYS_CLOCK_MAX_TIMEOUT_DAYS value representing the maximum timeout value the application wants to compute without overflow. If (on this particular example system) this is <= 889, then on that system it's permissible to use the fast version of the algorithm. The math isn't trivial, but it's all computible at compile time. And the advantage is that the app gets a really obvious tunable to control this instead of a somewhat opaque bitmask. But again I don't see that this should block merge. We can always come back and optimize later. |
That was poorly phrased. Not the lowest clock in the time_units.h header (which would probably be ms, not ticks), but the lower of the two clocks in the specific conversion being generated. You apply that math in every conversion, so if you want to do "ms to ticks" you get the fast version every time and only pay the cost when you're trying to convert GHz cycles. |
8f4c696 to
5eb522f
Compare
|
Ok, i think i got that. Applied. Can you check? I used 365 days as default but if #41814 get it i guess that it can be reduced (to 30 days?). |
include/sys/time_units.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you want a ULL suffix on those constants, otherwise the (kconfig * 24 * 3600) is computed in 32 bit precision and can overflow (for somewhat pathological, but still permissible, values of the tunable) before being widened by the multiplication by the (64 bit) from_hz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Avoid result overflow due to intermediate product overflow. Algorithm was multiplying input value by target frequency before dividing it by source frequency. If target frequency was high (e.g. conversion to nanoseconds) it could easily lead to overflow even though final result would not overflow. Adjusting algorithm to avoid that. Note, that typically this code is resolved at compile time so it will not impact performance as long as it can be resolved. Signed-off-by: Krzysztof Chruscinski <[email protected]>
Add maximum timeout used for conversion to Kconfig. Option is used to determine which conversion algorithm to use: faster but overflowing earlier or slower without early overflow. Signed-off-by: Krzysztof Chruscinski <[email protected]>
5eb522f to
3fc4dd3
Compare
|
Reup the +1 for the third time. Honestly I thought it was fine as originally submitted, but I like this version much better. |
|
Next time, let's get a unit test as well. |
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]>
Prior to #41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]>
Prior to #41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]> (cherry picked from commit 74c9c0e)
Prior to #41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]> (cherry picked from commit 74c9c0e)
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]> (cherry picked from commit 74c9c0e)
Prior to #41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]> (cherry picked from commit 74c9c0e)
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]> (cherry picked from commit 74c9c0e)
Prior to zephyrproject-rtos#41602, due to the ordering of operations (first mul, then div), an intermediate value would overflow, resulting in a time non-linearity. This test ensures that time rolls-over properly. Signed-off-by: Chris Friedt <[email protected]> (cherry picked from commit 74c9c0e)
Avoid result overflow due to intermediate product overflow.
Algorithm was multiplying input value by target frequency
before dividing it by source frequency. If target frequency was
high (e.g. conversion to nanoseconds) it could easily lead to
overflow even though final result would not overflow. Adjusting
algorithm to avoid that.
Note, that typically this code is resolved at compile time so
it will not impact performance as long as it can be resolved.
Fixes #41111.
Signed-off-by: Krzysztof Chruscinski [email protected]