-
Notifications
You must be signed in to change notification settings - Fork 8.2k
posix: Fix handling of timeouts #17813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
All checks are passing now. Review history of this comment for details about previous failed status. |
|
FYI the checkpatch warning about positive errno is incorrect (we're following posix spec here, not zephyr convention) |
andyross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's clearly a real bug. Designwise, it strikes me that it would be a lot simpler to just convert the abstime parameter into a millisecond value, check vs. k_uptime_get() and do a subtraction instead of all that modular microseconds stuff.
Which would then need to be merged with the equivalent code in clock_gettime() which is doing the same thing.
Also check some of the new code in #17155, which adds features like absolute-valued kernel timeouts which speak to exactly this problem.
None of that is a reason not to merge this, but looking at this code this is definitely an area we're going to want to come back to and simplify soon.
pfalcon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimparis, Thanks for the patch! May I ask you to try to rework it to avoid extra timeval/timespec conversion and to make code "more clear":
- Please use clock_gettime().
- Please compute struct timespec difference once, then branch on it being negative or zero (if you strongly feel than the current way is better, please elaborate why).
43cad8b to
22228e3
Compare
pthread_cond_timedwait and pthread_mutex_timedlock were treating the timeout as relative, but it's absolute. Use existing helper to convert. Rename argument to clarify. Rename timespec_to_timeoutms -> _timespec_to_timeoutms and include in posix/time.h so that files don't need to declare the prototype themselves. Ensure that expired timeouts return ETIMEDOUT in pthread_rwlock_timedrdlock, pthread_rwlock_timedwrlock, mq_timedsend, and mq_timedreceive. Signed-off-by: Jim Paris <[email protected]>
|
Rebased on master and rewritten. Turns out there was already a "timespec_to_timeoutms" function being used by This needs closer reviews and testing. Of these changes I've only lightly tested the |
pfalcon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimparis, Thanks for the updates!
In general the POSIX compat stuff seems to need a lot of work.
That's absolutely true. Initial Pthreads-and-stuff implementation was hasted and contains atrocious bugs. On top of that, integration of POSIX subsys with the rest of Zephyr subsys'es left much to be desired, and was a subject of heated debate in #16626, #16621, #17353, etc.
So, cleaning up that mess on us, folks interested in real usage of POSIX subsys in Zephyr, not on those who did initial unreviewable code-drop. That's why I have to ask you to fix and obvious issue of mis-comparing truncated values as present in the old code, now that you touch it to underscore an internal func.
On the good news, for 2.0, initial, but noticeable chunk of POSIX subsys fixes is already in. And this could be a good addition to it. So, thanks again.
| */ | ||
| int pthread_cond_timedwait(pthread_cond_t *cv, pthread_mutex_t *mut, | ||
| const struct timespec *to); | ||
| const struct timespec *abstime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic note: I personally (and I make this note as a POSIX subsys maintainer) consider that the patch made right way tries to minimal number of changes. Renaming argument name isn't necessary for the nature of this patch, ergo... (Not calling to rename it back, but please consider this point in mind for future changes.)
| #define TIMER_ABSTIME 4 | ||
| #endif | ||
|
|
||
| s64_t _timespec_to_timeoutms(const struct timespec *abstime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic note: not ideal place for this func (this is official POSIX header!), but as _ts_to_ms() below is already there, ok.
| s32_t timeout; | ||
|
|
||
| timeout = (s32_t) timespec_to_timeoutms(abstime); | ||
| timeout = (s32_t) _timespec_to_timeoutms(abstime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I see that this is left from the original code, but this can't be right, it just truncates s64_t to lower 32 bits, and interprets it as signed. Please do proper <= check on s64_t value. Extra points for checking if positive s64_t value won't fit into positive range of s32_t timeout, and return EINVAL in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worse, turns out that _timespec_to_timeoutms never returns negative value, in case of negative diff it return K_NO_WAIT instead.
| timeout = (s32_t) timespec_to_timeoutms(abstime); | ||
| timeout = (s32_t) _timespec_to_timeoutms(abstime); | ||
| if (timeout <= 0) | ||
| return ETIMEDOUT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you'd need to follow the codestyle: braces are mandatory in all cases, even for a single statement. (Here and below.)
Oh, and yeah, we won't win this battle, unless we start to add testcases for these things. Not asking it to be part of this patch, just setting the stage right ;-). |
|
ping @jimparis |
|
It may be a while before I have time to work more on this, so if someone else wants to pick it up that would be 👍. Regarding tests, I agree, although it seems silly to create our own. Maybe use Bionic's? |
@jimparis, Ok, I'll pick up this then. My plan is: submit a new PR with my changes; commits will still have you as git author, I'll just add my sign-off to the commit message body. Let me know if you have any concerns. |
That would seem like a good idea, but someone who'd pursue it, would drown in a bureaucratic process of arguing it. Bionic's plus is that it's least Apache2, so no licensing bureaucracy, but it's C++, and brings in yet another test harness lib. So, +1 and good luck to anyone who'd do that ;-). |
| timeout = (s32_t) _timespec_to_timeoutms(abstime); | ||
|
|
||
| if (write_lock_acquire(rwlock, timeout) != 0U) { | ||
| if (timeout <= 0 || write_lock_acquire(rwlock, timeout) != 0U) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_timedwrlock.html
Under no circumstances shall the function fail with a timeout if the lock can be acquired immediately. The validity of the abstime parameter need not be checked if the lock can be immediately acquired.
|
Ok, on the closer inspection, at least some of the changes included in this PR aren't correct. I checked a couple of patched functions here, and POSIX gives clauses for them like (http://pubs.opengroup.org/onlinepubs/9699919799/functions/mq_send.html):
The moral of the story: never put unrelated changes in one commit. (And the criteria of unrelatedness is simple: can a change be separated out? If yes, it's unrelated and should be separated. Yeah, we're far from that in Zephyr, so going to keep introducing regressions.) What I'm going to do is to look at the original issue, #17812, and the minimal set of changes required to fix it. |
Funnily enough, there's no such a clause in the description of pthread_cond_timedwait(): |
The timeout parameter is absolute, not relative. Addresses #17812
This is based on the implementation in esp-idf, with an inlined
timercmp/timersubsince we don't have those.