Skip to content

Conversation

@jimparis
Copy link
Contributor

The timeout parameter is absolute, not relative. Addresses #17812

This is based on the implementation in esp-idf, with an inlined timercmp/timersub since we don't have those.

@jimparis jimparis requested a review from pfalcon as a code owner July 26, 2019 19:18
@zephyrbot zephyrbot added the area: POSIX POSIX API Library label Jul 26, 2019
@zephyrbot
Copy link

zephyrbot commented Jul 26, 2019

All checks are passing now.

Review history of this comment for details about previous failed status.
Note that some checks might have not completed yet.

@jimparis
Copy link
Contributor Author

FYI the checkpatch warning about positive errno is incorrect (we're following posix spec here, not zephyr convention)

@ioannisg ioannisg requested a review from andyross July 29, 2019 18:22
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's clearly a real bug. Designwise, it strikes me that it would be a lot simpler to just convert the abstime parameter into a millisecond value, check vs. k_uptime_get() and do a subtraction instead of all that modular microseconds stuff.

Which would then need to be merged with the equivalent code in clock_gettime() which is doing the same thing.

Also check some of the new code in #17155, which adds features like absolute-valued kernel timeouts which speak to exactly this problem.

None of that is a reason not to merge this, but looking at this code this is definitely an area we're going to want to come back to and simplify soon.

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimparis, Thanks for the patch! May I ask you to try to rework it to avoid extra timeval/timespec conversion and to make code "more clear":

  1. Please use clock_gettime().
  2. Please compute struct timespec difference once, then branch on it being negative or zero (if you strongly feel than the current way is better, please elaborate why).

@jimparis jimparis force-pushed the jim-posix branch 2 times, most recently from 43cad8b to 22228e3 Compare August 9, 2019 16:14
@zephyrbot zephyrbot added the area: API Changes to public APIs label Aug 9, 2019
pthread_cond_timedwait and pthread_mutex_timedlock were treating the
timeout as relative, but it's absolute.  Use existing helper to
convert.  Rename argument to clarify.

Rename timespec_to_timeoutms -> _timespec_to_timeoutms and include in
posix/time.h so that files don't need to declare the prototype
themselves.

Ensure that expired timeouts return ETIMEDOUT in
pthread_rwlock_timedrdlock, pthread_rwlock_timedwrlock, mq_timedsend,
and mq_timedreceive.

Signed-off-by: Jim Paris <[email protected]>
@jimparis
Copy link
Contributor Author

jimparis commented Aug 9, 2019

Rebased on master and rewritten. Turns out there was already a "timespec_to_timeoutms" function being used by mqueue.c and pthread_rwlock.c, although it's not clear that expired timeouts were handled correctly. Also, the relative/absolute bug was present in both pthread_cond_timedwait and pthread_mutex_timedlock. So I added a bunch of cleanups and fixes all around.

This needs closer reviews and testing. Of these changes I've only lightly tested the pthread_cond_timedwait, but I'm not going to have a lot of time to dive in this area. In general the POSIX compat stuff seems to need a lot of work.

@jimparis jimparis changed the title posix: Fix handling of timeout in pthread_cond_timedwait posix: Fix handling of timeouts Aug 9, 2019
@pfalcon pfalcon requested a review from andyross August 12, 2019 14:31
Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimparis, Thanks for the updates!

In general the POSIX compat stuff seems to need a lot of work.

That's absolutely true. Initial Pthreads-and-stuff implementation was hasted and contains atrocious bugs. On top of that, integration of POSIX subsys with the rest of Zephyr subsys'es left much to be desired, and was a subject of heated debate in #16626, #16621, #17353, etc.

So, cleaning up that mess on us, folks interested in real usage of POSIX subsys in Zephyr, not on those who did initial unreviewable code-drop. That's why I have to ask you to fix and obvious issue of mis-comparing truncated values as present in the old code, now that you touch it to underscore an internal func.

On the good news, for 2.0, initial, but noticeable chunk of POSIX subsys fixes is already in. And this could be a good addition to it. So, thanks again.

*/
int pthread_cond_timedwait(pthread_cond_t *cv, pthread_mutex_t *mut,
const struct timespec *to);
const struct timespec *abstime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic note: I personally (and I make this note as a POSIX subsys maintainer) consider that the patch made right way tries to minimal number of changes. Renaming argument name isn't necessary for the nature of this patch, ergo... (Not calling to rename it back, but please consider this point in mind for future changes.)

#define TIMER_ABSTIME 4
#endif

s64_t _timespec_to_timeoutms(const struct timespec *abstime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic note: not ideal place for this func (this is official POSIX header!), but as _ts_to_ms() below is already there, ok.

s32_t timeout;

timeout = (s32_t) timespec_to_timeoutms(abstime);
timeout = (s32_t) _timespec_to_timeoutms(abstime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I see that this is left from the original code, but this can't be right, it just truncates s64_t to lower 32 bits, and interprets it as signed. Please do proper <= check on s64_t value. Extra points for checking if positive s64_t value won't fit into positive range of s32_t timeout, and return EINVAL in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worse, turns out that _timespec_to_timeoutms never returns negative value, in case of negative diff it return K_NO_WAIT instead.

timeout = (s32_t) timespec_to_timeoutms(abstime);
timeout = (s32_t) _timespec_to_timeoutms(abstime);
if (timeout <= 0)
return ETIMEDOUT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you'd need to follow the codestyle: braces are mandatory in all cases, even for a single statement. (Here and below.)

@pfalcon
Copy link
Contributor

pfalcon commented Aug 12, 2019

This needs closer reviews and testing. Of these changes I've only lightly tested the pthread_cond_timedwait, but I'm not going to have a lot of time to dive in this area.

Oh, and yeah, we won't win this battle, unless we start to add testcases for these things. Not asking it to be part of this patch, just setting the stage right ;-).

@pfalcon pfalcon added this to the v2.0.0 milestone Aug 13, 2019
@pfalcon
Copy link
Contributor

pfalcon commented Aug 19, 2019

ping @jimparis

@jimparis
Copy link
Contributor Author

It may be a while before I have time to work more on this, so if someone else wants to pick it up that would be 👍. Regarding tests, I agree, although it seems silly to create our own. Maybe use Bionic's?

@pfalcon
Copy link
Contributor

pfalcon commented Aug 27, 2019

if someone else wants to pick it up that would be +1

@jimparis, Ok, I'll pick up this then. My plan is: submit a new PR with my changes; commits will still have you as git author, I'll just add my sign-off to the commit message body. Let me know if you have any concerns.

@pfalcon
Copy link
Contributor

pfalcon commented Aug 27, 2019

Regarding tests, I agree, although it seems silly to create our own. Maybe use Bionic's?

That would seem like a good idea, but someone who'd pursue it, would drown in a bureaucratic process of arguing it. Bionic's plus is that it's least Apache2, so no licensing bureaucracy, but it's C++, and brings in yet another test harness lib. So, +1 and good luck to anyone who'd do that ;-).

timeout = (s32_t) _timespec_to_timeoutms(abstime);

if (write_lock_acquire(rwlock, timeout) != 0U) {
if (timeout <= 0 || write_lock_acquire(rwlock, timeout) != 0U) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_timedwrlock.html

Under no circumstances shall the function fail with a timeout if the lock can be acquired immediately. The validity of the abstime parameter need not be checked if the lock can be immediately acquired.

@pfalcon
Copy link
Contributor

pfalcon commented Aug 27, 2019

Ok, on the closer inspection, at least some of the changes included in this PR aren't correct. I checked a couple of patched functions here, and POSIX gives clauses for them like (http://pubs.opengroup.org/onlinepubs/9699919799/functions/mq_send.html):

Under no circumstance shall the operation fail with a timeout if there is sufficient room in the queue to add the message immediately. The validity of the abstime parameter need not be checked when there is sufficient room in the queue.

The moral of the story: never put unrelated changes in one commit. (And the criteria of unrelatedness is simple: can a change be separated out? If yes, it's unrelated and should be separated. Yeah, we're far from that in Zephyr, so going to keep introducing regressions.)

What I'm going to do is to look at the original issue, #17812, and the minimal set of changes required to fix it.

@pfalcon
Copy link
Contributor

pfalcon commented Aug 27, 2019

Under no circumstance shall the operation fail with a timeout if there is sufficient room in the queue to add the message immediately. The validity of the abstime parameter need not be checked when there is sufficient room in the queue.

Funnily enough, there's no such a clause in the description of pthread_cond_timedwait():
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_timedwait.html . But instead, they have informative section on why "deadline" (i.e. absolute time) vs "timeout" model was used: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_timedwait.html#tag_16_420_08_02 . And it doesn't say "to purposely add confusing edge cases". So I think that it still makes sense to behave consistently with other functions re: handling already-passed deadlines. And that's in turn consistent with how timeout of 0 would be handled, were all these functions accepted a timeout: first, useful work should be tried to be done, only then timeout should be taken care of.

@pfalcon
Copy link
Contributor

pfalcon commented Aug 27, 2019

I now posted #18708 as an alternative, minimal implementation of a fix to #17812. As we're very close to the release deadline, let me close this PR, in which problems were identified, to help with release triaging.

@pfalcon pfalcon closed this Aug 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: API Changes to public APIs area: POSIX POSIX API Library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants