Improving resolution and usability of controlled delays in kernel APIs

This issue outlines the current approach to *controlled delays* in Zephyr, identifies some weaknesses and functional gaps, and serves as a point of discussion of goals and requirements for changes to address those weaknesses.

## Existing Approach

Several kernel system APIs allow an application to specify a delay before some operation is initiated or terminated.  These APIs include (delay parameters are in bold type):
* `k_timer_start`(timer, **duration**, **period**)
* `k_timer_remaining_get`(timer) => **remaining** (nb: unsigned)
* `k_queue_get`queue, **timeout**) (underlying `k_fifo_get`, `k_lifo_get`)
* `k_futex_wait`(futex, expected, **timeout**)
* `k_stack_pop`(stack, data, **timeout**)
* `k_delayed_work_submit_to_queue`(queue, work, **delay**)
* `k_delayed_work_remaining_get`(work) => **remaining** (nb: signed)
* `k_mutex_lock`(mutex, **timeout**)
* `k_sem_take`(sem, **timeout**)
* `k_{msgq,mbox,pipe}_[block_]{put,get}`(store, data, **timeout**)
* `k_mem_{slab,pool}_alloc`(slab, mem, **timeout**)
* `k_poll`(events, num, **timeout**)
* `k_sleep`(**ms**)
* `k_usleep`(**us**)
* `k_busy_wait`(**us**)
* `k_thread_deadline_set`(thread, **deadline**)
* `k_uptime_get`() and `k_uptime_delta`(**&ms**)

In current Zephyr most controlled delays are specified as a signed 32-bit integer counting milliseconds, with helper macros like `K_MINUTES(d)` to convert from more coarse-grained measures.  Exceptions are:
* `k_usleep` and `k_busywait` which operate in microseconds;
* `k_thread_deadline_set` which operates in cycles of the hardware clock;
* `k_uptime_get` which operates in `s64_t` milliseconds clamped to tick increments.

Internally most delays are implemented through `struct _timeout` which operates on `ticks` as defined by `SYS_CLOCK_TICKS_PER_SEC`.  The requested delay is converted to the smallest span of ticks that is not less than the requested delay, except that a duration of zero may be converted to a single tick in some cases.  An exception is `k_busy_wait` under the influence of `ARCH_HAS_CUSTOM_BUSY_WAIT`, currently used in-tree only by Nordic.

## Functional Gaps

For all APIs but specifically `k_timer` interrupts between the point the application supplies the relative delay and the point the timer infrastructure inserts it into a processing queue introduces complexity in precise delay maintenance.  To reduce these complexities it is desirable in some cases to specify delays as deadlines rather than relative offsets (#2811).

Use of milliseconds was tolerable as the tick duration has historically been 10 ms.  With the upcoming merge of #16782 decreasing tick duration to 100 us finer grained specification will soon be needed for most if not all APIs.  Arguments have been made to go as fine as nanoseconds (#6498).

## Base Requirements and Questions

In a recent telecon @andyross proposed addressing these gaps by changing the way delays are specified, from signed 32-bit millisecond counts to another representation.

The following are positions and questions which I (@pabigot) have summarized and extended based on previous related discussions and experience.  All are open for debate.

* Existing code like `k_sem_take(&sem, K_MSEC(5))`--code that uses helper macros to translate delay durations to a timeout value--must be unaffected by any underlying API changes.
* It must be possible to specify timer delays as either relative or absolute delays.
* It is TBD whether other delays such as `k_poll` or `k_sleep` should allow for absolute deadlines.
* It is desirable for the application to be able to retrieve the absolute deadline assigned by the infrastructure when given a relative delay.  Note that returning time remaining does *not* satisfy this desire.
* Precise handling of the order of completion for delays with the same deadline should be defined.  The current implementation is inconsistent when timeouts are scheduled during callbacks or with deadlines that have passed (#12332).  The problem will be exacerbated with the ability to schedule at absolute deadlines (which may have passed).
* We should revisit the semantics of passing a relative deadline that has a non-positive value: in some cases converting it to an absolute deadline in the past may be preferable to the existing practice of treating it the same as zero in all cases.
* We need to determine the finest resolution that must be supported. Microseconds would handle many cases, but going to nanoseconds may be worth doing for future-proofing and because it has been requested in the past.
* We need to determine the maximum relative delay that must be expressible.  With `s32_t` milliseconds that's currently 2147483.647 s (about 3.5 weeks).  As resolution is increased this delay is reduced significantly unless larger data types are used.
* We need to review the clock domain used to control these delays, in particular its resolution and span.  Currently this is the 64-bit system tick clock.  At this time the only public API to read this clock is `k_uptime_get*` which converts its scale to milliseconds.  New API should be added to access the full precision for use in maintaining absolute deadlines.
* We might consider using POSIX [clock_gettime()](http://pubs.opengroup.org/onlinepubs/009695399/functions/clock_getres.html) as a model for reading clocks, and whether there's value in specifying delays as being expressed for a specific clock.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving resolution and usability of controlled delays in kernel APIs #17162

Existing Approach

Functional Gaps

Base Requirements and Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improving resolution and usability of controlled delays in kernel APIs #17162

Description

Existing Approach

Functional Gaps

Base Requirements and Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions