-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
The way _Swap() is implemented on ARM is nonatomic, which while not incorrect has turned out to be very surprising. It's done with a PendSV exception whose priority sits below that of hardware interrupts, so it's possible for a process to decide to context switch based on atomic state that then gets changed under the interrupt handler before the context switch actually happens.
This trick has resulted in three moderatly excruciating bug hunts so far (c.f. commits 41070c3 and 6c95daf and the current work submitted in PR #12448). In all honestly I doubt that's going to be the last of them.
There's a framework now which will help keep the workarounds (which so far haven't been complicated) tidy, which should help some.
But basically: how wedded are we to this architecture? How hard would it be and what would it break to set the PendSV exception to the maximum interrupt priority such that it can't be interrupted like this? That would match ARM's behavior to that of other systems (who do context switching with more typical musical-chairs register swaps and not in an exception handler), albeit at the cost of higher worst case latencies for high priority interrupts.
IMHO it would make for better reliability on ARM in the long term, and certainly would make my life easier.