-
Notifications
You must be signed in to change notification settings - Fork 8.2k
arch: arm: cortex-m: Reduce ZLI latency by not disabling them in wrapper #91078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| unsigned int key; | ||
|
|
||
| /* irq_lock() does what we want for this CPU */ | ||
| /* Disable all interrupts except ZLIs. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_arch_isr_direct_pm() could be called in ZLI and non-ZLI direct ISRs. A race will lead to pm_system_resume() not being called when it should be or corrupt the data accessed inside the function call, right?
On ARMv6, irq_lock() is nothing but disable all interrupts. On ARMv7, cpsid i needs to be used to disable all interrupts
Zephyr Controller calls ISR_DIRECT_PM() here:
| ISR_DIRECT_PM(); |
Does this PR mean, no ZLI ISR call ISR_DIRECT_PM() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ZLIs must not call any zephyr APIs since it breaks synchronisation :) pm_system_resume() is a zephyr API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be pedantic: Zephyr APIs are not guaranteed to work in a ZLI, and specifically as you point out anything that relies on synchronization is guaranteed not to work (or not to work reliably). But there's lots of good stuff in the Zephyr API that you can use as long as you know what you're doing. Like, Best Practice for communication with a ZLI almost certainly involves the Zephyr atomics API. We've just been too lazy/conservative to try to document where the boundary is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, IPC style communication with software IRQs like nordics SWI and shared memory area I think is the only truly safe way of syncing between ZLI and the zephyr kernel :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have never been able to call _arch_isr_direct_pm or pm_system_resume() safely from a ZLI. Starting with the fact that they read and modify kernel structures like _kernel.idle, this was never possible (at least safely). ZLIs can preempt the kernel any time so modifying kernel structures is a no-no. Remember: even though all interrupts were disabled prior to modifying kernel structs and calling pm_system_resume(), it was already too late! The ZLI could have preempted the kernel during an irq_lock() or sched_lock() execution, so you already had a race, regardless of global disabling of interrupts. Then if we go into pm_system_resume() it's even worse: tons of variables are being read and set there. In summary: it does not matter if you disable global interrupts or not, you can never call code from a ZLI that relies on irq_lock() or sched_lock() to prevent races.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nordic-krch agreed! Could you follow-up with #17301 so I can add the assert after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an assert to ensure that ZLI interrupt does not call things that should not be called in that context. For that we would need API for getting current priority. It was proposed long time ago but did not get in #17301
@nordic-krch Could you expand on that? asserts themselves are a zephyr API, so ZLIs are not even allowed to call those, how would you enforce this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that is an issue we can add ZLI_ASSERT that will do while(1) or reset. The thing is that it will be fail immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does present an issue though, ZLIs are meant to be "above" the zephyr kernel, a place where a user can do "whatever they want", so such a macro would have to be application specific I would think right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if it's just a loop you could simply disable interrupts and spin. This should work for all applications in principle? Then perhaps it should be overridable.
arch/arm/core/cortex_m/irq_manage.c
Outdated
|
|
||
| /* irq_lock() does what we want for this CPU */ | ||
| /* Disable all interrupts except ZLIs. */ | ||
| key = irq_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter in context, but while you're here it's probably best to swap this for arch_irq_lock(), which is the same thing on any uniprocessor architecture. But on SMP, "irq_lock()" is actually a complicated global spinlock and "arch_irq_lock()" is the call that locally masks interrupts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, did not know that. Will make the change.
andyross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely sounds correct to me, modulo the inevitable app somewhere that turns out to actually care about this behavior (because of bad assumptions about interrupt priority, maybe). Even then we should probably have the fight, as this wasn't ever really justified.
The difference between __irq_disable() and irq_lock() is that the former essentially translates to `cpsid i`, whereas `irq_lock()` translates to setting BASEPRI (on cores with BASEPRI). This means that using irq_lock() does not disable zero-latency interrupts (ZLIs), which reduces the potential execution latency of ZLIs. In both isr_wrapper and _arch_isr_direct_pm() (which is just really an implementation of ISR_DIRECT_PM()), we were using __irq_disable() to disable all interrups, including ZLIs. But the code executed with interrupts disabled handles waking up from idle, and so must only be protected against regular interrupts being executed, not ZLIs, which should have no effect on the correct execution of the code. Signed-off-by: Carles Cufi <[email protected]>
d0347e0
|
@andyross switched to |
|
Hope the maintainer reviews use of ISR_DIRECT_PM in the repo, and reviews the change introduce by moving away from use of __irq_disable to using arch_irq_lock . Please also update documentation of ISR_DIRECT_PM and its documentation references like for instance here:
zephyr/include/zephyr/arch/arm/irq.h
Lines 224 to 232 in 6559c7f
| * The major differences with regular Zephyr interrupts are the following: | |
| * - Similar to direct interrupts, the call into the OS to exit power | |
| * management idle state is optional. Normal interrupts always do this | |
| * before the ISR is run, but with dynamic direct ones when and if it runs | |
| * is controlled by the placement of | |
| * a ISR_DIRECT_PM() macro, or omitted entirely. | |
| * - Similar to direct interrupts, scheduling decisions are optional. Unlike | |
| * direct interrupts, the decisions must be made at build time. | |
| * They are controlled by @param resch to this macro. |
I think the maintainers have already reviewed it. Regarding the doxygen update you requested, I did that in a separate PR: |
|
Related Zephyr Controller PR #91119. |
wearyzen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much experience with ZLI but the PR and discussions make sense so LGTM.



The difference between
__irq_disable()andirq_lock()is that the former essentially translates tocpsid i, whereasirq_lock()translates to settingBASEPRI(on cores withBASEPRI). This means that usingirq_lock()does not disable zero-latency interrupts (ZLIs), which reduces the potential execution latency of ZLIs.In both
isr_wrapper()and_arch_isr_direct_pm()(which is just really an implementation ofISR_DIRECT_PM()), we were using__irq_disable()to disable all interrupts, including ZLIs. But the code executed with interrupts disabled handles waking up from idle, and so must only be protected against regular interrupts being executed, not ZLIs, which should have no effect on the correct execution of the code.