-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Enforce COOP priorities for networking subsystem #8386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If user has set CONFIG_COOP_ENABLED, then set the main thread default priority to -1. This is needed in later patches to enforce this setting for networking code which requires that CONFIG_COOP_ENABLED=y and CONFIG_MAIN_THREAD_PRIORITY < 0. Signed-off-by: Jukka Rissanen <[email protected]>
The networking subsystem does not work properly if running in preemptive priority. So make sure that system workqueue priority is set <0 which means cooperative priority. Signed-off-by: Jukka Rissanen <[email protected]>
The networking subsystem requires cooperative priorities to be enabled so make sure CONFIG_COOP_ENABLED is set. Signed-off-by: Jukka Rissanen <[email protected]>
Make sure that networking subsystem cannot be compiled unless CONFIG_MAIN_THREAD_PRIORITY < 0. This will avoid hard to track bugs as network subsystem functions should not be called from preemptive thread. Signed-off-by: Jukka Rissanen <[email protected]>
| * and that the main thread priority is negative (cooperative). | ||
| */ | ||
| BUILD_ASSERT(CONFIG_MAIN_THREAD_PRIORITY < 0); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid configurations should be detected during configuration instead
of during compilation whenever possible.
I believe that adding the below at an appropriate net Kconfig location would achieve this.
# The networking subsystem requires the main thread to execute at
# a cooperative priority.
config MAIN_THREAD_PRIORITY
range -256 -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that this is going to do much anyway. The fact that the main thread is or is not preemptible says nothing about users of the network APIs which may or may not be running in the main thread. Not, for example, that ztest always spawns a new thread for every test, so none of the existing tests would be caught by this check if they happened to be hitting the network out of a preemptible thread.
I'd suggest a runtime __ASSERT(k_thread_prioroity_get(k_current_get() < 0) at the relevant API entry points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that adding the below at an appropriate net Kconfig location would achieve this.
config MAIN_THREAD_PRIORITY
range -256 -1
This will give error if the default MAIN_THREAD_PRIORITY is 0. In which case user needs to add the proper value into prj.conf file. This PR tries to avoid this.
SebastianBoe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback inline.
| select NET_BUF | ||
| select POLL | ||
| select ENTROPY_GENERATOR | ||
| select COOP_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this actually does anything useful.
You can test this by trying to come up with a situation where this select prevents an invalid configuration from occuring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real complaints about the checking (except the note that checking only the main thread is insufficient).
But... why can't the network API be used in a preemptable context? I can understand doing tricks with the internal threads that rely on cooperative scheduling, but forcing every user to forego timeslicing and take a worst-case latency hit[1] just to use networking seems like a bug worth fixing.
[1] High priority threads need to wait for low priority ones to block. That's sometimes a good thing for latency if you're really careful, but in general it means that service for woken-up high priority threads will be delayed.
| * and that the main thread priority is negative (cooperative). | ||
| */ | ||
| BUILD_ASSERT(CONFIG_MAIN_THREAD_PRIORITY < 0); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that this is going to do much anyway. The fact that the main thread is or is not preemptible says nothing about users of the network APIs which may or may not be running in the main thread. Not, for example, that ztest always spawns a new thread for every test, so none of the existing tests would be caught by this check if they happened to be hitting the network out of a preemptible thread.
I'd suggest a runtime __ASSERT(k_thread_prioroity_get(k_current_get() < 0) at the relevant API entry points.
|
FWIW: note that if you really want to disable preemption inside the network API hooks, you can use k_sched_lock() for that, which effectively elevates the current thread to cooperative status until released. |
I think this is due to legacy from the times of the nanokernel/microkernel architecture where we had everything run in the nanokernel as fibers. The same pattern applies to bluetooth. Not sure if there is anything requirements to be running those threads in coop mode. |
|
Currently the networking code expects not to be preempted unless it calls k_yield() or waits fifo etc. This requirement is coming from times when we had micro/nano kernel as @nashif mentioned. Lately there has been some reports like #8131 where system either crashes or start to behave weirdly if called from thread with preemptive priority. This PR is kind of useless in a way, as user can set the priority of the application thread as he wishes and we do not call k_sched_lock() anywhere in the code to overcome this. So we could abandon this PR and start to add calls to k_sched_lock() in places. I can investigate how big task it would be to do this to existing code. In near future we will have a clear separation between user / kernel space for networking APIs. The plan is to have BSD socket interface to separate user / kernel space. In this case we could simply add k_sched_lock() calls to socket interface calls which would solve the coop / preempt issues. I created this PR to help to mitigate immediate issues but perhaps it is not worth having these kind of workarounds if we have a proper fix coming soonish. Any opinions about this? |
pfalcon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we should patch-enforce main thread priority like that. This would be a good patch for debugging with "[RFC][DNM]" (and indeed, I've been using it for few weeks), but I can't imagine this to be merged in the mainline. We should do it better. (k_sched_lock() is an obvious choice which was mentioned.)
andrewboie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is definitely real, but this patch won't really fix it, as @andyross says there's no guarantee that the APIs are being called from the main thread or the system workqueue.
This is going to only work for a portion of all possible configurations, do not assume this is going to be the norm, userspace is an option and enabling it is hardware dependent. |
|
@jukkar did you consider moving to a preemptive model and deprecate the legacy behavior? |
This needs more investigation as it means implementing locking in the IP stack. We have some global resources in IP stack that need to be protected by locks. |
|
This was failed experiment so closing it. |
This is related to similar changes in #8377. The networking subsystem should not be accessed from preemptive thread so try to check this at compile time. Later we might need to add run-time checks but lets start with this atm.