Skip to content

Conversation

@jukkar
Copy link
Member

@jukkar jukkar commented Jun 14, 2018

This is related to similar changes in #8377. The networking subsystem should not be accessed from preemptive thread so try to check this at compile time. Later we might need to add run-time checks but lets start with this atm.

jukkar added 4 commits June 14, 2018 17:38
If user has set CONFIG_COOP_ENABLED, then set the main thread
default priority to -1. This is needed in later patches to
enforce this setting for networking code which requires that
CONFIG_COOP_ENABLED=y and CONFIG_MAIN_THREAD_PRIORITY < 0.

Signed-off-by: Jukka Rissanen <[email protected]>
The networking subsystem does not work properly if running in
preemptive priority. So make sure that system workqueue priority
is set <0 which means cooperative priority.

Signed-off-by: Jukka Rissanen <[email protected]>
The networking subsystem requires cooperative priorities to be
enabled so make sure CONFIG_COOP_ENABLED is set.

Signed-off-by: Jukka Rissanen <[email protected]>
Make sure that networking subsystem cannot be compiled unless
CONFIG_MAIN_THREAD_PRIORITY < 0. This will avoid hard to track
bugs as network subsystem functions should not be called from
preemptive thread.

Signed-off-by: Jukka Rissanen <[email protected]>
* and that the main thread priority is negative (cooperative).
*/
BUILD_ASSERT(CONFIG_MAIN_THREAD_PRIORITY < 0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid configurations should be detected during configuration instead
of during compilation whenever possible.

I believe that adding the below at an appropriate net Kconfig location would achieve this.

# The networking subsystem requires the main thread to execute at
# a cooperative priority.
config MAIN_THREAD_PRIORITY
	range -256 -1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that this is going to do much anyway. The fact that the main thread is or is not preemptible says nothing about users of the network APIs which may or may not be running in the main thread. Not, for example, that ztest always spawns a new thread for every test, so none of the existing tests would be caught by this check if they happened to be hitting the network out of a preemptible thread.

I'd suggest a runtime __ASSERT(k_thread_prioroity_get(k_current_get() < 0) at the relevant API entry points.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that adding the below at an appropriate net Kconfig location would achieve this.
config MAIN_THREAD_PRIORITY
range -256 -1

This will give error if the default MAIN_THREAD_PRIORITY is 0. In which case user needs to add the proper value into prj.conf file. This PR tries to avoid this.

Copy link
Contributor

@SebastianBoe SebastianBoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback inline.

select NET_BUF
select POLL
select ENTROPY_GENERATOR
select COOP_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this actually does anything useful.

You can test this by trying to come up with a situation where this select prevents an invalid configuration from occuring.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real complaints about the checking (except the note that checking only the main thread is insufficient).

But... why can't the network API be used in a preemptable context? I can understand doing tricks with the internal threads that rely on cooperative scheduling, but forcing every user to forego timeslicing and take a worst-case latency hit[1] just to use networking seems like a bug worth fixing.

[1] High priority threads need to wait for low priority ones to block. That's sometimes a good thing for latency if you're really careful, but in general it means that service for woken-up high priority threads will be delayed.

* and that the main thread priority is negative (cooperative).
*/
BUILD_ASSERT(CONFIG_MAIN_THREAD_PRIORITY < 0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that this is going to do much anyway. The fact that the main thread is or is not preemptible says nothing about users of the network APIs which may or may not be running in the main thread. Not, for example, that ztest always spawns a new thread for every test, so none of the existing tests would be caught by this check if they happened to be hitting the network out of a preemptible thread.

I'd suggest a runtime __ASSERT(k_thread_prioroity_get(k_current_get() < 0) at the relevant API entry points.

@andyross
Copy link
Contributor

FWIW: note that if you really want to disable preemption inside the network API hooks, you can use k_sched_lock() for that, which effectively elevates the current thread to cooperative status until released.

@nashif
Copy link
Member

nashif commented Jun 14, 2018

But... why can't the network API be used in a preemptable context?

I think this is due to legacy from the times of the nanokernel/microkernel architecture where we had everything run in the nanokernel as fibers. The same pattern applies to bluetooth. Not sure if there is anything requirements to be running those threads in coop mode.

@jukkar
Copy link
Member Author

jukkar commented Jun 15, 2018

Currently the networking code expects not to be preempted unless it calls k_yield() or waits fifo etc. This requirement is coming from times when we had micro/nano kernel as @nashif mentioned. Lately there has been some reports like #8131 where system either crashes or start to behave weirdly if called from thread with preemptive priority.

This PR is kind of useless in a way, as user can set the priority of the application thread as he wishes and we do not call k_sched_lock() anywhere in the code to overcome this. So we could abandon this PR and start to add calls to k_sched_lock() in places. I can investigate how big task it would be to do this to existing code.

In near future we will have a clear separation between user / kernel space for networking APIs. The plan is to have BSD socket interface to separate user / kernel space. In this case we could simply add k_sched_lock() calls to socket interface calls which would solve the coop / preempt issues.

I created this PR to help to mitigate immediate issues but perhaps it is not worth having these kind of workarounds if we have a proper fix coming soonish. Any opinions about this?

Copy link
Contributor

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should patch-enforce main thread priority like that. This would be a good patch for debugging with "[RFC][DNM]" (and indeed, I've been using it for few weeks), but I can't imagine this to be merged in the mainline. We should do it better. (k_sched_lock() is an obvious choice which was mentioned.)

@jukkar jukkar added the DNM This PR should not be merged (Do Not Merge) label Jun 15, 2018
Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is definitely real, but this patch won't really fix it, as @andyross says there's no guarantee that the APIs are being called from the main thread or the system workqueue.

@nashif
Copy link
Member

nashif commented Jun 28, 2018

The plan is to have BSD socket interface to separate user / kernel space.

This is going to only work for a portion of all possible configurations, do not assume this is going to be the norm, userspace is an option and enabling it is hardware dependent.

@nashif
Copy link
Member

nashif commented Jun 28, 2018

@jukkar did you consider moving to a preemptive model and deprecate the legacy behavior?

@jukkar
Copy link
Member Author

jukkar commented Jun 28, 2018

did you consider moving to a preemptive model and deprecate the legacy behavior?

This needs more investigation as it means implementing locking in the IP stack. We have some global resources in IP stack that need to be protected by locks.

@jukkar
Copy link
Member Author

jukkar commented Jul 26, 2018

This was failed experiment so closing it.

@jukkar jukkar closed this Jul 26, 2018
@jukkar jukkar deleted the net-priorities branch July 26, 2018 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Networking DNM This PR should not be merged (Do Not Merge)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants