Skip to content

Commit eb28342

Browse files
Lai Jiangshanhtejun
authored andcommitted
workqueue: fix possible pool stall bug in wq_unbind_fn()
Since multiple pools per cpu have been introduced, wq_unbind_fn() has a subtle bug which may theoretically stall work item processing. The problem is two-fold. * wq_unbind_fn() depends on the worker executing wq_unbind_fn() itself to start unbound chain execution, which works fine when there was only single pool. With multiple pools, only the pool which is running wq_unbind_fn() - the highpri one - is guaranteed to have such kick-off. The other pool could stall when its busy workers block. * The current code is setting WORKER_UNBIND / POOL_DISASSOCIATED of the two pools in succession without initiating work execution inbetween. Because setting the flags requires grabbing assoc_mutex which is held while new workers are created, this could lead to stalls if a pool's manager is waiting for the previous pool's work items to release memory. This is almost purely theoretical tho. Update wq_unbind_fn() such that it sets WORKER_UNBIND / POOL_DISASSOCIATED, goes over schedule() and explicitly kicks off execution for a pool and then moves on to the next one. tj: Updated comments and description. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected]
1 parent 6dbe51c commit eb28342

File tree

1 file changed

+25
-19
lines changed

1 file changed

+25
-19
lines changed

kernel/workqueue.c

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3446,28 +3446,34 @@ static void wq_unbind_fn(struct work_struct *work)
34463446

34473447
spin_unlock_irq(&pool->lock);
34483448
mutex_unlock(&pool->assoc_mutex);
3449-
}
34503449

3451-
/*
3452-
* Call schedule() so that we cross rq->lock and thus can guarantee
3453-
* sched callbacks see the %WORKER_UNBOUND flag. This is necessary
3454-
* as scheduler callbacks may be invoked from other cpus.
3455-
*/
3456-
schedule();
3450+
/*
3451+
* Call schedule() so that we cross rq->lock and thus can
3452+
* guarantee sched callbacks see the %WORKER_UNBOUND flag.
3453+
* This is necessary as scheduler callbacks may be invoked
3454+
* from other cpus.
3455+
*/
3456+
schedule();
34573457

3458-
/*
3459-
* Sched callbacks are disabled now. Zap nr_running. After this,
3460-
* nr_running stays zero and need_more_worker() and keep_working()
3461-
* are always true as long as the worklist is not empty. Pools on
3462-
* @cpu now behave as unbound (in terms of concurrency management)
3463-
* pools which are served by workers tied to the CPU.
3464-
*
3465-
* On return from this function, the current worker would trigger
3466-
* unbound chain execution of pending work items if other workers
3467-
* didn't already.
3468-
*/
3469-
for_each_std_worker_pool(pool, cpu)
3458+
/*
3459+
* Sched callbacks are disabled now. Zap nr_running.
3460+
* After this, nr_running stays zero and need_more_worker()
3461+
* and keep_working() are always true as long as the
3462+
* worklist is not empty. This pool now behaves as an
3463+
* unbound (in terms of concurrency management) pool which
3464+
* are served by workers tied to the pool.
3465+
*/
34703466
atomic_set(&pool->nr_running, 0);
3467+
3468+
/*
3469+
* With concurrency management just turned off, a busy
3470+
* worker blocking could lead to lengthy stalls. Kick off
3471+
* unbound chain execution of currently pending work items.
3472+
*/
3473+
spin_lock_irq(&pool->lock);
3474+
wake_up_worker(pool);
3475+
spin_unlock_irq(&pool->lock);
3476+
}
34713477
}
34723478

34733479
/*

0 commit comments

Comments
 (0)