Skip to content

Commit 60ef541

Browse files
author
Alexei Starovoitov
committed
Merge branch 'bpf-replace-wq-users-and-add-wq_percpu-to-alloc_workqueue-users'
Marco Crivellari says: ==================== Below is a summary of a discussion about the Workqueue API and cpu isolation considerations. Details and more information are available here: "workqueue: Always use wq_select_unbound_cpu() for WORK_CPU_UNBOUND." https://lore.kernel.org/all/[email protected]/ === Current situation: problems === Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected. This leads to different scenarios if a work item is scheduled on an isolated CPU where "delay" value is 0 or greater then 0: schedule_delayed_work(, 0); This will be handled by __queue_work() that will queue the work item on the current local (isolated) CPU, while: schedule_delayed_work(, 1); Will move the timer on an housekeeping CPU, and schedule the work there. Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. === Plan and future plans === This patchset is the first stone on a refactoring needed in order to address the points aforementioned; it will have a positive impact also on the cpu isolation, in the long term, moving away percpu workqueue in favor to an unbound model. These are the main steps: 1) API refactoring (that this patch is introducing) - Make more clear and uniform the system wq names, both per-cpu and unbound. This to avoid any possible confusion on what should be used. - Introduction of WQ_PERCPU: this flag is the complement of WQ_UNBOUND, introduced in this patchset and used on all the callers that are not currently using WQ_UNBOUND. WQ_UNBOUND will be removed in a future release cycle. Most users don't need to be per-cpu, because they don't have locality requirements, because of that, a next future step will be make "unbound" the default behavior. 2) Check who really needs to be per-cpu - Remove the WQ_PERCPU flag when is not strictly required. 3) Add a new API (prefer local cpu) - There are users that don't require a local execution, like mentioned above; despite that, local execution yeld to performance gain. This new API will prefer the local execution, without requiring it. === Introduced Changes by this series === 1) [P 1-2] Replace use of system_wq and system_unbound_wq system_wq is a per-CPU workqueue, but his name is not clear. system_unbound_wq is to be used when locality is not required. Because of that, system_wq has been renamed in system_percpu_wq, and system_unbound_wq has been renamed in system_dfl_wq. 2) [P 3] add WQ_PERCPU to remaining alloc_workqueue() users Every alloc_workqueue() caller should use one among WQ_PERCPU or WQ_UNBOUND. This is actually enforced warning if both or none of them are present at the same time. WQ_UNBOUND will be removed in a next release cycle. === For Maintainers === There are prerequisites for this series, already merged in the master branch. The commits are: 128ea9f ("workqueue: Add system_percpu_wq and system_dfl_wq") 930c2ea ("workqueue: Add new WQ_PERCPU flag") ==================== Acked-by: Tejun Heo <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents 93a83d0 + a857210 commit 60ef541

File tree

5 files changed

+8
-7
lines changed

5 files changed

+8
-7
lines changed

kernel/bpf/cgroup.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,15 @@ EXPORT_SYMBOL(cgroup_bpf_enabled_key);
2727
/*
2828
* cgroup bpf destruction makes heavy use of work items and there can be a lot
2929
* of concurrent destructions. Use a separate workqueue so that cgroup bpf
30-
* destruction work items don't end up filling up max_active of system_wq
30+
* destruction work items don't end up filling up max_active of system_percpu_wq
3131
* which may lead to deadlock.
3232
*/
3333
static struct workqueue_struct *cgroup_bpf_destroy_wq;
3434

3535
static int __init cgroup_bpf_wq_init(void)
3636
{
37-
cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy", 0, 1);
37+
cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy",
38+
WQ_PERCPU, 1);
3839
if (!cgroup_bpf_destroy_wq)
3940
panic("Failed to alloc workqueue for cgroup bpf destroy.\n");
4041
return 0;

kernel/bpf/cpumap.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -550,7 +550,7 @@ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
550550
old_rcpu = unrcu_pointer(xchg(&cmap->cpu_map[key_cpu], RCU_INITIALIZER(rcpu)));
551551
if (old_rcpu) {
552552
INIT_RCU_WORK(&old_rcpu->free_work, __cpu_map_entry_free);
553-
queue_rcu_work(system_wq, &old_rcpu->free_work);
553+
queue_rcu_work(system_percpu_wq, &old_rcpu->free_work);
554554
}
555555
}
556556

kernel/bpf/helpers.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1594,7 +1594,7 @@ void bpf_timer_cancel_and_free(void *val)
15941594
* timer callback.
15951595
*/
15961596
if (this_cpu_read(hrtimer_running)) {
1597-
queue_work(system_unbound_wq, &t->cb.delete_work);
1597+
queue_work(system_dfl_wq, &t->cb.delete_work);
15981598
return;
15991599
}
16001600

@@ -1607,7 +1607,7 @@ void bpf_timer_cancel_and_free(void *val)
16071607
if (hrtimer_try_to_cancel(&t->timer) >= 0)
16081608
kfree_rcu(t, cb.rcu);
16091609
else
1610-
queue_work(system_unbound_wq, &t->cb.delete_work);
1610+
queue_work(system_dfl_wq, &t->cb.delete_work);
16111611
} else {
16121612
bpf_timer_delete_work(&t->cb.delete_work);
16131613
}

kernel/bpf/memalloc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -736,7 +736,7 @@ static void destroy_mem_alloc(struct bpf_mem_alloc *ma, int rcu_in_progress)
736736
/* Defer barriers into worker to let the rest of map memory to be freed */
737737
memset(ma, 0, sizeof(*ma));
738738
INIT_WORK(&copy->work, free_mem_alloc_deferred);
739-
queue_work(system_unbound_wq, &copy->work);
739+
queue_work(system_dfl_wq, &copy->work);
740740
}
741741

742742
void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)

kernel/bpf/syscall.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -905,7 +905,7 @@ static void bpf_map_free_in_work(struct bpf_map *map)
905905
/* Avoid spawning kworkers, since they all might contend
906906
* for the same mutex like slab_mutex.
907907
*/
908-
queue_work(system_unbound_wq, &map->work);
908+
queue_work(system_dfl_wq, &map->work);
909909
}
910910

911911
static void bpf_map_free_rcu_gp(struct rcu_head *rcu)

0 commit comments

Comments
 (0)