[LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue #271

pvts-mat · 2025-05-19T16:41:46Z

[LTS 8.8]
CVE-2023-3611
VULN-6560

Problem

https://www.cve.org/CVERecord?id=CVE-2023-3611

An out-of-bounds write vulnerability in the Linux kernel's net/sched: sch_qfq component can be exploited to achieve local privilege escalation. The qfq_change_agg() function in net/sched/sch_qfq.c allows an out-of-bounds write because lmax is updated according to packet sizes without bounds checks.

Applicability

The vulnerability applies to the sch_qfq module which is enabled in ciqlts8_8:

grep CONFIG_NET_SCH_QFQ= configs/*

configs/kernel-aarch64-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-aarch64.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-ppc64le-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-ppc64le.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-s390x-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-s390x-zfcpdump.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-s390x.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-x86_64-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-x86_64.config:CONFIG_NET_SCH_QFQ=m

Solution

The mainline fix is provided in the commit 3e33708. The official backport to Linux 4.19 (closest to ciqlts8_8's 4.18) is given in the commit ee3bc82. Note that it's different from the mainline fix, as it doesn't use the introduced in the meantime (6f22c11) QFQ_MAX_LMAX constant, inlining its definition instead

3e33708:

net/sched/sch_qfq.c:116:

#define QFQ_MAX_LMAX		(1UL << QFQ_MTU_SHIFT)

net/sched/sch_qfq.c:387:

if (lmax > QFQ_MAX_LMAX)
	return -EINVAL;

ee3bc82:

net/sched/sch_qfq.c:393:

if (lmax > (1UL << QFQ_MTU_SHIFT))
	return -EINVAL;

The already applied fix for Rocky LTS 8.6 gets around it by first picking 2536989 (as 6f22c11) which introduces QFQ_MAX_LMAX and then the mainline fix 3e33708 (as fe72210):

git log --oneline -n 2 fe72210071638a9c4cedfa4a09629cf284fd9631

fe7221007 net/sched: sch_qfq: account for stab overhead in qfq_enqueue
6f22c114d net/sched: sch_qfq: refactor parsing of netlink parameters

git log -n 1 fe72210071638a9c4cedfa4a09629cf284fd9631 | grep -A 1 upstream-diff

upstream-diff Cherry-pick is clean however QFQ_MAX_LMAX is undeclared so
a prereq commit was needed

For ciqlts8_8's fix the stable's branch backport ee3bc82 was used directly as it applies without any conflicts or code changes and provides a simpler solution while remaining no less official than the mainlne fix.

kABI check: passed

DEBUG=1 CVE=CVE-2023-3611 ./ninja.sh _kabi_checked__$(uname -m)--test--ciqlts8_8-CVE-2023-3611

[1/2] Check ABI of kernel [ciqlts8_8-CVE-2023-3611]
++ uname -m
+ python3 /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/check-kabi -k /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/Module.kabi_x86_64 -s vms/x86_64--build--ciqlts8_8/build_files/kernel-src-tree-ciqlts8_8-CVE-2023-3611/Module.symvers
kABI check passed
+ touch state/kernels/ciqlts8_8-CVE-2023-3611/x86_64/kabi_checked

Boot test: passed

boot-test.log

Kselftests: passed relative

Coverage

Specific tests were skipped which proved to be unreliable in the past.

android, bpf (except test_progs, test_xsk.sh, test_progs-no_alu32, test_kmod.sh, test_sockmap), breakpoints, capabilities, cgroup, core, cpu-hotplug, cpufreq, drivers/net/bonding, drivers/net/team, efivarfs, exec, firmware, fpu, ftrace, futex, gpio, intel_pstate, ipc, kcmp, kexec, kvm, lib, livepatch, membarrier, memfd, memory-hotplug, mount, mqueue, net/forwarding (except mirror_gre_vlan_bridge_1q.sh, sch_ets.sh, ipip_hier_gre_keys.sh, sch_tbf_ets.sh, tc_actions.sh, mirror_gre_bridge_1d_vlan.sh, sch_tbf_prio.sh, sch_tbf_root.sh), net/mptcp (except simult_flows.sh), net (except xfrm_policy.sh, udpgro_fwd.sh, gro.sh, txtimestamp.sh, udpgso_bench.sh, ip_defrag.sh, reuseaddr_conflict, reuseport_addr_any.sh), netfilter (except nft_trans_stress.sh), nsfs, pstore, ptrace, rseq, sgx, sigaltstack, size, splice, static_keys, sync, sysctl, tc-testing, tdx, timens, timers (except raw_skew), tpm2, user, vm, x86, zram

Reference

kselftests–ciqlts8_8–run3.log
kselftests–ciqlts8_8–run2.log
kselftests–ciqlts8_8–run1.log

Patch

kselftests–ciqlts8_8-CVE-2023-3611–run3.log
kselftests–ciqlts8_8-CVE-2023-3611–run2.log
kselftests–ciqlts8_8-CVE-2023-3611–run1.log

Comparison

All test results are the same

./ktests.xsh diff -d kselftests*.log

Column    File
--------  ---------------------------------------------
Status0   kselftests--ciqlts8_8--run1.log
Status1   kselftests--ciqlts8_8--run2.log
Status2   kselftests--ciqlts8_8--run3.log
Status3   kselftests--ciqlts8_8-CVE-2023-3611--run1.log
Status4   kselftests--ciqlts8_8-CVE-2023-3611--run2.log
Status5   kselftests--ciqlts8_8-CVE-2023-3611--run3.log

Specific tests: skipped

bmastbergen

The change itself looks good to me, and the justification for pulling the 4.19 stable version is fine. The only thing I don't know is what we want from a book keeping perspective in the commit log. @PlaidCat will any of our scripts be confused by two different 'commit hash' lines in the commit log for a single commit?

PlaidCat · 2025-05-21T14:22:53Z

The official backport to Linux 4.19 (closest to ciqlts8_8's 4.18)

To clarify this is only true sometimes as this kernel 8.8 has ~100k commits from upstream in it from 4.18 -> 6.x (I don't remember which off the top of my head kernel.org was up to when 8.8 was to start) so this is not always a good evaluation, and the primary reason we've defaulted to the Linus mainline.

WRT to tooling: Right now we don't use anything that look at the GKH (stable) kernel tree when it comes to automation, so unless both trees are cloned (or you just use stable's tree) you won't find that sha. I know there is another header line from stable that uses the old method of defining the reference to Linus's mainline tree but our tooling stops looking at the first blank line.

I think I'd prefer it to be something like

jira VULN-6560
cve CVE-2023-3611
commit-author Pedro Tammela <[email protected]>
commit 3e337087c3b5805fe0b8a46ba622a962880b5d64
upstream-diff used linux-stable LT-4.19 sha ee3bc829f9b4df96d208d58b654e400fa1f3b46c

<cherry-pick from linux-stable>

While its called out in the PR its not obvious where the actual code change comes from inside the commit log, so to discover it was from the LT-4.19 you'd have to come to the commit in github and then follow the PR links. If we were to ever migrate from GitHub to other systems PRs could become lost or no longer accurate so context as close to the code (ie in code or commit message) is the only way to prevent loss of context.
We could probably deduce this is from GHK stable but the rational is lost if the above isn't done correctly.

Note the git migration (and ticket systems) loss of data has happened to me at multiple companies so its not a theoretical.
Additionally you can look at the 2.6 seed for the kernel itself where all changes before that are lost to time.

bmastbergen · 2025-05-21T14:39:18Z

jira VULN-6560
cve CVE-2023-3611
commit-author Pedro Tammela <[email protected]>
commit 3e337087c3b5805fe0b8a46ba622a962880b5d64
upstream-diff used linux-stable LT-4.19 sha ee3bc829f9b4df96d208d58b654e400fa1f3b46c

<cherry-pick from linux-stable>

I like this ^^^ 👍

jira VULN-6560 cve CVE-2023-3611 commit-author Pedro Tammela <[email protected]> commit 3e33708 upstream-diff used linux-stable LT-4.19 sha ee3bc82 commit 3e33708 upstream. Lion says: ------- In the QFQ scheduler a similar issue to CVE-2023-31436 persists. Consider the following code in net/sched/sch_qfq.c: static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { unsigned int len = qdisc_pkt_len(skb), gso_segs; // ... if (unlikely(cl->agg->lmax < len)) { pr_debug("qfq: increasing maxpkt from %u to %u for class %u", cl->agg->lmax, len, cl->common.classid); err = qfq_change_agg(sch, cl, cl->agg->class_weight, len); if (err) { cl->qstats.drops++; return qdisc_drop(skb, sch, to_free); } // ... } Similarly to CVE-2023-31436, "lmax" is increased without any bounds checks according to the packet length "len". Usually this would not impose a problem because packet sizes are naturally limited. This is however not the actual packet length, rather the "qdisc_pkt_len(skb)" which might apply size transformations according to "struct qdisc_size_table" as created by "qdisc_get_stab()" in net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc. A user may choose virtually any size using such a table. As a result the same issue as in CVE-2023-31436 can occur, allowing heap out-of-bounds read / writes in the kmalloc-8192 cache. ------- We can create the issue with the following commands: tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \ overhead 999999999 linklayer ethernet qfq tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k tc filter add dev $DEV parent 1: matchall classid 1:1 ping -I $DEV 1.1.1.2 This is caused by incorrectly assuming that qdisc_pkt_len() returns a length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX. Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Reported-by: Lion <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Pedro Tammela <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Shaoying Xu <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit ee3bc82) Signed-off-by: Marcin Wcisło <[email protected]>

pvts-mat · 2025-05-21T17:26:54Z

To clarify this is only true sometimes as this kernel 8.8 has ~100k commits from upstream in it from 4.18 -> 6.x (I don't remember which off the top of my head kernel.org was up to when 8.8 was to start) so this is not always a good evaluation, and the primary reason we've defaulted to the Linus mainline.

I just learned it clearly from the other PR #282

PRs could become lost or no longer accurate so context as close to the code (ie in code or commit message) is the only way to prevent loss of context.

Relying on GitHub PRs should definitely be avoided (and migrating from GH encouraged 👍)
I just thought we have all the official Linux trees to pick cherries from and deducing from GKH stable is assumed.

PlaidCat

Thanks for the change

thefossguy-ciq

🚤

jira LE-1907 Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4 commit-author Daniel Borkmann <[email protected]> commit 685446b Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel ctrliq#252 tc_opts_after:OK ctrliq#253 tc_opts_append:OK ctrliq#254 tc_opts_basic:OK ctrliq#255 tc_opts_before:OK ctrliq#256 tc_opts_chain_classic:OK ctrliq#257 tc_opts_chain_mixed:OK ctrliq#258 tc_opts_delete_empty:OK ctrliq#259 tc_opts_demixed:OK ctrliq#260 tc_opts_detach:OK ctrliq#261 tc_opts_detach_after:OK ctrliq#262 tc_opts_detach_before:OK ctrliq#263 tc_opts_dev_cleanup:OK ctrliq#264 tc_opts_invalid:OK ctrliq#265 tc_opts_max:OK ctrliq#266 tc_opts_mixed:OK ctrliq#267 tc_opts_prepend:OK ctrliq#268 tc_opts_query:OK ctrliq#269 tc_opts_query_attach:OK <--- (new test) ctrliq#270 tc_opts_replace:OK ctrliq#271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]> (cherry picked from commit 685446b) Signed-off-by: Jonathan Maple <[email protected]>

jira LE-1907 Rebuild_History Non-Buildable kernel-5.14.0-427.18.1.el9_4 commit-author Daniel Borkmann <[email protected]> commit 2451630 Add several new test cases which assert corner cases on the mprog query mechanism, for example, around passing in a too small or a larger array than the current count. ./test_progs -t tc_opts ctrliq#252 tc_opts_after:OK ctrliq#253 tc_opts_append:OK ctrliq#254 tc_opts_basic:OK ctrliq#255 tc_opts_before:OK ctrliq#256 tc_opts_chain_classic:OK ctrliq#257 tc_opts_chain_mixed:OK ctrliq#258 tc_opts_delete_empty:OK ctrliq#259 tc_opts_demixed:OK ctrliq#260 tc_opts_detach:OK ctrliq#261 tc_opts_detach_after:OK ctrliq#262 tc_opts_detach_before:OK ctrliq#263 tc_opts_dev_cleanup:OK ctrliq#264 tc_opts_invalid:OK ctrliq#265 tc_opts_max:OK ctrliq#266 tc_opts_mixed:OK ctrliq#267 tc_opts_prepend:OK ctrliq#268 tc_opts_query:OK ctrliq#269 tc_opts_query_attach:OK ctrliq#270 tc_opts_replace:OK ctrliq#271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Reviewed-by: Alan Maguire <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] (cherry picked from commit 2451630) Signed-off-by: Jonathan Maple <[email protected]>

pvts-mat changed the title ~~net/sched: sch_qfq: account for stab overhead in qfq_enqueue~~ [LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue May 19, 2025

PlaidCat requested review from kerneltoast, PlaidCat, bmastbergen and thefossguy-ciq May 20, 2025 15:44

bmastbergen reviewed May 20, 2025

View reviewed changes

PlaidCat mentioned this pull request May 21, 2025

ciq-cherry-pick should detect GKH linux-stable shas and appropreately set commit header ctrliq/kernel-src-tree-tools#21

Open

pvts-mat force-pushed the ciqlts8_8-CVE-2023-3611 branch from a5ee20a to c83e32c Compare May 21, 2025 17:21

PlaidCat approved these changes May 22, 2025

View reviewed changes

thefossguy-ciq approved these changes May 26, 2025

View reviewed changes

PlaidCat merged commit 286db31 into ctrliq:ciqlts8_8 May 27, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue #271

[LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue #271

Uh oh!

pvts-mat commented May 19, 2025

Uh oh!

bmastbergen left a comment

Uh oh!

PlaidCat commented May 21, 2025 •

edited

Loading

Uh oh!

bmastbergen commented May 21, 2025

Uh oh!

pvts-mat commented May 21, 2025

Uh oh!

PlaidCat left a comment

Uh oh!

thefossguy-ciq left a comment

Uh oh!

Uh oh!

Uh oh!

[LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue #271

[LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue #271

Uh oh!

Conversation

pvts-mat commented May 19, 2025

Problem

Applicability

Solution

kABI check: passed

Boot test: passed

Kselftests: passed relative

Coverage

Reference

Patch

Comparison

Specific tests: skipped

Uh oh!

bmastbergen left a comment

Choose a reason for hiding this comment

Uh oh!

PlaidCat commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmastbergen commented May 21, 2025

Uh oh!

pvts-mat commented May 21, 2025

Uh oh!

PlaidCat left a comment

Choose a reason for hiding this comment

Uh oh!

thefossguy-ciq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

PlaidCat commented May 21, 2025 •

edited

Loading