Skip to content

[LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue #271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 27, 2025

Conversation

pvts-mat
Copy link
Contributor

[LTS 8.8]
CVE-2023-3611
VULN-6560

Problem

https://www.cve.org/CVERecord?id=CVE-2023-3611

An out-of-bounds write vulnerability in the Linux kernel's net/sched: sch_qfq component can be exploited to achieve local privilege escalation. The qfq_change_agg() function in net/sched/sch_qfq.c allows an out-of-bounds write because lmax is updated according to packet sizes without bounds checks.

Applicability

The vulnerability applies to the sch_qfq module which is enabled in ciqlts8_8:

grep CONFIG_NET_SCH_QFQ= configs/*

configs/kernel-aarch64-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-aarch64.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-ppc64le-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-ppc64le.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-s390x-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-s390x-zfcpdump.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-s390x.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-x86_64-debug.config:CONFIG_NET_SCH_QFQ=m
configs/kernel-x86_64.config:CONFIG_NET_SCH_QFQ=m

Solution

The mainline fix is provided in the commit 3e33708. The official backport to Linux 4.19 (closest to ciqlts8_8's 4.18) is given in the commit ee3bc82. Note that it's different from the mainline fix, as it doesn't use the introduced in the meantime (6f22c11) QFQ_MAX_LMAX constant, inlining its definition instead

3e33708:

net/sched/sch_qfq.c:116:

#define QFQ_MAX_LMAX		(1UL << QFQ_MTU_SHIFT)

net/sched/sch_qfq.c:387:

if (lmax > QFQ_MAX_LMAX)
	return -EINVAL;

ee3bc82:

net/sched/sch_qfq.c:393:

if (lmax > (1UL << QFQ_MTU_SHIFT))
	return -EINVAL;

The already applied fix for Rocky LTS 8.6 gets around it by first picking 2536989 (as 6f22c11) which introduces QFQ_MAX_LMAX and then the mainline fix 3e33708 (as fe72210):

git log --oneline -n 2 fe72210071638a9c4cedfa4a09629cf284fd9631

fe7221007 net/sched: sch_qfq: account for stab overhead in qfq_enqueue
6f22c114d net/sched: sch_qfq: refactor parsing of netlink parameters

git log -n 1 fe72210071638a9c4cedfa4a09629cf284fd9631 | grep -A 1 upstream-diff

upstream-diff Cherry-pick is clean however QFQ_MAX_LMAX is undeclared so
a prereq commit was needed

For ciqlts8_8's fix the stable's branch backport ee3bc82 was used directly as it applies without any conflicts or code changes and provides a simpler solution while remaining no less official than the mainlne fix.

kABI check: passed

DEBUG=1 CVE=CVE-2023-3611 ./ninja.sh _kabi_checked__$(uname -m)--test--ciqlts8_8-CVE-2023-3611

[1/2] Check ABI of kernel [ciqlts8_8-CVE-2023-3611]
++ uname -m
+ python3 /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/check-kabi -k /data/src/ctrliq-github/kernel-dist-git-el-8.8/SOURCES/Module.kabi_x86_64 -s vms/x86_64--build--ciqlts8_8/build_files/kernel-src-tree-ciqlts8_8-CVE-2023-3611/Module.symvers
kABI check passed
+ touch state/kernels/ciqlts8_8-CVE-2023-3611/x86_64/kabi_checked

Boot test: passed

boot-test.log

Kselftests: passed relative

Coverage

Specific tests were skipped which proved to be unreliable in the past.

android, bpf (except test_progs, test_xsk.sh, test_progs-no_alu32, test_kmod.sh, test_sockmap), breakpoints, capabilities, cgroup, core, cpu-hotplug, cpufreq, drivers/net/bonding, drivers/net/team, efivarfs, exec, firmware, fpu, ftrace, futex, gpio, intel_pstate, ipc, kcmp, kexec, kvm, lib, livepatch, membarrier, memfd, memory-hotplug, mount, mqueue, net/forwarding (except mirror_gre_vlan_bridge_1q.sh, sch_ets.sh, ipip_hier_gre_keys.sh, sch_tbf_ets.sh, tc_actions.sh, mirror_gre_bridge_1d_vlan.sh, sch_tbf_prio.sh, sch_tbf_root.sh), net/mptcp (except simult_flows.sh), net (except xfrm_policy.sh, udpgro_fwd.sh, gro.sh, txtimestamp.sh, udpgso_bench.sh, ip_defrag.sh, reuseaddr_conflict, reuseport_addr_any.sh), netfilter (except nft_trans_stress.sh), nsfs, pstore, ptrace, rseq, sgx, sigaltstack, size, splice, static_keys, sync, sysctl, tc-testing, tdx, timens, timers (except raw_skew), tpm2, user, vm, x86, zram

Reference

kselftests–ciqlts8_8–run3.log
kselftests–ciqlts8_8–run2.log
kselftests–ciqlts8_8–run1.log

Patch

kselftests–ciqlts8_8-CVE-2023-3611–run3.log
kselftests–ciqlts8_8-CVE-2023-3611–run2.log
kselftests–ciqlts8_8-CVE-2023-3611–run1.log

Comparison

All test results are the same

./ktests.xsh diff -d kselftests*.log

Column    File
--------  ---------------------------------------------
Status0   kselftests--ciqlts8_8--run1.log
Status1   kselftests--ciqlts8_8--run2.log
Status2   kselftests--ciqlts8_8--run3.log
Status3   kselftests--ciqlts8_8-CVE-2023-3611--run1.log
Status4   kselftests--ciqlts8_8-CVE-2023-3611--run2.log
Status5   kselftests--ciqlts8_8-CVE-2023-3611--run3.log

Specific tests: skipped

@pvts-mat pvts-mat changed the title net/sched: sch_qfq: account for stab overhead in qfq_enqueue [LTS 8.8] net/sched: sch_qfq: account for stab overhead in qfq_enqueue May 19, 2025
Copy link
Collaborator

@bmastbergen bmastbergen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself looks good to me, and the justification for pulling the 4.19 stable version is fine. The only thing I don't know is what we want from a book keeping perspective in the commit log. @PlaidCat will any of our scripts be confused by two different 'commit hash' lines in the commit log for a single commit?

@PlaidCat
Copy link
Collaborator

PlaidCat commented May 21, 2025

The official backport to Linux 4.19 (closest to ciqlts8_8's 4.18)

To clarify this is only true sometimes as this kernel 8.8 has ~100k commits from upstream in it from 4.18 -> 6.x (I don't remember which off the top of my head kernel.org was up to when 8.8 was to start) so this is not always a good evaluation, and the primary reason we've defaulted to the Linus mainline.

WRT to tooling: Right now we don't use anything that look at the GKH (stable) kernel tree when it comes to automation, so unless both trees are cloned (or you just use stable's tree) you won't find that sha. I know there is another header line from stable that uses the old method of defining the reference to Linus's mainline tree but our tooling stops looking at the first blank line.

I think I'd prefer it to be something like

jira VULN-6560
cve CVE-2023-3611
commit-author Pedro Tammela <[email protected]>
commit 3e337087c3b5805fe0b8a46ba622a962880b5d64
upstream-diff used linux-stable LT-4.19 sha ee3bc829f9b4df96d208d58b654e400fa1f3b46c

<cherry-pick from linux-stable>

While its called out in the PR its not obvious where the actual code change comes from inside the commit log, so to discover it was from the LT-4.19 you'd have to come to the commit in github and then follow the PR links. If we were to ever migrate from GitHub to other systems PRs could become lost or no longer accurate so context as close to the code (ie in code or commit message) is the only way to prevent loss of context.
We could probably deduce this is from GHK stable but the rational is lost if the above isn't done correctly.

Note the git migration (and ticket systems) loss of data has happened to me at multiple companies so its not a theoretical.
Additionally you can look at the 2.6 seed for the kernel itself where all changes before that are lost to time.

@bmastbergen
Copy link
Collaborator

jira VULN-6560
cve CVE-2023-3611
commit-author Pedro Tammela <[email protected]>
commit 3e337087c3b5805fe0b8a46ba622a962880b5d64
upstream-diff used linux-stable LT-4.19 sha ee3bc829f9b4df96d208d58b654e400fa1f3b46c

<cherry-pick from linux-stable>

I like this ^^^ 👍

jira VULN-6560
cve CVE-2023-3611
commit-author Pedro Tammela <[email protected]>
commit 3e33708
upstream-diff used linux-stable LT-4.19 sha ee3bc82

commit 3e33708 upstream.

Lion says:
-------
In the QFQ scheduler a similar issue to CVE-2023-31436
persists.

Consider the following code in net/sched/sch_qfq.c:

static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
                struct sk_buff **to_free)
{
     unsigned int len = qdisc_pkt_len(skb), gso_segs;

    // ...

     if (unlikely(cl->agg->lmax < len)) {
         pr_debug("qfq: increasing maxpkt from %u to %u for class %u",
              cl->agg->lmax, len, cl->common.classid);
         err = qfq_change_agg(sch, cl, cl->agg->class_weight, len);
         if (err) {
             cl->qstats.drops++;
             return qdisc_drop(skb, sch, to_free);
         }

    // ...

     }

Similarly to CVE-2023-31436, "lmax" is increased without any bounds
checks according to the packet length "len". Usually this would not
impose a problem because packet sizes are naturally limited.

This is however not the actual packet length, rather the
"qdisc_pkt_len(skb)" which might apply size transformations according to
"struct qdisc_size_table" as created by "qdisc_get_stab()" in
net/sched/sch_api.c if the TCA_STAB option was set when modifying the qdisc.

A user may choose virtually any size using such a table.

As a result the same issue as in CVE-2023-31436 can occur, allowing heap
out-of-bounds read / writes in the kmalloc-8192 cache.
-------

We can create the issue with the following commands:

tc qdisc add dev $DEV root handle 1: stab mtu 2048 tsize 512 mpu 0 \
overhead 999999999 linklayer ethernet qfq
tc class add dev $DEV parent 1: classid 1:1 htb rate 6mbit burst 15k
tc filter add dev $DEV parent 1: matchall classid 1:1
ping -I $DEV 1.1.1.2

This is caused by incorrectly assuming that qdisc_pkt_len() returns a
length within the QFQ_MIN_LMAX < len < QFQ_MAX_LMAX.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
	Reported-by: Lion <[email protected]>
	Reviewed-by: Eric Dumazet <[email protected]>
	Signed-off-by: Jamal Hadi Salim <[email protected]>
	Signed-off-by: Pedro Tammela <[email protected]>
	Reviewed-by: Simon Horman <[email protected]>
	Signed-off-by: Paolo Abeni <[email protected]>
	Signed-off-by: Shaoying Xu <[email protected]>
	Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit ee3bc82)
	Signed-off-by: Marcin Wcisło <[email protected]>
@pvts-mat pvts-mat force-pushed the ciqlts8_8-CVE-2023-3611 branch from a5ee20a to c83e32c Compare May 21, 2025 17:21
@pvts-mat
Copy link
Contributor Author

To clarify this is only true sometimes as this kernel 8.8 has ~100k commits from upstream in it from 4.18 -> 6.x (I don't remember which off the top of my head kernel.org was up to when 8.8 was to start) so this is not always a good evaluation, and the primary reason we've defaulted to the Linus mainline.

I just learned it clearly from the other PR #282

PRs could become lost or no longer accurate so context as close to the code (ie in code or commit message) is the only way to prevent loss of context.

Relying on GitHub PRs should definitely be avoided (and migrating from GH encouraged 👍)
I just thought we have all the official Linux trees to pick cherries from and deducing from GKH stable is assumed.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change
:shipit:

Copy link

@thefossguy-ciq thefossguy-ciq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚤

@PlaidCat PlaidCat merged commit 286db31 into ctrliq:ciqlts8_8 May 27, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants