net: fix __dst_negative_advice() race #162

bmastbergen · 2025-03-11T14:16:34Z

jira VULN-1303
cve CVE-2024-36971

commit-author Eric Dumazet <[email protected]>
commit 92f1655aa2b2294d0b49925f3b875a634bd3b59e
upstream-diff This change breaks the kabi.  Use the RH_KABI_REPLACE
              macro to define the negative_advice function such that
              check-kabi will still pass.  From rh_kabi.h:

              "The RH_KABI_REPLACE* macros attempt to add the ability
               to use the '_new' element while preserving size
               alignment and kabi agreement with the '_orig' element."

              Also, this version of the kernel does not have rcu
              locking in ip6_negative_advice like the upstream does
              so the rcu_read_lock/rcu_read_unlock calls are not
              in this changeset.

__dst_negative_advice() does not enforce proper RCU rules when sk->dst_cache must be cleared, leading to possible UAF.

RCU rules are that we must first clear sk->sk_dst_cache, then call dst_release(old_dst).

Note that sk_dst_reset(sk) is implementing this protocol correctly, while __dst_negative_advice() uses the wrong order.

Given that ip6_negative_advice() has special logic against RTF_CACHE, this means each of the three ->negative_advice() existing methods must perform the sk_dst_reset() themselves.

Note the check against NULL dst is centralized in
__dst_negative_advice(), there is no need to duplicate it in various callbacks.

Many thanks to Clement Lecigne for tracking this issue.

This old bug became visible after the blamed commit, using UDP sockets.

Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets")
        Reported-by: Clement Lecigne <[email protected]>
Diagnosed-by: Clement Lecigne <[email protected]>
        Signed-off-by: Eric Dumazet <[email protected]>
        Cc: Tom Herbert <[email protected]>
        Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
        Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 92f1655aa2b2294d0b49925f3b875a634bd3b59e)
        Signed-off-by: Brett Mastbergen <[email protected]>

Build log

build.log

Testing

The very limited kselftests were run before and after the change was applied:

selftests-before.log

selftests-after.log

brett@lycia ~/ciq/vuln-1303 % grep ^ok selftests-before.log | wc -l
2
brett@lycia ~/ciq/vuln-1303 % grep ^ok selftests-after.log | wc -l
2
brett@lycia ~/ciq/vuln-1303 %

jira VULN-1303 cve CVE-2024-36971 commit-author Eric Dumazet <[email protected]> commit 92f1655 upstream-diff This change breaks the kabi. Use the RH_KABI_REPLACE macro to define the negative_advice function such that check-kabi will still pass. From rh_kabi.h: "The RH_KABI_REPLACE* macros attempt to add the ability to use the '_new' element while preserving size alignment and kabi agreement with the '_orig' element." Also, this version of the kernel does not have rcu locking in ip6_negative_advice like the upstream does so the rcu_read_lock/rcu_read_unlock calls are not in this changeset. __dst_negative_advice() does not enforce proper RCU rules when sk->dst_cache must be cleared, leading to possible UAF. RCU rules are that we must first clear sk->sk_dst_cache, then call dst_release(old_dst). Note that sk_dst_reset(sk) is implementing this protocol correctly, while __dst_negative_advice() uses the wrong order. Given that ip6_negative_advice() has special logic against RTF_CACHE, this means each of the three ->negative_advice() existing methods must perform the sk_dst_reset() themselves. Note the check against NULL dst is centralized in __dst_negative_advice(), there is no need to duplicate it in various callbacks. Many thanks to Clement Lecigne for tracking this issue. This old bug became visible after the blamed commit, using UDP sockets. Fixes: a87cb3e ("net: Facility to report route quality of connected sockets") Reported-by: Clement Lecigne <[email protected]> Diagnosed-by: Clement Lecigne <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Tom Herbert <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> (cherry picked from commit 92f1655) Signed-off-by: Brett Mastbergen <[email protected]>

PlaidCat

kerneltoast

I think a KABI break is required, otherwise an OOT module which really depends on the old KABI will introduce the CVE via its negative_advice callback.

kerneltoast

Does indeed break the KABI as desired, sorry for the confusion on my part.

bmastbergen · 2025-03-11T16:50:17Z

I think a KABI break is required, otherwise an OOT module which really depends on the old KABI will introduce the CVE via its negative_advice callback.

Just to sync up slack discussion here. THE RH_KABI_REPLACE macro does force users to use the new call. But it also fools the check-kabi tool into thinking everything is ok. In reality the kabi is broken by this change.

apply_to_pte_range() enters the lazy MMU mode and then invokes kasan_populate_vmalloc_pte() callback on each page table walk iteration. However, the callback can go into sleep when trying to allocate a single page, e.g. if an architecutre disables preemption on lazy MMU mode enter. On s390 if make arch_enter_lazy_mmu_mode() -> preempt_enable() and arch_leave_lazy_mmu_mode() -> preempt_disable(), such crash occurs: [ 0.663336] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:321 [ 0.663348] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd [ 0.663358] preempt_count: 1, expected: 0 [ 0.663366] RCU nest depth: 0, expected: 0 [ 0.663375] no locks held by kthreadd/2. [ 0.663383] Preemption disabled at: [ 0.663386] [<0002f3284cbb4eda>] apply_to_pte_range+0xfa/0x4a0 [ 0.663405] CPU: 0 UID: 0 PID: 2 Comm: kthreadd Not tainted 6.15.0-rc5-gcc-kasan-00043-gd76bb1ebb558-dirty #162 PREEMPT [ 0.663408] Hardware name: IBM 3931 A01 701 (KVM/Linux) [ 0.663409] Call Trace: [ 0.663410] [<0002f3284c385f58>] dump_stack_lvl+0xe8/0x140 [ 0.663413] [<0002f3284c507b9e>] __might_resched+0x66e/0x700 [ 0.663415] [<0002f3284cc4f6c0>] __alloc_frozen_pages_noprof+0x370/0x4b0 [ 0.663419] [<0002f3284ccc73c0>] alloc_pages_mpol+0x1a0/0x4a0 [ 0.663421] [<0002f3284ccc8518>] alloc_frozen_pages_noprof+0x88/0xc0 [ 0.663424] [<0002f3284ccc8572>] alloc_pages_noprof+0x22/0x120 [ 0.663427] [<0002f3284cc341ac>] get_free_pages_noprof+0x2c/0xc0 [ 0.663429] [<0002f3284cceba70>] kasan_populate_vmalloc_pte+0x50/0x120 [ 0.663433] [<0002f3284cbb4ef8>] apply_to_pte_range+0x118/0x4a0 [ 0.663435] [<0002f3284cbc7c14>] apply_to_pmd_range+0x194/0x3e0 [ 0.663437] [<0002f3284cbc99be>] __apply_to_page_range+0x2fe/0x7a0 [ 0.663440] [<0002f3284cbc9e88>] apply_to_page_range+0x28/0x40 [ 0.663442] [<0002f3284ccebf12>] kasan_populate_vmalloc+0x82/0xa0 [ 0.663445] [<0002f3284cc1578c>] alloc_vmap_area+0x34c/0xc10 [ 0.663448] [<0002f3284cc1c2a6>] __get_vm_area_node+0x186/0x2a0 [ 0.663451] [<0002f3284cc1e696>] __vmalloc_node_range_noprof+0x116/0x310 [ 0.663454] [<0002f3284cc1d950>] __vmalloc_node_noprof+0xd0/0x110 [ 0.663457] [<0002f3284c454b88>] alloc_thread_stack_node+0xf8/0x330 [ 0.663460] [<0002f3284c458d56>] dup_task_struct+0x66/0x4d0 [ 0.663463] [<0002f3284c45be90>] copy_process+0x280/0x4b90 [ 0.663465] [<0002f3284c460940>] kernel_clone+0xd0/0x4b0 [ 0.663467] [<0002f3284c46115e>] kernel_thread+0xbe/0xe0 [ 0.663469] [<0002f3284c4e440e>] kthreadd+0x50e/0x7f0 [ 0.663472] [<0002f3284c38c04a>] __ret_from_fork+0x8a/0xf0 [ 0.663475] [<0002f3284ed57ff2>] ret_from_fork+0xa/0x38 Instead of allocating single pages per-PTE, bulk-allocate the shadow memory prior to applying kasan_populate_vmalloc_pte() callback on a page range. Link: https://lkml.kernel.org/r/c61d3560297c93ed044f0b1af085610353a06a58.1747316918.git.agordeev@linux.ibm.com Fixes: 3c5c3cf ("kasan: support backing vmalloc space with real shadow memory") Signed-off-by: Alexander Gordeev <[email protected]> Suggested-by: Andrey Ryabinin <[email protected]> Reviewed-by: Harry Yoo <[email protected]> Cc: Daniel Axtens <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

bmastbergen requested review from PlaidCat and kerneltoast March 11, 2025 14:16

PlaidCat approved these changes Mar 11, 2025

View reviewed changes

PlaidCat requested review from jallisonciq and kemotaha March 11, 2025 16:13

kerneltoast requested changes Mar 11, 2025

View reviewed changes

kerneltoast self-requested a review March 11, 2025 16:46

kerneltoast approved these changes Mar 11, 2025

View reviewed changes

bmastbergen merged commit be03ca1 into ciqcbr7_9 Mar 11, 2025

bmastbergen deleted the bmastbergen_ciqcbr7_9/VULN-1303 branch March 11, 2025 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

net: fix __dst_negative_advice() race #162

net: fix __dst_negative_advice() race #162

Uh oh!

bmastbergen commented Mar 11, 2025

Uh oh!

PlaidCat left a comment

Uh oh!

kerneltoast left a comment •

edited

Loading

Uh oh!

kerneltoast left a comment

Uh oh!

bmastbergen commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

net: fix __dst_negative_advice() race #162

net: fix __dst_negative_advice() race #162

Uh oh!

Conversation

bmastbergen commented Mar 11, 2025

Uh oh!

PlaidCat left a comment

Choose a reason for hiding this comment

Uh oh!

kerneltoast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kerneltoast left a comment

Choose a reason for hiding this comment

Uh oh!

bmastbergen commented Mar 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

kerneltoast left a comment •

edited

Loading