Skip to content

Commit af19487

Browse files
CmdrMoozyakpm00
authored andcommitted
mm: make PTE_MARKER_SWAPIN_ERROR more general
Patch series "add UFFDIO_POISON to simulate memory poisoning with UFFD", v4. This series adds a new userfaultfd feature, UFFDIO_POISON. See commit 4 for a detailed description of the feature. This patch (of 8): Future patches will reuse PTE_MARKER_SWAPIN_ERROR to implement UFFDIO_POISON, so make some various preparations for that: First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be confusing since we're going to re-use it for something not really related to swap. This can be particularly confusing for things like hugetlbfs, which doesn't support swap whatsoever. Also rename some various helper functions. Next, fix pte marker copying for hugetlbfs. Previously, it would WARN on seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swap. But, since we're going to re-use it, we want it to go ahead and copy it just like non-hugetlbfs memory does today. Since the code to do this is more complicated now, pull it out into a helper which can be re-used in both places. While we're at it, also make it slightly more explicit in its handling of e.g. uffd wp markers. For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS for an error entry, return VM_FAULT_HWPOISON. For most cases this change doesn't matter, e.g. a userspace program would receive a SIGBUS either way. But for UFFDIO_POISON, this change will let KVM guests get an MCE out of the box, instead of giving a SIGBUS to the hypervisor and requiring it to somehow inject an MCE. Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen today because the lack of swap support means we'll never end up with such a PTE anyway, but this behavior will be needed once such entries *can* show up via UFFDIO_POISON. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Cc: Al Viro <[email protected]> Cc: Brian Geffon <[email protected]> Cc: Christian Brauner <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Gaosheng Cui <[email protected]> Cc: Huang, Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: Jan Alexander Steffens (heftig) <[email protected]> Cc: Jiaqi Yan <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: T.J. Alumbaugh <[email protected]> Cc: Yu Zhao <[email protected]> Cc: ZhangPeng <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 60b1e24 commit af19487

File tree

8 files changed

+65
-28
lines changed

8 files changed

+65
-28
lines changed

include/linux/mm_inline.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,25 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm)
523523
return atomic_read(&mm->tlb_flush_pending) > 1;
524524
}
525525

526+
/*
527+
* Computes the pte marker to copy from the given source entry into dst_vma.
528+
* If no marker should be copied, returns 0.
529+
* The caller should insert a new pte created with make_pte_marker().
530+
*/
531+
static inline pte_marker copy_pte_marker(
532+
swp_entry_t entry, struct vm_area_struct *dst_vma)
533+
{
534+
pte_marker srcm = pte_marker_get(entry);
535+
/* Always copy error entries. */
536+
pte_marker dstm = srcm & PTE_MARKER_POISONED;
537+
538+
/* Only copy PTE markers if UFFD register matches. */
539+
if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma))
540+
dstm |= PTE_MARKER_UFFD_WP;
541+
542+
return dstm;
543+
}
544+
526545
/*
527546
* If this pte is wr-protected by uffd-wp in any form, arm the special pte to
528547
* replace a none pte. NOTE! This should only be called when *pte is already

include/linux/swapops.h

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,12 @@ static inline bool is_migration_entry_dirty(swp_entry_t entry)
393393
typedef unsigned long pte_marker;
394394

395395
#define PTE_MARKER_UFFD_WP BIT(0)
396-
#define PTE_MARKER_SWAPIN_ERROR BIT(1)
396+
/*
397+
* "Poisoned" here is meant in the very general sense of "future accesses are
398+
* invalid", instead of referring very specifically to hardware memory errors.
399+
* This marker is meant to represent any of various different causes of this.
400+
*/
401+
#define PTE_MARKER_POISONED BIT(1)
397402
#define PTE_MARKER_MASK (BIT(2) - 1)
398403

399404
static inline swp_entry_t make_pte_marker_entry(pte_marker marker)
@@ -421,15 +426,15 @@ static inline pte_t make_pte_marker(pte_marker marker)
421426
return swp_entry_to_pte(make_pte_marker_entry(marker));
422427
}
423428

424-
static inline swp_entry_t make_swapin_error_entry(void)
429+
static inline swp_entry_t make_poisoned_swp_entry(void)
425430
{
426-
return make_pte_marker_entry(PTE_MARKER_SWAPIN_ERROR);
431+
return make_pte_marker_entry(PTE_MARKER_POISONED);
427432
}
428433

429-
static inline int is_swapin_error_entry(swp_entry_t entry)
434+
static inline int is_poisoned_swp_entry(swp_entry_t entry)
430435
{
431436
return is_pte_marker_entry(entry) &&
432-
(pte_marker_get(entry) & PTE_MARKER_SWAPIN_ERROR);
437+
(pte_marker_get(entry) & PTE_MARKER_POISONED);
433438
}
434439

435440
/*

mm/hugetlb.c

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
#include <linux/nospec.h>
3535
#include <linux/delayacct.h>
3636
#include <linux/memory.h>
37+
#include <linux/mm_inline.h>
3738

3839
#include <asm/page.h>
3940
#include <asm/pgalloc.h>
@@ -5101,15 +5102,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
51015102
entry = huge_pte_clear_uffd_wp(entry);
51025103
set_huge_pte_at(dst, addr, dst_pte, entry);
51035104
} else if (unlikely(is_pte_marker(entry))) {
5104-
/* No swap on hugetlb */
5105-
WARN_ON_ONCE(
5106-
is_swapin_error_entry(pte_to_swp_entry(entry)));
5107-
/*
5108-
* We copy the pte marker only if the dst vma has
5109-
* uffd-wp enabled.
5110-
*/
5111-
if (userfaultfd_wp(dst_vma))
5112-
set_huge_pte_at(dst, addr, dst_pte, entry);
5105+
pte_marker marker = copy_pte_marker(
5106+
pte_to_swp_entry(entry), dst_vma);
5107+
5108+
if (marker)
5109+
set_huge_pte_at(dst, addr, dst_pte,
5110+
make_pte_marker(marker));
51135111
} else {
51145112
entry = huge_ptep_get(src_pte);
51155113
pte_folio = page_folio(pte_page(entry));
@@ -6089,14 +6087,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
60896087
}
60906088

60916089
entry = huge_ptep_get(ptep);
6092-
/* PTE markers should be handled the same way as none pte */
6093-
if (huge_pte_none_mostly(entry))
6090+
if (huge_pte_none_mostly(entry)) {
6091+
if (is_pte_marker(entry)) {
6092+
pte_marker marker =
6093+
pte_marker_get(pte_to_swp_entry(entry));
6094+
6095+
if (marker & PTE_MARKER_POISONED) {
6096+
ret = VM_FAULT_HWPOISON_LARGE;
6097+
goto out_mutex;
6098+
}
6099+
}
6100+
60946101
/*
6102+
* Other PTE markers should be handled the same way as none PTE.
6103+
*
60956104
* hugetlb_no_page will drop vma lock and hugetlb fault
60966105
* mutex internally, which make us return immediately.
60976106
*/
60986107
return hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
60996108
entry, flags);
6109+
}
61006110

61016111
ret = 0;
61026112

mm/madvise.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -664,7 +664,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
664664
free_swap_and_cache(entry);
665665
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
666666
} else if (is_hwpoison_entry(entry) ||
667-
is_swapin_error_entry(entry)) {
667+
is_poisoned_swp_entry(entry)) {
668668
pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
669669
}
670670
continue;

mm/memory.c

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -860,8 +860,11 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
860860
return -EBUSY;
861861
return -ENOENT;
862862
} else if (is_pte_marker_entry(entry)) {
863-
if (is_swapin_error_entry(entry) || userfaultfd_wp(dst_vma))
864-
set_pte_at(dst_mm, addr, dst_pte, pte);
863+
pte_marker marker = copy_pte_marker(entry, dst_vma);
864+
865+
if (marker)
866+
set_pte_at(dst_mm, addr, dst_pte,
867+
make_pte_marker(marker));
865868
return 0;
866869
}
867870
if (!userfaultfd_wp(dst_vma))
@@ -1502,7 +1505,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
15021505
!zap_drop_file_uffd_wp(details))
15031506
continue;
15041507
} else if (is_hwpoison_entry(entry) ||
1505-
is_swapin_error_entry(entry)) {
1508+
is_poisoned_swp_entry(entry)) {
15061509
if (!should_zap_cows(details))
15071510
continue;
15081511
} else {
@@ -3651,7 +3654,7 @@ static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
36513654
* none pte. Otherwise it means the pte could have changed, so retry.
36523655
*
36533656
* This should also cover the case where e.g. the pte changed
3654-
* quickly from a PTE_MARKER_UFFD_WP into PTE_MARKER_SWAPIN_ERROR.
3657+
* quickly from a PTE_MARKER_UFFD_WP into PTE_MARKER_POISONED.
36553658
* So is_pte_marker() check is not enough to safely drop the pte.
36563659
*/
36573660
if (pte_same(vmf->orig_pte, ptep_get(vmf->pte)))
@@ -3697,8 +3700,8 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf)
36973700
return VM_FAULT_SIGBUS;
36983701

36993702
/* Higher priority than uffd-wp when data corrupted */
3700-
if (marker & PTE_MARKER_SWAPIN_ERROR)
3701-
return VM_FAULT_SIGBUS;
3703+
if (marker & PTE_MARKER_POISONED)
3704+
return VM_FAULT_HWPOISON;
37023705

37033706
if (pte_marker_entry_uffd_wp(entry))
37043707
return pte_marker_handle_uffd_wp(vmf);

mm/mprotect.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -230,10 +230,10 @@ static long change_pte_range(struct mmu_gather *tlb,
230230
newpte = pte_swp_mkuffd_wp(newpte);
231231
} else if (is_pte_marker_entry(entry)) {
232232
/*
233-
* Ignore swapin errors unconditionally,
233+
* Ignore error swap entries unconditionally,
234234
* because any access should sigbus anyway.
235235
*/
236-
if (is_swapin_error_entry(entry))
236+
if (is_poisoned_swp_entry(entry))
237237
continue;
238238
/*
239239
* If this is uffd-wp pte marker and we'd like

mm/shmem.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1707,7 +1707,7 @@ static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
17071707
swp_entry_t swapin_error;
17081708
void *old;
17091709

1710-
swapin_error = make_swapin_error_entry();
1710+
swapin_error = make_poisoned_swp_entry();
17111711
old = xa_cmpxchg_irq(&mapping->i_pages, index,
17121712
swp_to_radix_entry(swap),
17131713
swp_to_radix_entry(swapin_error), 0);
@@ -1752,7 +1752,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
17521752
swap = radix_to_swp_entry(*foliop);
17531753
*foliop = NULL;
17541754

1755-
if (is_swapin_error_entry(swap))
1755+
if (is_poisoned_swp_entry(swap))
17561756
return -EIO;
17571757

17581758
si = get_swap_device(swap);

mm/swapfile.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1771,7 +1771,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
17711771
swp_entry = make_hwpoison_entry(swapcache);
17721772
page = swapcache;
17731773
} else {
1774-
swp_entry = make_swapin_error_entry();
1774+
swp_entry = make_poisoned_swp_entry();
17751775
}
17761776
new_pte = swp_entry_to_pte(swp_entry);
17771777
ret = 0;

0 commit comments

Comments
 (0)