Skip to content

Commit 88f306b

Browse files
kiryltorvalds
authored andcommitted
mm: fix locking order in mm_take_all_locks()
Dmitry Vyukov has reported[1] possible deadlock (triggered by his syzkaller fuzzer): Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&hugetlbfs_i_mmap_rwsem_key); lock(&mapping->i_mmap_rwsem); lock(&hugetlbfs_i_mmap_rwsem_key); lock(&mapping->i_mmap_rwsem); Both traces points to mm_take_all_locks() as a source of the problem. It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem. huge_pmd_share() does memory allocation under hugetlbfs_i_mmap_rwsem_key and allocator can take i_mmap_rwsem if it hit reclaim. So we need to take i_mmap_rwsem from all hugetlb VMAs before taking i_mmap_rwsem from rest of VMAs. The patch also documents locking order for hugetlbfs_i_mmap_rwsem_key. [1] http://lkml.kernel.org/r/CACT4Y+Zu95tBs-0EvdiAKzUOsb4tczRRfCRTpLr4bg_OP9HuVg@mail.gmail.com Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent d645fc0 commit 88f306b

File tree

3 files changed

+37
-21
lines changed

3 files changed

+37
-21
lines changed

fs/hugetlbfs/inode.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -708,7 +708,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
708708
/*
709709
* Hugetlbfs is not reclaimable; therefore its i_mmap_rwsem will never
710710
* be taken from reclaim -- unlike regular filesystems. This needs an
711-
* annotation because huge_pmd_share() does an allocation under
711+
* annotation because huge_pmd_share() does an allocation under hugetlb's
712712
* i_mmap_rwsem.
713713
*/
714714
static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;

mm/mmap.c

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3184,10 +3184,16 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
31843184
* mapping->flags avoid to take the same lock twice, if more than one
31853185
* vma in this mm is backed by the same anon_vma or address_space.
31863186
*
3187-
* We can take all the locks in random order because the VM code
3188-
* taking i_mmap_rwsem or anon_vma->rwsem outside the mmap_sem never
3189-
* takes more than one of them in a row. Secondly we're protected
3190-
* against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
3187+
* We take locks in following order, accordingly to comment at beginning
3188+
* of mm/rmap.c:
3189+
* - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
3190+
* hugetlb mapping);
3191+
* - all i_mmap_rwsem locks;
3192+
* - all anon_vma->rwseml
3193+
*
3194+
* We can take all locks within these types randomly because the VM code
3195+
* doesn't nest them and we protected from parallel mm_take_all_locks() by
3196+
* mm_all_locks_mutex.
31913197
*
31923198
* mm_take_all_locks() and mm_drop_all_locks are expensive operations
31933199
* that may have to take thousand of locks.
@@ -3206,7 +3212,16 @@ int mm_take_all_locks(struct mm_struct *mm)
32063212
for (vma = mm->mmap; vma; vma = vma->vm_next) {
32073213
if (signal_pending(current))
32083214
goto out_unlock;
3209-
if (vma->vm_file && vma->vm_file->f_mapping)
3215+
if (vma->vm_file && vma->vm_file->f_mapping &&
3216+
is_vm_hugetlb_page(vma))
3217+
vm_lock_mapping(mm, vma->vm_file->f_mapping);
3218+
}
3219+
3220+
for (vma = mm->mmap; vma; vma = vma->vm_next) {
3221+
if (signal_pending(current))
3222+
goto out_unlock;
3223+
if (vma->vm_file && vma->vm_file->f_mapping &&
3224+
!is_vm_hugetlb_page(vma))
32103225
vm_lock_mapping(mm, vma->vm_file->f_mapping);
32113226
}
32123227

mm/rmap.c

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -23,21 +23,22 @@
2323
* inode->i_mutex (while writing or truncating, not reading or faulting)
2424
* mm->mmap_sem
2525
* page->flags PG_locked (lock_page)
26-
* mapping->i_mmap_rwsem
27-
* anon_vma->rwsem
28-
* mm->page_table_lock or pte_lock
29-
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
30-
* swap_lock (in swap_duplicate, swap_info_get)
31-
* mmlist_lock (in mmput, drain_mmlist and others)
32-
* mapping->private_lock (in __set_page_dirty_buffers)
33-
* mem_cgroup_{begin,end}_page_stat (memcg->move_lock)
34-
* mapping->tree_lock (widely used)
35-
* inode->i_lock (in set_page_dirty's __mark_inode_dirty)
36-
* bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
37-
* sb_lock (within inode_lock in fs/fs-writeback.c)
38-
* mapping->tree_lock (widely used, in set_page_dirty,
39-
* in arch-dependent flush_dcache_mmap_lock,
40-
* within bdi.wb->list_lock in __sync_single_inode)
26+
* hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share)
27+
* mapping->i_mmap_rwsem
28+
* anon_vma->rwsem
29+
* mm->page_table_lock or pte_lock
30+
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
31+
* swap_lock (in swap_duplicate, swap_info_get)
32+
* mmlist_lock (in mmput, drain_mmlist and others)
33+
* mapping->private_lock (in __set_page_dirty_buffers)
34+
* mem_cgroup_{begin,end}_page_stat (memcg->move_lock)
35+
* mapping->tree_lock (widely used)
36+
* inode->i_lock (in set_page_dirty's __mark_inode_dirty)
37+
* bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
38+
* sb_lock (within inode_lock in fs/fs-writeback.c)
39+
* mapping->tree_lock (widely used, in set_page_dirty,
40+
* in arch-dependent flush_dcache_mmap_lock,
41+
* within bdi.wb->list_lock in __sync_single_inode)
4142
*
4243
* anon_vma->rwsem,mapping->i_mutex (memory_failure, collect_procs_anon)
4344
* ->tasklist_lock

0 commit comments

Comments
 (0)