Skip to content

Commit 3f70dc3

Browse files
Michal Hockotorvalds
Michal Hocko
authored andcommitted
mm: make sure that kthreads will not refault oom reaped memory
There are only few use_mm() users in the kernel right now. Most of them write to the target memory but vhost driver relies on copy_from_user/get_user from a kernel thread context. This makes it impossible to reap the memory of an oom victim which shares the mm with the vhost kernel thread because it could see a zero page unexpectedly and theoretically make an incorrect decision visible outside of the killed task context. To quote Michael S. Tsirkin: : Getting an error from __get_user and friends is handled gracefully. : Getting zero instead of a real value will cause userspace : memory corruption. The vhost kernel thread is bound to an open fd of the vhost device which is not tight to the mm owner life cycle in general. The device fd can be inherited or passed over to another process which means that we really have to be careful about unexpected memory corruption because unlike for normal oom victims the result will be visible outside of the oom victim context. Make sure that no kthread context (users of use_mm) can ever see corrupted data because of the oom reaper and hook into the page fault path by checking MMF_UNSTABLE mm flag. __oom_reap_task_mm will set the flag before it starts unmapping the address space while the flag is checked after the page fault has been handled. If the flag is set then SIGBUS is triggered so any g-u-p user will get a error code. Regular tasks do not need this protection because all which share the mm are killed when the mm is reaped and so the corruption will not outlive them. This patch shouldn't have any visible effect at this moment because the OOM killer doesn't invoke oom reaper for tasks with mm shared with kthreads yet. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: "Michael S. Tsirkin" <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 3853120 commit 3f70dc3

File tree

3 files changed

+22
-0
lines changed

3 files changed

+22
-0
lines changed

include/linux/sched.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -525,6 +525,7 @@ static inline int get_dumpable(struct mm_struct *mm)
525525
#define MMF_HAS_UPROBES 19 /* has uprobes */
526526
#define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */
527527
#define MMF_OOM_SKIP 21 /* mm is of no interest for the OOM killer */
528+
#define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */
528529

529530
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK)
530531

mm/memory.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3658,6 +3658,19 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
36583658
mem_cgroup_oom_synchronize(false);
36593659
}
36603660

3661+
/*
3662+
* This mm has been already reaped by the oom reaper and so the
3663+
* refault cannot be trusted in general. Anonymous refaults would
3664+
* lose data and give a zero page instead e.g. This is especially
3665+
* problem for use_mm() because regular tasks will just die and
3666+
* the corrupted data will not be visible anywhere while kthread
3667+
* will outlive the oom victim and potentially propagate the date
3668+
* further.
3669+
*/
3670+
if (unlikely((current->flags & PF_KTHREAD) && !(ret & VM_FAULT_ERROR)
3671+
&& test_bit(MMF_UNSTABLE, &vma->vm_mm->flags)))
3672+
ret = VM_FAULT_SIGBUS;
3673+
36613674
return ret;
36623675
}
36633676
EXPORT_SYMBOL_GPL(handle_mm_fault);

mm/oom_kill.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -495,6 +495,14 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
495495
goto unlock_oom;
496496
}
497497

498+
/*
499+
* Tell all users of get_user/copy_from_user etc... that the content
500+
* is no longer stable. No barriers really needed because unmapping
501+
* should imply barriers already and the reader would hit a page fault
502+
* if it stumbled over a reaped memory.
503+
*/
504+
set_bit(MMF_UNSTABLE, &mm->flags);
505+
498506
tlb_gather_mmu(&tlb, mm, 0, -1);
499507
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
500508
if (is_vm_hugetlb_page(vma))

0 commit comments

Comments
 (0)