Skip to content
Snippets Groups Projects
  1. Dec 31, 2021
  2. Dec 25, 2021
    • Liu Shixin's avatar
      mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page() · 2a57d83c
      Liu Shixin authored
      Hulk Robot reported a panic in put_page_testzero() when testing
      madvise() with MADV_SOFT_OFFLINE.  The BUG() is triggered when retrying
      get_any_page().  This is because we keep MF_COUNT_INCREASED flag in
      second try but the refcnt is not increased.
      
          page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
          ------------[ cut here ]------------
          kernel BUG at include/linux/mm.h:737!
          invalid opcode: 0000 [#1] PREEMPT SMP
          CPU: 5 PID: 2135 Comm: sshd Tainted: G    B             5.16.0-rc6-dirty #373
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
          RIP: release_pages+0x53f/0x840
          Call Trace:
            free_pages_and_swap_cache+0x64/0x80
            tlb_flush_mmu+0x6f/0x220
            unmap_page_range+0xe6c/0x12c0
            unmap_single_vma+0x90/0x170
            unmap_vmas+0xc4/0x180
            exit_mmap+0xde/0x3a0
            mmput+0xa3/0x250
            do_exit+0x564/0x1470
            do_group_exit+0x3b/0x100
            __do_sys_exit_group+0x13/0x20
            __x64_sys_exit_group+0x16/0x20
            do_syscall_64+0x34/0x80
            entry_SYSCALL_64_after_hwframe+0x44/0xae
          Modules linked in:
          ---[ end trace e99579b570fe0649 ]---
          RIP: 0010:release_pages+0x53f/0x840
      
      Link: https://lkml.kernel.org/r/20211221074908.3910286-1-liushixin2@huawei.com
      
      
      Fixes: b94e0282 ("mm,hwpoison: try to narrow window race for free pages")
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a57d83c
    • SeongJae Park's avatar
      mm/damon/dbgfs: protect targets destructions with kdamond_lock · 34796417
      SeongJae Park authored
      DAMON debugfs interface iterates current monitoring targets in
      'dbgfs_target_ids_read()' while holding the corresponding
      'kdamond_lock'.  However, it also destructs the monitoring targets in
      'dbgfs_before_terminate()' without holding the lock.  This can result in
      a use_after_free bug.  This commit avoids the race by protecting the
      destruction with the corresponding 'kdamond_lock'.
      
      Link: https://lkml.kernel.org/r/20211221094447.2241-1-sj@kernel.org
      
      
      Reported-by: default avatarSangwoo Bae <sangwoob@amazon.com>
      Fixes: 4bc05954 ("mm/damon: implement a debugfs-based user space interface")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: <stable@vger.kernel.org>	[5.15.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34796417
    • Naoya Horiguchi's avatar
      mm, hwpoison: fix condition in free hugetlb page path · e37e7b0b
      Naoya Horiguchi authored
      When a memory error hits a tail page of a free hugepage,
      __page_handle_poison() is expected to be called to isolate the error in
      4kB unit, but it's not called due to the outdated if-condition in
      memory_failure_hugetlb().  This loses the chance to isolate the error in
      the finer unit, so it's not optimal.  Drop the condition.
      
      This "(p != head && TestSetPageHWPoison(head)" condition is based on the
      old semantics of PageHWPoison on hugepage (where PG_hwpoison flag was
      set on the subpage), so it's not necessray any more.  By getting to set
      PG_hwpoison on head page for hugepages, concurrent error events on
      different subpages in a single hugepage can be prevented by
      TestSetPageHWPoison(head) at the beginning of memory_failure_hugetlb().
      So dropping the condition should not reopen the race window originally
      mentioned in commit b985194c ("hwpoison, hugetlb:
      lock_page/unlock_page does not match for handling a free hugepage")
      
      [naoya.horiguchi@linux.dev: fix "HardwareCorrupted" counter]
        Link: https://lkml.kernel.org/r/20211220084851.GA1460264@u2004
      
      Link: https://lkml.kernel.org/r/20211210110208.879740-1-naoya.horiguchi@linux.dev
      
      
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reported-by: default avatarFei Luo <luofei@unicloud.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>	[5.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e37e7b0b
    • Andrey Ryabinin's avatar
      mm: mempolicy: fix THP allocations escaping mempolicy restrictions · 33863534
      Andrey Ryabinin authored
      alloc_pages_vma() may try to allocate THP page on the local NUMA node
      first:
      
      	page = __alloc_pages_node(hpage_node,
      		gfp | __GFP_THISNODE | __GFP_NORETRY, order);
      
      And if the allocation fails it retries allowing remote memory:
      
      	if (!page && (gfp & __GFP_DIRECT_RECLAIM))
          		page = __alloc_pages_node(hpage_node,
      					gfp, order);
      
      However, this retry allocation completely ignores memory policy nodemask
      allowing allocation to escape restrictions.
      
      The first appearance of this bug seems to be the commit ac5b2c18
      ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings").
      
      The bug disappeared later in the commit 89c83fb5 ("mm, thp:
      consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") and
      reappeared again in slightly different form in the commit 76e654cc
      ("mm, page_alloc: allow hugepage fallback to remote nodes when
      madvised")
      
      Fix this by passing correct nodemask to the __alloc_pages() call.
      
      The demonstration/reproducer of the problem:
      
          $ mount -oremount,size=4G,huge=always /dev/shm/
          $ echo always > /sys/kernel/mm/transparent_hugepage/defrag
          $ cat mbind_thp.c
          #include <unistd.h>
          #include <sys/mman.h>
          #include <sys/stat.h>
          #include <fcntl.h>
          #include <assert.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <numaif.h>
      
          #define SIZE 2ULL << 30
          int main(int argc, char **argv)
          {
              int fd;
              unsigned long long i;
              char *addr;
              pid_t pid;
              char buf[100];
              unsigned long nodemask = 1;
      
              fd = open("/dev/shm/test", O_RDWR|O_CREAT);
              assert(fd > 0);
              assert(ftruncate(fd, SIZE) == 0);
      
              addr = mmap(NULL, SIZE, PROT_READ|PROT_WRITE,
                                 MAP_SHARED, fd, 0);
      
              assert(mbind(addr, SIZE, MPOL_BIND, &nodemask, 2, MPOL_MF_STRICT|MPOL_MF_MOVE)==0);
              for (i = 0; i < SIZE; i+=4096) {
                addr[i] = 1;
              }
              pid = getpid();
              snprintf(buf, sizeof(buf), "grep shm /proc/%d/numa_maps", pid);
              system(buf);
              sleep(10000);
      
              return 0;
          }
          $ gcc mbind_thp.c -o mbind_thp -lnuma
          $ numactl -H
          available: 2 nodes (0-1)
          node 0 cpus: 0 2
          node 0 size: 1918 MB
          node 0 free: 1595 MB
          node 1 cpus: 1 3
          node 1 size: 2014 MB
          node 1 free: 1731 MB
          node distances:
          node   0   1
            0:  10  20
            1:  20  10
          $ rm -f /dev/shm/test; taskset -c 0 ./mbind_thp
          7fd970a00000 bind:0 file=/dev/shm/test dirty=524288 active=0 N0=396800 N1=127488 kernelpagesize_kB=4
      
      Link: https://lkml.kernel.org/r/20211208165343.22349-1-arbn@yandex-team.com
      
      
      Fixes: ac5b2c18 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
      Signed-off-by: default avatarAndrey Ryabinin <arbn@yandex-team.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      33863534
    • Baokun Li's avatar
      kfence: fix memory leak when cat kfence objects · 0129ab1f
      Baokun Li authored
      Hulk robot reported a kmemleak problem:
      
          unreferenced object 0xffff93d1d8cc02e8 (size 248):
            comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
            hex dump (first 32 bytes):
              00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00  .@..............
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            backtrace:
               seq_open+0x2a/0x80
               full_proxy_open+0x167/0x1e0
               do_dentry_open+0x1e1/0x3a0
               path_openat+0x961/0xa20
               do_filp_open+0xae/0x120
               do_sys_openat2+0x216/0x2f0
               do_sys_open+0x57/0x80
               do_syscall_64+0x33/0x40
               entry_SYSCALL_64_after_hwframe+0x44/0xa9
          unreferenced object 0xffff93d419854000 (size 4096):
            comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
            hex dump (first 32 bytes):
              6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30  kfence-#250: 0x0
              30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d  0000000754bda12-
            backtrace:
               seq_read_iter+0x313/0x440
               seq_read+0x14b/0x1a0
               full_proxy_read+0x56/0x80
               vfs_read+0xa5/0x1b0
               ksys_read+0xa0/0xf0
               do_syscall_64+0x33/0x40
               entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      I find that we can easily reproduce this problem with the following
      commands:
      
      	cat /sys/kernel/debug/kfence/objects
      	echo scan > /sys/kernel/debug/kmemleak
      	cat /sys/kernel/debug/kmemleak
      
      The leaked memory is allocated in the stack below:
      
          do_syscall_64
            do_sys_open
              do_dentry_open
                full_proxy_open
                  seq_open            ---> alloc seq_file
            vfs_read
              full_proxy_read
                seq_read
                  seq_read_iter
                    traverse          ---> alloc seq_buf
      
      And it should have been released in the following process:
      
          do_syscall_64
            syscall_exit_to_user_mode
              exit_to_user_mode_prepare
                task_work_run
                  ____fput
                    __fput
                      full_proxy_release  ---> free here
      
      However, the release function corresponding to file_operations is not
      implemented in kfence.  As a result, a memory leak occurs.  Therefore,
      the solution to this problem is to implement the corresponding release
      function.
      
      Link: https://lkml.kernel.org/r/20211206133628.2822545-1-libaokun1@huawei.com
      
      
      Fixes: 0ce20dd8 ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Yu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0129ab1f
  3. Dec 11, 2021
  4. Dec 06, 2021
    • Vladimir Murzin's avatar
      percpu: km: ensure it is used with NOMMU (either UP or SMP) · 3583521a
      Vladimir Murzin authored
      
      Currently, NOMMU pull km allocator via !SMP dependency because most of
      them are UP, yet for SMP+NOMMU vm allocator gets pulled which:
      
      * may lead to broken build [1]
      * ...or not working runtime due to [2]
      
      It looks like SMP+NOMMU case was overlooked in bbddff05 ("percpu:
      use percpu allocator on UP too") so restore that.
      
      [1]
      For ARM SMP+NOMMU (R-class cores)
      
      arm-none-linux-gnueabihf-ld: mm/percpu.o: in function `pcpu_post_unmap_tlb_flush':
      mm/percpu-vm.c:188: undefined reference to `flush_tlb_kernel_range'
      
      [2]
      static inline
      int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
                      pgprot_t prot, struct page **pages, unsigned int page_shift)
      {
             return -EINVAL;
      }
      
      Signed-off-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Tested-by: default avatarRob Landley <rob@landley.net>
      Tested-by: default avatarRich Felker <dalias@libc.org>
      [Dennis: use depends instead of default for condition]
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      3583521a
  5. Dec 03, 2021
  6. Nov 22, 2021
    • Nadav Amit's avatar
      hugetlbfs: flush before unlock on move_hugetlb_page_tables() · 13e4ad2c
      Nadav Amit authored
      
      We must flush the TLB before releasing i_mmap_rwsem to avoid the
      potential reuse of an unshared PMDs page.  This is not true in the case
      of move_hugetlb_page_tables().  The last reference on the page table can
      therefore be dropped before the TLB flush took place.
      
      Prevent it by reordering the operations and flushing the TLB before
      releasing i_mmap_rwsem.
      
      Fixes: 550a7d60 ("mm, hugepages: add mremap() support for hugepage backed vma")
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13e4ad2c
    • Nadav Amit's avatar
      hugetlbfs: flush TLBs correctly after huge_pmd_unshare · a4a118f2
      Nadav Amit authored
      
      When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB
      flush is missing.  This TLB flush must be performed before releasing the
      i_mmap_rwsem, in order to prevent an unshared PMDs page from being
      released and reused before the TLB flush took place.
      
      Arguably, a comprehensive solution would use mmu_gather interface to
      batch the TLB flushes and the PMDs page release, however it is not an
      easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call
      huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2)
      deferring the release of the page reference for the PMDs page until
      after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into
      thinking PMDs are shared when they are not.
      
      Fix __unmap_hugepage_range() by adding the missing TLB flush, and
      forcing a flush when unshare is successful.
      
      Fixes: 24669e58 ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a4a118f2
  7. Nov 20, 2021
  8. Nov 18, 2021
  9. Nov 17, 2021
  10. Nov 13, 2021
  11. Nov 11, 2021
Loading