Skip to content
Snippets Groups Projects
  1. Oct 23, 2022
  2. Oct 22, 2022
  3. Oct 21, 2022
  4. Oct 17, 2022
  5. Oct 16, 2022
  6. Oct 15, 2022
  7. Oct 14, 2022
  8. Oct 13, 2022
    • Alistair Popple's avatar
      mm/migrate_device.c: add migrate_device_range() · e778406b
      Alistair Popple authored
      Device drivers can use the migrate_vma family of functions to migrate
      existing private anonymous mappings to device private pages.  These pages
      are backed by memory on the device with drivers being responsible for
      copying data to and from device memory.
      
      Device private pages are freed via the pgmap->page_free() callback when
      they are unmapped and their refcount drops to zero.  Alternatively they
      may be freed indirectly via migration back to CPU memory in response to a
      pgmap->migrate_to_ram() callback called whenever the CPU accesses an
      address mapped to a device private page.
      
      In other words drivers cannot control the lifetime of data allocated on
      the devices and must wait until these pages are freed from userspace. 
      This causes issues when memory needs to reclaimed on the device, either
      because the device is going away due to a ->release() callback or because
      another user needs to use the memory.
      
      Drivers could use the existing migrate_vma functions to migrate data off
      the device.  However this would require them to track the mappings of each
      page which is both complicated and not always possible.  Instead drivers
      need to be able to migrate device pages directly so they can free up
      device memory.
      
      To allow that this patch introduces the migrate_device family of functions
      which are functionally similar to migrate_vma but which skips the initial
      lookup based on mapping.
      
      Link: https://lkml.kernel.org/r/868116aab70b0c8ee467d62498bb2cf0ef907295.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e778406b
    • Alistair Popple's avatar
      mm: free device private pages have zero refcount · ef233450
      Alistair Popple authored
      Since 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page
      refcount") device private pages have no longer had an extra reference
      count when the page is in use.  However before handing them back to the
      owning device driver we add an extra reference count such that free pages
      have a reference count of one.
      
      This makes it difficult to tell if a page is free or not because both free
      and in use pages will have a non-zero refcount.  Instead we should return
      pages to the drivers page allocator with a zero reference count.  Kernel
      code can then safely use kernel functions such as get_page_unless_zero().
      
      Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Acked-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ef233450
    • Alistair Popple's avatar
      mm/memory.c: fix race when faulting a device private page · 16ce101d
      Alistair Popple authored
      Patch series "Fix several device private page reference counting issues",
      v2
      
      This series aims to fix a number of page reference counting issues in
      drivers dealing with device private ZONE_DEVICE pages.  These result in
      use-after-free type bugs, either from accessing a struct page which no
      longer exists because it has been removed or accessing fields within the
      struct page which are no longer valid because the page has been freed.
      
      During normal usage it is unlikely these will cause any problems.  However
      without these fixes it is possible to crash the kernel from userspace. 
      These crashes can be triggered either by unloading the kernel module or
      unbinding the device from the driver prior to a userspace task exiting. 
      In modules such as Nouveau it is also possible to trigger some of these
      issues by explicitly closing the device file-descriptor prior to the task
      exiting and then accessing device private memory.
      
      This involves some minor changes to both PowerPC and AMD GPU code. 
      Unfortunately I lack hardware to test either of those so any help there
      would be appreciated.  The changes mimic what is done in for both Nouveau
      and hmm-tests though so I doubt they will cause problems.
      
      
      This patch (of 8):
      
      When the CPU tries to access a device private page the migrate_to_ram()
      callback associated with the pgmap for the page is called.  However no
      reference is taken on the faulting page.  Therefore a concurrent migration
      of the device private page can free the page and possibly the underlying
      pgmap.  This results in a race which can crash the kernel due to the
      migrate_to_ram() function pointer becoming invalid.  It also means drivers
      can't reliably read the zone_device_data field because the page may have
      been freed with memunmap_pages().
      
      Close the race by getting a reference on the page while holding the ptl to
      ensure it has not been freed.  Unfortunately the elevated reference count
      will cause the migration required to handle the fault to fail.  To avoid
      this failure pass the faulting page into the migrate_vma functions so that
      if an elevated reference count is found it can be checked to see if it's
      expected or not.
      
      [mpe@ellerman.id.au: fix build]
        Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
      Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
      Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Acked-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      16ce101d
    • Xin Hao's avatar
      mm/damon: move sz_damon_region to damon_sz_region · 652e0446
      Xin Hao authored
      Rename sz_damon_region() to damon_sz_region(), and move it to
      "include/linux/damon.h", because in many places, we can to use this func.
      
      Link: https://lkml.kernel.org/r/20220927001946.85375-1-xhao@linux.alibaba.com
      
      
      Signed-off-by: default avatarXin Hao <xhao@linux.alibaba.com>
      Suggested-by: default avatarSeongJae Park <sj@kernel.org>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      652e0446
  9. Oct 12, 2022
  10. Oct 11, 2022
    • Jason A. Donenfeld's avatar
      prandom: remove unused functions · de492c83
      Jason A. Donenfeld authored
      
      With no callers left of prandom_u32() and prandom_bytes(), as well as
      get_random_int(), remove these deprecated wrappers, in favor of
      get_random_u32() and get_random_bytes().
      
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      de492c83
    • Jason A. Donenfeld's avatar
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld authored
      
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
    • Yosry Ahmed's avatar
      cgroup: add cgroup_v1v2_get_from_[fd/file]() · a6d1ce59
      Yosry Ahmed authored
      
      Add cgroup_v1v2_get_from_fd() and cgroup_v1v2_get_from_file() that
      support both cgroup1 and cgroup2.
      
      Signed-off-by: default avatarYosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      a6d1ce59
  11. Oct 10, 2022
  12. Oct 07, 2022
  13. Oct 06, 2022
Loading