Skip to content
Snippets Groups Projects
  1. May 27, 2022
    • Minchan Kim's avatar
      mm: fix is_pinnable_page against a cma page · 1c563432
      Minchan Kim authored
      Pages in the CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so
      the current is_pinnable_page() could miss CMA pages which have
      MIGRATE_ISOLATE.  It ends up pinning CMA pages as longterm for the
      pin_user_pages() API so CMA allocations keep failing until the pin is
      released.
      
           CPU 0                                   CPU 1 - Task B
      
      cma_alloc
      alloc_contig_range
                                              pin_user_pages_fast(FOLL_LONGTERM)
      change pageblock as MIGRATE_ISOLATE
                                              internal_get_user_pages_fast
                                              lockless_pages_from_mm
                                              gup_pte_range
                                              try_grab_folio
                                              is_pinnable_page
                                                return true;
                                              So, pinned the page successfully.
      page migration failure with pinned page
                                              ..
                                              .. After 30 sec
                                              unpin_user_page(page)
      
      CMA allocation succeeded after 30 sec.
      
      The CMA allocation path protects the migration type change race using
      zone->lock but what GUP path need to know is just whether the page is on
      CMA area or not rather than exact migration type.  Thus, we don't need
      zone->lock but just checks migration type in either of (MIGRATE_ISOLATE
      and MIGRATE_CMA).
      
      Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting
      of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither
      CMA nor movable zone if the page is temporarily unmovable.  However, such
      a migration failure by unexpected temporal refcount holding is general
      issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also
      transient state like other temporal elevated refcount problem.
      
      Link: https://lkml.kernel.org/r/20220524171525.976723-1-minchan@kernel.org
      
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1c563432
    • Zi Yan's avatar
      mm: split free page with properly free memory accounting and without race · 86d28b07
      Zi Yan authored
      In isolate_single_pageblock(), free pages are checked without holding zone
      lock, but they can go away in split_free_page() when zone lock is held.
      Check the free page and its order again in split_free_page() when zone lock
      is held. Recheck the page if the free page is gone under zone lock.
      
      In addition, in split_free_page(), the free page was deleted from the page
      list without changing free page accounting. Add the missing free page
      accounting code.
      
      Fix the type of order parameter in split_free_page().
      
      Link: https://lore.kernel.org/lkml/20220525103621.987185e2ca0079f7b97b856d@linux-foundation.org/
      Link: https://lkml.kernel.org/r/20220526231531.2404977-2-zi.yan@sent.com
      
      
      Fixes: b2c9e2fb ("mm: make alloc_contig_range work at pageblock granularity")
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Reported-by: default avatarDoug Berger <opendmb@gmail.com>
        Link: https://lore.kernel.org/linux-mm/c3932a6f-77fe-29f7-0c29-fe6b1c67ab7b@gmail.com/
      
      
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Qian Cai <quic_qiancai@quicinc.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Eric Ren <renzhengeek@gmail.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michael Walle <michael@walle.cc>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      86d28b07
    • Mel Gorman's avatar
      mm/page_alloc: always attempt to allocate at least one page during bulk allocation · c572e488
      Mel Gorman authored
      Peter Pavlisko reported the following problem on kernel bugzilla 216007.
      
      	When I try to extract an uncompressed tar archive (2.6 milion
      	files, 760.3 GiB in size) on newly created (empty) XFS file system,
      	after first low tens of gigabytes extracted the process hangs in
      	iowait indefinitely. One CPU core is 100% occupied with iowait,
      	the other CPU core is idle (on 2-core Intel Celeron G1610T).
      
      It was bisected to c9fa5630 ("xfs: use alloc_pages_bulk_array() for
      buffers") but XFS is only the messenger.  The problem is that nothing is
      waking kswapd to reclaim some pages at a time the PCP lists cannot be
      refilled until some reclaim happens.  The bulk allocator checks that there
      are some pages in the array and the original intent was that a bulk
      allocator did not necessarily need all the requested pages and it was best
      to return as quickly as possible.
      
      This was fine for the first user of the API but both NFS and XFS require
      the requested number of pages be available before making progress.  Both
      could be adjusted to call the page allocator directly if a bulk allocation
      fails but it puts a burden on users of the API.  Adjust the semantics to
      attempt at least one allocation via __alloc_pages() before returning so
      kswapd is woken if necessary.
      
      It was reported via bugzilla that the patch addressed the problem and that
      the tar extraction completed successfully.  This may also address bug
      215975 but has yet to be confirmed.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216007
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215975
      Link: https://lkml.kernel.org/r/20220526091210.GC3441@techsingularity.net
      
      
      Fixes: 387ba26f ("mm/page_alloc: add a bulk page allocator")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: <stable@vger.kernel.org>	[5.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c572e488
  2. May 25, 2022
    • Zi Yan's avatar
      mm: fix a potential infinite loop in start_isolate_page_range() · 88ee1343
      Zi Yan authored
      In isolate_single_pageblock() called by start_isolate_page_range(), there
      are some pageblock isolation issues causing a potential infinite loop when
      isolating a page range.  This is reported by Qian Cai.
      
      1. the pageblock was isolated by just changing pageblock migratetype
         without checking unmovable pages. Calling set_migratetype_isolate() to
         isolate pageblock properly.
      2. an off-by-one error caused migrating pages unnecessarily, since the page
         is not crossing pageblock boundary.
      3. migrating a compound page across pageblock boundary then splitting the
         free page later has a small race window that the free page might be
         allocated again, so that the code will try again, causing an potential
         infinite loop. Temporarily set the to-be-migrated page's pageblock to
         MIGRATE_ISOLATE to prevent that and bail out early if no free page is
         found after page migration.
      
      An additional fix to split_free_page() aims to avoid crashing in
      __free_one_page().  When the free page is split at the specified
      split_pfn_offset, free_page_order should check both the first bit of
      free_page_pfn and the last bit of split_pfn_offset and use the smaller
      one.  For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
      free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
      0x8000, which the original algorithm did.
      
      [akpm@linux-foundation.org: suppress min() warning]
      Link: https://lkml.kernel.org/r/20220524194756.1698351-1-zi.yan@sent.com
      
      
      Fixes: b2c9e2fb ("mm: make alloc_contig_range work at pageblock granularity")
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Reported-by: default avatarQian Cai <quic_qiancai@quicinc.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric Ren <renzhengeek@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      88ee1343
  3. May 19, 2022
  4. May 13, 2022
  5. May 10, 2022
  6. Apr 29, 2022
  7. Apr 24, 2022
  8. Apr 15, 2022
    • Juergen Gross's avatar
      mm, page_alloc: fix build_zonerefs_node() · e553f62f
      Juergen Gross authored
      Since commit 6aa303de ("mm, vmscan: only allocate and reclaim from
      zones with pages managed by the buddy allocator") only zones with free
      memory are included in a built zonelist.  This is problematic when e.g.
      all memory of a zone has been ballooned out when zonelists are being
      rebuilt.
      
      The decision whether to rebuild the zonelists when onlining new memory
      is done based on populated_zone() returning 0 for the zone the memory
      will be added to.  The new zone is added to the zonelists only, if it
      has free memory pages (managed_zone() returns a non-zero value) after
      the memory has been onlined.  This implies, that onlining memory will
      always free the added pages to the allocator immediately, but this is
      not true in all cases: when e.g. running as a Xen guest the onlined new
      memory will be added only to the ballooned memory list, it will be freed
      only when the guest is being ballooned up afterwards.
      
      Another problem with using managed_zone() for the decision whether a
      zone is being added to the zonelists is, that a zone with all memory
      used will in fact be removed from all zonelists in case the zonelists
      happen to be rebuilt.
      
      Use populated_zone() when building a zonelist as it has been done before
      that commit.
      
      There was a report that QubesOS (based on Xen) is hitting this problem.
      Xen has switched to use the zone device functionality in kernel 5.9 and
      QubesOS wants to use memory hotplugging for guests in order to be able
      to start a guest with minimal memory and expand it as needed.  This was
      the report leading to the patch.
      
      Link: https://lkml.kernel.org/r/20220407120637.9035-1-jgross@suse.com
      
      
      Fixes: 6aa303de ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reported-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e553f62f
  9. Apr 05, 2022
  10. Apr 01, 2022
  11. Mar 30, 2022
  12. Mar 25, 2022
  13. Mar 22, 2022
Loading