Skip to content
Snippets Groups Projects
  1. Nov 06, 2024
  2. Nov 03, 2024
  3. Sep 04, 2024
    • Liam R. Howlett's avatar
      ipc/shm, mm: drop do_vma_munmap() · 63fc66f5
      Liam R. Howlett authored
      The do_vma_munmap() wrapper existed for callers that didn't have a vma
      iterator and needed to check the vma mseal status prior to calling the
      underlying munmap().  All callers now use a vma iterator and since the
      mseal check has been moved to do_vmi_align_munmap() and the vmas are
      aligned, this function can just be called instead.
      
      do_vmi_align_munmap() can no longer be static as ipc/shm is using it and
      it is exported via the mm.h header.
      
      Link: https://lkml.kernel.org/r/20240830040101.822209-19-Liam.Howlett@oracle.com
      
      
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Reviewed-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Cc: Bert Karwatzki <spasswolf@web.de>
      Cc: Jeff Xu <jeffxu@chromium.org>
      Cc: Jiri Olsa <olsajiri@gmail.com>
      Cc: Kees Cook <kees@kernel.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      63fc66f5
  4. Aug 13, 2024
    • Al Viro's avatar
      introduce fd_file(), convert all accessors to it. · 1da91ea8
      Al Viro authored
      
      	For any changes of struct fd representation we need to
      turn existing accesses to fields into calls of wrappers.
      Accesses to struct fd::flags are very few (3 in linux/file.h,
      1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
      explicit initializers).
      	Those can be dealt with in the commit converting to
      new layout; accesses to struct fd::file are too many for that.
      	This commit converts (almost) all of f.file to
      fd_file(f).  It's not entirely mechanical ('file' is used as
      a member name more than just in struct fd) and it does not
      even attempt to distinguish the uses in pointer context from
      those in boolean context; the latter will be eventually turned
      into a separate helper (fd_empty()).
      
      	NOTE: mass conversion to fd_empty(), tempting as it
      might be, is a bad idea; better do that piecewise in commit
      that convert from fdget...() to CLASS(...).
      
      [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
      caught by git; fs/stat.c one got caught by git grep]
      [fs/xattr.c conflict]
      
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1da91ea8
  5. Jul 24, 2024
    • Joel Granados's avatar
      sysctl: treewide: constify the ctl_table argument of proc_handlers · 78eb4ea2
      Joel Granados authored
      
      const qualify the struct ctl_table argument in the proc_handler function
      signatures. This is a prerequisite to moving the static ctl_table
      structs into .rodata data which will ensure that proc_handler function
      pointers cannot be modified.
      
      This patch has been generated by the following coccinelle script:
      
      ```
        virtual patch
      
        @r1@
        identifier ctl, write, buffer, lenp, ppos;
        identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos);
      
        @r2@
        identifier func, ctl, write, buffer, lenp, ppos;
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos)
        { ... }
      
        @r3@
        identifier func;
        @@
      
        int func(
        - struct ctl_table *
        + const struct ctl_table *
          ,int , void *, size_t *, loff_t *);
      
        @r4@
        identifier func, ctl;
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int , void *, size_t *, loff_t *);
      
        @r5@
        identifier func, write, buffer, lenp, ppos;
        @@
      
        int func(
        - struct ctl_table *
        + const struct ctl_table *
          ,int write, void *buffer, size_t *lenp, loff_t *ppos);
      
      ```
      
      * Code formatting was adjusted in xfs_sysctl.c to comply with code
        conventions. The xfs_stats_clear_proc_handler,
        xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
        adjusted.
      
      * The ctl_table argument in proc_watchdog_common was const qualified.
        This is called from a proc_handler itself and is calling back into
        another proc_handler, making it necessary to change it as part of the
        proc_handler migration.
      
      Co-developed-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Signed-off-by: default avatarThomas Weißschuh <linux@weissschuh.net>
      Co-developed-by: default avatarJoel Granados <j.granados@samsung.com>
      Signed-off-by: default avatarJoel Granados <j.granados@samsung.com>
      78eb4ea2
  6. Jul 09, 2024
  7. Jul 03, 2024
  8. Apr 26, 2024
  9. Apr 24, 2024
  10. Apr 22, 2024
  11. Apr 09, 2024
  12. Feb 23, 2024
  13. Feb 22, 2024
  14. Dec 21, 2023
  15. Oct 25, 2023
    • Hugh Dickins's avatar
      mempolicy: alloc_pages_mpol() for NUMA policy without vma · ddc1a5cb
      Hugh Dickins authored
      Shrink shmem's stack usage by eliminating the pseudo-vma from its folio
      allocation.  alloc_pages_mpol(gfp, order, pol, ilx, nid) becomes the
      principal actor for passing mempolicy choice down to __alloc_pages(),
      rather than vma_alloc_folio(gfp, order, vma, addr, hugepage).
      
      vma_alloc_folio() and alloc_pages() remain, but as wrappers around
      alloc_pages_mpol().  alloc_pages_bulk_*() untouched, except to provide the
      additional args to policy_nodemask(), which subsumes policy_node(). 
      Cleanup throughout, cutting out some unhelpful "helpers".
      
      It would all be much simpler without MPOL_INTERLEAVE, but that adds a
      dynamic to the constant mpol: complicated by v3.6 commit 09c231cb
      ("tmpfs: distribute interleave better across nodes"), which added ino bias
      to the interleave, hidden from mm/mempolicy.c until this commit.
      
      Hence "ilx" throughout, the "interleave index".  Originally I thought it
      could be done just with nid, but that's wrong: the nodemask may come from
      the shared policy layer below a shmem vma, or it may come from the task
      layer above a shmem vma; and without the final nodemask then nodeid cannot
      be decided.  And how ilx is applied depends also on page order.
      
      The interleave index is almost always irrelevant unless MPOL_INTERLEAVE:
      with one exception in alloc_pages_mpol(), where the NO_INTERLEAVE_INDEX
      passed down from vma-less alloc_pages() is also used as hint not to use
      THP-style hugepage allocation - to avoid the overhead of a hugepage arg
      (though I don't understand why we never just added a GFP bit for THP - if
      it actually needs a different allocation strategy from other pages of the
      same order).  vma_alloc_folio() still carries its hugepage arg here, but
      it is not used, and should be removed when agreed.
      
      get_vma_policy() no longer allows a NULL vma: over time I believe we've
      eradicated all the places which used to need it e.g.  swapoff and madvise
      used to pass NULL vma to read_swap_cache_async(), but now know the vma.
      
      [hughd@google.com: handle NULL mpol being passed to __read_swap_cache_async()]
        Link: https://lkml.kernel.org/r/ea419956-4751-0102-21f7-9c93cb957892@google.com
      Link: https://lkml.kernel.org/r/74e34633-6060-f5e3-aee-7040d43f2e93@google.com
      Link: https://lkml.kernel.org/r/1738368e-bac0-fd11-ed7f-b87142a939fe@google.com
      
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Nhat Pham <nphamcs@gmail.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun heo <tj@kernel.org>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Domenico Cerasuolo <mimmocerasuolo@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ddc1a5cb
  16. Oct 18, 2023
  17. Aug 18, 2023
  18. Aug 15, 2023
    • Joel Granados's avatar
      sysctl: Add a size arg to __register_sysctl_table · bff97cf1
      Joel Granados authored
      
      We make these changes in order to prepare __register_sysctl_table and
      its callers for when we remove the sentinel element (empty element at
      the end of ctl_table arrays). We don't actually remove any sentinels in
      this commit, but we *do* make sure to use ARRAY_SIZE so the table_size
      is available when the removal occurs.
      
      We add a table_size argument to __register_sysctl_table and adjust
      callers, all of which pass ctl_table pointers and need an explicit call
      to ARRAY_SIZE. We implement a size calculation in register_net_sysctl in
      order to forward the size of the array pointer received from the network
      register calls.
      
      The new table_size argument does not yet have any effect in the
      init_header call which is still dependent on the sentinel's presence.
      table_size *does* however drive the `kzalloc` allocation in
      __register_sysctl_table with no adverse effects as the allocated memory
      is either one element greater than the calculated ctl_table array (for
      the calls in ipc_sysctl.c, mq_sysctl.c and ucount.c) or the exact size
      of the calculated ctl_table array (for the call from sysctl_net.c and
      register_sysctl). This approach will allows us to "just" remove the
      sentinel without further changes to __register_sysctl_table as
      table_size will represent the exact size for all the callers at that
      point.
      
      Signed-off-by: default avatarJoel Granados <j.granados@samsung.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      bff97cf1
  19. Jul 24, 2023
  20. Jul 11, 2023
  21. Feb 10, 2023
  22. Jan 28, 2023
  23. Jan 19, 2023
    • Christian Brauner's avatar
      fs: port ->permission() to pass mnt_idmap · 4609e1f1
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      4609e1f1
    • Christian Brauner's avatar
      fs: port ->create() to pass mnt_idmap · 6c960e68
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      6c960e68
  24. Jan 18, 2023
    • Christian Brauner's avatar
      fs: port vfs_*() helpers to struct mnt_idmap · abf08576
      Christian Brauner authored
      
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      abf08576
  25. Dec 12, 2022
  26. Dec 05, 2022
    • Jann Horn's avatar
      ipc/sem: Fix dangling sem_array access in semtimedop race · b52be557
      Jann Horn authored
      
      When __do_semtimedop() goes to sleep because it has to wait for a
      semaphore value becoming zero or becoming bigger than some threshold, it
      links the on-stack sem_queue to the sem_array, then goes to sleep
      without holding a reference on the sem_array.
      
      When __do_semtimedop() comes back out of sleep, one of two things must
      happen:
      
       a) We prove that the on-stack sem_queue has been disconnected from the
          (possibly freed) sem_array, making it safe to return from the stack
          frame that the sem_queue exists in.
      
       b) We stabilize our reference to the sem_array, lock the sem_array, and
          detach the sem_queue from the sem_array ourselves.
      
      sem_array has RCU lifetime, so for case (b), the reference can be
      stabilized inside an RCU read-side critical section by locklessly
      checking whether the sem_queue is still connected to the sem_array.
      
      However, the current code does the lockless check on sem_queue before
      starting an RCU read-side critical section, so the result of the
      lockless check immediately becomes useless.
      
      Fix it by doing rcu_read_lock() before the lockless check.  Now RCU
      ensures that if we observe the object being on our queue, the object
      can't be freed until rcu_read_unlock().
      
      This bug is only hittable on kernel builds with full preemption support
      (either CONFIG_PREEMPT or PREEMPT_DYNAMIC with preempt=full).
      
      Fixes: 370b262c ("ipc/sem: avoid idr tree lookup for interrupted semop")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b52be557
  27. Nov 23, 2022
    • Mike Kravetz's avatar
      ipc/shm: call underlying open/close vm_ops · b6305049
      Mike Kravetz authored
      Shared memory segments can be created that are backed by hugetlb pages. 
      When this happens, the vmas associated with any mappings (shmat) are
      marked VM_HUGETLB, yet the vm_ops for such mappings are provided by
      ipc/shm (shm_vm_ops).  There is a mechanism to call the underlying hugetlb
      vm_ops, and this is done for most operations.  However, it is not done for
      open and close.
      
      This was not an issue until the introduction of the hugetlb vma_lock. 
      This lock structure is pointed to by vm_private_data and the open/close
      vm_ops help maintain this structure.  The special hugetlb routine called
      at fork took care of structure updates at fork time.  However,
      vma_splitting is not properly handled for ipc shared memory mappings
      backed by hugetlb pages.  This can result in a "kernel NULL pointer
      dereference" BUG or use after free as two vmas point to the same lock
      structure.
      
      Update the shm open and close routines to always call the underlying open
      and close routines.
      
      Link: https://lkml.kernel.org/r/20221114210018.49346-1-mike.kravetz@oracle.com
      
      
      Fixes: 8d9bfb26 ("hugetlb: add vma based lock for pmd sharing")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarDoug Nelson <doug.nelson@intel.com>
      Reported-by: default avatar <syzbot+83b4134621b7c326d950@syzkaller.appspotmail.com>
      Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b6305049
  28. Oct 28, 2022
  29. Oct 03, 2022
Loading