Skip to content
Snippets Groups Projects
  1. Oct 30, 2024
  2. Oct 25, 2024
  3. Oct 24, 2024
  4. Oct 22, 2024
  5. Oct 18, 2024
  6. Oct 17, 2024
  7. Oct 14, 2024
  8. Oct 11, 2024
  9. Oct 09, 2024
  10. Oct 08, 2024
  11. Oct 03, 2024
    • Antonino Maniscalco's avatar
      drm/msm/a6xx: Add a flag to allow preemption to submitqueue_create · 7788d320
      Antonino Maniscalco authored and Rob Clark's avatar Rob Clark committed
      
      Some userspace changes are necessary so add a flag for userspace to
      advertise support for preemption when creating the submitqueue.
      
      When this flag is not set preemption will not be allowed in the middle
      of the submitted IBs therefore mantaining compatibility with older
      userspace.
      
      The flag is rejected if preemption is not supported on the target, this
      allows userspace to know whether preemption is supported.
      
      Tested-by: default avatarRob Clark <robdclark@gmail.com>
      Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
      Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
      Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8450-HDK
      Signed-off-by: default avatarAntonino Maniscalco <antomani103@gmail.com>
      Patchwork: https://patchwork.freedesktop.org/patch/618028/
      
      
      Signed-off-by: default avatarRob Clark <robdclark@chromium.org>
      7788d320
    • Eric Dumazet's avatar
      net: test for not too small csum_start in virtio_net_hdr_to_skb() · 49d14b54
      Eric Dumazet authored
      
      syzbot was able to trigger this warning [1], after injecting a
      malicious packet through af_packet, setting skb->csum_start and thus
      the transport header to an incorrect value.
      
      We can at least make sure the transport header is after
      the end of the network header (with a estimated minimal size).
      
      [1]
      [   67.873027] skb len=4096 headroom=16 headlen=14 tailroom=0
      mac=(-1,-1) mac_len=0 net=(16,-6) trans=10
      shinfo(txflags=0 nr_frags=1 gso(size=0 type=0 segs=0))
      csum(0xa start=10 offset=0 ip_summed=3 complete_sw=0 valid=0 level=0)
      hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=0
      priority=0x0 mark=0x0 alloc_cpu=10 vlan_all=0x0
      encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0)
      [   67.877172] dev name=veth0_vlan feat=0x000061164fdd09e9
      [   67.877764] sk family=17 type=3 proto=0
      [   67.878279] skb linear:   00000000: 00 00 10 00 00 00 00 00 0f 00 00 00 08 00
      [   67.879128] skb frag:     00000000: 0e 00 07 00 00 00 28 00 08 80 1c 00 04 00 00 02
      [   67.879877] skb frag:     00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.880647] skb frag:     00000020: 00 00 02 00 00 00 08 00 1b 00 00 00 00 00 00 00
      [   67.881156] skb frag:     00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.881753] skb frag:     00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.882173] skb frag:     00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.882790] skb frag:     00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.883171] skb frag:     00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.883733] skb frag:     00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.884206] skb frag:     00000090: 00 00 00 00 00 00 00 00 00 00 69 70 76 6c 61 6e
      [   67.884704] skb frag:     000000a0: 31 00 00 00 00 00 00 00 00 00 2b 00 00 00 00 00
      [   67.885139] skb frag:     000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.885677] skb frag:     000000c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.886042] skb frag:     000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.886408] skb frag:     000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.887020] skb frag:     000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   67.887384] skb frag:     00000100: 00 00
      [   67.887878] ------------[ cut here ]------------
      [   67.887908] offset (-6) >= skb_headlen() (14)
      [   67.888445] WARNING: CPU: 10 PID: 2088 at net/core/dev.c:3332 skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.889353] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p 9pnet netfs
      [   67.890111] CPU: 10 UID: 0 PID: 2088 Comm: b363492833 Not tainted 6.11.0-virtme #1011
      [   67.890183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
      [   67.890309] RIP: 0010:skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891043] Call Trace:
      [   67.891173]  <TASK>
      [   67.891274] ? __warn (kernel/panic.c:741)
      [   67.891320] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891333] ? report_bug (lib/bug.c:180 lib/bug.c:219)
      [   67.891348] ? handle_bug (arch/x86/kernel/traps.c:239)
      [   67.891363] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
      [   67.891372] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
      [   67.891388] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891399] ? skb_checksum_help (net/core/dev.c:3332 (discriminator 2))
      [   67.891416] ip_do_fragment (net/ipv4/ip_output.c:777 (discriminator 1))
      [   67.891448] ? __ip_local_out (./include/linux/skbuff.h:1146 ./include/net/l3mdev.h:196 ./include/net/l3mdev.h:213 net/ipv4/ip_output.c:113)
      [   67.891459] ? __pfx_ip_finish_output2 (net/ipv4/ip_output.c:200)
      [   67.891470] ? ip_route_output_flow (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:96 (discriminator 13) ./include/linux/rcupdate.h:871 (discriminator 13) net/ipv4/route.c:2625 (discriminator 13) ./include/net/route.h:141 (discriminator 13) net/ipv4/route.c:2852 (discriminator 13))
      [   67.891484] ipvlan_process_v4_outbound (drivers/net/ipvlan/ipvlan_core.c:445 (discriminator 1))
      [   67.891581] ipvlan_queue_xmit (drivers/net/ipvlan/ipvlan_core.c:542 drivers/net/ipvlan/ipvlan_core.c:604 drivers/net/ipvlan/ipvlan_core.c:670)
      [   67.891596] ipvlan_start_xmit (drivers/net/ipvlan/ipvlan_main.c:227)
      [   67.891607] dev_hard_start_xmit (./include/linux/netdevice.h:4916 ./include/linux/netdevice.h:4925 net/core/dev.c:3588 net/core/dev.c:3604)
      [   67.891620] __dev_queue_xmit (net/core/dev.h:168 (discriminator 25) net/core/dev.c:4425 (discriminator 25))
      [   67.891630] ? skb_copy_bits (./include/linux/uaccess.h:233 (discriminator 1) ./include/linux/uaccess.h:260 (discriminator 1) ./include/linux/highmem-internal.h:230 (discriminator 1) net/core/skbuff.c:3018 (discriminator 1))
      [   67.891645] ? __pskb_pull_tail (net/core/skbuff.c:2848 (discriminator 4))
      [   67.891655] ? skb_partial_csum_set (net/core/skbuff.c:5657)
      [   67.891666] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/skbuff.h:2791 (discriminator 3) ./include/linux/skbuff.h:2799 (discriminator 3) ./include/linux/virtio_net.h:109 (discriminator 3))
      [   67.891684] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1))
      [   67.891700] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4))
      [   67.891716] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1))
      [   67.891734] ? do_sock_setsockopt (net/socket.c:2335)
      [   67.891747] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355)
      [   67.891761] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1))
      [   67.891772] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
      [   67.891785] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
      
      Fixes: 9181d6f8 ("net: add more sanity check in virtio_net_hdr_to_skb()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://patch.msgid.link/20240926165836.3797406-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      49d14b54
  12. Oct 02, 2024
    • Al Viro's avatar
      move asm/unaligned.h to linux/unaligned.h · 5f60d5f6
      Al Viro authored
      asm/unaligned.h is always an include of asm-generic/unaligned.h;
      might as well move that thing to linux/unaligned.h and include
      that - there's nothing arch-specific in that header.
      
      auto-generated by the following:
      
      for i in `git grep -l -w asm/unaligned.h`; do
      	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
      done
      for i in `git grep -l -w asm-generic/unaligned.h`; do
      	sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
      done
      git mv include/asm-generic/unaligned.h include/linux/unaligned.h
      git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
      sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
      sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
      5f60d5f6
    • Lizhi Xu's avatar
      inotify: Fix possible deadlock in fsnotify_destroy_mark · cad3f4a2
      Lizhi Xu authored and Jan Kara's avatar Jan Kara committed
      
      [Syzbot reported]
      WARNING: possible circular locking dependency detected
      6.11.0-rc4-syzkaller-00019-gb311c1b497e5 #0 Not tainted
      ------------------------------------------------------
      kswapd0/78 is trying to acquire lock:
      ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline]
      ffff88801b8d8930 (&group->mark_mutex){+.+.}-{3:3}, at: fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578
      
      but task is already holding lock:
      ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:6841 [inline]
      ffffffff8ea2fd60 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0xbb4/0x35a0 mm/vmscan.c:7223
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (fs_reclaim){+.+.}-{0:0}:
             ...
             kmem_cache_alloc_noprof+0x3d/0x2a0 mm/slub.c:4044
             inotify_new_watch fs/notify/inotify/inotify_user.c:599 [inline]
             inotify_update_watch fs/notify/inotify/inotify_user.c:647 [inline]
             __do_sys_inotify_add_watch fs/notify/inotify/inotify_user.c:786 [inline]
             __se_sys_inotify_add_watch+0x72e/0x1070 fs/notify/inotify/inotify_user.c:729
             do_syscall_x64 arch/x86/entry/common.c:52 [inline]
             do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
             entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      -> #0 (&group->mark_mutex){+.+.}-{3:3}:
             ...
             __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
             fsnotify_group_lock include/linux/fsnotify_backend.h:270 [inline]
             fsnotify_destroy_mark+0x38/0x3c0 fs/notify/mark.c:578
             fsnotify_destroy_marks+0x14a/0x660 fs/notify/mark.c:934
             fsnotify_inoderemove include/linux/fsnotify.h:264 [inline]
             dentry_unlink_inode+0x2e0/0x430 fs/dcache.c:403
             __dentry_kill+0x20d/0x630 fs/dcache.c:610
             shrink_kill+0xa9/0x2c0 fs/dcache.c:1055
             shrink_dentry_list+0x2c0/0x5b0 fs/dcache.c:1082
             prune_dcache_sb+0x10f/0x180 fs/dcache.c:1163
             super_cache_scan+0x34f/0x4b0 fs/super.c:221
             do_shrink_slab+0x701/0x1160 mm/shrinker.c:435
             shrink_slab+0x1093/0x14d0 mm/shrinker.c:662
             shrink_one+0x43b/0x850 mm/vmscan.c:4815
             shrink_many mm/vmscan.c:4876 [inline]
             lru_gen_shrink_node mm/vmscan.c:4954 [inline]
             shrink_node+0x3799/0x3de0 mm/vmscan.c:5934
             kswapd_shrink_node mm/vmscan.c:6762 [inline]
             balance_pgdat mm/vmscan.c:6954 [inline]
             kswapd+0x1bcd/0x35a0 mm/vmscan.c:7223
      
      [Analysis]
      The problem is that inotify_new_watch() is using GFP_KERNEL to allocate
      new watches under group->mark_mutex, however if dentry reclaim races
      with unlinking of an inode, it can end up dropping the last dentry reference
      for an unlinked inode resulting in removal of fsnotify mark from reclaim
      context which wants to acquire group->mark_mutex as well.
      
      This scenario shows that all notification groups are in principle prone
      to this kind of a deadlock (previously, we considered only fanotify and
      dnotify to be problematic for other reasons) so make sure all
      allocations under group->mark_mutex happen with GFP_NOFS.
      
      Reported-and-tested-by: default avatar <syzbot+c679f13773f295d2da53@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=c679f13773f295d2da53
      
      
      Signed-off-by: default avatarLizhi Xu <lizhi.xu@windriver.com>
      Reviewed-by: default avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Link: https://patch.msgid.link/20240927143642.2369508-1-lizhi.xu@windriver.com
      cad3f4a2
    • Jaroslav Kysela's avatar
      ALSA: hda: fix trigger_tstamp_latched · df521561
      Jaroslav Kysela authored and Takashi Iwai's avatar Takashi Iwai committed
      When the trigger_tstamp_latched flag is set, the PCM core code assumes that
      the low-level driver handles the trigger timestamping itself. Ensure that
      runtime->trigger_tstamp is always updated.
      
      Buglink: https://github.com/alsa-project/alsa-lib/issues/387
      
      
      Reported-by: default avatarZeno Endemann <zeno.endemann@mailbox.org>
      Signed-off-by: default avatarJaroslav Kysela <perex@perex.cz>
      Link: https://patch.msgid.link/20241002081306.1788405-1-perex@perex.cz
      
      
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      df521561
  13. Oct 01, 2024
  14. Sep 30, 2024
    • David Howells's avatar
      netfs: Fix the netfs_folio tracepoint to handle NULL mapping · f801850b
      David Howells authored
      
      Fix the netfs_folio tracepoint to handle folios that have a NULL mapping
      pointer.  In such a case, just substitute a zero inode number.
      
      Fixes: c38f4e96 ("netfs: Provide func to copy data to pagecache for buffered write")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/2917423.1727697556@warthog.procyon.org.uk
      
      
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f801850b
    • David Howells's avatar
      netfs: Add folio_queue API documentation · 28e8c5c0
      David Howells authored
      
      Add API documentation for folio_queue.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/2912369.1727691281@warthog.procyon.org.uk
      
      
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-doc@vger.kernel.org
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      28e8c5c0
    • Al Viro's avatar
      close_range(): fix the logics in descriptor table trimming · 678379e1
      Al Viro authored
      
      Cloning a descriptor table picks the size that would cover all currently
      opened files.  That's fine for clone() and unshare(), but for close_range()
      there's an additional twist - we clone before we close, and it would be
      a shame to have
      	close_range(3, ~0U, CLOSE_RANGE_UNSHARE)
      leave us with a huge descriptor table when we are not going to keep
      anything past stderr, just because some large file descriptor used to
      be open before our call has taken it out.
      
      Unfortunately, it had been dealt with in an inherently racy way -
      sane_fdtable_size() gets a "don't copy anything past that" argument
      (passed via unshare_fd() and dup_fd()), close_range() decides how much
      should be trimmed and passes that to unshare_fd().
      
      The problem is, a range that used to extend to the end of descriptor
      table back when close_range() had looked at it might very well have stuff
      grown after it by the time dup_fd() has allocated a new files_struct
      and started to figure out the capacity of fdtable to be attached to that.
      
      That leads to interesting pathological cases; at the very least it's a
      QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes
      a snapshot of descriptor table one might have observed at some point.
      Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination
      of unshare(CLONE_FILES) with plain close_range(), ending up with a
      weird state that would never occur with unshare(2) is confusing, to put
      it mildly.
      
      It's not hard to get rid of - all it takes is passing both ends of the
      range down to sane_fdtable_size().  There we are under ->files_lock,
      so the race is trivially avoided.
      
      So we do the following:
      	* switch close_files() from calling unshare_fd() to calling
      dup_fd().
      	* undo the calling convention change done to unshare_fd() in
      60997c3d "close_range: add CLOSE_RANGE_UNSHARE"
      	* introduce struct fd_range, pass a pointer to that to dup_fd()
      and sane_fdtable_size() instead of "trim everything past that point"
      they are currently getting.  NULL means "we are not going to be punching
      any holes"; NR_OPEN_MAX is gone.
      	* make sane_fdtable_size() use find_last_bit() instead of
      open-coding it; it's easier to follow that way.
      	* while we are at it, have dup_fd() report errors by returning
      ERR_PTR(), no need to use a separate int *errorp argument.
      
      Fixes: 60997c3d "close_range: add CLOSE_RANGE_UNSHARE"
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      678379e1
  15. Sep 27, 2024
    • Al Viro's avatar
      [tree-wide] finally take no_llseek out · cb787f4a
      Al Viro authored
      
      no_llseek had been defined to NULL two years ago, in commit 868941b1
      ("fs: remove no_llseek")
      
      To quote that commit,
      
        At -rc1 we'll need do a mechanical removal of no_llseek -
      
        git grep -l -w no_llseek | grep -v porting.rst | while read i; do
      	sed -i '/\<no_llseek\>/d' $i
        done
      
        would do it.
      
      Unfortunately, that hadn't been done.  Linus, could you do that now, so
      that we could finally put that thing to rest? All instances are of the
      form
      	.llseek = no_llseek,
      so it's obviously safe.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb787f4a
  16. Sep 26, 2024
    • Tiezhu Yang's avatar
      compiler.h: specify correct attribute for .rodata..c_jump_table · c5b1184d
      Tiezhu Yang authored
      Currently, there is an assembler message when generating kernel/bpf/core.o
      under CONFIG_OBJTOOL with LoongArch compiler toolchain:
      
        Warning: setting incorrect section attributes for .rodata..c_jump_table
      
      This is because the section ".rodata..c_jump_table" should be readonly,
      but there is a "W" (writable) part of the flags:
      
        $ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
        [34] .rodata..c_j[...] PROGBITS         0000000000000000  0000d2e0
             0000000000000800  0000000000000000  WA       0     0     8
      
      There is no above issue on x86 due to the generated section flag is only
      "A" (allocatable). In order to silence the warning on LoongArch, specify
      the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
      then the section attribute of ".rodata..c_jump_table" must be readonly
      in the kernel/bpf/core.o file.
      
      Before:
      
        $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
         21 .rodata..c_jump_table 00000800  0000000000000000  0000000000000000  0000d2e0  2**3
                        CONTENTS, ALLOC, LOAD, RELOC, DATA
      
      After:
      
        $ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
         21 .rodata..c_jump_table 00000800  0000000000000000  0000000000000000  0000d2e0  2**3
                        CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
      
      By the way, AFAICT, maybe the root cause is related with the different
      compiler behavior of various archs, so to some extent this change is a
      workaround for LoongArch, and also there is no effect for x86 which is the
      only port supported by objtool before LoongArch with this patch.
      
      Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn
      
      
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Cc: Josh Poimboeuf <jpoimboe@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>	[6.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c5b1184d
    • Steve Sistare's avatar
      mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak · 26a8ea80
      Steve Sistare authored
      memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated
      if the pages were not already faulted in.  During a normal page fault,
      resv_huge_pages is consumed here:
      
      hugetlb_fault()
        alloc_hugetlb_folio()
          dequeue_hugetlb_folio_vma()
            dequeue_hugetlb_folio_nodemask()
              dequeue_hugetlb_folio_node_exact()
                free_huge_pages--
            resv_huge_pages--
      
      During memfd_pin_folios, the page is created by calling
      alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and
      resv_huge_pages is not modified:
      
      memfd_alloc_folio()
        alloc_hugetlb_folio_nodemask()
          dequeue_hugetlb_folio_nodemask()
            dequeue_hugetlb_folio_node_exact()
              free_huge_pages--
      
      alloc_hugetlb_folio_nodemask has other callers that must not modify
      resv_huge_pages.  Therefore, to fix, define an alternate version of
      alloc_hugetlb_folio_nodemask for this call site that adjusts
      resv_huge_pages.
      
      Link: https://lkml.kernel.org/r/1725373521-451395-4-git-send-email-steven.sistare@oracle.com
      
      
      Fixes: 89c1905d ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Acked-by: default avatarVivek Kasireddy <vivek.kasireddy@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      26a8ea80
Loading