Skip to content
Snippets Groups Projects
  1. Dec 05, 2024
    • Eric Dumazet's avatar
      net: avoid potential UAF in default_operstate() · 750e5160
      Eric Dumazet authored and Paolo Abeni's avatar Paolo Abeni committed
      
      syzbot reported an UAF in default_operstate() [1]
      
      Issue is a race between device and netns dismantles.
      
      After calling __rtnl_unlock() from netdev_run_todo(),
      we can not assume the netns of each device is still alive.
      
      Make sure the device is not in NETREG_UNREGISTERED state,
      and add an ASSERT_RTNL() before the call to
      __dev_get_by_index().
      
      We might move this ASSERT_RTNL() in __dev_get_by_index()
      in the future.
      
      [1]
      
      BUG: KASAN: slab-use-after-free in __dev_get_by_index+0x5d/0x110 net/core/dev.c:852
      Read of size 8 at addr ffff888043eba1b0 by task syz.0.0/5339
      
      CPU: 0 UID: 0 PID: 5339 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10296-gaaf20f870da0 #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:94 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
        print_address_description mm/kasan/report.c:378 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:489
        kasan_report+0x143/0x180 mm/kasan/report.c:602
        __dev_get_by_index+0x5d/0x110 net/core/dev.c:852
        default_operstate net/core/link_watch.c:51 [inline]
        rfc2863_policy+0x224/0x300 net/core/link_watch.c:67
        linkwatch_do_dev+0x3e/0x170 net/core/link_watch.c:170
        netdev_run_todo+0x461/0x1000 net/core/dev.c:10894
        rtnl_unlock net/core/rtnetlink.c:152 [inline]
        rtnl_net_unlock include/linux/rtnetlink.h:133 [inline]
        rtnl_dellink+0x760/0x8d0 net/core/rtnetlink.c:3520
        rtnetlink_rcv_msg+0x791/0xcf0 net/core/rtnetlink.c:6911
        netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2541
        netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
        netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1347
        netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1891
        sock_sendmsg_nosec net/socket.c:711 [inline]
        __sock_sendmsg+0x221/0x270 net/socket.c:726
        ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583
        ___sys_sendmsg net/socket.c:2637 [inline]
        __sys_sendmsg+0x269/0x350 net/socket.c:2669
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f2a3cb80809
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f2a3d9cd058 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f2a3cd45fa0 RCX: 00007f2a3cb80809
      RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000008
      RBP: 00007f2a3cbf393e R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 00007f2a3cd45fa0 R15: 00007ffd03bc65c8
       </TASK>
      
      Allocated by task 5339:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
        kasan_kmalloc include/linux/kasan.h:260 [inline]
        __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4314
        kmalloc_noprof include/linux/slab.h:901 [inline]
        kmalloc_array_noprof include/linux/slab.h:945 [inline]
        netdev_create_hash net/core/dev.c:11870 [inline]
        netdev_init+0x10c/0x250 net/core/dev.c:11890
        ops_init+0x31e/0x590 net/core/net_namespace.c:138
        setup_net+0x287/0x9e0 net/core/net_namespace.c:362
        copy_net_ns+0x33f/0x570 net/core/net_namespace.c:500
        create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110
        unshare_nsproxy_namespaces+0x124/0x180 kernel/nsproxy.c:228
        ksys_unshare+0x57d/0xa70 kernel/fork.c:3314
        __do_sys_unshare kernel/fork.c:3385 [inline]
        __se_sys_unshare kernel/fork.c:3383 [inline]
        __x64_sys_unshare+0x38/0x40 kernel/fork.c:3383
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Freed by task 12:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
        poison_slab_object mm/kasan/common.c:247 [inline]
        __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
        kasan_slab_free include/linux/kasan.h:233 [inline]
        slab_free_hook mm/slub.c:2338 [inline]
        slab_free mm/slub.c:4598 [inline]
        kfree+0x196/0x420 mm/slub.c:4746
        netdev_exit+0x65/0xd0 net/core/dev.c:11992
        ops_exit_list net/core/net_namespace.c:172 [inline]
        cleanup_net+0x802/0xcc0 net/core/net_namespace.c:632
        process_one_work kernel/workqueue.c:3229 [inline]
        process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
        worker_thread+0x870/0xd30 kernel/workqueue.c:3391
        kthread+0x2f0/0x390 kernel/kthread.c:389
        ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      The buggy address belongs to the object at ffff888043eba000
       which belongs to the cache kmalloc-2k of size 2048
      The buggy address is located 432 bytes inside of
       freed 2048-byte region [ffff888043eba000, ffff888043eba800)
      
      The buggy address belongs to the physical page:
      page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x43eb8
      head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
      page_type: f5(slab)
      raw: 04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
      head: 04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000
      head: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000
      head: 04fff00000000003 ffffea00010fae01 ffffffffffffffff 0000000000000000
      head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5339, tgid 5338 (syz.0.0), ts 69674195892, free_ts 69663220888
        set_page_owner include/linux/page_owner.h:32 [inline]
        post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1556
        prep_new_page mm/page_alloc.c:1564 [inline]
        get_page_from_freelist+0x3649/0x3790 mm/page_alloc.c:3474
        __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4751
        alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
        alloc_slab_page+0x6a/0x140 mm/slub.c:2408
        allocate_slab+0x5a/0x2f0 mm/slub.c:2574
        new_slab mm/slub.c:2627 [inline]
        ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3815
        __slab_alloc+0x58/0xa0 mm/slub.c:3905
        __slab_alloc_node mm/slub.c:3980 [inline]
        slab_alloc_node mm/slub.c:4141 [inline]
        __do_kmalloc_node mm/slub.c:4282 [inline]
        __kmalloc_noprof+0x2e6/0x4c0 mm/slub.c:4295
        kmalloc_noprof include/linux/slab.h:905 [inline]
        sk_prot_alloc+0xe0/0x210 net/core/sock.c:2165
        sk_alloc+0x38/0x370 net/core/sock.c:2218
        __netlink_create+0x65/0x260 net/netlink/af_netlink.c:629
        __netlink_kernel_create+0x174/0x6f0 net/netlink/af_netlink.c:2015
        netlink_kernel_create include/linux/netlink.h:62 [inline]
        uevent_net_init+0xed/0x2d0 lib/kobject_uevent.c:783
        ops_init+0x31e/0x590 net/core/net_namespace.c:138
        setup_net+0x287/0x9e0 net/core/net_namespace.c:362
      page last free pid 1032 tgid 1032 stack trace:
        reset_page_owner include/linux/page_owner.h:25 [inline]
        free_pages_prepare mm/page_alloc.c:1127 [inline]
        free_unref_page+0xdf9/0x1140 mm/page_alloc.c:2657
        __slab_free+0x31b/0x3d0 mm/slub.c:4509
        qlink_free mm/kasan/quarantine.c:163 [inline]
        qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
        kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
        __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
        kasan_slab_alloc include/linux/kasan.h:250 [inline]
        slab_post_alloc_hook mm/slub.c:4104 [inline]
        slab_alloc_node mm/slub.c:4153 [inline]
        kmem_cache_alloc_node_noprof+0x1d9/0x380 mm/slub.c:4205
        __alloc_skb+0x1c3/0x440 net/core/skbuff.c:668
        alloc_skb include/linux/skbuff.h:1323 [inline]
        alloc_skb_with_frags+0xc3/0x820 net/core/skbuff.c:6612
        sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2881
        sock_alloc_send_skb include/net/sock.h:1797 [inline]
        mld_newpack+0x1c3/0xaf0 net/ipv6/mcast.c:1747
        add_grhead net/ipv6/mcast.c:1850 [inline]
        add_grec+0x1492/0x19a0 net/ipv6/mcast.c:1988
        mld_send_initial_cr+0x228/0x4b0 net/ipv6/mcast.c:2234
        ipv6_mc_dad_complete+0x88/0x490 net/ipv6/mcast.c:2245
        addrconf_dad_completed+0x712/0xcd0 net/ipv6/addrconf.c:4342
       addrconf_dad_work+0xdc2/0x16f0
        process_one_work kernel/workqueue.c:3229 [inline]
        process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
      
      Memory state around the buggy address:
       ffff888043eba080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888043eba100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff888043eba180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
       ffff888043eba200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888043eba280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 8c55face ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down")
      Reported-by: default avatar <syzbot+1939f24bdb783e9e43d9@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/674f3a18.050a0220.48a03.0041.GAE@google.com/T/#u
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://patch.msgid.link/20241203170933.2449307-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      750e5160
    • Kory Maincent's avatar
      ethtool: Fix wrong mod state in case of verbose and no_mask bitset · 910c4788
      Kory Maincent authored
      
      A bitset without mask in a _SET request means we want exactly the bits in
      the bitset to be set. This works correctly for compact format but when
      verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
      bits present in the request bitset but does not clear the rest. The commit
      66991703 ("ethtool: fix application of verbose no_mask bitset") fixes
      this issue by clearing the whole target bitmap before we start iterating.
      The solution proposed brought an issue with the behavior of the mod
      variable. As the bitset is always cleared the old value will always
      differ to the new value.
      
      Fix it by adding a new function to compare bitmaps and a temporary variable
      which save the state of the old bitmap.
      
      Fixes: 66991703 ("ethtool: fix application of verbose no_mask bitset")
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Link: https://patch.msgid.link/20241202153358.1142095-1-kory.maincent@bootlin.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      910c4788
    • Paolo Abeni's avatar
      ipmr: tune the ipmr_can_free_table() checks. · 50b94204
      Paolo Abeni authored
      
      Eric reported a syzkaller-triggered splat caused by recent ipmr changes:
      
      WARNING: CPU: 2 PID: 6041 at net/ipv6/ip6mr.c:419
      ip6mr_free_table+0xbd/0x120 net/ipv6/ip6mr.c:419
      Modules linked in:
      CPU: 2 UID: 0 PID: 6041 Comm: syz-executor183 Not tainted
      6.12.0-syzkaller-10681-g65ae975e97d5 #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
      1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
      RIP: 0010:ip6mr_free_table+0xbd/0x120 net/ipv6/ip6mr.c:419
      Code: 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c
      02 00 75 58 49 83 bc 24 c0 0e 00 00 00 74 09 e8 44 ef a9 f7 90 <0f> 0b
      90 e8 3b ef a9 f7 48 8d 7b 38 e8 12 a3 96 f7 48 89 df be 0f
      RSP: 0018:ffffc90004267bd8 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff88803c710000 RCX: ffffffff89e4d844
      RDX: ffff88803c52c880 RSI: ffffffff89e4d87c RDI: ffff88803c578ec0
      RBP: 0000000000000001 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: ffff88803c578000
      R13: ffff88803c710000 R14: ffff88803c710008 R15: dead000000000100
      FS: 00007f7a855ee6c0(0000) GS:ffff88806a800000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7a85689938 CR3: 000000003c492000 CR4: 0000000000352ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      ip6mr_rules_exit+0x176/0x2d0 net/ipv6/ip6mr.c:283
      ip6mr_net_exit_batch+0x53/0xa0 net/ipv6/ip6mr.c:1388
      ops_exit_list+0x128/0x180 net/core/net_namespace.c:177
      setup_net+0x4fe/0x860 net/core/net_namespace.c:394
      copy_net_ns+0x2b4/0x6b0 net/core/net_namespace.c:500
      create_new_namespaces+0x3ea/0xad0 kernel/nsproxy.c:110
      unshare_nsproxy_namespaces+0xc0/0x1f0 kernel/nsproxy.c:228
      ksys_unshare+0x45d/0xa40 kernel/fork.c:3334
      __do_sys_unshare kernel/fork.c:3405 [inline]
      __se_sys_unshare kernel/fork.c:3403 [inline]
      __x64_sys_unshare+0x31/0x40 kernel/fork.c:3403
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
      entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f7a856332d9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48
      89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
      01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f7a855ee238 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 00007f7a856bd308 RCX: 00007f7a856332d9
      RDX: 00007f7a8560f8c6 RSI: 0000000000000000 RDI: 0000000062040200
      RBP: 00007f7a856bd300 R08: 00007fff932160a7 R09: 00007f7a855ee6c0
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a856bd30c
      R13: 0000000000000000 R14: 00007fff93215fc0 R15: 00007fff932160a8
      </TASK>
      
      The root cause is a network namespace creation failing after successful
      initialization of the ipmr subsystem. Such a case is not currently
      matched by the ipmr_can_free_table() helper.
      
      New namespaces are zeroed on allocation and inserted into net ns list
      only after successful creation; when deleting an ipmr table, the list
      next pointer can be NULL only on netns initialization failure.
      
      Update the ipmr_can_free_table() checks leveraging such condition.
      
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatar <syzbot+6e8cb445d4b43d006e0c@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=6e8cb445d4b43d006e0c
      
      
      Fixes: 11b6e701 ("ipmr: add debug check for mr table cleanup")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/8bde975e21bbca9d9c27e36209b2dd4f1d7a3f00.1733212078.git.pabeni@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      50b94204
  2. Dec 04, 2024
  3. Dec 03, 2024
    • Pablo Neira Ayuso's avatar
      netfilter: nft_inner: incorrect percpu area handling under softirq · 7b1d83da
      Pablo Neira Ayuso authored
      
      Softirq can interrupt ongoing packet from process context that is
      walking over the percpu area that contains inner header offsets.
      
      Disable bh and perform three checks before restoring the percpu inner
      header offsets to validate that the percpu area is valid for this
      skbuff:
      
      1) If the NFT_PKTINFO_INNER_FULL flag is set on, then this skbuff
         has already been parsed before for inner header fetching to
         register.
      
      2) Validate that the percpu area refers to this skbuff using the
         skbuff pointer as a cookie. If there is a cookie mismatch, then
         this skbuff needs to be parsed again.
      
      3) Finally, validate if the percpu area refers to this tunnel type.
      
      Only after these three checks the percpu area is restored to a on-stack
      copy and bh is enabled again.
      
      After inner header fetching, the on-stack copy is stored back to the
      percpu area.
      
      Fixes: 3a07327d ("netfilter: nft_inner: support for inner tunnel header matching")
      Reported-by: default avatar <syzbot+84d0441b9860f0d63285@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7b1d83da
    • Eric Dumazet's avatar
      net: hsr: must allocate more bytes for RedBox support · af8edaed
      Eric Dumazet authored and Paolo Abeni's avatar Paolo Abeni committed
      
      Blamed commit forgot to change hsr_init_skb() to allocate
      larger skb for RedBox case.
      
      Indeed, send_hsr_supervision_frame() will add
      two additional components (struct hsr_sup_tlv
      and struct hsr_sup_payload)
      
      syzbot reported the following crash:
      skbuff: skb_over_panic: text:ffffffff8afd4b0a len:34 put:6 head:ffff88802ad29e00 data:ffff88802ad29f22 tail:0x144 end:0x140 dev:gretap0
      ------------[ cut here ]------------
       kernel BUG at net/core/skbuff.c:206 !
      Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
      CPU: 2 UID: 0 PID: 7611 Comm: syz-executor Not tainted 6.12.0-syzkaller #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
       RIP: 0010:skb_panic+0x157/0x1d0 net/core/skbuff.c:206
      Code: b6 04 01 84 c0 74 04 3c 03 7e 21 8b 4b 70 41 56 45 89 e8 48 c7 c7 a0 7d 9b 8c 41 57 56 48 89 ee 52 4c 89 e2 e8 9a 76 79 f8 90 <0f> 0b 4c 89 4c 24 10 48 89 54 24 08 48 89 34 24 e8 94 76 fb f8 4c
      RSP: 0018:ffffc90000858ab8 EFLAGS: 00010282
      RAX: 0000000000000087 RBX: ffff8880598c08c0 RCX: ffffffff816d3e69
      RDX: 0000000000000000 RSI: ffffffff816de786 RDI: 0000000000000005
      RBP: ffffffff8c9b91c0 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000302 R11: ffffffff961cc1d0 R12: ffffffff8afd4b0a
      R13: 0000000000000006 R14: ffff88804b938130 R15: 0000000000000140
      FS:  000055558a3d6500(0000) GS:ffff88806a800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f1295974ff8 CR3: 000000002ab6e000 CR4: 0000000000352ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
        skb_over_panic net/core/skbuff.c:211 [inline]
        skb_put+0x174/0x1b0 net/core/skbuff.c:2617
        send_hsr_supervision_frame+0x6fa/0x9e0 net/hsr/hsr_device.c:342
        hsr_proxy_announce+0x1a3/0x4a0 net/hsr/hsr_device.c:436
        call_timer_fn+0x1a0/0x610 kernel/time/timer.c:1794
        expire_timers kernel/time/timer.c:1845 [inline]
        __run_timers+0x6e8/0x930 kernel/time/timer.c:2419
        __run_timer_base kernel/time/timer.c:2430 [inline]
        __run_timer_base kernel/time/timer.c:2423 [inline]
        run_timer_base+0x111/0x190 kernel/time/timer.c:2439
        run_timer_softirq+0x1a/0x40 kernel/time/timer.c:2449
        handle_softirqs+0x213/0x8f0 kernel/softirq.c:554
        __do_softirq kernel/softirq.c:588 [inline]
        invoke_softirq kernel/softirq.c:428 [inline]
        __irq_exit_rcu kernel/softirq.c:637 [inline]
        irq_exit_rcu+0xbb/0x120 kernel/softirq.c:649
        instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1049 [inline]
        sysvec_apic_timer_interrupt+0xa4/0xc0 arch/x86/kernel/apic/apic.c:1049
       </IRQ>
      
      Fixes: 5055cccf ("net: hsr: Provide RedBox support (HSR-SAN)")
      Reported-by: default avatar <syzbot+7f4643b267cc680bfa1c@syzkaller.appspotmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Lukasz Majewski <lukma@denx.de>
      Link: https://patch.msgid.link/20241202100558.507765-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      af8edaed
    • Cong Wang's avatar
      rtnetlink: fix double call of rtnl_link_get_net_ifla() · 48327566
      Cong Wang authored and Paolo Abeni's avatar Paolo Abeni committed
      
      Currently rtnl_link_get_net_ifla() gets called twice when we create
      peer devices, once in rtnl_add_peer_net() and once in each ->newlink()
      implementation.
      
      This looks safer, however, it leads to a classic Time-of-Check to
      Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And
      because of the lack of checking error pointer of the second call, it
      also leads to a kernel crash as reported by syzbot.
      
      Fix this by getting rid of the second call, which already becomes
      redudant after Kuniyuki's work. We have to propagate the result of the
      first rtnl_link_get_net_ifla() down to each ->newlink().
      
      Reported-by: default avatar <syzbot+21ba4d5adff0b6a7cfc6@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=21ba4d5adff0b6a7cfc6
      
      
      Fixes: 0eb87b02 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.")
      Fixes: 6b84e558 ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.")
      Fixes: fefd5d08 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.")
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/20241129212519.825567-1-xiyou.wangcong@gmail.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      48327566
    • Wen Gu's avatar
      net/smc: fix LGR and link use-after-free issue · 2c7f14ed
      Wen Gu authored and Paolo Abeni's avatar Paolo Abeni committed
      
      We encountered a LGR/link use-after-free issue, which manifested as
      the LGR/link refcnt reaching 0 early and entering the clear process,
      making resource access unsafe.
      
       refcount_t: addition on 0; use-after-free.
       WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
       Workqueue: events smc_lgr_terminate_work [smc]
       Call trace:
        refcount_warn_saturate+0x9c/0x140
        __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
        smc_lgr_terminate_work+0x28/0x30 [smc]
        process_one_work+0x1b8/0x420
        worker_thread+0x158/0x510
        kthread+0x114/0x118
      
      or
      
       refcount_t: underflow; use-after-free.
       WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
       Workqueue: smc_hs_wq smc_listen_work [smc]
       Call trace:
        refcount_warn_saturate+0xf0/0x140
        smcr_link_put+0x1cc/0x1d8 [smc]
        smc_conn_free+0x110/0x1b0 [smc]
        smc_conn_abort+0x50/0x60 [smc]
        smc_listen_find_device+0x75c/0x790 [smc]
        smc_listen_work+0x368/0x8a0 [smc]
        process_one_work+0x1b8/0x420
        worker_thread+0x158/0x510
        kthread+0x114/0x118
      
      It is caused by repeated release of LGR/link refcnt. One suspect is that
      smc_conn_free() is called repeatedly because some smc_conn_free() from
      server listening path are not protected by sock lock.
      
      e.g.
      
      Calls under socklock        | smc_listen_work
      -------------------------------------------------------
      lock_sock(sk)               | smc_conn_abort
      smc_conn_free               | \- smc_conn_free
      \- smcr_link_put            |    \- smcr_link_put (duplicated)
      release_sock(sk)
      
      So here add sock lock protection in smc_listen_work() path, making it
      exclusive with other connection operations.
      
      Fixes: 3b2dec26 ("net/smc: restructure client and server code in af_smc")
      Co-developed-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      Signed-off-by: default avatarGuangguan Wang <guangguan.wang@linux.alibaba.com>
      Co-developed-by: default avatarKai <KaiShen@linux.alibaba.com>
      Signed-off-by: default avatarKai <KaiShen@linux.alibaba.com>
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2c7f14ed
    • Wen Gu's avatar
      net/smc: initialize close_work early to avoid warning · 0541db8e
      Wen Gu authored and Paolo Abeni's avatar Paolo Abeni committed
      
      We encountered a warning that close_work was canceled before
      initialization.
      
        WARNING: CPU: 7 PID: 111103 at kernel/workqueue.c:3047 __flush_work+0x19e/0x1b0
        Workqueue: events smc_lgr_terminate_work [smc]
        RIP: 0010:__flush_work+0x19e/0x1b0
        Call Trace:
         ? __wake_up_common+0x7a/0x190
         ? work_busy+0x80/0x80
         __cancel_work_timer+0xe3/0x160
         smc_close_cancel_work+0x1a/0x70 [smc]
         smc_close_active_abort+0x207/0x360 [smc]
         __smc_lgr_terminate.part.38+0xc8/0x180 [smc]
         process_one_work+0x19e/0x340
         worker_thread+0x30/0x370
         ? process_one_work+0x340/0x340
         kthread+0x117/0x130
         ? __kthread_cancel_work+0x50/0x50
         ret_from_fork+0x22/0x30
      
      This is because when smc_close_cancel_work is triggered, e.g. the RDMA
      driver is rmmod and the LGR is terminated, the conn->close_work is
      flushed before initialization, resulting in WARN_ON(!work->func).
      
      __smc_lgr_terminate             | smc_connect_{rdma|ism}
      -------------------------------------------------------------
                                      | smc_conn_create
      				| \- smc_lgr_register_conn
      for conn in lgr->conns_all      |
      \- smc_conn_kill                |
         \- smc_close_active_abort    |
            \- smc_close_cancel_work  |
               \- cancel_work_sync    |
                  \- __flush_work     |
      	         (close_work)   |
      	                        | smc_close_init
      	                        | \- INIT_WORK(&close_work)
      
      So fix this by initializing close_work before establishing the
      connection.
      
      Fixes: 46c28dbd ("net/smc: no socket state changes in tasklet context")
      Fixes: 41349844 ("net/smc: add SMC-D support in af_smc")
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0541db8e
    • Kuniyuki Iwashima's avatar
      tipc: Fix use-after-free of kernel socket in cleanup_bearer(). · 6a2fa133
      Kuniyuki Iwashima authored and Paolo Abeni's avatar Paolo Abeni committed
      
      syzkaller reported a use-after-free of UDP kernel socket
      in cleanup_bearer() without repro. [0][1]
      
      When bearer_disable() calls tipc_udp_disable(), cleanup
      of the UDP kernel socket is deferred by work calling
      cleanup_bearer().
      
      tipc_net_stop() waits for such works to finish by checking
      tipc_net(net)->wq_count.  However, the work decrements the
      count too early before releasing the kernel socket,
      unblocking cleanup_net() and resulting in use-after-free.
      
      Let's move the decrement after releasing the socket in
      cleanup_bearer().
      
      [0]:
      ref_tracker: net notrefcnt@000000009b3d1faf has 1/1 users at
           sk_alloc+0x438/0x608
           inet_create+0x4c8/0xcb0
           __sock_create+0x350/0x6b8
           sock_create_kern+0x58/0x78
           udp_sock_create4+0x68/0x398
           udp_sock_create+0x88/0xc8
           tipc_udp_enable+0x5e8/0x848
           __tipc_nl_bearer_enable+0x84c/0xed8
           tipc_nl_bearer_enable+0x38/0x60
           genl_family_rcv_msg_doit+0x170/0x248
           genl_rcv_msg+0x400/0x5b0
           netlink_rcv_skb+0x1dc/0x398
           genl_rcv+0x44/0x68
           netlink_unicast+0x678/0x8b0
           netlink_sendmsg+0x5e4/0x898
           ____sys_sendmsg+0x500/0x830
      
      [1]:
      BUG: KMSAN: use-after-free in udp_hashslot include/net/udp.h:85 [inline]
      BUG: KMSAN: use-after-free in udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979
       udp_hashslot include/net/udp.h:85 [inline]
       udp_lib_unhash+0x3b8/0x930 net/ipv4/udp.c:1979
       sk_common_release+0xaf/0x3f0 net/core/sock.c:3820
       inet_release+0x1e0/0x260 net/ipv4/af_inet.c:437
       inet6_release+0x6f/0xd0 net/ipv6/af_inet6.c:489
       __sock_release net/socket.c:658 [inline]
       sock_release+0xa0/0x210 net/socket.c:686
       cleanup_bearer+0x42d/0x4c0 net/tipc/udp_media.c:819
       process_one_work kernel/workqueue.c:3229 [inline]
       process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310
       worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391
       kthread+0x531/0x6b0 kernel/kthread.c:389
       ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
      
      Uninit was created at:
       slab_free_hook mm/slub.c:2269 [inline]
       slab_free mm/slub.c:4580 [inline]
       kmem_cache_free+0x207/0xc40 mm/slub.c:4682
       net_free net/core/net_namespace.c:454 [inline]
       cleanup_net+0x16f2/0x19d0 net/core/net_namespace.c:647
       process_one_work kernel/workqueue.c:3229 [inline]
       process_scheduled_works+0xcaf/0x1c90 kernel/workqueue.c:3310
       worker_thread+0xf6c/0x1510 kernel/workqueue.c:3391
       kthread+0x531/0x6b0 kernel/kthread.c:389
       ret_from_fork+0x60/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
      
      CPU: 0 UID: 0 PID: 54 Comm: kworker/0:2 Not tainted 6.12.0-rc1-00131-gf66ebf37d69c #7 91723d6f74857f70725e1583cba3cf4adc716cfa
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
      Workqueue: events cleanup_bearer
      
      Fixes: 26abe143 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/20241127050512.28438-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6a2fa133
    • Ivan Solodovnikov's avatar
      dccp: Fix memory leak in dccp_feat_change_recv · 22be4727
      Ivan Solodovnikov authored and Paolo Abeni's avatar Paolo Abeni committed
      
      If dccp_feat_push_confirm() fails after new value for SP feature was accepted
      without reconciliation ('entry == NULL' branch), memory allocated for that value
      with dccp_feat_clone_sp_val() is never freed.
      
      Here is the kmemleak stack for this:
      
      unreferenced object 0xffff88801d4ab488 (size 8):
        comm "syz-executor310", pid 1127, jiffies 4295085598 (age 41.666s)
        hex dump (first 8 bytes):
          01 b4 4a 1d 80 88 ff ff                          ..J.....
        backtrace:
          [<00000000db7cabfe>] kmemdup+0x23/0x50 mm/util.c:128
          [<0000000019b38405>] kmemdup include/linux/string.h:465 [inline]
          [<0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:371 [inline]
          [<0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:367 [inline]
          [<0000000019b38405>] dccp_feat_change_recv net/dccp/feat.c:1145 [inline]
          [<0000000019b38405>] dccp_feat_parse_options+0x1196/0x2180 net/dccp/feat.c:1416
          [<00000000b1f6d94a>] dccp_parse_options+0xa2a/0x1260 net/dccp/options.c:125
          [<0000000030d7b621>] dccp_rcv_state_process+0x197/0x13d0 net/dccp/input.c:650
          [<000000001f74c72e>] dccp_v4_do_rcv+0xf9/0x1a0 net/dccp/ipv4.c:688
          [<00000000a6c24128>] sk_backlog_rcv include/net/sock.h:1041 [inline]
          [<00000000a6c24128>] __release_sock+0x139/0x3b0 net/core/sock.c:2570
          [<00000000cf1f3a53>] release_sock+0x54/0x1b0 net/core/sock.c:3111
          [<000000008422fa23>] inet_wait_for_connect net/ipv4/af_inet.c:603 [inline]
          [<000000008422fa23>] __inet_stream_connect+0x5d0/0xf70 net/ipv4/af_inet.c:696
          [<0000000015b6f64d>] inet_stream_connect+0x53/0xa0 net/ipv4/af_inet.c:735
          [<0000000010122488>] __sys_connect_file+0x15c/0x1a0 net/socket.c:1865
          [<00000000b4b70023>] __sys_connect+0x165/0x1a0 net/socket.c:1882
          [<00000000f4cb3815>] __do_sys_connect net/socket.c:1892 [inline]
          [<00000000f4cb3815>] __se_sys_connect net/socket.c:1889 [inline]
          [<00000000f4cb3815>] __x64_sys_connect+0x6e/0xb0 net/socket.c:1889
          [<00000000e7b1e839>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
          [<0000000055e91434>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
      
      Clean up the allocated memory in case of dccp_feat_push_confirm() failure
      and bail out with an error reset code.
      
      Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
      
      Fixes: e77b8363 ("dccp: Process incoming Change feature-negotiation options")
      Signed-off-by: default avatarIvan Solodovnikov <solodovnikov.ia@phystech.edu>
      Link: https://patch.msgid.link/20241126143902.190853-1-solodovnikov.ia@phystech.edu
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      22be4727
    • Jiri Wiesner's avatar
      net/ipv6: release expired exception dst cached in socket · 3301ab7d
      Jiri Wiesner authored
      Dst objects get leaked in ip6_negative_advice() when this function is
      executed for an expired IPv6 route located in the exception table. There
      are several conditions that must be fulfilled for the leak to occur:
      * an ICMPv6 packet indicating a change of the MTU for the path is received,
        resulting in an exception dst being created
      * a TCP connection that uses the exception dst for routing packets must
        start timing out so that TCP begins retransmissions
      * after the exception dst expires, the FIB6 garbage collector must not run
        before TCP executes ip6_negative_advice() for the expired exception dst
      
      When TCP executes ip6_negative_advice() for an exception dst that has
      expired and if no other socket holds a reference to the exception dst, the
      refcount of the exception dst is 2, which corresponds to the increment
      made by dst_init() and the increment made by the TCP socket for which the
      connection is timing out. The refcount made by the socket is never
      released. The refcount of the dst is decremented in sk_dst_reset() but
      that decrement is counteracted by a dst_hold() intentionally placed just
      before the sk_dst_reset() in ip6_negative_advice(). After
      ip6_negative_advice() has finished, there is no other object tied to the
      dst. The socket lost its reference stored in sk_dst_cache and the dst is
      no longer in the exception table. The exception dst becomes a leaked
      object.
      
      As a result of this dst leak, an unbalanced refcount is reported for the
      loopback device of a net namespace being destroyed under kernels that do
      not contain e5f80fcf ("ipv6: give an IPv6 dev to blackhole_netdev"):
      unregister_netdevice: waiting for lo to become free. Usage count = 2
      
      Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
      patch that introduced the dst_hold() in ip6_negative_advice() was
      92f1655a ("net: fix __dst_negative_advice() race"). But 92f1655a
      merely refactored the code with regards to the dst refcount so the issue
      was present even before 92f1655a. The bug was introduced in
      54c1a859 ("ipv6: Don't drop cache route entry unless timer actually
      expired.") where the expired cached route is deleted and the sk_dst_cache
      member of the socket is set to NULL by calling dst_negative_advice() but
      the refcount belonging to the socket is left unbalanced.
      
      The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
      When the TCP connection times out ipv4_negative_advice() merely resets the
      sk_dst_cache of the socket while decrementing the refcount of the
      exception dst.
      
      Fixes: 92f1655a ("net: fix __dst_negative_advice() race")
      Fixes: 54c1a859 ("ipv6: Don't drop cache route entry unless timer actually expired.")
      Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u
      
      
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241128085950.GA4505@incl
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3301ab7d
  4. Dec 02, 2024
  5. Dec 01, 2024
    • Linus Torvalds's avatar
      Get rid of 'remove_new' relic from platform driver struct · e70140ba
      Linus Torvalds authored
      
      The continual trickle of small conversion patches is grating on me, and
      is really not helping.  Just get rid of the 'remove_new' member
      function, which is just an alias for the plain 'remove', and had a
      comment to that effect:
      
        /*
         * .remove_new() is a relic from a prototype conversion of .remove().
         * New drivers are supposed to implement .remove(). Once all drivers are
         * converted to not use .remove_new any more, it will be dropped.
         */
      
      This was just a tree-wide 'sed' script that replaced '.remove_new' with
      '.remove', with some care taken to turn a subsequent tab into two tabs
      to make things line up.
      
      I did do some minimal manual whitespace adjustment for places that used
      spaces to line things up.
      
      Then I just removed the old (sic) .remove_new member function, and this
      is the end result.  No more unnecessary conversion noise.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e70140ba
    • Eric Dumazet's avatar
      ipv6: avoid possible NULL deref in modify_prefix_route() · a747e024
      Eric Dumazet authored
      
      syzbot found a NULL deref [1] in modify_prefix_route(), caused by one
      fib6_info without a fib6_table pointer set.
      
      This can happen for net->ipv6.fib6_null_entry
      
      [1]
      Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI
      KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
      CPU: 1 UID: 0 PID: 5837 Comm: syz-executor888 Not tainted 6.12.0-syzkaller-09567-g7eef7e306d3c #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
       RIP: 0010:__lock_acquire+0xe4/0x3c40 kernel/locking/lockdep.c:5089
      Code: 08 84 d2 0f 85 15 14 00 00 44 8b 0d ca 98 f5 0e 45 85 c9 0f 84 b4 0e 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 96 2c 00 00 49 8b 04 24 48 3d a0 07 7f 93 0f 84
      RSP: 0018:ffffc900035d7268 EFLAGS: 00010006
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000006 RSI: 1ffff920006bae5f RDI: 0000000000000030
      RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
      R10: ffffffff90608e17 R11: 0000000000000001 R12: 0000000000000030
      R13: ffff888036334880 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000555579e90380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffc59cc4278 CR3: 0000000072b54000 CR4: 00000000003526f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
        lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5849
        __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
        _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
        spin_lock_bh include/linux/spinlock.h:356 [inline]
        modify_prefix_route+0x30b/0x8b0 net/ipv6/addrconf.c:4831
        inet6_addr_modify net/ipv6/addrconf.c:4923 [inline]
        inet6_rtm_newaddr+0x12c7/0x1ab0 net/ipv6/addrconf.c:5055
        rtnetlink_rcv_msg+0x3c7/0xea0 net/core/rtnetlink.c:6920
        netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2541
        netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
        netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1347
        netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1891
        sock_sendmsg_nosec net/socket.c:711 [inline]
        __sock_sendmsg net/socket.c:726 [inline]
        ____sys_sendmsg+0xaaf/0xc90 net/socket.c:2583
        ___sys_sendmsg+0x135/0x1e0 net/socket.c:2637
        __sys_sendmsg+0x16e/0x220 net/socket.c:2669
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7fd1dcef8b79
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffc59cc4378 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd1dcef8b79
      RDX: 0000000000040040 RSI: 0000000020000140 RDI: 0000000000000004
      RBP: 00000000000113fd R08: 0000000000000006 R09: 0000000000000006
      R10: 0000000000000006 R11: 0000000000000246 R12: 00007ffc59cc438c
      R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
       </TASK>
      
      Fixes: 5eb902b8 ("net/ipv6: Remove expired routes with a separated list of routes.")
      Reported-by: default avatar <syzbot+1de74b0794c40c8eb300@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/67461f7f.050a0220.1286eb.0021.GAE@google.com/T/#u
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      CC: Kui-Feng Lee <thinker.li@gmail.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a747e024
  6. Nov 30, 2024
    • Dong Chenchen's avatar
      net: Fix icmp host relookup triggering ip_rt_bug · c44daa7e
      Dong Chenchen authored
      
      arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is:
      
      WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20
      Modules linked in:
      CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      RIP: 0010:ip_rt_bug+0x14/0x20
      Call Trace:
       <IRQ>
       ip_send_skb+0x14/0x40
       __icmp_send+0x42d/0x6a0
       ipv4_link_failure+0xe2/0x1d0
       arp_error_report+0x3c/0x50
       neigh_invalidate+0x8d/0x100
       neigh_timer_handler+0x2e1/0x330
       call_timer_fn+0x21/0x120
       __run_timer_base.part.0+0x1c9/0x270
       run_timer_softirq+0x4c/0x80
       handle_softirqs+0xac/0x280
       irq_exit_rcu+0x62/0x80
       sysvec_apic_timer_interrupt+0x77/0x90
      
      The script below reproduces this scenario:
      ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \
      	dir out priority 0 ptype main flag localok icmp
      ip l a veth1 type veth
      ip a a 192.168.141.111/24 dev veth0
      ip l s veth0 up
      ping 192.168.141.155 -c 1
      
      icmp_route_lookup() create input routes for locally generated packets
      while xfrm relookup ICMP traffic.Then it will set input route
      (dst->out = ip_rt_bug) to skb for DESTUNREACH.
      
      For ICMP err triggered by locally generated packets, dst->dev of output
      route is loopback. Generally, xfrm relookup verification is not required
      on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1).
      
      Skip icmp relookup for locally generated packets to fix it.
      
      Fixes: 8b7817f3 ("[IPSEC]: Add ICMP host relookup support")
      Signed-off-by: default avatarDong Chenchen <dongchenchen2@huawei.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241127040850.1513135-1-dongchenchen2@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c44daa7e
    • Eric Dumazet's avatar
      net: hsr: avoid potential out-of-bound access in fill_frame_info() · b9653d19
      Eric Dumazet authored
      
      syzbot is able to feed a packet with 14 bytes, pretending
      it is a vlan one.
      
      Since fill_frame_info() is relying on skb->mac_len already,
      extend the check to cover this case.
      
      BUG: KMSAN: uninit-value in fill_frame_info net/hsr/hsr_forward.c:709 [inline]
       BUG: KMSAN: uninit-value in hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724
        fill_frame_info net/hsr/hsr_forward.c:709 [inline]
        hsr_forward_skb+0x9ee/0x3b10 net/hsr/hsr_forward.c:724
        hsr_dev_xmit+0x2f0/0x350 net/hsr/hsr_device.c:235
        __netdev_start_xmit include/linux/netdevice.h:5002 [inline]
        netdev_start_xmit include/linux/netdevice.h:5011 [inline]
        xmit_one net/core/dev.c:3590 [inline]
        dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3606
        __dev_queue_xmit+0x366a/0x57d0 net/core/dev.c:4434
        dev_queue_xmit include/linux/netdevice.h:3168 [inline]
        packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
        packet_snd net/packet/af_packet.c:3146 [inline]
        packet_sendmsg+0x91ae/0xa6f0 net/packet/af_packet.c:3178
        sock_sendmsg_nosec net/socket.c:711 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:726
        __sys_sendto+0x594/0x750 net/socket.c:2197
        __do_sys_sendto net/socket.c:2204 [inline]
        __se_sys_sendto net/socket.c:2200 [inline]
        __x64_sys_sendto+0x125/0x1d0 net/socket.c:2200
        x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:4091 [inline]
        slab_alloc_node mm/slub.c:4134 [inline]
        kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
        __alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
        alloc_skb include/linux/skbuff.h:1323 [inline]
        alloc_skb_with_frags+0xc8/0xd00 net/core/skbuff.c:6612
        sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2881
        packet_alloc_skb net/packet/af_packet.c:2995 [inline]
        packet_snd net/packet/af_packet.c:3089 [inline]
        packet_sendmsg+0x74c6/0xa6f0 net/packet/af_packet.c:3178
        sock_sendmsg_nosec net/socket.c:711 [inline]
        __sock_sendmsg+0x30f/0x380 net/socket.c:726
        __sys_sendto+0x594/0x750 net/socket.c:2197
        __do_sys_sendto net/socket.c:2204 [inline]
        __se_sys_sendto net/socket.c:2200 [inline]
        __x64_sys_sendto+0x125/0x1d0 net/socket.c:2200
        x64_sys_call+0x346a/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:45
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Fixes: 48b491a5 ("net: hsr: fix mac_len checks")
      Reported-by: default avatar <syzbot+671e2853f9851d039551@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/6745dc7f.050a0220.21d33d.0018.GAE@google.com/T/#u
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: WingMan Kwok <w-kwok2@ti.com>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: MD Danish Anwar <danishanwar@ti.com>
      Cc: Jiri Pirko <jiri@nvidia.com>
      Cc: George McCollister <george.mccollister@gmail.com>
      Link: https://patch.msgid.link/20241126144344.4177332-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b9653d19
    • Martin Ottens's avatar
      net/sched: tbf: correct backlog statistic for GSO packets · 1596a135
      Martin Ottens authored
      
      When the length of a GSO packet in the tbf qdisc is larger than the burst
      size configured the packet will be segmented by the tbf_segment function.
      Whenever this function is used to enqueue SKBs, the backlog statistic of
      the tbf is not increased correctly. This can lead to underflows of the
      'backlog' byte-statistic value when these packets are dequeued from tbf.
      
      Reproduce the bug:
      Ensure that the sender machine has GSO enabled. Configured the tbf on
      the outgoing interface of the machine as follows (burstsize = 1 MTU):
      $ tc qdisc add dev <oif> root handle 1: tbf rate 50Mbit burst 1514 latency 50ms
      
      Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
      client on this machine. Check the qdisc statistics:
      $ tc -s qdisc show dev <oif>
      
      The 'backlog' byte-statistic has incorrect values while traffic is
      transferred, e.g., high values due to u32 underflows. When the transfer
      is stopped, the value is != 0, which should never happen.
      
      This patch fixes this bug by updating the statistics correctly, even if
      single SKBs of a GSO SKB cannot be enqueued.
      
      Fixes: e43ac79a ("sch_tbf: segment too big GSO packets")
      Signed-off-by: default avatarMartin Ottens <martin.ottens@fau.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241125174608.1484356-1-martin.ottens@fau.de
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1596a135
    • Eric Dumazet's avatar
      tcp: populate XPS related fields of timewait sockets · 0a4cc4ac
      Eric Dumazet authored
      
      syzbot reported that netdev_core_pick_tx() was reading an uninitialized
      field [1].
      
      This is indeed hapening for timewait sockets after recent commits.
      
      We can copy the original established socket sk_tx_queue_mapping
      and sk_rx_queue_mapping fields, instead of adding more checks
      in fast paths.
      
      As a bonus, packets will use the same transmit queue than
      prior ones, this potentially can avoid reordering.
      
      [1]
      BUG: KMSAN: uninit-value in netdev_pick_tx+0x5c7/0x1550
       netdev_pick_tx+0x5c7/0x1550
        netdev_core_pick_tx+0x1d2/0x4a0 net/core/dev.c:4312
        __dev_queue_xmit+0x128a/0x57d0 net/core/dev.c:4394
        dev_queue_xmit include/linux/netdevice.h:3168 [inline]
        neigh_hh_output include/net/neighbour.h:523 [inline]
        neigh_output include/net/neighbour.h:537 [inline]
        ip_finish_output2+0x187c/0x1b70 net/ipv4/ip_output.c:236
       __ip_finish_output+0x287/0x810
        ip_finish_output+0x4b/0x600 net/ipv4/ip_output.c:324
        NF_HOOK_COND include/linux/netfilter.h:303 [inline]
        ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:434
        dst_output include/net/dst.h:450 [inline]
        ip_local_out net/ipv4/ip_output.c:130 [inline]
        ip_send_skb net/ipv4/ip_output.c:1505 [inline]
        ip_push_pending_frames+0x444/0x570 net/ipv4/ip_output.c:1525
        ip_send_unicast_reply+0x18c1/0x1b30 net/ipv4/ip_output.c:1672
        tcp_v4_send_reset+0x238d/0x2a40 net/ipv4/tcp_ipv4.c:910
        tcp_v4_rcv+0x48f8/0x5750 net/ipv4/tcp_ipv4.c:2431
        ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
        ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
        dst_input include/net/dst.h:460 [inline]
        ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline]
        ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline]
        ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636
        ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670
        __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
        __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762
        __netif_receive_skb_list net/core/dev.c:5814 [inline]
        netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905
        gro_normal_list include/net/gro.h:515 [inline]
        napi_complete_done+0x3d4/0x810 net/core/dev.c:6256
        virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline]
        virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013
        __napi_poll+0xe7/0x980 net/core/dev.c:6877
        napi_poll net/core/dev.c:6946 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068
        handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554
        __do_softirq kernel/softirq.c:588 [inline]
        invoke_softirq kernel/softirq.c:428 [inline]
        __irq_exit_rcu+0x68/0x180 kernel/softirq.c:655
        irq_exit_rcu+0x12/0x20 kernel/softirq.c:671
        common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278
        asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693
        __preempt_count_sub arch/x86/include/asm/preempt.h:84 [inline]
        kmsan_virt_addr_valid arch/x86/include/asm/kmsan.h:95 [inline]
        virt_to_page_or_null+0xfb/0x150 mm/kmsan/shadow.c:75
        kmsan_get_metadata+0x13e/0x1c0 mm/kmsan/shadow.c:141
        kmsan_get_shadow_origin_ptr+0x4d/0xb0 mm/kmsan/shadow.c:102
        get_shadow_origin_ptr mm/kmsan/instrumentation.c:38 [inline]
        __msan_metadata_ptr_for_store_4+0x27/0x40 mm/kmsan/instrumentation.c:93
        rcu_preempt_read_enter kernel/rcu/tree_plugin.h:390 [inline]
        __rcu_read_lock+0x46/0x70 kernel/rcu/tree_plugin.h:413
        rcu_read_lock include/linux/rcupdate.h:847 [inline]
        batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:408 [inline]
        batadv_nc_worker+0x114/0x19e0 net/batman-adv/network-coding.c:719
        process_one_work kernel/workqueue.c:3229 [inline]
        process_scheduled_works+0xae0/0x1c40 kernel/workqueue.c:3310
        worker_thread+0xea7/0x14f0 kernel/workqueue.c:3391
        kthread+0x3e2/0x540 kernel/kthread.c:389
        ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      Uninit was created at:
        __alloc_pages_noprof+0x9a7/0xe00 mm/page_alloc.c:4774
        alloc_pages_mpol_noprof+0x299/0x990 mm/mempolicy.c:2265
        alloc_pages_noprof+0x1bf/0x1e0 mm/mempolicy.c:2344
        alloc_slab_page mm/slub.c:2412 [inline]
        allocate_slab+0x320/0x12e0 mm/slub.c:2578
        new_slab mm/slub.c:2631 [inline]
        ___slab_alloc+0x12ef/0x35e0 mm/slub.c:3818
        __slab_alloc mm/slub.c:3908 [inline]
        __slab_alloc_node mm/slub.c:3961 [inline]
        slab_alloc_node mm/slub.c:4122 [inline]
        kmem_cache_alloc_noprof+0x57a/0xb20 mm/slub.c:4141
        inet_twsk_alloc+0x11f/0x9d0 net/ipv4/inet_timewait_sock.c:188
        tcp_time_wait+0x83/0xf50 net/ipv4/tcp_minisocks.c:309
       tcp_rcv_state_process+0x145a/0x49d0
        tcp_v4_do_rcv+0xbf9/0x11a0 net/ipv4/tcp_ipv4.c:1939
        tcp_v4_rcv+0x51df/0x5750 net/ipv4/tcp_ipv4.c:2351
        ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205
        ip_local_deliver_finish+0x336/0x500 net/ipv4/ip_input.c:233
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
        dst_input include/net/dst.h:460 [inline]
        ip_sublist_rcv_finish net/ipv4/ip_input.c:578 [inline]
        ip_list_rcv_finish net/ipv4/ip_input.c:628 [inline]
        ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:636
        ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:670
        __netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
        __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5762
        __netif_receive_skb_list net/core/dev.c:5814 [inline]
        netif_receive_skb_list_internal+0x1085/0x1700 net/core/dev.c:5905
        gro_normal_list include/net/gro.h:515 [inline]
        napi_complete_done+0x3d4/0x810 net/core/dev.c:6256
        virtqueue_napi_complete drivers/net/virtio_net.c:758 [inline]
        virtnet_poll+0x5d80/0x6bf0 drivers/net/virtio_net.c:3013
        __napi_poll+0xe7/0x980 net/core/dev.c:6877
        napi_poll net/core/dev.c:6946 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:7068
        handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:554
        __do_softirq kernel/softirq.c:588 [inline]
        invoke_softirq kernel/softirq.c:428 [inline]
        __irq_exit_rcu+0x68/0x180 kernel/softirq.c:655
        irq_exit_rcu+0x12/0x20 kernel/softirq.c:671
        common_interrupt+0x97/0xb0 arch/x86/kernel/irq.c:278
        asm_common_interrupt+0x2b/0x40 arch/x86/include/asm/idtentry.h:693
      
      CPU: 0 UID: 0 PID: 3962 Comm: kworker/u8:18 Not tainted 6.12.0-syzkaller-09073-g9f16d5e6f220 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
      Workqueue: bat_events batadv_nc_worker
      
      Fixes: 79636038 ("ipv4: tcp: give socket pointer to control skbs")
      Fixes: 507a9673 ("ipv6: tcp: give socket pointer to control skbs")
      Reported-by: default avatar <syzbot+8b0959fc16551d55896b@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/netdev/674442bd.050a0220.1cc393.0072.GAE@google.com/T/#u
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarBrian Vazquez <brianvv@google.com>
      Link: https://patch.msgid.link/20241125093039.3095790-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a4cc4ac
  7. Nov 28, 2024
    • Liu Jian's avatar
      sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket · 3f23f965
      Liu Jian authored
      
      BUG: KASAN: slab-use-after-free in tcp_write_timer_handler+0x156/0x3e0
      Read of size 1 at addr ffff888111f322cd by task swapper/0/0
      
      CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
      Call Trace:
       <IRQ>
       dump_stack_lvl+0x68/0xa0
       print_address_description.constprop.0+0x2c/0x3d0
       print_report+0xb4/0x270
       kasan_report+0xbd/0xf0
       tcp_write_timer_handler+0x156/0x3e0
       tcp_write_timer+0x66/0x170
       call_timer_fn+0xfb/0x1d0
       __run_timers+0x3f8/0x480
       run_timer_softirq+0x9b/0x100
       handle_softirqs+0x153/0x390
       __irq_exit_rcu+0x103/0x120
       irq_exit_rcu+0xe/0x20
       sysvec_apic_timer_interrupt+0x76/0x90
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x1a/0x20
      RIP: 0010:default_idle+0xf/0x20
      Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90
       90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 33 f8 25 00 fb f4 <fa> c3 cc cc cc
       cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
      RSP: 0018:ffffffffa2007e28 EFLAGS: 00000242
      RAX: 00000000000f3b31 RBX: 1ffffffff4400fc7 RCX: ffffffffa09c3196
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9f00590f
      RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed102360835d
      R10: ffff88811b041aeb R11: 0000000000000001 R12: 0000000000000000
      R13: ffffffffa202d7c0 R14: 0000000000000000 R15: 00000000000147d0
       default_idle_call+0x6b/0xa0
       cpuidle_idle_call+0x1af/0x1f0
       do_idle+0xbc/0x130
       cpu_startup_entry+0x33/0x40
       rest_init+0x11f/0x210
       start_kernel+0x39a/0x420
       x86_64_start_reservations+0x18/0x30
       x86_64_start_kernel+0x97/0xa0
       common_startup_64+0x13e/0x141
       </TASK>
      
      Allocated by task 595:
       kasan_save_stack+0x24/0x50
       kasan_save_track+0x14/0x30
       __kasan_slab_alloc+0x87/0x90
       kmem_cache_alloc_noprof+0x12b/0x3f0
       copy_net_ns+0x94/0x380
       create_new_namespaces+0x24c/0x500
       unshare_nsproxy_namespaces+0x75/0xf0
       ksys_unshare+0x24e/0x4f0
       __x64_sys_unshare+0x1f/0x30
       do_syscall_64+0x70/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Freed by task 100:
       kasan_save_stack+0x24/0x50
       kasan_save_track+0x14/0x30
       kasan_save_free_info+0x3b/0x60
       __kasan_slab_free+0x54/0x70
       kmem_cache_free+0x156/0x5d0
       cleanup_net+0x5d3/0x670
       process_one_work+0x776/0xa90
       worker_thread+0x2e2/0x560
       kthread+0x1a8/0x1f0
       ret_from_fork+0x34/0x60
       ret_from_fork_asm+0x1a/0x30
      
      Reproduction script:
      
      mkdir -p /mnt/nfsshare
      mkdir -p /mnt/nfs/netns_1
      mkfs.ext4 /dev/sdb
      mount /dev/sdb /mnt/nfsshare
      systemctl restart nfs-server
      chmod 777 /mnt/nfsshare
      exportfs -i -o rw,no_root_squash *:/mnt/nfsshare
      
      ip netns add netns_1
      ip link add name veth_1_peer type veth peer veth_1
      ifconfig veth_1_peer 11.11.0.254 up
      ip link set veth_1 netns netns_1
      ip netns exec netns_1 ifconfig veth_1 11.11.0.1
      
      ip netns exec netns_1 /root/iptables -A OUTPUT -d 11.11.0.254 -p tcp \
      	--tcp-flags FIN FIN  -j DROP
      
      (note: In my environment, a DESTROY_CLIENTID operation is always sent
       immediately, breaking the nfs tcp connection.)
      ip netns exec netns_1 timeout -s 9 300 mount -t nfs -o proto=tcp,vers=4.1 \
      	11.11.0.254:/mnt/nfsshare /mnt/nfs/netns_1
      
      ip netns del netns_1
      
      The reason here is that the tcp socket in netns_1 (nfs side) has been
      shutdown and closed (done in xs_destroy), but the FIN message (with ack)
      is discarded, and the nfsd side keeps sending retransmission messages.
      As a result, when the tcp sock in netns_1 processes the received message,
      it sends the message (FIN message) in the sending queue, and the tcp timer
      is re-established. When the network namespace is deleted, the net structure
      accessed by tcp's timer handler function causes problems.
      
      To fix this problem, let's hold netns refcnt for the tcp kernel socket as
      done in other modules. This is an ugly hack which can easily be backported
      to earlier kernels. A proper fix which cleans up the interfaces will
      follow, but may not be so easy to backport.
      
      Fixes: 26abe143 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      3f23f965
    • Benjamin Coddington's avatar
      SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT · d7bdd849
      Benjamin Coddington authored
      
      We've noticed a situation where an unstable TCP connection can cause the
      TLS handshake to timeout waiting for userspace to complete it.  When this
      happens, we don't want to return from xs_tls_handshake_sync() with zero, as
      this will cause the upper xprt to be set CONNECTED, and subsequent attempts
      to transmit will be returned with -EPIPE.  The sunrpc machine does not
      recover from this situation and will spin attempting to transmit.
      
      The return value of tls_handshake_cancel() can be used to detect a race
      with completion:
      
       * tls_handshake_cancel - cancel a pending handshake
       * Return values:
       *   %true - Uncompleted handshake request was canceled
       *   %false - Handshake request already completed or not found
      
      If true, we do not want the upper xprt to be connected, so return
      -ETIMEDOUT.  If false, its possible the handshake request was lost and
      that may be the reason for our timeout.  Again we do not want the upper
      xprt to be connected, so return -ETIMEDOUT.
      
      Ensure that we alway return an error from xs_tls_handshake_sync() if we
      call tls_handshake_cancel().
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Fixes: 75eb6af7 ("SUNRPC: Add a TCP-with-TLS RPC transport class")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      d7bdd849
    • Liu Jian's avatar
      sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport · 4db9ad82
      Liu Jian authored
      
      Since transport->sock has been set to NULL during reset transport,
      XPRT_SOCK_UPD_TIMEOUT also needs to be cleared. Otherwise, the
      xs_tcp_set_socket_timeouts() may be triggered in xs_tcp_send_request()
      to dereference the transport->sock that has been set to NULL.
      
      Fixes: 7196dbb0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      4db9ad82
    • Paolo Abeni's avatar
      ipmr: fix build with clang and DEBUG_NET disabled. · f6d7695b
      Paolo Abeni authored
      
      Sasha reported a build issue in ipmr::
      
      net/ipv4/ipmr.c:320:13: error: function 'ipmr_can_free_table' is not \
      	needed and will not be emitted \
      	[-Werror,-Wunneeded-internal-declaration]
         320 | static bool ipmr_can_free_table(struct net *net)
      
      Apparently clang is too smart with BUILD_BUG_ON_INVALID(), let's
      fallback to a plain WARN_ON_ONCE().
      
      Reported-by: default avatarSasha Levin <sashal@kernel.org>
      Closes: https://qa-reports.linaro.org/lkft/sashal-linus-next/build/v6.11-25635-g6813e2326f1e/testrun/26111580/suite/build/test/clang-nightly-lkftconfig/details/
      
      
      Fixes: 11b6e701 ("ipmr: add debug check for mr table cleanup")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://patch.msgid.link/ee75faa926b2446b8302ee5fc30e129d2df73b90.1732810228.git.pabeni@redhat.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f6d7695b
    • Pablo Neira Ayuso's avatar
      netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level · b7529880
      Pablo Neira Ayuso authored
      
      cgroup maximum depth is INT_MAX by default, there is a cgroup toggle to
      restrict this maximum depth to a more reasonable value not to harm
      performance. Remove unnecessary WARN_ON_ONCE which is reachable from
      userspace.
      
      Fixes: 7f3287db ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
      Reported-by: default avatar <syzbot+57bac0866ddd99fe47c0@syzkaller.appspotmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      b7529880
    • Dmitry Antipov's avatar
      netfilter: x_tables: fix LED ID check in led_tg_check() · 04317f4e
      Dmitry Antipov authored
      
      Syzbot has reported the following BUG detected by KASAN:
      
      BUG: KASAN: slab-out-of-bounds in strlen+0x58/0x70
      Read of size 1 at addr ffff8881022da0c8 by task repro/5879
      ...
      Call Trace:
       <TASK>
       dump_stack_lvl+0x241/0x360
       ? __pfx_dump_stack_lvl+0x10/0x10
       ? __pfx__printk+0x10/0x10
       ? _printk+0xd5/0x120
       ? __virt_addr_valid+0x183/0x530
       ? __virt_addr_valid+0x183/0x530
       print_report+0x169/0x550
       ? __virt_addr_valid+0x183/0x530
       ? __virt_addr_valid+0x183/0x530
       ? __virt_addr_valid+0x45f/0x530
       ? __phys_addr+0xba/0x170
       ? strlen+0x58/0x70
       kasan_report+0x143/0x180
       ? strlen+0x58/0x70
       strlen+0x58/0x70
       kstrdup+0x20/0x80
       led_tg_check+0x18b/0x3c0
       xt_check_target+0x3bb/0xa40
       ? __pfx_xt_check_target+0x10/0x10
       ? stack_depot_save_flags+0x6e4/0x830
       ? nft_target_init+0x174/0xc30
       nft_target_init+0x82d/0xc30
       ? __pfx_nft_target_init+0x10/0x10
       ? nf_tables_newrule+0x1609/0x2980
       ? nf_tables_newrule+0x1609/0x2980
       ? rcu_is_watching+0x15/0xb0
       ? nf_tables_newrule+0x1609/0x2980
       ? nf_tables_newrule+0x1609/0x2980
       ? __kmalloc_noprof+0x21a/0x400
       nf_tables_newrule+0x1860/0x2980
       ? __pfx_nf_tables_newrule+0x10/0x10
       ? __nla_parse+0x40/0x60
       nfnetlink_rcv+0x14e5/0x2ab0
       ? __pfx_validate_chain+0x10/0x10
       ? __pfx_nfnetlink_rcv+0x10/0x10
       ? __lock_acquire+0x1384/0x2050
       ? netlink_deliver_tap+0x2e/0x1b0
       ? __pfx_lock_release+0x10/0x10
       ? netlink_deliver_tap+0x2e/0x1b0
       netlink_unicast+0x7f8/0x990
       ? __pfx_netlink_unicast+0x10/0x10
       ? __virt_addr_valid+0x183/0x530
       ? __check_object_size+0x48e/0x900
       netlink_sendmsg+0x8e4/0xcb0
       ? __pfx_netlink_sendmsg+0x10/0x10
       ? aa_sock_msg_perm+0x91/0x160
       ? __pfx_netlink_sendmsg+0x10/0x10
       __sock_sendmsg+0x223/0x270
       ____sys_sendmsg+0x52a/0x7e0
       ? __pfx_____sys_sendmsg+0x10/0x10
       __sys_sendmsg+0x292/0x380
       ? __pfx___sys_sendmsg+0x10/0x10
       ? lockdep_hardirqs_on_prepare+0x43d/0x780
       ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
       ? exc_page_fault+0x590/0x8c0
       ? do_syscall_64+0xb6/0x230
       do_syscall_64+0xf3/0x230
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      ...
       </TASK>
      
      Since an invalid (without '\0' byte at all) byte sequence may be passed
      from userspace, add an extra check to ensure that such a sequence is
      rejected as possible ID and so never passed to 'kstrdup()' and further.
      
      Reported-by: default avatar <syzbot+6c8215822f35fdb35667@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=6c8215822f35fdb35667
      
      
      Fixes: 268cb38e ("netfilter: x_tables: add LED trigger target")
      Signed-off-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      04317f4e
    • Jinghao Jia's avatar
      ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init() · 146b6f11
      Jinghao Jia authored
      
      Under certain kernel configurations when building with Clang/LLVM, the
      compiler does not generate a return or jump as the terminator
      instruction for ip_vs_protocol_init(), triggering the following objtool
      warning during build time:
      
        vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()
      
      At runtime, this either causes an oops when trying to load the ipvs
      module or a boot-time panic if ipvs is built-in. This same issue has
      been reported by the Intel kernel test robot previously.
      
      Digging deeper into both LLVM and the kernel code reveals this to be a
      undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
      of 64 chars to store the registered protocol names and leaves it
      uninitialized after definition. The function calls strnlen() when
      concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
      strnlen() performs an extra step to check whether the last byte of the
      input char buffer is a null character (commit 3009f891 ("fortify:
      Allow strlen() and strnlen() to pass compile-time known lengths")).
      This, together with possibly other configurations, cause the following
      IR to be generated:
      
        define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 {
          %1 = alloca [64 x i8], align 16
          ...
      
        14:                                               ; preds = %11
          %15 = getelementptr inbounds i8, ptr %1, i64 63
          %16 = load i8, ptr %15, align 1
          %17 = tail call i1 @llvm.is.constant.i8(i8 %16)
          %18 = icmp eq i8 %16, 0
          %19 = select i1 %17, i1 %18, i1 false
          br i1 %19, label %20, label %23
      
        20:                                               ; preds = %14
          %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
          ...
      
        23:                                               ; preds = %14, %11, %20
          %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24
          ...
        }
      
      The above code calculates the address of the last char in the buffer
      (value %15) and then loads from it (value %16). Because the buffer is
      never initialized, the LLVM GVN pass marks value %16 as undefined:
      
        %13 = getelementptr inbounds i8, ptr %1, i64 63
        br i1 undef, label %14, label %17
      
      This gives later passes (SCCP, in particular) more DCE opportunities by
      propagating the undef value further, and eventually removes everything
      after the load on the uninitialized stack location:
      
        define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 {
          %1 = alloca [64 x i8], align 16
          ...
      
        12:                                               ; preds = %11
          %13 = getelementptr inbounds i8, ptr %1, i64 63
          unreachable
        }
      
      In this way, the generated native code will just fall through to the
      next function, as LLVM does not generate any code for the unreachable IR
      instruction and leaves the function without a terminator.
      
      Zero the on-stack buffer to avoid this possible UB.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/
      
      
      Co-developed-by: default avatarRuowen Qin <ruqin@redhat.com>
      Signed-off-by: default avatarRuowen Qin <ruqin@redhat.com>
      Signed-off-by: default avatarJinghao Jia <jinghao7@illinois.edu>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      146b6f11
    • Paolo Abeni's avatar
      ipmr: fix tables suspicious RCU usage · fc9c273d
      Paolo Abeni authored
      
      Similar to the previous patch, plumb the RCU lock inside
      the ipmr_get_table(), provided a lockless variant and apply
      the latter in the few spots were the lock is already held.
      
      Fixes: 709b46e8 ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT")
      Fixes: f0ad0860 ("ipv4: ipmr: support multiple tables")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fc9c273d
    • Paolo Abeni's avatar
      ip6mr: fix tables suspicious RCU usage · f1553c98
      Paolo Abeni authored
      
      Several places call ip6mr_get_table() with no RCU nor RTNL lock.
      Add RCU protection inside such helper and provide a lockless variant
      for the few callers that already acquired the relevant lock.
      
      Note that some users additionally reference the table outside the RCU
      lock. That is actually safe as the table deletion can happen only
      after all table accesses are completed.
      
      Fixes: e2d57766 ("net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.")
      Fixes: d7c31cbd ("net: ip6mr: add RTM_GETROUTE netlink op")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f1553c98
    • Paolo Abeni's avatar
      ipmr: add debug check for mr table cleanup · 11b6e701
      Paolo Abeni authored
      
      The multicast route tables lifecycle, for both ipv4 and ipv6, is
      protected by RCU using the RTNL lock for write access. In many
      places a table pointer escapes the RCU (or RTNL) protected critical
      section, but such scenarios are actually safe because tables are
      deleted only at namespace cleanup time or just after allocation, in
      case of default rule creation failure.
      
      Tables freed at namespace cleanup time are assured to be alive for the
      whole netns lifetime; tables freed just after creation time are never
      exposed to other possible users.
      
      Ensure that the free conditions are respected in ip{,6}mr_free_table, to
      document the locking schema and to prevent future possible introduction
      of 'table del' operation from breaking it.
      
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      11b6e701
    • Jakub Kicinski's avatar
      net_sched: sch_fq: don't follow the fast path if Tx is behind now · 122aba8c
      Jakub Kicinski authored and Paolo Abeni's avatar Paolo Abeni committed
      
      Recent kernels cause a lot of TCP retransmissions
      
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  2.24 GBytes  19.2 Gbits/sec  2767    442 KBytes
      [  5]   1.00-2.00   sec  2.23 GBytes  19.1 Gbits/sec  2312    350 KBytes
                                                            ^^^^
      
      Replacing the qdisc with pfifo makes retransmissions go away.
      
      It appears that a flow may have a delayed packet with a very near
      Tx time. Later, we may get busy processing Rx and the target Tx time
      will pass, but we won't service Tx since the CPU is busy with Rx.
      If Rx sees an ACK and we try to push more data for the delayed flow
      we may fastpath the skb, not realizing that there are already "ready
      to send" packets for this flow sitting in the qdisc.
      
      Don't trust the fastpath if we are "behind" according to the projected
      Tx time for next flow waiting in the Qdisc. Because we consider anything
      within the offload window to be okay for fastpath we must consider
      the entire offload window as "now".
      
      Qdisc config:
      
      qdisc fq 8001: dev eth0 parent 1234:1 limit 10000p flow_limit 100p \
        buckets 32768 orphan_mask 1023 bands 3 \
        priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 \
        weights 589824 196608 65536 quantum 3028b initial_quantum 15140b \
        low_rate_threshold 550Kbit \
        refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
      
      For iperf this change seems to do fine, the reordering is gone.
      The fastpath still gets used most of the time:
      
        gc 0 highprio 0 fastpath 142614 throttled 418309 latency 19.1us
         xx_behind 2731
      
      where "xx_behind" counts how many times we hit the new "return false".
      
      CC: stable@vger.kernel.org
      Fixes: 076433bd ("net_sched: sch_fq: add fast path for mostly idle qdisc")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241124022148.3126719-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      122aba8c
    • Kuniyuki Iwashima's avatar
      tcp: Fix use-after-free of nreq in reqsk_timer_handler(). · c31e72d0
      Kuniyuki Iwashima authored and Paolo Abeni's avatar Paolo Abeni committed
      
      The cited commit replaced inet_csk_reqsk_queue_drop_and_put() with
      __inet_csk_reqsk_queue_drop() and reqsk_put() in reqsk_timer_handler().
      
      Then, oreq should be passed to reqsk_put() instead of req; otherwise
      use-after-free of nreq could happen when reqsk is migrated but the
      retry attempt failed (e.g. due to timeout).
      
      Let's pass oreq to reqsk_put().
      
      Fixes: e8c526f2 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().")
      Reported-by: default avatarLiu Jian <liujian56@huawei.com>
      Closes: https://lore.kernel.org/netdev/1284490f-9525-42ee-b7b8-ccadf6606f6d@huawei.com/
      
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarVadim Fedorenko <vadim.fedorenko@linux.dev>
      Reviewed-by: default avatarLiu Jian <liujian56@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://patch.msgid.link/20241123174236.62438-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c31e72d0
    • Michal Luczaj's avatar
      rxrpc: Improve setsockopt() handling of malformed user input · 02020056
      Michal Luczaj authored and Paolo Abeni's avatar Paolo Abeni committed
      
      copy_from_sockptr() does not return negative value on error; instead, it
      reports the number of bytes that failed to copy. Since it's deprecated,
      switch to copy_safe_from_sockptr().
      
      Note: Keeping the `optlen != sizeof(unsigned int)` check as
      copy_safe_from_sockptr() by itself would also accept
      optlen > sizeof(unsigned int). Which would allow a more lenient handling
      of inputs.
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      02020056
    • Michal Luczaj's avatar
      llc: Improve setsockopt() handling of malformed user input · 1465036b
      Michal Luczaj authored and Paolo Abeni's avatar Paolo Abeni committed
      
      copy_from_sockptr() is used incorrectly: return value is the number of
      bytes that could not be copied. Since it's deprecated, switch to
      copy_safe_from_sockptr().
      
      Note: Keeping the `optlen != sizeof(int)` check as copy_safe_from_sockptr()
      by itself would also accept optlen > sizeof(int). Which would allow a more
      lenient handling of inputs.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Suggested-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1465036b
Loading