Skip to content
Snippets Groups Projects
  1. Oct 22, 2022
  2. Oct 20, 2022
    • Eric Dumazet's avatar
      net: sched: fix race condition in qdisc_graft() · ebda44da
      Eric Dumazet authored
      
      We had one syzbot report [1] in syzbot queue for a while.
      I was waiting for more occurrences and/or a repro but
      Dmitry Vyukov spotted the issue right away.
      
      <quoting Dmitry>
      qdisc_graft() drops reference to qdisc in notify_and_destroy
      while it's still assigned to dev->qdisc
      </quoting>
      
      Indeed, RCU rules are clear when replacing a data structure.
      The visible pointer (dev->qdisc in this case) must be updated
      to the new object _before_ RCU grace period is started
      (qdisc_put(old) in this case).
      
      [1]
      BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
      Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
      
      CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
      __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
      __tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
      tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
      rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f5efaa89279
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
      RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
      </TASK>
      
      Allocated by task 21027:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track mm/kasan/common.c:45 [inline]
      set_alloc_info mm/kasan/common.c:437 [inline]
      ____kasan_kmalloc mm/kasan/common.c:516 [inline]
      ____kasan_kmalloc mm/kasan/common.c:475 [inline]
      __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
      kmalloc_node include/linux/slab.h:623 [inline]
      kzalloc_node include/linux/slab.h:744 [inline]
      qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
      qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
      attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
      netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
      attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
      dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
      __dev_open+0x393/0x4d0 net/core/dev.c:1441
      __dev_change_flags+0x583/0x750 net/core/dev.c:8556
      rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
      rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
      __rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
      rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
      rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 21020:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track+0x21/0x30 mm/kasan/common.c:45
      kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
      ____kasan_slab_free mm/kasan/common.c:367 [inline]
      ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1754 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
      slab_free mm/slub.c:3534 [inline]
      kfree+0xe2/0x580 mm/slub.c:4562
      rcu_do_batch kernel/rcu/tree.c:2245 [inline]
      rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
      __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
      
      Last potentially related work creation:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
      call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
      qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
      notify_and_destroy net/sched/sch_api.c:1012 [inline]
      qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
      tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
      rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
      netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Second to last potentially related work creation:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      __kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
      kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
      neigh_destroy+0x431/0x630 net/core/neighbour.c:912
      neigh_release include/net/neighbour.h:454 [inline]
      neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
      neigh_del net/core/neighbour.c:225 [inline]
      neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
      neigh_forced_gc net/core/neighbour.c:276 [inline]
      neigh_alloc net/core/neighbour.c:447 [inline]
      ___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
      ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
      __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
      ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
      NF_HOOK_COND include/linux/netfilter.h:296 [inline]
      ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
      dst_output include/net/dst.h:451 [inline]
      NF_HOOK include/linux/netfilter.h:307 [inline]
      NF_HOOK include/linux/netfilter.h:301 [inline]
      mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
      mld_send_cr net/ipv6/mcast.c:2121 [inline]
      mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
      process_one_work+0x991/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e4/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      
      The buggy address belongs to the object at ffff88802065e000
      which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 56 bytes inside of
      1024-byte region [ffff88802065e000, ffff88802065e400)
      
      The buggy address belongs to the physical page:
      page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
      head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
      prep_new_page mm/page_alloc.c:2532 [inline]
      get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
      __alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
      alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
      alloc_slab_page mm/slub.c:1824 [inline]
      allocate_slab+0x27e/0x3d0 mm/slub.c:1969
      new_slab mm/slub.c:2029 [inline]
      ___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
      __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
      slab_alloc_node mm/slub.c:3209 [inline]
      __kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
      kmalloc_reserve net/core/skbuff.c:358 [inline]
      __alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
      alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
      tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
      tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
      tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
      inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      sock_write_iter+0x291/0x3d0 net/socket.c:1108
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:578
      ksys_write+0x1e8/0x250 fs/read_write.c:631
      page last free stack trace:
      reset_page_owner include/linux/page_owner.h:24 [inline]
      free_pages_prepare mm/page_alloc.c:1449 [inline]
      free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
      free_unref_page_prepare mm/page_alloc.c:3380 [inline]
      free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
      __unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
      qlink_free mm/kasan/quarantine.c:168 [inline]
      qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
      kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
      __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
      kasan_slab_alloc include/linux/kasan.h:224 [inline]
      slab_post_alloc_hook mm/slab.h:727 [inline]
      slab_alloc_node mm/slub.c:3243 [inline]
      slab_alloc mm/slub.c:3251 [inline]
      __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
      kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
      kmem_cache_zalloc include/linux/slab.h:723 [inline]
      alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
      alloc_page_buffers+0x280/0x790 fs/buffer.c:829
      create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
      ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
      ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
      generic_perform_write+0x246/0x560 mm/filemap.c:3738
      ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
      ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:578
      
      Fixes: af356afa ("net_sched: reintroduce dev->qdisc for use by sch_api")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221018203258.2793282-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ebda44da
  3. Oct 19, 2022
    • Paul Blakey's avatar
      net: Fix return value of qdisc ingress handling on success · 672e97ef
      Paul Blakey authored
      
      Currently qdisc ingress handling (sch_handle_ingress()) doesn't
      set a return value and it is left to the old return value of
      the caller (__netif_receive_skb_core()) which is RX drop, so if
      the packet is consumed, caller will stop and return this value
      as if the packet was dropped.
      
      This causes a problem in the kernel tcp stack when having a
      egress tc rule forwarding to a ingress tc rule.
      The tcp stack sending packets on the device having the egress rule
      will see the packets as not successfully transmitted (although they
      actually were), will not advance it's internal state of sent data,
      and packets returning on such tcp stream will be dropped by the tcp
      stack with reason ack-of-unsent-data. See reproduction in [0] below.
      
      Fix that by setting the return value to RX success if
      the packet was handled successfully.
      
      [0] Reproduction steps:
       $ ip link add veth1 type veth peer name peer1
       $ ip link add veth2 type veth peer name peer2
       $ ifconfig peer1 5.5.5.6/24 up
       $ ip netns add ns0
       $ ip link set dev peer2 netns ns0
       $ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
       $ ifconfig veth2 0 up
       $ ifconfig veth1 0 up
      
       #ingress forwarding veth1 <-> veth2
       $ tc qdisc add dev veth2 ingress
       $ tc qdisc add dev veth1 ingress
       $ tc filter add dev veth2 ingress prio 1 proto all flower \
         action mirred egress redirect dev veth1
       $ tc filter add dev veth1 ingress prio 1 proto all flower \
         action mirred egress redirect dev veth2
      
       #steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
       $ tc qdisc add dev peer1 clsact
       $ tc filter add dev peer1 egress prio 20 proto ip flower \
         action mirred ingress redirect dev veth1
      
       #run iperf and see connection not running
       $ iperf3 -s&
       $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
      
       #delete egress rule, and run again, now should work
       $ tc filter del dev peer1 egress
       $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
      
      Fixes: f697c3e8 ("[NET]: Avoid unnecessary cloning for ingress filtering")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      672e97ef
    • Zhengchao Shao's avatar
      net: sched: sfb: fix null pointer access issue when sfb_init() fails · 2a3fc782
      Zhengchao Shao authored
      
      When the default qdisc is sfb, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), sfb_reset() is invoked to clear resources.
      In this case, the q->qdisc is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	sfb_init()
      		tcf_block_get()          --->failed, q->qdisc is NULL
      	...
      	qdisc_put()
      		...
      		sfb_reset()
      			qdisc_reset(q->qdisc)    --->q->qdisc is NULL
      				ops = qdisc->ops
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
      RIP: 0010:qdisc_reset+0x2b/0x6f0
      Call Trace:
      <TASK>
      sfb_reset+0x37/0xd0
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f2164122d04
      </TASK>
      
      Fixes: e13e02a3 ("net_sched: SFB flow scheduler")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a3fc782
    • Zhengchao Shao's avatar
      Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" · f5ffa3b1
      Zhengchao Shao authored
      
      This reverts commit 494f5063.
      
      When the default qdisc is fq_codel, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), fq_codel_reset() is invoked to clear
      resources. In this case, the flow is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	fq_codel_init()
      		...
      		q->flows_cnt = 1024;
      		...
      		q->flows = kvcalloc(...)      --->failed, q->flows is NULL
      	...
      	qdisc_put()
      		...
      		fq_codel_reset()
      			...
      			flow = q->flows + i   --->q->flows is NULL
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      RIP: 0010:fq_codel_reset+0x14d/0x350
      Call Trace:
      <TASK>
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fd272b22d04
      </TASK>
      
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5ffa3b1
    • Zhengchao Shao's avatar
      net: sched: cake: fix null pointer access issue when cake_init() fails · 51f9a892
      Zhengchao Shao authored
      
      When the default qdisc is cake, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), cake_reset() is invoked to clear
      resources. In this case, the tins is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	cake_init()
      		q->tins = kvcalloc(...)        --->failed, q->tins is NULL
      	...
      	qdisc_put()
      		...
      		cake_reset()
      			...
      			cake_dequeue_one()
      				b = &q->tins[...]   --->q->tins is NULL
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      RIP: 0010:cake_dequeue_one+0xc9/0x3c0
      Call Trace:
      <TASK>
      cake_reset+0xb1/0x140
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f89e5122d04
      </TASK>
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51f9a892
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements · 96df8360
      Pablo Neira Ayuso authored
      
      Otherwise EINVAL is bogusly reported to userspace when deleting a set
      element. NFTA_SET_ELEM_KEY_END does not need to be set in case of:
      
      - insertion: if not present, start key is used as end key.
      - deletion: only start key needs to be specified, end key is ignored.
      
      Hence, relax the sanity check.
      
      Fixes: 88cccd90 ("netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      96df8360
    • Guillaume Nault's avatar
      netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. · 1fcc064b
      Guillaume Nault authored
      
      Currently netfilter's rpfilter and fib modules implicitely initialise
      ->flowic_uid with 0. This is normally the root UID. However, this isn't
      the case in user namespaces, where user ID 0 is mapped to a different
      kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get
      the root UID of the user namespace, thus keeping the same behaviour
      whether or not we're running in a user namepspace.
      
      Note, this is similar to commit 8bcfd092 ("ipv4: add missing
      initialization for flowi4_uid"), which fixed the rp_filter sysctl.
      
      Fixes: 622ec2c9 ("net: core: add UID to flows, rules, and routes")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1fcc064b
    • Eric Dumazet's avatar
      net: hsr: avoid possible NULL deref in skb_clone() · d8b57135
      Eric Dumazet authored
      
      syzbot got a crash [1] in skb_clone(), caused by a bug
      in hsr_get_untagged_frame().
      
      When/if create_stripped_skb_hsr() returns NULL, we must
      not attempt to call skb_clone().
      
      While we are at it, replace a WARN_ONCE() by netdev_warn_once().
      
      [1]
      general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f]
      CPU: 1 PID: 754 Comm: syz-executor.0 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      RIP: 0010:skb_clone+0x108/0x3c0 net/core/skbuff.c:1641
      Code: 93 02 00 00 49 83 7c 24 28 00 0f 85 e9 00 00 00 e8 5d 4a 29 fa 4c 8d 75 7e 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 04 02 4c 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 9e 01 00 00
      RSP: 0018:ffffc90003ccf4e0 EFLAGS: 00010207
      
      RAX: dffffc0000000000 RBX: ffffc90003ccf5f8 RCX: ffffc9000c24b000
      RDX: 000000000000000f RSI: ffffffff8751cb13 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 00000000000000f0 R09: 0000000000000140
      R10: fffffbfff181d972 R11: 0000000000000000 R12: ffff888161fc3640
      R13: 0000000000000a20 R14: 000000000000007e R15: ffffffff8dc5f620
      FS: 00007feb621e4700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007feb621e3ff8 CR3: 00000001643a9000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      hsr_get_untagged_frame+0x4e/0x610 net/hsr/hsr_forward.c:164
      hsr_forward_do net/hsr/hsr_forward.c:461 [inline]
      hsr_forward_skb+0xcca/0x1d50 net/hsr/hsr_forward.c:623
      hsr_handle_frame+0x588/0x7c0 net/hsr/hsr_slave.c:69
      __netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
      __netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      netif_receive_skb_internal net/core/dev.c:5685 [inline]
      netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
      tun_rx_batched+0x4ab/0x7a0 drivers/net/tun.c:1544
      tun_get_user+0x2686/0x3a00 drivers/net/tun.c:1995
      tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2025
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:584
      ksys_write+0x127/0x250 fs/read_write.c:637
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: f266a683 ("net/hsr: Better frame dispatch")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221017165928.2150130-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d8b57135
  4. Oct 18, 2022
    • Zhengchao Shao's avatar
      ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed · 1ca69520
      Zhengchao Shao authored and Paolo Abeni's avatar Paolo Abeni committed
      
      If the initialization fails in calling addrconf_init_net(), devconf_all is
      the pointer that has been released. Then ip6mr_sk_done() is called to
      release the net, accessing devconf->mc_forwarding directly causes invalid
      pointer access.
      
      The process is as follows:
      setup_net()
      	ops_init()
      		addrconf_init_net()
      		all = kmemdup(...)           ---> alloc "all"
      		...
      		net->ipv6.devconf_all = all;
      		__addrconf_sysctl_register() ---> failed
      		...
      		kfree(all);                  ---> ipv6.devconf_all invalid
      		...
      	ops_exit_list()
      		...
      		ip6mr_sk_done()
      			devconf = net->ipv6.devconf_all;
      			//devconf is invalid pointer
      			if (!devconf || !atomic_read(&devconf->mc_forwarding))
      
      The following is the Call Trace information:
      BUG: KASAN: use-after-free in ip6mr_sk_done+0x112/0x3a0
      Read of size 4 at addr ffff888075508e88 by task ip/14554
      Call Trace:
      <TASK>
      dump_stack_lvl+0x8e/0xd1
      print_report+0x155/0x454
      kasan_report+0xba/0x1f0
      kasan_check_range+0x35/0x1b0
      ip6mr_sk_done+0x112/0x3a0
      rawv6_close+0x48/0x70
      inet_release+0x109/0x230
      inet6_release+0x4c/0x70
      sock_release+0x87/0x1b0
      igmp6_net_exit+0x6b/0x170
      ops_exit_list+0xb0/0x170
      setup_net+0x7ac/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f7963322547
      
      </TASK>
      Allocated by task 14554:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      __kasan_kmalloc+0xa1/0xb0
      __kmalloc_node_track_caller+0x4a/0xb0
      kmemdup+0x28/0x60
      addrconf_init_net+0x1be/0x840
      ops_init+0xa5/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Freed by task 14554:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      kasan_save_free_info+0x2a/0x40
      ____kasan_slab_free+0x155/0x1b0
      slab_free_freelist_hook+0x11b/0x220
      __kmem_cache_free+0xa4/0x360
      addrconf_init_net+0x623/0x840
      ops_init+0xa5/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: 7d9b1b57 ("ip6mr: fix use-after-free in ip6mr_sk_done()")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221017080331.16878-1-shaozhengchao@huawei.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1ca69520
    • Kuniyuki Iwashima's avatar
      udp: Update reuse->has_conns under reuseport_lock. · 69421bf9
      Kuniyuki Iwashima authored and Paolo Abeni's avatar Paolo Abeni committed
      When we call connect() for a UDP socket in a reuseport group, we have
      to update sk->sk_reuseport_cb->has_conns to 1.  Otherwise, the kernel
      could select a unconnected socket wrongly for packets sent to the
      connected socket.
      
      However, the current way to set has_conns is illegal and possible to
      trigger that problem.  reuseport_has_conns() changes has_conns under
      rcu_read_lock(), which upgrades the RCU reader to the updater.  Then,
      it must do the update under the updater's lock, reuseport_lock, but
      it doesn't for now.
      
      For this reason, there is a race below where we fail to set has_conns
      resulting in the wrong socket selection.  To avoid the race, let's split
      the reader and updater with proper locking.
      
       cpu1                               cpu2
      +----+                             +----+
      
      __ip[46]_datagram_connect()        reuseport_grow()
      .                                  .
      |- reuseport_has_conns(sk, true)   |- more_reuse = __reuseport_alloc(more_socks_size)
      |  .                               |
      |  |- rcu_read_lock()
      |  |- reuse = rcu_dereference(sk->sk_reuseport_cb)
      |  |
      |  |                               |  /* reuse->has_conns == 0 here */
      |  |                               |- more_reuse->has_conns = reuse->has_conns
      |  |- reuse->has_conns = 1         |  /* more_reuse->has_conns SHOULD BE 1 HERE */
      |  |                               |
      |  |                               |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
      |  |                               |                     more_reuse)
      |  `- rcu_read_unlock()            `- kfree_rcu(reuse, rcu)
      |
      |- sk->sk_state = TCP_ESTABLISHED
      
      Note the likely(reuse) in reuseport_has_conns_set() is always true,
      but we put the test there for ease of review.  [0]
      
      For the record, usually, sk_reuseport_cb is changed under lock_sock().
      The only exception is reuseport_grow() & TCP reqsk migration case.
      
        1) shutdown() TCP listener, which is moved into the latter part of
           reuse->socks[] to migrate reqsk.
      
        2) New listen() overflows reuse->socks[] and call reuseport_grow().
      
        3) reuse->max_socks overflows u16 with the new listener.
      
        4) reuseport_grow() pops the old shutdown()ed listener from the array
           and update its sk->sk_reuseport_cb as NULL without lock_sock().
      
      shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
      but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
      so likely(reuse) never be false in reuseport_has_conns_set().
      
      [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
      
      
      
      Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      69421bf9
  5. Oct 16, 2022
    • Eric Dumazet's avatar
      skmsg: pass gfp argument to alloc_sk_msg() · 2d1f274b
      Eric Dumazet authored
      
      syzbot found that alloc_sk_msg() could be called from a
      non sleepable context. sk_psock_verdict_recv() uses
      rcu_read_lock() protection.
      
      We need the callers to pass a gfp_t argument to avoid issues.
      
      syzbot report was:
      
      BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
      in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3613, name: syz-executor414
      preempt_count: 0, expected: 0
      RCU nest depth: 1, expected: 0
      INFO: lockdep is turned off.
      CPU: 0 PID: 3613 Comm: syz-executor414 Not tainted 6.0.0-syzkaller-09589-g55be6084c8e0 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
      __might_resched+0x538/0x6a0 kernel/sched/core.c:9877
      might_alloc include/linux/sched/mm.h:274 [inline]
      slab_pre_alloc_hook mm/slab.h:700 [inline]
      slab_alloc_node mm/slub.c:3162 [inline]
      slab_alloc mm/slub.c:3256 [inline]
      kmem_cache_alloc_trace+0x59/0x310 mm/slub.c:3287
      kmalloc include/linux/slab.h:600 [inline]
      kzalloc include/linux/slab.h:733 [inline]
      alloc_sk_msg net/core/skmsg.c:507 [inline]
      sk_psock_skb_ingress_self+0x5c/0x330 net/core/skmsg.c:600
      sk_psock_verdict_apply+0x395/0x440 net/core/skmsg.c:1014
      sk_psock_verdict_recv+0x34d/0x560 net/core/skmsg.c:1201
      tcp_read_skb+0x4a1/0x790 net/ipv4/tcp.c:1770
      tcp_rcv_established+0x129d/0x1a10 net/ipv4/tcp_input.c:5971
      tcp_v4_do_rcv+0x479/0xac0 net/ipv4/tcp_ipv4.c:1681
      sk_backlog_rcv include/net/sock.h:1109 [inline]
      __release_sock+0x1d8/0x4c0 net/core/sock.c:2906
      release_sock+0x5d/0x1c0 net/core/sock.c:3462
      tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1483
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg net/socket.c:734 [inline]
      __sys_sendto+0x46d/0x5f0 net/socket.c:2117
      __do_sys_sendto net/socket.c:2129 [inline]
      __se_sys_sendto net/socket.c:2125 [inline]
      __x64_sys_sendto+0xda/0xf0 net/socket.c:2125
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 43312915 ("skmsg: Get rid of unncessary memset()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d1f274b
  6. Oct 15, 2022
  7. Oct 14, 2022
    • Jakub Kicinski's avatar
      tls: strp: make sure the TCP skbs do not have overlapping data · 0d87bbd3
      Jakub Kicinski authored
      
      TLS tries to get away with using the TCP input queue directly.
      This does not work if there is duplicated data (multiple skbs
      holding bytes for the same seq number range due to retransmits).
      Check for this condition and fall back to copy mode, it should
      be rare.
      
      Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d87bbd3
    • Alexander Potapenko's avatar
      tipc: fix an information leak in tipc_topsrv_kern_subscr · 777ecaab
      Alexander Potapenko authored
      
      Use a 8-byte write to initialize sub.usr_handle in
      tipc_topsrv_kern_subscr(), otherwise four bytes remain uninitialized
      when issuing setsockopt(..., SOL_TIPC, ...).
      This resulted in an infoleak reported by KMSAN when the packet was
      received:
      
        =====================================================
        BUG: KMSAN: kernel-infoleak in copyout+0xbc/0x100 lib/iov_iter.c:169
         instrument_copy_to_user ./include/linux/instrumented.h:121
         copyout+0xbc/0x100 lib/iov_iter.c:169
         _copy_to_iter+0x5c0/0x20a0 lib/iov_iter.c:527
         copy_to_iter ./include/linux/uio.h:176
         simple_copy_to_iter+0x64/0xa0 net/core/datagram.c:513
         __skb_datagram_iter+0x123/0xdc0 net/core/datagram.c:419
         skb_copy_datagram_iter+0x58/0x200 net/core/datagram.c:527
         skb_copy_datagram_msg ./include/linux/skbuff.h:3903
         packet_recvmsg+0x521/0x1e70 net/packet/af_packet.c:3469
         ____sys_recvmsg+0x2c4/0x810 net/socket.c:?
         ___sys_recvmsg+0x217/0x840 net/socket.c:2743
         __sys_recvmsg net/socket.c:2773
         __do_sys_recvmsg net/socket.c:2783
         __se_sys_recvmsg net/socket.c:2780
         __x64_sys_recvmsg+0x364/0x540 net/socket.c:2780
         do_syscall_x64 arch/x86/entry/common.c:50
         do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120
      
        ...
      
        Uninit was stored to memory at:
         tipc_sub_subscribe+0x42d/0xb50 net/tipc/subscr.c:156
         tipc_conn_rcv_sub+0x246/0x620 net/tipc/topsrv.c:375
         tipc_topsrv_kern_subscr+0x2e8/0x400 net/tipc/topsrv.c:579
         tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190
         tipc_sk_join+0x2a8/0x770 net/tipc/socket.c:3084
         tipc_setsockopt+0xae5/0xe40 net/tipc/socket.c:3201
         __sys_setsockopt+0x87f/0xdc0 net/socket.c:2252
         __do_sys_setsockopt net/socket.c:2263
         __se_sys_setsockopt net/socket.c:2260
         __x64_sys_setsockopt+0xe0/0x160 net/socket.c:2260
         do_syscall_x64 arch/x86/entry/common.c:50
         do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
         entry_SYSCALL_64_after_hwframe+0x63/0xcd arch/x86/entry/entry_64.S:120
      
        Local variable sub created at:
         tipc_topsrv_kern_subscr+0x57/0x400 net/tipc/topsrv.c:562
         tipc_group_create+0x4e7/0x7d0 net/tipc/group.c:190
      
        Bytes 84-87 of 88 are uninitialized
        Memory access of size 88 starts at ffff88801ed57cd0
        Data copied to user address 0000000020000400
        ...
        =====================================================
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Fixes: 026321c6 ("tipc: rename tipc_server to tipc_topsrv")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      777ecaab
    • Mark Tomlinson's avatar
      tipc: Fix recognition of trial period · 28be7ca4
      Mark Tomlinson authored
      
      The trial period exists until jiffies is after addr_trial_end. But as
      jiffies will eventually overflow, just using time_after will eventually
      give incorrect results. As the node address is set once the trial period
      ends, this can be used to know that we are not in the trial period.
      
      Fixes: e415577f ("tipc: correct discovery message handling during address trial period")
      Signed-off-by: default avatarMark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28be7ca4
  8. Oct 13, 2022
    • Eric Dumazet's avatar
      kcm: avoid potential race in kcm_tx_work · ec7eede3
      Eric Dumazet authored
      
      syzbot found that kcm_tx_work() could crash [1] in:
      
      	/* Primarily for SOCK_SEQPACKET sockets */
      	if (likely(sk->sk_socket) &&
      	    test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
      <<*>>	clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
      		sk->sk_write_space(sk);
      	}
      
      I think the reason is that another thread might concurrently
      run in kcm_release() and call sock_orphan(sk) while sk is not
      locked. kcm_tx_work() find sk->sk_socket being NULL.
      
      [1]
      BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
      BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
      BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
      Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53
      
      CPU: 0 PID: 53 Comm: kworker/u4:3 Not tainted 5.19.0-rc3-next-20220621-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: kkcmd kcm_tx_work
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      kasan_report+0xbe/0x1f0 mm/kasan/report.c:495
      check_region_inline mm/kasan/generic.c:183 [inline]
      kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
      instrument_atomic_write include/linux/instrumented.h:86 [inline]
      clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
      kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
      process_one_work+0x996/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e9/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
      </TASK>
      
      Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Link: https://lore.kernel.org/r/20221012133412.519394-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec7eede3
    • Kuniyuki Iwashima's avatar
      tcp: Clean up kernel listener's reqsk in inet_twsk_purge() · 740ea3c4
      Kuniyuki Iwashima authored
      Eric Dumazet reported a use-after-free related to the per-netns ehash
      series. [0]
      
      When we create a TCP socket from userspace, the socket always holds a
      refcnt of the netns.  This guarantees that a reqsk timer is always fired
      before netns dismantle.  Each reqsk has a refcnt of its listener, so the
      listener is not freed before the reqsk, and the net is not freed before
      the listener as well.
      
      OTOH, when in-kernel users create a TCP socket, it might not hold a refcnt
      of its netns.  Thus, a reqsk timer can be fired after the netns dismantle
      and access freed per-netns ehash.
      
      To avoid the use-after-free, we need to clean up TCP_NEW_SYN_RECV sockets
      in inet_twsk_purge() if the netns uses a per-netns ehash.
      
      [0]: https://lore.kernel.org/netdev/CANn89iLXMup0dRD_Ov79Xt8N9FM0XdhCHEN05sf3eLwxKweM6w@mail.gmail.com/
      
      
      
      BUG: KASAN: use-after-free in tcp_or_dccp_get_hashinfo
      include/net/inet_hashtables.h:181 [inline]
      BUG: KASAN: use-after-free in reqsk_queue_unlink+0x320/0x350
      net/ipv4/inet_connection_sock.c:913
      Read of size 8 at addr ffff88807545bd80 by task syz-executor.2/8301
      
      CPU: 1 PID: 8301 Comm: syz-executor.2 Not tainted
      6.0.0-syzkaller-02757-gaf7d23f9d96a #0
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 09/22/2022
      Call Trace:
      <IRQ>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
      tcp_or_dccp_get_hashinfo include/net/inet_hashtables.h:181 [inline]
      reqsk_queue_unlink+0x320/0x350 net/ipv4/inet_connection_sock.c:913
      inet_csk_reqsk_queue_drop net/ipv4/inet_connection_sock.c:927 [inline]
      inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:939 [inline]
      reqsk_timer_handler+0x724/0x1160 net/ipv4/inet_connection_sock.c:1053
      call_timer_fn+0x1a0/0x6b0 kernel/time/timer.c:1474
      expire_timers kernel/time/timer.c:1519 [inline]
      __run_timers.part.0+0x674/0xa80 kernel/time/timer.c:1790
      __run_timers kernel/time/timer.c:1768 [inline]
      run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1803
      __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
      invoke_softirq kernel/softirq.c:445 [inline]
      __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
      irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
      sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1107
      </IRQ>
      
      Fixes: d1e5e640 ("tcp: Introduce optional per-netns ehash.")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221012145036.74960-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      740ea3c4
    • Xin Long's avatar
      openvswitch: add nf_ct_is_confirmed check before assigning the helper · 3c186054
      Xin Long authored
      
      A WARN_ON call trace would be triggered when 'ct(commit, alg=helper)'
      applies on a confirmed connection:
      
        WARNING: CPU: 0 PID: 1251 at net/netfilter/nf_conntrack_extend.c:98
        RIP: 0010:nf_ct_ext_add+0x12d/0x150 [nf_conntrack]
        Call Trace:
         <TASK>
         nf_ct_helper_ext_add+0x12/0x60 [nf_conntrack]
         __nf_ct_try_assign_helper+0xc4/0x160 [nf_conntrack]
         __ovs_ct_lookup+0x72e/0x780 [openvswitch]
         ovs_ct_execute+0x1d8/0x920 [openvswitch]
         do_execute_actions+0x4e6/0xb60 [openvswitch]
         ovs_execute_actions+0x60/0x140 [openvswitch]
         ovs_packet_cmd_execute+0x2ad/0x310 [openvswitch]
         genl_family_rcv_msg_doit.isra.15+0x113/0x150
         genl_rcv_msg+0xef/0x1f0
      
      which can be reproduced with these OVS flows:
      
        table=0, in_port=veth1,tcp,tcp_dst=2121,ct_state=-trk
        actions=ct(commit, table=1)
        table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new
        actions=ct(commit, alg=ftp),normal
      
      The issue was introduced by commit 248d45f1 ("openvswitch: Allow
      attaching helper in later commit") where it somehow removed the check
      of nf_ct_is_confirmed before asigning the helper. This patch is to fix
      it by bringing it back.
      
      Fixes: 248d45f1 ("openvswitch: Allow attaching helper in later commit")
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarAaron Conole <aconole@redhat.com>
      Tested-by: default avatarAaron Conole <aconole@redhat.com>
      Link: https://lore.kernel.org/r/c5c9092a22a2194650222bffaf786902613deb16.1665085502.git.lucien.xin@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c186054
    • Kuniyuki Iwashima's avatar
      tcp: Fix data races around icsk->icsk_af_ops. · f49cd2f4
      Kuniyuki Iwashima authored
      
      setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops
      under lock_sock(), but tcp_(get|set)sockopt() read it locklessly.  To
      avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE()
      for the reads and writes.
      
      Thanks to Eric Dumazet for providing the syzbot report:
      
      BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect
      
      write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
      tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
      __inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
      inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
      __sys_connect_file net/socket.c:1976 [inline]
      __sys_connect+0x197/0x1b0 net/socket.c:1993
      __do_sys_connect net/socket.c:2003 [inline]
      __se_sys_connect net/socket.c:2000 [inline]
      __x64_sys_connect+0x3d/0x50 net/socket.c:2000
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
      tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
      sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
      __sys_setsockopt+0x212/0x2b0 net/socket.c:2252
      __do_sys_setsockopt net/socket.c:2263 [inline]
      __se_sys_setsockopt net/socket.c:2260 [inline]
      __x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 23937 Comm: syz-executor.5 Not tainted
      6.0.0-rc4-syzkaller-00331-g4ed9c1e971b1-dirty #0
      
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 08/26/2022
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f49cd2f4
    • Kuniyuki Iwashima's avatar
      ipv6: Fix data races around sk->sk_prot. · 364f997b
      Kuniyuki Iwashima authored
      
      Commit 086d4905 ("ipv6: annotate some data-races around sk->sk_prot")
      fixed some data-races around sk->sk_prot but it was not enough.
      
      Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
      without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
      load tearing.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      364f997b
    • Kuniyuki Iwashima's avatar
      tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). · d38afeec
      Kuniyuki Iwashima authored
      
      Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
      able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
      IPv4 conversion by IPV6_ADDRFORM.  However, commit 03485f2a ("udpv6:
      Add lockless sendmsg() support") added a lockless memory allocation path,
      which could cause a memory leak:
      
      setsockopt(IPV6_ADDRFORM)                 sendmsg()
      +-----------------------+                 +-------+
      - do_ipv6_setsockopt(sk, ...)             - udpv6_sendmsg(sk, ...)
        - sockopt_lock_sock(sk)                   ^._ called via udpv6_prot
          - lock_sock(sk)                             before WRITE_ONCE()
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)
        - inet6_destroy_sock()                    - if (!corkreq)
        - sockopt_release_sock(sk)                  - ip6_make_skb(sk, ...)
          - release_sock(sk)                          ^._ lockless fast path for
                                                          the non-corking case
      
                                                      - __ip6_append_data(sk, ...)
                                                        - ipv6_local_rxpmtu(sk, ...)
                                                          - xchg(&np->rxpmtu, skb)
                                                            ^._ rxpmtu is never freed.
      
                                                      - goto out_no_dst;
      
                                                  - lock_sock(sk)
      
      For now, rxpmtu is only the case, but not to miss the future change
      and a similar bug fixed in commit e2732600 ("net: ping6: Fix
      memleak in ipv6_renew_options()."), let's set a new function to IPv6
      sk->sk_destruct() and call inet6_cleanup_sock() there.  Since the
      conversion does not change sk->sk_destruct(), we can guarantee that
      we can clean up IPv6 resources finally.
      
      We can now remove all inet6_destroy_sock() calls from IPv6 protocol
      specific ->destroy() functions, but such changes are invasive to
      backport.  So they can be posted as a follow-up later for net-next.
      
      Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d38afeec
    • Kuniyuki Iwashima's avatar
      udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). · 21985f43
      Kuniyuki Iwashima authored
      
      Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot
      to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
      socket into IPv4 with IPV6_ADDRFORM.  After conversion, sk_prot is
      changed to udp_prot and ->destroy() never cleans it up, resulting in
      a memory leak.
      
      This is due to the discrepancy between inet6_destroy_sock() and
      IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
      to remove the difference.
      
      However, this is not enough for now because rxpmtu can be changed
      without lock_sock() after commit 03485f2a ("udpv6: Add lockless
      sendmsg() support").  We will fix this case in the following patch.
      
      Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
      remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
      in the future.
      
      Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      21985f43
    • Kuniyuki Iwashima's avatar
      tcp/udp: Fix memory leak in ipv6_renew_options(). · 3c52c6bb
      Kuniyuki Iwashima authored
      
      syzbot reported a memory leak [0] related to IPV6_ADDRFORM.
      
      The scenario is that while one thread is converting an IPv6 socket into
      IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and
      allocates memory to inet6_sk(sk)->XXX after conversion.
      
      Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources,
      which inet6_destroy_sock() should have cleaned up.
      
      setsockopt(IPV6_ADDRFORM)                 setsockopt(IPV6_DSTOPTS)
      +-----------------------+                 +----------------------+
      - do_ipv6_setsockopt(sk, ...)
        - sockopt_lock_sock(sk)                 - do_ipv6_setsockopt(sk, ...)
          - lock_sock(sk)                         ^._ called via tcpv6_prot
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)          before WRITE_ONCE()
        - xchg(&np->opt, NULL)
        - txopt_put(opt)
        - sockopt_release_sock(sk)
          - release_sock(sk)                      - sockopt_lock_sock(sk)
                                                    - lock_sock(sk)
                                                  - ipv6_set_opt_hdr(sk, ...)
                                                    - ipv6_update_options(sk, opt)
                                                      - xchg(&inet6_sk(sk)->opt, opt)
                                                        ^._ opt is never freed.
      
                                                  - sockopt_release_sock(sk)
                                                    - release_sock(sk)
      
      Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this
      memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after
      acquiring the lock.
      
      This issue exists from the initial commit between IPV6_ADDRFORM and
      IPV6_PKTOPTIONS.
      
      [0]:
      BUG: memory leak
      unreferenced object 0xffff888009ab9f80 (size 96):
        comm "syz-executor583", pid 328, jiffies 4294916198 (age 13.034s)
        hex dump (first 32 bytes):
          01 00 00 00 48 00 00 00 08 00 00 00 00 00 00 00  ....H...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002ee98ae1>] kmalloc include/linux/slab.h:605 [inline]
          [<000000002ee98ae1>] sock_kmalloc+0xb3/0x100 net/core/sock.c:2566
          [<0000000065d7b698>] ipv6_renew_options+0x21e/0x10b0 net/ipv6/exthdrs.c:1318
          [<00000000a8c756d7>] ipv6_set_opt_hdr net/ipv6/ipv6_sockglue.c:354 [inline]
          [<00000000a8c756d7>] do_ipv6_setsockopt.constprop.0+0x28b7/0x4350 net/ipv6/ipv6_sockglue.c:668
          [<000000002854d204>] ipv6_setsockopt+0xdf/0x190 net/ipv6/ipv6_sockglue.c:1021
          [<00000000e69fdcf8>] tcp_setsockopt+0x13b/0x2620 net/ipv4/tcp.c:3789
          [<0000000090da4b9b>] __sys_setsockopt+0x239/0x620 net/socket.c:2252
          [<00000000b10d192f>] __do_sys_setsockopt net/socket.c:2263 [inline]
          [<00000000b10d192f>] __se_sys_setsockopt net/socket.c:2260 [inline]
          [<00000000b10d192f>] __x64_sys_setsockopt+0xbe/0x160 net/socket.c:2260
          [<000000000a80d7aa>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<000000000a80d7aa>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
          [<000000004562b5c6>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3c52c6bb
  9. Oct 12, 2022
    • Pavel Begunkov's avatar
      io_uring/af_unix: defer registered files gc to io_uring release · 0091bfc8
      Pavel Begunkov authored
      
      Instead of putting io_uring's registered files in unix_gc() we want it
      to be done by io_uring itself. The trick here is to consider io_uring
      registered files for cycle detection but not actually putting them down.
      Because io_uring can't register other ring instances, this will remove
      all refs to the ring file triggering the ->release path and clean up
      with io_ring_ctx_free().
      
      Cc: stable@vger.kernel.org
      Fixes: 6b06314c ("io_uring: add file set registration")
      Reported-and-tested-by: default avatarDavid Bouman <dbouman03@gmail.com>
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      [axboe: add kerneldoc comment to skb, fold in skb leak fix]
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0091bfc8
    • Jeremy Kerr's avatar
      mctp: prevent double key removal and unref · 3a732b46
      Jeremy Kerr authored
      
      Currently, we have a bug where a simultaneous DROPTAG ioctl and socket
      close may race, as we attempt to remove a key from lists twice, and
      perform an unref for each removal operation. This may result in a uaf
      when we attempt the second unref.
      
      This change fixes the race by making __mctp_key_remove tolerant to being
      called on a key that has already been removed from the socket/net lists,
      and only performs the unref when we do the actual remove. We also need
      to hold the list lock on the ioctl cleanup path.
      
      This fix is based on a bug report and comprehensive analysis from
      butt3rflyh4ck <butterflyhuangxx@gmail.com>, found via syzkaller.
      
      Cc: stable@vger.kernel.org
      Fixes: 63ed1aab ("mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control")
      Reported-by: default avatarbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Signed-off-by: default avatarJeremy Kerr <jk@codeconstruct.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a732b46
    • Phil Sutter's avatar
      netfilter: rpfilter/fib: Populate flowic_l3mdev field · acc641ab
      Phil Sutter authored
      
      Use the introduced field for correct operation with VRF devices instead
      of conditionally overwriting flowic_oif. This is a partial revert of
      commit b575b24b ("netfilter: Fix rpfilter dropping vrf packets by
      mistake"), implementing a simpler solution.
      
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      acc641ab
    • Eric Dumazet's avatar
      tcp: cdg: allow tcp_cdg_release() to be called multiple times · 72e560cb
      Eric Dumazet authored
      
      Apparently, mptcp is able to call tcp_disconnect() on an already
      disconnected flow. This is generally fine, unless current congestion
      control is CDG, because it might trigger a double-free [1]
      
      Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
      more resilient.
      
      [1]
      BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
      BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
      
      CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Workqueue: events mptcp_worker
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_address_description mm/kasan/report.c:317 [inline]
      print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
      kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
      ____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1759 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
      slab_free mm/slub.c:3539 [inline]
      kfree+0xe2/0x580 mm/slub.c:4567
      tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
      __mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
      mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
      mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
      process_one_work+0x991/0x1610 kernel/workqueue.c:2289
      worker_thread+0x665/0x1080 kernel/workqueue.c:2436
      kthread+0x2e4/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
      </TASK>
      
      Allocated by task 3671:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track mm/kasan/common.c:45 [inline]
      set_alloc_info mm/kasan/common.c:437 [inline]
      ____kasan_kmalloc mm/kasan/common.c:516 [inline]
      ____kasan_kmalloc mm/kasan/common.c:475 [inline]
      __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
      kmalloc_array include/linux/slab.h:640 [inline]
      kcalloc include/linux/slab.h:671 [inline]
      tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
      tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
      tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
      tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
      do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
      tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
      mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
      __sys_setsockopt+0x2d6/0x690 net/socket.c:2252
      __do_sys_setsockopt net/socket.c:2263 [inline]
      __se_sys_setsockopt net/socket.c:2260 [inline]
      __x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 16:
      kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
      kasan_set_track+0x21/0x30 mm/kasan/common.c:45
      kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
      ____kasan_slab_free mm/kasan/common.c:367 [inline]
      ____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
      kasan_slab_free include/linux/kasan.h:200 [inline]
      slab_free_hook mm/slub.c:1759 [inline]
      slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
      slab_free mm/slub.c:3539 [inline]
      kfree+0xe2/0x580 mm/slub.c:4567
      tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
      tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
      tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
      inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
      tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
      tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
      tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
      tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
      ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
      ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
      NF_HOOK include/linux/netfilter.h:302 [inline]
      NF_HOOK include/linux/netfilter.h:296 [inline]
      ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
      dst_input include/net/dst.h:455 [inline]
      ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
      ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
      ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
      nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
      nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
      nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
      NF_HOOK include/linux/netfilter.h:300 [inline]
      ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      netif_receive_skb_internal net/core/dev.c:5685 [inline]
      netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
      NF_HOOK include/linux/netfilter.h:302 [inline]
      NF_HOOK include/linux/netfilter.h:296 [inline]
      br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
      br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
      br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
      br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
      NF_HOOK include/linux/netfilter.h:302 [inline]
      br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
      br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
      nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
      nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
      br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
      __netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
      __netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
      __napi_poll+0xb3/0x6d0 net/core/dev.c:6494
      napi_poll net/core/dev.c:6561 [inline]
      net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
      __do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
      
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72e560cb
    • Eric Dumazet's avatar
      inet: ping: fix recent breakage · 0d24148b
      Eric Dumazet authored
      
      Blamed commit broke the assumption used by ping sendmsg() that
      allocated skb would have MAX_HEADER bytes in skb->head.
      
      This patch changes the way ping works, by making sure
      the skb head contains space for the icmp header,
      and adjusting ping_getfrag() which was desperate
      about going past the icmp header :/
      
      This is adopting what UDP does, mostly.
      
      syzbot is able to crash a host using both kfence and following repro in a loop.
      
      fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6)
      connect(fd, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0),
      		inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28
      sendmsg(fd, {msg_name=NULL, msg_namelen=0, msg_iov=[
      		{iov_base="\200\0\0\0\23\0\0\0\0\0\0\0\0\0"..., iov_len=65496}],
      		msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0
      
      When kfence triggers, skb->head only has 64 bytes, immediately followed
      by struct skb_shared_info (no extra headroom based on ksize(ptr))
      
      Then icmpv6_push_pending_frames() is overwriting first bytes
      of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS,
      and/or setting shinfo->gso_size to a non zero value.
      
      If nr_frags is mangled, a crash happens in skb_release_data()
      
      If gso_size is mangled, we have the following report:
      
      lo: caps=(0x00000516401d7c69, 0x00000516401d7c69)
      WARNING: CPU: 0 PID: 7548 at net/core/dev.c:3239 skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
      Modules linked in:
      CPU: 0 PID: 7548 Comm: syz-executor268 Not tainted 6.0.0-syzkaller-02754-g557f050166e5 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      RIP: 0010:skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
      Code: 70 03 00 00 e8 58 c3 24 fa 4c 8d a5 e8 00 00 00 e8 4c c3 24 fa 4c 89 e9 4c 89 e2 4c 89 f6 48 c7 c7 00 53 f5 8a e8 13 ac e7 01 <0f> 0b 5b 5d 41 5c 41 5d 41 5e e9 28 c3 24 fa e8 23 c3 24 fa 48 89
      RSP: 0018:ffffc9000366f3e8 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff88807a9d9d00 RCX: 0000000000000000
      RDX: ffff8880780c0000 RSI: ffffffff8160f6f8 RDI: fffff520006cde6f
      RBP: ffff888079952000 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000400 R11: 0000000000000000 R12: ffff8880799520e8
      R13: ffff88807a9da070 R14: ffff888079952000 R15: 0000000000000000
      FS: 0000555556be6300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020010000 CR3: 000000006eb7b000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      gso_features_check net/core/dev.c:3521 [inline]
      netif_skb_features+0x83e/0xb90 net/core/dev.c:3554
      validate_xmit_skb+0x2b/0xf10 net/core/dev.c:3659
      __dev_queue_xmit+0x998/0x3ad0 net/core/dev.c:4248
      dev_queue_xmit include/linux/netdevice.h:3008 [inline]
      neigh_hh_output include/net/neighbour.h:530 [inline]
      neigh_output include/net/neighbour.h:544 [inline]
      ip6_finish_output2+0xf97/0x1520 net/ipv6/ip6_output.c:134
      __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
      ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
      NF_HOOK_COND include/linux/netfilter.h:291 [inline]
      ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
      dst_output include/net/dst.h:445 [inline]
      ip6_local_out+0xaf/0x1a0 net/ipv6/output_core.c:161
      ip6_send_skb+0xb7/0x340 net/ipv6/ip6_output.c:1966
      ip6_push_pending_frames+0xdd/0x100 net/ipv6/ip6_output.c:1986
      icmpv6_push_pending_frames+0x2af/0x490 net/ipv6/icmp.c:303
      ping_v6_sendmsg+0xc44/0x1190 net/ipv6/ping.c:190
      inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x712/0x8c0 net/socket.c:2482
      ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
      __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f21aab42b89
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fff1729d038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f21aab42b89
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
      R10: 000000000000000d R11: 0000000000000246 R12: 00007fff1729d050
      R13: 00000000000f4240 R14: 0000000000021dd1 R15: 00007fff1729d044
      </TASK>
      
      Fixes: 47cf8899 ("net: unify alloclen calculation for paged requests")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d24148b
    • Eric Dumazet's avatar
      ipv6: ping: fix wrong checksum for large frames · 87445f36
      Eric Dumazet authored
      
      For a given ping datagram, ping_getfrag() is called once
      per skb fragment.
      
      A large datagram requiring more than one page fragment
      is currently getting the checksum of the last fragment,
      instead of the cumulative one.
      
      After this patch, "ping -s 35000 ::1" is working correctly.
      
      Fixes: 6d0bfe22 ("net: ipv6: Add IPv6 support to the ping socket.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87445f36
  10. Oct 11, 2022
    • Jason A. Donenfeld's avatar
      treewide: use get_random_bytes() when possible · 197173db
      Jason A. Donenfeld authored
      
      The prandom_bytes() function has been a deprecated inline wrapper around
      get_random_bytes() for several releases now, and compiles down to the
      exact same code. Replace the deprecated wrapper with a direct call to
      the real function. This was done as a basic find and replace.
      
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> # powerpc
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      197173db
    • Jason A. Donenfeld's avatar
      treewide: use get_random_u32() when possible · a251c17a
      Jason A. Donenfeld authored
      
      The prandom_u32() function has been a deprecated inline wrapper around
      get_random_u32() for several releases now, and compiles down to the
      exact same code. Replace the deprecated wrapper with a direct call to
      the real function. The same also applies to get_random_int(), which is
      just a wrapper around get_random_u32(). This was done as a basic find
      and replace.
      
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
      Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
      Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Acked-by: Helge Deller <deller@gmx.de> # for parisc
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      a251c17a
    • Jason A. Donenfeld's avatar
      treewide: use get_random_{u8,u16}() when possible, part 2 · f743f16c
      Jason A. Donenfeld authored
      
      Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
      simply use the get_random_{u8,u16}() functions, which are faster than
      wasting the additional bytes from a 32-bit value. This was done by hand,
      identifying all of the places where one of the random integer functions
      was used in a non-32-bit context.
      
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      f743f16c
    • Jason A. Donenfeld's avatar
      treewide: use get_random_{u8,u16}() when possible, part 1 · 7e3cf084
      Jason A. Donenfeld authored
      
      Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
      simply use the get_random_{u8,u16}() functions, which are faster than
      wasting the additional bytes from a 32-bit value. This was done
      mechanically with this coccinelle script:
      
      @@
      expression E;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u16;
      typedef __be16;
      typedef __le16;
      typedef u8;
      @@
      (
      - (get_random_u32() & 0xffff)
      + get_random_u16()
      |
      - (get_random_u32() & 0xff)
      + get_random_u8()
      |
      - (get_random_u32() % 65536)
      + get_random_u16()
      |
      - (get_random_u32() % 256)
      + get_random_u8()
      |
      - (get_random_u32() >> 16)
      + get_random_u16()
      |
      - (get_random_u32() >> 24)
      + get_random_u8()
      |
      - (u16)get_random_u32()
      + get_random_u16()
      |
      - (u8)get_random_u32()
      + get_random_u8()
      |
      - (__be16)get_random_u32()
      + (__be16)get_random_u16()
      |
      - (__le16)get_random_u32()
      + (__le16)get_random_u16()
      |
      - prandom_u32_max(65536)
      + get_random_u16()
      |
      - prandom_u32_max(256)
      + get_random_u8()
      |
      - E->inet_id = get_random_u32()
      + E->inet_id = get_random_u16()
      )
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u16;
      identifier v;
      @@
      - u16 v = get_random_u32();
      + u16 v = get_random_u16();
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u8;
      identifier v;
      @@
      - u8 v = get_random_u32();
      + u8 v = get_random_u8();
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u16;
      u16 v;
      @@
      -  v = get_random_u32();
      +  v = get_random_u16();
      
      @@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u8;
      u8 v;
      @@
      -  v = get_random_u32();
      +  v = get_random_u8();
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Examine limits
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value < 256:
              coccinelle.RESULT = cocci.make_ident("get_random_u8")
      elif value < 65536:
              coccinelle.RESULT = cocci.make_ident("get_random_u16")
      else:
              print("Skipping large mask of %s" % (literal))
              cocci.include_match(False)
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      identifier add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       (RESULT() & LITERAL)
      
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      7e3cf084
    • Jason A. Donenfeld's avatar
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld authored
      
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
  11. Oct 10, 2022
Loading