Skip to content
Snippets Groups Projects
  1. Feb 12, 2021
    • Cong Wang's avatar
      net: fix dev_ifsioc_locked() race condition · 3b23a32a
      Cong Wang authored
      
      dev_ifsioc_locked() is called with only RCU read lock, so when
      there is a parallel writer changing the mac address, it could
      get a partially updated mac address, as shown below:
      
      Thread 1			Thread 2
      // eth_commit_mac_addr_change()
      memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
      				// dev_ifsioc_locked()
      				memcpy(ifr->ifr_hwaddr.sa_data,
      					dev->dev_addr,...);
      
      Close this race condition by guarding them with a RW semaphore,
      like netdev_get_name(). We can not use seqlock here as it does not
      allow blocking. The writers already take RTNL anyway, so this does
      not affect the slow path. To avoid bothering existing
      dev_set_mac_address() callers in drivers, introduce a new wrapper
      just for user-facing callers on ioctl and rtnetlink paths.
      
      Note, bonding also changes slave mac addresses but that requires
      a separate patch due to the complexity of bonding code.
      
      Fixes: 3710becf ("net: RCU locking for simple ioctl()")
      Reported-by: default avatar"Gong, Sishuai" <sishuai@purdue.edu>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b23a32a
  2. Jan 16, 2021
  3. Jan 08, 2021
  4. Jan 08, 2020
  5. Oct 23, 2019
  6. Sep 27, 2019
  7. Jul 09, 2019
    • Al Viro's avatar
      coallocate socket_wq with socket itself · 333f7909
      Al Viro authored
      
      socket->wq is assign-once, set when we are initializing both
      struct socket it's in and struct socket_wq it points to.  As the
      matter of fact, the only reason for separate allocation was the
      ability to RCU-delay freeing of socket_wq.  RCU-delaying the
      freeing of socket itself gets rid of that need, so we can just
      fold struct socket_wq into the end of struct socket and simplify
      the life both for sock_alloc_inode() (one allocation instead of
      two) and for tun/tap oddballs, where we used to embed struct socket
      and struct socket_wq into the same structure (now - embedding just
      the struct socket).
      
      Note that reference to struct socket_wq in struct sock does remain
      a reference - that's unchanged.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      333f7909
  8. May 21, 2019
  9. Feb 22, 2019
    • Maxim Mikityanskiy's avatar
      net: Don't set transport offset to invalid value · d2aa125d
      Maxim Mikityanskiy authored
      
      If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
      skb->protocol will be unset, __skb_flow_dissect() will fail, and
      skb_probe_transport_header() will fall back to the offset_hint, making
      the resulting skb_transport_offset incorrect.
      
      If, however, there is no transport header in the packet,
      transport_header shouldn't be set to an arbitrary value.
      
      Fix it by leaving the transport offset unset if it couldn't be found, to
      be explicit rather than to fill it with some wrong value. It changes the
      behavior, but if some code relied on the old behavior, it would be
      broken anyway, as the old one is incorrect.
      
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2aa125d
  10. Jan 01, 2019
  11. Dec 14, 2018
  12. Sep 21, 2018
  13. Sep 13, 2018
    • Jason Wang's avatar
      tap: accept an array of XDP buffs through sendmsg() · 0efac277
      Jason Wang authored
      
      This patch implement TUN_MSG_PTR msg_control type. This type allows
      the caller to pass an array of XDP buffs to tuntap through ptr field
      of the tun_msg_control. Tap will build skb through those XDP buffers.
      
      This will avoid lots of indirect calls thus improves the icache
      utilization and allows to do XDP batched flushing when doing XDP
      redirection.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0efac277
    • Jason Wang's avatar
      tun: switch to new type of msg_control · fe8dd45b
      Jason Wang authored
      
      This patch introduces to a new tun/tap specific msg_control:
      
      #define TUN_MSG_UBUF 1
      #define TUN_MSG_PTR  2
      struct tun_msg_ctl {
             int type;
             void *ptr;
      };
      
      This allows us to pass different kinds of msg_control through
      sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
      be used by the existed vhost_net zerocopy code. The second is XDP
      buff, which allows vhost_net to pass XDP buff to TUN. This could be
      used to implement accepting an array of XDP buffs from vhost_net in
      the following patches.
      
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe8dd45b
  14. Jun 07, 2018
    • Willem de Bruijn's avatar
      net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan · fd3a8862
      Willem de Bruijn authored
      
      Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
      to communicate packet metadata to userspace.
      
      For skbuffs with vlan, the first two return the packet as it may have
      existed on the wire, inserting the VLAN tag in the user buffer.  Then
      virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
      
      Commit f09e2249 ("macvtap: restore vlan header on user read")
      added this feature to macvtap. Commit 3ce9b20f ("macvtap: Fix
      csum_start when VLAN tags are present") then fixed up csum_start.
      
      Virtio, packet and uml do not insert the vlan header in the user
      buffer.
      
      When introducing virtio_net_hdr_from_skb to deduplicate filling in
      the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
      applied uniformly, breaking csum offset for packets with vlan on
      virtio and packet.
      
      Make insertion of VLAN_HLEN optional. Convert the callers to pass it
      when needed.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Fixes: 1276f24e ("packet: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd3a8862
  15. Feb 11, 2018
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  16. Jan 29, 2018
  17. Jan 09, 2018
  18. Dec 03, 2017
  19. Nov 28, 2017
  20. Nov 23, 2017
    • Willem de Bruijn's avatar
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn authored
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com
      
      >
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c19f846
  21. Nov 01, 2017
    • Craig Gallek's avatar
      tun/tap: sanitize TUNSETSNDBUF input · 93161922
      Craig Gallek authored
      
      Syzkaller found several variants of the lockup below by setting negative
      values with the TUNSETSNDBUF ioctl.  This patch adds a sanity check
      to both the tun and tap versions of this ioctl.
      
        watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389]
        Modules linked in:
        irq event stamp: 329692056
        hardirqs last  enabled at (329692055): [<ffffffff824b8381>] _raw_spin_unlock_irqrestore+0x31/0x75
        hardirqs last disabled at (329692056): [<ffffffff824b9e58>] apic_timer_interrupt+0x98/0xb0
        softirqs last  enabled at (35659740): [<ffffffff824bc958>] __do_softirq+0x328/0x48c
        softirqs last disabled at (35659731): [<ffffffff811c796c>] irq_exit+0xbc/0xd0
        CPU: 0 PID: 2389 Comm: repro Not tainted 4.14.0-rc7 #23
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        task: ffff880009452140 task.stack: ffff880006a20000
        RIP: 0010:_raw_spin_lock_irqsave+0x11/0x80
        RSP: 0018:ffff880006a27c50 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10
        RAX: ffff880009ac68d0 RBX: ffff880006a27ce0 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff880006a27ce0 RDI: ffff880009ac6900
        RBP: ffff880006a27c60 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000001 R11: 000000000063ff00 R12: ffff880009ac6900
        R13: ffff880006a27cf8 R14: 0000000000000001 R15: ffff880006a27cf8
        FS:  00007f4be4838700(0000) GS:ffff88000cc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000020101000 CR3: 0000000009616000 CR4: 00000000000006f0
        Call Trace:
         prepare_to_wait+0x26/0xc0
         sock_alloc_send_pskb+0x14e/0x270
         ? remove_wait_queue+0x60/0x60
         tun_get_user+0x2cc/0x19d0
         ? __tun_get+0x60/0x1b0
         tun_chr_write_iter+0x57/0x86
         __vfs_write+0x156/0x1e0
         vfs_write+0xf7/0x230
         SyS_write+0x57/0xd0
         entry_SYSCALL_64_fastpath+0x1f/0xbe
        RIP: 0033:0x7f4be4356df9
        RSP: 002b:00007ffc18101c08 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
        RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4be4356df9
        RDX: 0000000000000046 RSI: 0000000020101000 RDI: 0000000000000005
        RBP: 00007ffc18101c40 R08: 0000000000000001 R09: 0000000000000001
        R10: 0000000000000001 R11: 0000000000000293 R12: 0000559c75f64780
        R13: 00007ffc18101d30 R14: 0000000000000000 R15: 0000000000000000
      
      Fixes: 33dccbb0 ("tun: Limit amount of queued packets per device")
      Fixes: 20d29d7a ("net: macvtap driver")
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93161922
  22. Oct 28, 2017
    • Girish Moodalbail's avatar
      tap: reference to KVA of an unloaded module causes kernel panic · dea6e19f
      Girish Moodalbail authored
      
      The commit 9a393b5d ("tap: tap as an independent module") created a
      separate tap module that implements tap functionality and exports
      interfaces that will be used by macvtap and ipvtap modules to create
      create respective tap devices.
      
      However, that patch introduced a regression wherein the modules macvtap
      and ipvtap can be removed (through modprobe -r) while there are
      applications using the respective /dev/tapX devices. These applications
      cause kernel to hold reference to /dev/tapX through 'struct cdev
      macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
      modules respectively. So,  when the application is later closed the
      kernel panics because we are referencing KVA that is present in the
      unloaded modules.
      
      ----------8<------- Example ----------8<----------
      $ sudo ip li add name mv0 link enp7s0 type macvtap
      $ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
        14:mv0@enp7s0:
      $ cat /dev/tap14 &
      $ lsmod |egrep -i 'tap|vlan'
      macvtap                16384  0
      macvlan                24576  1 macvtap
      tap                    24576  3 macvtap
      $ sudo modprobe -r macvtap
      $ fg
      cat /dev/tap14
      ^C
      
      <...system panics...>
      BUG: unable to handle kernel paging request at ffffffffa038c500
      IP: cdev_put+0xf/0x30
      ----------8<-----------------8<----------
      
      The fix is to set cdev.owner to the module that creates the tap device
      (either macvtap or ipvtap). With this set, the operations (in
      fs/char_dev.c) on char device holds and releases the module through
      cdev_get() and cdev_put() and will not allow the module to unload
      prematurely.
      
      Fixes: 9a393b5d (tap: tap as an independent module)
      Signed-off-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dea6e19f
  23. Oct 26, 2017
  24. Oct 25, 2017
    • Mark Rutland's avatar
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland authored and Ingo Molnar's avatar Ingo Molnar committed
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6aa7de05
  25. Aug 16, 2017
  26. Aug 14, 2017
  27. Jul 17, 2017
  28. Jul 11, 2017
  29. May 18, 2017
  30. Mar 03, 2017
    • Ingo Molnar's avatar
      sched/headers: Move task_struct::signal and task_struct::sighand types and... · c3edc401
      Ingo Molnar authored
      sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>
      
      task_struct::signal and task_struct::sighand are pointers, which would normally make it
      straightforward to not define those types in sched.h.
      
      That is not so, because the types are accompanied by a myriad of APIs (macros and inline
      functions) that dereference them.
      
      Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.
      
      With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
      trying to put accessors into sched.h as a test fails the following way:
      
        ./include/linux/sched.h: In function ‘test_signal_types’:
        ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
                          ^
      
      This reduces the size and complexity of sched.h significantly.
      
      Update all headers and .c code that relied on getting the signal handling
      functionality from <linux/sched.h> to include <linux/sched/signal.h>.
      
      The list of affected files in the preparatory patch was partly generated by
      grepping for the APIs, and partly by doing coverage build testing, both
      all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
      cross-architecture builds.
      
      Nevertheless some (trivial) build breakage is still expected related to rare
      Kconfig combinations and in-flight patches to various kernel code, but most
      of it should be handled by this patch.
      
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c3edc401
  31. Feb 12, 2017
Loading