Skip to content
Snippets Groups Projects
  1. Feb 11, 2021
    • Petr Machata's avatar
      Revert "net-loopback: set lo dev initial state to UP" · 1edb5cbf
      Petr Machata authored
      
      In commit c9dca822 ("net-loopback: set lo dev initial state to UP"),
      linux started automatically bringing up the loopback device of a newly
      created namespace. However, an existing user script might reasonably have
      the following stanza when creating a new namespace -- and in fact at least
      tools/testing/selftests/net/fib_nexthops.sh in Linux's very own testsuite
      does:
      
       # set -e
       # ip netns add foo
       # ip -netns foo addr add 127.0.0.1/8 dev lo
       # ip -netns foo link set lo up
       # set +e
      
      This will now fail, because the kernel reasonably rejects "ip addr add" of
      a duplicate address. The described change of behavior therefore constitutes
      a breakage. Revert it.
      
      Fixes: c9dca822 ("net-loopback: set lo dev initial state to UP")
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1edb5cbf
  2. Feb 05, 2021
  3. Nov 08, 2019
  4. Jul 03, 2019
    • Mahesh Bandewar's avatar
      loopback: fix lockdep splat · d62962b3
      Mahesh Bandewar authored
      
      dev_init_scheduler() and dev_activate() expect the caller to
      hold RTNL. Since we don't want blackhole device to be initialized
      per ns, we are initializing at init.
      
      [    3.855027] Call Trace:
      [    3.855034]  dump_stack+0x67/0x95
      [    3.855037]  lockdep_rcu_suspicious+0xd5/0x110
      [    3.855044]  dev_init_scheduler+0xe3/0x120
      [    3.855048]  ? net_olddevs_init+0x60/0x60
      [    3.855050]  blackhole_netdev_init+0x45/0x6e
      [    3.855052]  do_one_initcall+0x6c/0x2fa
      [    3.855058]  ? rcu_read_lock_sched_held+0x8c/0xa0
      [    3.855066]  kernel_init_freeable+0x1e5/0x288
      [    3.855071]  ? rest_init+0x260/0x260
      [    3.855074]  kernel_init+0xf/0x180
      [    3.855076]  ? rest_init+0x260/0x260
      [    3.855078]  ret_from_fork+0x24/0x30
      
      Fixes: 4de83b88 ("loopback: create blackhole net device similar to loopack.")
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d62962b3
  5. Jul 02, 2019
  6. May 30, 2019
  7. Apr 12, 2019
  8. Oct 20, 2018
  9. Sep 14, 2018
  10. Mar 27, 2018
  11. Feb 13, 2018
    • Kirill Tkhai's avatar
      net: Convert loopback_net_ops · 9a4d105d
      Kirill Tkhai authored
      
      These pernet_operations have only init() method. It allocates
      memory for net_device, calls register_netdev() and assigns
      net::loopback_dev.
      
      register_netdev() is allowed be used without additional locks,
      as it's synchronized on rtnl_lock(). There are many examples
      of using this functon directly from ioctl().
      
      The only difference, compared to ioctl(), is that net is not
      completely alive at this moment. But it looks like, there is
      no way for parallel pernet_operations to dereference
      the net_device, as the most of struct net_device lists,
      where it's linked, are related to net, and the net is not liked.
      
      The exceptions are net_device::unreg_list, close_list, todo_list,
      used for unregistration, and ::link_watch_list, where net_device
      may be linked to global lists.
      
      Unregistration of loopback_dev obviously can't happen, when
      loopback_net_init() is executing, as the net as alive. It occurs
      in default_device_ops, which currently requires net_mutex,
      and it behaves as a barrier at the moment. It will be considered
      in next patch.
      
      Speaking about link_watch_list, it seems, there is no way
      for loopback_dev at time of registration to be linked in lweventlist
      and be available for another pernet_operations.
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a4d105d
  12. Jun 07, 2017
    • David S. Miller's avatar
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller authored
      
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf124db5
  13. Mar 21, 2017
  14. Feb 08, 2017
  15. Jan 08, 2017
  16. Dec 24, 2016
  17. Jun 03, 2016
  18. Dec 15, 2015
  19. Aug 18, 2015
  20. Oct 07, 2014
    • Eric Dumazet's avatar
      net: better IFF_XMIT_DST_RELEASE support · 02875878
      Eric Dumazet authored
      
      Testing xmit_more support with netperf and connected UDP sockets,
      I found strange dst refcount false sharing.
      
      Current handling of IFF_XMIT_DST_RELEASE is not optimal.
      
      Dropping dst in validate_xmit_skb() is certainly too late in case
      packet was queued by cpu X but dequeued by cpu Y
      
      The logical point to take care of drop/force is in __dev_queue_xmit()
      before even taking qdisc lock.
      
      As Julian Anastasov pointed out, need for skb_dst() might come from some
      packet schedulers or classifiers.
      
      This patch adds new helper to cleanly express needs of various drivers
      or qdiscs/classifiers.
      
      Drivers that need skb_dst() in their ndo_start_xmit() should call
      following helper in their setup instead of the prior :
      
      	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
      ->
      	netif_keep_dst(dev);
      
      Instead of using a single bit, we use two bits, one being
      eventually rebuilt in bonding/team drivers.
      
      The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
      rebuilt in bonding/team. Eventually, we could add something
      smarter later.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02875878
  21. Jul 15, 2014
    • Tom Gundersen's avatar
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen authored
      
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      
      Signed-off-by: default avatarTom Gundersen <teg@jklm.no>
      Reviewed-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c835a677
  22. Mar 15, 2014
  23. Feb 25, 2014
    • Daniel Borkmann's avatar
      loopback: sctp: add NETIF_F_SCTP_CSUM to device features · b17c7069
      Daniel Borkmann authored
      
      Drivers are allowed to set NETIF_F_SCTP_CSUM if they have
      hardware crc32c checksumming support for the SCTP protocol.
      Currently, NETIF_F_SCTP_CSUM flag is available in igb,
      ixgbe, i40e/i40evf drivers and for vlan devices.
      
      If we don't have NETIF_F_SCTP_CSUM then crc32c is done
      through CPU instructions, invoked from crypto layer, or
      if not available as slow-path fallback in software.
      
      Currently, loopback device propagates checksum offloading
      feature flags in dev->features, but is missing SCTP checksum
      offloading. Therefore, account for NETIF_F_SCTP_CSUM as
      well.
      
      Before patch:
      
      ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
      SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
      4194304 4194304   4096    10.00    4683.50
      
      After patch:
      
      ./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
      SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
      4194304 4194304   4096    10.00    15348.26
      
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b17c7069
  24. Feb 14, 2014
  25. Feb 13, 2014
    • WANG Cong's avatar
      net: allow setting mac address of loopback device · 25f929fb
      WANG Cong authored
      
      We are trying to mirror the local traffic from lo to eth0,
      allowing setting mac address of lo to eth0 would make
      the ether addresses in these packets correct, so that
      we don't have to modify the ether header again.
      
      Since usually no one cares about its mac address (all-zero),
      it is safe to allow those who care to set its mac address.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25f929fb
  26. Jan 16, 2014
  27. Nov 06, 2013
    • John Stultz's avatar
      net: Explicitly initialize u64_stats_sync structures for lockdep · 827da44c
      John Stultz authored and Ingo Molnar's avatar Ingo Molnar committed
      
      In order to enable lockdep on seqcount/seqlock structures, we
      must explicitly initialize any locks.
      
      The u64_stats_sync structure, uses a seqcount, and thus we need
      to introduce a u64_stats_init() function and use it to initialize
      the structure.
      
      This unfortunately adds a lot of fairly trivial initialization code
      to a number of drivers. But the benefit of ensuring correctness makes
      this worth while.
      
      Because these changes are required for lockdep to be enabled, and the
      changes are quite trivial, I've not yet split this patch out into 30-some
      separate patches, as I figured it would be better to get the various
      maintainers thoughts on how to best merge this change along with
      the seqcount lockdep enablement.
      
      Feedback would be appreciated!
      
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mirko Lindner <mlindner@marvell.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Roger Luethi <rl@hellgate.ch>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Wensong Zhang <wensong@linux-vs.org>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      827da44c
  28. Sep 17, 2013
  29. Jan 27, 2013
    • Eric Dumazet's avatar
      net: loopback: fix a dst refcounting issue · 794ed393
      Eric Dumazet authored
      
      Ben Greear reported crashes in ip_rcv_finish() on a stress
      test involving many macvlans.
      
      We tracked the bug to a dst use after free. ip_rcv_finish()
      was calling dst->input() and got garbage for dst->input value.
      
      It appears the bug is in loopback driver, lacking
      a skb_dst_force() before calling netif_rx().
      
      As a result, a non refcounted dst, normally protected by a
      RCU read_lock section, was escaping this section and could
      be freed before the packet being processed.
      
        [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
        [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
        [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
        [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
        [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
        [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
        [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
        [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
        [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
        [<ffffffff814ae412>] ip_output+0x61/0x67
        [<ffffffff814ab904>] dst_output+0x17/0x1b
        [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
        [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
        [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
        [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
        [<ffffffff814c3571>] tcp_connect+0x53c/0x587
        [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
        [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
        [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
        [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
        [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
        [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
        [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
        [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
        [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
        [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
        [<ffffffff814632c6>] sys_connect+0x75/0x98
      
      This bug was introduced in linux-2.6.35, in commit
      7fee226a (net: add a noref bit on skb dst)
      
      skb_dst_force() is enforced in dev_queue_xmit() for devices having a
      qdisc.
      
      Reported-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      794ed393
  30. Sep 24, 2012
    • Eric Dumazet's avatar
      net: loopback: set default mtu to 64K · 0cf833ae
      Eric Dumazet authored
      
      loopback current mtu of 16436 bytes allows no more than 3 MSS TCP
      segments per frame, or 48 Kbytes. Changing mtu to 64K allows TCP
      stack to build large frames and significantly reduces stack overhead.
      
      Performance boost on bulk TCP transferts can be up to 30 %, partly
      because we now have one ACK message for two 64KB segments, and a lower
      probability of hitting /proc/sys/net/ipv4/tcp_reordering default limit.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cf833ae
  31. Aug 09, 2012
  32. Jul 22, 2012
  33. Mar 28, 2012
  34. Nov 16, 2011
  35. May 08, 2011
    • Mahesh Bandewar's avatar
      net: Allow ethtool to set interface in loopback mode. · eed2a12f
      Mahesh Bandewar authored
      
      This patch enables ethtool to set the loopback mode on a given interface.
      By configuring the interface in loopback mode in conjunction with a policy
      route / rule, a userland application can stress the egress / ingress path
      exposing the flows of the change in progress and potentially help developer(s)
      understand the impact of those changes without even sending a packet out
      on the network.
      
      Following set of commands illustrates one such example -
          a) ip -4 addr add 192.168.1.1/24 dev eth1
          b) ip -4 rule add from all iif eth1 lookup 250
          c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
          d) arp -Ds 192.168.1.100 eth1
          e) arp -Ds 192.168.1.200 eth1
          f) sysctl -w net.ipv4.ip_nonlocal_bind=1
          g) sysctl -w net.ipv4.conf.all.accept_local=1
          # Assuming that the machine has 8 cores
          h) taskset 000f netserver -L 192.168.1.200
          i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
      
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Acked-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eed2a12f
  36. Apr 18, 2011
    • Krishna Kumar's avatar
      ip6_pol_route panic: Do not allow VLAN on loopback · 0553c891
      Krishna Kumar authored
      
      Several tests in the ipv6 routing code check IFF_LOOPBACK, and
      allowing stacking such as VLAN'ing on top of loopback results in a
      netdevice which reports IFF_LOOPBACK but really isn't the loopback
      device.
      
      Instead of spamming the ipv6 routing code with even more special tests,
      simply disallow VLAN over loopback.
      
      The result of this patch is:
      
      # modprobe 8021q
      # vconfig add lo 43
      ERROR: trying to add VLAN #43 to IF -:lo:-  error: Operation not supported
      
      Signed-off-by: default avatarKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0553c891
  37. Feb 17, 2011
Loading