Commits · kunit-prototype2 · Isabella Bassoasso / linux

Feb 11, 2021

Revert "net-loopback: set lo dev initial state to UP" · 1edb5cbf

Petr Machata authored 4 years ago


In commit c9dca822 ("net-loopback: set lo dev initial state to UP"),
linux started automatically bringing up the loopback device of a newly
created namespace. However, an existing user script might reasonably have
the following stanza when creating a new namespace -- and in fact at least
tools/testing/selftests/net/fib_nexthops.sh in Linux's very own testsuite
does:

 # set -e
 # ip netns add foo
 # ip -netns foo addr add 127.0.0.1/8 dev lo
 # ip -netns foo link set lo up
 # set +e

This will now fail, because the kernel reasonably rejects "ip addr add" of
a duplicate address. The described change of behavior therefore constitutes
a breakage. Revert it.

Fixes: c9dca822 ("net-loopback: set lo dev initial state to UP")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

1edb5cbf

Feb 05, 2021

net-loopback: set lo dev initial state to UP · c9dca822

Jian Yang authored 4 years ago

Traditionally loopback devices come up with initial state as DOWN for
any new network-namespace. This would mean that anyone needing this
device would have to bring this UP by issuing something like 'ip link
set lo up'. This can be avoided if the initial state is set as UP.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Jian Yang <jianyang@google.com>
Link: https://lore.kernel.org/r/20210201233445.2044327-1-jianyang.kernel@gmail.com

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c9dca822

Nov 08, 2019

net: use u64_stats_t in struct pcpu_lstats · fd2f4737

Eric Dumazet authored 5 years ago


In order to fix the data-race found by KCSAN, we
can use the new u64_stats_t type and its accessors instead
of plain u64 fields. This will still generate optimal code
for both 32 and 64 bit platforms.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd2f4737

net: provide dev_lstats_add() helper · dd5382a0

Eric Dumazet authored 5 years ago


Many network drivers need it and hand-coded the same function.

In order to ease u64_stats_t adoption, it is time to factorize.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd5382a0

net: provide dev_lstats_read() helper · de7d5084

Eric Dumazet authored 5 years ago


Many network drivers use hand-coded implementation of the same thing,
let's factorize things so that u64_stats_t adoption is done once.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de7d5084

Jul 03, 2019

loopback: fix lockdep splat · d62962b3

Mahesh Bandewar authored 5 years ago


dev_init_scheduler() and dev_activate() expect the caller to
hold RTNL. Since we don't want blackhole device to be initialized
per ns, we are initializing at init.

[    3.855027] Call Trace:
[    3.855034]  dump_stack+0x67/0x95
[    3.855037]  lockdep_rcu_suspicious+0xd5/0x110
[    3.855044]  dev_init_scheduler+0xe3/0x120
[    3.855048]  ? net_olddevs_init+0x60/0x60
[    3.855050]  blackhole_netdev_init+0x45/0x6e
[    3.855052]  do_one_initcall+0x6c/0x2fa
[    3.855058]  ? rcu_read_lock_sched_held+0x8c/0xa0
[    3.855066]  kernel_init_freeable+0x1e5/0x288
[    3.855071]  ? rest_init+0x260/0x260
[    3.855074]  kernel_init+0xf/0x180
[    3.855076]  ? rest_init+0x260/0x260
[    3.855078]  ret_from_fork+0x24/0x30

Fixes: 4de83b88 ("loopback: create blackhole net device similar to loopack.")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

d62962b3

Jul 02, 2019

loopback: create blackhole net device similar to loopack. · 4de83b88

Mahesh Bandewar authored 5 years ago

Create a blackhole net device that can be used for "dead"
dst entries instead of loopback device. This blackhole device differs
from loopback in few aspects: (a) It's not per-ns. (b) MTU on this
device is ETH_MIN_MTU (c) The xmit function is essentially kfree_skb().
and (d) since it's not registered it won't have ifindex.

Lower MTU effectively make the device not pass the MTU check during
the route check when a dst associated with the skb is dead.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4de83b88

May 30, 2019

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd

Thomas Gleixner authored 5 years ago


Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

2874c5fd

Apr 12, 2019

net: loopback: use generic helper to report timestamping info · af730342

Julian Wiedmann authored 5 years ago


For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af730342

Oct 20, 2018

net: loopback: clear skb->tstamp before netif_rx() · 4c16128b

Eric Dumazet authored 6 years ago


At least UDP / TCP stacks can now cook skbs with a tstamp using
MONOTONIC base (or arbitrary values with SCM_TXTIME)

Since loopback driver does not call (directly or indirectly)
skb_scrub_packet(), we need to clear skb->tstamp so that
net_timestamp_check() can eventually resample the time,
using ktime_get_real().

Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.")
Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c16128b

Sep 14, 2018

net: move definition of pcpu_lstats to header file · 52bb6677

Li RongQing authored 6 years ago


pcpu_lstats is defined in several files, so unify them as one
and move to header file

Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52bb6677

Mar 27, 2018

net: Drop pernet_operations::async · 2f635cee

Kirill Tkhai authored 7 years ago


Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f635cee

Feb 13, 2018

net: Convert loopback_net_ops · 9a4d105d

Kirill Tkhai authored 7 years ago


These pernet_operations have only init() method. It allocates
memory for net_device, calls register_netdev() and assigns
net::loopback_dev.

register_netdev() is allowed be used without additional locks,
as it's synchronized on rtnl_lock(). There are many examples
of using this functon directly from ioctl().

The only difference, compared to ioctl(), is that net is not
completely alive at this moment. But it looks like, there is
no way for parallel pernet_operations to dereference
the net_device, as the most of struct net_device lists,
where it's linked, are related to net, and the net is not liked.

The exceptions are net_device::unreg_list, close_list, todo_list,
used for unregistration, and ::link_watch_list, where net_device
may be linked to global lists.

Unregistration of loopback_dev obviously can't happen, when
loopback_net_init() is executing, as the net as alive. It occurs
in default_device_ops, which currently requires net_mutex,
and it behaves as a barrier at the moment. It will be considered
in next patch.

Speaking about link_watch_list, it seems, there is no way
for loopback_dev at time of registration to be linked in lweventlist
and be available for another pernet_operations.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a4d105d

Jun 07, 2017

net: Fix inconsistent teardown and release of private netdev state. · cf124db5

David S. Miller authored 7 years ago


Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>

cf124db5

Mar 21, 2017

Cleanup some warning from timestamping code. · b3407c8e

Ezequiel Lara Gomez authored 8 years ago


Following checkpatch.pl recommendations (which include
replacing with <linux/io.h> the <asm/io.h>, since linux/io.h includes
it).

Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3407c8e

Enable tx timestamping on loopback and dummy · 6df014cf

Ezequiel Lara Gomez authored 8 years ago


This enables developing code that uses SOF_TIMESTAMPING_TX_SOFTWARE
by using localhost addresses (without needing to send packets outside),
as well as enabling unit and functional testing of TX timestamping code
without needing hardware support or network access.

It also fulfills the expectation of software network devices supporting
software-based timestamping.

Tested on qemu using txtimestamping.c from the kernel selftests, and
ethtool -T.

Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6df014cf

Feb 08, 2017

net: introduce device min_header_len · 217e6fa2

Willem de Bruijn authored 8 years ago


The stack must not pass packets to device drivers that are shorter
than the minimum link layer header length.

Previously, packet sockets would drop packets smaller than or equal
to dev->hard_header_len, but this has false positives. Zero length
payload is used over Ethernet. Other link layer protocols support
variable length headers. Support for validation of these protocols
removed the min length check for all protocols.

Introduce an explicit dev->min_header_len parameter and drop all
packets below this value. Initially, set it to non-zero only for
Ethernet and loopback. Other protocols can follow in a patch to
net-next.

Fixes: 9ed988cd ("packet: validate variable length ll headers")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

217e6fa2

Jan 08, 2017

net: make ndo_get_stats64 a void function · bc1f4470

Stephen Hemminger authored 8 years ago


The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc1f4470

Dec 24, 2016

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

Linus Torvalds authored 8 years ago


This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

Jun 03, 2016

loopback: make use of NETIF_F_GSO_SOFTWARE · f6c382fc

Marcelo Ricardo Leitner authored 8 years ago


NETIF_F_GSO_SOFTWARE was defined to list all GSO software types, so lets
make use of it in loopback code. Note that veth/vxlan/others already
uses it.

Within this patch series, this patch causes lo to pick up SCTP GSO feature
automatically (as it's added to NETIF_F_GSO_SOFTWARE) and thus avoiding
segmentation if possible.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6c382fc

Dec 15, 2015

sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC · 53692b1d

Tom Herbert authored 9 years ago


The SCTP checksum is really a CRC and is very different from the
standards 1's complement checksum that serves as the checksum
for IP protocols. This offload interface is also very different.
Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these
differences. The term CSUM should be reserved in the stack to refer
to the standard 1's complement IP checksum.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

53692b1d

Aug 18, 2015

net: loopback: convert to using IFF_NO_QUEUE · e65db2b7

Phil Sutter authored 9 years ago


Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

e65db2b7

Oct 07, 2014

net: better IFF_XMIT_DST_RELEASE support · 02875878

Eric Dumazet authored 10 years ago


Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

02875878

Jul 15, 2014

net: set name_assign_type in alloc_netdev() · c835a677

Tom Gundersen authored 10 years ago


Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c835a677

Mar 15, 2014

net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq · 57a7744e

Eric W. Biederman authored 11 years ago


Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

57a7744e

Feb 25, 2014

loopback: sctp: add NETIF_F_SCTP_CSUM to device features · b17c7069

Daniel Borkmann authored 11 years ago


Drivers are allowed to set NETIF_F_SCTP_CSUM if they have
hardware crc32c checksumming support for the SCTP protocol.
Currently, NETIF_F_SCTP_CSUM flag is available in igb,
ixgbe, i40e/i40evf drivers and for vlan devices.

If we don't have NETIF_F_SCTP_CSUM then crc32c is done
through CPU instructions, invoked from crypto layer, or
if not available as slow-path fallback in software.

Currently, loopback device propagates checksum offloading
feature flags in dev->features, but is missing SCTP checksum
offloading. Therefore, account for NETIF_F_SCTP_CSUM as
well.

Before patch:

./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

4194304 4194304   4096    10.00    4683.50

After patch:

./netperf_sctp -H 192.168.0.100 -t SCTP_STREAM_MANY
SCTP 1-TO-MANY STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.100 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

4194304 4194304   4096    10.00    15348.26

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b17c7069

Feb 14, 2014

net: introduce netdev_alloc_pcpu_stats() for drivers · 1c213bd2

WANG Cong authored 11 years ago


There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c213bd2

Feb 13, 2014

net: allow setting mac address of loopback device · 25f929fb

WANG Cong authored 11 years ago


We are trying to mirror the local traffic from lo to eth0,
allowing setting mac address of lo to eth0 would make
the ether addresses in these packets correct, so that
we don't have to modify the ether header again.

Since usually no one cares about its mac address (all-zero),
it is safe to allow those who care to set its mac address.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25f929fb

Jan 16, 2014

drivers/net: delete non-required instances of include <linux/init.h> · a81ab36b

Paul Gortmaker authored 11 years ago


None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.   Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

This covers everything under drivers/net except for wireless, which
has been submitted separately.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a81ab36b

Nov 06, 2013

net: Explicitly initialize u64_stats_sync structures for lockdep · 827da44c

John Stultz authored 11 years ago and

Ingo Molnar committed 11 years ago


In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org


Signed-off-by: Ingo Molnar <mingo@kernel.org>

827da44c

Sep 17, 2013

net loopback: Set loopback_dev to NULL when freed · e05e9070

Eric W. Biederman authored 11 years ago


It has recently turned up that we have a number of long standing bugs
in the network stack cleanup code with use of the loopback device
after it has been freed that have not turned up because in most cases
the storage allocated to the loopback device is not reused, when those
accesses happen.

Set looback_dev to NULL to trigger oopses instead of silent data corrupt
when we hit this class of bug.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e05e9070

Jan 27, 2013

net: loopback: fix a dst refcounting issue · 794ed393

Eric Dumazet authored 12 years ago


Ben Greear reported crashes in ip_rcv_finish() on a stress
test involving many macvlans.

We tracked the bug to a dst use after free. ip_rcv_finish()
was calling dst->input() and got garbage for dst->input value.

It appears the bug is in loopback driver, lacking
a skb_dst_force() before calling netif_rx().

As a result, a non refcounted dst, normally protected by a
RCU read_lock section, was escaping this section and could
be freed before the packet being processed.

  [<ffffffff813a3c4d>] loopback_xmit+0x64/0x83
  [<ffffffff81477364>] dev_hard_start_xmit+0x26c/0x35e
  [<ffffffff8147771a>] dev_queue_xmit+0x2c4/0x37c
  [<ffffffff81477456>] ? dev_hard_start_xmit+0x35e/0x35e
  [<ffffffff8148cfa6>] ? eth_header+0x28/0xb6
  [<ffffffff81480f09>] neigh_resolve_output+0x176/0x1a7
  [<ffffffff814ad835>] ip_finish_output2+0x297/0x30d
  [<ffffffff814ad6d5>] ? ip_finish_output2+0x137/0x30d
  [<ffffffff814ad90e>] ip_finish_output+0x63/0x68
  [<ffffffff814ae412>] ip_output+0x61/0x67
  [<ffffffff814ab904>] dst_output+0x17/0x1b
  [<ffffffff814adb6d>] ip_local_out+0x1e/0x23
  [<ffffffff814ae1c4>] ip_queue_xmit+0x315/0x353
  [<ffffffff814adeaf>] ? ip_send_unicast_reply+0x2cc/0x2cc
  [<ffffffff814c018f>] tcp_transmit_skb+0x7ca/0x80b
  [<ffffffff814c3571>] tcp_connect+0x53c/0x587
  [<ffffffff810c2f0c>] ? getnstimeofday+0x44/0x7d
  [<ffffffff810c2f56>] ? ktime_get_real+0x11/0x3e
  [<ffffffff814c6f9b>] tcp_v4_connect+0x3c2/0x431
  [<ffffffff814d6913>] __inet_stream_connect+0x84/0x287
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff8108d695>] ? _local_bh_enable_ip+0x84/0x9f
  [<ffffffff8108d6c8>] ? local_bh_enable+0xd/0x11
  [<ffffffff8146763c>] ? lock_sock_nested+0x6e/0x79
  [<ffffffff814d6b38>] ? inet_stream_connect+0x22/0x49
  [<ffffffff814d6b49>] inet_stream_connect+0x33/0x49
  [<ffffffff814632c6>] sys_connect+0x75/0x98

This bug was introduced in linux-2.6.35, in commit
7fee226a (net: add a noref bit on skb dst)

skb_dst_force() is enforced in dev_queue_xmit() for devices having a
qdisc.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

794ed393

Sep 24, 2012

net: loopback: set default mtu to 64K · 0cf833ae

Eric Dumazet authored 12 years ago

loopback current mtu of 16436 bytes allows no more than 3 MSS TCP
segments per frame, or 48 Kbytes. Changing mtu to 64K allows TCP
stack to build large frames and significantly reduces stack overhead.

Performance boost on bulk TCP transferts can be up to 30 %, partly
because we now have one ACK message for two 64KB segments, and a lower
probability of hitting /proc/sys/net/ipv4/tcp_reordering default limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cf833ae

Aug 09, 2012

net: Loopback ifindex is constant now · 1fb9489b

Pavel Emelyanov authored 12 years ago


As pointed out, there are places, that access net->loopback_dev->ifindex
and after ifindex generation is made per-net this value becomes constant
equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use
it where appropriate.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1fb9489b

Jul 22, 2012

net: fix race condition in several drivers when reading stats · e3906486

Kevin Groeneveld authored 12 years ago


Fix race condition in several network drivers when reading stats on 32bit
UP architectures.  These drivers update their stats in a BH context and
therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
stats.

Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e3906486

Mar 28, 2012

Remove all #inclusions of asm/system.h · 9ffc93f2

David Howells authored 13 years ago

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>

9ffc93f2

Nov 16, 2011

net: remove NETIF_F_NO_CSUM feature bit · 34324dc2

Michał Mirosław authored 13 years ago


Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
enabled by default. The check heuristics is altered a bit here,
so it hits other people than before. The default shouldn't be
trusted for performance-critical cases anyway.

For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

34324dc2

May 08, 2011

net: Allow ethtool to set interface in loopback mode. · eed2a12f

Mahesh Bandewar authored 13 years ago


This patch enables ethtool to set the loopback mode on a given interface.
By configuring the interface in loopback mode in conjunction with a policy
route / rule, a userland application can stress the egress / ingress path
exposing the flows of the change in progress and potentially help developer(s)
understand the impact of those changes without even sending a packet out
on the network.

Following set of commands illustrates one such example -
    a) ip -4 addr add 192.168.1.1/24 dev eth1
    b) ip -4 rule add from all iif eth1 lookup 250
    c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
    d) arp -Ds 192.168.1.100 eth1
    e) arp -Ds 192.168.1.200 eth1
    f) sysctl -w net.ipv4.ip_nonlocal_bind=1
    g) sysctl -w net.ipv4.conf.all.accept_local=1
    # Assuming that the machine has 8 cores
    h) taskset 000f netserver -L 192.168.1.200
    i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eed2a12f

Apr 18, 2011

ip6_pol_route panic: Do not allow VLAN on loopback · 0553c891

Krishna Kumar authored 13 years ago


Several tests in the ipv6 routing code check IFF_LOOPBACK, and
allowing stacking such as VLAN'ing on top of loopback results in a
netdevice which reports IFF_LOOPBACK but really isn't the loopback
device.

Instead of spamming the ipv6 routing code with even more special tests,
simply disallow VLAN over loopback.

The result of this patch is:

# modprobe 8021q
# vconfig add lo 43
ERROR: trying to add VLAN #43 to IF -:lo:-  error: Operation not supported

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0553c891

Feb 17, 2011

loopback: convert to hw_features · cf0bdefd

Michał Mirosław authored 14 years ago

This also enables TSOv6, TSO-ECN, and UFO as loopback clearly can handle them.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf0bdefd

Admin message

Admin message