Skip to content
Snippets Groups Projects
  1. Apr 23, 2021
  2. Apr 11, 2021
  3. Mar 31, 2021
  4. Mar 26, 2021
    • Antoine Tenart's avatar
      geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply · 68c1a943
      Antoine Tenart authored
      
      When the interface is part of a bridge or an Open vSwitch port and a
      packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
      using the external mode (collect metadata) the source and destination
      addresses are reversed, so that Open vSwitch can match the packet
      against an existing (reverse) flow.
      
      But inverting the source and destination addresses in the shared
      ip_tunnel_info will make following packets of the flow to use a wrong
      destination address (packets will be tunnelled to itself), if the flow
      isn't updated. Which happens with Open vSwitch, until the flow times
      out.
      
      Fixes this by uncloning the skb's ip_tunnel_info before inverting its
      source and destination addresses, so that the modification will only be
      made for the PTMU packet, not the following ones.
      
      Fixes: c1a800e8 ("geneve: Support for PMTU discovery on directly bridged links")
      Tested-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68c1a943
  5. Jan 19, 2021
  6. Jan 07, 2021
  7. Dec 09, 2020
  8. Dec 02, 2020
    • Eric Dumazet's avatar
      geneve: pull IP header before ECN decapsulation · 4179b00c
      Eric Dumazet authored
      
      IP_ECN_decapsulate() and IP6_ECN_decapsulate() assume
      IP header is already pulled.
      
      geneve does not ensure this yet.
      
      Fixing this generically in IP_ECN_decapsulate() and
      IP6_ECN_decapsulate() is not possible, since callers
      pass a pointer that might be freed by pskb_may_pull()
      
      syzbot reported :
      
      BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
      BUG: KMSAN: uninit-value in INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
      CPU: 1 PID: 8941 Comm: syz-executor.0 Not tainted 5.10.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x21c/0x280 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
       __msan_warning+0x5f/0xa0 mm/kmsan/kmsan_instr.c:197
       __INET_ECN_decapsulate include/net/inet_ecn.h:238 [inline]
       INET_ECN_decapsulate+0x345/0x1db0 include/net/inet_ecn.h:260
       geneve_rx+0x2103/0x2980 include/net/inet_ecn.h:306
       geneve_udp_encap_recv+0x105c/0x1340 drivers/net/geneve.c:377
       udp_queue_rcv_one_skb+0x193a/0x1af0 net/ipv4/udp.c:2093
       udp_queue_rcv_skb+0x282/0x1050 net/ipv4/udp.c:2167
       udp_unicast_rcv_skb net/ipv4/udp.c:2325 [inline]
       __udp4_lib_rcv+0x399d/0x5880 net/ipv4/udp.c:2394
       udp_rcv+0x5c/0x70 net/ipv4/udp.c:2564
       ip_protocol_deliver_rcu+0x572/0xc50 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_local_deliver+0x583/0x8d0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:449 [inline]
       ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_rcv+0x5c3/0x840 net/ipv4/ip_input.c:539
       __netif_receive_skb_one_core net/core/dev.c:5315 [inline]
       __netif_receive_skb+0x1ec/0x640 net/core/dev.c:5429
       process_backlog+0x523/0xc10 net/core/dev.c:6319
       napi_poll+0x420/0x1010 net/core/dev.c:6763
       net_rx_action+0x35c/0xd40 net/core/dev.c:6833
       __do_softirq+0x1a9/0x6fa kernel/softirq.c:298
       asm_call_irq_on_stack+0xf/0x20
       </IRQ>
       __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
       run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
       do_softirq_own_stack+0x6e/0x90 arch/x86/kernel/irq_64.c:77
       do_softirq kernel/softirq.c:343 [inline]
       __local_bh_enable_ip+0x184/0x1d0 kernel/softirq.c:195
       local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
       rcu_read_unlock_bh include/linux/rcupdate.h:730 [inline]
       __dev_queue_xmit+0x3a9b/0x4520 net/core/dev.c:4167
       dev_queue_xmit+0x4b/0x60 net/core/dev.c:4173
       packet_snd net/packet/af_packet.c:2992 [inline]
       packet_sendmsg+0x86f9/0x99d0 net/packet/af_packet.c:3017
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       __sys_sendto+0x9dc/0xc80 net/socket.c:1992
       __do_sys_sendto net/socket.c:2004 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:2000
       __x64_sys_sendto+0x6e/0x90 net/socket.c:2000
       do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 2d07dc79 ("geneve: add initial netdev driver for GENEVE tunnels")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20201201090507.4137906-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4179b00c
  9. Nov 24, 2020
  10. Nov 14, 2020
  11. Nov 10, 2020
  12. Oct 06, 2020
  13. Sep 17, 2020
    • Mark Gray's avatar
      geneve: add transport ports in route lookup for geneve · 34beb215
      Mark Gray authored
      
      This patch adds transport ports information for route lookup so that
      IPsec can select Geneve tunnel traffic to do encryption. This is
      needed for OVS/OVN IPsec with encrypted Geneve tunnels.
      
      This can be tested by configuring a host-host VPN using an IKE
      daemon and specifying port numbers. For example, for an
      Openswan-type configuration, the following parameters should be
      configured on both hosts and IPsec set up as-per normal:
      
      $ cat /etc/ipsec.conf
      
      conn in
      ...
      left=$IP1
      right=$IP2
      ...
      leftprotoport=udp/6081
      rightprotoport=udp
      ...
      conn out
      ...
      left=$IP1
      right=$IP2
      ...
      leftprotoport=udp
      rightprotoport=udp/6081
      ...
      
      The tunnel can then be setup using "ip" on both hosts (but
      changing the relevant IP addresses):
      
      $ ip link add tun type geneve id 1000 remote $IP2
      $ ip addr add 192.168.0.1/24 dev tun
      $ ip link set tun up
      
      This can then be tested by pinging from $IP1:
      
      $ ping 192.168.0.2
      
      Without this patch the traffic is unencrypted on the wire.
      
      Fixes: 2d07dc79 ("geneve: add initial netdev driver for GENEVE tunnels")
      Signed-off-by: default avatarQiuyu Xiao <qiuyu.xiao.qyx@gmail.com>
      Signed-off-by: default avatarMark Gray <mark.d.gray@redhat.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34beb215
  14. Aug 04, 2020
    • Stefano Brivio's avatar
      geneve: Support for PMTU discovery on directly bridged links · c1a800e8
      Stefano Brivio authored
      
      If the interface is a bridge or Open vSwitch port, and we can't
      forward a packet because it exceeds the local PMTU estimate,
      trigger an ICMP or ICMPv6 reply to the sender, using the same
      interface to forward it back.
      
      If metadata collection is enabled, set destination and source
      addresses for the flow as if we were receiving the packet, so that
      Open vSwitch can match the ICMP error against the existing
      association.
      
      v2: Use netif_is_any_bridge_port() (David Ahern)
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1a800e8
    • Stefano Brivio's avatar
      tunnels: PMTU discovery support for directly bridged IP packets · 4cb47a86
      Stefano Brivio authored
      
      It's currently possible to bridge Ethernet tunnels carrying IP
      packets directly to external interfaces without assigning them
      addresses and routes on the bridged network itself: this is the case
      for UDP tunnels bridged with a standard bridge or by Open vSwitch.
      
      PMTU discovery is currently broken with those configurations, because
      the encapsulation effectively decreases the MTU of the link, and
      while we are able to account for this using PMTU discovery on the
      lower layer, we don't have a way to relay ICMP or ICMPv6 messages
      needed by the sender, because we don't have valid routes to it.
      
      On the other hand, as a tunnel endpoint, we can't fragment packets
      as a general approach: this is for instance clearly forbidden for
      VXLAN by RFC 7348, section 4.3:
      
         VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
         fragment encapsulated VXLAN packets due to the larger frame size.
         The destination VTEP MAY silently discard such VXLAN fragments.
      
      The same paragraph recommends that the MTU over the physical network
      accomodates for encapsulations, but this isn't a practical option for
      complex topologies, especially for typical Open vSwitch use cases.
      
      Further, it states that:
      
         Other techniques like Path MTU discovery (see [RFC1191] and
         [RFC1981]) MAY be used to address this requirement as well.
      
      Now, PMTU discovery already works for routed interfaces, we get
      route exceptions created by the encapsulation device as they receive
      ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
      we already rebuild those messages with the appropriate MTU and route
      them back to the sender.
      
      Add the missing bits for bridged cases:
      
      - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
        to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
        RFC 4443 section 2.4 for ICMPv6. This function is already called by
        UDP tunnels
      
      - a new function generating those ICMP or ICMPv6 replies. We can't
        reuse icmp_send() and icmp6_send() as we don't see the sender as a
        valid destination. This doesn't need to be generic, as we don't
        cover any other type of ICMP errors given that we only provide an
        encapsulation function to the sender
      
      While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
      we might receive GSO buffers here, and the passed headroom already
      includes the inner MAC length, so we don't have to account for it
      a second time (that would imply three MAC headers on the wire, but
      there are just two).
      
      This issue became visible while bridging IPv6 packets with 4500 bytes
      of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
      bytes of encapsulation headroom, we would advertise MTU as 3950, and
      we would reject fragmented IPv6 datagrams of 3958 bytes size on the
      wire. We're exclusively dealing with network MTU here, though, so we
      could get Ethernet frames up to 3964 octets in that case.
      
      v2:
      - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
      - split IPv4/IPv6 functions (David Ahern)
      
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cb47a86
  15. Jul 23, 2020
  16. Jul 10, 2020
    • Jakub Kicinski's avatar
      udp_tunnel: add central NIC RX port offload infrastructure · cc4e3835
      Jakub Kicinski authored
      
      Cater to devices which:
       (a) may want to sleep in the callbacks;
       (b) only have IPv4 support;
       (c) need all the programming to happen while the netdev is up.
      
      Drivers attach UDP tunnel offload info struct to their netdevs,
      where they declare how many UDP ports of various tunnel types
      they support. Core takes care of tracking which ports to offload.
      
      Use a fixed-size array since this matches what almost all drivers
      do, and avoids a complexity and uncertainty around memory allocations
      in an atomic context.
      
      Make sure that tunnel drivers don't try to replay the ports when
      new NIC netdev is registered. Automatic replays would mess up
      reference counting, and will be removed completely once all drivers
      are converted.
      
      v4:
       - use a #define NULL to avoid build issues with CONFIG_INET=n.
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc4e3835
  17. Jul 06, 2020
    • Sabrina Dubroca's avatar
      geneve: move all configuration under struct geneve_config · 9e06e859
      Sabrina Dubroca authored
      
      This patch adds a new structure geneve_config and moves the per-device
      configuration attributes to it, like we already have in VXLAN with
      struct vxlan_config. This ends up being pretty invasive since those
      attributes are used everywhere.
      
      This allows us to clean up the argument lists for geneve_configure (4
      arguments instead of 8) and geneve_nl2info (5 instead of 9).
      
      This also reduces the copy-paste of code setting those attributes
      between geneve_configure and geneve_changelink to a single memcpy,
      which would have avoided the bug fixed in commit
      56c09de3 ("geneve: allow changing DF behavior after creation").
      
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e06e859
  18. Jun 20, 2020
  19. Jun 04, 2020
    • Jiri Benc's avatar
      geneve: change from tx_error to tx_dropped on missing metadata · 9d149045
      Jiri Benc authored
      
      If the geneve interface is in collect_md (external) mode, it can't send any
      packets submitted directly to its net interface, as such packets won't have
      metadata attached. This is expected.
      
      However, the kernel itself sends some packets to the interface, most
      notably, IPv6 DAD, IPv6 multicast listener reports, etc. This is not wrong,
      as tunnel metadata can be specified in routing table (although technically,
      that has never worked for IPv6, but hopefully will be fixed eventually) and
      then the interface must correctly participate in IPv6 housekeeping.
      
      The problem is that any such attempt increases the tx_error counter. Just
      bringing up a geneve interface with IPv6 enabled is enough to see a number
      of tx_errors. That causes confusion among users, prompting them to find
      a network error where there is none.
      
      Change the counter used to tx_dropped. That better conveys the meaning
      (there's nothing wrong going on, just some packets are getting dropped) and
      hopefully will make admins panic less.
      
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d149045
  20. Apr 23, 2020
  21. Mar 15, 2020
  22. Dec 09, 2019
  23. Dec 04, 2019
    • Sabrina Dubroca's avatar
      net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup · 6c8991f4
      Sabrina Dubroca authored
      
      ipv6_stub uses the ip6_dst_lookup function to allow other modules to
      perform IPv6 lookups. However, this function skips the XFRM layer
      entirely.
      
      All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the
      ip_route_output_key and ip_route_output helpers) for their IPv4 lookups,
      which calls xfrm_lookup_route(). This patch fixes this inconsistent
      behavior by switching the stub to ip6_dst_lookup_flow, which also calls
      xfrm_lookup_route().
      
      This requires some changes in all the callers, as these two functions
      take different arguments and have different return types.
      
      Fixes: 5f81bd2e ("ipv6: export a stub for IPv6 symbols used by vxlan")
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c8991f4
  24. Sep 05, 2019
  25. Jun 19, 2019
  26. Jun 11, 2019
  27. Mar 29, 2019
  28. Mar 22, 2019
  29. Mar 02, 2019
    • Jiri Benc's avatar
      geneve: correctly handle ipv6.disable module parameter · cf1c9ccb
      Jiri Benc authored
      
      When IPv6 is compiled but disabled at runtime, geneve_sock_add returns
      -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
      operation of bringing up the tunnel.
      
      Ignore failure of IPv6 socket creation for metadata based tunnels caused by
      IPv6 not being available.
      
      This is the same fix as what commit d074bf96 ("vxlan: correctly handle
      ipv6.disable module parameter") is doing for vxlan.
      
      Note there's also commit c0a47e44 ("geneve: should not call rt6_lookup()
      when ipv6 was disabled") which fixes a similar issue but for regular
      tunnels, while this patch is needed for metadata based tunnels.
      
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf1c9ccb
  30. Feb 07, 2019
  31. Nov 18, 2018
  32. Nov 09, 2018
  33. Nov 06, 2018
  34. Oct 18, 2018
  35. Oct 04, 2018
  36. Sep 13, 2018
  37. Jul 02, 2018
    • Sabrina Dubroca's avatar
      net: fix use-after-free in GRO with ESP · 603d4cf8
      Sabrina Dubroca authored
      
      Since the addition of GRO for ESP, gro_receive can consume the skb and
      return -EINPROGRESS. In that case, the lower layer GRO handler cannot
      touch the skb anymore.
      
      Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted
      some of the gro_receive handlers that can lead to ESP's gro_receive so
      that they wouldn't access the skb when -EINPROGRESS is returned, but
      missed other spots, mainly in tunneling protocols.
      
      This patch finishes the conversion to using skb_gro_flush_final(), and
      adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
      GUE.
      
      Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      603d4cf8
Loading