Skip to content
Snippets Groups Projects
  1. Sep 27, 2024
    • Al Viro's avatar
      [tree-wide] finally take no_llseek out · cb787f4a
      Al Viro authored
      
      no_llseek had been defined to NULL two years ago, in commit 868941b1
      ("fs: remove no_llseek")
      
      To quote that commit,
      
        At -rc1 we'll need do a mechanical removal of no_llseek -
      
        git grep -l -w no_llseek | grep -v porting.rst | while read i; do
      	sed -i '/\<no_llseek\>/d' $i
        done
      
        would do it.
      
      Unfortunately, that hadn't been done.  Linus, could you do that now, so
      that we could finally put that thing to rest? All instances are of the
      form
      	.llseek = no_llseek,
      so it's obviously safe.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb787f4a
  2. Sep 26, 2024
    • Florian Westphal's avatar
      netfilter: nfnetlink_queue: remove old clash resolution logic · 8af79d3e
      Florian Westphal authored
      
      For historical reasons there are two clash resolution spots in
      netfilter, one in nfnetlink_queue and one in conntrack core.
      
      nfnetlink_queue one was added first: If a colliding entry is found, NAT
      NAT transformation is reversed by calling nat engine again with altered
      tuple.
      
      See commit 368982cd ("netfilter: nfnetlink_queue: resolve clash for
      unconfirmed conntracks") for details.
      
      One problem is that nf_reroute() won't take an action if the queueing
      doesn't occur in the OUTPUT hook, i.e. when queueing in forward or
      postrouting, packet will be sent via the wrong path.
      
      Another problem is that the scenario addressed (2nd UDP packet sent with
      identical addresses while first packet is still being processed) can also
      occur without any nfqueue involvement due to threaded resolvers doing
      A and AAAA requests back-to-back.
      
      This lead us to add clash resolution logic to the conntrack core, see
      commit 6a757c07 ("netfilter: conntrack: allow insertion of clashing
      entries").  Instead of fixing the nfqueue based logic, lets remove it
      and let conntrack core handle this instead.
      
      Retain the ->update hook for sake of nfqueue based conntrack helpers.
      We could axe this hook completely but we'd have to split confirm and
      helper logic again, see commit ee04805f ("netfilter: conntrack: make
      conntrack userspace helpers work again").
      
      This SHOULD NOT be backported to kernels earlier than v5.6; they lack
      adequate clash resolution handling.
      
      Patch was originally written by Pablo Neira Ayuso.
      
      Reported-by: default avatarAntonio Ojea <aojea@google.com>
      Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1766
      
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Tested-by: default avatarAntonio Ojea <aojea@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8af79d3e
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: missing objects with no memcg accounting · 69e687ce
      Pablo Neira Ayuso authored
      
      Several ruleset objects are still not using GFP_KERNEL_ACCOUNT for
      memory accounting, update them. This includes:
      
      - catchall elements
      - compat match large info area
      - log prefix
      - meta secctx
      - numgen counters
      - pipapo set backend datastructure
      - tunnel private objects
      
      Fixes: 33758c89 ("memcg: enable accounting for nft objects")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      69e687ce
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use rcu chain hook list iterator from netlink dump path · 4ffcf5ca
      Pablo Neira Ayuso authored
      
      Lockless iteration over hook list is possible from netlink dump path,
      use rcu variant to iterate over the hook list as is done with flowtable
      hooks.
      
      Fixes: b9703ed4 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4ffcf5ca
    • Simon Horman's avatar
      netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS · e1f1ee0e
      Simon Horman authored
      Only provide ctnetlink_label_size when it is used,
      which is when CONFIG_NF_CONNTRACK_EVENTS is configured.
      
      Flagged by clang-18 W=1 builds as:
      
      .../nf_conntrack_netlink.c:385:19: warning: unused function 'ctnetlink_label_size' [-Wunused-function]
        385 | static inline int ctnetlink_label_size(const struct nf_conn *ct)
            |                   ^~~~~~~~~~~~~~~~~~~~
      
      The condition on CONFIG_NF_CONNTRACK_LABELS being removed by
      this patch guards compilation of non-trivial implementations
      of ctnetlink_dump_labels() and ctnetlink_label_size().
      
      However, this is not necessary as each of these functions
      will always return 0 if CONFIG_NF_CONNTRACK_LABELS is not defined
      as each function starts with the equivalent of:
      
      	struct nf_conn_labels *labels = nf_ct_labels_find(ct);
      
      	if (!labels)
      		return 0;
      
      And nf_ct_labels_find always returns NULL if CONFIG_NF_CONNTRACK_LABELS
      is not enabled.  So I believe that the compiler optimises the code away
      in such cases anyway.
      
      Found by inspection.
      Compile tested only.
      
      Originally splitted in two patches, Pablo Neira Ayuso collapsed them and
      added Fixes: tag.
      
      Fixes: 0ceabd83 ("netfilter: ctnetlink: deliver labels to userspace")
      Link: https://lore.kernel.org/netfilter-devel/20240909151712.GZ2097826@kernel.org/
      
      
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e1f1ee0e
    • Simon Horman's avatar
      netfilter: nf_reject: Fix build warning when CONFIG_BRIDGE_NETFILTER=n · fc56878c
      Simon Horman authored
      
      If CONFIG_BRIDGE_NETFILTER is not enabled, which is the case for x86_64
      defconfig, then building nf_reject_ipv4.c and nf_reject_ipv6.c with W=1
      using gcc-14 results in the following warnings, which are treated as
      errors:
      
      net/ipv4/netfilter/nf_reject_ipv4.c: In function 'nf_send_reset':
      net/ipv4/netfilter/nf_reject_ipv4.c:243:23: error: variable 'niph' set but not used [-Werror=unused-but-set-variable]
        243 |         struct iphdr *niph;
            |                       ^~~~
      cc1: all warnings being treated as errors
      net/ipv6/netfilter/nf_reject_ipv6.c: In function 'nf_send_reset6':
      net/ipv6/netfilter/nf_reject_ipv6.c:286:25: error: variable 'ip6h' set but not used [-Werror=unused-but-set-variable]
        286 |         struct ipv6hdr *ip6h;
            |                         ^~~~
      cc1: all warnings being treated as errors
      
      Address this by reducing the scope of these local variables to where
      they are used, which is code only compiled when CONFIG_BRIDGE_NETFILTER
      enabled.
      
      Compile tested and run through netfilter selftests.
      
      Reported-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Closes: https://lore.kernel.org/netfilter-devel/20240906145513.567781-1-andriy.shevchenko@linux.intel.com/
      
      
      Signed-off-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fc56878c
    • Phil Sutter's avatar
      netfilter: nf_tables: Keep deleted flowtable hooks until after RCU · 642c89c4
      Phil Sutter authored
      
      Documentation of list_del_rcu() warns callers to not immediately free
      the deleted list item. While it seems not necessary to use the
      RCU-variant of list_del() here in the first place, doing so seems to
      require calling kfree_rcu() on the deleted item as well.
      
      Fixes: 3f0465a9 ("netfilter: nf_tables: dynamically allocate hooks per net_device in flowtables")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      642c89c4
    • Andy Shevchenko's avatar
      netfilter: ctnetlink: Guard possible unused functions · 2cadd3b1
      Andy Shevchenko authored
      
      Some of the functions may be unused (CONFIG_NETFILTER_NETLINK_GLUE_CT=n
      and CONFIG_NF_CONNTRACK_EVENTS=n), it prevents kernel builds with clang,
      `make W=1` and CONFIG_WERROR=y:
      
      net/netfilter/nf_conntrack_netlink.c:657:22: error: unused function 'ctnetlink_acct_size' [-Werror,-Wunused-function]
        657 | static inline size_t ctnetlink_acct_size(const struct nf_conn *ct)
            |                      ^~~~~~~~~~~~~~~~~~~
      net/netfilter/nf_conntrack_netlink.c:667:19: error: unused function 'ctnetlink_secctx_size' [-Werror,-Wunused-function]
        667 | static inline int ctnetlink_secctx_size(const struct nf_conn *ct)
            |                   ^~~~~~~~~~~~~~~~~~~~~
      net/netfilter/nf_conntrack_netlink.c:683:22: error: unused function 'ctnetlink_timestamp_size' [-Werror,-Wunused-function]
        683 | static inline size_t ctnetlink_timestamp_size(const struct nf_conn *ct)
            |                      ^~~~~~~~~~~~~~~~~~~~~~~~
      
      Fix this by guarding possible unused functions with ifdeffery.
      
      See also commit 6863f564 ("kbuild: allow Clang to find unused static
      inline functions for W=1 build").
      
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2cadd3b1
    • Florian Westphal's avatar
      netfilter: conntrack: add clash resolution for reverse collisions · a4e6a103
      Florian Westphal authored
      
      Given existing entry:
      ORIGIN: a:b -> c:d
      REPLY:  c:d -> a:b
      
      And colliding entry:
      ORIGIN: c:d -> a:b
      REPLY:  a:b -> c:d
      
      The colliding ct (and the associated skb) get dropped on insert.
      Permit this by checking if the colliding entry matches the reply
      direction.
      
      Happens when both ends send packets at same time, both requests are picked
      up as NEW, rather than NEW for the 'first' and 'ESTABLISHED' for the
      second packet.
      
      This is an esoteric condition, as ruleset must permit NEW connections
      in either direction and both peers must already have a bidirectional
      traffic flow at the time conntrack gets enabled.
      
      Allow the 'reverse' skb to pass and assign the existing (clashing)
      entry.
      
      While at it, also drop the extra 'dying' check, this is already
      tested earlier by the calling function.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a4e6a103
    • Florian Westphal's avatar
      netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash · d8f84a9b
      Florian Westphal authored
      
      A conntrack entry can be inserted to the connection tracking table if there
      is no existing entry with an identical tuple in either direction.
      
      Example:
      INITIATOR -> NAT/PAT -> RESPONDER
      
      Initiator passes through NAT/PAT ("us") and SNAT is done (saddr rewrite).
      Then, later, NAT/PAT machine itself also wants to connect to RESPONDER.
      
      This will not work if the SNAT done earlier has same IP:PORT source pair.
      
      Conntrack table has:
      ORIGINAL: $IP_INITATOR:$SPORT -> $IP_RESPONDER:$DPORT
      REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT
      
      and new locally originating connection wants:
      ORIGINAL: $IP_NAT:$SPORT -> $IP_RESPONDER:$DPORT
      REPLY:    $IP_RESPONDER:$DPORT -> $IP_NAT:$SPORT
      
      This is handled by the NAT engine which will do a source port reallocation
      for the locally originating connection that is colliding with an existing
      tuple by attempting a source port rewrite.
      
      This is done even if this new connection attempt did not go through a
      masquerade/snat rule.
      
      There is a rare race condition with connection-less protocols like UDP,
      where we do the port reallocation even though its not needed.
      
      This happens when new packets from the same, pre-existing flow are received
      in both directions at the exact same time on different CPUs after the
      conntrack table was flushed (or conntrack becomes active for first time).
      
      With strict ordering/single cpu, the first packet creates new ct entry and
      second packet is resolved as established reply packet.
      
      With parallel processing, both packets are picked up as new and both get
      their own ct entry.
      
      In this case, the 'reply' packet (picked up as ORIGINAL) can be mangled by
      NAT engine because a port collision is detected.
      
      This change isn't enough to prevent a packet drop later during
      nf_conntrack_confirm(), the existing clash resolution strategy will not
      detect such reverse clash case.  This is resolved by a followup patch.
      
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d8f84a9b
  3. Sep 25, 2024
    • Luigi Leonardi's avatar
      vsock/virtio: avoid queuing packets when intermediate queue is empty · efcd71af
      Luigi Leonardi authored and Michael S. Tsirkin's avatar Michael S. Tsirkin committed
      
      When the driver needs to send new packets to the device, it always
      queues the new sk_buffs into an intermediate queue (send_pkt_queue)
      and schedules a worker (send_pkt_work) to then queue them into the
      virtqueue exposed to the device.
      
      This increases the chance of batching, but also introduces a lot of
      latency into the communication. So we can optimize this path by
      adding a fast path to be taken when there is no element in the
      intermediate queue, there is space available in the virtqueue,
      and no other process that is sending packets (tx_lock held).
      
      The following benchmarks were run to check improvements in latency and
      throughput. The test bed is a host with Intel i7-10700KF CPU @ 3.80GHz
      and L1 guest running on QEMU/KVM with vhost process and all vCPUs
      pinned individually to pCPUs.
      
      - Latency
         Tool: Fio version 3.37-56
         Mode: pingpong (h-g-h)
         Test runs: 50
         Runtime-per-test: 50s
         Type: SOCK_STREAM
      
      In the following fio benchmark (pingpong mode) the host sends
      a payload to the guest and waits for the same payload back.
      
      fio process pinned both inside the host and the guest system.
      
      Before: Linux 6.9.8
      
      Payload 64B:
      
      	1st perc.	overall		99th perc.
      Before	12.91		16.78		42.24		us
      After	9.77		13.57		39.17		us
      
      Payload 512B:
      
      	1st perc.	overall		99th perc.
      Before	13.35		17.35		41.52		us
      After	10.25		14.11		39.58		us
      
      Payload 4K:
      
      	1st perc.	overall		99th perc.
      Before	14.71		19.87		41.52		us
      After	10.51		14.96		40.81		us
      
      - Throughput
         Tool: iperf-vsock
      
      The size represents the buffer length (-l) to read/write
      P represents the number of parallel streams
      
      P=1
      	4K	64K	128K
      Before	6.87	29.3	29.5 Gb/s
      After	10.5	39.4	39.9 Gb/s
      
      P=2
      	4K	64K	128K
      Before	10.5	32.8	33.2 Gb/s
      After	17.8	47.7	48.5 Gb/s
      
      P=4
      	4K	64K	128K
      Before	12.7	33.6	34.2 Gb/s
      After	16.9	48.1	50.5 Gb/s
      
      The performance improvement is related to this optimization,
      I used a ebpf kretprobe on virtio_transport_send_skb to check
      that each packet was sent directly to the virtqueue
      
      Co-developed-by: default avatarMarco Pinna <marco.pinn95@gmail.com>
      Signed-off-by: default avatarMarco Pinna <marco.pinn95@gmail.com>
      Signed-off-by: default avatarLuigi Leonardi <luigi.leonardi@outlook.com>
      Message-Id: <20240730-pinna-v4-2-5c9179164db5@outlook.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      efcd71af
    • Marco Pinna's avatar
      vsock/virtio: refactor virtio_transport_send_pkt_work · 26618da3
      Marco Pinna authored and Michael S. Tsirkin's avatar Michael S. Tsirkin committed
      
      Preliminary patch to introduce an optimization to the
      enqueue system.
      
      All the code used to enqueue a packet into the virtqueue
      is removed from virtio_transport_send_pkt_work()
      and moved to the new virtio_transport_send_skb() function.
      
      Co-developed-by: default avatarLuigi Leonardi <luigi.leonardi@outlook.com>
      Signed-off-by: default avatarLuigi Leonardi <luigi.leonardi@outlook.com>
      Signed-off-by: default avatarMarco Pinna <marco.pinn95@gmail.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-Id: <20240730-pinna-v4-1-5c9179164db5@outlook.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      26618da3
  4. Sep 24, 2024
  5. Sep 23, 2024
  6. Sep 22, 2024
  7. Sep 20, 2024
  8. Sep 19, 2024
    • Eric Dumazet's avatar
      netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put() · 9c778fe4
      Eric Dumazet authored and Paolo Abeni's avatar Paolo Abeni committed
      
      syzbot reported that nf_reject_ip6_tcphdr_put() was possibly sending
      garbage on the four reserved tcp bits (th->res1)
      
      Use skb_put_zero() to clear the whole TCP header,
      as done in nf_reject_ip_tcphdr_put()
      
      BUG: KMSAN: uninit-value in nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255
        nf_reject_ip6_tcphdr_put+0x688/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:255
        nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
        do_softirq+0x9a/0x100 kernel/softirq.c:455
        __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382
        local_bh_enable include/linux/bottom_half.h:33 [inline]
        rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline]
        __dev_queue_xmit+0x2692/0x5610 net/core/dev.c:4450
        dev_queue_xmit include/linux/netdevice.h:3105 [inline]
        neigh_resolve_output+0x9ca/0xae0 net/core/neighbour.c:1565
        neigh_output include/net/neighbour.h:542 [inline]
        ip6_finish_output2+0x2347/0x2ba0 net/ipv6/ip6_output.c:141
        __ip6_finish_output net/ipv6/ip6_output.c:215 [inline]
        ip6_finish_output+0xbb8/0x14b0 net/ipv6/ip6_output.c:226
        NF_HOOK_COND include/linux/netfilter.h:303 [inline]
        ip6_output+0x356/0x620 net/ipv6/ip6_output.c:247
        dst_output include/net/dst.h:450 [inline]
        NF_HOOK include/linux/netfilter.h:314 [inline]
        ip6_xmit+0x1ba6/0x25d0 net/ipv6/ip6_output.c:366
        inet6_csk_xmit+0x442/0x530 net/ipv6/inet6_connection_sock.c:135
        __tcp_transmit_skb+0x3b07/0x4880 net/ipv4/tcp_output.c:1466
        tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
        tcp_connect+0x35b6/0x7130 net/ipv4/tcp_output.c:4143
        tcp_v6_connect+0x1bcc/0x1e40 net/ipv6/tcp_ipv6.c:333
        __inet_stream_connect+0x2ef/0x1730 net/ipv4/af_inet.c:679
        inet_stream_connect+0x6a/0xd0 net/ipv4/af_inet.c:750
        __sys_connect_file net/socket.c:2061 [inline]
        __sys_connect+0x606/0x690 net/socket.c:2078
        __do_sys_connect net/socket.c:2088 [inline]
        __se_sys_connect net/socket.c:2085 [inline]
        __x64_sys_connect+0x91/0xe0 net/socket.c:2085
        x64_sys_call+0x27a5/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:43
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Uninit was stored to memory at:
        nf_reject_ip6_tcphdr_put+0x60c/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:249
        nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
      
      Uninit was stored to memory at:
        nf_reject_ip6_tcphdr_put+0x2ca/0x6c0 net/ipv6/netfilter/nf_reject_ipv6.c:231
        nf_send_reset6+0xd84/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:344
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
      
      Uninit was created at:
        slab_post_alloc_hook mm/slub.c:3998 [inline]
        slab_alloc_node mm/slub.c:4041 [inline]
        kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4084
        kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583
        __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674
        alloc_skb include/linux/skbuff.h:1320 [inline]
        nf_send_reset6+0x98d/0x15b0 net/ipv6/netfilter/nf_reject_ipv6.c:327
        nft_reject_inet_eval+0x3c1/0x880 net/netfilter/nft_reject_inet.c:48
        expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
        nft_do_chain+0x438/0x22a0 net/netfilter/nf_tables_core.c:288
        nft_do_chain_inet+0x41a/0x4f0 net/netfilter/nft_chain_filter.c:161
        nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
        nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626
        nf_hook include/linux/netfilter.h:269 [inline]
        NF_HOOK include/linux/netfilter.h:312 [inline]
        ipv6_rcv+0x29b/0x390 net/ipv6/ip6_input.c:310
        __netif_receive_skb_one_core net/core/dev.c:5661 [inline]
        __netif_receive_skb+0x1da/0xa00 net/core/dev.c:5775
        process_backlog+0x4ad/0xa50 net/core/dev.c:6108
        __napi_poll+0xe7/0x980 net/core/dev.c:6772
        napi_poll net/core/dev.c:6841 [inline]
        net_rx_action+0xa5a/0x19b0 net/core/dev.c:6963
        handle_softirqs+0x1ce/0x800 kernel/softirq.c:554
        __do_softirq+0x14/0x1a kernel/softirq.c:588
      
      Fixes: c8d7b98b ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://patch.msgid.link/20240913170615.3670897-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9c778fe4
  9. Sep 15, 2024
  10. Sep 14, 2024
  11. Sep 13, 2024
    • Daniel Borkmann's avatar
      bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error · 4b3786a6
      Daniel Borkmann authored
      
      For all non-tracing helpers which formerly had ARG_PTR_TO_{LONG,INT} as input
      arguments, zero the value for the case of an error as otherwise it could leak
      memory. For tracing, it is not needed given CAP_PERFMON can already read all
      kernel memory anyway hence bpf_get_func_arg() and bpf_get_func_ret() is skipped
      in here.
      
      Also, the MTU helpers mtu_len pointer value is being written but also read.
      Technically, the MEM_UNINIT should not be there in order to always force init.
      Removing MEM_UNINIT needs more verifier rework though: MEM_UNINIT right now
      implies two things actually: i) write into memory, ii) memory does not have
      to be initialized. If we lift MEM_UNINIT, it then becomes: i) read into memory,
      ii) memory must be initialized. This means that for bpf_*_check_mtu() we're
      readding the issue we're trying to fix, that is, it would then be able to
      write back into things like .rodata BPF maps. Follow-up work will rework the
      MEM_UNINIT semantics such that the intent can be better expressed. For now
      just clear the *mtu_len on error path which can be lifted later again.
      
      Fixes: 8a67f2de ("bpf: expose bpf_strtol and bpf_strtoul to all program types")
      Fixes: d7a4cb9b ("bpf: Introduce bpf_strtol and bpf_strtoul helpers")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/e5edd241-59e7-5e39-0ee5-a51e31b6840a@iogearbox.net
      Link: https://lore.kernel.org/r/20240913191754.13290-5-daniel@iogearbox.net
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4b3786a6
Loading