1. 04 Jan, 2019 1 commit
    • Linus Torvalds's avatar
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds authored
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  2. 01 Jan, 2019 2 commits
    • Willem de Bruijn's avatar
      ip: validate header length on virtual device xmit · cb9f1b78
      Willem de Bruijn authored
      KMSAN detected read beyond end of buffer in vti and sit devices when
      passing truncated packets with PF_PACKET. The issue affects additional
      ip tunnel devices.
      
      Extend commit 76c0ddd8 ("ip6_tunnel: be careful when accessing the
      inner header") and commit ccfec9e5 ("ip_tunnel: be careful when
      accessing the inner header").
      
      Move the check to a separate helper and call at the start of each
      ndo_start_xmit function in net/ipv4 and net/ipv6.
      
      Minor changes:
      - convert dev_kfree_skb to kfree_skb on error path,
        as dev_kfree_skb calls consume_skb which is not for error paths.
      - use pskb_network_may_pull even though that is pedantic here,
        as the same as pskb_may_pull for devices without llheaders.
      - do not cache ipv6 hdrs if used only once
        (unsafe across pskb_may_pull, was more relevant to earlier patch)
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb9f1b78
    • Deepa Dinamani's avatar
      sock: Make sock->sk_stamp thread-safe · 3a0ed3e9
      Deepa Dinamani authored
      Al Viro mentioned (Message-ID
      <20170626041334.GZ10672@ZenIV.linux.org.uk>)
      that there is probably a race condition
      lurking in accesses of sk_stamp on 32-bit machines.
      
      sock->sk_stamp is of type ktime_t which is always an s64.
      On a 32 bit architecture, we might run into situations of
      unsafe access as the access to the field becomes non atomic.
      
      Use seqlocks for synchronization.
      This allows us to avoid using spinlocks for readers as
      readers do not need mutual exclusion.
      
      Another approach to solve this is to require sk_lock for all
      modifications of the timestamps. The current approach allows
      for timestamps to have their own lock: sk_stamp_lock.
      This allows for the patch to not compete with already
      existing critical sections, and side effects are limited
      to the paths in the patch.
      
      The addition of the new field maintains the data locality
      optimizations from
      commit 9115e8cd ("net: reorganize struct sock for better data
      locality")
      
      Note that all the instances of the sk_stamp accesses
      are either through the ioctl or the syscall recvmsg.
      Signed-off-by: default avatarDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a0ed3e9
  3. 29 Dec, 2018 2 commits
  4. 24 Dec, 2018 1 commit
    • Peter Oskolkov's avatar
      net: dccp: fix kernel crash on module load · c92c81df
      Peter Oskolkov authored
      Patch eedbbb0d "net: dccp: initialize (addr,port) ..."
      added calling to inet_hashinfo2_init() from dccp_init().
      
      However, inet_hashinfo2_init() is marked as __init(), and
      thus the kernel panics when dccp is loaded as module. Removing
      __init() tag from inet_hashinfo2_init() is not feasible because
      it calls into __init functions in mm.
      
      This patch adds inet_hashinfo2_init_mod() function that can
      be called after the init phase is done; changes dccp_init() to
      call the new function; un-marks inet_hashinfo2_init() as
      exported.
      
      Fixes: eedbbb0d ("net: dccp: initialize (addr,port) ...")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c92c81df
  5. 21 Dec, 2018 1 commit
  6. 20 Dec, 2018 5 commits
    • Florian Westphal's avatar
      netfilter: netns: shrink netns_ct struct · 8527f9df
      Florian Westphal authored
      remove the obsolete sysctl anchors and move auto_assign_helper_warned
      to avoid/cover a hole.  Reduces size by 40 bytes on 64 bit.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8527f9df
    • Florian Westphal's avatar
      netfilter: conntrack: remove empty pernet fini stubs · fc3893fd
      Florian Westphal authored
      after moving sysctl handling into single place, the init functions
      can't fail anymore and some of the fini functions are empty.
      
      Remove them and change return type to void.
      This also simplifies error unwinding in conntrack module init path.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fc3893fd
    • Florian Westphal's avatar
      netfilter: conntrack: un-export seq_print_acct · 4b216e21
      Florian Westphal authored
      Only one caller, just place it where its needed.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4b216e21
    • Florian Westphal's avatar
      netfilter: conntrack: udp: only extend timeout to stream mode after 2s · d535c8a6
      Florian Westphal authored
      Currently DNS resolvers that send both A and AAAA queries from same source port
      can trigger stream mode prematurely, which results in non-early-evictable conntrack entry
      for three minutes, even though DNS requests are done in a few milliseconds.
      
      Add a two second grace period where we continue to use the ordinary
      30-second default timeout.  Its enough for DNS request/response traffic,
      even if two request/reply packets are involved.
      
      ASSURED is still set, else conntrack (and thus a possible
      NAT mapping ...) gets zapped too in case conntrack table runs full.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d535c8a6
    • John Fastabend's avatar
      bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c
      John Fastabend authored
      A sockmap program that redirects through a kTLS ULP enabled socket
      will not work correctly because the ULP layer is skipped. This
      fixes the behavior to call through the ULP layer on redirect to
      ensure any operations required on the data stream at the ULP layer
      continue to be applied.
      
      To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
      calling the BPF layer on a redirected message. This is
      required to avoid calling the BPF layer multiple times (possibly
      recursively) which is not the current/expected behavior without
      ULPs. In the future we may add a redirect flag if users _do_
      want the policy applied again but this would need to work for both
      ULP and non-ULP sockets and be opt-in to avoid breaking existing
      programs.
      
      Also to avoid polluting the flag space with an internal flag we
      reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
      MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
      SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
      to verify is user space API is masked correctly to ensure the flag
      can not be set by user. (Note this needs to be true regardless
      because we have internal flags already in-use that user space
      should not be able to set). But for completeness we have two UAPI
      paths into sendpage, sendfile and splice.
      
      In the sendfile case the function do_sendfile() zero's flags,
      
      ./fs/read_write.c:
       static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
      		   	    size_t count, loff_t max)
       {
         ...
         fl = 0;
      #if 0
         /*
          * We need to debate whether we can enable this or not. The
          * man page documents EAGAIN return for the output at least,
          * and the application is arguably buggy if it doesn't expect
          * EAGAIN on a non-blocking file descriptor.
          */
          if (in.file->f_flags & O_NONBLOCK)
      	fl = SPLICE_F_NONBLOCK;
      #endif
          file_start_write(out.file);
          retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
       }
      
      In the splice case the pipe_to_sendpage "actor" is used which
      masks flags with SPLICE_F_MORE.
      
      ./fs/splice.c:
       static int pipe_to_sendpage(struct pipe_inode_info *pipe,
      			    struct pipe_buffer *buf, struct splice_desc *sd)
       {
         ...
         more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
         ...
       }
      
      Confirming what we expect that internal flags  are in fact internal
      to socket side.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      0608c69c
  7. 19 Dec, 2018 9 commits
  8. 18 Dec, 2018 5 commits
  9. 17 Dec, 2018 8 commits
  10. 16 Dec, 2018 2 commits
  11. 15 Dec, 2018 3 commits
  12. 14 Dec, 2018 1 commit