1. 11 Jan, 2019 1 commit
  2. 06 Jan, 2019 3 commits
    • Eric Biggers's avatar
      fscrypt: add Adiantum support · 8094c3ce
      Eric Biggers authored
      Add support for the Adiantum encryption mode to fscrypt.  Adiantum is a
      tweakable, length-preserving encryption mode with security provably
      reducible to that of XChaCha12 and AES-256, subject to a security bound.
      It's also a true wide-block mode, unlike XTS.  See the paper
      "Adiantum: length-preserving encryption for entry-level processors"
      (https://eprint.iacr.org/2018/720.pdf) for more details.  Also see
      commit 059c2a4d ("crypto: adiantum - add Adiantum support").
      On sufficiently long messages, Adiantum's bottlenecks are XChaCha12 and
      the NH hash function.  These algorithms are fast even on processors
      without dedicated crypto instructions.  Adiantum makes it feasible to
      enable storage encryption on low-end mobile devices that lack AES
      instructions; currently such devices are unencrypted.  On ARM Cortex-A7,
      on 4096-byte messages Adiantum encryption is about 4 times faster than
      AES-256-XTS encryption; decryption is about 5 times faster.
      In fscrypt, Adiantum is suitable for encrypting both file contents and
      names.  With filenames, it fixes a known weakness: when two filenames in
      a directory share a common prefix of >= 16 bytes, with CTS-CBC their
      encrypted filenames share a common prefix too, leaking information.
      Adiantum does not have this problem.
      Since Adiantum also accepts long tweaks (IVs), it's also safe to use the
      master key directly for Adiantum encryption rather than deriving
      per-file keys, provided that the per-file nonce is included in the IVs
      and the master key isn't used for any other encryption mode.  This
      configuration saves memory and improves performance.  A new fscrypt
      policy flag is added to allow users to opt-in to this configuration.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    • Masahiro Yamada's avatar
      arch: remove stale comments "UAPI Header export list" · d4ce5458
      Masahiro Yamada authored
      These comments are leftovers of commit fcc8487d ("uapi: export all
      headers under uapi directories").
      Prior to that commit, exported headers must be explicitly added to
      header-y. Now, all headers under the uapi/ directories are exported.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada authored
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
  3. 04 Jan, 2019 21 commits
  4. 03 Jan, 2019 4 commits
    • Daniel Borkmann's avatar
      bpf: prevent out of bounds speculation on pointer arithmetic · 979d63d5
      Daniel Borkmann authored
      Jann reported that the original commit back in b2157399
      ("bpf: prevent out-of-bounds speculation") was not sufficient
      to stop CPU from speculating out of bounds memory access:
      While b2157399 only focussed on masking array map access
      for unprivileged users for tail calls and data access such
      that the user provided index gets sanitized from BPF program
      and syscall side, there is still a more generic form affected
      from BPF programs that applies to most maps that hold user
      data in relation to dynamic map access when dealing with
      unknown scalars or "slow" known scalars as access offset, for
        - Load a map value pointer into R6
        - Load an index into R7
        - Do a slow computation (e.g. with a memory dependency) that
          loads a limit into R8 (e.g. load the limit from a map for
          high latency, then mask it to make the verifier happy)
        - Exit if R7 >= R8 (mispredicted branch)
        - Load R0 = R6[R7]
        - Load R0 = R6[R0]
      For unknown scalars there are two options in the BPF verifier
      where we could derive knowledge from in order to guarantee
      safe access to the memory: i) While </>/<=/>= variants won't
      allow to derive any lower or upper bounds from the unknown
      scalar where it would be safe to add it to the map value
      pointer, it is possible through ==/!= test however. ii) another
      option is to transform the unknown scalar into a known scalar,
      for example, through ALU ops combination such as R &= <imm>
      followed by R |= <imm> or any similar combination where the
      original information from the unknown scalar would be destroyed
      entirely leaving R with a constant. The initial slow load still
      precedes the latter ALU ops on that register, so the CPU
      executes speculatively from that point. Once we have the known
      scalar, any compare operation would work then. A third option
      only involving registers with known scalars could be crafted
      as described in [0] where a CPU port (e.g. Slow Int unit)
      would be filled with many dependent computations such that
      the subsequent condition depending on its outcome has to wait
      for evaluation on its execution port and thereby executing
      speculatively if the speculated code can be scheduled on a
      different execution port, or any other form of mistraining
      as described in [1], for example. Given this is not limited
      to only unknown scalars, not only map but also stack access
      is affected since both is accessible for unprivileged users
      and could potentially be used for out of bounds access under
      In order to prevent any of these cases, the verifier is now
      sanitizing pointer arithmetic on the offset such that any
      out of bounds speculation would be masked in a way where the
      pointer arithmetic result in the destination register will
      stay unchanged, meaning offset masked into zero similar as
      in array_index_nospec() case. With regards to implementation,
      there are three options that were considered: i) new insn
      for sanitation, ii) push/pop insn and sanitation as inlined
      BPF, iii) reuse of ax register and sanitation as inlined BPF.
      Option i) has the downside that we end up using from reserved
      bits in the opcode space, but also that we would require
      each JIT to emit masking as native arch opcodes meaning
      mitigation would have slow adoption till everyone implements
      it eventually which is counter-productive. Option ii) and iii)
      have both in common that a temporary register is needed in
      order to implement the sanitation as inlined BPF since we
      are not allowed to modify the source register. While a push /
      pop insn in ii) would be useful to have in any case, it
      requires once again that every JIT needs to implement it
      first. While possible, amount of changes needed would also
      be unsuitable for a -stable patch. Therefore, the path which
      has fewer changes, less BPF instructions for the mitigation
      and does not require anything to be changed in the JITs is
      option iii) which this work is pursuing. The ax register is
      already mapped to a register in all JITs (modulo arm32 where
      it's mapped to stack as various other BPF registers there)
      and used in constant blinding for JITs-only so far. It can
      be reused for verifier rewrites under certain constraints.
      The interpreter's tmp "register" has therefore been remapped
      into extending the register set with hidden ax register and
      reusing that for a number of instructions that needed the
      prior temporary variable internally (e.g. div, mod). This
      allows for zero increase in stack space usage in the interpreter,
      and enables (restricted) generic use in rewrites otherwise as
      long as such a patchlet does not make use of these instructions.
      The sanitation mask is dynamic and relative to the offset the
      map value or stack pointer currently holds.
      There are various cases that need to be taken under consideration
      for the masking, e.g. such operation could look as follows:
      ptr += val or val += ptr or ptr -= val. Thus, the value to be
      sanitized could reside either in source or in destination
      register, and the limit is different depending on whether
      the ALU op is addition or subtraction and depending on the
      current known and bounded offset. The limit is derived as
      follows: limit := max_value_size - (smin_value + off). For
      subtraction: limit := umax_value + off. This holds because
      we do not allow any pointer arithmetic that would
      temporarily go out of bounds or would have an unknown
      value with mixed signed bounds where it is unclear at
      verification time whether the actual runtime value would
      be either negative or positive. For example, we have a
      derived map pointer value with constant offset and bounded
      one, so limit based on smin_value works because the verifier
      requires that statically analyzed arithmetic on the pointer
      must be in bounds, and thus it checks if resulting
      smin_value + off and umax_value + off is still within map
      value bounds at time of arithmetic in addition to time of
      access. Similarly, for the case of stack access we derive
      the limit as follows: MAX_BPF_STACK + off for subtraction
      and -off for the case of addition where off := ptr_reg->off +
      ptr_reg->var_off.value. Subtraction is a special case for
      the masking which can be in form of ptr += -val, ptr -= -val,
      or ptr -= val. In the first two cases where we know that
      the value is negative, we need to temporarily negate the
      value in order to do the sanitation on a positive value
      where we later swap the ALU op, and restore original source
      register if the value was in source.
      The sanitation of pointer arithmetic alone is still not fully
      sufficient as is, since a scenario like the following could
      happen ...
        PTR += 0x1000 (e.g. K-based imm)
        PTR += 0x1000
      ... which under speculation could end up as ...
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
        PTR += 0x1000
        PTR -= 0 [ truncated by mitigation ]
      ... and therefore still access out of bounds. To prevent such
      case, the verifier is also analyzing safety for potential out
      of bounds access under speculative execution. Meaning, it is
      also simulating pointer access under truncation. We therefore
      "branch off" and push the current verification state after the
      ALU operation with known 0 to the verification stack for later
      analysis. Given the current path analysis succeeded it is
      likely that the one under speculation can be pruned. In any
      case, it is also subject to existing complexity limits and
      therefore anything beyond this point will be rejected. In
      terms of pruning, it needs to be ensured that the verification
      state from speculative execution simulation must never prune
      a non-speculative execution path, therefore, we mark verifier
      state accordingly at the time of push_stack(). If verifier
      detects out of bounds access under speculative execution from
      one of the possible paths that includes a truncation, it will
      reject such program.
      Given we mask every reg-based pointer arithmetic for
      unprivileged programs, we've been looking into how it could
      affect real-world programs in terms of size increase. As the
      majority of programs are targeted for privileged-only use
      case, we've unconditionally enabled masking (with its alu
      restrictions on top of it) for privileged programs for the
      sake of testing in order to check i) whether they get rejected
      in its current form, and ii) by how much the number of
      instructions and size will increase. We've tested this by
      using Katran, Cilium and test_l4lb from the kernel selftests.
      For Katran we've evaluated balancer_kern.o, Cilium bpf_lxc.o
      and an older test object bpf_lxc_opt_-DUNKNOWN.o and l4lb
      we've used test_l4lb.o as well as test_l4lb_noinline.o. We
      found that none of the programs got rejected by the verifier
      with this change, and that impact is rather minimal to none.
      balancer_kern.o had 13,904 bytes (1,738 insns) xlated and
      7,797 bytes JITed before and after the change. Most complex
      program in bpf_lxc.o had 30,544 bytes (3,817 insns) xlated
      and 18,538 bytes JITed before and after and none of the other
      tail call programs in bpf_lxc.o had any changes either. For
      the older bpf_lxc_opt_-DUNKNOWN.o object we found a small
      increase from 20,616 bytes (2,576 insns) and 12,536 bytes JITed
      before to 20,664 bytes (2,582 insns) and 12,558 bytes JITed
      after the change. Other programs from that object file had
      similar small increase. Both test_l4lb.o had no change and
      remained at 6,544 bytes (817 insns) xlated and 3,401 bytes
      JITed and for test_l4lb_noinline.o constant at 5,080 bytes
      (634 insns) xlated and 3,313 bytes JITed. This can be explained
      in that LLVM typically optimizes stack based pointer arithmetic
      by using K-based operations and that use of dynamic map access
      is not overly frequent. However, in future we may decide to
      optimize the algorithm further under known guarantees from
      branch and value speculation. Latter seems also unclear in
      terms of prediction heuristics that today's CPUs apply as well
      as whether there could be collisions in e.g. the predictor's
      Value History/Pattern Table for triggering out of bounds access,
      thus masking is performed unconditionally at this point but could
      be subject to relaxation later on. We were generally also
      brainstorming various other approaches for mitigation, but the
      blocker was always lack of available registers at runtime and/or
      overhead for runtime tracking of limits belonging to a specific
      pointer. Thus, we found this to be minimally intrusive under
      given constraints.
      With that in place, a simple example with sanitized access on
      unprivileged load at post-verification time looks as follows:
        # bpftool prog dump xlated id 282
        28: (79) r1 = *(u64 *)(r7 +0)
        29: (79) r2 = *(u64 *)(r7 +8)
        30: (57) r1 &= 15
        31: (79) r3 = *(u64 *)(r0 +4608)
        32: (57) r3 &= 1
        33: (47) r3 |= 1
        34: (2d) if r2 > r3 goto pc+19
        35: (b4) (u32) r11 = (u32) 20479  |
        36: (1f) r11 -= r2                | Dynamic sanitation for pointer
        37: (4f) r11 |= r2                | arithmetic with registers
        38: (87) r11 = -r11               | containing bounded or known
        39: (c7) r11 s>>= 63              | scalars in order to prevent
        40: (5f) r11 &= r2                | out of bounds speculation.
        41: (0f) r4 += r11                |
        42: (71) r4 = *(u8 *)(r4 +0)
        43: (6f) r4 <<= r1
      For the case where the scalar sits in the destination register
      as opposed to the source register, the following code is emitted
      for the above example:
        16: (b4) (u32) r11 = (u32) 20479
        17: (1f) r11 -= r2
        18: (4f) r11 |= r2
        19: (87) r11 = -r11
        20: (c7) r11 s>>= 63
        21: (5f) r2 &= r11
        22: (0f) r2 += r0
        23: (61) r0 = *(u32 *)(r2 +0)
      JIT blinding example with non-conflicting use of r10:
         d5:	je     0x0000000000000106    _
         d7:	mov    0x0(%rax),%edi       |
         da:	mov    $0xf153246,%r10d     | Index load from map value and
         e0:	xor    $0xf153259,%r10      | (const blinded) mask with 0x1f.
         e7:	and    %r10,%rdi            |_
         ea:	mov    $0x2f,%r10d          |
         f0:	sub    %rdi,%r10            | Sanitized addition. Both use r10
         f3:	or     %rdi,%r10            | but do not interfere with each
         f6:	neg    %r10                 | other. (Neither do these instructions
         f9:	sar    $0x3f,%r10           | interfere with the use of ax as temp
         fd:	and    %r10,%rdi            | in interpreter.)
        100:	add    %rax,%rdi            |_
        103:	mov    0x0(%rdi),%eax
      Tested that it fixes Jann's reproducer, and also checked that test_verifier
      and test_progs suite with interpreter, JIT and JIT with hardening enabled
      on x86-64 and arm64 runs successfully.
        [0] Speculose: Analyzing the Security Implications of Speculative
            Execution in CPUs, Giorgi Maisuradze and Christian Rossow,
        [1] A Systematic Evaluation of Transient Execution Attacks and
            Defenses, Claudio Canella, Jo Van Bulck, Michael Schwarz,
            Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens,
            Dmitry Evtyushkin, Daniel Gruss,
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: Jann Horn's avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Daniel Borkmann's avatar
      bpf: enable access to ax register also from verifier rewrite · 9b73bfdd
      Daniel Borkmann authored
      Right now we are using BPF ax register in JIT for constant blinding as
      well as in interpreter as temporary variable. Verifier will not be able
      to use it simply because its use will get overridden from the former in
      bpf_jit_blind_insn(). However, it can be made to work in that blinding
      will be skipped if there is prior use in either source or destination
      register on the instruction. Taking constraints of ax into account, the
      verifier is then open to use it in rewrites under some constraints. Note,
      ax register already has mappings in every eBPF JIT.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Daniel Borkmann's avatar
      bpf: move tmp variable into ax register in interpreter · 144cd91c
      Daniel Borkmann authored
      This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run()
      into the hidden ax register. The latter is currently only used in JITs
      for constant blinding as a temporary scratch register, meaning the BPF
      interpreter will never see the use of ax. Therefore it is safe to use
      it for the cases where tmp has been used earlier. This is needed to later
      on allow restricted hidden use of ax in both interpreter and JITs.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    • Daniel Borkmann's avatar
      bpf: move {prev_,}insn_idx into verifier env · c08435ec
      Daniel Borkmann authored
      Move prev_insn_idx and insn_idx from the do_check() function into
      the verifier environment, so they can be read inside the various
      helper functions for handling the instructions. It's easier to put
      this into the environment rather than changing all call-sites only
      to pass it along. insn_idx is useful in particular since this later
      on allows to hold state in env->insn_aux_data[env->insn_idx].
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
  5. 02 Jan, 2019 9 commits
  6. 01 Jan, 2019 2 commits
    • Willem de Bruijn's avatar
      ip: validate header length on virtual device xmit · cb9f1b78
      Willem de Bruijn authored
      KMSAN detected read beyond end of buffer in vti and sit devices when
      passing truncated packets with PF_PACKET. The issue affects additional
      ip tunnel devices.
      Extend commit 76c0ddd8 ("ip6_tunnel: be careful when accessing the
      inner header") and commit ccfec9e5 ("ip_tunnel: be careful when
      accessing the inner header").
      Move the check to a separate helper and call at the start of each
      ndo_start_xmit function in net/ipv4 and net/ipv6.
      Minor changes:
      - convert dev_kfree_skb to kfree_skb on error path,
        as dev_kfree_skb calls consume_skb which is not for error paths.
      - use pskb_network_may_pull even though that is pedantic here,
        as the same as pskb_may_pull for devices without llheaders.
      - do not cache ipv6 hdrs if used only once
        (unsafe across pskb_may_pull, was more relevant to earlier patch)
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Cong Wang's avatar
      ptr_ring: wrap back ->producer in __ptr_ring_swap_queue() · aff6db45
      Cong Wang authored
      __ptr_ring_swap_queue() tries to move pointers from the old
      ring to the new one, but it forgets to check if ->producer
      is beyond the new size at the end of the operation. This leads
      to an out-of-bound access in __ptr_ring_produce() as reported
      by syzbot.
      Reported-by: syzbot+8993c0fa96d57c399735@syzkaller.appspotmail.com
      Fixes: 5d49de53 ("ptr_ring: resize support")
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>