Skip to content
Snippets Groups Projects
  1. Mar 22, 2022
  2. Nov 06, 2021
    • Marco Elver's avatar
      kfence: default to dynamic branch instead of static keys mode · 4f612ed3
      Marco Elver authored
      We have observed that on very large machines with newer CPUs, the static
      key/branch switching delay is on the order of milliseconds.  This is due
      to the required broadcast IPIs, which simply does not scale well to
      hundreds of CPUs (cores).  If done too frequently, this can adversely
      affect tail latencies of various workloads.
      
      One workaround is to increase the sample interval to several seconds,
      while decreasing sampled allocation coverage, but the problem still
      exists and could still increase tail latencies.
      
      As already noted in the Kconfig help text, there are trade-offs: at
      lower sample intervals the dynamic branch results in better performance;
      however, at very large sample intervals, the static keys mode can result
      in better performance -- careful benchmarking is recommended.
      
      Our initial benchmarking showed that with large enough sample intervals
      and workloads stressing the allocator, the static keys mode was slightly
      better.  Evaluating and observing the possible system-wide side-effects
      of the static-key-switching induced broadcast IPIs, however, was a blind
      spot (in particular on large machines with 100s of cores).
      
      Therefore, a major downside of the static keys mode is, unfortunately,
      that it is hard to predict performance on new system architectures and
      topologies, but also making conclusions about performance of new
      workloads based on a limited set of benchmarks.
      
      Most distributions will simply select the defaults, while targeting a
      large variety of different workloads and system architectures.  As such,
      the better default is CONFIG_KFENCE_STATIC_KEYS=n, and re-enabling it is
      only recommended after careful evaluation.
      
      For reference, on x86-64 the condition in kfence_alloc() generates
      exactly
      2 instructions in the kmem_cache_alloc() fast-path:
      
       | ...
       | cmpl   $0x0,0x1a8021c(%rip)  # ffffffff82d560d0 <kfence_allocation_gate>
       | je     ffffffff812d6003      <kmem_cache_alloc+0x243>
       | ...
      
      which, given kfence_allocation_gate is infrequently modified, should be
      well predicted by most CPUs.
      
      Link: https://lkml.kernel.org/r/20211019102524.2807208-2-elver@google.com
      
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f612ed3
  3. May 05, 2021
    • Marco Elver's avatar
      kfence: await for allocation using wait_event · 407f1d8c
      Marco Elver authored
      Patch series "kfence: optimize timer scheduling", v2.
      
      We have observed that mostly-idle systems with KFENCE enabled wake up
      otherwise idle CPUs, preventing such to enter a lower power state.
      Debugging revealed that KFENCE spends too much active time in
      toggle_allocation_gate().
      
      While the first version of KFENCE was using all the right bits to be
      scheduling optimal, and thus power efficient, by simply using wait_event()
      + wake_up(), that code was unfortunately removed.
      
      As KFENCE was exposed to various different configs and tests, the
      scheduling optimal code slowly disappeared.  First because of hung task
      warnings, and finally because of deadlocks when an allocation is made by
      timer code with debug objects enabled.  Clearly, the "fixes" were not too
      friendly for devices that want to be power efficient.
      
      Therefore, let's try a little harder to fix the hung task and deadlock
      problems that we have with wait_event() + wake_up(), while remaining as
      scheduling friendly and power efficient as possible.
      
      Crucially, we need to defer the wake_up() to an irq_work, avoiding any
      potential for deadlock.
      
      The result with this series is that on the devices where we observed a
      power regression, power usage returns back to baseline levels.
      
      This patch (of 3):
      
      On mostly-idle systems, we have observed that toggle_allocation_gate() is
      a cause of frequent wake-ups, preventing an otherwise idle CPU to go into
      a lower power state.
      
      A late change in KFENCE's development, due to a potential deadlock [1],
      required changing the scheduling-friendly wait_event_timeout() and
      wake_up() to an open-coded wait-loop using schedule_timeout().  [1]
      https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
      
      To avoid unnecessary wake-ups, switch to using wait_event_timeout().
      
      Unfortunately, we still cannot use a version with direct wake_up() in
      __kfence_alloc() due to the same potential for deadlock as in [1].
      Instead, add a level of indirection via an irq_work that is scheduled if
      we determine that the kfence_timer requires a wake_up().
      
      Link: https://lkml.kernel.org/r/20210421105132.3965998-1-elver@google.com
      Link: https://lkml.kernel.org/r/20210421105132.3965998-2-elver@google.com
      
      
      Fixes: 0ce20dd8 ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      407f1d8c
  4. Feb 26, 2021
    • Marco Elver's avatar
      kfence: add test suite · bc8fbc5f
      Marco Elver authored
      Add KFENCE test suite, testing various error detection scenarios. Makes
      use of KUnit for test organization. Since KFENCE's interface to obtain
      error reports is via the console, the test verifies that KFENCE outputs
      expected reports to the console.
      
      [elver@google.com: fix typo in test]
        Link: https://lkml.kernel.org/r/X9lHQExmHGvETxY4@elver.google.com
      [elver@google.com: show access type in report]
        Link: https://lkml.kernel.org/r/20210111091544.3287013-2-elver@google.com
      
      Link: https://lkml.kernel.org/r/20201103175841.3495947-9-elver@google.com
      
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Co-developed-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joern Engel <joern@purestorage.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bc8fbc5f
    • Marco Elver's avatar
      kfence, Documentation: add KFENCE documentation · 10efe55f
      Marco Elver authored
      Add KFENCE documentation in dev-tools/kfence.rst, and add to index.
      
      [elver@google.com: add missing copyright header to documentation]
        Link: https://lkml.kernel.org/r/20210118092159.145934-4-elver@google.com
      
      Link: https://lkml.kernel.org/r/20201103175841.3495947-8-elver@google.com
      
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Co-developed-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joern Engel <joern@purestorage.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10efe55f
    • Alexander Potapenko's avatar
      kfence, kasan: make KFENCE compatible with KASAN · 2b830526
      Alexander Potapenko authored
      Make KFENCE compatible with KASAN. Currently this helps test KFENCE
      itself, where KASAN can catch potential corruptions to KFENCE state, or
      other corruptions that may be a result of freepointer corruptions in the
      main allocators.
      
      [akpm@linux-foundation.org: merge fixup]
      [andreyknvl@google.com: untag addresses for KFENCE]
        Link: https://lkml.kernel.org/r/9dc196006921b191d25d10f6e611316db7da2efc.1611946152.git.andreyknvl@google.com
      
      Link: https://lkml.kernel.org/r/20201103175841.3495947-7-elver@google.com
      
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Co-developed-by: default avatarMarco Elver <elver@google.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joern Engel <joern@purestorage.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: SeongJae Park <sjpark@amazon.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2b830526
    • Alexander Potapenko's avatar
      mm: add Kernel Electric-Fence infrastructure · 0ce20dd8
      Alexander Potapenko authored
      Patch series "KFENCE: A low-overhead sampling-based memory safety error detector", v7.
      
      This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
      low-overhead sampling-based memory safety error detector of heap
      use-after-free, invalid-free, and out-of-bounds access errors.  This
      series enables KFENCE for the x86 and arm64 architectures, and adds
      KFENCE hooks to the SLAB and SLUB allocators.
      
      KFENCE is designed to be enabled in production kernels, and has near
      zero performance overhead. Compared to KASAN, KFENCE trades performance
      for precision. The main motivation behind KFENCE's design, is that with
      enough total uptime KFENCE will detect bugs in code paths not typically
      exercised by non-production test workloads. One way to quickly achieve a
      large enough total uptime is when the tool is deployed across a large
      fleet of machines.
      
      KFENCE objects each reside on a dedicated page, at either the left or
      right page boundaries. The pages to the left and right of the object
      page are "guard pages", whose attributes are changed to a protected
      state, and cause page faults on any attempted access to them. Such page
      faults are then intercepted by KFENCE, which handles the fault
      gracefully by reporting a memory access error.
      
      Guarded allocations are set up based on a sample interval (can be set
      via kfence.sample_interval). After expiration of the sample interval,
      the next allocation through the main allocator (SLAB or SLUB) returns a
      guarded allocation from the KFENCE object pool. At this point, the timer
      is reset, and the next allocation is set up after the expiration of the
      interval.
      
      To enable/disable a KFENCE allocation through the main allocator's
      fast-path without overhead, KFENCE relies on static branches via the
      static keys infrastructure. The static branch is toggled to redirect the
      allocation to KFENCE.
      
      The KFENCE memory pool is of fixed size, and if the pool is exhausted no
      further KFENCE allocations occur. The default config is conservative
      with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
      pages).
      
      We have verified by running synthetic benchmarks (sysbench I/O,
      hackbench) and production server-workload benchmarks that a kernel with
      KFENCE (using sample intervals 100-500ms) is performance-neutral
      compared to a non-KFENCE baseline kernel.
      
      KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
      properties. The name "KFENCE" is a homage to the Electric Fence Malloc
      Debugger [2].
      
      For more details, see Documentation/dev-tools/kfence.rst added in the
      series -- also viewable here:
      
      	https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst
      
      [1] http://llvm.org/docs/GwpAsan.html
      [2] https://linux.die.net/man/3/efence
      
      This patch (of 9):
      
      This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
      low-overhead sampling-based memory safety error detector of heap
      use-after-free, invalid-free, and out-of-bounds access errors.
      
      KFENCE is designed to be enabled in production kernels, and has near
      zero performance overhead. Compared to KASAN, KFENCE trades performance
      for precision. The main motivation behind KFENCE's design, is that with
      enough total uptime KFENCE will detect bugs in code paths not typically
      exercised by non-production test workloads. One way to quickly achieve a
      large enough total uptime is when the tool is deployed across a large
      fleet of machines.
      
      KFENCE objects each reside on a dedicated page, at either the left or
      right page boundaries. The pages to the left and right of the object
      page are "guard pages", whose attributes are changed to a protected
      state, and cause page faults on any attempted access to them. Such page
      faults are then intercepted by KFENCE, which handles the fault
      gracefully by reporting a memory access error. To detect out-of-bounds
      writes to memory within the object's page itself, KFENCE also uses
      pattern-based redzones. The following figure illustrates the page
      layout:
      
        ---+-----------+-----------+-----------+-----------+-----------+---
           | xxxxxxxxx | O :       | xxxxxxxxx |       : O | xxxxxxxxx |
           | xxxxxxxxx | B :       | xxxxxxxxx |       : B | xxxxxxxxx |
           | x GUARD x | J : RED-  | x GUARD x | RED-  : J | x GUARD x |
           | xxxxxxxxx | E :  ZONE | xxxxxxxxx |  ZONE : E | xxxxxxxxx |
           | xxxxxxxxx | C :       | xxxxxxxxx |       : C | xxxxxxxxx |
           | xxxxxxxxx | T :       | xxxxxxxxx |       : T | xxxxxxxxx |
        ---+-----------+-----------+-----------+-----------+-----------+---
      
      Guarded allocations are set up based on a sample interval (can be set
      via kfence.sample_interval). After expiration of the sample interval, a
      guarded allocation from the KFENCE object pool is returned to the main
      allocator (SLAB or SLUB). At this point, the timer is reset, and the
      next allocation is set up after the expiration of the interval.
      
      To enable/disable a KFENCE allocation through the main allocator's
      fast-path without overhead, KFENCE relies on static branches via the
      static keys infrastructure. The static branch is toggled to redirect the
      allocation to KFENCE. To date, we have verified by running synthetic
      benchmarks (sysbench I/O, hackbench) that a kernel compiled with KFENCE
      is performance-neutral compared to the non-KFENCE baseline.
      
      For more details, see Documentation/dev-tools/kfence.rst (added later in
      the series).
      
      [elver@google.com: fix parameter description for kfence_object_start()]
        Link: https://lkml.kernel.org/r/20201106092149.GA2851373@elver.google.com
      [elver@google.com: avoid stalling work queue task without allocations]
        Link: https://lkml.kernel.org/r/CADYN=9J0DQhizAGB0-jz4HOBBh+05kMBXb4c0cXMS7Qi5NAJiw@mail.gmail.com
        Link: https://lkml.kernel.org/r/20201110135320.3309507-1-elver@google.com
      [elver@google.com: fix potential deadlock due to wake_up()]
        Link: https://lkml.kernel.org/r/000000000000c0645805b7f982e4@google.com
        Link: https://lkml.kernel.org/r/20210104130749.1768991-1-elver@google.com
      [elver@google.com: add option to use KFENCE without static keys]
        Link: https://lkml.kernel.org/r/20210111091544.3287013-1-elver@google.com
      [elver@google.com: add missing copyright and description headers]
        Link: https://lkml.kernel.org/r/20210118092159.145934-1-elver@google.com
      
      Link: https://lkml.kernel.org/r/20201103175841.3495947-2-elver@google.com
      
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: default avatarSeongJae Park <sjpark@amazon.de>
      Co-developed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarJann Horn <jannh@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Joern Engel <joern@purestorage.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ce20dd8
Loading