Skip to content
Snippets Groups Projects
  1. Oct 28, 2023
  2. Oct 19, 2023
  3. Oct 18, 2023
  4. Oct 16, 2023
    • NeilBrown's avatar
      lib: add light-weight queuing mechanism. · de9e82c3
      NeilBrown authored
      
      lwq is a FIFO single-linked queue that only requires a spinlock
      for dequeueing, which happens in process context.  Enqueueing is atomic
      with no spinlock and can happen in any context.
      
      This is particularly useful when work items are queued from BH or IRQ
      context, and when they are handled one at a time by dedicated threads.
      
      Avoiding any locking when enqueueing means there is no need to disable
      BH or interrupts, which is generally best avoided (particularly when
      there are any RT tasks on the machine).
      
      This solution is superior to using "list_head" links because we need
      half as many pointers in the data structures, and because list_head
      lists would need locking to add items to the queue.
      
      This solution is superior to a bespoke solution as all locking and
      container_of casting is integrated, so the interface is simple.
      
      Despite the similar name, this solution meets a distinctly different
      need to kfifo.  kfifo provides a fixed sized circular buffer to which
      data can be added at one end and removed at the other, and does not
      provide any locking.  lwq does not have any size limit and works with
      data structures (objects?) rather than data (bytes).
      
      A unit test for basic functionality, which runs at boot time, is included.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Gow <davidgow@google.com>
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20230911111333.4d1a872330e924a00acb905b@linux-foundation.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      de9e82c3
  5. Oct 15, 2023
    • Yury Norov's avatar
      lib/bitmap: split-out string-related operations to a separate files · aae06fc1
      Yury Norov authored
      
      lib/bitmap.c and corresponding include/linux/bitmap.h are intended to
      hold functions related to operations on bitmaps, like bitmap_shift or
      bitmap_set. Historically, some string-related operations like
      bitmap_parse are also reside in lib/bitmap.c.
      
      Now that the subsystem evolves, string-related bitmap operations became a
      significant part of the file. Because they are quite different from the
      other bitmap functions by nature, it's worth to split them to a separate
      source/header files.
      
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      aae06fc1
  6. Sep 09, 2023
  7. Aug 15, 2023
    • Marco Elver's avatar
      list: Introduce CONFIG_LIST_HARDENED · aebc7b0d
      Marco Elver authored
      Numerous production kernel configs (see [1, 2]) are choosing to enable
      CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
      configs [3]. The motivation behind this is that the option can be used
      as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
      are mitigated by the option [4]).
      
      The feature has never been designed with performance in mind, yet common
      list manipulation is happening across hot paths all over the kernel.
      
      Introduce CONFIG_LIST_HARDENED, which performs list pointer checking
      inline, and only upon list corruption calls the reporting slow path.
      
      To generate optimal machine code with CONFIG_LIST_HARDENED:
      
        1. Elide checking for pointer values which upon dereference would
           result in an immediate access fault (i.e. minimal hardening
           checks).  The trade-off is lower-quality error reports.
      
        2. Use the __preserve_most function attribute (available with Clang,
           but not yet with GCC) to minimize the code footprint for calling
           the reporting slow path. As a result, function size of callers is
           reduced by avoiding saving registers before calling the rarely
           called reporting slow path.
      
           Note that all TUs in lib/Makefile already disable function tracing,
           including list_debug.c, and __preserve_most's implied notrace has
           no effect in this case.
      
        3. Because the inline checks are a subset of the full set of checks in
           __list_*_valid_or_report(), always return false if the inline
           checks failed.  This avoids redundant compare and conditional
           branch right after return from the slow path.
      
      As a side-effect of the checks being inline, if the compiler can prove
      some condition to always be true, it can completely elide some checks.
      
      Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the
      Kconfig variables are changed to reflect that: DEBUG_LIST selects
      LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on
      DEBUG_LIST.
      
      Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with
      "preserve_most") shows throughput improvements, in my case of ~7% on
      average (up to 20-30% on some test cases).
      
      Link: https://r.android.com/1266735 [1]
      Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
      Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
      Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html
      
       [4]
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Link: https://lore.kernel.org/r/20230811151847.1594958-3-elver@google.com
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      aebc7b0d
  8. Aug 11, 2023
  9. Jul 18, 2023
    • Yury Norov's avatar
      lib/bitmap: workaround const_eval test build failure · 2356d198
      Yury Norov authored
      When building with Clang, and when KASAN and GCOV_PROFILE_ALL are both
      enabled, the test fails to build [1]:
      
      >> lib/test_bitmap.c:920:2: error: call to '__compiletime_assert_239' declared with 'error' attribute: BUILD_BUG_ON failed: !__builtin_constant_p(res)
                 BUILD_BUG_ON(!__builtin_constant_p(res));
                 ^
         include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
                 BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
                 ^
         include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
         #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                             ^
         include/linux/compiler_types.h:352:2: note: expanded from macro 'compiletime_assert'
                 _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                 ^
         include/linux/compiler_types.h:340:2: note: expanded from macro '_compiletime_assert'
                 __compiletime_assert(condition, msg, prefix, suffix)
                 ^
         include/linux/compiler_types.h:333:4: note: expanded from macro '__compiletime_assert'
                                 prefix ## suffix();                             \
                                 ^
         <scratch space>:185:1: note: expanded from here
         __compiletime_assert_239
      
      Originally it was attributed to s390, which now looks seemingly wrong. The
      issue is not related to bitmap code itself, but it breaks build for a given
      configuration.
      
      Disabling the const_eval test under that config may potentially hide other
      bugs. Instead, workaround it by disabling GCOV for the test_bitmap unless
      the compiler will get fixed.
      
      [1] https://github.com/ClangBuiltLinux/linux/issues/1874
      
      
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202307171254.yFcH97ej-lkp@intel.com/
      
      
      Fixes: dc34d503 ("lib: test_bitmap: add compile-time optimization/evaluations assertions")
      Co-developed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      2356d198
  10. Jun 09, 2023
  11. May 25, 2023
    • Noah Goldstein's avatar
      x86/csum: Improve performance of `csum_partial` · 688eb819
      Noah Goldstein authored
      1) Add special case for len == 40 as that is the hottest value. The
         nets a ~8-9% latency improvement and a ~30% throughput improvement
         in the len == 40 case.
      
      2) Use multiple accumulators in the 64-byte loop. This dramatically
         improves ILP and results in up to a 40% latency/throughput
         improvement (better for more iterations).
      
      Results from benchmarking on Icelake. Times measured with rdtsc()
       len   lat_new   lat_old      r    tput_new  tput_old      r
         8      3.58      3.47  1.032        3.58      3.51  1.021
        16      4.14      4.02  1.028        3.96      3.78  1.046
        24      4.99      5.03  0.992        4.23      4.03  1.050
        32      5.09      5.08  1.001        4.68      4.47  1.048
        40      5.57      6.08  0.916        3.05      4.43  0.690
        48      6.65      6.63  1.003        4.97      4.69  1.059
        56      7.74      7.72  1.003        5.22      4.95  1.055
        64      6.65      7.22  0.921        6.38      6.42  0.994
        96      9.43      9.96  0.946        7.46      7.54  0.990
       128      9.39     12.15  0.773        8.90      8.79  1.012
       200     12.65     18.08  0.699       11.63     11.60  1.002
       272     15.82     23.37  0.677       14.43     14.35  1.005
       440     24.12     36.43  0.662       21.57     22.69  0.951
       952     46.20     74.01  0.624       42.98     53.12  0.809
      1024     47.12     78.24  0.602       46.36     58.83  0.788
      1552     72.01    117.30  0.614       71.92     96.78  0.743
      2048     93.07    153.25  0.607       93.28    137.20  0.680
      2600    114.73    194.30  0.590      114.28    179.32  0.637
      3608    156.34    268.41  0.582      154.97    254.02  0.610
      4096    175.01    304.03  0.576      175.89    292.08  0.602
      
      There is no such thing as a free lunch, however, and the special case
      for len == 40 does add overhead to the len != 40 cases. This seems to
      amount to be ~5% throughput and slightly less in terms of latency.
      
      Testing:
      Part of this change is a new kunit test. The tests check all
      alignment X length pairs in [0, 64) X [0, 512).
      There are three cases.
          1) Precomputed random inputs/seed. The expected results where
             generated use the generic implementation (which is assumed to be
             non-buggy).
          2) An input of all 1s. The goal of this test is to catch any case
             a carry is missing.
          3) An input that never carries. The goal of this test si to catch
             any case of incorrectly carrying.
      
      More exhaustive tests that test all alignment X length pairs in
      [0, 8192) X [0, 8192] on random data are also available here:
      https://github.com/goldsteinn/csum-reproduction
      
      
      
      The reposity also has the code for reproducing the above benchmark
      numbers.
      
      Signed-off-by: default avatarNoah Goldstein <goldstein.w.n@gmail.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: https://lore.kernel.org/all/20230511011002.935690-1-goldstein.w.n%40gmail.com
      688eb819
  12. May 16, 2023
  13. Mar 28, 2023
    • Thomas Gleixner's avatar
      atomics: Provide rcuref - scalable reference counting · ee1ee6db
      Thomas Gleixner authored
      
      atomic_t based reference counting, including refcount_t, uses
      atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is
      implemented with a atomic_try_cmpxchg() loop. High contention of the
      reference count leads to retry loops and scales badly. There is nothing to
      improve on this implementation as the semantics have to be preserved.
      
      Provide rcuref as a scalable alternative solution which is suitable for RCU
      managed objects. Similar to refcount_t it comes with overflow and underflow
      detection and mitigation.
      
      rcuref treats the underlying atomic_t as an unsigned integer and partitions
      this space into zones:
      
        0x00000000 - 0x7FFFFFFF	valid zone (1 .. (INT_MAX + 1) references)
        0x80000000 - 0xBFFFFFFF	saturation zone
        0xC0000000 - 0xFFFFFFFE	dead zone
        0xFFFFFFFF   			no reference
      
      rcuref_get() unconditionally increments the reference count with
      atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the
      reference count with atomic_add_negative_release().
      
      This unconditional increment avoids the inc_not_zero() problem, but
      requires a more complex implementation on the put() side when the count
      drops from 0 to -1.
      
      When this transition is detected then it is attempted to mark the reference
      count dead, by setting it to the midpoint of the dead zone with a single
      atomic_cmpxchg_release() operation. This operation can fail due to a
      concurrent rcuref_get() elevating the reference count from -1 to 0 again.
      
      If the unconditional increment in rcuref_get() hits a reference count which
      is marked dead (or saturated) it will detect it after the fact and bring
      back the reference count to the midpoint of the respective zone. The zones
      provide enough tolerance which makes it practically impossible to escape
      from a zone.
      
      The racy implementation of rcuref_put() requires to protect rcuref_put()
      against a grace period ending in order to prevent a subtle use after
      free. As RCU is the only mechanism which allows to protect against that, it
      is not possible to fully replace the atomic_inc_not_zero() based
      implementation of refcount_t with this scheme.
      
      The final drop is slightly more expensive than the atomic_dec_return()
      counterpart, but that's not the case which this is optimized for. The
      optimization is on the high frequeunt get()/put() pairs and their
      scalability.
      
      The performance of an uncontended rcuref_get()/put() pair where the put()
      is not dropping the last reference is still on par with the plain atomic
      operations, while at the same time providing overflow and underflow
      detection and mitigation.
      
      The performance of rcuref compared to plain atomic_inc_not_zero() and
      atomic_dec_return() based reference counting under contention:
      
       -  Micro benchmark: All CPUs running a increment/decrement loop on an
          elevated reference count, which means the 0 to -1 transition never
          happens.
      
          The performance gain depends on microarchitecture and the number of
          CPUs and has been observed in the range of 1.3X to 4.7X
      
       - Conversion of dst_entry::__refcnt to rcuref and testing with the
          localhost memtier/memcached benchmark. That benchmark shows the
          reference count contention prominently.
      
          The performance gain depends on microarchitecture and the number of
          CPUs and has been observed in the range of 1.1X to 2.6X over the
          previous fix for the false sharing issue vs. struct
          dst_entry::__refcnt.
      
          When memtier is run over a real 1Gb network connection, there is a
          small gain on top of the false sharing fix. The two changes combined
          result in a 2%-5% total gain for that networked test.
      
      Reported-by: default avatarWangyang Guo <wangyang.guo@intel.com>
      Reported-by: default avatarArjan Van De Ven <arjan.van.de.ven@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230323102800.158429195@linutronix.de
      ee1ee6db
  14. Mar 19, 2023
    • Jason Baron's avatar
      dyndbg: cleanup dynamic usage in ib_srp.c · 7ce93729
      Jason Baron authored
      
      Currently, in dynamic_debug.h we only provide
      DEFINE_DYNAMIC_DEBUG_METADATA() and DYNAMIC_DEBUG_BRANCH()
      definitions if CONFIG_DYNAMIC_CORE is enabled. Thus, drivers
      such as infiniband srp (see: drivers/infiniband/ulp/srp/ib_srp.c)
      must provide their own definitions for !CONFIG_DYNAMIC_CORE.
      
      Thus, let's move this !CONFIG_DYNAMIC_CORE case into dynamic_debug.h.
      However, the dynamic debug interfaces should really only be defined
      if CONFIG_DYNAMIC_DEBUG is set or CONFIG_DYNAMIC_CORE is set along
      with DYNAMIC_DEBUG_MODULE, (see:
      Documentation/admin-guide/dynamic-debug-howto.rst). Thus, the
      undefined case becomes: !((CONFIG_DYNAMIC_DEBUG ||
      (CONFIG_DYNAMIC_CORE && DYNAMIC_DEBUG_MODULE)).
      With those changes in place, we can remove the !CONFIG_DYNAMIC_CORE
      case from ib_srp.c
      
      This change was prompted by a build breakeage in ib_srp.c stemming
      from the inclusion of dynamic_debug.h unconditionally in module.h, due
      to commit 7deabd67 ("dyndbg: use the module notifier callbacks").
      In that case, if we have CONFIG_DYNAMIC_CORE=y and
      CONFIG_DYNAMIC_DEBUG=n then the definitions for
      DEFINE_DYNAMIC_DEBUG_METADATA() and DYNAMIC_DEBUG_BRANCH() are defined
      once in ib_srp.c and then again in the dynamic_debug.h. This had been
      working prior to the above referenced commit because dynamic_debug.h
      was only pulled into ib_srp.c conditinally via printk.h if
      CONFIG_DYNAMIC_DEBUG was set.
      
      Also, the exported functions in lib/dynamic_debug.c itself may
      not have a prototype if CONFIG_DYNAMIC_DEBUG=n and
      CONFIG_DYNAMIC_CORE=y. This would trigger the -Wmissing-prototypes
      warning.
      
      The exported functions are behind (include/linux/dynamic_debug.h):
      
      if defined(CONFIG_DYNAMIC_DEBUG) || \
       (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
      
      Thus, by adding -DDYNAMIC_CONFIG_MODULE to the lib/Makefile we
      can ensure that the exported functions have a prototype in all cases,
      since lib/dynamic_debug.c is built whenever
      CONFIG_DYNAMIC_DEBUG_CORE=y.
      
      Fixes: 7deabd67 ("dyndbg: use the module notifier callbacks")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202303071444.sIbZTDCy-lkp@intel.com/
      
      
      Signed-off-by: default avatarJason Baron <jbaron@akamai.com>
      [mcgrof: adjust commit log, and remove urldefense from URL]
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      7ce93729
  15. Feb 27, 2023
  16. Feb 08, 2023
  17. Feb 03, 2023
  18. Jan 17, 2023
  19. Dec 02, 2022
  20. Nov 23, 2022
    • Kees Cook's avatar
      kunit/fortify: Validate __alloc_size attribute results · 9124a264
      Kees Cook authored
      
      Validate the effect of the __alloc_size attribute on allocators. If the
      compiler doesn't support __builtin_dynamic_object_size(), skip the
      associated tests.
      
      (For GCC, just remove the "--make_options" line below...)
      
      $ ./tools/testing/kunit/kunit.py run --arch x86_64 \
              --kconfig_add CONFIG_FORTIFY_SOURCE=y \
      	--make_options LLVM=1
              fortify
      ...
      [15:16:30] ================== fortify (10 subtests) ===================
      [15:16:30] [PASSED] known_sizes_test
      [15:16:30] [PASSED] control_flow_split_test
      [15:16:30] [PASSED] alloc_size_kmalloc_const_test
      [15:16:30] [PASSED] alloc_size_kmalloc_dynamic_test
      [15:16:30] [PASSED] alloc_size_vmalloc_const_test
      [15:16:30] [PASSED] alloc_size_vmalloc_dynamic_test
      [15:16:30] [PASSED] alloc_size_kvmalloc_const_test
      [15:16:30] [PASSED] alloc_size_kvmalloc_dynamic_test
      [15:16:30] [PASSED] alloc_size_devm_kmalloc_const_test
      [15:16:30] [PASSED] alloc_size_devm_kmalloc_dynamic_test
      [15:16:30] ===================== [PASSED] fortify =====================
      [15:16:30] ============================================================
      [15:16:30] Testing complete. Ran 10 tests: passed: 10
      [15:16:31] Elapsed time: 8.348s total, 0.002s configuring, 6.923s building, 1.075s running
      
      For earlier GCC prior to version 12, the dynamic tests will be skipped:
      
      [15:18:59] ================== fortify (10 subtests) ===================
      [15:18:59] [PASSED] known_sizes_test
      [15:18:59] [PASSED] control_flow_split_test
      [15:18:59] [PASSED] alloc_size_kmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_kmalloc_dynamic_test
      [15:18:59] [PASSED] alloc_size_vmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_vmalloc_dynamic_test
      [15:18:59] [PASSED] alloc_size_kvmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_kvmalloc_dynamic_test
      [15:18:59] [PASSED] alloc_size_devm_kmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_devm_kmalloc_dynamic_test
      [15:18:59] ===================== [PASSED] fortify =====================
      [15:18:59] ============================================================
      [15:18:59] Testing complete. Ran 10 tests: passed: 6, skipped: 4
      [15:18:59] Elapsed time: 11.965s total, 0.002s configuring, 10.540s building, 1.068s running
      
      Cc: David Gow <davidgow@google.com>
      Cc: linux-hardening@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      9124a264
  21. Nov 08, 2022
  22. Nov 02, 2022
    • Kees Cook's avatar
      overflow: Introduce overflows_type() and castable_to_type() · 4b21d25b
      Kees Cook authored
      Implement a robust overflows_type() macro to test if a variable or
      constant value would overflow another variable or type. This can be
      used as a constant expression for static_assert() (which requires a
      constant expression[1][2]) when used on constant values. This must be
      constructed manually, since __builtin_add_overflow() does not produce
      a constant expression[3].
      
      Additionally adds castable_to_type(), similar to __same_type(), but for
      checking if a constant value would overflow if cast to a given type.
      
      Add unit tests for overflows_type(), __same_type(), and castable_to_type()
      to the existing KUnit "overflow" test:
      
      [16:03:33] ================== overflow (21 subtests) ==================
      ...
      [16:03:33] [PASSED] overflows_type_test
      [16:03:33] [PASSED] same_type_test
      [16:03:33] [PASSED] castable_to_type_test
      [16:03:33] ==================== [PASSED] overflow =====================
      [16:03:33] ============================================================
      [16:03:33] Testing complete. Ran 21 tests: passed: 21
      [16:03:33] Elapsed time: 24.022s total, 0.002s configuring, 22.598s building, 0.767s running
      
      [1] https://en.cppreference.com/w/c/language/_Static_assert
      [2] C11 standard (ISO/IEC 9899:2011): 6.7.10 Static assertions
      [3] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
      
      
          6.56 Built-in Functions to Perform Arithmetic with Overflow Checking
          Built-in Function: bool __builtin_add_overflow (type1 a, type2 b,
      
      Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Daniel Latypov <dlatypov@google.com>
      Cc: Vitor Massaru Iha <vitor@massaru.org>
      Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: linux-hardening@vger.kernel.org
      Cc: llvm@lists.linux.dev
      Co-developed-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20221024201125.1416422-1-gwan-gyeong.mun@intel.com
      4b21d25b
  23. Nov 01, 2022
  24. Oct 03, 2022
    • Alexander Potapenko's avatar
      kmsan: disable instrumentation of unsupported common kernel code · 79dbd006
      Alexander Potapenko authored
      EFI stub cannot be linked with KMSAN runtime, so we disable
      instrumentation for it.
      
      Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
      caused by instrumentation hooks calling instrumented code again.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-13-glider@google.com
      
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      79dbd006
    • Andrey Konovalov's avatar
      kasan: move tests to mm/kasan/ · f7e01ab8
      Andrey Konovalov authored
      Move KASAN tests to mm/kasan/ to keep the test code alongside the
      implementation.
      
      Link: https://lkml.kernel.org/r/676398f0aeecd47d2f8e3369ea0e95563f641a36.1662416260.git.andreyknvl@google.com
      
      
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f7e01ab8
  25. Sep 27, 2022
    • Liam R. Howlett's avatar
      Maple Tree: add new data structure · 54a611b6
      Liam R. Howlett authored
      Patch series "Introducing the Maple Tree"
      
      The maple tree is an RCU-safe range based B-tree designed to use modern
      processor cache efficiently.  There are a number of places in the kernel
      that a non-overlapping range-based tree would be beneficial, especially
      one with a simple interface.  If you use an rbtree with other data
      structures to improve performance or an interval tree to track
      non-overlapping ranges, then this is for you.
      
      The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
      nodes.  With the increased branching factor, it is significantly shorter
      than the rbtree so it has fewer cache misses.  The removal of the linked
      list between subsequent entries also reduces the cache misses and the need
      to pull in the previous and next VMA during many tree alterations.
      
      The first user that is covered in this patch set is the vm_area_struct,
      where three data structures are replaced by the maple tree: the augmented
      rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
      long term goal is to reduce or remove the mmap_lock contention.
      
      The plan is to get to the point where we use the maple tree in RCU mode.
      Readers will not block for writers.  A single write operation will be
      allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
      would be RCU enabled and this mode would be entered once multiple tasks
      are using the mm_struct.
      
      Davidlor said
      
      : Yes I like the maple tree, and at this stage I don't think we can ask for
      : more from this series wrt the MM - albeit there seems to still be some
      : folks reporting breakage.  Fundamentally I see Liam's work to (re)move
      : complexity out of the MM (not to say that the actual maple tree is not
      : complex) by consolidating the three complimentary data structures very
      : much worth it considering performance does not take a hit.  This was very
      : much a turn off with the range locking approach, which worst case scenario
      : incurred in prohibitive overhead.  Also as Liam and Matthew have
      : mentioned, RCU opens up a lot of nice performance opportunities, and in
      : addition academia[1] has shown outstanding scalability of address spaces
      : with the foundation of replacing the locked rbtree with RCU aware trees.
      
      A similar work has been discovered in the academic press
      
      	https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
      
      Sheer coincidence.  We designed our tree with the intention of solving the
      hardest problem first.  Upon settling on a b-tree variant and a rough
      outline, we researched ranged based b-trees and RCU b-trees and did find
      that article.  So it was nice to find reassurances that we were on the
      right path, but our design choice of using ranges made that paper unusable
      for us.
      
      This patch (of 70):
      
      The maple tree is an RCU-safe range based B-tree designed to use modern
      processor cache efficiently.  There are a number of places in the kernel
      that a non-overlapping range-based tree would be beneficial, especially
      one with a simple interface.  If you use an rbtree with other data
      structures to improve performance or an interval tree to track
      non-overlapping ranges, then this is for you.
      
      The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
      nodes.  With the increased branching factor, it is significantly shorter
      than the rbtree so it has fewer cache misses.  The removal of the linked
      list between subsequent entries also reduces the cache misses and the need
      to pull in the previous and next VMA during many tree alterations.
      
      The first user that is covered in this patch set is the vm_area_struct,
      where three data structures are replaced by the maple tree: the augmented
      rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
      long term goal is to reduce or remove the mmap_lock contention.
      
      The plan is to get to the point where we use the maple tree in RCU mode.
      Readers will not block for writers.  A single write operation will be
      allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
      would be RCU enabled and this mode would be entered once multiple tasks
      are using the mm_struct.
      
      There is additional BUG_ON() calls added within the tree, most of which
      are in debug code.  These will be replaced with a WARN_ON() call in the
      future.  There is also additional BUG_ON() calls within the code which
      will also be reduced in number at a later date.  These exist to catch
      things such as out-of-range accesses which would crash anyways.
      
      Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
      Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
      
      
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      54a611b6
  26. Sep 07, 2022
  27. Aug 31, 2022
  28. Aug 24, 2022
  29. Aug 19, 2022
  30. Aug 15, 2022
  31. Aug 02, 2022
  32. Aug 01, 2022
    • Yury Norov's avatar
      lib/nodemask: inline next_node_in() and node_random() · 36d4b36b
      Yury Norov authored
      
      The functions are pretty thin wrappers around find_bit engine, and
      keeping them in c-file prevents compiler from small_const_nbits()
      optimization, which must take place for all systems with MAX_NUMNODES
      less than BITS_PER_LONG (default is 16 for me).
      
      Moving them to header file doesn't blow up the kernel size:
      add/remove: 1/2 grow/shrink: 9/5 up/down: 968/-88 (880)
      
      CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      CC: Stephen Rothwell <sfr@canb.auug.org.au>
      CC: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      36d4b36b
  33. Jul 18, 2022
Loading