Skip to content
Snippets Groups Projects
  1. Oct 02, 2024
  2. Oct 01, 2024
  3. Sep 26, 2024
  4. Sep 20, 2024
    • Ming Lei's avatar
      lib/sbitmap: define swap_lock as raw_spinlock_t · 65f666c6
      Ming Lei authored
      
      When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run
      with preempt disabled. In RT kernel, spin_lock() can sleep, then warning
      of "BUG: sleeping function called from invalid context" can be triggered.
      
      Fix it by replacing it with raw_spin_lock.
      
      Cc: Yang Yang <yang.yang@vivo.com>
      Fixes: 72d04bdc ("sbitmap: fix io hung due to race on sbitmap_word::cleared")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarYang Yang <yang.yang@vivo.com>
      Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      65f666c6
    • Kris Van Hees's avatar
      kbuild: generate offset range data for builtin modules · 5f5e7344
      Kris Van Hees authored
      
      Create file module.builtin.ranges that can be used to find where
      built-in modules are located by their addresses. This will be useful for
      tracing tools to find what functions are for various built-in modules.
      
      The offset range data for builtin modules is generated using:
       - modules.builtin: associates object files with module names
       - vmlinux.map: provides load order of sections and offset of first member
          per section
       - vmlinux.o.map: provides offset of object file content per section
       - .*.cmd: build cmd file with KBUILD_MODFILE
      
      The generated data will look like:
      
      .text 00000000-00000000 = _text
      .text 0000baf0-0000cb10 amd_uncore
      .text 0009bd10-0009c8e0 iosf_mbi
      ...
      .text 00b9f080-00ba011a intel_skl_int3472_discrete
      .text 00ba0120-00ba03c0 intel_skl_int3472_discrete intel_skl_int3472_tps68470
      .text 00ba03c0-00ba08d6 intel_skl_int3472_tps68470
      ...
      .data 00000000-00000000 = _sdata
      .data 0000f020-0000f680 amd_uncore
      
      For each ELF section, it lists the offset of the first symbol.  This can
      be used to determine the base address of the section at runtime.
      
      Next, it lists (in strict ascending order) offset ranges in that section
      that cover the symbols of one or more builtin modules.  Multiple ranges
      can apply to a single module, and ranges can be shared between modules.
      
      The CONFIG_BUILTIN_MODULE_RANGES option controls whether offset range data
      is generated for kernel modules that are built into the kernel image.
      
      How it works:
      
       1. The modules.builtin file is parsed to obtain a list of built-in
          module names and their associated object names (the .ko file that
          the module would be in if it were a loadable module, hereafter
          referred to as <kmodfile>).  This object name can be used to
          identify objects in the kernel compile because any C or assembler
          code that ends up into a built-in module will have the option
          -DKBUILD_MODFILE=<kmodfile> present in its build command, and those
          can be found in the .<obj>.cmd file in the kernel build tree.
      
          If an object is part of multiple modules, they will all be listed
          in the KBUILD_MODFILE option argument.
      
          This allows us to conclusively determine whether an object in the
          kernel build belong to any modules, and which.
      
       2. The vmlinux.map is parsed next to determine the base address of each
          top level section so that all addresses into the section can be
          turned into offsets.  This makes it possible to handle sections
          getting loaded at different addresses at system boot.
      
          We also determine an 'anchor' symbol at the beginning of each
          section to make it possible to calculate the true base address of
          a section at runtime (i.e. symbol address - symbol offset).
      
          We collect start addresses of sections that are included in the top
          level section.  This is used when vmlinux is linked using vmlinux.o,
          because in that case, we need to look at the vmlinux.o linker map to
          know what object a symbol is found in.
      
          And finally, we process each symbol that is listed in vmlinux.map
          (or vmlinux.o.map) based on the following structure:
      
          vmlinux linked from vmlinux.a:
      
            vmlinux.map:
              <top level section>
                <included section>  -- might be same as top level section)
                  <object>          -- built-in association known
                    <symbol>        -- belongs to module(s) object belongs to
                    ...
      
          vmlinux linked from vmlinux.o:
      
            vmlinux.map:
              <top level section>
                <included section>  -- might be same as top level section)
                  vmlinux.o         -- need to use vmlinux.o.map
                    <symbol>        -- ignored
                    ...
      
            vmlinux.o.map:
              <section>
                  <object>          -- built-in association known
                    <symbol>        -- belongs to module(s) object belongs to
                    ...
      
       3. As sections, objects, and symbols are processed, offset ranges are
          constructed in a straight-forward way:
      
            - If the symbol belongs to one or more built-in modules:
                - If we were working on the same module(s), extend the range
                  to include this object
                - If we were working on another module(s), close that range,
                  and start the new one
            - If the symbol does not belong to any built-in modules:
                - If we were working on a module(s) range, close that range
      
      Signed-off-by: default avatarKris Van Hees <kris.van.hees@oracle.com>
      Reviewed-by: default avatarNick Alcock <nick.alcock@oracle.com>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Tested-by: default avatarSam James <sam@gentoo.org>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Tested-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      5f5e7344
  5. Sep 17, 2024
  6. Sep 14, 2024
  7. Sep 13, 2024
    • Christophe Leroy's avatar
      random: vDSO: minimize and simplify header includes · 7f053812
      Christophe Leroy authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      Depending on the architecture, building a 32-bit vDSO on a 64-bit kernel
      is problematic when some system headers are included.
      
      Minimise the amount of headers by moving needed items, such as
      __{get,put}_unaligned_t, into dedicated common headers and in general
      use more specific headers, similar to what was done in commit
      8165b57b ("linux/const.h: Extract common header for vDSO") and
      commit 8c59ab83 ("lib/vdso: Enable common headers").
      
      On some architectures this results in missing PAGE_SIZE, as was
      described by commit 8b3843ae ("vdso/datapage: Quick fix - use
      asm/page-def.h for ARM64"), so define this if necessary, in the same way
      as done prior by commit cffaefd1 ("vdso: Use CONFIG_PAGE_SHIFT in
      vdso/datapage.h").
      
      Removing linux/time64.h leads to missing 'struct timespec64' in
      x86's asm/pvclock.h. Add a forward declaration of that struct in
      that file.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      7f053812
    • Christophe Leroy's avatar
      random: vDSO: avoid call to out of line memset() · b7bad082
      Christophe Leroy authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      With the current implementation, __cvdso_getrandom_data() calls
      memset() on certain architectures, which is unexpected in the VDSO.
      
      Rather than providing a memset(), simply rewrite opaque data
      initialization to avoid memset().
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      b7bad082
    • Christophe Leroy's avatar
      random: vDSO: add missing c-getrandom-y in Makefile · 81723e3a
      Christophe Leroy authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      Same as for the gettimeofday CVDSO implementation, add c-getrandom-y to
      ease the inclusion of lib/vdso/getrandom.c in architectures' VDSO
      builds.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      81723e3a
    • Christophe Leroy's avatar
      random: vDSO: don't use 64-bit atomics on 32-bit architectures · 81c68960
      Christophe Leroy authored and Jason A. Donenfeld's avatar Jason A. Donenfeld committed
      
      Performing SMP atomic operations on u64 fails on powerpc32:
      
          CC      drivers/char/random.o
        In file included from <command-line>:
        drivers/char/random.c: In function 'crng_reseed':
        ././include/linux/compiler_types.h:510:45: error: call to '__compiletime_assert_391' declared with attribute error: Need native word sized stores/loads for atomicity.
          510 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
              |                                             ^
        ././include/linux/compiler_types.h:491:25: note: in definition of macro '__compiletime_assert'
          491 |                         prefix ## suffix();                             \
              |                         ^~~~~~
        ././include/linux/compiler_types.h:510:9: note: in expansion of macro '_compiletime_assert'
          510 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
              |         ^~~~~~~~~~~~~~~~~~~
        ././include/linux/compiler_types.h:513:9: note: in expansion of macro 'compiletime_assert'
          513 |         compiletime_assert(__native_word(t),                            \
              |         ^~~~~~~~~~~~~~~~~~
        ./arch/powerpc/include/asm/barrier.h:74:9: note: in expansion of macro 'compiletime_assert_atomic_type'
           74 |         compiletime_assert_atomic_type(*p);                             \
              |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        ./include/asm-generic/barrier.h:172:55: note: in expansion of macro '__smp_store_release'
          172 | #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
              |                                                       ^~~~~~~~~~~~~~~~~~~
        drivers/char/random.c:286:9: note: in expansion of macro 'smp_store_release'
          286 |         smp_store_release(&__arch_get_k_vdso_rng_data()->generation, next_gen + 1);
              |         ^~~~~~~~~~~~~~~~~
      
      The kernel-side generation counter in the random driver is handled as an
      unsigned long, not as a u64, in base_crng and struct crng.
      
      But on the vDSO side, it needs to be an u64, not just an unsigned long,
      in order to support a 32-bit vDSO atop a 64-bit kernel.
      
      On kernel side, however, it is an unsigned long, hence a 32-bit value on
      32-bit architectures, so just cast it to unsigned long for the
      smp_store_release(). A side effect is that on big endian architectures
      the store will be performed in the upper 32 bits. It is not an issue on
      its own because the vDSO site doesn't mind the value, as it only checks
      differences. Just make sure that the vDSO side checks the full 64 bits.
      For that, the local current_generation has to be u64 as well.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      81c68960
  8. Sep 12, 2024
    • Luis Felipe Hernandez's avatar
      lib/math: Add int_pow test suite · 7fcc9b53
      Luis Felipe Hernandez authored
      
      Adds test suite for integer based power function which performs integer
      exponentiation.
      
      The test suite is designed to verify that the implementation of int_pow
      correctly computes the power of a given base raised to a given exponent.
      
      The tests check various scenarios and edge cases to ensure the accuracy
      and reliability of the exponentiation function.
      
      Updated commit with test information at commit time: Shuah Khan
      
      Signed-off-by: default avatarLuis Felipe Hernandez <luis.hernandez093@gmail.com>
      Reviewed-by: default avatarDavid Gow <davidgow@google.com>
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      7fcc9b53
    • David Howells's avatar
      mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios · db0aa2e9
      David Howells authored
      
      Define a data structure, struct folio_queue, to represent a sequence of
      folios and a kernel-internal I/O iterator type, ITER_FOLIOQ, to allow a
      list of folio_queue structures to be used to provide a buffer to
      iov_iter-taking functions, such as sendmsg and recvmsg.
      
      The folio_queue structure looks like:
      
      	struct folio_queue {
      		struct folio_batch	vec;
      		u8			orders[PAGEVEC_SIZE];
      		struct folio_queue	*next;
      		struct folio_queue	*prev;
      		unsigned long		marks;
      		unsigned long		marks2;
      	};
      
      It does not use a list_head so that next and/or prev can be set to NULL at
      the ends of the list, allowing iov_iter-handling routines to determine that
      they *are* the ends without needing to store a head pointer in the iov_iter
      struct.
      
      A folio_batch struct is used to hold the folio pointers which allows the
      batch to be passed to batch handling functions.  Two mark bits are
      available per slot.  The intention is to use at least one of them to mark
      folios that need putting, but that might not be ultimately necessary.
      Accessor functions are used to access the slots to do the masking and an
      additional accessor function is used to indicate the size of the array.
      
      The order of each folio is also stored in the structure to avoid the need
      for iov_iter_advance() and iov_iter_revert() to have to query each folio to
      find its size.
      
      With careful barriering, this can be used as an extending buffer with new
      folios inserted and new folio_queue structs added without the need for a
      lock.  Further, provided we always keep at least one struct in the buffer,
      we can also remove consumed folios and consumed structs from the head end
      as we without the need for locks.
      
      [Questions/thoughts]
      
       (1) To manage this, I need a head pointer, a tail pointer, a tail slot
           number (assuming insertion happens at the tail end and the next
           pointers point from head to tail).  Should I put these into a struct
           of their own, say "folio_queue_head" or "rolling_buffer"?
      
           I will end up with two of these in netfs_io_request eventually, one
           keeping track of the pagecache I'm dealing with for buffered I/O and
           the other to hold a bounce buffer when we need one.
      
       (2) Should I make the slots {folio,off,len} or bio_vec?
      
       (3) This is intended to replace ITER_XARRAY eventually.  Using an xarray
           in I/O iteration requires the taking of the RCU read lock, doing
           copying under the RCU read lock, walking the xarray (which may change
           under us), handling retries and dealing with special values.
      
           The advantage of ITER_XARRAY is that when we're dealing with the
           pagecache directly, we don't need any allocation - but if we're doing
           encrypted comms, there's a good chance we'd be using a bounce buffer
           anyway.
      
           This will require afs, erofs, cifs, orangefs and fscache to be
           converted to not use this.  afs still uses it for dirs and symlinks;
           some of erofs usages should be easy to change, but there's one which
           won't be so easy; ceph's use via fscache can be fixed by porting ceph
           to netfslib; cifs is using xarray as a bounce buffer - that can be
           moved to use sheaves instead; and orangefs has a similar problem to
           erofs - maybe orangefs could use netfslib?
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Matthew Wilcox <willy@infradead.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: Steve French <sfrench@samba.org>
      cc: Ilya Dryomov <idryomov@gmail.com>
      cc: Gao Xiang <xiang@kernel.org>
      cc: Mike Marshall <hubcap@omnibond.com>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      cc: linux-afs@lists.infradead.org
      cc: linux-cifs@vger.kernel.org
      cc: ceph-devel@vger.kernel.org
      cc: linux-erofs@lists.ozlabs.org
      cc: devel@lists.orangefs.org
      Link: https://lore.kernel.org/r/20240814203850.2240469-13-dhowells@redhat.com/
      
       # v2
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      db0aa2e9
  9. Sep 11, 2024
  10. Sep 09, 2024
  11. Sep 08, 2024
  12. Sep 04, 2024
Loading