Skip to content
Snippets Groups Projects
  1. Apr 08, 2020
  2. Jan 13, 2020
    • Zhengyuan Liu's avatar
      md/raid6: fix algorithm choice under larger PAGE_SIZE · f591df3c
      Zhengyuan Liu authored
      
      There are several algorithms available for raid6 to generate xor and syndrome
      parity, including basic int1, int2 ... int32 and SIMD optimized implementation
      like sse and neon.  To test and choose the best algorithms at the initial
      stage, we need provide enough disk data to feed the algorithms. However, the
      disk number we provided depends on page size and gfmul table, seeing bellow:
      
          const int disks = (65536/PAGE_SIZE) + 2;
      
      So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
      as a result the chosed algorithm is not reliable. For example, on my arm64
      machine with 64K page enabled, it will choose intx32 as the best one, although
      the NEON implementation is better.
      
      This patch tries to fix the problem by defining a constant raid6 disk number to
      supporting arbitrary page size.
      
      Suggested-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarZhengyuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      f591df3c
  3. May 24, 2019
  4. Dec 20, 2018
    • Daniel Verkamp's avatar
      lib/raid6: add option to skip algo benchmarking · be85f93a
      Daniel Verkamp authored
      
      This is helpful for systems where fast startup time is important.
      It is especially nice to avoid benchmarking RAID functions that are
      never used (for example, BTRFS selects RAID6_PQ even if the parity RAID
      mode is not in use).
      
      This saves 250+ milliseconds of boot time on modern x86 and ARM systems
      with a dozen or more available implementations.
      
      The new option is defaulted to 'y' to match the previous behavior of
      always benchmarking on init.
      
      Signed-off-by: default avatarDaniel Verkamp <dverkamp@chromium.org>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      be85f93a
    • Daniel Verkamp's avatar
      lib/raid6: sort algos in rough performance order · 0437de4f
      Daniel Verkamp authored
      
      Sort the list of RAID6 algorithms in roughly decreasing order of
      expected performance: newer instruction sets first (within each
      architecture) and wider unrollings first.
      
      This doesn't make any difference right now, since all functions are
      benchmarked; a follow-up change will make use of this by optionally
      choosing the first valid function rather than testing all of them.
      
      The Itanium raid6_intx{16,32} entries are also moved down to be near the
      other raid6_intx entries for clarity.
      
      Signed-off-by: default avatarDaniel Verkamp <dverkamp@chromium.org>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      0437de4f
  5. Mar 26, 2018
  6. Mar 20, 2018
    • Matt Brown's avatar
      lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome · 751ba79c
      Matt Brown authored
      
      This patch uses the vpermxor instruction to optimise the raid6 Q
      syndrome. This instruction was made available with POWER8, ISA version
      2.07. It allows for both vperm and vxor instructions to be done in a
      single instruction. This has been tested for correctness on a ppc64le
      vm with a basic RAID6 setup containing 5 drives.
      
      The performance benchmarks are from the raid6test in the
      /lib/raid6/test directory. These results are from an IBM Firestone
      machine with ppc64le architecture. The benchmark results show a 35%
      speed increase over the best existing algorithm for powerpc (altivec).
      The raid6test has also been run on a big-endian ppc64 vm to ensure it
      also works for big-endian architectures.
      
      Performance benchmarks:
        raid6: altivecx4 gen() 18773 MB/s
        raid6: altivecx8 gen() 19438 MB/s
      
        raid6: vpermxor4 gen() 25112 MB/s
        raid6: vpermxor8 gen() 26279 MB/s
      
      Signed-off-by: default avatarMatt Brown <matthew.brown.dev@gmail.com>
      Reviewed-by: default avatarDaniel Axtens <dja@axtens.net>
      [mpe: Add VPERMXOR macro so we can build with old binutils]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      751ba79c
  7. Aug 09, 2017
  8. Sep 21, 2016
  9. Sep 01, 2016
  10. Aug 29, 2016
    • Martin Schwidefsky's avatar
      RAID/s390: add SIMD implementation for raid6 gen/xor · 474fd6e8
      Martin Schwidefsky authored
      
      Using vector registers is slightly faster:
      
      raid6: vx128x8  gen() 19705 MB/s
      raid6: vx128x8  xor() 11886 MB/s
      raid6: using algorithm vx128x8 gen() 19705 MB/s
      raid6: .... xor() 11886 MB/s, rmw enabled
      
      vs the software algorithms:
      
      raid6: int64x1  gen()  3018 MB/s
      raid6: int64x1  xor()  1429 MB/s
      raid6: int64x2  gen()  4661 MB/s
      raid6: int64x2  xor()  3143 MB/s
      raid6: int64x4  gen()  5392 MB/s
      raid6: int64x4  xor()  3509 MB/s
      raid6: int64x8  gen()  4441 MB/s
      raid6: int64x8  xor()  3207 MB/s
      raid6: using algorithm int64x4 gen() 5392 MB/s
      raid6: .... xor() 3509 MB/s, rmw enabled
      
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      474fd6e8
  11. Apr 21, 2015
    • Markus Stockhausen's avatar
      md/raid6 algorithms: delta syndrome functions · fe5cbc6e
      Markus Stockhausen authored
      
      v3: s-o-b comment, explanation of performance and descision for
      the start/stop implementation
      
      Implementing rmw functionality for RAID6 requires optimized syndrome
      calculation. Up to now we can only generate a complete syndrome. The
      target P/Q pages are always overwritten. With this patch we provide
      a framework for inplace P/Q modification. In the first place simply
      fill those functions with NULL values.
      
      xor_syndrome() has two additional parameters: start & stop. These
      will indicate the first and last page that are changing during a
      rmw run. That makes it possible to avoid several unneccessary loops
      and speed up calculation. The caller needs to implement the following
      logic to make the functions work.
      
      1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source
      blocks inside P/Q between (and including) start and end.
      
      2) modify any block with start <= block <= stop
      
      3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of
      source blocks into P/Q between (and including) start and end.
      
      Pages between start and stop that won't be changed should be filled
      with a pointer to the kernel zero page. The reasons for not taking NULL
      pages are:
      
      1) Algorithms cross the whole source data line by line. Thus avoid
      additional branches.
      
      2) Having a NULL page avoids calculating the XOR P parity but still
      need calulation steps for the Q parity. Depending on the algorithm
      unrolling that might be only a difference of 2 instructions per loop.
      
      The benchmark numbers of the gen_syndrome() functions are displayed in
      the kernel log. Do the same for the xor_syndrome() functions. This
      will help to analyze performance problems and give an rough estimate
      how well the algorithm works. The choice of the fastest algorithm will
      still depend on the gen_syndrome() performance.
      
      With the start/stop page implementation the speed can vary a lot in real
      life. E.g. a change of page 0 & page 15 on a stripe will be harder to
      compute than the case where page 0 & page 1 are XOR candidates. To be not
      to enthusiatic about the expected speeds we will run a worse case test
      that simulates a change on the upper half of the stripe. So we do:
      
      1) calculation of P/Q for the upper pages
      
      2) continuation of Q for the lower (empty) pages
      
      Signed-off-by: default avatarMarkus Stockhausen <stockhausen@collogia.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      fe5cbc6e
  12. Feb 03, 2015
  13. Oct 14, 2014
  14. Aug 27, 2013
    • Ken Steele's avatar
      RAID: add tilegx SIMD implementation of raid6 · ae77cbc1
      Ken Steele authored
      This change adds TILE-Gx SIMD instructions to the software raid
      (md), modeling the Altivec implementation. This is only for Syndrome
      generation; there is more that could be done to improve recovery,
      as in the recent Intel SSE3 recovery implementation.
      
      The code unrolls 8 times; this turns out to be the best on tilegx
      hardware among the set 1, 2, 4, 8 or 16.  The code reads one
      cache-line of data from each disk, stores P and Q then goes to the
      next cache-line.
      
      The test code in sys/linux/lib/raid6/test reports 2008 MB/s data
      read rate for syndrome generation using 18 disks (16 data and 2
      parity). It was 1512 MB/s before this SIMD optimizations. This is
      running on 1 core with all the data in cache.
      
      This is based on the paper The Mathematics of RAID-6.
      (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
      
      ).
      
      Signed-off-by: default avatarKen Steele <ken@tilera.com>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      ae77cbc1
  15. Jul 08, 2013
  16. Dec 13, 2012
  17. May 22, 2012
  18. Oct 31, 2011
  19. Aug 11, 2010
  20. Aug 10, 2010
  21. Oct 29, 2009
  22. Mar 31, 2009
    • Dan Williams's avatar
      md/raid6: move raid6 data processing to raid6_pq.ko · f701d589
      Dan Williams authored
      
      Move the raid6 data processing routines into a standalone module
      (raid6_pq) to prepare them to be called from async_tx wrappers and other
      non-md drivers/modules.  This precludes a circular dependency of raid456
      needing the async modules for data processing while those modules in
      turn depend on raid456 for the base level synchronous raid6 routines.
      
      To support this move:
      1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h
      2/ The raid6_call, recovery calls, and table symbols are exported
      3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to
         compile
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      f701d589
    • Atsushi SAKAI's avatar
      md: fix typo in FSF address · 93ed05e2
      Atsushi SAKAI authored
      
      Hello,
      
       I found a typo Bosto"m" in FSF address.
      And I am checking around linux source code.
      Here is the only place which uses Bosto"m" (not Boston).
      
      Signed-off-by: default avatarAtsushi SAKAI <sakaia@jp.fujitsu.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      93ed05e2
  23. Apr 28, 2008
  24. Oct 29, 2007
  25. Jun 23, 2006
  26. Sep 17, 2005
  27. Apr 16, 2005
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
Loading