- Apr 08, 2020
-
-
Jason A. Donenfeld authored
Now that the kernel specifies binutils 2.23 as the minimum version, we can remove ifdefs for AVX2 and ADX throughout. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com> Acked-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
Masahiro Yamada authored
CONFIG_AS_SSSE3 was introduced by commit 75aaf4c3 ("x86/raid6: correctly check for assembler capabilities"). We raise the minimal supported binutils version from time to time. The last bump was commit 1fb12b35 ("kbuild: Raise the minimum required binutils version to 2.21"). I confirmed the code in $(call as-instr,...) can be assembled by the binutils 2.21 assembler and also by LLVM integrated assembler. Remove CONFIG_AS_SSSE3, which is always defined. I added ifdef CONFIG_X86 to lib/raid6/algos.c to avoid link errors on non-x86 architectures. lib/raid6/algos.c is built not only for the kernel but also for testing the library code from userspace. I added -DCONFIG_X86 to lib/raid6/test/Makefile to cator to this usecase. Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org> Reviewed-by:
Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Acked-by:
Ingo Molnar <mingo@kernel.org>
-
- Jan 13, 2020
-
-
Zhengyuan Liu authored
There are several algorithms available for raid6 to generate xor and syndrome parity, including basic int1, int2 ... int32 and SIMD optimized implementation like sse and neon. To test and choose the best algorithms at the initial stage, we need provide enough disk data to feed the algorithms. However, the disk number we provided depends on page size and gfmul table, seeing bellow: const int disks = (65536/PAGE_SIZE) + 2; So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk, as a result the chosed algorithm is not reliable. For example, on my arm64 machine with 64K page enabled, it will choose intx32 as the best one, although the NEON implementation is better. This patch tries to fix the problem by defining a constant raid6 disk number to supporting arbitrary page size. Suggested-by:
H. Peter Anvin <hpa@zytor.com> Signed-off-by:
Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by:
Song Liu <songliubraving@fb.com>
-
- May 24, 2019
-
-
Thomas Gleixner authored
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation inc 53 temple place ste 330 boston ma 02111 1307 usa either version 2 of the license or at your option any later version incorporated herein by reference extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 13 file(s). Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Reviewed-by:
Allison Randal <allison@lohutok.net> Reviewed-by:
Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170858.645641371@linutronix.de Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Dec 20, 2018
-
-
Daniel Verkamp authored
This is helpful for systems where fast startup time is important. It is especially nice to avoid benchmarking RAID functions that are never used (for example, BTRFS selects RAID6_PQ even if the parity RAID mode is not in use). This saves 250+ milliseconds of boot time on modern x86 and ARM systems with a dozen or more available implementations. The new option is defaulted to 'y' to match the previous behavior of always benchmarking on init. Signed-off-by:
Daniel Verkamp <dverkamp@chromium.org> Signed-off-by:
Shaohua Li <shli@fb.com>
-
Daniel Verkamp authored
Sort the list of RAID6 algorithms in roughly decreasing order of expected performance: newer instruction sets first (within each architecture) and wider unrollings first. This doesn't make any difference right now, since all functions are benchmarked; a follow-up change will make use of this by optionally choosing the first valid function rather than testing all of them. The Itanium raid6_intx{16,32} entries are also moved down to be near the other raid6_intx entries for clarity. Signed-off-by:
Daniel Verkamp <dverkamp@chromium.org> Signed-off-by:
Shaohua Li <shli@fb.com>
-
- Mar 26, 2018
-
-
Arnd Bergmann authored
The Tile architecture is getting removed, so we no longer need this either. Acked-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Arnd Bergmann <arnd@arndb.de>
-
- Mar 20, 2018
-
-
Matt Brown authored
This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Signed-off-by:
Matt Brown <matthew.brown.dev@gmail.com> Reviewed-by:
Daniel Axtens <dja@axtens.net> [mpe: Add VPERMXOR macro so we can build with old binutils] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au>
-
- Aug 09, 2017
-
-
Ard Biesheuvel authored
Provide a NEON accelerated implementation of the recovery algorithm, which supersedes the default byte-by-byte one. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
- Sep 21, 2016
-
-
Gayatri Kammela authored
Optimize RAID6 recovery functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized recovery functions, which is simply based on recov_avx2.c written by Jim Kukunas This patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Megha Dey <megha.dey@linux.intel.com> Signed-off-by:
Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by:
Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Shaohua Li <shli@fb.com>
-
Gayatri Kammela authored
Optimize RAID6 gen_syndrom functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized gen_syndrom functions, which is simply based on avx2.c written by Yuanhan Liu and sse2.c written by hpa. The patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Megha Dey <megha.dey@linux.intel.com> Signed-off-by:
Gayatri Kammela <gayatri.kammela@intel.com> Reviewed-by:
Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Shaohua Li <shli@fb.com>
-
- Sep 01, 2016
-
-
Martin Schwidefsky authored
The XC instruction can be used to improve the speed of the raid6 recovery. The loops now operate on blocks of 256 bytes. Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com>
-
- Aug 29, 2016
-
-
Martin Schwidefsky authored
Using vector registers is slightly faster: raid6: vx128x8 gen() 19705 MB/s raid6: vx128x8 xor() 11886 MB/s raid6: using algorithm vx128x8 gen() 19705 MB/s raid6: .... xor() 11886 MB/s, rmw enabled vs the software algorithms: raid6: int64x1 gen() 3018 MB/s raid6: int64x1 xor() 1429 MB/s raid6: int64x2 gen() 4661 MB/s raid6: int64x2 xor() 3143 MB/s raid6: int64x4 gen() 5392 MB/s raid6: int64x4 xor() 3509 MB/s raid6: int64x8 gen() 4441 MB/s raid6: int64x8 xor() 3207 MB/s raid6: using algorithm int64x4 gen() 5392 MB/s raid6: .... xor() 3509 MB/s, rmw enabled Signed-off-by:
Martin Schwidefsky <schwidefsky@de.ibm.com>
-
- Apr 21, 2015
-
-
Markus Stockhausen authored
v3: s-o-b comment, explanation of performance and descision for the start/stop implementation Implementing rmw functionality for RAID6 requires optimized syndrome calculation. Up to now we can only generate a complete syndrome. The target P/Q pages are always overwritten. With this patch we provide a framework for inplace P/Q modification. In the first place simply fill those functions with NULL values. xor_syndrome() has two additional parameters: start & stop. These will indicate the first and last page that are changing during a rmw run. That makes it possible to avoid several unneccessary loops and speed up calculation. The caller needs to implement the following logic to make the functions work. 1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source blocks inside P/Q between (and including) start and end. 2) modify any block with start <= block <= stop 3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of source blocks into P/Q between (and including) start and end. Pages between start and stop that won't be changed should be filled with a pointer to the kernel zero page. The reasons for not taking NULL pages are: 1) Algorithms cross the whole source data line by line. Thus avoid additional branches. 2) Having a NULL page avoids calculating the XOR P parity but still need calulation steps for the Q parity. Depending on the algorithm unrolling that might be only a difference of 2 instructions per loop. The benchmark numbers of the gen_syndrome() functions are displayed in the kernel log. Do the same for the xor_syndrome() functions. This will help to analyze performance problems and give an rough estimate how well the algorithm works. The choice of the fastest algorithm will still depend on the gen_syndrome() performance. With the start/stop page implementation the speed can vary a lot in real life. E.g. a change of page 0 & page 15 on a stripe will be harder to compute than the case where page 0 & page 1 are XOR candidates. To be not to enthusiatic about the expected speeds we will run a worse case test that simulates a change on the upper half of the stripe. So we do: 1) calculation of P/Q for the upper pages 2) continuation of Q for the lower (empty) pages Signed-off-by:
Markus Stockhausen <stockhausen@collogia.de> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- Feb 03, 2015
-
-
Jan Beulich authored
Just like for AVX2 (which simply needs an #if -> #ifdef conversion), SSSE3 assembler support should be checked for before using it. Signed-off-by:
Jan Beulich <jbeulich@suse.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- Oct 14, 2014
-
-
Anton Blanchard authored
Signed-off-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- Aug 27, 2013
-
-
Ken Steele authored
This change adds TILE-Gx SIMD instructions to the software raid (md), modeling the Altivec implementation. This is only for Syndrome generation; there is more that could be done to improve recovery, as in the recent Intel SSE3 recovery implementation. The code unrolls 8 times; this turns out to be the best on tilegx hardware among the set 1, 2, 4, 8 or 16. The code reads one cache-line of data from each disk, stores P and Q then goes to the next cache-line. The test code in sys/linux/lib/raid6/test reports 2008 MB/s data read rate for syndrome generation using 18 disks (16 data and 2 parity). It was 1512 MB/s before this SIMD optimizations. This is running on 1 core with all the data in cache. This is based on the paper The Mathematics of RAID-6. (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf ). Signed-off-by:
Ken Steele <ken@tilera.com> Signed-off-by:
Chris Metcalf <cmetcalf@tilera.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- Jul 08, 2013
-
-
Ard Biesheuvel authored
Rebased/reworked a patch contributed by Rob Herring that uses NEON intrinsics to perform the RAID-6 syndrome calculations. It uses the existing unroll.awk code to generate several unrolled versions of which the best performing one is selected at boot time. Signed-off-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by:
Nicolas Pitre <nico@linaro.org> Cc: hpa@linux.intel.com
-
- Dec 13, 2012
-
-
Yuanhan Liu authored
Add AVX2 optimized gen_syndrom functions, which is simply based on sse2.c written by hpa. Signed-off-by:
Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by:
H. Peter Anvin <hpa@zytor.com> Signed-off-by:
Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
Jim Kukunas authored
Optimize RAID6 recovery functions to take advantage of the 256-bit YMM integer instructions introduced in AVX2. The patch was tested and benchmarked before submission. However hardware is not yet released so benchmark numbers cannot be reported. Acked-by:
"H. Peter Anvin" <hpa@zytor.com> Signed-off-by:
Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- May 22, 2012
-
-
Jim Kukunas authored
Reorders functions in raid6_algos as well as the preference check to reduce the number of functions tested on initialization. Also, creates symmetry between choosing the gen_syndrome functions and choosing the recovery functions. Signed-off-by:
Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
Jim Kukunas authored
Add SSSE3 optimized recovery functions, as well as a system for selecting the most appropriate recovery functions to use. Originally-by:
H. Peter Anvin <hpa@zytor.com> Signed-off-by:
Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
Jim Kukunas authored
<linux/module.h> drags in headers which are not visible to userspace, thus breaking the build for the test program. Signed-off-by:
Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- Oct 31, 2011
-
-
Paul Gortmaker authored
A pending cleanup will mean that module.h won't be implicitly everywhere anymore. Make sure the modular drivers in md dir are actually calling out for <module.h> explicitly in advance. Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com>
-
- Aug 11, 2010
-
-
NeilBrown authored
Rename raid6/raid6x86.h to raid6/x86.h and modify some comments. Signed-off-by:
NeilBrown <neilb@suse.de>
-
NeilBrown authored
Some bit-rot needs to be cleaned out. Signed-off-by:
NeilBrown <neilb@suse.de>
-
- Aug 10, 2010
-
-
David Woodhouse authored
Linus asks 'why "raid6" twice?'. No reason. Signed-off-by:
David Woodhouse <David.Woodhouse@intel.com>
-
- Oct 29, 2009
-
-
David Woodhouse authored
We'll want to use these in btrfs too. Signed-off-by:
David Woodhouse <David.Woodhouse@intel.com>
-
- Mar 31, 2009
-
-
Dan Williams authored
Move the raid6 data processing routines into a standalone module (raid6_pq) to prepare them to be called from async_tx wrappers and other non-md drivers/modules. This precludes a circular dependency of raid456 needing the async modules for data processing while those modules in turn depend on raid456 for the base level synchronous raid6 routines. To support this move: 1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h 2/ The raid6_call, recovery calls, and table symbols are exported 3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to compile Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
Atsushi SAKAI authored
Hello, I found a typo Bosto"m" in FSF address. And I am checking around linux source code. Here is the only place which uses Bosto"m" (not Boston). Signed-off-by:
Atsushi SAKAI <sakaia@jp.fujitsu.com> Signed-off-by:
NeilBrown <neilb@suse.de>
-
- Apr 28, 2008
-
-
Julia Lawall authored
The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the semantic patch making this change is as follows: (http://www.emn.fr/x-info/coccinelle/ ) // <smpl> @ change_compare_np @ expression E; @@ ( - jiffies <= E + time_before_eq(jiffies,E) | - jiffies >= E + time_after_eq(jiffies,E) | - jiffies < E + time_before(jiffies,E) | - jiffies > E + time_after(jiffies,E) ) @ include depends on change_compare_np @ @@ #include <linux/jiffies.h> @ no_include depends on !include && change_compare_np @ @@ #include <linux/...> + #include <linux/jiffies.h> // </smpl> [akpm@linux-foundation.org: coding-style fixes] Signed-off-by:
Julia Lawall <julia@diku.dk> Cc: Neil Brown <neilb@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 29, 2007
-
-
Al Viro authored
Don't undef __i386__/__x86_64__ in uml anymore, make sure that (few) places that required adjusting the ifdefs got those. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jun 23, 2006
-
-
Adrian Bunk authored
This patch fixes a NULL dereference spotted by the Coverity checker. Signed-off-by:
Adrian Bunk <bunk@stusta.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Sep 17, 2005
-
-
H Peter Anvin authored
This patch fixes a signedness bug with RAID6 for Altivec, and makes the Altivec code testable in userspace. Signed-off-by:
H. Peter Anvin <hpa@zytor.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Apr 16, 2005
-
-
Linus Torvalds authored
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
-