LoongArch: LSX and LASX SIMD implementation.
Add LoongArch SIMD support. Add LSX and LASX optimizations.
Benchmark results, before is upstream/master 47d3fbe3,
LSX build: ./autogen.sh --disable-lasx && make -j4 LASX build: ./autogen.sh && make -j4
For example, the highest improvement is add_n_888.
./tests/lowlevel-blt-bench add_n_888
before: add_n_8 = L1: 186.07 L2: 200.18 M:198.43 ( 1.41%) HT:161.37 VT:156.22 R:156.65 RT:103.67 ( 654Kops/s)
LSX: add_n_8 = L1:13782.81 L2:21067.23 M:14209.75 ( 98.95%) HT:1712.74 VT:3345.05 R:1661.89 RT:469.35 (2054Kops/s)
LASX: add_n_8 = L1:13034.63 L2:19725.46 M:16530.90 (117.71%) HT:1104.39 VT:2264.26 R:1077.33 RT:442.79 (2020Kops/s)
./test/lowlevel-blt-bench all, 10 iterations:
2.5 GHz LoongArch 3A5000, Linux, 64-bit, MEAN:
LSX | LASX | |
---|---|---|
L1 | +336.97% | +488.91% |
L2 | +340.57% | +484.78% |
M | +307.29% | +420.48% |
HT | +214.05% | +225.17% |
VT | +201.28% | +208.94% |
R | +202.48% | +213.19% |
RT | +146.14% | +140.95% |