Skip to content
Snippets Groups Projects
  1. Jan 30, 2013
  2. Jan 29, 2013
    • Søren Sandmann Pedersen's avatar
      Change default GPGKEY to 3892336E, which is soren.sandmann@gmail.com · afde8629
      Søren Sandmann Pedersen authored
      The old one belongs to the email address sandmann@daimi.au.dk, which
      doesn't work anyore.
      
      Also use gpg to get the name and address for the "(Signed by ...)"
      line since that works more reliably for me than using git.
      afde8629
    • Ben Avison's avatar
      Improve L1 and L2 benchmark tests for caches that don't use allocate-on-write · 69a7a9b6
      Ben Avison authored
      In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which
      are usually configured this way. For other CPUs, this should only add a
      constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs.
      
      The problems were caused by cachelines becoming permanently evicted from
      the cache, because the code that was intended to pull them back in again on
      each iteration assumed too long a cache line (for the L1 test) or failed to
      read memory beyond the first pixel row (for the L2 test). Also, the reloading
      of the source buffer was unnecessary.
      
      These issues were identified by Siarhei in this post:
      http://lists.freedesktop.org/archives/pixman/2013-January/002543.html
      69a7a9b6
    • Søren Sandmann Pedersen's avatar
      pixman-combine-float.c: Use IS_ZERO() in clip_color() and set_sat() · 1fa67f49
      Søren Sandmann Pedersen authored
      The clip_color() function has some checks to avoid division by zero,
      but they are done by comparing the value to 4 * FLT_EPSILON, where a
      better choice is the IS_ZERO() macro that compares to +/- FLT_MIN.
      
      In set_sat(), the check is that *max > *min before dividing by *max -
      *min, but that has the potential problem that interactions between GCC
      optimizions and 80 bit x87 registers could mean that (*max > *min) is
      true in 80 bits, but (*max - *min) is 0 in 32 bits, so that the
      division by zero is not prevented. Using IS_ZERO() here as well
      prevents this.
      1fa67f49
    • Ben Avison's avatar
      ARMv6: Replacement add_8_8, over_8888_8888, over_8888_n_8888 and over_n_8_8888 routines · 7e53e586
      Ben Avison authored and Siarhei Siamashka's avatar Siarhei Siamashka committed
      Improved by adding preloads, combining writes and using the SEL
      instruction.
      
      add_8_8
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  62.1   0.2      543.4  12.4    100.0%      +774.9%
      L2  38.7   0.4      116.8  1.7     100.0%      +201.8%
      M   40.0   0.1      110.1  0.5     100.0%      +175.3%
      HT  30.9   0.2      43.4   0.5     100.0%      +40.4%
      VT  30.6   0.3      39.2   0.5     100.0%      +28.0%
      R   21.3   0.2      35.4   0.4     100.0%      +66.6%
      RT  8.6    0.2      10.2   0.3     100.0%      +19.4%
      
      over_8888_8888
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  32.3   0.1      38.0   0.2     100.0%      +17.7%
      L2  15.9   0.4      30.6   0.5     100.0%      +92.8%
      M   13.3   0.0      25.6   0.0     100.0%      +92.9%
      HT  10.5   0.1      15.5   0.1     100.0%      +47.1%
      VT  10.4   0.1      14.6   0.1     100.0%      +40.8%
      R   10.3   0.1      15.8   0.1     100.0%      +53.3%
      RT  6.0    0.1      7.6    0.1     100.0%      +25.9%
      
      over_8888_n_8888
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  17.6   0.1      21.0   0.1     100.0%      +19.2%
      L2  11.2   0.2      19.2   0.1     100.0%      +71.2%
      M   10.2   0.0      19.6   0.0     100.0%      +92.6%
      HT  8.4    0.0      11.9   0.1     100.0%      +41.7%
      VT  8.3    0.0      11.3   0.1     100.0%      +36.4%
      R   8.3    0.0      11.8   0.1     100.0%      +43.1%
      RT  5.1    0.1      6.2    0.1     100.0%      +21.3%
      
      over_n_8_8888
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  17.5   0.1      22.8   0.8     100.0%      +30.1%
      L2  14.2   0.3      21.7   0.2     100.0%      +52.6%
      M   12.0   0.0      22.3   0.0     100.0%      +84.8%
      HT  10.5   0.1      14.1   0.1     100.0%      +34.5%
      VT  10.0   0.1      13.5   0.1     100.0%      +35.3%
      R   9.4    0.0      12.9   0.2     100.0%      +37.7%
      RT  5.5    0.1      6.5    0.2     100.0%      +19.2%
      7e53e586
    • Ben Avison's avatar
      ARMv6: New conversion routines · f87dfd6f
      Ben Avison authored and Siarhei Siamashka's avatar Siarhei Siamashka committed
      There was no previous attempt at accelerating these specifically for
      ARMv6.
      
      src_x888_8888
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  96.7   0.5      270.4  2.6     100.0%      +179.5%
      L2  44.6   2.7      110.6  9.7     100.0%      +148.0%
      M   26.9   0.1      87.6   0.5     100.0%      +226.1%
      HT  19.3   0.2      37.5   0.4     100.0%      +93.7%
      VT  18.6   0.1      33.7   0.4     100.0%      +81.6%
      R   18.4   0.1      32.2   0.3     100.0%      +75.2%
      RT  9.2    0.2      12.1   0.3     100.0%      +31.4%
      
      src_0565_8888
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  37.0   0.3      66.9   0.2     100.0%      +80.8%
      L2  30.3   0.2      55.9   0.3     100.0%      +84.4%
      M   25.9   0.0      62.3   0.2     100.0%      +140.3%
      HT  15.2   0.1      33.1   0.3     100.0%      +116.9%
      VT  15.1   0.1      30.7   0.3     100.0%      +103.6%
      R   14.2   0.1      27.6   0.3     100.0%      +94.0%
      RT  6.0    0.1      11.2   0.3     100.0%      +87.2%
      f87dfd6f
    • Ben Avison's avatar
      ARMv6: New blit routines · a0f59f3b
      Ben Avison authored and Siarhei Siamashka's avatar Siarhei Siamashka committed
      These are usable either as various composite operations, or via the
      top-level function pixman_blt() which now does some blitting for the
      first time on an ARMv6 platform (previously it just returned FALSE).
      
      src_8888_8888
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  414.5  9.4      445.8  3.6     100.0%      +7.6%
      L2  93.3   20.7     114.5  12.9    100.0%      +22.7%
      M   57.0   0.2      89.2   0.5     100.0%      +56.4%
      HT  28.7   0.3      39.6   0.4     100.0%      +37.9%
      VT  25.5   0.2      35.3   0.4     100.0%      +38.4%
      R   20.1   0.1      33.8   0.3     100.0%      +67.8%
      RT  7.8    0.2      12.7   0.4     100.0%      +62.7%
      
      src_0565_0565
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  397.4  6.1      412.5  5.2     100.0%      +3.8%
      L2  143.2  10.9     141.9  6.5     68.9%       -0.9%  (insignificant)
      M   90.7   0.4      133.5  0.7     100.0%      +47.1%
      HT  38.6   0.3      53.7   0.7     100.0%      +39.0%
      VT  33.0   0.3      47.3   0.6     100.0%      +43.3%
      R   25.7   0.2      42.1   0.5     100.0%      +64.1%
      RT  8.0    0.2      13.3   0.3     100.0%      +65.6%
      
      src_8_8
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  716.5  9.8      768.2  20.4    100.0%      +7.2%
      L2  246.2  12.7     260.5  8.8     100.0%      +5.8%
      M   146.8  0.7      227.9  0.7     100.0%      +55.2%
      HT  44.9   0.6      62.1   1.0     100.0%      +38.2%
      VT  35.6   0.4      53.4   0.7     100.0%      +50.0%
      R   29.7   0.3      48.2   0.6     100.0%      +62.2%
      RT  8.6    0.2      12.9   0.4     100.0%      +49.3%
      a0f59f3b
    • Ben Avison's avatar
      ARMv6: New fill routines · 3cff56c5
      Ben Avison authored and Siarhei Siamashka's avatar Siarhei Siamashka committed
      Note that this also effectively accelerates src_n_8888, src_n_0565 and
      src_n_8 composite types, because of the fast paths in
      pixman-fast-path.c implemented by fast_composite_solid_fill(), which
      end up dispatching these platform-specific fill routines.
      
      src_n_8888
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  157.3  1.1      574.2  8.7     100.0%      +265.0%
      L2  94.2   0.5      364.8  4.2     100.0%      +287.3%
      M   92.7   0.4      358.7  1.1     100.0%      +287.1%
      HT  68.5   0.9      133.6  4.0     100.0%      +95.2%
      VT  61.3   0.8      111.8  2.6     100.0%      +82.4%
      R   61.1   0.9      108.7  2.8     100.0%      +78.1%
      RT  24.6   1.0      28.6   1.6     100.0%      +16.0%
      
      src_n_0565
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  157.4  1.0      983.1  38.5    100.0%      +524.6%
      L2  93.6   0.5      696.0  14.3    100.0%      +643.4%
      M   92.7   0.4      680.5  1.0     100.0%      +634.0%
      HT  68.3   0.9      160.3  6.6     100.0%      +134.6%
      VT  61.1   0.8      130.1  3.4     100.0%      +112.9%
      R   61.0   0.8      125.4  4.1     100.0%      +105.7%
      RT  24.9   1.3      29.5   1.5     100.0%      +18.2%
      
      src_n_8
      
          Before          After
          Mean   StdDev   Mean   StdDev  Confidence  Change
      L1  154.7  1.0      1324.4 48.5    100.0%      +756.3%
      L2  92.4   0.4      1178.4 10.9    100.0%      +1175.6%
      M   92.9   0.4      1275.7 2.1     100.0%      +1273.5%
      HT  68.2   1.0      169.8  5.5     100.0%      +149.0%
      VT  61.2   1.0      138.5  3.6     100.0%      +126.3%
      R   61.3   0.9      130.1  3.8     100.0%      +112.4%
      RT  25.5   1.3      29.2   1.9     100.0%      +14.6%
      3cff56c5
    • Ben Avison's avatar
      ARMv6: Lay the groundwork for later patches in the series · 2e173326
      Ben Avison authored and Siarhei Siamashka's avatar Siarhei Siamashka committed
      Move the entire contents of pixman-arm-simd-asm.S to a new file;
      ultimately this will only retain the scaled operations, so it is
      named pixman-arm-simd-asm-scaled.S. Added new header file
      pixman-arm-simd-asm.h, containing the macros which are the basis of
      all the new ARMv6 implementations, although at this point in the
      series, nothing uses them and the library should be binary-identical.
      2e173326
  3. Jan 28, 2013
  4. Jan 27, 2013
    • Siarhei Siamashka's avatar
      Use pixman_transform_point_31_16() from pixman_transform_point() · ed399925
      Siarhei Siamashka authored
      Old functions pixman_transform_point() and pixman_transform_point_3d()
      now become just wrappers for pixman_transform_point_31_16() and
      pixman_transform_point_31_16_3d(). Eventually their uses should be
      completely eliminated in the pixman code and replaced with their
      extended range counterparts. This is needed in order to be able
      to correctly handle any matrices and parameters that may come
      to pixman from the code responsible for XRender implementation.
      ed399925
    • Siarhei Siamashka's avatar
      test: Added matrix-test for testing projective transform accuracy · 5a78d74c
      Siarhei Siamashka authored
      This test uses __float128 data type when it is available
      for implementing a "perfect" reference implementation. The
      output from from pixman_transform_point_31_16() and
      pixman_transform_point_31_16_affine() is compared with the
      reference implementation to make sure that the rounding
      errors may only show up in a single least significant bit.
      
      The platforms and compilers, which do not support __float128
      data type, can rely on crc32 checksum for the pseudorandom
      transform results.
      5a78d74c
    • Siarhei Siamashka's avatar
      configure.ac: Added detection for __float128 support · 09600ae7
      Siarhei Siamashka authored
      GCC supports 128-bit floating point data type on some platforms (including
      but not limited to x86 and x86-64). This may be useful for tests, which
      need prefectly accurate reference implementations of certain algorithms.
      09600ae7
    • Siarhei Siamashka's avatar
      Add higher precision "pixman_transform_point_*" functions · c3deb833
      Siarhei Siamashka authored
      The following new functions are added:
      
      pixman_transform_point_31_16_3d() -
          Calculates the product of a matrix and a vector multiplication.
      
      pixman_transform_point_31_16() -
          Calculates the product of a matrix and a vector multiplication.
          Then converts the homogenous resulting vector [x, y, z] to
          cartesian [x', y', 1] variant, where x' = x / z, and y' = y / z.
      
      pixman_transform_point_31_16_affine() -
          A faster sibling of the other two functions, which assumes affine
          transformation, where the bottom row of the matrix is [0, 0, 1] and
          the last element of the input vector is set to 1.
      
      These functions transform a point with 31.16 fixed point coordinates from
      the destination space to a point with 48.16 fixed point coordinates in
      the source space.
      
      The results are accurate and the rounding errors may only show up in
      the least significant bit. No overflows are possible for the affine
      transformations as long as the input data is provided in 31.16 format.
      In the case of projective transformations, some output values may be not
      representable using 48.16 fixed point format. In this case the results
      are clamped to return maximum or minimum 48.16 values (so that the caller
      can at least handle NONE and PAD repeats correctly).
      c3deb833
    • Siarhei Siamashka's avatar
      Faster fetch for the C variant of r5g6b5 src/dest iterator · a47ed2c3
      Siarhei Siamashka authored
      Processing two pixels at once is used to reduce the number of
      arithmetic operations.
      
      The speedup relative to the generic fetch_scanline_r5g6b5() from
      "pixman-access.c" (pixman was compiled with gcc 4.7.2):
      
          MIPS 74K        480MHz  :  20.32 MPix/s ->  26.47 MPix/s
          ARM11           700MHz  :  34.95 MPix/s ->  38.22 MPix/s
          ARM Cortex-A8  1000MHz  :  87.44 MPix/s -> 100.92 MPix/s
          ARM Cortex-A9  1700MHz  : 150.95 MPix/s -> 158.13 MPix/s
          ARM Cortex-A15 1700MHz  : 148.91 MPix/s -> 155.42 MPix/s
          IBM Cell PPU   3200MHz  :  75.29 MPix/s ->  98.33 MPix/s
          Intel Core i7  2800MHz  : 257.02 MPix/s -> 376.93 MPix/s
      
      That's the performance for C code (SIMD and assembly optimizations
      are disabled via PIXMAN_DISABLE environment variable).
      a47ed2c3
    • Siarhei Siamashka's avatar
      Faster write-back for the C variant of r5g6b5 dest iterator · e66fd5cc
      Siarhei Siamashka authored
      Unrolling loops improves performance, so just use it here.
      
      Also GCC can't properly optimize this code for RISC processors and
      allocate 0x1F001F constant in a register. Because this constant is
      too large to be represented as an immediate operand in instructions,
      GCC inserts some redundant arithmetics. This problem can be workarounded
      by explicitly using a variable for 0x1F001F constant and also initializing
      it by a read from another volatile variable. In this case GCC is forced
      to allocate a register for it, because it is not seen as a constant anymore.
      
      The speedup relative to the generic store_scanline_r5g6b5() from
      "pixman-access.c" (pixman was compiled with gcc 4.7.2):
      
          MIPS 74K        480MHz  :  33.22 MPix/s ->  43.42 MPix/s
          ARM11           700MHz  :  50.16 MPix/s ->  78.23 MPix/s
          ARM Cortex-A8  1000MHz  : 117.75 MPix/s -> 196.34 MPix/s
          ARM Cortex-A9  1700MHz  : 177.04 MPix/s -> 320.32 MPix/s
          ARM Cortex-A15 1700MHz  : 231.44 MPix/s -> 261.64 MPix/s
          IBM Cell PPU   3200MHz  : 130.25 MPix/s -> 145.61 MPix/s
          Intel Core i7  2800MHz  : 502.21 MPix/s -> 721.73 MPix/s
      
      That's the performance for C code (SIMD and assembly optimizations
      are disabled via PIXMAN_DISABLE environment variable).
      e66fd5cc
    • Siarhei Siamashka's avatar
      Added C variants of r5g6b5 fetch/write-back iterators · a9f66694
      Siarhei Siamashka authored
      Adding specialized iterators for r5g6b5 color format allows us to work
      on fine tuning performance of r5g6b5 fetch/write-back operations in the
      pixman general "fetch -> combine -> store" pipeline.
      
      These iterators also make "src_x888_0565" fast path redundant, so it can
      be removed.
      a9f66694
    • Chris Wilson's avatar
    • Chris Wilson's avatar
      Always return a valid function from lookup_combiner() · a59f081d
      Chris Wilson authored
      
      We should always have at least a C combiner available, so we never
      expect the search to fail. If it does, emit an error and return a
      dummy function.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      a59f081d
    • Chris Wilson's avatar
      Always return a valid function from lookup_composite() · 52023091
      Chris Wilson authored
      
      We never expect to fail to find the appropriate function as the
      general_composite_rect should always match. So if somehow we fallthrough
      the search, emit a _pixman_log_error() and return a dummy function.
      
      Note that we remove some conditionals and a level of indentation hence a
      large amount of code movement. This also reveals that in a few places we
      are duplicating stack variables that can be eliminated later.
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      52023091
    • Chris Wilson's avatar
      sse2: Add fast paths for bilinear source with a solid mask · b283c864
      Chris Wilson authored
      
      Based on the existing sse2_8888_n_8888 nearest scaling routines.
      
      fishbowl on an i5-2500: 60.9s -> 56.9s
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      b283c864
    • Chris Wilson's avatar
      sse2: Add a fast path for add_n_8_8888 · d00ce409
      Chris Wilson authored
      
      This path is being exercised by compositing of trapezoids for clipmasks, for
      instance as used in the firefox-asteroids cairo-trace.
      
      IVB i7-3720qm ./tests/lowlevel-blt-bench add_n_8_8888:
      
      reference memcpy speed = 14846.7MB/s (3711.7MP/s for 32bpp fills)
      
      before: L1: 681.10  L2: 735.14  M:701.44 ( 28.35%)  HT:283.32  VT:213.23  R:208.93  RT: 77.89 ( 793Kops/s)
      
      after:  L1: 992.91  L2:1017.33  M:982.58 ( 39.88%)  HT:458.93  VT:332.32  R:326.13  RT:136.66 (1287Kops/s)
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      d00ce409
    • Chris Wilson's avatar
      sse2: Add a fast path for add_n_8888 · 7ced3bee
      Chris Wilson authored
      
      This path is being exercised by inplace compositing of trapezoids, for
      instance as used in the firefox-asteroids cairo-trace.
      
      IVB i3-3720qm ./tests/lowlevel-blt-bench add_n_888:
      
      reference memcpy speed = 14918.3MB/s (3729.6MP/s for 32bpp fills)
      
      before: L1:1752.44  L2:2259.48  M:2215.73 ( 58.80%)  HT:589.49   VT:404.04   R:424.69  RT:134.68 (1182Kops/s)
      
      after:  L1:3931.21  L2:6132.78  M:3440.17 ( 92.24%)  HT:1337.70  VT:1357.64  R:1270.27  RT:359.78 (2161Kops/s)
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      7ced3bee
  5. Jan 25, 2013
    • Jeff Muizelaar's avatar
      Add a version of bilinear_interpolation for precision <=4 · b7f523e3
      Jeff Muizelaar authored
      Having 4 or fewer bits means we can do two components at
      a time in a single 32 bit register.
      
      Here are the results for firefox-fishtank on a Pandaboard with
      4.6.3 and PIXMAN_DISABLE="arm-neon"
      
      Before:
      [ # ]  backend                         test   min(s) median(s) stddev. count
      [  0]    image           t-firefox-fishtank    7.841    7.910   0.70%    6/6
      
      After:
      [ # ]  backend                         test   min(s) median(s) stddev. count
      [  0]    image           t-firefox-fishtank    6.951    6.995   1.11%    6/6
      b7f523e3
    • Ben Avison's avatar
      Tweaks to lowlevel-blt-bench · 24e83cae
      Ben Avison authored
      This adds two extra tests, src_n_8 and src_8_8, which I have been
      using to benchmark my ARMv6 changes.
      
      I'd also like to propose that it requires an exact test name as the
      executable's argument, as achieved by this strstr to strcmp change.
      Without this, it is impossible to only benchmark (for example)
      add_8_8, add_n_8 or src_n_8, due to those also being substrings of
      many other test names.
      24e83cae
  6. Jan 23, 2013
  7. Jan 22, 2013
    • Nemanja Lukic's avatar
      MIPS: DSPr2: Added more fast-paths: · 2c657747
      Nemanja Lukic authored
       - over_reverse_n_8888
       - in_n_8_8
      
      Performance numbers before/after on MIPS-74kc @ 1GHz:
      
      lowlevel-blt-bench results
      
      Referent (before):
              over_reverse_n_8888 =  L1:  19.42  L2:  19.07  M: 15.38 ( 40.80%)  HT: 13.35  VT: 13.10  R: 12.92  RT:  8.27 (  49Kops/s)
                         in_n_8_8 =  L1:  21.20  L2:  22.86  M: 21.42 ( 14.21%)  HT: 15.97  VT: 15.69  R: 15.47  RT:  8.00 (  48Kops/s)
      
      Optimized:
              over_reverse_n_8888 =  L1:  60.09  L2:  47.87  M: 28.65 ( 76.02%)  HT: 23.58  VT: 22.51  R: 21.99  RT: 12.28 (  60Kops/s)
                         in_n_8_8 =  L1:  89.38  L2:  86.07  M: 65.48 ( 43.44%)  HT: 44.64  VT: 41.50  R: 40.77  RT: 16.94 (  66Kops/s)
      2c657747
    • Nemanja Lukic's avatar
      MIPS: DSPr2: Added more fast-paths for REVERSE operation: · a67b0e24
      Nemanja Lukic authored
       - out_reverse_8_0565
       - out_reverse_8_8888
      
      Performance numbers before/after on MIPS-74kc @ 1GHz:
      
      lowlevel-blt-bench results
      
      Referent (before):
              out_reverse_8_0565 =  L1:  14.29  L2:  13.58  M: 12.14 ( 24.16%)  HT:  9.23  VT:  9.12  R:  8.84  RT:  4.75 (  36Kops/s)
              out_reverse_8_8888 =  L1:  27.46  L2:  23.24  M: 17.41 ( 57.73%)  HT: 12.61  VT: 12.47  R: 11.79  RT:  5.86 (  41Kops/s)
      
      Optimized:
              out_reverse_8_0565 =  L1:  28.24  L2:  25.64  M: 20.63 ( 41.05%)  HT: 16.69  VT: 16.14  R: 15.50  RT:  8.69 (  52Kops/s)
              out_reverse_8_8888 =  L1:  52.78  L2:  41.44  M: 23.50 ( 77.94%)  HT: 18.79  VT: 18.16  R: 16.90  RT:  9.11 (  53Kops/s)
      a67b0e24
  8. Jan 06, 2013
    • Søren Sandmann Pedersen's avatar
      pixman-filter.c: Cope with NULL returns from malloc() · 35cc9655
      Søren Sandmann Pedersen authored
      v2: Don't return a pointer to uninitialized memory when the allocation
      of horz and vert fails, but allocation of params doesn't.
      35cc9655
    • Søren Sandmann Pedersen's avatar
      Handle solid images in the noop iterator · 58526cfc
      Søren Sandmann Pedersen authored
      The noop src iterator already has code to handle solid images, but
      that code never actually runs currently because it is not possible for
      an image to have both a format code of PIXMAN_solid and a flag of
      FAST_PATH_BITS_IMAGE.
      
      If these two were to be set at the same time, the
      fast_composite_tiled_repeat() fast path would trigger for solid images
      (because it triggers for PIXMAN_any formats, which includes
      PIXMAN_solid), but for solid images we can usually do better than that
      fast path.
      
      So this patch removes _pixman_solid_fill_iter_init() and instead
      handles such images (along with repeating 1x1 bits images without an
      alpha map) in pixman-noop.c.
      
      When a 1x1R image is involved in the general composite path, before
      this patch, it would hit this code in repeat() in pixman-inlines.h:
      
              while (*c >= size)
                  *c -= size;
              while (*c < 0)
                  *c += size;
      
      and those loops could run for a huge number of iteratons (proportional
      to the composite width). For such cases, the performance improvement
      is really big:
      
      ./test/lowlevel-blt-bench -n add_n_8888:
      
      Before:
      
          add_n_8888 =  L1:   3.86  L2:   3.78  M:  1.40 (  0.06%)  HT:  1.43  VT:  1.41  R:  1.41  RT:  1.38 (  19Kops/s)
      
      After:
      
          add_n_8888 =  L1:1236.86  L2:2468.49  M:1097.88 ( 49.04%)  HT:476.49  VT:429.05  R:417.04  RT:155.12 ( 817Kops/s)
      58526cfc
  9. Jan 03, 2013
  10. Dec 20, 2012
Loading