Commits · 2714b5d201525e176429c0c030b8376a32b4f6c7 · Richard Henderson / pixman

Apr 30, 2013

Pre-release version bump to 0.29.4 · 2714b5d2
Søren Sandmann Pedersen authored 11 years ago

View commits for tag pixman-0.29.4 pixman-0.29.4

2714b5d2
pixman/refactor: Delete this file · 7fc2654a
Søren Sandmann Pedersen authored 11 years ago
```
Essentially all of it is obsolete by now.
```
7fc2654a

MIPS: DSPr2: Added rpixbuf fast path. · cb928a77

Nemanja Lukic authored 11 years ago

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
       rpixbuf =  L1:  14.63  L2:  13.55  M:  9.91 ( 79.53%)  HT:  8.47  VT:  8.32  R:  8.17  RT:  4.90 (  33Kops/s)

Optimized:
       rpixbuf =  L1:  45.69  L2:  37.30  M: 17.24 (138.31%)  HT: 15.66  VT: 14.88  R: 13.97  RT:  8.38 (  44Kops/s)

cb928a77

MIPS: DSPr2: Added pixbuf fast path. · c6a6fbdc

Nemanja Lukic authored 11 years ago

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        pixbuf =  L1:  18.18  L2:  16.47  M: 13.36 (107.27%)  HT: 10.16  VT: 10.07  R:  9.84  RT:  5.54 (  35Kops/s)

Optimized:
        pixbuf =  L1:  43.54  L2:  36.02  M: 17.08 (137.09%)  HT: 15.58  VT: 14.85  R: 13.87  RT:  8.38 (  44Kops/s)

c6a6fbdc

test: add "pixbuf" and "rpixbuf" to lowlevel-blt-bench · f69335d5

Nemanja Lukic authored 11 years ago

Add necessary support to lowlevel-blt benchmark for benchmarking pixbuf and
rpixbuf fast paths. bench_composite function now checks for pixbuf string in
testname, and if that is detected, use same bits for src and mask images.

f69335d5

test: add "src_0888_8888_rev" and "src_0888_0565_rev" to lowlevel-blt-bench · 3dc9e382
Nemanja Lukic authored 11 years ago

3dc9e382

MIPS: DSPr2: Fix for bug in in_n_8 routine. · 44174ce5

Nemanja Lukic authored 11 years ago

Rounding logic was not implemented right.
Instead of using rounding version of the 8-bit shift, logical shifts were used.
Also, code used unnecessary multiplications, which could be avoided by packing
4 destination (a8) pixel into one 32bit register. There were also, unnecessary
spills on stack. Code is rewritten to address mentioned issues.

The bug was revealed by increasing number of the iterations in blitters-test.

Performance numbers on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
in_n_8 = L1: 21.20 L2: 22.86 M: 21.42 ( 14.21%) HT: 15.97 VT: 15.69 R: 15.47 RT: 8.00 ( 48Kops/s)
Optimized (first implementation, with bug):
in_n_8 = L1: 89.38 L2: 86.07 M: 65.48 ( 43.44%) HT: 44.64 VT: 41.50 R: 40.77 RT: 16.94 ( 66Kops/s)
Optimized (with bug fix, and code revisited):
in_n_8 = L1: 102.33 L2: 95.65 M: 70.54 ( 46.84%) HT: 48.35 VT: 45.06 R: 43.20 RT: 17.60 ( 66Kops/s)

44174ce5

MIPS: DSPr2: Added src_0565_8888 nearest neighbor fast path. · 5858f09d

Nemanja Lukic authored 11 years ago

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
         src_0565_8888 =  L1:  20.70  L2:  19.22  M: 12.50 ( 49.79%)  HT: 10.45  VT: 10.18  R:  9.99  RT:  5.31 (  31Kops/s)

Optimized:
         src_0565_8888 =  L1:  62.98  L2:  53.44  M: 23.07 ( 91.87%)  HT: 19.85  VT: 19.15  R: 17.70  RT:  9.68 (  43Kops/s)

5858f09d

MIPS: DSPr2: Added over_8888_0565 nearest neighbor fast path. · 311d55b6

Nemanja Lukic authored 11 years ago

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        over_8888_0565 =  L1:  13.22  L2:  12.02  M:  9.77 ( 38.92%)  HT:  8.58  VT:  8.35  R:  8.38  RT:  5.78 (  35Kops/s)

Optimized:
        over_8888_0565 =  L1:  26.20  L2:  22.97  M: 15.92 ( 63.40%)  HT: 13.33  VT: 13.13  R: 12.72  RT:  7.65 (  39Kops/s)

311d55b6

MIPS: DSPr2: Added over_8888_8888 nearest neighbor fast path. · bd487ee3

Nemanja Lukic authored 11 years ago

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        over_8888_8888 =  L1:  19.47  L2:  16.30  M: 11.24 ( 59.69%)  HT:  9.54  VT:  9.29  R:  9.47  RT:  6.24 (  37Kops/s)

Optimized:
        over_8888_8888 =  L1:  43.67  L2:  33.30  M: 16.32 ( 86.65%)  HT: 14.10  VT: 13.78  R: 12.96  RT:  7.85 (  39Kops/s)

bd487ee3

MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines · 66def909

Nemanja Lukic authored 11 years ago

After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2
routines was revealed. Bug manifested by wrong calculation in composite and
glyph tests, which caused make check to fail for MIPS DSPr2 optimizations.

Bug was in the calculation of the:
*dst = over (src, *dst) when ma == 0xffffffff

In this case src was not negated and shifted right by 24 bits, it was only
negated. When implementing this routine in the first place, I missplaced those
shifts, which alowed me to combine code for over operation and:
    UN8x4_MUL_UN8x4 (s, ma);
    UN8x4_MUL_UN8 (ma, srca);
    ma = ~ma;
    UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s);
So I decided to rewrite that piece of code from scratch. I changed logic, so
now assembly code mimics code from pixman-fast-path.c but processes two pixels
at a time. This code should be easier to debug and maintain.

The bug was revealed in commit b31a6962. Errors were detected by composite
and glyph tests.

66def909

Apr 28, 2013

sse2: faster bilinear interpolation (get rid of XOR instruction) · d768558c

Siarhei Siamashka authored 12 years ago

The old code was calculating horizontal weights for right pixels
in the following way (for simplicity assume 8-bit interpolation
precision):

  Start with "x = vx" and do increment "x += ux" after each pixel.
  In this case right pixel weight for interpolation can be calculated
  as "((x >> 8) ^ 0xFF) + 1", which is the same as "256 - (x >> 8)".

The new code instead:

  Starts with "x = -(vx + 1)", performs increment "x += -ux" after
  each pixel and calculates right weights as just "(x >> 8) + 1",
  eliminating the need for XOR operation in the inner loop.

So we have one instruction less on the critical path. Benchmarks
with "lowlevel-blt-bench -b src_8888_8888" using GCC 4.7.2 on
x86-64 system and default optimizations:

Intel Core i7 860 (2.8GHz):
    before: src_8888_8888 =  L1: 291.37  L2: 288.58  M:285.38
    after:  src_8888_8888 =  L1: 319.66  L2: 316.47  M:312.06

Intel Core2 T7300 (2GHz):
    before: src_8888_8888 =  L1: 121.95  L2: 118.38  M:118.52
    after:  src_8888_8888 =  L1: 128.82  L2: 125.12  M:124.88

Intel Atom N450 (1.67GHz):
    before: src_8888_8888 =  L1:  64.25  L2:  62.37  M: 61.80
    after:  src_8888_8888 =  L1:  64.23  L2:  62.37  M: 61.82

Inspired by the "sse2_bilinear_interpolation" function (single
pixel interpolation) from:
    http://lists.freedesktop.org/archives/pixman/2013-January/002575.html

d768558c

test: larger 0xFF/0x00 filled clusters in random images for blitters-test · 59109f32

Siarhei Siamashka authored 12 years ago

Current blitters-test program had difficulties detecting a bug in
over_n_8888_8888_ca implementation for MIPS DSPr2:

http://lists.freedesktop.org/archives/pixman/2013-March/002645.html

In order to hit the buggy code path, two consecutive mask values had
to be equal to 0xFFFFFFFF because of loop unrolling. The current
blitters-test generates random images in such a way that each byte
has 25% probability for having 0xFF value. Hence each 32-bit mask
value has ~0.4% probability for 0xFFFFFFFF. Because we are testing
many compositing operations with many pixels, encountering at least
one 0xFFFFFFFF mask value reasonably fast is not a problem. If a
bug related to 0xFFFFFFFF mask value is artificialy introduced into
over_n_8888_8888_ca generic C function, it gets detected on 675591
iteration in blitters-test (out of 2000000).

However two consecutive 0xFFFFFFFF mask values are much less likely
to be generated, so the bug was missed by blitters-test.

This patch addresses the problem by also randomly setting the 32-bit
values in images to either 0xFFFFFFFF or 0x00000000 (also with 25%
probability). It allows to have larger clusters of consecutive 0x00
or 0xFF bytes in images which may have special shortcuts for handling
them in unrolled or SIMD optimized code.

59109f32

Apr 27, 2013
- Trivial spelling fixes in comments · a99147d1
  Stefan Weil authored 11 years ago
  
  They were found by codespell. Signed-off-by: Stefan Weil <sw@weilnetz.de>
  a99147d1
Apr 08, 2013
- Check for missing sqrtf() as, e.g., for Solaris 9 · 9d0bb103
  Peter Breitenlohner authored 11 years ago
  
  Signed-off-by: Peter Breitenlohner <peb@mppmu.mpg.de>
  9d0bb103
Mar 16, 2013

Improve precision of calculations in pixman-gradient-walker.c · d8ac35af

Søren Sandmann Pedersen authored 12 years ago

The computations in pixman-gradient-walker.c currently take place at
very limited 8 bit precision which results in quite visible artefacts
in gradients. An example is the one produced by demos/linear-gradient
which currently looks like this:

    http://i.imgur.com/kQbX8nd.png

With the changes in this commit, the gradient looks like this:

    http://i.imgur.com/nUlyuKI.png

The images are also available here:

    http://people.freedesktop.org/~sandmann/gradients/before.png
    http://people.freedesktop.org/~sandmann/gradients/after.png

This patch computes pixels using floating point, but uses a faster
algorithm, which makes up for the loss of performance.

== Theory:

In both the new and the old algorithm, the various gradient
implementations compute a parameter x that indicates how far along the
gradient the current scanline is. The current algorithm has a cache of
the two color stops surrounding the last parameter; those are used in
a SIMD-within-register fashion in this way:

    t1 = walker->left_rb * idist + walker->right_rb * dist;

where dist and idist are the distances to the left and right color
stops respectively normalized to the distance between the left and
right stops. The normalization (which involves a division) is captured
in another cached variable "stepper". The cached values are recomputed
whenever the parameter moves in between two different stops (called
"reset" in the implementation).

Because idist and dist are computed in 8 bits only, a lot of
information is lost, which is quite visible as the image linked above
shows.

The new algorithm caches more information in the following way. When
interpolating between stops, the formula to be used is this:

     t = ((x - left) / (right - left));

     result = lc * (1 - t) + rc * t;

where

    - x is the parameter as computed by the main gradient code,
    - left is the position of the left color stop,
    - right is the position of the right color stop
    - lc is the color of the left color stop
    - rc is the color of the right color stop

That formula can also be written like this:

    result
      = lc * (1 - t) + rc * t;
      = lc + (rc - lc) * t
      = lc + (rc - lc) * ((x - left) / (right - left))
      = (rc - lc) / (right - left) * x +
      	       lc - (left * (rc - lc)) / (right - left)
      = s * x + b

where

    s = (rc - lc) / (right - left)

and

    b = lc - left * (rc - lc) / (right - left)
      = (lc * (right - left) - left * (rc - lc)) / (right - left)
      = (lc * right - rc * left) / (right - left)

To summarize, setting w = (right - left):

    s = (rc - lc) / w
    b = (lc * right - rc * left) / w

    r = s * x + b

Since s and b only depend on the two active stops, both can be cached
so that the computation only needs to do one multiplication and one
addition per pixel (followed by premultiplication of the alpha
channel). That is, seven multiplications in total, which is the same
number as the old SIMD-within-register implementation had.

== Implementation notes:

The new formula described above is implemented in single precision
floating point, and the eight divisions necessary to compute the
cached values are done by multiplication with the reciprocal of the
distance between the color stops.

The alpha values used in the cached computation are scaled by 255.0,
whereas the RGB values are kept in the [0, 1] interval. The ensures
that after premultiplication, all values will be in the [0, 255]
interval.

This scaling is done by first dividing all the all the channels by
257, and then later on dividing the r, g, b channels by 255. It would
be more natural to do all this scaling in only one place, but
inexplicably, that results in a (substantial) slowdown on Sandy Bridge
with GCC v 4.7.

== Performance impact (median of three runs of radial-perf-test):

   == Intel Sandy Bridge, Core i3 @ 1.2GHz

   Before: 0.014553
   After:  0.014410
   Change: 1.0% faster

   == AMD Barcelona @ 1.2 GHz

   Before: 0.021735
   After:  0.021328
   Change: 1.9% faster

Ie., slightly faster, though conceivably there could be a negative
impact on machines with a bigger difference between integer and
floating point performance.

V2:

- Use 's' and 'b' in the variable names instead of 'm' and 'd'. This
  way they match the explanation above

- Move variable declarations to the top of the function

- Remove unused stepper field

- Some formatting fixes

- Don't pointlessly include pixman-combine32.h

- Don't offset x for each pixel; go back to offsetting left_x and
  right_x at reset time. The offsets cancel out in the formula above,
  so there is no impact on the calcualations.

d8ac35af

Mar 12, 2013

Move the IS_ZERO() to pixman-private.h and rename to FLOAT_IS_ZERO() · a1c2331e
Søren Sandmann Pedersen authored 12 years ago
```
Some upcoming changes to pixman-gradient-walker.c will need this
macro.
```
a1c2331e

test: Add radial-perf-test, a microbenchmark for radial gradients · 2c953e57

Søren Sandmann Pedersen authored 12 years ago

This benchmark renders one of the radial gradients used in the
swfdec-youtube cairo trace 500 times and reports the average time it
took.

V2: Update .gitignore

2c953e57

demos: Add linear-gradient demo program · 460faaa4

Søren Sandmann Pedersen authored 12 years ago

This program displays a linear gradient from blue to yellow. Due to
limited precision in pixman-gradient-walker.c, it currently has some
ugly artefacts that gives it a 'brushed metal' appearance.

V2: Update .gitignore

460faaa4

Mar 08, 2013
- Remove unused macro · aaae3d8e
  Behdad Esfahbod authored 12 years ago
  
  aaae3d8e
Feb 27, 2013

MIPS: DSPr2: Added more fast-paths for SRC operation: · 5feda20f

Nemanja Lukic authored 12 years ago

 - src_0888_8888_rev
 - src_0888_0565_rev

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        src_0888_8888_rev =  L1:  51.88  L2:  42.00  M: 19.04 ( 88.50%)  HT: 15.27  VT: 14.62  R: 14.13  RT:  7.12 (  45Kops/s)
        src_0888_0565_rev =  L1:  31.96  L2:  30.90  M: 22.60 ( 75.03%)  HT: 15.32  VT: 15.11  R: 14.49  RT:  6.64 (  43Kops/s)

Optimized:
        src_0888_8888_rev =  L1: 222.73  L2: 113.70  M: 20.97 ( 97.35%)  HT: 18.31  VT: 17.14  R: 16.71  RT:  9.74 (  54Kops/s)
        src_0888_0565_rev =  L1: 100.37  L2:  74.27  M: 29.43 ( 97.63%)  HT: 22.92  VT: 21.59  R: 20.52  RT: 10.56 (  56Kops/s)

5feda20f

MIPS: DSPr2: Added more fast-paths for OVER operation: · 43914d68

Nemanja Lukic authored 12 years ago

 - over_8888_0565
 - over_n_8_8

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        over_8888_0565 =  L1:  14.30  L2:  13.22  M: 10.43 ( 41.56%)  HT: 12.51  VT: 12.95  R: 11.82  RT:  7.34 (  49Kops/s)
            over_n_8_8 =  L1:  12.77  L2:  16.93  M: 15.03 ( 29.94%)  HT: 10.78  VT: 10.72  R: 10.29  RT:  4.92 (  33Kops/s)

Optimized:
        over_8888_0565 =  L1:  26.03  L2:  22.92  M: 15.68 ( 62.43%)  HT: 16.19  VT: 16.27  R: 14.93  RT:  8.60 (  52Kops/s)
            over_n_8_8 =  L1:  62.00  L2:  55.17  M: 40.29 ( 80.23%)  HT: 26.77  VT: 25.64  R: 24.13  RT: 10.01 (  47Kops/s)

43914d68

Feb 15, 2013

gtk-utils.c: Use cairo in show_image() rather than GdkPixbuf · 2156fb51

Søren Sandmann Pedersen authored 12 years ago

GdkPixbufs are not premultiplied, so when using them to display pixman
images, there is some unecessary conversions going on: First the image
is converted to non-premultiplied, and then GdkPixbuf premultiplies
before sending the result to the X server. These conversions may cause
the displayed image to not be exactly identical to the original.

This patch just uses a cairo image surface instead, which avoids these
conversions.

Also make the comment about sRGB a little more concise.

2156fb51

Feb 13, 2013

Fix to lowlevel-blt-bench · 5e207f82

Ben Avison authored 12 years ago

The source, mask and destination buffers are initialised to 0xCC just after
they are allocated. Between each benchmark, there are a pair of memcpys,
from the destination buffer to the source buffer and back again (there are
no explanatory comments, but presumably this is an effort to flush the
caches). However, it has an unintended consequence, which is to change the
contents of the buffers on entry to subsequent benchmarks. This means it is
not a fair test: for example, with over_n_8888 (featured in the following
patches) it reports L2 and even M tests as being faster than the L1 test,
because after the L1 test, the source buffer is filled with fully opaque
pixels, for which over_n_8888 has a shortcut.

The fix here is simply to reverse the order of the memcpys, so src and
destination are both filled with 0xCC on entry to all tests.

5e207f82

sse2: Use uintptr_t in type casts from pointer to integral value · d26f922d

Stefan Weil authored 12 years ago


Some recent code added new type casts from pointer to unsigned long.
These type casts result in compiler warnings for systems like
MinGW-w64 (64 bit Windows) where sizeof(unsigned long) != sizeof(void *).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

d26f922d

lookup_composite: Don't update cache in case of error · dc80eb09

Søren Sandmann Pedersen authored 12 years ago

If we fail to find a composite function, don't update the fast path
cache with the dummy compositing function.

Also make the error message state that the bug is likely caused by
issues with thread local storage.

dc80eb09

Turn on error logging at all times · 4dced81c

Søren Sandmann Pedersen authored 12 years ago

While releasing 0.29.2 the distcheck run produced a number of error
messages that had to be fixed in 349015e1.
These were not caught before so nobody had actually run pixman with
debugging turned on. It's not the first time this has happened, see
5b0563f3 for example.

So this patch makes the return_if_fail() macros use unlikely() around
the expressions and then turns on error logging at all times. The
performance hit should negligible since we were already evaluating the
expressions.

The place where DEBUG actually does cause a performance hit is in the
region selfcheck code, and that will still only be enabled in
development snapshots.

4dced81c

pixman-compiler.h: Add unlikely() macro · f4c9492c

Søren Sandmann Pedersen authored 12 years ago

When compiling with GCC this macro expands to __builtin_expect((expr), 0).
On other compilers, it just expands to (expr).

f4c9492c

utils.c: Increase acceptable deviation to 0.0064 in pixel_checker_t · 5ebb5ac3

Søren Sandmann Pedersen authored 12 years ago

The check-formats programs reveals that the 8 bit pipeline cannot meet
the current 0.004 acceptable deviation specified in utils.c, so we
have to increase it. Some of the failing pixels were captured in
pixel-test, which with this commit now passes.

== a4r4g4b4 DISJOINT_XOR a8r8g8b8 ==

The DISJOINT_XOR operator applied to an a4r4g4b4 source pixel of
0xd0c0 and a destination pixel of 0x5300ea00 results in the exact
value:

    fa = (1 - da) / sa = (1 - 0x53 / 255.0) / (0xd / 15.0) = 0.7782
    fb = (1 - sa) / da = (1 - 0xd / 15.0) / (0x53 / 255.0) = 0.4096

    r = fa * (0xc / 15.0) + fb * (0xea / 255.0) = 0.99853

But when computing in 8 bits, we get:

    fa8 = ((255 - 0x53) * 255 + 0xdd / 2) / 0xdd = 0xc6
    fb8 = ((255 - 0xdd) * 255 + 0x53 / 3) / 0x53 = 0x68

    r8 = (fa8 * 0xcc + 127) / 255 + (fb8 * 0xea + 127) / 255 = 0xfd

and

    0xfd / 255.0 = 0.9921568627450981

for a deviation of 0.00637118610187, which we then have to consider
acceptable given the current implementation.

By switching to computing the result with

   r = (fa * s + fb * d + 127) / 255

rather than

   r = (fa * s + 127) / 255 + (fb * d + 127) / 255

the deviation would be only 0.00244961747442, so at some point it may
be worth doing either this, or switching to floating point for
operators that involve divisions.

Note that the conversion from 4 bits to 8 bits does not cause any
error in this case because both rounding and bit replication produces
an exact result when the number of from-bits divide the number of
to-bits.

== a8r8g8b8 OVER r5g6b5 ==

When OVER compositing the a8r8g8b8 pixel 0x0f00c300 with the x14r6g6b6
pixel 0x03c0, the true floating point value of the resulting green
channel is:

   0xc3 / 255.0 + (1.0 - 0x0f / 255.0) * (0x0f / 63.0) = 0.9887955

but when compositing 8 bit values, where the 6-bit green channel is
converted to 8 bit through bit replication, the 8-bit result is:

   0xc3 + ((255 - 0x0f) * 0x3c + 127) / 255 = 251

which corresponds to a real value of 0.984314. The difference from the
true value is 0.004482 which is bigger than the acceptable deviation
of 0.004. So, if we were to compute all the CONJOINT/DISJOINT
operators in floating point, or otherwise make them more accurate, the
acceptable deviation could be set at 0.0045.

If we were doing the 6-bit conversion with rounding:

   (x / 63.0 * 255.0 + 0.5)

instead of bit replication, the deviation in this particular case
would be only 0.0005, so we may want to consider this at some
point.

5ebb5ac3

test: Add new pixel-test regression test · f2ba7fe1

Søren Sandmann Pedersen authored 12 years ago

This test program contains a table of individual operator/pixel
combinations. For each pixel combination, images of various sizes are
filled with the pixels and then composited. The result is then
verified against the output of do_composite(). If the result doesn't
match, detailed error information is printed.

The initial 14 pixel combinations currently all fail.

f2ba7fe1

a1-trap-test: Add tests for operator_name and format_name() · 67816367

Søren Sandmann Pedersen authored 12 years ago

The check-formats.c test depends on the exact format of the strings
returned from these functions, so add a test here.

a1-trap-test isn't the ideal place, but it seems like overkill to add
a new test just for these trivial checks.

67816367

test: Add new check-formats utility · d1434d11

Søren Sandmann Pedersen authored 12 years ago

Given an operator and two formats, this program will composite and
check all pixels where the red and blue channels are 0. That is, if
the two formats are a8r8g8b8 and a4r4g4b4, all source pixels matching
the mask

    0xff00ff00

are composited with the given operator against all destination pixels
matching the mask

    0xf0f0

and the result is then verified against the do_composite() function
that was moved to utils.c earlier.

This program reveals that a number of operators and format
combinations are not computed to within the precision currently
accepted by pixel_checker_t. For example:

    check-formats over a8r8g8b8 r5g6b5 | grep failed | wc -l
    30

reveals that there are 30 pixel combinations where OVER produces
insufficiently precise results for the a8r8g8b8 and r5g6b5 formats.

d1434d11

utils.[ch]: Add pixel_checker_get_masks() · 1820131f
Søren Sandmann Pedersen authored 12 years ago
```
This function returns the a, r, g, and b masks corresponding to the
pixel checker's format.
```
1820131f
test/utils.[ch]: Add pixel_checker_convert_pixel_to_color() · 5eb61f72
Søren Sandmann Pedersen authored 12 years ago
```
This function takes a pixel in the format corresponding to the pixel
checker, and converts to a color_t.
```
5eb61f72
test: Move do_composite() function from composite.c to utils.c · 3ae717f7
Søren Sandmann Pedersen authored 12 years ago
```
So that it can be used in other tests.
```
3ae717f7

Jan 30, 2013

Post-release version bump to 0.29.3 · 958bd334
Søren Sandmann Pedersen authored 12 years ago

958bd334
Pre-release version bump to 0.29.2 · a56707e2
Søren Sandmann Pedersen authored 12 years ago

View commits for tag pixman-0.29.2 pixman-0.29.2

a56707e2

stresstest: Ensure that the rasterizer is only given alpha formats · 349015e1

Søren Sandmann Pedersen authored 12 years ago

In c2cb303d, return_if_fail()s were added to
prevent the trapezoid rasterizers from being called with non-alpha
formats. However, stress-test actually does call the rasterizers with
non-alpha formats, but because _pixman_log_error() is disabled in
versions with an odd minor number, the errors never materialized.

Fix this by changing the argument to random format to an enum of three
values DONT_CARE, PREFER_ALPHA, or REQUIRE_ALPHA, and then in the
switch that calls the trapezoid rasterizers, pass the appropriate
value for the function in question.

349015e1

Jan 29, 2013

Change default GPGKEY to 3892336E, which is soren.sandmann@gmail.com · afde8629

Søren Sandmann Pedersen authored 12 years ago

The old one belongs to the email address sandmann@daimi.au.dk, which
doesn't work anyore.

Also use gpg to get the name and address for the "(Signed by ...)"
line since that works more reliably for me than using git.

afde8629

Improve L1 and L2 benchmark tests for caches that don't use allocate-on-write · 69a7a9b6

Ben Avison authored 12 years ago

In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which
are usually configured this way. For other CPUs, this should only add a
constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs.

The problems were caused by cachelines becoming permanently evicted from
the cache, because the code that was intended to pull them back in again on
each iteration assumed too long a cache line (for the L1 test) or failed to
read memory beyond the first pixel row (for the L2 test). Also, the reloading
of the source buffer was unnecessary.

These issues were identified by Siarhei in this post:
http://lists.freedesktop.org/archives/pixman/2013-January/002543.html

69a7a9b6

Admin message

Admin message