Skip to content
  • Connor Abbott's avatar
    spirv: Rewrite determinant calculation · fe5d94f0
    Connor Abbott authored
    The old calculation for mat3 was clever, but it turns out that a
    straightforward application of subdeterminants similar to how mat4 is
    handled is more efficient: on a scalar architecture with some sort of
    combined multiply+add instruction with a negate modifier (both fairly
    common), the new determinant is 9 instructions vs. 15 for the old one,
    and without the multiply-add it's 14 instructions vs. 18 for the old
    one.  When used as a routine for inverse() the savings are compounded,
    because we now use the same method as used to compute the adjucate
    matrix and so CSE can combine most of the calculations with the adjucate
    matrix ones.
    
    Once mat3 and mat4 use the same method for computing determinants, we
    can combine them into a single recursive function. I also pulled up the
    mat_subdet() function because it was doing basically what we need, so
    it's now shared between determinant and inverse. This shrinks the
    implementation significantly, as can be seen from the diffstat.
    
    The real reason I want to change this, though, is that it fixes
    dEQP-VK.glsl.builtin.precision_fp16_storage16b.inverse.compute.mat3 with
    turnip. Qualcomm uses round-to-zero for 16-bit frcp, which combined with
    some inaccuracy in the old method of calculating the determinant led us
    to fail. Qualcomm's driver uses something like the new method to
    calculate the determinant in the inverse. We could argue that Mesa's
    method should be allowed, because round-to-zero for floating-point
    division is within spec and there are no precision guarantees given for
    determinant() or inverse(). However we might as well use the more
    efficient method.
    fe5d94f0