Skip to content

r300: add some synchronization for KIL (dEQP discard fix)

Pavel Ondračka requested to merge ondracka/mesa:r300_discard_control_flow into main

This is a fix for dEQP-GLES2.functional.shaders.discard.dynamic_loop_always. The draft status is there because I have no clue why it actually works...

I've been looking at the remaining dEQP failures and the dEQP-GLES2.functional.shaders.discard.dynamic_loop_always is one of them.

[fragment shader]
varying mediump vec4 v_color;
varying mediump vec4 v_coords;
uniform sampler2D    ut_brick;
uniform mediump int  ui_one;
uniform mediump int  ui_two;

void main (void)
{
    gl_FragColor = v_color;
    for (int i = 0; i < ui_two; i++)
    {
        if (i > 0)
            discard;
    }
}

Looking at the generated code doesn't show anything wrong

Fragment Program: after 'register allocation'
# Radeon Compiler Program
  0: 
     MAD temp[1].x, src0.0__, src0.1__, src0.0__
  1: BGNLOOP;
  2:   src0.xyz = temp[1], src1.xyz = const[0]
       MAD aluresult, src0.x__, src0.111, -src1.x__
        [aluresult = (result >= 0)]
  3:   IF aluresult.x___;
  4:     BRK;
  5:   ENDIF;
  6:   src0.xyz = temp[1]
       CMP temp[1].y, src0._0_, src0._1_, -src0._x_
  7:   src0.xyz = temp[1]
       MAD temp[2].xyz, -src0.yyy, src0.111, src0.000
       MAD temp[2].w, -src0.y, src0.1, src0.0
  8:   BEGIN_TEX;
  9:   KIL temp[2];
 10:   src0.xyz = temp[1]
       MAD temp[1].x, src0.x__, src0.1__, src0.1__
 11: ENDLOOP;
 12: src0.xyz = input[0], src0.w = input[0] SEM_WAIT
     MAD color[0].xyz, src0.xyz, src0.111, src0.000
     MAD color[0].w, src0.w, src0.1, src0.0

however there is a random corruption in some pixels of the resulting image. While playing with RADEON_DEBUG=noopt and also different mesa versions, the test passes under some circumstances, however when I was looking at the differences, in fact there are should be no functionality change, when there is less optimizations and the instructions are shuffled a bit is passes, but the functionality (is/should be) the same for passing and failing circumstances.

I thought this could be some sort of synchronization issue and adding texture semaphore wait to flowcontrol instructions indeed fixes the test. The patch does it in less invasive way, the wait is only set for the first fc instruction after the KIL. Why it actually fixes things is unclear to me though. Note that the color[0] write already have the SEM_WAIT set (we have to always set it at the last instruction), so the pixel should get 100% killed before and the output should not get written.

Note that we already set ALU_WAIT for all flow control instructions so it already waits for ALU instructions, TEX instructions can't influence control flow directly (we can only do it through the aluresult register from ALU instructions), however the text semaphore is also already properly set for the ALU instructions that need the TEX results. In this regards the TEXKIL is the only TEX instruction that has really no synchronization, so I think it makes sense to add it there as well. The docs are unfortunately quiet about this, there is just a short paragraph in section 8.3.1 Synchronization of instruction streams of the R5xx docs, but nothing relevant.

@gawin thoughts?

Edited by Pavel Ondračka

Merge request reports