• Matt Turner's avatar
    intel/compiler/fs: Implement ddy without using align16 for Gen11+ · 2134ea38
    Matt Turner authored
    Align16 is no more. We previously generated an align16 ADD instruction
    to calculate DDY:
    
       add(16) g25<1>F  -g23<4>.xyxyF   g23<4>.zwzwF   { align16 1H };
    
    Without align16, we now implement it as:
    
       add(4) g25<1>F   -g23<0,2,1>F    g23.2<0,2,1>F  { align1 1N };
       add(4) g25.4<1>F -g23.4<0,2,1>F  g23.6<0,2,1>F  { align1 1N };
       add(4) g26<1>F   -g24<0,2,1>F    g24.2<0,2,1>F  { align1 1N };
       add(4) g26.4<1>F -g24.4<0,2,1>F  g24.6<0,2,1>F  { align1 1N };
    
    where only the first two instructions are needed in SIMD8 mode.
    
    Note: an earlier version of the patch implemented this in two
    instructions in SIMD16:
    
       add(8) g25<2>F   -g23<4,2,0>F    g23.2<4,2,0>F  { align1 1N };
       add(8) g25.1<2>F -g23.1<4,2,0>F  g23.3<4,2,0>F  { align1 1N };
    
    but I realized that the channel enable bits will not be correct. If we
    knew we were under uniform control flow, we could emit only those two
    instructions however.
    Reviewed-by: Kenneth Graunke's avatarKenneth Graunke <kenneth@whitecape.org>
    2134ea38
brw_fs_generator.cpp 81.3 KB