Skip to content

freedreno/a6xx: Fix mh31 intermittent faults

Rob Clark requested to merge robclark/mesa:fd/mh31-intermittent-faults into main

I've noticed for a while now intermittent iova faults in mh31. They are "harmless" in the sense that they aren't causing any corruption, and if you weren't paying attention to dmesg you wouldn't even notice them.

Until now it had been kind of low priority, since we didn't really have a good way to debug it (all full cmdstream capture of something like mh31 is too much, and I never managed to find a simpler reproducer). But with the patchset for improved iova fault logging and devcoredumps on faults, I now had a good way to dig in deeper to what is going on.

The fault seems to always happen from a push-const buffer at the last 8 dwords of a buffer, with the fault address at the start of the next page:

fault-info:
  - ttbr0=000000008f3b1000
  - iova=0000000114117000
  - dir=READ
  - type=TRANSLATION
  - source=HLSQ

... snip ...

                                group_id: 9
                                count: 16
                                addr: 0000000103e6db60
                                flags: 0
                                enable_mask: 0x7
0000000103e6db60:                                       0000: 70328003 01224000 14116e60 00000001 70328003 01224004 14107080 00000001
0000000103e6db80:                                       0020: 70348003 01324000 14116fe0 00000001 70348003 01324004 0be69240 00000001
                                                opcode: CP_LOAD_STATE6_GEOM (32) (4 dwords)
                                                        { DST_OFF = 0 | STATE_TYPE = ST6_CONSTANTS | STATE_SRC = SS6_INDIRECT | STATE_BLOCK = SB6_VS_SHADER | NUM_UNIT = 4 }
                                                        { EXT_SRC_ADDR = 0x14116e60 }
                                                        { EXT_SRC_ADDR_HI = 0x1 }
0000000103e6db60:                                               0000: 70328003 01224000 14116e60 00000001
                                                opcode: CP_LOAD_STATE6_GEOM (32) (4 dwords)
                                                        { DST_OFF = 4 | STATE_TYPE = ST6_CONSTANTS | STATE_SRC = SS6_INDIRECT | STATE_BLOCK = SB6_VS_SHADER | NUM_UNIT = 4 }
                                                        { EXT_SRC_ADDR = 0x14107080 }
                                                        { EXT_SRC_ADDR_HI = 0x1 }
0000000103e6db70:                                               0000: 70328003 01224004 14107080 00000001
                                                opcode: CP_LOAD_STATE6_FRAG (34) (4 dwords)
                                                        { DST_OFF = 0 | STATE_TYPE = ST6_CONSTANTS | STATE_SRC = SS6_INDIRECT | STATE_BLOCK = SB6_FS_SHADER | NUM_UNIT = 4 }
>>>>>                                                   { EXT_SRC_ADDR = 0x14116fe0 }
                                                        { EXT_SRC_ADDR_HI = 0x1 }
0000000103e6db80:                                               0000: 70348003 01324000 14116fe0 00000001
                                                opcode: CP_LOAD_STATE6_FRAG (34) (4 dwords)
                                                        { DST_OFF = 4 | STATE_TYPE = ST6_CONSTANTS | STATE_SRC = SS6_INDIRECT | STATE_BLOCK = SB6_FS_SHADER | NUM_UNIT = 4 }
                                                        { EXT_SRC_ADDR = 0xbe69240 }
                                                        { EXT_SRC_ADDR_HI = 0x1 }
0000000103e6db90:                                               0000: 70348003 01324004 0be69240 00000001

I haven't managed to reproduce this yet with something simpler, like computerator. Although that could be a difference between SDS and CP_LOAD_STATE directly in IB2.

Changing the constbuf alignment to 64 does avoid the potential that there is less than 16 dwords remaining when constants are uploaded, and reliably fixes the iova faults.

Merge request reports