drm_buddy clear page tracking sometimes returns non-zeroed pages
Hi,
there have been user reports of several vkd3d-proton titles that exhibited issues from misrenders to GPU hangs on launch, for example:
mesa/mesa#11964 (closed)
https://github.com/HansKristian-Work/vkd3d-proton/issues/2223
Debugging has shown that these issues stem from "zeroed" memory containing 0xFFFFFFFF
instead of 0x0
. vkd3d-proton has a driconf that enables radv_zero_vram
, which causes RADV to allocate all memory with AMDGPU_GEM_CREATE_VRAM_CLEARED
. Therefore, the buffers containing 0xFFFFFFFF
is a kernel bug.
This bug only started appearing with commit 96950929eb232038 ("drm/buddy: Implement tracking clear page feature")/commit a68c7eaa7a8ffde ("drm/amdgpu: Enable clear page functionality"). Testing on a68c7eaa7a8ffde exhibits the issue, while testing on commit 105aa4c65b76 (the parent commit of 96950929eb232038) does not.
This issue reproduced on a PRIME system with a 6650 XT and a Raphael iGPU, however various other systems did not seem to reproduce the issue (I tested on a 7900XTX and a 7600S/680M PRIME system and was unable to reproduce on either machine). The user in the vkd3d-proton issue reports it happened on their 7900GRE.
/cc @arunpravin24 as the author of the clear page functionality.