vulkan context lost when imported prime buffer is used as a framebuffer
I have an AMD iGPU and an AMD dGPU. Only the connectors on the dGPU are used.
I've been testing the following setup:
- Use the iGPU as the main render device in the wayland compositor.
- Run a vulkan application with DRI_PRIME=1.
- Fullscreen that application and then unfullscreen it.
Upon unfullscreening, the render device context is lost.
When the application is fullscreen, feedback is sent to the application containing a scanout tranch. The application uses this to successfully allocate a buffer for direct scanout:
[2024-02-18T21:06:05.822Z DEBUG jay::backends::metal::video] Enabling direct scanout on HDMI-A-1
However, at this point the buffer (allocated on the dGPU due to DRI_PRIME=1) has already been imported into the iGPU. It seems likely that ADDFB2 (on the dGPU) should fail in this case or else this should be handled transparently when later using the buffer in the iGPU.
At the point when the sampling happens, the buffer is likely still being scanned out.
When fullscreen is exited, the same texture is sampled but this causes the context to be lost:
[2024-02-18T21:06:06.952Z DEBUG jay::backends::metal::video] Disabling direct scanout on HDMI-A-1
[2024-02-18T21:06:06.952Z INFO jay::gfx_apis::vulkan::instance] VULKAN: vkQueueSubmit() failed (VK_ERROR_DEVICE_LOST) (../mesa-23.3.5/src/amd/vulkan/radv_queue.c:1718)
The logs contain the following message at around the same time
amdgpu 0000:14:00.0: [drm:drm_ioctl] comm="jay" pid=1381136, dev=0xe281, auth=0, AMDGPU_CS
[drm:amdgpu_cs_ioctl [amdgpu]] Failed to process the buffer list -22!
amdgpu 0000:14:00.0: [drm:drm_ioctl] comm="jay", pid=1381136, ret=-22
System information
- OS: Arch
- GPU:
- Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c7)
- Raphael [1002:164e] (rev c9)
- Kernel version: 6.7.3
- Mesa version: 23.3.5