Skip to content

radv: On GFX11, set DB_Z_INFO.NUM_SAMPLES to MSAA_EXPOSED_SAMPLES without Z/S

This case is a new addition in GFX11 (earlier architectures used other registers for the same purposes), and according to PAL, when no depth/stencil attachment is bound, it must be set to the number of coverage samples (the number of SampleMask bits — which is MSAA_EXPOSED_SAMPLES):

https://github.com/GPUOpen-Drivers/pal/blob/4640888b579bc9b0951c586b08a4552f71780d0d/src/core/hw/gfxip/gfx9/gfx9UniversalCmdBuffer.cpp#L6978

Without this change, the maximum of depth/stencil and color sample counts is used, and if there are no depth/stencil or color attachments (target-independent rasterization), the Depth Block assumes 1 coverage sample, and thus Primitive Ordered Pixel Shading (Fragment Shader Interlock) doesn't work correctly (and fails 4xAA fragment shader interlock CTS tests), and occlusion queries apparently don't count the correct number of samples (according to the "Sample Counting" section of the Vulkan specification, "the occlusion query sample counter increments by one for each sample with a coverage value of 1…")

This is also a part of !22250 (merged) (fragment shader interlock), but since it may help other parts of the pipeline too, it may be beneficial to merge this change separately.

Note that if any attachment is color/depth attachment is used in the subpass, the rasterization sample count must be max(color samples, depth samples) with VK_AMD_mixed_attachment_samples (see https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkGraphicsPipelineCreateInfo.html VUID-VkGraphicsPipelineCreateInfo-subpass-01505, as well as the stricter VUID-VkGraphicsPipelineCreateInfo-renderPass-06854 without the attachment). So if there's any color attachment, the behavior will be the same as before.

The logic is slightly messy when secondary command buffers are involved, however, because the condition can't be exactly the same as in primary command buffers, as the image view is specified at playback time, not at recording time, thus iview is always null in them. I'm using the closest approximation to see if there's likely no depth buffer — the depth/stencil attachment format being UNDEFINED. However, I'm not sure how accurate it is, if you have better ideas (apparently you can bind a null depth/stencil image explicitly in the framebuffer even if the render pass has a depth/stencil attachment?), please post them in the comments. But this is still better than the nothing that we had previously, of course.

It may also be useful to check if something similar needs to be applied to the radeonsi driver, where sctx->framebuffer.log_samples is currently used in this case, though I'm not familiar with OpenGL multisampling rules, so I'm a bit scared to enter it currently.

Unfortunately, the occlusion query and VRS interactions involved in this merge request don't seem to be covered by the CTS. dEQP-VK.query_pool.occlusion_query.* (in the CTS version as of April 1, 2023) pass regardless of this change. dEQP-VK.fragment_shading_rate.* take a long time, but I'll post the results when I have them. But in !22250 (merged), half of dEQP-VK.fragment_shader_interlock.* fail without this change (if running the tests with !22250 (merged), note that it already includes this commit).

Edited by Triang3l (Vitaliy Kuzmin)

Merge request reports