Skip to content

r600: Make SFN and asm callable outside Gallium (in Vulkan)

This merge request makes it possible to build and use SFN outside Gallium — specifically in the Vulkan driver in the future. While it still doesn’t separate those parts into a common library in the amd directory, it eliminates all link-time dependencies on Gallium, as well as some (but very far from all) header dependencies. On my branch, these changes are sufficient for the Vulkan driver to be able to invoke the shader compiler and obtain the shader binary (if those parts are also built as a separate static library).

Though this merge request mostly doesn’t introduce any functional changes, nothing in the upstream requires this, and no methods of enforcing that Gallium independence is maintained in the future are created in it, the idea of this merge request is to make rebasing the Vulkan branch easier in the future, since lately various changes have been made to the parts modified here, and this has already been causing issues — during one of the rebases, I missed the change related to vertex buffer strides because I moved r600_create_vertex_fetch_shader to a different file. As the work continues, there will be lots of changes to the lowerings of course (most importantly for resource binding, passing of various internal constants), but I expect the overall structure of the SFN calls to stay mostly the same, so it will be easy to rebase those details if the high-level interface changes are merged early.

I intentionally want the Vulkan driver to be separate from the Gallium R600 driver and Gallium overall, just like the Vulkan drivers for other GPUs in Mesa — not to try to get the Lavapipe frontend to work on top of R600g — for many reasons. I already have a system for state management that takes into account that the radeon kernel driver accepts only one indirect buffer per submission, that has its own fine-grained, close to metal “atoms”, and tries to convert both pipeline state objects and dynamic state to registers as directly as possible, so I don’t want the CPU performance impact of both Gallium and R600g here (which in my gaming experience seems prohibitively high compared to what I remember from playing games on Windows in the late 2000s on an even slower CPU, although I haven’t profiled it yet so I’m not sure if R600g is to blame). In addition, I see some very questionable likely temporary bug workarounds such as flushing the indirect buffer before any compute dispatches, the driver appearing to be very unstable with virtual memory, and I want to avoid introducing those issues in the first place. Plus, while R600g offers pretty good OpenGL support on R6xx/R7xx, there’s no way even basic Vulkan can work on those GPUs as it strictly requires Direct3D 11 compute functionality (most importantly RATs), and I have no plans to make a VC4-like minefield of a subset implementation for R6xx/R7xx as that’d be even more pointless than my idea of a Vulkan driver for R8xx/R9xx itself, so I want to avoid the maintenance costs of having R6xx/R7xx paths in the Vulkan code.

I have tried to remove some of the dependencies on the Gallium header files, though I quickly realized that it was going to require a huge amount of work and that it was still too early as that would also involve adding binding and constant buffer lowerings, so that’s largely out of scope of this merge request. One of the header dependencies — pipe_stream_output_info — was even kept intentionally for now for a reason outside R600g: the Nine and D3D10UMD frontends don’t seem to be setting up nir_xfb_info currently (maybe they’re even broken on RadeonSI here now), so it’s possible that pipe_stream_output_info will be used for a while even in the Vulkan driver itself unless some unification in this area is done across the Mesa infrastructure before I get to transform feedback.

However, with this change, SFN doesn’t have any link-time dependencies on Gallium functions and variables anymore. That was something that couldn’t have been avoided in a clean way, because my Vulkan driver is placed in the amd directory, but since RadeonSI and R600g depend on it, it’s processed before Gallium — including before its auxiliary parts. Thankfully, there were just two functions that needed to be removed.

In this merge request, two kinds of commits are included:

  • Various small changes moving things around:
    • Placing large parts of r600_shader_from_nir that interact purely with NIR and SFN, but not with Gallium, in separate functions.
    • Separation of r600_shader into largely hardware-related parts (r600_shader_common.h) and Gallium-specific fields (r600_shader.h).
    • Moving the tiny remaining Gallium dependencies from r600_asm and r600_isa to other files.
    • Some minor changes, such as making it possible to build SFN without having the R600g directory in the include search paths.
  • A large commit not just moving the call to tgsi_get_gl_varying_semantic (that requires linking to the Gallium auxiliary library), but completely removing the use of TGSI semantics from SFN and most of R600g, in favor of what’s offered by the common compiler/shader_enums.h file: gl_varying_slot, gl_system_value, and gl_frag_result. Those mostly correspond 1:1 to TGSI semantics, so the conversion of them into the TGSI_SEMANTIC enumeration is redundant.

Most of these changes don’t introduce any changes to the behavior of the driver. The switch from TGSI_SEMANTIC to gl_varying_slot and friends though is slightly more than a purely stylistic change. The hardware semantic IDs are simply gl_varying_slot plus 1 now, but those are just arbitrary 8-bit names if I understand correctly, so that shouldn’t change any logic.

However, I removed the ps_prim_id_input/prim_id_out field and its override_spi_sid in vertex shader outputs. As far as I understand, the only place where the primitive ID SPI semantic is configured is SFN — and there it’s always calculated as usual for TGSI_SEMANTIC_PRIMID 0, so there’s no way it can be different from that currently. So I just erased it (because I don’t use SPI semantic IDs within SFN itself anymore, only calculating them to fill the r600_shader), with the assumption that the SPI SID for the primitive ID is always calculated as normal by SFN for VARYING_SLOT_PRIMITIVE_ID.

Additionally, I cleaned up how the SPI_VS_OUT_ID registers are configured, which define how export parameter IDs are mapped to SPI semantic IDs. Previously, there were two places that calculated the export parameter IDs:

  • r600/evergreen_update_vs_state, which considers an output a parameter (and increments the ID) if it has a SPI semantic ID.
  • Shader::scan_shader in SFN, that considers an output a parameter if set_is_param was called for it.

The sequences of parameter indices must match exactly between the shader’s export instructions and SPI_VS_OUT_ID. The previous code, however, made some implicit assumptions about certain outputs having both spi_sid and is_param, and the other ones not having any of the two. But that seemed pretty dangerous to me, and now because the switch from tgsi_semantic to gl_varying_slot might have introduced some changes to which outputs get a SPI SID and which don’t (though most likely that was unaffected in reality), I thought it would’ve been better to remove the two separate conditions and instead centralize the assignment of the export parameter IDs: now Shader::scan_shader calculates the export_param field, and both export instruction emission (with an assertion) and SPI_VS_OUT_ID explicitly use that index.

There may be some similar implicit assumptions on the pixel shader side, but that’s more complicated (including having differences between R6xx/R7xx and R8xx/R9xx, and I don’t have any devices from the former generation for testing), so that part was not in the scope of this merge request, however.

I haven’t been able to test this change properly so far unfortunately, but I made an attempt, and though the results were horrible, they were the same regardless of whether these changes were applied or not, so I think they shouldn’t introduce any regressions. My development PC currently has 3 GB of RAM (I’ll upgrade it in a week), and building Piglit on multiple cores always triggered the OOM killer at some point, so apparently some of the tests weren’t built, and for many I was getting a “permission denied” error when Piglit attempted to execute them. So there were lots of failures when I tried running the command that Gert Wollny usually runs that can be attributed to issues with the test suite itself rather than the driver. At one point I got a GPU hang, however, both with and without the changes, so that’s where the testing ended for me. Nevertheless, regardless of whether this merge request is applied, I had 502 failures and a few crashes on the Radeon HD 6850 (R8xx Barts) before that hang. Interestingly, Mesa 23.1.4 from my distribution’s repository didn’t trigger that hang, and produced a lot fewer failures if I recall correctly.

With the switch away from TGSI semantics, there were also changes to how inputs and outputs are printed in the disassembly and parsed in the tests. I updated the SFN tests accordingly and ran all of them, all passed successfully.

Merge request reports