mesa issueshttps://gitlab.freedesktop.org/mesa/mesa/-/issues2023-06-01T09:35:15Zhttps://gitlab.freedesktop.org/mesa/mesa/-/issues/7225removing tgsi to llvm compiler2023-06-01T09:35:15ZYonggang Luoremoving tgsi to llvm compilerrationale:
anholt lygstate: as far as I know, the only blocker is that lower_int_to_float cause array indexing to be floats, which gallivm_nir can't understand. But I think we should be able to pretty easily handle that by looking at ...rationale:
anholt lygstate: as far as I know, the only blocker is that lower_int_to_float cause array indexing to be floats, which gallivm_nir can't understand. But I think we should be able to pretty easily handle that by looking at options->no_integers in the gallivm_nir frontendhttps://gitlab.freedesktop.org/mesa/mesa/-/issues/6307nir_to_tgsi: Skip FLRs before passing to ARL, detect and emit ARR2022-04-12T00:57:07ZEmma Anholtemma@anholt.netnir_to_tgsi: Skip FLRs before passing to ARL, detect and emit ARRFor drivers using NTT and !`CAP_INTEGERS`, we should detect when the `ARL` argument is an SSA `ffloor` and skip the `ffloor`, and detect when the `ARL` argument is an SSA `fround_even` and emit an `ARR` of the source.For drivers using NTT and !`CAP_INTEGERS`, we should detect when the `ARL` argument is an SSA `ffloor` and skip the `ffloor`, and detect when the `ARL` argument is an SSA `fround_even` and emit an `ARR` of the source.https://gitlab.freedesktop.org/mesa/mesa/-/issues/6096nir_to_tgsi: Fold comparisons2022-03-03T20:09:21ZEmma Anholtemma@anholt.netnir_to_tgsi: Fold comparisonsRight now if you have `if a != 0 { ... }` in your shader, we'll get a TGSI `SNE b, a, 0; IF b` sequence, instead of `IF a`.
Fix:
- At `nir_if` emit time look at the SSA source of b and fold comparisons into the IF when possible.
- Befor...Right now if you have `if a != 0 { ... }` in your shader, we'll get a TGSI `SNE b, a, 0; IF b` sequence, instead of `IF a`.
Fix:
- At `nir_if` emit time look at the SSA source of b and fold comparisons into the IF when possible.
- Before ntt's RA, do a DCE pass on ALU instructions writing a dst that's never read.
Similar folding would also be useful for bcsel, fcsel and discard_if.https://gitlab.freedesktop.org/mesa/mesa/-/issues/6089Eliminate more redundant operations on vectored platforms2022-03-01T02:27:08ZIan RomanickEliminate more redundant operations on vectored platforms(NOTE: I only tagged TGSI because I think this enhancement would most likely help platforms using NIR-to-TGSI.)
While trying to investigate a solution for #6038, I noticed this NIR in the output of fs-temp-array-mat4-index-col-row-wr.sh...(NOTE: I only tagged TGSI because I think this enhancement would most likely help platforms using NIR-to-TGSI.)
While trying to investigate a solution for #6038, I noticed this NIR in the output of fs-temp-array-mat4-index-col-row-wr.shader_test on my R430.
```
vec3 32 ssa_10 = load_const (0x3f800000, 0x40000000, 0x40000000) = (1.000000, 2.000000, 2.000000)
vec3 1 ssa_11 = flt ssa_9.xxx, ssa_10
...
vec4 32 ssa_16 = load_const (0x40000000, 0x40000000, 0x40000000, 0x40000000) = (2.000000, 2.000000, 2.000000, 2.000000)
vec4 1 ssa_17 = flt ssa_9.xxxx, ssa_16
vec4 32 ssa_18 = bcsel ssa_17, ssa_1, ssa_8
...
vec4 32 ssa_22 = bcsel ssa_11.xxxx, ssa_1, ssa_12
...
vec4 32 ssa_28 = bcsel ssa_11.zzzz, ssa_22, ssa_18
```
Ideally, this should get reduced to
```
vec2 32 ssa_10 = load_const (0x3f800000, 0x40000000) = (1.000000, 2.000000)
vec2 1 ssa_11 = flt ssa_9.xx, ssa_10
...
vec4 32 ssa_22 = bcsel ssa_11.xxxx, ssa_1, ssa_12
...
vec4 32 ssa_28 = bcsel ssa_11.yyyy, ssa_22, ssa_8
```
I _think_ this can be achieved without too much difficulty. It seems like adding a pass that tries to narror vector operations would be the most important thing. That would perform a first reduction to
```
vec2 32 ssa_10 = load_const (0x3f800000, 0x40000000) = (1.000000, 2.000000)
vec2 1 ssa_11 = flt ssa_9.xx, ssa_10
...
vec1 32 ssa_16 = load_const (0x40000000) = (2.000000)
vec1 1 ssa_17 = flt ssa_9.x, ssa_16
vec4 32 ssa_18 = bcsel ssa_17.xxxx, ssa_1, ssa_8
...
vec4 32 ssa_22 = bcsel ssa_11.xxxx, ssa_1, ssa_12
...
vec4 32 ssa_28 = bcsel ssa_11.yyyy, ssa_22, ssa_18
```
A good first step of this would probably be to just narrow constants. Then it should be easy to detect redundant channel operations in something like
```
vec2 32 ssa_10 = load_const (0x3f800000, 0x40000000) = (1.000000, 2.000000)
vec3 1 ssa_11 = flt ssa_9.xxx, ssa_10.xyy
...
```
Then, possibly with some enhancements, `nir_opt_vectorize` (and another run of the narrowing pass) could reduce it to
```
vec2 32 ssa_10 = load_const (0x3f800000, 0x40000000) = (1.000000, 2.000000)
vec2 1 ssa_11 = flt ssa_9.xx, ssa_10
...
vec4 32 ssa_18 = bcsel ssa_11.yyyy, ssa_1, ssa_8
...
vec4 32 ssa_22 = bcsel ssa_11.xxxx, ssa_1, ssa_12
...
vec4 32 ssa_28 = bcsel ssa_11.yyyy, ssa_22, ssa_18
```
Finally, an obvious algrebraic optimization would take care of the rest.
```
# In the innermost bcsel, 'a' must be false.
(('bcsel', a, b, ('bcsel', c, ('bcsel', a, d, e), 'f')),
('bcsel', a, b, ('bcsel', c, e , 'f'))),
```
It might also simplify things to have a pass that tries to detect scalar constants that are subsets of existing vector constants. That would allow in intermediate step that converts
```
vec2 32 ssa_10 = load_const (0x3f800000, 0x40000000) = (1.000000, 2.000000)
vec2 1 ssa_11 = flt ssa_9.xx, ssa_10
...
vec1 32 ssa_16 = load_const (0x40000000) = (2.000000)
vec1 1 ssa_17 = flt ssa_9.x, ssa_16
vec4 32 ssa_18 = bcsel ssa_17.xxxx, ssa_1, ssa_8
...
vec4 32 ssa_22 = bcsel ssa_11.xxxx, ssa_1, ssa_12
...
vec4 32 ssa_28 = bcsel ssa_11.yyyy, ssa_22, ssa_18
```
into
```
vec2 32 ssa_10 = load_const (0x3f800000, 0x40000000) = (1.000000, 2.000000)
vec2 1 ssa_11 = flt ssa_9.xx, ssa_10
...
vec1 1 ssa_17 = flt ssa_9.x, ssa_10.y
vec4 32 ssa_18 = bcsel ssa_17.xxxx, ssa_1, ssa_8
...
vec4 32 ssa_22 = bcsel ssa_11.xxxx, ssa_1, ssa_12
...
vec4 32 ssa_28 = bcsel ssa_11.yyyy, ssa_22, ssa_18
```
It's tempting to try a vector CSE pass here, but I suspect that would be more work. It might also have other benefits.
- [ ] Implement general pass to eliminate redundant channels from vector constants.
- [ ] Modify the previous pass to eliminate redundant channels from vector operations.
- [ ] Enhance either CSE or constant propagation, if necessary, to replace scalar or small vector constants with swizzled components of larger vector constants. This might already "just work."
- [ ] Enhance `nir_opt_vectorize` or CSE to replace scalar or small vector ALU operations with swizzled components of larger vector ALU operations.https://gitlab.freedesktop.org/mesa/mesa/-/issues/6022move nir_opt_shrink_stores() (and nir_opt_shrink_vectors()) out of optimizati...2022-02-11T19:24:52ZDaniel Schürmannmove nir_opt_shrink_stores() (and nir_opt_shrink_vectors()) out of optimization loopWith !14480 being merged, there is no need to execute `nir_opt_shrink_stores()` in the optimization loop anymore. With `nir_opt_shrink_vectors()`, I'm not entirely sure if repeated execution has an effect, and that probably depends on th...With !14480 being merged, there is no need to execute `nir_opt_shrink_stores()` in the optimization loop anymore. With `nir_opt_shrink_vectors()`, I'm not entirely sure if repeated execution has an effect, and that probably depends on the backend, whether the arch is scalar and how often the optimization loop is called, but it should be worth trying.
- nir_opt_shrink_stores should probably be called before and
- nir_opt_shrink_vectors after the optimization loop
Please also note !12468 which increases the runtime of `nir_opt_shrink_vectors()`.https://gitlab.freedesktop.org/mesa/mesa/-/issues/4099clang compiler errors for d3d12 & tgsi_ureg.c2023-03-15T18:13:08ZMārtiņš Možeikoclang compiler errors for d3d12 & tgsi_ureg.cWhen building with clang-cl on Windows I get two errors:
```
../mesa.src/src/gallium/auxiliary/tgsi/tgsi_ureg.c(481,1): error: conflicting types for 'ureg_DECL_output_masked'
ureg_DECL_output_masked(struct ureg_program *ureg,
^
..\mesa....When building with clang-cl on Windows I get two errors:
```
../mesa.src/src/gallium/auxiliary/tgsi/tgsi_ureg.c(481,1): error: conflicting types for 'ureg_DECL_output_masked'
ureg_DECL_output_masked(struct ureg_program *ureg,
^
..\mesa.src\src\gallium\auxiliary\tgsi/tgsi_ureg.h(261,1): note: previous declaration is here
ureg_DECL_output_masked(struct ureg_program *,
^
```
and
```
../mesa.src/src/gallium/winsys/d3d12/wgl/d3d12_wgl_winsys.c(34,1): error: conflicting types for 'd3d12_wgl_create_screen'
d3d12_wgl_create_screen(struct sw_winsys *winsys, HDC hDC)
^
..\mesa.src\src\gallium\winsys\d3d12\wgl/d3d12_wgl_public.h(39,1): note: previous declaration is here
d3d12_wgl_create_screen(struct sw_winsys *winsys,
```
First one is fixed by changing `name` argument type from incorrect `unsigned` to correct `enum tgsi_semantic` for `ureg_DECL_output_masked` function in `src/gallium/auxiliary/tgsi/tgsi_ureg.c` file.
Second one is fixed by changing incorrect struct forward declaration (typo) in `src/gallium/winsys/d3d12/wgl/d3d12_wgl_public.h` file - from `struct stw_winsys;` to `struct sw_winsys;`