mesa issues
https://gitlab.freedesktop.org/mesa/mesa/-/issues
2024-02-12T21:22:05Z
https://gitlab.freedesktop.org/mesa/mesa/-/issues/10565
Intel/fs: improve CSE for sampler messages
2024-02-12T21:22:05Z
Lionel Landwerlin
Intel/fs: improve CSE for sampler messages
Here is a pattern we see a lot with Anv when the sampler used is not in the sampler binding table. We have to load the sampler handle in the header at component 3 :
```
( 6) (W) mov (8|M0) r28.0<1>:ud r0.0<8;8,1...
Here is a pattern we see a lot with Anv when the sampler used is not in the sampler binding table. We have to load the sampler handle in the header at component 3 :
```
( 6) (W) mov (8|M0) r28.0<1>:ud r0.0<8;8,1>:ud
( 13) (W) mov (1|M0) r28.3<1>:ud r5.2<0;1,0>:ud {I@4}
( 31) send.smpl (8|M0) r123 r28 r32:2 a0.1 0x024A00FC {ExBSO,A@1,$1} // wr:1h+2, rd:4; simd8 sample using sampler index 0
... (later in the same block)
( 11) (W) mov (8|M0) r30.0<1>:ud r0.0<8;8,1>:ud
( 25) (W) mov (1|M0) r30.3<1>:ud r5.2<0;1,0>:ud {I@2}
( 42) send.smpl (8|M0) r11 r30 r31:3 a0.1 0x024A20FC {ExBSO,A@1,$2} // wr:1h+3, rd:4; simd8 sample override LOD using sampler index 0
```
We're essentially rebuilding the same header twice for no reason. `r28` & `r30` are not used for anything else.
CSE has a condition that avoids it from working with partial writes : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/compiler/brw_fs_cse.cpp?ref_type=heads#L257
I think this is the root of the problem here.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/10035
intel/compiler: add function support
2023-10-24T04:14:48Z
Dave Airlie
intel/compiler: add function support
For proper rusticl support we need function calling support in the backends via the nir driver_functions option.
This is just a tracker issue to make sure it's on the project's radar.
I can find a little bit of info in:
https://github....
For proper rusticl support we need function calling support in the backends via the nir driver_functions option.
This is just a tracker issue to make sure it's on the project's radar.
I can find a little bit of info in:
https://github.com/intel/intel-graphics-compiler/blob/master/documentation/visa/3_execution_model.md
but someone might know if there's an Intel defined calling convention that we could be compatible with in Mesa, not sure how much the need to be compatible is, but it might be good to just copy it.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9999
Anv: starfield create compute pipeline times out.
2023-10-19T19:10:06Z
Michael Mestnik
Anv: starfield create compute pipeline times out.
Conclusion: wine/vk3d is not giving `fs_reg_alloc::assign_regs` enough time. Making this faster or having wine give more time are the solutions that could be pursued. Any other suggestions, should this be closed?
Requires https://gitl...
Conclusion: wine/vk3d is not giving `fs_reg_alloc::assign_regs` enough time. Making this faster or having wine give more time are the solutions that could be pursued. Any other suggestions, should this be closed?
Requires https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512
```bash
export ENABLE_VK_LAYER_VALVE_cheako_shader_capture_1=1
export VK_LOADER_DEBUG=error,warn,layer
export VKD3D_FEATURE_LEVEL=12_1
export VKD3D_SHADER_MODEL=6_6
export VK_INSTANCE_LAYERS="${VK_INSTANCE_LAYERS}${VK_INSTANCE_LAYERS:+:}VK_LAYER_LUNARG_api_dump"
```
System specs: #9814
Edit: Forgot about this:
```text
cheako@mx1:~$ cat .drirc
<?xml version="1.0" standalone="yes"?>
<driconf>
<device driver="anv">
<application name="Starfield" executable="Starfield.exe">
<!-- option name="force_vk_vendor" value="0x1002" /-->
<option name="shader_spilling_rate" value="15" />
</application>
</device>
</driconf>
```
[shaders.tar.xz](/uploads/74914e295caf3194e63aebb8a19b82c6/shaders.tar.xz)
This should be enough to recreate, I'll next try and turn this into an application.
Edit: forgot to add stderr [log.err](/uploads/a35b199fe67c6193d61a48d9341eb445/log.err)
```rust
// Copyright (C) 2023 Michael Mestnik <cheako@mikemestnik.net>
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <https://www.gnu.org/licenses/>.
use core::slice::SlicePattern;
use std::fs::File;
#[allow(unused_imports)]
use std::hash::{Hash, Hasher};
use std::io::Write;
use std::sync::atomic::Ordering;
use std::sync::{mpsc, Arc};
use std::{panic::catch_unwind, sync::atomic::AtomicUsize};
use std::{slice, thread};
use ash::vk::{
self, AllocationCallbacks, ComputePipelineCreateInfo, Device, Pipeline, PipelineCache,
PipelineShaderStageCreateFlags, ShaderModule, ShaderModuleCreateFlags, ShaderStageFlags,
};
static CTR: AtomicUsize = AtomicUsize::new(0);
struct ShaderData(
Arc<[u32]>,
ShaderModuleCreateFlags,
ShaderModule,
ShaderStageFlags,
PipelineShaderStageCreateFlags,
Box<str>,
bool,
);
impl std::fmt::Debug for ShaderData {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
// let mut a = std::collections::hash_map::DefaultHasher::new();
f.debug_struct("ShaderData")
.field("module_crate_flags", &self.1.as_raw())
.field("stage_flags", &self.3.as_raw())
.field("state_create_flags", &self.4.as_raw())
.field("name", &self.5)
.field("bool", &self.6)
.finish()
}
}
#[derive(Debug)]
struct MyDumper {
ctr: usize,
create_infos: Box<[(ShaderData, vk::PipelineLayout)]>,
}
fn process_stage_create_info(
stage_create_info: &vk::PipelineShaderStageCreateInfo,
) -> Option<ShaderData> {
super::shader_module::get_spirv(&stage_create_info.module).map(|(spirv, shader_flags)| {
ShaderData(
spirv,
shader_flags,
stage_create_info.module,
stage_create_info.stage,
stage_create_info.flags,
unsafe { std::ffi::CStr::from_ptr(stage_create_info.p_name) }
.to_str()
.unwrap()
.into(),
unsafe { stage_create_info.p_specialization_info.as_ref() }.is_some(),
)
})
}
pub(crate) unsafe extern "system" fn create_compute_pipelines(
device: Device,
pipeline_cache: PipelineCache,
create_info_count: u32,
p_create_infos: *const ComputePipelineCreateInfo,
p_allocator: *const AllocationCallbacks,
p_pipelines: *mut Pipeline,
) -> vk::Result {
let result = catch_unwind(|| {
let create_infos = unsafe { slice::from_raw_parts(p_create_infos, create_info_count as _) };
let (tx, rx) = mpsc::channel();
let ctr = CTR.fetch_add(1, Ordering::SeqCst);
let my_data = MyDumper {
ctr,
create_infos: create_infos
.iter()
.flat_map(|create_info| {
if !create_info.p_next.is_null() {
dbg!("ComputePipelineCreateInfo has next".len());
}
Some(create_info.stage)
.iter()
.filter_map(process_stage_create_info)
.map(|x| (x, create_info.layout))
.take_while(|_| create_info.p_next.is_null())
.next()
})
.collect(),
};
let _ = thread::spawn(move || {
if rx.recv_timeout(std::time::Duration::from_secs(15))
== Err(mpsc::RecvTimeoutError::Timeout)
{
let _ = dbg!(&my_data);
let ctr = my_data.ctr;
my_data.create_infos.iter().enumerate().for_each(|x| {
let mut f = File::create(format!("/tmp/{}-{}.bin", &ctr, x.0)).unwrap();
let data =
x.1 .0
.0
.iter()
.copied()
.flat_map(u32::to_ne_bytes)
.collect::<Box<_>>();
let _ = f.write_all(data.as_slice());
});
};
});
let result = unsafe {
super::DEVICE
.read()
.unwrap()
.get(&device)
.unwrap()
.device
.create_compute_pipelines(pipeline_cache, create_infos, p_allocator.as_ref())
};
let _ = tx.send(());
let (x, ret) = match result.map(|x| (x, vk::Result::SUCCESS)) {
Ok(x) => x,
Err(x) => x,
};
for (i, pipeline) in x.into_iter().take(create_infos.len()).enumerate() {
unsafe { *p_pipelines.add(i) = pipeline }
}
ret
});
result.unwrap()
}
```
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9960
[Intel][Vulkan][Gen12] Vulkan compute shader is 3x slower than the same OpenC...
2023-10-09T14:25:24Z
Shao Jiawei
[Intel][Vulkan][Gen12] Vulkan compute shader is 3x slower than the same OpenCL kernel
Hi Intel Mesa team,
This issue comes from Google: https://bugs.chromium.org/p/tint/issues/detail?id=2049
When porting an OpenCL kernel into the exactly equivalent Vulkan compute shader, we find the Vulkan compute shader version is abou...
Hi Intel Mesa team,
This issue comes from Google: https://bugs.chromium.org/p/tint/issues/detail?id=2049
When porting an OpenCL kernel into the exactly equivalent Vulkan compute shader, we find the Vulkan compute shader version is about 3x slower than the OpenCL one on Linux Intel Mesa driver. Could you help us investigate if we can improve the Mesa ISA code generator for better performance of the Vulkan compute shader?
Steps to reproduce:
1. Download and extract [Vulkan_OpenCL.zip](/uploads/b39719eeb52750cf176bb59906a227f3/Vulkan_OpenCL.zip).
2. Run the OpenCL application `HelloOpenCL` in OpenCLTest/Build/ (or build it with CMake at OpenCLTest/)
3. Run the Vulkan application `VulkanTest` in VulkanTest/ (or build it by executing ./compile.sh at VulkanTest/)
You can see the Vulkan application runs 3x slower than the OpenCL one.
Platform information:
- OS: Ubuntu 23.04.1 (Kernel version: 6.2.0-32-generic)
- GPU: Intel(R) Xe Graphics (TGL GT2) (device ID: 0x9A49)
- Mesa driver version: 23.0.4
After checking the generated ISA from the OpenCL and Mesa driver, we find the Mesa driver generated much worse ISA than the OpenCL one. The Mesa compiler reports that it is using the `lifo` register scheduler which is the worst of the 4 options: `top-down`,`non-lifo`,`none`,`lifo`, then the ISA is not that efficient:
Below is part of the ISA of the Vulkan compute shader on Mesa driver
```
// There is write-read dependency among the `send` instruction and the following `mad` instructions on g9-g16 registers.
send(16) g9UD g69UD nullUD 0x048050fe 0x00000000
dp data 1 MsgDesc: (untyped surface read, Surface = 254, SIMD16, Mask = 0x0)
mlen 2 ex_mlen 0 rlen 8 { align1 1H @1 $9 };
sync nop(1) null<0,1,0>UB { align1 WE_all 1N $9.dst };
mad(16) g69<1>F g123<8,8,1>F g3<8,8,1>F g9<1,1,1>F { align1 1H @4 $7.dst compacted };
mad(16) g71<1>F g125<8,8,1>F g3<8,8,1>F g11<1,1,1>F { align1 1H @4 $9.dst compacted };
mad(16) g73<1>F g65<8,8,1>F g3<8,8,1>F g13<1,1,1>F { align1 1H @4 $9.dst compacted };
mad(16) g75<1>F g67<8,8,1>F g3<8,8,1>F g15<1,1,1>F { align1 1H @4 $9.dst compacted };
```
Below is the part of the ISA of the OpenCL kernel on the Linux Intel OpenCL driver
```
// load two vec4s into one register
(W&f0.0.any16h) send.dc0 (8|M0) r8 r2 null 0x0 0x021802FE {@7,$13} // wr:1h+0, rd:1; oword block read x2
...
// then, later down, because the initial load is a preload
...
// then use 1 float from r8 in each of 8 `mad` instructions
mad (16|M0) acc0.0<1>:f r72.0<8;1>:f r12.0<8;1>:f r8.0<0>:f {Compacted,$11.dst}
mad (16|M0) r2.0<1>:f r58.0<8;1>:f r12.0<8;1>:f r8.1<0>:f {Compacted,$14.src}
mad (16|M0) r46.0<1>:f r56.0<8;1>:f r12.0<8;1>:f r8.2<0>:f {Compacted}
mad (16|M0) r32.0<1>:f r54.0<8;1>:f r12.0<8;1>:f r8.3<0>:f {Compacted,$15.src}
...
mad (16|M0) acc0.0<1>:f acc0.0<8;1>:f r14.0<8;1>:f r8.4<0>:massage:
mad (16|M0) r72.0<1>:f r2.0<8;1>:f r14.0<8;1>:f r8.5<0>:f {Compacted}
mad (16|M0) r74.0<1>:f r46.0<8;1>:f r14.0<8;1>:f r8.6<0>:f {Compacted}
mad (16|M0) r76.0<1>:f r32.0<8;1>:f r14.0<8;1>:f r8.7<0>:f {Compacted}
```
The ISA for `HelloOpencL` is OpenCLTest/HelloOpenCL.isa and the ISA for `VulkanTest` is `VulkanTest/VulkanTest.isa`.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9769
anv: dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite and dEQP-VK....
2023-09-28T22:54:38Z
Matt Turner
anv: dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite and dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store very slow
These two tests
- `dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite`
- `dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store`
Take 15+ seconds on an unloaded system. When run in parallel with other tests they t...
These two tests
- `dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite`
- `dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store`
Take 15+ seconds on an unloaded system. When run in parallel with other tests they time out. `perf` says it's `ra_allocate` and `INTEL_DEBUG=fs` confirms there is a shader in each test with a huge amount of spilling.
## `dEQP-VK.graphicsfuzz.spv-stable-maze-flatten-copy-composite`
### `perf report`
```
# Overhead Command Shared Object Symbol
# ........ .............. .................... .......................................
#
78.47% deqp-vk libvulkan_intel.so [.] ra_allocate
6.43% deqp-vk libvulkan_intel.so [.] ra_reset_node_interference
4.06% deqp-vk libvulkan_intel.so [.] fs_reg_alloc::spill_reg
1.50% deqp-vk libvulkan_intel.so [.] backend_instruction::insert_before
1.27% deqp-vk libvulkan_intel.so [.] ra_add_node_interference
1.20% deqp-vk libvulkan_intel.so [.] ra_get_best_spill_node
0.81% deqp-vk libvulkan_intel.so [.] set_search
0.45% deqp-vk libvulkan_intel.so [.] ra_add_node_adjacency
```
### Shaders
```
Native code for unnamed fragment shader (null) (src_hash 0x73ffc2a4) (sha1 6ded761b90df448f3fc5e7cd12dfa3fcfe9ad472)
SIMD8 shader: 29840 instructions. 2 loops. 2561812 cycles. 1531:1936 spills:fills, 1 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 477440 to 450064 bytes (6%)
```
## `dEQP-VK.graphicsfuzz.spv-stable-pillars-volatile-nontemporal-store`
### `perf report`
```
# Overhead Command Shared Object Symbol
# ........ .............. .................... .......................................
#
74.57% deqp-vk libvulkan_intel.so [.] ra_allocate
7.48% deqp-vk libvulkan_intel.so [.] ra_reset_node_interference
6.45% deqp-vk libvulkan_intel.so [.] backend_instruction::insert_before
2.93% deqp-vk libvulkan_intel.so [.] fs_reg_alloc::spill_reg
1.61% deqp-vk libvulkan_intel.so [.] ra_get_best_spill_node
1.00% deqp-vk libvulkan_intel.so [.] ra_add_node_interference
0.72% deqp-vk libvulkan_intel.so [.] set_search
```
### Shaders
```
Native code for unnamed fragment shader (null) (src_hash 0x310cd128) (sha1 225f71b11e1dd90cdba95667618f84441b7b1dc3)
SIMD8 shader: 34592 instructions. 1 loops. 2825222 cycles. 1835:2346 spills:fills, 3 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 553472 to 524944 bytes (5%)
```
Google bug: b/298123643
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9732
intel/compiler: Need a pass to merge CFG blocks
2023-09-08T21:48:52Z
Ian Romanick
intel/compiler: Need a pass to merge CFG blocks
While working on something unrelated, I stumbled on a case where reconstructing the CFG right before scheduling produced a different schedule. One affected shader was `kerbal-space-program/1017.shader_test`. Below is a diff of the two sh...
While working on something unrelated, I stumbled on a case where reconstructing the CFG right before scheduling produced a different schedule. One affected shader was `kerbal-space-program/1017.shader_test`. Below is a diff of the two shaders.
```
@@ -340,16 +340,15 @@
mul(8) g6<1>F g17<8,8,1>F g4.7<0,1,0>F { align1 1Q compacted };
mul(8) g7<1>F g18<8,8,1>F g4.7<0,1,0>F { align1 1Q compacted };
END B0 ->B1
- START B1 <-B0 <-B2 (140 cycles)
+ START B2 <-B1 <-B3 (260 cycles)
LABEL1:
mov(1) g63<1>D 1077936128D { align1 WE_all 1N };
-mov(1) g63.1<1>D 1073741824D { align1 WE_all 1N };
END B1 ->B2 ->B4
- START B2 <-B1 <-B3 (240 cycles)
cmp.ge.f0.0(8) null<1>D g3<8,8,1>D 60D { align1 1Q compacted };
+mov(1) g63.1<1>D 1073741824D { align1 WE_all 1N };
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0 { align1 1Q };
- END B2 ->B1 ->B4 ->B3 ->B3
+ END B2 ->B1 ->B4 ->B3
START B3 <-B2 (10180 cycles)
mul(8) g22<1>D g3<8,8,1>D 12W { align1 1Q };
add(8) g3<1>D g3<8,8,1>D 1D { align1 1Q compacted };
```
In the before shader, this segment
```
START B1 <-B0 <-B2 (140 cycles)
LABEL1:
mov(1) g63<1>D 1077936128D { align1 WE_all 1N };
mov(1) g63.1<1>D 1073741824D { align1 WE_all 1N };
END B1 ->B2 ->B4
START B2 <-B1 <-B3 (240 cycles)
cmp.ge.f0.0(8) null<1>D g3<8,8,1>D 60D { align1 1Q compacted };
(+f0.0) break(8) JIP: LABEL0 UIP: LABEL0 { align1 1Q };
END B2 ->B1 ->B4 ->B3 ->B3
```
had two blocks that are a single block in the reconstructed CFG. I believe `dead_control_flow_eliminate` should be enhanced to fuse these two blocks.
@kwg
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9716
Don't NIH lower_fquantize2f16
2023-10-09T16:31:34Z
Alyssa Rosenzweig
Don't NIH lower_fquantize2f16
The following drivers have fquantize2f16 lowerings equivalent to what we have in common as `lower_fquantize2f16`. They should switch to the common versions.
Drivers that use some other lowering are omitted as switching could regress cod...
The following drivers have fquantize2f16 lowerings equivalent to what we have in common as `lower_fquantize2f16`. They should switch to the common versions.
Drivers that use some other lowering are omitted as switching could regress codegen quality for them.
- [x] broadcom (!24924)
- [ ] ac/llvm
- [x] gallivm (!24988)
- [ ] intel/fs (!25552)
- [ ] intel/vec4 (!25552)
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9714
intel: Scheduler issues with piano trace
2023-08-29T06:05:44Z
Ian Romanick
intel: Scheduler issues with piano trace
Per my notes in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698#note_2059913, there is some issue with scheduling and the piano trace. Making changes to register allocation (which affects scheduling) or to the scheduler mo...
Per my notes in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7698#note_2059913, there is some issue with scheduling and the piano trace. Making changes to register allocation (which affects scheduling) or to the scheduler mode causes the trace to render differently. That should not be.
I was able to affect changes in the rendering output across !7698 on both Zink and Iris. Changing the scheduler mode was only tested on Iris.
To get the trace, use `replayer.py` from piglit:
```
replayer.py download --download-url https://s3.freedesktop.org/mesa-tracie-public/ --db-path /tmp/replayer-db/ gputest/pixmark-piano-v2.trace
```
Then use `apitrace` to replay it. The `dump-images` and `diff-images` modes are useful.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9288
anv: compiler much slower than radv
2023-06-30T18:04:30Z
Emma Anholt
emma@anholt.net
anv: compiler much slower than radv
One of ChromeOS's big concerns for Steam is that Intel's fossilize step is much slower than on other comparable-CPU boards. To compare using some stuff I had on hand, on my workstation with release builds, using shader-db and https://gi...
One of ChromeOS's big concerns for Steam is that Intel's fossilize step is much slower than on other comparable-CPU boards. To compare using some stuff I had on hand, on my workstation with release builds, using shader-db and https://gitlab.freedesktop.org/anholt/shaders/-/tree/fossils together, with fozzilize-replay hacked to ignore renderpass compatibility (so we don't skip my angle/gfxbench fossils on radv), and intel drm-shim in place or not:
```
# time INTEL_STUB_GPU_PLATFORM=tgl MESA_SHADER_CACHE_DISABLE=1 mesagl irisgl releasegl fossilize-replay --enable-pipeline-stats anv.log fossils/**/*.foz fossils/closed/**/*.foz
Fossilize INFO: name: Intel(R) Xe Graphics (TGL GT2)
[...]
110.82s user 3.45s system 2862% cpu 3.991 total
time MESA_SHADER_CACHE_DISABLE=1 fossilize-replay --enable-pipeline-stats radv.log fossils/**/*.foz fossils/closed/**/*.foz
Fossilize INFO: name: AMD Radeon RX Vega (RADV VEGA10)
[...]
33.85s user 2.25s system 2363% cpu 1.528 total
```
**radv completes in 38% of the wall time / 30% of the CPU time of anv.**
I can share this closed shader-db with intel devs that might be looking into it.
<details><summary>intel debugoptimized perf report -g with NIR validation disabled</summary>
```
+ 92.08% 0.00% fossilize-repla fossilize-replay [.] ThreadedReplayer::worker_thread ▒
+ 91.41% 0.00% fossilize-repla fossilize-replay [.] ThreadedReplayer::run_creation_work_item ▒
+ 81.51% 0.00% fossilize-repla fossilize-replay [.] ThreadedReplayer::run_creation_work_item_graphics_iteration ▒
+ 81.47% 0.01% fossilize-repla libvulkan_intel.so [.] anv_graphics_pipeline_create ▒
+ 81.37% 0.01% fossilize-repla libvulkan_intel.so [.] anv_graphics_pipeline_compile.constprop.0 ▒
+ 81.32% 0.00% fossilize-repla libvulkan_intel.so [.] anv_CreateGraphicsPipelines ▒
+ 47.45% 0.01% fossilize-repla libvulkan_intel.so [.] brw_compile_fs ▒
+ 44.17% 0.01% fossilize-repla libvulkan_intel.so [.] fs_visitor::optimize ▒
+ 43.21% 0.00% fossilize-repla libvulkan_intel.so [.] fs_visitor::run_fs ▒
+ 18.26% 0.01% fossilize-repla libvulkan_intel.so [.] fs_visitor::allocate_registers ▒
+ 15.59% 0.06% fossilize-repla libvulkan_intel.so [.] brw_nir_optimize ▒
+ 14.53% 0.00% fossilize-repla libvulkan_intel.so [.] brw_compile_vs ▒
+ 11.98% 10.11% fossilize-repla libvulkan_intel.so [.] fs_inst::size_read ▒
+ 11.38% 0.00% fossilize-repla libvulkan_intel.so [.] fs_visitor::run_vs ▒
+ 10.61% 0.00% fossilize-repla libvulkan_intel.so [.] anv_pipeline_nir_preprocess.isra.0 ▒
+ 10.51% 0.44% fossilize-repla libvulkan_intel.so [.] brw::fs_live_variables::fs_live_variables ▒
+ 10.45% 0.01% fossilize-repla libvulkan_intel.so [.] brw_preprocess_nir ▒
+ 10.41% 0.00% fossilize-repla libvulkan_intel.so [.] anv_pipeline_compile_cs ▒
+ 9.88% 0.00% fossilize-repla fossilize-replay [.] ThreadedReplayer::run_creation_work_item_compute_iteration ▒
+ 9.87% 0.00% fossilize-repla libvulkan_intel.so [.] anv_CreateComputePipelines ▒
+ 9.50% 0.00% fossilize-repla libvulkan_intel.so [.] brw_compile_cs ▒
+ 9.26% 3.64% fossilize-repla libvulkan_intel.so [.] brw::fs_live_variables::setup_def_use ▒
+ 8.90% 0.00% fossilize-repla libvulkan_intel.so [.] fs_visitor::run_cs ▒
+ 8.80% 0.29% fossilize-repla libvulkan_intel.so [.] fs_visitor::opt_copy_propagation ▒
+ 8.48% 3.19% fossilize-repla libvulkan_intel.so [.] fs_visitor::validate ▒
+ 7.80% 0.00% fossilize-repla libvulkan_intel.so [.] fs_visitor::assign_regs ▒
+ 7.69% 0.00% fossilize-repla libvulkan_intel.so [.] fs_visitor::schedule_instructions ▒
+ 7.58% 0.01% fossilize-repla libvulkan_intel.so [.] instruction_scheduler::run ▒
+ 7.46% 5.79% fossilize-repla libvulkan_intel.so [.] fs_visitor::register_coalesce ▒
+ 6.95% 0.03% fossilize-repla libvulkan_intel.so [.] fs_reg_alloc::assign_regs ▒
+ 6.82% 1.21% fossilize-repla libvulkan_intel.so [.] fs_visitor::dead_code_eliminate ▒
+ 6.48% 0.00% fossilize-repla libvulkan_intel.so [.] fs_visitor::opt_cse ▒
+ 6.44% 5.44% fossilize-repla libvulkan_intel.so [.] fs_visitor::opt_copy_propagation_local ▒
+ 5.19% 0.01% fossilize-repla libvulkan_intel.so [.] brw_postprocess_nir ▒
+ 4.80% 0.59% fossilize-repla libvulkan_intel.so [.] nir_algebraic_impl ▒
+ 4.77% 2.11% fossilize-repla libvulkan_intel.so [.] regs_read ▒
+ 4.59% 0.01% fossilize-repla libvulkan_intel.so [.] nir_opt_algebraic ▒
```
</details>
https://gitlab.freedesktop.org/mesa/mesa/-/issues/9090
Enable support for integer MAD
2023-06-01T00:04:30Z
Ian Romanick
Enable support for integer MAD
Since Ice Lake, Intel GPUs have supported integer `MAD`, but there is no support for this in the Intel compiler.
- [ ] Add new `nir_op_imad_32x16p32` opcode. Perhaps this opcode should also be `_intel`?
- [ ] Either modify `brw_nir_opt_...
Since Ice Lake, Intel GPUs have supported integer `MAD`, but there is no support for this in the Intel compiler.
- [ ] Add new `nir_op_imad_32x16p32` opcode. Perhaps this opcode should also be `_intel`?
- [ ] Either modify `brw_nir_opt_peephole_ffma.c` or create a new pass to to merge `nir_iadd` and `nir_op_imul_32x16`. Require that there be no source modifiers. Initially require that none of the operands be constants.
- [ ] On Gfx12, allow the addend to be constant if the value can fit in 16 bits. ~~Aside from modifying the peephole pass, `brw_combine_constants` will need to be modified to know that mixed-size integer types are allowed on Gfx12.5. Currently all mixing is forbidden, but it's really only `F` and `HF` mixing that was removed from the hardware.~~ (see !23262)
https://gitlab.freedesktop.org/mesa/mesa/-/issues/8590
intel/fs: dead code eliminate unable to remove unused instructions
2023-03-22T23:45:52Z
Lionel Landwerlin
intel/fs: dead code eliminate unable to remove unused instructions
Filing this to get some feedback from @currojerez
Here is some intermediate intel/fs IR :
```
{ 89} 164: do(8) (null):UD,
...
{ 94} 773: undef(8) vgrf803<0>:UD,
{ 94} 774: mov(8) vgrf803+0.0<0>:UD, 64u NoMask
{ 95} 775: unalig...
Filing this to get some feedback from @currojerez
Here is some intermediate intel/fs IR :
```
{ 89} 164: do(8) (null):UD,
...
{ 94} 773: undef(8) vgrf803<0>:UD,
{ 94} 774: mov(8) vgrf803+0.0<0>:UD, 64u NoMask
{ 95} 775: unaligned_oword_block_read_logical(8) vgrf802:UD, 254u(null):UD, vgrf803+0.0<0>:UD(null):UD(null):UD, 8u(null):UD NoMask
{ 95} 776: add(8) vgrf803+0.0<0>:UD, vgrf803+0.0<0>:UD, 32u NoMask
* after this point vgrf803 is not used anymore *
....
{ 89} 806: while(8) (null):UD,
```
The dead code eleminate is unable to remove the ADD instruction and I don't really get why.
Maybe a problem with the live range analysis?
This appears to increase register pressure for no reason, since the variable is kept alive throughout the loop.
I'm running with patches from https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6946
nine: intel-whl: ERROR: src0 is null
2024-03-23T08:55:06Z
David Heidelberg
nine: intel-whl: ERROR: src0 is null
- job: https://gitlab.freedesktop.org/okias/mesa/-/jobs/25986259
- tests: https://github.com/iXit/nine-tests
```
...
2022-07-27 22:49:39.458298: succ get_rt_readback Failed to get surface desc, hr 0.
2022-07-27 22:49:39.458305: NIR (S...
- job: https://gitlab.freedesktop.org/okias/mesa/-/jobs/25986259
- tests: https://github.com/iXit/nine-tests
```
...
2022-07-27 22:49:39.458298: succ get_rt_readback Failed to get surface desc, hr 0.
2022-07-27 22:49:39.458305: NIR (SSA form) for vertex shader:
2022-07-27 22:49:39.458314: shader: MESA_SHADER_VERTEX
2022-07-27 22:49:39.458322: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:39.458330: name: TTN
2022-07-27 22:49:39.458337: inputs: 3
2022-07-27 22:49:39.458345: outputs: 4
2022-07-27 22:49:39.458352: uniforms: 0
2022-07-27 22:49:39.458359: ubos: 1
2022-07-27 22:49:39.458366: shared: 0
2022-07-27 22:49:39.458373: ray queries: 0
2022-07-27 22:49:39.458380: decl_var shader_in INTERP_MODE_FLAT vec4 in_0 (VERT_ATTRIB_GENERIC0.xyzw, 15, 0)
2022-07-27 22:49:39.458387: decl_var shader_in INTERP_MODE_FLAT vec4 in_1 (VERT_ATTRIB_GENERIC1.xyzw, 16, 0)
2022-07-27 22:49:39.458395: decl_var shader_in INTERP_MODE_FLAT vec4 in_2 (VERT_ATTRIB_GENERIC2.xyzw, 17, 0)
2022-07-27 22:49:39.458402: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (VARYING_SLOT_POS.xyzw, 0, 0)
2022-07-27 22:49:39.458409: decl_var shader_out INTERP_MODE_FLAT vec4 out_1 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:39.458417: decl_var shader_out INTERP_MODE_FLAT vec4 out_2 (VARYING_SLOT_COL1.xyzw, 2, 0)
2022-07-27 22:49:39.458424: decl_var shader_out INTERP_MODE_FLAT vec4 out_3 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:39.458431: decl_var uniform INTERP_MODE_NONE vec4[4] uniform_0 (0, 0, 0)
2022-07-27 22:49:39.458438: decl_var ubo INTERP_MODE_NONE vec4[4] uniform_0@0 (0, 0, 0)
2022-07-27 22:49:39.458446: decl_function main (0 params)
2022-07-27 22:49:39.458453: impl main {
2022-07-27 22:49:39.458460: block block_0:
2022-07-27 22:49:39.458468: /* preds: */
2022-07-27 22:49:39.458475: vec4 32 con ssa_0 = undefined
2022-07-27 22:49:39.458482: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:39.458490: vec4 32 div ssa_2 = intrinsic load_input (ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=15 slots=1 /*143*/)
2022-07-27 22:49:39.458498: vec4 32 con ssa_3 = intrinsic load_ubo (ssa_1, ssa_1) (access=0, align_mul=1073741824, align_offset=0, range_base=0, range=16)
2022-07-27 22:49:39.458506: vec1 32 div ssa_4 = fmul ssa_2.x, ssa_3.x
2022-07-27 22:49:39.458513: vec1 32 div ssa_5 = fmul ssa_2.x, ssa_3.y
2022-07-27 22:49:39.458520: vec1 32 div ssa_6 = fmul ssa_2.x, ssa_3.z
2022-07-27 22:49:39.458528: vec1 32 div ssa_7 = fmul ssa_2.x, ssa_3.w
2022-07-27 22:49:39.458535: vec1 32 con ssa_8 = load_const (0x00000010 = 0.000000)
2022-07-27 22:49:39.458543: vec4 32 con ssa_9 = intrinsic load_ubo (ssa_1, ssa_8) (access=0, align_mul=1073741824, align_offset=16, range_base=16, range=16)
2022-07-27 22:49:39.458550: vec1 32 div ssa_10 = ffma ssa_2.y, ssa_9.x, ssa_4
2022-07-27 22:49:39.458557: vec1 32 div ssa_11 = ffma ssa_2.y, ssa_9.y, ssa_5
2022-07-27 22:49:39.458564: vec1 32 div ssa_12 = ffma ssa_2.y, ssa_9.z, ssa_6
2022-07-27 22:49:39.458571: vec1 32 div ssa_13 = ffma ssa_2.y, ssa_9.w, ssa_7
2022-07-27 22:49:39.458578: vec1 32 con ssa_14 = load_const (0x00000020 = 0.000000)
2022-07-27 22:49:39.458585: vec4 32 con ssa_15 = intrinsic load_ubo (ssa_1, ssa_14) (access=0, align_mul=1073741824, align_offset=32, range_base=32, range=16)
2022-07-27 22:49:39.458592: vec1 32 div ssa_16 = ffma ssa_2.z, ssa_15.x, ssa_10
2022-07-27 22:49:39.458599: vec1 32 div ssa_17 = ffma ssa_2.z, ssa_15.y, ssa_11
2022-07-27 22:49:39.458666: vec1 32 div ssa_18 = ffma ssa_2.z, ssa_15.z, ssa_12
2022-07-27 22:49:39.458678: vec1 32 div ssa_19 = ffma ssa_2.z, ssa_15.w, ssa_13
2022-07-27 22:49:39.458685: vec1 32 con ssa_20 = load_const (0x00000030 = 0.000000)
2022-07-27 22:49:39.458692: vec4 32 con ssa_21 = intrinsic load_ubo (ssa_1, ssa_20) (access=0, align_mul=1073741824, align_offset=48, range_base=48, range=16)
2022-07-27 22:49:39.458699: vec1 32 div ssa_22 = ffma ssa_2.w, ssa_21.x, ssa_16
2022-07-27 22:49:39.458727: vec1 32 div ssa_23 = ffma ssa_2.w, ssa_21.y, ssa_17
2022-07-27 22:49:39.458734: vec1 32 div ssa_24 = ffma ssa_2.w, ssa_21.z, ssa_18
2022-07-27 22:49:39.458742: vec1 32 div ssa_25 = ffma ssa_2.w, ssa_21.w, ssa_19
2022-07-27 22:49:39.458750: vec4 32 div ssa_26 = intrinsic load_input (ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=16 slots=1 /*144*/)
2022-07-27 22:49:39.458757: vec1 32 div ssa_27 = fsat ssa_26.x
2022-07-27 22:49:39.458764: vec1 32 div ssa_28 = fsat ssa_26.y
2022-07-27 22:49:39.458771: vec1 32 div ssa_29 = fsat ssa_26.z
2022-07-27 22:49:39.458778: vec1 32 div ssa_30 = fsat ssa_26.w
2022-07-27 22:49:39.458785: vec4 32 div ssa_31 = intrinsic load_input (ssa_1) (base=2, component=0, dest_type=float32 /*160*/, io location=17 slots=1 /*145*/)
2022-07-27 22:49:39.458792: vec1 32 div ssa_32 = fsat ssa_31.x
2022-07-27 22:49:39.458799: vec1 32 div ssa_33 = fsat ssa_31.y
2022-07-27 22:49:39.458807: vec1 32 div ssa_34 = fsat ssa_31.z
2022-07-27 22:49:39.458814: vec1 32 div ssa_35 = fsat ssa_31.w
2022-07-27 22:49:39.458821: vec4 32 div ssa_36 = vec4 ssa_22, ssa_23, ssa_24, ssa_25
2022-07-27 22:49:39.458830: intrinsic store_output (ssa_36, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=0 slots=1 /*128*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:39.458839: vec4 32 div ssa_37 = vec4 ssa_27, ssa_28, ssa_29, ssa_30
2022-07-27 22:49:39.458846: intrinsic store_output (ssa_37, ssa_1) (base=1, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=1 slots=1 /*129*/, xfb() /*0*/, xfb2() /*0*/) /* out_1 */
2022-07-27 22:49:39.458853: vec4 32 div ssa_38 = vec4 ssa_32, ssa_33, ssa_34, ssa_35
2022-07-27 22:49:39.458861: intrinsic store_output (ssa_38, ssa_1) (base=2, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=2 slots=1 /*130*/, xfb() /*0*/, xfb2() /*0*/) /* out_2 */
2022-07-27 22:49:39.458868: vec4 32 div ssa_39 = vec4 ssa_31.w, ssa_0.y, ssa_0.z, ssa_0.w
2022-07-27 22:49:39.458876: intrinsic store_output (ssa_39, ssa_1) (base=48, wrmask=x /*1*/, component=0, src_type=float32 /*160*/, io location=48 slots=1 /*176*/, xfb() /*0*/, xfb2() /*0*/) /* out_3 */
2022-07-27 22:49:39.458885: /* succs: block_1 */
2022-07-27 22:49:39.458893: block block_1:
2022-07-27 22:49:39.458900: }
2022-07-27 22:49:39.458908: NIR (final form) for vertex shader:
2022-07-27 22:49:39.458916: shader: MESA_SHADER_VERTEX
2022-07-27 22:49:39.458924: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:39.458931: name: TTN
2022-07-27 22:49:39.458939: inputs: 3
2022-07-27 22:49:39.458946: outputs: 4
2022-07-27 22:49:39.458954: uniforms: 0
2022-07-27 22:49:39.458962: ubos: 1
2022-07-27 22:49:39.458970: shared: 0
2022-07-27 22:49:39.458989: ray queries: 0
2022-07-27 22:49:39.458997: decl_var shader_in INTERP_MODE_FLAT vec4 in_0 (VERT_ATTRIB_GENERIC0.xyzw, 15, 0)
2022-07-27 22:49:39.459006: decl_var shader_in INTERP_MODE_FLAT vec4 in_1 (VERT_ATTRIB_GENERIC1.xyzw, 16, 0)
2022-07-27 22:49:39.459015: decl_var shader_in INTERP_MODE_FLAT vec4 in_2 (VERT_ATTRIB_GENERIC2.xyzw, 17, 0)
2022-07-27 22:49:39.459024: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (VARYING_SLOT_POS.xyzw, 0, 0)
2022-07-27 22:49:39.459033: decl_var shader_out INTERP_MODE_FLAT vec4 out_1 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:39.459067: decl_var shader_out INTERP_MODE_FLAT vec4 out_2 (VARYING_SLOT_COL1.xyzw, 2, 0)
2022-07-27 22:49:39.459078: decl_var shader_out INTERP_MODE_FLAT vec4 out_3 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:39.459086: decl_var uniform INTERP_MODE_NONE vec4[4] uniform_0 (0, 0, 0)
2022-07-27 22:49:39.459094: decl_var ubo INTERP_MODE_NONE vec4[4] uniform_0@0 (0, 0, 0)
2022-07-27 22:49:39.459103: decl_function main (0 params)
2022-07-27 22:49:39.459111: impl main {
2022-07-27 22:49:39.459120: block block_0:
2022-07-27 22:49:39.459128: /* preds: */
2022-07-27 22:49:39.459137: vec4 32 con ssa_0 = undefined
2022-07-27 22:49:39.459146: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:39.459154: vec4 32 div ssa_2 = intrinsic load_input (ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=15 slots=1 /*143*/)
2022-07-27 22:49:39.459163: vec4 32 con ssa_3 = intrinsic load_ubo (ssa_1, ssa_1) (access=0, align_mul=1073741824, align_offset=0, range_base=0, range=16)
2022-07-27 22:49:39.459173: vec1 32 div ssa_4 = fmul ssa_2.x, ssa_3.x
2022-07-27 22:49:39.459203: vec1 32 div ssa_5 = fmul ssa_2.x, ssa_3.y
2022-07-27 22:49:39.459217: vec1 32 div ssa_6 = fmul ssa_2.x, ssa_3.z
2022-07-27 22:49:44.714250: vec1 32 div ssa_7 = fmul ssa_2.x, ssa_3.w
2022-07-27 22:49:44.714387: vec1 32 con ssa_8 = load_const (0x00000010 = 0.000000)
2022-07-27 22:49:44.714404: vec4 32 con ssa_9 = intrinsic load_ubo (ssa_1, ssa_8) (access=0, align_mul=1073741824, align_offset=16, range_base=16, range=16)
2022-07-27 22:49:44.714426: vec1 32 div ssa_10 = ffma ssa_2.y, ssa_9.x, ssa_4
2022-07-27 22:49:44.714436: vec1 32 div ssa_11 = ffma ssa_2.y, ssa_9.y, ssa_5
2022-07-27 22:49:44.714444: vec1 32 div ssa_12 = ffma ssa_2.y, ssa_9.z, ssa_6
2022-07-27 22:49:44.714453: vec1 32 div ssa_13 = ffma ssa_2.y, ssa_9.w, ssa_7
2022-07-27 22:49:44.714461: vec1 32 con ssa_14 = load_const (0x00000020 = 0.000000)
2022-07-27 22:49:44.714470: vec4 32 con ssa_15 = intrinsic load_ubo (ssa_1, ssa_14) (access=0, align_mul=1073741824, align_offset=32, range_base=32, range=16)
2022-07-27 22:49:44.714480: vec1 32 div ssa_16 = ffma ssa_2.z, ssa_15.x, ssa_10
2022-07-27 22:49:44.714489: vec1 32 div ssa_17 = ffma ssa_2.z, ssa_15.y, ssa_11
2022-07-27 22:49:44.714500: vec1 32 div ssa_18 = ffma ssa_2.z, ssa_15.z, ssa_12
2022-07-27 22:49:44.714510: vec1 32 div ssa_19 = ffma ssa_2.z, ssa_15.w, ssa_13
2022-07-27 22:49:44.714518: vec1 32 con ssa_20 = load_const (0x00000030 = 0.000000)
2022-07-27 22:49:44.714527: vec4 32 con ssa_21 = intrinsic load_ubo (ssa_1, ssa_20) (access=0, align_mul=1073741824, align_offset=48, range_base=48, range=16)
2022-07-27 22:49:44.714537: vec1 32 div ssa_22 = ffma ssa_2.w, ssa_21.x, ssa_16
2022-07-27 22:49:44.714545: vec1 32 div ssa_23 = ffma ssa_2.w, ssa_21.y, ssa_17
2022-07-27 22:49:44.714554: vec1 32 div ssa_24 = ffma ssa_2.w, ssa_21.z, ssa_18
2022-07-27 22:49:44.714562: vec1 32 div ssa_25 = ffma ssa_2.w, ssa_21.w, ssa_19
2022-07-27 22:49:44.714571: vec4 32 div ssa_26 = intrinsic load_input (ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=16 slots=1 /*144*/)
2022-07-27 22:49:44.714580: vec1 32 div ssa_27 = fsat ssa_26.x
2022-07-27 22:49:44.714589: vec1 32 div ssa_28 = fsat ssa_26.y
2022-07-27 22:49:44.714599: vec1 32 div ssa_29 = fsat ssa_26.z
2022-07-27 22:49:44.714608: vec1 32 div ssa_30 = fsat ssa_26.w
2022-07-27 22:49:44.714616: vec4 32 div ssa_31 = intrinsic load_input (ssa_1) (base=2, component=0, dest_type=float32 /*160*/, io location=17 slots=1 /*145*/)
2022-07-27 22:49:44.714625: vec1 32 div ssa_32 = fsat ssa_31.x
2022-07-27 22:49:44.714634: vec1 32 div ssa_33 = fsat ssa_31.y
2022-07-27 22:49:44.714643: vec1 32 div ssa_34 = fsat ssa_31.z
2022-07-27 22:49:44.714652: vec1 32 div ssa_35 = fsat ssa_31.w
2022-07-27 22:49:44.714661: vec4 32 div ssa_36 = vec4 ssa_22, ssa_23, ssa_24, ssa_25
2022-07-27 22:49:44.714696: intrinsic store_output (ssa_36, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=0 slots=1 /*128*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.714706: vec4 32 div ssa_37 = vec4 ssa_27, ssa_28, ssa_29, ssa_30
2022-07-27 22:49:44.714714: intrinsic store_output (ssa_37, ssa_1) (base=1, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=1 slots=1 /*129*/, xfb() /*0*/, xfb2() /*0*/) /* out_1 */
2022-07-27 22:49:44.714723: vec4 32 div ssa_38 = vec4 ssa_32, ssa_33, ssa_34, ssa_35
2022-07-27 22:49:44.714730: intrinsic store_output (ssa_38, ssa_1) (base=2, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=2 slots=1 /*130*/, xfb() /*0*/, xfb2() /*0*/) /* out_2 */
2022-07-27 22:49:44.714740: vec4 32 div ssa_39 = vec4 ssa_31.w, ssa_0.y, ssa_0.z, ssa_0.w
2022-07-27 22:49:44.714748: intrinsic store_output (ssa_39, ssa_1) (base=48, wrmask=x /*1*/, component=0, src_type=float32 /*160*/, io location=48 slots=1 /*176*/, xfb() /*0*/, xfb2() /*0*/) /* out_3 */
2022-07-27 22:49:44.714757: /* succs: block_1 */
2022-07-27 22:49:44.714766: block block_1:
2022-07-27 22:49:44.714774: }
2022-07-27 22:49:44.714782: VS Output VUE map (23 slots, SSO)
2022-07-27 22:49:44.714791: [0] VARYING_SLOT_PSIZ
2022-07-27 22:49:44.714799: [1] VARYING_SLOT_POS
2022-07-27 22:49:44.714808: [2] VARYING_SLOT_CLIP_DIST0
2022-07-27 22:49:44.714817: [3] VARYING_SLOT_CLIP_DIST1
2022-07-27 22:49:44.714825: [4] VARYING_SLOT_COL0
2022-07-27 22:49:44.714834: [5] VARYING_SLOT_COL1
2022-07-27 22:49:44.714843: [6] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714851: [7] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714859: [8] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714867: [9] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714876: [10] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714883: [11] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714891: [12] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714899: [13] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714908: [14] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714917: [15] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714926: [16] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714934: [17] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714944: [18] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714952: [19] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714960: [20] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714969: [21] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.714978: [22] VARYING_SLOT_VAR16
2022-07-27 22:49:44.714986: Native code for unnamed vertex shader TTN (sha1 48a99faba555e0b3c35cae172e890c310d24424e)
2022-07-27 22:49:44.714996: SIMD8 shader: 29 instructions. 0 loops. 114 cycles. 0:0 spills:fills, 3 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 464 to 352 bytes (24%)
2022-07-27 22:49:44.715005: START B0 (114 cycles)
2022-07-27 22:49:44.715013: mul(8) g24<1>F g4<8,8,1>F g2<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.715022: mul(8) g25<1>F g4<8,8,1>F g2.1<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.715031: mul(8) g26<1>F g4<8,8,1>F g2.2<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.715040: mul(8) g27<1>F g4<8,8,1>F g2.3<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.715049: mov.sat(8) g16<1>F g8<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715057: mov.sat(8) g17<1>F g9<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715163: mov.sat(8) g18<1>F g10<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715175: mov.sat(8) g19<1>F g11<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715206: mov.sat(8) g20<1>F g12<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715224: mov.sat(8) g21<1>F g13<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715234: mov.sat(8) g22<1>F g14<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715242: mov.sat(8) g23<1>F g15<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715251: mov(8) g126<1>UD g1<8,8,1>UD { align1 WE_all 1Q compacted };
2022-07-27 22:49:44.715261: mad(8) g28<1>F g24<4,4,1>F g2.4<0,1,0>F g5<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715269: mad(8) g29<1>F g25<4,4,1>F g2.5<0,1,0>F g5<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715277: mad(8) g30<1>F g26<4,4,1>F g2.6<0,1,0>F g5<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715286: mad(8) g31<1>F g27<4,4,1>F g2.7<0,1,0>F g5<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715294: mad(8) g32<1>F g28<4,4,1>F g3.0<0,1,0>F g6<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715302: mad(8) g33<1>F g29<4,4,1>F g3.1<0,1,0>F g6<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715310: mad(8) g34<1>F g30<4,4,1>F g3.2<0,1,0>F g6<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715318: mad(8) g35<1>F g31<4,4,1>F g3.3<0,1,0>F g6<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715326: mad(8) g52<1>F g32<4,4,1>F g3.4<0,1,0>F g7<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715334: mad(8) g53<1>F g33<4,4,1>F g3.5<0,1,0>F g7<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715342: mad(8) g54<1>F g34<4,4,1>F g3.6<0,1,0>F g7<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715352: mad(8) g55<1>F g35<4,4,1>F g3.7<0,1,0>F g7<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.715360: sends(8) nullUD g1UD g52UD 0x02080017 0x00000100
2022-07-27 22:49:44.715368: urb MsgDesc: offset 1 SIMD8 write mlen 1 ex_mlen 4 rlen 0 { align1 1Q };
2022-07-27 22:49:44.715377: sends(8) nullUD g1UD g16UD 0x02080047 0x00000200
2022-07-27 22:49:44.715385: urb MsgDesc: offset 4 SIMD8 write mlen 1 ex_mlen 8 rlen 0 { align1 1Q };
2022-07-27 22:49:44.715393: mov(8) g122<1>F g15<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.715400: sends(8) nullUD g126UD g122UD 0x02080167 0x00000100
2022-07-27 22:49:44.715409: urb MsgDesc: offset 22 SIMD8 write mlen 1 ex_mlen 4 rlen 0 { align1 1Q EOT };
2022-07-27 22:49:44.715419: END B0
2022-07-27 22:49:44.715428: NIR (SSA form) for fragment shader:
2022-07-27 22:49:44.715437: shader: MESA_SHADER_FRAGMENT
2022-07-27 22:49:44.715446: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.715454: name: TTN
2022-07-27 22:49:44.715463: inputs: 2
2022-07-27 22:49:44.715470: outputs: 1
2022-07-27 22:49:44.715479: uniforms: 0
2022-07-27 22:49:44.715488: ubos: 1
2022-07-27 22:49:44.715497: shared: 0
2022-07-27 22:49:44.715506: ray queries: 0
2022-07-27 22:49:44.715514: decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.715524: decl_var shader_in INTERP_MODE_SMOOTH vec4 in_1 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:44.715534: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0.xyzw, 8, 0)
2022-07-27 22:49:44.715543: decl_var uniform INTERP_MODE_NONE vec4 uniform_21 (21, 21, 0)
2022-07-27 22:49:44.715553: decl_var ubo INTERP_MODE_NONE vec4[22] uniform_0 (0, 0, 0)
2022-07-27 22:49:44.715563: decl_function main (0 params)
2022-07-27 22:49:44.715580: impl main {
2022-07-27 22:49:44.715589: block block_0:
2022-07-27 22:49:44.715599: /* preds: */
2022-07-27 22:49:44.715608: vec2 32 div ssa_0 = intrinsic load_barycentric_pixel () (interp_mode=1)
2022-07-27 22:49:44.715617: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.715627: vec4 32 div ssa_2 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=1 slots=1 /*129*/) /* in_0 */
2022-07-27 22:49:44.715637: vec4 32 div ssa_3 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=48, component=0, dest_type=float32 /*160*/, io location=48 slots=1 /*176*/) /* in_1 */
2022-07-27 22:49:44.715646: vec1 32 con ssa_4 = load_const (0x00000150 = 0.000000)
2022-07-27 22:49:44.715702: vec1 32 con ssa_5 = load_const (0x00000001 = 0.000000)
2022-07-27 22:49:44.715716: vec4 32 con ssa_6 = intrinsic load_ubo (ssa_5, ssa_4) (access=0, align_mul=1073741824, align_offset=336, range_base=336, range=16)
2022-07-27 22:49:44.715726: vec1 32 div ssa_7 = flrp ssa_6.x, ssa_2.x, ssa_3.x
2022-07-27 22:49:44.715736: vec1 32 div ssa_8 = flrp ssa_6.y, ssa_2.y, ssa_3.x
2022-07-27 22:49:44.715745: vec1 32 div ssa_9 = flrp ssa_6.z, ssa_2.z, ssa_3.x
2022-07-27 22:49:44.715755: vec4 32 div ssa_10 = vec4 ssa_7, ssa_8, ssa_9, ssa_2.w
2022-07-27 22:49:44.715763: intrinsic store_output (ssa_10, ssa_1) (base=8, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=4 slots=1 /*132*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.715773: /* succs: block_1 */
2022-07-27 22:49:44.715783: block block_1:
2022-07-27 22:49:44.715791: }
2022-07-27 22:49:44.715799: NIR (final form) for fragment shader:
2022-07-27 22:49:44.715807: shader: MESA_SHADER_FRAGMENT
2022-07-27 22:49:44.715815: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.715824: name: TTN
2022-07-27 22:49:44.715833: inputs: 2
2022-07-27 22:49:44.715842: outputs: 1
2022-07-27 22:49:44.715851: uniforms: 0
2022-07-27 22:49:44.715859: ubos: 1
2022-07-27 22:49:44.715869: shared: 0
2022-07-27 22:49:44.715877: ray queries: 0
2022-07-27 22:49:44.715912: decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.715922: decl_var shader_in INTERP_MODE_SMOOTH vec4 in_1 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:44.715931: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0.xyzw, 8, 0)
2022-07-27 22:49:44.715939: decl_var uniform INTERP_MODE_NONE vec4 uniform_21 (21, 21, 0)
2022-07-27 22:49:44.715948: decl_var ubo INTERP_MODE_NONE vec4[22] uniform_0 (0, 0, 0)
2022-07-27 22:49:44.715957: decl_function main (0 params)
2022-07-27 22:49:44.715966: impl main {
2022-07-27 22:49:44.715974: block block_0:
2022-07-27 22:49:44.715983: /* preds: */
2022-07-27 22:49:44.715991: vec2 32 div ssa_0 = intrinsic load_barycentric_pixel () (interp_mode=1)
2022-07-27 22:49:44.716000: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.716009: vec4 32 div ssa_2 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=1 slots=1 /*129*/) /* in_0 */
2022-07-27 22:49:44.716018: vec4 32 div ssa_3 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=48, component=0, dest_type=float32 /*160*/, io location=48 slots=1 /*176*/) /* in_1 */
2022-07-27 22:49:44.716027: vec1 32 con ssa_4 = load_const (0x00000150 = 0.000000)
2022-07-27 22:49:44.716036: vec1 32 con ssa_5 = load_const (0x00000001 = 0.000000)
2022-07-27 22:49:44.716045: vec4 32 con ssa_6 = intrinsic load_ubo (ssa_5, ssa_4) (access=0, align_mul=1073741824, align_offset=336, range_base=336, range=16)
2022-07-27 22:49:44.716053: vec1 32 div ssa_7 = flrp ssa_6.x, ssa_2.x, ssa_3.x
2022-07-27 22:49:44.716062: vec1 32 div ssa_8 = flrp ssa_6.y, ssa_2.y, ssa_3.x
2022-07-27 22:49:44.716079: vec1 32 div ssa_9 = flrp ssa_6.z, ssa_2.z, ssa_3.x
2022-07-27 22:49:44.716088: vec4 32 div ssa_10 = vec4 ssa_7, ssa_8, ssa_9, ssa_2.w
2022-07-27 22:49:44.716096: intrinsic store_output (ssa_10, ssa_1) (base=8, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=4 slots=1 /*132*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.716106: /* succs: block_1 */
2022-07-27 22:49:44.716114: block block_1:
2022-07-27 22:49:44.716123: }
2022-07-27 22:49:44.716131: Native code for unnamed fragment shader TTN (sha1 b1f8d995d2d32a1c0ec7911c237f4d1ca4bbf163)
2022-07-27 22:49:44.716161: SIMD8 shader: 9 instructions. 0 loops. 58 cycles. 0:0 spills:fills, 1 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 144 to 112 bytes (22%)
2022-07-27 22:49:44.716170: START B0 (58 cycles)
2022-07-27 22:49:44.716178: pln(8) g11<1>F g5<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.716188: pln(8) g9<1>F g5.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.716196: pln(8) g13<1>F g6<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.716206: pln(8) g126<1>F g6.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.716215: pln(8) g8<1>F g7<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.716223: lrp(8) g123<1>F g8<4,4,1>F g11<4,4,1>F g4.4<0,1,0>F { align16 1Q };
2022-07-27 22:49:44.716232: lrp(8) g124<1>F g8<4,4,1>F g9<4,4,1>F g4.5<0,1,0>F { align16 1Q };
2022-07-27 22:49:44.716240: lrp(8) g125<1>F g8<4,4,1>F g13<4,4,1>F g4.6<0,1,0>F { align16 1Q };
2022-07-27 22:49:44.716249: sendc(8) null<1>UW g123<0,1,0>UD 0x88031400
2022-07-27 22:49:44.716257: render MsgDesc: RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT };
2022-07-27 22:49:44.716266: END B0
2022-07-27 22:49:44.716306: Native code for unnamed fragment shader TTN (sha1 19443fec2f6c2cd45fa7bb5453e75244bc717290)
2022-07-27 22:49:44.716316: SIMD16 shader: 9 instructions. 0 loops. 96 cycles. 0:0 spills:fills, 1 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 144 to 112 bytes (22%)
2022-07-27 22:49:44.716325: START B0 (96 cycles)
2022-07-27 22:49:44.716335: pln(16) g10<1>F g7<0,1,0>F g2<8,8,1>F { align1 1H compacted };
2022-07-27 22:49:44.716344: pln(16) g12<1>F g7.4<0,1,0>F g2<8,8,1>F { align1 1H compacted };
2022-07-27 22:49:44.716352: pln(16) g14<1>F g8<0,1,0>F g2<8,8,1>F { align1 1H compacted };
2022-07-27 22:49:44.716361: pln(16) g125<1>F g8.4<0,1,0>F g2<8,8,1>F { align1 1H compacted };
2022-07-27 22:49:44.716370: pln(16) g16<1>F g9<0,1,0>F g2<8,8,1>F { align1 1H compacted };
2022-07-27 22:49:44.716379: lrp(16) g119<1>F g16<4,4,1>F g10<4,4,1>F g6.4<0,1,0>F { align16 1H };
2022-07-27 22:49:44.716387: lrp(16) g121<1>F g16<4,4,1>F g12<4,4,1>F g6.5<0,1,0>F { align16 1H };
2022-07-27 22:49:44.716397: lrp(16) g123<1>F g16<4,4,1>F g14<4,4,1>F g6.6<0,1,0>F { align16 1H };
2022-07-27 22:49:44.716405: sendc(16) null<1>UW g119<0,1,0>UD 0x90031000
2022-07-27 22:49:44.716414: render MsgDesc: RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 { align1 1H EOT };
2022-07-27 22:49:44.716422: END B0
2022-07-27 22:49:44.716431: NIR (SSA form) for vertex shader:
2022-07-27 22:49:44.716440: shader: MESA_SHADER_VERTEX
2022-07-27 22:49:44.716449: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.716458: name: TTN
2022-07-27 22:49:44.716467: inputs: 3
2022-07-27 22:49:44.716483: outputs: 4
2022-07-27 22:49:44.716491: uniforms: 0
2022-07-27 22:49:44.716500: ubos: 1
2022-07-27 22:49:44.716508: shared: 0
2022-07-27 22:49:44.716516: ray queries: 0
2022-07-27 22:49:44.716525: decl_var shader_in INTERP_MODE_FLAT vec4 in_0 (VERT_ATTRIB_GENERIC0.xyzw, 15, 0)
2022-07-27 22:49:44.716534: decl_var shader_in INTERP_MODE_FLAT vec4 in_1 (VERT_ATTRIB_GENERIC1.xyzw, 16, 0)
2022-07-27 22:49:44.716543: decl_var shader_in INTERP_MODE_FLAT vec4 in_2 (VERT_ATTRIB_GENERIC2.xyzw, 17, 0)
2022-07-27 22:49:44.716552: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (VARYING_SLOT_POS.xyzw, 0, 0)
2022-07-27 22:49:44.716561: decl_var shader_out INTERP_MODE_FLAT vec4 out_1 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.716570: decl_var shader_out INTERP_MODE_FLAT vec4 out_2 (VARYING_SLOT_COL1.xyzw, 2, 0)
2022-07-27 22:49:44.716580: decl_var shader_out INTERP_MODE_FLAT vec4 out_3 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:44.716589: decl_var uniform INTERP_MODE_NONE vec4[8] uniform_0 (0, 0, 0)
2022-07-27 22:49:44.716598: decl_var uniform INTERP_MODE_NONE vec4 uniform_28 (28, 28, 0)
2022-07-27 22:49:44.716607: decl_var ubo INTERP_MODE_NONE vec4[29] uniform_0@0 (0, 0, 0)
2022-07-27 22:49:44.716616: decl_function main (0 params)
2022-07-27 22:49:44.716625: impl main {
2022-07-27 22:49:44.716635: block block_0:
2022-07-27 22:49:44.716643: /* preds: */
2022-07-27 22:49:44.716652: vec4 32 con ssa_0 = undefined
2022-07-27 22:49:44.716661: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.716670: vec4 32 div ssa_2 = intrinsic load_input (ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=15 slots=1 /*143*/)
2022-07-27 22:49:44.716680: vec4 32 con ssa_3 = intrinsic load_ubo (ssa_1, ssa_1) (access=0, align_mul=1073741824, align_offset=0, range_base=0, range=16)
2022-07-27 22:49:44.716690: vec1 32 div ssa_4 = fmul ssa_2.x, ssa_3.x
2022-07-27 22:49:44.716699: vec1 32 div ssa_5 = fmul ssa_2.x, ssa_3.y
2022-07-27 22:49:44.716708: vec1 32 div ssa_6 = fmul ssa_2.x, ssa_3.z
2022-07-27 22:49:44.716717: vec1 32 div ssa_7 = fmul ssa_2.x, ssa_3.w
2022-07-27 22:49:44.716726: vec1 32 con ssa_8 = load_const (0x00000010 = 0.000000)
2022-07-27 22:49:44.716736: vec4 32 con ssa_9 = intrinsic load_ubo (ssa_1, ssa_8) (access=0, align_mul=1073741824, align_offset=16, range_base=16, range=16)
2022-07-27 22:49:44.716744: vec1 32 div ssa_10 = ffma ssa_2.y, ssa_9.x, ssa_4
2022-07-27 22:49:44.716753: vec1 32 div ssa_11 = ffma ssa_2.y, ssa_9.y, ssa_5
2022-07-27 22:49:44.716762: vec1 32 div ssa_12 = ffma ssa_2.y, ssa_9.z, ssa_6
2022-07-27 22:49:44.716769: vec1 32 div ssa_13 = ffma ssa_2.y, ssa_9.w, ssa_7
2022-07-27 22:49:44.716777: vec1 32 con ssa_14 = load_const (0x00000020 = 0.000000)
2022-07-27 22:49:44.716784: vec4 32 con ssa_15 = intrinsic load_ubo (ssa_1, ssa_14) (access=0, align_mul=1073741824, align_offset=32, range_base=32, range=16)
2022-07-27 22:49:44.716792: vec1 32 div ssa_16 = ffma ssa_2.z, ssa_15.x, ssa_10
2022-07-27 22:49:44.716800: vec1 32 div ssa_17 = ffma ssa_2.z, ssa_15.y, ssa_11
2022-07-27 22:49:44.716808: vec1 32 div ssa_18 = ffma ssa_2.z, ssa_15.z, ssa_12
2022-07-27 22:49:44.716816: vec1 32 div ssa_19 = ffma ssa_2.z, ssa_15.w, ssa_13
2022-07-27 22:49:44.716824: vec1 32 con ssa_20 = load_const (0x00000030 = 0.000000)
2022-07-27 22:49:44.716831: vec4 32 con ssa_21 = intrinsic load_ubo (ssa_1, ssa_20) (access=0, align_mul=1073741824, align_offset=48, range_base=48, range=16)
2022-07-27 22:49:44.716839: vec1 32 div ssa_22 = ffma ssa_2.w, ssa_21.x, ssa_16
2022-07-27 22:49:44.716846: vec1 32 div ssa_23 = ffma ssa_2.w, ssa_21.y, ssa_17
2022-07-27 22:49:44.716853: vec1 32 div ssa_24 = ffma ssa_2.w, ssa_21.z, ssa_18
2022-07-27 22:49:44.716860: vec1 32 div ssa_25 = ffma ssa_2.w, ssa_21.w, ssa_19
2022-07-27 22:49:44.716867: vec1 32 con ssa_26 = load_const (0x00000040 = 0.000000)
2022-07-27 22:49:44.716881: vec4 32 con ssa_27 = intrinsic load_ubo (ssa_1, ssa_26) (access=0, align_mul=1073741824, align_offset=64, range_base=64, range=16)
2022-07-27 22:49:44.716891: vec1 32 div ssa_28 = fmul ssa_2.x, ssa_27.z
2022-07-27 22:49:44.716900: vec1 32 con ssa_29 = load_const (0x00000050 = 0.000000)
2022-07-27 22:49:44.716909: vec4 32 con ssa_30 = intrinsic load_ubo (ssa_1, ssa_29) (access=0, align_mul=1073741824, align_offset=80, range_base=80, range=16)
2022-07-27 22:49:44.716918: vec1 32 div ssa_31 = ffma ssa_2.y, ssa_30.z, ssa_28
2022-07-27 22:49:44.716927: vec1 32 con ssa_32 = load_const (0x00000060 = 0.000000)
2022-07-27 22:49:44.716936: vec4 32 con ssa_33 = intrinsic load_ubo (ssa_1, ssa_32) (access=0, align_mul=1073741824, align_offset=96, range_base=96, range=16)
2022-07-27 22:49:44.716945: vec1 32 div ssa_34 = ffma ssa_2.z, ssa_33.z, ssa_31
2022-07-27 22:49:44.716953: vec1 32 con ssa_35 = load_const (0x00000070 = 0.000000)
2022-07-27 22:49:44.716960: vec4 32 con ssa_36 = intrinsic load_ubo (ssa_1, ssa_35) (access=0, align_mul=1073741824, align_offset=112, range_base=112, range=16)
2022-07-27 22:49:44.716969: vec1 32 div ssa_37 = ffma ssa_2.w, ssa_36.z, ssa_34
2022-07-27 22:49:44.716977: vec4 32 div ssa_38 = intrinsic load_input (ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=16 slots=1 /*144*/)
2022-07-27 22:49:44.716985: vec1 32 div ssa_39 = fsat ssa_38.x
2022-07-27 22:49:44.716994: vec1 32 div ssa_40 = fsat ssa_38.y
2022-07-27 22:49:44.717002: vec1 32 div ssa_41 = fsat ssa_38.z
2022-07-27 22:49:44.717009: vec1 32 div ssa_42 = fsat ssa_38.w
2022-07-27 22:49:44.717018: vec4 32 div ssa_43 = intrinsic load_input (ssa_1) (base=2, component=0, dest_type=float32 /*160*/, io location=17 slots=1 /*145*/)
2022-07-27 22:49:44.717027: vec1 32 div ssa_44 = fsat ssa_43.x
2022-07-27 22:49:44.717035: vec1 32 div ssa_45 = fsat ssa_43.y
2022-07-27 22:49:44.717043: vec1 32 div ssa_46 = fsat ssa_43.z
2022-07-27 22:49:44.717051: vec1 32 div ssa_47 = fsat ssa_43.w
2022-07-27 22:49:44.717059: vec1 32 div ssa_48 = fabs ssa_37
2022-07-27 22:49:44.717067: vec1 32 con ssa_49 = load_const (0x000001c0 = 0.000000)
2022-07-27 22:49:44.717075: vec4 32 con ssa_50 = intrinsic load_ubo (ssa_1, ssa_49) (access=0, align_mul=1073741824, align_offset=448, range_base=448, range=16)
2022-07-27 22:49:44.717083: vec1 32 div ssa_51 = fneg ssa_48
2022-07-27 22:49:44.717092: vec1 32 div ssa_52 = fadd ssa_50.x, ssa_51
2022-07-27 22:49:44.717100: vec1 32 div ssa_53 = fmul ssa_52, ssa_50.y
2022-07-27 22:49:44.717109: vec1 32 div ssa_54 = fsat ssa_53
2022-07-27 22:49:44.717117: vec4 32 div ssa_55 = vec4 ssa_22, ssa_23, ssa_24, ssa_25
2022-07-27 22:49:44.717126: intrinsic store_output (ssa_55, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=0 slots=1 /*128*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.717135: vec4 32 div ssa_56 = vec4 ssa_39, ssa_40, ssa_41, ssa_42
2022-07-27 22:49:44.717143: intrinsic store_output (ssa_56, ssa_1) (base=1, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=1 slots=1 /*129*/, xfb() /*0*/, xfb2() /*0*/) /* out_1 */
2022-07-27 22:49:44.717152: vec4 32 div ssa_57 = vec4 ssa_44, ssa_45, ssa_46, ssa_47
2022-07-27 22:49:44.717160: intrinsic store_output (ssa_57, ssa_1) (base=2, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=2 slots=1 /*130*/, xfb() /*0*/, xfb2() /*0*/) /* out_2 */
2022-07-27 22:49:44.717169: vec4 32 div ssa_58 = vec4 ssa_54, ssa_0.y, ssa_0.z, ssa_0.w
2022-07-27 22:49:44.717177: intrinsic store_output (ssa_58, ssa_1) (base=48, wrmask=x /*1*/, component=0, src_type=float32 /*160*/, io location=48 slots=1 /*176*/, xfb() /*0*/, xfb2() /*0*/) /* out_3 */
2022-07-27 22:49:44.717185: /* succs: block_1 */
2022-07-27 22:49:44.717194: block block_1:
2022-07-27 22:49:44.717203: }
2022-07-27 22:49:44.717243: NIR (final form) for vertex shader:
2022-07-27 22:49:44.717252: shader: MESA_SHADER_VERTEX
2022-07-27 22:49:44.717260: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.717269: name: TTN
2022-07-27 22:49:44.717278: inputs: 3
2022-07-27 22:49:44.717286: outputs: 4
2022-07-27 22:49:44.717295: uniforms: 0
2022-07-27 22:49:44.717303: ubos: 1
2022-07-27 22:49:44.717311: shared: 0
2022-07-27 22:49:44.717319: ray queries: 0
2022-07-27 22:49:44.717327: decl_var shader_in INTERP_MODE_FLAT vec4 in_0 (VERT_ATTRIB_GENERIC0.xyzw, 15, 0)
2022-07-27 22:49:44.717335: decl_var shader_in INTERP_MODE_FLAT vec4 in_1 (VERT_ATTRIB_GENERIC1.xyzw, 16, 0)
2022-07-27 22:49:44.717343: decl_var shader_in INTERP_MODE_FLAT vec4 in_2 (VERT_ATTRIB_GENERIC2.xyzw, 17, 0)
2022-07-27 22:49:44.717351: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (VARYING_SLOT_POS.xyzw, 0, 0)
2022-07-27 22:49:44.717359: decl_var shader_out INTERP_MODE_FLAT vec4 out_1 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.717367: decl_var shader_out INTERP_MODE_FLAT vec4 out_2 (VARYING_SLOT_COL1.xyzw, 2, 0)
2022-07-27 22:49:44.717375: decl_var shader_out INTERP_MODE_FLAT vec4 out_3 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:44.717383: decl_var uniform INTERP_MODE_NONE vec4[8] uniform_0 (0, 0, 0)
2022-07-27 22:49:44.717391: decl_var uniform INTERP_MODE_NONE vec4 uniform_28 (28, 28, 0)
2022-07-27 22:49:44.717399: decl_var ubo INTERP_MODE_NONE vec4[29] uniform_0@0 (0, 0, 0)
2022-07-27 22:49:44.717407: decl_function main (0 params)
2022-07-27 22:49:44.717414: impl main {
2022-07-27 22:49:44.717422: block block_0:
2022-07-27 22:49:44.717430: /* preds: */
2022-07-27 22:49:44.717437: vec4 32 con ssa_0 = undefined
2022-07-27 22:49:44.717445: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.717453: vec4 32 div ssa_2 = intrinsic load_input (ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=15 slots=1 /*143*/)
2022-07-27 22:49:44.717461: vec4 32 con ssa_3 = intrinsic load_ubo (ssa_1, ssa_1) (access=0, align_mul=1073741824, align_offset=0, range_base=0, range=16)
2022-07-27 22:49:44.717468: vec1 32 div ssa_4 = fmul ssa_2.x, ssa_3.x
2022-07-27 22:49:44.717477: vec1 32 div ssa_5 = fmul ssa_2.x, ssa_3.y
2022-07-27 22:49:44.717484: vec1 32 div ssa_6 = fmul ssa_2.x, ssa_3.z
2022-07-27 22:49:44.717492: vec1 32 div ssa_7 = fmul ssa_2.x, ssa_3.w
2022-07-27 22:49:44.717500: vec1 32 con ssa_8 = load_const (0x00000010 = 0.000000)
2022-07-27 22:49:44.717507: vec4 32 con ssa_9 = intrinsic load_ubo (ssa_1, ssa_8) (access=0, align_mul=1073741824, align_offset=16, range_base=16, range=16)
2022-07-27 22:49:44.717515: vec1 32 div ssa_10 = ffma ssa_2.y, ssa_9.x, ssa_4
2022-07-27 22:49:44.717523: vec1 32 div ssa_11 = ffma ssa_2.y, ssa_9.y, ssa_5
2022-07-27 22:49:44.717531: vec1 32 div ssa_12 = ffma ssa_2.y, ssa_9.z, ssa_6
2022-07-27 22:49:44.717539: vec1 32 div ssa_13 = ffma ssa_2.y, ssa_9.w, ssa_7
2022-07-27 22:49:44.717546: vec1 32 con ssa_14 = load_const (0x00000020 = 0.000000)
2022-07-27 22:49:44.717554: vec4 32 con ssa_15 = intrinsic load_ubo (ssa_1, ssa_14) (access=0, align_mul=1073741824, align_offset=32, range_base=32, range=16)
2022-07-27 22:49:44.717562: vec1 32 div ssa_16 = ffma ssa_2.z, ssa_15.x, ssa_10
2022-07-27 22:49:44.717569: vec1 32 div ssa_17 = ffma ssa_2.z, ssa_15.y, ssa_11
2022-07-27 22:49:44.717576: vec1 32 div ssa_18 = ffma ssa_2.z, ssa_15.z, ssa_12
2022-07-27 22:49:44.717584: vec1 32 div ssa_19 = ffma ssa_2.z, ssa_15.w, ssa_13
2022-07-27 22:49:44.717592: vec1 32 con ssa_20 = load_const (0x00000030 = 0.000000)
2022-07-27 22:49:44.717600: vec4 32 con ssa_21 = intrinsic load_ubo (ssa_1, ssa_20) (access=0, align_mul=1073741824, align_offset=48, range_base=48, range=16)
2022-07-27 22:49:44.717608: vec1 32 div ssa_22 = ffma ssa_2.w, ssa_21.x, ssa_16
2022-07-27 22:49:44.717655: vec1 32 div ssa_23 = ffma ssa_2.w, ssa_21.y, ssa_17
2022-07-27 22:49:44.717664: vec1 32 div ssa_24 = ffma ssa_2.w, ssa_21.z, ssa_18
2022-07-27 22:49:44.717671: vec1 32 div ssa_25 = ffma ssa_2.w, ssa_21.w, ssa_19
2022-07-27 22:49:44.717678: vec1 32 con ssa_26 = load_const (0x00000040 = 0.000000)
2022-07-27 22:49:44.717685: vec4 32 con ssa_27 = intrinsic load_ubo (ssa_1, ssa_26) (access=0, align_mul=1073741824, align_offset=64, range_base=64, range=16)
2022-07-27 22:49:44.717692: vec1 32 div ssa_28 = fmul ssa_2.x, ssa_27.z
2022-07-27 22:49:44.717699: vec1 32 con ssa_29 = load_const (0x00000050 = 0.000000)
2022-07-27 22:49:44.717705: vec4 32 con ssa_30 = intrinsic load_ubo (ssa_1, ssa_29) (access=0, align_mul=1073741824, align_offset=80, range_base=80, range=16)
2022-07-27 22:49:44.717713: vec1 32 div ssa_31 = ffma ssa_2.y, ssa_30.z, ssa_28
2022-07-27 22:49:44.717720: vec1 32 con ssa_32 = load_const (0x00000060 = 0.000000)
2022-07-27 22:49:44.717728: vec4 32 con ssa_33 = intrinsic load_ubo (ssa_1, ssa_32) (access=0, align_mul=1073741824, align_offset=96, range_base=96, range=16)
2022-07-27 22:49:44.717735: vec1 32 div ssa_34 = ffma ssa_2.z, ssa_33.z, ssa_31
2022-07-27 22:49:44.717742: vec1 32 con ssa_35 = load_const (0x00000070 = 0.000000)
2022-07-27 22:49:44.717749: vec4 32 con ssa_36 = intrinsic load_ubo (ssa_1, ssa_35) (access=0, align_mul=1073741824, align_offset=112, range_base=112, range=16)
2022-07-27 22:49:44.717757: vec1 32 div ssa_37 = ffma ssa_2.w, ssa_36.z, ssa_34
2022-07-27 22:49:44.717764: vec4 32 div ssa_38 = intrinsic load_input (ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=16 slots=1 /*144*/)
2022-07-27 22:49:44.717772: vec1 32 div ssa_39 = fsat ssa_38.x
2022-07-27 22:49:44.717779: vec1 32 div ssa_40 = fsat ssa_38.y
2022-07-27 22:49:44.717786: vec1 32 div ssa_41 = fsat ssa_38.z
2022-07-27 22:49:44.717793: vec1 32 div ssa_42 = fsat ssa_38.w
2022-07-27 22:49:44.717800: vec4 32 div ssa_43 = intrinsic load_input (ssa_1) (base=2, component=0, dest_type=float32 /*160*/, io location=17 slots=1 /*145*/)
2022-07-27 22:49:44.717807: vec1 32 div ssa_44 = fsat ssa_43.x
2022-07-27 22:49:44.717815: vec1 32 div ssa_45 = fsat ssa_43.y
2022-07-27 22:49:44.717822: vec1 32 div ssa_46 = fsat ssa_43.z
2022-07-27 22:49:44.717828: vec1 32 div ssa_47 = fsat ssa_43.w
2022-07-27 22:49:44.717835: vec1 32 div ssa_48 = fabs ssa_37
2022-07-27 22:49:44.717842: vec1 32 con ssa_49 = load_const (0x000001c0 = 0.000000)
2022-07-27 22:49:44.717850: vec4 32 con ssa_50 = intrinsic load_ubo (ssa_1, ssa_49) (access=0, align_mul=1073741824, align_offset=448, range_base=448, range=16)
2022-07-27 22:49:44.717858: vec1 32 div ssa_51 = fneg ssa_48
2022-07-27 22:49:44.717865: vec1 32 div ssa_52 = fadd ssa_50.x, ssa_51
2022-07-27 22:49:44.717873: vec1 32 div ssa_53 = fmul ssa_52, ssa_50.y
2022-07-27 22:49:44.717881: vec1 32 div ssa_54 = fsat ssa_53
2022-07-27 22:49:44.717888: vec4 32 div ssa_55 = vec4 ssa_22, ssa_23, ssa_24, ssa_25
2022-07-27 22:49:44.717895: intrinsic store_output (ssa_55, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=0 slots=1 /*128*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.717905: vec4 32 div ssa_56 = vec4 ssa_39, ssa_40, ssa_41, ssa_42
2022-07-27 22:49:44.717913: intrinsic store_output (ssa_56, ssa_1) (base=1, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=1 slots=1 /*129*/, xfb() /*0*/, xfb2() /*0*/) /* out_1 */
2022-07-27 22:49:44.717920: vec4 32 div ssa_57 = vec4 ssa_44, ssa_45, ssa_46, ssa_47
2022-07-27 22:49:44.717929: intrinsic store_output (ssa_57, ssa_1) (base=2, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=2 slots=1 /*130*/, xfb() /*0*/, xfb2() /*0*/) /* out_2 */
2022-07-27 22:49:44.717937: vec4 32 div ssa_58 = vec4 ssa_54, ssa_0.y, ssa_0.z, ssa_0.w
2022-07-27 22:49:44.717944: intrinsic store_output (ssa_58, ssa_1) (base=48, wrmask=x /*1*/, component=0, src_type=float32 /*160*/, io location=48 slots=1 /*176*/, xfb() /*0*/, xfb2() /*0*/) /* out_3 */
2022-07-27 22:49:44.717987: /* succs: block_1 */
2022-07-27 22:49:44.717997: block block_1:
2022-07-27 22:49:44.718004: }
2022-07-27 22:49:44.718011: VS Output VUE map (23 slots, SSO)
2022-07-27 22:49:44.718018: [0] VARYING_SLOT_PSIZ
2022-07-27 22:49:44.718025: [1] VARYING_SLOT_POS
2022-07-27 22:49:44.718033: [2] VARYING_SLOT_CLIP_DIST0
2022-07-27 22:49:44.718040: [3] VARYING_SLOT_CLIP_DIST1
2022-07-27 22:49:44.718047: [4] VARYING_SLOT_COL0
2022-07-27 22:49:44.718054: [5] VARYING_SLOT_COL1
2022-07-27 22:49:44.718061: [6] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718067: [7] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718075: [8] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718084: [9] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718092: [10] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718100: [11] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718106: [12] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718113: [13] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718120: [14] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718128: [15] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718135: [16] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718143: [17] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718150: [18] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718158: [19] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718166: [20] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718173: [21] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.718180: [22] VARYING_SLOT_VAR16
2022-07-27 22:49:44.718187: Native code for unnamed vertex shader TTN (sha1 447eb0bb1111eb6ee202eb90e872aa08ec4e3291)
2022-07-27 22:49:44.718194: SIMD8 shader: 34 instructions. 0 loops. 130 cycles. 0:0 spills:fills, 3 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 544 to 432 bytes (21%)
2022-07-27 22:49:44.718202: START B0 (130 cycles)
2022-07-27 22:49:44.718210: mul(8) g27<1>F g7<8,8,1>F g2<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.718218: mul(8) g28<1>F g7<8,8,1>F g2.1<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.718225: mul(8) g29<1>F g7<8,8,1>F g2.2<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.718233: mul(8) g30<1>F g7<8,8,1>F g2.3<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.718241: mul(8) g43<1>F g7<8,8,1>F g4.2<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.718248: mov.sat(8) g19<1>F g11<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718255: mov.sat(8) g20<1>F g12<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718262: mov.sat(8) g21<1>F g13<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718270: mov.sat(8) g22<1>F g14<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718278: mov.sat(8) g23<1>F g15<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718287: mov.sat(8) g24<1>F g16<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718295: mov.sat(8) g25<1>F g17<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718304: mov.sat(8) g26<1>F g18<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.718312: mov(8) g126<1>UD g1<8,8,1>UD { align1 WE_all 1Q compacted };
2022-07-27 22:49:44.718321: mad(8) g31<1>F g27<4,4,1>F g2.4<0,1,0>F g8<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718329: mad(8) g32<1>F g28<4,4,1>F g2.5<0,1,0>F g8<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718338: mad(8) g33<1>F g29<4,4,1>F g2.6<0,1,0>F g8<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718354: mad(8) g34<1>F g30<4,4,1>F g2.7<0,1,0>F g8<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718364: mad(8) g44<1>F g43<4,4,1>F g4.6<0,1,0>F g8<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718373: mad(8) g35<1>F g31<4,4,1>F g3.0<0,1,0>F g9<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718381: mad(8) g36<1>F g32<4,4,1>F g3.1<0,1,0>F g9<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718388: mad(8) g37<1>F g33<4,4,1>F g3.2<0,1,0>F g9<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718397: mad(8) g38<1>F g34<4,4,1>F g3.3<0,1,0>F g9<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718406: mad(8) g45<1>F g44<4,4,1>F g5.2<0,1,0>F g9<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718415: mad(8) g60<1>F g35<4,4,1>F g3.4<0,1,0>F g10<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718424: mad(8) g61<1>F g36<4,4,1>F g3.5<0,1,0>F g10<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718433: mad(8) g62<1>F g37<4,4,1>F g3.6<0,1,0>F g10<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718441: mad(8) g63<1>F g38<4,4,1>F g3.7<0,1,0>F g10<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718450: mad(8) g46<1>F g45<4,4,1>F g5.6<0,1,0>F g10<4,4,1>F { align16 1Q };
2022-07-27 22:49:44.718458: add(8) g55<1>F g6<0,1,0>F -(abs)g46<8,8,1>F { align1 1Q };
2022-07-27 22:49:44.718466: mul.sat(8) g122<1>F g55<8,8,1>F g6.1<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.718474: sends(8) nullUD g1UD g60UD 0x02080017 0x00000100
2022-07-27 22:49:44.718482: urb MsgDesc: offset 1 SIMD8 write mlen 1 ex_mlen 4 rlen 0 { align1 1Q };
2022-07-27 22:49:44.718491: sends(8) nullUD g1UD g19UD 0x02080047 0x00000200
2022-07-27 22:49:44.718500: urb MsgDesc: offset 4 SIMD8 write mlen 1 ex_mlen 8 rlen 0 { align1 1Q };
2022-07-27 22:49:44.718507: sends(8) nullUD g126UD g122UD 0x02080167 0x00000100
2022-07-27 22:49:44.718516: urb MsgDesc: offset 22 SIMD8 write mlen 1 ex_mlen 4 rlen 0 { align1 1Q EOT };
2022-07-27 22:49:44.718524: END B0
2022-07-27 22:49:44.718533: NIR (SSA form) for vertex shader:
2022-07-27 22:49:44.718540: shader: MESA_SHADER_VERTEX
2022-07-27 22:49:44.718549: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.718558: name: TTN
2022-07-27 22:49:44.718568: inputs: 3
2022-07-27 22:49:44.718576: outputs: 4
2022-07-27 22:49:44.718585: uniforms: 0
2022-07-27 22:49:44.718593: shared: 0
2022-07-27 22:49:44.718601: ray queries: 0
2022-07-27 22:49:44.718609: decl_var shader_in INTERP_MODE_FLAT vec4 in_0 (VERT_ATTRIB_GENERIC0.xyzw, 15, 0)
2022-07-27 22:49:44.718618: decl_var shader_in INTERP_MODE_FLAT vec4 in_1 (VERT_ATTRIB_GENERIC1.xyzw, 16, 0)
2022-07-27 22:49:44.718626: decl_var shader_in INTERP_MODE_FLAT vec4 in_2 (VERT_ATTRIB_GENERIC2.xyzw, 17, 0)
2022-07-27 22:49:44.718634: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (VARYING_SLOT_POS.xyzw, 0, 0)
2022-07-27 22:49:44.718641: decl_var shader_out INTERP_MODE_FLAT vec4 out_1 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.718649: decl_var shader_out INTERP_MODE_FLAT vec4 out_2 (VARYING_SLOT_COL1.xyzw, 2, 0)
2022-07-27 22:49:44.718657: decl_var shader_out INTERP_MODE_FLAT vec4 out_3 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:44.718665: decl_function main (0 params)
2022-07-27 22:49:44.718674: impl main {
2022-07-27 22:49:44.718683: block block_0:
2022-07-27 22:49:44.718691: /* preds: */
2022-07-27 22:49:44.718724: vec4 32 con ssa_0 = undefined
2022-07-27 22:49:44.718735: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.718743: vec4 32 div ssa_2 = intrinsic load_input (ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=15 slots=1 /*143*/)
2022-07-27 22:49:44.718752: vec4 32 div ssa_3 = intrinsic load_input (ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=16 slots=1 /*144*/)
2022-07-27 22:49:44.718761: vec1 32 div ssa_4 = fsat ssa_3.x
2022-07-27 22:49:44.718768: vec1 32 div ssa_5 = fsat ssa_3.y
2022-07-27 22:49:44.718776: vec1 32 div ssa_6 = fsat ssa_3.z
2022-07-27 22:49:44.718783: vec1 32 div ssa_7 = fsat ssa_3.w
2022-07-27 22:49:44.718790: vec4 32 div ssa_8 = intrinsic load_input (ssa_1) (base=2, component=0, dest_type=float32 /*160*/, io location=17 slots=1 /*145*/)
2022-07-27 22:49:44.718798: vec1 32 div ssa_9 = fsat ssa_8.x
2022-07-27 22:49:44.718805: vec1 32 div ssa_10 = fsat ssa_8.y
2022-07-27 22:49:44.718812: vec1 32 div ssa_11 = fsat ssa_8.z
2022-07-27 22:49:44.718819: vec1 32 div ssa_12 = fsat ssa_8.w
2022-07-27 22:49:44.718826: intrinsic store_output (ssa_2, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=0 slots=1 /*128*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.718834: vec4 32 div ssa_13 = vec4 ssa_4, ssa_5, ssa_6, ssa_7
2022-07-27 22:49:44.718841: intrinsic store_output (ssa_13, ssa_1) (base=1, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=1 slots=1 /*129*/, xfb() /*0*/, xfb2() /*0*/) /* out_1 */
2022-07-27 22:49:44.718848: vec4 32 div ssa_14 = vec4 ssa_9, ssa_10, ssa_11, ssa_12
2022-07-27 22:49:44.718855: intrinsic store_output (ssa_14, ssa_1) (base=2, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=2 slots=1 /*130*/, xfb() /*0*/, xfb2() /*0*/) /* out_2 */
2022-07-27 22:49:44.718862: vec4 32 div ssa_15 = vec4 ssa_8.w, ssa_0.y, ssa_0.z, ssa_0.w
2022-07-27 22:49:44.718869: intrinsic store_output (ssa_15, ssa_1) (base=48, wrmask=x /*1*/, component=0, src_type=float32 /*160*/, io location=48 slots=1 /*176*/, xfb() /*0*/, xfb2() /*0*/) /* out_3 */
2022-07-27 22:49:44.718876: /* succs: block_1 */
2022-07-27 22:49:44.718884: block block_1:
2022-07-27 22:49:44.718891: }
2022-07-27 22:49:44.718900: NIR (final form) for vertex shader:
2022-07-27 22:49:44.718908: shader: MESA_SHADER_VERTEX
2022-07-27 22:49:44.718916: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.718925: name: TTN
2022-07-27 22:49:44.718933: inputs: 3
2022-07-27 22:49:44.718941: outputs: 4
2022-07-27 22:49:44.718949: uniforms: 0
2022-07-27 22:49:44.718958: shared: 0
2022-07-27 22:49:44.718966: ray queries: 0
2022-07-27 22:49:44.718974: decl_var shader_in INTERP_MODE_FLAT vec4 in_0 (VERT_ATTRIB_GENERIC0.xyzw, 15, 0)
2022-07-27 22:49:44.718983: decl_var shader_in INTERP_MODE_FLAT vec4 in_1 (VERT_ATTRIB_GENERIC1.xyzw, 16, 0)
2022-07-27 22:49:44.718991: decl_var shader_in INTERP_MODE_FLAT vec4 in_2 (VERT_ATTRIB_GENERIC2.xyzw, 17, 0)
2022-07-27 22:49:44.718999: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (VARYING_SLOT_POS.xyzw, 0, 0)
2022-07-27 22:49:44.719007: decl_var shader_out INTERP_MODE_FLAT vec4 out_1 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.719015: decl_var shader_out INTERP_MODE_FLAT vec4 out_2 (VARYING_SLOT_COL1.xyzw, 2, 0)
2022-07-27 22:49:44.719024: decl_var shader_out INTERP_MODE_FLAT vec4 out_3 (VARYING_SLOT_VAR16.xyzw, 48, 0)
2022-07-27 22:49:44.719032: decl_function main (0 params)
2022-07-27 22:49:44.719041: impl main {
2022-07-27 22:49:44.719049: block block_0:
2022-07-27 22:49:44.719057: /* preds: */
2022-07-27 22:49:44.719065: vec4 32 con ssa_0 = undefined
2022-07-27 22:49:44.719074: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.719081: vec4 32 div ssa_2 = intrinsic load_input (ssa_1) (base=0, component=0, dest_type=float32 /*160*/, io location=15 slots=1 /*143*/)
2022-07-27 22:49:44.719098: vec4 32 div ssa_3 = intrinsic load_input (ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=16 slots=1 /*144*/)
2022-07-27 22:49:44.719106: vec1 32 div ssa_4 = fsat ssa_3.x
2022-07-27 22:49:44.719114: vec1 32 div ssa_5 = fsat ssa_3.y
2022-07-27 22:49:44.719123: vec1 32 div ssa_6 = fsat ssa_3.z
2022-07-27 22:49:44.719130: vec1 32 div ssa_7 = fsat ssa_3.w
2022-07-27 22:49:44.719138: vec4 32 div ssa_8 = intrinsic load_input (ssa_1) (base=2, component=0, dest_type=float32 /*160*/, io location=17 slots=1 /*145*/)
2022-07-27 22:49:44.719146: vec1 32 div ssa_9 = fsat ssa_8.x
2022-07-27 22:49:44.719154: vec1 32 div ssa_10 = fsat ssa_8.y
2022-07-27 22:49:44.719162: vec1 32 div ssa_11 = fsat ssa_8.z
2022-07-27 22:49:44.719171: vec1 32 div ssa_12 = fsat ssa_8.w
2022-07-27 22:49:44.719200: intrinsic store_output (ssa_2, ssa_1) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=0 slots=1 /*128*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.719211: vec4 32 div ssa_13 = vec4 ssa_4, ssa_5, ssa_6, ssa_7
2022-07-27 22:49:44.719220: intrinsic store_output (ssa_13, ssa_1) (base=1, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=1 slots=1 /*129*/, xfb() /*0*/, xfb2() /*0*/) /* out_1 */
2022-07-27 22:49:44.719229: vec4 32 div ssa_14 = vec4 ssa_9, ssa_10, ssa_11, ssa_12
2022-07-27 22:49:44.719237: intrinsic store_output (ssa_14, ssa_1) (base=2, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=2 slots=1 /*130*/, xfb() /*0*/, xfb2() /*0*/) /* out_2 */
2022-07-27 22:49:44.719245: vec4 32 div ssa_15 = vec4 ssa_8.w, ssa_0.y, ssa_0.z, ssa_0.w
2022-07-27 22:49:44.719254: intrinsic store_output (ssa_15, ssa_1) (base=48, wrmask=x /*1*/, component=0, src_type=float32 /*160*/, io location=48 slots=1 /*176*/, xfb() /*0*/, xfb2() /*0*/) /* out_3 */
2022-07-27 22:49:44.719263: /* succs: block_1 */
2022-07-27 22:49:44.719271: block block_1:
2022-07-27 22:49:44.719279: }
2022-07-27 22:49:44.719288: VS Output VUE map (23 slots, SSO)
2022-07-27 22:49:44.719297: [0] VARYING_SLOT_PSIZ
2022-07-27 22:49:44.719306: [1] VARYING_SLOT_POS
2022-07-27 22:49:44.719315: [2] VARYING_SLOT_CLIP_DIST0
2022-07-27 22:49:44.719323: [3] VARYING_SLOT_CLIP_DIST1
2022-07-27 22:49:44.719331: [4] VARYING_SLOT_COL0
2022-07-27 22:49:44.719339: [5] VARYING_SLOT_COL1
2022-07-27 22:49:44.719348: [6] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719356: [7] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719364: [8] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719373: [9] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719382: [10] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719390: [11] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719399: [12] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719408: [13] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719416: [14] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719425: [15] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719433: [16] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719442: [17] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719450: [18] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719458: [19] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719466: [20] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719474: [21] BRW_VARYING_SLOT_PAD
2022-07-27 22:49:44.719483: [22] VARYING_SLOT_VAR16
2022-07-27 22:49:44.719491: Native code for unnamed vertex shader TTN (sha1 ffb696f9ce71a921f6078357aeddcf510fa5d013)
2022-07-27 22:49:44.719500: SIMD8 shader: 17 instructions. 0 loops. 80 cycles. 0:0 spills:fills, 3 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 272 to 160 bytes (41%)
2022-07-27 22:49:44.719508: START B0 (80 cycles)
2022-07-27 22:49:44.719517: mov.sat(8) g14<1>F g6<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719548: mov.sat(8) g15<1>F g7<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719558: mov.sat(8) g16<1>F g8<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719566: mov.sat(8) g17<1>F g9<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719574: mov.sat(8) g18<1>F g10<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719582: mov.sat(8) g19<1>F g11<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719590: mov.sat(8) g20<1>F g12<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719597: mov.sat(8) g21<1>F g13<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719604: mov(8) g27<1>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719612: mov(8) g28<1>F g3<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719621: mov(8) g29<1>F g4<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719629: mov(8) g30<1>F g5<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719637: mov(8) g126<1>UD g1<8,8,1>UD { align1 WE_all 1Q compacted };
2022-07-27 22:49:44.719646: sends(8) nullUD g1UD g27UD 0x02080017 0x00000100
2022-07-27 22:49:44.719654: urb MsgDesc: offset 1 SIMD8 write mlen 1 ex_mlen 4 rlen 0 { align1 1Q };
2022-07-27 22:49:44.719663: sends(8) nullUD g1UD g14UD 0x02080047 0x00000200
2022-07-27 22:49:44.719670: urb MsgDesc: offset 4 SIMD8 write mlen 1 ex_mlen 8 rlen 0 { align1 1Q };
2022-07-27 22:49:44.719679: mov(8) g122<1>F g13<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.719686: sends(8) nullUD g126UD g122UD 0x02080167 0x00000100
2022-07-27 22:49:44.719694: urb MsgDesc: offset 22 SIMD8 write mlen 1 ex_mlen 4 rlen 0 { align1 1Q EOT };
2022-07-27 22:49:44.719703: END B0
2022-07-27 22:49:44.719711: NIR (SSA form) for fragment shader:
2022-07-27 22:49:44.719719: shader: MESA_SHADER_FRAGMENT
2022-07-27 22:49:44.719727: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.719735: name: TTN
2022-07-27 22:49:44.719743: inputs: 1
2022-07-27 22:49:44.719752: outputs: 1
2022-07-27 22:49:44.719760: uniforms: 0
2022-07-27 22:49:44.719768: ubos: 1
2022-07-27 22:49:44.719776: shared: 0
2022-07-27 22:49:44.719785: ray queries: 0
2022-07-27 22:49:44.719793: decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.719801: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0.xyzw, 8, 0)
2022-07-27 22:49:44.719809: decl_var uniform INTERP_MODE_NONE vec4[2] uniform_21 (21, 21, 0)
2022-07-27 22:49:44.719818: decl_var ubo INTERP_MODE_NONE vec4[23] uniform_0 (0, 0, 0)
2022-07-27 22:49:44.719826: decl_function main (0 params)
2022-07-27 22:49:44.719835: impl main {
2022-07-27 22:49:44.719843: block block_0:
2022-07-27 22:49:44.719851: /* preds: */
2022-07-27 22:49:44.719860: vec2 32 div ssa_0 = intrinsic load_barycentric_pixel () (interp_mode=1)
2022-07-27 22:49:44.719868: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.719877: vec4 32 div ssa_2 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=1 slots=1 /*129*/) /* in_0 */
2022-07-27 22:49:44.719886: vec4 32 div ssa_3 = intrinsic load_frag_coord () ()
2022-07-27 22:49:44.719908: vec1 32 con ssa_4 = load_const (0x00000160 = 0.000000)
2022-07-27 22:49:44.719918: vec1 32 con ssa_5 = load_const (0x00000001 = 0.000000)
2022-07-27 22:49:44.719926: vec4 32 con ssa_6 = intrinsic load_ubo (ssa_5, ssa_4) (access=0, align_mul=1073741824, align_offset=352, range_base=352, range=16)
2022-07-27 22:49:44.719935: vec1 32 div ssa_7 = fneg ssa_3.z
2022-07-27 22:49:44.719943: vec1 32 div ssa_8 = fadd ssa_6.x, ssa_7
2022-07-27 22:49:44.719951: vec1 32 div ssa_9 = fmul ssa_8, ssa_6.y
2022-07-27 22:49:44.719959: vec1 32 div ssa_10 = fsat ssa_9
2022-07-27 22:49:44.719966: vec1 32 con ssa_11 = load_const (0x00000150 = 0.000000)
2022-07-27 22:49:44.719974: vec4 32 con ssa_12 = intrinsic load_ubo (ssa_5, ssa_11) (access=0, align_mul=1073741824, align_offset=336, range_base=336, range=16)
2022-07-27 22:49:44.719982: vec1 32 div ssa_13 = flrp ssa_12.x, ssa_2.x, ssa_10
2022-07-27 22:49:44.719990: vec1 32 div ssa_14 = flrp ssa_12.y, ssa_2.y, ssa_10
2022-07-27 22:49:44.719998: vec1 32 div ssa_15 = flrp ssa_12.z, ssa_2.z, ssa_10
2022-07-27 22:49:44.720006: vec4 32 div ssa_16 = vec4 ssa_13, ssa_14, ssa_15, ssa_2.w
2022-07-27 22:49:44.720014: intrinsic store_output (ssa_16, ssa_1) (base=8, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=4 slots=1 /*132*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.720022: /* succs: block_1 */
2022-07-27 22:49:44.720031: block block_1:
2022-07-27 22:49:44.720039: }
2022-07-27 22:49:44.720047: NIR (final form) for fragment shader:
2022-07-27 22:49:44.720056: shader: MESA_SHADER_FRAGMENT
2022-07-27 22:49:44.720064: source_sha1: {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000}
2022-07-27 22:49:44.720073: name: TTN
2022-07-27 22:49:44.720082: inputs: 1
2022-07-27 22:49:44.720090: outputs: 1
2022-07-27 22:49:44.720099: uniforms: 0
2022-07-27 22:49:44.720107: ubos: 1
2022-07-27 22:49:44.720115: shared: 0
2022-07-27 22:49:44.720123: ray queries: 0
2022-07-27 22:49:44.720132: decl_var shader_in INTERP_MODE_SMOOTH vec4 in_0 (VARYING_SLOT_COL0.xyzw, 1, 0)
2022-07-27 22:49:44.720140: decl_var shader_out INTERP_MODE_FLAT vec4 out_0 (FRAG_RESULT_DATA0.xyzw, 8, 0)
2022-07-27 22:49:44.720149: decl_var uniform INTERP_MODE_NONE vec4[2] uniform_21 (21, 21, 0)
2022-07-27 22:49:44.720157: decl_var ubo INTERP_MODE_NONE vec4[23] uniform_0 (0, 0, 0)
2022-07-27 22:49:44.720166: decl_function main (0 params)
2022-07-27 22:49:44.720174: impl main {
2022-07-27 22:49:44.720183: block block_0:
2022-07-27 22:49:44.720192: /* preds: */
2022-07-27 22:49:44.720200: vec2 32 div ssa_0 = intrinsic load_barycentric_pixel () (interp_mode=1)
2022-07-27 22:49:44.720208: vec1 32 con ssa_1 = load_const (0x00000000 = 0.000000)
2022-07-27 22:49:44.720217: vec4 32 div ssa_2 = intrinsic load_interpolated_input (ssa_0, ssa_1) (base=1, component=0, dest_type=float32 /*160*/, io location=1 slots=1 /*129*/) /* in_0 */
2022-07-27 22:49:44.720224: vec4 32 div ssa_3 = intrinsic load_frag_coord () ()
2022-07-27 22:49:44.720232: vec1 32 con ssa_4 = load_const (0x00000160 = 0.000000)
2022-07-27 22:49:44.720240: vec1 32 con ssa_5 = load_const (0x00000001 = 0.000000)
2022-07-27 22:49:44.720248: vec4 32 con ssa_6 = intrinsic load_ubo (ssa_5, ssa_4) (access=0, align_mul=1073741824, align_offset=352, range_base=352, range=16)
2022-07-27 22:49:44.720257: vec1 32 div ssa_7 = fneg ssa_3.z
2022-07-27 22:49:44.720265: vec1 32 div ssa_8 = fadd ssa_6.x, ssa_7
2022-07-27 22:49:44.720288: vec1 32 div ssa_9 = fmul ssa_8, ssa_6.y
2022-07-27 22:49:44.720296: vec1 32 div ssa_10 = fsat ssa_9
2022-07-27 22:49:44.720304: vec1 32 con ssa_11 = load_const (0x00000150 = 0.000000)
2022-07-27 22:49:44.720312: vec4 32 con ssa_12 = intrinsic load_ubo (ssa_5, ssa_11) (access=0, align_mul=1073741824, align_offset=336, range_base=336, range=16)
2022-07-27 22:49:44.720319: vec1 32 div ssa_13 = flrp ssa_12.x, ssa_2.x, ssa_10
2022-07-27 22:49:44.720350: vec1 32 div ssa_14 = flrp ssa_12.y, ssa_2.y, ssa_10
2022-07-27 22:49:44.720360: vec1 32 div ssa_15 = flrp ssa_12.z, ssa_2.z, ssa_10
2022-07-27 22:49:44.720368: vec4 32 div ssa_16 = vec4 ssa_13, ssa_14, ssa_15, ssa_2.w
2022-07-27 22:49:44.720376: intrinsic store_output (ssa_16, ssa_1) (base=8, wrmask=xyzw /*15*/, component=0, src_type=float32 /*160*/, io location=4 slots=1 /*132*/, xfb() /*0*/, xfb2() /*0*/) /* out_0 */
2022-07-27 22:49:44.720384: /* succs: block_1 */
2022-07-27 22:49:44.720393: block block_1:
2022-07-27 22:49:44.720401: }
2022-07-27 22:49:44.720410: Native code for unnamed fragment shader TTN (sha1 79cb4fc7ba74b8716c799ae02334d0968987c41c)
2022-07-27 22:49:44.720418: SIMD8 shader: 11 instructions. 0 loops. 74 cycles. 0:0 spills:fills, 1 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 176 to 128 bytes (27%)
2022-07-27 22:49:44.720427: START B0 (74 cycles)
2022-07-27 22:49:44.720435: pln(8) g11<1>F g6<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.720443: pln(8) g9<1>F g6.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.720452: pln(8) g13<1>F g7<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.720461: pln(8) g126<1>F g7.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.720470: mov(8) g8<1>F null<8,8,1>F { align1 1Q };
2022-07-27 22:49:44.720478: ERROR: src0 is null
2022-07-27 22:49:44.720487: add(8) g2<1>F g5<0,1,0>F -g8<8,8,1>F { align1 1Q compacted };
2022-07-27 22:49:44.720496: mul.sat(8) g3<1>F g2<8,8,1>F g5.1<0,1,0>F { align1 1Q compacted };
2022-07-27 22:49:44.720505: lrp(8) g123<1>F g3<4,4,1>F g11<4,4,1>F g4.4<0,1,0>F { align16 1Q };
2022-07-27 22:49:44.720513: lrp(8) g124<1>F g3<4,4,1>F g9<4,4,1>F g4.5<0,1,0>F { align16 1Q };
2022-07-27 22:49:44.720521: lrp(8) g125<1>F g3<4,4,1>F g13<4,4,1>F g4.6<0,1,0>F { align16 1Q };
2022-07-27 22:49:44.720530: sendc(8) null<1>UW g123<0,1,0>UD 0x88031400
2022-07-27 22:49:44.720539: render MsgDesc: RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT };
2022-07-27 22:49:44.720547: END B0
2022-07-27 22:49:44.720555: NineTests: ../src/intel/compiler/brw_fs_generator.cpp:2620: int fs_generator::generate_code(const cfg_t*, int, shader_stats, const brw::performance&, brw_compile_stats*): Assertion `validated' failed.
2022-07-27 22:49:44.720565: ./NineTests.sh: line 3: 293 Aborted INTEL_DEBUG=shaders ./NineTests
```
https://gitlab.freedesktop.org/mesa/mesa/-/issues/6810
opt_uniform_atomics causing problems on RT shaders & OpenCL kernels
2022-07-07T06:54:16Z
Lionel Landwerlin
opt_uniform_atomics causing problems on RT shaders & OpenCL kernels
Disabled for RT here : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16104
We're also having issues with Gfx7/7.5.
Somewhat related, this was found to affect this FarCry rendering issue for me : https://gitlab.freedesktop.o...
Disabled for RT here : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16104
We're also having issues with Gfx7/7.5.
Somewhat related, this was found to affect this FarCry rendering issue for me : https://gitlab.freedesktop.org/mesa/mesa/-/issues/6420#note_1405082
For @kwg that didn't completely fix it. Seems like we might have a backend issue somewhere.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/5182
intel: Lower txf_ms to txf_ms+txf_ms_mcs in NIR
2021-08-05T15:09:33Z
Faith Ekstrand
intel: Lower txf_ms to txf_ms+txf_ms_mcs in NIR
Long ago, I plumbed `nir_texop_txf_ms_mcs_intel` and `nir_tex_src_mcs_intel` into NIR so BLORP could do manual MCS management. Since we have these, we may as well add a `lower_txf_ms_to_mcs` bit to `nir_lower_tex` and do the lowering th...
Long ago, I plumbed `nir_texop_txf_ms_mcs_intel` and `nir_tex_src_mcs_intel` into NIR so BLORP could do manual MCS management. Since we have these, we may as well add a `lower_txf_ms_to_mcs` bit to `nir_lower_tex` and do the lowering there. Then we could delete the automatic MCS handling from both back-ends. Likely we'd need to add `nir_texop_txf_ms_mcs_intel` and `nir_tex_src_mcs_intel` support to the vec4 back-end but that doesn't seem hard.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/4174
ANV: GPU hang with e1m1 GRVK demo
2021-01-31T22:26:12Z
Clément Guérin
ANV: GPU hang with e1m1 GRVK demo
* mesa 34e3e1649
* linux 5.10.10
* GPU: Intel(R) HD Graphics 530 (SKL GT2)
Steps to reproduce:
* Download [GRVK 0.2.0](https://github.com/libcg/grvk/releases/download/0.2.0/grvk-0.2.0.zip) and extract it
* Run the e1m1 demo using `wine ...
* mesa 34e3e1649
* linux 5.10.10
* GPU: Intel(R) HD Graphics 530 (SKL GT2)
Steps to reproduce:
* Download [GRVK 0.2.0](https://github.com/libcg/grvk/releases/download/0.2.0/grvk-0.2.0.zip) and extract it
* Run the e1m1 demo using `wine hello.exe shaders/e1m1_ps.bin` (needs recent wine for `VK_EXT_extended_dynamic_state`)
Expected result:
* Demo should run
Observed result:
* The driver detects a GPU hang, because of excessive frame time (~700ms). All other demos run fine, the only difference being the fragment shader.
@llandwerlin
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3380
intel/fs: Make the constant data part of the shader.
2020-08-20T16:02:24Z
Faith Ekstrand
intel/fs: Make the constant data part of the shader.
Instead of having the constant data be a bit that hangs off the NIR, it would be really convenient if the back-end compiler would just stuff it in the end of the program for us. We could add a couple new fields to `brw_stage_prog_data` ...
Instead of having the constant data be a bit that hangs off the NIR, it would be really convenient if the back-end compiler would just stuff it in the end of the program for us. We could add a couple new fields to `brw_stage_prog_data` for `const_data_offset` and `const_data_size`. It wouldn't simplify state setup at all because we still have to set a UBO up for it but it would simplify shader caching quite a bit. Every extra bit of data with it's own size that's hanging off the side of the shader is a pain to cache.
@mattst88, @kwg, thoughts?
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3083
intel: Optimize compute workgroup sizes
2023-08-10T12:42:13Z
Faith Ekstrand
intel: Optimize compute workgroup sizes
There are many cases when a client may choose a workgroup size which is non-optimal for our hardware:
1. It doesn't care about the local group size so it sets 1x1x1 and just uses the global group size. This means each shader thread wi...
There are many cases when a client may choose a workgroup size which is non-optimal for our hardware:
1. It doesn't care about the local group size so it sets 1x1x1 and just uses the global group size. This means each shader thread will only be doing 1 unit of work rather than 8, 16, or 32.
2. It chooses a large local group size which fits in a slice but doesn't fill the whole slice. This can happen often because we have non-power-of-two numbers of EUs per slice.
There are a number of possible optimizations here:
1. For shaders which don't require barriers or SLM (shared variables), we can make the local workgroup size be 8, 16, or 32 depending on how we compile the shader. We then adjust all the various workgroup IDs we generate in the shader to make it look like it's running at the client's requested size.
2. For 1x1x1 local workgroups which use SLM and/or barriers, we can move the SLM to normal local variables because there is only one invocation.
3. For shaders where the entire local workgroup fits in a single SIMD8, SIMD16, or SIMD32 invocation, we can delete all barrier instructions.
4. For small local workgroup sizes which use barriers or SLM, we can put multiple local workgroups into a single SIMD8, SIMD16, or SIMD32 workgroup. We just have to be careful with SLM to ensure that each workgroup gets its own SLM space. Likely, this means dividing up the SLM and doing an offset in the shader.
I'm not sure how all this works with variable workgroup sizes. Likely, only 1 and 4 work in that case.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/3055
[icl] Intermittent hangs with simd16 dual source blending
2020-06-17T22:26:03Z
Danylo Piliaiev
[icl] Intermittent hangs with simd16 dual source blending
This is a continuation of https://gitlab.freedesktop.org/mesa/mesa/-/issues/2183 which was closed by workaround https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5037, since underlying issue is still unknown.
Apitrace which coul...
This is a continuation of https://gitlab.freedesktop.org/mesa/mesa/-/issues/2183 which was closed by workaround https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5037, since underlying issue is still unknown.
Apitrace which could reproduce the hang: [glamor_dual_src_gen11_hang.trace](https://gitlab.freedesktop.org/mesa/mesa/uploads/bedd1ec4bd1b87b8ea3a57ced89717d5/glamor_dual_src_gen11_hang.trace) (It may take more than one run for trace to hang)
Error state: [gen11-aa-hang.tar.gz](/uploads/ecea82403a48104630496cca870aa27a/gen11-aa-hang.tar.gz)
How to reproduce without the trace (You would need to revert https://gitlab.freedesktop.org/mesa/mesa/-/commit/296c04d78c9840f83e7fcaf9b45a4cee96752348):
1) Compile reproducer [main.c](/uploads/7fcc59c1081f55cc943e6e594cbb4833/main.c)
2) Install `Xephyr` and `xsettingsd`
3) Create xsettingsd config `~/.config/xsettingsd/xsettingsd.conf`:
```
Gtk/FontName "Noto Sans, Regular 10"
Xft/Hinting 1
Xft/HintStyle "hintfull"
Xft/Antialias 1
Xft/RGBA "rgb"
```
4) `Xephyr -glamor -screen 1024x800 -reset :2`
5) `DISPLAY=:2 xsettingsd &`
6) `DISPLAY=:2 ./main` (Using the reproducer)
https://gitlab.freedesktop.org/mesa/mesa/-/issues/2998
ANV: Pathological performance cliff with compute shader
2020-05-19T16:41:54Z
Hans-Kristian Arntzen
ANV: Pathological performance cliff with compute shader
I have a benchmark which runs 2-3x faster on the Windows driver on same hardware.
Tested on Intel UHD 620 on Mesa 20.0.7.
To run benchmark on Linux:
```
git clone git://github.com/Themaister/parallel-rdp
cd parallel-rdp
git checkout 2b0...
I have a benchmark which runs 2-3x faster on the Windows driver on same hardware.
Tested on Intel UHD 620 on Mesa 20.0.7.
To run benchmark on Linux:
```
git clone git://github.com/Themaister/parallel-rdp
cd parallel-rdp
git checkout 2b0ff05bfb49ef9eb02a7ade331fd91331ecc72c
git submodule update --init --recursive
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release --parallel
./rdp-bench
```
My end result is ~0.1 Gpixels/s.
With a similar build process on Windows, I get ~0.32 GPixels/s.
One caveat is that the Windows build assumes `PARALLEL_RDP_SMALL_TYPES=0` since 32-bit arithmetic was significantly faster than 8/16-bit arithmetic. To also run Windows with 8/16-bit arithmetic, use the mentioned env var, and I now observed ~0.22 GPixels/s.
Attached is a Fossilize archive with the shaders compiled for the benchmark.
[repro.foz](/uploads/e54790d8a6b6f1692fdbb89841e143be/repro.foz). The pipeline `0771c744744c4da4` is the likely culprit as it's SIMD8 with a ton of spilling. Unfortunately the Windows driver does not support pipeline executable properties so I cannot inspect compilation results.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/2861
Write unit tests for brw_fs_register_coalesce
2020-06-08T23:23:08Z
Faith Ekstrand
Write unit tests for brw_fs_register_coalesce
We have unit tests for the vec4 register coalesce pass. It'd be good to write some for the FS one as well. In particular, we should unit-test #2820.
We have unit tests for the vec4 register coalesce pass. It'd be good to write some for the FS one as well. In particular, we should unit-test #2820.