Bitwise and with constant 31 removed on width argument to BitFieldSExtract, causing incorrect result on RADV ACO
Description
In the attached RenderDoc capture, signed-bfe-width-mask_capture.rdc, a bitwise-and of a scalar integer with 31 is removed from a shader, causing extra bits to be set in the width argument of a bitfield extract operation. This causes incorrect results on an AMD RX 480 using Mesa commit 7447c158.
Steps to reproduce
- Open attached capture
- On the Resource Inspector tab, double click buffer 130
- From the bar at the top of the tab, cick View Contents
- Each component should have value 0xFFFFFF9F after event ID 6, but some are 0x9F.
System information
inxi -GSC -xx
output:
System:
Host: hincker Kernel: 6.2.15-300.fc38.x86_64 arch: x86_64 bits: 64
compiler: gcc v: 2.39-9.fc38 Desktop: GNOME v: 44.1 tk: GTK v: 3.24.38
wm: gnome-shell dm: 1: GDM 2: LightDM note: stopped 3: SDDM note: stopped
Distro: Fedora release 38 (Thirty Eight)
CPU:
Info: quad core model: Intel Core i7-6700K bits: 64 type: MT MCP
arch: Skylake-S rev: 3 cache: L1: 256 KiB L2: 1024 KiB L3: 8 MiB
Speed (MHz): avg: 4000 min/max: 800/4200 cores: 1: 4000 2: 4000 3: 4000
4: 4000 5: 4000 6: 4000 7: 4000 8: 4000 bogomips: 63999
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
Device-1: Intel HD Graphics 530 vendor: Gigabyte driver: i915 v: kernel
arch: Gen-9 ports: active: DP-1 empty: HDMI-A-1,HDMI-A-2,HDMI-A-3
bus-ID: 00:02.0 chip-ID: 8086:1912
Device-2: AMD Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
driver: amdgpu v: kernel arch: GCN-4 pcie: speed: 8 GT/s lanes: 16 ports:
active: HDMI-A-4 empty: DP-2,DP-3,DP-4 bus-ID: 01:00.0 chip-ID: 1002:67df
temp: 52.0 C
Device-3: Logitech Webcam C270 driver: snd-usb-audio,uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 bus-ID: 1-3:2 chip-ID: 046d:0825
Display: wayland server: X.org v: 1.20.14 with: Xwayland v: 22.1.9
compositor: gnome-shell driver: X: loaded: modesetting unloaded: fbdev,vesa
alternate: amdgpu dri: radeonsi,iris gpu: amdgpu,i915 display-ID: 0
Monitor-1: DP-1 model: JT178x1-3 res: 1280x1024 dpi: 96 diag: 433mm (17")
Monitor-2: HDMI-A-4 model: Asus VX24A res: 2560x1440 dpi: 123
diag: 604mm (23.8")
API: OpenGL v: 4.6 Mesa 23.2.0-devel (git-7447c15894) renderer: AMD
Radeon RX 480 Graphics (polaris10 LLVM 16.0.4 DRM 3.49
6.2.15-300.fc38.x86_64) direct-render: Yes
Further information
The code used to produce the capture is here: signed_bfe_width_mask.tar.xz.
The issue does not reproduce with the LLVM backend (with RADV_DEBUG=llvm
). When the bug is reproducing, variables ACO_DEBUG
, RADV_DEBUG
, and RADV_PERFTEST
are not set.
In the capture, the affected operation happens in the compute shader in event ID 6. From the KHR_pipeline_executable_properties disassembly, it looks like the optimized NIR shader has removed the masking with 0x1f. If the code is modified to mask a offset not known at compile time to BitFieldSExtract with 0x1f, this would also disappear in the NIR assembly, but similar masking appears in the ACO IR and the final assembly.
In the GCN3 instruction set architecture documentation, section 13.1, it says bits 16 to 22 of S1 is the width for s_bfe_i32
. This is 7 bits, not 5. Setting either of the two extra bits which were supposed to be masked out by the & 31
appears to cause more bits than intended to be used, preventing the expected sign extension in s_bfe_i32
.