Relaxed atomic loads in while loops being optimized away
Describe the issue
It appears that while loops that include a relaxed atomic memory load are being optimized away. Here are a couple amber tests that show the problem: https://gist.github.com/reeselevine/935febcc7a8c4c192c234c54522f0cb0
The infinite loop test should hang, since the flag is never updated, and on an Nvidia Quadro RTX 4000 it does, as expected. However, on the Intel GPU, the test passes immediately.
To confirm this is not due to an optimization that is simply removing an infinite loop, I wrote a second example that mimics message passing, where one thread updates data and sets a flag, while the other spins on the flag and then loads the data. This test should always pass, however on the Intel GPU the amber test fails.
Interestingly, if the commented out fence in the message-passing amber test (line 98) is uncommented, the test passes. It also happens if the load on line 96 is strengthened to an acquire. Therefore, this issue seems only to affect relaxed memory accesses.
System information
System: Host: toucan Kernel: 5.8.0-43-generic x86_64 bits: 64 compiler: N/A Console: tty 0 dm: GDM3
Distro: Ubuntu 20.04.2 LTS (Focal Fossa)
CPU: Topology: 8-Core model: Intel Core i7-9700K bits: 64 type: MCP arch: Kaby Lake rev: D L2 cache: 12.0 MiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 57600
Speed: 800 MHz min/max: 800/4900 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 4: 800 5: 800 6: 800 7: 800 8: 800
Graphics: Device-1: Intel UHD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:3e98
Device-2: NVIDIA TU104GL [Quadro RTX 4000] vendor: Dell driver: nvidia v: 460.32.03 bus ID: 01:00.0
chip ID: 10de:1eb1
Display: server: X.org 1.20.9 driver: modesetting,nvidia unloaded: fbdev,nouveau,vesa compositor: gnome-shell
tty: 204x50
Message: Advanced graphics data unavailable in console. Try -G --display
Tagging @tyler-utah as well