Bas Nieuwenhuizen · a00efa08
--- a/RDNA2-intersection-performance.md
+++ b/RDNA2-intersection-performance.md
@@ -20,7 +20,7 @@ This would suggest the obvious improvement of surrounding the instruction with a

 TODO: test mixed workloads.

-# Few lanes enabled.
+# Few lanes enabled

 With one lane enabled we only get ~14 Ginsns/sec, which is a far cry of the 169 Ginsns/sec one would expect with perfect scaling. In fact below 12 lanes enabled you don't get any improvement. Even worse, it is 12-lanes per half in wave64 (unless 0 lanes in which case 1 half can be disabled). So a mask of `exec_lo = 0x1` and `exec_hi = 0x1` only give you ~7 Ginsns/sec.