Changes

Bas Nieuwenhuizen · d9b6be6d
--- a/Raytracing.md
+++ b/Raytracing.md
@@ -117,3 +117,12 @@ So if besides we the first child we put a node "retest parent from child offset
 Of course if we don't cull any boxes this way then this can be a net negative in number of box nodes processes.


+### VALU budget
+
+Per CU per cycle the GPU can process 1 BVH node (aka 1 lane of BVH) and 64 lanes of VALU instructions.
+
+At 100% of lanes enabled (and assuming the memory latency does not lead to throughput limitations) that means we have optimal performance for up to 64 VALU instructions per iteration on avg.
+
+However, if the number of active lanes is lower then our budget for optimal perf becomes lower, so assuming we get about 67% or so we have a budget of ~43 VALU instructions/iteration.
+
+(latency + max in flight memory instructions might make the RT instructions though, in which case we can have a higher budget , bet lets hope that doesn't happen should we)