... | ... | @@ -117,3 +117,12 @@ So if besides we the first child we put a node "retest parent from child offset |
|
|
Of course if we don't cull any boxes this way then this can be a net negative in number of box nodes processes.
|
|
|
|
|
|
|
|
|
### VALU budget
|
|
|
|
|
|
Per CU per cycle the GPU can process 1 BVH node (aka 1 lane of BVH) and 64 lanes of VALU instructions.
|
|
|
|
|
|
At 100% of lanes enabled (and assuming the memory latency does not lead to throughput limitations) that means we have optimal performance for up to 64 VALU instructions per iteration on avg.
|
|
|
|
|
|
However, if the number of active lanes is lower then our budget for optimal perf becomes lower, so assuming we get about 67% or so we have a budget of ~43 VALU instructions/iteration.
|
|
|
|
|
|
(latency + max in flight memory instructions might make the RT instructions though, in which case we can have a higher budget , bet lets hope that doesn't happen should we) |