Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
@@ -117,3 +117,12 @@ So if besides we the first child we put a node "retest parent from child offset
...
@@ -117,3 +117,12 @@ So if besides we the first child we put a node "retest parent from child offset
Of course if we don't cull any boxes this way then this can be a net negative in number of box nodes processes.
Of course if we don't cull any boxes this way then this can be a net negative in number of box nodes processes.
### VALU budget
Per CU per cycle the GPU can process 1 BVH node (aka 1 lane of BVH) and 64 lanes of VALU instructions.
At 100% of lanes enabled (and assuming the memory latency does not lead to throughput limitations) that means we have optimal performance for up to 64 VALU instructions per iteration on avg.
However, if the number of active lanes is lower then our budget for optimal perf becomes lower, so assuming we get about 67% or so we have a budget of ~43 VALU instructions/iteration.
(latency + max in flight memory instructions might make the RT instructions though, in which case we can have a higher budget , bet lets hope that doesn't happen should we)