Changes

Bas Nieuwenhuizen · b41b76e8
--- a/Raytracing.md
+++ b/Raytracing.md
@@ -104,6 +104,8 @@ Couple of solutions:
 A good solution might be a combination of 1/2 + 4, keeping a small local part of the stack for frequent operations and then regularly pushing/popping big chunks into/from VMEM. Will need some work to avoid significant divergence.
+(for non-inline we might just do LDS + VMEM since that is way more efficient and we have 32 dwords/lane)
 Stackless traversal is possible, but (a) we might not be able to fit a parent pointer in the fp16 box nodes without significant overhead (b) would need 1 load per level (instead of 1 load + 1 store per M (M=4/8?) levels in the combined solution above) (c) needs a bunch of logic to figure out the next child which would probably involve doing the intersection test again, which leads us to ...
 ### Should we retest parent box nodes?