This adds a mechanism to log points in cmdstream (in particular toplevel IB1 cmdstream) with GPU timestamps, which are recorded from the cmdstream, and collected later and re-associated with queued log msgs.
This gives us a way to profile the breakdown of how much time is spent in various parts of a batch.
Note I've correlated the timestamps to fps in some known cases and it looks like our conversion of ts to ns is correct (ie. a6xx is still logging count of 19.2MHz clk).. and added a few more useful tracepoints. So I'm going to call this ready-to-go.