Skip to content

util: GPU tracepoints

Rob Clark requested to merge robclark/mesa:wip/utrace into master

This MR adds a tracepoint mechanism loosely inspired by the kernel's tracepoint mechanism, with a few notable differences:

  1. GPU timestamps! A driver provided callback is used to emit timestamps to a buffer. At a later point in time (when stalling to wait for the GPU is not required), the timestamps are re-united with the trace payload. This makes the trace mechanism suitable for profiling.
  2. Instead of a systemwide trace ringbuffer, buffering of un-retired tracepoints is split into two stages. Traces are emitted to a u_trace instance, and at a later time flushed to a u_trace_context instance. This avoids the requirement that commandstream containing tracepoints is emitted in the same order as it is generated.
  3. Unlike kernel tracepoints, u_trace tracepoints are defined in py from which header and src files are generated. Since we already have a build dependency on python+mako, this gives more flexibility than clunky preprocessor macro magic.

The mechanism for capturing timestamps on the GPU is based on freedreno's fd_log() mechanism. But unlike fd_log() the trace payload is binary. The intention is to (waves hands) add a perfetto backend as an alternative to the current printf backend, to ship this data out to an external tool (AGI?) so data can be visualized in a nice timeline view like:

003_blit_bottom_track

Still a bit WIP. I haven't finished converting all the fd_log() tracepoints and removing fd_log(). There are a few rough edges. And notably for tiler GPUs there isn't yet support for traces in cmdstream that is executed per-tile (ie. so we can use this to instrument tile passes, but not individual draws within a tile-pass). I'm not entirely sure how we'd want to visualize that in the first place. For per-draw instrumentation on a tiler, we probably want something that accumulates elapsed time rather than absolute timestamps (and at that point, it could simply re-use elapsed-time queries).

Another open question is how we could make this something that could be re-used by vk drivers. Perhaps with a bit of an internal shim, this could be (waves hands again) built twice, once using gallium APIs and a second time using vk APIs?

Currently, with the freedreno tracepoints that I've added so far, GALLIUM_GPU_TRACE=1 gets you output like:

+----- NS -----+ +-- Δ --+  +----- MSG -----
0000003067732720        +0: flush_batch: 0x17119720: cleared=5, gmem_reason=3, num_draws=2
0000003067732720        +0: framebuffer: 800x600x0@1, nr_cbufs: 1
0000003067732720        +0: surface: 800x600@0, fmt=b8g8r8a8_unorm
0000003067732720        +0: surface: 800x600@0, fmt=z24x8_unorm
0000003067732720        +0: render_gmem: 1x4 bins of 800x160
0000003067732720        +0: start_state_restore
0000003067734904     +2184: end_state_restore
0000003067735164      +260: start_prologue
0000003067755496    +20332: end_prologue
0000003067757732     +2236: start_binning_ib
0000003067848836    +91104: end_binning_ib
0000003067854244     +5408: start_vsc_overflow_test
0000003067862876     +8632: end_vsc_overflow_test
0000003067866204     +3328: start_tile: bin_h=160, yoff=0, bin_w=800, xoff=0
0000003067867660     +1456: start_clear_restore: fast_cleared: 0x5
0000003067875148     +7488: end_clear_restore
0000003067875148        +0: start_draw_ib
0000003067898652    +23504: end_draw_ib
0000003067898860      +208: start_resolve
0000003067940200    +41340: end_resolve
0000003067940200        +0: start_tile: bin_h=160, yoff=160, bin_w=800, xoff=0
0000003067942436     +2236: start_clear_restore: fast_cleared: 0x5
0000003067948884     +6448: end_clear_restore
0000003067948884        +0: start_draw_ib
0000003067977328    +28444: end_draw_ib
0000003067977588      +260: start_resolve
0000003068020332    +42744: end_resolve
0000003068020332        +0: start_tile: bin_h=160, yoff=320, bin_w=800, xoff=0
0000003068022568     +2236: start_clear_restore: fast_cleared: 0x5
0000003068029068     +6500: end_clear_restore
0000003068029068        +0: start_draw_ib
0000003068085436    +56368: end_draw_ib
0000003068085696      +260: start_resolve
0000003068128700    +43004: end_resolve
0000003068128700        +0: start_tile: bin_h=120, yoff=480, bin_w=800, xoff=0
0000003068130832     +2132: start_clear_restore: fast_cleared: 0x5
0000003068138684     +7852: end_clear_restore
0000003068138684        +0: start_draw_ib
0000003068184808    +46124: end_draw_ib
0000003068185068      +260: start_resolve
0000003068217152    +32084: end_resolve
ELAPSED: 484432 ns
END OF FRAME 362

(for a single renderpass)

Edited by Rob Clark

Merge request reports