Strange slow-fast performance latching between gfxbench3 (and 4) test runs
@tursulin
Submitted by Tvrtko Ursulin Assigned to Chris Wilson @ickle
Link to original bug (#103314)
Description
I have observed a strange phenomena where benchmarks scores can vary, by a differing degree, between benchmark runs. By looking at what is happening with the system I have noticed the fast case only ever uses the RCS engine, while the slow case uses BCS at 100% and RCS at a lower percentage (depends on the benchmark exactly how much).
I have already chatted with Kenneth (and some other people) about this and apparently this is somewhat known and caused by different BO upload paths. It was supposed to be alleviated in the master branch but for me the results are exactly the same (bad) as what I originally discovered.
The most obvious example is the gl_driver test:
Slow run, 17.5% RCS busy, 100% BCS busy:
========================================
root@e31:~/benchmarks/gfxbench3_desktop# INTEL_DEBUG=perf ~/bin/run-mesa ~/mesa ./gfxbench-driver.sh
Running following GfxBench 3.x test-cases:
- gl_driver
In following resolutions:
- 1920x1080
Fullscreened:
- 1
On/offscreen:
- on
COMMAND: build/linux/gfxbench_Release/mainapp/mainapp -w 1920 -ow 1920 -h 1080 -oh 1080 -t gl_driver -fullscreen 1
ATTENTION: default value of option vblank_mode overridden by environment.
Scanning index buffer to compute index buffer bounds. Use glDrawRangeElements() to avoid this.
CPU mapping a busy "statebuffer" BO stalled and took 0.017 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.035 ms.
CPU mapping a busy "streamed data" BO stalled and took 0.012 ms.
CPU mapping a busy "statebuffer" BO stalled and took 0.010 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.022 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.013 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.025 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.038 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.039 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.018 ms.
#Name, FPS, Score, Unit, Width, Height, GL version:
GLB30_gl_driver, 34.2, 1026.0, frames, 1920, 1080, 3.0 Mesa 17.3.0-devel (git-31fb7bbe0b)
Results are in:
- gfxbench-result-fullscreen-1.csv
Full output logs are in:
- gfxbench-result-fullscreen-1.txt
Fast run, 75% RCS busy, 0% BCS busy:
=====================================
root@e31:~/benchmarks/gfxbench3_desktop# INTEL_DEBUG=perf ~/bin/run-mesa ~/mesa ./gfxbench-driver.sh
Running following GfxBench 3.x test-cases:
- gl_driver
In following resolutions:
- 1920x1080
Fullscreened:
- 1
On/offscreen:
- on
COMMAND: build/linux/gfxbench_Release/mainapp/mainapp -w 1920 -ow 1920 -h 1080 -oh 1080 -t gl_driver -fullscreen 1
ATTENTION: default value of option vblank_mode overridden by environment.
Scanning index buffer to compute index buffer bounds. Use glDrawRangeElements() to avoid this.
CPU mapping a busy "batchbuffer" BO stalled and took 0.010 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.020 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.021 ms.
CPU mapping a busy "streamed data" BO stalled and took 0.019 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.015 ms.
CPU mapping a busy "statebuffer" BO stalled and took 0.017 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.037 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.036 ms.
CPU mapping a busy "statebuffer" BO stalled and took 0.021 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.029 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.019 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.017 ms.
CPU mapping a busy "streamed data" BO stalled and took 0.019 ms.
CPU mapping a busy "statebuffer" BO stalled and took 0.016 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.023 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.011 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.014 ms.
CPU mapping a busy "statebuffer" BO stalled and took 0.015 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.011 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.013 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.015 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.027 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.010 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.015 ms.
CPU mapping a busy "statebuffer" BO stalled and took 0.011 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.031 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.038 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.023 ms.
CPU mapping a busy "batchbuffer" BO stalled and took 0.026 ms.
#Name, FPS, Score, Unit, Width, Height, GL version:
GLB30_gl_driver, 101.4, 3042.0, frames, 1920, 1080, 3.0 Mesa 17.3.0-devel (git-31fb7bbe0b)
Results are in:
- gfxbench-result-fullscreen-1.csv
Full output logs are in:
- gfxbench-result-fullscreen-1.txt
Other tests show a smaller margin between slow and fast modes, but the crux of the problem is still there.
I think this is quite a problem for benchmarking, and unless people will say something along the lines of "Don't use gfxbench3/4 ever because A B C", I think it would be good to get to the bottom of this and having something which behaves somewhat more predictably.
I also wonder if it is possible that something in the kernel (i915) is causing Mesa to behave like this? Because I am pretty sure I used these benchmarks before, but I don't remember noticing this issue before. 3x difference in gl_driver certainly looks like something one would have thought people would have noticed before if it was old behaviour?