drm/xe/guc: Use doorbells for submission if possible
We have 256 doorbells (on most platforms) that we can allocate to bypass using the H2G channel for submission. This will avoid contention on the CT mutex.
Signed-off-by: Matthew Brost matthew.brost@intel.com Suggested-by: Jason Ekstrand jason@jlekstrand.net
Suggested by Jason on IRC, not sure if will work but getting RFC code out for this implementation as there was a lot on confusion on #xe channel when I asked about doorbells. Will try to test this out on TGL / DG1 / DG2 tomorrow and see if I can get this working.
Attempt to resolve: #196
Merge request reports
Activity
-
Build successful! - Build URL: https://jenkins-xe.lgci.intel.com/job/xe/345/
- Test URL: http://intel-gfx-ci-public.igk.intel.com:8080/job/BAT-xe/240/
- Test Logs URL (wait for tests to finish first): http://intel-gfx-ci-public.igk.intel.com/archive/results/IGT/xe-mr-288-a226b168d608af05bb6cdf43be60fcf96519048b/
-
added 4 commits
Toggle commit list-
Build successful! - Build URL: https://jenkins-xe.lgci.intel.com/job/xe/358/
- Test URL: http://intel-gfx-ci-public.igk.intel.com:8080/job/BAT-xe/253/
- Test Logs URL (wait for tests to finish first): http://intel-gfx-ci-public.igk.intel.com/archive/results/IGT/xe-mr-288-49da6b9289c8d9fd7ef545ed1371f3bd28c27f7d/
-
Latest version is tested and working on TGL / DG1. Can try the other platforms before working.
Running xe_exec_threads /w 245 user engines + 8k execs seems to be slightly faster with doorbells:
Running with doorbells:
root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.828s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.775s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.892s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.763s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.840s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.829s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.815s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.877s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.833s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.827s) root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.784s)
Running without doorbells:
oot@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.848s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.927s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.960s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.966s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.975s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.956s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.921s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.940s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.945s) root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64) Starting subtest: threads-basic Subtest threads-basic: SUCCESS (0.797s)
Based on these results /w doorbells averaged .824s vs. w/o doorbells .923s per run. This seems to indicate there is a benefit of using doorbells. Based on that, I'd say let get this merged as this performance improvement seems to justify adding this complexity to the code.
Thoughts?
Edited by Matthew Brost-
Build successful! - Build URL: https://jenkins-xe.lgci.intel.com/job/xe/359/
- Test URL: http://intel-gfx-ci-public.igk.intel.com:8080/job/BAT-xe/254/
- Test Logs URL (wait for tests to finish first): http://intel-gfx-ci-public.igk.intel.com/archive/results/IGT/xe-mr-288-49da6b9289c8d9fd7ef545ed1371f3bd28c27f7d/
-
-
Build failed! - Build URL: https://jenkins-xe.lgci.intel.com/job/xe/443/
-
mentioned in merge request !305