Skip to content
Snippets Groups Projects

drm/xe/guc: Use doorbells for submission if possible

Closed Matthew Brost requested to merge (removed):doorbells into xe

We have 256 doorbells (on most platforms) that we can allocate to bypass using the H2G channel for submission. This will avoid contention on the CT mutex.

Signed-off-by: Matthew Brost matthew.brost@intel.com Suggested-by: Jason Ekstrand jason@jlekstrand.net

Suggested by Jason on IRC, not sure if will work but getting RFC code out for this implementation as there was a lot on confusion on #xe channel when I asked about doorbells. Will try to test this out on TGL / DG1 / DG2 tomorrow and see if I can get this working.

Attempt to resolve: #196

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Matthew Brost added 4 commits

    added 4 commits

    • 8c301f4e - drm/xe/guc: Read HXG fields from DW1 of G2H response
    • 7f911481 - drm/xe/guc: Return the lower part of blocking H2G message
    • 4a0f9a35 - drm/xe/guc: Use doorbells for submission if possible
    • 49da6b92 - drm/xe/guc: Print doorbell ID in GuC engine debugfs entry

    Compare with previous version

  • Author Developer

    Latest version is tested and working on TGL / DG1. Can try the other platforms before working.

    Running xe_exec_threads /w 245 user engines + 8k execs seems to be slightly faster with doorbells:

    Running with doorbells:

    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.828s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.775s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.892s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.763s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.840s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.829s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.815s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.877s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.833s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.827s)
    root@DUT025-TGLU:mbrost# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.784s)

    Running without doorbells:

    oot@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.848s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.927s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.960s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.966s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.975s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.956s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.921s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.940s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.945s)
    root@DUT025-TGLU:igt-gpu-tools# xe_exec_threads --r threads-basic
    IGT-Version: 1.26-ge26de4b2 (x86_64) (Linux: 6.1.0-rc1-xe+ x86_64)
    Starting subtest: threads-basic
    Subtest threads-basic: SUCCESS (0.797s)

    Based on these results /w doorbells averaged .824s vs. w/o doorbells .923s per run. This seems to indicate there is a benefit of using doorbells. Based on that, I'd say let get this merged as this performance improvement seems to justify adding this complexity to the code.

    Thoughts?

    Edited by Matthew Brost
  • Matthew Brost changed title from RFC: drm/xe/guc: Use doorbells for submission if possible to drm/xe/guc: Use doorbells for submission if possible

    changed title from RFC: drm/xe/guc: Use doorbells for submission if possible to drm/xe/guc: Use doorbells for submission if possible

  • closed

  • Author Developer

    Did not mean to delete the source for this one, will repost shortly.

  • Matthew Brost mentioned in merge request !305

    mentioned in merge request !305

Please register or sign in to reply
Loading