Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • mesa mesa
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 3,070
    • Issues 3,070
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 1,001
    • Merge requests 1,001
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • MesaMesa
  • mesamesa
  • Merge requests
  • !1789

radv: force unroll loops which access temporary arrays via induction

  • Review changes

  • Download
  • Email patches
  • Plain diff
Closed Rhys Perry requested to merge pendingchaos/mesa:radv_force_unroll into master Aug 28, 2019
  • Overview 7
  • Commits 1
  • Pipelines 1
  • Changes 1

This has a very large impact on at least one X4 Foundations shader:

SGPRS: 40 -> 88 (120.00 %)
VGPRS: 168 -> 32 (-80.95 %)
Scratch size: 1012 -> 0 (-100.00 %) dwords per thread
Code Size: 15360 -> 4864 (-68.33 %) bytes
Max Waves: 1 -> 8 (700.00 %)

and has been reported to fix a large performance regression: https://github.com/daniel-schuermann/mesa/issues/120#issuecomment-523166738

Looking at force_unroll_array_accesses, this was already done but only if max_trip_count matched the array's size. This would prevent unrolling when the loop is split into multiple loops with each loop only processing a subset of the array (probably to try to convince the compiler to unroll the loop). Even if there was only a single loop, it wouldn't be unrolled because the (large) array was initialized in it's body.

pipeline-db changes:

Totals from affected shaders:
SGPRS: 136 -> 96 (-29.41 %)
VGPRS: 88 -> 168 (90.91 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 7100 -> 7968 (12.23 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 11 -> 6 (-45.45 %)
Wait states: 0 -> 0 (0.00 %)

This causes LLVM to use far more vgprs in a couple of shaders, halving their max_waves and reducing performance slightly in F1 2017 and Dota 2.

EDIT: I don't think I made this clear, the array was large enough to be lowered to scratch. Since glslang creates SPIR-V that initializes the array each loop iteration, there is a large amount of scratch stores each loop iteration.

Edited Aug 28, 2019 by Rhys Perry
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: radv_force_unroll