Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • mesa mesa
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 2,878
    • Issues 2,878
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 910
    • Merge requests 910
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Releases
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Mesa
  • mesamesa
  • Issues
  • #5853

Closed
Open
Created Jan 11, 2022 by Ian Romanick@idrOwner

nir: Merge consecutive store_scratch

I have observed some shaders in shader-db that have code like:

void main ()
{
  movieTaps[0] = vec3(0.2165, 0.125, 1.0);
  // repeat with different constant vec3 for [1..58].
  movieTaps[59] = vec3(0.8668, 0.2513, 0.907);

And NIR merrily generates the obvious, horrifying code. The worst part is that since these are vec3, each component is written individually.

        vec1 32 ssa_0 = load_const (0x00000000 = 0.000000)
        vec1 32 ssa_1 = load_const (0x3f800000 = 1.000000)
        ...
        vec1 32 ssa_10 = load_const (0x3e5db22d = 0.216500)
        intrinsic store_scratch (ssa_10, ssa_0) (align_mul=256, align_offset=0, wrmask=x /*1*/)
        vec1 32 ssa_11 = load_const (0x3e000000 = 0.125000)
        vec1 32 ssa_12 = load_const (0x00000004 = 0.000000)
        intrinsic store_scratch (ssa_11, ssa_12) (align_mul=256, align_offset=0, wrmask=x /*1*/)
        vec1 32 ssa_13 = load_const (0x00000008 = 0.000000)
        intrinsic store_scratch (ssa_1, ssa_13) (align_mul=256, align_offset=0, wrmask=x /*1*/)

The Intel compiler generates the obvious, horrifying code. As a side note, I noticed this because those writes get misidentified as 180 spills.

A pass already exists that merges UBO and SSBO reads and writes on consecutive locations. This pass should be extended to operate on scratch reads and writes as well.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking