Commits · 11.2-branchpoint · Jordan Williams / mesa

Feb 22, 2016

nouveau: update the Makefile.sources list · 4cd5e5b4

Emil Velikov authored 9 years ago


Reflect the nv50->g80 change and the new gm107_texture header.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

4cd5e5b4

Feb 21, 2016

radeonsi: implement binary shaders & shader cache in memory (v2) · ff360a52

Marek Olšák authored 9 years ago


v2: handle _mesa_hash_table_insert failure
    other cosmetic changes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ff360a52

gallium/radeon: remove unused radeon_shader_binary_free_* functions · 1132910e
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
1132910e

radeonsi: make radeon_shader_reloc name string fixed-sized · 50ac2612

Marek Olšák authored 9 years ago


This will simplify implementations of binary shaders.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

50ac2612

radeonsi: move some struct si_shader members to new struct si_shader_info · 1fe73d55
Marek Olšák authored 9 years ago
```
This will be part of shader binaries.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
1fe73d55

radeonsi: use smaller types for some si_shader members · 10fa269f

Marek Olšák authored 9 years ago


in order to decrease the shader size for a shader cache.

v2: add & use SI_MAX_VS_OUTPUTS

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

10fa269f

radeonsi: enable compiling one variant per shader · 9aaf28da

Marek Olšák authored 9 years ago


Shader stats from VERDE:

Default scheduler:

Totals:
SGPRS: 491272 -> 488672 (-0.53 %)
VGPRS: 289980 -> 311093 (7.28 %)
Code Size: 11091656 -> 11219948 (1.16 %) bytes
LDS: 97 -> 97 (0.00 %) blocks
Scratch: 1732608 -> 2246656 (29.67 %) bytes per wave
Max Waves: 78063 -> 77352 (-0.91 %)
Wait states: 0 -> 0 (0.00 %)

Looking at some of the worst regressions, I get:
- The VGPR increase seems to be caused by the fact that if PS has used less
  than 16 VGPRs, now it will always use 16 VGPRs and sometimes even 20.
  However, the wave count remains at 10 if VGPRs <= 24, so no harm there.
- The scratch increase seems to be caused by SGPR spilling.
  The unnecessary SGPR spilling has been an ongoing issue with the compiler
  and it's completely fixable by rematerializing s_loads or reordering
  instructions.

SI scheduler:

Totals:
SGPRS: 374848 -> 374576 (-0.07 %)
VGPRS: 284456 -> 307515 (8.11 %)
Code Size: 11433068 -> 11535452 (0.90 %) bytes
LDS: 97 -> 97 (0.00 %) blocks
Scratch: 509952 -> 522240 (2.41 %) bytes per wave
Max Waves: 79456 -> 78217 (-1.56 %)
Wait states: 0 -> 0 (0.00 %)

VGPRs - same story as before. The SI scheduler doesn't spill SGPRs so much
and generally spills way less than the default scheduler.
(522240 spills vs 2246656 spills)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

9aaf28da

radeonsi: print full shader name before disassembly · 754cf171
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
754cf171

radeonsi: compile non-GS middle parts of shaders immediately if enabled · 3c98e0b3

Marek Olšák authored 9 years ago


Still disabled.

Only prologs & epilogs are compiled in draw calls, but each variant of those
is compiled only once per process.

VS is always compiled as hw VS.
TES is always compiled as hw VS.

LS and ES stages are always compiled on demand.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

3c98e0b3

radeonsi: rework polygon stippling for PS prolog · e038f8fd
Marek Olšák authored 9 years ago
```
Don't use the pstipple module.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
e038f8fd
radeonsi: add PS prolog · 4636d9be
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
4636d9be
radeonsi: add PS epilog · e79bb746
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
e79bb746
radeonsi: add TCS epilog · eb10919b
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
eb10919b

radeonsi: add VS epilog · e1b21696

Marek Olšák authored 9 years ago


It only exports the primitive ID.
Also used by TES when it's compiled as VS.

The VS input location of the primitive ID input is v2.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

e1b21696

radeonsi: add VS prolog · 70de433d

Marek Olšák authored 9 years ago


This is disabled with use_monolithic_shaders = true.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

70de433d

radeonsi: first bits for non-monolithic shaders · 19a92886
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
19a92886
radeonsi: add code for dumping all shader parts together (v2) · 0303886b
Marek Olšák authored 9 years ago
```
v2: unify some code into si_get_shader_binary_size

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
```
0303886b
radeonsi: add code for combining and uploading shaders from 3 shader parts · 17eb99d8
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
17eb99d8
radeonsi: fail compilation if non-GS non-CS shaders have rodata · 9d5bf1a3
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
9d5bf1a3
radeonsi: separate 2 pieces of code from create_function · 09408764
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
09408764
radeonsi: add samplemask parameter to si_export_mrt_color · 29275922
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
29275922
radeonsi: add start_instance parameter to get_instance_index_for_fetch · e6aea08b
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
e6aea08b
radeonsi: separate out shader key bits for prologs & epilogs · dc274561
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
dc274561
radeonsi: compute how many input VGPRs fragment shaders have · d995d483
Marek Olšák authored 9 years ago
```
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
d995d483

radeonsi: compute how many input SGPRs and VGPRs shaders have · fe1b6ede

Marek Olšák authored 9 years ago

Prologs (shader binaries inserted before the API shader binary) need to
know this, so that they won't change the input registers unintentionally.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

fe1b6ede

gallium/radeon: add basic code for setting shader return values · 36202182
Marek Olšák authored 9 years ago
```
LLVMBuildInsertValue will be used on return_value.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
```
36202182

nvc0: enable compute shaders on Fermi · 3c9ed201

Samuel Pitoiset authored 9 years ago


Kepler compute support is really different than Fermi and it's not
ready yet.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

3c9ed201

nv50/ir: add atomics support on shared memory for Fermi · 14a810e9

Samuel Pitoiset authored 9 years ago


Changes from v3:
 - move the previous OP_SELP change to the previous commit

Changes from v2:
 - make sure the op is OP_SELP when emitting the predicate and add one
   assert
 - use bld.getSSA() for mkOp2()
 - add cross edge between tryLockAndSetBB and joinBB

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>

14a810e9

nv50/ir: make OP_SELP a compare instruction · e0371e63

Samuel Pitoiset authored 9 years ago


This OP_SELP insn will be used to handle compare and swap subops.

Changes from v2:
 - fix logic for GK110+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

e0371e63

nv50/ir: add lock/unlock subops for load/store · 0c930557

Samuel Pitoiset authored 9 years ago


Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

0c930557

nv50/ir: use s[] addr space for shared buffers · 45e85e16

Samuel Pitoiset authored 9 years ago


Shared memory address space (FILE_MEMORY_SHARED) must be used instead
of global memory when a shared memory area is declared.

Changes from v2:
 - oops, do not remove TGSI_FILE_BUFFER in a switch in
   nv50_ir_from_tgsi.cpp

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

45e85e16

nvc0: reduce likelihood of collision for real buffers on Fermi · 80fc67fb

Samuel Pitoiset authored 9 years ago


Reduce likelihood of collision with real buffers by placing the
hole at the top of the 4G area. This fixes some indirect draw+compute
tests with large buffers.

Suggested by Ilia Mirkin.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

80fc67fb

nvc0: invalidate compute state when switching pipe contexts · 807901b6

Samuel Pitoiset authored 9 years ago


Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

807901b6

nvc0: add support for indirect compute on Fermi · c6293877

Samuel Pitoiset authored 9 years ago


When indirect compute is used, the size of the grid (in blocks) is
stored as three integers inside a buffer. This requires a macro to
set up GRIDDIM_YX and GRIDDIM_Z.

Changes from v2:
 - do not launch the grid if the number of groups for a dimension is 0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

c6293877

nvc0: bind textures/samplers for compute on Fermi · fa7333a7

Samuel Pitoiset authored 9 years ago


Textures and samplers don't seem to be aliased between COMPUTE and 3D.

Changes from v2:
 - refactor the code to share (almost) the same logic between 3d and
   compute

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

fa7333a7

nvc0: bind shader buffers for compute on Fermi · 917a5ff6

Samuel Pitoiset authored 9 years ago


This is loosely based on 3D. Shader buffers are bound on c15 (the
driver constbuf) at offset 0x200.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

917a5ff6

nvc0: bind driver constbuf for compute on Fermi · a9b70a86

Samuel Pitoiset authored 9 years ago


Changes from v3:
 - add new validation state for COMPUTE driver constbuf

Changes from v2:
 - always bind the driver consts even if user params come in via clover

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

a9b70a86

nvc0: add a new validation state for 3D driver constbuf · 52765262

Samuel Pitoiset authored 9 years ago


This will be used to invalidate 3D driver constbuf when using COMPUTE
and vice-versa. This is needed because this CB contains a bunch of
useful information like the addrs of shader buffers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

52765262

nvc0: bind constant buffers for compute on Fermi · 57d42510

Samuel Pitoiset authored 9 years ago


Loosely based on 3D.

Changs from v3:
 - invalidate COMPUTE CBs after validating 3D CBs because they are
   aliased

Changes from v2:
 - get rid of the 's' param to nvc0_cb_bo_push() because it doesn't
   matter to upload constbufs for compute using the 3d chan

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

57d42510

nvc0: allocate an area for compute user constbufs · 53f92bb7

Samuel Pitoiset authored 9 years ago


For compute shaders, we might need to upload uniforms.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

53f92bb7

Admin message

Admin message