Skip to content

vulkan/STACK_ARRAY(): save some space and use the stack more often

Paulo Zanoni requested to merge pzanoni/mesa:stack-array-save-space into main

Two extra commits to STACK_ARRAY:

  • The first one stops the memory waste of defining arrays we won't use at all (when we choose to go with malloc()), or defining arrays with more elements than what we need.
  • The second one changes the threshold from number of elements to number of bytes, allowing us to make saner decisions between the stack or the heap.

The motivation here is the fact that we use STACK_ARRAY a lot, and even more on Anv when using TR-TT. Here's a little graph for when running a small trace of Assassin's Creed: Valhalla with the following patch:

diff --git a/src/vulkan/util/vk_util.h b/src/vulkan/util/vk_util.h
index 26978dbe1e8..fbd57164aa3 100644
--- a/src/vulkan/util/vk_util.h
+++ b/src/vulkan/util/vk_util.h
@@ -363,6 +363,9 @@ vk_spec_info_to_nir_spirv(const VkSpecializationInfo *spec_info,
    const size_t _##name##_alloc_size = (size) * sizeof(type); \
    const bool _##name##_stack_array_on_stack = \
       _##name##_alloc_size <= STACK_ARRAY_SIZE_B; \
+   fprintf(stderr, "=== alloc_size:%8llu elements:%4llu (%s)\n", \
+           (long long unsigned) _##name##_alloc_size, \
+           (long long unsigned) size, __func__); \
    type *const name = _##name##_stack_array_on_stack ? \
                         (type *)alloca(_##name##_alloc_size) : \
                         (type *)malloc(_##name##_alloc_size)

The results being:

$ cat traci.log | grep === | grep -v trtt | sort | uniq -c | sort -n | tail -n 20
     20 === alloc_size:      40 elements:   1 (vk_common_GetPhysicalDeviceQueueFamilyProperties)
   1241 === alloc_size:       0 elements:   0 (vk_common_QueueSubmit)
   1241 === alloc_size:      24 elements:   1 (vk_queue_wait_before_present)
   1241 === alloc_size:       4 elements:   1 (wsi_common_queue_present)
   1241 === alloc_size:      64 elements:   1 (vk_common_QueueSubmit)
   2482 === alloc_size:      24 elements:   1 (vk_common_QueueSubmit)
   2482 === alloc_size:      48 elements:   1 (vk_common_QueueSubmit)
   3291 === alloc_size:      80 elements:   2 (vk_common_GetPhysicalDeviceSparseImageFormatProperties)
   3718 === alloc_size:      24 elements:   1 (vk_common_WaitForFences)
   6582 === alloc_size:      64 elements:   1 (vk_common_GetImageSparseMemoryRequirements)
  20108 === alloc_size:      48 elements:   1 (vk_common_QueueBindSparse)
  41310 === alloc_size:      24 elements:   1 (vk_common_WaitSemaphores)
  79878 === alloc_size:       4 elements:   1 (vk_drm_syncobj_wait_many)
  79878 === alloc_size:       8 elements:   1 (vk_drm_syncobj_wait_many)
$ cat traci.log | grep === | grep trtt | sort | uniq -c | sort -n | tail -n 20
    441 === alloc_size:     768 elements:  48 (anv_sparse_bind_trtt)
    473 === alloc_size:     512 elements: 128 (gfx12_write_trtt_entries)
    480 === alloc_size:      96 elements:   6 (anv_sparse_bind_trtt)
    522 === alloc_size:     640 elements: 160 (gfx12_write_trtt_entries)
    524 === alloc_size:     128 elements:   8 (anv_sparse_bind_trtt)
    569 === alloc_size:     320 elements:  80 (gfx12_write_trtt_entries)
    654 === alloc_size:     112 elements:   7 (anv_sparse_bind_trtt)
    859 === alloc_size:     144 elements:   9 (anv_sparse_bind_trtt)
   1485 === alloc_size:    1664 elements: 416 (gfx12_write_trtt_entries)
   1493 === alloc_size:     896 elements: 224 (gfx12_write_trtt_entries)
   1510 === alloc_size:     192 elements:  12 (anv_sparse_bind_trtt)
   1517 === alloc_size:     384 elements:  24 (anv_sparse_bind_trtt)
   1887 === alloc_size:     384 elements:  96 (gfx12_write_trtt_entries)
   1894 === alloc_size:      64 elements:   4 (anv_sparse_bind_trtt)
   3190 === alloc_size:     256 elements:  64 (gfx12_write_trtt_entries)
   3204 === alloc_size:      32 elements:   2 (anv_sparse_bind_trtt)
   3217 === alloc_size:      80 elements:   5 (anv_sparse_bind_trtt)
   3736 === alloc_size:     192 elements:  48 (gfx12_write_trtt_entries)
   3756 === alloc_size:      16 elements:   1 (anv_sparse_bind_trtt)
  10470 === alloc_size:      48 elements:   3 (anv_sparse_bind_trtt)

Merge request reports