Shader compilation memory leaks
Memory leaks are observed with a simple Vulkan test that uses the Intel blorp framework. Example:
Indirect leak of 800 byte(s) in 2 object(s) allocated from:
#0 0x4a040d in malloc (/home/cstout/vulkan-tests/wait-and-signal-same-sem/vkcopy+0x4a040d)
#1 0x7f51a6590e7c in ralloc_size /home/cstout/mesa/build/../src/util/ralloc.c:133:18
#2 0x7f51a6590f05 in rzalloc_size /home/cstout/mesa/build/../src/util/ralloc.c:166:16
#3 0x7f51a6750006 in nir_alu_instr_create /home/cstout/mesa/build/../src/compiler/nir/nir.c:521:7
#4 0x7f51a653426b in brw_nir_opt_peephole_ffma_block /home/cstout/mesa/build/../src/intel/compiler/brw_nir_opt_peephole_ffma.c:238:29
#5 0x7f51a653426b in brw_nir_opt_peephole_ffma_impl /home/cstout/mesa/build/../src/intel/compiler/brw_nir_opt_peephole_ffma.c:277:19
#6 0x7f51a653426b in brw_nir_opt_peephole_ffma /home/cstout/mesa/build/../src/intel/compiler/brw_nir_opt_peephole_ffma.c:297:22
#7 0x7f51a65174b7 in brw_postprocess_nir /home/cstout/mesa/build/../src/intel/compiler/brw_nir.c:1140:7
#8 0x7f51a64c7800 in brw_compile_fs /home/cstout/mesa/build/../src/intel/compiler/brw_fs.cpp:9235:4
#9 0x7f51a65b52d5 in blorp_compile_fs /home/cstout/mesa/build/../src/intel/blorp/blorp.c:231:11
#10 0x7f51a65c2bed in brw_blorp_get_blit_kernel /home/cstout/mesa/build/../src/intel/blorp/blorp_blit.c:1502:14
#11 0x7f51a65c518e in try_blorp_blit /home/cstout/mesa/build/../src/intel/blorp/blorp_blit.c:2115:9
#12 0x7f51a65c518e in do_blorp_blit /home/cstout/mesa/build/../src/intel/blorp/blorp_blit.c:2276:10
#13 0x7f51a65c69a3 in blorp_copy /home/cstout/mesa/build/../src/intel/blorp/blorp_blit.c:2772:4
#14 0x7f51a65c6ea9 in do_buffer_copy /home/cstout/mesa/build/../src/intel/blorp/blorp_blit.c:2845:4
#15 0x7f51a65c70a2 in blorp_buffer_copy /home/cstout/mesa/build/../src/intel/blorp/blorp_blit.c:2883:7
#16 0x7f51a633dbf1 in copy_buffer /home/cstout/mesa/build/../src/intel/vulkan/anv_blorp.c:829:4
#17 0x7f51a633dbf1 in anv_CmdCopyBuffer2KHR /home/cstout/mesa/build/../src/intel/vulkan/anv_blorp.c:844:7
#18 0x7f51a648fa20 in vk_common_CmdCopyBuffer /home/cstout/mesa/build/../src/vulkan/util/vk_cmd_copy.c:67:4
#19 0x4ee7f7 in vk::DispatchLoaderStatic::vkCmdCopyBuffer(VkCommandBuffer_T*, VkBuffer_T*, VkBuffer_T*, unsigned int, VkBufferCopy const*) const (/home/cstout/vulkan-tests/wait-and-signal-same-sem/vkcopy+0x4ee7f7)
#20 0x4d96ff in VkCopyTest::InitBuffers(unsigned int) (/home/cstout/vulkan-tests/wait-and-signal-same-sem/vkcopy+0x4d96ff)
#21 0x4d3675 in VkCopyTest::Initialize() (/home/cstout/vulkan-tests/wait-and-signal-same-sem/vkcopy+0x4d3675)
#22 0x4dc89b in main (/home/cstout/vulkan-tests/wait-and-signal-same-sem/vkcopy+0x4dc89b)
#23 0x7f51aa32bcb1 in __libc_start_main csu/../csu/libc-start.c:314:16
Looking through recent history I see this change that seems suspect:
commit 5f992802f5130352e903218cf3541e429b87cae2
Author: Eric Anholt <eric@anholt.net>
Date: Mon Oct 26 11:28:33 2020 -0700
nir/builder: Drop the mem_ctx arg from nir_builder_init_simple_shader().
So I tried the following workaround and it removes the leak:
diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index 281803a190b..78300a749c5 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -1501,6 +1501,8 @@ brw_blorp_get_blit_kernel(struct blorp_batch *batch,
program = blorp_compile_fs(blorp, mem_ctx, nir, &wm_key, false,
&prog_data);
+ ralloc_adopt(mem_ctx, nir);
+ ralloc_steal(mem_ctx, nir);
bool result =
blorp->upload_shader(batch, MESA_SHADER_FRAGMENT,
It seems that doing ralloc_adopt early doesn't result in future children of the shader from parenting under mem_ctx.
Here is a test that demonstrates the issue, verified against ToT mesa today, with -fno-omit-frame-pointer added.
https://github.com/cdotstout/vulkan-tests/tree/main/wait-and-signal-same-sem
Apply the following patch and build with -fsanitize=address:
diff --git a/wait-and-signal-same-sem/vkcopy.cc b/wait-and-signal-same-sem/vkcopy.cc
index 75d6209..90b6e32 100644
--- a/wait-and-signal-same-sem/vkcopy.cc
+++ b/wait-and-signal-same-sem/vkcopy.cc
@@ -85,7 +85,7 @@ bool VkCopyTest::Initialize() {
return false;
}
- ctx_ = VulkanContext::Builder{}.set_validation_layers_enabled(true).Unique();
+ ctx_ = VulkanContext::Builder{}.set_validation_layers_enabled(false).Unique();
if (!ctx_) {
RTN_MSG(false, "Failed to initialize Vulkan.\n");
@@ -387,7 +387,7 @@ int main() {
fflush(stdout);
for (uint32_t iter = 0; iter < kIterations; iter++) {
- if (!app.Exec(iter > 0)) {
+ if (!app.Exec(false)) {
RTN_MSG(EXIT_FAILURE, "Exec failed.\n");
}
}
Full sanitizer results are attached. mesa-leaks.txt