Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Mesa 22.3.0 SEGFAULT in nir shader creation for r600 cards on FreeBSD
GPU: Several radeon r600 cards, e.g. FirePro V4800
Mesa version: 22.3.0
Xserver version: 21.1.4
Compiled with base compiler, clang 13.0.0 / clang 15.0.0
LLVM lib version 15.0.6
Description
We have several cases of Xorg and wayland immediately crashing at startup, after an update from Mesa 22.2.3 to 22.3.0, see the FreeBSD bug report.
So far this only affects radeon r600 based graphic cards. Intel graphics and the newer amd cards work as before.
The Xorg log does not contain anything useful apart from SEGFAULT.
Regression
All these setups used to work with Mesa 22.2.3, and crash immediately with Mesa 22.3.0.
As of commit 73db82c8, the problem still remains.
I did a bisect of the changes and for me the first commit to crash was 7662a5e9.
There is also an earlier commit which breaks rendering (black screen), dfbb4b38. That may be an unrelated issue though.
You're right, I haven't gone deep enough to see raw pointer.
Probably this is most interesting part of log:
==1994== Use of uninitialised value of size 8==1994== at 0x7891544: bool std::__1::__tree_is_left_child<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) (__tree:83)==1994== by 0x78914C4: std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>* std::__1::__tree_next_iter<std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>*, std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) (__tree:186)==1994== by 0x78F8C55: std::__1::__tree_const_iterator<r600::Instr*, std::__1::__tree_node<r600::Instr*, void*>*, long>::operator++() (__tree:925)==1994== by 0x796ECFE: r600::CopyPropFwdVisitor::visit(r600::AluInstr*) (sfn_optimizer.cpp:363)==1994== by 0x7901D4C: r600::AluInstr::accept(r600::InstrVisitor&) (sfn_instr_alu.cpp:180)==1994== by 0x796EEB4: r600::CopyPropFwdVisitor::visit(r600::Block*) (sfn_optimizer.cpp:602)==1994== by 0x78F5F2D: r600::Block::accept(r600::InstrVisitor&) (sfn_instr.cpp:328)==1994== by 0x796CB92: r600::copy_propagation_fwd(r600::Shader&) (sfn_optimizer.cpp:304)==1994== by 0x796C81F: r600::optimize(r600::Shader&) (sfn_optimizer.cpp:59)==1994== by 0x78E6C99: r600_shader_from_nir (sfn_nir.cpp:988)==1994== by 0x77F9E31: r600_pipe_shader_create (r600_shader.c:231)==1994== by 0x7838ABA: r600_shader_select (r600_state_common.c:959)==1994== ==1994== Invalid read of size 8==1994== at 0x7891544: bool std::__1::__tree_is_left_child<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) (__tree:83)==1994== by 0x78914C4: std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>* std::__1::__tree_next_iter<std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>*, std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) (__tree:186)==1994== by 0x78F8C55: std::__1::__tree_const_iterator<r600::Instr*, std::__1::__tree_node<r600::Instr*, void*>*, long>::operator++() (__tree:925)==1994== by 0x796ECFE: r600::CopyPropFwdVisitor::visit(r600::AluInstr*) (sfn_optimizer.cpp:363)==1994== by 0x7901D4C: r600::AluInstr::accept(r600::InstrVisitor&) (sfn_instr_alu.cpp:180)==1994== by 0x796EEB4: r600::CopyPropFwdVisitor::visit(r600::Block*) (sfn_optimizer.cpp:602)==1994== by 0x78F5F2D: r600::Block::accept(r600::InstrVisitor&) (sfn_instr.cpp:328)==1994== by 0x796CB92: r600::copy_propagation_fwd(r600::Shader&) (sfn_optimizer.cpp:304)==1994== by 0x796C81F: r600::optimize(r600::Shader&) (sfn_optimizer.cpp:59)==1994== by 0x78E6C99: r600_shader_from_nir (sfn_nir.cpp:988)==1994== by 0x77F9E31: r600_pipe_shader_create (r600_shader.c:231)==1994== by 0x7838ABA: r600_shader_select (r600_state_common.c:959)==1994== Address 0xffffffff is not stack'd, malloc'd or (recently) free'd==1994== ==1994== ==1994== Process terminating with default action of signal 6 (SIGABRT): dumping core==1994== at 0x4B7133A: thr_kill (in /lib/libc.so.7)==1994== by 0x4AE9C73: raise (in /lib/libc.so.7)==1994== by 0x4B9B108: abort (in /lib/libc.so.7)==1994== by 0x45C68B: OsAbort (utils.c:1352)==1994== by 0x466D05: AbortServer (log.c:879)==1994== by 0x464826: FatalError (log.c:1017)==1994== by 0x458FA2: OsSigHandler (osinit.c:156)==1994== by 0x4A2258D: ??? (in /lib/libthr.so.3)==1994== by 0x4A21B3E: ??? (in /lib/libthr.so.3)==1994== by 0x3819C467: ??? (in /usr/local/libexec/valgrind/memcheck-amd64-freebsd)==1994== by 0x78914C4: std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>* std::__1::__tree_next_iter<std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>*, std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) (__tree:186)==1994== by 0x78F8C55: std::__1::__tree_const_iterator<r600::Instr*, std::__1::__tree_node<r600::Instr*, void*>*, long>::operator++() (__tree:925)
(It's christmas time so answers may be super slow.)
I agree, the uninitialized read is very interesting. dest->uses() is a std::set container, which is implemented by a red-black-tree in llvm libc++. It's a doubly linked tree (pointers for parent, left child, right child). Makes heavy use of pointer casting between base nodes and end nodes of different size, so no fun to debug.
From the core file it's clear that the set iterator trips over a malformed node in the red-black-tree, which is probably the uninitialized read here.
The problem lies in the fact that I was using a ranged for loop for uses, but upon successful copy propagation this set was changes, which seems to invalidate the iterators when using libc++ (but not with libstdc++). !20394 (merged) implements a workaround.
Subtle nuances these iterator invalidations - I can confirm that this fixes the crash in my case. Others will be testing now. The black screen is also gone with Mesa 22.3.1.
src/gallium/drivers/r600/sfn/sfn_optimizer.cpp:378 373 auto ii = dest->uses().begin(); 374 auto ie = dest->uses().end(); 375 376 while(ii != ie) { 377 auto i = *ii; 378 ++ii;Core was generated by `xonotic-sdl'.Program terminated with signal SIGSEGV, Segmentation fault.#0 0x0000078715b82d24 in std::__1::__tree_is_left_child<std::__1::__tree_node_base<void*>*> (__x=0x787ffffffff) at /usr/include/c++/v1/__tree:8383 return __x == __x->__parent_->__left_;[Current thread is 1 (process 195492)](gdb) p __x->__parent_$1 = (std::__1::__tree_node_base<void*>::__parent_pointer) 0x0(gdb) bt#0 0x0000078715b82d24 in std::__1::__tree_is_left_child<std::__1::__tree_node_base<void*>*> (__x=0x787ffffffff) at /usr/include/c++/v1/__tree:83#1 std::__1::__tree_next_iter<std::__1::__tree_end_node<std::__1::__tree_node_base<void*>*>*, std::__1::__tree_node_base<void*>*> (__x=0x787ffffffff) at /usr/include/c++/v1/__tree:186#2 std::__1::__tree_const_iterator<r600::Instr*, std::__1::__tree_node<r600::Instr*, void*>*, long>::operator++ ( this=<optimized out>) at /usr/include/c++/v1/__tree:925#3 r600::CopyPropFwdVisitor::visit (this=0x72a715d874a8, instr=<optimized out>) at ../src/gallium/drivers/r600/sfn/sfn_optimizer.cpp:378#4 0x0000078715b836c4 in r600::CopyPropFwdVisitor::visit (this=0x72a715d874a8, instr=<optimized out>) at ../src/gallium/drivers/r600/sfn/sfn_optimizer.cpp:631#5 0x0000078715b81ae4 in r600::copy_propagation_fwd (shader=...) at ../src/gallium/drivers/r600/sfn/sfn_optimizer.cpp:304#6 0x0000078715b8190c in r600::optimize (shader=...) at ../src/gallium/drivers/r600/sfn/sfn_optimizer.cpp:59#7 0x0000078715b4592f in r600_shader_from_nir (rctx=0x787686c9000, pipeshader=0x787c4742000, key=0x72a715d87848) at ../src/gallium/drivers/r600/sfn/sfn_nir.cpp:999#8 0x0000078715ab3400 in r600_pipe_shader_create (ctx=0x787686c9000, shader=0x787c4742000, key=...) at ../src/gallium/drivers/r600/r600_shader.c:231#9 0x0000078715aed054 in r600_shader_select (ctx=0x787fa2987e0, sel=0x787dd352350, dirty=0x72a715d878ff, precompile=<optimized out>) at ../src/gallium/drivers/r600/r600_state_common.c:967#10 0x0000078715af45a0 in r600_create_shader_state (ctx=0x787686c9000, state=<optimized out>, pipe_shader_type=<optimized out>) at ../src/gallium/drivers/r600/r600_state_common.c:1071#11 0x000007871557d0cf in st_create_nir_shader (st=<optimized out>, state=0x72a715d879b8) at ../src/mesa/state_tracker/st_program.c:551#12 0x000007871557ddb9 in st_create_fp_variant (st=0x787552e5000, fp=0x7877e1a4630, key=0x72a715d87d00) at ../src/mesa/state_tracker/st_program.c:1071#13 st_get_fp_variant (st=0x787552e5000, fp=0x7877e1a4630, key=0x72a715d87d00) at ../src/mesa/state_tracker/st_program.c:1116#14 0x000007871557e67d in st_precompile_shader_variant (st=0x787552e5000, prog=0x7877e1a4630) at ../src/mesa/state_tracker/st_program.c:1303#15 st_finalize_program (st=0x787552e5000, prog=0x7877e1a4630) at ../src/mesa/state_tracker/st_program.c:1365#16 0x000007871556ba01 in st_link_nir (ctx=0x787b4b3d000, shader_program=0x787d36d37b0) at ../src/mesa/state_tracker/st_glsl_to_nir.cpp:956#17 0x000007871556a018 in link_shader (ctx=0x787b4b3d000, prog=0x787d36d37b0) at ../src/mesa/state_tracker/st_glsl_to_ir.cpp:91#18 st_link_shader (ctx=0x787b4b3d000, prog=0x787d36d37b0) at ../src/mesa/state_tracker/st_glsl_to_ir.cpp:106#19 0x000007871552d483 in _mesa_glsl_link_shader (ctx=0x787b4b3d000, prog=0x787d36d37b0) at ../src/mesa/program/link_program.cpp:91#20 0x0000078715498f35 in link_program (shProg=0x787d36d37b0, no_error=<error reading variable: Cannot access memory at address 0x0>, ctx=<optimized out>) at ../src/mesa/main/shaderapi.c:1332#21 link_program_error (ctx=0x787b4b3d000, shProg=0x787d36d37b0) at ../src/mesa/main/shaderapi.c:1443
I'd appreciate if you would open a new bug and possibly link to this closed bug.
The graphics card reported by Edd Barrett is a Radeon HD 7470 which is code-name CAICOS (Terrascale 2). I've tested Xonotic on the Radeon HD 6850 (Barts also Terrascale 2) and with Linux, and I don't see the issue. It could be that Edd used different settings so that different shaders are used, or that it is indeed something that has to do with libc++ vs. libstdc++.
I can reproduce the crash with Mesa 23.1.9 but not with Mesa 22.3.7. On a Mobility Radeon HD 3650 (RV635). I'll continue to look at it, but it is difficult to bisect as I need various patches to get Mesa to build. If I make any progress I'll open another issue.
I've tested Mesa 24.0.0-devel (git-8e03c18914). I also tried to test on RV710, but there xonotic simply kills the whole system and results in an immediate reboot.