SegFault: Missing file descriptor when using turn server
The following is based on current libnice master (804a4c09) in connection with janus-gateway 0.10.16 and was previously reported there (see here)
When I enable the use of a turn server in janus-gateway, I can reliably reproduce the following segfault in libnice:
(process:1074): libnice-DEBUG: 19:49:26.111: Agent 0x55982e113980: inbound STUN packet for 1/1 (stream/component) from [172.31.2.63]:64095 (96 octets) :
(process:1074): libnice-stun-DEBUG: 19:49:26.111: STUN demux: OK!
(process:1074): libnice-stun-DEBUG: 19:49:26.111: Comparing username/ufrag of len 9 and 4, equal=0
(process:1074): libnice-stun-DEBUG: 19:49:26.111: username: 0x61512f463a4a4d7866
(process:1074): libnice-stun-DEBUG: 19:49:26.111: ufrag: 0x61512f46
(process:1074): libnice-stun-DEBUG: 19:49:26.111: Found valid username, returning password: 'X....u'
(process:1074): libnice-stun-DEBUG: 19:49:26.111: Message HMAC-SHA1 fingerprint:
(process:1074): libnice-stun-DEBUG: 19:49:26.111: key : 0x5858....
(process:1074): libnice-stun-DEBUG: 19:49:26.111: expected: 0xc536955676f160a9802092a3a7bd858c3bb4d752
(process:1074): libnice-stun-DEBUG: 19:49:26.111: received: 0xc536955676f160a9802092a3a7bd858c3bb4d752
(process:1074): libnice-stun-DEBUG: 19:49:26.111: STUN auth: OK!
(process:1074): libnice-stun-DEBUG: 19:49:26.111: STUN unknown: 0 mandatory attribute(s)!
(process:1074): libnice-stun-DEBUG: 19:49:26.111: STUN Reply (buffer size = 1300)...
(process:1074): libnice-stun-DEBUG: 19:49:26.111: Message HMAC-SHA1 message integrity:
(process:1074): libnice-stun-DEBUG: 19:49:26.111: key : 0x5858.....
(process:1074): libnice-stun-DEBUG: 19:49:26.111: sent : 0x52ee7f8d3819393c8a906addba52fc31423222d8
(process:1074): libnice-stun-DEBUG: 19:49:26.111: Message HMAC-SHA1 fingerprint: 0x8796f9c3
(process:1074): libnice-stun-DEBUG: 19:49:26.111: All done (response size: 80)
(process:1074): libnice-DEBUG: 19:49:26.111: Agent 0x55982e113980 : STUN-CC RESP to '172.31.2.63:64095', socket=4294967295, len=80, cand=0x55982e180c90 (c-id:1), use-cand =0, transactionId=2112a44259352b2b646a4266584a7646
Thread 42 "hloop 429041650" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1118]
0x00007f2da5e53561 in socket_send_message (to=0x7f2da3bca480, message=0x7f2da3bc97c0, reliable=0, sock=<optimized out>) at ../socket/udp-turn.c:785
785 socket_send_message (NiceSocket *sock, const NiceAddress *to,
(gdb) backtrace
#0 0x00007f2da5e53561 in socket_send_message (to=0x7f2da3bca480, message=0x7f2da3bc97c0, reliable=0, sock=<optimized out>) at ../socket/udp-turn.c:785
#1 0x00007f2da5e53df0 in socket_send_messages (sock=0x55982e3156e0, to=0x7f2da3bca480, messages=<optimized out>, n_messages=1) at ../socket/udp-turn.c:990
#2 0x00007f2da5e4d7be in nice_socket_send (sock=sock@entry=0x55982e3156e0, to=to@entry=0x7f2da3bca480, len=len@entry=80, buf=buf@entry=0x7f2da3bc9ea0 "\001\001") at ../socket/socket.c:226
#3 0x00007f2da5e37747 in agent_socket_send (sock=sock@entry=0x55982e3156e0, addr=addr@entry=0x7f2da3bca480, len=len@entry=80, buf=0x7f2da3bc9ea0 "\001\001") at ../agent/agent.c:7012
#4 0x00007f2da5e412ed in priv_reply_to_conn_check (use_candidate=<optimized out>, msg=0x7f2da3bc99f0, rbuf_len=80, sockptr=0x55982e3156e0, toaddr=0x7f2da3bca480, rcand=<optimized out>, lcand=0x55982e2920b0, component=0x55982e170640, stream=0x55982e1dd6c0, agent=0x55982e113980) at ../agent/conncheck.c:3244
#5 0x00007f2da5e412ed in conn_check_handle_inbound_stun (agent=agent@entry=0x55982e113980, stream=stream@entry=0x55982e1dd6c0, component=component@entry=0x55982e170640, nicesock=0x55982e3156e0, from=0x7f2da3bca480, buf=buf@entry=0x55982e3a47c0 "", len=96) at ../agent/conncheck.c:4785
#6 0x00007f2da5e35791 in agent_recv_message_unlocked(agent=agent@entry=0x55982e113980, stream=stream@entry=0x55982e1dd6c0, component=component@entry=0x55982e170640, nicesock=<optimized out>, message=message@entry=0x7f2 da3bca540) at ../agent/agent.c:4430
#7 0x00007f2da5e35def in component_io_cb (gsocket=<optimized out>, condition=<optimized out>, user_data=0x55982e3626c0) at ../agent/agent.c:5753
#8 0x00007f2da5d162f4 in () at /usr/lib/libgio-2.0.so.0
#9 0x00007f2da5b88703 in g_main_context_dispatch () at /usr/lib/libglib-2.0.so.0
#10 0x00007f2da5b8896b in () at /usr/lib/libglib-2.0.so.0
#11 0x00007f2da5b88cb1 in g_main_loop_run () at /usr/lib/libglib-2.0.so.0
#12 0x000055982cea6786 in janus_ice_handle_thread (data=0x55982e202aa0) at ice.c:1165
#13 0x00007f2da5ba5df3 in () at /usr/lib/libglib-2.0.so.0
#14 0x00007f2da5ed471e in () at /lib/ld-musl-x86_64.so.1
#15 0x0000000000000000 in ()
(gdb)
The reported socket (socket=4294967295) looks suspicious as it equals 0xFFFF FFFF. I also found 2 STUN-REQ with this socket:
(process:1074): libnice-stun-DEBUG: 19:49:26.042: STUN demux: OK!
(process:1074): libnice-stun-DEBUG: 19:49:26.042: STUN unknown: 0 mandatory attribute(s)!
(process:1074): libnice-stun-DEBUG: 19:49:26.042: STUN error message received (code: 401)
(process:1074): libnice-DEBUG: 19:49:26.042: Agent 0x55982e113340 : stun_turn_process/disc for 0x55982e464840 res 2.
(process:1074): libnice-DEBUG: 19:49:26.042: agent_recv_message_unlocked: Valid STUN packet received.
(process:1074): libnice-DEBUG: 19:49:26.046: Agent 0x55982e113340: inbound STUN packet for 1/1 (stream/component) from [<...>]:7869 (108 octets) :
(process:1074): libnice-stun-DEBUG: 19:49:26.046: STUN demux: OK!
(process:1074): libnice-DEBUG: 19:49:26.046: Agent 0x55982e113340 : Valid STUN response for which we don't have a request, ignoring
(process:1074): libnice-DEBUG: 19:49:26.046: agent_recv_message_unlocked: Valid STUN packet received.
(process:1074): libnice-DEBUG: 19:49:26.054: Agent 0x55982e113980 : pair 0x55982e1393a0 state IN_PROGRESS (priv_conn_check_initiate)
(process:1074): libnice-DEBUG: 19:49:26.054: Agent 0x55982e113980 : STUN-CC REQ [172.30.150.100]:52897 --> [172.31.2.63]:64095, socket=4294967295, pair=0x55982e1393a0 (c-id:1), tie=3521818653514815344, username='<...>' (9), password='<...>' (24), prio=1e2000ff, controlling.
(process:1074): libnice-DEBUG: 19:49:26.054: Agent 0x55982e113980 : conn_check_send: set cand_use=1 (aggressive nomination).
(process:1074): libnice-stun-DEBUG: 19:49:26.054: Message HMAC-SHA1 message integrity:
(process:1074): libnice-stun-DEBUG: 19:49:26.054: key : 0x31654e50482b774733753453654772456571724f45517574
(process:1074): libnice-stun-DEBUG: 19:49:26.054: sent : 0x3ce16101a03de707d023c83976db923d67b4fab2
(process:1074): libnice-stun-DEBUG: 19:49:26.054: Message HMAC-SHA1 fingerprint: 0x45854073
(process:1074): libnice-DEBUG: 19:49:26.054: Agent 0x55982e113980: conncheck created 92 - 0x55982e13a9a8
(process:1074): libnice-DEBUG: 19:49:26.054: Agent 0x55982e113980 : timer set to 500ms, waiting+in_progress=6
The devs over at janus-gateway suggested, that this is more likely a problem of libnice as the 0xffffffff
socket only happens, when the socket does not have a file descriptor. See here
That's as far as I was able to debug this issue. Please let me know what you need.