Repeatable hang/crash after 5681 hours
Describe your issue
We use gstreamer in a long running embedded device to receive and play audio streams. We are experiencing an issue where our pipeline consistently crashes after running for 5681 hours. We have 4 different logged instances of this now. The exact failure message is inconsistent. We have two cases of "realloc(): invalid pointer," one case of "free(): invalid pointer," and one case of "corrupted double-linked list."
Expected Behavior
We expect our gstreamer pipeline to keep working indefinitely and not crash or hang.
Observed Behavior
We observe a crash shortly after 5681 hours of run time. When run without G_DEBUG=fatal_warnings, the pipeline silently hangs, which is worse than the crash as we can't automatically recover from that.
Setup
- Operating System: Buildroot Linux 5.0.0
- Device: Embedded armv7l
- GStreamer Version: 1.16.0
-
Command line:
GST_DEBUG_NO_COLOR=1 G_DEBUG=fatal_warnings GST_DEBUG=4 gst-launch-1.0 -mv udpsrc port=5555 ! rawaudioparse use-sink-caps=false format=pcm pcm-format=s16le sample-rate=22050 num-channels=2 ! queue ! audioconvert ! audioresample ! alsasink
Steps to reproduce the bug
Given that this error takes 236 days to appear, it's hard to have more specific reproduction steps than our particular embedded use-case. The devices that experience this issue have relatively diverse usage patterns (ranging from about a few distinct streams per day to hundreds per day), so the only common factor seems to be the 5681 hours. The crash seems to occur on the first stream attempted after that ~5681 hour mark is hit. In particular, the times of the 4 crash reports I have are as follows (236 days, 17 hours, 21 minutes; 236 days, 18 hours, 41 minutes; 237 days, 6 hours, 48 minutes; and 236 days, 18 hours, 4 minutes)
How reproducible is the bug?
It seems to be consistent, but we're not totally sure, as we don't have perfect statistics on all of our devices and their uptimes.
Screenshots if relevant
Solutions you have tried
Related non-duplicate issues
Additional Information
I've attached the debug logs that we were able to capture. We only captured at debug level 4 due to the long-running nature of the issue and space constraints.