info: Code size and complexity increase
With the latest addition of additionaly checking the category threshold before actually evaluating debug statements, the code size/complexity has increased everywhere.
Previously a debug line would:
- Just check if the level of that debug statement was equal to, or below,
_gst_debug_min
(i.e. the worst level activated)- This would end up being a locally stored variable in any compiled function (i.e. only loaded once)
- The check is small/efficient (a simple compare/jmp): 5 instruction bytes (on amd64) and one branch
Since !403 (merged) the following happens in addition:
- Load the category (extra load)
- Call
gst_debug_category_get_threshold()
(which is never inlined)- That call does an atomic load \o/
- Finally come back and check the level
- That results in an increase of 30 instruction bytes (on amd64), an extra function call and yet-another-branch
This is increasing both:
- The size of all code (risk of not being able to load most code in cache)
- The number of branches (risk of overloading branch prediction in cpu)
I have tried to replace locally the call to gst_debug_category_get_threshold()
by the direct atomic_int_get, which reduces the code size slightly, but still results in extra branches.
Proposal
- Revert !403 (merged)
- For debug statements where evaluating arguments is potentially expensive, add guards to make sure they are only called if that particular category threshold is exceeded.
A good example is how the debugging is handled in gst-plugins-good/gst/isomp4/qtdemux_dump.c
:
#ifndef GST_DISABLE_GST_DEBUG
/* Only traverse/dump if we know it will be outputted in the end */
if (qtdemux_debug->threshold < GST_LEVEL_LOG)
return TRUE;
g_node_traverse (node, G_PRE_ORDER, G_TRAVERSE_ALL, -1,
qtdemux_node_dump_foreach, qtdemux);
#endif