info: Code size and complexity increase
With the latest addition of additionaly checking the category threshold before actually evaluating debug statements, the code size/complexity has increased everywhere.
Previously a debug line would:
- Just check if the level of that debug statement was equal to, or below,
_gst_debug_min(i.e. the worst level activated)
- This would end up being a locally stored variable in any compiled function (i.e. only loaded once)
- The check is small/efficient (a simple compare/jmp): 5 instruction bytes (on amd64) and one branch
Since !403 (merged) the following happens in addition:
- Load the category (extra load)
gst_debug_category_get_threshold()(which is never inlined)
- That call does an atomic load \o/
- Finally come back and check the level
- That results in an increase of 30 instruction bytes (on amd64), an extra function call and yet-another-branch
This is increasing both:
- The size of all code (risk of not being able to load most code in cache)
- The number of branches (risk of overloading branch prediction in cpu)
I have tried to replace locally the call to
gst_debug_category_get_threshold() by the direct atomic_int_get, which reduces the code size slightly, but still results in extra branches.
- Revert !403 (merged)
- For debug statements where evaluating arguments is potentially expensive, add guards to make sure they are only called if that particular category threshold is exceeded.
A good example is how the debugging is handled in
#ifndef GST_DISABLE_GST_DEBUG /* Only traverse/dump if we know it will be outputted in the end */ if (qtdemux_debug->threshold < GST_LEVEL_LOG) return TRUE; g_node_traverse (node, G_PRE_ORDER, G_TRAVERSE_ALL, -1, qtdemux_node_dump_foreach, qtdemux); #endif