Use less TLS
Refer to this thread and this handy guide to TLS models. From what I can tell the default "TLS" path, at least on Linux, ends up using statically-allocated TLS slots (either initial-exec or local-exec), which is a severely scarce resource. The way we're using this TLS block could probably be local-dynamic instead, and/or could just fall down to pthreads API.