dlist: store all dlist in a continuous memory block
This reduces cache-misses in execute_list for apps using lots of small dlist, like viewperf. This is only done for small dlist (fitting in one block) because doing this for larger ones wouldn't bring any benefit. For instance, in vp13/snx test 10: the % of cache-misses events in _mesa_glthread_execute_list/execute_list goes down from 17%/10% to 4%/3%. If "struct gl_display_list" were stored in an array this would also remove source of cache-misses since currently they're malloc-ed individually. Reviewed-by:Marek Olšák <marek.olsak@amd.com> Part-of: <mesa/mesa!11493>