aco: implement 8/16-bit shared, global and scratch
This implements 8/16-bit shared, global and scratch loads/stores and attempts to create various helpers for performing loads/stores. It also has some fixes
This implements 8/16-bit shared, global and scratch loads/stores and attempts to create various helpers for performing loads/stores. It also has some fixes