aco: Some tessellation related fixes and improvements
This MR contains a few fixes and improvements which are relevant to how tessellation shaders are compiled.
Pre-requisite: !4193 (merged) is needed because I rely on collecting the necessary information in
- We can now truly merge the LSHS shaders when the number of LS and HS invocations are the same. These shaders are no longer "cut" in the middle, we can schedule accross the two halves and the VS outputs are passed to TCS in registers.
- LDS usage is reduced when cross-invocation inputs are not used.
- Some loads, stores are combined.
- A few other minor improvements, such as a more optimal sequence at the beginning of every merged shader.