6900xt: Thermal Throtting due to Mem Temperatures Despite having fan speed headroom
As per my previous ticket (#1580 (closed)), I am using a 6900xt together with a 6800xt in the same system. The top slot has the 6900xt fitted, while my bottom slot has a 6800xt slotted. Interestingly enough, the top 6900xt is throttling despite there being plenty of headroom on the fan speed.
I have tested three different scenario's:
-
Gaming on the 6900xt with the 6800xt being idle: Boost clocks are a bit lower than when running the 6900xt card solo (due to worsened airflow), but it's still able to boost to 2.2-2.3ghz ish. Temperatures are high (as expected), but within AMD's specification. The hotspot temperature is at 110C. The memory temperature is in the low 90s. Fan speeds are at 1900-2000 RPM. (Out of a maximum of 3300).
-
Mining on the 6900xt with the 6800xt being idle: Hotspot is way cooler at 97C ish, but memory is at 100C (which is the maximum rated temperature for these dies). It's not throttling, but boost clocks are much lower at 2-2.1ghz.
-
Mining on the 6900xt while gaming on the 6800xt: Hotspot is still at roughly 97C ish. Memory is still at 100C (which again, is the maximum rated temperature), fan speed is at exactly HALF of maximum (1650 RPM). Core clock is heavily throttled at 750-950mhz.
Now don't get me wrong: Throttling is way better than actually overheating and resulting in possible damage. However, as far as I am aware, throttling should be a last resort thing. The fact that there is still a lot of headroom in terms of fan speed, but the GPU chooses to throttle instead doesn't make much sense. Now I could work with custom fan profiles of course, but I am of the opinion the automatic fan control should be able to do a better job here.
What I think is happening is this (not 100% sure though!): Fan speed is tied to core temperatures, so as long as core temperatures are fine, the fan will not ramp up higher. For gaming workloads this works fine, since the core is usually the hotter component. However, mining puts a lot more stress on the RAM modules. Thankfully the throttling behavior DOES take RAM temperatures into account, but this means that in this particular environment with this particular workload, the fan speed is held constant at 50%, while temperature of the RAM is controlled by throttling. The fan speed responding to either core temperature OR RAM temperature (depending on which needs cooling the most) would be a solution if my theory is correct.
Hardware description:
- CPU: 5900x
- First GPU: 6900xt
- Second GPU: 6800xt Midnight Black
- System Memory: 48GB 3200mhz CL16 ram
- Display(s): Dual 2560x1440 144hz Monitors
- Type of Diplay Connection: Double Display port
System information:
- Distro name and Version: Arch Linux
- Kernel version: 5.12.1 (patched with https://patchwork.freedesktop.org/patch/434089/?series=90262&rev=1)
- AMD package version: No package, using Mesa 21.1.1