6800xt and 6900xt in Dual GPU setups: Broken Automatic Fan Control Leading to Overheating
I am experiencing a very weird issue with my 6800xt's fan curve, and I am not sure what is causing it. The card is completely fine when it's alone in my case, however, when I add a second GPU card for compute reasons (5700xt) below it, it is a bit starved for air. I was expecting the fans having to work harder to keep temperatures in check, but seeing as the fan would sit at 40-50% without the card, I was assuming there was enough headroom for some worse airflow. I started testing gaming with the second card (5700xt) being completely idle. That should lead to barely any additional heat generated since these cards consume very little when completely idle. However, it does influence the amount of air the 6800xt can suck in.
However, it turns out the fan curve on my 6800xt is extremely wack at higher RPM. The fan seems to hit a wall at ~1650 RPM, and will not increase any higher. However, according to the values in "sensors", the maximum RPM for these cards is 3300 meaning there is a lot of headroom left for the fans to spin up faster. What happens is the following:
- I start gaming and the temperature slowly rises
- The fan starts to climb to higher RPM values which compensates the increase in temperature as expected
- At 1650 RPM the fan will refuse to spin any faster. (exactly half of the maximum rated RPM)
- Temperature slowly starts to creep upwards as 1650 RPM is a tiny bit too little to keep the temperature in check. Mind you, it's a VERY slow climb in temperature taking up to 5 minutes
- At some point the temperature climbs too high (about 93-95C on Tjunction, which isn't out of the ordinary according to multiple reviews), and then something extremely weird happens: Instead of the fan climbing higher to compensate (as expected) or remaining stuck at 1650 RPM (as it was previously), the fan speed drops like a stone to ~630 RPM.
- Temperature rapidly starts increasing, hitting 110c on the Tjunction
- I didn't wait to see what happened next, as I quickly closed my game to prevent anything bad from happening.
Mind you, I do not have any third party tools running (or even installed!) that are able to control the fan speeds or clock speeds on these cards. It's pure stock at the moment. The only thing I am using to view temperatures is psensor, which is reading values from lm_sensors. Furthermore, the fan curve/speed below 1650 RPM is completely normal. Spinning up and down as required based on temperatures. It's only when it needs to go above that that this extremely weird behavior starts to happen.
Hardware description:
- CPU: 5900x
- GPU: 6800xt Midnight Black
- Second GPU: Red devil 5700xt
- System Memory: 48GB 3200mhz CL16 ram
- Display(s): Dual 2560x1440 144hz Monitors
- Type of Diplay Connection: Double Display port
System information:
- Distro name and Version: Arch Linux
- Kernel version: 5.11.16
- AMD package version: No package, using Mesa 21.0.3
How to reproduce the issue:
- See above