Really bad performance on Thunderbolt 3 eGPUs with certain games
Brief summary of the problem:
Using an AMD graphics card through any Thunderbolt 3 eGPU enclosure (ex. Razer Core X) performs worse than an integrated Intel GPU, especially when Vulkan is used, in certain games. Tried the following Vulkan drivers and are all affected in the same way
- AMDVLK (also, the one that came today)
- Mesa RADV
- AMDGPU PRO
The symptoms include:
- Low and bumpy clock speeds and GPU usage, (less than 50%), even old graphic applications can't run properly, usage so low the temperature does not even turn the fan on
- No difference when downscaling the resolution to 800x600, no more than 30 fps for example in Cyberpunk 2077 no matter what option is used. Tried Far Cry 5 (no more than 40 fps at the lowest) and Killing Floor 2 (most areas run bad, at 30 fps)
It is pretty well known that this issue (just search on reddit, google, etc) only happens with certain newer AMD graphics cards, according to some I spoke with having the same issue, starting from the RDNA architecture, which started to support ReBAR, even if I can't exactly tell as I own an RX 7800 XT and can't test other graphics card. Apparently, an RX 580 is not affected, but personally can't test it. Also, NVIDIA cards are not affected.
At first I thought it was an issue with the PCI express speeds, and it was indeed running at PCIe 1.0 x4 speeds (2.5 GT/s) since it is an already known issue that can be workarounded by putting pcie_gen_cap=0x40000
as a kernel option but since it didn't make any difference I tried some OpenCL compute benchmarks to confirm that I was indeed running at the correct speeds (8.0 GT/s for PCIe 3.0 x4) but after forcing the correct speed, it didn't make any difference at ALL in the game benchmarks I tried. Forcing PCIe 1.0 4x actually even made a benchmark run slightly better.
Eventually, in all the other tests, I also double checked the pp_dpm_pcie
speeds, which showed
0: 8.0GT/s, x4 78Mhz *
1: 8.0GT/s, x4 156Mhz *
2: 8.0GT/s, x4 623Mhz *
Which is correct for PCIe 3.0 4x
Another curious thing, is that OpenGL does not seem affected at all, as forcing, for example, a game with WINED3D using OpenGL makes full use (99%) of the GPU and performs better than any Vulkan backend tried since then Forcing WINED3D with Vulkan makes the issue come back exactly as DXVK
This is also currently being tracked in mesa but always had doubts it should be reported here also, as other vulkan drivers are affected as well in the same way
It is also being said the performance may be improved by disabling ReBAR optimizations, putting the following flags in RADV
RADV_PERFTEST=nosam
or RADV_PERFTEST=dmashaders
in newer Mesa releases, however on my system using an RX 7800 XT didn't make a difference at all, the only thing that improved the performance was using OpenGL as Wine D3D backend trying various games as I said.
The intel iGPU and the discrete NVDIA card were off all the time
Hardware description:
- Asus TUF F15 FX516PM
- CPU: 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
- GPU: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:747e] (rev c8), or AMD Radeon RX 7800 XT for the friends.
- System Memory: 24 GB
- Display(s): 2560x1440 169,83Hz - Gigabyte M27q
- Type of Display Connection: DP
System information:
- Distro name and Versions tried: Arch Linux + Ubuntu 23.04
- Kernel versions: 6.5.5-arch1-1 (on Arch) linux-next-20230920 (on Ubuntu, also tried a bunch of other kernels with same result)
- AMD official driver version: Radeon™ Software for Linux® version 23.20 when the AMDGPU-PRO driver was used.
How to reproduce the issue:
- Own a Thunderbolt 3 eGPU
- Compare the performance with any Vulkan graphics application with any vulkan driver when used into a desktop, or Windows using the same eGPU setup.
Screenshots/video files
Forcing PCI express 1.0 speeds even improved the result.
Out of curiosity tried the newest vk\queue that came out days ago, with no difference. In those benchs tried different RADV flags
A test with OpenGL on Killing Floor 2