Long hangs in buddy allocator with DG2/A380 without Resizable BAR since 6.9
I've been seeing long (~1s–5min) stutters and hangs — since installing kernel 6.9. These typically happen in games where a lot of data is being used in video memory (it seems to be mostly happening on or just after loading screens, sometimes randomly).
This seems consistent with some sort of thrashing, and perf shows the kernel is spending all of its time in the buddy allocator (typically drm_buddy_alloc_blocks
/__alloc_range_bias
. My machine doesn't support resizable BAR, so only 256MiB (out of the 6G) of video memory is accessible from the CPU.
I've tried this:
- On the openSUSE 'vanilla' 6.9 kernels (which should be unmodified upstream kernels)
- The latest torvalds/master kernel (6.10-rc1+), built locally.
- With two games: Age of Empires 2: Definitive Edition (under DXVK and wined3d: both show the issue), and Psychonauts 2 (native Linux UE4/vulkan).
- The system also has an integrated HSW GPU, which doesn't seem to show the issue, but is really too slow in games to test reliably.
Running with xe instead on i915 solves the issue: the games do not stutter at all.
System info:
uname -a: Linux sparky 6.10.0-rc1-sulix+ #1 SMP PREEMPT_DYNAMIC Sun Jun 2 12:04:56 AWST 2024 x86_64 x86_64 x86_64 GNU/Linux
lspci -vnn -d :*:0300:
00:02.0 VGA compatible controller [0300]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [8086:0412] (rev 06) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd Device [1458:d000]
Flags: bus master, fast devsel, latency 0, IRQ 33
Memory at f7400000 (64-bit, non-prefetchable) [size=4M]
Memory at d0000000 (64-bit, prefetchable) [size=256M]
I/O ports at f000 [size=64]
Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [d0] Power Management version 2
Capabilities: [a4] PCI Advanced Features
Kernel driver in use: i915
Kernel modules: i915
03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A380] [8086:56a5] (rev 05) (prog-if 00 [VGA controller])
Subsystem: Device [172f:3941]
Flags: bus master, fast devsel, latency 0, IRQ 36
Memory at f6000000 (64-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Expansion ROM at f7000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [d0] Power Management version 3
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [420] Physical Resizable BAR
Capabilities: [400] Latency Tolerance Reporting
Kernel driver in use: i915
Kernel modules: i915, xe
Motherboard (from dmidecode): Gigabyte Technology Co., Ltd. Z87X-UD5H
KDE's system info:
Operating System: openSUSE Tumbleweed 20240524
KDE Plasma Version: 6.0.4
KDE Frameworks Version: 6.2.0
Qt Version: 6.7.0
Kernel Version: 6.10.0-rc1-sulix+ (64-bit)
Graphics Platform: Wayland
Processors: 8 × Intel® Core™ i7-4770K CPU @ 3.50GHz
Memory: 31.0 GiB of RAM
Graphics Processor: Mesa Intel® Arc
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z87X-UD5H
Full system journal with drm.debug=0xe since boot (running first Psychonauts 2, then AoE2) is included, save for a bunch of "[drm:drm_mode_addfb2] [FB:288]
" messages which were spamming multiple times per second. kwin notes that atomic commits are failing during the hangs due to -ENOMEM (which it's reporting in japanese — sorry — as 'メモリを確保できません'). The games always recovered from the hangs. A few of the i915 messages are from the HSW integrated GPU, so sorry for the spam. I've also booted with KUnit enabled: the drm_buddy_alloc tests pass.
The 'attach button on gitlab isn't working: I've uploaded the log here: https://davidgow.net/stuff/sparky_journal_20240603.log