Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
Equinix is shutting down its operations with us on April 30, 2025. They have graciously supported us for almost 5 years, but all good things come to an end. We are expecting to transition to new infrastructure between late March and mid-April. We do not yet have a firm timeline for this, but it will involve (probably multiple) periods of downtime as we move our services whilst also changing them to be faster and more responsive. Any updates will be posted in freedesktop/freedesktop#2011 as it becomes clear, and any downtime will be announced with further broadcast messages.
Regression since Kernel 6.10: W7900 GPU hangs frequently with MES timeout errors. After 6.10.9, machine hard crashes and repeats "MES ring buffer full" in dmesg. 6.9 kernel was not affected. Machine rendered unusable for desktop use by subsequent regressions.
Hardware description:
CPU: Xeon Max 9480
GPU: AMD Radeon Pro W7900
System Memory: 512GB
Display(s): BenQ, Asus ProArt
Type of Display Connection: DisplayPort
System information:
Distro name and Version: Arch
Kernel version: 6.10.9
Custom kernel: no
AMD official driver version: no
How to reproduce the issue:
Use W7900 under Kernel 6.10 and open Chromium and browse sites with lots of accelerated video (Reddit main page always triggers this):
[16511.864387] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[16511.864393] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
Update to 6.10.9 which contains MES overflow patch. Instead of merely hanging, entire system locks up with "MES ring buffer is full." machine will lock up and require hard reboot.
I couldn't reproduce the "MES ring buffer is full" on 6.10.10, but I just opened a video-laden site and scrolled rapidly up and down in Chromium, didn't actually use it for a while.
Log files (for system lockups / game freezes / crashes)
dmesg errors inline
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Will try 6.10.9 again and capture the exact error message here as well. Some other pieces of info, Firefox triggers this but only on new windows, whereas Chrome/Chromium seem to trigger it often, I'm assuming while hardware video decoding is occurring. The machine and W7900 work fine when used headlessly, the error does not occur.
Errors still occur on 6.10.10, but I wasn't able to reproduce the "MES ring buffer is full", though I didn't try very hard, just opened a site with videos and scrolled around really fast. Notably, the display would hang for 5s more often than the dmesg below was emitted. Approximately 3-5 hangs per dmesg.
[ 175.223625] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 175.223632] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 212.748665] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 212.748674] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 234.878885] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 234.878891] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 235.948777] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 235.948783] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 250.390843] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 250.390857] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 253.465162] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 253.465168] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 255.401294] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 255.401310] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 255.402364] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 255.402368] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 275.598964] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 275.598971] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 275.600400] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 275.600405] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 278.675143] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 278.675150] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
I got the patch applied, it seemed to vastly reduce the issue. I did manage to get one
[ 180.524963] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 180.524976] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
But I don't think it was while I had the browser open, possibly while shutting down the window manager. I didn't encounter any multi-second lockups anymore at least.
Running that now. The machine doesn't lock up, which 6.10.9 definitely did frequently, but it still stutters frequently on websites with lots of content/media while scrolling.
[ 65.651210] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 65.651218] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.558017] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.558119] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.558024] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.558123] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575273] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575273] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575273] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575273] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575274] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575274] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575273] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575274] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 130.575287] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575289] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575284] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575285] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575288] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575290] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575286] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 130.575297] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 132.761640] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 132.761646] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 132.761966] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 132.761970] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait[ 161.879177] amdgpu 0000:96:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)[ 161.879183] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_wait
kernel: [drm:0xffffffffc084a98d] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:0b:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: amdgpu 0000:0b:00.0: amdgpu: GPU Recovery Failed: -19kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset end with ret = -19kernel: amdgpu 0000:0b:00.0: amdgpu: device lost from bus!kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!kernel: amdgpu 0000:0b:00.0: amdgpu: ring sdma0 timeout, signaled seq=210646, emitted seq=210648kernel: amdgpu 0000:0b:00.0: amdgpu: GPU Recovery Failed: -19kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset end with ret = -19kernel: amdgpu 0000:0b:00.0: amdgpu: device lost from bus!kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!kernel: amdgpu 0000:0b:00.0: amdgpu: ring sdma1 timeout, signaled seq=100357, emitted seq=100359kernel: amdgpu 0000:0b:00.0: amdgpu: GPU Recovery Failed: -19kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset end with ret = -19kernel: amdgpu 0000:0b:00.0: amdgpu: device lost from bus!kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!kernel: amdgpu 0000:0b:00.0: amdgpu: Process information: process FPSAimTrainer-W pid 22585 thread dxvk-submit pid 22694kernel: amdgpu 0000:0b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=11547191, emitted seq=11547193kernel: amdgpu 0000:0b:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out
I probably have a similar problem on amd 7900 gre and 6.11-rc7 kernel. Games freeze after a while with a black screen. I use wayland and wlroots git + sway git, vulkan session.
My video card died after the patch above. Short circuit of PCI-E lines. Could this error lead to a video card failure? The video card is new and less than one month old. There were no high loads on the card, only Kovaak was launched.
So I have spent more time with the patch applied (MES timeout and fix for overfull ring buffer). The machine still hangs up frequently for a second or two (perhaps 2.1) with a frozen cursor at times, and continues to print the same set of errors. Note it does not include any SDMA ring timeouts as seen in some similar examples, just the register writes to MISC timing out. Is it possible there is an invalid command being sent to the card and ignored?
This still makes W7900 unusable as of 6.11.2-arch1, MES Ring Buffer full messages spamming the console, the machine has to be forcibly shut down. The problem seems to get worse the longer the machine is up, until the machine is basically frozen 90% of the time.
can confirm that I just hit the same errors, on 6.11.2
This happened after launching a game.
Snippet:
kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitsystemd[2408]: Started Konsole - Terminal.kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitxdg-desktop-portal-kde[2729]: xdp-kde-settings: Namespace "org.gnome.desktop.interface" is not supportedkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: amdgpu 0000:c1:00.0: amdgpu: MES failed to respond to msg=MISC (WAIT_REG_MEM)kernel: [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] *ERROR* failed to reg_write_reg_waitkernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 2393107995 wd_nsec: 2393123008kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: [drm] Fence fallback timer expired on ring sdma0kernel: sched: RT throttling activatedkernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 6664097183 wd_nsec: 6664139435kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: [drm] Fence fallback timer expired on ring sdma0kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0kernel: [drm] Fence fallback timer expired on ring sdma0kernel: [drm] Fence fallback timer expired on ring sdma0kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0kernel: [drm] Fence fallback timer expired on ring sdma0kernel: amdgpu 0000:c1:00.0: amdgpu: MES ring buffer is full.kernel: [drm] Fence fallback timer expired on ring gfx_0.0.0