RX 6600 Driver Reset / Firefox
System information
- OS: EndeavourOS
- GPU: VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c7)
- Kernel version: 6.7.4-arch1-1
- Mesa version: OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.3.5-arch1.1
- Xserver version (if applicable): 1.21.1.11
- Desktop manager and compositor: KDE/X11
Describe the issue
Very rarly (happend twice in a Month now) the Driver will reset itself when browsing with Firefox. It starts with Freezing the DE (cursor is still able to move at this point) then the driver will reset itself. Since it isn't really reproducable and seems to happen randomly i'll provide what both driver resets had in common.
- Both times i had several youtube tabs open. Although i didn't actually play any video at this point i still suspect that Videos might have something to do with this.
- Both The Audio Player Audacious was running (don't know if this even can be related but i want to mention it).
- The Telegram Web Application was opened in a Tab. I noticed t hat this App will sometimes load videos/gifs pretty aggressively
- The Screen was previously in Sleep Mode (KDE Setting)
- I also noticed that when you are gaming and letting a music video play in background with firefox, tabbing out would kinda freeze the Video for a second or two. Not sure if this is related
However I can't really reproduce it - so all context i can give at this point are what i did during these Resets and what they had in common. Though i still want to report the Issue in case someone else is suffering from the same Issue. Which then might help investigating it.
Regression
The issue appeared out of nowhere. First time it appeared was a week after upgrading to Mesa 23.3.3-1 (which was the same week arch switched from dbus-daemon to dbus broker - other than that no new apps been installed since). Due to its randomness it's hard to put the finger on what Mesa Driver/Kernel really started it. On Windows i ran all kind of Benchmarks/Vram Tests etc using OCCT - couldn't reproduce any issue. Games work fine so far as well.
Log files as attachment
The only output i can provide seems to be:
Feb 12 17:39:00.356653 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32770, for process firefox pid 1683 thread firefox:cs0 pid 1687)
Feb 12 17:39:00.356924 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: in page starting at address 0x0000800300120000 from client 0x1b (UTCL2)
Feb 12 17:39:00.357064 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00601031
Feb 12 17:39:00.357194 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Feb 12 17:39:00.357323 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: MORE_FAULTS: 0x1
Feb 12 17:39:00.357448 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 12 17:39:00.357577 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: PERMISSION_FAULTS: 0x3
Feb 12 17:39:00.357736 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 12 17:39:00.357897 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: RW: 0x0
Feb 12 17:39:00.358062 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32770, for process firefox pid 1683 thread firefox:cs0 pid 1687)
Feb 12 17:39:00.358220 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: in page starting at address 0x0000800300128000 from client 0x1b (UTCL2)
Feb 12 17:39:00.358378 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Feb 12 17:39:00.358537 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
Feb 12 17:39:00.358695 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: MORE_FAULTS: 0x0
Feb 12 17:39:00.358856 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: WALKER_ERROR: 0x0
Feb 12 17:39:00.359012 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Feb 12 17:39:00.359170 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: MAPPING_ERROR: 0x0
Feb 12 17:39:00.359326 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: RW: 0x0
Feb 12 17:39:10.386384 Linux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=9749804, emitted seq=9749806
Feb 12 17:39:10.386453 Linux kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 1683 thread firefox:cs0 pid 1687
Feb 12 17:39:10.386468 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
Feb 12 17:39:10.576396 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: MODE1 reset
Feb 12 17:39:10.576609 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: GPU mode1 reset
Feb 12 17:39:10.576782 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: GPU smu mode1 reset
Feb 12 17:39:11.096407 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: GPU reset succeeded, trying to resume
Feb 12 17:39:11.096714 Linux kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
Feb 12 17:39:11.096739 Linux kernel: [drm] VRAM is lost due to GPU reset!
Feb 12 17:39:11.096756 Linux kernel: [drm] PSP is resuming...
Feb 12 17:39:11.176395 Linux kernel: [drm] reserve 0xa00000 from 0x81fd000000 for PSP TMR
Feb 12 17:39:11.283062 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: RAS: optional ras ta ucode is not available
Feb 12 17:39:11.299728 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Feb 12 17:39:11.299898 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is resuming...
Feb 12 17:39:11.300031 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2f00 (59.47.0)
Feb 12 17:39:11.300157 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU driver if version not matched
Feb 12 17:39:11.300284 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: use vbios provided pptable
Feb 12 17:39:11.349712 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: SMU is resumed successfully!
Feb 12 17:39:11.353101 Linux kernel: [drm] DMUB hardware initialized: version=0x02020020
Feb 12 17:39:11.473049 Linux kernel: [drm] kiq ring mec 2 pipe 1 q 0
Feb 12 17:39:11.476378 Linux kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
Feb 12 17:39:11.476393 Linux kernel: [drm] JPEG decode initialized successfully.
Feb 12 17:39:11.476405 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Feb 12 17:39:11.476571 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Feb 12 17:39:11.476703 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Feb 12 17:39:11.476829 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Feb 12 17:39:11.476954 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Feb 12 17:39:11.477077 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Feb 12 17:39:11.477200 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Feb 12 17:39:11.477323 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Feb 12 17:39:11.477446 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Feb 12 17:39:11.477570 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
Feb 12 17:39:11.477720 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Feb 12 17:39:11.477878 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
Feb 12 17:39:11.478036 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
Feb 12 17:39:11.478195 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
Feb 12 17:39:11.478355 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
Feb 12 17:39:11.478510 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
Feb 12 17:39:11.478666 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: recover vram bo from shadow start
Feb 12 17:39:11.488453 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: recover vram bo from shadow done
Feb 12 17:39:11.488603 Linux kernel: amdgpu 0000:2d:00.0: amdgpu: GPU reset(2) succeeded!