Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
*ERROR* IB test failed on gfx (-110) occasionally appears on my P14s Gen2 AMD when waking up from s0ix. The chance is ~1:1.
When this happens, the display is flickering (interestingly, a GNOME screenshot does only capture a transparent rectangle).
When this happens, I have to reboot the laptop to fully recover.
This does never happen with s3 sleep.
Hardware description:
CPU: 5850u
GPU: Cezanne
System information:
Distro name and Version: Fedora 35
Kernel version: 5.15.6
DE Version: GNOME 41.1 (Wayland)
How to reproduce the issue:
echo s2idle | sudo tee -a /sys/power/mem_sleep
Close the lid
Open the lid
Attached files:
Screenshots/video files
[ TODO: For rendering errors, attach screenshots of the problem and (if
possible) of how it should look. For freezes, it may be useful to provide a
screenshot of the affected game scene. Prefer screenshots over videos. ]
Log files (for system lockups / game freezes / crashes)
Alright. I actually have no clue what all the errors are about tbh.
Can you potentially try with the newer firmware at linux-firmware.git 0x101001c and see if it persists? This was uploaded as commit fbdf20e
This didn't hit Fedora repos yet, right? If there isn't an easy way to build an rpm out of it, I am afraid we have to wait till it is in the F350 repo...
F35 has linux-firmware 20211027, but the referenced commit landed after that upstream.
If there isn't an easy way to build an rpm out of it, I am afraid we have to wait till it is in the F350 repo...
FWIW, you could manually put the files from upstream Git into /lib/firmware/amdgpu/ and re-generate /boot/initramfs-$(uname -r).img. If something goes wrong, in the worst case you can boot another kernel and reinstall the Fedora linux-firmware package, which should re-generate the initrd as well.
FWIW, you could manually put the files from upstream Git into /lib/firmware/amdgpu/ and re-generate /boot/initramfs-$(uname -r).img. If something goes wrong, in the worst case you can boot another kernel and reinstall the Fedora linux-firmware package, which should re-generate the initrd as well.
/lib/firmware/amdgpu/ seems to contain compressed .bin.xz files, while the upstream files are raw .bin files. Do I still have to transform the files somehow before placing them there?
In the mean time, I got another s0ix resume fail, again showing the *ERROR* IB test failed on gfx (-110) warning but this time not including the *ERROR* Error waiting for DMUB idle: status=3 warning.
/lib/firmware/amdgpu/ seems to contain compressed .bin.xz files, while the upstream files are raw .bin files. Do I still have to transform the files somehow before placing them there?
Good question. I guess safest would be to compress the new files as well with xz.
FWIW, you could manually put the files from upstream Git into /lib/firmware/amdgpu/ and re-generate /boot/initramfs-$(uname -r).img. If something goes wrong, in the worst case you can boot another kernel and reinstall the Fedora linux-firmware package, which should re-generate the initrd as well.
Turns out, that didn't work (the kernel didn't boot, showed the Lenovo logo forever).
Went back to an older kernel then, which still booted. However, reinstalling linux-firmware and regenerating the initramfs still didn't give me a booting kernel. I then reinstalled kernel-*, which gave me a booting kernel again.
Do you know how to use fedora developer tools? If so, you might be able to generate an updated rpm and test like this. In brief, you should fedpkg clone linux-firmware and slip in the updated linux-firmware tarball created by make dist or equivalent.
Thanks for the idea, but I unfortunately don't have any experience.
I have cloned the existing linux-firmware packaging spec now and this builds just fine. But tbh, I have no idea how to force it to use for example the linux-firmware git directly or a local tar.gz.
If you are having trouble updating your initrd, you can just load the driver after the system has booted. put the new firmware in /lib/firmware/amdgpu, then append modprobe.blacklist=amdgpu 3 to the kernel command line in grub. Once the system has booted to a console, run the following:
Björn Daasechanged title from ERROR IB test failed on gfx (-110) occasionally on P14s Gen2 AMD when waking up from s0ix to ERROR IB test failed on gfx (-110) on P14s Gen2 AMD when waking up from s0ix
changed title from ERROR IB test failed on gfx (-110) occasionally on P14s Gen2 AMD when waking up from s0ix to ERROR IB test failed on gfx (-110) on P14s Gen2 AMD when waking up from s0ix
FYI I don't think it helps the underlying issue you see though "Error waiting for DMUB idle: status=3", it just might help you recover. For the actual underlying issue, this is actually a regression right? Do you think you could bisect back to what caused it?
Hmm, unfortunately none of my custom build kernels (even a non-modified 5.15.7, using the fedora kernel.spec, but also the one with your suggested patch) boot past fb0: switching to amdgpudrmfb from efi vga. I have no idea what I am doing wrong here...
May I ask how are you building your kernel? For official sources, you should use kernel-ark gitlab and run make dist-srpm and then rebuild the resulting srpm. This should give you pretty much the same kernel as the official fedora one. You will need to disable secure boot, or jump through very many hoops.
Hmm, unfortunately none of my custom build kernels (even a non-modified 5.15.7, using the fedora kernel.spec, but also the one with your suggested patch) boot past fb0: switching to amdgpudrmfb from efi vga.
How long have you waited? E.g. if some needed firmware file is missing from the initrd, there's a 3 minute timeout IIRC.
A general troubleshooting trick is to boot with modprobe.blacklist=amdgpu on the kernel command line, then after boot-up log in via SSH and run sudo modprobe amdgpu. That should allow getting more information from dmesg at least.
How long have you waited? E.g. if some needed firmware file is missing from the initrd, there's a 3 minute timeout IIRC.
~8 hours, I gave it some time overnight. But it was still hanging there in the morning.
A general troubleshooting trick is to boot with modprobe.blacklist=amdgpu on the kernel command line, then after boot-up log in via SSH and run sudo modprobe amdgpu. That should allow getting more information from dmesg at least.
I might be able to try this that weekend. But I am still wondering what I am doing wrong, because it should be a quite well tested scenario...
Do you still have modprobe.blacklist=amdgpu in your grub command line? In Fedora the boot line of newly installed kernels gets based on the most recent one of installed kernels. Maybe your self-built kernels ended up inheriting some changes you did.
You can check this by checking the files in /boot/loader/entries.
FWIW I used to see [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3 in kernel logs with a 6600XT and kernel 5.15.10-arch1-1 ; GPU wouldn't come back from sleep (screen black) and GPU sensors appeared to be stuck
Until yesterday, I thought the situation is as follows. On the P14s Gen2a there are three configurations available:
"Windows 10" mode selected in BIOS, "s2idle" selected in /sys/power/mem_sleep
"Linux" mode selected in BIOS, "s2idle" selected in /sys/power/mem_sleep
"Linux" mode selected in BIOS, "deep" selected in /sys/power/mem_sleep
It seemed to me, that the first and second case (so "s2idle", no matter which BIOS mode selected), should be the same.
It turns out, however, that it DOES make a difference whether it's "Windows 10" or "Linux" mode selected in BIOS. I was testing with "Linux" all the time and got the crashes seen in this thread. However, when selecting "Windows 10", the laptop suspends and resumes just fine.
What do you guys think is happening here? Is this an AMD or Lenovo issue? It's really weird that this behaves differently even though you can't see a difference when using the OS.
Re-opening this issue as the patch series to drop s2idle from AMD systems when configured to S3 was rejected. I've sent up https://patchwork.freedesktop.org/patch/469143/ as another idea to try out here.