[NVE4] freezes: HUB_INIT timed out
wolf480@interia.pl
Submitted byAssigned to Nouveau Project
Description
Created attachment 117518
dmesg log from freeze with runpm=0
I have a Medion X7827 laptop with GK104 GPU in an Optimus setup, running:
Linux 4.1.2 x86_64
Mesa 10.6.2
Xorg 1.17.2
I've been experiencing some freezes:
- a total freeze (no ping, no sysrq, only hard reset) shortly after xorg start - if nouveau is loaded without runpm=0
- a recoverable freeze (sysrq+K worked) when exiting xorg - if nouveau is loaded with runpm=0
On #nouveau IRC channel I've been told to try the hack-gk106m branch of this repository: http://... , with runpm=0
At first I thought it helped, but then I noticed the freezez happen randomly.
When runpm=0 is set, the freeze has about 60% chance of happening. I've tested it with both in-tree nouveau.ko and one built from hack-gk106m branch, and looks like the chance is the same on both.
When the freeze happens, there's either a "HUB_INIT timed out" message or "grctx template channel unload timeout" message in dmesg.
If the freeze is to happen, the error message shows up at nouveau module load time, and then again when Xorg starts. Full logs in attachments.
I did mmiotraces of the nouveau.ko from hack-gk106m branch (can repeat with in-tree nouveau.ko if necessary), with runpm=0, for all of the cases:
- the driver loading succesfully
- the driver loading with HUB_INIT timeout error
- the driver loading with grctx timeout error
The traces and corresponding dmesg logs are in attachments. I have more traces, but included only one per case.
I did not try to start xorg and trigger the freeze during the mmiotraces, because:
a) I believe the problem happens at nouveau load time, when it tries to initialize the GPU
b) The traces compressed with `xz -9` barely fit in the max attachment size of bugzilla, if they were longer I doubt I could make them fit.
I hope these traces will be useful and help figure out why it sometimes works and sometimes doesn't, and how to make it always work.
Let me know if there's anything more I could to to help you figure this out.
**Attachment 117518**, "dmesg log from freeze with runpm=0":
dmesg.log