Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
The system randomly freezes and doesn't react to anything afterwards. Not even the magic keys can reboot the system.
Processor model is Intel Core-i7 5500U with the integrated GPU.
Kernel version is 4.0.0-rc1, which is required to even get X / gdm working with the system.
I've attached the kernel log messages which shows an instance of this problem.
Please request any information needed and I'll happily provide it.
Short version
---
adding 'intel_iommu=igfx_off' helped
Long version
---
I've tried many things to resolve this issue, from kernel reconfiguration to installing mesa, libdrm, intel drivers from latest repository masters, which all didn't help. I reverted back to the most recent releases of the packages.
From an older forum entry somewhere on the webs I found that this could be related to virtualization techologies and memory remapping, so I added the following arguments to my kernel commandline: 'intel_iommu=igfx_off'
Ever since (about 10-15 hours of very active usage) I haven't had a single freeze.
I still think this is not normal behavior, since turning off iommu for the GPU can't be the right or necessary thing to do.
some additional info - my BIOS (3rd gen x1 carbon) apparently marks x2apic as broken. I booted a number of times with intremap=no_x2apic_optout on the kernel command line, and saw what steveej mentioned: a hard freeze.
The system did have the foresight to save the dmesg into the EFI pstore. I have those logs if they are useful.
After removing no_x2apic_optout, the kernel "Enabled IRQ remapping in xapic mode", and under xapic some of the time the kernel was able to recover/reset the chip to an ok-enough state that I could save dmesg and grap the GPU dump from /sys/class/drm/card0/error.
I found the same problem as in comment 4. If I disable VT-d in the BIOS the crashes disappear. But then I get random segmentation faults from GCC if I try to compile QtWebKit (N.b: I have gentoo and compile all packages by myself.) Hence, I have two options
(1) Disable VT-d for daily work such that i915 does not crash
(2) Enable VT-d and only boot into text console mode if I need to compile QtWebKit
In my case this issue (googling for the opcode hanging the GPU lead me to this bug) was solved by disabling the EFI Framebuffer on the kernel configuration.
If the devs want I can open a second bug to request the Intel GFX drivers taking over from early framebuffers (for example EFI or VGA) to prevent my particular issues.
Looking at the first two dumps, this looks like it might be a simple driver bug. The driver forgets to use the DMA API and wrongly just hands a physical address to the device. The device does DMA to that invalid address, takes a well-deserved fault, and is subsequently unhappy.
The faulting addresses do not look like addresses which would be given out as virtual DMA addresses by the DMA API. Such addresses would typically start at 0xfffff000 and grow downwards.
I reported bug #90091 initially, which was marked as duplicate of this one.
I've tried to reproduce with 4.2.2, and it still happens. Two dmesgs are attached, the second one with i915.enable_execlists=0. In the latter case I only have the DMAR fault but not the GPU hang.
Attachment 118605, "kernel log on 4.2.2": dmesg.txt
I have the same problem, and Google Chrome (as in the binary release, not Chromium) seems to trigger it. Disabling 3D rendering in X has helped, as now I get a crash and hang every couple of weeks instead of around once a day.
Here's my uname output, in case it helps:
Linux laurana 4.0.5-gentoo #5 (closed) SMP Tue Sep 22 09:45:32 CEST 2015 x86_64 Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz GenuineIntel GNU/Linux
I'll try the 'intel_iommu=igfx_off' kernel option and see if that improves matters. Failing that, I'll disable VT-d, unless it is required by Docker.
If I can provide any helpful info, please let me know and I'll be happy to do so.
I have here an Dell XPS 13 9350 2016 (Intel Core i7-6560U ) and have installed Fedora 23 (GNOME), currently on Kernel 4.4.5-300.fc23.x86_64.
As soon I open Chrome or Firefox and let’s say open YouTube after few second whole laptop will hang, totally unresponsible. Need to press Power ON button for few seconds to power of the device.
Once I was able to get log and its attached.
I added "intel_iommu=igfx_off" to /etc/default/grub and regenerated GRUB ( grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg) but it does not solve the problem here.
If any more info is needed I would be glad to provide them to you. I hope I will be able to use my new Notebook without issue.
With the same behavior here. I've tried to disable other framebuffer devices such as EFI, but in the meanwhile the only process which worked for me was disabling the iommu in the i915 kernel module, which is not a clean approach.
I've attached my dumps and hope it helps. I'll try Arch Linux anyway to see if has the same problem.
It is still unfixed as of now - more than one year after reported. It is crucial - makes laptop unusable for ANY kind of video tasks, as with igfx_off it works really slow. Will such a critical bug finally be assigned to someone? It was reported so many times and affected so many users... Or at least any thoughts, like where to watch this damn DMAR code for this bug?
This is actually fixed for me in 4.7. I have a Thinkpad X1 Carbon with an Intel i7-5600U, and haven't seen any issues when enabling the IOMMU over the last month or so (I don't remember if things also worked under 4.6, but I Was running 4.7 RCs for a bit).
I built and tried 4.7.2 last night and enabling VT-d still causes the freezes to happen so I don't think it is fixed yet. Kernel log messages when the freezes happen contain the following when it was able to recover from the freeze:
[ 1936.694513] [drm] stuck on render ring
[ 1936.694899] [drm] GPU HANG: ecode 8:0:0x85dffffb, in X [3356], reason: Engine(s) hung, action: reset
[ 1936.696494] drm/i915: Resetting chip after gpu hang
And the last one that killed it, requiring the power button to be held to turn the machine off, had a different ecode:
[ 1944.706379] [drm] stuck on render ring
[ 1944.706694] [drm] GPU HANG: ecode 8:0:0xbf9fffff, reason: Engine(s) hung, action: reset
[ 1944.708378] drm/i915: Resetting chip after gpu hang
First of all, sorry i forgot to add the comment to the file.
I think am also hitting this bug, i think. The system is a Lenovo ThinkPad X250 - Intel(R) Core(TM) i5-5200U and the HD Graphics 5500.
Since i started using the IOMMU i have been getting hangs and also some X restarts. This is what dmesg says after the fact:
[ 1409.513438] DMAR: DRHD: handling fault status reg 3
[ 1409.513451] DMAR: [DMA Read] Request device [00:02.0] fault addr f5995000 [fault reason 05] PTE Write access is not set
[ 1409.513468] DMAR: DRHD: handling fault status reg 3
[ 1409.513471] DMAR: [DMA Write] Request device [00:02.0] fault addr f5968000 [fault reason 23] Unknown
[ 1418.830396] [drm] GPU HANG: ecode 8:0:0x85dffffb, in kwin_x11 [2191], reason: Hang on render ring, action: reset
[ 1418.830407] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1418.830409] [drm] Please file a new bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1418.830410] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1418.830412] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1418.830413] [drm] GPU crash dump saved to /sys/class/drm/card0/error
I will upload said crash dump here. If i can provide you with any more debug-logs or should test something, i would be glad to help.
*** Bug 98309 has been marked as a duplicate of this bug. ***
I would be careful not to mix gen8/gen9 reports for the moment, not until we
have the root cause.
Well I pointed out early that one of the causes seems to be a conflict between the UEFI framebuffer driver and the intel one, most likely because of some race conditions or both trying to access the same hardware at the same time.
For me disabling the EFI framebuffer solved the issue so far so maybe other reporters may want to test and see if that solves the issue for them too.
For me disabling the EFI framebuffer solved the issue so far so maybe other
reporters may want to test and see if that solves the issue for them too.
I've neither had efifb nor fbsimple enabled on my XPS 15 9550, but I didn't get rid of this problem until I added "intel_iommu=igfx_off" to my bootargs.
For me disabling the EFI framebuffer solved the issue so far so maybe other
reporters may want to test and see if that solves the issue for them too.
I've neither had efifb nor fbsimple enabled on my XPS 15 9550, but I didn't
get rid of this problem until I added "intel_iommu=igfx_off" to my bootargs.
Interesting, it seems then that there is more than one different instance of this bug then. Do you have any other FB or driver that interacts with the intel card other than the Intel's modesetting one? The VGA console could be one such driver.
For me disabling the EFI framebuffer solved the issue so far so maybe other
reporters may want to test and see if that solves the issue for them too.
I've neither had efifb nor fbsimple enabled on my XPS 15 9550, but I didn't
get rid of this problem until I added "intel_iommu=igfx_off" to my bootargs.
I failed to mention that I didn't come across this bug until trying some drm-intel-nightly based on 4.9.0, but efifb was never involved.
intel_iommu=igfx_off also solved a long standing bug with suspend on my machine, bug 97211
But, I don't know if what I'm experiencing is relevant here, since I didn't see this bug appear until just recently with drm-intel-nightly (and fixed again by using intel_iommu=igfx_off)
I guess we both are experiencing different bugs with similar symptoms.
In my case at least, the UEFI display driver clashes with the Intel one resulting in the IOMMU violations. This seems to be some kind of firmware bug where the firmware isn't playing along with the MMU settings and Intel's driver.
In yours, the cause may come from somewhere else, hopefully the devs can provide more guidance on what is triggering your case, but if sleep is involved chances are that firmware is somehow part of the issue.