[RADEON:KMS::EDID] i2c bit banging + preempt kernel -> i2c failure (random XRandR failures)
Submitted by Arno Schuring
Assigned to Default DRI bug account
Description
A few weeks ago, X started displaying weird errors during normal use. These range from simple output resets to application crashes due to X BadAlloc responses. In detail, this is what I see happening:
- output blink. Usually the screen goes dark for about a second. When the screen returns, I see either:
- display position getting cancelled (xrandr --pos 0x0) on my left screen
when this happens I can usually get it back to the correct position via the xrandr command line tool, but sometimes that triggers a WM crash
- display rotation getting cancelled (xrandr --rotate normal) on the right screen. When this happens:
- quodlibet (python-gtk app) crashing with BadAlloc (serial 255 error_code 11 request_code 53 minor_code 0). It will keep crashing like this until I restart X
- whatever other app I had running on the right screen has disappeared as well
- window manager (e17) crashing. It can recover succesfully, though
- when I re-enable screen rotation, the display gets completely garbled
- quodlibet (python-gtk app) crashing with BadAlloc (serial 255 error_code 11 request_code 53 minor_code 0). It will keep crashing like this until I restart X
I have started seeing this happen with 2.6.35-rc3 (was running 2.6.34-rc6 before that), and with 2.6.35.1 this happens a lot less frequently (from once a day to twice in a week).
From the latest crash, I can give the following info from xsession-errors:
(E17 init)
E17 INIT: XINERAMA CHOSEN: [1], 1080x1920+1280+0
E17 INIT: XINERAMA CHOSEN: [0], 1280x1024+0+896
(output blinking starts)
E17 INIT: XINERAMA SCREEN: [0], 1280x1024+0+0
E17 INIT: XINERAMA CHOSEN: [153045780], 0x153045764+0+153046580
(e17 crash, try to recover)
E17 INIT: XINERAMA CHOSEN: [1], 1920x1080+1280+0
E17 INIT: XINERAMA CHOSEN: [0], 1280x1024+0+0
(rotation lost on [1], position lost on [0])
E17 INIT: XINERAMA CHOSEN: [1], 1080x1920+1280+0
E17 INIT: XINERAMA CHOSEN: [141138596], 0x141138580+0+141139396
E17 INIT: XINERAMA CHOSEN: [0], 1280x1024+0+896
(restoring rotation confuses E17)
###!!! ABORT: X_CreatePixmap: BadAlloc (insufficient resources for operation);
6 requests ago: file nsX11ErrorHandler.cpp, line 182
UNKNOWN [/usr/lib/xulrunner-1.9.2/libxul.so +0x001CA781]
(firefox crashing)
Now, the reason for the E17 crash looks like a use-after-free bug. Question is: who is freeing the Xinerama structures, and why are they freed at all? I never had these problems before, and when using stock 2.6.32 (from Debian) I don't have these problems either, so I'm ruling out hardware failure.
I can see the following lines logged by the kernel (KMS enabled on radeon 9550):
[12045.280155] [drm:radeon_dvi_detect] *ERROR* DVI-I-1: probed a monitor but no|invalid EDID
[13500.386132] [drm:radeon_vga_detect] *ERROR* VGA-1: probed a monitor but no|invalid EDID
[13797.568760] [drm:radeon_dvi_detect] *ERROR* DVI-I-1: probed a monitor but no|invalid EDID
And I believe these are harmless (and reported in #27708). I can see these messages being reported for as long my kernel logs go back. But when the crashes occur, they are followed by these lines:
[35753.097508] i2c i2c-1: sendbytes: NAK bailout.
[35762.752866] i2c i2c-1: sendbytes: NAK bailout.
[35772.912023] i2c i2c-1: readbytes: ack/nak timeout
Which I believe are not so harmless. Or they could be a red herring, I'm not qualified to tell.
Since the same time, I've been seeing a lot of lines logged by g-s-d, like
gdk_pixbuf_format_get_name: assertion `format != NULL' failed
But I do not believe these are relevant.