xfree86: drm device access denied to armsoc driver due to systemd-logind (patch drafted)
Submitted by Alexei Colin
Assigned to Xorg Project Team
Created attachment 111383 Patch (commit 1 of 2): factor out for reuse a function to lookup device by bus ID
Symptom: startx (and xinit) from a virtual console (as root) fails to start the X server with the following error:
(EE) ARMSOC(0): ERROR: Cannot set the DRM interface version. (EE) Screen(s) found, but none have a usable configuration. Fatal server error: (EE) no screens found(EE)
This is with xf86-video-armsoc (both Arch package xf86-video-armsoc-git 1:219.f16b5c8-2 and xf86-video-armsoc-odroid 237.3bdf799-1) on Linux 18.104.22.168-3-ARCH armv7h on Samsung Exynos 4412 on Hardkernel Odroid U3 board. However, the problem is not strictly platform-specific (see below).
The problem: kernel drm module debug output (drm.debug=1 on kernel command line) shows that set version ioctl (nr=0x07) to the drm device from the armsoc driver fails with EACCESS (-13), see attached kernel log. This is because the armsoc driver's device file descriptor does not have the 'master' priviledge. The server opens the /dev/dri/card0 device through systemd-logind, which sets the server's file descriptor to be the master via ioctl nr=0x1e from systemd-logind process. Later, the armsoc driver opens the device, but its file descriptor is not master because the server's descriptor is still open and is the master.
The device is opened via systemd-logind by the server during bus probing without regard to whether the driver SUPPORTS_SERVER_FD, i.e. does not explicitly open the device file but fetches the descriptor via an attribute initialized by the server. However, (only) if a non-SERVER_FD driver does implement the 'platformProbe', then the server releases the file descriptor through the systemd-logind before further probing in the driver. The armsoc driver does not implement 'platformProbe' and relies on fallback to 'Probe', the path on which the above release logic does not take place.
In summary, this problem seems to affect all drivers, like armsoc, that (1) do not SUPPORTS_SERVER_FD, and (2) do not implement the 'platformProbe' method but only the older 'Probe' method.
Potential solutions (both tested and make startx succeed): (A) A workaround is to build the server with './configure --disable-systemd-logind'. (B) Attached patchset moves the descriptor release logic slightly up the stack so that it applies to both drivers with 'platformProbe' and with 'Probe' instead of only to the former. Tested against recent xorg/xserver master 3b5be33f.
Details on patch:
platformProbeDev [if drv->platformProbe exists]
doPlatformProbe systemd_logind_release_fd [if not SUPPORTS_SERVER_FD] drv->platformProbe drv->Probe [if drv->platformProbe does not exist]
systemd_logind_release_fd [if not SUPPORTS_SERVER_FD]
platformProbeDev [if drv->platformProbe exists]
doPlatformProbe drv->platformProbe drv->Probe [if drv->platformProbe does not exist]
Attached is an Arch PKGBUILD for building the package and testing the patch (derived from https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/xorg-server): makepkg -si PKGBUILD --asroot
(1) This problem is not fatal to display managers (LightDM), because the request for the file descriptor from the systemd-logind never succeeds in the first place, because association with a logind session fails: '(EE) systemd-logind: failed to get session: PID 8464 does not belong to any known session' in X server log.
(2) This problem happens to not manifest (as severely) if server is started on a virtual console different from the current session where startx is run, e.g. when running 'startx -- :0 vt3' on vt2. This is a sideeffect: the VT switch event triggers systemd-logind to 'pause' the drm device, which involves droping master, just in time before the armsoc driver needs it. The server does start but it ends up outside of an active logind session (the vt2 session goes from 'active' to 'online' state, and there is no other session created). As a result, input devices don't work. Running xserver in a non-controlling VT seems to run into separate issues (Bug 81932), in the present bug report the concern is with running it in the controlling VT.
(3) This problem affects LibDRM: the /dev/dri/card0 device is wrongly determined to be in use, because drmGetBusid returns a non-zero value, and is skipped, preventing server startup. See attached debug output from 'export LIBGL_DEBUG=verbose; startx'. To debug the original issue, libDRM was patched to override this 'in use' decision. This patch is not necessary once the underlying issue is fixed in the server.
Attachment 111383, "Patch (commit 1 of 2): factor out for reuse a function to lookup device by bus ID":