Pathological device polling on kmsro -> libdrm codepath causes app startup delays
Downstream bug: https://github.com/leifliddy/asahi-fedora-builder/issues/21
On screen creation, kmsro tries opening every possible render device with drmOpenWithType()
passing a name, worst case 6 times before finding the right driver and succeeding. This goes on to call drmOpenByName()
. That then tries calling drmOpenMinor(i, 1, type)
for 16 possible render minor nodes, so 96 times. Since the create
argument is 1
, that tails into drmOpenDevice()
, which in UDEV
builds attempts to stat the directory and then the device 50 times with a usleep(20)
in between, for a grand total of 9600 stat calls and 4800 usleep calls and a cumulative 96ms of sleep time.
Of course, that assumes the kernel actually waits an accurate 20µs for each sleep, which is not going to happen. We're seeing easily 140µs per call, and since the libdrm code does not check the actual clock to implement the timeout but rather just uses a fixed iteration count, that brings the total dead time to 0.7 seconds. I believe with the wrong kernel/timer config this could easily be 1ms per call, and now we're talking 5 seconds of startup time.
There's a few places this could be improved, but I think right off the bat, that udev poll loop is a pretty bad code smell. I don't know what exact situation it's trying to solve, but AIUI DRM devices are supposed to probe synchronously and that should happen on module load (we already ran into trouble with gdm racing us when our display controller driver was doing it asynchronously, so I assume other drivers do it the same way or more things would be broken). That leaves just waiting for udev to settle events after loading the module, and that can probably be better accomplished with the equivalent of udevadm settle
once after actually taking a module load action, instead of pathologically polling every time we try to open a nonexistent driver's device.