Problems related to the lack of GPU preemption
#92 (comment 1379786) pointed out a denial of service attack on compositors that use dma-bufs. However, I think there is actually a broader class of problems: the lack of effective GPU preemption means that a rogue client can hog the GPU for an unpredictable amount of time. This is a nasty problem for any GPU use with untrusted tenants, not just Wayland.
There are two solutions I can think of:
- Partition the GPU (either in software or hardware) such that each tenant has access to a disjoint subset of GPU resources. That tenant can lock up the resources it has been assigned, but is not able to interfere with other tenants.
- Ensure that on-GPU computations can be involuntarily preempted in a small and bounded amount of time.
In either case, enforcement must be by means of the kernel driver and/or the compositor, even if the other parts of the system are malicious. While local DoS from a userspace process is typically not considered very serious, the same problems arise with any form of GPU virtualization that works on ordinary hardware with FLOSS drivers (read: does not rely on hardware SR-IOV).