Draft: Define library and interfaces for GPU offloading

Kyle Brenneman requested to merge kbrenneman/libglvnd:app-profile-library into master

This is an experimental interface for configuring GPU offloading for individual applications through libglvnd. It's sort of a generalization of the DRI_PRIME and __NV_PRIME_RENDER_OFFLOAD_PROVIDER variables in Mesa and the NVIDIA driver.

Note that this is still a work in progress. With this MR, I'm focusing on the libglvnd and driver interfaces. I've got a placeholder function for reading config files, and I've been experimenting with formats, but the details of that are still open.

Whatever form that configuration takes, part of my goal is that the same configuration should apply to GLX, EGL, and Vulkan. That is, a user should be able to configure an application without having to care about which API the application uses.

App Profile Library

To start with, there's a new library to read the configuration for the current process. Since the same configuration should apply to each API, this library is shared between libGLX, libEGL, and a Vulkan layer.

The public interface for the application library is based around returning attributes. That should make it easy to add new attributes later without breaking ABI compatibility. Also, the other libraries shouldn't have to care about the details of how the configuration is defined, and this makes it easy to make those internal details opaque.

The most important attribute is a list of zero or more devices for offloading. Each device is represented by a (vendor_name, device_uuid) pair. The vendor_name part is the same vendor name used by libGLX, or the EGL_DRIVER_NAME_EXT string from EGL_EXT_device_persistent_id. The device_uuid is, as you'd expect, the device UUID, which corresponds to EGL_DEVICE_UUID_EXT, VkPhysicalDeviceIDProperties::deviceUUID, or GL_DEVICE_UUID_EXT.

Note that we don't use a driver UUID, because the configuration is supposed to be persistent, and the driver UUID by definition changes between driver versions. If two drivers support the same physical device and use the same device UUID for it, then the vendor_name string keeps things unambiguous.

Each API library is then responsible for translating that name and UUID into something API-specific. More details on that below.

Internally, this library just uses environment variables to define the profile data. There's a placeholder function for where it would load a config file, but that's just a no-op right now.

Even after we define a config file, I expect the environment variables would still be available as a way to override it. Using the environment variables would make testing all this stuff a lot easier, and it would be a clean way for a desktop environment to implement a "Run on such-and-such device" command.

GLX Interface

For GLX, I added a new initOffloadVendor function to the vendor library interface, which takes a Display pointer and the device UUID, and returns a set of screen numbers.

When libglvnd first initializes a display, it looks up a (vendor_name, device_uuid) pair from the app profile library. It will load a vendor library based on vendor_name, and then it calls that vendor's initOffloadVendor function.

If the vendor can support offloading for at least one screen, then it returns the set of screens that it supports. Libglvnd will then use that vendor for those screens. For all other screens, libglvnd will use its current vendor selection logic.

If libglvnd can't load that vendor, or if the vendor doesn't support offloading for that display, then libglvnd will move on to the next device from the profile and try again.

To clarify, this interface is per-display, not per-screen. That way, a vendor library can still initialize a whole display all at once, including whatever it might need to do to set up offloading.

EGL Interface

For EGL, a vendor library has to provide a vendor name string through its __EGLapiImports::getVendorString callback, which libglvnd matches against the vendor name from the profile.

The vendor name should be the same string as you'd get from EGL_DRIVER_NAME_EXT. It's a separate query for two reasons, though. The first is that providing a non-NULL vendor name is how a vendor library indicates that it supports offloading at all, since a vendor could support EGL_EXT_device_persistent_id without supporting offloading.

The second reason is that libglvnd can check if the vendor name matches before it has to call the vendor's eglQueryDevicesEXT function, which could be very expensive (powering on devices, probing hardware, etc).

Anyway, in eglGetPlatformDisplay, libglvnd will look for a vendor with a name that matches the profile. It will then get a list of EGLDeviceEXT handles from that vendor, and it will look for one where the EGL_DEVICE_UUID_EXT value matches the UUID from the profile.

If it finds a matching device, then it will append an EGL_DEVICE_EXT attribute when it calls the vendor's eglGetPlatformDisplay. The vendor is then responsible for checking whether it can actually support offloading onto that device.

If the vendor's eglGetPlatformDisplay returns NULL, then libglvnd will try again with the next device from the profile. If no devices work, then it'll fall back to calling each vendor's eglGetPlatformDisplay without the EGL_DEVICE_EXT attribute like it does now.

This is based on the proposed EGL_EXT_explicit_device extension, with the added requirement that the vendor has to check in eglGetPlatformDisplay (not in eglInitialize) whether it can actually use that device with that native display.

Vulkan Interface

For Vulkan, I've written up a layer that can either sort the VkPhysicalDevice handles to put the devices from the app profile first, or filter them so that the application only sees the devices from the profile. There's an attribute in the app profile that tells it whether to use sorting or filtering.

Vulkan doesn't provide a vendor name as a string, or at least, not a string that matches EGL or GLX. It does have a VkDriverID enum, though, so the layer currently just hard-codes a string for some of those enum values. It's not ideal, but it's something that works without any new extensions.

Edited by Kyle Brenneman

Merge request reports