Draft: Config file format for GPU offloading

mentioned in merge request !224

added 118 commits

e905215f...5024e579 - 96 commits from branch glvnd:master
a157b862 - Added a basic app profile interface.
e664cd0d - app_profile: Add functions to return arbitrary attribute values.
8c60a27b - app_profile: Add attributes for GPU offloading
6f576599 - app_profile: Get a device list from an environment variable
8a7e54c1 - app_profile: Add an attribute for device filtering
f87b9388 - EGL: Define a GPU offload interface.
6fe0c57b - EGL: Implement the GPU offload interface.
229f47de - GLX: Define a GPU offload interface.
a7fdd97d - GLX: Implement the GPU offload interface.
b53ed707 - vulkanconfig: Add a stub Vulkan layer.
68e686f0 - vulkanconfig: Move the instance map functions to their own header.
a65fa292 - vulkanconfig: Load an app profile.
14ca1f25 - vulkanconfig: Implement device sorting based on profile.
1e68dfb4 - app_profile: Check in a TOML parser.
281f5e7f - app_profile: Move the GLVNDProfileRec definition into app_profile_internal.h.
7f5d69f2 - app_profile: Add functions to look up process data.
bb298573 - app_profile: Add functions to scan the config directories.
e96cc3e1 - app_profile: Add a function for reading TOML files.
415411d5 - app_profile: Implement app profile parsing.
70681dbd - app_profile: Implement device aliases.
082e7f3a - app_profile: Added a file describing the config file format.
f557db37 - Update profile_format.md.

marked this merge request as draft

changed title from WIP: Config file format for GPU offloading to Draft: Config file format for GPU offloading

changed the description

Updated to match the new revision of !224. The config file parsing code itself hasn't actually changed, and the format is the same, except that the device ID string has to include a valid UUID instead of an arbitrary vendor-defined string.

mentioned in issue #229

Just finished looking at the config file example:

For wine programms being able to match by cwd and cmdline could also be very usefull. comm is the name of the .exe file but truncated to 15 chars, cmdline contains all agruments after the wine executable, aka it starts with an absolute path or a path relative to cwd to the .exe file (ofc given that the cwd hasn't changed in the meantime), but at least contains the full name of the .exe file.

Oh, I had never noticed that -- wine adjusts things so that /proc/self/cmdline is just the windows command line, so element zero is the .exe file. I had thought that it left the wine executable as element zero.

In that case, for wine, it would be pretty easy to add a rule to look at cmdline[0], or maybe generalize it to look at a specified index or list of indices.

added 5 commits

c42b9f8f - app_profile: Fix the autotools build
65d079b4 - app_profile: Update the tomlc99 version.
88b5364e - app_profile: Read /proc/self/cmdline
a172590e - app_profile: Add a match rule for /proc/self/cmdline.
23124614 - app_profile: Allow case-insensitive matching for cmdline

Compare with previous version

Okay, I added a match rule that looks at /proc/self/cmdline.

I set it up so that you can specify the argument index to match against, which does unfortunately make it a bit more complicated. Rather than a simple array, it uses sub-tables. TOML's array-of-tables syntax still works pretty cleanly for nested arrays like that, though.

Did not yet test cmdline, but only in general with glxinfo:

__NV_PRIME_RENDER_OFFLOAD=1 must still be set manually for nvidia gpus (the code only seems to do the equivalent of __GLX_VENDOR_LIBRARY_NAME=nvidia)
vendor->glxvc->initOffloadVendor is NULL albeit the deivce supportys offloading (altering the function to ignore the existence vendor->glxvc->initOffloadVendor and it's output and it works)

As far as i can tell: it's not possible atm to just set the vendor name, the device uuid also must always be provided?

!224 (which is included in this change) adds a driver interface for EGL and GLX. For offloading to work, we'd need to finalize that interface and check in the libglvnd changes, and then after that, the drivers would need to implement it.

Before I check in the libglvnd changes, I'm hoping to get feedback from at least someone on the Mesa side of things to confirm whether or not the interface would work. The whole goal here is to have a vendor-agnostic configuration interface.

As far as configuration, though: Yes, as currently defined, you do have to specify both a vendor name and a device UUID. In theory, we could adjust things to take only a vendor name, but then you'd likely get different behavior for GLX and Vulkan: For GLX, the driver would have to select a device internally, and for Vulkan, the new Vulkan layer library would have to select one.

That sort of thing is what the alias stuff is for, though: You could define an "nvidia" alias, and then have some other program that runs on startup or login that selects a specific NVIDIA device to use. Then, you'd be able to just put "nvidia" into a profile, and all three API's would get the same device.

!224 (which is included in this change) adds a driver interface for EGL and GLX. For offloading to work, we'd need to finalize that interface and check in the libglvnd changes, and then after that, the drivers would need to implement it.

Did only just now realize that initOffloadVendor is something new introduced by this patchset.

Before I check in the libglvnd changes, I'm hoping to get feedback from at least someone on the Mesa side of things to confirm whether or not the interface would work. The whole goal here is to have a vendor-agnostic configuration interface.

Do you know if a Mesa dev is already aware of this?

As far as configuration, though: Yes, as currently defined, you do have to specify both a vendor name and a device UUID. In theory, we could adjust things to take only a vendor name, but then you'd likely get different behavior for GLX and Vulkan: For GLX, the driver would have to select a device internally, and for Vulkan, the new Vulkan layer library would have to select one.

That sort of thing is what the alias stuff is for, though: You could define an "nvidia" alias, and then have some other program that runs on startup or login that selects a specific NVIDIA device to use. Then, you'd be able to just put "nvidia" into a profile, and all three API's would get the same device.

Yes, read that art in the documentation. Makes things more complicated because you first must scan the device for GPUs, but also avoids ambiguity with AMD also supporting offloading and Intel also having discrete GPUs.

If you know any way I can help to speed things up let me know. I'm tasked with creating a GUI for TUXEDO_OS for easy GPU offload handling, and this MR would be the perfect backend for it.

e.g. would having a prototype of the GUI ready help getting the attention of the Mesa devs? Or is this irrelevant at this early point in time?

Yeah, I spent a more time than I'd care to think about trying to figure out a sane way of dealing with multi-vendor configurations beyond simple cases like a single integrated and a single discrete GPU. That ultimately led to the alias feature -- it lets profiles select a generic "performance" rule, but it still separates the GPU offloading mechanism from the device selection policy, and provides a way for a user or a distro to define whatever policy they see fit.

Anyway, if you want something to test, the Vulkan side should work without any extra driver support -- it just relies on existing Vulkan interfaces and fudges the VkPhysicalDevice ordering to try to make an application choose the right device.

I don't know if having a GUI (or even a mock-up) for a config file editor would be helpful or not -- the config file format is deliberately opaque to the rest of libglvnd (and by extension, the drivers). I actually expected that the vendor interface in !224 would get checked in, and then the config file format would get nailed down sometime later.

vulkaninfo does not reorder it's list, but I'm not sure if that's the right place to look?

I would expect vulkaninfo to show the ordering. It's quite possible that I messed something up in the Vulkan layer, though. Most of the testing I've done so far was for the new libGlvndAppProfile.so library itself.

Just wanna bump this. Did you hear back from the mesa folks?

No, it's been pretty quiet. I would like to get this moving again, though.

I wonder if we could somehow bump it...

Did you ever write to one of the mesa mailings lists? https://docs.mesa3d.org/lists.html

I haven't yet, though !224 would be the more relevant change for Mesa. By design, the details of the config file are opaque to the vendor libraries.

Something that came up when discussing another bug is that I think this could be used for something like the inverse of the usual GPU offloading arrangement.

With X11 and Wayland, the client-side EGL driver can tell which device the display server is running on using DRI3Open or wl_drm. By default, the dGPU's driver would skip a native display if the server is running on the iGPU.

But, if an application calls eglGetDisplay(NULL) or the eglGetPlatformDisplay equivalent with EGL_EXT_platform_device or EGL_MESA_platform_surfaceless, then there is no display server involved, so the driver can't make any such distinction. As a result, the dGPU driver would respond to that call, possibly waking up the dGPU to do so.

To avoid that, something (possibly a startup script for a desktop environment) could generate a config file with a default profile that specifies whatever device the desktop is running on. Then, any application that calls eglGetDisplay(NULL) would end up with that device.

To do that, we'd need to make sure that any application-specific configurations take priority over that, which would be tricky to do using only the directory search order. Also, you'd want to put that in a per-session (rather than per-user) directory, and I don't know of any standard place for such a thing.

We'd also need to be able to limit the eglQueryDevicesEXT calls that libglvnd makes internally, to avoid unnecessarily waking up any GPUs. It would be pretty easy to add a name for each driver like we have with GLX, which would be enough to limit the eglQueryDevicesEXT call to that driver. For any finer granularity than that, though, we'd need a new query of some sort.

For reference: https://lists.freedesktop.org/archives/mesa-dev/2024-March/226174.html

Admin message

Admin message

Draft: Config file format for GPU offloading

Merge request reports

Activity