For the driver interface (i.e., this MR), I'm still hoping to get some feedback from someone on the Mesa side. After all, if NVIDIA is the only driver that implements it, then that rather defeats the purpose of having a vendor-neutral interface.
Note that GLX is the only part that actually requires changes to the vendor library interface. EGL requires a couple of extensions which would need to be implemented. For Vulkan, it's implemented in a separate layer, so no driver changes would be necessary.
For EGL, it just uses EGL_EXT_device_persistent_id and EGL_EXT_explicit_device. Mesa already implements EGL_EXT_explicit_device, but would need to add EGL_EXT_device_persistent_id.
The __EGL_VENDOR_OFFLOAD_NAME
query could be made optional, since that's really just a performance optimization -- calling eglQueryDevicesEXT can be really expensive if it causes a driver to power on its GPUs. That said, adding the __EGL_VENDOR_OFFLOAD_NAME
query would take all of two lines of code in Mesa. One line if we change it to use a switch statement instead of an if/else sequence.
So, I think the main challenge in Mesa would be to find the device UUID before display init (rather than after you've created a GL context). For EGL, we'd need that to implement EGL_EXT_device_persistent_id
, and for GLX, we'd need to plumb that UUID through loader_get_user_preferred_fd
. In both cases, that UUID would still need to match GL_DEVICE_UUID_EXT
and VkPhysicalDeviceIDProperties::deviceUUID
.
Alas, I ran those use cased with X-Windows at the time. Initially just X-Windows without a reboot. Same result. Then after rebooting and checking for updates -- Reran my examples. No change. Whatever the effect, I is underneath those top layers (I guess). I'd simply be satisifed to not need the EGL if I could turn that off or side-step it, I could carry on.
For me the killer are the core dump-s from different EGL tooling, e.g.
$ eglinfo -B
core dumps every time. Wayland and X-Windows. Message: "corrupted size vs. prev_size
" ?!
$ eglinfo -B
GBM platform:
EGL API version: 1.5
EGL vendor string: NVIDIA
EGL version string: 1.5
EGL client APIs: OpenGL_ES OpenGL
OpenGL core profile vendor: NVIDIA Corporation
OpenGL core profile renderer: NVIDIA GeForce RTX 2080/PCIe/SSE2
OpenGL core profile version: 4.6.0 NVIDIA 535.129.03
OpenGL core profile shading language version: 4.60 NVIDIA
OpenGL compatibility profile vendor: NVIDIA Corporation
OpenGL compatibility profile renderer: NVIDIA GeForce RTX 2080/PCIe/SSE2
OpenGL compatibility profile version: 4.6.0 NVIDIA 535.129.03
OpenGL compatibility profile shading language version: 4.60 NVIDIA
OpenGL ES profile vendor: NVIDIA Corporation
OpenGL ES profile renderer: NVIDIA GeForce RTX 2080/PCIe/SSE2
OpenGL ES profile version: OpenGL ES 3.2 NVIDIA 535.129.03
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20
corrupted size vs. prev_size
Aborted (core dumped)
Getting the same behavior with Nouveau is expected -- if it's not able to load or use the NVIDIA client-side driver for some reason, then it'll end up with Mesa, which is the same place it would be with Nouveau.
One thing that could be worth trying is running a normal X11 session instead of Wayland. As I noted above, the NVIDIA driver doesn't support EGL on XWayland yet.
Thank you @kbrenneman ... Yes that makes sense. Something driver related, the barfs on the EGL (gulp!). I am also getting this same problem when using a native Rust library, Ubuntu 23.10 (Wayland AND X.Windows).
It rather seems that the problem itself lies with the outside Flatpak/Flatseal's context. To me at least, since I don't know what I'm talking about. Just muddling through, trying to make sense of stuff.
Just wondering then, since upgrading from Kubuntu 23.04 to Kubuntu 23.10 -- The NVIDIA driver I'm using was Not changed. The thing that changed was 23.04 ==> 23.10. Stuff always worked for Wayland on 23.04 (as far as I know). Flatseal worked fantastically; Now, though I have multiple Flatpak apps and one or two native apps locking-up with this same EGL message.
$ flatpak list | grep nvidia
nvidia-535-129-03 org.freedesktop.Platform.GL.nvidia-535-129-03 1.4 system
The currentl in-use system driver is: nvidia-535
(metapackage)
Would it help perhaps to go back to the Nouveau
driver just to see if that works around the issue? I think I did try that; no change. I'm not sure, so it may be worthwhile giving that a go.
Something that came up when discussing another bug is that I think this could be used for something like the inverse of the usual GPU offloading arrangement.
With X11 and Wayland, the client-side EGL driver can tell which device the display server is running on using DRI3Open or wl_drm. By default, the dGPU's driver would skip a native display if the server is running on the iGPU.
But, if an application calls eglGetDisplay(NULL)
or the eglGetPlatformDisplay
equivalent with EGL_EXT_platform_device or EGL_MESA_platform_surfaceless, then there is no display server involved, so the driver can't make any such distinction. As a result, the dGPU driver would respond to that call, possibly waking up the dGPU to do so.
To avoid that, something (possibly a startup script for a desktop environment) could generate a config file with a default profile that specifies whatever device the desktop is running on. Then, any application that calls eglGetDisplay(NULL)
would end up with that device.
To do that, we'd need to make sure that any application-specific configurations take priority over that, which would be tricky to do using only the directory search order. Also, you'd want to put that in a per-session (rather than per-user) directory, and I don't know of any standard place for such a thing.
We'd also need to be able to limit the eglQueryDevicesEXT
calls that libglvnd makes internally, to avoid unnecessarily waking up any GPUs. It would be pretty easy to add a name for each driver like we have with GLX, which would be enough to limit the eglQueryDevicesEXT
call to that driver. For any finer granularity than that, though, we'd need a new query of some sort.
I haven't yet, though !224 would be the more relevant change for Mesa. By design, the details of the config file are opaque to the vendor libraries.
I don't have a Fedora system handy at the moment, but I just tested on a minimal Ubuntu installation, which has /usr/bin/python3, but not /usr/bin/python or /usr/bin/python2, and it works as expected: The configure script finds /usr/bin/python3, and the makefiles all explicitly run /usr/bin/python3 to run any Python scripts.
@paulthomas100199 -- Can you attach the config.log file from when you see the build fail?
Also, if you manually tell it where to find the python interpreter, then does that help? You can do that by adding PYTHON=/usr/bin/python3
as an argument to configure
.
That tells me that automake isn't finding the Python executable for some reason, or else isn't using it.
If python
doesn't exist in $PATH, then the AM_PATH_PYTHON
call should find /usr/bin/python3
, and then anything that uses Python should explicitly run that as /usr/bin/python3 <script>.py
.
Do you have the NVIDIA package installed in flatpak?
You can check that with flatpak-list. On my system, for example:
$ flatpak list | grep nvidia
nvidia-535-129-03 org.freedesktop.Platform.GL.nvidia-535-129-03 1.4 flathub user
nvidia-535-129-03 org.freedesktop.Platform.GL32.nvidia-535-129-03 1.4 flathub user
Flatpak will map the Wayland and X11 sockets through to the container along with any necessary device nodes, and the runtime package will include libglvnd and Mesa. However, the NVIDIA driver has a separate extension package. Also, since the user-space and client-side NVIDIA libraries (in the container) have to match the kernel-space and server-side libraries (in the host system), flatpak has separate packages for each NVIDIA driver version.
So, if you don't have the flatpak extension for whichever version of the NVIDIA driver you have in the host, then the container won't be able to use the NVIDIA driver. When that happens, libglvnd will move on to try Mesa, which in turn will try to run with its software renderer.
Also, note that the NVIDIA driver does not support EGL with XWayland yet. Using EGL_EXT_platform_wayland
(i.e., talking to Wayland directly) should work fine, but EGL_KHR_platform_x11
currently only works with a regular Xorg server.
I wonder if we could somehow bump it...
Did you ever write to one of the mesa mailings lists? https://docs.mesa3d.org/lists.html
Compile the spec file using fedora, using autotools
spec files
We can see that there is a patch in fedora to modify python to python3
modify python3
Hmmm ...
flatseal
app under flatpak
.flatpak
encapsulated or containerised each application -- It was unexpected to see this error under an containerised context. Yes/No?I don't think it is my hardware -- It might also be something to do with the NVIDIA driver in use:
I'm happy to attach logs, console output, run experiments, or such. I have no idea if I can apply Valgrind
from user space for this error/context. I am only running a desktop app -- I doubt that I have much in the way of debug. I was using this exercise to "learn" Rust; there is probaly potential to get a stactrace in a future library patch release(????). The flatpak
situation seems to point to my environment -- It might be a good thing if there were a diagnostic flatpak
module we could all load to check on this kind of thing. Wooo wooo!
/
Which build system are you using, autotools or Meson?
Either one is supposed to find and use the correct Python executable, regardless of whether it's installed as python
, python2
, or python3
. If that's not working, then we should figure out why.
Currently, /usr/bin/python
is provided separately in fedora. If it is not installed during compilation, the compilation will fail.
[root@fedora ~]# cat /etc/fedora-release
Fedora release 38 (Thirty Eight)
[root@fedora ~]# which python
/usr/bin/which: no python in (/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin)
[root@fedora ~]# repoquery -q -f "/usr/bin/python"
python-unversioned-command-0:3.11.2-1.fc38.noarch
python-unversioned-command-0:3.11.6-1.fc38.noarch
[root@fedora ~]# repoquery -q -l python-unversioned-command
/usr/bin/python
/usr/share/man/man1/python.1.gz
[root@fedora ~]# repoquery -q -i python-unversioned-command
名称 : python-unversioned-command
版本 : 3.11.2
发布 : 1.fc38
架构 : noarch
大小 : 11 k
源 : python3.11-3.11.2-1.fc38.src.rpm
仓库 : fedora
概况 : The "python" command that runs Python 3
URL : https://www.python.org/
协议 : Python-2.0.1
描述 : This package contains /usr/bin/python - the "python"
: command that runs Python 3.
名称 : python-unversioned-command
版本 : 3.11.6
发布 : 1.fc38
架构 : noarch
大小 : 12 k
源 : python3.11-3.11.6-1.fc38.src.rpm
仓库 : updates
概况 : The "python" command that runs Python 3
URL : https://www.python.org/
协议 : Python-2.0.1
描述 : This package contains /usr/bin/python - the "python"
: command that runs Python 3.
[root@fedora ~]# echo $PATH
/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
The Python scripts are written such that they'll work with Python 2 or 3.
@paulthomas100199 -- Is there some place you've found where the scripts don't work with Python 2, or where python
isn't available in $PATH?
modify python to python3
Hi! I was debugging a related issue and this was very helpful.
Since mesa 23.3 zink is now queried before swrast, so EGL_EXT_platform_xcb applications will go through that. Unfortunately at the same time zink is now (since mesa 23.3) crashing on top of the nvidia vulkan driver. The result is that applications using xcb_egl now just crash... (namely: vlc, xwaylandvideobridge)
Hi! Sorry if this question is solely vendor-specific or should be created somewhere else!
I am exploring an x11 opengl app that would only use xcb and egl, without any dependency on xlib or glx. I went with egl because it's seemingly works on both wayland and x11 and I don't want to have more dependencies than I absolutely need. I've found an example of xlib + egl app (https://gist.github.com/mmozeiko/911347b5e3d998621295794e0ba334c4) and ported it to xcb + egl (https://gist.github.com/valignatev/60fdd91fefabd131a0e53fe2e3ef0ec7) where I used eglGetPlatformDisplayEXT
with EGL_PLATFORM_XCB_EXT
.
It works on my machine (with integrated intel gpu), but people with nvidia report that xlib version works well, but xcb version falls back to software rasterization and produces few warnings such as
libEGL warning: DRI3: Screen seems not DRI3 capable
libEGL warning: DRI2: failed to authenticate
their eglinfo advertises EGL_EXT_platform_xcb
in the client extensions.
So my question is - did I do something wrong while porting from xlib to xcb or is this a case when nvidia doesn't handle EGL_EXT_platform_xcb correctly in their vendor implementation? I've tried to only use xlib to get EGLDisplay and then get xcb connection and use that for everything else and it works as well, but I'd keep it as my last resort, I'd really rather do pure xcb if it's possible.
Do you know where I should be looking to fix it (or to know what's the reason for nvidia falling back to swrastr)? And, again, I'm sorry if this is the wrong place for such questions, let me know where I should direct it if that's the case.