Delay start-up until a graphics device is present

Hi @pq, I'm new at Weston project and I'm looking for a issues to start. Can I work on this Feature? Could you give me some pointers if you have the time?

Sure! No-one else has mentioned working on this.

The linked GDM issue has some pointers forward. It has been so long since I filed this issue that I have forgot all the details.

The fundamental issue still exists: if Weston gets started early enough, the DRM device might not be there yet, or it might be still initializing. Currently Weston assumes that the DRM device it will use is fully initialized, so Weston would fail to start if started too early.

What Weston probably should do is block the in the DRM-backend initialization until the DRM device is ready. Maybe it could also use a configurable timeout to fail gracefully if the seat never gets a DRM device.

This task seems to be about talking to logind (launcher-logind.c) to figure out if and when the seat has a DRM device. But finding the right DRM device is done in drm.c from udev independently. So logind should be waited for first, and then drm.c should be able to find the DRM device via udev automatically.

drm_backend_create() is the entry point to the DRM-backend.

Hi @pq,

I'm having some troubles to test my code, It looks like launcher_logind_take_control() is not being called. I've added a printf with fflush but nothing happened, but a printf in drm_backend_create() shows up. A friend suggested that occur because Weston was running inside the xfce, so It does not need any code to deal with logind and the function will not be called. Do you think that's the case?

About the solution Itself, I'm thinking to put a wait loop after the launcher_logind_take_control and change de function property_changed to accept the read of CanGraphical property and change one new variable can_graphical inside the launcher_logind struct. This makes sense for you?

If you simply run weston in some terminal window, when Weston will pick the X11- or Wayland-backend, as appropriate, and not the DRM-backend. You will see what it picks in the log output. These nested backends (wayland, x11) have no use for logind, because the parent display server is already in control of input and output (the seat).

However, since your prints show that DRM-backend really is used, then something else is going on.

For DRM-backend, always run from a VT where no other display server is active. This means not from any kind of terminal window, and not via ssh. Run it as a normal user, not as root. If logind service is available, it will be used.

Your proposed solution might make sense, but I can't really say for now. I have no insight into this matter at all at the moment.

Thanks @pq, your solution worked fine. Sorry the delay, I was very busy these days.

I could see the CanGraphical variable changing when unloading/loading the i915 driver, and then using the gdbus introspect -system --dest org.freedesktop.login1 --object-path /org/freedesktop/login1/seat/seat0 command. But I still can't get this change in the code though. This week I will investigate more why this is happening.

Thanks for your help

For reference, the patch in wlroots: https://github.com/swaywm/wlroots/pull/1895

Thanks a lot, This will be very helpful!!!!

Sorry, change of plans because of https://lists.freedesktop.org/archives/systemd-devel/2020-April/044245.html :

Let's not use CanGraphical. Apparently it is more recommended to just watch the kinds of devices Weston wants with udev instead.

cc @emersion @jadahl since your projects had plans using CanGraphical.

Ohh, OK. I'm not sure how do that though. I will try my best and ask you if I can't move forward.

The fundamental idea would be the same: block in the DRM-backend init until we have the device.

How to actually do that would probably involve the udev monitor DRM-backend already sets up anyway, so the code exist, but you would have to do the same separately because you cannot spin the main event loop to service the udev monitor. You'd have a sort of mini-event-loop that dispatches udev events until it finds the DRM device we need, then stop the mini-event-loop and shut the temporary monitor down. Or something like that.

However, I think a more future-proof approach would be to make the DRM-backend initialization much more dynamic:

Always start with Pixman-renderer, and switch later to GL-renderer if possible. This allows it to...
Initialize the DRM-backend without any DRM device, resulting in "just" no heads existing. The compositor will essentially run headless and normal at this point.
Once udev indicates a DRM device appearing (or already existing) and we are not already using any DRM device, take the DRM device into use.

The latter approach is probably what we want in the long term, but it likely has some open design questions and may be more work to implement. The former approach is more of a mechanical programming exercise.

Either way would be fine by me.

I would like to implement the future-proof approach, but I don't know how much time this will take. If I notice that I'm struggling to much to implement this approach, I will fallback to the first approach and try the future-proof afterward.

Thanks again pq for your explanation how things should proceed!

Hi @pq, I have a draft implementation in the following link Igortorrente/weston@ca5f42b9.

I want to implement a more robust solution in the future, but for now, I followed the first approach. Blocking the back-end until one card be available or the timeout occurs.

Even though now I understand a little bit better the Weston back-end, I still not sure how to use Pixman instead of DRM, neither how to switch to DRM after. These things still a mystery to me.

There are other questions about the implementation that is not clear for me, like

What to do when another graphics card is inserted/remove?
Should I keep track of any udev name change and each available card?

I think these are my main doubts for now. Thanks @pq

DRM-backend has field use_pixman in struct weston_drm_backend_config that makes it initialize either Pixman-renderer or GL-renderer. This would be changed so that Pixman-renderer is always initialized on start-up, and if not use_pixman then after a delay call switch_to_gl_renderer(). I'm not sure this is a safe change, because it makes Weston ready to accept clients before GL-renderer is initialized, meaning that client that connect early won't be using the GPU. This may need more thinking. But, if there is no DRM device around, we cannot initialize GL-renderer anyway.

Blocking in the DRM-backend init OTOH is safe, because it postpones Weston becoming "ready".

In either case, there is no need to do anything about any other DRM device that might appear or disappear. Weston cannot do anything with them anyway, yet. It might be good to keep on waiting if the first DRM device to appear does not actually have KMS support as identified by drm_device_is_kms().

Your patch seems reasonable, although I didn't read it too carefully. We need to be careful with calling wl_event_loop_dispatch(), because previously that has signified the end of initialization. Some components register idle callbacks to know when they can start running (e.g. clients), and idle callbacks are dispatched from the dispatch. OTOH, it would be good to process signal events in case something is stuck and someone wants to quit Weston. If the backend is the first module to initialize, then no other component should have had a chance to register idle callbacks, so we should be safe in that case. Otherwise, we should probably not touch the wl_event_loop and just poll udev without it until a device appears or a timeout is up.

Thanks, I will try to use use_pixman and switch_to_gl_renderer() and see if it works.

The waitlist is a good idea! I will try to implement it!

About the wl_event_loop_dispatch() possible problem, I'm thinking to let it in the way it is, and change it if any problems happen.

In the current state of draft, in the case of a timeout because of lack of GPU, Do you think it is better fallback to pixman or just let the Weston fail?

Another question, do you think is better ask a merge request now and sending others request in the future, or just continuing working on my fork and only sent the request when everything is done? I mean with the future-proof version.

Let Weston fail. Currently Weston won't start without a DRM device either, and keeping existing behavior is less surprise to users.

I believe that merge requests should be opened as soon as you are sure the changes you put into it are useful. When I implement big features, I work in my own branch and at some point I realize some of the patches are ready, they won't change anymore. Those patches are usually some sort of clean-up or refactoring of existing code to better support the new thing. So I put the patches that are ready into an MR to get them merged. That way when I need to rebase the next time, I have less work to do.

When you open a MR, it needs to be self-standing, which means that if it is just preparation for something else, it needs to explain what it is preparing for and why. And of course every step of a patch series must not obviously regress anything.

If the question here is about whether we should have the intermediate solution for device waiting or not, we need to weigh the need for the intermediate solution against the likelyhood and speed of reaching the full solution. In this case, I don't remember anyone asking for this feature (people have probably worked around it if they hit it), so the need for the intermediate solution seems to be low. I don't mind either way, so I think you can decide on that. Maybe it's more rewarding for you to do the simpler thing first, get it merged, and then look at the more complicated thing?

Ok, so I will let the DRM fail.

About the MR, I think in the current state, my change is complete, in the sense that it does what should do(even though not in the best way). And I don't think it will let hard (at least not harder than it is now) to implement a more robust solution. IMHO it will be a good addition to Weston, and if you disagree on any point please tell me.

Maybe still some issue and non-covered edge cases, but as far as I tested, it is working. I will do more tests today to see if any problem shows up.

Thanks @pq

Hi @pq, I tested my change in three scenarios:

None GPU never become available
One or more GPUs are available at start(I test with one and two GPUs drivers)
One or more GPUs become available after the initial probe and before the timeout

I launch some programs like weston-editor, weston-flower, and weston-simple-egl, to see if everything is working as expected.

The only strange thing occur in the third scenario, In this case I receive this message in the terminal Unknown parameter: ?2004. It is strange because it not happen in the other scenarios.

I made brief investigation, and it looks like to be something related to the handle of the terminal parameters. But I don't see how this could be related to my change, at least I did not found any wl_event or something related.

Do you think this could cause by my change?

I test a version of my patch with a simple wait and probe loop, and the issue gone. As you said, we need use wl_event_loop_dispatch carefully, and this is probably the root of the message that i receive in my terminal.

So I will remove the wl_event part.

Hi @pq, I sent a MR!

I will continue to work on a better solution and will update my progress here.

Yeah, the unknown parameter message is in terminal.c, handle_term_parameter(). I have no idea what it means or what might cause it.

Thanks for the MR! Unfortunately I'm fairly tied up now and my next main attention is color management and HDR, so it may or may not take a good while before I get to your MR. If someone else reviews and merges it, that's fine with me.

Ok, no problem. I will be busy with the improvement anyway.

mentioned in merge request !415

Delay start-up until a graphics device is present

Child items ...

Activity

Admin message

Admin message

Delay start-up until a graphics device is present

Activity