Framebuffer corruption when a fb which is not being scanned out gets removed
@jwrdegoede
Submitted by Hans de Goede Assigned to Default DRI bug account
Link to original bug (#111588)
Description
This is a weird issue, which I noticed when working on this plymouth fix:
plymouth/plymouth!59 (merged)
On boot to ensure a smooth handover of the framebuffer from plymouth to gdm the following happens:
- plymouth starts, does an addfb, becomes master, set the fb as the fb the crtc should scanout
- gdm starts tells plymouth to "deactivate", plymouth drops master (but does not exit)
- gdm becomes master, installs its own fb to scanout, the fb being scanned out is now owned by gdm
- gdm tells plymouth it may quit now
- plymouth exits, without calling rmfb or closing the /dev/dri/card0 fd, relying on the kernel to cleanup
- all is well
The bug fix from the above merge requests make plymouth actually cleanup behind itself, this is necessary to avoid issues with hotunplug (see the plymouth MR for details). My first attempt at this simply made plymouth always do the cleanup, both on hotunplug and exit as that was the most straight forward to do.
This changes the sequence to:
1-4) Idem as above
5) Plymouth internally calls src/plugins/renderers/drm/plugin.c:
ply_renderer_buffer_free() this does:
drmModeRmFB(...);
munmap (buffer->map_address, buffer->map_size);
destroy_dumb_buffer_request.handle = buffer->handle;
drmIoctl (fd, DRM_IOCTL_MODE_DESTROY_DUMB, &destroy_dumb_buffer_request);
Followed by calling close() on the fd.
6) Plymouth exits
7) 5 and/or 6 cause the gdm framebuffer being all messed up, it looks like a
wrong pitch or tiling setting
Note that when 5 is executed plymouth no longer is master and the fb being removed is no longer being scanned out, so this really should not be able to influence the current kms state, yet it does.
This is 100% reproducable for me with Fedora 30 + master plymouth + 5.3.0-rc7 on a R7 250E (SAPPHIRE Ultimate Radeon R7 250) using the amdgpu driver. I know that the default is to use the radeon driver with the R7 250E, but I was using the amdgpu driver deliberately to reproduce: https://bugzilla.redhat.com/show_bug.cgi?id=1490490
For now I've modified the plymouth fix to only call ply_renderer_buffer_free() + close() on hot-unplug and to still leave cleanup to the kernel on exit, but it would be nice to get to the bottom of this.