snow gremlins invading systemd and the keyboard
Submitted by Ray Strode
Assigned to Ray Strode @halfline
Link to original bug (#105404)
Description
`<mstahl>` halfline: i got a F27 plymouthd eating 100% cpu since hours, strace shows it's calling epoll_pwait and nothing else, is that a known issue?
`<halfline>` mstahl: no
`<halfline>` mstahl: is it constantly waking up from epoll_pwait ?
`<halfline>` can you look at what fd is making it wake up?
`<halfline>` this is on a boot machine ?
`<halfline>` *botted
`<halfline>` *booted
`<mstahl>` halfline: it's my build pc, i usually just ssh into it, turning on the monitor it still shows the fedora logo, no gdm, so the boot is stuck there
`<halfline>` so maybe something prevented boot from finishing ?
`<halfline>` systemctl status ?
`<mstahl>` halfline: food time, i'll be back in 15 minutes
`<halfline>` oh i'll be gone
`<halfline>` it's almost food time for me too :-)
`<mstahl>` halfline: i've finally installed enough debuginfo that i see plymouth source in gdb... ... ... it always has 1 event and that goes into "on_tty_input" in ply-terminal.c
`<halfline>` mstahl: weird
`<halfline>` does the machine have a keyboard attached ?
`<halfline>` are you able to see with gdb what is getting input to the tty ?
`<mstahl>` halfline: so it used to have a keyboard attached while booting, but not any more
`<mstahl>` halfline: it's a switchable usb-hub, that's connected to my laptop and to the pc
`<halfline>` i wonder if it has a stuck key
`<mstahl>` is it possible that a key was pressed while disconnecting it and then it never goes "unpressed"?
`<mstahl>` lsusb doesn't show anything other than hubs
`<halfline>` i wouldn't expect that kind of behavior, modulo a kernel bug
`<halfline>` can you set a break point on on_keyboard_input ?
`<halfline>` and the (gdb) print keyboad_input ?
`<mstahl>` on_keyboard_input breakpoint never hits
`<mstahl>` on_tty_input does not do anything btw because terminal->input_closures is null
`<mstahl>` err i mean ply_list_get_first_node (terminal->input_closures) is null
`<mstahl>` is it possible that plymouth doesn't actually read from the fd because of the on_tty_input returning early & that causes the loop?
`<halfline>` lemme check the code
`<mstahl>` read is being called from ply_terminal_session_on_new_data anyway
`<halfline>` different terminal
`<halfline>` that's /dev/console
`<mstahl>` okay, that's the only thing that hits a breakpoint on "read"
`<halfline>` yea so there should be a 'ply_keyboard_t" object that reds from the terminal
`<halfline>` it should set up an input closure
`<halfline>` since that didn't happen, you're exactly right, nothing is reading from tty
`<mstahl>` maybe it did happen & there is some handling for a "keyboard removed" event & that didn't clean up everything?
`<halfline>` of course no idea why the keyboard object didn't get created
`<halfline>` could be
`<halfline>` could be some sort of race where if ply_keyboard_stop_watching_for_input is called right when a key is pressed
`<halfline>` boom
`<mstahl>` oh, i should mention that i typed in a passphrase during boot, though that was probably into something launched from the initrd, not sure what the relation of this plymouthd to that is...
`<halfline>` plymouth does handle reading hte passphrase at boot
`<halfline>` so it could be related
`<halfline>` maybe you typed your password hit enter, boot finsihed, and you put the keyboard down
`<halfline>` right as you put it down a key got pressed and boot finished at the same time
`<halfline>` and the race happened
`<halfline>` i'll file a bug
`<halfline>` and investigate if this is possible explanation later
`<halfline>` mstahl: one thing i wonder
`<halfline>` why didn't plymouth quit ?
`<halfline>` i mean i get the the main loop is waking up in a loop
`<halfline>` but it should still be processing other events
`<halfline>` so when the display manager told it to quit, why didn't it quit? that's weird
`<halfline>` does running "sudo plymouth quit" make it quit ?
`<mstahl>` hmmm no idea, what sort of events would that be?
`<halfline>` events from the display manager asking it to quit
`<halfline>` like i wonder if there are two problems
`<halfline>` maybe boot isn't finishing too
`<halfline>` what does systemctl status say ?
`<mstahl>` i've got this: root 8359 0.0 0.0 335840 6608 ? Ssl 10:08 0:00 /usr/sbin/gdm
`<halfline>` do you have an X server too ?
`<halfline>` or is gdm frozen too ?
`<mstahl>` systemctl status doesn't say anything, it hangs
`<halfline>` really ?
`<halfline>` nice
`<mstahl>` Failed to read server status: Connection timed out
`<mstahl>` well that is unfortunate :)
`<halfline>` are you running it as root ?
`<halfline>` if not, does running it as root help ?
`<mstahl>` oh, no ... but that hangs, too
`<halfline>` nice
`<halfline>` so either systemd is fucked too or selinux is preventing systemctl from talking to systemd or something
`<halfline>` I THINK I WOULD REBOOT AT THIS POINT
`<mstahl>` the funny thing is, i've been using it for 9 hours with ssh -X with no issue :)
`<halfline>` no surge of zombie processes ?
`<halfline>` i guess systemd is working on some level then
`<halfline>` if it's reaping children okay
`<mstahl>` ps aux | grep Z shows nothing
`<mstahl>` there's no X process, pstree shows gdm having a 'plymouth' child and 2*[{gdm}
`<halfline>` okay so "plymouth quit" is getting called but never returning ?
`<mstahl>` root 8379 0.0 0.0 16364 1144 ? S 10:08 0:00 /bin/plymouth deactivate
`<halfline>` if you run it now (as root) does it just hang ?
`<mstahl>` "plymouth deactivate" does hang
`<mstahl>` eh, maybe i should detach gdb
`<mstahl>` still hangs
`<halfline>` ah is gdb stopping because of sigpipe ?
`<halfline>` oh
`<mstahl>` no i just didn't continue
`<halfline>` what about plymouth quit ?
`<halfline>` so one thing is
`<mstahl>` error: unexpectedly disconnected from boot status daemon
`<halfline>` heh plymouthd crashed i guess
`<halfline>` nice
`<mstahl>` oh no more plymouthd process
`<halfline>` did an X server start now?
`<mstahl>` ah, on the monitor i see the gdm login now
`<halfline>` does systemctl status work now ?
`<mstahl>` yes, that works too now!
`<mstahl>` how was that dependent on plymouthd?
`<mstahl>` State: degraded
`<halfline>` well systemd does send some info to plymouthd
`<mstahl>` Failed: 1 units
`<halfline>` little surprised it makes systemctl status hang
`<mstahl>` abrt-notification[453608]: Process 425 (plymouthd) crashed in ??()
`<mstahl>` ^ in ??, after i installed all that debuginfo :)
`<mstahl>` oh, theres a proper backtrace further up
`<halfline>` can ou attach it to the bug ?
`<mstahl>` assert fail in #4 0x00007f2965921d2a ply_boot_splash_become_idle (libply-splash-core
`<mstahl>` sure, will do...
`<halfline>` that assertion failure sort of makes sense
`<halfline>` since there was an in-progress deactivate going on
`<halfline>` (i mean obviously a bug, but at least not a head scratching one)
`<halfline>` mstahl: you're just using the stock splash right?
`<halfline>` no custom hot dog based themes etc ?
`<mstahl>` halfline: no customizations, F27 defaults
`<mstahl>` plymouth hasn't logged anything unusual, not until it crashes
`<halfline>` plymouth doesn't really give log messages unless plymouth.debug is on the kernel command line
`<halfline>` maybe it should do it unconditionally and just discard the results every boot unless plymouth.debug is on kernel command line
`<halfline>` would make fixing hard to reproduce problems easier
`<halfline>` alright that's as far as i'm going to take this today, i'm going to get back to what i was working on before
`<halfline>` weird bug, but there are enough clues, ihopefully ill be able to deduce what's going on
`<halfline>` like we know it was deactivating, so we know the keyboard objects were getting deactivated
`<halfline>` and i can certainly fix the infinite loop problem too
`<halfline>` so we can come at it from multiple angles
`<mstahl>` snow gremlins, lol