# ###########################################################################
# ##################### kdumpst: pstore + kdump tooling #####################
# ###########################################################################
#
#
# This is the kdumpst infrastructure; the goal is to collect data whenever
# a kernel crash/panic is detected. There is a lightweight collection, that
# only grabs dmesg, and a more complete setting to grab the whole (compressed)
# vmcore. It supports both pstore (for the lightweight collection) and kdump
# for both collecting dmesg or even the full vmcore. In kdump "mode", both
# initcpio and dracut initramfs images are supported. The focus is Arch Linux
# (and spin-off distros), but should work in most systemd-based distros.
#
#
# ############################ HOW-TO USE IT ##############################
#
# 1. Install the package with pacman if not available in your system; to check
# if it's already installed look the pacman installed package list. Also, be
# sure to properly enable/load the systemd service, since the services on Arch
# Linux do not autoload - for that, just run 'systemctl enable --now kdumpst-init.service'.
#
# 2. In a crash event, the dmesg log is collected, and by default this happens
# via the pstore mechanism, i.e., no crashkernel memory needs to be reserved
# and no GRUB change is required. If 'lsmod' shows "ramoops", then pstore is
# likely in use (check dmesg for "ramoops" to be sure). Some extra files are
# collected besides dmesg, like dmidecode output and "/etc/os-release".
#
# 3. It might be necessary to reserve a bit of memory for pstore in the general
# case, if not pre-reserved due to kernel alignment or through the device-tree;
# check the output of "grep buffer /proc/iomem" - if empty or too small buffer,
# one could save PSTORE_MEM_AMOUNT bytes (see the config file) from kernel use
# with the "mem=" parameter (requires bootloader configuration).
#
# 4. The logs are stored in a ZIP file in the folder at "$MOUNT_FOLDER/logs"
# (see the config file); this file is named as: "kdumpst-TIMESTAMP.zip",
# where TIMESTAMP is the current timestamp (UTC timezone).
#
# 5. (IMPORTANT) Please, test the infrastructure in order to see if a dummy
# crash log is collected before using it to try debugging complex issues.
# In order to do that, login to a shell and execute, as root user:
# 'echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger'
#
# This action will trigger a dummy crash and reboot the system; check if
# there is a ZIP file with the crash logs in the directory described in (3).
#
# 6. Various tunings are available at "/usr/share/kdumpst.d/*" files; for
# example, the users can choose kdump instead of pstore (USE_PSTORE_RAM),
# and if using Kdump, collect the full vmcore (FULL_COREDUMP) or not.
# The vmcore is not stored in the ZIP file, but it's saved in the folder
# "$MOUNT_FOLDER/crash".
# NOTICE that, if kdump is used instead of pstore (either per user's choice
# or due to some failure in pstore), a reboot is necessary before kdump is
# usable, in order to effectively reserve crashkernel memory.
#
# 7. Error and succeeding messages are sent to systemd journal, so running
# 'journalctl -b | grep kdumpst' would hopefully bring some information.
#
#
# ############################## DETAILS ##################################
# CAVEATS / INSTRUCTIONS
# ###########################################################################
# (a) We automatically edit GRUB config in case pstore fails or if the user's
# choice is to use kdump. But it requires one reboot in order the crashkernel
# memory is effectively reserved by kernel.
#
# In case Kdump is used, the crashkernel necessary memory was empirically
# determined; setting 192M wasn't enough always, so 256M seems good enough.
# This amount might change in future kernel versions, requiring tests using
# the approach suggested in the step (5) above.
#
#
# TODOs
# ###########################################################################
# * The package currently doesn't uninstall the dracut/initcpio hooks, this
# is something to be implemented soon, either in the install script or as an
# option of kdumpst-load script.
#
# * We should explore /etc/grub.d/ instead of messing with the general grub
# config file directly to add the "crashkernel" kernel parameter.
#
# * Would be interesting to have a clean-up mechanism, to keep up to N most
# recent ZIP log files, instead of keeping all of them forever.
#
# * Pstore ramoops back-end has some limitations that we're discussing with
# the kernel community - right now we can only collect ONE dmesg and its
# size is truncated on "record_size" bytes, not allowing a file split like
# efi-pstore; thankfully we can still save a 2MiB dmesg, which seems enough.
#

Guilherme G. Piccoli
authored
The gitlab issue #19 was first reported by Saurabh, regarding GPU drivers getting included on the kdump initrd. We've managed to improve the code a bit, but the solution we proposed there was both wrong (for initcpio) *and* incomplete. The user Ethanell reopened the issue, mentioning that even after the patch that allegedly fixed the problem, they were getting amdgpu driver included in the kdump initrd. That was the "wrong" portion of the fix: in my naivety, I thought we could just remove the modules in the kdump install hook. But...happens that initcpio works with a lazy approach: some install hooks execute add_module(), and that signals to the later effective inclusion of the modules by mkinitcpio; so in other words, the kdump install hook was deleting no modules at all. Special thanks to Foxboron and noclaf for the discussion about that on IRC, which was very enlightening. Now, the proper approach is simple: just to skip hooks that add the GPU drivers, like plymouth and kms. Also took the opportunity to include the microcode hook in the ban list, since we don't want kernel updating CPU ucode in case of the microcode packages were updated (would be rare, but why not save a bit of space dropping that?). Finally, I've removed the GPU drivers delete code from the initcpio kdump install hook, since it was a nop in the end. That's it, right? No. Still there's the part two of my statement above: the prior approach was wrong, but not only, it was also *incomplete*. That is due to cases like Ethanell's: users that manually include GPU drivers in their initramfs images, for things like crypto or splash screen. It's also hard or better saying, *impossible*, to guess all hooks that add GPU drivers to the initramfs. So to cover manual inclusions or other hooks, we also add hereby extra kdump command-line flags to avoid the DRM and some GFX drivers probe during the kdump kernel boot. In order to not grow the drivers ban list too much, my attempt was to add common in-tree GPU drivers, but also block the DRM init call. Hopefully this time this is enough to disable GFX on kdump! Fixes: 4767b809 ("initramfs: Fix the removal of GPU drivers from the minimal initrd") Reported-by:Ethanell <(@flifloo)> Signed-off-by:
Guilherme G. Piccoli <gpiccoli@igalia.com>