Skip to content
Snippets Groups Projects
  1. Jan 12, 2024
    • James Gowans's avatar
      kexec: do syscore_shutdown() in kernel_kexec · 7bb94380
      James Gowans authored
      syscore_shutdown() runs driver and module callbacks to get the system into
      a state where it can be correctly shut down.  In commit 6f389a8f ("PM
      / reboot: call syscore_shutdown() after disable_nonboot_cpus()")
      syscore_shutdown() was removed from kernel_restart_prepare() and hence got
      (incorrectly?) removed from the kexec flow.  This was innocuous until
      commit 6735150b ("KVM: Use syscore_ops instead of reboot_notifier to
      hook restart/shutdown") changed the way that KVM registered its shutdown
      callbacks, switching from reboot notifiers to syscore_ops.shutdown.  As
      syscore_shutdown() is missing from kexec, KVM's shutdown hook is not run
      and virtualisation is left enabled on the boot CPU which results in triple
      faults when switching to the new kernel on Intel x86 VT-x with VMXE
      enabled.
      
      Fix this by adding syscore_shutdown() to the kexec sequence.  In terms of
      where to add it, it is being added after migrating the kexec task to the
      boot CPU, but before APs are shut down.  It is not totally clear if this
      is the best place: in commit 6f389a8f ("PM / reboot: call
      syscore_shutdown() after disable_nonboot_cpus()") it is stated that
      "syscore_ops operations should be carried with one CPU on-line and
      interrupts disabled." APs are only offlined later in machine_shutdown(),
      so this syscore_shutdown() is being run while APs are still online.  This
      seems to be the correct place as it matches where syscore_shutdown() is
      run in the reboot and halt flows - they also run it before APs are shut
      down.  The assumption is that the commit message in commit 6f389a8f
      ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") is
      no longer valid.
      
      KVM has been discussed here as it is what broke loudly by not having
      syscore_shutdown() in kexec, but this change impacts more than just KVM;
      all drivers/modules which register a syscore_ops.shutdown callback will
      now be invoked in the kexec flow.  Looking at some of them like x86 MCE it
      is probably more correct to also shut these down during kexec. 
      Maintainers of all drivers which use syscore_ops.shutdown are added on CC
      for visibility.  They are:
      
      arch/powerpc/platforms/cell/spu_base.c  .shutdown = spu_shutdown,
      arch/x86/kernel/cpu/mce/core.c	        .shutdown = mce_syscore_shutdown,
      arch/x86/kernel/i8259.c                 .shutdown = i8259A_shutdown,
      drivers/irqchip/irq-i8259.c	        .shutdown = i8259A_shutdown,
      drivers/irqchip/irq-sun6i-r.c	        .shutdown = sun6i_r_intc_shutdown,
      drivers/leds/trigger/ledtrig-cpu.c	.shutdown = ledtrig_cpu_syscore_shutdown,
      drivers/power/reset/sc27xx-poweroff.c	.shutdown = sc27xx_poweroff_shutdown,
      kernel/irq/generic-chip.c	        .shutdown = irq_gc_shutdown,
      virt/kvm/kvm_main.c	                .shutdown = kvm_shutdown,
      
      This has been tested by doing a kexec on x86_64 and aarch64.
      
      Link: https://lkml.kernel.org/r/20231213064004.2419447-1-jgowans@amazon.com
      
      
      Fixes: 6735150b ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown")
      Signed-off-by: default avatarJames Gowans <jgowans@amazon.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Chen-Yu Tsai <wens@csie.org>
      Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
      Cc: Samuel Holland <samuel@sholland.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Orson Zhai <orsonzhai@gmail.com>
      Cc: Alexander Graf <graf@amazon.de>
      Cc: Jan H. Schoenherr <jschoenh@amazon.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7bb94380
  2. Dec 29, 2023
    • Yuntao Wang's avatar
      kexec_core: fix the assignment to kimage->control_page · 2861b377
      Yuntao Wang authored
      image->control_page represents the starting address for allocating the
      next control page, while hole_end represents the address of the last valid
      byte of the currently allocated control page.
      
      This bug actually does not affect the correctness of allocating control
      pages, because image->control_page is currently only used in
      kimage_alloc_crash_control_pages(), and this function, when allocating
      control pages, will first align image->control_page up to the nearest
      `(1 << order) << PAGE_SHIFT` boundary, then use this value as the
      starting address of the next control page.  This ensures that the newly
      allocated control page will use the correct starting address and not
      overlap with previously allocated control pages.
      
      Although it does not affect the correctness of the final result, it is
      better for us to set image->control_page to the correct value, in case
      it might be used elsewhere in the future, potentially causing errors.
      
      Therefore, after successfully allocating a control page,
      image->control_page should be updated to `hole_end + 1`, rather than
      hole_end.
      
      Link: https://lkml.kernel.org/r/20231221042308.11076-1-ytcoode@gmail.com
      
      
      Signed-off-by: default avatarYuntao Wang <ytcoode@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2861b377
    • Yuntao Wang's avatar
      kexec: modify the meaning of the end parameter in kimage_is_destination_range() · 816d334a
      Yuntao Wang authored
      The end parameter received by kimage_is_destination_range() should be the
      last valid byte address of the target memory segment plus 1.  However, in
      the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions,
      the corresponding value passed to kimage_is_destination_range() is the
      last valid byte address of the target memory segment, which is 1 less.
      
      There are two ways to fix this bug.  We can either correct the logic of
      the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions,
      or we can fix kimage_is_destination_range() by making the end parameter
      represent the last valid byte address of the target memory segment.  Here,
      we choose the second approach.
      
      Due to the modification to kimage_is_destination_range(), we also need to
      adjust its callers, such as kimage_alloc_normal_control_pages() and
      kimage_alloc_page().
      
      Link: https://lkml.kernel.org/r/20231217033528.303333-2-ytcoode@gmail.com
      
      
      Signed-off-by: default avatarYuntao Wang <ytcoode@gmail.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      816d334a
  3. Dec 20, 2023
    • Yuntao Wang's avatar
      kexec: use ALIGN macro instead of open-coding it · db6b6fb7
      Yuntao Wang authored
      Use ALIGN macro instead of open-coding it to improve code readability.
      
      Link: https://lkml.kernel.org/r/20231212142706.25149-1-ytcoode@gmail.com
      
      
      Signed-off-by: default avatarYuntao Wang <ytcoode@gmail.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      db6b6fb7
    • Baoquan He's avatar
      kexec_file: add kexec_file flag to control debug printing · cbc2fe9d
      Baoquan He authored
      Patch series "kexec_file: print out debugging message if required", v4.
      
      Currently, specifying '-d' on kexec command will print a lot of debugging
      informationabout kexec/kdump loading with kexec_load interface.
      
      However, kexec_file_load prints nothing even though '-d' is specified. 
      It's very inconvenient to debug or analyze the kexec/kdump loading when
      something wrong happened with kexec/kdump itself or develper want to check
      the kexec/kdump loading.
      
      In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked
      in code.  If it's passed in, debugging message of kexec_file code will be
      printed out and can be seen from console and dmesg.  Otherwise, the
      debugging message is printed like beofre when pr_debug() is taken.
      
      Note:
      ****
      =====
      1) The code in kexec-tools utility also need be changed to support
      passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified.
      The patch link is here:
      =========
      [PATCH] kexec_file: add kexec_file flag to support debug printing
      http://lists.infradead.org/pipermail/kexec/2023-November/028505.html
      
      2) s390 also has kexec_file code, while I am not sure what debugging
      information is necessary. So leave it to s390 developer.
      
      Test:
      ****
      ====
      Testing was done in v1 on x86_64 and arm64. For v4, tested on x86_64
      again. And on x86_64, the printed messages look like below:
      --------------------------------------------------------------
      kexec measurement buffer for the loaded kernel at 0x207fffe000.
      Loaded purgatory at 0x207fff9000
      Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180
      Loaded 64bit kernel at 0x207c000000 bufsz=0xc88200 memsz=0x3c4a000
      Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280
      Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro
      rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M
      E820 memmap:
      0000000000000000-000000000009a3ff (1)
      000000000009a400-000000000009ffff (2)
      00000000000e0000-00000000000fffff (2)
      0000000000100000-000000006ff83fff (1)
      000000006ff84000-000000007ac50fff (2)
      ......
      000000207fff6150-000000207fff615f (128)
      000000207fff6160-000000207fff714f (1)
      000000207fff7150-000000207fff715f (128)
      000000207fff7160-000000207fff814f (1)
      000000207fff8150-000000207fff815f (128)
      000000207fff8160-000000207fffffff (1)
      nr_segments = 5
      segment[0]: buf=0x000000004e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000
      segment[1]: buf=0x000000009e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000
      segment[2]: buf=0x00000000d879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000
      segment[3]: buf=0x000000001101cd86 bufsz=0xc88200 mem=0x207c000000 memsz=0x3c4a000
      segment[4]: buf=0x00000000c6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000
      kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8
      ---------------------------------------------------------------------------
      
      
      This patch (of 7):
      
      When specifying 'kexec -c -d', kexec_load interface will print loading
      information, e.g the regions where kernel/initrd/purgatory/cmdline are
      put, the memmap passed to 2nd kernel taken as system RAM ranges, and
      printing all contents of struct kexec_segment, etc.  These are very
      helpful for analyzing or positioning what's happening when kexec/kdump
      itself failed.  The debugging printing for kexec_load interface is made in
      user space utility kexec-tools.
      
      Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. 
      Because kexec_file code is mostly implemented in kernel space, and the
      debugging printing functionality is missed.  It's not convenient when
      debugging kexec/kdump loading and jumping with kexec_file_load interface.
      
      Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
      message printing.  And add global variable kexec_file_dbg_print and macro
      kexec_dprintk() to facilitate the printing.
      
      This is a preparation, later kexec_dprintk() will be used to replace the
      existing pr_debug().  Once 'kexec -s -d' is specified, it will print out
      kexec/kdump loading information.  If '-d' is not specified, it regresses
      to pr_debug().
      
      Link: https://lkml.kernel.org/r/20231213055747.61826-1-bhe@redhat.com
      Link: https://lkml.kernel.org/r/20231213055747.61826-2-bhe@redhat.com
      
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Conor Dooley <conor@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cbc2fe9d
  4. Dec 11, 2023
  5. Oct 04, 2023
    • Baoquan He's avatar
      crash_core: move crashk_*res definition into crash_core.c · b631b95d
      Baoquan He authored
      Both crashk_res and crashk_low_res are used to mark the reserved
      crashkernel regions in iomem_resource tree.  And later the generic
      crashkernel resrvation will be added into crash_core.c.  So move
      crashk_res and crashk_low_res definition into crash_core.c to avoid
      compiling error if CONFIG_CRASH_CORE=on while CONFIG_KEXEC_CORE is unset.
      
      Meanwhile include <asm/crash_core.h> in <linux/crash_core.h> if generic
      reservation is needed.  In that case, <asm/crash_core.h> need be added by
      ARCH.  In asm/crash_core.h, ARCH can provide its own macro definitions to
      override macros in <linux/crash_core.h> if needed.  Wrap the including
      into CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope to
      avoid compiling error in other ARCH-es which don't take the generic
      reservation way yet.
      
      Link: https://lkml.kernel.org/r/20230914033142.676708-6-bhe@redhat.com
      
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Jiahao <chenjiahao16@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b631b95d
  6. Aug 24, 2023
    • Eric DeVolder's avatar
      crash: add generic infrastructure for crash hotplug support · 24726275
      Eric DeVolder authored
      To support crash hotplug, a mechanism is needed to update the crash
      elfcorehdr upon CPU or memory changes (eg.  hot un/plug or off/ onlining).
      The crash elfcorehdr describes the CPUs and memory to be written into the
      vmcore.
      
      To track CPU changes, callbacks are registered with the cpuhp mechanism
      via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN).  The crash hotplug
      elfcorehdr update has no explicit ordering requirement (relative to other
      cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. 
      CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a
      new state for crash hotplug.  Also, CPUHP_BP_PREPARE_DYN is the last state
      in the PREPARE group, just prior to the STARTING group, which is very
      close to the CPU starting up in a plug/online situation, or stopping in a
      unplug/ offline situation.  This minimizes the window of time during an
      actual plug/online or unplug/offline situation in which the elfcorehdr
      would be inaccurate.  Note that for a CPU being unplugged or offlined, the
      CPU will still be present in the list of CPUs generated by
      crash_prepare_elf64_headers().  However, there is no need to explicitly
      omit the CPU, see justification in 'crash: change
      crash_prepare_elf64_headers() to for_each_possible_cpu()'.
      
      To track memory changes, a notifier is registered to capture the memblock
      MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
      
      The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event()
      which performs needed tasks and then dispatches the event to the
      architecture specific arch_crash_handle_hotplug_event() to update the
      elfcorehdr with the current state of CPUs and memory.  During the process,
      the kexec_lock is held.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-3-eric.devolder@oracle.com
      
      
      Signed-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      24726275
    • Eric DeVolder's avatar
      crash: move a few code bits to setup support of crash hotplug · 6f991cc3
      Eric DeVolder authored
      Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.
      
      Once the kdump service is loaded, if changes to CPUs or memory occur,
      either by hot un/plug or off/onlining, the crash elfcorehdr must also be
      updated.
      
      The elfcorehdr describes to kdump the CPUs and memory in the system, and
      any inaccuracies can result in a vmcore with missing CPU context or memory
      regions.
      
      The current solution utilizes udev to initiate an unload-then-reload of
      the kdump image (eg.  kernel, initrd, boot_params, purgatory and
      elfcorehdr) by the userspace kexec utility.  In the original post I
      outlined the significant performance problems related to offloading this
      activity to userspace.
      
      This patchset introduces a generic crash handler that registers with the
      CPU and memory notifiers.  Upon CPU or memory changes, from either hot
      un/plug or off/onlining, this generic handler is invoked and performs
      important housekeeping, for example obtaining the appropriate lock, and
      then invokes an architecture specific handler to do the appropriate
      elfcorehdr update.
      
      Note the description in patch 'crash: change crash_prepare_elf64_headers()
      to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
      enables further optimizations related to CPU plug/unplug/online/offline
      performance of elfcorehdr updates.
      
      In the case of x86_64, the arch specific handler generates a new
      elfcorehdr, and overwrites the old one in memory; thus no involvement with
      userspace needed.
      
      To realize the benefits/test this patchset, one must make a couple
      of minor changes to userspace:
      
       - Prevent udev from updating kdump crash kernel on hot un/plug changes.
         Add the following as the first lines to the RHEL udev rule file
         /usr/lib/udev/rules.d/98-kexec.rules:
      
         # The kernel updates the crash elfcorehdr for CPU and memory changes
         SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
         SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
      
         With this changeset applied, the two rules evaluate to false for
         CPU and memory change events and thus skip the userspace
         unload-then-reload of kdump.
      
       - Change to the kexec_file_load for loading the kdump kernel:
         Eg. on RHEL: in /usr/bin/kdumpctl, change to:
          standard_kexec_args="-p -d -s"
         which adds the -s to select kexec_file_load() syscall.
      
      This kernel patchset also supports kexec_load() with a modified kexec
      userspace utility.  A working changeset to the kexec userspace utility is
      posted to the kexec-tools mailing list here:
      
       http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
      
      To use the kexec-tools patch, apply, build and install kexec-tools, then
      change the kdumpctl's standard_kexec_args to replace the -s with
      --hotplug.  The removal of -s reverts to the kexec_load syscall and the
      addition of --hotplug invokes the changes put forth in the kexec-tools
      patch.
      
      
      This patch (of 8):
      
      The crash hotplug support leans on the work for the kexec_file_load()
      syscall.  To also support the kexec_load() syscall, a few bits of code
      need to be move outside of CONFIG_KEXEC_FILE.  As such, these bits are
      moved out of kexec_file.c and into a common location crash_core.c.
      
      In addition, struct crash_mem and crash_notes were moved to new locales so
      that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.
      
      No functionality change intended.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com
      
      
      Signed-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f991cc3
  7. Jun 10, 2023
  8. Feb 03, 2023
  9. Feb 01, 2023
  10. Nov 18, 2022
  11. Sep 12, 2022
    • Fabio M. De Francesco's avatar
      kexec: replace kmap() with kmap_local_page() · 948084f0
      Fabio M. De Francesco authored
      kmap() is being deprecated in favor of kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmap's pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and are still valid.
      
      Since its use in kexec_core.c is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in kexec_core.c.
      
      Tested on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220821182519.9483-1-fmdefrancesco@gmail.com
      
      
      Signed-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      948084f0
    • Valentin Schneider's avatar
      panic, kexec: make __crash_kexec() NMI safe · 05c62574
      Valentin Schneider authored
      Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
      panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
      of mutex_trylock():
      
      	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
      		return 0;
      
      This prevents an nmi_panic() from executing the main body of
      __crash_kexec() which does the actual kexec into the kdump kernel.  The
      warning and return are explained by:
      
        6ce47fd9 ("rtmutex: Warn if trylock is called from hard/softirq context")
        [...]
        The reasons for this are:
      
            1) There is a potential deadlock in the slowpath
      
            2) Another cpu which blocks on the rtmutex will boost the task
      	 which allegedly locked the rtmutex, but that cannot work
      	 because the hard/softirq context borrows the task context.
      
      Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
      and replace it with an atomic variable.  This is somewhat overzealous as
      *some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
      like crash_shrink_memory()), but this has the benefit of involving a
      single unified lock and preventing any future NMI-related surprises.
      
      Tested by triggering NMI panics via:
      
        $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
        $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
        $ echo 1 > /proc/sys/kernel/panic
      
        $ ipmitool power diag
      
      Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
      
      
      Fixes: 6ce47fd9 ("rtmutex: Warn if trylock is called from hard/softirq context")
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Juri Lelli <jlelli@redhat.com>
      Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      05c62574
    • Valentin Schneider's avatar
      kexec: turn all kexec_mutex acquisitions into trylocks · 7bb5da0d
      Valentin Schneider authored
      Patch series "kexec, panic: Making crash_kexec() NMI safe", v4.
      
      
      This patch (of 2):
      
      Most acquistions of kexec_mutex are done via mutex_trylock() - those were
      a direct "translation" from:
      
        8c5a1cf0 ("kexec: use a mutex for locking rather than xchg()")
      
      there have however been two additions since then that use mutex_lock():
      crash_get_memory_size() and crash_shrink_memory().
      
      A later commit will replace said mutex with an atomic variable, and
      locking operations will become atomic_cmpxchg().  Rather than having those
      mutex_lock() become while (atomic_cmpxchg(&lock, 0, 1)), turn them into
      trylocks that can return -EBUSY on acquisition failure.
      
      This does halve the printable size of the crash kernel, but that's still
      neighbouring 2G for 32bit kernels which should be ample enough.
      
      Link: https://lkml.kernel.org/r/20220630223258.4144112-1-vschneid@redhat.com
      Link: https://lkml.kernel.org/r/20220630223258.4144112-2-vschneid@redhat.com
      
      
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Juri Lelli <jlelli@redhat.com>
      Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7bb5da0d
  12. Jul 15, 2022
  13. Apr 29, 2022
  14. Apr 25, 2022
  15. Apr 14, 2022
  16. Dec 13, 2021
  17. Jul 26, 2021
    • John Ogness's avatar
      printk: remove safe buffers · 93d102f0
      John Ogness authored
      
      With @logbuf_lock removed, the high level printk functions for
      storing messages are lockless. Messages can be stored from any
      context, so there is no need for the NMI and safe buffers anymore.
      Remove the NMI and safe buffers.
      
      Although the safe buffers are removed, the NMI and safe context
      tracking is still in place. In these contexts, store the message
      immediately but still use irq_work to defer the console printing.
      
      Since printk recursion tracking is in place, safe context tracking
      for most of printk is not needed. Remove it. Only safe context
      tracking relating to the console and console_owner locks is left
      in place. This is because the console and console_owner locks are
      needed for the actual printing.
      
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20210715193359.25946-4-john.ogness@linutronix.de
      93d102f0
  18. Jul 01, 2021
  19. May 07, 2021
  20. Jan 25, 2021
  21. Jan 06, 2021
    • Al Viro's avatar
      elf_prstatus: collect the common part (everything before pr_reg) into a struct · f2485a2d
      Al Viro authored
      
      Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating
      the beginning of compat_elf_prstatus, take these fields into a separate
      structure (compat_elf_prstatus_common), so that it could be reused.  Due to
      the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we
      need the same shape change done to native struct elf_prstatus, gathering the
      fields prior to pr_reg into a new structure (struct elf_prstatus_common).
      
      Fortunately, offset of pr_reg is always a multiple of 16 with no padding
      right before it, so it's possible to turn all the stuff prior to it into
      a single member without disturbing the layout.
      
      [build fix from Geert Uytterhoeven folded in]
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f2485a2d
  22. Nov 20, 2020
    • Eric Biggers's avatar
      crypto: sha - split sha.h into sha1.h and sha2.h · a24d22b2
      Eric Biggers authored
      
      Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
      and <crypto/sha3.h> contains declarations for SHA-3.
      
      This organization is inconsistent, but more importantly SHA-1 is no
      longer considered to be cryptographically secure.  So to the extent
      possible, SHA-1 shouldn't be grouped together with any of the other SHA
      versions, and usage of it should be phased out.
      
      Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
      <crypto/sha2.h>, and make everyone explicitly specify whether they want
      the declarations for SHA-1, SHA-2, or both.
      
      This avoids making the SHA-1 declarations visible to files that don't
      want anything to do with SHA-1.  It also prepares for potentially moving
      sha1.h into a new insecure/ or dangerous/ directory.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      a24d22b2
  23. Oct 16, 2020
  24. Sep 10, 2020
  25. Jan 08, 2020
  26. Sep 26, 2019
  27. Jun 19, 2019
Loading