Skip to content
Snippets Groups Projects
  1. Dec 29, 2023
  2. Dec 20, 2023
    • Baoquan He's avatar
      kexec_file: print out debugging message if required · a85ee18c
      Baoquan He authored
      Then when specifying '-d' for kexec_file_load interface, loaded locations
      of kernel/initrd/cmdline etc can be printed out to help debug.
      
      Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file
      loading related codes.
      
      And also print out type/start/head of kimage and flags to help debug.
      
      Link: https://lkml.kernel.org/r/20231213055747.61826-3-bhe@redhat.com
      
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Conor Dooley <conor@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a85ee18c
    • Baoquan He's avatar
      kexec_file: add kexec_file flag to control debug printing · cbc2fe9d
      Baoquan He authored
      Patch series "kexec_file: print out debugging message if required", v4.
      
      Currently, specifying '-d' on kexec command will print a lot of debugging
      informationabout kexec/kdump loading with kexec_load interface.
      
      However, kexec_file_load prints nothing even though '-d' is specified. 
      It's very inconvenient to debug or analyze the kexec/kdump loading when
      something wrong happened with kexec/kdump itself or develper want to check
      the kexec/kdump loading.
      
      In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked
      in code.  If it's passed in, debugging message of kexec_file code will be
      printed out and can be seen from console and dmesg.  Otherwise, the
      debugging message is printed like beofre when pr_debug() is taken.
      
      Note:
      ****
      =====
      1) The code in kexec-tools utility also need be changed to support
      passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified.
      The patch link is here:
      =========
      [PATCH] kexec_file: add kexec_file flag to support debug printing
      http://lists.infradead.org/pipermail/kexec/2023-November/028505.html
      
      2) s390 also has kexec_file code, while I am not sure what debugging
      information is necessary. So leave it to s390 developer.
      
      Test:
      ****
      ====
      Testing was done in v1 on x86_64 and arm64. For v4, tested on x86_64
      again. And on x86_64, the printed messages look like below:
      --------------------------------------------------------------
      kexec measurement buffer for the loaded kernel at 0x207fffe000.
      Loaded purgatory at 0x207fff9000
      Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180
      Loaded 64bit kernel at 0x207c000000 bufsz=0xc88200 memsz=0x3c4a000
      Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280
      Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro
      rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M
      E820 memmap:
      0000000000000000-000000000009a3ff (1)
      000000000009a400-000000000009ffff (2)
      00000000000e0000-00000000000fffff (2)
      0000000000100000-000000006ff83fff (1)
      000000006ff84000-000000007ac50fff (2)
      ......
      000000207fff6150-000000207fff615f (128)
      000000207fff6160-000000207fff714f (1)
      000000207fff7150-000000207fff715f (128)
      000000207fff7160-000000207fff814f (1)
      000000207fff8150-000000207fff815f (128)
      000000207fff8160-000000207fffffff (1)
      nr_segments = 5
      segment[0]: buf=0x000000004e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000
      segment[1]: buf=0x000000009e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000
      segment[2]: buf=0x00000000d879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000
      segment[3]: buf=0x000000001101cd86 bufsz=0xc88200 mem=0x207c000000 memsz=0x3c4a000
      segment[4]: buf=0x00000000c6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000
      kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8
      ---------------------------------------------------------------------------
      
      
      This patch (of 7):
      
      When specifying 'kexec -c -d', kexec_load interface will print loading
      information, e.g the regions where kernel/initrd/purgatory/cmdline are
      put, the memmap passed to 2nd kernel taken as system RAM ranges, and
      printing all contents of struct kexec_segment, etc.  These are very
      helpful for analyzing or positioning what's happening when kexec/kdump
      itself failed.  The debugging printing for kexec_load interface is made in
      user space utility kexec-tools.
      
      Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. 
      Because kexec_file code is mostly implemented in kernel space, and the
      debugging printing functionality is missed.  It's not convenient when
      debugging kexec/kdump loading and jumping with kexec_file_load interface.
      
      Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging
      message printing.  And add global variable kexec_file_dbg_print and macro
      kexec_dprintk() to facilitate the printing.
      
      This is a preparation, later kexec_dprintk() will be used to replace the
      existing pr_debug().  Once 'kexec -s -d' is specified, it will print out
      kexec/kdump loading information.  If '-d' is not specified, it regresses
      to pr_debug().
      
      Link: https://lkml.kernel.org/r/20231213055747.61826-1-bhe@redhat.com
      Link: https://lkml.kernel.org/r/20231213055747.61826-2-bhe@redhat.com
      
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Conor Dooley <conor@kernel.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cbc2fe9d
  3. Dec 11, 2023
    • Baoquan He's avatar
      kexec_file: load kernel at top of system RAM if required · b3ba2341
      Baoquan He authored
      Patch series "kexec_file: Load kernel at top of system RAM if required".
      
      Justification:
      ==============
      
      Kexec_load interface has been doing top down searching and loading
      kernel/initrd/purgtory etc to prepare for kexec reboot.  In that way, the
      benefits are that it avoids to consume and fragment limited low memory
      which satisfy DMA buffer allocation and big chunk of continuous memory
      during system init; and avoids to stir with BIOS/FW reserved or occupied
      areas, or corner case handling/work around/quirk occupied areas when doing
      system init.  By the way, the top-down searching and loading of kexec-ed
      kernel is done in user space utility code.
      
      For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
      ignored.  It calls walk_system_ram_res() directly to go through all
      resources of System RAM bottom up, to find an available memory region,
      then call locate_mem_hole_callback() to allocate memory in that found
      memory region from top to down.  This is not expected and inconsistent
      with kexec_load.
      
      Implementation
      ===============
      
      In patch 1, introduce a new function walk_system_ram_res_rev() which is a
      variant of walk_system_ram_res(), it walks through a list of all the
      resources of System RAM in reversed order, i.e., from higher to lower.
      
      In patch 2, check if kexec_buf.top_down is 'true' in
      kexec_walk_resources(), if yes, call walk_system_ram_res_rev() to find
      memory region of system RAM from top to down to load kernel/initrd etc.
      
      Background information: ======================= And I ever tried this in
      the past in a different way, please see below link.  In the post, I tried
      to adjust struct sibling linking code, replace the the singly linked list
      with list_head so that walk_system_ram_res_rev() can be implemented in a
      much easier way.  Finally I failed. 
      https://lore.kernel.org/all/20180718024944.577-4-bhe@redhat.com/
      
      This time, I picked up the patch from AKASHI Takahiro's old post and made
      some change to take as the current patch 1:
      https://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/531456.html
      
      
      This patch (of 2):
      
      Kexec_load interface has been doing top down searching and loading
      kernel/initrd/purgtory etc to prepare for kexec reboot.  In that way, the
      benefits are that it avoids to consume and fragment limited low memory
      which satisfy DMA buffer allocation and big chunk of continuous memory
      during system init; and avoids to stir with BIOS/FW reserved or occupied
      areas, or corner case handling/work around/quirk occupied areas when doing
      system init.  By the way, the top-down searching and loading of kexec-ed
      kernel is done in user space utility code.
      
      For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply
      ignored.  It calls walk_system_ram_res() directly to go through all
      resources of System RAM bottom up, to find an available memory region,
      then call locate_mem_hole_callback() to allocate memory in that found
      memory region from top to down.  This is not expected and inconsistent
      with kexec_load.
      
      Here check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if
      yes, call the newly added walk_system_ram_res_rev() to find memory region
      of system RAM from top to down to load kernel/initrd etc.
      
      Link: https://lkml.kernel.org/r/20231114091658.228030-1-bhe@redhat.com
      Link: https://lkml.kernel.org/r/20231114091658.228030-3-bhe@redhat.com
      
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b3ba2341
  4. Aug 24, 2023
    • Eric DeVolder's avatar
      kexec: exclude elfcorehdr from the segment digest · f7cc804a
      Eric DeVolder authored
      When a crash kernel is loaded via the kexec_file_load() syscall, the
      kernel places the various segments (ie crash kernel, crash initrd,
      boot_params, elfcorehdr, purgatory, etc) in memory.  For those
      architectures that utilize purgatory, a hash digest of the segments is
      calculated for integrity checking.  The digest is embedded into the
      purgatory image prior to placing in memory.
      
      Updates to the elfcorehdr in response to CPU and memory changes would
      cause the purgatory integrity checking to fail (at crash time, and no
      vmcore created).  Therefore, the elfcorehdr segment is explicitly excluded
      from the purgatory digest, enabling updates to the elfcorehdr while also
      avoiding the need to recompute the hash digest and reload purgatory.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-4-eric.devolder@oracle.com
      
      
      Signed-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Suggested-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f7cc804a
    • Eric DeVolder's avatar
      crash: move a few code bits to setup support of crash hotplug · 6f991cc3
      Eric DeVolder authored
      Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.
      
      Once the kdump service is loaded, if changes to CPUs or memory occur,
      either by hot un/plug or off/onlining, the crash elfcorehdr must also be
      updated.
      
      The elfcorehdr describes to kdump the CPUs and memory in the system, and
      any inaccuracies can result in a vmcore with missing CPU context or memory
      regions.
      
      The current solution utilizes udev to initiate an unload-then-reload of
      the kdump image (eg.  kernel, initrd, boot_params, purgatory and
      elfcorehdr) by the userspace kexec utility.  In the original post I
      outlined the significant performance problems related to offloading this
      activity to userspace.
      
      This patchset introduces a generic crash handler that registers with the
      CPU and memory notifiers.  Upon CPU or memory changes, from either hot
      un/plug or off/onlining, this generic handler is invoked and performs
      important housekeeping, for example obtaining the appropriate lock, and
      then invokes an architecture specific handler to do the appropriate
      elfcorehdr update.
      
      Note the description in patch 'crash: change crash_prepare_elf64_headers()
      to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that
      enables further optimizations related to CPU plug/unplug/online/offline
      performance of elfcorehdr updates.
      
      In the case of x86_64, the arch specific handler generates a new
      elfcorehdr, and overwrites the old one in memory; thus no involvement with
      userspace needed.
      
      To realize the benefits/test this patchset, one must make a couple
      of minor changes to userspace:
      
       - Prevent udev from updating kdump crash kernel on hot un/plug changes.
         Add the following as the first lines to the RHEL udev rule file
         /usr/lib/udev/rules.d/98-kexec.rules:
      
         # The kernel updates the crash elfcorehdr for CPU and memory changes
         SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
         SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
      
         With this changeset applied, the two rules evaluate to false for
         CPU and memory change events and thus skip the userspace
         unload-then-reload of kdump.
      
       - Change to the kexec_file_load for loading the kdump kernel:
         Eg. on RHEL: in /usr/bin/kdumpctl, change to:
          standard_kexec_args="-p -d -s"
         which adds the -s to select kexec_file_load() syscall.
      
      This kernel patchset also supports kexec_load() with a modified kexec
      userspace utility.  A working changeset to the kexec userspace utility is
      posted to the kexec-tools mailing list here:
      
       http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
      
      To use the kexec-tools patch, apply, build and install kexec-tools, then
      change the kdumpctl's standard_kexec_args to replace the -s with
      --hotplug.  The removal of -s reverts to the kexec_load syscall and the
      addition of --hotplug invokes the changes put forth in the kexec-tools
      patch.
      
      
      This patch (of 8):
      
      The crash hotplug support leans on the work for the kexec_file_load()
      syscall.  To also support the kexec_load() syscall, a few bits of code
      need to be move outside of CONFIG_KEXEC_FILE.  As such, these bits are
      moved out of kexec_file.c and into a common location crash_core.c.
      
      In addition, struct crash_mem and crash_notes were moved to new locales so
      that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.
      
      No functionality change intended.
      
      Link: https://lkml.kernel.org/r/20230814214446.6659-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230814214446.6659-2-eric.devolder@oracle.com
      
      
      Signed-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Reviewed-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Acked-by: default avatarHari Bathini <hbathini@linux.ibm.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Akhil Raj <lf32.dev@gmail.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Weißschuh <linux@weissschuh.net>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6f991cc3
  5. Aug 18, 2023
  6. Aug 07, 2023
  7. Jun 12, 2023
    • Ricardo Ribalda Delgado's avatar
      kexec: support purgatories with .text.hot sections · 8652d44f
      Ricardo Ribalda Delgado authored
      Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7.
      
      When upreving llvm I realised that kexec stopped working on my test
      platform.
      
      The reason seems to be that due to PGO there are multiple .text sections
      on the purgatory, and kexec does not supports that.
      
      
      This patch (of 4):
      
      Clang16 links the purgatory text in two sections when PGO is in use:
      
        [ 1] .text             PROGBITS         0000000000000000  00000040
             00000000000011a1  0000000000000000  AX       0     0     16
        [ 2] .rela.text        RELA             0000000000000000  00003498
             0000000000000648  0000000000000018   I      24     1     8
        ...
        [17] .text.hot.        PROGBITS         0000000000000000  00003220
             000000000000020b  0000000000000000  AX       0     0     1
        [18] .rela.text.hot.   RELA             0000000000000000  00004428
             0000000000000078  0000000000000018   I      24    17     8
      
      And both of them have their range [sh_addr ... sh_addr+sh_size] on the
      area pointed by `e_entry`.
      
      This causes that image->start is calculated twice, once for .text and
      another time for .text.hot. The second calculation leaves image->start
      in a random location.
      
      Because of this, the system crashes immediately after:
      
      kexec_core: Starting new kernel
      
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-0-b05c520b7296@chromium.org
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-1-b05c520b7296@chromium.org
      
      
      Fixes: 93045705 ("kernel/kexec_file.c: split up __kexec_load_puragory")
      Signed-off-by: default avatarRicardo Ribalda <ribalda@chromium.org>
      Reviewed-by: default avatarRoss Zwisler <zwisler@google.com>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Reviewed-by: default avatarPhilipp Rudo <prudo@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Rix <trix@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8652d44f
  8. Jun 10, 2023
  9. Apr 08, 2023
  10. Feb 03, 2023
  11. Nov 18, 2022
  12. Sep 12, 2022
    • Valentin Schneider's avatar
      panic, kexec: make __crash_kexec() NMI safe · 05c62574
      Valentin Schneider authored
      Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
      panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
      of mutex_trylock():
      
      	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
      		return 0;
      
      This prevents an nmi_panic() from executing the main body of
      __crash_kexec() which does the actual kexec into the kdump kernel.  The
      warning and return are explained by:
      
        6ce47fd9 ("rtmutex: Warn if trylock is called from hard/softirq context")
        [...]
        The reasons for this are:
      
            1) There is a potential deadlock in the slowpath
      
            2) Another cpu which blocks on the rtmutex will boost the task
      	 which allegedly locked the rtmutex, but that cannot work
      	 because the hard/softirq context borrows the task context.
      
      Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
      and replace it with an atomic variable.  This is somewhat overzealous as
      *some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
      like crash_shrink_memory()), but this has the benefit of involving a
      single unified lock and preventing any future NMI-related surprises.
      
      Tested by triggering NMI panics via:
      
        $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
        $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
        $ echo 1 > /proc/sys/kernel/panic
      
        $ ipmitool power diag
      
      Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
      
      
      Fixes: 6ce47fd9 ("rtmutex: Warn if trylock is called from hard/softirq context")
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Juri Lelli <jlelli@redhat.com>
      Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      05c62574
  13. Jul 15, 2022
  14. Jul 13, 2022
  15. Jun 17, 2022
  16. May 27, 2022
  17. May 19, 2022
    • Liao Chang's avatar
      kexec_file: Fix kexec_file.c build error for riscv platform · 4853f68d
      Liao Chang authored
      
      When CONFIG_KEXEC_FILE is set for riscv platform, the compilation of
      kernel/kexec_file.c generate build error:
      
      kernel/kexec_file.c: In function 'crash_prepare_elf64_headers':
      ./arch/riscv/include/asm/page.h:110:71: error: request for member 'virt_addr' in something not a structure or union
        110 |  ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr))
            |                                                                       ^
      ./arch/riscv/include/asm/page.h:131:2: note: in expansion of macro 'is_linear_mapping'
        131 |  is_linear_mapping(_x) ?       \
            |  ^~~~~~~~~~~~~~~~~
      ./arch/riscv/include/asm/page.h:140:31: note: in expansion of macro '__va_to_pa_nodebug'
        140 | #define __phys_addr_symbol(x) __va_to_pa_nodebug(x)
            |                               ^~~~~~~~~~~~~~~~~~
      ./arch/riscv/include/asm/page.h:143:24: note: in expansion of macro '__phys_addr_symbol'
        143 | #define __pa_symbol(x) __phys_addr_symbol(RELOC_HIDE((unsigned long)(x), 0))
            |                        ^~~~~~~~~~~~~~~~~~
      kernel/kexec_file.c:1327:36: note: in expansion of macro '__pa_symbol'
       1327 |   phdr->p_offset = phdr->p_paddr = __pa_symbol(_text);
      
      This occurs is because the "kernel_map" referenced in macro
      is_linear_mapping()  is suppose to be the one of struct kernel_mapping
      defined in arch/riscv/mm/init.c, but the 2nd argument of
      crash_prepare_elf64_header() has same symbol name, in expansion of macro
      is_linear_mapping in function crash_prepare_elf64_header(), "kernel_map"
      actually is the local variable.
      
      Signed-off-by: default avatarLiao Chang <liaochang1@huawei.com>
      Link: https://lore.kernel.org/r/20220408100914.150110-2-lizhengyu3@huawei.com
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      4853f68d
  18. Nov 06, 2021
    • David Hildenbrand's avatar
      memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED · f7892d8e
      David Hildenbrand authored
      Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED,
      indicating that we're dealing with a memory region that is never
      indicated in the firmware-provided memory map, but always detected and
      added by a driver.
      
      Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such
      memory regions like ordinary MEMBLOCK_NONE memory regions -- for
      example, when selecting memory regions to add to the vmcore for dumping
      in the crashkernel via for_each_mem_range().
      
      However, especially kexec_file is not supposed to select such memblocks
      via for_each_free_mem_range() / for_each_free_mem_range_reverse() to
      place kexec images, similar to how we handle
      IORESOURCE_SYSRAM_DRIVER_MANAGED without CONFIG_ARCH_KEEP_MEMBLOCK.
      
      We'll make sure that memory hotplug code sets the flag where applicable
      (IORESOURCE_SYSRAM_DRIVER_MANAGED) next.  This prepares architectures
      that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem
      support.
      
      Note that kexec *must not* indicate this memory to the second kernel and
      *must not* place kexec-images on this memory.  Let's add a comment to
      kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED
      now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in
      locate_mem_hole_callback() for kexec_walk_resources().
      
      Also note that MEMBLOCK_HOTPLUG cannot be reused due to different
      semantics:
      	MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the
      	firmware-provided memory map and added to the system early during
      	boot; kexec *has to* indicate this memory to the second kernel and
      	can place kexec-images on this memory. After memory hotunplug,
      	kexec has to be re-armed. We mostly ignore this flag when
      	"movable_node" is not set on the kernel command line, because
      	then we're told to not care about hotunpluggability of such
      	memory regions.
      
      	MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in
      	the firmware-provided memory map; this memory is always detected
      	and added to the system by a driver; memory might not actually be
      	physically hotunpluggable. kexec *must not* indicate this memory to
      	the second kernel and *must not* place kexec-images on this memory.
      
      Link: https://lkml.kernel.org/r/20211004093605.5830-5-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Jianyong Wu <Jianyong.Wu@arm.com>
      Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Shahab Vahedi <shahab@synopsys.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7892d8e
  19. May 07, 2021
  20. Feb 10, 2021
  21. Nov 20, 2020
    • Eric Biggers's avatar
      crypto: sha - split sha.h into sha1.h and sha2.h · a24d22b2
      Eric Biggers authored
      
      Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
      and <crypto/sha3.h> contains declarations for SHA-3.
      
      This organization is inconsistent, but more importantly SHA-1 is no
      longer considered to be cryptographically secure.  So to the extent
      possible, SHA-1 shouldn't be grouped together with any of the other SHA
      versions, and usage of it should be phased out.
      
      Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
      <crypto/sha2.h>, and make everyone explicitly specify whether they want
      the declarations for SHA-1, SHA-2, or both.
      
      This avoids making the SHA-1 declarations visible to files that don't
      want anything to do with SHA-1.  It also prepares for potentially moving
      sha1.h into a new insecure/ or dangerous/ directory.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      a24d22b2
  22. Oct 16, 2020
    • David Hildenbrand's avatar
      kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED · 7cf603d1
      David Hildenbrand authored
      
      IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
      always set to 0 by hardware.  This is far from beautiful (and confusing),
      and the bit only applies to SYSRAM.  So let's move it out of the
      bus-specific (PnP) defined bits.
      
      We'll add another SYSRAM specific bit soon.  If we ever need more bits for
      other purposes, we can steal some from "desc", or reshuffle/regroup what
      we have.
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Julien Grall <julien@xen.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Leonardo Bras <leobras.c@gmail.com>
      Cc: Libor Pechacek <lpechacek@suse.cz>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: "Oliver O'Halloran" <oohall@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Roger Pau Monné <roger.pau@citrix.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Link: https://lkml.kernel.org/r/20200911103459.10306-3-david@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7cf603d1
  23. Oct 05, 2020
  24. Aug 06, 2020
    • Lianbo Jiang's avatar
      kexec_file: Correctly output debugging information for the PT_LOAD ELF header · 475f63ae
      Lianbo Jiang authored and Ingo Molnar's avatar Ingo Molnar committed
      
      Currently, when we enable the debugging switch to debug kexec_file,
      we always get the following incorrect results:
      
        kexec_file: Crash PT_LOAD elf header. phdr=00000000c988639b vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=51 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=000000003cca69a0 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=52 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000c584cb9f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=53 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000cf85d57f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=54 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000a4a8f847 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=55 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000272ec49f vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=56 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000ea0b65de vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=57 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=000000001f5e490c vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=58 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000dfe4109e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=59 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000480ed2b6 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=60 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=0000000080b65151 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=61 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=0000000024e31c5e vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=62 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000332e0385 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=63 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=000000002754d5da vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=64 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=00000000783320dd vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=65 p_offset=0x0
        kexec_file: Crash PT_LOAD elf header. phdr=0000000076fe5b64 vaddr=0x0, paddr=0x0, sz=0x0 e_phnum=66 p_offset=0x0
      
      The reason is that kernel always prints the values of the next PT_LOAD
      instead of the current PT_LOAD. Change it to ensure that we can get the
      correct debugging information.
      
      [ mingo: Amended changelog, capitalized "ELF". ]
      
      Signed-off-by: default avatarLianbo Jiang <lijiang@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Link: https://lore.kernel.org/r/20200804044933.1973-4-lijiang@redhat.com
      475f63ae
    • Lianbo Jiang's avatar
      kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges · a2e9a95d
      Lianbo Jiang authored and Ingo Molnar's avatar Ingo Molnar committed
      
      The crash_exclude_mem_range() function can only handle one memory region a time.
      
      It will fail in the case in which the passed in area covers several memory
      regions. In this case, it will only exclude the first region, then return,
      but leave the later regions unsolved.
      
      E.g in a NEC system with two usable RAM regions inside the low 1M:
      
        ...
        BIOS-e820: [mem 0x0000000000000000-0x000000000003efff] usable
        BIOS-e820: [mem 0x000000000003f000-0x000000000003ffff] reserved
        BIOS-e820: [mem 0x0000000000040000-0x000000000009ffff] usable
      
      It will only exclude the memory region [0, 0x3efff], the memory region
      [0x40000, 0x9ffff] will still be added into /proc/vmcore, which may cause
      the following failure when dumping vmcore:
      
       ioremap on RAM at 0x0000000000040000 - 0x0000000000040fff
       WARNING: CPU: 0 PID: 665 at arch/x86/mm/ioremap.c:186 __ioremap_caller+0x2c7/0x2e0
       ...
       RIP: 0010:__ioremap_caller+0x2c7/0x2e0
       ...
       cp: error reading '/proc/vmcore': Cannot allocate memory
       kdump: saving vmcore failed
      
      In order to fix this bug, let's extend the crash_exclude_mem_range()
      to handle the overlapping ranges.
      
      [ mingo: Amended the changelog. ]
      
      Signed-off-by: default avatarLianbo Jiang <lijiang@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Link: https://lore.kernel.org/r/20200804044933.1973-3-lijiang@redhat.com
      a2e9a95d
  25. Jul 29, 2020
  26. Jul 20, 2020
    • Tyler Hicks's avatar
      ima: Support additional conditionals in the KEXEC_CMDLINE hook function · 4834177e
      Tyler Hicks authored
      
      Take the properties of the kexec kernel's inode and the current task
      ownership into consideration when matching a KEXEC_CMDLINE operation to
      the rules in the IMA policy. This allows for some uniformity when
      writing IMA policy rules for KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK,
      and KEXEC_CMDLINE operations.
      
      Prior to this patch, it was not possible to write a set of rules like
      this:
      
       dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t
       dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t
       dont_measure func=KEXEC_CMDLINE obj_type=foo_t
       measure func=KEXEC_KERNEL_CHECK
       measure func=KEXEC_INITRAMFS_CHECK
       measure func=KEXEC_CMDLINE
      
      The inode information associated with the kernel being loaded by a
      kexec_kernel_load(2) syscall can now be included in the decision to
      measure or not
      
      Additonally, the uid, euid, and subj_* conditionals can also now be
      used in KEXEC_CMDLINE rules. There was no technical reason as to why
      those conditionals weren't being considered previously other than
      ima_match_rules() didn't have a valid inode to use so it immediately
      bailed out for KEXEC_CMDLINE operations rather than going through the
      full list of conditional comparisons.
      
      Signed-off-by: default avatarTyler Hicks <tyhicks@linux.microsoft.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: kexec@lists.infradead.org
      Reviewed-by: default avatarLakshmi Ramasubramanian <nramas@linux.microsoft.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      4834177e
  27. Jun 26, 2020
    • Lianbo Jiang's avatar
      kexec: do not verify the signature without the lockdown or mandatory signature · fd7af71b
      Lianbo Jiang authored
      Signature verification is an important security feature, to protect
      system from being attacked with a kernel of unknown origin.  Kexec
      rebooting is a way to replace the running kernel, hence need be secured
      carefully.
      
      In the current code of handling signature verification of kexec kernel,
      the logic is very twisted.  It mixes signature verification, IMA
      signature appraising and kexec lockdown.
      
      If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
      signature, the supported crypto, and key, we don't think this is wrong,
      Unless kexec lockdown is executed.  IMA is considered as another kind of
      signature appraising method.
      
      If kexec kernel image has signature/crypto/key, it has to go through the
      signature verification and pass.  Otherwise it's seen as verification
      failure, and won't be loaded.
      
      Seems kexec kernel image with an unqualified signature is even worse
      than those w/o signature at all, this sounds very unreasonable.  E.g.
      If people get a unsigned kernel to load, or a kernel signed with expired
      key, which one is more dangerous?
      
      So, here, let's simplify the logic to improve code readability.  If the
      KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature
      verification is mandated.  Otherwise, we lift the bar for any kernel
      image.
      
      Link: http://lkml.kernel.org/r/20200602045952.27487-1-lijiang@redhat.com
      
      
      Signed-off-by: default avatarLianbo Jiang <lijiang@redhat.com>
      Reviewed-by: default avatarJiri Bohac <jbohac@suse.cz>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Matthew Garrett <mjg59@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fd7af71b
  28. Jun 05, 2020
  29. Jan 08, 2020
  30. Nov 01, 2019
    • Helge Deller's avatar
      kexec: Fix pointer-to-int-cast warnings · f973cce0
      Helge Deller authored
      
      Fix two pointer-to-int-cast warnings when compiling for the 32-bit parisc
      platform:
      
      kernel/kexec_file.c: In function ‘crash_prepare_elf64_headers’:
      kernel/kexec_file.c:1307:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
        phdr->p_vaddr = (Elf64_Addr)_text;
                        ^
      kernel/kexec_file.c:1324:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
        phdr->p_vaddr = (unsigned long long) __va(mstart);
                        ^
      
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      f973cce0
  31. Aug 20, 2019
    • Matthew Garrett's avatar
      kexec: Allow kexec_file() with appropriate IMA policy when locked down · 29d3c1c8
      Matthew Garrett authored
      
      Systems in lockdown mode should block the kexec of untrusted kernels.
      For x86 and ARM we can ensure that a kernel is trustworthy by validating
      a PE signature, but this isn't possible on other architectures. On those
      platforms we can use IMA digital signatures instead. Add a function to
      determine whether IMA has or will verify signatures for a given event type,
      and if so permit kexec_file() even if the kernel is otherwise locked down.
      This is restricted to cases where CONFIG_INTEGRITY_TRUSTED_KEYRING is set
      in order to prevent an attacker from loading additional keys at runtime.
      
      Signed-off-by: default avatarMatthew Garrett <mjg59@google.com>
      Acked-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      Cc: linux-integrity@vger.kernel.org
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      29d3c1c8
Loading