Skip to content
Snippets Groups Projects
  1. Dec 17, 2022
  2. Dec 08, 2022
  3. Dec 07, 2022
  4. Dec 06, 2022
  5. Dec 02, 2022
  6. Dec 01, 2022
  7. Nov 30, 2022
    • Juergen Gross's avatar
      mm: introduce arch_has_hw_nonleaf_pmd_young() · 4aaf269c
      Juergen Gross authored
      When running as a Xen PV guests commit eed9a328 ("mm: x86: add
      CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
      pmdp_test_and_clear_young():
      
       BUG: unable to handle page fault for address: ffff8880083374d0
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0003) - permissions violation
       PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
       Oops: 0003 [#1] PREEMPT SMP NOPTI
       CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
       RIP: e030:pmdp_test_and_clear_young+0x25/0x40
      
      This happens because the Xen hypervisor can't emulate direct writes to
      page table entries other than PTEs.
      
      This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
      similar to arch_has_hw_pte_young() and test that instead of
      CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.
      
      Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com
      
      
      Fixes: eed9a328 ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reported-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Acked-by: default avatarYu Zhao <yuzhao@google.com>
      Tested-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Acked-by: David Hildenbrand <david@redhat.com>	[core changes]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4aaf269c
    • Juergen Gross's avatar
      mm: add dummy pmd_young() for architectures not having it · 6617da8f
      Juergen Gross authored
      In order to avoid #ifdeffery add a dummy pmd_young() implementation as a
      fallback.  This is required for the later patch "mm: introduce
      arch_has_hw_nonleaf_pmd_young()".
      
      Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com
      
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Sander Eikelenboom <linux@eikelenboom.it>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6617da8f
    • Paolo Bonzini's avatar
      KVM: x86: fix uninitialized variable use on KVM_REQ_TRIPLE_FAULT · e542baf3
      Paolo Bonzini authored
      
      If a triple fault was fixed by kvm_x86_ops.nested_ops->triple_fault (by
      turning it into a vmexit), there is no need to leave vcpu_enter_guest().
      Any vcpu->requests will be caught later before the actual vmentry,
      and in fact vcpu_enter_guest() was not initializing the "r" variable.
      Depending on the compiler's whims, this could cause the
      x86_64/triple_fault_event_test test to fail.
      
      Cc: Maxim Levitsky <mlevitsk@redhat.com>
      Fixes: 92e7d5c8 ("KVM: x86: allow L1 to not intercept triple fault")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e542baf3
    • Guo Ren's avatar
      riscv: kexec: Fixup crash_smp_send_stop without multi cores · 9b932aad
      Guo Ren authored
      
      Current crash_smp_send_stop is the same as the generic one in
      kernel/panic and misses crash_save_cpu in percpu. This patch is inspired
      by 78fd584c ("arm64: kdump: implement machine_crash_shutdown()")
      and adds the same mechanism for riscv.
      
      Before this patch, test result:
      crash> help -r
      CPU 0: [OFFLINE]
      
      CPU 1:
      epc : ffffffff80009ff0 ra : ffffffff800b789a sp : ff2000001098bb40
       gp : ffffffff815fca60 tp : ff60000004680000 t0 : 6666666666663c5b
       t1 : 0000000000000000 t2 : 666666666666663c s0 : ff2000001098bc90
       s1 : ffffffff81600798 a0 : ff2000001098bb48 a1 : 0000000000000000
       a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000
       a5 : ff60000004690800 a6 : 0000000000000000 a7 : 0000000000000000
       s2 : ff2000001098bb48 s3 : ffffffff81093ec8 s4 : ffffffff816004ac
       s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80e7f720
       s8 : 00fffffffffff3f0 s9 : 0000000000000007 s10: 00aaaaaaaab98700
       s11: 0000000000000001 t3 : ffffffff819a8097 t4 : ffffffff819a8097
       t5 : ffffffff819a8098 t6 : ff2000001098b9a8
      
      CPU 2: [OFFLINE]
      
      CPU 3: [OFFLINE]
      
      After this patch, test result:
      crash> help -r
      CPU 0:
      epc : ffffffff80003f34 ra : ffffffff808caa7c sp : ffffffff81403eb0
       gp : ffffffff815fcb48 tp : ffffffff81413400 t0 : 0000000000000000
       t1 : 0000000000000000 t2 : 0000000000000000 s0 : ffffffff81403ec0
       s1 : 0000000000000000 a0 : 0000000000000000 a1 : 0000000000000000
       a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
       a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
       s2 : ffffffff816001c8 s3 : ffffffff81600370 s4 : ffffffff80c32e18
       s5 : ffffffff819d3018 s6 : ffffffff810e2110 s7 : 0000000000000000
       s8 : 0000000000000000 s9 : 0000000080039eac s10: 0000000000000000
       s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000
       t5 : 0000000000000000 t6 : 0000000000000000
      
      CPU 1:
      epc : ffffffff80003f34 ra : ffffffff808caa7c sp : ff2000000068bf30
       gp : ffffffff815fcb48 tp : ff6000000240d400 t0 : 0000000000000000
       t1 : 0000000000000000 t2 : 0000000000000000 s0 : ff2000000068bf40
       s1 : 0000000000000001 a0 : 0000000000000000 a1 : 0000000000000000
       a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
       a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
       s2 : ffffffff816001c8 s3 : ffffffff81600370 s4 : ffffffff80c32e18
       s5 : ffffffff819d3018 s6 : ffffffff810e2110 s7 : 0000000000000000
       s8 : 0000000000000000 s9 : 0000000080039ea8 s10: 0000000000000000
       s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000
       t5 : 0000000000000000 t6 : 0000000000000000
      
      CPU 2:
      epc : ffffffff80003f34 ra : ffffffff808caa7c sp : ff20000000693f30
       gp : ffffffff815fcb48 tp : ff6000000240e900 t0 : 0000000000000000
       t1 : 0000000000000000 t2 : 0000000000000000 s0 : ff20000000693f40
       s1 : 0000000000000002 a0 : 0000000000000000 a1 : 0000000000000000
       a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
       a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
       s2 : ffffffff816001c8 s3 : ffffffff81600370 s4 : ffffffff80c32e18
       s5 : ffffffff819d3018 s6 : ffffffff810e2110 s7 : 0000000000000000
       s8 : 0000000000000000 s9 : 0000000080039eb0 s10: 0000000000000000
       s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000000000000000
       t5 : 0000000000000000 t6 : 0000000000000000
      
      CPU 3:
      epc : ffffffff8000a1e4 ra : ffffffff800b7bba sp : ff200000109bbb40
       gp : ffffffff815fcb48 tp : ff6000000373aa00 t0 : 6666666666663c5b
       t1 : 0000000000000000 t2 : 666666666666663c s0 : ff200000109bbc90
       s1 : ffffffff816007a0 a0 : ff200000109bbb48 a1 : 0000000000000000
       a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000
       a5 : ff60000002c61c00 a6 : 0000000000000000 a7 : 0000000000000000
       s2 : ff200000109bbb48 s3 : ffffffff810941a8 s4 : ffffffff816004b4
       s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80e7f7a0
       s8 : 00fffffffffff3f0 s9 : 0000000000000007 s10: 00aaaaaaaab98700
       s11: 0000000000000001 t3 : ffffffff819a8097 t4 : ffffffff819a8097
       t5 : ffffffff819a8098 t6 : ff200000109bb9a8
      
      Fixes: ad943893 ("RISC-V: Fixup schedule out issue in machine_crash_shutdown()")
      Reviewed-by: default avatarXianting Tian <xianting.tian@linux.alibaba.com>
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      Cc: Nick Kossifidis <mick@ics.forth.gr>
      Link: https://lore.kernel.org/r/20221020141603.2856206-3-guoren@kernel.org
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Unverified
      9b932aad
    • Guo Ren's avatar
      riscv: kexec: Fixup irq controller broken in kexec crash path · b17d19a5
      Guo Ren authored
      
      If a crash happens on cpu3 and all interrupts are binding on cpu0, the
      bad irq routing will cause a crash kernel which can't receive any irq.
      Because crash kernel won't clean up all harts' PLIC enable bits in
      enable registers. This patch is similar to 9141a003 ("ARM: 7316/1:
      kexec: EOI active and mask all interrupts in kexec crash path") and
      78fd584c ("arm64: kdump: implement machine_crash_shutdown()"), and
      PowerPC also has the same mechanism.
      
      Fixes: fba8a867 ("RISC-V: Add kexec support")
      Signed-off-by: default avatarGuo Ren <guoren@linux.alibaba.com>
      Signed-off-by: default avatarGuo Ren <guoren@kernel.org>
      Reviewed-by: default avatarXianting Tian <xianting.tian@linux.alibaba.com>
      Cc: Nick Kossifidis <mick@ics.forth.gr>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Link: https://lore.kernel.org/r/20221020141603.2856206-2-guoren@kernel.org
      
      
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Unverified
      b17d19a5
    • Björn Töpel's avatar
      riscv: mm: Proper page permissions after initmem free · 6fdd5d2f
      Björn Töpel authored
      
      64-bit RISC-V kernels have the kernel image mapped separately to alias
      the linear map. The linear map and the kernel image map are documented
      as "direct mapping" and "kernel" respectively in [1].
      
      At image load time, the linear map corresponding to the kernel image
      is set to PAGE_READ permission, and the kernel image map is set to
      PAGE_READ|PAGE_EXEC.
      
      When the initmem is freed, the pages in the linear map should be
      restored to PAGE_READ|PAGE_WRITE, whereas the corresponding pages in
      the kernel image map should be restored to PAGE_READ, by removing the
      PAGE_EXEC permission.
      
      This is not the case. For 64-bit kernels, only the linear map is
      restored to its proper page permissions at initmem free, and not the
      kernel image map.
      
      In practise this results in that the kernel can potentially jump to
      dead __init code, and start executing invalid instructions, without
      getting an exception.
      
      Restore the freed initmem properly, by setting both the kernel image
      map to the correct permissions.
      
      [1] Documentation/riscv/vm-layout.rst
      
      Fixes: e5c35fa0 ("riscv: Map the kernel with correct permissions the first time")
      Signed-off-by: default avatarBjörn Töpel <bjorn@rivosinc.com>
      Reviewed-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Tested-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Link: https://lore.kernel.org/r/20221115090641.258476-1-bjorn@kernel.org
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Unverified
      6fdd5d2f
    • Jisheng Zhang's avatar
      riscv: vdso: fix section overlapping under some conditions · 74f6bb55
      Jisheng Zhang authored
      lkp reported a build error, I tried the config and can reproduce
      build error as below:
      
        VDSOLD  arch/riscv/kernel/vdso/vdso.so.dbg
      ld.lld: error: section .note file range overlaps with .text
      >>> .note range is [0x7C8, 0x803]
      >>> .text range is [0x800, 0x1993]
      
      ld.lld: error: section .text file range overlaps with .dynamic
      >>> .text range is [0x800, 0x1993]
      >>> .dynamic range is [0x808, 0x937]
      
      ld.lld: error: section .note virtual address range overlaps with .text
      >>> .note range is [0x7C8, 0x803]
      >>> .text range is [0x800, 0x1993]
      
      Fix it by setting DISABLE_BRANCH_PROFILING which will disable branch
      tracing for vdso, thus avoid useless _ftrace_annotated_branch section
      and _ftrace_branch section. Although we can also fix it by removing
      the hardcoded .text begin address, but I think that's another story
      and should be put into another patch.
      
      Link: https://lore.kernel.org/lkml/202210122123.Cc4FPShJ-lkp@intel.com/#r
      
      
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Link: https://lore.kernel.org/r/20221102170254.1925-1-jszhang@kernel.org
      
      
      Fixes: ad5d1122 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Unverified
      74f6bb55
    • Jisheng Zhang's avatar
      riscv: fix race when vmap stack overflow · 7e186433
      Jisheng Zhang authored
      
      Currently, when detecting vmap stack overflow, riscv firstly switches
      to the so called shadow stack, then use this shadow stack to call the
      get_overflow_stack() to get the overflow stack. However, there's
      a race here if two or more harts use the same shadow stack at the same
      time.
      
      To solve this race, we introduce spin_shadow_stack atomic var, which
      will be swap between its own address and 0 in atomic way, when the
      var is set, it means the shadow_stack is being used; when the var
      is cleared, it means the shadow_stack isn't being used.
      
      Fixes: 31da94c2 ("riscv: add VMAP_STACK overflow detection")
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Suggested-by: default avatarGuo Ren <guoren@kernel.org>
      Reviewed-by: default avatarGuo Ren <guoren@kernel.org>
      Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org
      
      
      [Palmer: Add AQ to the swap, and also some comments.]
      Signed-off-by: default avatarPalmer Dabbelt <palmer@rivosinc.com>
      Unverified
      7e186433
  8. Nov 29, 2022
  9. Nov 28, 2022
  10. Nov 26, 2022
  11. Nov 25, 2022
  12. Nov 24, 2022
    • Thomas Huth's avatar
      KVM: s390: vsie: Fix the initialization of the epoch extension (epdx) field · 0dd4cdcc
      Thomas Huth authored
      We recently experienced some weird huge time jumps in nested guests when
      rebooting them in certain cases. After adding some debug code to the epoch
      handling in vsie.c (thanks to David Hildenbrand for the idea!), it was
      obvious that the "epdx" field (the multi-epoch extension) did not get set
      to 0xff in case the "epoch" field was negative.
      Seems like the code misses to copy the value from the epdx field from
      the guest to the shadow control block. By doing so, the weird time
      jumps are gone in our scenarios.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=2140899
      
      
      Fixes: 8fa1696e ("KVM: s390: Multiple Epoch Facility support")
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Cc: stable@vger.kernel.org # 4.19+
      Link: https://lore.kernel.org/r/20221123090833.292938-1-thuth@redhat.com
      
      
      Message-Id: <20221123090833.292938-1-thuth@redhat.com>
      Signed-off-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      0dd4cdcc
    • Heiko Carstens's avatar
      s390/crashdump: fix TOD programmable field size · f44e07a8
      Heiko Carstens authored
      
      The size of the TOD programmable field was incorrectly increased from
      four to eight bytes with commit 1a2c5840 ("s390/dump: cleanup CPU
      save area handling").
      This leads to an elf notes section NT_S390_TODPREG which has a size of
      eight instead of four bytes in case of kdump, however even worse is
      that the contents is incorrect: it is supposed to contain only the
      contents of the TOD programmable field, but in fact contains a mix of
      the TOD programmable field (32 bit upper bits) and parts of the CPU
      timer register (lower 32 bits).
      
      Fix this by simply changing the size of the todpreg field within the
      save area structure. This will implicitly also fix the size of the
      corresponding elf notes sections.
      
      This also gets rid of this compile time warning:
      
      in function ‘fortify_memcpy_chk’,
          inlined from ‘save_area_add_regs’ at arch/s390/kernel/crash_dump.c:99:2:
      ./include/linux/fortify-string.h:413:25: error: call to ‘__read_overflow2_field’
         declared with attribute warning: detected read beyond size of field
         (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
        413 |                         __read_overflow2_field(q_size_field, size);
            |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fixes: 1a2c5840 ("s390/dump: cleanup CPU save area handling")
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      f44e07a8
    • Christophe Leroy's avatar
      powerpc/bpf/32: Fix Oops on tail call tests · 89d21e25
      Christophe Leroy authored
      
      test_bpf tail call tests end up as:
      
        test_bpf: #0 Tail call leaf jited:1 85 PASS
        test_bpf: #1 Tail call 2 jited:1 111 PASS
        test_bpf: #2 Tail call 3 jited:1 145 PASS
        test_bpf: #3 Tail call 4 jited:1 170 PASS
        test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
        test_bpf: #5 Tail call load/store jited:1
        BUG: Unable to handle kernel data access on write at 0xf1b4e000
        Faulting instruction address: 0xbe86b710
        Oops: Kernel access of bad area, sig: 11 [#1]
        BE PAGE_SIZE=4K MMU=Hash PowerMac
        Modules linked in: test_bpf(+)
        CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195
        Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
        NIP:  be86b710 LR: be857e88 CTR: be86b704
        REGS: f1b4df20 TRAP: 0300   Not tainted  (6.1.0-rc4+)
        MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28008242  XER: 00000000
        DAR: f1b4e000 DSISR: 42000000
        GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000
        GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
        GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
        GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
        NIP [be86b710] 0xbe86b710
        LR [be857e88] __run_one+0xec/0x264 [test_bpf]
        Call Trace:
        [f1b4dfe0] [00000002] 0x2 (unreliable)
        Instruction dump:
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
        XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
        ---[ end trace 0000000000000000 ]---
      
      This is a tentative to write above the stack. The problem is encoutered
      with tests added by commit 38608ee7 ("bpf, tests: Add load store
      test case for tail call")
      
      This happens because tail call is done to a BPF prog with a different
      stack_depth. At the time being, the stack is kept as is when the caller
      tail calls its callee. But at exit, the callee restores the stack based
      on its own properties. Therefore here, at each run, r1 is erroneously
      increased by 32 - 16 = 16 bytes.
      
      This was done that way in order to pass the tail call count from caller
      to callee through the stack. As powerpc32 doesn't have a red zone in
      the stack, it was necessary the maintain the stack as is for the tail
      call. But it was not anticipated that the BPF frame size could be
      different.
      
      Let's take a new approach. Use register r4 to carry the tail call count
      during the tail call, and save it into the stack at function entry if
      required. This means the input parameter must be in r3, which is more
      correct as it is a 32 bits parameter, then tail call better match with
      normal BPF function entry, the down side being that we move that input
      parameter back and forth between r3 and r4. That can be optimised later.
      
      Doing that also has the advantage of maximising the common parts between
      tail calls and a normal function exit.
      
      With the fix, tail call tests are now successfull:
      
        test_bpf: #0 Tail call leaf jited:1 53 PASS
        test_bpf: #1 Tail call 2 jited:1 115 PASS
        test_bpf: #2 Tail call 3 jited:1 154 PASS
        test_bpf: #3 Tail call 4 jited:1 165 PASS
        test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
        test_bpf: #5 Tail call load/store jited:1 141 PASS
        test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS
        test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS
        test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS
        test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS
        test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
      
      Suggested-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Fixes: 51c66ad8 ("powerpc/bpf: Implement extended BPF on PPC32")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Tested-by: default avatarNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/757acccb7fbfc78efa42dcf3c974b46678198905.1669278887.git.christophe.leroy@csgroup.eu
      89d21e25
    • Peter Rosin's avatar
      ARM: at91: fix build for SAMA5D3 w/o L2 cache · 6a3fc8c3
      Peter Rosin authored
      
      The L2 cache is present on the newer SAMA5D2 and SAMA5D4 families, but
      apparently not for the older SAMA5D3.
      
      Solves a build-time regression with the following symptom:
      
      sama5.c:(.init.text+0x48): undefined reference to `outer_cache'
      
      Fixes: 3b5a7ca7 ("ARM: at91: setup outer cache .write_sec() callback if needed")
      Signed-off-by: default avatarPeter Rosin <peda@axentia.se>
      [claudiu.beznea: delete "At least not always." from commit description]
      Signed-off-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Link: https://lore.kernel.org/r/b7f8dacc-5e1f-0eb2-188e-3ad9a9f7613d@axentia.se
      6a3fc8c3
    • Masahiro Yamada's avatar
      kbuild: fix "cat: .version: No such file or directory" · 083cad78
      Masahiro Yamada authored
      
      Since commit 2df8220c ("kbuild: build init/built-in.a just once"),
      the .version file is not touched at all when KBUILD_BUILD_VERSION is
      given.
      
      If KBUILD_BUILD_VERSION is specified and the .version file is missing
      (for example right after 'make mrproper'), "No such file or director"
      is shown. Even if the .version exists, it is irrelevant to the version
      of the current build.
      
        $ make -j$(nproc) KBUILD_BUILD_VERSION=100 mrproper defconfig all
          [ snip ]
          BUILD   arch/x86/boot/bzImage
        cat: .version: No such file or directory
        Kernel: arch/x86/boot/bzImage is ready  (#)
      
      Show KBUILD_BUILD_VERSION if it is given.
      
      Fixes: 2df8220c ("kbuild: build init/built-in.a just once")
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNicolas Schier <nicolas@fjasle.eu>
      083cad78
  13. Nov 23, 2022
    • David Woodhouse's avatar
      KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0 · c2b8cdfa
      David Woodhouse authored
      
      There are almost no hypercalls which are valid from CPL > 0, and definitely
      none which are handled by the kernel.
      
      Fixes: 2fd6df2f ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
      Reported-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      c2b8cdfa
    • David Woodhouse's avatar
      KVM: x86/xen: Validate port number in SCHEDOP_poll · 4ea9439f
      David Woodhouse authored
      
      We shouldn't allow guests to poll on arbitrary port numbers off the end
      of the event channel table.
      
      Fixes: 1a65105a ("KVM: x86/xen: handle PV spinlocks slowpath")
      [dwmw2: my bug though; the original version did check the validity as a
       side-effect of an idr_find() which I ripped out in refactoring.]
      Reported-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      4ea9439f
    • Kazuki Takiguchi's avatar
      KVM: x86/mmu: Fix race condition in direct_page_fault · 47b0c2e4
      Kazuki Takiguchi authored
      
      make_mmu_pages_available() must be called with mmu_lock held for write.
      However, if the TDP MMU is used, it will be called with mmu_lock held for
      read.
      This function does nothing unless shadow pages are used, so there is no
      race unless nested TDP is used.
      Since nested TDP uses shadow pages, old shadow pages may be zapped by this
      function even when the TDP MMU is enabled.
      Since shadow pages are never allocated by kvm_tdp_mmu_map(), a race
      condition can be avoided by not calling make_mmu_pages_available() if the
      TDP MMU is currently in use.
      
      I encountered this when repeatedly starting and stopping nested VM.
      It can be artificially caused by allocating a large number of nested TDP
      SPTEs.
      
      For example, the following BUG and general protection fault are caused in
      the host kernel.
      
      pte_list_remove: 00000000cd54fc10 many->many
      ------------[ cut here ]------------
      kernel BUG at arch/x86/kvm/mmu/mmu.c:963!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      RIP: 0010:pte_list_remove.cold+0x16/0x48 [kvm]
      Call Trace:
       <TASK>
       drop_spte+0xe0/0x180 [kvm]
       mmu_page_zap_pte+0x4f/0x140 [kvm]
       __kvm_mmu_prepare_zap_page+0x62/0x3e0 [kvm]
       kvm_mmu_zap_oldest_mmu_pages+0x7d/0xf0 [kvm]
       direct_page_fault+0x3cb/0x9b0 [kvm]
       kvm_tdp_page_fault+0x2c/0xa0 [kvm]
       kvm_mmu_page_fault+0x207/0x930 [kvm]
       npf_interception+0x47/0xb0 [kvm_amd]
       svm_invoke_exit_handler+0x13c/0x1a0 [kvm_amd]
       svm_handle_exit+0xfc/0x2c0 [kvm_amd]
       kvm_arch_vcpu_ioctl_run+0xa79/0x1780 [kvm]
       kvm_vcpu_ioctl+0x29b/0x6f0 [kvm]
       __x64_sys_ioctl+0x95/0xd0
       do_syscall_64+0x5c/0x90
      
      general protection fault, probably for non-canonical address
      0xdead000000000122: 0000 [#1] PREEMPT SMP NOPTI
      RIP: 0010:kvm_mmu_commit_zap_page.part.0+0x4b/0xe0 [kvm]
      Call Trace:
       <TASK>
       kvm_mmu_zap_oldest_mmu_pages+0xae/0xf0 [kvm]
       direct_page_fault+0x3cb/0x9b0 [kvm]
       kvm_tdp_page_fault+0x2c/0xa0 [kvm]
       kvm_mmu_page_fault+0x207/0x930 [kvm]
       npf_interception+0x47/0xb0 [kvm_amd]
      
      CVE: CVE-2022-45869
      Fixes: a2855afc ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
      Signed-off-by: default avatarKazuki Takiguchi <takiguchi.kazuki171@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      47b0c2e4
  14. Nov 22, 2022
  15. Nov 21, 2022
    • Pawan Gupta's avatar
      x86/pm: Add enumeration check before spec MSRs save/restore setup · 50bcceb7
      Pawan Gupta authored
      
      pm_save_spec_msr() keeps a list of all the MSRs which _might_ need
      to be saved and restored at hibernate and resume. However, it has
      zero awareness of CPU support for these MSRs. It mostly works by
      unconditionally attempting to manipulate these MSRs and relying on
      rdmsrl_safe() being able to handle a #GP on CPUs where the support is
      unavailable.
      
      However, it's possible for reads (RDMSR) to be supported for a given MSR
      while writes (WRMSR) are not. In this case, msr_build_context() sees
      a successful read (RDMSR) and marks the MSR as valid. Then, later, a
      write (WRMSR) fails, producing a nasty (but harmless) error message.
      This causes restore_processor_state() to try and restore it, but writing
      this MSR is not allowed on the Intel Atom N2600 leading to:
      
        unchecked MSR access error: WRMSR to 0x122 (tried to write 0x0000000000000002) \
           at rIP: 0xffffffff8b07a574 (native_write_msr+0x4/0x20)
        Call Trace:
         <TASK>
         restore_processor_state
         x86_acpi_suspend_lowlevel
         acpi_suspend_enter
         suspend_devices_and_enter
         pm_suspend.cold
         state_store
         kernfs_fop_write_iter
         vfs_write
         ksys_write
         do_syscall_64
         ? do_syscall_64
         ? up_read
         ? lock_is_held_type
         ? asm_exc_page_fault
         ? lockdep_hardirqs_on
         entry_SYSCALL_64_after_hwframe
      
      To fix this, add the corresponding X86_FEATURE bit for each MSR.  Avoid
      trying to manipulate the MSR when the feature bit is clear. This
      required adding a X86_FEATURE bit for MSRs that do not have one already,
      but it's a small price to pay.
      
        [ bp: Move struct msr_enumeration inside the only function that uses it. ]
      
      Fixes: 73924ec4 ("x86/pm: Save the MSR validity status at context setup")
      Reported-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: <stable@kernel.org>
      Link: https://lore.kernel.org/r/c24db75d69df6e66c0465e13676ad3f2837a2ed8.1668539735.git.pawan.kumar.gupta@linux.intel.com
      50bcceb7
    • Pawan Gupta's avatar
      x86/tsx: Add a feature bit for TSX control MSR support · aaa65d17
      Pawan Gupta authored
      
      Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES.
      This is different from how other CPU features are enumerated i.e. via
      CPUID. Currently, a call to tsx_ctrl_is_supported() is required for
      enumerating the feature. In the absence of a feature bit for TSX control,
      any code that relies on checking feature bits directly will not work.
      
      In preparation for adding a feature bit check in MSR save/restore
      during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when
      MSR_IA32_TSX_CTRL is present. Also make tsx_ctrl_is_supported() use the
      new feature bit to avoid any overhead of reading the MSR.
      
        [ bp: Remove tsx_ctrl_is_supported(), add room for two more feature
          bits in word 11 which are coming up in the next merge window. ]
      
      Suggested-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@kernel.org>
      Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.1668539735.git.pawan.kumar.gupta@linux.intel.com
      aaa65d17
    • KaiLong Wang's avatar
      LoongArch: Fix unsigned comparison with less than zero · b96e74bb
      KaiLong Wang authored
      
      Eliminate the following coccicheck warning:
      
      ./arch/loongarch/kernel/unwind_prologue.c:84:5-13: WARNING: Unsigned
      expression compared with zero: frame_ra < 0
      
      Signed-off-by: default avatarKaiLong Wang <wangkailong@jari.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      b96e74bb
    • Huacai Chen's avatar
      LoongArch: Set _PAGE_DIRTY only if _PAGE_MODIFIED is set in {pmd,pte}_mkwrite() · 54e6cd42
      Huacai Chen authored
      
      Set _PAGE_DIRTY only if _PAGE_MODIFIED is set in {pmd,pte}_mkwrite().
      Otherwise, _PAGE_DIRTY silences the TLB modify exception and make us
      have no chance to mark a pmd/pte dirty (_PAGE_MODIFIED) for software.
      
      Reviewed-by: default avatarGuo Ren <guoren@kernel.org>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      54e6cd42
    • Huacai Chen's avatar
      LoongArch: Set _PAGE_DIRTY only if _PAGE_WRITE is set in {pmd,pte}_mkdirty() · bf2f34a5
      Huacai Chen authored
      
      Now {pmd,pte}_mkdirty() set _PAGE_DIRTY bit unconditionally, this causes
      random segmentation fault after commit 0ccf7f16 ("mm/thp: carry
      over dirty bit when thp splits on pmd").
      
      The reason is: when fork(), parent process use pmd_wrprotect() to clear
      huge page's _PAGE_WRITE and _PAGE_DIRTY (for COW); then pte_mkdirty() set
      _PAGE_DIRTY as well as _PAGE_MODIFIED while splitting dirty huge pages;
      once _PAGE_DIRTY is set, there will be no tlb modify exception so the COW
      machanism fails; and at last memory corruption occurred between parent
      and child processes.
      
      So, we should set _PAGE_DIRTY only when _PAGE_WRITE is set in {pmd,pte}_
      mkdirty().
      
      Cc: stable@vger.kernel.org
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      bf2f34a5
    • Huacai Chen's avatar
      LoongArch: Clear FPU/SIMD thread info flags for kernel thread · e428e961
      Huacai Chen authored
      
      If a kernel thread is created by a user thread, it may carry FPU/SIMD
      thread info flags (TIF_USEDFPU, TIF_USEDSIMD, etc.). Then it will be
      considered as a fpu owner and kernel try to save its FPU/SIMD context
      and cause such errors:
      
      [   41.518931] do_fpu invoked from kernel context![#1]:
      [   41.523933] CPU: 1 PID: 395 Comm: iou-wrk-394 Not tainted 6.1.0-rc5+ #217
      [   41.530757] Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.pre-beta8 08/18/2022
      [   41.544064] $ 0   : 0000000000000000 90000000011e9468 9000000106c7c000 9000000106c7fcf0
      [   41.552101] $ 4   : 9000000106305d40 9000000106689800 9000000106c7fd08 0000000003995818
      [   41.560138] $ 8   : 0000000000000001 90000000009a72e4 0000000000000020 fffffffffffffffc
      [   41.568174] $12   : 0000000000000000 0000000000000000 0000000000000020 00000009aab7e130
      [   41.576211] $16   : 00000000000001ff 0000000000000407 0000000000000001 0000000000000000
      [   41.584247] $20   : 0000000000000000 0000000000000001 9000000106c7fd70 90000001002f0400
      [   41.592284] $24   : 0000000000000000 900000000178f740 90000000011e9834 90000001063057c0
      [   41.600320] $28   : 0000000000000000 0000000000000001 9000000006826b40 9000000106305140
      [   41.608356] era   : 9000000000228848 _save_fp+0x0/0xd8
      [   41.613542] ra    : 90000000011e9468 __schedule+0x568/0x8d0
      [   41.619160] CSR crmd: 000000b0
      [   41.619163] CSR prmd: 00000000
      [   41.622359] CSR euen: 00000000
      [   41.625558] CSR ecfg: 00071c1c
      [   41.628756] CSR estat: 000f0000
      [   41.635239] ExcCode : f (SubCode 0)
      [   41.638783] PrId  : 0014c010 (Loongson-64bit)
      [   41.643191] Modules linked in: acpi_ipmi vfat fat ipmi_si ipmi_devintf cfg80211 ipmi_msghandler rfkill fuse efivarfs
      [   41.653734] Process iou-wrk-394 (pid: 395, threadinfo=0000000004ebe913, task=00000000636fa1be)
      [   41.662375] Stack : 00000000ffff0875 9000000006800ec0 9000000006800ec0 90000000002d57e0
      [   41.670412]         0000000000000001 0000000000000000 9000000106535880 0000000000000001
      [   41.678450]         9000000105291800 0000000000000000 9000000105291838 900000000178e000
      [   41.686487]         9000000106c7fd90 9000000106305140 0000000000000001 90000000011e9834
      [   41.694523]         00000000ffff0875 90000000011f034c 9000000105291838 9000000105291830
      [   41.702561]         0000000000000000 9000000006801440 00000000ffff0875 90000000002d48c0
      [   41.710597]         9000000128800001 9000000106305140 9000000105291838 9000000105291838
      [   41.718634]         9000000105291830 9000000107811740 9000000105291848 90000000009bf1e0
      [   41.726672]         9000000105291830 9000000107811748 2d6b72772d756f69 0000000000343933
      [   41.734708]         0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [   41.742745]         ...
      [   41.745252] Call Trace:
      [   42.197868] [<9000000000228848>] _save_fp+0x0/0xd8
      [   42.205214] [<90000000011ed468>] __schedule+0x568/0x8d0
      [   42.210485] [<90000000011ed834>] schedule+0x64/0xd4
      [   42.215411] [<90000000011f434c>] schedule_timeout+0x88/0x188
      [   42.221115] [<90000000009c36d0>] io_wqe_worker+0x184/0x350
      [   42.226645] [<9000000000221cf0>] ret_from_kernel_thread+0xc/0x9c
      
      This can be easily triggered by ltp testcase syscalls/io_uring02 and it
      can also be easily fixed by clearing the FPU/SIMD thread info flags for
      kernel threads in copy_thread().
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarQi Hu <huqi@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      e428e961
    • Huacai Chen's avatar
      LoongArch: SMP: Change prefix from loongson3 to loongson · c56ab8e8
      Huacai Chen authored
      
      SMP operations can be shared by Loongson-2 series and Loongson-3 series,
      so we change the prefix from loongson3 to loongson for all functions and
      data structures.
      
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      c56ab8e8
    • Huacai Chen's avatar
      LoongArch: Combine acpi_boot_table_init() and acpi_boot_init() · 538eafc6
      Huacai Chen authored
      
      Combine acpi_boot_table_init() and acpi_boot_init() since they are very
      simple, and we don't need to check the return value of acpi_boot_init().
      
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      538eafc6
    • Tiezhu Yang's avatar
      LoongArch: Makefile: Use "grep -E" instead of "egrep" · 83f638bc
      Tiezhu Yang authored
      
      The latest version of grep claims the egrep is now obsolete so the build
      now contains warnings that look like:
      	egrep: warning: egrep is obsolescent; using grep -E
      
      Fix this up by changing the LoongArch Makefile to use "grep -E" instead.
      
      Signed-off-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      83f638bc
  16. Nov 19, 2022
Loading