1. 10 Oct, 2018 1 commit
    • Juergen Gross's avatar
      x86/boot: Add ACPI RSDP address to setup_header · ae7e1238
      Juergen Gross authored
      Xen PVH guests receive the address of the RSDP table from Xen. In order
      to support booting a Xen PVH guest via Grub2 using the standard x86
      boot entry we need a way for Grub2 to pass the RSDP address to the
      kernel.
      
      For this purpose expand the struct setup_header to hold the physical
      address of the RSDP address. Being zero means it isn't specified and
      has to be located the legacy way (searching through low memory or
      EBDA).
      
      While documenting the new setup_header layout and protocol version
      2.14 add the missing documentation of protocol version 2.13.
      
      There are Grub2 versions in several distros with a downstream patch
      violating the boot protocol by writing past the end of setup_header.
      This requires another update of the boot protocol to enable the kernel
      to distinguish between a specified RSDP address and one filled with
      garbage by such a broken Grub2.
      
      From protocol 2.14 on Grub2 will write the version it is supporting
      (but never a higher value than found to be supported by the kernel)
      ored with 0x8000 to the version field of setup_header. This enables
      the kernel to know up to which field Grub2 has written information
      to. All fields after that are supposed to be clobbered.
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: bp@alien8.de
      Cc: corbet@lwn.net
      Cc: linux-doc@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20181010061456.22238-3-jgross@suse.comSigned-off-by: Ingo Molnar's avatarIngo Molnar <mingo@kernel.org>
      ae7e1238
  2. 03 Oct, 2018 1 commit
  3. 02 Oct, 2018 1 commit
    • Mike Travis's avatar
      x86/tsc: Fix UV TSC initialization · 2647c43c
      Mike Travis authored
      The recent rework of the TSC calibration code introduced a regression on UV
      systems as it added a call to tsc_early_init() which initializes the TSC
      ADJUST values before acpi_boot_table_init().  In the case of UV systems,
      that is a necessary step that calls uv_system_init().  This informs
      tsc_sanitize_first_cpu() that the kernel runs on a platform with async TSC
      resets as documented in commit 341102c3 ("x86/tsc: Add option that TSC
      on Socket 0 being non-zero is valid")
      
      Fix it by skipping the early tsc initialization on UV systems and let TSC
      init tests take place later in tsc_init().
      
      Fixes: cf7a63ef ("x86/tsc: Calibrate tsc only once")
      Suggested-by: default avatarHedi Berriche <hedi.berriche@hpe.com>
      Signed-off-by: default avatarMike Travis <mike.travis@hpe.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarRuss Anderson <rja@hpe.com>
      Reviewed-by: default avatarDimitri Sivanich <sivanich@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Xiaoming Gao <gxm.linux.kernel@gmail.com>
      Cc: Rajvi Jingar <rajvi.jingar@intel.com>
      Link: https://lkml.kernel.org/r/20181002180144.923579706@stormcage.americas.sgi.com
      2647c43c
  4. 20 Sep, 2018 1 commit
  5. 19 Sep, 2018 1 commit
  6. 18 Sep, 2018 9 commits
  7. 15 Sep, 2018 3 commits
    • Brijesh Singh's avatar
      x86/kvm: Use __bss_decrypted attribute in shared variables · 6a1cac56
      Brijesh Singh authored
      The recent removal of the memblock dependency from kvmclock caused a SEV
      guest regression because the wall_clock and hv_clock_boot variables are
      no longer mapped decrypted when SEV is active.
      
      Use the __bss_decrypted attribute to put the static wall_clock and
      hv_clock_boot in the .bss..decrypted section so that they are mapped
      decrypted during boot.
      
      In the preparatory stage of CPU hotplug, the per-cpu pvclock data pointer
      assigns either an element of the static array or dynamically allocated
      memory for the pvclock data pointer. The static array are now mapped
      decrypted but the dynamically allocated memory is not mapped decrypted.
      However, when SEV is active this memory range must be mapped decrypted.
      
      Add a function which is called after the page allocator is up, and
      allocate memory for the pvclock data pointers for the all possible cpus.
      Map this memory range as decrypted when SEV is active.
      
      Fixes: 368a540e ("x86/kvmclock: Remove memblock dependency")
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Link: https://lkml.kernel.org/r/1536932759-12905-3-git-send-email-brijesh.singh@amd.com
      6a1cac56
    • Brijesh Singh's avatar
      x86/mm: Add .bss..decrypted section to hold shared variables · b3f0907c
      Brijesh Singh authored
      kvmclock defines few static variables which are shared with the
      hypervisor during the kvmclock initialization.
      
      When SEV is active, memory is encrypted with a guest-specific key, and
      if the guest OS wants to share the memory region with the hypervisor
      then it must clear the C-bit before sharing it.
      
      Currently, we use kernel_physical_mapping_init() to split large pages
      before clearing the C-bit on shared pages. But it fails when called from
      the kvmclock initialization (mainly because the memblock allocator is
      not ready that early during boot).
      
      Add a __bss_decrypted section attribute which can be used when defining
      such shared variable. The so-defined variables will be placed in the
      .bss..decrypted section. This section will be mapped with C=0 early
      during boot.
      
      The .bss..decrypted section has a big chunk of memory that may be unused
      when memory encryption is not active, free it when memory encryption is
      not active.
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Radim Krčmář<rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Link: https://lkml.kernel.org/r/1536932759-12905-2-git-send-email-brijesh.singh@amd.com
      b3f0907c
    • Randy Dunlap's avatar
      x86/APM: Fix build warning when PROC_FS is not enabled · 002b87d2
      Randy Dunlap authored
      Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled:
      
      ../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function]
       static int proc_apm_show(struct seq_file *m, void *v)
      
      Fixes: 3f3942ac ("proc: introduce proc_create_single{,_data}")
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Jiri Kosina <jikos@kernel.org>
      Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org
      
      002b87d2
  8. 12 Sep, 2018 1 commit
  9. 11 Sep, 2018 1 commit
  10. 08 Sep, 2018 1 commit
  11. 06 Sep, 2018 2 commits
  12. 02 Sep, 2018 2 commits
  13. 31 Aug, 2018 1 commit
  14. 30 Aug, 2018 2 commits
  15. 27 Aug, 2018 2 commits
  16. 24 Aug, 2018 1 commit
  17. 23 Aug, 2018 1 commit
  18. 20 Aug, 2018 2 commits
    • Dan Williams's avatar
      x86/memory_failure: Introduce {set, clear}_mce_nospec() · 284ce401
      Dan Williams authored
      Currently memory_failure() returns zero if the error was handled. On
      that result mce_unmap_kpfn() is called to zap the page out of the kernel
      linear mapping to prevent speculative fetches of potentially poisoned
      memory. However, in the case of dax mapped devmap pages the page may be
      in active permanent use by the device driver, so it cannot be unmapped
      from the kernel.
      
      Instead of marking the page not present, marking the page UC should
      be sufficient for preventing poison from being pre-fetched into the
      cache. Convert mce_unmap_pfn() to set_mce_nospec() remapping the page as
      UC, to hide it from speculative accesses.
      
      Given that that persistent memory errors can be cleared by the driver,
      include a facility to restore the page to cacheable operation,
      clear_mce_nospec().
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: <linux-edac@vger.kernel.org>
      Cc: <x86@kernel.org>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      284ce401
    • Rian Hunter's avatar
      x86/process: Re-export start_thread() · dc76803e
      Rian Hunter authored
      The consolidation of the start_thread() functions removed the export
      unintentionally. This breaks binfmt handlers built as a module.
      
      Add it back.
      
      Fixes: e634d8fc ("x86-64: merge the standard and compat start_thread() functions")
      Signed-off-by: default avatarRian Hunter <rian@alum.mit.edu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Dmitry Safonov <dima@arista.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180819230854.7275-1-rian@alum.mit.edu
      dc76803e
  19. 16 Aug, 2018 1 commit
  20. 15 Aug, 2018 1 commit
  21. 14 Aug, 2018 2 commits
  22. 10 Aug, 2018 1 commit
  23. 07 Aug, 2018 2 commits
    • Peter Zijlstra's avatar
      x86/paravirt: Fix spectre-v2 mitigations for paravirt guests · 5800dc5c
      Peter Zijlstra authored
      Nadav reported that on guests we're failing to rewrite the indirect
      calls to CALLEE_SAVE paravirt functions. In particular the
      pv_queued_spin_unlock() call is left unpatched and that is all over the
      place. This obviously wrecks Spectre-v2 mitigation (for paravirt
      guests) which relies on not actually having indirect calls around.
      
      The reason is an incorrect clobber test in paravirt_patch_call(); this
      function rewrites an indirect call with a direct call to the _SAME_
      function, there is no possible way the clobbers can be different
      because of this.
      
      Therefore remove this clobber check. Also put WARNs on the other patch
      failure case (not enough room for the instruction) which I've not seen
      trigger in my (limited) testing.
      
      Three live kernel image disassemblies for lock_sock_nested (as a small
      function that illustrates the problem nicely). PRE is the current
      situation for guests, POST is with this patch applied and NATIVE is with
      or without the patch for !guests.
      
      PRE:
      
      (gdb) disassemble lock_sock_nested
      Dump of assembler code for function lock_sock_nested:
         0xffffffff817be970 <+0>:     push   %rbp
         0xffffffff817be971 <+1>:     mov    %rdi,%rbp
         0xffffffff817be974 <+4>:     push   %rbx
         0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
         0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
         0xffffffff817be981 <+17>:    mov    %rbx,%rdi
         0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
         0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
         0xffffffff817be98f <+31>:    test   %eax,%eax
         0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
         0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
         0xffffffff817be99d <+45>:    mov    %rbx,%rdi
         0xffffffff817be9a0 <+48>:    callq  *0xffffffff822299e8
         0xffffffff817be9a7 <+55>:    pop    %rbx
         0xffffffff817be9a8 <+56>:    pop    %rbp
         0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
         0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
         0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
         0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
         0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
         0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
      End of assembler dump.
      
      POST:
      
      (gdb) disassemble lock_sock_nested
      Dump of assembler code for function lock_sock_nested:
         0xffffffff817be970 <+0>:     push   %rbp
         0xffffffff817be971 <+1>:     mov    %rdi,%rbp
         0xffffffff817be974 <+4>:     push   %rbx
         0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
         0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
         0xffffffff817be981 <+17>:    mov    %rbx,%rdi
         0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
         0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
         0xffffffff817be98f <+31>:    test   %eax,%eax
         0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
         0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
         0xffffffff817be99d <+45>:    mov    %rbx,%rdi
         0xffffffff817be9a0 <+48>:    callq  0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
         0xffffffff817be9a5 <+53>:    xchg   %ax,%ax
         0xffffffff817be9a7 <+55>:    pop    %rbx
         0xffffffff817be9a8 <+56>:    pop    %rbp
         0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
         0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
         0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063aa0 <__local_bh_enable_ip>
         0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
         0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
         0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
      End of assembler dump.
      
      NATIVE:
      
      (gdb) disassemble lock_sock_nested
      Dump of assembler code for function lock_sock_nested:
         0xffffffff817be970 <+0>:     push   %rbp
         0xffffffff817be971 <+1>:     mov    %rdi,%rbp
         0xffffffff817be974 <+4>:     push   %rbx
         0xffffffff817be975 <+5>:     lea    0x88(%rbp),%rbx
         0xffffffff817be97c <+12>:    callq  0xffffffff819f7160 <_cond_resched>
         0xffffffff817be981 <+17>:    mov    %rbx,%rdi
         0xffffffff817be984 <+20>:    callq  0xffffffff819fbb00 <_raw_spin_lock_bh>
         0xffffffff817be989 <+25>:    mov    0x8c(%rbp),%eax
         0xffffffff817be98f <+31>:    test   %eax,%eax
         0xffffffff817be991 <+33>:    jne    0xffffffff817be9ba <lock_sock_nested+74>
         0xffffffff817be993 <+35>:    movl   $0x1,0x8c(%rbp)
         0xffffffff817be99d <+45>:    mov    %rbx,%rdi
         0xffffffff817be9a0 <+48>:    movb   $0x0,(%rdi)
         0xffffffff817be9a3 <+51>:    nopl   0x0(%rax)
         0xffffffff817be9a7 <+55>:    pop    %rbx
         0xffffffff817be9a8 <+56>:    pop    %rbp
         0xffffffff817be9a9 <+57>:    mov    $0x200,%esi
         0xffffffff817be9ae <+62>:    mov    $0xffffffff817be993,%rdi
         0xffffffff817be9b5 <+69>:    jmpq   0xffffffff81063ae0 <__local_bh_enable_ip>
         0xffffffff817be9ba <+74>:    mov    %rbp,%rdi
         0xffffffff817be9bd <+77>:    callq  0xffffffff817be8c0 <__lock_sock>
         0xffffffff817be9c2 <+82>:    jmp    0xffffffff817be993 <lock_sock_nested+35>
      End of assembler dump.
      
      
      Fixes: 63f70270 ("[PATCH] i386: PARAVIRT: add common patching machinery")
      Fixes: 3010a066 ("x86/paravirt, objtool: Annotate indirect calls")
      Reported-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: stable@vger.kernel.org
      5800dc5c
    • Thomas Gleixner's avatar
      cpu/hotplug: Fix SMT supported evaluation · bc2d8d26
      Thomas Gleixner authored
      Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
      cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
      on the kernel command line as it cannot differentiate between SMT disabled
      by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
      makes the sysfs interface unusable.
      
      Rework this so that during bringup of the non boot CPUs the availability of
      SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
      'primary' thread then set the local cpu_smt_available marker and evaluate
      this explicitely right after the initial SMP bringup has finished.
      
      SMT evaulation on x86 is a trainwreck as the firmware has all the
      information _before_ booting the kernel, but there is no interface to query
      it.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bc2d8d26