Skip to content
Snippets Groups Projects
  1. Dec 23, 2022
    • Sean Christopherson's avatar
      KVM: nVMX: Properly expose ENABLE_USR_WAIT_PAUSE control to L1 · 31de69f4
      Sean Christopherson authored
      
      Set ENABLE_USR_WAIT_PAUSE in KVM's supported VMX MSR configuration if the
      feature is supported in hardware and enabled in KVM's base, non-nested
      configuration, i.e. expose ENABLE_USR_WAIT_PAUSE to L1 if it's supported.
      This fixes a bug where saving/restoring, i.e. migrating, a vCPU will fail
      if WAITPKG (the associated CPUID feature) is enabled for the vCPU, and
      obviously allows L1 to enable the feature for L2.
      
      KVM already effectively exposes ENABLE_USR_WAIT_PAUSE to L1 by stuffing
      the allowed-1 control ina vCPU's virtual MSR_IA32_VMX_PROCBASED_CTLS2 when
      updating secondary controls in response to KVM_SET_CPUID(2), but (a) that
      depends on flawed code (KVM shouldn't touch VMX MSRs in response to CPUID
      updates) and (b) runs afoul of vmx_restore_control_msr()'s restriction
      that the guest value must be a strict subset of the supported host value.
      
      Although no past commit explicitly enabled nested support for WAITPKG,
      doing so is safe and functionally correct from an architectural
      perspective as no additional KVM support is needed to virtualize TPAUSE,
      UMONITOR, and UMWAIT for L2 relative to L1, and KVM already forwards
      VM-Exits to L1 as necessary (commit bf653b78, "KVM: vmx: Introduce
      handle_unexpected_vmexit and handle WAITPKG vmexit").
      
      Note, KVM always keeps the hosts MSR_IA32_UMWAIT_CONTROL resident in
      hardware, i.e. always runs both L1 and L2 with the host's power management
      settings for TPAUSE and UMWAIT.  See commit bf09fb6c ("KVM: VMX: Stop
      context switching MSR_IA32_UMWAIT_CONTROL") for more details.
      
      Fixes: e69e72fa ("KVM: x86: Add support for user wait instructions")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAaron Lewis <aaronlewis@google.com>
      Reported-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Message-Id: <20221213062306.667649-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      31de69f4
    • Sean Christopherson's avatar
      KVM: nVMX: Document that ignoring memory failures for VMCLEAR is deliberate · 057b1875
      Sean Christopherson authored
      
      Explicitly drop the result of kvm_vcpu_write_guest() when writing the
      "launch state" as part of VMCLEAR emulation, and add a comment to call
      out that KVM's behavior is architecturally valid.  Intel's pseudocode
      effectively says that VMCLEAR is a nop if the target VMCS address isn't
      in memory, e.g. if the address points at MMIO.
      
      Add a FIXME to call out that suppressing failures on __copy_to_user() is
      wrong, as memory (a memslot) does exist in that case.  Punt the issue to
      the future as open coding kvm_vcpu_write_guest() just to make sure the
      guest dies with -EFAULT isn't worth the extra complexity.  The flaw will
      need to be addressed if KVM ever does something intelligent on uaccess
      failures, e.g. to support post-copy demand paging, but in that case KVM
      will need a more thorough overhaul, i.e. VMCLEAR shouldn't need to open
      code a core KVM helper.
      
      No functional change intended.
      
      Reported-by: default avatarcoverity-bot <keescook+coverity-bot@chromium.org>
      Addresses-Coverity-ID: 1527765 ("Error handling issues")
      Fixes: 587d7e72 ("kvm: nVMX: VMCLEAR should not cause the vCPU to shut down")
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221220154224.526568-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      057b1875
    • Sean Christopherson's avatar
      KVM: x86: Sanity check inputs to kvm_handle_memory_failure() · 77b1908e
      Sean Christopherson authored
      
      Add a sanity check in kvm_handle_memory_failure() to assert that a valid
      x86_exception structure is provided if the memory "failure" wants to
      propagate a fault into the guest.  If a memory failure happens during a
      direct guest physical memory access, e.g. for nested VMX, KVM hardcodes
      the failure to X86EMUL_IO_NEEDED and doesn't provide an exception pointer
      (because the exception struct would just be filled with garbage).
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221220153427.514032-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      77b1908e
    • Peng Hao's avatar
      KVM: x86: Simplify kvm_apic_hw_enabled · 3c649918
      Peng Hao authored
      
      kvm_apic_hw_enabled() only needs to return bool, there is no place
      to use the return value of MSR_IA32_APICBASE_ENABLE.
      
      Signed-off-by: default avatarPeng Hao <flyingpeng@tencent.com>
      Message-Id: <CAPm50aJ=BLXNWT11+j36Dd6d7nz2JmOBk4u7o_NPQ0N61ODu1g@mail.gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3c649918
    • Vitaly Kuznetsov's avatar
      KVM: x86: hyper-v: Fix 'using uninitialized value' Coverity warning · 8b9e13d2
      Vitaly Kuznetsov authored
      
      In kvm_hv_flush_tlb(), 'data_offset' and 'consumed_xmm_halves' variables
      are used in a mutually exclusive way: in 'hc->fast' we count in 'XMM
      halves' and increase 'data_offset' otherwise. Coverity discovered, that in
      one case both variables are incremented unconditionally. This doesn't seem
      to cause any issues as the only user of 'data_offset'/'consumed_xmm_halves'
      data is kvm_hv_get_tlb_flush_entries() -> kvm_hv_get_hc_data() which also
      takes into account 'hc->fast' but is still worth fixing.
      
      To make things explicit, put 'data_offset' and 'consumed_xmm_halves' to
      'struct kvm_hv_hcall' as a union and use at call sites. This allows to
      remove explicit 'data_offset'/'consumed_xmm_halves' parameters from
      kvm_hv_get_hc_data()/kvm_get_sparse_vp_set()/kvm_hv_get_tlb_flush_entries()
      helpers.
      
      Note: 'struct kvm_hv_hcall' is allocated on stack in kvm_hv_hypercall() and
      is not zeroed, consumers are supposed to initialize the appropriate field
      if needed.
      
      Reported-by: default avatarcoverity-bot <keescook+coverity-bot@chromium.org>
      Addresses-Coverity-ID: 1527764 ("Uninitialized variables")
      Fixes: 26097086 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221208102700.959630-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8b9e13d2
    • Adamos Ttofari's avatar
      KVM: x86: ioapic: Fix level-triggered EOI and userspace I/OAPIC reconfigure race · fceb3a36
      Adamos Ttofari authored
      
      When scanning userspace I/OAPIC entries, intercept EOI for level-triggered
      IRQs if the current vCPU has a pending and/or in-service IRQ for the
      vector in its local API, even if the vCPU doesn't match the new entry's
      destination.  This fixes a race between userspace I/OAPIC reconfiguration
      and IRQ delivery that results in the vector's bit being left set in the
      remote IRR due to the eventual EOI not being forwarded to the userspace
      I/OAPIC.
      
      Commit 0fc5a36d ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC
      reconfigure race") fixed the in-kernel IOAPIC, but not the userspace
      IOAPIC configuration, which has a similar race.
      
      Fixes: 0fc5a36d ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race")
      
      Signed-off-by: default avatarAdamos Ttofari <attofari@amazon.de>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221208094415.12723-1-attofari@amazon.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fceb3a36
    • Like Xu's avatar
      KVM: x86/pmu: Prevent zero period event from being repeatedly released · 55c590ad
      Like Xu authored
      
      The current vPMU can reuse the same pmc->perf_event for the same
      hardware event via pmc_pause/resume_counter(), but this optimization
      does not apply to a portion of the TSX events (e.g., "event=0x3c,in_tx=1,
      in_tx_cp=1"), where event->attr.sample_period is legally zero at creation,
      thus making the perf call to perf_event_period() meaningless (no need to
      adjust sample period in this case), and instead causing such reusable
      perf_events to be repeatedly released and created.
      
      Avoid releasing zero sample_period events by checking is_sampling_event()
      to follow the previously enable/disable optimization.
      
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20221207071506.15733-2-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      55c590ad
  2. Dec 05, 2022
  3. Dec 02, 2022
  4. Dec 01, 2022
Loading