Skip to content
Snippets Groups Projects
  1. Dec 23, 2022
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't attempt to map leaf if target TDP MMU SPTE is frozen · f5d16bb9
      Sean Christopherson authored
      
      Hoist the is_removed_spte() check above the "level == goal_level" check
      when walking SPTEs during a TDP MMU page fault to avoid attempting to map
      a leaf entry if said entry is frozen by a different task/vCPU.
      
        ------------[ cut here ]------------
        WARNING: CPU: 3 PID: 939 at arch/x86/kvm/mmu/tdp_mmu.c:653 kvm_tdp_mmu_map+0x269/0x4b0
        Modules linked in: kvm_intel
        CPU: 3 PID: 939 Comm: nx_huge_pages_t Not tainted 6.1.0-rc4+ #67
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:kvm_tdp_mmu_map+0x269/0x4b0
        RSP: 0018:ffffc9000068fba8 EFLAGS: 00010246
        RAX: 00000000000005a0 RBX: ffffc9000068fcc0 RCX: 0000000000000005
        RDX: ffff88810741f000 RSI: ffff888107f04600 RDI: ffffc900006a3000
        RBP: 060000010b000bf3 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 000ffffffffff000 R12: 0000000000000005
        R13: ffff888113670000 R14: ffff888107464958 R15: 0000000000000000
        FS:  00007f01c942c740(0000) GS:ffff888277cc0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000117013006 CR4: 0000000000172ea0
        Call Trace:
         <TASK>
         kvm_tdp_page_fault+0x10c/0x130
         kvm_mmu_page_fault+0x103/0x680
         vmx_handle_exit+0x132/0x5a0 [kvm_intel]
         vcpu_enter_guest+0x60c/0x16f0
         kvm_arch_vcpu_ioctl_run+0x1e2/0x9d0
         kvm_vcpu_ioctl+0x271/0x660
         __x64_sys_ioctl+0x80/0xb0
         do_syscall_64+0x2b/0x50
         entry_SYSCALL_64_after_hwframe+0x46/0xb0
         </TASK>
        ---[ end trace 0000000000000000 ]---
      
      Fixes: 63d28a25 ("KVM: x86/mmu: simplify kvm_tdp_mmu_map flow when guest has to retry")
      Cc: Robert Hoo <robert.hu@linux.intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarRobert Hoo <robert.hu@linux.intel.com>
      Message-Id: <20221213033030.83345-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f5d16bb9
    • Sean Christopherson's avatar
      KVM: nVMX: Don't stuff secondary execution control if it's not supported · a0860d68
      Sean Christopherson authored
      
      When stuffing the allowed secondary execution controls for nested VMX in
      response to CPUID updates, don't set the allowed-1 bit for a feature that
      isn't supported by KVM, i.e. isn't allowed by the canonical vmcs_config.
      
      WARN if KVM attempts to manipulate a feature that isn't supported.  All
      features that are currently stuffed are always advertised to L1 for
      nested VMX if they are supported in KVM's base configuration, and no
      additional features should ever be added to the CPUID-induced stuffing
      (updating VMX MSRs in response to CPUID updates is a long-standing KVM
      flaw that is slowly being fixed).
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221213062306.667649-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a0860d68
    • Sean Christopherson's avatar
      KVM: nVMX: Properly expose ENABLE_USR_WAIT_PAUSE control to L1 · 31de69f4
      Sean Christopherson authored
      
      Set ENABLE_USR_WAIT_PAUSE in KVM's supported VMX MSR configuration if the
      feature is supported in hardware and enabled in KVM's base, non-nested
      configuration, i.e. expose ENABLE_USR_WAIT_PAUSE to L1 if it's supported.
      This fixes a bug where saving/restoring, i.e. migrating, a vCPU will fail
      if WAITPKG (the associated CPUID feature) is enabled for the vCPU, and
      obviously allows L1 to enable the feature for L2.
      
      KVM already effectively exposes ENABLE_USR_WAIT_PAUSE to L1 by stuffing
      the allowed-1 control ina vCPU's virtual MSR_IA32_VMX_PROCBASED_CTLS2 when
      updating secondary controls in response to KVM_SET_CPUID(2), but (a) that
      depends on flawed code (KVM shouldn't touch VMX MSRs in response to CPUID
      updates) and (b) runs afoul of vmx_restore_control_msr()'s restriction
      that the guest value must be a strict subset of the supported host value.
      
      Although no past commit explicitly enabled nested support for WAITPKG,
      doing so is safe and functionally correct from an architectural
      perspective as no additional KVM support is needed to virtualize TPAUSE,
      UMONITOR, and UMWAIT for L2 relative to L1, and KVM already forwards
      VM-Exits to L1 as necessary (commit bf653b78, "KVM: vmx: Introduce
      handle_unexpected_vmexit and handle WAITPKG vmexit").
      
      Note, KVM always keeps the hosts MSR_IA32_UMWAIT_CONTROL resident in
      hardware, i.e. always runs both L1 and L2 with the host's power management
      settings for TPAUSE and UMWAIT.  See commit bf09fb6c ("KVM: VMX: Stop
      context switching MSR_IA32_UMWAIT_CONTROL") for more details.
      
      Fixes: e69e72fa ("KVM: x86: Add support for user wait instructions")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAaron Lewis <aaronlewis@google.com>
      Reported-by: default avatarYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Message-Id: <20221213062306.667649-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      31de69f4
    • Sean Christopherson's avatar
      KVM: nVMX: Document that ignoring memory failures for VMCLEAR is deliberate · 057b1875
      Sean Christopherson authored
      
      Explicitly drop the result of kvm_vcpu_write_guest() when writing the
      "launch state" as part of VMCLEAR emulation, and add a comment to call
      out that KVM's behavior is architecturally valid.  Intel's pseudocode
      effectively says that VMCLEAR is a nop if the target VMCS address isn't
      in memory, e.g. if the address points at MMIO.
      
      Add a FIXME to call out that suppressing failures on __copy_to_user() is
      wrong, as memory (a memslot) does exist in that case.  Punt the issue to
      the future as open coding kvm_vcpu_write_guest() just to make sure the
      guest dies with -EFAULT isn't worth the extra complexity.  The flaw will
      need to be addressed if KVM ever does something intelligent on uaccess
      failures, e.g. to support post-copy demand paging, but in that case KVM
      will need a more thorough overhaul, i.e. VMCLEAR shouldn't need to open
      code a core KVM helper.
      
      No functional change intended.
      
      Reported-by: default avatarcoverity-bot <keescook+coverity-bot@chromium.org>
      Addresses-Coverity-ID: 1527765 ("Error handling issues")
      Fixes: 587d7e72 ("kvm: nVMX: VMCLEAR should not cause the vCPU to shut down")
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221220154224.526568-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      057b1875
    • Sean Christopherson's avatar
      KVM: x86: Sanity check inputs to kvm_handle_memory_failure() · 77b1908e
      Sean Christopherson authored
      
      Add a sanity check in kvm_handle_memory_failure() to assert that a valid
      x86_exception structure is provided if the memory "failure" wants to
      propagate a fault into the guest.  If a memory failure happens during a
      direct guest physical memory access, e.g. for nested VMX, KVM hardcodes
      the failure to X86EMUL_IO_NEEDED and doesn't provide an exception pointer
      (because the exception struct would just be filled with garbage).
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221220153427.514032-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      77b1908e
    • Peng Hao's avatar
      KVM: x86: Simplify kvm_apic_hw_enabled · 3c649918
      Peng Hao authored
      
      kvm_apic_hw_enabled() only needs to return bool, there is no place
      to use the return value of MSR_IA32_APICBASE_ENABLE.
      
      Signed-off-by: default avatarPeng Hao <flyingpeng@tencent.com>
      Message-Id: <CAPm50aJ=BLXNWT11+j36Dd6d7nz2JmOBk4u7o_NPQ0N61ODu1g@mail.gmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3c649918
    • Vitaly Kuznetsov's avatar
      KVM: x86: hyper-v: Fix 'using uninitialized value' Coverity warning · 8b9e13d2
      Vitaly Kuznetsov authored
      
      In kvm_hv_flush_tlb(), 'data_offset' and 'consumed_xmm_halves' variables
      are used in a mutually exclusive way: in 'hc->fast' we count in 'XMM
      halves' and increase 'data_offset' otherwise. Coverity discovered, that in
      one case both variables are incremented unconditionally. This doesn't seem
      to cause any issues as the only user of 'data_offset'/'consumed_xmm_halves'
      data is kvm_hv_get_tlb_flush_entries() -> kvm_hv_get_hc_data() which also
      takes into account 'hc->fast' but is still worth fixing.
      
      To make things explicit, put 'data_offset' and 'consumed_xmm_halves' to
      'struct kvm_hv_hcall' as a union and use at call sites. This allows to
      remove explicit 'data_offset'/'consumed_xmm_halves' parameters from
      kvm_hv_get_hc_data()/kvm_get_sparse_vp_set()/kvm_hv_get_tlb_flush_entries()
      helpers.
      
      Note: 'struct kvm_hv_hcall' is allocated on stack in kvm_hv_hypercall() and
      is not zeroed, consumers are supposed to initialize the appropriate field
      if needed.
      
      Reported-by: default avatarcoverity-bot <keescook+coverity-bot@chromium.org>
      Addresses-Coverity-ID: 1527764 ("Uninitialized variables")
      Fixes: 26097086 ("KVM: x86: hyper-v: Handle HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST{,EX} calls gently")
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221208102700.959630-1-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8b9e13d2
    • Adamos Ttofari's avatar
      KVM: x86: ioapic: Fix level-triggered EOI and userspace I/OAPIC reconfigure race · fceb3a36
      Adamos Ttofari authored
      
      When scanning userspace I/OAPIC entries, intercept EOI for level-triggered
      IRQs if the current vCPU has a pending and/or in-service IRQ for the
      vector in its local API, even if the vCPU doesn't match the new entry's
      destination.  This fixes a race between userspace I/OAPIC reconfiguration
      and IRQ delivery that results in the vector's bit being left set in the
      remote IRR due to the eventual EOI not being forwarded to the userspace
      I/OAPIC.
      
      Commit 0fc5a36d ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC
      reconfigure race") fixed the in-kernel IOAPIC, but not the userspace
      IOAPIC configuration, which has a similar race.
      
      Fixes: 0fc5a36d ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race")
      
      Signed-off-by: default avatarAdamos Ttofari <attofari@amazon.de>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20221208094415.12723-1-attofari@amazon.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fceb3a36
    • Like Xu's avatar
      KVM: x86/pmu: Prevent zero period event from being repeatedly released · 55c590ad
      Like Xu authored
      
      The current vPMU can reuse the same pmc->perf_event for the same
      hardware event via pmc_pause/resume_counter(), but this optimization
      does not apply to a portion of the TSX events (e.g., "event=0x3c,in_tx=1,
      in_tx_cp=1"), where event->attr.sample_period is legally zero at creation,
      thus making the perf call to perf_event_period() meaningless (no need to
      adjust sample period in this case), and instead causing such reusable
      perf_events to be repeatedly released and created.
      
      Avoid releasing zero sample_period events by checking is_sampling_event()
      to follow the previously enable/disable optimization.
      
      Signed-off-by: default avatarLike Xu <likexu@tencent.com>
      Message-Id: <20221207071506.15733-2-likexu@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      55c590ad
  2. Dec 05, 2022
  3. Dec 02, 2022
  4. Dec 01, 2022
Loading