Skip to content
Snippets Groups Projects
  1. Nov 26, 2021
  2. Nov 18, 2021
    • Sean Christopherson's avatar
      KVM: Disallow user memslot with size that exceeds "unsigned long" · 6b285a55
      Sean Christopherson authored
      
      Reject userspace memslots whose size exceeds the storage capacity of an
      "unsigned long".  KVM's uAPI takes the size as u64 to support large slots
      on 64-bit hosts, but does not account for the size being truncated on
      32-bit hosts in various flows.  The access_ok() check on the userspace
      virtual address in particular casts the size to "unsigned long" and will
      check the wrong number of bytes.
      
      KVM doesn't actually support slots whose size doesn't fit in an "unsigned
      long", e.g. KVM's internal kvm_memory_slot.npages is an "unsigned long",
      not a "u64", and misc arch specific code follows that behavior.
      
      Fixes: fa3d315a ("KVM: Validate userspace_addr of memslot when registered")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <20211104002531.1176691-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      6b285a55
    • Sean Christopherson's avatar
      KVM: Ensure local memslot copies operate on up-to-date arch-specific data · bda44d84
      Sean Christopherson authored
      
      When modifying memslots, snapshot the "old" memslot and copy it to the
      "new" memslot's arch data after (re)acquiring slots_arch_lock.  x86 can
      change a memslot's arch data while memslot updates are in-progress so
      long as it holds slots_arch_lock, thus snapshotting a memslot without
      holding the lock can result in the consumption of stale data.
      
      Fixes: b10a038e ("KVM: mmu: Add slots_arch_lock for memslot arch fields")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20211104002531.1176691-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bda44d84
    • David Woodhouse's avatar
      KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache · 357a18ad
      David Woodhouse authored
      
      In commit 7e2175eb ("KVM: x86: Fix recording of guest steal time /
      preempted status") I removed the only user of these functions because
      it was basically impossible to use them safely.
      
      There are two stages to the GFN->PFN mapping; first through the KVM
      memslots to a userspace HVA and then through the page tables to
      translate that HVA to an underlying PFN. Invalidations of the former
      were being handled correctly, but no attempt was made to use the MMU
      notifiers to invalidate the cache when the HVA->GFN mapping changed.
      
      As a prelude to reinventing the gfn_to_pfn_cache with more usable
      semantics, rip it out entirely and untangle the implementation of
      the unsafe kvm_vcpu_map()/kvm_vcpu_unmap() functions from it.
      
      All current users of kvm_vcpu_map() also look broken right now, and
      will be dealt with separately. They broadly fall into two classes:
      
      * Those which map, access the data and immediately unmap. This is
        mostly gratuitous and could just as well use the existing user
        HVA, and could probably benefit from a gfn_to_hva_cache as they
        do so.
      
      * Those which keep the mapping around for a longer time, perhaps
        even using the PFN directly from the guest. These will need to
        be converted to the new gfn_to_pfn_cache and then kvm_vcpu_map()
        can be removed too.
      
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20211115165030.7422-8-dwmw2@infradead.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      357a18ad
  3. Nov 11, 2021
  4. Sep 30, 2021
  5. Sep 23, 2021
  6. Sep 22, 2021
    • Sean Christopherson's avatar
      KVM: KVM: Use cpumask_available() to check for NULL cpumask when kicking vCPUs · 0bbc2ca8
      Sean Christopherson authored
      
      Check for a NULL cpumask_var_t when kicking multiple vCPUs via
      cpumask_available(), which performs a !NULL check if and only if cpumasks
      are configured to be allocated off-stack.  This is a meaningless
      optimization, e.g. avoids a TEST+Jcc and TEST+CMOV on x86, but more
      importantly helps document that the NULL check is necessary even though
      all callers pass in a local variable.
      
      No functional change intended.
      
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210827092516.1027264-3-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0bbc2ca8
    • Sean Christopherson's avatar
      KVM: Clean up benign vcpu->cpu data races when kicking vCPUs · 85b64045
      Sean Christopherson authored
      Fix a benign data race reported by syzbot+KCSAN[*] by ensuring vcpu->cpu
      is read exactly once, and by ensuring the vCPU is booted from guest mode
      if kvm_arch_vcpu_should_kick() returns true.  Fix a similar race in
      kvm_make_vcpus_request_mask() by ensuring the vCPU is interrupted if
      kvm_request_needs_ipi() returns true.
      
      Reading vcpu->cpu before vcpu->mode (via kvm_arch_vcpu_should_kick() or
      kvm_request_needs_ipi()) means the target vCPU could get migrated (change
      vcpu->cpu) and enter !OUTSIDE_GUEST_MODE between reading vcpu->cpud and
      reading vcpu->mode.  If that happens, the kick/IPI will be sent to the
      old pCPU, not the new pCPU that is now running the vCPU or reading SPTEs.
      
      Although failing to kick the vCPU is not exactly ideal, practically
      speaking it cannot cause a functional issue unless there is also a bug in
      the caller, and any such bug would exist regardless of kvm_vcpu_kick()'s
      behavior.
      
      The purpose of sending an IPI is purely to get a vCPU into the host (or
      out of reading SPTEs) so that the vCPU can recognize a change in state,
      e.g. a KVM_REQ_* request.  If vCPU's handling of the state change is
      required for correctness, KVM must ensure either the vCPU sees the change
      before entering the guest, or that the sender sees the vCPU as running in
      guest mode.  All architectures handle this by (a) sending the request
      before calling kvm_vcpu_kick() and (b) checking for requests _after_
      setting vcpu->mode.
      
      x86's READING_SHADOW_PAGE_TABLES has similar requirements; KVM needs to
      ensure it kicks and waits for vCPUs that started reading SPTEs _before_
      MMU changes were finalized, but any vCPU that starts reading after MMU
      changes were finalized will see the new state and can continue on
      uninterrupted.
      
      For uses of kvm_vcpu_kick() that are not paired with a KVM_REQ_*, e.g.
      x86's kvm_arch_sync_dirty_log(), the order of the kick must not be relied
      upon for functional correctness, e.g. in the dirty log case, userspace
      cannot assume it has a 100% complete log if vCPUs are still running.
      
      All that said, eliminate the benign race since the cost of doing so is an
      "extra" atomic cmpxchg() in the case where the target vCPU is loaded by
      the current pCPU or is not loaded at all.  I.e. the kick will be skipped
      due to kvm_vcpu_exiting_guest_mode() seeing a compatible vcpu->mode as
      opposed to the kick being skipped because of the cpu checks.
      
      Keep the "cpu != me" checks even though they appear useless/impossible at
      first glance.  x86 processes guest IPI writes in a fast path that runs in
      IN_GUEST_MODE, i.e. can call kvm_vcpu_kick() from IN_GUEST_MODE.  And
      calling kvm_vm_bugged()->kvm_make_vcpus_request_mask() from IN_GUEST or
      READING_SHADOW_PAGE_TABLES is perfectly reasonable.
      
      Note, a race with the cpu_online() check in kvm_vcpu_kick() likely
      persists, e.g. the vCPU could exit guest mode and get offlined between
      the cpu_online() check and the sending of smp_send_reschedule().  But,
      the online check appears to exist only to avoid a WARN in x86's
      native_smp_send_reschedule() that fires if the target CPU is not online.
      The reschedule WARN exists because CPU offlining takes the CPU out of the
      scheduling pool, i.e. the WARN is intended to detect the case where the
      kernel attempts to schedule a task on an offline CPU.  The actual sending
      of the IPI is a non-issue as at worst it will simpy be dropped on the
      floor.  In other words, KVM's usurping of the reschedule IPI could
      theoretically trigger a WARN if the stars align, but there will be no
      loss of functionality.
      
      [*] https://syzkaller.appspot.com/bug?extid=cd4154e502f43f10808a
      
      
      
      Cc: Venkatesh Srinivas <venkateshs@google.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Fixes: 97222cc8 ("KVM: Emulate local APIC in kernel")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210827092516.1027264-2-vkuznets@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      85b64045
    • Sergey Senozhatsky's avatar
      KVM: do not shrink halt_poll_ns below grow_start · ae232ea4
      Sergey Senozhatsky authored
      
      grow_halt_poll_ns() ignores values between 0 and
      halt_poll_ns_grow_start (10000 by default). However,
      when we shrink halt_poll_ns we may fall way below
      halt_poll_ns_grow_start and endup with halt_poll_ns
      values that don't make a lot of sense: like 1 or 9,
      or 19.
      
      VCPU1 trace (halt_poll_ns_shrink equals 2):
      
      VCPU1 grow 10000
      VCPU1 shrink 5000
      VCPU1 shrink 2500
      VCPU1 shrink 1250
      VCPU1 shrink 625
      VCPU1 shrink 312
      VCPU1 shrink 156
      VCPU1 shrink 78
      VCPU1 shrink 39
      VCPU1 shrink 19
      VCPU1 shrink 9
      VCPU1 shrink 4
      
      Mirror what grow_halt_poll_ns() does and set halt_poll_ns
      to 0 as soon as new shrink-ed halt_poll_ns value falls
      below halt_poll_ns_grow_start.
      
      Signed-off-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20210902031100.252080-1-senozhatsky@chromium.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ae232ea4
  7. Sep 06, 2021
  8. Aug 20, 2021
  9. Aug 13, 2021
  10. Aug 06, 2021
    • David Matlack's avatar
      KVM: Cache the last used slot index per vCPU · fe22ed82
      David Matlack authored
      
      The memslot for a given gfn is looked up multiple times during page
      fault handling. Avoid binary searching for it multiple times by caching
      the most recently used slot. There is an existing VM-wide last_used_slot
      but that does not work well for cases where vCPUs are accessing memory
      in different slots (see performance data below).
      
      Another benefit of caching the most recently use slot (versus looking
      up the slot once and passing around a pointer) is speeding up memslot
      lookups *across* faults and during spte prefetching.
      
      To measure the performance of this change I ran dirty_log_perf_test with
      64 vCPUs and 64 memslots and measured "Populate memory time" and
      "Iteration 2 dirty memory time".  Tests were ran with eptad=N to force
      dirty logging to use fast_page_fault so its performance could be
      measured.
      
      Config     | Metric                        | Before | After
      ---------- | ----------------------------- | ------ | ------
      tdp_mmu=Y  | Populate memory time          | 6.76s  | 5.47s
      tdp_mmu=Y  | Iteration 2 dirty memory time | 2.83s  | 0.31s
      tdp_mmu=N  | Populate memory time          | 20.4s  | 18.7s
      tdp_mmu=N  | Iteration 2 dirty memory time | 2.65s  | 0.30s
      
      The "Iteration 2 dirty memory time" results are especially compelling
      because they are equivalent to running the same test with a single
      memslot. In other words, fast_page_fault performance no longer scales
      with the number of memslots.
      
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210804222844.1419481-4-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe22ed82
    • David Matlack's avatar
      KVM: Rename lru_slot to last_used_slot · 87689270
      David Matlack authored
      
      lru_slot is used to keep track of the index of the most-recently used
      memslot. The correct acronym would be "mru" but that is not a common
      acronym. So call it last_used_slot which is a bit more obvious.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Message-Id: <20210804222844.1419481-2-dmatlack@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      87689270
  11. Aug 04, 2021
    • Paolo Bonzini's avatar
      KVM: Do not leak memory for duplicate debugfs directories · 85cd39af
      Paolo Bonzini authored
      
      KVM creates a debugfs directory for each VM in order to store statistics
      about the virtual machine.  The directory name is built from the process
      pid and a VM fd.  While generally unique, it is possible to keep a
      file descriptor alive in a way that causes duplicate directories, which
      manifests as these messages:
      
        [  471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
      
      Even though this should not happen in practice, it is more or less
      expected in the case of KVM for testcases that call KVM_CREATE_VM and
      close the resulting file descriptor repeatedly and in parallel.
      
      When this happens, debugfs_create_dir() returns an error but
      kvm_create_vm_debugfs() goes on to allocate stat data structs which are
      later leaked.  The slow memory leak was spotted by syzkaller, where it
      caused OOM reports.
      
      Since the issue only affects debugfs, do a lookup before calling
      debugfs_create_dir, so that the message is downgraded and rate-limited.
      While at it, ensure kvm->debugfs_dentry is NULL rather than an error
      if it is not created.  This fixes kvm_destroy_vm_debugfs, which was not
      checking IS_ERR_OR_NULL correctly.
      
      Cc: stable@vger.kernel.org
      Fixes: 536a6f88 ("KVM: Create debugfs dir and stat files for each VM")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Suggested-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      85cd39af
  12. Aug 03, 2021
    • Paolo Bonzini's avatar
      KVM: Don't take mmu_lock for range invalidation unless necessary · 071064f1
      Paolo Bonzini authored
      
      Avoid taking mmu_lock for .invalidate_range_{start,end}() notifications
      that are unrelated to KVM.  This is possible now that memslot updates are
      blocked from range_start() to range_end(); that ensures that lock elision
      happens in both or none, and therefore that mmu_notifier_count updates
      (which must occur while holding mmu_lock for write) are always paired
      across start->end.
      
      Based on patches originally written by Ben Gardon.
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      071064f1
    • Paolo Bonzini's avatar
      KVM: Block memslot updates across range_start() and range_end() · 52ac8b35
      Paolo Bonzini authored
      
      We would like to avoid taking mmu_lock for .invalidate_range_{start,end}()
      notifications that are unrelated to KVM.  Because mmu_notifier_count
      must be modified while holding mmu_lock for write, and must always
      be paired across start->end to stay balanced, lock elision must
      happen in both or none.  Therefore, in preparation for this change,
      this patch prevents memslot updates across range_start() and range_end().
      
      Note, technically flag-only memslot updates could be allowed in parallel,
      but stalling a memslot update for a relatively short amount of time is
      not a scalability issue, and this is all more than complex enough.
      
      A long note on the locking: a previous version of the patch used an rwsem
      to block the memslot update while the MMU notifier run, but this resulted
      in the following deadlock involving the pseudo-lock tagged as
      "mmu_notifier_invalidate_range_start".
      
         ======================================================
         WARNING: possible circular locking dependency detected
         5.12.0-rc3+ #6 Tainted: G           OE
         ------------------------------------------------------
         qemu-system-x86/3069 is trying to acquire lock:
         ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190
      
         but task is already holding lock:
         ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm]
      
         which lock already depends on the new lock.
      
      This corresponds to the following MMU notifier logic:
      
          invalidate_range_start
            take pseudo lock
            down_read()           (*)
            release pseudo lock
          invalidate_range_end
            take pseudo lock      (**)
            up_read()
            release pseudo lock
      
      At point (*) we take the mmu_notifiers_slots_lock inside the pseudo lock;
      at point (**) we take the pseudo lock inside the mmu_notifiers_slots_lock.
      
      This could cause a deadlock (ignoring for a second that the pseudo lock
      is not a lock):
      
      - invalidate_range_start waits on down_read(), because the rwsem is
      held by install_new_memslots
      
      - install_new_memslots waits on down_write(), because the rwsem is
      held till (another) invalidate_range_end finishes
      
      - invalidate_range_end sits waits on the pseudo lock, held by
      invalidate_range_start.
      
      Removing the fairness of the rwsem breaks the cycle (in lockdep terms,
      it would change the *shared* rwsem readers into *shared recursive*
      readers), so open-code the wait using a readers count and a
      spinlock.  This also allows handling blockable and non-blockable
      critical section in the same way.
      
      Losing the rwsem fairness does theoretically allow MMU notifiers to
      block install_new_memslots forever.  Note that mm/mmu_notifier.c's own
      retry scheme in mmu_interval_read_begin also uses wait/wake_up
      and is likewise not fair.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      52ac8b35
  13. Aug 02, 2021
  14. Jul 27, 2021
    • Paolo Bonzini's avatar
      KVM: add missing compat KVM_CLEAR_DIRTY_LOG · 8750f9bb
      Paolo Bonzini authored
      
      The arguments to the KVM_CLEAR_DIRTY_LOG ioctl include a pointer,
      therefore it needs a compat ioctl implementation.  Otherwise,
      32-bit userspace fails to invoke it on 64-bit kernels; for x86
      it might work fine by chance if the padding is zero, but not
      on big-endian architectures.
      
      Reported-by: Thomas Sattler
      Cc: stable@vger.kernel.org
      Fixes: 2a31b9db ("kvm: introduce manual dirty log reprotect")
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      8750f9bb
    • Li RongQing's avatar
      KVM: use cpu_relax when halt polling · 74775654
      Li RongQing authored
      
      SMT siblings share caches and other hardware, and busy halt polling
      will degrade its sibling performance if its sibling is working
      
      Sean Christopherson suggested as below:
      
      "Rather than disallowing halt-polling entirely, on x86 it should be
      sufficient to simply have the hardware thread yield to its sibling(s)
      via PAUSE.  It probably won't get back all performance, but I would
      expect it to be close.
      This compiles on all KVM architectures, and AFAICT the intended usage
      of cpu_relax() is identical for all architectures."
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarLi RongQing <lirongqing@baidu.com>
      Message-Id: <20210727111247.55510-1-lirongqing@baidu.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      74775654
  15. Jul 15, 2021
  16. Jul 14, 2021
    • Kefeng Wang's avatar
      KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio · 23fa2e46
      Kefeng Wang authored
      
      BUG: KASAN: use-after-free in kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
      Read of size 8 at addr ffff0000c03a2500 by task syz-executor083/4269
      
      CPU: 5 PID: 4269 Comm: syz-executor083 Not tainted 5.10.0 #7
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
       show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x110/0x164 lib/dump_stack.c:118
       print_address_description+0x78/0x5c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report+0x148/0x1e4 mm/kasan/report.c:562
       check_memory_region_inline mm/kasan/generic.c:183 [inline]
       __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
       kvm_vm_ioctl_unregister_coalesced_mmio+0x7c/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:183
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Allocated by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
       kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
       kmem_cache_alloc_trace include/linux/slab.h:450 [inline]
       kmalloc include/linux/slab.h:552 [inline]
       kzalloc include/linux/slab.h:664 [inline]
       kvm_vm_ioctl_register_coalesced_mmio+0x78/0x1cc arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:146
       kvm_vm_ioctl+0x7e8/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3746
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      Freed by task 4269:
       stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x38/0x6c mm/kasan/common.c:56
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
       __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
       kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook mm/slub.c:1577 [inline]
       slab_free mm/slub.c:3142 [inline]
       kfree+0x104/0x38c mm/slub.c:4124
       coalesced_mmio_destructor+0x94/0xa4 arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:102
       kvm_iodevice_destructor include/kvm/iodev.h:61 [inline]
       kvm_io_bus_unregister_dev+0x248/0x280 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:4374
       kvm_vm_ioctl_unregister_coalesced_mmio+0x158/0x1ec arch/arm64/kvm/../../../virt/kvm/coalesced_mmio.c:186
       kvm_vm_ioctl+0xe30/0x14c4 arch/arm64/kvm/../../../virt/kvm/kvm_main.c:3755
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0xf88/0x131c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
       do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:220
       el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
       el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
       el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670
      
      If kvm_io_bus_unregister_dev() return -ENOMEM, we already call kvm_iodevice_destructor()
      inside this function to delete 'struct kvm_coalesced_mmio_dev *dev' from list
      and free the dev, but kvm_iodevice_destructor() is called again, it will lead
      the above issue.
      
      Let's check the the return value of kvm_io_bus_unregister_dev(), only call
      kvm_iodevice_destructor() if the return value is 0.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Message-Id: <20210626070304.143456-1-wangkefeng.wang@huawei.com>
      Cc: stable@vger.kernel.org
      Fixes: 5d3c4c79 ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed", 2021-04-20)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      23fa2e46
  17. Jun 29, 2021
  18. Jun 24, 2021
Loading