- Jan 08, 2024
-
-
Paolo Bonzini authored
KVM_GENERIC_MEMORY_ATTRIBUTES requires the generic MMU notifier code, because it uses kvm_mmu_invalidate_begin/end. However, it would not work with a bespoke implementation of MMU notifiers that does not use KVM_GENERIC_MMU_NOTIFIER, because most likely it would not synchronize correctly on invalidation. So the right thing to do is to note the problematic configuration if the architecture does not select itself KVM_GENERIC_MMU_NOTIFIER; not to enable it blindly. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
CONFIG_HAVE_KVM is currently used by some architectures to either enabled the KVM config proper, or to enable host-side code that is not part of the KVM module. However, CONFIG_KVM's "select" statement in virt/kvm/Kconfig corresponds to a third meaning, namely to enable common Kconfigs required by all architectures that support KVM. These three meanings can be replaced respectively by an architecture-specific Kconfig, by IS_ENABLED(CONFIG_KVM), or by a new Kconfig symbol that is in turn selected by the architecture-specific "config KVM". Start by introducing such a new Kconfig symbol, CONFIG_KVM_COMMON. Unlike CONFIG_HAVE_KVM, it is selected by CONFIG_KVM, not by architecture code, and it brings in all dependencies of common KVM code. In particular, INTERVAL_TREE was missing in loongarch and riscv, so that is another thing that is fixed. Fixes: 8132d887 ("KVM: remove CONFIG_HAVE_KVM_EVENTFD", 2023-12-08) Reported-by:
Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/all/44907c6b-c5bd-4e4a-a921-e4d3825539d8@infradead.org/ Reviewed-by:
Andrew Jones <ajones@ventanamicro.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Dec 12, 2023
-
-
Marc Zyngier authored
Instead of having a comment indicating the need to hold slots_lock when calling kvm_io_bus_register_dev(), make it explicit with a lockdep assertion. Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-6-maz@kernel.org Signed-off-by:
Oliver Upton <oliver.upton@linux.dev>
-
- Dec 08, 2023
-
-
Paolo Bonzini authored
Keep all #ifdef CONFIG_HAVE_KVM_IRQCHIP parts of eventfd.c together, and compile out the irqfds field of struct kvm if the symbol is not defined. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The deprecated interfaces were removed 15 years ago. KVM's device assignment was deprecated in 4.2 and removed 6.5 years ago; the only interest might be in compiling ancient versions of QEMU, but QEMU has been using its own imported copy of the kernel headers since June 2011. So again we go into archaeology territory; just remove the cruft. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
All platforms with a kernel irqchip have support for irqfd. Unify the two configuration items so that userspace can expect to use irqfd to inject interrupts into the irqchip. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
virt/kvm/eventfd.c is compiled unconditionally, meaning that the ioeventfds member of struct kvm is accessed unconditionally. CONFIG_HAVE_KVM_EVENTFD therefore must be defined for KVM common code to compile successfully, remove it. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
With migration disabled, one function becomes unused: virt/kvm/guest_memfd.c:262:12: error: 'kvm_gmem_migrate_folio' defined but not used [-Werror=unused-function] 262 | static int kvm_gmem_migrate_folio(struct address_space *mapping, | ^~~~~~~~~~~~~~~~~~~~~~ Remove the #ifdef around the reference so that fallback_migrate_folio() is never used. The gmem implementation of the hook is trivial; since the gmem mapping is unmovable, the pages should not be migrated anyway. Fixes: a7800aa8 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory") Reported-by:
Arnd Bergmann <arnd@arndb.de> Suggested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Dec 01, 2023
-
-
Sean Christopherson authored
Revert KVM's misguided attempt to "fix" a use-after-module-unload bug that was actually due to failure to flush a workqueue, not a lack of module refcounting. Pinning the KVM module until kvm_vm_destroy() doesn't prevent use-after-free due to the module being unloaded, as userspace can invoke delete_module() the instant the last reference to KVM is put, i.e. can cause all KVM code to be unmapped while KVM is actively executing said code. Generally speaking, the many instances of module_put(THIS_MODULE) notwithstanding, outside of a few special paths, a module can never safely put the last reference to itself without creating deadlock, i.e. something external to the module *must* put the last reference. In other words, having VMs grab a reference to the KVM module is futile, pointless, and as evidenced by the now-reverted commit 70375c2d ("Revert "KVM: set owner of cpu and vm file operations""), actively dangerous. This reverts commit 405294f2 and commit 5f6de5cb. Fixes: 405294f2 ("KVM: Unconditionally get a ref to /dev/kvm module when creating a VM") Fixes: 5f6de5cb ("KVM: Prevent module exit until all VMs are freed") Link: https://lore.kernel.org/r/20231018204624.1905300-4-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
Sean Christopherson authored
Set .owner for all KVM-owned filed types so that the KVM module is pinned until any files with callbacks back into KVM are completely freed. Using "struct kvm" as a proxy for the module, i.e. keeping KVM-the-module alive while there are active VMs, doesn't provide full protection. Userspace can invoke delete_module() the instant the last reference to KVM is put. If KVM itself puts the last reference, e.g. via kvm_destroy_vm(), then it's possible for KVM to be preempted and deleted/unloaded before KVM fully exits, e.g. when the task running kvm_destroy_vm() is scheduled back in, it will jump to a code page that is no longer mapped. Note, file types that can call into sub-module code, e.g. kvm-intel.ko or kvm-amd.ko on x86, must use the module pointer passed to kvm_init(), not THIS_MODULE (which points at kvm.ko). KVM assumes that if /dev/kvm is reachable, e.g. VMs are active, then the vendor module is loaded. To reduce the probability of forgetting to set .owner entirely, use THIS_MODULE for stats files where KVM does not call back into vendor code. This reverts commit 70375c2d, and fixes several other file types that have been buggy since their introduction. Fixes: 70375c2d ("Revert "KVM: set owner of cpu and vm file operations"") Fixes: 3bcd0662 ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file") Reported-by:
Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/20231010003746.GN800259@ZenIV Link: https://lore.kernel.org/r/20231018204624.1905300-2-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
Philipp Stanner authored
kvm_main.c utilizes vmemdup_user() and array_size() to copy a userspace array. Currently, this does not check for an overflow. Use the new wrapper vmemdup_array_user() to copy the array more safely. Note, KVM explicitly checks the number of entries before duplicating the array, i.e. adding the overflow check should be a glorified nop. Suggested-by:
Dave Airlie <airlied@redhat.com> Signed-off-by:
Philipp Stanner <pstanner@redhat.com> Link: https://lore.kernel.org/r/20231102181526.43279-4-pstanner@redhat.com [sean: call out that KVM pre-checks the number of entries] Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
- Nov 30, 2023
-
-
Wei Wang authored
KVM_CAP_DEVICE_CTRL allows userspace to check if the kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by KVM. Move KVM_CAP_DEVICE_CTRL to the generic check for the two reasons: 1) it already supports arch agnostic usages (i.e. KVM_DEV_TYPE_VFIO). For example, userspace VFIO implementation may needs to create KVM_DEV_TYPE_VFIO on x86, riscv, or arm etc. It is simpler to have it checked at the generic code than at each arch's code. 2) KVM_CREATE_DEVICE has been added to the generic code. Link: https://lore.kernel.org/all/20221215115207.14784-1-wei.w.wang@intel.com Signed-off-by:
Wei Wang <wei.w.wang@intel.com> Reviewed-by:
Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> (riscv) Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230315101606.10636-1-wei.w.wang@intel.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
- Nov 28, 2023
-
-
Christian Brauner authored
Ever since the eventfd type was introduced back in 2007 in commit e1ad7468 ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by:
Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Nov 14, 2023
-
-
Sean Christopherson authored
Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development and testing vehicle for Confidential (CoCo) VMs, and potentially to even become a "real" product in the distant future, e.g. a la pKVM. The private memory support in KVM x86 is aimed at AMD's SEV-SNP and Intel's TDX, but those technologies are extremely complex (understatement), difficult to debug, don't support running as nested guests, and require hardware that's isn't universally accessible. I.e. relying SEV-SNP or TDX for maintaining guest private memory isn't a realistic option. At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of selftests for guest_memfd and private memory support without requiring unique hardware. Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20231027182217.3615211-24-seanjc@google.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Let x86 track the number of address spaces on a per-VM basis so that KVM can disallow SMM memslots for confidential VMs. Confidentials VMs are fundamentally incompatible with emulating SMM, which as the name suggests requires being able to read and write guest memory and register state. Disallowing SMM will simplify support for guest private memory, as KVM will not need to worry about tracking memory attributes for multiple address spaces (SMM is the only "non-default" address space across all architectures). Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-23-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based memory that is tied to a specific KVM virtual machine and whose primary purpose is to serve guest memory. A guest-first memory subsystem allows for optimizations and enhancements that are kludgy or outright infeasible to implement/support in a generic memory subsystem. With guest_memfd, guest protections and mapping sizes are fully decoupled from host userspace mappings. E.g. KVM currently doesn't support mapping memory as writable in the guest without it also being writable in host userspace, as KVM's ABI uses VMA protections to define the allow guest protection. Userspace can fudge this by establishing two mappings, a writable mapping for the guest and readable one for itself, but that’s suboptimal on multiple fronts. Similarly, KVM currently requires the guest mapping size to be a strict subset of the host userspace mapping size, e.g. KVM doesn’t support creating a 1GiB guest mapping unless userspace also has a 1GiB guest mapping. Decoupling the mappings sizes would allow userspace to precisely map only what is needed without impacting guest performance, e.g. to harden against unintentional accesses to guest memory. Decoupling guest and userspace mappings may also allow for a cleaner alternative to high-granularity mappings for HugeTLB, which has reached a bit of an impasse and is unlikely to ever be merged. A guest-first memory subsystem also provides clearer line of sight to things like a dedicated memory pool (for slice-of-hardware VMs) and elimination of "struct page" (for offload setups where userspace _never_ needs to mmap() guest memory). More immediately, being able to map memory into KVM guests without mapping said memory into the host is critical for Confidential VMs (CoCo VMs), the initial use case for guest_memfd. While AMD's SEV and Intel's TDX prevent untrusted software from reading guest private data by encrypting guest memory with a key that isn't usable by the untrusted host, projects such as Protected KVM (pKVM) provide confidentiality and integrity *without* relying on memory encryption. And with SEV-SNP and TDX, accessing guest private memory can be fatal to the host, i.e. KVM must be prevent host userspace from accessing guest memory irrespective of hardware behavior. Attempt #1 to support CoCo VMs was to add a VMA flag to mark memory as being mappable only by KVM (or a similarly enlightened kernel subsystem). That approach was abandoned largely due to it needing to play games with PROT_NONE to prevent userspace from accessing guest memory. Attempt #2 to was to usurp PG_hwpoison to prevent the host from mapping guest private memory into userspace, but that approach failed to meet several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel wouldn't easily be able to enforce a 1:1 page:guest association, let alone a 1:1 pfn:gfn mapping. And using PG_hwpoison does not work for memory that isn't backed by 'struct page', e.g. if devices gain support for exposing encrypted memory regions to guests. Attempt #3 was to extend the memfd() syscall and wrap shmem to provide dedicated file-based guest memory. That approach made it as far as v10 before feedback from Hugh Dickins and Christian Brauner (and others) led to it demise. Hugh's objection was that piggybacking shmem made no sense for KVM's use case as KVM didn't actually *want* the features provided by shmem. I.e. KVM was using memfd() and shmem to avoid having to manage memory directly, not because memfd() and shmem were the optimal solution, e.g. things like read/write/mmap in shmem were dead weight. Christian pointed out flaws with implementing a partial overlay (wrapping only _some_ of shmem), e.g. poking at inode_operations or super_operations would show shmem stuff, but address_space_operations and file_operations would show KVM's overlay. Paraphrashing heavily, Christian suggested KVM stop being lazy and create a proper API. Link: https://lore.kernel.org/all/20201020061859.18385-1-kirill.shutemov@linux.intel.com Link: https://lore.kernel.org/all/20210416154106.23721-1-kirill.shutemov@linux.intel.com Link: https://lore.kernel.org/all/20210824005248.200037-1-seanjc@google.com Link: https://lore.kernel.org/all/20211111141352.26311-1-chao.p.peng@linux.intel.com Link: https://lore.kernel.org/all/20221202061347.1070246-1-chao.p.peng@linux.intel.com Link: https://lore.kernel.org/all/ff5c5b97-acdf-9745-ebe5-c6609dd6322e@google.com Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner Link: https://lore.kernel.org/all/ZEM5Zq8oo+xnApW9@google.com Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey Link: https://lore.kernel.org/linux-mm/ZII1p8ZHlHaQ3dDl@casper.infradead.org Cc: Fuad Tabba <tabba@google.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Ackerley Tng <ackerleytng@google.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Maciej Szmigiero <mail@maciej.szmigiero.name> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Quentin Perret <qperret@google.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Wang <wei.w.wang@intel.com> Cc: Liam Merwick <liam.merwick@oracle.com> Cc: Isaku Yamahata <isaku.yamahata@gmail.com> Co-developed-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Co-developed-by:
Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by:
Yu Zhang <yu.c.zhang@linux.intel.com> Co-developed-by:
Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by:
Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by:
Ackerley Tng <ackerleytng@google.com> Signed-off-by:
Ackerley Tng <ackerleytng@google.com> Co-developed-by:
Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by:
Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Co-developed-by:
Michael Roth <michael.roth@amd.com> Signed-off-by:
Michael Roth <michael.roth@amd.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20231027182217.3615211-17-seanjc@google.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Reviewed-by:
Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Nov 13, 2023
-
-
Chao Peng authored
In confidential computing usages, whether a page is private or shared is necessary information for KVM to perform operations like page fault handling, page zapping etc. There are other potential use cases for per-page memory attributes, e.g. to make memory read-only (or no-exec, or exec-only, etc.) without having to modify memslots. Introduce the KVM_SET_MEMORY_ATTRIBUTES ioctl, advertised by KVM_CAP_MEMORY_ATTRIBUTES, to allow userspace to set the per-page memory attributes to a guest memory range. Use an xarray to store the per-page attributes internally, with a naive, not fully optimized implementation, i.e. prioritize correctness over performance for the initial implementation. Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX attributes/protections in the future, e.g. to give userspace fine-grained control over read, write, and execute protections for guest memory. Provide arch hooks for handling attribute changes before and after common code sets the new attributes, e.g. x86 will use the "pre" hook to zap all relevant mappings, and the "post" hook to track whether or not hugepages can be used to map the range. To simplify the implementation wrap the entire sequence with kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly guaranteed to be an invalidation. For the initial use case, x86 *will* always invalidate memory, and preventing arch code from creating new mappings while the attributes are in flux makes it much easier to reason about the correctness of consuming attributes. It's possible that future usages may not require an invalidation, e.g. if KVM ends up supporting RWX protections and userspace grants _more_ protections, but again opt for simplicity and punt optimizations to if/when they are needed. Suggested-by:
Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com Cc: Fuad Tabba <tabba@google.com> Cc: Xu Yilun <yilun.xu@intel.com> Cc: Mickaël Salaün <mic@digikod.net> Signed-off-by:
Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20231027182217.3615211-14-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Drop the .on_unlock() mmu_notifer hook now that it's no longer used for notifying arch code that memory has been reclaimed. Adding .on_unlock() and invoking it *after* dropping mmu_lock was a terrible idea, as doing so resulted in .on_lock() and .on_unlock() having divergent and asymmetric behavior, and set future developers up for failure, i.e. all but asked for bugs where KVM relied on using .on_unlock() to try to run a callback while holding mmu_lock. Opportunistically add a lockdep assertion in kvm_mmu_invalidate_end() to guard against future bugs of this nature. Reported-by:
Isaku Yamahata <isaku.yamahata@intel.com> Link: https://lore.kernel.org/all/20230802203119.GB2021422@ls.amr.corp.intel.com Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-12-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Handle AMD SEV's kvm_arch_guest_memory_reclaimed() hook by having __kvm_handle_hva_range() return whether or not an overlapping memslot was found, i.e. mmu_lock was acquired. Using the .on_unlock() hook works, but kvm_arch_guest_memory_reclaimed() needs to run after dropping mmu_lock, which makes .on_lock() and .on_unlock() asymmetrical. Use a small struct to return the tuple of the notifier-specific return, plus whether or not overlap was found. Because the iteration helpers are __always_inlined, practically speaking, the struct will never actually be returned from a function call (not to mention the size of the struct will be two bytes in practice). Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-11-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional information can be supplied without setting userspace up to fail. The padding in the new kvm_userspace_memory_region2 structure will be used to pass a file descriptor in addition to the userspace_addr, i.e. allow userspace to point at a file descriptor and map memory into a guest that is NOT mapped into host userspace. Alternatively, KVM could simply add "struct kvm_userspace_memory_region2" without a new ioctl(), but as Paolo pointed out, adding a new ioctl() makes detection of bad flags a bit more robust, e.g. if the new fd field is guarded only by a flag and not a new ioctl(), then a userspace bug (setting a "bad" flag) would generate out-of-bounds access instead of an -EINVAL error. Cc: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-9-seanjc@google.com> Acked-by:
Kai Huang <kai.huang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Convert KVM_ARCH_WANT_MMU_NOTIFIER into a Kconfig and select it where appropriate to effectively maintain existing behavior. Using a proper Kconfig will simplify building more functionality on top of KVM's mmu_notifier infrastructure. Add a forward declaration of kvm_gfn_range to kvm_types.h so that including arch/powerpc/include/asm/kvm_ppc.h's with CONFIG_KVM=n doesn't generate warnings due to kvm_gfn_range being undeclared. PPC defines hooks for PR vs. HV without guarding them via #ifdeffery, e.g. bool (*unmap_gfn_range)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*test_age_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); bool (*set_spte_gfn)(struct kvm *kvm, struct kvm_gfn_range *range); Alternatively, PPC could forward declare kvm_gfn_range, but there's no good reason not to define it in common KVM. Acked-by:
Anup Patel <anup@brainfault.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-8-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Add an assertion that there are no in-progress MMU invalidations when a VM is being destroyed, with the exception of the scenario where KVM unregisters its MMU notifier between an .invalidate_range_start() call and the corresponding .invalidate_range_end(). KVM can't detect unpaired calls from the mmu_notifier due to the above exception waiver, but the assertion can detect KVM bugs, e.g. such as the bug that *almost* escaped initial guest_memfd development. Link: https://lore.kernel.org/all/e397d30c-c6af-e68f-d18e-b4e3739c5389@linux.intel.com Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-5-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Chao Peng authored
Currently in mmu_notifier invalidate path, hva range is recorded and then checked against by mmu_invalidate_retry_hva() in the page fault handling path. However, for the soon-to-be-introduced private memory, a page fault may not have a hva associated, checking gfn(gpa) makes more sense. For existing hva based shared memory, gfn is expected to also work. The only downside is when aliasing multiple gfns to a single hva, the current algorithm of checking multiple ranges could result in a much larger range being rejected. Such aliasing should be uncommon, so the impact is expected small. Suggested-by:
Sean Christopherson <seanjc@google.com> Cc: Xu Yilun <yilun.xu@intel.com> Signed-off-by:
Chao Peng <chao.p.peng@linux.intel.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> [sean: convert vmx_set_apic_access_page_addr() to gfn-based API] Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Xu Yilun <yilun.xu@linux.intel.com> Message-Id: <20231027182217.3615211-4-seanjc@google.com> Reviewed-by:
Kai Huang <kai.huang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Move the assertion on the in-progress invalidation count from the primary MMU's notifier path to KVM's common notification path, i.e. assert that the count doesn't go negative even when the invalidation is coming from KVM itself. Opportunistically convert the assertion to a KVM_BUG_ON(), i.e. kill only the affected VM, not the entire kernel. A corrupted count is fatal to the VM, e.g. the non-zero (negative) count will cause mmu_invalidate_retry() to block any and all attempts to install new mappings. But it's far from guaranteed that an end() without a start() is fatal or even problematic to anything other than the target VM, e.g. the underlying bug could simply be a duplicate call to end(). And it's much more likely that a missed invalidation, i.e. a potential use-after-free, would manifest as no notification whatsoever, not an end() without a start(). Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-3-seanjc@google.com> Reviewed-by:
Kai Huang <kai.huang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Rework and rename "struct kvm_hva_range" into "kvm_mmu_notifier_range" so that the structure can be used to handle notifications that operate on gfn context, i.e. that aren't tied to a host virtual address. Rename the handler typedef too (arguably it should always have been gfn_handler_t). Practically speaking, this is a nop for 64-bit kernels as the only meaningful change is to store start+end as u64s instead of unsigned longs. Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Fuad Tabba <tabba@google.com> Tested-by:
Fuad Tabba <tabba@google.com> Message-Id: <20231027182217.3615211-2-seanjc@google.com> Reviewed-by:
Kai Huang <kai.huang@intel.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Aug 21, 2023
-
-
David Hildenbrand authored
KVM is *the* case we know that really wants to honor NUMA hinting falls. As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU to map them into a secondary MMU, and add a comment why. Do that unconditionally in hva_to_pfn_slow() when calling get_user_pages_unlocked(). kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and gfn_to_page_many_atomic() are similarly used to map pages into a secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always implicitly honor NUMA hinting faults -- as documented for FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location for now. Don't set it in check_user_page_hwpoison(), where we really only want to check if the mapped page is HW-poisoned. We won't set it for other KVM users of get_user_pages()/pin_user_pages() * arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a secondary MMU. * arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace * arch/s390/kvm/*: s390x only supports a single NUMA node either way * arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU. This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer implicitly be set by get_user_pages() and friends. Link: https://lkml.kernel.org/r/20230803143208.383663-4-david@redhat.com Signed-off-by:
David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: liubo <liubo254@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Aug 17, 2023
-
-
Sean Christopherson authored
Wrap kvm_{gfn,hva}_range.pte in a union so that future notifier events can pass event specific information up and down the stack without needing to constantly expand and churn the APIs. Lockless aging of SPTEs will pass around a bitmap, and support for memory attributes will pass around the new attributes for the range. Add a "KVM_NO_ARG" placeholder to simplify handling events without an argument (creating a dummy union variable is midly annoying). Opportunstically drop explicit zero-initialization of the "pte" field, as omitting the field (now a union) has the same effect. Cc: Yu Zhao <yuzhao@google.com> Link: https://lore.kernel.org/all/CAOUHufagkd2Jk3_HrVoFFptRXM=hX2CV8f+M-dka-hJU4bP8kw@mail.gmail.com Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Acked-by:
Yu Zhao <yuzhao@google.com> Link: https://lore.kernel.org/r/20230729004144.1054885-1-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
David Matlack authored
Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a range-based TLB invalidation where the range is defined by the memslot. Now that kvm_flush_remote_tlbs_range() can be called from common code we can just use that and drop a bunch of duplicate code from the arch directories. Note this adds a lockdep assertion for slots_lock being held when calling kvm_flush_remote_tlbs_memslot(), which was previously only asserted on x86. MIPS has calls to kvm_flush_remote_tlbs_memslot(), but they all hold the slots_lock, so the lockdep assertion continues to hold true. Also drop the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT ifdef gating kvm_flush_remote_tlbs_memslot(), since it is no longer necessary. Signed-off-by:
David Matlack <dmatlack@google.com> Signed-off-by:
Raghavendra Rao Ananta <rananta@google.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Reviewed-by:
Shaoqin Huang <shahuang@redhat.com> Acked-by:
Anup Patel <anup@brainfault.org> Acked-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-7-rananta@google.com
-
David Matlack authored
Make kvm_flush_remote_tlbs_range() visible in common code and create a default implementation that just invalidates the whole TLB. This paves the way for several future features/cleanups: - Introduction of range-based TLBI on ARM. - Eliminating kvm_arch_flush_remote_tlbs_memslot() - Moving the KVM/x86 TDP MMU to common code. No functional change intended. Signed-off-by:
David Matlack <dmatlack@google.com> Signed-off-by:
Raghavendra Rao Ananta <rananta@google.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Reviewed-by:
Shaoqin Huang <shahuang@redhat.com> Reviewed-by:
Anup Patel <anup@brainfault.org> Acked-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-6-rananta@google.com
-
Raghavendra Rao Ananta authored
kvm_arch_flush_remote_tlbs() or CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL are two mechanisms to solve the same problem, allowing architecture-specific code to provide a non-IPI implementation of remote TLB flushing. Dropping CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL allows KVM to standardize all architectures on kvm_arch_flush_remote_tlbs() instead of maintaining two mechanisms. Signed-off-by:
Raghavendra Rao Ananta <rananta@google.com> Reviewed-by:
Shaoqin Huang <shahuang@redhat.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-5-rananta@google.com
-
David Matlack authored
Rename kvm_arch_flush_remote_tlb() and the associated macro __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB to kvm_arch_flush_remote_tlbs() and __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS respectively. Making the name plural matches kvm_flush_remote_tlbs() and makes it more clear that this function can affect more than one remote TLB. No functional change intended. Signed-off-by:
David Matlack <dmatlack@google.com> Signed-off-by:
Raghavendra Rao Ananta <rananta@google.com> Reviewed-by:
Gavin Shan <gshan@redhat.com> Reviewed-by:
Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by:
Shaoqin Huang <shahuang@redhat.com> Acked-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-2-rananta@google.com
-
- Aug 03, 2023
-
-
Dmitry Torokhov authored
Stop taking kv->lock mutex in kvm_vfio_update_coherency() and instead call it with this mutex held: the callers of the function usually already have it taken (and released) before calling kvm_vfio_update_coherency(). This avoid bouncing the lock up and down. The exception is kvm_vfio_release() where we do not take the lock, but it is being executed when the very last reference to kvm_device is being dropped, so there are no concerns about concurrency. Suggested-by:
Alex Williamson <alex.williamson@redhat.com> Reviewed-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230714224538.404793-2-dmitry.torokhov@gmail.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com>
-
Dmitry Torokhov authored
kvm_vfio_group_add() creates kvg instance, links it to kv->group_list, and calls kvm_vfio_file_set_kvm() with kvg->file as an argument after dropping kv->lock. If we race group addition and deletion calls, kvg instance may get freed by the time we get around to calling kvm_vfio_file_set_kvm(). Previous iterations of the code did not reference kvg->file outside of the critical section, but used a temporary variable. Still, they had similar problem of the file reference being owned by kvg structure and potential for kvm_vfio_group_del() dropping it before kvm_vfio_group_add() had a chance to complete. Fix this by moving call to kvm_vfio_file_set_kvm() under the protection of kv->lock. We already call it while holding the same lock when vfio group is being deleted, so it should be safe here as well. Fixes: 2fc1bec1 ("kvm: set/clear kvm to/from vfio_group when group add/delete") Reviewed-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230714224538.404793-1-dmitry.torokhov@gmail.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com>
-
- Jul 29, 2023
-
-
Sean Christopherson authored
Grab a reference to KVM prior to installing VM and vCPU stats file descriptors to ensure the underlying VM and vCPU objects are not freed until the last reference to any and all stats fds are dropped. Note, the stats paths manually invoke fd_install() and so don't need to grab a reference before creating the file. Fixes: ce55c049 ("KVM: stats: Support binary stats retrieval for a VCPU") Fixes: fcfe1bae ("KVM: stats: Support binary stats retrieval for a VM") Reported-by:
Zheng Zhang <zheng.zhang@email.ucr.edu> Closes: https://lore.kernel.org/all/CAC_GQSr3xzZaeZt85k_RCBd5kfiOve8qXo7a81Cq53LuVQ5r=Q@mail.gmail.com Cc: stable@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Message-Id: <20230711230131.648752-2-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jul 25, 2023
-
-
Yi Liu authored
This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*. Old userspace uses KVM_DEV_VFIO_GROUP* works as well. Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Tested-by:
Terrence Xu <terrence.xu@intel.com> Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Matthew Rosato <mjrosato@linux.ibm.com> Tested-by:
Yanting Jiang <yanting.jiang@intel.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by:
Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by:
Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-6-yi.l.liu@intel.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This renames kvm_vfio_group related helpers to prepare for accepting vfio device fd. No functional change is intended. Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Tested-by:
Terrence Xu <terrence.xu@intel.com> Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Matthew Rosato <mjrosato@linux.ibm.com> Tested-by:
Yanting Jiang <yanting.jiang@intel.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by:
Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by:
Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-5-yi.l.liu@intel.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com>
-
Yi Liu authored
This prepares for making the below kAPIs to accept both group file and device file instead of only vfio group file. bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); Reviewed-by:
Kevin Tian <kevin.tian@intel.com> Reviewed-by:
Eric Auger <eric.auger@redhat.com> Reviewed-by:
Jason Gunthorpe <jgg@nvidia.com> Tested-by:
Terrence Xu <terrence.xu@intel.com> Tested-by:
Nicolin Chen <nicolinc@nvidia.com> Tested-by:
Matthew Rosato <mjrosato@linux.ibm.com> Tested-by:
Yanting Jiang <yanting.jiang@intel.com> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Tested-by:
Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by:
Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230718135551.6592-3-yi.l.liu@intel.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com>
-
- Jun 22, 2023
-
-
Gavin Shan authored
We run into guest hang in edk2 firmware when KSM is kept as running on the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash device (TYPE_PFLASH_CFI01) during the operation of sector erasing or buffered write. The status is returned by reading the memory region of the pflash device and the read request should have been forwarded to QEMU and emulated by it. Unfortunately, the read request is covered by an illegal stage2 mapping when the guest hang issue occurs. The read request is completed with QEMU bypassed and wrong status is fetched. The edk2 firmware runs into an infinite loop with the wrong status. The illegal stage2 mapping is populated due to same page sharing by KSM at (C) even the associated memory slot has been marked as invalid at (B) when the memory slot is requested to be deleted. It's notable that the active and inactive memory slots can't be swapped when we're in the middle of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count is elevated, and kvm_swap_active_memslots() will busy loop until it reaches to zero again. Besides, the swapping from the active to the inactive memory slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(), corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots(). CPU-A CPU-B ----- ----- ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION) kvm_vm_ioctl_set_memory_region kvm_set_memory_region __kvm_set_memory_region kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE) kvm_invalidate_memslot kvm_copy_memslot kvm_replace_memslot kvm_swap_active_memslots (A) kvm_arch_flush_shadow_memslot (B) same page sharing by KSM kvm_mmu_notifier_invalidate_range_start : kvm_mmu_notifier_change_pte kvm_handle_hva_range __kvm_handle_hva_range kvm_set_spte_gfn (C) : kvm_mmu_notifier_invalidate_range_end Fix the issue by skipping the invalid memory slot at (C) to avoid the illegal stage2 mapping so that the read request for the pflash's status is forwarded to QEMU and emulated by it. In this way, the correct pflash's status can be returned from QEMU to break the infinite loop in the edk2 firmware. We tried a git-bisect and the first problematic commit is cd4c7183 (" KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this, clean_dcache_guest_page() is called after the memory slots are iterated in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called before the iteration on the memory slots before this commit. This change literally enlarges the racy window between kvm_mmu_notifier_change_pte() and memory slot removal so that we're able to reproduce the issue in a practical test case. However, the issue exists since commit d5d8184d ("KVM: ARM: Memory virtualization setup"). Cc: stable@vger.kernel.org # v3.9+ Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup") Reported-by:
Shuai Hu <hshuai@redhat.com> Reported-by:
Zhenyu Zhang <zhenyzha@redhat.com> Signed-off-by:
Gavin Shan <gshan@redhat.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Oliver Upton <oliver.upton@linux.dev> Reviewed-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Sean Christopherson <seanjc@google.com> Reviewed-by:
Shaoqin Huang <shahuang@redhat.com> Message-Id: <20230615054259.14911-1-gshan@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jun 19, 2023
-
-
Ryan Roberts authored
Convert all instances of direct pte_t* dereferencing to instead use ptep_get() helper. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. Conversion was done using Coccinelle: ---- // $ make coccicheck \ // COCCI=ptepget.cocci \ // SPFLAGS="--include-headers" \ // MODE=patch virtual patch @ depends on patch @ pte_t *v; @@ - *v + ptep_get(v) ---- Then reviewed and hand-edited to avoid multiple unnecessary calls to ptep_get(), instead opting to store the result of a single call in a variable, where it is correct to do so. This aims to negate any cost of READ_ONCE() and will benefit arch-overrides that may be more complex. Included is a fix for an issue in an earlier version of this patch that was pointed out by kernel test robot. The issue arose because config MMU=n elides definition of the ptep helper functions, including ptep_get(). HUGETLB_PAGE=n configs still define a simple huge_ptep_clear_flush() for linking purposes, which dereferences the ptep. So when both configs are disabled, this caused a build error because ptep_get() is not defined. Fix by continuing to do a direct dereference when MMU=n. This is safe because for this config the arch code cannot be trying to virtualize the ptes because none of the ptep helpers are defined. Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com Reported-by:
kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/ Signed-off-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Jun 13, 2023
-
-
Wei Wang authored
Simpify kvm_deassign_ioeventfd_idx to use list_for_each_entry as the loop just ends at the entry that's found and deleted. Note, coalesced_mmio_ops and ioeventfd_ops are the only instances of kvm_io_device_ops that implement a destructor, all other callers of kvm_io_bus_unregister_dev() are unaffected by this change. Suggested-by:
Michal Luczaj <mhal@rbox.co> Signed-off-by:
Wei Wang <wei.w.wang@intel.com> Reviewed-by:
Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230207123713.3905-3-wei.w.wang@intel.com [sean: call out that only select users implement a destructor] Signed-off-by:
Sean Christopherson <seanjc@google.com>
-