- Nov 02, 2022
-
-
In some cases crosvm needs a way to query the cache flags to communicate them to the guest kernel for guest userspace mapping. Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Patchwork: https://patchwork.freedesktop.org/patch/504453/ Link: https://lore.kernel.org/r/20220923173307.2429872-1-robdclark@gmail.com Signed-off-by:
Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
-
- Oct 23, 2022
-
-
Linus Torvalds authored
Commit bfca3dd3 ("kernel/utsname_sysctl.c: print kernel arch") added a new entry to the uts_kern_table[] array, but didn't update the UTS_PROC_xyz enumerators of older entries, breaking anything that used them. Which is admittedly not many cases: it's really just the two uses of uts_proc_notify() in kernel/sys.c. But apparently journald-systemd actually uses this to detect hostname changes. Reported-by:
Torsten Hilbrich <torsten.hilbrich@secunet.com> Fixes: bfca3dd3 ("kernel/utsname_sysctl.c: print kernel arch") Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Cc: Petr Vorel <pvorel@suse.cz> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 22, 2022
-
-
Pavel Begunkov authored
We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: <stable@vger.kernel.org> # 6.0 Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Acked-by:
Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Alexander Graf authored
We will introduce the first architecture specific compat vm ioctl in the next patch. Add all necessary boilerplate to allow architectures to override compat vm ioctls when necessary. Signed-off-by:
Alexander Graf <graf@amazon.com> Message-Id: <20221017184541.2658-2-graf@amazon.com> Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 21, 2022
-
-
Ard Biesheuvel authored
Commit bbc6d2c6 ("efi: vars: Switch to new wrapper layer") refactored the efivars layer so that the 'business logic' related to which UEFI variables affect the boot flow in which way could be moved out of it, and into the efivarfs driver. This inadvertently broke setting variables on firmware implementations that lack the QueryVariableInfo() boot service, because we no longer tolerate a EFI_UNSUPPORTED result from check_var_size() when calling efivar_entry_set_get_size(), which now ends up calling check_var_size() a second time inadvertently. If QueryVariableInfo() is missing, we support writes of up to 64k - let's move that logic into check_var_size(), and drop the redundant call. Cc: <stable@vger.kernel.org> # v6.0 Fixes: bbc6d2c6 ("efi: vars: Switch to new wrapper layer") Signed-off-by:
Ard Biesheuvel <ardb@kernel.org>
-
Lu Baolu authored
Add gfp parameter to iommu_alloc_resv_region() for the callers to specify the memory allocation behavior. Thus iommu_alloc_resv_region() could also be available in critical contexts. Signed-off-by:
Lu Baolu <baolu.lu@linux.intel.com> Tested-by:
Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com Signed-off-by:
Joerg Roedel <jroedel@suse.de>
-
- Oct 20, 2022
-
-
Peter Zijlstra authored
Different function signatures means they needs to be different functions; otherwise CFI gets upset. As triggered by the ftrace boot tests: [] CFI failure at ftrace_return_to_handler+0xac/0x16c (target: ftrace_stub+0x0/0x14; expected type: 0x0a5d5347) Fixes: 3c516f89 ("x86: Add support for CONFIG_CFI_CLANG") Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Mark Rutland <mark.rutland@arm.com> Tested-by:
Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/Y06dg4e1xF6JTdQq@hirez.programming.kicks-ass.net
-
Steven Price authored
__le32 and __le64 types aren't portable and are not available on FreeBSD (which uses the same uAPI). Instead of attempting to always output little endian, just use native endianness in the dumps. Tools can detect the endianness in use by looking at the 'magic' field, but equally we don't expect big-endian to be used with Mali (there are no known implementations out there). Bug: mesa/mesa#7252 Fixes: 730c2bf4 ("drm/panfrost: Add support for devcoredump") Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Signed-off-by:
Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017104602.142992-3-steven.price@arm.com
-
Steven Price authored
The two structs internal to struct panfrost_dump_object_header were named, but sadly that is incompatible with C++, causing an error: "an anonymous union may only have public non-static data members". However nothing refers to struct pan_reg_hdr and struct pan_bomap_hdr and there's no need to export these definitions, so lets drop them. This fixes the C++ build error with the minimum change in userspace API. Reported-by:
Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: 730c2bf4 ("drm/panfrost: Add support for devcoredump") Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Signed-off-by:
Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017104602.142992-2-steven.price@arm.com
-
Zack Rusin authored
The fb_base in struct drm_mode_config has been unused for a long time. Some drivers set it and some don't leading to a very confusing state where the variable can't be relied upon, because there's no indication as to which driver sets it and which doesn't. The only usage of fb_base is internal to two drivers so instead of trying to force it into all the drivers to get it into a coherent state completely remove it. Signed-off-by:
Zack Rusin <zackr@vmware.com> Reviewed-by:
Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by:
Thomas Zimmermann <tzimemrmann@suse.de> Acked-by:
Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221019024401.394617-1-zack@kde.org
-
- Oct 19, 2022
-
-
Jakub Kicinski authored
Address a bunch of kdoc warnings: include/net/genetlink.h:81: warning: Function parameter or member 'module' not described in 'genl_family' include/net/genetlink.h:243: warning: expecting prototype for struct genl_info. Prototype was for struct genl_dumpit_info instead include/net/genetlink.h:419: warning: Function parameter or member 'net' not described in 'genlmsg_unicast' include/net/genetlink.h:438: warning: expecting prototype for gennlmsg_data(). Prototype was for genlmsg_data() instead include/net/genetlink.h:244: warning: Function parameter or member 'op' not described in 'genl_dumpit_info' Link: https://lore.kernel.org/r/20221018231310.1040482-1-kuba@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Christian König authored
Setting this flag on a scheduler fence prevents pipelining of jobs depending on this fence. In other words we always insert a full CPU round trip before dependent jobs are pushed to the pipeline. Signed-off-by:
Christian König <christian.koenig@amd.com> Bug: drm/amd#2113 (comment 1579296) Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Acked-by:
Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
-
- Oct 18, 2022
-
-
When we call connect() for a UDP socket in a reuseport group, we have to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel could select a unconnected socket wrongly for packets sent to the connected socket. However, the current way to set has_conns is illegal and possible to trigger that problem. reuseport_has_conns() changes has_conns under rcu_read_lock(), which upgrades the RCU reader to the updater. Then, it must do the update under the updater's lock, reuseport_lock, but it doesn't for now. For this reason, there is a race below where we fail to set has_conns resulting in the wrong socket selection. To avoid the race, let's split the reader and updater with proper locking. cpu1 cpu2 +----+ +----+ __ip[46]_datagram_connect() reuseport_grow() . . |- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size) | . | | |- rcu_read_lock() | |- reuse = rcu_dereference(sk->sk_reuseport_cb) | | | | | /* reuse->has_conns == 0 here */ | | |- more_reuse->has_conns = reuse->has_conns | |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */ | | | | | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb, | | | more_reuse) | `- rcu_read_unlock() `- kfree_rcu(reuse, rcu) | |- sk->sk_state = TCP_ESTABLISHED Note the likely(reuse) in reuseport_has_conns_set() is always true, but we put the test there for ease of review. [0] For the record, usually, sk_reuseport_cb is changed under lock_sock(). The only exception is reuseport_grow() & TCP reqsk migration case. 1) shutdown() TCP listener, which is moved into the latter part of reuse->socks[] to migrate reqsk. 2) New listen() overflows reuse->socks[] and call reuseport_grow(). 3) reuse->max_socks overflows u16 with the new listener. 4) reuseport_grow() pops the old shutdown()ed listener from the array and update its sk->sk_reuseport_cb as NULL without lock_sock(). shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(), but, reuseport_has_conns_set() is called only for UDP under lock_sock(), so likely(reuse) never be false in reuseport_has_conns_set(). [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/ Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets") Signed-off-by:
Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- Oct 17, 2022
-
-
Dmitry Osipenko authored
The internal dma-buf lock isn't needed anymore because the updated locking specification claims that dma-buf reservation must be locked by importers, and thus, the internal data is already protected by the reservation lock. Remove the obsoleted internal lock. Acked-by:
Sumit Semwal <sumit.semwal@linaro.org> Acked-by:
Christian König <christian.koenig@amd.com> Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-22-dmitry.osipenko@collabora.com
-
Dmitry Osipenko authored
The new common dma-buf locking convention will require buffer importers to hold the reservation lock around mapping operations. Make DRM GEM core to take the lock around the vmapping operations and update DRM drivers to use the locked functions for the case where DRM core now holds the lock. This patch prepares DRM core and drivers to the common dynamic dma-buf locking convention. Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-4-dmitry.osipenko@collabora.com
-
Dmitry Osipenko authored
Add unlocked variant of dma_buf_map/unmap_attachment() that will be used by drivers that don't take the reservation lock explicitly. Acked-by:
Sumit Semwal <sumit.semwal@linaro.org> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-3-dmitry.osipenko@collabora.com
-
Dmitry Osipenko authored
Add unlocked variant of dma_buf_vmap/vunmap() that will be utilized by drivers that don't take the reservation lock explicitly. Acked-by:
Sumit Semwal <sumit.semwal@linaro.org> Acked-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221017172229.42269-2-dmitry.osipenko@collabora.com
-
Peter Zijlstra authored
Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b2 ("perf: Add support for SIGTRAP on perf events") Reported-by:
Dmitry Vyukov <dvyukov@google.com> Debugged-by:
Dmitry Vyukov <dvyukov@google.com> Reported-by:
Marco Elver <elver@google.com> Debugged-by:
Marco Elver <elver@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Marco Elver <elver@google.com> Tested-by:
Marco Elver <elver@google.com>
-
- Oct 16, 2022
-
-
Tetsuo Handa authored
This reverts commit 78e5a339 ("cpumask: fix checking valid cpu range"). syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at cpu_max_bits_warn() [1], for commit 78e5a339 ("cpumask: fix checking valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE() when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of "cpu" will trivially hit cpu == nr_cpumask_bits condition. Although syzbot found this problem in linux-next.git on 2022/09/27 [2], this problem was not fixed immediately. As a result, that patch was sent to linux.git before the patch author recognizes this problem, and syzbot started failing to test changes in linux.git since 2022/10/10 [3]. Andrew Jones proposed a fix for x86 and riscv architectures [4]. But [2] and [5] indicate that affected locations are not limited to arch code. More delay before we find and fix affected locations, less tested kernel (and more difficult to bisect and fix) before release. We should have inspected and fixed basically all cpumask users before applying that patch. We should not crash kernels in order to ask existing cpumask users to update their code, even if limited to CONFIG_DEBUG_PER_CPU_MAPS=y case. Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1] Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2] Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3] Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4] Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5] Reported-by:
Andrew Jones <ajones@ventanamicro.com> Reported-by:
<syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com> Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Yury Norov <yury.norov@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 15, 2022
-
-
After commit d6a71648 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2) requests to buddy like SLUB does. SLAB has been using kmalloc caches to allocate freelist_idx_t array for off slab caches. But after the commit, freelist_size can be bigger than KMALLOC_MAX_CACHE_SIZE. Instead of using pointer to kmalloc cache, use kmalloc_node() and only check if the kmalloc cache is off slab during calculate_slab_order(). If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens as it allocates freelist_idx_t array directly from buddy. Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/ Reported-and-tested-by:
Guenter Roeck <linux@roeck-us.net> Fixes: d6a71648 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator") Signed-off-by:
Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by:
Vlastimil Babka <vbabka@suse.cz>
-
Shenwei Wang authored
The recent commit 'commit 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")' requires the MAC driver explicitly tell the phy driver who is managing the PM, otherwise you will see warning during resume stage. Add a boolean property in the phylink_config structure so that the MAC driver can use it to tell the PHY driver if it wants to manage the PM. Fixes: 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Signed-off-by:
Shenwei Wang <shenwei.wang@nxp.com> Acked-by:
Florian Fainelli <f.fainelli@gmail.com> Reviewed-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Jakub Kicinski authored
This reverts commit 854701ba. We have more violations around, which leads to: WARNING: CPU: 2 PID: 1 at include/linux/cpumask.h:110 __netif_set_xps_queue+0x14e/0x770 Let's back this out and retry with a larger clean up in -next. Fixes: 854701ba ("net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}") Link: https://lore.kernel.org/all/20221014030459.3272206-2-guoren@kernel.org/ Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 14, 2022
-
-
There is an issue when build with older versions of binutils 2.27.0, arch/arm/mach-at91/pm_suspend.S: Assembler messages: arch/arm/mach-at91/pm_suspend.S:1086: Error: garbage following instruction -- `ldr tmp1,=0x00020010UL' Use UL() macro to fix the issue in assembly file. Fixes: 4fd36e45 ("ARM: at91: pm: add plla disable/enable support for sam9x60") Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20221012030635.13140-1-wangkefeng.wang@huawei.com Signed-off-by:
Stephen Boyd <sboyd@kernel.org>
-
Christian Marangi authored
The switch sends autocast mib in little-endian. This is problematic for big-endian system as the values needs to be converted. Fix this by converting each mib value to cpu byte order. Fixes: 5c957c7c ("net: dsa: qca8k: add support for mib autocast in Ethernet packet") Tested-by:
Pawel Dembicki <paweldembicki@gmail.com> Tested-by:
Lech Perczak <lech.perczak@gmail.com> Signed-off-by:
Christian Marangi <ansuelsmth@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Christian Marangi authored
The header and the data of the skb for the inband mgmt requires to be in little-endian. This is problematic for big-endian system as the mgmt header is written in the cpu byte order. Fix this by converting each value for the mgmt header and data to little-endian, and convert to cpu byte order the mgmt header and data sent by the switch. Fixes: 5950c7c0 ("net: dsa: qca8k: add support for mgmt read/write in Ethernet packet") Tested-by:
Pawel Dembicki <paweldembicki@gmail.com> Tested-by:
Lech Perczak <lech.perczak@gmail.com> Signed-off-by:
Christian Marangi <ansuelsmth@gmail.com> Reviewed-by:
Lech Perczak <lech.perczak@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 13, 2022
-
-
Ashish Kalra authored
Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail. The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int: len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE); The following call trace is observed because of this bug: [ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK> Signed-off-by:
Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Greentime Hu authored
Since composable cache may be L3 cache if there is a L2 cache, we should use its original name composable cache to prevent confusion. There are some new lines were generated due to adding the compatible "sifive,ccache0" into ID table and indent requirement. The sifive L2 has been renamed to sifive CCACHE, EDAC driver needs to apply the change as well. Signed-off-by:
Greentime Hu <greentime.hu@sifive.com> Signed-off-by:
Zong Li <zong.li@sifive.com> Co-developed-by:
Zong Li <zong.li@sifive.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20220913061817.22564-3-zong.li@sifive.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Javier Martinez Canillas authored
Provides a default CRTC state check handler for CRTCs that only have one primary plane attached. There are some drivers that duplicate this logic in their helpers, such as simpledrm and ssd130x. Factor out this common code into a CRTC helper and make drivers use it. Signed-off-by:
Javier Martinez Canillas <javierm@redhat.com> Reviewed-by:
Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20221011165136.469750-5-javierm@redhat.com
-
Alistair Popple authored
Device drivers can use the migrate_vma family of functions to migrate existing private anonymous mappings to device private pages. These pages are backed by memory on the device with drivers being responsible for copying data to and from device memory. Device private pages are freed via the pgmap->page_free() callback when they are unmapped and their refcount drops to zero. Alternatively they may be freed indirectly via migration back to CPU memory in response to a pgmap->migrate_to_ram() callback called whenever the CPU accesses an address mapped to a device private page. In other words drivers cannot control the lifetime of data allocated on the devices and must wait until these pages are freed from userspace. This causes issues when memory needs to reclaimed on the device, either because the device is going away due to a ->release() callback or because another user needs to use the memory. Drivers could use the existing migrate_vma functions to migrate data off the device. However this would require them to track the mappings of each page which is both complicated and not always possible. Instead drivers need to be able to migrate device pages directly so they can free up device memory. To allow that this patch introduces the migrate_device family of functions which are functionally similar to migrate_vma but which skips the initial lookup based on mapping. Link: https://lkml.kernel.org/r/868116aab70b0c8ee467d62498bb2cf0ef907295.1664366292.git-series.apopple@nvidia.com Signed-off-by:
Alistair Popple <apopple@nvidia.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Alistair Popple authored
Since 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page refcount") device private pages have no longer had an extra reference count when the page is in use. However before handing them back to the owning device driver we add an extra reference count such that free pages have a reference count of one. This makes it difficult to tell if a page is free or not because both free and in use pages will have a non-zero refcount. Instead we should return pages to the drivers page allocator with a zero reference count. Kernel code can then safely use kernel functions such as get_page_unless_zero(). Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com Signed-off-by:
Alistair Popple <apopple@nvidia.com> Acked-by:
Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Alistair Popple authored
Patch series "Fix several device private page reference counting issues", v2 This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems. This patch (of 8): When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. [mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com Signed-off-by:
Alistair Popple <apopple@nvidia.com> Acked-by:
Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Lyude Paul <lyude@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Sierra <alex.sierra@amd.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Christian König <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Xin Hao authored
Rename sz_damon_region() to damon_sz_region(), and move it to "include/linux/damon.h", because in many places, we can to use this func. Link: https://lkml.kernel.org/r/20220927001946.85375-1-xhao@linux.alibaba.com Signed-off-by:
Xin Hao <xhao@linux.alibaba.com> Suggested-by:
SeongJae Park <sj@kernel.org> Reviewed-by:
SeongJae Park <sj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Kuniyuki Iwashima authored
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were able to clean them up by calling inet6_destroy_sock() during the IPv6 -> IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2a ("udpv6: Add lockless sendmsg() support") added a lockless memory allocation path, which could cause a memory leak: setsockopt(IPV6_ADDRFORM) sendmsg() +-----------------------+ +-------+ - do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...) - sockopt_lock_sock(sk) ^._ called via udpv6_prot - lock_sock(sk) before WRITE_ONCE() - WRITE_ONCE(sk->sk_prot, &tcp_prot) - inet6_destroy_sock() - if (!corkreq) - sockopt_release_sock(sk) - ip6_make_skb(sk, ...) - release_sock(sk) ^._ lockless fast path for the non-corking case - __ip6_append_data(sk, ...) - ipv6_local_rxpmtu(sk, ...) - xchg(&np->rxpmtu, skb) ^._ rxpmtu is never freed. - goto out_no_dst; - lock_sock(sk) For now, rxpmtu is only the case, but not to miss the future change and a similar bug fixed in commit e2732600 ("net: ping6: Fix memleak in ipv6_renew_options()."), let's set a new function to IPv6 sk->sk_destruct() and call inet6_cleanup_sock() there. Since the conversion does not change sk->sk_destruct(), we can guarantee that we can clean up IPv6 resources finally. We can now remove all inet6_destroy_sock() calls from IPv6 protocol specific ->destroy() functions, but such changes are invasive to backport. So they can be posted as a follow-up later for net-next. Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support") Signed-off-by:
Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Kuniyuki Iwashima authored
Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6 socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is changed to udp_prot and ->destroy() never cleans it up, resulting in a memory leak. This is due to the discrepancy between inet6_destroy_sock() and IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM to remove the difference. However, this is not enough for now because rxpmtu can be changed without lock_sock() after commit 03485f2a ("udpv6: Add lockless sendmsg() support"). We will fix this case in the following patch. Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy() in the future. Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") Signed-off-by:
Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- Oct 12, 2022
-
-
Alexey Dobriyan authored
Link: https://lkml.kernel.org/r/Y0WuE3Riv4iy5Jx8@localhost.localdomain Fixes: 7964cf8c ("mm: remove vmacache") Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Acked-by:
Liam Howlett <liam.howlett@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Pavel Begunkov authored
Notifications were killed but there is a couple of fields and struct declarations left, remove them. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8df8877d677be5a2b43afd936d600e60105ea960.1664849941.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Pavel Begunkov authored
Instead of putting io_uring's registered files in unix_gc() we want it to be done by io_uring itself. The trick here is to consider io_uring registered files for cycle detection but not actually putting them down. Because io_uring can't register other ring instances, this will remove all refs to the ring file triggering the ->release path and clean up with io_ring_ctx_free(). Cc: stable@vger.kernel.org Fixes: 6b06314c ("io_uring: add file set registration") Reported-and-tested-by:
David Bouman <dbouman03@gmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> [axboe: add kerneldoc comment to skb, fold in skb leak fix] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Uwe Kleine-König authored
To simplify debugging which process touches a watchdog and when, add tracing events for .start(), .set_timeout(), .ping() and .stop(). Signed-off-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by:
Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20221008174602.3972859-1-u.kleine-koenig@pengutronix.de Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Wim Van Sebroeck <wim@linux-watchdog.org>
-
Baolin Wang authored
On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified. So when looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte entry lock is incorrect for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get the correct lock, which is mm->page_table_lock. That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, even though they are under the 'pte lock'. For example, suppose thread A is trying to look up a CONT-PTE size hugetlb page by move_pages() syscall under the lock, however antoher thread B can migrate the CONT-PTE hugetlb page at the same time, which will cause thread A to get an incorrect page, if thread A also wants to do page migration, then data inconsistency error occurs. Moreover we have the same issue for CONT-PMD size hugetlb in follow_huge_pmd(). To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte() to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to get the correct pte entry lock to make the pte entry stable. Mike said: Support for CONT_PMD/_PTE was added with bb9dd3df ("arm64: hugetlb: refactor find_num_contig()"). Patch series "Support for contiguous pte hugepages", v4. However, I do not believe these code paths were executed until migration support was added with 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") I would go with 5480280d for the Fixes: targe. Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com Fixes: 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Suggested-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Tiezhu Yang authored
The argument has_signal of arch_do_signal_or_restart() has been removed in commit 8ba62d37 ("task_work: Call tracehook_notify_signal from get_signal on all architectures"), let us remove the related comment. Link: https://lkml.kernel.org/r/1662090106-5545-1-git-send-email-yangtiezhu@loongson.cn Fixes: 8ba62d37 ("task_work: Call tracehook_notify_signal from get_signal on all architectures") Signed-off-by:
Tiezhu Yang <yangtiezhu@loongson.cn> Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-