- Dec 27, 2022
-
-
Mathieu Desnoyers authored
In a scenario where kcalloc() fails to allocate memory, the futex_waitv system call immediately returns -ENOMEM without invoking destroy_hrtimer_on_stack(). When CONFIG_DEBUG_OBJECTS_TIMERS=y, this results in leaking a timer debug object. Fixes: bf69bad3 ("futex: Implement sys_futex_waitv()") Signed-off-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Davidlohr Bueso <dave@stgolabs.net> Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org # v5.16+ Link: https://lore.kernel.org/r/20221214222008.200393-1-mathieu.desnoyers@efficios.com
-
Namhyung Kim authored
It passes the attr struct to the security_perf_event_open() but it's not initialized yet. Fixes: da97e184 ("perf_event: Add support for LSM and SELinux checks") Signed-off-by:
Namhyung Kim <namhyung@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org
-
Peter Zijlstra authored
The syscall error path has a use-after-free; put_pmu_ctx() will reference ctx, therefore we must ensure ctx is destroyed after pmu_ctx is. Fixes: bd275681 ("perf: Rewrite core context handling") Reported-by:
<syzbot+b8e8c01c8ade4fe6e48f@syzkaller.appspotmail.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by:
Chengming Zhou <zhouchengming@bytedance.com> Link: https://lkml.kernel.org/r/Y6B3xEgkbmFUCeni@hirez.programming.kicks-ass.net
-
Chengming Zhou authored
We encounter perf warnings when using cgroup events like: cd /sys/fs/cgroup mkdir test perf stat -e cycles -a -G test Which then triggers: WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0 Call Trace: <TASK> __schedule+0x4ae/0x9f0 ? _raw_spin_unlock_irqrestore+0x23/0x40 ? __cond_resched+0x18/0x20 preempt_schedule_common+0x2d/0x70 __cond_resched+0x18/0x20 wait_for_completion+0x2f/0x160 ? cpu_stop_queue_work+0x9e/0x130 affine_move_task+0x18a/0x4f0 WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0 Call Trace: <TASK> ? ctx_sched_out+0xb7/0x1b0 perf_cgroup_switch+0x88/0xc0 __schedule+0x4ae/0x9f0 ? _raw_spin_unlock_irqrestore+0x23/0x40 ? __cond_resched+0x18/0x20 preempt_schedule_common+0x2d/0x70 __cond_resched+0x18/0x20 wait_for_completion+0x2f/0x160 ? cpu_stop_queue_work+0x9e/0x130 affine_move_task+0x18a/0x4f0 The above two warnings are not complete here since I remove other unimportant information. The problem is caused by the perf cgroup events tracking: CPU0 CPU1 perf_event_open() perf_event_alloc() account_event() account_event_cpu() atomic_inc(perf_cgroup_events) __perf_event_task_sched_out() if (atomic_read(perf_cgroup_events)) perf_cgroup_switch() // kernel/events/core.c:849 WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0) if (READ_ONCE(cpuctx->cgrp) == cgrp) // false return perf_ctx_lock() ctx_sched_out() cpuctx->cgrp = cgrp ctx_sched_in() perf_cgroup_set_timestamp() // kernel/events/core.c:829 WARN_ON_ONCE(!ctx->nr_cgroups) perf_ctx_unlock() perf_install_in_context() cpu_function_call() __perf_install_in_context() add_event_to_ctx() list_add_event() perf_cgroup_event_enable() ctx->nr_cgroups++ cpuctx->cgrp = X We can see from above that we wrongly use percpu atomic perf_cgroup_events to check if we need to perf_cgroup_switch(), which should only be used when we know this CPU has cgroup events enabled. The commit bd275681 ("perf: Rewrite core context handling") change to have only one context per-CPU, so we can just use cpuctx->cgrp to check if this CPU has cgroup events enabled. So percpu atomic perf_cgroup_events is not needed. Fixes: bd275681 ("perf: Rewrite core context handling") Signed-off-by:
Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by:
Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lkml.kernel.org/r/20221207124023.66252-1-zhouchengming@bytedance.com
-
Ravi Bangoria authored
inherit_event() returns NULL only when it finds orphaned events otherwise it returns either valid child_event pointer or an error pointer. Follow the same when it fails to find pmu_ctx. Fixes: bd275681 ("perf: Rewrite core context handling") Reported-by:
Dan Carpenter <error27@gmail.com> Signed-off-by:
Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221118051539.820-1-ravi.bangoria@amd.com
-
- Dec 23, 2022
-
-
Sami Tolvanen authored
When CFI_CLANG and KASAN are both enabled, LLVM doesn't generate a CFI type hash for asan.module_ctor functions in translation units where CFI is disabled, which leads to a CFI failure during boot when do_ctors calls the affected constructors: CFI failure at do_basic_setup+0x64/0x90 (target: asan.module_ctor+0x0/0x28; expected type: 0xa540670c) Specifically, this happens because CFI is disabled for kernel/cfi.c. There's no reason to keep CFI disabled here anymore, so fix the failure by not filtering out CC_FLAGS_CFI for the file. Note that https://reviews.llvm.org/rG3b14862f0a96 fixed the issue where LLVM didn't emit CFI type hashes for any sanitizer constructors, but now type hashes are emitted correctly for TUs that use CFI. Link: https://github.com/ClangBuiltLinux/linux/issues/1742 Fixes: 89245600 ("cfi: Switch to -fsanitize=kcfi") Reported-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221222225747.3538676-1-samitolvanen@google.com
-
- Dec 21, 2022
-
-
Rickard x Andersson authored
In GCC version 12.1 a checksum field was added. This patch fixes a kernel crash occurring during boot when using gcov-kernel with GCC version 12.2. The crash occurred on a system running on i.MX6SX. Link: https://lkml.kernel.org/r/20221220102318.3418501-1-rickaran@axis.com Fixes: 977ef30a ("gcov: support GCC 12.1 and newer compilers") Signed-off-by:
Rickard x Andersson <rickaran@axis.com> Reviewed-by:
Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by:
Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by:
Martin Liska <mliska@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Christoph Hellwig authored
While not quite as bogus as for the dma-coherent allocations that were fixed earlier, GFP_COMP for these allocations has no benefits for the dma-direct case, and can't be supported at all by dma dma-iommu backend which splits up allocations into smaller orders. Due to an oversight in ffcb7545 that flag stopped being cleared for all dma allocations, but only got rejected for coherent ones, so fix up these callers to not allow __GFP_COMP as well after the sound code has been fixed to not ask for it. Fixes: ffcb7545 ("dma-mapping: reject __GFP_COMP in dma_alloc_attrs") Reported-by:
Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reported-by:
Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Takashi Iwai <tiwai@suse.de> Tested-by:
Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by:
Kai Vehmanen <kai.vehmanen@linux.intel.com>
-
- Dec 20, 2022
-
-
Alessandro Carminati authored
It makes sense to move the important monitor structure into rodata to prevent accidental structure modification. Link: https://lkml.kernel.org/r/20221122173648.4732-1-acarmina@redhat.com Signed-off-by:
Alessandro Carminati <acarmina@redhat.com> Acked-by:
Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
- Dec 18, 2022
-
-
Paul E. McKenney authored
The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check that interrupts are enabled, as they normally should be when waiting for an RCU grace period. Except that it is legal to wait for grace periods during early boot, before interrupts have been enabled for the first time, and polling for grace periods is required to work during this time. This can result in false-positive lockdep splats in the presence of boot-time-initiated tracing. This commit therefore conditions those interrupts-enabled checks on rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by which time interrupts have been enabled. Reported-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Tested-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
- Dec 16, 2022
-
-
Kees Cook authored
Use a temporary variable to take full advantage of READ_ONCE() behavior. Without this, the report (and even the test) might be out of sync with the initial test. Reported-by:
Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.net Fixes: 9fc9e278 ("panic: Introduce warn_limit") Fixes: d4ccd54d ("exit: Put an upper limit on how often we can oops") Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jann Horn <jannh@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Marco Elver <elver@google.com> Cc: tangmeng <tangmeng@uniontech.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by:
Kees Cook <keescook@chromium.org>
-
Thomas Gleixner authored
On architectures such as s390 that do not use irq domains for MSI, returning 0 as the maximum MSI index is a bit counter-productive, as it indicates that no MSI can be allocated. Bad idea. Instead, return the maximum we're willing to support in the MSI backing store (MSI_XA_DOMAIN_SIZE), and let the arch code do its usual thing. Thanks to Matthew Rosato for fixing the fix. Reported-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> [maz: commit message] Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/87fsdgzpqs.ffs@tglx
-
Marc Zyngier authored
For architectures such as s390 and powerpc that do not use irq domains for MSIs, dev->msi.domain is always NULL, so the per-device, per-bus MSI domain is also guaranteed to be NULL. So checking one without checking the other is bound to result in a splat, followed by a memory leak as we don't free the MSI descriptors. Add the missing check. Reported-by:
Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/e570e70d-19bc-101b-0481-ff9a3cab3504@linux.ibm.com
-
- Dec 15, 2022
-
-
Peter Zijlstra authored
On architectures where the PTE/PMD is larger than the native word size (i386-PAE for example), READ_ONCE() can do the wrong thing. Use pmdp_get_lockless() just like we use ptep_get_lockless(). Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.906110403%40infradead.org
-
Peter Zijlstra authored
Because endlessly repeating: set_memory_ro() set_memory_x() is getting tedious. Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
-
Peter Zijlstra authored
Instead of duplicating init_mm, allocate a fresh mm. The advantage is that mm_alloc() has much simpler dependencies. Additionally it makes more conceptual sense, init_mm has no (and must not have) user state to duplicate. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
-
Peter Zijlstra authored
In order to allow using mm_alloc() much earlier, move initializing mm_cachep into mm_init(). Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
-
Toke Høiland-Jørgensen authored
The bpf_prog_map_compatible() check makes sure that BPF program types are not mixed inside BPF map types that can contain programs (tail call maps, cpumaps and devmaps). It does this by setting the fields of the map->owner struct to the values of the first program being checked against, and rejecting any subsequent programs if the values don't match. One of the values being set in the map owner struct is the program type, and since the code did not resolve the prog type for fext programs, the map owner type would be set to PROG_TYPE_EXT and subsequent loading of programs of the target type into the map would fail. This bug is seen in particular for XDP programs that are loaded as PROG_TYPE_EXT using libxdp; these cannot insert programs into devmaps and cpumaps because the check fails as described above. Fix the bug by resolving the fext program type to its target program type as elsewhere in the verifier. v3: - Add Yonghong's ACK Fixes: f45d5b6c ("bpf: generalise tail call map compatibility check") Acked-by:
Yonghong Song <yhs@fb.com> Signed-off-by:
Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221214230254.790066-1-toke@redhat.com Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org>
-
Masami Hiramatsu (Google) authored
Since uprobe's argument must contain the user-space data, that should not be converted to kernel symbols. Reject if user specifies these types on uprobe events. e.g. /sys/kernel/debug/tracing # echo 'p /bin/sh:10 %ax:symbol' >> uprobe_events sh: write error: Invalid argument /sys/kernel/debug/tracing # echo 'p /bin/sh:10 %ax:symstr' >> uprobe_events sh: write error: Invalid argument /sys/kernel/debug/tracing # cat error_log [ 1783.134883] trace_uprobe: error: Unknown type is specified Command: p /bin/sh:10 %ax:symbol ^ [ 1792.201120] trace_uprobe: error: Unknown type is specified Command: p /bin/sh:10 %ax:symstr ^ Link: https://lore.kernel.org/all/166679931679.1528100.15540755370726009882.stgit@devnote3/ Signed-off-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org>
-
- Dec 14, 2022
-
-
Masami Hiramatsu (Google) authored
Add 'symstr' type for storing the kernel symbol as a string data instead of the symbol address. This allows us to filter the events by wildcard symbol name. e.g. # echo 'e:wqfunc workqueue.workqueue_execute_start symname=$function:symstr' >> dynamic_events # cat events/eprobes/wqfunc/format name: wqfunc ID: 2110 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:__data_loc char[] symname; offset:8; size:4; signed:1; print fmt: " symname=\"%s\"", __get_str(symname) Note that there is already 'symbol' type which just change the print format (so it still stores the symbol address in the tracing ring buffer.) On the other hand, 'symstr' type stores the actual "symbol+offset/size" data as a string. Link: https://lore.kernel.org/all/166679930847.1528100.4124308529180235965.stgit@devnote3/ Signed-off-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org>
-
wuqiang authored
Default value of maxactive is set as num_possible_cpus() for nonpreemptable systems. For a 2-core system, only 2 kretprobe instances would be allocated in default, then these 2 instances for execve kretprobe are very likely to be used up with a pipelined command. Here's the testcase: a shell script was added to crontab, and the content of the script is: #!/bin/sh do_something_magic `tr -dc a-z < /dev/urandom | head -c 10` cron will trigger a series of program executions (4 times every hour). Then events loss would be noticed normally after 3-4 hours of testings. The issue is caused by a burst of series of execve requests. The best number of kretprobe instances could be different case by case, and should be user's duty to determine, but num_possible_cpus() as the default value is inadequate especially for systems with small number of cpus. This patch enables the logic for preemption as default, thus increases the minimum of maxactive to 10 for nonpreemptable systems. Link: https://lore.kernel.org/all/20221110081502.492289-1-wuqiang.matt@bytedance.com/ Signed-off-by:
wuqiang <wuqiang.matt@bytedance.com> Reviewed-by:
Solar Designer <solar@openwall.com> Acked-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org>
-
Jiri Olsa authored
Hao Sun reported crash in dispatcher image [1]. Currently we don't have any sync between bpf_dispatcher_update and bpf_dispatcher_xdp_func, so following race is possible: cpu 0: cpu 1: bpf_prog_run_xdp ... bpf_dispatcher_xdp_func in image at offset 0x0 bpf_dispatcher_update update image at offset 0x800 bpf_dispatcher_update update image at offset 0x0 in image at offset 0x0 -> crash Fixing this by synchronizing dispatcher image update (which is done in bpf_dispatcher_update function) with bpf_dispatcher_xdp_func that reads and execute the dispatcher image. Calling synchronize_rcu after updating and installing new image ensures that readers leave old image before it's changed in the next dispatcher update. The update itself is locked with dispatcher's mutex. The bpf_prog_run_xdp is called under local_bh_disable and synchronize_rcu will wait for it to leave [2]. [1] https://lore.kernel.org/bpf/Y5SFho7ZYXr9ifRn@krava/T/#m00c29ece654bc9f332a17df493bbca33e702896c [2] https://lore.kernel.org/bpf/0B62D35A-E695-4B7A-A0D4-774767544C1A@gmail.com/T/#mff43e2c003ae99f4a38f353c7969be4c7162e877 Reported-by:
Hao Sun <sunhao.th@gmail.com> Signed-off-by:
Jiri Olsa <jolsa@kernel.org> Acked-by:
Yonghong Song <yhs@fb.com> Acked-by:
Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20221214123542.1389719-1-jolsa@kernel.org Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org>
-
Milan Landaverde authored
In [0], we added the ability to bpf_prog_attach LSM programs to cgroups, but in our validation to make sure the prog is meant to be attached to BPF_LSM_CGROUP, we return too early if the check fails. This results in lack of decrementing prog's refcnt (through bpf_prog_put) leaving the LSM program alive past the point of the expected lifecycle. This fix allows for the decrement to take place. [0] https://lore.kernel.org/all/20220628174314.1216643-4-sdf@google.com/ Fixes: 69fd337a ("bpf: per-cgroup lsm flavor") Signed-off-by:
Milan Landaverde <milan@mdaverde.com> Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221213175714.31963-1-milan@mdaverde.com
-
Guilherme G. Piccoli authored
Currently the tracing dump_on_oops feature is implemented through separate notifiers, one for die/oops and the other for panic; given they have the same functionality, let's unify them. Also improve the function comment and change the priority of the notifier to make it execute earlier, avoiding showing useless trace data (like the callback names for the other notifiers); finally, we also removed an unnecessary header inclusion. Link: https://lkml.kernel.org/r/20220819221731.480795-7-gpiccoli@igalia.com Cc: Petr Mladek <pmladek@suse.com> Cc: Sergei Shtylyov <sergei.shtylyov@gmail.com> Signed-off-by:
Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
Guilherme G. Piccoli authored
The function match_records() may take a while due to a large number of string comparisons, so when in PREEMPT_VOLUNTARY kernels we could face RCU stalls due to that. Add a cond_resched() to prevent that. Link: https://lkml.kernel.org/r/20221115204847.593616-1-gpiccoli@igalia.com Cc: Mark Rutland <mark.rutland@arm.com> Suggested-by:
Steven Rostedt <rostedt@goodmis.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> # from RCU CPU stall warning perspective Signed-off-by:
Guilherme G. Piccoli <gpiccoli@igalia.com> Acked-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
Steven Rostedt (Google) authored
If a trigger filter on the kernel command line fails to apply (due to syntax error), it will be freed. The freeing will call tracepoint_synchronize_unregister(), but this is not needed during early boot up, and will even trigger a lockdep splat. Avoid calling the synchronization function when system_state is SYSTEM_BOOTING. Link: https://lore.kernel.org/linux-trace-kernel/20221213172429.7774f4ba@gandalf.local.home Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
Nathan Chancellor authored
When building arm64 allmodconfig + ThinLTO with clang and a proposed modpost update to account for -ffuncton-sections, the following warning appears: WARNING: modpost: vmlinux.o: section mismatch in reference: padata_work_init (section: .text.padata_work_init) -> padata_mt_helper (section: .init.text) WARNING: modpost: vmlinux.o: section mismatch in reference: padata_work_init (section: .text.padata_work_init) -> padata_mt_helper (section: .init.text) LLVM has optimized padata_work_init() to include the address of padata_mt_helper() directly because it inlined the other call to padata_work_init() with padata_parallel_worker(), meaning the remaining uses of padata_work_init() use padata_mt_helper() as the work_fn argument. This optimization causes modpost to complain since padata_work_init() is not __init, whereas padata_mt_helper() is. Since padata_work_init() is only called from __init code when padata_mt_helper() is passed as the work_fn argument, mark padata_work_init() as __ref, which makes it clear to modpost that this scenario is okay. Suggested-by:
Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by:
Nathan Chancellor <nathan@kernel.org> Acked-by:
Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
- Dec 13, 2022
-
-
Steven Rostedt (Google) authored
It is annoying that the filter parsing of triggers do not show up in the error_log. Trying to figure out what is incorrect in the input is difficult when it fails for a typo. Have the errors of filter parsing show up in error_log as well. Link: https://lore.kernel.org/linux-trace-kernel/20221213095602.083fa9fd@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
Zhen Lei authored
The implementation of function klp_match_callback() is identical to the partial implementation of function klp_find_callback(). So call function klp_match_callback() in function klp_find_callback() instead of the duplicated code. Signed-off-by:
Zhen Lei <thunder.leizhen@huawei.com> Acked-by:
Song Liu <song@kernel.org> Reviewed-by:
Petr Mladek <pmladek@suse.com> Suggested-by:
Petr Mladek <pmladek@suse.com> Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org>
-
- Dec 12, 2022
-
-
Mel Gorman authored
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench on XFS on arm64. kernel BUG at fs/inode.c:625! Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1 pc : clear_inode+0xa0/0xc0 lr : clear_inode+0x38/0xc0 Call trace: clear_inode+0xa0/0xc0 evict+0x160/0x180 iput+0x154/0x240 do_unlinkat+0x184/0x300 __arm64_sys_unlinkat+0x48/0xc0 el0_svc_common.constprop.4+0xe4/0x2c0 do_el0_svc+0xac/0x100 el0_svc+0x78/0x200 el0t_64_sync_handler+0x9c/0xc0 el0t_64_sync+0x19c/0x1a0 It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this is likely a bug that existed forever and only became visible when ARM support was added to preempt-rt. The same problem does not occur on x86-64 and he also reported that converting sb->s_inode_wblist_lock to raw_spinlock_t makes the problem disappear indicating that the RT spinlock variant is the problem. Which in turn means that RT mutexes on ARM64 and any other weakly ordered architecture are affected by this independent of RT. Will Deacon observed: "I'd be more inclined to be suspicious of the slowpath tbh, as we need to make sure that we have acquire semantics on all paths where the lock can be taken. Looking at the rtmutex code, this really isn't obvious to me -- for example, try_to_take_rt_mutex() appears to be able to return via the 'takeit' label without acquire semantics and it looks like we might be relying on the caller's subsequent _unlock_ of the wait_lock for ordering, but that will give us release semantics which aren't correct." Sebastian Andrzej Siewior prototyped a fix that does work based on that comment but it was a little bit overkill and added some fences that should not be necessary. The lock owner is updated with an IRQ-safe raw spinlock held, but the spin_unlock does not provide acquire semantics which are needed when acquiring a mutex. Adds the necessary acquire semantics for lock owner updates in the slow path acquisition and the waiter bit logic. It successfully completed 10 iterations of the dbench workload while the vanilla kernel fails on the first iteration. [ bigeasy@linutronix.de: Initial prototype fix ] Fixes: 700318d1 ("locking/rtmutex: Use acquire/release semantics") Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core") Reported-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Mel Gorman <mgorman@techsingularity.net> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
-
Yang Jihong authored
print_trace_line may overflow seq_file buffer. If the event is not consumed, the while loop keeps peeking this event, causing a infinite loop. Link: https://lkml.kernel.org/r/20221129113009.182425-1-yangjihong1@huawei.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 088b1e42 ("ftrace: pipe fixes") Signed-off-by:
Yang Jihong <yangjihong1@huawei.com> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
Gavrilov Ilia authored
The 'padding' field of the 'rchan_buf' structure is an array of 'size_t' elements, but the memory is allocated for an array of 'size_t *' elements. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20221129092002.3538384-1-Ilia.Gavrilov@infotecs.ru Fixes: b86ff981 ("[PATCH] relay: migrate from relayfs to a generic relay API") Signed-off-by:
Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Anders Roxell authored
Building kcsan_test with structleak plugin enabled makes the stack frame size to grow. kernel/kcsan/kcsan_test.c:704:1: error: the frame size of 3296 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Turn off the structleak plugin checks for kcsan_test. Link: https://lkml.kernel.org/r/20221128104358.2660634-1-anders.roxell@linaro.org Signed-off-by:
Anders Roxell <anders.roxell@linaro.org> Suggested-by:
Arnd Bergmann <arnd@arndb.de> Acked-by:
Marco Elver <elver@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Gow <davidgow@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Xu Panda authored
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Link: https://lkml.kernel.org/r/202211220853259244666@zte.com.cn Signed-off-by:
Xu Panda <xu.panda@zte.com.cn> Signed-off-by:
Yang Yang <yang.yang29@zte.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: wuchi <wuchi.zero@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Alexander Potapenko authored
Lockdep and KMSAN used to play badly together, causing deadlocks when KMSAN instrumentation of lockdep.c called lockdep functions recursively. Looks like this is no more the case, and a kernel can run (yet slower) with both KMSAN and lockdep enabled. This patch should fix false positives on wq_head->lock->dep_map, which KMSAN used to consider uninitialized because of lockdep.c not being instrumented. Link: https://lore.kernel.org/lkml/Y3b9AAEKp2Vr3e6O@sol.localdomain/ Link: https://lkml.kernel.org/r/20221128094541.2645890-1-glider@google.com Signed-off-by:
Alexander Potapenko <glider@google.com> Reported-by:
Eric Biggers <ebiggers@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Dec 10, 2022
-
-
Eduard Zingerman authored
An update for verifier.c:states_equal()/regsafe() to use check_ids() for active spin lock comparisons. This fixes the issue reported by Kumar Kartikeya Dwivedi in [1] using technique suggested by Edward Cree. W/o this commit the verifier might be tricked to accept the following program working with a map containing spin locks: 0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1. 1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2. 2: if r9 == 0 goto exit ; r9 -> PTR_TO_MAP_VALUE. 3: if r8 == 0 goto exit ; r8 -> PTR_TO_MAP_VALUE. 4: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE. 5: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE. 6: bpf_spin_lock(r8) ; active_lock.id == 2. 7: if r6 > r7 goto +1 ; No new information about the state ; is derived from this check, thus ; produced verifier states differ only ; in 'insn_idx'. 8: r9 = r8 ; Optionally make r9.id == r8.id. --- checkpoint --- ; Assume is_state_visisted() creates a ; checkpoint here. 9: bpf_spin_unlock(r9) ; (a,b) active_lock.id == 2. ; (a) r9.id == 2, (b) r9.id == 1. 10: exit(0) Consider two verification paths: (a) 0-10 (b) 0-7,9-10 The path (a) is verified first. If checkpoint is created at (8) the (b) would assume that (8) is safe because regsafe() does not compare register ids for registers of type PTR_TO_MAP_VALUE. [1] https://lore.kernel.org/bpf/20221111202719.982118-1-memxor@gmail.com/ Reported-by:
Kumar Kartikeya Dwivedi <memxor@gmail.com> Suggested-by:
Edward Cree <ecree.xilinx@gmail.com> Signed-off-by:
Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-6-eddyz87@gmail.com Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
Eduard Zingerman authored
verifier.c:states_equal() must maintain register ID mapping across all function frames. Otherwise the following example might be erroneously marked as safe: main: fp[-24] = map_lookup_elem(...) ; frame[0].fp[-24].id == 1 fp[-32] = map_lookup_elem(...) ; frame[0].fp[-32].id == 2 r1 = &fp[-24] r2 = &fp[-32] call foo() r0 = 0 exit foo: 0: r9 = r1 1: r8 = r2 2: r7 = ktime_get_ns() 3: r6 = ktime_get_ns() 4: if (r6 > r7) goto skip_assign 5: r9 = r8 skip_assign: ; <--- checkpoint 6: r9 = *r9 ; (a) frame[1].r9.id == 2 ; (b) frame[1].r9.id == 1 7: if r9 == 0 goto exit: ; mark_ptr_or_null_regs() transfers != 0 info ; for all regs sharing ID: ; (a) r9 != 0 => &frame[0].fp[-32] != 0 ; (b) r9 != 0 => &frame[0].fp[-24] != 0 8: r8 = *r8 ; (a) r8 == &frame[0].fp[-32] ; (b) r8 == &frame[0].fp[-32] 9: r0 = *r8 ; (a) safe ; (b) unsafe exit: 10: exit While processing call to foo() verifier considers the following execution paths: (a) 0-10 (b) 0-4,6-10 (There is also path 0-7,10 but it is not interesting for the issue at hand. (a) is verified first.) Suppose that checkpoint is created at (6) when path (a) is verified, next path (b) is verified and (6) is reached. If states_equal() maintains separate 'idmap' for each frame the mapping at (6) for frame[1] would be empty and regsafe(r9)::check_ids() would add a pair 2->1 and return true, which is an error. If states_equal() maintains single 'idmap' for all frames the mapping at (6) would be { 1->1, 2->2 } and regsafe(r9)::check_ids() would return false when trying to add a pair 2->1. This issue was suggested in the following discussion: https://lore.kernel.org/bpf/CAEf4BzbFB5g4oUfyxk9rHy-PJSLQ3h8q9mV=rVoXfr_JVm8+1Q@mail.gmail.com/ Suggested-by:
Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by:
Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-4-eddyz87@gmail.com Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
Eduard Zingerman authored
The verifier.c:regsafe() has the following shortcut: equal = memcmp(rold, rcur, offsetof(struct bpf_reg_state, parent)) == 0; ... if (equal) return true; Which is executed regardless old register type. This is incorrect for register types that might have an ID checked by check_ids(), namely: - PTR_TO_MAP_KEY - PTR_TO_MAP_VALUE - PTR_TO_PACKET_META - PTR_TO_PACKET The following pattern could be used to exploit this: 0: r9 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=1. 1: r8 = map_lookup_elem(...) ; Returns PTR_TO_MAP_VALUE_OR_NULL id=2. 2: r7 = ktime_get_ns() ; Unbound SCALAR_VALUE. 3: r6 = ktime_get_ns() ; Unbound SCALAR_VALUE. 4: if r6 > r7 goto +1 ; No new information about the state ; is derived from this check, thus ; produced verifier states differ only ; in 'insn_idx'. 5: r9 = r8 ; Optionally make r9.id == r8.id. --- checkpoint --- ; Assume is_state_visisted() creates a ; checkpoint here. 6: if r9 == 0 goto <exit> ; Nullness info is propagated to all ; registers with matching ID. 7: r1 = *(u64 *) r8 ; Not always safe. Verifier first visits path 1-7 where r8 is verified to be not null at (6). Later the jump from 4 to 6 is examined. The checkpoint for (6) looks as follows: R8_rD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R9_rwD=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R10=fp0 The current state is: R0=... R6=... R7=... fp-8=... R8=map_value_or_null(id=2,off=0,ks=4,vs=8,imm=0) R9=map_value_or_null(id=1,off=0,ks=4,vs=8,imm=0) R10=fp0 Note that R8 states are byte-to-byte identical, so regsafe() would exit early and skip call to check_ids(), thus ID mapping 2->2 will not be added to 'idmap'. Next, states for R9 are compared: these are not identical and check_ids() is executed, but 'idmap' is empty, so check_ids() adds mapping 2->1 to 'idmap' and returns success. This commit pushes the 'equal' down to register types that don't need check_ids(). Signed-off-by:
Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20221209135733.28851-2-eddyz87@gmail.com Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
Daniel Bristot de Oliveira authored
The osnoise workload runs with preemption and IRQs enabled in such a way as to allow all sorts of noise to disturb osnoise's execution. hwlat tracer has a similar workload but works with irq disabled, allowing only NMIs and the hardware to generate noise. While thinking about adding an options file to hwlat tracer to allow the system to panic, and other features I was thinking to add, like having a tracepoint at each noise detection, it came to my mind that is easier to make osnoise and also do hardware latency detection than making hwlat "feature compatible" with osnoise. Other points are: - osnoise already has an independent cpu file. - osnoise has a more intuitive interface, e.g., runtime/period vs. window/width (and people often need help remembering what it is). - osnoise: tracepoints - osnoise stop options - osnoise options file itself Moreover, the user-space side (in rtla) is simplified by reusing the existing osnoise code. Finally, people have been asking me about using osnoise for hw latency detection, and I have to explain that it was sufficient but not necessary. These options make it sufficient and necessary. Adding a Suggested-by Clark, as he often asked me about this possibility. Link: https://lkml.kernel.org/r/d9c6c19135497054986900f94c8e47410b15316a.1670623111.git.bristot@kernel.org Cc: Suggested-by: Clark Williams <williams@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by:
Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
Daniel Bristot de Oliveira authored
Often the latency observed in a CPU is not caused by the work being done in the CPU itself, but by work done on another CPU that causes the hardware to stall all CPUs. In this case, it is interesting to know what is happening on ALL CPUs, and the best way to do this is via crash dump analysis. Add the PANIC_ON_STOP option to osnoise/timerlat tracers. The default behavior is having this option off. When enabled by the user, the system will panic after hitting a stop tracing condition. This option was motivated by a real scenario that Juri Lelli and I were debugging. Link: https://lkml.kernel.org/r/249ce4287c6725543e6db845a6e0df621dc67db5.1670623111.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by:
Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-