- Jan 24, 2024
-
-
Frederic Weisbecker authored
When the CPU goes idle for the last time during the CPU down hotplug process, RCU reports a final quiescent state for the current CPU. If this quiescent state propagates up to the top, some tasks may then be woken up to complete the grace period: the main grace period kthread and/or the expedited main workqueue (or kworker). If those kthreads have a SCHED_FIFO policy, the wake up can indirectly arm the RT bandwith timer to the local offline CPU. Since this happens after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the timer gets ignored. Therefore if the RCU kthreads are waiting for RT bandwidth to be available, they may never be actually scheduled. This triggers TREE03 rcutorture hangs: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved) rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6) rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000 Call Trace: <TASK> __schedule+0x2eb/0xa80 schedule+0x1f/0x90 schedule_timeout+0x163/0x270 ? __pfx_process_timeout+0x10/0x10 rcu_gp_fqs_loop+0x37c/0x5b0 ? __pfx_rcu_gp_kthread+0x10/0x10 rcu_gp_kthread+0x17c/0x200 kthread+0xde/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> The situation can't be solved with just unpinning the timer. The hrtimer infrastructure and the nohz heuristics involved in finding the best remote target for an unpinned timer would then also need to handle enqueues from an offline CPU in the most horrendous way. So fix this on the RCU side instead and defer the wake up to an online CPU if it's too late for the local one. Reported-by:
Paul E. McKenney <paulmck@kernel.org> Fixes: 5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
- Dec 13, 2023
-
-
Zqiang authored
If an rcutorture test scenario creates an fqs_task kthread, it will periodically invoke rcu_force_quiescent_state() in order to start force-quiescent-state (FQS) operations. However, an FQS operation will be started even if there is no RCU grace period in progress. Although testing FQS operations startup when there is no grace period in progress is necessary, it need not happen all that often. This commit therefore causes rcu_force_quiescent_state() to take an early exit if there is no grace period in progress. Note that there will still be attempts to start an FQS scan in the absence of a grace period because the grace period might end right after the rcu_force_quiescent_state() function's check. In actual testing, this happens about once every ten minutes, which should provide adequate testing. Signed-off-by:
Zqiang <qiang.zhang1211@gmail.com> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
- Dec 11, 2023
-
-
Frederic Weisbecker authored
If an SRCU barrier is queued while callbacks are running and a new callbacks invocator for the same sdp were to run concurrently, the RCU barrier might execute too early. As this requirement is non-obvious, make sure to keep a record. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
Frederic Weisbecker authored
While in grace period start, there is nothing to accelerate and therefore no need to advance the callbacks either if no callback is to be enqueued. Spare these needless operations in this case. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
Frederic Weisbecker authored
Callbacks advancing on SRCU must be performed on two specific places: 1) On enqueue time in order to make room for the acceleration of the new callback. 2) On invocation time in order to move the callbacks ready to invoke. Any other callback advancing callsite is needless. Remove the remaining one in srcu_gp_start(). Co-developed-by:
Yong He <zhuangel570@gmail.com> Signed-off-by:
Yong He <zhuangel570@gmail.com> Co-developed-by:
Joel Fernandes <joel@joelfernandes.org> Signed-off-by:
Joel Fernandes <joel@joelfernandes.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Co-developed-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
Paul E. McKenney authored
Although the RCU CPU stall notifiers can be useful for dumping state when tracking down delicate forward-progress bugs where NUMA effects cause cache lines to be delivered to a given CPU regularly, but always in a state that prevents that CPU from making forward progress. These bugs can be detected by the RCU CPU stall-warning mechanism, but in some cases, the stall-warnings printk()s disrupt the forward-progress bug before any useful state can be obtained. Unfortunately, the notifier mechanism added by commit 5b404fda ("rcu: Add RCU CPU stall notifier") can make matters worse if used at all carelessly. For example, if the stall warning was caused by a lock not being released, then any attempt to acquire that lock in the notifier will hang. This will prevent not only the notifier from producing any useful output, but it will also prevent the stall-warning message from ever appearing. This commit therefore hides this new RCU CPU stall notifier mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that depends on both DEBUG_KERNEL and RCU_EXPERT. In addition, the rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also be specified. The RCU_CPU_STALL_NOTIFIER Kconfig option's help text contains a warning and explains the dangers of careless use, recommending lockless notifier code. In addition, a WARN() is triggered each time that an attempt is made to register a stall-warning notifier in kernels built with CONFIG_RCU_CPU_STALL_NOTIFIER=y. This combination of measures will keep use of this mechanism confined to debug kernels and away from routine deployments. [ paulmck: Apply Dan Carpenter feedback. ] Fixes: 5b404fda ("rcu: Add RCU CPU stall notifier") Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
Paul E. McKenney authored
The task_struct structure's ->rcu_tasks_idle_cpu can be concurrently read and written from the RCU Tasks grace-period kthread and from the CPU on which the task_struct structure's task is running. This commit therefore marks the accesses appropriately. Reported-by:
Boqun Feng <boqun.feng@gmail.com> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
- Nov 23, 2023
-
-
Zqiang authored
For rcutorture tests on RCU implementations that support force-quiescent-state operations and that set the fqs_duration module parameter greater than zero, the fqs_task kthread will be created. However, if the fqs_holdoff module parameter is not set, then its default value of zero will cause fqs_task enter a long-term busy loop until stopped by kthread_stop(). This commit therefore adds a fqs_holdoff check before the fqs_task is created, making sure that whenever the fqs_task is created, the fqs_holdoff will be greater than zero. Signed-off-by:
Zqiang <qiang.zhang1211@gmail.com> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
-
- Nov 01, 2023
-
-
Frederic Weisbecker authored
The commit: cff9b233 ("kernel/sched: Modify initial boot task idle setup") has changed the semantics of what is to be considered an idle task in such a way that the idle task of an offline CPU may not carry the PF_IDLE flag anymore. However RCU-tasks-trace tests the opposite assertion, still assuming that idle tasks carry the PF_IDLE flag during their whole lifecycle. Remove this assumption to avoid spurious warnings but keep the initial test verifying that the idle task is the current task on any offline CPU. Reported-by:
Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: cff9b233 ("kernel/sched: Modify initial boot task idle setup") Suggested-by:
Joel Fernandes <joel@joelfernandes.org> Suggested-by:
Paul E . McKenney" <paulmck@kernel.org> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Frederic Weisbecker authored
The commit: cff9b233 ("kernel/sched: Modify initial boot task idle setup") has changed the semantics of what is to be considered an idle task in such a way that CPU boot code preceding the actual idle loop is excluded from it. This has however introduced new potential RCU-tasks stalls when either: 1) Grace period is started before init/0 had a chance to set PF_IDLE, keeping it stuck in the holdout list until idle ever schedules. 2) Grace period is started when some possible CPUs have never been online, keeping their idle tasks stuck in the holdout list until the CPU ever boots up. 3) Similar to 1) but with secondary CPUs: Grace period is started concurrently with secondary CPU booting, putting its idle task in the holdout list because PF_IDLE isn't yet observed on it. It stays then stuck in the holdout list until that CPU ever schedules. The effect is mitigated here by the hotplug AP thread that must run to bring the CPU up. Fix this with handling the new semantics of PF_IDLE, keeping in mind that it may or may not be set on an idle task. Take advantage of that to strengthen the coverage of an RCU-tasks quiescent state within an idle task, excluding the CPU boot code from it. Only the code running within the idle loop is now a quiescent state, along with offline CPUs. Fixes: cff9b233 ("kernel/sched: Modify initial boot task idle setup") Suggested-by:
Joel Fernandes <joel@joelfernandes.org> Suggested-by:
Paul E . McKenney" <paulmck@kernel.org> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Frederic Weisbecker authored
Export the RCU point of view as to when a CPU is considered offline (ie: when does RCU consider that a CPU is sufficiently down in the hotplug process to not feature any possible read side). This will be used by RCU-tasks whose vision of an offline CPU should reasonably match the one of RCU core. Fixes: cff9b233 ("kernel/sched: Modify initial boot task idle setup") Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Peter Zijlstra authored
Commit 851a723e ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") added a kfree() call to free any user provided affinity mask, if present. It was changed later to use kfree_rcu() in commit 9a5418bc ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()") to avoid a circular locking dependency problem. It turns out that even kfree_rcu() isn't safe for avoiding circular locking problem. As reported by kernel test robot, the following circular locking dependency now exists: &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving the resched_cpu() out from under rcu_node lock. [peterz: heavily borrowed from Waiman's Changelog] [paulmck: applied Z qiang feedback] Fixes: 851a723e ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") Reported-by:
kernel test robot <oliver.sang@intel.com> Acked-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Oct 13, 2023
-
-
Frederic Weisbecker authored
Acceleration in SRCU happens on enqueue time for each new callback. This operation is expected not to fail and therefore any similar attempt from other places shouldn't find any remaining callbacks to accelerate. Moreover accelerations performed beyond enqueue time are error prone because rcu_seq_snap() then may return the snapshot for a new grace period that is not going to be started. Remove these dangerous and needless accelerations and introduce instead assertions reporting leaking unaccelerated callbacks beyond enqueue time. Co-developed-by:
Yong He <alexyonghe@tencent.com> Signed-off-by:
Yong He <alexyonghe@tencent.com> Co-developed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by:
Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by:
Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Reviewed-by:
Like Xu <likexu@tencent.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Oct 10, 2023
-
-
Frederic Weisbecker authored
SRCU callbacks acceleration might fail if the preceding callbacks advance also fails. This can happen when the following steps are met: 1) The RCU_WAIT_TAIL segment has callbacks (say for gp_num 8) and the RCU_NEXT_READY_TAIL also has callbacks (say for gp_num 12). 2) The grace period for RCU_WAIT_TAIL is observed as started but not yet completed so rcu_seq_current() returns 4 + SRCU_STATE_SCAN1 = 5. 3) This value is passed to rcu_segcblist_advance() which can't move any segment forward and fails. 4) srcu_gp_start_if_needed() still proceeds with callback acceleration. But then the call to rcu_seq_snap() observes the grace period for the RCU_WAIT_TAIL segment (gp_num 8) as completed and the subsequent one for the RCU_NEXT_READY_TAIL segment as started (ie: 8 + SRCU_STATE_SCAN1 = 9) so it returns a snapshot of the next grace period, which is 16. 5) The value of 16 is passed to rcu_segcblist_accelerate() but the freshly enqueued callback in RCU_NEXT_TAIL can't move to RCU_NEXT_READY_TAIL which already has callbacks for a previous grace period (gp_num = 12). So acceleration fails. 6) Note in all these steps, srcu_invoke_callbacks() hadn't had a chance to run srcu_invoke_callbacks(). Then some very bad outcome may happen if the following happens: 7) Some other CPU races and starts the grace period number 16 before the CPU handling previous steps had a chance. Therefore srcu_gp_start() isn't called on the latter sdp to fix the acceleration leak from previous steps with a new pair of call to advance/accelerate. 8) The grace period 16 completes and srcu_invoke_callbacks() is finally called. All the callbacks from previous grace periods (8 and 12) are correctly advanced and executed but callbacks in RCU_NEXT_READY_TAIL still remain. Then rcu_segcblist_accelerate() is called with a snaphot of 20. 9) Since nothing started the grace period number 20, callbacks stay unhandled. This has been reported in real load: [3144162.608392] INFO: task kworker/136:12:252684 blocked for more than 122 seconds. [3144162.615986] Tainted: G O K 5.4.203-1-tlinux4-0011.1 #1 [3144162.623053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [3144162.631162] kworker/136:12 D 0 252684 2 0x90004000 [3144162.631189] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm] [3144162.631192] Call Trace: [3144162.631202] __schedule+0x2ee/0x660 [3144162.631206] schedule+0x33/0xa0 [3144162.631209] schedule_timeout+0x1c4/0x340 [3144162.631214] ? update_load_avg+0x82/0x660 [3144162.631217] ? raw_spin_rq_lock_nested+0x1f/0x30 [3144162.631218] wait_for_completion+0x119/0x180 [3144162.631220] ? wake_up_q+0x80/0x80 [3144162.631224] __synchronize_srcu.part.19+0x81/0xb0 [3144162.631226] ? __bpf_trace_rcu_utilization+0x10/0x10 [3144162.631227] synchronize_srcu+0x5f/0xc0 [3144162.631236] irqfd_shutdown+0x3c/0xb0 [kvm] [3144162.631239] ? __schedule+0x2f6/0x660 [3144162.631243] process_one_work+0x19a/0x3a0 [3144162.631244] worker_thread+0x37/0x3a0 [3144162.631247] kthread+0x117/0x140 [3144162.631247] ? process_one_work+0x3a0/0x3a0 [3144162.631248] ? __kthread_cancel_work+0x40/0x40 [3144162.631250] ret_from_fork+0x1f/0x30 Fix this with taking the snapshot for acceleration _before_ the read of the current grace period number. The only side effect of this solution is that callbacks advancing happen then _after_ the full barrier in rcu_seq_snap(). This is not a problem because that barrier only cares about: 1) Ordering accesses of the update side before call_srcu() so they don't bleed. 2) See all the accesses prior to the grace period of the current gp_num The only things callbacks advancing need to be ordered against are carried by snp locking. Reported-by:
Yong He <alexyonghe@tencent.com> Co-developed-by:
: Yong He <alexyonghe@tencent.com> Signed-off-by:
Yong He <alexyonghe@tencent.com> Co-developed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Co-developed-by:
Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by:
Neeraj upadhyay <Neeraj.Upadhyay@amd.com> Link: http://lore.kernel.org/CANZk6aR+CqZaqmMWrC2eRRPY12qAZnDZLwLnHZbNi=xXMB401g@mail.gmail.com Fixes: da915ad5 ("srcu: Parallelize callback handling") Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Oct 04, 2023
-
-
Frederic Weisbecker authored
rcu_report_dead() and rcutree_migrate_callbacks() have their headers in rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug functions. Also rcu_cpu_starting() and rcu_report_dead() have different naming conventions while they mirror each other's effects. Fix the headers and propose a naming that relates both functions and aligns with the prefix of other rcutree CPU-hotplug functions. Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Frederic Weisbecker authored
Among the three CPU-hotplug teardown RCU callbacks, two of them early exit if CONFIG_HOTPLUG_CPU=n, and one is left unchanged. In any case all of them have an implementation when CONFIG_HOTPLUG_CPU=n. Align instead with the common way to deal with CPU-hotplug teardown callbacks and provide a proper stub when they are not supported. Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Qi Zheng authored
Use new APIs to dynamically allocate the rcu-kfree shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-17-zhengqi.arch@bytedance.com Signed-off-by:
Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by:
Muchun Song <songmuchun@bytedance.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Qi Zheng authored
Use new APIs to dynamically allocate the rcu-lazy shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-16-zhengqi.arch@bytedance.com Signed-off-by:
Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by:
Muchun Song <songmuchun@bytedance.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Frederic Weisbecker authored
rcu_report_dead() has to be called locally by the CPU that is going to exit the RCU state machine. Passing a cpu argument here is error-prone and leaves the possibility for a racy remote call. Use local access instead. Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Frederic Weisbecker authored
rcu_report_dead() is the last RCU word from the CPU down through the hotplug path. It is called in the idle loop right before the CPU shuts down for good. Because it removes the CPU from the grace period state machine and reports an ultimate quiescent state if necessary, no further use of RCU is allowed. Therefore it is expected that IRQs are disabled upon calling this function and are not to be re-enabled again until the CPU shuts down. Remove the IRQs disablement from that function and verify instead that it is actually called with IRQs disabled as it is expected at that special point in the idle path. Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Frederic Weisbecker authored
This makes the code more readable. Reviewed-by:
Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Catalin Marinas authored
Since the actual slab freeing is deferred when calling kvfree_rcu(), so is the kmemleak_free() callback informing kmemleak of the object deletion. From the perspective of the kvfree_rcu() caller, the object is freed and it may remove any references to it. Since kmemleak does not scan RCU internal data storing the pointer, it will report such objects as leaks during the grace period. Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note that the tiny RCU implementation does not have such issue since the objects can be tracked from the rcu_ctrlblk structure. Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Reported-by:
Christoph Paasch <cpaasch@apple.com> Closes: https://lore.kernel.org/all/F903A825-F05F-4B77-A2B5-7356282FBA2C@apple.com/ Cc: <stable@vger.kernel.org> Tested-by:
Christoph Paasch <cpaasch@apple.com> Reviewed-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Sep 26, 2023
-
-
Denis Arefev authored
The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo) is subject to overflow due to a failure to cast operands to a larger data type before performing the bitwise operation. The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF Kconfig option, which on 64-bit systems defaults to 16 (resulting in a maximum shift of 15), but which can be set up as high as 64 (resulting in a maximum shift of 63). A value of 31 can result in sign extension, resulting in 0xffffffff80000000 instead of the desired 0x80000000. A value of 32 or greater triggers undefined behavior per the C standard. This bug has not been known to cause issues because almost all kernels take the default CONFIG_RCU_FANOUT_LEAF=16. Furthermore, as long as a given compiler gives a deterministic non-zero result for 1<<N for N>=32, the code correctly invokes all SRCU callbacks, albeit wasting CPU time along the way. This commit therefore substitutes the correct 1UL for the buggy 1. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by:
Denis Arefev <arefev@swemel.ru> Reviewed-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Cc: David Laight <David.Laight@aculab.com> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Sep 24, 2023
-
-
Zqiang authored
Currently, the maxcpu is set by traversing online CPUs, however, if the rcutorture.onoff_holdoff is set zero and onoff_interval is set non-zero, and the some CPUs with larger cpuid has been offline before setting maxcpu, for these CPUs, even if they are online again, also cannot be offload or deoffload. This can result in rcutorture attempting to (de-)offload CPUs that have never been online, but the (de-)offload code handles this. This commit therefore use for_each_possible_cpu() instead of for_each_online_cpu() in rcu_nocb_toggle(). Signed-off-by:
Zqiang <qiang.zhang1211@gmail.com> Reviewed-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Joel Fernandes (Google) authored
In the past, spinning on schedule_timeout* with a wait of 1 jiffy has hung the kernel. See for example d52d3a2b ("torture: Fix hang during kthread shutdown phase"). This issue recently recurred in torture's stutter code. The result is that the function instantly returns and never goes to sleep, preempting whatever might otherwise make useful forward progress. To prevent future issues, apply the commit-d52d3a2b fix throughout rcutorture, moving from a 1-jiffy wait to a 50-millisecond wait. Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
The rcutorture_sched_setaffinity() function is needed by locktorture, so move its declaration from rcu.h to torture.h and rename it to the more generic torture_sched_setaffinity() name. Please note that use of this function is still restricted to torture tests, and of those, currently only rcutorture and locktorture. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Arnd Bergmann authored
The prototype for torture_sched_setaffinity() will be moved to a different header, which will need to be included from update.c to avoid this W=1 warning: kernel/rcu/update.c:529:6: error: no previous prototype for 'torture_sched_setaffinity' [-Werror=missing-prototypes] 529 | long torture_sched_setaffinity(pid_t pid, const struct cpumask *in_mask) Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Sep 13, 2023
-
-
Paul E. McKenney authored
When using rcutorture as a module, there are a number of conditions that can abort the modprobe operation, for example, when attempting to run both RCU CPU stall warning tests and forward-progress tests. This can cause rcu_torture_cleanup() to be invoked on the unwind path out of rcu_rcu_torture_init(), which will mean that rcu_gp_slow_unregister() is invoked without a matching rcu_gp_slow_register(). This will cause a splat because rcu_gp_slow_unregister() is passed rcu_fwd_cb_nodelay, which does not match a NULL pointer. This commit therefore forgives a mismatch involving a NULL pointer, thus avoiding this false-positive splat. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Zhen Lei authored
When a structure containing an RCU callback rhp is (incorrectly) freed and reallocated after rhp is passed to call_rcu(), it is not unusual for rhp->func to be set to NULL. This defeats the debugging prints used by __call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which expect to identify the offending code using the identity of this function. And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things are even worse, as can be seen from this splat: Unable to handle kernel NULL pointer dereference at virtual address 0 ... ... PC is at 0x0 LR is at rcu_do_batch+0x1c0/0x3b8 ... ... (rcu_do_batch) from (rcu_core+0x1d4/0x284) (rcu_core) from (__do_softirq+0x24c/0x344) (__do_softirq) from (__irq_exit_rcu+0x64/0x108) (__irq_exit_rcu) from (irq_exit+0x8/0x10) (irq_exit) from (__handle_domain_irq+0x74/0x9c) (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98) (gic_handle_irq) from (__irq_svc+0x5c/0x94) (__irq_svc) from (arch_cpu_idle+0x20/0x3c) (arch_cpu_idle) from (default_idle_call+0x4c/0x78) (default_idle_call) from (do_idle+0xf8/0x150) (do_idle) from (cpu_startup_entry+0x18/0x20) (cpu_startup_entry) from (0xc01530) This commit therefore adds calls to mem_dump_obj(rhp) to output some information, for example: slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256 This provides the rough size of the memory block and the offset of the rcu_head structure, which as least provides at least a few clues to help locate the problem. If the problem is reproducible, additional slab debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can provide significantly more information. Signed-off-by:
Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
When running a series of stress tests all making heavy use of RCU, it is all too possible to OOM the system when the prior test's RCU callbacks don't get invoked until after the subsequent test starts. One way of handling this is just a timed wait, but this fails when a given CPU has so many callbacks queued that they take longer to invoke than allowed for by that timed wait. This commit therefore adds an rcutree.do_rcu_barrier module parameter that is accessible from sysfs. Writing one of the many synonyms for boolean "true" will cause an rcu_barrier() to be invoked, but will guarantee that no more than one rcu_barrier() will be invoked per sixteenth of a second via this mechanism. The flip side is that a given request might wait a second or three longer than absolutely necessary, but only when there are multiple uses of rcutree.do_rcu_barrier within a one-second time interval. This commit unnecessarily serializes the rcu_barrier() machinery, given that serialization is already provided by procfs. This has the advantage of allowing throttled rcu_barrier() from other sources within the kernel. Reported-by:
Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Joel Fernandes (Google) authored
The return keyword is not needed here. Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Joel Fernandes (Google) authored
The current error handling in init_srcu_struct_fields() is a bit inconsistent. If init_srcu_struct_nodes() fails, the function either returns -ENOMEM or 0 depending on whether ssp->sda_is_static is true or false. This can make init_srcu_struct_fields() return 0 even if memory allocation failed! Simplify the error handling by always returning -ENOMEM if either init_srcu_struct_nodes() or the per-CPU allocation fails. This makes the control flow easier to follow and avoids the inconsistent return values. Add goto labels to avoid duplicating the error cleanup code. Link: https://lore.kernel.org/r/20230404003508.GA254019@google.com Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
- Sep 11, 2023
-
-
Paul E. McKenney authored
The refscale.verbose_batched and refscale.lookup_instances module parameters are omitted from the ref_scale_print_module_parms() beginning-of-test output. This commit therefore adds them. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
This commit fixes a misplaced data re-read in the typesafe code. The reason that this was not noticed is that this is a performance test with no writers, so a mismatch could not occur. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Jiapeng Chong authored
The rcu_tasks_lazy_ms variable is not used outside the file tasks.h, so this commit marks it static. kernel/rcu/tasks.h:1085:5: warning: symbol 'rcu_tasks_lazy_ms' was not declared. Should it be static? Reported-by:
Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6086 Signed-off-by:
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
The rcu_tasks_need_gpcb() samples ->percpu_dequeue_lim as part of the condition clause of a "for" loop, which is a bit confusing. This commit therefore hoists this sampling out of the loop, using the result loaded in the condition clause. So why does this work in the face of a concurrent switch from single-CPU queueing to per-CPU queueing? o The call_rcu_tasks_generic() that makes the change has already enqueued its callback, which means that all of the other CPU's callback queues are empty. o For the call_rcu_tasks_generic() that first notices the switch to per-CPU queues, the smp_store_release() used to update ->percpu_enqueue_lim pairs with the raw_spin_trylock_rcu_node()'s full barrier that is between the READ_ONCE(rtp->percpu_enqueue_shift) and the rcu_segcblist_enqueue() that enqueues the callback. o Because this CPU's queue is empty (unless it happens to be the original single queue, in which case there is no need for synchronization), this call_rcu_tasks_generic() will do an irq_work_queue() to schedule a handler for the needed rcuwait_wake_up() call. This call will be ordered after the first call_rcu_tasks_generic() function's change to ->percpu_dequeue_lim. o This rcuwait_wake_up() will either happen before or after the set_current_state() in rcuwait_wait_event(). If it happens before, the "condition" argument's call to rcu_tasks_need_gpcb() will be ordered after the original change, and all callbacks on all CPUs will be visible. Otherwise, if it happens after, then the grace-period kthread's state will be set back to running, which will result in a later call to rcuwait_wait_event() and thus to rcu_tasks_need_gpcb(), which will again see the change. So it all works out. Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
Currently, rcu_tasks_initiate_self_tests() prints a message and then initiates self tests on up to three different RCU Tasks flavors. If one of the flavors has a grace-period hang, it is not easy to work out which of the three hung. This commit therefore prints a message prior to each individual test. Reported-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Joel Fernandes (Google) authored
There are instances where rcu_cpu_stall_reset() is called when jiffies did not get a chance to update for a long time. Before jiffies is updated, the CPU stall detector can go off triggering false-positives where a just-started grace period appears to be ages old. In the past, we disabled stall detection in rcu_cpu_stall_reset() however this got changed [1]. This is resulting in false-positives in KGDB usecase [2]. Fix this by deferring the update of jiffies to the third run of the FQS loop. This is more robust, as, even if rcu_cpu_stall_reset() is called just before jiffies is read, we would end up pushing out the jiffies read by 3 more FQS loops. Meanwhile the CPU stall detection will be delayed and we will not get any false positives. [1] https://lore.kernel.org/all/20210521155624.174524-2-senozhatsky@chromium.org/ [2] https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Tested with rcutorture.cpu_stall option as well to verify stall behavior with/without patch. Tested-by:
Huacai Chen <chenhuacai@loongson.cn> Reported-by:
Binbin Zhou <zhoubinbin@loongson.cn> Closes: https://lore.kernel.org/all/20230814020045.51950-2-chenhuacai@loongson.cn/ Suggested-by:
Paul McKenney <paulmck@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: a80be428 ("rcu: Do not disable GP stall detection in rcu_cpu_stall_reset()") Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
This commit registers an RCU CPU stall notifier when testing RCU CPU stalls. The notifier logs a message similar to the following: rcu_torture_stall_nf: v=1, duration=21001. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
It is sometimes helpful to have a way for the subsystem causing the stall to dump its state when an RCU CPU stall occurs. This commit therefore bases rcu_stall_chain_notifier_register() and rcu_stall_chain_notifier_unregister() on atomic notifiers in order to provide this functionality. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-