Skip to content
Snippets Groups Projects
  1. Jan 24, 2024
    • Frederic Weisbecker's avatar
      rcu: Defer RCU kthreads wakeup when CPU is dying · e787644c
      Frederic Weisbecker authored
      
      When the CPU goes idle for the last time during the CPU down hotplug
      process, RCU reports a final quiescent state for the current CPU. If
      this quiescent state propagates up to the top, some tasks may then be
      woken up to complete the grace period: the main grace period kthread
      and/or the expedited main workqueue (or kworker).
      
      If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
      arm the RT bandwith timer to the local offline CPU. Since this happens
      after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
      timer gets ignored. Therefore if the RCU kthreads are waiting for RT
      bandwidth to be available, they may never be actually scheduled.
      
      This triggers TREE03 rcutorture hangs:
      
      	 rcu: INFO: rcu_preempt self-detected stall on CPU
      	 rcu:     4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
      	 rcu:     (t=21035 jiffies g=938281 q=40787 ncpus=6)
      	 rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
      	 rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      	 rcu: RCU grace-period kthread stack dump:
      	 task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
      	 Call Trace:
      	  <TASK>
      	  __schedule+0x2eb/0xa80
      	  schedule+0x1f/0x90
      	  schedule_timeout+0x163/0x270
      	  ? __pfx_process_timeout+0x10/0x10
      	  rcu_gp_fqs_loop+0x37c/0x5b0
      	  ? __pfx_rcu_gp_kthread+0x10/0x10
      	  rcu_gp_kthread+0x17c/0x200
      	  kthread+0xde/0x110
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork+0x2b/0x40
      	  ? __pfx_kthread+0x10/0x10
      	  ret_from_fork_asm+0x1b/0x30
      	  </TASK>
      
      The situation can't be solved with just unpinning the timer. The hrtimer
      infrastructure and the nohz heuristics involved in finding the best
      remote target for an unpinned timer would then also need to handle
      enqueues from an offline CPU in the most horrendous way.
      
      So fix this on the RCU side instead and defer the wake up to an online
      CPU if it's too late for the local one.
      
      Reported-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Fixes: 5c0930cc ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
      e787644c
  2. Dec 13, 2023
    • Zqiang's avatar
      rcu: Force quiescent states only for ongoing grace period · dee39c0c
      Zqiang authored
      
      If an rcutorture test scenario creates an fqs_task kthread, it will
      periodically invoke rcu_force_quiescent_state() in order to start
      force-quiescent-state (FQS) operations.  However, an FQS operation
      will be started even if there is no RCU grace period in progress.
      Although testing FQS operations startup when there is no grace period in
      progress is necessary, it need not happen all that often.  This commit
      therefore causes rcu_force_quiescent_state() to take an early exit
      if there is no grace period in progress.
      
      Note that there will still be attempts to start an FQS scan in the
      absence of a grace period because the grace period might end right
      after the rcu_force_quiescent_state() function's check.  In actual
      testing, this happens about once every ten minutes, which should
      provide adequate testing.
      
      Signed-off-by: default avatarZqiang <qiang.zhang1211@gmail.com>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarNeeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
      dee39c0c
  3. Dec 11, 2023
  4. Nov 23, 2023
  5. Nov 01, 2023
  6. Oct 13, 2023
  7. Oct 10, 2023
    • Frederic Weisbecker's avatar
      srcu: Fix callbacks acceleration mishandling · 4a8e65b0
      Frederic Weisbecker authored
      
      SRCU callbacks acceleration might fail if the preceding callbacks
      advance also fails. This can happen when the following steps are met:
      
      1) The RCU_WAIT_TAIL segment has callbacks (say for gp_num 8) and the
         RCU_NEXT_READY_TAIL also has callbacks (say for gp_num 12).
      
      2) The grace period for RCU_WAIT_TAIL is observed as started but not yet
         completed so rcu_seq_current() returns 4 + SRCU_STATE_SCAN1 = 5.
      
      3) This value is passed to rcu_segcblist_advance() which can't move
         any segment forward and fails.
      
      4) srcu_gp_start_if_needed() still proceeds with callback acceleration.
         But then the call to rcu_seq_snap() observes the grace period for the
         RCU_WAIT_TAIL segment (gp_num 8) as completed and the subsequent one
         for the RCU_NEXT_READY_TAIL segment as started
         (ie: 8 + SRCU_STATE_SCAN1 = 9) so it returns a snapshot of the
         next grace period, which is 16.
      
      5) The value of 16 is passed to rcu_segcblist_accelerate() but the
         freshly enqueued callback in RCU_NEXT_TAIL can't move to
         RCU_NEXT_READY_TAIL which already has callbacks for a previous grace
         period (gp_num = 12). So acceleration fails.
      
      6) Note in all these steps, srcu_invoke_callbacks() hadn't had a chance
         to run srcu_invoke_callbacks().
      
      Then some very bad outcome may happen if the following happens:
      
      7) Some other CPU races and starts the grace period number 16 before the
         CPU handling previous steps had a chance. Therefore srcu_gp_start()
         isn't called on the latter sdp to fix the acceleration leak from
         previous steps with a new pair of call to advance/accelerate.
      
      8) The grace period 16 completes and srcu_invoke_callbacks() is finally
         called. All the callbacks from previous grace periods (8 and 12) are
         correctly advanced and executed but callbacks in RCU_NEXT_READY_TAIL
         still remain. Then rcu_segcblist_accelerate() is called with a
         snaphot of 20.
      
      9) Since nothing started the grace period number 20, callbacks stay
         unhandled.
      
      This has been reported in real load:
      
      	[3144162.608392] INFO: task kworker/136:12:252684 blocked for more
      	than 122 seconds.
      	[3144162.615986]       Tainted: G           O  K   5.4.203-1-tlinux4-0011.1 #1
      	[3144162.623053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      	disables this message.
      	[3144162.631162] kworker/136:12  D    0 252684      2 0x90004000
      	[3144162.631189] Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
      	[3144162.631192] Call Trace:
      	[3144162.631202]  __schedule+0x2ee/0x660
      	[3144162.631206]  schedule+0x33/0xa0
      	[3144162.631209]  schedule_timeout+0x1c4/0x340
      	[3144162.631214]  ? update_load_avg+0x82/0x660
      	[3144162.631217]  ? raw_spin_rq_lock_nested+0x1f/0x30
      	[3144162.631218]  wait_for_completion+0x119/0x180
      	[3144162.631220]  ? wake_up_q+0x80/0x80
      	[3144162.631224]  __synchronize_srcu.part.19+0x81/0xb0
      	[3144162.631226]  ? __bpf_trace_rcu_utilization+0x10/0x10
      	[3144162.631227]  synchronize_srcu+0x5f/0xc0
      	[3144162.631236]  irqfd_shutdown+0x3c/0xb0 [kvm]
      	[3144162.631239]  ? __schedule+0x2f6/0x660
      	[3144162.631243]  process_one_work+0x19a/0x3a0
      	[3144162.631244]  worker_thread+0x37/0x3a0
      	[3144162.631247]  kthread+0x117/0x140
      	[3144162.631247]  ? process_one_work+0x3a0/0x3a0
      	[3144162.631248]  ? __kthread_cancel_work+0x40/0x40
      	[3144162.631250]  ret_from_fork+0x1f/0x30
      
      Fix this with taking the snapshot for acceleration _before_ the read
      of the current grace period number.
      
      The only side effect of this solution is that callbacks advancing happen
      then _after_ the full barrier in rcu_seq_snap(). This is not a problem
      because that barrier only cares about:
      
      1) Ordering accesses of the update side before call_srcu() so they don't
         bleed.
      2) See all the accesses prior to the grace period of the current gp_num
      
      The only things callbacks advancing need to be ordered against are
      carried by snp locking.
      
      Reported-by: default avatarYong He <alexyonghe@tencent.com>
      Co-developed-by: default avatar: Yong He <alexyonghe@tencent.com>
      Signed-off-by: default avatarYong He <alexyonghe@tencent.com>
      Co-developed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Co-developed-by: default avatarNeeraj upadhyay <Neeraj.Upadhyay@amd.com>
      Signed-off-by: default avatarNeeraj upadhyay <Neeraj.Upadhyay@amd.com>
      Link: http://lore.kernel.org/CANZk6aR+CqZaqmMWrC2eRRPY12qAZnDZLwLnHZbNi=xXMB401g@mail.gmail.com
      
      
      Fixes: da915ad5 ("srcu: Parallelize callback handling")
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      4a8e65b0
  8. Oct 04, 2023
    • Frederic Weisbecker's avatar
      rcu: Standardize explicit CPU-hotplug calls · 448e9f34
      Frederic Weisbecker authored
      
      rcu_report_dead() and rcutree_migrate_callbacks() have their headers in
      rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug
      functions.
      
      Also rcu_cpu_starting() and rcu_report_dead() have different naming
      conventions while they mirror each other's effects.
      
      Fix the headers and propose a naming that relates both functions and
      aligns with the prefix of other rcutree CPU-hotplug functions.
      
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      448e9f34
    • Frederic Weisbecker's avatar
      rcu: Conditionally build CPU-hotplug teardown callbacks · 2cb1f6e9
      Frederic Weisbecker authored
      
      Among the three CPU-hotplug teardown RCU callbacks, two of them early
      exit if CONFIG_HOTPLUG_CPU=n, and one is left unchanged. In any case
      all of them have an implementation when CONFIG_HOTPLUG_CPU=n.
      
      Align instead with the common way to deal with CPU-hotplug teardown
      callbacks and provide a proper stub when they are not supported.
      
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      2cb1f6e9
    • Qi Zheng's avatar
      rcu: dynamically allocate the rcu-kfree shrinker · 21e0b932
      Qi Zheng authored
      Use new APIs to dynamically allocate the rcu-kfree shrinker.
      
      Link: https://lkml.kernel.org/r/20230911094444.68966-17-zhengqi.arch@bytedance.com
      
      
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Anna Schumaker <anna@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Carlos Llamas <cmllamas@google.com>
      Cc: Chandan Babu R <chandan.babu@oracle.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Chuck Lever <cel@kernel.org>
      Cc: Coly Li <colyli@suse.de>
      Cc: Dai Ngo <Dai.Ngo@oracle.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Airlie <airlied@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Marijn Suijten <marijn.suijten@somainline.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Olga Kornievskaia <kolga@netapp.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Sean Paul <sean@poorly.run>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
      Cc: Tom Talpey <tom@talpey.com>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      Cc: Yue Hu <huyue2@coolpad.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      21e0b932
    • Qi Zheng's avatar
      rcu: dynamically allocate the rcu-lazy shrinker · 2fbacff0
      Qi Zheng authored
      Use new APIs to dynamically allocate the rcu-lazy shrinker.
      
      Link: https://lkml.kernel.org/r/20230911094444.68966-16-zhengqi.arch@bytedance.com
      
      
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Anna Schumaker <anna@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Carlos Llamas <cmllamas@google.com>
      Cc: Chandan Babu R <chandan.babu@oracle.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Chuck Lever <cel@kernel.org>
      Cc: Coly Li <colyli@suse.de>
      Cc: Dai Ngo <Dai.Ngo@oracle.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Airlie <airlied@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Marijn Suijten <marijn.suijten@somainline.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mike Snitzer <snitzer@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
      Cc: Olga Kornievskaia <kolga@netapp.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Sean Paul <sean@poorly.run>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Song Liu <song@kernel.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Steven Price <steven.price@arm.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
      Cc: Tom Talpey <tom@talpey.com>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      Cc: Yue Hu <huyue2@coolpad.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2fbacff0
    • Frederic Weisbecker's avatar
      rcu: Assume rcu_report_dead() is always called locally · c964c1f5
      Frederic Weisbecker authored
      
      rcu_report_dead() has to be called locally by the CPU that is going to
      exit the RCU state machine. Passing a cpu argument here is error-prone
      and leaves the possibility for a racy remote call.
      
      Use local access instead.
      
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      c964c1f5
    • Frederic Weisbecker's avatar
      rcu: Assume IRQS disabled from rcu_report_dead() · 358662a9
      Frederic Weisbecker authored
      
      rcu_report_dead() is the last RCU word from the CPU down through the
      hotplug path. It is called in the idle loop right before the CPU shuts
      down for good. Because it removes the CPU from the grace period state
      machine and reports an ultimate quiescent state if necessary, no further
      use of RCU is allowed. Therefore it is expected that IRQs are disabled
      upon calling this function and are not to be re-enabled again until the
      CPU shuts down.
      
      Remove the IRQs disablement from that function and verify instead that
      it is actually called with IRQs disabled as it is expected at that
      special point in the idle path.
      
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      358662a9
    • Frederic Weisbecker's avatar
    • Catalin Marinas's avatar
      rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects · 5f98fd03
      Catalin Marinas authored
      
      Since the actual slab freeing is deferred when calling kvfree_rcu(), so
      is the kmemleak_free() callback informing kmemleak of the object
      deletion. From the perspective of the kvfree_rcu() caller, the object is
      freed and it may remove any references to it. Since kmemleak does not
      scan RCU internal data storing the pointer, it will report such objects
      as leaks during the grace period.
      
      Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note
      that the tiny RCU implementation does not have such issue since the
      objects can be tracked from the rcu_ctrlblk structure.
      
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://lore.kernel.org/all/F903A825-F05F-4B77-A2B5-7356282FBA2C@apple.com/
      
      
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      5f98fd03
  9. Sep 26, 2023
    • Denis Arefev's avatar
      srcu: Fix srcu_struct node grpmask overflow on 64-bit systems · d8d5b7bf
      Denis Arefev authored
      
      The value of a bitwise expression 1 << (cpu - sdp->mynode->grplo)
      is subject to overflow due to a failure to cast operands to a larger
      data type before performing the bitwise operation.
      
      The maximum result of this subtraction is defined by the RCU_FANOUT_LEAF
      Kconfig option, which on 64-bit systems defaults to 16 (resulting in a
      maximum shift of 15), but which can be set up as high as 64 (resulting
      in a maximum shift of 63).  A value of 31 can result in sign extension,
      resulting in 0xffffffff80000000 instead of the desired 0x80000000.
      A value of 32 or greater triggers undefined behavior per the C standard.
      
      This bug has not been known to cause issues because almost all kernels
      take the default CONFIG_RCU_FANOUT_LEAF=16.  Furthermore, as long as a
      given compiler gives a deterministic non-zero result for 1<<N for N>=32,
      the code correctly invokes all SRCU callbacks, albeit wasting CPU time
      along the way.
      
      This commit therefore substitutes the correct 1UL for the buggy 1.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Signed-off-by: default avatarDenis Arefev <arefev@swemel.ru>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Cc: David Laight <David.Laight@aculab.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      d8d5b7bf
  10. Sep 24, 2023
  11. Sep 13, 2023
    • Paul E. McKenney's avatar
      rcu: Eliminate rcu_gp_slow_unregister() false positive · 0ae9942f
      Paul E. McKenney authored
      
      When using rcutorture as a module, there are a number of conditions that
      can abort the modprobe operation, for example, when attempting to run
      both RCU CPU stall warning tests and forward-progress tests.  This can
      cause rcu_torture_cleanup() to be invoked on the unwind path out of
      rcu_rcu_torture_init(), which will mean that rcu_gp_slow_unregister()
      is invoked without a matching rcu_gp_slow_register().  This will cause
      a splat because rcu_gp_slow_unregister() is passed rcu_fwd_cb_nodelay,
      which does not match a NULL pointer.
      
      This commit therefore forgives a mismatch involving a NULL pointer, thus
      avoiding this false-positive splat.
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      0ae9942f
    • Zhen Lei's avatar
      rcu: Dump memory object info if callback function is invalid · 2cbc482d
      Zhen Lei authored
      
      When a structure containing an RCU callback rhp is (incorrectly) freed
      and reallocated after rhp is passed to call_rcu(), it is not unusual for
      rhp->func to be set to NULL. This defeats the debugging prints used by
      __call_rcu_common() in kernels built with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y,
      which expect to identify the offending code using the identity of this
      function.
      
      And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things
      are even worse, as can be seen from this splat:
      
      Unable to handle kernel NULL pointer dereference at virtual address 0
      ... ...
      PC is at 0x0
      LR is at rcu_do_batch+0x1c0/0x3b8
      ... ...
       (rcu_do_batch) from (rcu_core+0x1d4/0x284)
       (rcu_core) from (__do_softirq+0x24c/0x344)
       (__do_softirq) from (__irq_exit_rcu+0x64/0x108)
       (__irq_exit_rcu) from (irq_exit+0x8/0x10)
       (irq_exit) from (__handle_domain_irq+0x74/0x9c)
       (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
       (gic_handle_irq) from (__irq_svc+0x5c/0x94)
       (__irq_svc) from (arch_cpu_idle+0x20/0x3c)
       (arch_cpu_idle) from (default_idle_call+0x4c/0x78)
       (default_idle_call) from (do_idle+0xf8/0x150)
       (do_idle) from (cpu_startup_entry+0x18/0x20)
       (cpu_startup_entry) from (0xc01530)
      
      This commit therefore adds calls to mem_dump_obj(rhp) to output some
      information, for example:
      
        slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
      
      This provides the rough size of the memory block and the offset of the
      rcu_head structure, which as least provides at least a few clues to help
      locate the problem. If the problem is reproducible, additional slab
      debugging can be enabled, for example, CONFIG_DEBUG_SLAB=y, which can
      provide significantly more information.
      
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      2cbc482d
    • Paul E. McKenney's avatar
      rcu: Add sysfs to provide throttled access to rcu_barrier() · 16128b1f
      Paul E. McKenney authored
      
      When running a series of stress tests all making heavy use of RCU,
      it is all too possible to OOM the system when the prior test's RCU
      callbacks don't get invoked until after the subsequent test starts.
      One way of handling this is just a timed wait, but this fails when a
      given CPU has so many callbacks queued that they take longer to invoke
      than allowed for by that timed wait.
      
      This commit therefore adds an rcutree.do_rcu_barrier module parameter that
      is accessible from sysfs.  Writing one of the many synonyms for boolean
      "true" will cause an rcu_barrier() to be invoked, but will guarantee that
      no more than one rcu_barrier() will be invoked per sixteenth of a second
      via this mechanism.  The flip side is that a given request might wait a
      second or three longer than absolutely necessary, but only when there are
      multiple uses of rcutree.do_rcu_barrier within a one-second time interval.
      
      This commit unnecessarily serializes the rcu_barrier() machinery, given
      that serialization is already provided by procfs.  This has the advantage
      of allowing throttled rcu_barrier() from other sources within the kernel.
      
      Reported-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      16128b1f
    • Joel Fernandes (Google)'s avatar
      4502138a
    • Joel Fernandes (Google)'s avatar
      srcu: Fix error handling in init_srcu_struct_fields() · f0a31b26
      Joel Fernandes (Google) authored
      The current error handling in init_srcu_struct_fields() is a bit
      inconsistent.  If init_srcu_struct_nodes() fails, the function either
      returns -ENOMEM or 0 depending on whether ssp->sda_is_static is true or
      false. This can make init_srcu_struct_fields() return 0 even if memory
      allocation failed!
      
      Simplify the error handling by always returning -ENOMEM if either
      init_srcu_struct_nodes() or the per-CPU allocation fails. This makes the
      control flow easier to follow and avoids the inconsistent return values.
      
      Add goto labels to avoid duplicating the error cleanup code.
      
      Link: https://lore.kernel.org/r/20230404003508.GA254019@google.com
      
      
      Signed-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      f0a31b26
  12. Sep 11, 2023
Loading