Skip to content
Snippets Groups Projects
  1. Jul 22, 2023
  2. Jul 11, 2023
    • Christophe Leroy's avatar
      kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures · 487d69e7
      Christophe Leroy authored
      
      [ Upstream commit 353e7300 ]
      
      Activating KCSAN on a 32 bits architecture leads to the following
      link-time failure:
      
          LD      .tmp_vmlinux.kallsyms1
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_load':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_load_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_store':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_store_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_exchange':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_exchange_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_add':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_add_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_sub':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_sub_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_and':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_and_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_or':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_or_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_xor':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_xor_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_nand':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_nand_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_strong':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_weak':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
        powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_val':
        kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
      
      32 bits architectures don't have 64 bits atomic builtins. Only
      include DEFINE_TSAN_ATOMIC_OPS(64) on 64 bits architectures.
      
      Fixes: 0f8ad5f2 ("kcsan: Add support for atomic builtins")
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/d9c6afc28d0855240171a4e0ad9ffcdb9d07fceb.1683892665.git.christophe.leroy@csgroup.eu
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      487d69e7
    • Zhen Lei's avatar
      kexec: fix a memory leak in crash_shrink_memory() · 45fa5b03
      Zhen Lei authored
      [ Upstream commit 1cba6c43 ]
      
      Patch series "kexec: enable kexec_crash_size to support two crash kernel
      regions".
      
      When crashkernel=X fails to reserve region under 4G, it will fall back to
      reserve region above 4G and a region of the default size will also be
      reserved under 4G.  Unfortunately, /sys/kernel/kexec_crash_size only
      supports one crash kernel region now, the user cannot sense the low memory
      reserved by reading /sys/kernel/kexec_crash_size.  Also, low memory cannot
      be freed by writing this file.
      
      For example:
      resource_size(crashk_res) = 512M
      resource_size(crashk_low_res) = 256M
      
      The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be
      768M.  When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size
      of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB,
      which is incorrect.
      
      Since crashk_res manages the memory with high address and crashk_low_res
      manages the memory with low address, crashk_low_res is shrunken only when
      all crashk_res is shrunken.  And because when there is only one crash
      kernel region, crashk_res is always used.  Therefore, if all crashk_res is
      shrunken and crashk_low_res still exists, swap them.
      
      This patch (of 6):
      
      If the value of parameter 'new_size' is in the semi-open and semi-closed
      interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the
      calculation result of ram_res is:
      
      	ram_res->start = crashk_res.end + 1
      	ram_res->end   = crashk_res.end
      
      The operation of insert_resource() fails, and ram_res is not added to
      iomem_resource.  As a result, the memory of the control block ram_res is
      leaked.
      
      In fact, on all architectures, the start address and size of crashk_res
      are already aligned by KEXEC_CRASH_MEM_ALIGN.  Therefore, we do not need
      to round up crashk_res.start again.  Instead, we should round up
      'new_size' in advance.
      
      Link: https://lkml.kernel.org/r/20230527123439.772-1-thunder.leizhen@huawei.com
      Link: https://lkml.kernel.org/r/20230527123439.772-2-thunder.leizhen@huawei.com
      
      
      Fixes: 6480e5a0 ("kdump: add missing RAM resource in crash_shrink_memory()")
      Fixes: 06a7f711 ("kexec: premit reduction of the reserved memory size")
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Cong Wang <amwang@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      45fa5b03
    • Douglas Anderson's avatar
      watchdog/perf: more properly prevent false positives with turbo modes · 8e640900
      Douglas Anderson authored
      [ Upstream commit 4379e59f ]
      
      Currently, in the watchdog_overflow_callback() we first check to see if
      the watchdog had been touched and _then_ we handle the workaround for
      turbo mode.  This order should be reversed.
      
      Specifically, "touching" the hardlockup detector's watchdog should avoid
      lockups being detected for one period that should be roughly the same
      regardless of whether we're running turbo or not.  That means that we
      should do the extra accounting for turbo _before_ we look at (and clear)
      the global indicating that we've been touched.
      
      NOTE: this fix is made based on code inspection.  I am not aware of any
      reports where the old code would have generated false positives.  That
      being said, this order seems more correct and also makes it easier down
      the line to share code with the "buddy" hardlockup detector.
      
      Link: https://lkml.kernel.org/r/20230519101840.v5.2.I843b0d1de3e096ba111a179f3adb16d576bef5c7@changeid
      
      
      Fixes: 7edaeb68 ("kernel/watchdog: Prevent false positives with turbo modes")
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen-Yu Tsai <wens@csie.org>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Colin Cross <ccross@android.com>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guenter Roeck <groeck@chromium.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Stephen Boyd <swboyd@chromium.org>
      Cc: Sumit Garg <sumit.garg@linaro.org>
      Cc: Tzung-Bi Shih <tzungbi@chromium.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8e640900
    • Yafang Shao's avatar
      bpf: Fix memleak due to fentry attach failure · f72c67d1
      Yafang Shao authored
      
      [ Upstream commit 108598c3 ]
      
      If it fails to attach fentry, the allocated bpf trampoline image will be
      left in the system. That can be verified by checking /proc/kallsyms.
      
      This meamleak can be verified by a simple bpf program as follows:
      
        SEC("fentry/trap_init")
        int fentry_run()
        {
            return 0;
        }
      
      It will fail to attach trap_init because this function is freed after
      kernel init, and then we can find the trampoline image is left in the
      system by checking /proc/kallsyms.
      
        $ tail /proc/kallsyms
        ffffffffc0613000 t bpf_trampoline_6442453466_1  [bpf]
        ffffffffc06c3000 t bpf_trampoline_6442453466_1  [bpf]
      
        $ bpftool btf dump file /sys/kernel/btf/vmlinux | grep "FUNC 'trap_init'"
        [2522] FUNC 'trap_init' type_id=119 linkage=static
      
        $ echo $((6442453466 & 0x7fffffff))
        2522
      
      Note that there are two left bpf trampoline images, that is because the
      libbpf will fallback to raw tracepoint if -EINVAL is returned.
      
      Fixes: e21aa341 ("bpf: Fix fexit trampoline.")
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Cc: Jiri Olsa <olsajiri@gmail.com>
      Link: https://lore.kernel.org/bpf/20230515130849.57502-2-laoar.shao@gmail.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f72c67d1
    • Yafang Shao's avatar
      bpf: Remove bpf trampoline selector · 69056193
      Yafang Shao authored
      
      [ Upstream commit 47e79cbe ]
      
      After commit e21aa341 ("bpf: Fix fexit trampoline."), the selector is only
      used to indicate how many times the bpf trampoline image are updated and been
      displayed in the trampoline ksym name. After the trampoline is freed, the
      selector will start from 0 again. So the selector is a useless value to the
      user. We can remove it.
      
      If the user want to check whether the bpf trampoline image has been updated
      or not, the user can compare the address. Each time the trampoline image is
      updated, the address will change consequently. Jiri also pointed out another
      issue that perf is still using the old name "bpf_trampoline_%lu", so this
      change can fix the issue in perf.
      
      Fixes: e21aa341 ("bpf: Fix fexit trampoline.")
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSong Liu <song@kernel.org>
      Cc: Jiri Olsa <olsajiri@gmail.com>
      Link: https://lore.kernel.org/bpf/ZFvOOlrmHiY9AgXE@krava
      Link: https://lore.kernel.org/bpf/20230515130849.57502-3-laoar.shao@gmail.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      69056193
    • Stanislav Fomichev's avatar
      bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen · 1b4a82c2
      Stanislav Fomichev authored
      
      [ Upstream commit 29ebbba7 ]
      
      With the way the hooks implemented right now, we have a special
      condition: optval larger than PAGE_SIZE will expose only first 4k into
      BPF; any modifications to the optval are ignored. If the BPF program
      doesn't handle this condition by resetting optlen to 0,
      the userspace will get EFAULT.
      
      The intention of the EFAULT was to make it apparent to the
      developers that the program is doing something wrong.
      However, this inadvertently might affect production workloads
      with the BPF programs that are not too careful (i.e., returning EFAULT
      for perfectly valid setsockopt/getsockopt calls).
      
      Let's try to minimize the chance of BPF program screwing up userspace
      by ignoring the output of those BPF programs (instead of returning
      EFAULT to the userspace). pr_info_once those cases to
      the dmesg to help with figuring out what's going wrong.
      
      Fixes: 0d01da6a ("bpf: implement getsockopt and setsockopt hooks")
      Suggested-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20230511170456.1759459-2-sdf@google.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b4a82c2
    • Hao Jia's avatar
      sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle() · 83fbbb46
      Hao Jia authored
      
      [ Upstream commit ebb83d84 ]
      
      After commit 8ad075c2 ("sched: Async unthrottling for cfs
      bandwidth"), we may update the rq clock multiple times in the loop of
      __cfsb_csd_unthrottle().
      
      A prior (although less common) instance of this problem exists in
      unthrottle_offline_cfs_rqs().
      
      Cure both by ensuring update_rq_clock() is called before the loop and
      setting RQCF_ACT_SKIP during the loop, to supress further updates.
      The alternative would be pulling update_rq_clock() out of
      unthrottle_cfs_rq(), but that gives an even bigger mess.
      
      Fixes: 8ad075c2 ("sched: Async unthrottling for cfs bandwidth")
      Reviewed-By: default avatarBen Segall <bsegall@google.com>
      Suggested-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarHao Jia <jiahao.os@bytedance.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Link: https://lkml.kernel.org/r/20230613082012.49615-4-jiahao.os@bytedance.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83fbbb46
    • Qiuxu Zhuo's avatar
      rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale · 1dd7547c
      Qiuxu Zhuo authored
      
      [ Upstream commit 23fc8df2 ]
      
      Running the 'kfree_rcu_test' test case [1] results in a splat [2].
      The root cause is the kfree_scale_thread thread(s) continue running
      after unloading the rcuscale module.  This commit fixes that isue by
      invoking kfree_scale_cleanup() from rcu_scale_cleanup() when removing
      the rcuscale module.
      
      [1] modprobe rcuscale kfree_rcu_test=1
          // After some time
          rmmod rcuscale
          rmmod torture
      
      [2] BUG: unable to handle page fault for address: ffffffffc0601a87
          #PF: supervisor instruction fetch in kernel mode
          #PF: error_code(0x0010) - not-present page
          PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
          Oops: 0010 [#1] PREEMPT SMP NOPTI
          CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
          RIP: 0010:0xffffffffc0601a87
          Code: Unable to access opcode bytes at 0xffffffffc0601a5d.
          RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297
          RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000
          RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de
          RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000
          R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
          R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe
          FS:  0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           <TASK>
           ? kvfree_call_rcu+0xf0/0x3a0
           ? kthread+0xf3/0x120
           ? kthread_complete_and_exit+0x20/0x20
           ? ret_from_fork+0x1f/0x30
           </TASK>
          Modules linked in: rfkill sunrpc ... [last unloaded: torture]
          CR2: ffffffffc0601a87
          ---[ end trace 0000000000000000 ]---
      
      Fixes: e6e78b00 ("rcuperf: Add kfree_rcu() performance Tests")
      Reviewed-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: default avatarQiuxu Zhuo <qiuxu.zhuo@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1dd7547c
    • Qiuxu Zhuo's avatar
      rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() · bd5e3278
      Qiuxu Zhuo authored
      
      [ Upstream commit bf5ddd73 ]
      
      This code-movement-only commit moves the rcu_scale_cleanup() and
      rcu_scale_shutdown() functions to follow kfree_scale_cleanup().
      This is code movement is in preparation for a bug-fix patch that invokes
      kfree_scale_cleanup() from rcu_scale_cleanup().
      
      Signed-off-by: default avatarQiuxu Zhuo <qiuxu.zhuo@intel.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Stable-dep-of: 23fc8df2 ("rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      bd5e3278
    • Paul E. McKenney's avatar
      rcuscale: Move shutdown from wait_event() to wait_event_idle() · 9ca6bb80
      Paul E. McKenney authored
      
      [ Upstream commit ef1ef3d4 ]
      
      The rcu_scale_shutdown() and kfree_scale_shutdown() kthreads/functions
      use wait_event() to wait for the rcuscale test to complete.  However,
      each updater thread in such a test waits for at least 100 grace periods.
      If each grace period takes more than 1.2 seconds, which is long, but
      not insanely so, this can trigger the hung-task timeout.
      
      This commit therefore replaces those wait_event() calls with calls to
      wait_event_idle(), which do not trigger the hung-task timeout.
      
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Reported-by: default avatarLiam Howlett <liam.howlett@oracle.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Tested-by: default avatarYujie Liu <yujie.liu@intel.com>
      Signed-off-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Stable-dep-of: 23fc8df2 ("rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9ca6bb80
    • Paul E. McKenney's avatar
      rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs · 9a1d4933
      Paul E. McKenney authored
      
      [ Upstream commit 401b0de3 ]
      
      The rcu_tasks_invoke_cbs() function relies on queue_work_on() to silently
      fall back to WORK_CPU_UNBOUND when the specified CPU is offline.  However,
      the queue_work_on() function's silent fallback mechanism relies on that
      CPU having been online at some time in the past.  When queue_work_on()
      is passed a CPU that has never been online, workqueue lockups ensue,
      which can be bad for your kernel's general health and well-being.
      
      This commit therefore checks whether a given CPU has ever been online,
      and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to
      queue_work_on().  Why not simply omit the queue_work_on() call entirely?
      Because this function is flooding callback-invocation notifications
      to all CPUs, and must deal with possibilities that include a sparse
      cpu_possible_mask.
      
      This commit also moves the setting of the rcu_data structure's
      ->beenonline field to rcu_cpu_starting(), which executes on the
      incoming CPU before that CPU has ever enabled interrupts.  This ensures
      that the required workqueues are present.  In addition, because the
      incoming CPU has not yet enabled its interrupts, there cannot yet have
      been any softirq handlers running on this CPU, which means that the
      WARN_ON_ONCE(!rdp->beenonline) within the RCU_SOFTIRQ handler cannot
      have triggered yet.
      
      Fixes: d363f833 ("rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations")
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9a1d4933
    • Paul E. McKenney's avatar
      rcu: Make rcu_cpu_starting() rely on interrupts being disabled · a1a3bbd8
      Paul E. McKenney authored
      
      [ Upstream commit 15d44dfa ]
      
      Currently, rcu_cpu_starting() is written so that it might be invoked
      with interrupts enabled.  However, it is always called when interrupts
      are disabled, either by rcu_init(), notify_cpu_starting(), or from a
      call point prior to the call to notify_cpu_starting().
      
      But why bother requiring that interrupts be disabled?  The purpose is
      to allow the rcu_data structure's ->beenonline flag to be set after all
      early processing has completed for the incoming CPU, thus allowing this
      flag to be used to determine when workqueues have been set up for the
      incoming CPU, while still allowing this flag to be used as a diagnostic
      within rcu_core().
      
      This commit therefore makes rcu_cpu_starting() rely on interrupts being
      disabled.
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Stable-dep-of: 401b0de3 ("rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a1a3bbd8
    • Wen Yang's avatar
      tick/rcu: Fix bogus ratelimit condition · 14c05764
      Wen Yang authored
      
      [ Upstream commit a7e282c7 ]
      
      The ratelimit logic in report_idle_softirq() is broken because the
      exit condition is always true:
      
      	static int ratelimit;
      
      	if (ratelimit < 10)
      		return false;  ---> always returns here
      
      	ratelimit++;           ---> no chance to run
      
      Make it check for >= 10 instead.
      
      Fixes: 0345691b ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
      Signed-off-by: default avatarWen Yang <wenyang.linux@foxmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/tencent_5AAA3EEAB42095C9B7740BE62FBF9A67E007@qq.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      14c05764
    • Thomas Gleixner's avatar
      posix-timers: Prevent RT livelock in itimer_delete() · f9bd298e
      Thomas Gleixner authored
      
      [ Upstream commit 9d9e5220 ]
      
      itimer_delete() has a retry loop when the timer is concurrently expired. On
      non-RT kernels this just spin-waits until the timer callback has completed,
      except for posix CPU timers which have HAVE_POSIX_CPU_TIMERS_TASK_WORK
      enabled.
      
      In that case and on RT kernels the existing task could live lock when
      preempting the task which does the timer delivery.
      
      Replace spin_unlock() with an invocation of timer_wait_running() to handle
      it the same way as the other retry loops in the posix timer code.
      
      Fixes: ec8f954a ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/87v8g7c50d.ffs@tglx
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f9bd298e
  3. Jun 28, 2023
  4. Jun 21, 2023
    • Ricardo Ribalda Delgado's avatar
      kexec: support purgatories with .text.hot sections · cb163861
      Ricardo Ribalda Delgado authored
      commit 8652d44f upstream.
      
      Patch series "kexec: Fix kexec_file_load for llvm16 with PGO", v7.
      
      When upreving llvm I realised that kexec stopped working on my test
      platform.
      
      The reason seems to be that due to PGO there are multiple .text sections
      on the purgatory, and kexec does not supports that.
      
      
      This patch (of 4):
      
      Clang16 links the purgatory text in two sections when PGO is in use:
      
        [ 1] .text             PROGBITS         0000000000000000  00000040
             00000000000011a1  0000000000000000  AX       0     0     16
        [ 2] .rela.text        RELA             0000000000000000  00003498
             0000000000000648  0000000000000018   I      24     1     8
        ...
        [17] .text.hot.        PROGBITS         0000000000000000  00003220
             000000000000020b  0000000000000000  AX       0     0     1
        [18] .rela.text.hot.   RELA             0000000000000000  00004428
             0000000000000078  0000000000000018   I      24    17     8
      
      And both of them have their range [sh_addr ... sh_addr+sh_size] on the
      area pointed by `e_entry`.
      
      This causes that image->start is calculated twice, once for .text and
      another time for .text.hot. The second calculation leaves image->start
      in a random location.
      
      Because of this, the system crashes immediately after:
      
      kexec_core: Starting new kernel
      
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-0-b05c520b7296@chromium.org
      Link: https://lkml.kernel.org/r/20230321-kexec_clang16-v7-1-b05c520b7296@chromium.org
      
      
      Fixes: 93045705 ("kernel/kexec_file.c: split up __kexec_load_puragory")
      Signed-off-by: default avatarRicardo Ribalda <ribalda@chromium.org>
      Reviewed-by: default avatarRoss Zwisler <zwisler@google.com>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Reviewed-by: default avatarPhilipp Rudo <prudo@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Simon Horman <horms@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Rix <trix@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb163861
    • Qi Zheng's avatar
      cgroup: fix missing cpus_read_{lock,unlock}() in cgroup_transfer_tasks() · 8064c5a5
      Qi Zheng authored
      
      [ Upstream commit ab1de7ea ]
      
      The commit 4f7e7236 ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock()
      deadlock") fixed the deadlock between cgroup_threadgroup_rwsem and
      cpus_read_lock() by introducing cgroup_attach_{lock,unlock}() and removing
      cpus_read_{lock,unlock}() from cpuset_attach(). But cgroup_transfer_tasks()
      was missed and not handled, which will cause th following warning:
      
       WARNING: CPU: 0 PID: 589 at kernel/cpu.c:526 lockdep_assert_cpus_held+0x32/0x40
       CPU: 0 PID: 589 Comm: kworker/1:4 Not tainted 6.4.0-rc2-next-20230517 #50
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
       Workqueue: events cpuset_hotplug_workfn
       RIP: 0010:lockdep_assert_cpus_held+0x32/0x40
       <...>
       Call Trace:
        <TASK>
        cpuset_attach+0x40/0x240
        cgroup_migrate_execute+0x452/0x5e0
        ? _raw_spin_unlock_irq+0x28/0x40
        cgroup_transfer_tasks+0x1f3/0x360
        ? find_held_lock+0x32/0x90
        ? cpuset_hotplug_workfn+0xc81/0xed0
        cpuset_hotplug_workfn+0xcb1/0xed0
        ? process_one_work+0x248/0x5b0
        process_one_work+0x2b9/0x5b0
        worker_thread+0x56/0x3b0
        ? process_one_work+0x5b0/0x5b0
        kthread+0xf1/0x120
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
      
      So just use the cgroup_attach_{lock,unlock}() helper to fix it.
      
      Reported-by: default avatarZhao Gongyi <zhaogongyi@bytedance.com>
      Signed-off-by: default avatarQi Zheng <zhengqi.arch@bytedance.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Fixes: 05c7b7a9 ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
      Cc: stable@vger.kernel.org # v5.17+
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8064c5a5
    • John Sperbeck's avatar
      cgroup: always put cset in cgroup_css_set_put_fork · e70e4638
      John Sperbeck authored
      
      [ Upstream commit 2bd11033 ]
      
      A successful call to cgroup_css_set_fork() will always have taken
      a ref on kargs->cset (regardless of CLONE_INTO_CGROUP), so always
      do a corresponding put in cgroup_css_set_put_fork().
      
      Without this, a cset and its contained css structures will be
      leaked for some fork failures.  The following script reproduces
      the leak for a fork failure due to exceeding pids.max in the
      pids controller.  A similar thing can happen if we jump to the
      bad_fork_cancel_cgroup label in copy_process().
      
      [ -z "$1" ] && echo "Usage $0 pids-root" && exit 1
      PID_ROOT=$1
      CGROUP=$PID_ROOT/foo
      
      [ -e $CGROUP ] && rmdir -f $CGROUP
      mkdir $CGROUP
      echo 5 > $CGROUP/pids.max
      echo $$ > $CGROUP/cgroup.procs
      
      fork_bomb()
      {
      	set -e
      	for i in $(seq 10); do
      		/bin/sleep 3600 &
      	done
      }
      
      (fork_bomb) &
      wait
      echo $$ > $PID_ROOT/cgroup.procs
      kill $(cat $CGROUP/cgroup.procs)
      rmdir $CGROUP
      
      Fixes: ef2c41cf ("clone3: allow spawning processes into cgroups")
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e70e4638
    • Kamalesh Babulal's avatar
      cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers · fdf31f33
      Kamalesh Babulal authored
      
      [ Upstream commit 4cdb91b0 ]
      
      Replace mutex_[un]lock() with cgroup_[un]lock() wrappers to stay
      consistent across cgroup core and other subsystem code, while
      operating on the cgroup_mutex.
      
      Signed-off-by: default avatarKamalesh Babulal <kamalesh.babulal@oracle.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Stable-dep-of: 2bd11033 ("cgroup: always put cset in cgroup_css_set_put_fork")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fdf31f33
  5. Jun 14, 2023
  6. Jun 09, 2023
  7. Jun 05, 2023
  8. May 30, 2023
  9. May 24, 2023
Loading