Skip to content
Snippets Groups Projects
  1. Jan 19, 2024
    • Sebastian Andrzej Siewior's avatar
      futex: Prevent the reuse of stale pi_state · e626cb02
      Sebastian Andrzej Siewior authored
      
      Jiri Slaby reported a futex state inconsistency resulting in -EINVAL during
      a lock operation for a PI futex. It requires that the a lock process is
      interrupted by a timeout or signal:
      
        T1 Owns the futex in user space.
      
        T2 Tries to acquire the futex in kernel (futex_lock_pi()). Allocates a
           pi_state and attaches itself to it.
      
        T2 Times out and removes its rt_waiter from the rt_mutex. Drops the
           rtmutex lock and tries to acquire the hash bucket lock to remove
           the futex_q. The lock is contended and T2 schedules out.
      
        T1 Unlocks the futex (futex_unlock_pi()). Finds a futex_q but no
           rt_waiter. Unlocks the futex (do_uncontended) and makes it available
           to user space.
      
        T3 Acquires the futex in user space.
      
        T4 Tries to acquire the futex in kernel (futex_lock_pi()). Finds the
           existing futex_q of T2 and tries to attach itself to the existing
           pi_state.  This (attach_to_pi_state()) fails with -EINVAL because uval
           contains the TID of T3 but pi_state points to T1.
      
      It's incorrect to unlock the futex and make it available for user space to
      acquire as long as there is still an existing state attached to it in the
      kernel.
      
      T1 cannot hand over the futex to T2 because T2 already gave up and started
      to clean up and is blocked on the hash bucket lock, so T2's futex_q with
      the pi_state pointing to T1 is still queued.
      
      T2 observes the futex_q, but ignores it as there is no waiter on the
      corresponding rt_mutex and takes the uncontended path which allows the
      subsequent caller of futex_lock_pi() (T4) to observe that stale state.
      
      To prevent this the unlock path must dequeue all futex_q entries which
      point to the same pi_state when there is no waiter on the rt mutex. This
      requires obviously to make the dequeue conditional in the locking path to
      prevent a double dequeue. With that it's guaranteed that user space cannot
      observe an uncontended futex which has kernel state attached.
      
      Fixes: fbeb558b ("futex/pi: Fix recursive rt_mutex waiter state")
      Reported-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarJiri Slaby <jirislaby@kernel.org>
      Link: https://lore.kernel.org/r/20240118115451.0TkD_ZhB@linutronix.de
      Closes: https://lore.kernel.org/all/4611bcf2-44d0-4c34-9b84-17406f881003@kernel.org
      e626cb02
  2. Dec 21, 2023
  3. Nov 15, 2023
  4. Oct 27, 2023
  5. Oct 04, 2023
  6. Sep 29, 2023
  7. Sep 21, 2023
  8. Sep 20, 2023
  9. Sep 13, 2023
  10. Aug 18, 2023
  11. Dec 27, 2022
  12. Dec 02, 2022
    • Alexey Izbyshev's avatar
      futex: Resend potentially swallowed owner death notification · 90d75889
      Alexey Izbyshev authored
      
      Commit ca16d5be ("futex: Prevent robust futex exit race") addressed
      two cases when tasks waiting on a robust non-PI futex remained blocked
      despite the futex not being owned anymore:
      
      * if the owner died after writing zero to the futex word, but before
        waking up a waiter
      
      * if a task waiting on the futex was woken up, but died before updating
        the futex word (effectively swallowing the notification without acting
        on it)
      
      In the second case, the task could be woken up either by the previous
      owner (after the futex word was reset to zero) or by the kernel (after
      the OWNER_DIED bit was set and the TID part of the futex word was reset
      to zero) if the previous owner died without the resetting the futex.
      
      Because the referenced commit wakes up a potential waiter only if the
      whole futex word is zero, the latter subcase remains unaddressed.
      
      Fix this by looking only at the TID part of the futex when deciding
      whether a wake up is needed.
      
      Fixes: ca16d5be ("futex: Prevent robust futex exit race")
      Signed-off-by: default avatarAlexey Izbyshev <izbyshev@ispras.ru>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20221111215439.248185-1-izbyshev@ispras.ru
      90d75889
  13. Sep 07, 2022
    • Peter Zijlstra's avatar
      freezer,sched: Rewrite core freezer logic · f5d39b02
      Peter Zijlstra authored
      
      Rewrite the core freezer to behave better wrt thawing and be simpler
      in general.
      
      By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
      ensured frozen tasks stay frozen until thawed and don't randomly wake
      up early, as is currently possible.
      
      As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
      two PF_flags (yay!).
      
      Specifically; the current scheme works a little like:
      
      	freezer_do_not_count();
      	schedule();
      	freezer_count();
      
      And either the task is blocked, or it lands in try_to_freezer()
      through freezer_count(). Now, when it is blocked, the freezer
      considers it frozen and continues.
      
      However, on thawing, once pm_freezing is cleared, freezer_count()
      stops working, and any random/spurious wakeup will let a task run
      before its time.
      
      That is, thawing tries to thaw things in explicit order; kernel
      threads and workqueues before doing bringing SMP back before userspace
      etc.. However due to the above mentioned races it is entirely possible
      for userspace tasks to thaw (by accident) before SMP is back.
      
      This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
      where the userspace task requires a special CPU to run.
      
      As said; replace this with a special task state TASK_FROZEN and add
      the following state transitions:
      
      	TASK_FREEZABLE	-> TASK_FROZEN
      	__TASK_STOPPED	-> TASK_FROZEN
      	__TASK_TRACED	-> TASK_FROZEN
      
      The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
      (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
      is already required to deal with spurious wakeups and the freezer
      causes one such when thawing the task (since the original state is
      lost).
      
      The special __TASK_{STOPPED,TRACED} states *can* be restored since
      their canonical state is in ->jobctl.
      
      With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
      free of undue (early / spurious) wakeups.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
      f5d39b02
  14. May 13, 2022
  15. Apr 07, 2022
  16. Mar 21, 2022
  17. Dec 13, 2021
  18. Nov 24, 2021
  19. Oct 19, 2021
  20. Oct 07, 2021
Loading