Skip to content
Snippets Groups Projects
Forked from drm / msm
Loading
  • Carlos Llamas's avatar
    7710e2cc
    binder: switch alloc->mutex to spinlock_t · 7710e2cc
    Carlos Llamas authored
    
    The alloc->mutex is a highly contended lock that causes performance
    issues on Android devices. When a low-priority task is given this lock
    and it sleeps, it becomes difficult for the task to wake up and complete
    its work. This delays other tasks that are also waiting on the mutex.
    
    The problem gets worse when there is memory pressure in the system,
    because this increases the contention on the alloc->mutex while the
    shrinker reclaims binder pages.
    
    Switching to a spinlock helps to keep the waiters running and avoids the
    overhead of waking up tasks. This significantly improves the transaction
    latency when the problematic scenario occurs.
    
    The performance impact of this patchset was measured by stress-testing
    the binder alloc contention. In this test, several clients of different
    priorities send thousands of transactions of different sizes to a single
    server. In parallel, pages get reclaimed using the shinker's debugfs.
    
    The test was run on a Pixel 8, Pixel 6 and qemu machine. The results
    were similar on all three devices:
    
    after:
      | sched  | prio | average | max       | min     |
      |--------+------+---------+-----------+---------|
      | fifo   |   99 | 0.135ms |   1.197ms | 0.022ms |
      | fifo   |   01 | 0.136ms |   5.232ms | 0.018ms |
      | other  |  -20 | 0.180ms |   7.403ms | 0.019ms |
      | other  |   19 | 0.241ms |  58.094ms | 0.018ms |
    
    before:
      | sched  | prio | average | max       | min     |
      |--------+------+---------+-----------+---------|
      | fifo   |   99 | 0.350ms | 248.730ms | 0.020ms |
      | fifo   |   01 | 0.357ms | 248.817ms | 0.024ms |
      | other  |  -20 | 0.399ms | 249.906ms | 0.020ms |
      | other  |   19 | 0.477ms | 297.756ms | 0.022ms |
    
    The key metrics above are the average and max latencies (wall time).
    These improvements should roughly translate to p95-p99 latencies on real
    workloads. The response time is up to 200x faster in these scenarios and
    there is no penalty in the regular path.
    
    Note that it is only possible to convert this lock after a series of
    changes made by previous patches. These mainly include refactoring the
    sections that might_sleep() and changing the locking order with the
    mmap_lock amongst others.
    
    Reviewed-by: default avatarAlice Ryhl <aliceryhl@google.com>
    Signed-off-by: default avatarCarlos Llamas <cmllamas@google.com>
    Link: https://lore.kernel.org/r/20231201172212.1813387-29-cmllamas@google.com
    
    
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    7710e2cc
    History
    binder: switch alloc->mutex to spinlock_t
    Carlos Llamas authored
    
    The alloc->mutex is a highly contended lock that causes performance
    issues on Android devices. When a low-priority task is given this lock
    and it sleeps, it becomes difficult for the task to wake up and complete
    its work. This delays other tasks that are also waiting on the mutex.
    
    The problem gets worse when there is memory pressure in the system,
    because this increases the contention on the alloc->mutex while the
    shrinker reclaims binder pages.
    
    Switching to a spinlock helps to keep the waiters running and avoids the
    overhead of waking up tasks. This significantly improves the transaction
    latency when the problematic scenario occurs.
    
    The performance impact of this patchset was measured by stress-testing
    the binder alloc contention. In this test, several clients of different
    priorities send thousands of transactions of different sizes to a single
    server. In parallel, pages get reclaimed using the shinker's debugfs.
    
    The test was run on a Pixel 8, Pixel 6 and qemu machine. The results
    were similar on all three devices:
    
    after:
      | sched  | prio | average | max       | min     |
      |--------+------+---------+-----------+---------|
      | fifo   |   99 | 0.135ms |   1.197ms | 0.022ms |
      | fifo   |   01 | 0.136ms |   5.232ms | 0.018ms |
      | other  |  -20 | 0.180ms |   7.403ms | 0.019ms |
      | other  |   19 | 0.241ms |  58.094ms | 0.018ms |
    
    before:
      | sched  | prio | average | max       | min     |
      |--------+------+---------+-----------+---------|
      | fifo   |   99 | 0.350ms | 248.730ms | 0.020ms |
      | fifo   |   01 | 0.357ms | 248.817ms | 0.024ms |
      | other  |  -20 | 0.399ms | 249.906ms | 0.020ms |
      | other  |   19 | 0.477ms | 297.756ms | 0.022ms |
    
    The key metrics above are the average and max latencies (wall time).
    These improvements should roughly translate to p95-p99 latencies on real
    workloads. The response time is up to 200x faster in these scenarios and
    there is no penalty in the regular path.
    
    Note that it is only possible to convert this lock after a series of
    changes made by previous patches. These mainly include refactoring the
    sections that might_sleep() and changing the locking order with the
    mmap_lock amongst others.
    
    Reviewed-by: default avatarAlice Ryhl <aliceryhl@google.com>
    Signed-off-by: default avatarCarlos Llamas <cmllamas@google.com>
    Link: https://lore.kernel.org/r/20231201172212.1813387-29-cmllamas@google.com
    
    
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
binder_alloc.h 6.29 KiB