llvmpipe: sampler matrix cache is slow because of mutex lock: 650% speedup by using RCU (!30267) · Merge requests · Mesa / mesa

Aleksi Sapon requested to merge DDoSQc/mesa:lp-sample-matrix-rcu into main Jul 19, 2024

Overview

get_sample_function in lp_texture_handle is very slow because of the simple_mtx overhead. This cost is paid once during the first pipeline execution, when the sampling functions are first jit'ed, but it's still way too much overhead.

I've implemented an RCU-like trick to remove the lock on reading the sample function cache, since it's a "mostly read" hash table. The table is updated under lock, but the reader only needs to read an atomic pointer, which is swapped in to a newer version by an updating thread. Disposal of the old tables is done when the cache is cleared and no more readers are possibly left, to avoid deleting a hash table that might still be in use.

Test environment

MacBook pro with M3 Pro, macOS Sonoma 14.5

OpenUSD on the feature-hgi-vulkan branch, with some public (but not yet merged) changes for macOS and Lavapipe support.

Test case

Enabling USDView dome lighting, waiting for the change to take effect.

Results

libvulkan_lvp.dylib`get_sample_function

simple_mtx: 752'888 samples, ~130s realtime, ~66% CPU
rcu: 8'794 samples, ~20s realtime, ~95% CPU

sampling rate: 997 Hz

realtime speedup: 6.5x

PS: I tried a rwlock, it's actually slower because it has more locking overhead!

Edited Jul 22, 2024 by Aleksi Sapon

Admin message

llvmpipe: sampler matrix cache is slow because of mutex lock: 650% speedup by using RCU

Overview

Test environment

Test case

Results

Merge request reports