util: Add a util_sparse_array data structure
A thread-safe automatically growing sparse array data structure
This data structure has the following very nice properties:
-
Accessing an element is basically constant time. Technically, it's O(log_b n) where the base b is the node size and n is the maximum index. However, node sizes are expected to be fairly large and the index is a uint64_t so, if your node size is 256, it's O(8).
-
The data stored in the array is never moved in memory. Instead, the data structure only ever grows and new nodes are added as-needed. This means it's safe to store a pointer to something stored in the sparse array without worrying about a
realloc
invalidating it. -
The data structure is thread-safe. No guarantees are made about the data stored in the sparse array but it is safe to call
util_sparse_array_get(arr, idx)
from as many threads as you'd like and we guarantee that two calls toutil_sparse_array_get(arr, idx)
with the same array and index will always return the same pointer regardless contention between threads. -
The data structure is lock-free. All manipulations of the tree are done by a careful use of atomics to maintain thread safety and no locks are ever taken other than those taken implicitly by
calloc()
. If no allocation is required,util_sparse_array_get(arr, idx)
does a simple walk over the tree should be efficient even in the case where many threads are accessing the sparse array at once.
At the moment, this MR is just the data structure so we can discuss its merits, get some code review, and possibly some suggestions for how to better unit-test it. I see this being useful for a variety of things in core mesa and the Intel drivers:
- Replace the hash tables currently used for GL name -> object lookup. Should be faster and doesn't require locking like the hash table does.
- Adding per-context data to BOs in iris as an array indexed by context id
- Tracking all BOs in ANV as an array indexed by GEM handle
- Tracking an array of CPU-side surface state information in ANV
- and more?
As far as performance goes, I did a bit of benchmarking by adding the following to the unit test:
#define BENCH_ITER (1 << 27)
static void
bench_sparse_array(void)
{
struct util_sparse_array arr;
util_sparse_array_init(&arr, sizeof(uint32_t), (1 << 10));
for (unsigned i = 0; i < BENCH_ITER; i++) {
unsigned idx = rand() % (1 << 14);
uint32_t *elem = util_sparse_array_get(&arr, idx);
(*elem)++;
}
uint32_t *elem = util_sparse_array_get(&arr, 0);
assert(*elem != UINT32_MAX);
}
#include "util/set.h"
static void
bench_hash_table(void)
{
struct set *set = _mesa_pointer_set_create(NULL);
_mesa_set_resize(set, (1 << 15));
for (unsigned i = 0; i < BENCH_ITER; i++) {
unsigned idx = rand() % (1 << 14);
_mesa_set_add(set, (void *)(uintptr_t)(100 + idx));
}
void *entry = _mesa_set_search(set, (void *)(uintptr_t)50);
assert(entry == NULL);
}
The end result was
bench_sparse_array: 1.52s user 0.00s system 99% cpu 1.530 total
bench_hash_table: 11.27s user 0.00s system 99% cpu 11.313 total
So, yeah.... it's faster than hash tables...