Deadlock in anv_timelines_wait()
After experiencing frequent deadlocks in intel vulkan library I compiled the latest 20.06 version with debug info. Here's the deadlock trace I see in gdb:
#0 __lll_lock_wait (futex=futex@entry=0x5555fb845bd0, private=0) at lowlevellock.c:52
#1 0x00007f27cc8ea0a3 in __GI___pthread_mutex_lock (mutex=0x5555fb845bd0) at ../nptl/pthread_mutex_lock.c:80
#2 0x00007f26de2096f3 in anv_timelines_wait (device=0x5555fb844240, timelines=0x7f2600000c50, serials=0x7f2600000c70, n_timelines=2,
wait_all=false, abs_timeout_ns=18446744073709551615) at ../src/intel/vulkan/anv_queue.c:2234
After examining the code it seems that anv_timelines_wait() tries to lock the same mutex twice. Pay attention that after pthread_cond_timedwait() the code never unlocks the mutex
if (!wait_all && n_timelines > 1) {
while (1) {
VkResult result;
pthread_mutex_lock(&device->mutex);
for (uint32_t i = 0; i < n_timelines; i++) {
result =
anv_timeline_wait_locked(device, timelines[i], serials[i], 0);
if (result != VK_TIMEOUT)
break;
}
if (result != VK_TIMEOUT ||
anv_gettime_ns() >= abs_timeout_ns) {
pthread_mutex_unlock(&device->mutex);
return result;
}
/* If none of them are ready do a short wait so we don't completely
* spin while holding the lock. The 10us is completely arbitrary.
*/
uint64_t abs_short_wait_ns =
anv_get_absolute_timeout(
MIN2((anv_gettime_ns() - abs_timeout_ns) / 10, 10 * 1000));
struct timespec abstime = {
.tv_sec = abs_short_wait_ns / NSEC_PER_SEC,
.tv_nsec = abs_short_wait_ns % NSEC_PER_SEC,
};
ASSERTED int ret;
ret = pthread_cond_timedwait(&device->queue_submit,
&device->mutex, &abstime);
assert(ret != EINVAL);
}
}
Edited by alex fishman