userptr deadlock in Vulkan CTS: mmu_notifier vs workers vs struct_mutex
@majanes
Submitted by Mark Janes Assigned to Chris Wilson @ickle
Link to original bug (#108456)
Description
VulkanCTS 1.1.2 causes machines to intermittently hang. In dmesg, the following output was found:
[78782.499488] deqp-vk D 0 19687 19392 0x00000000
[78782.499488] Call Trace:
[78782.499489] ? __schedule+0x291/0x880
[78782.499490] schedule+0x28/0x80
[78782.499491] schedule_preempt_disabled+0xa/0x10
[78782.499492] __mutex_lock.isra.1+0x1a0/0x4e0
[78782.499496] ? drm_gem_handle_create+0x40/0x40 [drm]
[78782.499505] ? i915_gem_close_object+0x3a/0x160 [i915]
[78782.499514] i915_gem_close_object+0x3a/0x160 [i915]
[78782.499519] ? drm_gem_handle_create+0x40/0x40 [drm]
[78782.499523] drm_gem_object_release_handle+0x2c/0x90 [drm]
[78782.499527] drm_gem_handle_delete+0x57/0x80 [drm]
[78782.499531] drm_ioctl_kernel+0x59/0xb0 [drm]
[78782.499535] drm_ioctl+0x2cb/0x380 [drm]
[78782.499539] ? drm_gem_handle_create+0x40/0x40 [drm]
[78782.499541] do_vfs_ioctl+0xa1/0x610
[78782.499542] ? do_munmap+0x32e/0x440
[78782.499543] SyS_ioctl+0x74/0x80
[78782.499544] entry_SYSCALL_64_fastpath+0x24/0x87
[78782.499545] RIP: 0033:0x7fd6e249af07
[78782.499545] RSP: 002b:00007ffe035cb448 EFLAGS: 00000246
[78782.499546] INFO: task kworker/7:2:19952 blocked for more than 120 seconds.
[78782.499564] Tainted: G U 4.15.0 #4 (moved)
[78782.499576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[78782.499595] kworker/7:2 D 0 19952 2 0x80000000
[78782.499600] Workqueue: events drm_fb_helper_dirty_work [drm_kms_helper]
This error is easier to reproduce on gen8 systems. Often, the hung machines reported VK_ERROR_OUT_OF_DEVICE_MEMORY for tests like:
dEQP-VK.pipeline.render_to_image.core.cube_array.huge.width_height_layers.r8g8b8a8_unorm_d16_unorm
Machines may complete tests, then hang on subsequent CI runs.