NULL pointer dereference in ttm_pool_shrink
Brief summary of the problem:
It would appear like @ckoenig's new TTM allocator can sometimes lead to a kernel panic. After a quick analysis, it seems like some buffers have
Hardware description:
- CPU: AMD Ryzen 7 PRO 4750U
- GPU: AMD Ryzen 7 PRO 4750U
- System Memory: 16 GB
- Display(s): 2 x 1080p
- Type of Diplay Connection: eDP + HDMI
System information:
- Distro name and Version: Arch Linux
- Custom kernel: 5.10-rc2 (drm-misc-next)
- AMD package version: "No package"
How to reproduce the issue:
- Deploy the current
drm-misc-next
tree - Start Rise of the Tomb Raider's benchmark
- Wait for it to crash
Here is the problematic piece of code:
static unsigned int ttm_pool_shrink(void)
{
struct ttm_pool_type *pt;
unsigned int num_freed;
struct page *p;
spin_lock(&shrinker_lock);
pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
p = ttm_pool_type_take(pt);
if (p) {
ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
num_freed = 1 << pt->order;
} else {
num_freed = 0;
}
list_move_tail(&pt->shrinker_list, &shrinker_list);
spin_unlock(&shrinker_lock);
return num_freed;
}
For some buffers, pt->pool
is NULL which leads to the kernel panic. Changing if (p) {
into if (pt->pool && p) {
prevents the crash, but quite likely leads to memory leaks.
I'll spend more time tomorrow, bisecting the problem (just to be sure).
Edited by Martin Roukala