lima_bo leak in kernel - page allocation failure
I'm observing a memory leak with the latest drm-misc where the lima_bo allocated in lima_bo_create never gets deallocated.
I'm able to reproduce this with gbm-surface and observing the "Unevictable" count grow with successive runs.
After some point the system runs out of memory and starts crashing like this:
[ 1870.317242] vertex-program-: page allocation failure: order:0, mode:0x4(GFP_DMA32), nodemask=(null)
[ 1870.326522] CPU: 2 PID: 711 Comm: vertex-program- Not tainted 5.1.0-rc5 #1
[ 1870.333402] Hardware name: Allwinner sun8i Family
[ 1870.338153] [<c011e704>] (unwind_backtrace) from [<c0116520>] (show_stack+0x2c/0x64)
[ 1870.345905] [<c0116520>] (show_stack) from [<c0e6b5a4>] (dump_stack+0x1b8/0x228)
[ 1870.353309] [<c0e6b5a4>] (dump_stack) from [<c02dd220>] (warn_alloc+0x180/0x33c)
[ 1870.360712] [<c02dd220>] (warn_alloc) from [<c02ded8c>] (__alloc_pages_nodemask+0x1814/0x1f3c)
[ 1870.369330] [<c02ded8c>] (__alloc_pages_nodemask) from [<c0307be0>] (shmem_getpage_gfp.constprop.1+0x218/0x1564)
[ 1870.379507] [<c0307be0>] (shmem_getpage_gfp.constprop.1) from [<c030cfa0>] (shmem_read_mapping_page_gfp+0x64/0x138)
[ 1870.389945] [<c030cfa0>] (shmem_read_mapping_page_gfp) from [<c0890c44>] (drm_gem_get_pages+0x118/0x464)
[ 1870.399467] [<c0890c44>] (drm_gem_get_pages) from [<bf0552f8>] (lima_bo_create+0x254/0x4bc [lima])
[ 1870.408461] [<bf0552f8>] (lima_bo_create [lima]) from [<bf05094c>] (lima_gem_create_handle+0x48/0xec [lima])
[ 1870.418315] [<bf05094c>] (lima_gem_create_handle [lima]) from [<bf049b78>] (lima_ioctl_gem_create+0x5c/0xec [lima])
[ 1870.428763] [<bf049b78>] (lima_ioctl_gem_create [lima]) from [<c089580c>] (drm_ioctl_kernel+0x144/0x218)
[ 1870.438246] [<c089580c>] (drm_ioctl_kernel) from [<c089607c>] (drm_ioctl+0x404/0x834)
[ 1870.446081] [<c089607c>] (drm_ioctl) from [<c03caed4>] (do_vfs_ioctl+0xc4/0x1b04)
[ 1870.453567] [<c03caed4>] (do_vfs_ioctl) from [<c03cc97c>] (ksys_ioctl+0x68/0xf8)
[ 1870.460966] [<c03cc97c>] (ksys_ioctl) from [<c03cca2c>] (sys_ioctl+0x20/0x40)
[ 1870.468105] [<c03cca2c>] (sys_ioctl) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
[ 1870.475669] Exception stack(0xc5b57fa8 to 0xc5b57ff0)
[ 1870.480724] 7fa0: b6523c98 00000005 00000005 c0106441 beffec40 00000007
[ 1870.488901] 7fc0: b6523c98 00000005 c0106441 00000036 00008000 00059cf0 000309e0 b6522808
[ 1870.497072] 7fe0: b6b65090 beffec24 b6b4e818 b698347c
[ 1870.502196] Mem-Info:
[ 1870.504582] active_anon:13152 inactive_anon:12163 isolated_anon:0
active_file:254 inactive_file:142 isolated_file:0
unevictable:75198 dirty:0 writeback:0 unstable:0
slab_reclaimable:1790 slab_unreclaimable:3469
mapped:2638 shmem:85157 pagetables:123 bounce:0
free:16546 free_pcp:70 free_cma:14544
[ 1870.538154] Node 0 active_anon:52608kB inactive_anon:48652kB active_file:1016kB inactive_file:568kB unevictable:300792kB isolated(anon):0kB isolated(file):0kB mapped:10552kB dirty:0kB writeback:0kB shmem:340628kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1870.561640] Normal free:66184kB min:8164kB low:8792kB high:9420kB active_anon:52584kB inactive_anon:48632kB active_file:940kB inactive_file:568kB unevictable:300792kB writepending:0kB present:524288kB managed:498384kB mlocked:0kB kernel_stack:696kB pagetables:492kB bounce:0kB free_pcp:280kB local_pcp:156kB free_cma:58176kB
[ 1870.590213] lowmem_reserve[]: 0 0 0
[ 1870.593753] Normal: 1084*4kB (UMEC) 517*8kB (UMC) 379*16kB (UMC) 180*32kB (C) 137*64kB (C) 94*128kB (C) 52*256kB (C) 13*512kB (C) 5*1024kB (C) 0*2048kB 0*4096kB = 66184kB
[ 1870.609038] 88058 total pagecache pages
[ 1870.612919] 2483 pages in swap cache
[ 1870.616497] Swap cache stats: add 33420, delete 30935, find 3489/5676
[ 1870.622963] Free swap = 172028kB
[ 1870.626286] Total swap = 262140kB
[ 1870.629600] 131072 pages RAM
[ 1870.632481] 0 pages HighMem/MovableOnly
[ 1870.636352] 6476 pages reserved
[ 1870.639504] 16384 pages cma reserved
I was able to bisect it to this commit in drm-misc-next:
5918045c4ed4 drm/scheduler: rework job destruction
So maybe we need to update something in the scheduler usage to deallocate those properly.