anv: binding table pool leak or overly cached
Noticed this while running dEQP, and can hit oom on some low ram devices.
Below is an example (the chosen test can be any that involves cmd submission, so that device binding/bindless tables are pinned). This is the normal case for that single test run, and it takes 4.2Mb
total bo sizes (including pinned state pools).
INTEL_DEBUG=submit ./deqp-vk --deqp-log-filename=/tmp/dEQP-Log.qpa --deqp-case=dEQP-VK.api.buffer_view.access.suballocation.buffer_view_memory_test_complete
Batch offset=0x0 len=0x0 on queue 0 (aperture: 4.2Mb, 0.0Mb VRAM only)
BO: addr=0xfffffffefffff000-0xfffffffeffffffff size= 4KB handle=00007 capture=1 vram_only=0 name=workaround
BO: addr=0xfffffffeffefb000-0xfffffffeffffcfff size= 1032KB handle=00011 capture=0 vram_only=0 name=user
BO: addr=0xfffffffeffdfb000-0xfffffffeffefafff size= 1024KB handle=00012 capture=0 vram_only=0 name=user
BO: addr=0xfffffffeffdf9000-0xfffffffeffdf9fff size= 4KB handle=00014 capture=0 vram_only=0 name=descriptors
BO: addr=0xfffffffeffdfa000-0xfffffffeffdfafff size= 4KB handle=00013 capture=0 vram_only=0 name=user
BO: addr=0x00000000c0000000-0x00000000c003ffff size= 256KB handle=00002 capture=1 vram_only=0 name=dynamic pool
BO: addr=0x0000000200000000-0x000000020003ffff size= 256KB handle=00003 capture=1 vram_only=0 name=instruction pool
BO: addr=0x0000000140000000-0x000000014000ffff size= 64KB handle=00004 capture=1 vram_only=0 name=internal surface state pool
BO: addr=0x00000001c0000000-0x00000001c000ffff size= 64KB handle=00005 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x0000000100000000-0x00000001000fffff size= 1024KB handle=00006 capture=1 vram_only=0 name=binding table pool
BO: addr=0xfffffffeffdf8000-0xfffffffeffdf8fff size= 4KB handle=00015 capture=0 vram_only=0 name=user
BO: addr=0x00000000c0040000-0x00000000c007ffff size= 256KB handle=00010 capture=1 vram_only=0 name=dynamic pool
BO: addr=0x0000000000200000-0x000000000023ffff size= 256KB handle=00001 capture=1 vram_only=0 name=general pool
BO: addr=0xfffffffeffdf6000-0xfffffffeffdf7fff size= 8KB handle=00016 capture=1 vram_only=0 name=batch
However, if you run the same test right after deqp object_management test group with this caselist.txt, you'll stablely get >1GB
total bo size:
INTEL_DEBUG=submit ./deqp-vk --deqp-log-filename=/tmp/dEQP-Log.qpa --deqp-caselist-file=./caselist.txt
Batch offset=0x0 len=0x0 on queue 0 (aperture: 1034.5Mb, 0.0Mb VRAM only)
BO: addr=0xfffffffefffff000-0xfffffffeffffffff size= 4KB handle=00007 capture=1 vram_only=0 name=workaround
BO: addr=0xfffffffef7efb000-0xfffffffef7ffcfff size= 1032KB handle=16418 capture=0 vram_only=0 name=user
BO: addr=0xfffffffef7dfb000-0xfffffffef7efafff size= 1024KB handle=16419 capture=0 vram_only=0 name=user
BO: addr=0xfffffffef7df9000-0xfffffffef7df9fff size= 4KB handle=16421 capture=0 vram_only=0 name=descriptors
BO: addr=0xfffffffef7dfa000-0xfffffffef7dfafff size= 4KB handle=16420 capture=0 vram_only=0 name=user
BO: addr=0x00000000c0000000-0x00000000c003ffff size= 256KB handle=00002 capture=1 vram_only=0 name=dynamic pool
BO: addr=0x0000000200000000-0x000000020003ffff size= 256KB handle=00003 capture=1 vram_only=0 name=instruction pool
BO: addr=0x0000000140000000-0x000000014000ffff size= 64KB handle=00004 capture=1 vram_only=0 name=internal surface state pool
BO: addr=0x00000001c0000000-0x00000001c000ffff size= 64KB handle=00005 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x0000000100000000-0x00000001000fffff size= 1024KB handle=00006 capture=1 vram_only=0 name=binding table pool
BO: addr=0xfffffffef7df8000-0xfffffffef7df8fff size= 4KB handle=16422 capture=0 vram_only=0 name=user
BO: addr=0x00000001c0010000-0x00000001c001ffff size= 64KB handle=00016 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x00000001c0020000-0x00000001c003ffff size= 128KB handle=00017 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x00000001c0040000-0x00000001c007ffff size= 256KB handle=00018 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x00000001c0080000-0x00000001c00fffff size= 512KB handle=00019 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x00000001c0100000-0x00000001c01fffff size= 1024KB handle=00020 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x00000001c0200000-0x00000001c03fffff size= 2048KB handle=00021 capture=1 vram_only=0 name=bindless surface state pool
BO: addr=0x0000000140010000-0x000000014001ffff size= 64KB handle=00024 capture=1 vram_only=0 name=internal surface state pool
BO: addr=0x0000000140020000-0x000000014003ffff size= 128KB handle=00025 capture=1 vram_only=0 name=internal surface state pool
BO: addr=0x0000000140040000-0x000000014007ffff size= 256KB handle=00026 capture=1 vram_only=0 name=internal surface state pool
BO: addr=0x0000000140080000-0x00000001400fffff size= 512KB handle=00027 capture=1 vram_only=0 name=internal surface state pool
BO: addr=0x0000000140100000-0x00000001401fffff size= 1024KB handle=00028 capture=1 vram_only=0 name=internal surface state pool
BO: addr=0x00000000c0040000-0x00000000c007ffff size= 256KB handle=00010 capture=1 vram_only=0 name=dynamic pool
BO: addr=0x00000000c0080000-0x00000000c00fffff size= 512KB handle=00015 capture=1 vram_only=0 name=dynamic pool
BO: addr=0x00000000c0100000-0x00000000c01fffff size= 1024KB handle=00022 capture=1 vram_only=0 name=dynamic pool
BO: addr=0x0000000000200000-0x000000000023ffff size= 256KB handle=00001 capture=1 vram_only=0 name=general pool
BO: addr=0x0000000100100000-0x00000001001fffff size= 1024KB handle=00041 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000100200000-0x00000001003fffff size= 2048KB handle=00058 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000100400000-0x00000001007fffff size= 4096KB handle=00091 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000100800000-0x0000000100ffffff size= 8192KB handle=00156 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000101000000-0x0000000101ffffff size= 16384KB handle=00285 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000102000000-0x0000000103ffffff size= 32768KB handle=00542 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000104000000-0x0000000107ffffff size= 65536KB handle=01055 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000108000000-0x000000010fffffff size= 131072KB handle=02080 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000110000000-0x000000011fffffff size= 262144KB handle=04129 capture=1 vram_only=0 name=binding table pool
BO: addr=0x0000000120000000-0x000000013fffffff size= 524288KB handle=08226 capture=1 vram_only=0 name=binding table pool
BO: addr=0xfffffffeffff5000-0xfffffffeffff6fff size= 8KB handle=00014 capture=1 vram_only=0 name=batch
The The next order of binding table pool is 1GB
, and in some combinations of test runs, that 1GB
order can be allocated out as well. 2GB
total makes things worse on low ram devices. Those state pool bos seem to be lazily paged out, but EB forces the entire bo allocated, causing a spike in system memory load, and likely triggers OOM.
Is this intended behavior? or due to some leak?