GM20B pmu timeout
[Posting here as well per Thorsten's suggestion, see LKML 1]
Hello,
Somewhere between 6.11-rc4 and 6.11-rc5 the following error message is displayed when trying to initialize a nvc0_screen on the Tegra X1's GM20B:
[ 34.431210] nouveau 57000000.gpu: pmu:hpq: timeout waiting for queue ready [ 34.438145] nouveau 57000000.gpu: gr: init failed, -110 nvc0_screen_create:1075 - Error allocating PGRAPH context for M2MF: -110 failed to create GPU screen
If we then try a second time we get a more detailed error message:
[ 27.432391] ------------[ cut here ]------------ [ 27.437019] nouveau 57000000.gpu: timeout [ 27.441083] WARNING: CPU: 2 PID: 307 at drivers/gpu/drm/nouveau/nvkm/engine/gr/gf100.c:840 gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau] [ 27.453897] Modules linked in: nouveau drm_ttm_helper ttm backlight gpu_sched i2c_algo_bit drm_gpuvm drm_exec efivarfs [ 27.464592] CPU: 2 UID: 0 PID: 307 Comm: loadjpeg Not tainted 6.11.0-rc4+ #1 (closed) [ 27.471628] Hardware name: nvidia NVIDIA P2371-2180/NVIDIA P2371-2180, BIOS 2024.10-rc5-00018-g56b47b8b6a09 10/01/2024 [ 27.482303] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 27.489251] pc : gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau] [ 27.495535] lr : gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau] [ 27.501794] sp : ffffffc082473810 [ 27.505100] x29: ffffffc082473840 x28: ffffff80c56fe500 x27: ffffff80c6f3be40 [ 27.512227] x26: 00000000804001ea x25: 0000000000000001 x24: 0000000000000000 [ 27.519351] x23: ffffff80c5516808 x22: ffffffc079d08350 x21: ffffff80c16bae40 [ 27.526476] x20: 0000000000409800 x19: ffffff80c5516808 x18: ffffffffffffffff [ 27.533599] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000006 [ 27.540724] x14: ffffffc0817defc8 x13: 74756f656d697420 x12: 3a7570672e303030 [ 27.547848] x11: ffffffc0817defc8 x10: 00000000000003f1 x9 : ffffffc081836fc8 [ 27.554972] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : 0000000000000001 [ 27.562096] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 27.569218] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80d578c600 [ 27.576341] Call trace: [ 27.578780] gf100_gr_fecs_bind_pointer+0x140/0x158 [nouveau] [ 27.584698] gf100_grctx_generate+0x54c/0x6f4 [nouveau] [ 27.590093] gf100_gr_chan_new+0x3f8/0x430 [nouveau] [ 27.595223] nvkm_gr_cclass_new+0x34/0x48 [nouveau] [ 27.600269] nvkm_cgrp_ectx_get+0x134/0x224 [nouveau] [ 27.605485] nvkm_cgrp_vctx_get+0x11c/0x300 [nouveau] [ 27.610704] nvkm_chan_cctx_get+0x144/0x25c [nouveau] [ 27.615920] nvkm_uchan_object_new+0xd8/0x1e0 [nouveau] [ 27.621311] nvkm_ioctl_new+0x14c/0x24c [nouveau] [ 27.626167] nvkm_ioctl+0xd0/0x280 [nouveau] [ 27.630590] nvkm_client_ioctl+0x10/0x1c [nouveau] [ 27.635551] nvif_client_ioctl+0x20/0x2c [nouveau] [ 27.640493] usif_ioctl+0x294/0x420 [nouveau] [ 27.645021] nouveau_drm_ioctl+0xb0/0xe0 [nouveau] [ 27.649982] __arm64_sys_ioctl+0xac/0xf0 [ 27.653900] invoke_syscall+0x48/0x104 [ 27.657645] el0_svc_common.constprop.0+0x40/0xe0 [ 27.662341] do_el0_svc+0x1c/0x28 [ 27.665650] el0_svc+0x3c/0x108 [ 27.668787] el0t_64_sync_handler+0x120/0x12c [ 27.673133] el0t_64_sync+0x190/0x194 [ 27.676789] ---[ end trace 0000000000000000 ]--- [ 27.681937] nouveau 57000000.gpu: gr: failed to construct context [ 27.688126] nouveau 57000000.gpu: fifo:000000:0002:[loadjpeg[307]] ectx 0[gr]: -110 [ 27.695786] nouveau 57000000.gpu: fifo:000000:0002:0002:[loadjpeg[307]] vctx 0[gr]: -110 nvc0_screen_create:1075 - Error allocating PGRAPH context for M2MF: -110 failed to create GPU screen
but I am not sure if this is connected to the fact that the first attempt failed or not.
When trying to bissect the issue the "bad" commit I obtained was 9b340aeb. However, checking out this commit and compiling the kernel leads to a different error where we have a boot regression:
[ 19.146693] nouveau 57000000.gpu: Adding to iommu group 3 [ 19.155581] nouveau 57000000.gpu: NVIDIA GM20B (12b000a1) [ 19.161025] nouveau 57000000.gpu: imem: using IOMMU [ 22.451833] ------------[ cut here ]------------ [ 22.456460] nouveau 57000000.gpu: timeout [ 22.460508] WARNING: CPU: 0 PID: 201 at drivers/gpu/drm/nouveau/nvkm/falcon/gm200.c:231 gm200_flcn_fw_boot+0x2a4/0x428 [nouveau] [ 22.472384] Modules linked in: nouveau(+) drm_ttm_helper ttm backlight gpu_sched i2c_algo_bit drm_gpuvm drm_exec efivarfs [ 22.483342] CPU: 0 UID: 0 PID: 201 Comm: (udev-worker) Not tainted 6.11.0-rc1+ #4 [ 22.490811] Hardware name: nvidia NVIDIA P2371-2180/NVIDIA P2371-2180, BIOS 2024.10-rc5-00018-g56b47b8b6a09 10/01/2024 [ 22.501485] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 22.508434] pc : gm200_flcn_fw_boot+0x2a4/0x428 [nouveau] [ 22.514063] lr : gm200_flcn_fw_boot+0x2a4/0x428 [nouveau] [ 22.519656] sp : ffffffc0822fb3e0 [ 22.522961] x29: ffffffc0822fb410 x28: ffffff80c7bf0008 x27: ffffff80d5625208 [ 22.530088] x26: 0000000000000001 x25: 0000000000000010 x24: 0000000000000000 [ 22.537213] x23: ffffff80c4e920b8 x22: 0000000000000000 x21: 0000000000000000 [ 22.544336] x20: 0000000000000010 x19: ffffff80c4e920b8 x18: ffffffffffffffff [ 22.551460] x17: 000000000000d000 x16: 0000000000000000 x15: 0000000000000006 [ 22.558585] x14: ffffffc08181efa8 x13: 74756f656d697420 x12: 3a7570672e303030 [ 22.565709] x11: ffffffc08181efa8 x10: 00000000000003fd x9 : ffffffc081876fa8 [ 22.572834] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : 0000000000000001 [ 22.579958] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 22.587083] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80c2f98000 [ 22.594208] Call trace: [ 22.596647] gm200_flcn_fw_boot+0x2a4/0x428 [nouveau] [ 22.601904] nvkm_falcon_fw_boot+0x1b4/0x598 [nouveau] [ 22.607237] nvkm_acr_hsfw_boot+0x78/0xa4 [nouveau] [ 22.612309] gm200_acr_init+0x18/0x24 [nouveau] [ 22.617034] nvkm_acr_load+0x7c/0x18c [nouveau] [ 22.621760] nvkm_acr_init+0x54/0x70 [nouveau] [ 22.626400] nvkm_subdev_init_+0x5c/0x12c [nouveau] [ 22.631471] nvkm_subdev_init+0x60/0xa0 [nouveau] [ 22.636370] nvkm_device_init+0x160/0x2a0 [nouveau] [ 22.641448] nvkm_udevice_init+0x60/0xa0 [nouveau] [ 22.646439] nvkm_object_init+0x48/0x1c0 [nouveau] [ 22.651426] nvkm_ioctl_new+0x164/0x24c [nouveau] [ 22.656323] nvkm_ioctl+0xd0/0x280 [nouveau] [ 22.660787] nvkm_client_ioctl+0x10/0x1c [nouveau] [ 22.665784] nvif_object_ctor+0xe8/0x1b8 [nouveau] [ 22.670769] nvif_device_ctor+0x28/0x78 [nouveau] [ 22.675667] nouveau_cli_init+0x154/0x5e0 [nouveau] [ 22.680749] nouveau_drm_device_init+0x84/0x2e0 [nouveau] [ 22.686352] nouveau_platform_device_create+0x90/0xe0 [nouveau] [ 22.692476] nouveau_platform_probe+0x40/0xc0 [nouveau] [ 22.697904] platform_probe+0x68/0xd8 [ 22.701564] really_probe+0xbc/0x2c0 [ 22.705133] __driver_probe_device+0x78/0x120 [ 22.709480] driver_probe_device+0x3c/0x160 [ 22.713654] __driver_attach+0x90/0x1a0 [ 22.717481] bus_for_each_dev+0x78/0xd8 [ 22.721309] driver_attach+0x24/0x30 [ 22.724875] bus_add_driver+0xe4/0x208 [ 22.728615] driver_register+0x68/0x124 [ 22.732443] __platform_driver_register+0x28/0x40 [ 22.737137] nouveau_drm_init+0x90/0x1000 [nouveau] [ 22.742217] do_one_initcall+0x44/0x230 [ 22.746047] do_init_module+0x5c/0x220 [ 22.749788] load_module+0x748/0x87c [ 22.753355] init_module_from_file+0x88/0xcc [ 22.757617] __arm64_sys_finit_module+0x164/0x328 [ 22.762310] invoke_syscall+0x48/0x104 [ 22.766054] el0_svc_common+0xc8/0xe8 [ 22.769710] do_el0_svc+0x20/0x34 [ 22.773017] el0_svc+0x3c/0x108 [ 22.776155] el0t_64_sync_handler+0x120/0x12c [ 22.780502] el0t_64_sync+0x190/0x194 [ 22.784156] ---[ end trace 0000000000000000 ]--- [ 22.788838] nouveau 57000000.gpu: pmu(acr): mbox 00000001 00000000 [ 22.795033] nouveau 57000000.gpu: pmu(acr):load: boot failed: -110 [ 22.801235] nouveau 57000000.gpu: acr: init failed, -110 [ 22.806858] nouveau 57000000.gpu: init failed with -110 [ 22.812084] nouveau: DRM-master:00000000:00000080: init failed with -110 [ 22.818793] nouveau 57000000.gpu: DRM-master: Device allocation failed: -110 [ 22.826368] ------------[ cut here ]------------ [ 22.830980] WARNING: CPU: 2 PID: 201 at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/base.c:239 nvkm_mmu_dtor+0xac/0xc0 [nouveau] [ 22.842573] Modules linked in: nouveau(+) drm_ttm_helper ttm backlight gpu_sched i2c_algo_bit drm_gpuvm drm_exec efivarfs [ 22.853529] CPU: 2 UID: 0 PID: 201 Comm: (udev-worker) Tainted: G W 6.11.0-rc1+ #4 [ 22.862475] Tainted: [W]=WARN [ 22.865433] Hardware name: nvidia NVIDIA P2371-2180/NVIDIA P2371-2180, BIOS 2024.10-rc5-00018-g56b47b8b6a09 10/01/2024 [ 22.876107] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 22.883055] pc : nvkm_mmu_dtor+0xac/0xc0 [nouveau] [ 22.888063] lr : nvkm_mmu_dtor+0x24/0xc0 [nouveau] [ 22.893057] sp : ffffffc0822fb7f0 [ 22.896362] x29: ffffffc0822fb7f0 x28: 0000000000000000 x27: ffffffc079c69a18 [ 22.903488] x26: ffffffc079c69d38 x25: ffffffc081892ce8 x24: ffffff80d5624e00 [ 22.910613] x23: ffffff80d5624e08 x22: dead000000000122 x21: dead000000000100 [ 22.917737] x20: ffffff80d5624f50 x19: ffffff80c4e07500 x18: ffffffffffffffff [ 22.924861] x17: 0000000000001000 x16: 0000000000000000 x15: 0000000000000000 [ 22.931985] x14: 0000000000000000 x13: dead000000000122 x12: 0000000000000001 [ 22.939109] x11: 0000000080000000 x10: 0000000000000000 x9 : 0000000000000001 [ 22.946233] x8 : 00000000000007e0 x7 : 0000000000000000 x6 : 0000000000000239 [ 22.953357] x5 : 000000000010000c x4 : dead000000000122 x3 : ffffff80c2fa5b38 [ 22.960481] x2 : ffffff80d519a320 x1 : ffffff80d519a2d0 x0 : ffffff80d519a2c0 [ 22.967604] Call trace: [ 22.970042] nvkm_mmu_dtor+0xac/0xc0 [nouveau] [ 22.974690] nvkm_subdev_del+0x6c/0xf8 [nouveau] [ 22.979504] nvkm_device_del+0x78/0x120 [nouveau] [ 22.984410] nouveau_platform_device_create+0x54/0xe0 [nouveau] [ 22.990534] nouveau_platform_probe+0x40/0xc0 [nouveau] [ 22.995966] platform_probe+0x68/0xd8 [ 22.999624] really_probe+0xbc/0x2c0 [ 23.003192] __driver_probe_device+0x78/0x120 [ 23.007540] driver_probe_device+0x3c/0x160 [ 23.011714] __driver_attach+0x90/0x1a0 [ 23.015542] bus_for_each_dev+0x78/0xd8 [ 23.019369] driver_attach+0x24/0x30 [ 23.022937] bus_add_driver+0xe4/0x208 [ 23.026676] driver_register+0x68/0x124 [ 23.030503] __platform_driver_register+0x28/0x40 [ 23.035197] nouveau_drm_init+0x90/0x1000 [nouveau] [ 23.040274] do_one_initcall+0x44/0x230 [ 23.044103] do_init_module+0x5c/0x220 [ 23.047844] load_module+0x748/0x87c [ 23.051412] init_module_from_file+0x88/0xcc [ 23.055672] __arm64_sys_finit_module+0x164/0x328 [ 23.060367] invoke_syscall+0x48/0x104 [ 23.064110] el0_svc_common+0xc8/0xe8 [ 23.067765] do_el0_svc+0x20/0x34 [ 23.071073] el0_svc+0x3c/0x108 [ 23.074206] el0t_64_sync_handler+0x120/0x12c [ 23.078553] el0t_64_sync+0x190/0x194 [ 23.082206] ---[ end trace 0000000000000000 ]--- [ 23.087065] nouveau 57000000.gpu: imem: instobj LRU not empty! [ 23.092906] nouveau 57000000.gpu: imem: instobj vmap area not empty! 0x40000 bytes still mapped [ 23.101958] nvkm: mm not clean! [ 23.105095] nvkm: node list: [ 23.107994] nvkm: 00000000 00000074 0 [ 23.111750] nvkm: 00400074 00000040 1 [ 23.115496] nvkm: 000000b4 003fff4c 0 [ 23.119248] nvkm: free list: [ 23.122128] nvkm: 00000000 00000074 0 [ 23.125880] nvkm: 000000b4 003fff4c 0 [ 23.129643] nouveau 57000000.gpu: probe with driver nouveau failed with error -110
so I am not sure that this is the actual commit that introduces the breakage. I have also tried to manually checkout some commits to see where the problem could be but unfortunately nothing came out of it.
Best regards, Diogo