Kernel panic due to NULL ringbuffer vaddr dereference in 5.4.28
I was rebooting my laptop a few times and hit this panic on a plain 5.4.28 kernel:
BUG: kernel NULL pointer dereference, address: 0000000000000b60
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] SMP
CPU: 15 PID: 572 Comm: Xorg Tainted: G U 5.4.28-00007-g64bb42e80256-dirty #2
Hardware name: Dell Inc. Precision 5540/0FMYX6, BIOS 1.5.0 12/25/2019
RIP: 0010:gen8_emit_flush+0x28/0x60
Code: 40 00 0f 1f 44 00 00 55 89 f5 be 04 00 00 00 53 48 89 fb e8 1a 83 00 00 48 3d 00 f0 ff ff 77 1d 83 e5 01 ba 02 40 20 13 75 16 <89> 10 48 c7 40 04 04>
RSP: 0018:ffff97e580eb3a40 EFLAGS: 00010202
RAX: 0000000000000b60 RBX: ffff8fcfd0b86580 RCX: 0000000013244082
RDX: 0000000013244002 RSI: 00000000000000a0 RDI: ffff8fcfd0b86580
RBP: 0000000000000001 R08: 00000000000000b0 R09: 0000000000000002
R10: 0000000000000400 R11: ffff8fcfc250c3c0 R12: ffff8fcfd0d06c00
R13: ffff8fcfd4548cc0 R14: ffff8fcfd38dc500 R15: ffff8fcfac4bb3c0
FS: 00007f37f0548dc0(0000) GS:ffff8fcfdc3c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000b60 CR3: 00000008461e6002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
execlists_request_alloc+0x45/0x130
__i915_request_create+0x217/0x280
i915_request_create+0x71/0xc0
i915_gem_do_execbuffer+0x905/0x14f0
i915_gem_execbuffer2_ioctl+0x1df/0x3d0
? i915_gem_execbuffer_ioctl+0x2f0/0x2f0
drm_ioctl_kernel+0xb2/0x100
drm_ioctl+0x209/0x360
? i915_gem_execbuffer_ioctl+0x2f0/0x2f0
do_vfs_ioctl+0x43f/0x6c0
ksys_ioctl+0x5e/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4e/0x140
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f37f13912eb
Code: 0f 1e fa 48 8b 05 a5 8b 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73>
RSP: 002b:00007ffdefeccd58 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055f524c56280 RCX: 00007f37f13912eb
RDX: 00007ffdefeccd90 RSI: 0000000040406469 RDI: 000000000000000d
RBP: 000000000000000d R08: 0000000000000002 R09: 0000000000000001
R10: 0000000000007fff R11: 0000000000003246 R12: 0000000000000038
R13: 00007f37ec79e000 R14: 00007ffdefeccd90 R15: 00007f37ec79e488
CR2: 0000000000000b60
---[ end trace fddd27adcbe816b9 ]---
RIP: 0010:gen8_emit_flush+0x28/0x60
Code: 40 00 0f 1f 44 00 00 55 89 f5 be 04 00 00 00 53 48 89 fb e8 1a 83 00 00 48 3d 00 f0 ff ff 77 1d 83 e5 01 ba 02 40 20 13 75 16 <89> 10 48 c7 40 04 04>
RSP: 0018:ffff97e580eb3a40 EFLAGS: 00010202
RAX: 0000000000000b60 RBX: ffff8fcfd0b86580 RCX: 0000000013244082
RDX: 0000000013244002 RSI: 00000000000000a0 RDI: ffff8fcfd0b86580
RBP: 0000000000000001 R08: 00000000000000b0 R09: 0000000000000002
R10: 0000000000000400 R11: ffff8fcfc250c3c0 R12: ffff8fcfd0d06c00
R13: ffff8fcfd4548cc0 R14: ffff8fcfd38dc500 R15: ffff8fcfac4bb3c0
FS: 00007f37f0548dc0(0000) GS:ffff8fcfdc3c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000b60 CR3: 00000008461e6002 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
I traced the faulting instruction in GDB:
(gdb) list *gen8_emit_flush+0x28
0xffffffff817ee868 is in gen8_emit_flush (drivers/gpu/drm/i915/gt/intel_lrc.c:2769).
2764 cmd |= MI_INVALIDATE_TLB;
2765 if (request->engine->class == VIDEO_DECODE_CLASS)
2766 cmd |= MI_INVALIDATE_BSD;
2767 }
2768
2769 *cs++ = cmd;
2770 *cs++ = I915_GEM_HWS_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT;
2771 *cs++ = 0; /* upper addr */
2772 *cs++ = 0; /* value */
2773 intel_ring_advance(request, cs);
It looks like cs
contained junk (0x0000000000000b60
) that was returned from intel_ring_begin()
. The pointer that intel_ring_begin()
returns is cs = ring->vaddr + ring->emit;
, which means ring->vaddr
was NULL and cs
was equal to ring->emit
.
Edited by Mahesh Meena