[BISECTED] Kernel warning on v5.10-rc1 and amd-staging-drm-next
Problem Description
I see the below kernel warning and backtrace after logging in when running v5.10-rc1
or amd-staging-drm-next
. This does not occur when running mainline v5.9.x or earlier releases.
Warning Message
[ 16.845918] ------------[ cut here ]------------
[ 16.845921] amdgpu 0000:01:00.0: drm_WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev))
[ 16.845981] WARNING: CPU: 1 PID: 972 at drivers/gpu/drm/drm_vblank.c:722 drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x31b/0x330 [drm]
[ 16.845982] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nft_objref nf_conntrack_tftp nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun iptable_mangle iptable_raw bridge stp iptable_security llc ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter nct6775 hwmon_vid sunrpc vfat fat intel_rapl_msr iTCO_wdt at24 iTCO_vendor_support mei_hdcp pktcdvd intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi irqbypass rapl intel_cstate snd_hda_intel snd_intel_dspcfg snd_usb_audio intel_uncore snd_hda_codec
[ 16.846022] snd_usbmidi_lib snd_hda_core snd_rawmidi snd_hwdep i2c_i801 i2c_smbus xpad snd_seq lpc_ich ff_memless joydev snd_seq_device snd_pcm snd_timer snd mei_me soundcore mei zram ip_tables amdgpu i915 crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 gpu_sched ghash_clmulni_intel ttm intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops tg3 drm video fuse i2c_dev
[ 16.846047] CPU: 1 PID: 972 Comm: Xorg Tainted: G T 5.10.0-rc1 #31
[ 16.846047] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013
[ 16.846059] RIP: 0010:drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x31b/0x330 [drm]
[ 16.846061] Code: 48 85 db 75 03 48 8b 1f 44 88 5d a8 e8 ae 3d 59 e3 48 c7 c1 a0 71 14 c0 48 89 da 48 c7 c7 4d 2c 14 c0 48 89 c6 e8 b0 e6 98 e3 <0f> 0b 44 0f b6 5d a8 e9 08 ff ff ff e8 b4 bd 9d e3 0f 1f 40 00 0f
[ 16.846062] RSP: 0000:ffffa6fc4181f7c0 EFLAGS: 00010082
[ 16.846063] RAX: 0000000000000000 RBX: ffff8ae300d7fd90 RCX: 0000000000000003
[ 16.846064] RDX: 0000000080000003 RSI: ffffffffa43ad5f4 RDI: 00000000ffffffff
[ 16.846064] RBP: ffffa6fc4181f830 R08: 0000000000000000 R09: c0000000ffffdfff
[ 16.846065] R10: ffffa6fc4181f5e8 R11: ffffa6fc4181f5e0 R12: ffffa6fc4181f878
[ 16.846066] R13: 0000000000000000 R14: ffff8ae3009b5800 R15: ffff8ae30c2cb1d8
[ 16.846067] FS: 00007fd1d7fafa80(0000) GS:ffff8ae817280000(0000) knlGS:0000000000000000
[ 16.846067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 16.846068] CR2: 00007f382ab36024 CR3: 0000000108eb0003 CR4: 00000000001706e0
[ 16.846069] Call Trace:
[ 16.846158] ? amdgpu_display_crtc_page_flip_target+0x470/0x4f0 [amdgpu]
[ 16.846219] ? dm_read_reg_func+0x36/0xa0 [amdgpu]
[ 16.846234] drm_get_last_vbltimestamp+0xa9/0xc0 [drm]
[ 16.846244] drm_reset_vblank_timestamp+0x59/0xd0 [drm]
[ 16.846253] drm_crtc_vblank_on+0x77/0x120 [drm]
[ 16.846315] manage_dm_interrupts+0x38/0x70 [amdgpu]
[ 16.846378] amdgpu_dm_atomic_commit_tail+0xba7/0x2350 [amdgpu]
[ 16.846381] ? free_one_page+0x5c4/0x600
[ 16.846397] commit_tail+0x94/0x120 [drm_kms_helper]
[ 16.846403] drm_atomic_helper_commit+0x10e/0x140 [drm_kms_helper]
[ 16.846409] drm_atomic_helper_set_config+0x72/0xc0 [drm_kms_helper]
[ 16.846420] drm_mode_setcrtc+0x1e1/0x6f0 [drm]
[ 16.846431] ? drm_mode_getcrtc+0x180/0x180 [drm]
[ 16.846440] drm_ioctl_kernel+0xa8/0xf0 [drm]
[ 16.846449] drm_ioctl+0x210/0x3d0 [drm]
[ 16.846458] ? drm_mode_getcrtc+0x180/0x180 [drm]
[ 16.846500] amdgpu_drm_ioctl+0x45/0x80 [amdgpu]
[ 16.846503] __x64_sys_ioctl+0x8b/0xc0
[ 16.846507] do_syscall_64+0x33/0x40
[ 16.846509] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 16.846510] RIP: 0033:0x7fd1d852158b
[ 16.846512] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd c8 0c 00 f7 d8 64 89 01 48
[ 16.846513] RSP: 002b:00007ffd2a767d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 16.846515] RAX: ffffffffffffffda RBX: 00007ffd2a767d60 RCX: 00007fd1d852158b
[ 16.846515] RDX: 00007ffd2a767d60 RSI: 00000000c06864a2 RDI: 000000000000000f
[ 16.846516] RBP: 00000000c06864a2 R08: 0000000000000000 R09: 0000557d3ea62f10
[ 16.846516] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 16.846517] R13: 000000000000000f R14: 0000557d3ea62f10 R15: 0000000000000000
[ 16.846519] CPU: 1 PID: 972 Comm: Xorg Tainted: G T 5.10.0-rc1 #31
[ 16.846520] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90 07/11/2013
[ 16.846520] Call Trace:
[ 16.846523] dump_stack+0x57/0x6a
[ 16.846527] __warn.cold.13+0xe/0x3d
[ 16.846537] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x31b/0x330 [drm]
[ 16.846539] report_bug+0xc0/0xf0
[ 16.846541] handle_bug+0x44/0x80
[ 16.846543] exc_invalid_op+0x13/0x60
[ 16.846544] asm_exc_invalid_op+0x12/0x20
[ 16.846554] RIP: 0010:drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x31b/0x330 [drm]
[ 16.846555] Code: 48 85 db 75 03 48 8b 1f 44 88 5d a8 e8 ae 3d 59 e3 48 c7 c1 a0 71 14 c0 48 89 da 48 c7 c7 4d 2c 14 c0 48 89 c6 e8 b0 e6 98 e3 <0f> 0b 44 0f b6 5d a8 e9 08 ff ff ff e8 b4 bd 9d e3 0f 1f 40 00 0f
[ 16.846555] RSP: 0000:ffffa6fc4181f7c0 EFLAGS: 00010082
[ 16.846556] RAX: 0000000000000000 RBX: ffff8ae300d7fd90 RCX: 0000000000000003
[ 16.846557] RDX: 0000000080000003 RSI: ffffffffa43ad5f4 RDI: 00000000ffffffff
[ 16.846558] RBP: ffffa6fc4181f830 R08: 0000000000000000 R09: c0000000ffffdfff
[ 16.846558] R10: ffffa6fc4181f5e8 R11: ffffa6fc4181f5e0 R12: ffffa6fc4181f878
[ 16.846559] R13: 0000000000000000 R14: ffff8ae3009b5800 R15: ffff8ae30c2cb1d8
[ 16.846604] ? amdgpu_display_crtc_page_flip_target+0x470/0x4f0 [amdgpu]
[ 16.846663] ? dm_read_reg_func+0x36/0xa0 [amdgpu]
[ 16.846673] drm_get_last_vbltimestamp+0xa9/0xc0 [drm]
[ 16.846683] drm_reset_vblank_timestamp+0x59/0xd0 [drm]
[ 16.846693] drm_crtc_vblank_on+0x77/0x120 [drm]
[ 16.846756] manage_dm_interrupts+0x38/0x70 [amdgpu]
[ 16.846818] amdgpu_dm_atomic_commit_tail+0xba7/0x2350 [amdgpu]
[ 16.846820] ? free_one_page+0x5c4/0x600
[ 16.846833] commit_tail+0x94/0x120 [drm_kms_helper]
[ 16.846839] drm_atomic_helper_commit+0x10e/0x140 [drm_kms_helper]
[ 16.846845] drm_atomic_helper_set_config+0x72/0xc0 [drm_kms_helper]
[ 16.846854] drm_mode_setcrtc+0x1e1/0x6f0 [drm]
[ 16.846865] ? drm_mode_getcrtc+0x180/0x180 [drm]
[ 16.846874] drm_ioctl_kernel+0xa8/0xf0 [drm]
[ 16.846883] drm_ioctl+0x210/0x3d0 [drm]
[ 16.846892] ? drm_mode_getcrtc+0x180/0x180 [drm]
[ 16.846933] amdgpu_drm_ioctl+0x45/0x80 [amdgpu]
[ 16.846935] __x64_sys_ioctl+0x8b/0xc0
[ 16.846938] do_syscall_64+0x33/0x40
[ 16.846939] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 16.846940] RIP: 0033:0x7fd1d852158b
[ 16.846941] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd c8 0c 00 f7 d8 64 89 01 48
[ 16.846942] RSP: 002b:00007ffd2a767d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 16.846943] RAX: ffffffffffffffda RBX: 00007ffd2a767d60 RCX: 00007fd1d852158b
[ 16.846944] RDX: 00007ffd2a767d60 RSI: 00000000c06864a2 RDI: 000000000000000f
[ 16.846945] RBP: 00000000c06864a2 R08: 0000000000000000 R09: 0000557d3ea62f10
[ 16.846945] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 16.846946] R13: 000000000000000f R14: 0000557d3ea62f10 R15: 0000000000000000
[ 16.846948] ---[ end trace 4095d4a0729959b6 ]---
Commit responsible (from amd-staging-drm-next)
042198ce873591568208c29fe22cfa1f549566b7 is the first bad commit
commit 042198ce873591568208c29fe22cfa1f549566b7
Author: Aurabindo Pillai <aurabindo.pillai@amd.com>
Date: Fri Sep 11 15:10:11 2020 -0400
drm/amd/display: Move disable interrupt into commit tail
[Why&How]
Since there is no need for accessing crtc state in the interrupt
handler, interrupts need not be disabled well in advance, and
can be moved to commit_tail where it should be.
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
This corresponds to commit 6d90a208cfff94c519caaecbc5da3af3929bf374
in Linus' tree which landed in v5.10-rc1.
I have tested this by reverting the corresponding commit in both mainline and amd-staging-drm-next, and have verified that the revert causes the kernel warning to no longer appear.
Bisection log (from amd-staging-drm-next)
git bisect start
# good: [6cde47f093833e9620969be4dbd4db5b7772d925] drm/amdgpu: add member to store vm fault interrupt masks
git bisect good 6cde47f093833e9620969be4dbd4db5b7772d925
# bad: [c91b8da53d45a852621847e689d441a494766251] drm/amdgpu: fix perms of gfx_v10_0.c
git bisect bad c91b8da53d45a852621847e689d441a494766251
# good: [6780c6d60cbc2a5b788db6f94356b0a865d4f046] drm/amdgpu: drop experimental flag for amd-staging-drm-next
git bisect good 6780c6d60cbc2a5b788db6f94356b0a865d4f046
# bad: [e6b73564c2dc66272908eed51ba0beab55444ae0] drm/amd/display: dc/clk_mgr: make function static
git bisect bad e6b73564c2dc66272908eed51ba0beab55444ae0
# bad: [d3a5327ca00251adf2a37e9911a3b76b36a7ae0b] drm/amd/display: Return the number of bytes parsed than allocated
git bisect bad d3a5327ca00251adf2a37e9911a3b76b36a7ae0b
# bad: [042198ce873591568208c29fe22cfa1f549566b7] drm/amd/display: Move disable interrupt into commit tail
git bisect bad 042198ce873591568208c29fe22cfa1f549566b7
# good: [6c957f7730757b5588e42f5afb09131387e69b00] drm/amdkfd: fix a memory leak issue
git bisect good 6c957f7730757b5588e42f5afb09131387e69b00
# good: [ac0c4c24ea1ce90407fe9c4daeb3b88ca7e3c091] drm/amdgpu: Minor checkpatch fix
git bisect good ac0c4c24ea1ce90407fe9c4daeb3b88ca7e3c091
# good: [e7db128f3bf5f53776508f7510034a5d3fb26556] drm/amdgpu: add ta DTM/HDCP print in amdgpu_firmware_info for apu
git bisect good e7db128f3bf5f53776508f7510034a5d3fb26556
# good: [83a505578cc051f7c1c59f1365457067b49ab0ec] drm/amdgpu: Update RAS init handling
git bisect good 83a505578cc051f7c1c59f1365457067b49ab0ec
# good: [ec02a4a8d069bd53d7082ff708cc6dc35392c828] drm/amd/display: Refactor to prevent crtc state access in DM IRQ handler
git bisect good ec02a4a8d069bd53d7082ff708cc6dc35392c828
# first bad commit: [042198ce873591568208c29fe22cfa1f549566b7] drm/amd/display: Move disable interrupt into commit tail
System Information
GPU: AMD RX 560 (POLARIS11)
OS: Fedora 33
Kernels: 5.10-rc1 and amd-staging-drm-next
Mesa: 20.2.1