[Bisected] GPU hangs and display artifacts on 5.18-rc3 on Intel GM45
Platform: Dell Latitude E6500
Chipset: Intel GM45
Steps to reproduce: happens soon after boot on kernel 5.18-rc3 . Can be triggered fast by opening a terminal and executing "dmesg" for a few times. Sometimes happens only after several minutes of GUI activity, though.
System architecture: amd64
Linux distribution: Ubuntu 20.04 fully updated
Xorg, Gnome Flashback
Display connector: LVDS
Bisected down to
commit b5cfe6f7a6e1 ("drm/i915: Remove short-term pins from execbuf, v6.")
Reverting this commit on top of v5.18-rc3 fixes the issue.
Photos of artifacts on the screen:
Dmesg after occurrence of this problem, on hand-compiled 5.18-rc3: dmesg_5.18.0-rc3.txt
Dmesg on 5.18-rc3 with this commit reverted: dmesg_5.18-rc3_reverted_patch.txt
System information: xrandr__verbose.txt lspci_vvvnn.txt (obtained from some previous kernel).
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Mateusz Jończyk changed the description
changed the description
- Author
CCing @mlankhorst , as he is the commit author.
- ravi teja added Community GPU hang platform: GM45 labels
added Community GPU hang platform: GM45 labels
- Author
Happens also on drm-tip as of today. Attaching dmesg.txt and contents of /sys/class/drm/card0/error after the hang: drm_error.txt
The hang happens at:
[ 63.110186] mjonczyk-laptop kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 4:1:9ffdfeff, in Xorg [2307]
- Reporter
Marking it as critical.
- LAKSHMINARAYANA VUDUM added severity::critical label
added severity::critical label
- LAKSHMINARAYANA VUDUM added priority::highest label
added priority::highest label
- LAKSHMINARAYANA VUDUM removed priority::highest label
removed priority::highest label
- Developer
Short term pinning was removed because it shouldn't matter. However it could be that we handle batch buffer differently.
Hence, below patch reverts everything except the part of reloc_iomap(). Does it still fail?
- Reporter
@matjon Can you try with the patch shared by @mlankhorst
- Author
Sure, compiling.
- Author
Still happens with the patch applied. dmesg_5.18.0-rc5-00001-gfb7a0a8c4a40.txt
- Developer
This is not a patch you should test, just showing the part of the patch that is broken for you on GM45..
However something you can test on top of your favorite broken tree.. in gem/i915_gem_execbuffer.c there is a "#define DBG_FORCE_RELOC 0", can you change the 0 to 1? And what happens if you change it to 2?
- Author
I have tested modified values of DBG_FORCE_RELOC on vanilla 5.18-rc5 kernel.
With
DBG_FORCE_RELOC=1
everything appears stable, I have played SuperTuxKart for a while with no problems. dmesg_DBG_FORCE_RELOC_1.txtWith
DBG_FORCE_RELOC=2
I cannot do a GUI login, gdm (I think) crashes on something like a blur effect after choosing the username. dmesg_DBG_FORCE_RELOC_2.txt drm_error_DBG_FORCE_RELOC_2.txt - Developer
Yeah, DBG_FORCE_RELOC=2 forces the bad path, makes reproducing easier..
Small attempt to fix it? patch
- Author
Nope, this patch does not help unfortunately. The symptoms with this patch are as before, whether with DBG_FORCE_RELOC=0 or with DBG_FORCE_RELOC=2.
- Developer
What about this?
- Developer
- Developer
Thanks, I'll give that patch a try later today.
- Developer
What about this?
That seems to fix things for me, thank you.
- Developer
@jwrdegoede one more thing to try? This copies parts that i915_gem_object_ggtt_pin_ww was doing, but we didn't do after changes:
- Author
Referring to the patch from 8hs ago: #5806 (comment 1373187)
It fixes the issue for me, works with both DBG_FORCE_REL=0 and 2. I played SuperTuxKart on both variants, the FPS was a bit low but it was playable, no issues.
The laptop did hang once while shutting down (of several tries), but this looks unrelated, probably caused by sth in userspace and not connected with graphics.
Edit: tested on vanilla 5.18-rc6.
Edited by Mateusz Jończyk - Developer
Yeah, first patch just brute forces cpu fallback path. Last patch should hopefully fix the issue.
- Developer
one more thing to try? This copies parts that i915_gem_object_ggtt_pin_ww was doing, but we didn't do after changes:
I'm afraid that that patch does not work. Instead of misrendering gdm now crashes and I get the following backtraces in dmesg:
[ 35.331015] BUG: kernel NULL pointer dereference, address: 00000000000000c5 [ 35.331029] #PF: supervisor read access in kernel mode [ 35.331035] #PF: error_code(0x0000) - not-present page [ 35.331039] PGD 0 P4D 0 [ 35.331048] Oops: 0000 [#1] PREEMPT SMP PTI [ 35.331055] CPU: 3 PID: 841 Comm: gnome-shell Tainted: G C E 5.18.0-rc5+ #31 [ 35.331061] Hardware name: ilife S806/BYT-PA03C, BIOS H1D_S806_206 10/29/2014 [ 35.331065] RIP: 0010:i915_gem_object_prepare_write+0xa93/0x6280 [i915] [ 35.331136] Code: 54 24 28 31 c9 48 8b 7c 24 20 4c 8b 4c 24 30 48 89 c6 2e e8 ff 15 db d5 48 8b 54 24 28 e9 6b fd ff ff f0 41 ff 87 10 01 00 00 <48> 83 3c 25 c5 00 00 00 00 74 0d f6 85 f0 03 00 00 7f 0f 84 c3 00 [ 35.331141] RSP: 0018:ffffb71ac15eb730 EFLAGS: 00010202 [ 35.331147] RAX: 0000000000006800 RBX: ffffb71ac15eba60 RCX: ffffb71ac15eb6cc [ 35.331152] RDX: 0000000000000002 RSI: ffff8e3887dd32b8 RDI: ffff8e3887dd32a0 [ 35.331156] RBP: ffff8e3887dd3000 R08: 0000000000000000 R09: 0000000000000001 [ 35.331160] R10: ffff8e3884da3180 R11: 00000000000183ac R12: 0000000000000000 [ 35.331164] R13: 0000000000000034 R14: 000000007ffe0060 R15: ffff8e3898495380 [ 35.331169] FS: 00007f92e231ad80(0000) GS:ffff8e38fb780000(0000) knlGS:0000000000000000 [ 35.331174] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.331178] CR2: 00000000000000c5 CR3: 00000000184fa000 CR4: 00000000001006e0 [ 35.331183] Call Trace: [ 35.331189] <TASK> [ 35.331201] i915_gem_object_prepare_write+0xdd0/0x6280 [i915] [ 35.331267] ? lock_acquire+0xad/0x290 [ 35.331276] ? krc_this_cpu_lock+0x33/0x40 [ 35.331285] ? lock_is_held_type+0xa6/0x120 [ 35.331296] ? mark_held_locks+0x49/0x70 [ 35.331303] ? lockdep_hardirqs_on_prepare+0xd9/0x180 [ 35.331308] ? _raw_spin_unlock_irqrestore+0x30/0x50 [ 35.331316] ? _raw_spin_unlock_irqrestore+0x30/0x50 [ 35.331322] ? i915_sw_fence_await_reservation+0x2c0/0x300 [i915] [ 35.331375] ? i915_vma_pin_ww+0x569/0xb50 [i915] [ 35.331443] ? i915_gem_object_prepare_write+0x1ae6/0x6280 [i915] [ 35.331504] ? i915_gem_object_prepare_write+0x18bf/0x6280 [i915] [ 35.331566] i915_gem_object_prepare_write+0x40d4/0x6280 [i915] [ 35.331637] ? lock_is_held_type+0xa6/0x120 [ 35.331655] ? __lock_acquire+0x3a2/0x1f90 [ 35.331666] ? lock_acquire+0xad/0x290 [ 35.331672] ? lock_is_held_type+0xa6/0x120 [ 35.331685] i915_gem_execbuffer2_ioctl+0x115/0x5c0 [i915] [ 35.331747] ? i915_gem_object_prepare_write+0x6280/0x6280 [i915] [ 35.331808] drm_ioctl_kernel+0xa1/0x150 [ 35.331818] drm_ioctl+0x21c/0x410 [ 35.331825] ? i915_gem_object_prepare_write+0x6280/0x6280 [i915] [ 35.331890] ? __fget_files+0xd2/0x170 [ 35.331902] __x64_sys_ioctl+0x8d/0xc0 [ 35.331911] do_syscall_64+0x5b/0x80 [ 35.331919] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 35.331926] RIP: 0033:0x7f92e87db37b [ 35.331933] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48 [ 35.331939] RSP: 002b:00007ffc0d95dd58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 35.331945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f92e87db37b [ 35.331950] RDX: 00007ffc0d95dd90 RSI: 0000000040406469 RDI: 000000000000000f [ 35.331954] RBP: 00007ffc0d95de50 R08: 00007f92e0036000 R09: 00000001008de000 [ 35.331958] R10: 00007ffc0d95dc90 R11: 0000000000000246 R12: 000055c435edef98 [ 35.331962] R13: 00000000000000dc R14: 00007ffc0d95dd90 R15: 000000000000000f [ 35.331973] </TASK> [ 35.331976] Modules linked in: qrtr bnep snd_ctl_led iTCO_wdt phy_tusb1210 mei_pxp dwc3 mei_hdcp intel_pmc_bxt iTCO_vendor_support udc_core ulpi snd_soc_sst_bytcr_rt5640 gpio_keys intel_rapl_msr intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp kvm_intel vfat fat brcmfmac kvm brcmutil irqbypass intel_cstate cfg80211 pcspkr joydev axp288_adc axp20x_pek extcon_axp288 axp288_fuel_gauge axp288_charger mei_txe mei lpc_ich dwc3_pci wmi_bmof snd_sof_acpi_intel_byt snd_sof_acpi snd_sof_intel_atom snd_sof_xtensa_dsp snd_sof snd_sof_utils ledtrig_audio snd_intel_sst_acpi snd_hdmi_lpe_audio snd_intel_sst_core snd_soc_sst_atom_hifi2_platform snd_soc_rt5670 snd_soc_acpi_intel_match snd_soc_rt5651 snd_soc_rt5645 snd_soc_acpi snd_soc_rt5640 snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_rl6231 snd_soc_core int3401_thermal processor_thermal_device snd_compress ac97_bus snd_pcm_dmaengine soc_button_array processor_thermal_rfim bmc150_accel_spi dw_dmac snd_seq processor_thermal_mbox [ 35.332085] regmap_spi dptf_power int3406_thermal processor_thermal_rapl snd_seq_device int3400_thermal intel_rapl_common int3403_thermal hci_uart acpi_thermal_rel btqca btrtl btbcm int340x_thermal_zone bmc150_accel_i2c btintel snd_pcm bmc150_accel_core silead bluetooth intel_int0002_vgpio atomisp_gc0310(C) industrialio_triggered_buffer atomisp_ov2680(C) atomisp_gmin_platform(C) acpi_pad videodev cm32181 kfifo_buf snd_timer industrialio mc snd ecdh_generic soundcore rfkill zram ip_tables i915(E) mmc_block crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_buddy drm_dp_helper ttm video wmi sdhci_acpi sdhci mmc_core pwm_lpss_platform pwm_lpss i2c_dev fuse [ 35.332175] CR2: 00000000000000c5 [ 35.332233] ---[ end trace 0000000000000000 ]--- [ 35.332239] RIP: 0010:i915_gem_object_prepare_write+0xa93/0x6280 [i915] [ 35.332303] Code: 54 24 28 31 c9 48 8b 7c 24 20 4c 8b 4c 24 30 48 89 c6 2e e8 ff 15 db d5 48 8b 54 24 28 e9 6b fd ff ff f0 41 ff 87 10 01 00 00 <48> 83 3c 25 c5 00 00 00 00 74 0d f6 85 f0 03 00 00 7f 0f 84 c3 00 [ 35.332309] RSP: 0018:ffffb71ac15eb730 EFLAGS: 00010202 [ 35.332315] RAX: 0000000000006800 RBX: ffffb71ac15eba60 RCX: ffffb71ac15eb6cc [ 35.332320] RDX: 0000000000000002 RSI: ffff8e3887dd32b8 RDI: ffff8e3887dd32a0 [ 35.332324] RBP: ffff8e3887dd3000 R08: 0000000000000000 R09: 0000000000000001 [ 35.332328] R10: ffff8e3884da3180 R11: 00000000000183ac R12: 0000000000000000 [ 35.332333] R13: 0000000000000034 R14: 000000007ffe0060 R15: ffff8e3898495380 [ 35.332337] FS: 00007f92e231ad80(0000) GS:ffff8e38fb780000(0000) knlGS:0000000000000000 [ 35.332343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.332348] CR2: 00000000000000c5 CR3: 00000000184fa000 CR4: 00000000001006e0 [ 122.785023] fbcon: Taking over console [ 122.793566] Console: switching to colour frame buffer device 150x120 [ 125.072328] BUG: unable to handle page fault for address: ffffb71ac15ebb60 [ 125.072350] #PF: supervisor read access in kernel mode [ 125.072362] #PF: error_code(0x0000) - not-present page [ 125.072371] PGD 1000067 P4D 1000067 PUD 11e3067 PMD 66b7067 PTE 0 [ 125.072401] Oops: 0000 [#2] PREEMPT SMP PTI [ 125.072415] CPU: 2 PID: 941 Comm: Xorg Tainted: G D C E 5.18.0-rc5+ #31 [ 125.072429] Hardware name: ilife S806/BYT-PA03C, BIOS H1D_S806_206 10/29/2014 [ 125.072438] RIP: 0010:__ww_mutex_lock.constprop.0+0x95b/0xfb0 [ 125.072462] Code: c2 e9 3f ff ff ff 4d 39 e9 0f 85 d4 fe ff ff f6 c2 02 0f 85 2a ff ff ff 48 83 c9 02 e9 99 fe ff ff 48 85 c0 74 30 48 8b 53 08 <48> 2b 50 08 48 85 d2 7e 23 8b 15 1e 59 5d 02 85 d2 75 0b 48 83 7b [ 125.072475] RSP: 0018:ffffb71ac19177a0 EFLAGS: 00010282 [ 125.072488] RAX: ffffb71ac15ebb58 RBX: ffffb71ac1917a68 RCX: 0000000000000001 [ 125.072499] RDX: 000000000000001a RSI: ffff8e3884da3180 RDI: ffffb71ac19177e0 [ 125.072509] RBP: ffffb71ac1917840 R08: ffff8e388aa04988 R09: 0000000000000000 [ 125.072519] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e38984d0000 [ 125.072528] R13: ffffb71ac19177e0 R14: ffff8e388aa04938 R15: ffff8e388aa04988 [ 125.072539] FS: 00007eff9890bf00(0000) GS:ffff8e38fb700000(0000) knlGS:0000000000000000 [ 125.072551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 125.072561] CR2: ffffb71ac15ebb60 CR3: 000000000682e000 CR4: 00000000001006e0 [ 125.072572] Call Trace: [ 125.072583] <TASK> [ 125.072596] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.072618] ? __intel_context_do_pin_ww+0x2f1/0x810 [i915] [ 125.072758] ? cpumask_next+0x1f/0x30 [ 125.072781] ? ww_mutex_lock_interruptible+0x38/0xa0 [ 125.072797] ww_mutex_lock_interruptible+0x38/0xa0 [ 125.072816] __intel_context_do_pin_ww+0x2f1/0x810 [i915] [ 125.072954] i915_gem_object_prepare_write+0x1906/0x6280 [i915] [ 125.073104] i915_gem_object_prepare_write+0x3f09/0x6280 [i915] [ 125.073267] ? lock_acquire+0x23f/0x290 [ 125.073282] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073297] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073311] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073327] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073353] ? lock_acquire+0x23f/0x290 [ 125.073368] ? lock_release+0x1d4/0x2a0 [ 125.073381] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073396] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073414] ? vsnprintf+0x397/0x600 [ 125.073432] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073452] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073467] ? lock_acquire+0x23f/0x290 [ 125.073482] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.073510] i915_gem_execbuffer2_ioctl+0x115/0x5c0 [i915] [ 125.073657] ? i915_gem_object_prepare_write+0x6280/0x6280 [i915] [ 125.073802] drm_ioctl_kernel+0xa1/0x150 [ 125.073823] drm_ioctl+0x21c/0x410 [ 125.073841] ? i915_gem_object_prepare_write+0x6280/0x6280 [i915] [ 125.073994] ? __fget_files+0xd2/0x170 [ 125.074016] __x64_sys_ioctl+0x8d/0xc0 [ 125.074036] do_syscall_64+0x5b/0x80 [ 125.074058] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.074072] ? lock_release+0x1d4/0x2a0 [ 125.074091] ? up_read+0x17/0x20 [ 125.074103] ? do_user_addr_fault+0x1ea/0x6a0 [ 125.074123] ? trace_hardirqs_off+0xc/0xc0 [ 125.074140] ? exc_page_fault+0xc1/0x280 [ 125.074156] ? rcu_read_lock_sched_held+0x10/0x70 [ 125.074171] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 125.074187] RIP: 0033:0x7eff991a137b [ 125.074202] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48 [ 125.074215] RSP: 002b:00007ffcff7a0c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 125.074230] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007eff991a137b [ 125.074240] RDX: 00007ffcff7a0cb0 RSI: 0000000040406469 RDI: 0000000000000011 [ 125.074250] RBP: 00007ffcff7a0d70 R08: 00007eff98830000 R09: 000000010000d000 [ 125.074260] R10: 00007ffcff7a0bb0 R11: 0000000000000246 R12: 00005640a9901f48 [ 125.074270] R13: 00000000000000dc R14: 00007ffcff7a0cb0 R15: 0000000000000011 [ 125.074296] </TASK> [ 125.074303] Modules linked in: qrtr bnep snd_ctl_led iTCO_wdt phy_tusb1210 mei_pxp dwc3 mei_hdcp intel_pmc_bxt iTCO_vendor_support udc_core ulpi snd_soc_sst_bytcr_rt5640 gpio_keys intel_rapl_msr intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp kvm_intel vfat fat brcmfmac kvm brcmutil irqbypass intel_cstate cfg80211 pcspkr joydev axp288_adc axp20x_pek extcon_axp288 axp288_fuel_gauge axp288_charger mei_txe mei lpc_ich dwc3_pci wmi_bmof snd_sof_acpi_intel_byt snd_sof_acpi snd_sof_intel_atom snd_sof_xtensa_dsp snd_sof snd_sof_utils ledtrig_audio snd_intel_sst_acpi snd_hdmi_lpe_audio snd_intel_sst_core snd_soc_sst_atom_hifi2_platform snd_soc_rt5670 snd_soc_acpi_intel_match snd_soc_rt5651 snd_soc_rt5645 snd_soc_acpi snd_soc_rt5640 snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_rl6231 snd_soc_core int3401_thermal processor_thermal_device snd_compress ac97_bus snd_pcm_dmaengine soc_button_array processor_thermal_rfim bmc150_accel_spi dw_dmac snd_seq processor_thermal_mbox [ 125.074545] regmap_spi dptf_power int3406_thermal processor_thermal_rapl snd_seq_device int3400_thermal intel_rapl_common int3403_thermal hci_uart acpi_thermal_rel btqca btrtl btbcm int340x_thermal_zone bmc150_accel_i2c btintel snd_pcm bmc150_accel_core silead bluetooth intel_int0002_vgpio atomisp_gc0310(C) industrialio_triggered_buffer atomisp_ov2680(C) atomisp_gmin_platform(C) acpi_pad videodev cm32181 kfifo_buf snd_timer industrialio mc snd ecdh_generic soundcore rfkill zram ip_tables i915(E) mmc_block crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_buddy drm_dp_helper ttm video wmi sdhci_acpi sdhci mmc_core pwm_lpss_platform pwm_lpss i2c_dev fuse [ 125.074751] CR2: ffffb71ac15ebb60 [ 125.074762] ---[ end trace 0000000000000000 ]--- [ 125.074772] RIP: 0010:i915_gem_object_prepare_write+0xa93/0x6280 [i915] [ 125.074922] Code: 54 24 28 31 c9 48 8b 7c 24 20 4c 8b 4c 24 30 48 89 c6 2e e8 ff 15 db d5 48 8b 54 24 28 e9 6b fd ff ff f0 41 ff 87 10 01 00 00 <48> 83 3c 25 c5 00 00 00 00 74 0d f6 85 f0 03 00 00 7f 0f 84 c3 00 [ 125.074934] RSP: 0018:ffffb71ac15eb730 EFLAGS: 00010202 [ 125.074947] RAX: 0000000000006800 RBX: ffffb71ac15eba60 RCX: ffffb71ac15eb6cc [ 125.074957] RDX: 0000000000000002 RSI: ffff8e3887dd32b8 RDI: ffff8e3887dd32a0 [ 125.074967] RBP: ffff8e3887dd3000 R08: 0000000000000000 R09: 0000000000000001 [ 125.074976] R10: ffff8e3884da3180 R11: 00000000000183ac R12: 0000000000000000 [ 125.074986] R13: 0000000000000034 R14: 000000007ffe0060 R15: ffff8e3898495380 [ 125.074996] FS: 00007eff9890bf00(0000) GS:ffff8e38fb700000(0000) knlGS:0000000000000000 [ 125.075008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 125.075019] CR2: ffffb71ac15ebb60 CR3: 000000000682e000 CR4: 00000000001006e0 [ 125.075030] note: Xorg[941] exited with preempt_count 2 [ 125.075105] ------------[ cut here ]------------ [ 125.075118] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000002b2ebbe>] __ww_mutex_lock.constprop.0+0x685/0xfb0 [ 125.075153] WARNING: CPU: 2 PID: 941 at kernel/sched/core.c:9662 __might_sleep+0x5a/0x60 [ 125.075175] Modules linked in: qrtr bnep snd_ctl_led iTCO_wdt phy_tusb1210 mei_pxp dwc3 mei_hdcp intel_pmc_bxt iTCO_vendor_support udc_core ulpi snd_soc_sst_bytcr_rt5640 gpio_keys intel_rapl_msr intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp kvm_intel vfat fat brcmfmac kvm brcmutil irqbypass intel_cstate cfg80211 pcspkr joydev axp288_adc axp20x_pek extcon_axp288 axp288_fuel_gauge axp288_charger mei_txe mei lpc_ich dwc3_pci wmi_bmof snd_sof_acpi_intel_byt snd_sof_acpi snd_sof_intel_atom snd_sof_xtensa_dsp snd_sof snd_sof_utils ledtrig_audio snd_intel_sst_acpi snd_hdmi_lpe_audio snd_intel_sst_core snd_soc_sst_atom_hifi2_platform snd_soc_rt5670 snd_soc_acpi_intel_match snd_soc_rt5651 snd_soc_rt5645 snd_soc_acpi snd_soc_rt5640 snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_rl6231 snd_soc_core int3401_thermal processor_thermal_device snd_compress ac97_bus snd_pcm_dmaengine soc_button_array processor_thermal_rfim bmc150_accel_spi dw_dmac snd_seq processor_thermal_mbox [ 125.075467] regmap_spi dptf_power int3406_thermal processor_thermal_rapl snd_seq_device int3400_thermal intel_rapl_common int3403_thermal hci_uart acpi_thermal_rel btqca btrtl btbcm int340x_thermal_zone bmc150_accel_i2c btintel snd_pcm bmc150_accel_core silead bluetooth intel_int0002_vgpio atomisp_gc0310(C) industrialio_triggered_buffer atomisp_ov2680(C) atomisp_gmin_platform(C) acpi_pad videodev cm32181 kfifo_buf snd_timer industrialio mc snd ecdh_generic soundcore rfkill zram ip_tables i915(E) mmc_block crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_buddy drm_dp_helper ttm video wmi sdhci_acpi sdhci mmc_core pwm_lpss_platform pwm_lpss i2c_dev fuse [ 125.075676] CPU: 2 PID: 941 Comm: Xorg Tainted: G D C E 5.18.0-rc5+ #31 [ 125.075689] Hardware name: ilife S806/BYT-PA03C, BIOS H1D_S806_206 10/29/2014 [ 125.075698] RIP: 0010:__might_sleep+0x5a/0x60 [ 125.075714] Code: ee 48 89 df 31 d2 5b 5d e9 73 fe ff ff 48 8b 90 28 2d 00 00 48 c7 c7 a0 16 82 96 c6 05 24 be f2 01 01 48 89 d1 e8 6a 04 cb 00 <0f> 0b eb d1 66 90 0f 1f 44 00 00 41 54 41 89 f4 55 48 89 d5 8b 15 [ 125.075727] RSP: 0018:ffffb71ac1917ec0 EFLAGS: 00010282 [ 125.075740] RAX: 0000000000000078 RBX: ffffffff9681cdee RCX: 0000000000000027 [ 125.075750] RDX: ffff8e38fb720928 RSI: 0000000000000001 RDI: ffff8e38fb720920 [ 125.075760] RBP: 0000000000000031 R08: 0000000000000000 R09: ffffb71ac1917cf0 [ 125.075770] R10: 0000000000000003 R11: ffffffff96d61f08 R12: 0000000000000046 [ 125.075779] R13: 0000000000000000 R14: 0000000000000009 R15: 0000000000000000 [ 125.075788] FS: 00007eff9890bf00(0000) GS:ffff8e38fb700000(0000) knlGS:0000000000000000 [ 125.075800] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 125.075810] CR2: ffffb71ac15ebb60 CR3: 000000000682e000 CR4: 00000000001006e0 [ 125.075821] Call Trace: [ 125.075830] <TASK> [ 125.075842] exit_signals+0x1a/0x2f0 [ 125.075860] do_exit+0x14b/0xbc0 [ 125.075882] make_task_dead+0x51/0x60 [ 125.075897] rewind_stack_and_make_dead+0x17/0x17 [ 125.075914] RIP: 0033:0x7eff991a137b [ 125.075928] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 2a 0f 00 f7 d8 64 89 01 48 [ 125.075941] RSP: 002b:00007ffcff7a0c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 125.075956] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007eff991a137b [ 125.075966] RDX: 00007ffcff7a0cb0 RSI: 0000000040406469 RDI: 0000000000000011 [ 125.075976] RBP: 00007ffcff7a0d70 R08: 00007eff98830000 R09: 000000010000d000 [ 125.075985] R10: 00007ffcff7a0bb0 R11: 0000000000000246 R12: 00005640a9901f48 [ 125.075995] R13: 00000000000000dc R14: 00007ffcff7a0cb0 R15: 0000000000000011 [ 125.076019] </TASK> [ 125.076027] irq event stamp: 0 [ 125.076034] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 125.076047] hardirqs last disabled at (0): [<ffffffff950e4a87>] copy_process+0x9d7/0x1dc0 [ 125.076065] softirqs last enabled at (0): [<ffffffff950e4a87>] copy_process+0x9d7/0x1dc0 [ 125.076080] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 125.076092] ---[ end trace 0000000000000000 ]---
- Developer
I just double checked and going back to the:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 498b458fd784..c02cd548c455 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1260,18 +1260,15 @@ static void *reloc_iomap(struct i915_vma *batch, * VMA from the object list because we no longer pin. * * Only attempt to pin the batch buffer to ggtt if the current batch - * is not inside ggtt, or the batch buffer is not misplaced. + * is not inside ggtt. */ - if (!i915_is_ggtt(batch->vm)) { - vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0, - PIN_MAPPABLE | - PIN_NONBLOCK /* NOWARN */ | - PIN_NOEVICT); - } else if (i915_vma_is_map_and_fenceable(batch)) { - __i915_vma_pin(batch); - vma = batch; - } + if (i915_is_ggtt(batch->vm)) + return NULL; + vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0, + PIN_MAPPABLE | + PIN_NONBLOCK /* NOWARN */ | + PIN_NOEVICT); if (vma == ERR_PTR(-EDEADLK)) return vma;
patch I see no backtraces in dmesg, so the backtraces are caused by the new patch.
- Developer
Can you try the last patch but without the i915_vma_revoke_fence call?
- Developer
That fixes the backtraces, but now we are back to things getting mis-rendered again.
Off-topic: Note I'm going AFK now to go to the gym. I'll be available to run more tests again in about 90 minutes.
- Developer
BTW, so that we are on the same page, here is the diff which I used for my latest test which had the mis-rendering again:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index d42f437149c9..0ddbfc363e0d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1259,7 +1259,9 @@ static void *reloc_iomap(struct i915_vma *batch, PIN_NOEVICT); } else if (i915_vma_is_map_and_fenceable(batch)) { __i915_vma_pin(batch); - vma = batch; + + err = i915_vma_wait_for_bind(batch); + vma = err ? ERR_PTR(err) : batch; } if (vma == ERR_PTR(-EDEADLK))
- Developer
One more attempt..
- Developer
In the previous patch with the return NULL, what happens if you take out the if (i915_is_ggtt(batch->vm)) return NULL ?
So this results in:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index d42f437149c9..5eb8a8cb1fa5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1245,23 +1245,10 @@ static void *reloc_iomap(struct i915_vma *batch, if (err) return ERR_PTR(err); - /* - * i915_gem_object_ggtt_pin_ww may attempt to remove the batch - * VMA from the object list because we no longer pin. - * - * Only attempt to pin the batch buffer to ggtt if the current batch - * is not inside ggtt, or the batch buffer is not misplaced. - */ - if (!i915_is_ggtt(batch->vm)) { - vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0, - PIN_MAPPABLE | - PIN_NONBLOCK /* NOWARN */ | - PIN_NOEVICT); - } else if (i915_vma_is_map_and_fenceable(batch)) { - __i915_vma_pin(batch); - vma = batch; - } - + vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0, + PIN_MAPPABLE | + PIN_NONBLOCK /* NOWARN */ | + PIN_NOEVICT); if (vma == ERR_PTR(-EDEADLK)) return vma;
And with this change everything seems to work as it should.
- Developer
One more attempt..
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 498b458fd784..919d01082909 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1262,14 +1262,12 @@ static void *reloc_iomap(struct i915_vma *batch, * Only attempt to pin the batch buffer to ggtt if the current batch * is not inside ggtt, or the batch buffer is not misplaced. */ - if (!i915_is_ggtt(batch->vm)) { + if (!i915_is_ggtt(batch->vm) || + !i915_vma_misplaced(batch, 0, 0, PIN_MAPPABLE)) { vma = i915_gem_object_ggtt_pin_ww(obj, &eb->ww, NULL, 0, 0, PIN_MAPPABLE | PIN_NONBLOCK /* NOWARN */ | PIN_NOEVICT); - } else if (i915_vma_is_map_and_fenceable(batch)) { - __i915_vma_pin(batch); - vma = batch; } if (vma == ERR_PTR(-EDEADLK))
With this patch everything also seems to work as it should.
- Developer
Weird! Oh well, lets see if CI is still happy with that change..
- Author
I have tested the patch you sent to the mailing list (
[PATCH] drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomap
) and with it the GPU works correctly.When resizing the terminal window many times in a row and playing with it, once the terminal contents got empty (I don't think I pressed a key combo to clean it). This seems to be unrelated, however.
Tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
- Author
Thank you for fixing this issue.
- Reporter
@matjon Thanks for the confirmation. Closing as this issue.
- Suresh closed
closed
- Reporter
@jani.saarinen Thanks for pointing to me that patch is not merged yet hence re-opening the bug until patch is landing in the upstreaming branch.
- Suresh reopened
reopened
- Developer
@mlankhorst, it would be nice if we can get this fixed before 5.18 is released. What is the status of getting your:
https://patchwork.freedesktop.org/patch/485889/
patch for this merged ?
- Developer
Pushed to gt-next, should be picked up by the drm-intel maintainers for inclusion.
commit 451374eef622fca6f00eeeda89aaccb45a30a149 (HEAD -> drm-intel-gt-next) Author: Maarten Lankhorst maarten.lankhorst@linux.intel.com Date: Wed May 11 13:52:19 2022 +0200
drm/i915: Use i915_gem_object_ggtt_pin_ww for reloc_iomap
- Maarten Lankhorst closed
closed
- Dave Airlie mentioned in commit airlied/drm-testing-ci@7b1d6924
mentioned in commit airlied/drm-testing-ci@7b1d6924
- Author
Slated to be released in Linux 5.18 as commit 7b1d6924f27b . https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b1d6924f27ba24b9e47abb9bd53d0bbc430a835
- Mario R. mentioned in issue #5953 (closed)
mentioned in issue #5953 (closed)