Skip to content
  • Lyude Paul's avatar
    Revert "drm/dp_mst: Skip validating ports during destruction, just ref" · 9765635b
    Lyude Paul authored
    This reverts commit:
    
    c54c7374 ("drm/dp_mst: Skip validating ports during destruction, just ref")
    
    ugh.
    
    In drm_dp_destroy_connector_work(), we have a pretty good chance of
    freeing the actual struct drm_dp_mst_port. However, after destroying
    things we send a hotplug through (*mgr->cbs->hotplug)(mgr) which is
    where the problems start.
    
    For i915, this calls all the way down to the fbcon probing helpers,
    which start trying to access the port in a modeset.
    
    [   45.062001] ==================================================================
    [   45.062112] BUG: KASAN: use-after-free in ex_handler_refcount+0x146/0x180
    [   45.062196] Write of size 4 at addr ffff8882b4b70968 by task kworker/3:1/53
    
    [   45.062325] CPU: 3 PID: 53 Comm: kworker/3:1 Kdump: loaded Tainted: G           O      4.20.0-rc4Lyude-Test+ #3
    
    
    [   45.062442] Hardware name: LENOVO 20BWS1KY00/20BWS1KY00, BIOS JBET71WW (1.35 ) 09/14/2018
    [   45.062554] Workqueue: events drm_dp_destroy_connector_work [drm_kms_helper]
    [   45.062641] Call Trace:
    [   45.062685]  dump_stack+0xbd/0x15a
    [   45.062735]  ? dump_stack_print_info.cold.0+0x1b/0x1b
    [   45.062801]  ? printk+0x9f/0xc5
    [   45.062847]  ? kmsg_dump_rewind_nolock+0xe4/0xe4
    [   45.062909]  ? ex_handler_refcount+0x146/0x180
    [   45.062970]  print_address_description+0x71/0x239
    [   45.063036]  ? ex_handler_refcount+0x146/0x180
    [   45.063095]  kasan_report.cold.5+0x242/0x30b
    [   45.063155]  __asan_report_store4_noabort+0x1c/0x20
    [   45.063313]  ex_handler_refcount+0x146/0x180
    [   45.063371]  ? ex_handler_clear_fs+0xb0/0xb0
    [   45.063428]  fixup_exception+0x98/0xd7
    [   45.063484]  ? raw_notifier_call_chain+0x20/0x20
    [   45.063548]  do_trap+0x6d/0x210
    [   45.063605]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
    [   45.063732]  do_error_trap+0xc0/0x170
    [   45.063802]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
    [   45.063929]  do_invalid_op+0x3b/0x50
    [   45.063997]  ? _GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
    [   45.064103]  invalid_op+0x14/0x20
    [   45.064162] RIP: 0010:_GLOBAL__sub_I_65535_1_drm_dp_aux_unregister_devnode+0x2f/0x1c6 [drm_kms_helper]
    [   45.064274] Code: 00 48 c7 c7 80 fe 53 a0 48 89 e5 e8 5b 6f 26 e1 5d c3 48 8d 0e 0f 0b 48 8d 0b 0f 0b 48 8d 0f 0f 0b 48 8d 0f 0f 0b 49 8d 4d 00 <0f> 0b 49 8d 0e 0f 0b 48 8d 08 0f 0b 49 8d 4d 00 0f 0b 48 8d 0b 0f
    [   45.064569] RSP: 0018:ffff8882b789ee10 EFLAGS: 00010282
    [   45.064637] RAX: ffff8882af47ae70 RBX: ffff8882af47aa60 RCX: ffff8882b4b70968
    [   45.064723] RDX: ffff8882af47ae70 RSI: 0000000000000008 RDI: ffff8882b788bdb8
    [   45.064808] RBP: ffff8882b789ee28 R08: ffffed1056f13db4 R09: ffffed1056f13db3
    [   45.064894] R10: ffffed1056f13db3 R11: ffff8882b789ed9f R12: ffff8882af47ad28
    [   45.064980] R13: ffff8882b4b70968 R14: ffff8882acd86728 R15: ffff8882b4b75dc8
    [   45.065084]  drm_dp_mst_reset_vcpi_slots+0x12/0x80 [drm_kms_helper]
    [   45.065225]  intel_mst_disable_dp+0xda/0x180 [i915]
    [   45.065361]  intel_encoders_disable.isra.107+0x197/0x310 [i915]
    [   45.065498]  haswell_crtc_disable+0xbe/0x400 [i915]
    [   45.065622]  ? i9xx_disable_plane+0x1c0/0x3e0 [i915]
    [   45.065750]  intel_atomic_commit_tail+0x74e/0x3e60 [i915]
    [   45.065884]  ? intel_pre_plane_update+0xbc0/0xbc0 [i915]
    [   45.065968]  ? drm_atomic_helper_swap_state+0x88b/0x1d90 [drm_kms_helper]
    [   45.066054]  ? kasan_check_write+0x14/0x20
    [   45.066165]  ? i915_gem_track_fb+0x13a/0x330 [i915]
    [   45.066277]  ? i915_sw_fence_complete+0xe9/0x140 [i915]
    [   45.066406]  ? __i915_sw_fence_complete+0xc50/0xc50 [i915]
    [   45.066540]  intel_atomic_commit+0x72e/0xef0 [i915]
    [   45.066635]  ? drm_dev_dbg+0x200/0x200 [drm]
    [   45.066764]  ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
    [   45.066898]  ? intel_atomic_commit_tail+0x3e60/0x3e60 [i915]
    [   45.067001]  drm_atomic_commit+0xc4/0xf0 [drm]
    [   45.067074]  restore_fbdev_mode_atomic+0x562/0x780 [drm_kms_helper]
    [   45.067166]  ? drm_fb_helper_debug_leave+0x690/0x690 [drm_kms_helper]
    [   45.067249]  ? kasan_check_read+0x11/0x20
    [   45.067324]  restore_fbdev_mode+0x127/0x4b0 [drm_kms_helper]
    [   45.067364]  ? kasan_check_read+0x11/0x20
    [   45.067406]  drm_fb_helper_restore_fbdev_mode_unlocked+0x164/0x200 [drm_kms_helper]
    [   45.067462]  ? drm_fb_helper_hotplug_event+0x30/0x30 [drm_kms_helper]
    [   45.067508]  ? kasan_check_write+0x14/0x20
    [   45.070360]  ? mutex_unlock+0x22/0x40
    [   45.073748]  drm_fb_helper_set_par+0xb2/0xf0 [drm_kms_helper]
    [   45.075846]  drm_fb_helper_hotplug_event.part.33+0x1cd/0x290 [drm_kms_helper]
    [   45.078088]  drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper]
    [   45.082614]  intel_fbdev_output_poll_changed+0x9f/0x140 [i915]
    [   45.087069]  drm_kms_helper_hotplug_event+0x67/0x90 [drm_kms_helper]
    [   45.089319]  intel_dp_mst_hotplug+0x37/0x50 [i915]
    [   45.091496]  drm_dp_destroy_connector_work+0x510/0x6f0 [drm_kms_helper]
    [   45.093675]  ? drm_dp_update_payload_part1+0x1220/0x1220 [drm_kms_helper]
    [   45.095851]  ? kasan_check_write+0x14/0x20
    [   45.098473]  ? kasan_check_read+0x11/0x20
    [   45.101155]  ? strscpy+0x17c/0x530
    [   45.103808]  ? __switch_to_asm+0x34/0x70
    [   45.106456]  ? syscall_return_via_sysret+0xf/0x7f
    [   45.109711]  ? read_word_at_a_time+0x20/0x20
    [   45.113138]  ? __switch_to_asm+0x40/0x70
    [   45.116529]  ? __switch_to_asm+0x34/0x70
    [   45.119891]  ? __switch_to_asm+0x40/0x70
    [   45.123224]  ? __switch_to_asm+0x34/0x70
    [   45.126540]  ? __switch_to_asm+0x34/0x70
    [   45.129824]  process_one_work+0x88d/0x15d0
    [   45.133172]  ? pool_mayday_timeout+0x850/0x850
    [   45.136459]  ? pci_mmcfg_check_reserved+0x110/0x128
    [   45.139739]  ? wake_q_add+0xb0/0xb0
    [   45.143010]  ? check_preempt_wakeup+0x652/0x1050
    [   45.146304]  ? worker_enter_idle+0x29e/0x740
    [   45.149589]  ? __schedule+0x1ec0/0x1ec0
    [   45.152937]  ? kasan_check_read+0x11/0x20
    [   45.156179]  ? _raw_spin_lock_irq+0xa3/0x130
    [   45.159382]  ? _raw_read_unlock_irqrestore+0x30/0x30
    [   45.162542]  ? kasan_check_write+0x14/0x20
    [   45.165657]  worker_thread+0x1a5/0x1470
    [   45.168725]  ? set_load_weight+0x2e0/0x2e0
    [   45.171755]  ? process_one_work+0x15d0/0x15d0
    [   45.174806]  ? __switch_to_asm+0x34/0x70
    [   45.177645]  ? __switch_to_asm+0x40/0x70
    [   45.180323]  ? __switch_to_asm+0x34/0x70
    [   45.182936]  ? __switch_to_asm+0x40/0x70
    [   45.185539]  ? __switch_to_asm+0x34/0x70
    [   45.188100]  ? __switch_to_asm+0x40/0x70
    [   45.190628]  ? __schedule+0x7d4/0x1ec0
    [   45.193143]  ? save_stack+0xa9/0xd0
    [   45.195632]  ? kasan_check_write+0x10/0x20
    [   45.198162]  ? kasan_kmalloc+0xc4/0xe0
    [   45.200609]  ? kmem_cache_alloc_trace+0xdd/0x190
    [   45.203046]  ? kthread+0x9f/0x3b0
    [   45.205470]  ? ret_from_fork+0x35/0x40
    [   45.207876]  ? unwind_next_frame+0x43/0x50
    [   45.210273]  ? __save_stack_trace+0x82/0x100
    [   45.212658]  ? deactivate_slab.isra.67+0x3d4/0x580
    [   45.215026]  ? default_wake_function+0x35/0x50
    [   45.217399]  ? kasan_check_read+0x11/0x20
    [   45.219825]  ? _raw_spin_lock_irqsave+0xae/0x140
    [   45.222174]  ? __lock_text_start+0x8/0x8
    [   45.224521]  ? replenish_dl_entity.cold.62+0x4f/0x4f
    [   45.226868]  ? __kthread_parkme+0x87/0xf0
    [   45.229200]  kthread+0x2f7/0x3b0
    [   45.231557]  ? process_one_work+0x15d0/0x15d0
    [   45.233923]  ? kthread_park+0x120/0x120
    [   45.236249]  ret_from_fork+0x35/0x40
    
    [   45.240875] Allocated by task 242:
    [   45.243136]  save_stack+0x43/0xd0
    [   45.245385]  kasan_kmalloc+0xc4/0xe0
    [   45.247597]  kmem_cache_alloc_trace+0xdd/0x190
    [   45.249793]  drm_dp_add_port+0x1e0/0x2170 [drm_kms_helper]
    [   45.252000]  drm_dp_send_link_address+0x4a7/0x740 [drm_kms_helper]
    [   45.254389]  drm_dp_check_and_send_link_address+0x1a7/0x210 [drm_kms_helper]
    [   45.256803]  drm_dp_mst_link_probe_work+0x6f/0xb0 [drm_kms_helper]
    [   45.259200]  process_one_work+0x88d/0x15d0
    [   45.261597]  worker_thread+0x1a5/0x1470
    [   45.264038]  kthread+0x2f7/0x3b0
    [   45.266371]  ret_from_fork+0x35/0x40
    
    [   45.270937] Freed by task 53:
    [   45.273170]  save_stack+0x43/0xd0
    [   45.275382]  __kasan_slab_free+0x139/0x190
    [   45.277604]  kasan_slab_free+0xe/0x10
    [   45.279826]  kfree+0x99/0x1b0
    [   45.282044]  drm_dp_free_mst_port+0x4a/0x60 [drm_kms_helper]
    [   45.284330]  drm_dp_destroy_connector_work+0x43e/0x6f0 [drm_kms_helper]
    [   45.286660]  process_one_work+0x88d/0x15d0
    [   45.288934]  worker_thread+0x1a5/0x1470
    [   45.291231]  kthread+0x2f7/0x3b0
    [   45.293547]  ret_from_fork+0x35/0x40
    
    [   45.298206] The buggy address belongs to the object at ffff8882b4b70968
                    which belongs to the cache kmalloc-2k of size 2048
    [   45.303047] The buggy address is located 0 bytes inside of
                    2048-byte region [ffff8882b4b70968, ffff8882b4b71168)
    [   45.308010] The buggy address belongs to the page:
    [   45.310477] page:ffffea000ad2dc00 count:1 mapcount:0 mapping:ffff8882c080cf40 index:0x0 compound_mapcount: 0
    [   45.313051] flags: 0x8000000000010200(slab|head)
    [   45.315635] raw: 8000000000010200 ffffea000aac2808 ffffea000abe8608 ffff8882c080cf40
    [   45.318300] raw: 0000000000000000 00000000000d000d 00000001ffffffff 0000000000000000
    [   45.320966] page dumped because: kasan: bad access detected
    
    [   45.326312] Memory state around the buggy address:
    [   45.329085]  ffff8882b4b70800: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [   45.331845]  ffff8882b4b70880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [   45.334584] >ffff8882b4b70900: fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb
    [   45.337302]                                                           ^
    [   45.340061]  ffff8882b4b70980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [   45.342910]  ffff8882b4b70a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
    [   45.345748] ==================================================================
    
    So, this definitely isn't a fix that we want. This being said; there's
    no real easy fix for this problem because of some of the catch-22's of
    the MST helpers current design. For starters; we always need to validate
    a port with drm_dp_get_validated_port_ref(), but validation relies on
    the lifetime of the port in the actual topology. So once the port is
    gone, it can't be validated again.
    
    If we were to try to make the payload helpers not use port validation,
    then we'd cause another problem: if the port isn't validated, it could
    be freed and we'd just start causing more KASAN issues. There are
    already hacks that attempt to workaround this in
    drm_dp_mst_destroy_connector_work() by re-initializing the kref so that
    it can be used again and it's memory can be freed once the VCPI helpers
    finish removing the port's respective payloads. But none of these really
    do anything helpful since the port still can't be validated since it's
    gone from the topology. Also, that workaround is immensely confusing to
    read through.
    
    What really needs to be done in order to fix this is to teach DRM how to
    track the lifetime of the structs for MST ports and branch devices
    separately from their lifetime in the actual topology. Simply put; this
    means having two different krefs-one that removes the port/branch device
    from the topology, and one that finally calls kfree(). This would let us
    simplify things, since we'd now be able to keep ports around without
    having to keep them in the topology at the same time, which is exactly
    what we need in order to teach our VCPI helpers to only validate ports
    when it's actually necessary without running the risk of trying to use
    unallocated memory.
    
    Such a fix is on it's way, but for now let's play it safe and just
    revert this. If this bug has been around for well over a year, we can
    wait a little while to get an actual proper fix here.
    
    Signed-off-by: Lyude Paul's avatarLyude Paul <lyude@redhat.com>
    Fixes: c54c7374
    
     ("drm/dp_mst: Skip validating ports during destruction, just ref")
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Sean Paul <sean@poorly.run>
    Cc: Jerry Zuo <Jerry.Zuo@amd.com>
    Cc: Harry Wentland <Harry.Wentland@amd.com>
    Cc: stable@vger.kernel.org # v4.6+
    Acked-by: default avatarSean Paul <sean@poorly.run>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181128210005.24434-1-lyude@redhat.com
    9765635b