AMDGPU oops on 5.14.8 riscv64 during module initialization
Linux 5.14.8 on riscv64 (HiFive Unmatched)
Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 550 640SP / RX 560/560X] (rev ff)
cc @equan, who was recently in this function
[ 34.854760] Oops [#1]
[ 34.856265] Modules linked in: amdgpu(+) mfd_core gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm drm_panel_orientation_quirks ofpart spi_nor mtd mscc macb phylink efivarfs xhci_pci xhci_hcd nvme nvme_core hwmon loop usb_storage usbcore usb_common
[ 34.886245] CPU: 1 PID: 1038 Comm: modprobe Not tainted 5.14.8-1-edge #2-Alpine
[ 34.893523] Hardware name: SiFive HiFive Unmatched A00 (DT)
[ 34.899082] epc : __memcpy+0x3c/0xf8
[ 34.902640] ra : amdgpu_uvd_suspend+0x15a/0x186 [amdgpu]
[ 34.920534] epc : ffffffff8031fd7c ra : ffffffff013177b0 sp : ffffffd0042d3870
[ 34.927739] gp : ffffffff80f3b2c0 tp : ffffffe0827e8ac0 t0 : ffffffe086a00000
[ 34.934948] t1 : ffffffff80f8ef70 t2 : 000000000000b05c s0 : ffffffd0042d3900
[ 34.942157] s1 : ffffffe0856e0000 a0 : ffffffe086600000 a1 : ffffffd115654000
[ 34.949367] a2 : 00000000002c2000 a3 : ffffffd115916000 a4 : 00000000002c2000
[ 34.956576] a5 : 0000000000000000 a6 : 0000000200000022 a7 : ffffffff80e0a370
[ 34.963795] s2 : ffffffe0856ef000 s3 : ffffffe0856f0000 s4 : 000000000000f000
[ 34.971003] s5 : 0000000000000910 s6 : 00000000ffffffff s7 : 0000000000001000
[ 34.978212] s8 : ffffffe0856ef000 s9 : 0000000000000000 s10: 00000000002c2000
[ 34.985421] s11: ffffffd115654000 t3 : 0000000000000030 t4 : 0000000000000000
[ 34.992630] t5 : 0000000002c186a0 t6 : ffffffe086600000
[ 34.997923] status: 0000000200000120 badaddr: ffffffd115654000 cause: 000000000000000d
[ 35.005834] [<ffffffff8031fd7c>] __memcpy+0x3c/0xf8
[ 35.010694] [<ffffffff0131a9b8>] uvd_v6_0_sw_fini+0x1c/0x8c [amdgpu]
[ 35.024313] [<ffffffff0125cbca>] amdgpu_device_fini_sw+0xc8/0x2c0 [amdgpu]
[ 35.037523] [<ffffffff01261ece>] amdgpu_driver_release_kms+0x16/0x28 [amdgpu]
[ 35.050630] [<ffffffff01124cbe>] devm_drm_dev_init_release+0x40/0x6c [drm]
[ 35.057627] [<ffffffff803a759e>] devm_action_release+0xe/0x16
[ 35.063355] [<ffffffff803a8714>] devres_release_all+0x88/0xce
[ 35.069087] [<ffffffff803a41ca>] really_probe.part.0+0xc2/0x224
[ 35.074994] [<ffffffff803a439c>] __driver_probe_device+0x70/0xde
[ 35.080987] [<ffffffff803a4438>] driver_probe_device+0x2e/0xf6
[ 35.086806] [<ffffffff803a4a50>] __driver_attach+0x88/0x142
[ 35.092365] [<ffffffff803a2690>] bus_for_each_dev+0x52/0x8c
[ 35.097924] [<ffffffff803a3bbe>] driver_attach+0x1a/0x22
[ 35.103222] [<ffffffff803a36de>] bus_add_driver+0xca/0x17c
[ 35.108694] [<ffffffff803a507a>] driver_register+0x48/0xd8
[ 35.114166] [<ffffffff80337402>] __pci_register_driver+0x40/0x48
[ 35.120161] [<ffffffff01760082>] amdgpu_init+0x82/0x1000 [amdgpu]
[ 35.133177] [<ffffffff800020ea>] do_one_initcall+0x3e/0x168
[ 35.138727] [<ffffffff80077f06>] do_init_module+0x46/0x208
[ 35.144199] [<ffffffff80079c64>] load_module+0x1af4/0x2050
[ 35.149670] [<ffffffff8007a37a>] __do_sys_finit_module+0x8a/0xb6
[ 35.155663] [<ffffffff8007a3ca>] sys_finit_module+0x10/0x18
[ 35.161222] [<ffffffff8000304e>] ret_from_syscall+0x0/0x2
[ 35.166703] ---[ end trace 799a431749fc53bc ]---
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Owner
Is this a regression?
- Author
Working on finding out, will update when I know. It's a bit of an endeavor to build and test kernels for this system.
- Owner
Please attach the full dmesg output as well.
- Author
Yep. Here's the complete log:
Loading Linux edge ... Loading initial ramdisk ... EFI stub: Booting Linux Kernel... EFI stub: Using DTB from configuration table EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path EFI stub: Exiting boot services and installing virtual address map... [ 0.000000] Linux version 5.14.8-1-edge (sircmpwn@taiga) (gcc (Alpine 10.3.1_git20210625) 10.3.1 20210625, GNU ld (GNU Binutils) 2.35.2) #2-Alpine SMP PREEMPT Thu, 30 Sep 2021 16:44:59 +0000 [ 0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000 [ 0.000000] Machine model: SiFive HiFive Unmatched A00 [ 0.000000] efi: EFI v2.80 by Das U-Boot [ 0.000000] efi: RTPROP=0xfe727040 SMBIOS=0xfe723000 MEMRESERVE=0xdc8ae040 [ 0.000000] OF: fdt: Ignoring memory block 0x80000000 - 0x80040000 [ 0.000000] OF: fdt: Ignoring memory range 0x80040000 - 0x80200000 [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080200000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x000000047fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080200000-0x00000000fe6f1fff] [ 0.000000] node 0: [mem 0x00000000fe6f2000-0x00000000fe6f3fff] [ 0.000000] node 0: [mem 0x00000000fe6f4000-0x00000000fe71afff] [ 0.000000] node 0: [mem 0x00000000fe71b000-0x00000000fe721fff] [ 0.000000] node 0: [mem 0x00000000fe722000-0x00000000fe722fff] [ 0.000000] node 0: [mem 0x00000000fe723000-0x00000000fe723fff] [ 0.000000] node 0: [mem 0x00000000fe724000-0x00000000fe725fff] [ 0.000000] node 0: [mem 0x00000000fe726000-0x00000000fe729fff] [ 0.000000] node 0: [mem 0x00000000fe72a000-0x00000000fe72afff] [ 0.000000] node 0: [mem 0x00000000fe72b000-0x00000000fe72ffff] [ 0.000000] node 0: [mem 0x00000000fe730000-0x00000000fe730fff] [ 0.000000] node 0: [mem 0x00000000fe731000-0x00000000fe731fff] [ 0.000000] node 0: [mem 0x00000000fe732000-0x00000000fe732fff] [ 0.000000] node 0: [mem 0x00000000fe733000-0x00000000fe733fff] [ 0.000000] node 0: [mem 0x00000000fe734000-0x00000000fe735fff] [ 0.000000] node 0: [mem 0x00000000fe736000-0x00000000fe737fff] [ 0.000000] node 0: [mem 0x00000000fe738000-0x00000000fe738fff] [ 0.000000] node 0: [mem 0x00000000fe739000-0x00000000fe739fff] [ 0.000000] node 0: [mem 0x00000000fe73a000-0x00000000fe73bfff] [ 0.000000] node 0: [mem 0x00000000fe73c000-0x00000000fe73cfff] [ 0.000000] node 0: [mem 0x00000000fe73d000-0x00000000fff65fff] [ 0.000000] node 0: [mem 0x00000000fff66000-0x00000000fff66fff] [ 0.000000] node 0: [mem 0x00000000fff67000-0x000000047fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000047fffffff] [ 0.000000] SBI specification v0.2 detected [ 0.000000] SBI implementation ID=0x1 Version=0x9 [ 0.000000] SBI TIME extension detected [ 0.000000] SBI IPI extension detected [ 0.000000] SBI RFENCE extension detected [ 0.000000] SBI v0.2 HSM extension detected [ 0.000000] CPU with hartid=0 is not available [ 0.000000] CPU with hartid=0 is not available [ 0.000000] riscv: ISA extensions acdfim [ 0.000000] riscv: ELF capabilities acdfim [ 0.000000] percpu: Embedded 20 pages/cpu s41240 r8192 d32488 u81920 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 4136455 [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-edge root=UUID=3b64e102-46c4-49c6-a66b-cc731583c976 ro modules=sd-mod,usb-storage,ext4 rootfstype=ext4 [ 0.000000] Unknown command line parameters: BOOT_IMAGE=/boot/vmlinuz-edge modules=sd-mod,usb-storage,ext4 [ 0.000000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear) [ 0.000000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.000000] Sorting __ex_table... [ 0.000000] mem auto-init: stack:off, heap alloc:on, heap free:off [ 0.000000] software IO TLB: mapped [mem 0x00000000fa6f2000-0x00000000fe6f2000] (64MB) [ 0.000000] Memory: 16399252K/16775168K available (5465K kernel code, 5261K rwdata, 2048K rodata, 2144K init, 438K bss, 375916K reserved, 0K cma-reserved) [ 0.000000] random: get_random_u64 called from cache_random_seq_create+0x70/0x154 with crng_init=0 [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=32 to nr_cpu_ids=4. [ 0.000000] Trampoline variant of Tasks RCU enabled. [ 0.000000] Tracing variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] CPU with hartid=0 is not available [ 0.000000] riscv-intc: unable to find hart id for /cpus/cpu@0/interrupt-controller [ 0.000000] riscv-intc: 64 local interrupts mapped [ 0.000000] plic: interrupt-controller@c000000: mapped 69 interrupts with 4 handlers for 9 contexts. [ 0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [3] [ 0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns [ 0.000002] sched_clock: 64 bits at 1000kHz, resolution 1000ns, wraps every 2199023255500ns [ 0.000286] Console: colour dummy device 80x25 [ 0.001725] printk: console [tty0] enabled [ 0.001806] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=4000) [ 0.001870] pid_max: default: 32768 minimum: 301 [ 0.001983] LSM: Security Framework initializing [ 0.002689] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.003418] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear) [ 0.006014] ASID allocator disabled [ 0.006164] rcu: Hierarchical SRCU implementation. [ 0.006534] Remapping and enabling EFI services. [ 0.006997] smp: Bringing up secondary CPUs ... [ 0.009532] smp: Brought up 1 node, 4 CPUs [ 0.012273] devtmpfs: initialized [ 0.015258] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.015525] futex hash table entries: 1024 (order: 4, 65536 bytes, linear) [ 0.016336] NET: Registered PF_NETLINK/PF_ROUTE protocol family [ 0.016551] audit: initializing netlink subsys (disabled) [ 0.016896] audit: type=2000 audit(0.016:1): state=initialized audit_enabled=0 res=1 [ 0.029545] iommu: Default domain type: Translated [ 0.029853] vgaarb: loaded [ 0.030148] SCSI subsystem initialized [ 0.030457] Registered efivars operations [ 0.031154] clocksource: Switched to clocksource riscv_clocksource [ 0.031836] VFS: Disk quotas dquot_6.6.0 [ 0.032064] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.038946] NET: Registered PF_INET protocol family [ 0.046114] IP idents hash table entries: 262144 (order: 9, 2097152 bytes, linear) [ 0.062221] tcp_listen_portaddr_hash hash table entries: 8192 (order: 5, 131072 bytes, linear) [ 0.066140] TCP established hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.072097] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.074116] TCP: Hash tables configured (established 131072 bind 65536) [ 0.076837] MPTCP token hash table entries: 16384 (order: 6, 393216 bytes, linear) [ 0.078335] UDP hash table entries: 8192 (order: 6, 262144 bytes, linear) [ 0.079699] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes, linear) [ 0.080367] NET: Registered PF_UNIX/PF_LOCAL protocol family [ 0.080424] PCI: CLS 0 bytes, default 64 [ 0.081161] Unpacking initramfs... [ 0.100752] Initialise system trusted keyrings [ 0.101160] workingset: timestamp_bits=62 max_order=22 bucket_order=0 [ 0.115293] NET: Registered PF_ALG protocol family [ 0.115368] Key type asymmetric registered [ 0.115392] Asymmetric key parser 'x509' registered [ 0.115502] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) [ 0.160353] fu740-pcie e00000000.pcie: host bridge /soc/pcie@e00000000 ranges: [ 0.160502] fu740-pcie e00000000.pcie: IO 0x0060080000..0x006008ffff -> 0x0060080000 [ 0.160571] fu740-pcie e00000000.pcie: MEM 0x0060090000..0x0070ffffff -> 0x0060090000 [ 0.160620] fu740-pcie e00000000.pcie: MEM 0x2000000000..0x3fffffffff -> 0x2000000000 [ 0.268155] fu740-pcie e00000000.pcie: invalid resource [ 0.268235] fu740-pcie e00000000.pcie: iATU unroll: enabled [ 0.268261] fu740-pcie e00000000.pcie: Detected iATU regions: 8 outbound, 8 inbound [ 0.368386] fu740-pcie e00000000.pcie: Link up [ 0.368721] fu740-pcie e00000000.pcie: PCI host bridge to bus 0000:00 [ 0.368761] pci_bus 0000:00: root bus resource [bus 00-ff] [ 0.368798] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] (bus address [0x60080000-0x6008ffff]) [ 0.368833] pci_bus 0000:00: root bus resource [mem 0x60090000-0x70ffffff] [ 0.368865] pci_bus 0000:00: root bus resource [mem 0x2000000000-0x3fffffffff pref] [ 0.368947] pci 0000:00:00.0: [f15e:0000] type 01 class 0x060400 [ 0.368986] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff] [ 0.369021] pci 0000:00:00.0: reg 0x38: [mem 0x00000000-0x0000ffff pref] [ 0.369104] pci 0000:00:00.0: supports D1 [ 0.369127] pci 0000:00:00.0: PME# supported from D0 D1 D3hot [ 0.370163] pci 0000:01:00.0: [1b21:2824] type 01 class 0x060400 [ 0.370266] pci 0000:01:00.0: enabling Extended Tags [ 0.370410] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold [ 0.381091] Freeing initrd memory: 5828K [ 0.383875] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.384234] pci 0000:02:00.0: [1b21:2824] type 01 class 0x060400 [ 0.384334] pci 0000:02:00.0: enabling Extended Tags [ 0.384489] pci 0000:02:00.0: PME# supported from D0 D3hot D3cold [ 0.384880] pci 0000:02:02.0: [1b21:2824] type 01 class 0x060400 [ 0.384975] pci 0000:02:02.0: enabling Extended Tags [ 0.385110] pci 0000:02:02.0: PME# supported from D0 D3hot D3cold [ 0.385365] pci 0000:02:03.0: [1b21:2824] type 01 class 0x060400 [ 0.385456] pci 0000:02:03.0: enabling Extended Tags [ 0.385600] pci 0000:02:03.0: PME# supported from D0 D3hot D3cold [ 0.385844] pci 0000:02:04.0: [1b21:2824] type 01 class 0x060400 [ 0.385938] pci 0000:02:04.0: enabling Extended Tags [ 0.386071] pci 0000:02:04.0: PME# supported from D0 D3hot D3cold [ 0.386466] pci 0000:02:08.0: [1b21:2824] type 01 class 0x060400 [ 0.386558] pci 0000:02:08.0: enabling Extended Tags [ 0.386688] pci 0000:02:08.0: PME# supported from D0 D3hot D3cold [ 0.387396] pci 0000:02:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.387450] pci 0000:02:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.387497] pci 0000:02:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.387544] pci 0000:02:04.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.387590] pci 0000:02:08.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.388331] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03 [ 0.388545] pci 0000:04:00.0: [1b21:1142] type 00 class 0x0c0330 [ 0.388613] pci 0000:04:00.0: reg 0x10: [mem 0x00000000-0x00007fff 64bit] [ 0.388827] pci 0000:04:00.0: PME# supported from D3cold [ 0.399725] pci_bus 0000:04: busn_res: [bus 04-ff] end is updated to 04 [ 0.400410] pci_bus 0000:05: busn_res: [bus 05-ff] end is updated to 05 [ 0.400618] pci 0000:06:00.0: [15b7:5002] type 00 class 0x010802 [ 0.400678] pci 0000:06:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit] [ 0.400745] pci 0000:06:00.0: reg 0x20: [mem 0x00000000-0x000000ff 64bit] [ 0.411770] pci_bus 0000:06: busn_res: [bus 06-ff] end is updated to 06 [ 0.411995] pci 0000:07:00.0: [1002:67ff] type 00 class 0x030000 [ 0.412058] pci 0000:07:00.0: reg 0x10: [mem 0x00000000-0x0fffffff 64bit pref] [ 0.412118] pci 0000:07:00.0: reg 0x18: [mem 0x00000000-0x001fffff 64bit pref] [ 0.412166] pci 0000:07:00.0: reg 0x20: initial BAR value 0x00000000 invalid [ 0.412191] pci 0000:07:00.0: reg 0x20: [io size 0x0100] [ 0.412224] pci 0000:07:00.0: reg 0x24: [mem 0x00000000-0x0003ffff] [ 0.412264] pci 0000:07:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref] [ 0.412302] pci 0000:07:00.0: enabling Extended Tags [ 0.412529] pci 0000:07:00.0: supports D1 D2 [ 0.412549] pci 0000:07:00.0: PME# supported from D1 D2 D3hot D3cold [ 0.412756] pci 0000:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 0.413012] pci 0000:07:00.1: [1002:aae0] type 00 class 0x040300 [ 0.413076] pci 0000:07:00.1: reg 0x10: [mem 0x00000000-0x00003fff 64bit] [ 0.413175] pci 0000:07:00.1: enabling Extended Tags [ 0.413328] pci 0000:07:00.1: supports D1 D2 [ 0.423746] pci_bus 0000:07: busn_res: [bus 07-ff] end is updated to 07 [ 0.423791] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 07 [ 0.423828] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 07 [ 0.423905] pci 0000:00:00.0: BAR 9: assigned [mem 0x2000000000-0x2017ffffff 64bit pref] [ 0.423941] pci 0000:00:00.0: BAR 0: assigned [mem 0x60100000-0x601fffff] [ 0.423970] pci 0000:00:00.0: BAR 8: assigned [mem 0x60200000-0x604fffff] [ 0.423997] pci 0000:00:00.0: BAR 6: assigned [mem 0x60090000-0x6009ffff pref] [ 0.424035] pci 0000:00:00.0: BAR 7: assigned [io 0x0000-0x0fff] [ 0.424064] pci 0000:01:00.0: BAR 9: assigned [mem 0x2000000000-0x2017ffffff 64bit pref] [ 0.424097] pci 0000:01:00.0: BAR 8: assigned [mem 0x60200000-0x604fffff] [ 0.424123] pci 0000:01:00.0: BAR 7: assigned [io 0x0000-0x0fff] [ 0.424151] pci 0000:02:08.0: BAR 9: assigned [mem 0x2000000000-0x2017ffffff 64bit pref] [ 0.424187] pci 0000:02:02.0: BAR 8: assigned [mem 0x60200000-0x602fffff] [ 0.424216] pci 0000:02:04.0: BAR 8: assigned [mem 0x60300000-0x603fffff] [ 0.424244] pci 0000:02:08.0: BAR 8: assigned [mem 0x60400000-0x604fffff] [ 0.424271] pci 0000:02:08.0: BAR 7: assigned [io 0x0000-0x0fff] [ 0.424298] pci 0000:02:00.0: PCI bridge to [bus 03] [ 0.424338] pci 0000:04:00.0: BAR 0: assigned [mem 0x60200000-0x60207fff 64bit] [ 0.424393] pci 0000:02:02.0: PCI bridge to [bus 04] [ 0.424419] pci 0000:02:02.0: bridge window [mem 0x60200000-0x602fffff] [ 0.424454] pci 0000:02:03.0: PCI bridge to [bus 05] [ 0.424491] pci 0000:06:00.0: BAR 0: assigned [mem 0x60300000-0x60303fff 64bit] [ 0.424544] pci 0000:06:00.0: BAR 4: assigned [mem 0x60304000-0x603040ff 64bit] [ 0.424595] pci 0000:02:04.0: PCI bridge to [bus 06] [ 0.424621] pci 0000:02:04.0: bridge window [mem 0x60300000-0x603fffff] [ 0.424659] pci 0000:07:00.0: BAR 0: assigned [mem 0x2000000000-0x200fffffff 64bit pref] [ 0.424709] pci 0000:07:00.0: BAR 2: assigned [mem 0x2010000000-0x20101fffff 64bit pref] [ 0.424758] pci 0000:07:00.0: BAR 5: assigned [mem 0x60400000-0x6043ffff] [ 0.424789] pci 0000:07:00.0: BAR 6: assigned [mem 0x60440000-0x6045ffff pref] [ 0.424827] pci 0000:07:00.1: BAR 0: assigned [mem 0x60460000-0x60463fff 64bit] [ 0.424883] pci 0000:07:00.0: BAR 4: assigned [io 0x0000-0x00ff] [ 0.424912] pci 0000:02:08.0: PCI bridge to [bus 07] [ 0.424935] pci 0000:02:08.0: bridge window [io 0x0000-0x0fff] [ 0.424964] pci 0000:02:08.0: bridge window [mem 0x60400000-0x604fffff] [ 0.424991] pci 0000:02:08.0: bridge window [mem 0x2000000000-0x2017ffffff 64bit pref] [ 0.425028] pci 0000:01:00.0: PCI bridge to [bus 02-07] [ 0.425055] pci 0000:01:00.0: bridge window [io 0x0000-0x0fff] [ 0.425083] pci 0000:01:00.0: bridge window [mem 0x60200000-0x604fffff] [ 0.425110] pci 0000:01:00.0: bridge window [mem 0x2000000000-0x2017ffffff 64bit pref] [ 0.425147] pci 0000:00:00.0: PCI bridge to [bus 01-07] [ 0.425172] pci 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 0.425196] pci 0000:00:00.0: bridge window [mem 0x60200000-0x604fffff] [ 0.425221] pci 0000:00:00.0: bridge window [mem 0x2000000000-0x2017ffffff 64bit pref] [ 0.425568] pcieport 0000:00:00.0: PME: Signaling with IRQ 46 [ 0.425796] pcieport 0000:01:00.0: enabling device (0000 -> 0003) [ 0.426330] pcieport 0000:02:02.0: enabling device (0000 -> 0002) [ 0.426954] pcieport 0000:02:04.0: enabling device (0000 -> 0002) [ 0.427352] pcieport 0000:02:08.0: enabling device (0000 -> 0003) [ 0.427658] pci 0000:04:00.0: enabling device (0000 -> 0002) [ 0.427854] pci 0000:07:00.1: D0 power state depends on 0000:07:00.0 [ 0.428443] L2CACHE: DataError @ 0x00000001.223167F0 [ 0.428567] L2CACHE: No. of Banks in the cache: 4 [ 0.428592] L2CACHE: No. of ways per bank: 16 [ 0.428613] L2CACHE: Sets per bank: 512 [ 0.428634] L2CACHE: Bytes per cache block: 64 [ 0.428657] L2CACHE: Index of the largest way enabled: 15 [ 0.430606] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.432110] 10010000.serial: ttySIF0 at MMIO 0x10010000 (irq = 1, base_baud = 115200) is a SiFive UART v0 [ 1.939139] printk: console [ttySIF0] enabled [ 1.943830] 10011000.serial: ttySIF1 at MMIO 0x10011000 (irq = 2, base_baud = 115200) is a SiFive UART v0 [ 1.954904] sifive_spi 10040000.spi: mapped; irq=4, cs=1 [ 1.960006] sifive_spi 10050000.spi: mapped; irq=6, cs=1 [ 1.965724] libphy: Fixed MDIO Bus: probed [ 1.995049] mmc_spi spi0.0: SD/MMC host mmc0, no DMA, no WP, no poweroff, cd polling [ 2.003035] NET: Registered PF_INET6 protocol family [ 2.008562] Segment Routing with IPv6 [ 2.011600] Key type dns_resolver registered [ 2.015833] registered taskstats version 1 [ 2.019825] Loading compiled-in X.509 certificates [ 2.028221] Loaded X.509 cert 'Build time autogenerated kernel key: 1b87df832ec5233f727ed6a58e6ed94dcde52301' [ 2.037715] Key type ._fscrypt registered [ 2.041412] Key type .fscrypt registered [ 2.045328] Key type fscrypt-provisioning registered [ 2.057422] mmc0: host does not support reading read-only switch, assuming write-enable [ 2.064739] mmc0: new SDHC card on SPI [ 2.070787] Freeing unused kernel image (initmem) memory: 2144K [ 2.078031] mmcblk0: mmc0:0000 SA32G 28.8 GiB [ 2.082148] Run /init as init process [ 2.118265] mmcblk0: p1 p2 [ 2.129609] Alpine Init 3.5.0-r1 Alpine I[ 2.133149] Loading boot drivers... nit 3.5.0-r1 * Loading boot drivers: [ 2.251083] usbcore: registered new interface driver usbfs [ 2.255927] usbcore: registered new interface driver hub [ 2.261195] usbcore: registered new device driver usb [ 2.276193] usbcore: registered new interface driver usb-storage [ 2.287964] loop: module loaded [ 2.291171] Loading boot drivers: ok. ok. [ 2.297128] Mounting root... * Mounting root: [ 2.384303] nvme nvme0: pci function 0000:06:00.0 [ 2.388332] nvme 0000:06:00.0: enabling device (0000 -> 0002) [ 2.408210] nvme nvme0: 4/0/0 default/read/poll queues [ 2.423985] nvme0n1: p1 p2 p3 [ 2.477095] xhci_hcd 0000:04:00.0: xHCI Host Controller [ 2.481625] xhci_hcd 0000:04:00.0: new USB bus registered, assigned bus number 1 [ 2.629724] xhci_hcd 0000:04:00.0: hcc params 0x0200e080 hci version 0x100 quirks 0x0000000010800410 [ 2.638622] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.14 [ 2.646374] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.653572] usb usb1: Product: xHCI Host Controller [ 2.658429] usb usb1: Manufacturer: Linux 5.14.8-1-edge xhci-hcd [ 2.664423] usb usb1: SerialNumber: 0000:04:00.0 [ 2.669516] hub 1-0:1.0: USB hub found [ 2.672803] hub 1-0:1.0: 2 ports detected [ 2.677204] xhci_hcd 0000:04:00.0: xHCI Host Controller [ 2.681980] xhci_hcd 0000:04:00.0: new USB bus registered, assigned bus number 2 [ 2.689369] xhci_hcd 0000:04:00.0: Host supports USB 3.0 SuperSpeed [ 2.696393] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.703933] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.14 [ 2.711988] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 2.719191] usb usb2: Product: xHCI Host Controller [ 2.724047] usb usb2: Manufacturer: Linux 5.14.8-1-edge xhci-hcd [ 2.730042] usb usb2: SerialNumber: 0000:04:00.0 [ 2.735027] hub 2-0:1.0: USB hub found [ 2.738413] hub 2-0:1.0: 2 ports detected [ 3.003182] usb 1-2: new high-speed USB device number 2 using xhci_hcd [ 3.457292] usb 1-2: New USB device found, idVendor=174c, idProduct=2074, bcdDevice= 0.01 [ 3.464762] usb 1-2: New USB device strings: Mfr=2, Product=3, SerialNumber=1 [ 3.471852] usb 1-2: Product: AS2107 [ 3.475411] usb 1-2: Manufacturer: ASMedia [ 3.479493] usb 1-2: SerialNumber: USB2.0 Hub [ 3.553234] hub 1-2:1.0: USB hub found [ 3.556487] hub 1-2:1.0: 4 ports detected [ 3.583481] random: fast init done [ 3.604962] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 3.634834] usb 2-2: New USB device found, idVendor=174c, idProduct=3074, bcdDevice= 0.01 [ 3.642300] usb 2-2: New USB device strings: Mfr=2, Product=3, SerialNumber=1 [ 3.649511] usb 2-2: Product: AS2107 [ 3.652985] usb 2-2: Manufacturer: ASMedia [ 3.657283] usb 2-2: SerialNumber: USB2.0 Hub [ 3.749202] hub 2-2:1.0: USB hub found [ 3.752879] hub 2-2:1.0: 4 ports detected [ 4.258588] EXT4-fs (nvme0n1p3): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. [ 4.268455] Mounting root: ok. ok. OpenRC 0.44.5.e64faccf77 is starting up Linux 5.14.8-1-edge (riscv64) * /proc is already mounted * Mounting /run ... * /run/openrc: creating directory * /run/lock: creating directory * /run/lock: correcting owner * Caching service dependencies ... [ ok ] * Caching service dependencies ... [ ok ] * Clock skew detected with `(null)' * Adjusting mtime of `/run/openrc/deptree' to Thu Sep 23 08:31:44 2021 * WARNING: clock skew detected! * Remounting devtmpfs on /dev ... [ ok ] * Mounting /dev/mqueue ... [ ok ] * Mounting security filesystem ... [ ok ] * Mounting efivarfs filesystem ... [ ok ] * Starting busybox mdev ... [ ok ] * Loading hardware drivers ...[ 36.569387] Oops [#1] [ 36.570910] Modules linked in: amdgpu(+) mfd_core gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm drm_panel_orientation_quirks ofpart spi_nor mtd mscc macb phylink efivarfs xhci_pci xhci_hcd nvme nvme_core hwmon loop usb_storage usbcore usb_common [ 36.600892] CPU: 3 PID: 1037 Comm: modprobe Not tainted 5.14.8-1-edge #2-Alpine [ 36.608158] Hardware name: SiFive HiFive Unmatched A00 (DT) [ 36.613713] epc : __memcpy+0x3c/0xf8 [ 36.617271] ra : amdgpu_uvd_suspend+0x15a/0x186 [amdgpu] [ 36.634834] epc : ffffffff8031fd7c ra : ffffffff013017b0 sp : ffffffd0042e3870 [ 36.642035] gp : ffffffff80f3b2c0 tp : ffffffe082ad0000 t0 : ffffffe086600000 [ 36.649244] t1 : ffffffff80f8ef80 t2 : 000000000000b5e6 s0 : ffffffd0042e3900 [ 36.656453] s1 : ffffffe0826a0000 a0 : ffffffe086200000 a1 : ffffffd115354000 [ 36.663663] a2 : 00000000002c2000 a3 : ffffffd115616000 a4 : 00000000002c2000 [ 36.670872] a5 : 0000000000000000 a6 : 0000000200000022 a7 : ffffffff80e0a370 [ 36.678081] s2 : ffffffe0826af000 s3 : ffffffe0826b0000 s4 : 000000000000f000 [ 36.685290] s5 : 0000000000000910 s6 : 00000000ffffffff s7 : 0000000000001000 [ 36.692499] s8 : ffffffe0826af000 s9 : 0000000000000000 s10: 00000000002c2000 [ 36.699709] s11: ffffffd115354000 t3 : 0000000000000030 t4 : 0000000000000001 [ 36.706918] t5 : 0000000002d7b1ef t6 : ffffffe086200000 [ 36.712211] status: 0000000200000120 badaddr: ffffffd115354000 cause: 000000000000000d [ 36.720122] [<ffffffff8031fd7c>] __memcpy+0x3c/0xf8 [ 36.724983] [<ffffffff013049b8>] uvd_v6_0_sw_fini+0x1c/0x8c [amdgpu] [ 36.738397] [<ffffffff01246bca>] amdgpu_device_fini_sw+0xc8/0x2c0 [amdgpu] [ 36.751320] [<ffffffff0124bece>] amdgpu_driver_release_kms+0x16/0x28 [amdgpu] [ 36.764128] [<ffffffff0112bcbe>] devm_drm_dev_init_release+0x40/0x6c [drm] [ 36.771151] [<ffffffff803a759e>] devm_action_release+0xe/0x16 [ 36.776879] [<ffffffff803a8714>] devres_release_all+0x88/0xce [ 36.782611] [<ffffffff803a41ca>] really_probe.part.0+0xc2/0x224 [ 36.788517] [<ffffffff803a439c>] __driver_probe_device+0x70/0xde [ 36.794511] [<ffffffff803a4438>] driver_probe_device+0x2e/0xf6 [ 36.800330] [<ffffffff803a4a50>] __driver_attach+0x88/0x142 [ 36.805889] [<ffffffff803a2690>] bus_for_each_dev+0x52/0x8c [ 36.811448] [<ffffffff803a3bbe>] driver_attach+0x1a/0x22 [ 36.816746] [<ffffffff803a36de>] bus_add_driver+0xca/0x17c [ 36.822219] [<ffffffff803a507a>] driver_register+0x48/0xd8 [ 36.827691] [<ffffffff80337402>] __pci_register_driver+0x40/0x48 [ 36.833685] [<ffffffff0174a082>] amdgpu_init+0x82/0x1000 [amdgpu] [ 36.846415] [<ffffffff800020ea>] do_one_initcall+0x3e/0x168 [ 36.851963] [<ffffffff80077f06>] do_init_module+0x46/0x208 [ 36.857435] [<ffffffff80079c64>] load_module+0x1af4/0x2050 [ 36.862907] [<ffffffff8007a37a>] __do_sys_finit_module+0x8a/0xb6 [ 36.868901] [<ffffffff8007a3ca>] sys_finit_module+0x10/0x18 [ 36.874459] [<ffffffff8000304e>] ret_from_syscall+0x0/0x2 [ 36.879935] ---[ end trace a2b10b06f90ff5d2 ]---
- Author
Got my dev environment set up from torvalds/linux on 02d5e016800d082058b3d3b7c3ede136cdc6ddcb (5.15-rc3-135) and I can still reproduce. Will aim for 5.14.0 and see if the problem persists there, and if so, get down to debugging.
- Author
Still setting up my debugging environment, but I did get something more useful:
[ 86.608872] amdgpu 0000:07:00.0: enabling device (0000 -> 0003) [ 86.614121] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1002:0x0B04 0xFF). [ 86.622791] amdgpu 0000:07:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 86.630906] [drm] register mmio base: 0x60400000 [ 86.635462] [drm] register mmio size: 262144 [ 86.639719] [drm] PCIE atomic ops is not supported [ 86.644510] [drm] add ip block number 0 <vi_common> [ 86.649357] [drm] add ip block number 1 <gmc_v8_0> [ 86.654133] [drm] add ip block number 2 <tonga_ih> [ 86.658909] [drm] add ip block number 3 <gfx_v8_0> [ 86.663686] [drm] add ip block number 4 <sdma_v3_0> [ 86.668554] [drm] add ip block number 5 <powerplay> [ 86.673416] [drm] add ip block number 6 <dm> [ 86.677670] [drm] add ip block number 7 <uvd_v6_0> [ 86.682448] [drm] add ip block number 8 <vce_v3_0> [ 86.905716] amdgpu 0000:07:00.0: amdgpu: Fetched VBIOS from ROM BAR [ 86.911252] amdgpu: ATOM BIOS: xxx-xxx-xxx [ 86.915457] [drm] UVD is enabled in VM mode [ 86.919490] [drm] UVD ENC is enabled in VM mode [ 86.924006] [drm] VCE enabled in VM mode [ 86.927927] [drm] GPU posting now... [ 87.049835] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit [ 87.060652] amdgpu 0000:07:00.0: BAR 2: releasing [mem 0x2010000000-0x20101fffff 64bit pref] [ 87.068378] amdgpu 0000:07:00.0: BAR 0: releasing [mem 0x2000000000-0x200fffffff 64bit pref] [ 87.076829] pcieport 0000:02:08.0: BAR 9: releasing [mem 0x2000000000-0x2017ffffff 64bit pref] [ 87.085388] pcieport 0000:01:00.0: BAR 9: releasing [mem 0x2000000000-0x2017ffffff 64bit pref] [ 87.093982] pcieport 0000:00:00.0: BAR 9: releasing [mem 0x2000000000-0x2017ffffff 64bit pref] [ 87.102621] pcieport 0000:00:00.0: BAR 9: assigned [mem 0x2000000000-0x20bfffffff 64bit pref] [ 87.111096] pcieport 0000:01:00.0: BAR 9: assigned [mem 0x2000000000-0x20bfffffff 64bit pref] [ 87.119607] pcieport 0000:02:08.0: BAR 9: assigned [mem 0x2000000000-0x20bfffffff 64bit pref] [ 87.128123] amdgpu 0000:07:00.0: BAR 0: assigned [mem 0x2000000000-0x207fffffff 64bit pref] [ 87.136472] amdgpu 0000:07:00.0: BAR 2: assigned [mem 0x2080000000-0x20801fffff 64bit pref] [ 87.144809] pcieport 0000:00:00.0: PCI bridge to [bus 01-07] [ 87.150433] pcieport 0000:00:00.0: bridge window [io 0x0000-0x0fff] [ 87.156946] pcieport 0000:00:00.0: bridge window [mem 0x60200000-0x604fffff] [ 87.164163] pcieport 0000:00:00.0: bridge window [mem 0x2000000000-0x20bfffffff 64bit pref] [ 87.172681] pcieport 0000:01:00.0: PCI bridge to [bus 02-07] [ 87.178315] pcieport 0000:01:00.0: bridge window [io 0x0000-0x0fff] [ 87.184831] pcieport 0000:01:00.0: bridge window [mem 0x60200000-0x604fffff] [ 87.192045] pcieport 0000:01:00.0: bridge window [mem 0x2000000000-0x20bfffffff 64bit pref] [ 87.200560] pcieport 0000:02:08.0: PCI bridge to [bus 07] [ 87.205934] pcieport 0000:02:08.0: bridge window [io 0x0000-0x0fff] [ 87.212452] pcieport 0000:02:08.0: bridge window [mem 0x60400000-0x604fffff] [ 87.219664] pcieport 0000:02:08.0: bridge window [mem 0x2000000000-0x20bfffffff 64bit pref] [ 87.228195] amdgpu 0000:07:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used) [ 87.237730] amdgpu 0000:07:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF [ 87.246070] [drm] Detected VRAM RAM=2048M, BAR=2048M [ 87.251010] [drm] RAM width 128bits GDDR5 [ 87.273792] [drm] amdgpu: 2048M of VRAM memory ready [ 87.278068] [drm] amdgpu: 3072M of GTT memory ready. [ 87.283010] [drm] GART: num cpu pages 65536, num gpu pages 65536 [ 87.289913] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). [ 87.298961] [drm] Chained IB support enabled! [ 87.321334] amdgpu: hwmgr_sw_init smu backed is polaris10_smu [ 87.330502] [drm] Found UVD firmware Version: 1.130 Family ID: 16 [ 87.346101] [drm] Found VCE firmware Version: 53.26 Binary ID: 3 [ 87.618327] amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) [ 88.096268] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110 [ 88.572912] amdgpu 0000:07:00.0: amdgpu: amdgpu_device_ip_init failed [ 88.579315] amdgpu 0000:07:00.0: amdgpu: Fatal error during GPU init [ 88.585667] amdgpu 0000:07:00.0: amdgpu: amdgpu: finishing device. [ 88.608780] amdgpu: probe of 0000:07:00.0 failed with error -110 [ 88.629404] Unable to handle kernel paging request at virtual address ffffffd115454000 [ 88.636742] Oops [#1] [ 88.638852] Modules linked in: amdgpu(+) mfd_core gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm drm_panel_orientation_quirks tcp_bbr nls_utf8 nls_cp437 vfat fat af_packet tun evdev snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore mscc macb phylink efivarfs xhci_pci xhci_hcd nvme nvme_core hwmon loop usb_storage usbcore usb_common [ 88.682424] CPU: 3 PID: 1881 Comm: modprobe Not tainted 5.15.0-rc3dev+ #4 [ 88.689144] Hardware name: SiFive HiFive Unmatched A00 (DT) [ 88.694704] epc : __memcpy+0x3c/0xf8 [ 88.698263] ra : amdgpu_uvd_suspend+0x15a/0x186 [amdgpu] [ 89.173482] epc : ffffffff80330e28 ra : ffffffff025a46ba sp : ffffffd004263870 [ 89.180685] gp : ffffffff80f3d220 tp : ffffffe080d6e300 t0 : ffffffe09fe00000 [ 89.187893] t1 : 00000000007fffff t2 : 0000000000000000 s0 : ffffffd004263900 [ 89.195102] s1 : ffffffe089d20000 a0 : ffffffe09fa00000 a1 : ffffffd115454000 [ 89.202312] a2 : 00000000002c2000 a3 : ffffffd115716000 a4 : 00000000002c2000 [ 89.209520] a5 : 0000000000000000 a6 : ffffffff80f2c700 a7 : 0000000000000000 [ 89.216730] s2 : ffffffe089d2f000 s3 : ffffffe089d30000 s4 : 000000000000f000 [ 89.223939] s5 : 0000000000000928 s6 : 00000000ffffffff s7 : 0000000000001000 [ 89.231148] s8 : ffffffe089d2f000 s9 : 0000000000000000 s10: 00000000002c2000 [ 89.238357] s11: ffffffd115454000 t3 : ffffffffffffffff t4 : ffffffff80f8ed20 [ 89.245567] t5 : 0000000000000000 t6 : ffffffe09fa00000 [ 89.250859] status: 0000000200000120 badaddr: ffffffd115454000 cause: 000000000000000d [ 89.258771] [<ffffffff80330e28>] __memcpy+0x3c/0xf8 [ 89.263633] [<ffffffff025a7996>] uvd_v6_0_sw_fini+0x1c/0x8c [amdgpu] [ 89.738324] [<ffffffff024e6dd4>] amdgpu_device_fini_sw+0xd2/0x2c2 [amdgpu] [ 90.213565] [<ffffffff024ebff0>] amdgpu_driver_release_kms+0x16/0x28 [amdgpu] [ 90.689044] [<ffffffff01cf9b6c>] devm_drm_dev_init_release+0x40/0x6c [drm] [ 90.712649] [<ffffffff803b9ad2>] devm_action_release+0xe/0x16 [ 90.718369] [<ffffffff803bac48>] devres_release_all+0x88/0xce [ 90.724102] [<ffffffff803b6724>] really_probe.part.0+0xc2/0x22c [ 90.730008] [<ffffffff803b68fe>] __driver_probe_device+0x70/0xde [ 90.736001] [<ffffffff803b699a>] driver_probe_device+0x2e/0xf6 [ 90.741821] [<ffffffff803b6fb6>] __driver_attach+0x88/0x142 [ 90.747379] [<ffffffff803b4be6>] bus_for_each_dev+0x52/0x8c [ 90.752938] [<ffffffff803b6114>] driver_attach+0x1a/0x22 [ 90.758236] [<ffffffff803b5c34>] bus_add_driver+0xca/0x17c [ 90.763708] [<ffffffff803b75e0>] driver_register+0x48/0xd8 [ 90.769180] [<ffffffff803486f4>] __pci_register_driver+0x40/0x48 [ 90.775177] [<ffffffff041ca082>] amdgpu_init+0x82/0x1000 [amdgpu] [ 91.248795] [<ffffffff800020ea>] do_one_initcall+0x3e/0x168 [ 91.254342] [<ffffffff8007eae4>] do_init_module+0x46/0x220 [ 91.259813] [<ffffffff800808d8>] load_module+0x1b72/0x20ce [ 91.265284] [<ffffffff80081002>] __do_sys_finit_module+0x8a/0xb6 [ 91.271278] [<ffffffff8008103e>] sys_finit_module+0x10/0x18 [ 91.276837] [<ffffffff80003052>] ret_from_syscall+0x0/0x2 [ 91.282381] ---[ end trace 78337b2d79eb1a8e ]---
- Owner
This is the relevant error:
[ 87.618327] amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) [ 88.096268] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110
Is the PCIe bus on this platform cache coherent with the CPU? What the test does it write commands to a ring buffer in system memory and trigger the GPU to start consuming the commands. The GPU consumes the commands which tells it to write a a specific value to a register. The driver then checks the register to make sure the GPU processed the commands and updated the register. If the PCIe bus is not cache coherent, the snoop of the CPU cache from the GPU might not work and the GPU will fetch garbage if the data has not hit memory yet. I'm not too familiar with RISC-V hardware, but I know on some ARM platforms, the PCIe bus is not cache coherent if the platform vendor has not included the necessary IPs in their ARM design. RISC-V might be similar.
- Author
Thanks for the tips. Lots of ideas I'm unfamiliar with there, so I'll get back to you after some research.
The manual for this platform is here, and may offer some insights:
https://sifive.cdn.prismic.io/sifive/de1491e5-077c-461d-9605-e8a0ce57337d_fu740-c000-manual-v1p3.pdf
It claims to be an "IO coherent/one-way coherent master into the L2 cache of the system".
If this turns out to be the case, is there some kind of remediation path that makes sense to investigate?
Drew pointed this out on IRC. Just from a quick look at gfx_v8_0_ring_test_ring() (which IIUC is the test that's blowing up here), this is the sort of thing that could be a result of us not enforcing the various PCIe ordering constraints (things like posted writes). I don't know nearly enough about amdgpu or PCIe to really put a stake in the ground here, but this pattern of "write to a PCIe device to start a command, then start a timer to see that command finishes on time" can get messed up if we're silently ignoring things like posted writes (or non-posted writes? I always forget which is which).
It's entirely possible this is just some simple screw-up in arch/riscv or whatever, though -- I bet there's a lot of those ;).
- Developer
This issue hasn't had any activity since 2021-10-01. The AMD driver stack changes rapidly and contains lots of shared code across products so it's possible that it has already been fixed. Please upgrade to a current stable kernel and userspace stack and try again. If you still experience this issue with the latest driver stack, please capture relevant logging and open a new issue referring back to this one.
- Mario Limonciello closed
closed