Xe driver OOPS on aarch64 with Intel Arc 750
I understand that making it work on non-x86 architectures is probably not a goal at this moment, but I've still decided to try compiling it. I also understand, that running that driver on Xe is also not supported, but I hope it would still improve ability to use those cards on more different systems in a future
Currently if you try to run Intel Arc 750 on aarch64 system (Ampere Altra in my case) you'll get following OOPS (kernel 6.9-rc6):
[ 57.555741] xe 0004:04:00.0: [drm] Using GuC firmware from i915/dg2_guc_70.bin version 70.20.0
[ 57.581832] xe 0004:04:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[ 57.581848] Unable to handle kernel paging request at virtual address ffffffffc08003cc
[ 57.589768] Mem abort info:
[ 57.592561] ESR = 0x0000000096000006
[ 57.596305] EC = 0x25: DABT (current EL), IL = 32 bits
[ 57.601615] SET = 0, FnV = 0
[ 57.604663] EA = 0, S1PTW = 0
[ 57.607794] FSC = 0x06: level 2 translation fault
[ 57.612668] Data abort info:
[ 57.615538] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[ 57.621026] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 57.626075] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 57.631384] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000801c6b7d000
[ 57.638083] [ffffffffc08003cc] pgd=1800081fffcc6003, p4d=00000801c760a003, pud=00000801c760b003, pmd=0000000000000000
[ 57.648696] Internal error: Oops: 0000000096000006 [#1] SMP
[ 57.654259] Modules linked in: xe(+) snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core aes_ce_blk aes_ce_cipher polyval_ce snd_hwdep polyval_generic snd_pcm ghash_ce gf128mul snd_timer acpi_ipmi snd sha2_ce soundcore sha256_arm64 ipmi_ssif sbsa_gwdt sha1_ce arm_spe_pmu ipmi_devintf binfmt_misc arm_cmn ipmi_msghandler xgene_hwmon nls_ascii nls_cp437 vfat fat arm_dsu_pmu cppc_cpufreq joydev acpi_tad evdev dm_mod loop efi_pstore dax configfs nfnetlink efivarfs ip_tables x_tables autofs4 ext4 crc32c_generic crc16 cdc_ether mbcache usbnet jbd2 mii hid_generic usbhid hid drm_gpuvm drm_exec drm_buddy gpu_sched nvme video drm_suballoc_helper drm_ttm_helper ttm cec nvme_core rc_core ixgbe t10_pi drm_display_helper xhci_pci xfrm_algo mdio_devres xhci_hcd drm_kms_helper of_mdio crc64_rocksoft fixed_phy crc64 fwnode_mdio crc_t10dif drm usbcore igb crct10dif_generic libphy crct10dif_ce crct10dif_common mdio usb_common i2c_algo_bit i2c_designware_platform
[ 57.654393] i2c_designware_core [last unloaded: xe]
[ 57.748068] CPU: 0 PID: 7 Comm: kworker/0:0 Tainted: G U 6.9.0-rc6+ #4
[ 57.755975] Hardware name: ALTRAD8UD-1L2T/ALTRAD8UD-1L2T, BIOS 2.05 04/12/2024
[ 57.763272] Workqueue: events work_for_cpu_fn
[ 57.767623] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 57.774575] pc : logic_inb+0xa0/0xe0
[ 57.778145] lr : intel_vga_reset_io_mem+0x38/0x68 [xe]
[ 57.783547] sp : ffff8000802dbaa0
[ 57.786850] x29: ffff8000802dbaa0 x28: ffffa9bdd90e9928 x27: 0000000000000000
[ 57.793978] x26: 0000000000000001 x25: 0000000000000002 x24: ffff07ff82cfb0c8
[ 57.801105] x23: ffffa9bdd90f6c38 x22: ffff07ffa8901000 x21: 0000000000000000
[ 57.808232] x20: 0000000000000000 x19: ffff07ff82cfb000 x18: ffffffffffffffff
[ 57.815358] x17: 2c6d656d2b6f693d x16: ffffa9be0b0be3c8 x15: 6c6f203a6465676e
[ 57.822484] x14: 616863207365646f x13: 656e6f6e3d736e77 x12: ffff081eef6e0000
[ 57.829610] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffa9be0aa2702c
[ 57.836737] x8 : c0000000ffffbfff x7 : ffffa9be0ca1c120 x6 : 00000000000000ff
[ 57.843863] x5 : ffffa9be0b659f10 x4 : 000000000000000a x3 : 0000000000000000
[ 57.850989] x2 : 0000000000ffbffe x1 : 00000000000003cc x0 : ffffffffc08003cc
[ 57.858116] Call trace:
[ 57.860552] logic_inb+0xa0/0xe0
[ 57.863772] hsw_power_well_enable+0x198/0x288 [xe]
[ 57.868900] intel_power_well_enable+0x74/0x98 [xe]
[ 57.874019] intel_power_well_get+0x2c/0x40 [xe]
[ 57.878873] __intel_display_power_get_domain.part.0+0x7c/0xd0 [xe]
[ 57.885375] intel_display_power_get+0x5c/0x98 [xe]
[ 57.890487] intel_power_domains_init_hw+0x64/0x320 [xe]
[ 57.896031] intel_display_driver_probe_noirq+0xa0/0x1f8 [xe]
[ 57.902008] xe_display_init_noirq+0x58/0x90 [xe]
[ 57.906945] xe_device_probe+0x248/0x4e8 [xe]
[ 57.911538] xe_pci_probe+0x5d8/0x918 [xe]
[ 57.915866] local_pci_probe+0x48/0xb8
[ 57.919610] work_for_cpu_fn+0x24/0x40
[ 57.923349] process_one_work+0x18c/0x400
[ 57.927350] worker_thread+0x204/0x420
[ 57.931090] kthread+0xe8/0xf8
[ 57.934135] ret_from_fork+0x10/0x20
[ 57.937705] Code: d65f03c0 929fffe0 f2b81000 8b000020 (39400000)
[ 57.943787] ---[ end trace 0000000000000000 ]---
The root cause, as far as I understood, is that eventually, the driver will try to initialise VGA. I'm not sure it should've been done in the first place, but it calls intel_vga_reset_io_mem
from drm/i915/display/intel_vga.c
On ARM (unless you have one specific and outdated platform), you can't even compile VGA_CONSOLE, so all the work to ensure it is working is futile.
So following patch will ensure that if VGA_CONSOLE is not set, driver won't even try touching those registers:
diff --git a/drivers/gpu/drm/i915/display/intel_vga.c b/drivers/gpu/drm/i915/display/intel_vga.c
index 4b98833bf..80f743f4c 100644
--- a/drivers/gpu/drm/i915/display/intel_vga.c
+++ b/drivers/gpu/drm/i915/display/intel_vga.c
@@ -80,6 +80,7 @@ void intel_vga_redisable(struct drm_i915_private *i915)
void intel_vga_reset_io_mem(struct drm_i915_private *i915)
{
+#if defined(CONFIG_VGA_CONSOLE)
struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
/*
@@ -95,6 +96,7 @@ void intel_vga_reset_io_mem(struct drm_i915_private *i915)
vga_get_uninterruptible(pdev, VGA_RSRC_LEGACY_IO);
outb(inb(VGA_MIS_R), VGA_MIS_W);
vga_put(pdev, VGA_RSRC_LEGACY_IO);
+#endif
}
int intel_vga_register(struct drm_i915_private *i915)
--
2.43.0
If you apply it, modesetting starts working on aarch64, and I could've even got a picture out of it (2D Only, but that is a story for another bug):
Unfortunately, I can't fork and send PR, but I think it is a simple fix that would make it possible to at least try the card out.