Due to an influx of spam, we have had to impose restrictions on new accounts. Please see this wiki page for instructions on how to get full permissions. Sorry for the inconvenience.
Admin message
The migration is almost done, at least the rest should happen in the background. There are still a few technical difference between the old cluster and the new ones, and they are summarized in this issue. Please pay attention to the TL:DR at the end of the comment.
Disabling the IOMMU is probably the best option. Disabling S/G display support means all display buffers must be in carve out memory which many be really limited on some platforms.
You have do disable IOMMU in the bios or via kernel command line (iommu=pt or iommu=off). As for scatter/gather display, that allows the GPU to use buffers in system memory for display rather than just carve out (vram). On systems with limited carve out, this limits the size and number of displays which can be supported. There is no direct way to disable S/G display support. It only works with DC, so disabling DC (amdgpu.dc=0) will disable S/G display support.
With the bigger vram size, the display buffers end up in vram rather than system ram so it doesn't matter if S/G display is enabled or not. We only use S/G display when there is not enough vram to support the requested display surfaces. Windows doesn't use the IOMMU, that's why it works regardless of the vram size.
If the IVMD_FLAG_EXCL_RANGE is not flagged, set_device_exclusion_range() is never called.
AMD IOMMU allows device accesses that target addresses in the exclusion range (set_device_exclusion_range()) are neither translated nor access checked if the EX bit in the Device Table is set for the device, which means the addresses of the exclusion range are identity-mapped. And, the IVRS (I/O Virtualization Reporting Structure) table of ACPI should define the corresponding IVMD entry with exclusion range. So, set_device_exclusion_range() is not used to exclude one PCI device from IOMMU. When enabling IOMMU, all target addresses should be translated by IOMMU except for the exclusion range configured by set_device_exclusion_range().
BTW, may I have the acpi table and lspci log to check?
Yes, I know that set_device_exclusion_range() can't exclude one PCI device from IOMMU, but I guess it's still possible to achieve that if we can get the calculate the range and use a modified version of set_device_exclusion_range() to ignore the flag.