Skip to content

Daft: clover: implement CLOVER_DEVICE_ENABLE like AMD APP's GPU_DEVICE_ORDINAL

Thomas Debesse requested to merge illwieckz/mesa:clover_device_enable into main

AMD APP (ROCm, PAL, Orca, fglrx) implements GPU_DEVICE_ORDINAL environment variable as a comma-separated list of OpenCL device id numbers.

This implements the same feature for Clover using the CLOVER_DEVICE_ENABLE environment variable.

Example:

CLOVER_DEVICE_ENABLE='0,3' clinfo --list

It is not named GPU_DEVICE_ORDINAL because it is required for the environment variable to be specific to the OpenCL platform, or one cannot enable a device with a platform and enable it with another one.

As an example, Both ROCm and PAL attempts to support the Hawaii GPU but ROCm is faulty and wrecks the kernel, the user can use GPU_DEVICE_ORDINAL to prevent ROCm to drive the Hawaii GPU and prevent a kernel wreckage, but since all AMD platforms use the same environment variable name it also prevents PAL to drives the Hawaii GPU. So one cannot host a Hawaii GPU and a Vega GPU on the same system by enabling Vega and disabling Hawaii in ROCm and enabling Hawaii in PAL as GPU_DEVICE_ORDINAL will disable Hawaii in both platforms.

The platform-specific variable name is meant to not reproduce that mistake.

So the variable is named CLOVER_DEVICE_ENABLE in a way we can implement the same feature with rusticl by naming the variable RUSTICL_DEVICE_ENABLE, so we can both enable GPU A in Clover and disable GPU B in rusticl while disabling GPU A in Clover and enabling GPU B in rusticl.

The variable name doesn't mention any device type (like GPU) on purpose, it is meant to enable/disable any device from a same platform whatever the device type.

The proposed variable name does not make use of the ORDINAL word because it may be possible to extend the format in the future to list hardware addresses or other identifiers that are more predictable. Implementing the feature with id numbers seems to be good enough for now.

Edited by Thomas Debesse

Merge request reports