[DG2] Feature Request: Expose Thermal and Fan Information of DG2 (Intel Arc A750 GPU) in User Space.
Currently, only limited information is exposed to user space through the i915-pci-0300(lm_sensors) interface, which are:
i915-pci-0300
Adapter: PCI adapter
in0: 0.00 V # This is broken
power1: N/A (max = 190.00 W) # This works in xpu-smi but not here for some reason
energy1: 39.15 kJ
It will be better if GPU thermal and FAN (RPM) monitoring is exposed as they are available in Windows.
xpu-smi also fails to provide this information:
+-----------------------------+--------------------------------------------------------------------+
| Device ID | 0 |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%) | N/A |
| EU Array Active (%) | N/A |
| EU Array Stall (%) | N/A |
| EU Array Idle (%) | N/A |
| | |
| Compute Engine Util (%) | N/A |
| Render Engine Util (%) | Engine 0: 0 |
| Media Engine Util (%) | N/A |
| Decoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Encoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Copy Engine Util (%) | Engine 0: 0 |
| Media EM Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| 3D Engine Util (%) | N/A |
+-----------------------------+--------------------------------------------------------------------+
| Reset | N/A |
| Programming Errors | N/A |
| Driver Errors | N/A |
| Cache Errors Correctable | N/A |
| Cache Errors Uncorrectable | N/A |
| Mem Errors Correctable | N/A |
| Mem Errors Uncorrectable | N/A |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W) | 37 |
| GPU Frequency (MHz) | 600 |
| Media Engine Freq (MHz) | N/A |
| GPU Core Temperature (C) | N/A |
| GPU Memory Temperature (C) | N/A |
| GPU Memory Read (kB/s) | N/A |
| GPU Memory Write (kB/s) | N/A |
| GPU Memory Bandwidth (%) | N/A |
| GPU Memory Used (MiB) | 447 |
| GPU Memory Util (%) | 6 |
| Xe Link Throughput (kB/s) | N/A |
+-----------------------------+--------------------------------------------------------------------+
Exposing this will be a big help in monitoring and identifying situation when the GPU is overheating/thermal throttling. Windows already has it but Linux doesn't.