lockup in dce_i2c_submit_command_hw
Brief summary of the problem:
During ddcutil initialization, it probes each (plausible) /dev/i2c device that has an EDID to see if it supports DDC/CI. In the first phase, the EDID is obtained, either from sysfs or by reading slave address x50. Each device with an EDID is poked on slave address x37 to see if it is responsive. In the second phase, DDC communication is attempted for those devices on which x37 is responsive. Each phase can be parallelized for performance.
Every so often the user interface locks up if both phases are parallelized. It does not occur if the first phase is not parallelized. The UI is unresponsive to keyboard and mouse input (including even e.g. ALT-CTL-F2). However, it is possible to ssh into the system and save dmesg contents. As shown in the dmesg output, the lockup occurs within i2dev_ioctl_rdwr, more specifically within dce_i2c_submit_command.
Hardware description:
- CPU: Intel I7-12700K
- GPU: Navi 10 [Radeon Pro W5700]
- System Memory: 32GB
- Display(s): Samsung U32H750, Dell U3011, NEC P241W, HP Z22i
- Type of Display Connection: DP
System information:
- Distro name and Version: Fedora 40
- Kernel version: Linux banner 6.9.5-200.fc40.x86_64 #1 (closed) SMP PREEMPT_DYNAMIC Sun Jun 16 15:47:09 UTC 2024 x86_64 GNU/Linux
- Custom kernel: N/A
- AMD official driver version: as distributed with kernel
- ddcutil built using branch 2.1.5-dev on github
How to reproduce the issue:
Script that reproduces the bug: loop_ddcutil
Comments:
- Script calls the "ddcutil detect" command in a 999 iteration loop.
- Commonly the lockup occurs in a few tens of iterations.
- Use of command "detect" is somewhat arbitrary. The lockup occurs during initialization when ddcutil probes for /dev/i2c buses with a DDC enabled monitor. This occurs with most commands.
- Option "--i2c-init-async-min 2" causes the first phase to be parallelized. Because of the bug, by default parallelization is disabled.
- On each line, the thread id is shown in brackets.
Attached files:
- Script to reproduce failure: loop_ddcutil
- Source code for the relevant functions: extracted_source_code
Log files:
- Dmesg log: dmesg_output
- syslog: syslog_output
Comments on syslog.
- The lockup occurs on the 17th execution of ddcutil.
- It traces the low level functions that write to and read from /dev/i2c using i2c-dev's ioctl interface.
- For each execution, the initial portion of the output reports the phase 1 probing. On the final execution, the last four lines are from the second phase, showing that ddcutil attempts to send a DDC request packet to each /dev/i2c device with a display (i2c-10,11,12,14). None of the i2c_ioctl_writer() calls return.