Crash / freeze when switching between GUI desktop and tty on Artix, Arch and Devuan
Switching to a tty console results in an empty black screen perhaps 25% of the time, although you can get a sequence of it working or not. This freeze can usually be exited by pressing Fn + F10 (KB backlight) to cycle the keyboard backlight to off, mid and full, then Fn + Insert (sleep) and after it has gone to sleep, press the power button to wake up and it returns to the expected state showing the tty. This is true for the current Zen kernel and LTS kernel, but it didn't happen with an older kernel. I tried updating the BIOS to the latest version and disabling all the performance options and virtualization support in the BIOS, forcing the CPU to 100% all the time with one core and thread, with no change. It is worse using the linux-git kernel and happens almost all the time.
First I tried the kernels available in the Artix archive, the problem started between these two versions:
Bad: linux-5.16.1.artix1-1-x86_64.pkg.tar.zst 16-Jan-2022 13:55
Good: linux-5.15.12.artix1-1-x86_64.pkg.tar.zst 30-Dec-2021 12:03
Then I tried the Arch archive:
Bad: linux-5.16.arch1-1-x86_64.pkg.tar.zst
Good: linux-5.15.13.arch1-1-x86_64.pkg.tar.zst
Then I bisected the kernel: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/+/8e07757bece6e81b0b0910358ebceca3032bc6c7%5E%21/#F0 It seemed this commit was where the problem started: commit 8e07757bece6e81b0b0910358ebceca3032bc6c7 (HEAD) Author: Shyam Prasad N sprasad@microsoft.com Date: Mon Jul 19 10:03:38 2021 +0000
cifs: do not negotiate session if session already exists
Reverting the patch in linux-git did not fix the issue, but changed / improved it. It seemed there were a lot of changes in this section of code including the function in question, the file had even moved to fs/smb/client/connect.c from it's previous location. Unmodified linux-git would never recover by pressing sleep, it required a forced power off. With the reverted patch sleep then resume almost always worked but still it often went to a black screen when switching to a tty. On recovery this message can sometimes be seen on the tty and in syslog: __common_interrupt: 0.55 No irq handler for vector
I also found on a Dell M4400 a possible variation of the problem, when switching from a tty back to the desktop, the desktop crashes back to the login tty, as I have startx and autologin set up it then restarts automatically although any previously open apps were forcibly terminated in the crash. This happens only about say one in 50 times so it is not so obvious. But on a Dell M4500, switching tty works normally. There was a similar bug (fixed now) in December 2021 relating to xorg and xorg-server: https://bbs.archlinux.org/viewtopic.php?id=272327 This does not seem to happen when I use either a 5.15.12 or 5.16.1 Artix kernel on the M4400, so it was not triggered by the exact same commit on there.
I added a printk to the commit revert to try and find out when this was getting called, as I had not found any sign of it using trace-cmd when switching to a tty. But the printk output never appears in dmesg or syslog, even after switching to a tty and freezing - or not. So it looks like this section of code is not used. I don't use any samba things and also did -Rs samba caja-share to remove the samba package and related items, which has not changed the situation either. I guess from this that some kind of buffer overflow / memory corruption / stack smashing type situation could be loading this part of the kernel code due to an error elsewhere, even though it is not being used in the normal course of events?
I installed Gnome desktop (which uses wayland) and gdm, starting the Mate desktop with gdm still has the problem, so it isn't xinitrc, but Gnome using Wayland is not affected, while Gnome using Xorg is, so it seems to be an Xorg bug, or at least Xorg related. Then I tried downgrading various xorg components to see if older versions behaved differently, but they didn't, and beyond a certain point it became difficult with an otherwise updated system due to changes in libxcvt and the switch in Artix away from eudev. I went back as far as these versions with no change in behaviour: xf86-input-evdev-2.10.6-2.1-x86_64.pkg.tar.zst xf86-video-fbdev-0.5.0-2.1-x86_64.pkg.tar.zst xorg-server-xephyr-1.20.13-3-x86_64.pkg.tar.zst xf86-input-libinput-1.2.0-1-x86_64.pkg.tar.zst 'xf86-video-intel-1 2.99.917+916+g31486f40-1.1-x86_64.pkg.tar.zst' xorg-server-xnest-1.20.13-3-x86_64.pkg.tar.zst xf86-input-synaptics-1.9.1-2-x86_64.pkg.tar.zst xorg-server-1.20.13-3-x86_64.pkg.tar.zst xorg-server-xvfb-1.20.13-3-x86_64.pkg.tar.zst xf86-input-vmmouse-13.1.0-5.1-x86_64.pkg.tar.zst xorg-server-common-1.20.13-3-x86_64.pkg.tar.zst xf86-video-dummy-0.3.8-4.1-x86_64.pkg.tar.zst xorg-server-devel-1.20.13-3-x86_64.pkg.tar.zst
Arch Linux (with systemd) has exactly the same problem. Devuan Daedalus is slightly different - you can switch to another tty, but cannot then switch back to tty7 and the GUI display, either using ctrl alt f7 or chvt. Trying to do so results in the cursor on the tty you are on freezing and becoming unresponsive. But here you can switch back to the existing tty, say tty2, and continue as you were, it is just impossible to return to tty7 once you have left it. In the same way as Artix, using Gnome Wayland works normally but Gnome Xorg exhibits the bug. Switching to tty1 is still possible where GDM starts up though.
This is using a Dell E7470 Ultrabook, hw details:
System: Kernel: 5.8.14-artix1-1 arch: x86_64 bits: 64 compiler: gcc v: 10.2.0 Desktop: MATE v: 1.27.0 Distro: Artix Linux base: Arch Linux Machine: Type: Laptop System: Dell product: Latitude E7470 v: N/A serial: Mobo: Dell model: 0T6HHJ v: A00 serial: UEFI: Dell v: 1.36.3 date: 09/18/2022 Battery: ID-1: BAT0 charge: 30.4 Wh (100.0%) condition: 30.4/55.0 Wh (55.2%) volts: 8.6 min: 7.6 model: LGC-LGC3.65 DELL 242WD6C status: full CPU: Info: dual core model: Intel Core i7-6600U bits: 64 type: MT MCP arch: Skylake rev: 3 cache: L1: 128 KiB L2: 512 KiB L3: 4 MiB Speed (MHz): avg: 2645 high: 2882 min/max: 400/3400 cores: 1: 2808 2: 2882 3: 2550 4: 2340 bogomips: 22408 Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 Graphics: Device-1: Intel Skylake GT2 [HD Graphics 520] vendor: Dell Latitude E7470 driver: i915 v: kernel arch: Gen-9 bus-ID: 00:02.0 Display: server: X.Org v: 21.1.8 with: Xwayland v: 23.1.2 driver: X: loaded: intel unloaded: fbdev,modesetting dri: i965 gpu: i915 resolution: 1920x1080~60Hz API: OpenGL Message: Unable to show GL data. Required tool glxinfo missing. Network: Device-1: Intel Wireless 8260 driver: iwlwifi v: kernel bus-ID: 01:00.0 IF: wlan0 state: down mac: Drives: Local Storage: total: 238.47 GiB used: 71.3 GiB (29.9%) ID-1: /dev/nvme0n1 vendor: Western Digital model: PC SN720 SDAPNTW-256G-1016 size: 238.47 GiB temp: 40.9 C Partition: ID-1: / size: 120 GiB used: 71.07 GiB (59.2%) fs: btrfs dev: /dev/nvme0n1p6 ID-2: /boot size: 500 MiB used: 228.7 MiB (45.7%) fs: btrfs dev: /dev/nvme0n1p2 ID-3: /boot/efi size: 299.4 MiB used: 292 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 ID-4: swap-1 size: 16 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p3 Info: Processes: 199 Uptime: 11m Memory: available: 7.67 GiB used: 680.6 MiB (8.7%) Init: OpenRC runlevel: default Compilers: gcc: 13.1.1 Packages: 707 Shell: Bash v: 5.1.16 inxi: 3.3.27
[](oldkernelswitchingttynoproblemXorg.0.log.gzexample-of-ttyblackscreen-Xorg.0.log.gzcommitrevertpatch.diff.gz