Skip to content

serial: reopen when fails to re-acquire after disconnection

Greetings!

Over the past week I've been experiencing problems with a Telit HE910-D modem regarding reconnection after an LCP termination request had been received over PPP. I believe the modem model is not relevant in this case, but rather that it uses PPP over a ttyACM interface. Still, I include it here for the sake of information.

  • ModemManager version: 1.8.2
  • NetworkManager version: 1.16.0
  • pppd version: 2.4.7

Relevant log entries of the LCP termination request on NetworkManager can be found on this file. nm_log.txt

Relevant log entries of the problem from the ModemManager perspective can be found on this file. mm_log.txt

As can be seen in the NM logs, the termination request is followed by shutdown of the pppd process, which is supposed to restore settings of the ttyACM0 port. Simultaneously, MM is actually monitoring the connection status (signal quality) through the secondary port (ttyACM3) via "CGACT?" AT command.

For some reason, after disconnection happens MM is unable to re-acquire the lock on the ttyACM0 port, which leads up to eventually marking the forced_close flag. After that, MM will never reuse this port and the only solution is to restart it.

Note that "(ttyACM0): could not re-acquire serial port lock: (5) Input/output error" log entry happens before proper shutdown of pppd.

While looking for a solution I've found a number of similar cases, some already patched on NM regarding a possible race condition between MM and pppd over control of the ttyACM port. (@aleksm is likely to remember those issues). Some links/mail lists:

However, after applying the patch from the last link (which is the only one my NM version didn't have), the problem still persisted.

Looking further I've found that MM disconnection was being triggered not from the NM notification, but rather from the connection status monitoring of the CGACT command.

After some exploring I reached the port_connected function which had been connected to the MM_PORT_CONNECTED signal, triggered from the chain of calls resulting from the loss of PDP context on CGACT and immediately trying to reacquire the lock, regardless of pppd. I happened to notice the // FIXME on the exact situation I had found myself into, which was kinda funny.

My first attempt was to implement the FIXME suggestion of trying again later, so I've had a task re-call port_connected up to 5 times with a 1s period to reacquire the lock after pppd has asynchronously restored it.

I don't know why it didn't work, some insights would be appreciated. I have noticed that even after pppd was fully shutdown the port settings were never restored (line discipline was still PPP and without CLOCAL flag), so setting TIOCEXCL failed again and again.

I've reached the final solution of simply reopening the port after noticing that a graceful disconnection on my side (a simple nmcli c down ) would not go through re acquisition of the port, but would simply close and reopen it. I then proceeded to reopen it on port_connected() if ioctl failed and we're disconnected. The problem was then gone.

I apologize for the long text, but as the patch is really short I found myself compelled to a detailed explanation.

Also, I've found/solved this on MM version 1.8.2, but I'm submitting this into master since there haven't been any apparent fixes and I'm unfamiliar with the contribution guidelines of MM.

Insights appreciated, as I've no idea if this would be harmful in some other situation

Regards, Alex

Merge request reports