libslirp sends RST to app in response to arriving FIN when containerized socket is shutdown() with SHUT_WR

When podman runs a rootless application in a container serving as a TCP client utilizing slirp4netns, and when that client calls shutdown() with SHUT_WR to no longer send data, but be able to receive it only, if the serverside TCP connection sends a FIN to the client running in the container, the FIN will be received by the application as a RST.

This is easily reproduced in upstream slirp4netns outside of podman.

To reproduce, create a small python client application that calls shutdown but continues to read from the socket, ensuring it sends a FIN:

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(("myserver.example.com", 9999))
sock.shutdown(socket.SHUT_WR)
while True:
	data = sock.recv(1024)
	print(data)

On the server system, create a server application that blindly sends data:

import time
import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("0.0.0.0", 9999))
sock.listen(128)
while True:
	client, address = sock.accept()
	client.send("Accepted.. sending 3 pulses and closing the socket...")
	for i in range(2):
	client.send("Pulse " + str(i))
	time.sleep(5)
	client.close()

After the applications are in place, create a network namespace independant of root rootns and run slirp4netns on it:

Make the namespace as a non-root user:

$ unshare --user --map-root-user --net --mount

Get the PID of the bash process in the namespace:

$ echo $$
15307

In a separate terminal, run slirp4netns to set up the userspace TCP/IP stack for that PID, and create the tap devices:

$ slirp4netns --configure --mtu=65520 --disable-host-loopback 15307 tap0
sent tapfd=5 for tap0
received tapfd=5
Starting slirp
* MTU:             65520
* Network:         10.0.2.0
* Netmask:         255.255.255.0
* Gateway:         10.0.2.2
* DNS:             10.0.2.3
* Recommended IP:  10.0.2.100

Start the server application on the serverside:

root@myserver # python server.py

Perform packet captures to observe traffic inside of the slirp network namespace, and in the root namespace on the container host:

# tcpdump -i eth0 port 9999 -w /tmp/host.pcap

# nsenter -t 15307 -n tcpdump -i tap0 port 9999 /tmp/slirp.pcap

Inside of the namespace as the non-root user, run the client application and observe exception by RST TCP connection:

# python client.py 
Accepted.. sending 3 pulses and closing the socket...
Pulse 0
Pulse 1
Traceback (most recent call last):
  File "client.py", line 7, in <module>
	data = sock.recv(1024)
socket.error: [Errno 104] Connection reset by peer

Inspect packet captures, see the host side packet capture shows a FIN being received (frame 12):
$ tshark -te -r /tmp/host.pcap
	1 1571428959.091848  10.3.117.71 → 10.10.93.144 TCP 60 43108 → 9999 [SYN] Seq=0 Win=64680 Len=0 MSS=1320 SACK_PERM=1 TSval=784810176 TSecr=0 WS=128
	2 1571428959.160007 10.10.93.144 → 10.3.117.71  TCP 60 9999 → 43108 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1320 SACK_PERM=1 TSval=268662887 TSecr=784810176 WS=128
	3 1571428959.160151  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=1 Ack=1 Win=64768 Len=0 TSval=784810245 TSecr=268662887
	4 1571428959.160657  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [FIN, ACK] Seq=1 Ack=1 Win=64768 Len=0 TSval=784810245 TSecr=268662887
	5 1571428959.228354 10.10.93.144 → 10.3.117.71  TCP 105 9999 → 43108 [PSH, ACK] Seq=1 Ack=1 Win=29056 Len=53 TSval=268662955 TSecr=784810245
	6 1571428959.228440  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=54 Win=64768 Len=0 TSval=784810313 TSecr=268662955
	7 1571428959.228489 10.10.93.144 → 10.3.117.71  TCP 52 9999 → 43108 [ACK] Seq=54 Ack=2 Win=29056 Len=0 TSval=268662956 TSecr=784810245
	8 1571428959.296303 10.10.93.144 → 10.3.117.71  TCP 59 9999 → 43108 [PSH, ACK] Seq=54 Ack=2 Win=29056 Len=7 TSval=268663023 TSecr=784810313
	9 1571428959.296401  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=61 Win=64768 Len=0 TSval=784810381 TSecr=268663023
   10 1571428964.233675 10.10.93.144 → 10.3.117.71  TCP 59 9999 → 43108 [PSH, ACK] Seq=61 Ack=2 Win=29056 Len=7 TSval=268667961 TSecr=784810381
   11 1571428964.233781  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=68 Win=64768 Len=0 TSval=784815318 TSecr=268667961
   12 1571428969.238628 10.10.93.144 → 10.3.117.71  TCP 52 9999 → 43108 [FIN, ACK] Seq=68 Ack=2 Win=29056 Len=0 TSval=268672966 TSecr=784815318
   13 1571428969.238662  10.3.117.71 → 10.10.93.144 TCP 52 43108 → 9999 [ACK] Seq=2 Ack=69 Win=64768 Len=0 TSval=784820323 TSecr=268672966

But in the slirp packet capture, the connection is reset:

$ tshark -te -r /tmp/slirp.pcap
	1 1571428959.090986   10.0.2.100 → 10.10.93.144 TCP 74 51074 → 9999 [SYN] Seq=0 Win=65480 Len=0 MSS=65480 SACK_PERM=1 TSval=2681814203 TSecr=0 WS=128
	2 1571428959.160236 10.10.93.144 → 10.0.2.100   TCP 58 9999 → 51074 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=65480
	3 1571428959.160324   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=1 Ack=1 Win=65480 Len=0
	4 1571428959.160532   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [FIN, ACK] Seq=1 Ack=1 Win=65480 Len=0
	5 1571428959.160709 10.10.93.144 → 10.0.2.100   TCP 54 9999 → 51074 [ACK] Seq=1 Ack=2 Win=65535 Len=0
	6 1571428959.228560 10.10.93.144 → 10.0.2.100   TCP 107 9999 → 51074 [PSH, ACK] Seq=1 Ack=2 Win=65535 Len=53
	7 1571428959.228619   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=54 Win=65427 Len=0
	8 1571428959.296468 10.10.93.144 → 10.0.2.100   TCP 61 9999 → 51074 [PSH, ACK] Seq=54 Ack=2 Win=65535 Len=7
	9 1571428959.296524   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=61 Win=65420 Len=0
   10 1571428964.233847 10.10.93.144 → 10.0.2.100   TCP 61 9999 → 51074 [PSH, ACK] Seq=61 Ack=2 Win=65535 Len=7
   11 1571428964.233903   10.0.2.100 → 10.10.93.144 TCP 54 51074 → 9999 [ACK] Seq=2 Ack=68 Win=65413 Len=0
   12 1571428969.238721 10.10.93.144 → 10.0.2.100   TCP 54 9999 → 51074 [RST, ACK] Seq=68 Ack=2 Win=65535 Len=0

The connection is either torn down too soon by slirp4netns due to the early shutdown that the client receives a RST, or some other related issue to receiving a FIN after shutdown() is called.

While all data is received by the client application in this scenario, it does throw exceptions in client applications that are gracefully shutdown.

Giuseppe Scrivano has authored a patch that resolves this scenario, and additional debugging steps are present in Red Hat bug bz1763454. He'll follow up this issue with a PR shortly.

Admin message

libslirp sends RST to app in response to arriving FIN when containerized socket is shutdown() with SHUT_WR