Handling TCP Keepalive


TCP Keepalives are useful for scenarios where one end of the connection disappears without closing the connection. This can happen when a NAT or firewall resets, or forcibly closes the connection. The following code in Python enables sending a Keepalive message when there is no data activity over a socket for 60 seconds. If it does not get a response, it sends Keepalive messages 4 times at intervals of 15 seconds. After that, the connection is closed. The code has been tested with Python 2.7.2 and Ubuntu 12.04.

import sys
import socket
import traceback
import time

def do_work():

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    # timeout recv every 5 seconds
    sock.settimeout(5.0) 

    # check and turn on TCP Keepalive
    x = sock.getsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE)
    if (x == 0):
        print 'Socket Keepalive off, turning on'
        x = sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
        print 'setsockopt='+str(x)
        # overrides value (in seconds) shown by sysctl net.ipv4.tcp_keepalive_time
        sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 60)
        # overrides value shown by sysctl net.ipv4.tcp_keepalive_probes
        sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 4)
        # overrides value shown by sysctl net.ipv4.tcp_keepalive_intvl
        sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 15)
    else:
        print 'Socket Keepalive already on'

    try:
        sock.connect(('192.168.0.120', 8001))
 
    except socket.error:
        print 'Socket connect failed!'
        traceback.print_exc()
        return

    print 'Socket connect worked!'
    while True:
        try:
            # read at most 10 bytes (or less)
            req = sock.recv(10)

        except socket.timeout:
            print 'Socket timeout, loop and try recv() again'
            continue

        except:
            traceback.print_exc()
            print 'Other Socket err, exit and try creating socket again'
            # break from loop
            break

        if req == '':
            # connection closed by peer, exit loop
            print 'Connection closed by peer'
            break

        print 'Received', req

    try:
        sock.close()
    except:
        pass    


if __name__ == '__main__':
    do_work()

# references
# http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
# http://www.digi.com/wiki/developer/index.php/Handling_Socket_Error_and_Keepalive

To test, create a TCP listener/server using netcat on a different PC (or nc on a Mac)

netcat -l 192.168.0.120 8001

Now, disable the network connection, enable a firewall, or power off the router, to see Keepalive in action. Wireshark highlights Keepalive messages when TCP sequence number analysis is enabled.

You’ll see the following messages when the connection times out

Traceback (most recent call last):
  File "socket_test.py", line 39, in do_work
    req = sock.recv(10)
error: [Errno 110] Connection timed out
Other Socket err, exit and try creating socket again

On Mac and Windows you can enable Keepalive, but cannot set TCP_KEEPIDLE and other parameters, you’ll get the following error message (Python 2.7.2 with macports)

Traceback (most recent call last):
  File "socket_test.py", line 65, in <module>
    do_work()
  File "socket_test.py", line 19, in do_work
    sock.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 60)
AttributeError: 'module' object has no attribute 'TCP_KEEPIDLE'

3 thoughts on “Handling TCP Keepalive

  1. That is an interesting mechanism, I will try that. But if you use the code below it will defeat the issue “TCP peer is being quiet – or if the TCP socket has gone away” as quoted on the reference link, zero data is catched by the exception socket.timeout, isn’t it ?

            if req == '':
                # connection closed by peer, exit loop
                print 'Connection closed by peer'
                break
    
  2. You mean if there isn’t any data socket.timeout will happen? Yes, but we loop back to check for data again. If no exception happens and yet there is no data, that is when we have a problem, most probably due to TCP Keepalive.

  3. yes, my bad, you’re right with the codes, I do some tests, assuming we don’t have tcp keepalive, if there is no data (meaning no activity and not receive 0 data) socket.timeout is raised, I maintain a counter to count the number of timeout and close the socket accordingly. Indeed data 0 byte happens only when the remote peer makes a proper call close(), you’re codes are correct req==” to detect a close from remote peer. In addition to that, I probably need the tcp keepalive mechanism because my router doing NAT looks to send a reconnect after only a few seconds to my server socket, or the reconnect is coming from the client application, I am unsure. Do you think the tcp keepalive packet can maintain the socket connection with the NAT router and force to to stop sending continuously reconnect ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s