Close socket as if the remote closed the connection

If you have a hung process and it happened to be in this state because of the network, for example your client or server program is in read timeout state, you can use

lsof and gbg

to close the network socket simulating the other (remote) end closed it and the process will continue operating normally.

In our case there is a couple of nrpe process hung in read from a network socket:

[root@srv ~]# lsof -n -p 9948
COMMAND  PID USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
nrpe    9948 nrpe  cwd    DIR                9,2     4096       2 /
nrpe    9948 nrpe  rtd    DIR                9,2     4096       2 /
nrpe    9948 nrpe  txt    REG                9,2    69960 1053396 /usr/sbin/nrpe
nrpe    9948 nrpe  mem    REG                9,2    62184 1053312 /usr/lib64/libnss_files-2.17.so
nrpe    9948 nrpe  mem    REG                9,2   402384 1051943 /usr/lib64/libpcre.so.1.2.0
nrpe    9948 nrpe  mem    REG                9,2   155784 1057231 /usr/lib64/libselinux.so.1
nrpe    9948 nrpe  mem    REG                9,2   144792 1051919 /usr/lib64/libpthread-2.17.so
nrpe    9948 nrpe  mem    REG                9,2   106848 1053314 /usr/lib64/libresolv-2.17.so
nrpe    9948 nrpe  mem    REG                9,2    15688 1051678 /usr/lib64/libkeyutils.so.1.5
nrpe    9948 nrpe  mem    REG                9,2    58728 1051843 /usr/lib64/libkrb5support.so.0.1
nrpe    9948 nrpe  mem    REG                9,2    90664 1051808 /usr/lib64/libz.so.1.2.7
nrpe    9948 nrpe  mem    REG                9,2    19776 1053308 /usr/lib64/libdl-2.17.so
nrpe    9948 nrpe  mem    REG                9,2   210840 1051701 /usr/lib64/libk5crypto.so.3.1
nrpe    9948 nrpe  mem    REG                9,2    15920 1051682 /usr/lib64/libcom_err.so.2.1
nrpe    9948 nrpe  mem    REG                9,2   963576 1051755 /usr/lib64/libkrb5.so.3.3
nrpe    9948 nrpe  mem    REG                9,2   320408 1051956 /usr/lib64/libgssapi_krb5.so.2.2
nrpe    9948 nrpe  mem    REG                9,2  2173512 1051792 /usr/lib64/libc-2.17.so
nrpe    9948 nrpe  mem    REG                9,2    42520 1051997 /usr/lib64/libwrap.so.0.7.6
nrpe    9948 nrpe  mem    REG                9,2   117680 1053310 /usr/lib64/libnsl-2.17.so
nrpe    9948 nrpe  mem    REG                9,2  2512832 1051648 /usr/lib64/libcrypto.so.1.0.2k
nrpe    9948 nrpe  mem    REG                9,2   470360 1051690 /usr/lib64/libssl.so.1.0.2k
nrpe    9948 nrpe  mem    REG                9,2   164240 1049135 /usr/lib64/ld-2.17.so
nrpe    9948 nrpe    0r   CHR                1,3      0t0    1028 /dev/null
nrpe    9948 nrpe    1w   CHR                1,3      0t0    1028 /dev/null
nrpe    9948 nrpe    2w   CHR                1,3      0t0    1028 /dev/null
nrpe    9948 nrpe    3u  unix 0xffff961d48d37000      0t0   19091 socket
nrpe    9948 nrpe    6u  IPv4          261850576      0t0     TCP 10.10.10.10:5666->10.10.10.254:39056 (ESTABLISHED)

As you can see the FD column shows the File Descriptor number of the opened file (network resource here) and you can use it with

gdb

to simulate closing the network socket as if the remote close it but from the same machine.

[root@srv ~]# gdb -p 9948
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9948
Reading symbols from /usr/sbin/nrpe...Reading symbols from /usr/sbin/nrpe...(no debugging symbols found)...done.
(no debugging symbols found)...done.
....
....
Loaded symbols for /lib64/libpcre.so.1
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00007f91e8295c70 in __read_nocancel () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install nrpe-3.2.0-6.el7.x86_64
(gdb) call shutdown(6, 0)
$1 = 0
(gdb) quit
A debugging session is active.

        Inferior 1 [process 9948] will be detached.

Quit anyway? (y or n) Y
Detaching from program: /usr/sbin/nrpe, process 9948

Just call

call shutdown(FileDescriptorID, 0)

and quit the gdb. In our case the FileDescriptorID is 6, so we executed

call shutdown(6, 0)

And the network socket between this machine and the remote one will be terminated, so the process nrpe could continue its execution.
Of course, in your cases you can look for a specific network connection among many other, but lsof is the tool you can use to identify the connection and the right file descriptor number to use in gdb.

Leave a Reply

Your email address will not be published.