If you have a hung process and it happened to be in this state because of the network, for example your client or server program is in read timeout state, you can use
lsof and gbg
to close the network socket simulating the other (remote) end closed it and the process will continue operating normally.
In our case there is a couple of nrpe process hung in read from a network socket:
[root@srv ~]# lsof -n -p 9948 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME nrpe 9948 nrpe cwd DIR 9,2 4096 2 / nrpe 9948 nrpe rtd DIR 9,2 4096 2 / nrpe 9948 nrpe txt REG 9,2 69960 1053396 /usr/sbin/nrpe nrpe 9948 nrpe mem REG 9,2 62184 1053312 /usr/lib64/libnss_files-2.17.so nrpe 9948 nrpe mem REG 9,2 402384 1051943 /usr/lib64/libpcre.so.1.2.0 nrpe 9948 nrpe mem REG 9,2 155784 1057231 /usr/lib64/libselinux.so.1 nrpe 9948 nrpe mem REG 9,2 144792 1051919 /usr/lib64/libpthread-2.17.so nrpe 9948 nrpe mem REG 9,2 106848 1053314 /usr/lib64/libresolv-2.17.so nrpe 9948 nrpe mem REG 9,2 15688 1051678 /usr/lib64/libkeyutils.so.1.5 nrpe 9948 nrpe mem REG 9,2 58728 1051843 /usr/lib64/libkrb5support.so.0.1 nrpe 9948 nrpe mem REG 9,2 90664 1051808 /usr/lib64/libz.so.1.2.7 nrpe 9948 nrpe mem REG 9,2 19776 1053308 /usr/lib64/libdl-2.17.so nrpe 9948 nrpe mem REG 9,2 210840 1051701 /usr/lib64/libk5crypto.so.3.1 nrpe 9948 nrpe mem REG 9,2 15920 1051682 /usr/lib64/libcom_err.so.2.1 nrpe 9948 nrpe mem REG 9,2 963576 1051755 /usr/lib64/libkrb5.so.3.3 nrpe 9948 nrpe mem REG 9,2 320408 1051956 /usr/lib64/libgssapi_krb5.so.2.2 nrpe 9948 nrpe mem REG 9,2 2173512 1051792 /usr/lib64/libc-2.17.so nrpe 9948 nrpe mem REG 9,2 42520 1051997 /usr/lib64/libwrap.so.0.7.6 nrpe 9948 nrpe mem REG 9,2 117680 1053310 /usr/lib64/libnsl-2.17.so nrpe 9948 nrpe mem REG 9,2 2512832 1051648 /usr/lib64/libcrypto.so.1.0.2k nrpe 9948 nrpe mem REG 9,2 470360 1051690 /usr/lib64/libssl.so.1.0.2k nrpe 9948 nrpe mem REG 9,2 164240 1049135 /usr/lib64/ld-2.17.so nrpe 9948 nrpe 0r CHR 1,3 0t0 1028 /dev/null nrpe 9948 nrpe 1w CHR 1,3 0t0 1028 /dev/null nrpe 9948 nrpe 2w CHR 1,3 0t0 1028 /dev/null nrpe 9948 nrpe 3u unix 0xffff961d48d37000 0t0 19091 socket nrpe 9948 nrpe 6u IPv4 261850576 0t0 TCP 10.10.10.10:5666->10.10.10.254:39056 (ESTABLISHED)
As you can see the FD column shows the File Descriptor number of the opened file (network resource here) and you can use it with
gdb
to simulate closing the network socket as if the remote close it but from the same machine.
[root@srv ~]# gdb -p 9948 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Attaching to process 9948 Reading symbols from /usr/sbin/nrpe...Reading symbols from /usr/sbin/nrpe...(no debugging symbols found)...done. (no debugging symbols found)...done. .... .... Loaded symbols for /lib64/libpcre.so.1 Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_files.so.2 0x00007f91e8295c70 in __read_nocancel () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install nrpe-3.2.0-6.el7.x86_64 (gdb) call shutdown(6, 0) $1 = 0 (gdb) quit A debugging session is active. Inferior 1 [process 9948] will be detached. Quit anyway? (y or n) Y Detaching from program: /usr/sbin/nrpe, process 9948
Just call
call shutdown(FileDescriptorID, 0)
and quit the gdb. In our case the FileDescriptorID is 6, so we executed
call shutdown(6, 0)
And the network socket between this machine and the remote one will be terminated, so the process nrpe could continue its execution.
Of course, in your cases you can look for a specific network connection among many other, but lsof is the tool you can use to identify the connection and the right file descriptor number to use in gdb.