We have master slave setup with redis servers and after some time the master server began to refuse connections with
Error: Connection reset by peer
Looking in the redis server’s log in “/var/log/redis/redis-server.log” (Ubuntu way):
redis-server.log-13447:M 17 Jan 15:28:58.719 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:28:58.729 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:28:58.779 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:28:59.723 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:28:59.731 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:28:59.782 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:29:00.725 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:29:00.732 # Error registering fd event for the new client: Numerical result out of range (fd=24099) redis-server.log-13447:M 17 Jan 15:29:00.784 # Error registering fd event for the new client: Numerical result out of range (fd=24099)
It looked like there are no more File descriptors available to the process of redis server, but way?
Here is why:
srv-redis1 # lsof -n|grep redis|grep FIFO|wc -l 96264 srv-redis1 # netstat -anp Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 13447/redis-server ..... ..... ..... redis-ser 13447 redis 51w FIFO 0,10 0t0 403873809 pipe redis-ser 13447 redis 52r FIFO 0,10 0t0 403866724 pipe redis-ser 13447 redis 53w FIFO 0,10 0t0 403866724 pipe redis-ser 13447 redis 54r FIFO 0,10 0t0 403868523 pipe redis-ser 13447 redis 55w FIFO 0,10 0t0 403868523 pipe redis-ser 13447 redis 56w FIFO 0,10 0t0 403870163 pipe ...... ......
Almost 100 000 FIFO pipes?
The only solution is just to restart the server
and probably it is a good idea to upgrade to the latest version!
srv-redis1 redis # systemctl restart redis-server.service
Our version was the latest of the branch 4.x (4.0.11-1chl1~xenial1) at the moment!
Local and remote connections to the master were impossible. The interesting part is that our software on all servers had been already connected and it was OK, but a restarted server could not connect to the master server any more! Here is what you might receive:
srv ~ # redis-cli -h 10.10.10.10 -p 6379 10.10.10.10:6379> INFO Error: Connection reset by peer 10.10.10.10:6379>
and even on local host in the master:
srv-redis1 redis # redis-cli 127.0.0.1:6379> INFO Error: Server closed the connection 127.0.0.1:6379> INFO Error: Connection reset by peer 127.0.0.1:6379>
Increase the “Number of File Descriptors”
You can increase the file descriptors for the redis server in the systemd service file “/lib/systemd/system/redis-server.service”:
..... LimitNOFILE=100000 .....
But you probably will get the problem again, it is most probably a bug very similar to this one: https://github.com/antirez/redis/issues/2857