In peaks deleting files could kill your server and easily the traffic could degraded multiple times than normal if the nginx cache manager start deleting files!
The server is perfectly normal but suddenly it just get loaded and all nginx processes are in D (“Disk sleep”) state.
What could it be? What is going on with your proxy server?
Probably the cache is full!
Unfortunately there is no way to check how much is filled the cache live – just an upgrade or restart of the nginx process will trigger nginx cache loader to check all the cache files and will write the cache size on exit in the error log – but be careful the cache loading is also IO intensive operation – stats all the cache files and they could be millions images).
If you are sure the cache manager is to blame for the IO of your server (probably using this method – Check whether nginx cache manager is deleting files at the moment), you can stop it almost immediately!
Just increase the nginx cache drastically – add zero to the maximum cache size
Of course, you should have enough free space till you resolve the problem – for example more servers or manual deletion on peak-off or tune your cache deletion or any other solution….
Search for something like
proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=400g
And add zero to the max_size number like:
proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g
The max size will increase from 400G to 4000G (4T)!
This will effectively stop the files deleting and the nginx cache manager will have slept for long time before invoking again to delete files. This could be life saving operation for your server at peak!
Here is a real graph from one of our servers – the cache manager started deleting files from the cache and the traffic dropped 99%!!!
SCREENSHOT 1) The nginx cache manager just started to delete files from the cache and this operation just killed our server completely.
You can see almost zero bandwidth! The problem was resolved when we reloaded nginx with a bigger cache max_size value. The nginx manager immediately went to sleep and no IO for deleting files. The load of the server returned to normal!
SCREENSHOT 2) Hard drives were saturated and the disk maxed the IO time to 10 ms.
Despite the bigger READ and WRITE IOPS there was 95-99% less traffic.
Then you can tune the values for deleting files from the cache – Tune nginx proxy cache – control the cache manager how to delete cached files.