Tune nginx proxy cache – control the cache manager how to delete cached files

In most cases you’ll never want to modify the default settings for deleting cache items with proxy_cache_path directives. The problem is in a peak the file deleting could impact your server performance and even it could kill your server leaving it unresponsive for a period of time. You cannot instruct nginx with a schedule job for deletion cached items or ban the deletion when the server is busy or loaded. The manager just traces each zone for used cache capacity versus the maximum allowed size and if the used capacity is near or bigger than the maximum allowed size (max_size) the manager process triggers deletion with the default values – the nginx manager will try to delete at least 100 files (up to 200 milliseconds) and then it will sleep for 50 milliseconds then again it will try deleting 100 files. So your file system could receive at least 1000 files per second to delete!

This could lead your server to almost unresponsive state in the peaks.

And it could be perfectly OK in off-peaks, but there is no way how to tell nginx cache manager there is a plenty free space despite you reach the cache limit so at the moment it is not the best time to delete the cache!

You can tune three parameters per cache directory (manual here: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path):

  • manager_files – not more than this number of files to delete in one iteration. The default value is 100.
  • manager_threshold – limit the delete iteration time. The default value is 200 milliseconds and you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
  • manager_sleep – how much time to sleep the manager before executing another delete iteration. The default value is 50 milliseconds and here you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g manager_files=2 manager_sleep=200ms manager_threshold=500ms;

The cache manager will delete not more than 2 files for up to 500 milliseconds and it will sleep 200 milliseconds before another delete iteration.

The best option for loaded servers

The best option for loaded servers with full cache is to balance the free space – delete small amount of files at once to be sure your server will not get loaded even the free space decreases at the peaks (so more files are cached than the nginx manager could delete – you are aware of this and the free space should be enough), but during the off peak (which normally is several times longer than the peak) the nginx manager could catch up with the deleting and it should free up some space (cached files are lesser than the deleted ones). Of course, you should tune this according to your situation.
The main idea is to delete in small amounts of files to not saturate your disks it could take longer to recover the free space, but it will not load your server in peaks. You should consider two things:

  1. Free space – enough free space and to be sure the free space is enough for the peaks, when the cache could grow above the threshold.
  2. Number of deletions per iteration – you should experiment with this. Fist you should be away how many files are added for a period of time, which includes one peak and one off-peak and then to balance the number in such a way that after the period the cache is not above the maximum size. Probably the best is to start with a 24 hours period, which includes at least one peak.

As you can see the example above only 2 files are good enough for an iteration for our case. Taking into account the 200ms sleep between the files’ deletions 10 files at most should be deleted per second. In our case it is not enough for the peak, but for the off-peak, which is 20 hours every 24 hours, is good enough to get into the maximum size limit of the cache.

Here you can learn how to verify your nginx is deleting cache files and the impact of the default settings on a busy server in a peak: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) Our loaded server just stopped serving files and the bandwidth decreased with 99% because nginx cache manager suddenly started deleting cached files.

how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process)

In peaks deleting files could kill your server and easily the traffic could degraded multiple times than normal if the nginx cache manager start deleting files!

The server is perfectly normal but suddenly it just get loaded and all nginx processes are in D (“Disk sleep”) state.

What could it be? What is going on with your proxy server?

Probably the cache is full!

Unfortunately there is no way to check how much is filled the cache live – just an upgrade or restart of the nginx process will trigger nginx cache loader to check all the cache files and will write the cache size on exit in the error log – but be careful the cache loading is also IO intensive operation – stats all the cache files and they could be millions images).

If you are sure the cache manager is to blame for the IO of your server (probably using this method – Check whether nginx cache manager is deleting files at the moment), you can stop it almost immediately!

Just increase the nginx cache drastically – add zero to the maximum cache size

Of course, you should have enough free space till you resolve the problem – for example more servers or manual deletion on peak-off or tune your cache deletion or any other solution….
Search for something like

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=400g

And add zero to the max_size number like:

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g

The max size will increase from 400G to 4000G (4T)!
This will effectively stop the files deleting and the nginx cache manager will have slept for long time before invoking again to delete files. This could be life saving operation for your server at peak!

Here is a real graph from one of our servers – the cache manager started deleting files from the cache and the traffic dropped 99%!!!

SCREENSHOT 1) The nginx cache manager just started to delete files from the cache and this operation just killed our server completely.

You can see almost zero bandwidth! The problem was resolved when we reloaded nginx with a bigger cache max_size value. The nginx manager immediately went to sleep and no IO for deleting files. The load of the server returned to normal!

main menu
nginx cache manager start deleting files

SCREENSHOT 2) Hard drives were saturated and the disk maxed the IO time to 10 ms.

Despite the bigger READ and WRITE IOPS there was 95-99% less traffic.

main menu
Disk IO Time when cache manager is working

Then you can tune the values for deleting files from the cache – Tune nginx proxy cache – control the cache manager how to delete cached files.

Check whether nginx cache manager is deleting files at the moment

Here is a tip for the webmasters (or system admins) to discover whether the nginx using proxy_cache to cache files is deleting files at the moment! There situation where you may need to know if the loaded of a static media server is caused by the deletion of the cache manager or by the read or seek operations when serving the static files. The deletion is really slow and IO intensive operation, which could greatly impact the performance and traffic of the server.
Find the process nginx’s “cache manager process” and strace it:

[root@srv ~]# ps axuf|grep nginx
root     31582  0.0  0.0 2906768 25108 ?       Ss   Feb15   0:01 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx    16008  1.9  1.3 2941188 440224 ?      S    16:39   1:33  \_ nginx: worker process
nginx    16009  1.5  1.2 2941188 398836 ?      S    16:39   1:12  \_ nginx: worker process
nginx    16010  0.5  0.7 2941984 239064 ?      S    16:39   0:26  \_ nginx: worker process
nginx    16011  0.7  0.9 2941984 299356 ?      D    16:39   0:35  \_ nginx: worker process
nginx    16012  1.2  1.1 2941188 389540 ?      D    16:39   1:01  \_ nginx: worker process
nginx    16013  2.3  1.5 2941188 487324 ?      D    16:39   1:55  \_ nginx: worker process
nginx    16014  0.0  0.6 2906772 224004 ?      S    16:39   0:01  \_ nginx: cache manager process
[root@srv ~]# strace -f -p 16014
strace: Process 16014 attached
gettid()                                = 16014
write(31, "2019/02/25 18:00:31 [info] 16014"..., 89) = 89
epoll_wait(36, [], 512, 5406)           = 0
unlink("/mnt/cache/0/39/c8ccbbc06d16debb1c8d58ceb6f99390") = 0
unlink("/mnt/cache/0/78/118924d7bf70e20fa8f790c6f9e7c780") = 0
unlink("/mnt/cache/3/ce/fab074cc670e6a80114dcbc398a63ce3") = 0
unlink("/mnt/cache/5/48/0b4e162dd7be8244815721fb7d68e485") = 0
unlink("/mnt/cache/5/56/e5eb4b38c7c8d209d0aabaf79ac02565") = 0
unlink("/mnt/cache/e/c6/207b432fa77375e4eefcaf52db250c6e") = 0
unlink("/mnt/cache/4/6d/ac0db27a03dabc79d869068db1b516d4") = 0
unlink("/mnt/cache/9/e8/91625c6e60de8e5425c4135c7dfb2e89") = 0
unlink("/mnt/cache/b/3c/f3c53000cf0cb20d55d8c09df8a733cb") = 0
unlink("/mnt/cache/f/f7/6f06423cd411b45816969fe020903f7f") = 0
unlink("/mnt/cache/f/50/c9b8ab72821a6e9bcb9c8d4b790dc50f") = 0
unlink("/mnt/cache/6/1f/74b0f1fdf1ac30db6af7793dc15671f6") = 0
unlink("/mnt/cache/0/83/caf199c1b99d438f96caec71bf2ea830") = 0
unlink("/mnt/cache/4/3d/c90f8fbbba4aaf407e386641dc2203d4") = 0
unlink("/mnt/cache/4/ad/d23cf8598020141b2bcec46d2b5cbad4") = 0
unlink("/mnt/cache/d/47/05973bc310503f36c67b7c1c24c8247d") = 0
unlink("/mnt/cache/f/11/e4fcbde8533d89105ab41f22c55e211f") = 0
unlink("/mnt/cache/2/06/29066a58e4116d24266026b4ed1e3062") = 0
epoll_wait(32, [], 512, 50)             = 0
unlink("/mnt/cache/4/6b/9a104ebdf70d00137a88d4584b2bb6b4") = 0
unlink("/mnt/cache/e/95/6d176447f57f21769d86a8f0b2a8b95e") = 0
unlink("/mnt/cache/b/b2/2f6f51163c65ae1fc06a913d6de1ab2b") = 0
unlink("/mnt/cache/a/24/2b058045a23b69de7a4442c9e6fce24a") = 0
unlink("/mnt/cache/7/60/00833e0b236ca8472f5be8227d645607") = 0
unlink("/mnt/cache/a/08/bf00eea300eff97dc4fffa61daaca08a") = 0
unlink("/mnt/cache/2/48/a291d8aca2b6f4f9471686eabe9b2482") = 0
unlink("/mnt/cache/0/e3/2d631adbc3bfdf8e44a51fa5453eee30") = 0
unlink("/mnt/cache/1/3b/08eef7c86c5ece9b5279b304dd86e3b1") = 0
unlink("/mnt/cache/b/a4/03213e4a8a1e8fb17ae698e54e70fa4b") = 0
unlink("/mnt/cache/b/a3/77f1b11811a9cda0ae93c498769f7a3b") = 0
unlink("/mnt/cache/4/01/1d50fac60681ae3263c8875775d20014") = 0
unlink("/mnt/cache/c/94/e71b96cbc65b248bd8e4540cbd69294c") = 0
unlink("/mnt/cache/1/59/99ec58e865b97e217835dd84f5f48591") = 0
unlink("/mnt/cache/4/b8/6a64825ce555b8f2440f051a7f7bcb84") = 0
unlink("/mnt/cache/7/51/fe2acbb895427ed8e406ce7e79d61517") = 0

You can tune the file removing from the cache with manager_files, manager_threshold and manager_sleep arguments of the proxy_cache_path.
If you came here searching information on the topic probably you should check out these articles, too: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) and Tune nginx proxy cache – control the cache manager how to delete cached files