nginx proxy cache and expires directive – pass-through the origin cache control

Proxying static content sometimes requires to modify the expire directive on the proxy server, but sometimes it may just need to pass the origin expire directive. What if the “expires” is defined in the server section and ones need to pass through the value from the origin by the proxy server?

Simply switch off the expires by “expires off”. It will disable an earlier definition in the block it is used and the proxy answer to the client will include the origin header for the cache control.

It won’t disable the cache control meaning to add no-cache in proxy answer to the client. So if it is used “expires” in a block and a pass-through from the origin is required, just make a location block with “expires off”:

        server {
                listen          10.10.10.10:443 ssl http2;
                server_name     srv1.example.com;

                ssl_certificate  /etc/ssl/nginx/srv1.example.com.chain.crt;
                ssl_certificate_key /etc/ssl/nginx/srv1.example.com.key;

                resolver 8.8.8.8;

                client_max_body_size 12m;

                expires         -1;
                root            /mnt/storage/web/root;
                access_log      /mnt/storage/web/logs/srv1.example.com;
                error_log       /mnt/storage/web/logs/srv1.example.com warn;

                location / {
                        #proxy
                        proxy_buffer_size   128k;
                        proxy_buffers   4 256k;
                        proxy_busy_buffers_size   256k;
                        proxy_buffering off;
                        proxy_read_timeout 600;
                        proxy_send_timeout 600;
                        proxy_store off;
                        proxy_cache off;
                        proxy_redirect off;
                        proxy_no_cache $cookie_nocache $arg_nocache $arg_comment;
                        proxy_no_cache $http_pragma $http_authorization;
                        proxy_cache_bypass $cookie_nocache $arg_nocache $arg_comment;
                        proxy_cache_bypass $http_pragma $http_authorization;

                        proxy_pass https://https_backend;
                        proxy_http_version 1.1;
                        proxy_set_header Connection "";
                        proxy_set_header Host $host;
                        proxy_set_header        X-Real-IP-EXAMPLE       $remote_addr;
                        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                }

                location ~* \.jpeg {
                        expires off;
                        #proxy
                        proxy_buffer_size   128k;
                        proxy_buffers   4 256k;
                        proxy_busy_buffers_size   256k;
                        proxy_buffering off;
                        proxy_read_timeout 600;
                        proxy_send_timeout 600;
                        proxy_store off;
                        proxy_cache off;
                        proxy_redirect off;
                        proxy_no_cache $cookie_nocache $arg_nocache $arg_comment;
                        proxy_no_cache $http_pragma $http_authorization;
                        proxy_cache_bypass $cookie_nocache $arg_nocache $arg_comment;
                        proxy_cache_bypass $http_pragma $http_authorization;
                        proxy_ignore_headers "Expires" "Cache-Control";


                        proxy_pass https://https_backend;
                        proxy_http_version 1.1;
                        proxy_set_header Connection "";
                        proxy_set_header Host $host;
                        proxy_set_header        X-Real-IP-EXAMPLE       $remote_addr;
                        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                }

                 location ~* \.(jpg|gif|png|css|mcss|js|mjs|woff|woff2)$ {
                        expires 30d;
                        #proxy
                        proxy_buffer_size   128k;
                        proxy_buffers   4 256k;
                        proxy_busy_buffers_size   256k;
                        proxy_buffering off;
                        proxy_read_timeout 600;
                        proxy_send_timeout 600;
                        proxy_store off;
                        proxy_cache off;
                        proxy_redirect off;
                        proxy_no_cache $cookie_nocache $arg_nocache $arg_comment;
                        proxy_no_cache $http_pragma $http_authorization;
                        proxy_cache_bypass $cookie_nocache $arg_nocache $arg_comment;
                        proxy_cache_bypass $http_pragma $http_authorization;

                        proxy_pass http://http_backend;
                        proxy_http_version 1.1;
                        proxy_set_header Connection "";
                        proxy_set_header Host $host;
                        proxy_set_header        X-Real-IP-EXAMPLE       $remote_addr;
                        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                }
        }

The above sample configuration defines 3 location blocks for 3 different “expires” cases:

  1. Uses the server block “expires -1” – proxy for dynamic code such as proxy site application code. No caching.
  2. Uses the location block .jpeg “expires off” – the proxy will pass-through the value from the origin server. Any value from the origin.
  3. Uses the location block .(gif|png|css|mcss|js|mjs|woff|woff2) “expires 30d” – it will set +30 days expire header no matter of the origin value. Origin server value is replaced by the today +30 days in the future cache cache header.

nginx and proxy_cache not growing cache despite max_size is bigger – shared memory zone to blame

One of our big Nginx cache servers has recently been upgraded to have 70T storage, which is pretty good storage for a proxy. And in a hurry to configure the big storage we changed only the “max_size” option of proxy_cache_path directive! After a week in production, the proxy reached 23T and it just stopped growing with no apparent reason! Space and Nginx max_size were OK 75T total space and 70T for the proxy cache, but no Nginx had not added more space after reaching 23T occupied space for two days, which was impossible because all files were kept for 5 years and 200G per day were generated. No errors in the logs and we even use “virtual host traffic status module” – Live status information like used space and more for nginx proxy cache, but still no clue why it did not grow above this threshold of 23T! And it began to remove cached objects!

proxy_cache_path /mnt/cache levels=1:2 keys_zone=STATIC:900m inactive=42600h max_size=70000g;

It appeared we exhausted the shared memory zone limit for the zone! And Nginx cache just stopped growing.

According to the Nginx manual “One megabyte (of shared memory zone), a zone can store about 8 thousand keys“. Apparently, after 23T of files, we have passed 7 200 000 keys and exhausted the limit we configured in the proxy_cache_path line!

The solution is really simple just increase the limit for the shared memory zone for the zone.

proxy_cache_path /mnt/cache levels=1:2 keys_zone=STATIC:4000m inactive=42600h max_size=70000g;

In the past, with small cache (15T) it was enough to have 900Mbytes for the cache’s shared memory. Now we set it to 4000 Mbytes to be able to store approximately 32 000 000 keys. We have 23T and 900M of shared memory for keys (for our setup, your setup may differ a lot!!!) and setting it to 4000M, which is more than 4 times bigger than before it will probably be enough for the rest free storage to be used at full extent.
Be careful this operation will trigger the “Nginx cache loader” to load the cache index and may produce IO during this operation!

Nginx shared memory zone size

Nginx workers use shared mappings – mmap, which is different from the SYSV and POSIX shared memory (so you cannot use ipcs tool to check for shared memory). You should check how many memory currently the process is using. Here is how you can get the size of the shared memory zone occupied by the Nginx processes and as you can see each Nginx worker is around 900M of column “RSS” (Resident Set Size):

[root@srv ~]# ps -o rss,pid,comm,user,cmd -C nginx
  RSS   PID COMMAND         USER     CMD
904888 3979 nginx           nginx    nginx: worker process
905116 3980 nginx           nginx    nginx: worker process
904828 3981 nginx           nginx    nginx: worker process
905176 3982 nginx           nginx    nginx: worker process
905196 3983 nginx           nginx    nginx: worker process
905008 3984 nginx           nginx    nginx: worker process
904908 3985 nginx           nginx    nginx: worker process
905372 3986 nginx           nginx    nginx: worker process
905088 3987 nginx           nginx    nginx: worker process
902688 3988 nginx           nginx    nginx: worker process
904932 3989 nginx           nginx    nginx: worker process
905032 3990 nginx           nginx    nginx: worker process
26452  3991 nginx           nginx    nginx: cache manager process
33928  8148 nginx           root     nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf

For a single Nginx process:

[root@srv ~]# cat /proc/3981/status |grep RssShmem
RssShmem:         894240 kB

You can check the occupied inodes of your file system with df to get approximately how many files you have:

[root@srv ~]# df -i
Filesystem         Inodes   IUsed      IFree IUse% Mounted on
devtmpfs         16452656     715   16451941   1% /dev
tmpfs            16455999       1   16455998   1% /dev/shm
tmpfs            16455999    1153   16454846   1% /run
tmpfs            16455999      17   16455982   1% /sys/fs/cgroup
/dev/md1          2076704   39687    2037017   2% /
/dev/md3       1214685184 6897020 1207788164   1% /mnt/cache
tmpfs            16455999       5   16455994   1% /run/user/0

inodes around 6 897 020 and not growing for days. This number is very close to the maximum keys, which 900M key shared memory zone may store!

Two days after changing the key shared memory zone limit to 4000Mbytes:

proxy_cache_path /mnt/cache levels=1:2 keys_zone=STATIC:4000m inactive=42600h max_size=70000g;

The Nginx workers passed 900Mbytes RSS (Resident Set Size) and it reached 1Gbyte. The occupied cached sized grew with 1T and continued to grow.

[root@srv ~]# ps -o rss,pid,comm,user,cmd -C nginx
  RSS   PID COMMAND         USER     CMD
52256  8148 nginx           root     nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
1005624 16899 nginx         nginx    nginx: worker process
1005948 16900 nginx         nginx    nginx: worker process
1005936 16901 nginx         nginx    nginx: worker process
1005912 16902 nginx         nginx    nginx: worker process
1005832 16903 nginx         nginx    nginx: worker process
1005836 16904 nginx         nginx    nginx: worker process
1005868 16905 nginx         nginx    nginx: worker process
1005932 16906 nginx         nginx    nginx: worker process
1005796 16907 nginx         nginx    nginx: worker process
1005980 16908 nginx         nginx    nginx: worker process
1005848 16909 nginx         nginx    nginx: worker process
1005888 16910 nginx         nginx    nginx: worker process
26328 16911 nginx           nginx    nginx: cache manager process

The occupied inodes also increased to 7 484 291, which means the cache added around 700 000 new files.

[root@srv ~]# df -i
Filesystem         Inodes   IUsed      IFree IUse% Mounted on
devtmpfs         16452656     715   16451941    1% /dev
tmpfs            16455999       1   16455998    1% /dev/shm
tmpfs            16455999    1153   16454846    1% /run
tmpfs            16455999      17   16455982    1% /sys/fs/cgroup
/dev/md1          2076704   39690    2037014    2% /
/dev/md3       1214685184 7484582 1207200602    1% /mnt/cache
tmpfs            16455999       5   16455994    1% /run/user/0

nginx remote logging to UDP rsyslog server (CentOS 7)

This article will present you all the configuration needed to remotely save access logs of an Nginx web server. All the configuration from the client and server sides is included. The client and the server use CentOS 7 Linux distribution and it the configuration could be used under different Linux distribution. Probably only Selinux rules are kind of specific to the CentOS 7 and the firewalld rules are specific for those who use it as a firewall replacing the iptables. Here is the summary of what to expect:

  • Client-side – nginx configuration
  • Server-side – rsyslog configuration to accept UDP connections
  • Server-side – selinux and firewall configuration

STEP 1) Client-side – the Nginx configuration.

Nginx configuration is pretty simple just a single line with the log template and the IP (and port if not default 514) of the rsyslog server. For the record, this is the official documentation https://nginx.org/en/docs/syslog.html. In addition it worth mentioning there could be multiple access_log directives in a single section to log simultaneously on different targets (and the templates may be different or the same). So you can set the access log output of a section locally and remotely.
Nginx configuration (probably /etc/nginx/nginx.conf or whatever is the organization of your Nginx configuration files.)

server {
     .....
     access_log      /var/log/nginx/example.com_access.log main;
     access_log      syslog:server=10.10.10.2:514,facility=local7,tag=nginx,severity=info main3;
     .....
}

The “main” and “main3” are just names of the logging templates defined earlier (you may check rsyslog remote logging – prevent local messages to appear to see an interesting Nginx logging template).
The error log also could be remotely logged:

error_log syslog:server=10.10.10.3 debug;

STEP 2) Server-side – rsyslog configuration to accept UDP connections.

Of course, if you have not installed the rsyslog it’s high time you installed it with (for CentOS 7):

yum install -y rsyslog

To enable rsyslog to listen for UDP connections your rsyslog configuration file (/etc/rsyslog.conf) must include the following:

$ModLoad imudp
$UDPServerRun 514

Most of the Linux distributions have these two lines commented so you just need to uncomment them by removing the “#” from the beginning of the lines. If the lines are missing just add them under section “MODULES” (it should be near the first lines of the rsyslog configuration file).
Change the 514 with the number you like for the UDP listening port.
Write the client’s incoming lines of information to a different location and prevent merging with the local log messages – rsyslog remote logging – prevent local messages to appear. Include as a first rule under the rules’ section starting with “RULES” of the rsyslog configuration file (/etc/rsyslog.conf):

# Remote logging
$template HostIPtemp,"/mnt/logging/%FROMHOST-IP%.log"
if ($fromhost-ip != "127.0.0.1" ) then ?HostIPtemp
& stop

Logs only of remote hosts are going to be saved under /mnt/logging/.log.
Keep on reading!

nginx proxy cache – log the upstream response server, time, cache status, connect time and more in nginx access logs

The Nginx upstream module exposes embedded variables, which we can use to log them in the Nginx access log files.
Some of the variables are really interesting and could be of great use to the system administrators and in general to tune your systems (content delivery network?). For example, you can log

  • $upstream_cache_status – the cache status of the object the server served. For each URI you will have in the logs if the item is from the cache (HIT) or the Nginx used an upstream server to get the item (MISS)
  • $upstream_response_time – the time Nginx proxy needed to get the resource from the upstream server
  • $upstream_addr – the Nginx upstream server used for the requested URI in the logs.
  • $upstream_connect_time – the connect time to the specific

And many more you may check the documentation at the bottom with heading “Embedded Variables” – http://nginx.org/en/docs/http/ngx_http_upstream_module.html

For example, in peak hours, you can see how the time to get the resource from the upstream servers changes.

And you can substruct the time from the time your server served the URI to the client.

Of course, you can use this with any upstream case not only with proxy cache! These variables may be used with application backend servers like PHP (FastCGI) application servers and more. In a single log in the access log file, there could be information not only for the URI but for the time spent to generate the request in the application server.

Example

Logging in JSON format (JSON is just for the example, you can use the default string):

        log_format main3 escape=json '{'
                '"remote_addr":"$remote_addr",'
                '"time_iso8601":"$time_iso8601",'
                '"request_uri":"$request_uri",'
                '"request_length":"$request_length",'
                '"request_method":"$request_method",'
                '"request_time":"$request_time",'
                '"server_port":"$server_port",'
                '"server_protocol":"$server_protocol",'
                '"ssl_protocol":"$ssl_protocol",'
                '"status":"$status",'
                '"bytes_sent":"$bytes_sent",'
                '"http_referer":"$http_referer",'
                '"http_user_agent":"$http_user_agent",'
                '"upstream_response_time":"$upstream_response_time",'
                '"upstream_addr":"$upstream_addr",'
                '"upstream_connect_time":"$upstream_connect_time",'
                '"upstream_cache_status":"$upstream_cache_status",'
                '"tcpinfo_rtt":"$tcpinfo_rtt",'
                '"tcpinfo_rttvar":"$tcpinfo_rttvar"'
                '}';

We included the variables we needed, but there are a lot more, check out the Nginx documentation for more.
Just add the above snippet to your Nginx configuration and activate it with the access_log directive:

access_log      /var/log/nginx/example.com-json.log main3;

“main3” is the name of the format and it could be anything you like.

And the logs look like:

{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T13:36:33+00:00","request_uri":"/i/example/bc/bcda7f798ea1c75f18838bc3f0ffbd1c_200.jpg","request_length":"412","request_method":"GET","request_time":"0.325","server_port":"8801","server_protocol":"HTTP/1.1","ssl_protocol":"","status":"404","bytes_sent":"332","http_referer":"https://example.com/test_1","http_user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12","upstream_response_time":"0.324","upstream_addr":"10.10.10.2","upstream_connect_time":"0.077","upstream_cache_status":"MISS","tcpinfo_rtt":"45614","tcpinfo_rttvar":"22807"}
{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T13:36:33+00:00","request_uri":"/i/example/2d/2df5f3dfe1754b3b4ba8ac66159c0384_200.jpg","request_length":"412","request_method":"GET","request_time":"0.242","server_port":"8801","server_protocol":"HTTP/1.1","ssl_protocol":"","status":"404","bytes_sent":"332","http_referer":"https://example.com/test_1","http_user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12","upstream_response_time":"0.242","upstream_addr":"10.10.10.2","upstream_connect_time":"0.000","upstream_cache_status":"MISS","tcpinfo_rtt":"46187","tcpinfo_rttvar":"23093"}
{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T13:36:41+00:00","request_uri":"/flv/example/test_1.ts?st=E05FMg-DSIAgRfVhbadUWQ&e=1568381799&sopt=pdlfwefdfsr","request_length":"357","request_method":"GET","request_time":"0.960","server_port":"8801","server_protocol":"HTTP/1.0","ssl_protocol":"","status":"200","bytes_sent":"3988358","http_referer":"","http_user_agent":"Lavf53.32.100","upstream_response_time":"0.959","upstream_addr":"10.10.10.2","upstream_connect_time":"0.000","upstream_cache_status":"MISS","tcpinfo_rtt":"46320","tcpinfo_rttvar":"91"}
{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T14:09:34+00:00","request_uri":"/flv/example/aee001dce114c88874b306bc73c2d482_1.ts?range=564-1804987","request_length":"562","request_method":"GET","request_time":"0.613","server_port":"8801","server_protocol":"HTTP/1.0","ssl_protocol":"","status":"200","bytes_sent":"5318082","http_referer":"","http_user_agent":"AppleCoreMedia/1.0.0.16E227 (iPad; U; CPU OS 12_2 like Mac OS X; en_us)","upstream_response_time":"","upstream_addr":"","upstream_connect_time":"","upstream_cache_status":"HIT","tcpinfo_rtt":"45322","tcpinfo_rttvar":"295"}

It’s easy to print them beatiful in the console with the “jq” tool

[root@srv logging]# tail -f 10.10.10.10.log|awk 'BEGIN {FS="{"} {print "{"$2}'|jq "."
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T13:36:33+00:00",
  "request_uri": "/i/example/bc/bcda7f798ea1c75f18838bc3f0ffbd1c_200.jpg",
  "request_length": "412",
  "request_method": "GET",
  "request_time": "0.325",
  "server_port": "8801",
  "server_protocol": "HTTP/1.1",
  "ssl_protocol": "",
  "status": "404",
  "bytes_sent": "332",
  "http_referer": "https://example.com/test_1",
  "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12",
  "upstream_response_time": "0.324",
  "upstream_addr": "10.10.10.2",
  "upstream_connect_time": "0.077",
  "upstream_cache_status": "MISS",
  "tcpinfo_rtt": "45614",
  "tcpinfo_rttvar": "22807"
}
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T13:36:33+00:00",
  "request_uri": "/i/example/2d/2df5f3dfe1754b3b4ba8ac66159c0384_200.jpg",
  "request_length": "412",
  "request_method": "GET",
  "request_time": "0.242",
  "server_port": "8801",
  "server_protocol": "HTTP/1.1",
  "ssl_protocol": "",
  "status": "404",
  "bytes_sent": "332",
  "http_referer": "https://example.com/test_1",
  "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12",
  "upstream_response_time": "0.242",
  "upstream_addr": "10.10.10.2",
  "upstream_connect_time": "0.000",
  "upstream_cache_status": "MISS",
  "tcpinfo_rtt": "46187",
  "tcpinfo_rttvar": "23093"
}
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T13:36:41+00:00",
  "request_uri": "/flv/example/test_1.ts?st=E05FMg-DSIAgRfVhbadUWQ&e=1568381799&sopt=pdlfwefdfsr",
  "request_length": "357",
  "request_method": "GET",
  "request_time": "0.960",
  "server_port": "8801",
  "server_protocol": "HTTP/1.0",
  "ssl_protocol": "",
  "status": "200",
  "bytes_sent": "3988358",
  "http_referer": "",
  "http_user_agent": "Lavf53.32.100",
  "upstream_response_time": "0.959",
  "upstream_addr": "10.10.10.2",
  "upstream_connect_time": "0.000",
  "upstream_cache_status": "MISS",
  "tcpinfo_rtt": "46320",
  "tcpinfo_rttvar": "91"
}
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T14:09:34+00:00",
  "request_uri": "/flv/example/aee001dce114c88874b306bc73c2d482_1.ts?range=564-1804987",
  "request_length": "562",
  "request_method": "GET",
  "request_time": "0.613",
  "server_port": "8801",
  "server_protocol": "HTTP/1.0",
  "ssl_protocol": "",
  "status": "200",
  "bytes_sent": "5318082",
  "http_referer": "",
  "http_user_agent": "AppleCoreMedia/1.0.0.16E227 (iPad; U; CPU OS 12_2 like Mac OS X; en_us)",
  "upstream_response_time": "",
  "upstream_addr": "",
  "upstream_connect_time": "",
  "upstream_cache_status": "HIT",
  "tcpinfo_rtt": "45322",
  "tcpinfo_rttvar": "295"
}

3 misses and 1 hit, the hit 3 of the upstream variables we used are blank, because the server took the item from the cache.

Live status information like used space and more for nginx proxy cache

Using the Nginx virtual host traffic status module you can have extended live information for your proxy cache module and the proxy cache upstream servers. We have covered the topic of how to install the module here – Install Nginx virtual host traffic status module – traffic information in Nginx and more per server block and upstreams and this article is just to show you what information you could have using the module with proxy cache (and the upstream servers) module.
In general, there is no live information about Nginx proxy cache. Of course, by the space it is occupied in the disk you can guess how much space is taken by your Nginx cache (or when you restart or upgrade the Nginx it would reinitialize the cache and when finished the numbers would be written in the error log). With this module “Nginx virtual host traffic status module” – https://github.com/vozlt/nginx-module-vts you would have additional status information page containing information for the proxy cache module (we included only for the proxy cache here, for more look at the other article mentioned above), too:

Per upstream server

  • state – up, down, backup server and so on.
  • Response Time – the time the server responded last time. You can use this to see how far away is your server and to detect problems with your upstreams connectivity.
  • Weight – the weight of the server in the group. It’s from the configuration file.
  • Max Fails – the max fail attempts before it blacklists the upstream for “Fail Timeout” time. It’s from the configuration file.
  • Fail Timeout – the time, which the server will be blacklisted and the time the all fails (from Max Fails) must occur. It’s from the configuration file.
  • RequestsTotal – from the start of the server, Req/s – Requests per second at the moment of loading the extended status page, Time.
  • ResponsesTotal and split by error codes – 1xx, 2xx, 3xx, 4xx, 5xx.
  • TrafficSent – total sent from the start of the server), Rcvd – total received from the start of the server, Sent/s – sent per second at the moment, Rcvd/s – received per second at the moment.

Per cache – i.e. key zone name

  • SizeCapacity – the capacity from the configuration file, Used – used space lively updated! With this you can have access to the used space of your zones with only loading a page – the extended status page of this module!
  • TrafficSent – total sent from the start of the server, Rcvd – total received from the start of the server, Sent/s – sent per second at the moment, Rcvd/s – received per second at the moment.
  • CacheMiss, Bypass, Expired, Stale, Updating, Revalidated, Hit, Scarce, Total – they all are self-explanatory and all counters are from the start of the server.

You can compute the effectiveness of your cache for a period of time. For example, you can make different graphs based on this data for long periods and for different short periods like in peaks of off-peaks. We might have an article on the subject.

SCREENSHOT 1) Cache with three cache zones and two upstream servers – main and backup

As you can see our biggest zone has 2.92T occupied and it is 100% of the available space, so probably the cache manager is deleteing at the moment. The hits are 24551772 and the total is 28023927 so the ratio in percentages is 87.6%! 87.6% of the hits of this zone is servered by the server without the need of touching the upstream servers. In the first cache zone we have more aggressive time expiring, so there were 21% requests, which were updated.

main menu
Cache with three cache zones and two upstream servers – main and backup

Keep on reading!

Install Nginx virtual host traffic status module – traffic information in nginx and more per server block and upstreams

This article is going to show how to compile and install the Nginx module – ngx_http_vhost_traffic_status.

The module gathers traffic information per the server blocks and upstream servers and shows information for Nginx proxy cache like used space.

In addition, the module shows the type of the Response – 1xx, 2xx, 3xx, 4xx, 5xx and total. So when if problems occur in a server block or an upstream server
This module nginx-module-vts offers really extended status information for your Nginx.
Here is one the status page of our web servers with 18 virtual hosts:

The status page shows all virtual hosts in section “Server zones” and all upstream servers for the FastCGI PHP backend servers.

Traffic, requests, and status codes are available. All data is available in JSON, too.

main menu
Traffic information in Nginx and more per server block and upstreams

Server zones information

  • Requests – Total, Requests/s, Time
  • Responses – 1xx, 2xx, 3xx, 4xx, 5xx, Total
  • Traffic – Sent, Received, Sent/s, Received/s
  • Cache – Miss, Bypass, Expired, Stale, Updating, Revalidated, Hit, Scarce, Total

In addition to the information above there are State, Response Time, Weight, MaxFails and FileTimeout for all the upstream servers. And for the Nginx proxy cache there are Size, Capacity (live information!) and all information above per zone – there is an additional article here Live status information like used space and more for nginx proxy cache.
Keep on reading!

nginx – remove “no live upstreams” for your backup connections

In this article, we discuss the use of Nginx upstream module for HTTP and CGI (FastCGI) requests.

Using Nginx upstream module is essential for scaling application backend, but there might be a few catch-ups. One of them is related to what will happen when a server fails?
There is a proxy_next_upstream directive (or for FastCGI – fastcgi_next_upstream), which instructs the upstream module what is a fail – is it a connection error or timeout or HTTP 500 returned by the upstream server or an ordinary HTTP 404 returned by the upstream server is also a fail. So when a failure is identified by the Nginx upstream module the upstream module will look for the next upstream server to handle the request. These directives instruct Nginx upstream module what is a failure then to handle the next upstream server if available.
The default values are too conservative (and probably it is better to be like that):

proxy_next_upstream error timeout;

And available options are:

proxy_next_upstream error | timeout | invalid_header | http_500 | http_502 | http_503 | http_504 | http_403 | http_404 | http_429 | non_idempotent | off ...;

They are the same for fastcgi_next_upstream, too.

Imagine you what to protect yourself from HTTP 500? Or even HTTP 403 and 404? It’s normal to include them in your configuration. But here is the catch-up:
What if an image is really missing (HTTP 404) on all your upstream backend servers? Or there is a syntax error in an application file (like a simple PHP file)? All your upstreams servers will return 404 or 500 (in case of application error) and all of them will be blacklisted for at least 20 seconds. Remember 404 or 500 is a failure and we need next upstream server and if all return failure for this particular request Nginx will return to the client there is a problem and will mark the server as down (unavailable for a period of time).

SO because of a single file (a problem in a single file), all the following requests will be denied with “502 Bad Gateway” or “500 Internal Server Error”, even your servers are healthy!

Just a tiny miss like a missing image could be misinterpreted as your upstream servers have problems, so they must be blocked! Even if you put a “backup” directive in the upstream server line!

The solution is to include one or more of your upstream servers with disabled failure count (fail_timeout=0s) as a backup server. So this server will be always available when all normal servers got blacklisted! And you are not going to receive any more “no live upstreams” and returning an error to your clients.
Here is a working configuration (it is the same for HTTP and FastCGI setups):

upstream backend {
     server   127.0.0.1:8000;
     server   10.10.10.10:9000 fail_timeout=20s;
     server   127.0.0.1:8000 backup fail_timeout=0s max_fails=1000;
}

And be careful what you add in proxy_next_upstream (or fastcgi_next_upstream). In general, HTTP 403 and 403 are not for this directive!

A real-world example (FastCGI)

One of our project using an Nginx with the upstream module to scale the PHP application backend began to serve only HTTP 502 to all clients! In the PHP logs, there was a rear syntax error on a single file (not in the main part of the site), but Nginx was answering to all requests, no matter of the URI with 502. What had happened? The two backend application servers had returned with 500 (because of this error) and were blacklisted for 20 seconds! And all the following requests were not served by the upstream backends because there were “no live upstreams”:

upstream backend-php {
        server   127.0.0.1:8000;
        server   178.63.22.46:9000 fail_timeout=20s;
        server   127.0.0.1:8000 backup;
}

with

fastcgi_next_upstream error timeout invalid_header http_500 http_503 http_429 non_idempotent;

Even we have a backup, it was also blacklisted. In fact, a scanner was scanning all of our PHP files and one URL returned syntax error, then all of the upstream servers were blacklisted and we experienced an effective DoS because of misconfiguration. After we changed our configuration to:

upstream backend-php {
        server   127.0.0.1:8000;
        server   10.10.10.10:9000 fail_timeout=20s;
        server   127.0.0.1:8000 backup fail_timeout=0s max_fails=1000;
}

Everything returned to normal. There was still this syntax error but did not stop all other valid URLs to be served. Of course, we fixed the broken file and stopped the scammer from scanning our site.

A real-world example 2 (HTTP)

In our proxy static cache servers in remote locations, we experienced periodically “no live upstreams” and our clients received “502 Bad Gateway” on-peak hours! The problem was we have too aggressive proxy connect, read and send timeout, but because we were serving a live TV we needed them. And on-peak if a single connection just huck-up for 5-10 seconds our upstream servers were blacklisted for 20 seconds! Using proxy_cache_lock could worsen the situation! Then we changed our configuration to have a backup upstream server, which effectively would not be blacklisted and lowered the proxy_cache_lock to be sure if a single connection failed for some reason all other might succeed in bringing the file to the cache!

proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504 http_403 http_404;
proxy_connect_timeout 2s;
proxy_read_timeout 5s;
proxy_send_timeout 5s;
proxy_cache_lock on;
proxy_cache_lock_timeout 20s;
proxy_cache_lock_age 10s;

with upstream configuration:

upstream backend_http {
        server 10.10.10.10 fail_timeout=20s;
        server 10.10.10.11 backup fail_timeout=0s max_fails=1000;
        keepalive 16;
}

ansible – restart a (nginx) service only if it is running and the configuration is ok

Another ansible quick tip showing how to restart a program properly. We want to restart the program or the service only if it is running (because some system on executing restart may start the service even it is in the stopped state).
Here is what the ansible playbook do:

  1. Check if the program is running.
  2. Check the configuration of the program. Do not restart a program or service if it cannot start after a stop command because of bad configuration file(s).
  3. Restart the service (the program) only if the above two are true.

If you are a newbie in ansible you can check this article – First ansible use – install and execute a single command or multiple tasks in a playbook There you can see how to create your inventory file (and configure sudo if you remotely log in with unprivileged user) used herein the example.

Ansible YAML file

For our example we use the nginx webserver in the ansible playbook. Put the following code in a file and then execute ansible-playbook:

---
- hosts: all
  tasks:
            
    - name: Test for running nginx
      shell: ps axuf|grep 'nginx'|grep -v "grep" | tr -d "\n" | cat
      register: test_running_nginx
      changed_when: False
      tags: restart-nginx
      
    - name: First check the configuration
      shell: /usr/sbin/nginx -t
      register: test_nginx_config
      when: test_running_nginx.stdout != ""
      changed_when: False
      ignore_errors: True
      tags: restart-nginx
          
    - name: Restart nginx
      service: name=nginx state=restarted
      when: test_running_nginx.stdout != "" and test_nginx_config.rc == 0
      tags: restart-nginx

Here is how to run the above ansible playbook

myuser@srv ~ $ ansible-playbook -l srv2 -i ./inventory.ini ./playbook-example.yml -b

PLAY [all] *****************************************************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************************************
ok: [srv2]

TASK [Test for running nginx] **********************************************************************************************************************************************
ok: [srv2]

TASK [First check the configuration] ***************************************************************************************************************************************
ok: [srv2]

TASK [Restart nginx] *******************************************************************************************************************************************************
changed: [srv2]

PLAY RECAP *****************************************************************************************************************************************************************
srv2                       : ok=4    changed=1    unreachable=0    failed=0   

Here we add to the command line “-b”, which will escalate to root if it is needed (using sudo) because the remote connection is done with unprivileged user “myuser”. You can skip this option if you described the remote connection with the root user in the inventory file (or a system user, which has permissions to restart services).
Keep on reading!

nginx with php fpm (fastcgi) and the warning – an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp

As the web grows and the technology advances the page size of the web sites also grows or just some times you might want to output a big chunk of data from your application server – PHP-FPM (but it could be any of another ruby, python, C, Django and more), for example.
Here is a fast configuration tip (note this is not the proxy-related warning!):

The default nginx buffers per CGI connection are too small

Here is what to do in your nginx configuration file:
First, look for a line “include /etc/nginx/fastcgi_params;” or similar and add or edit if they exist after this line:

        fastcgi_buffer_size 16k;
        fastcgi_buffers 32 16k;

Check out more for the buffers here http://nginx.org/en/docs/http/ngx_http_fastcgi_module.html#fastcgi_buffers
The warning should stop if it does not stop you can try raising them. It could consume more memory but could lower the IO usage of your disks and improve the performance of your site or whatever backend works!

Here is the warning in our nginx error logs. We got this warning when using php-fpm and the php output size was 325965 bytes (~320K).

2019/04/04 09:56:05 [warn] 24451#24451: *44269838 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/0/12/0019966120 while reading upstream, client: 10.10.10.10, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"
2019/04/04 09:56:07 [warn] 24451#24451: *44269849 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/2/12/0019966122 while reading upstream, client: 10.10.10.11, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"
2019/04/04 09:56:09 [warn] 24450#24450: *44269856 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/7/12/0019966127 while reading upstream, client: 10.10.10.12, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"

Tune nginx proxy cache – control the cache manager how to delete cached files

In most cases you’ll never want to modify the default settings for deleting cache items with proxy_cache_path directives. The problem is in a peak the file deleting could impact your server performance and even it could kill your server leaving it unresponsive for a period of time. You cannot instruct nginx with a schedule job for deletion cached items or ban the deletion when the server is busy or loaded. The manager just traces each zone for used cache capacity versus the maximum allowed size and if the used capacity is near or bigger than the maximum allowed size (max_size) the manager process triggers deletion with the default values – the nginx manager will try to delete at least 100 files (up to 200 milliseconds) and then it will sleep for 50 milliseconds then again it will try deleting 100 files. So your file system could receive at least 1000 files per second to delete!

This could lead your server to almost unresponsive state in the peaks.

And it could be perfectly OK in off-peaks, but there is no way how to tell nginx cache manager there is a plenty free space despite you reach the cache limit so at the moment it is not the best time to delete the cache!

You can tune three parameters per cache directory (manual here: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path):

  • manager_files – not more than this number of files to delete in one iteration. The default value is 100.
  • manager_threshold – limit the delete iteration time. The default value is 200 milliseconds and you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
  • manager_sleep – how much time to sleep the manager before executing another delete iteration. The default value is 50 milliseconds and here you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g manager_files=2 manager_sleep=200ms manager_threshold=500ms;

The cache manager will delete not more than 2 files for up to 500 milliseconds and it will sleep 200 milliseconds before another delete iteration.

The best option for loaded servers

The best option for loaded servers with full cache is to balance the free space – delete small amount of files at once to be sure your server will not get loaded even the free space decreases at the peaks (so more files are cached than the nginx manager could delete – you are aware of this and the free space should be enough), but during the off peak (which normally is several times longer than the peak) the nginx manager could catch up with the deleting and it should free up some space (cached files are lesser than the deleted ones). Of course, you should tune this according to your situation.
The main idea is to delete in small amounts of files to not saturate your disks it could take longer to recover the free space, but it will not load your server in peaks. You should consider two things:

  1. Free space – enough free space and to be sure the free space is enough for the peaks, when the cache could grow above the threshold.
  2. Number of deletions per iteration – you should experiment with this. Fist you should be away how many files are added for a period of time, which includes one peak and one off-peak and then to balance the number in such a way that after the period the cache is not above the maximum size. Probably the best is to start with a 24 hours period, which includes at least one peak.

As you can see the example above only 2 files are good enough for an iteration for our case. Taking into account the 200ms sleep between the files’ deletions 10 files at most should be deleted per second. In our case it is not enough for the peak, but for the off-peak, which is 20 hours every 24 hours, is good enough to get into the maximum size limit of the cache.

Here you can learn how to verify your nginx is deleting cache files and the impact of the default settings on a busy server in a peak: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) Our loaded server just stopped serving files and the bandwidth decreased with 99% because nginx cache manager suddenly started deleting cached files.