nginx remote logging to UDP rsyslog server (CentOS 7)

This article will present to you all the configuration needed to remotely save access logs of an Nginx web server. All the configuration from the client and server sides is included. The client and the server use CentOS 7 Linux distribution and the configuration could be used under different Linux distribution. Probably only Selinux rules are kind of specific to the CentOS 7 and the firewalld rules are specific for those who use it as a firewall replacing the iptables. Here is the summary of what to expect:

  • Client-side – nginx configuration
  • Server-side – rsyslog configuration to accept UDP connections
  • Server-side – selinux and firewall configuration

The JSON formatted logs may be sent to a Elasticsearch server, for example. Here is how to do it – send access logs in json to Elasticsearch using rsyslog

STEP 1) Client-side – the Nginx configuration.

Nginx configuration is pretty simple just a single line with the log template and the IP (and port if not default 514) of the rsyslog server. For the record, this is the official documentation https://nginx.org/en/docs/syslog.html. In addition it worth mentioning there could be multiple access_log directives in a single section to log simultaneously on different targets (and the templates may be different or the same). So you can set the access log output of a section locally and remotely.
Nginx configuration (probably /etc/nginx/nginx.conf or whatever is the organization of your Nginx configuration files.)

server {
     .....
     access_log      /var/log/nginx/example.com_access.log main;
     access_log      syslog:server=10.10.10.2:514,facility=local7,tag=nginx,severity=info main3;
     .....
}

The “main” and “main3” are just names of the logging templates defined earlier (you may check rsyslog remote logging – prevent local messages to appear to see an interesting Nginx logging template).
The error log also could be remotely logged:

error_log syslog:server=10.10.10.3 debug;

STEP 2) Server-side – rsyslog configuration to accept UDP connections.

Of course, if you have not installed the rsyslog it’s high time you installed it with (for CentOS 7):

yum install -y rsyslog

To enable rsyslog to listen for UDP connections your rsyslog configuration file (/etc/rsyslog.conf) must include the following:

$ModLoad imudp
$UDPServerRun 514

Most of the Linux distributions have these two lines commented so you just need to uncomment them by removing the “#” from the beginning of the lines. If the lines are missing just add them under section “MODULES” (it should be near the first lines of the rsyslog configuration file).
Change the 514 with the number you like for the UDP listening port.
Write the client’s incoming lines of information to a different location and prevent merging with the local log messages – rsyslog remote logging – prevent local messages to appear. Include as a first rule under the rules’ section starting with “RULES” of the rsyslog configuration file (/etc/rsyslog.conf):

# Remote logging
$template HostIPtemp,"/mnt/logging/%FROMHOST-IP%.log"
if ($fromhost-ip != "127.0.0.1" ) then ?HostIPtemp
& stop

Logs only of remote hosts are going to be saved under /mnt/logging/.log.
Keep on reading!

nginx proxy cache – log the upstream response server, time, cache status, connect time and more in nginx access logs

The Nginx upstream module exposes embedded variables, which we can use to log them in the Nginx access log files.
Some of the variables are really interesting and could be of great use to the system administrators and in general to tune your systems (content delivery network?). For example, you can log

  • $upstream_cache_status – the cache status of the object the server served. For each URI you will have in the logs if the item is from the cache (HIT) or the Nginx used an upstream server to get the item (MISS)
  • $upstream_response_time – the time Nginx proxy needed to get the resource from the upstream server
  • $upstream_addr – the Nginx upstream server used for the requested URI in the logs.
  • $upstream_connect_time – the connect time to the specific

And many more you may check the documentation at the bottom with heading “Embedded Variables” – http://nginx.org/en/docs/http/ngx_http_upstream_module.html

For example, in peak hours, you can see how the time to get the resource from the upstream servers changes.

And you can substruct the time from the time your server served the URI to the client.

Of course, you can use this with any upstream case not only with proxy cache! These variables may be used with application backend servers like PHP (FastCGI) application servers and more. In a single log in the access log file, there could be information not only for the URI but for the time spent to generate the request in the application server.

Example

Logging in JSON format (JSON is just for the example, you can use the default string):

        log_format main3 escape=json '{'
                '"remote_addr":"$remote_addr",'
                '"time_iso8601":"$time_iso8601",'
                '"request_uri":"$request_uri",'
                '"request_length":"$request_length",'
                '"request_method":"$request_method",'
                '"request_time":"$request_time",'
                '"server_port":"$server_port",'
                '"server_protocol":"$server_protocol",'
                '"ssl_protocol":"$ssl_protocol",'
                '"status":"$status",'
                '"bytes_sent":"$bytes_sent",'
                '"http_referer":"$http_referer",'
                '"http_user_agent":"$http_user_agent",'
                '"upstream_response_time":"$upstream_response_time",'
                '"upstream_addr":"$upstream_addr",'
                '"upstream_connect_time":"$upstream_connect_time",'
                '"upstream_cache_status":"$upstream_cache_status",'
                '"tcpinfo_rtt":"$tcpinfo_rtt",'
                '"tcpinfo_rttvar":"$tcpinfo_rttvar"'
                '}';

We included the variables we needed, but there are a lot more, check out the Nginx documentation for more.
Just add the above snippet to your Nginx configuration and activate it with the access_log directive:

access_log      /var/log/nginx/example.com-json.log main3;

“main3” is the name of the format and it could be anything you like.

And the logs look like:

{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T13:36:33+00:00","request_uri":"/i/example/bc/bcda7f798ea1c75f18838bc3f0ffbd1c_200.jpg","request_length":"412","request_method":"GET","request_time":"0.325","server_port":"8801","server_protocol":"HTTP/1.1","ssl_protocol":"","status":"404","bytes_sent":"332","http_referer":"https://example.com/test_1","http_user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12","upstream_response_time":"0.324","upstream_addr":"10.10.10.2","upstream_connect_time":"0.077","upstream_cache_status":"MISS","tcpinfo_rtt":"45614","tcpinfo_rttvar":"22807"}
{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T13:36:33+00:00","request_uri":"/i/example/2d/2df5f3dfe1754b3b4ba8ac66159c0384_200.jpg","request_length":"412","request_method":"GET","request_time":"0.242","server_port":"8801","server_protocol":"HTTP/1.1","ssl_protocol":"","status":"404","bytes_sent":"332","http_referer":"https://example.com/test_1","http_user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12","upstream_response_time":"0.242","upstream_addr":"10.10.10.2","upstream_connect_time":"0.000","upstream_cache_status":"MISS","tcpinfo_rtt":"46187","tcpinfo_rttvar":"23093"}
{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T13:36:41+00:00","request_uri":"/flv/example/test_1.ts?st=E05FMg-DSIAgRfVhbadUWQ&e=1568381799&sopt=pdlfwefdfsr","request_length":"357","request_method":"GET","request_time":"0.960","server_port":"8801","server_protocol":"HTTP/1.0","ssl_protocol":"","status":"200","bytes_sent":"3988358","http_referer":"","http_user_agent":"Lavf53.32.100","upstream_response_time":"0.959","upstream_addr":"10.10.10.2","upstream_connect_time":"0.000","upstream_cache_status":"MISS","tcpinfo_rtt":"46320","tcpinfo_rttvar":"91"}
{"remote_addr":"10.10.10.10","time_iso8601":"2019-09-12T14:09:34+00:00","request_uri":"/flv/example/aee001dce114c88874b306bc73c2d482_1.ts?range=564-1804987","request_length":"562","request_method":"GET","request_time":"0.613","server_port":"8801","server_protocol":"HTTP/1.0","ssl_protocol":"","status":"200","bytes_sent":"5318082","http_referer":"","http_user_agent":"AppleCoreMedia/1.0.0.16E227 (iPad; U; CPU OS 12_2 like Mac OS X; en_us)","upstream_response_time":"","upstream_addr":"","upstream_connect_time":"","upstream_cache_status":"HIT","tcpinfo_rtt":"45322","tcpinfo_rttvar":"295"}

It’s easy to print them beatiful in the console with the “jq” tool

[root@srv logging]# tail -f 10.10.10.10.log|awk 'BEGIN {FS="{"} {print "{"$2}'|jq "."
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T13:36:33+00:00",
  "request_uri": "/i/example/bc/bcda7f798ea1c75f18838bc3f0ffbd1c_200.jpg",
  "request_length": "412",
  "request_method": "GET",
  "request_time": "0.325",
  "server_port": "8801",
  "server_protocol": "HTTP/1.1",
  "ssl_protocol": "",
  "status": "404",
  "bytes_sent": "332",
  "http_referer": "https://example.com/test_1",
  "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12",
  "upstream_response_time": "0.324",
  "upstream_addr": "10.10.10.2",
  "upstream_connect_time": "0.077",
  "upstream_cache_status": "MISS",
  "tcpinfo_rtt": "45614",
  "tcpinfo_rttvar": "22807"
}
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T13:36:33+00:00",
  "request_uri": "/i/example/2d/2df5f3dfe1754b3b4ba8ac66159c0384_200.jpg",
  "request_length": "412",
  "request_method": "GET",
  "request_time": "0.242",
  "server_port": "8801",
  "server_protocol": "HTTP/1.1",
  "ssl_protocol": "",
  "status": "404",
  "bytes_sent": "332",
  "http_referer": "https://example.com/test_1",
  "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12",
  "upstream_response_time": "0.242",
  "upstream_addr": "10.10.10.2",
  "upstream_connect_time": "0.000",
  "upstream_cache_status": "MISS",
  "tcpinfo_rtt": "46187",
  "tcpinfo_rttvar": "23093"
}
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T13:36:41+00:00",
  "request_uri": "/flv/example/test_1.ts?st=E05FMg-DSIAgRfVhbadUWQ&e=1568381799&sopt=pdlfwefdfsr",
  "request_length": "357",
  "request_method": "GET",
  "request_time": "0.960",
  "server_port": "8801",
  "server_protocol": "HTTP/1.0",
  "ssl_protocol": "",
  "status": "200",
  "bytes_sent": "3988358",
  "http_referer": "",
  "http_user_agent": "Lavf53.32.100",
  "upstream_response_time": "0.959",
  "upstream_addr": "10.10.10.2",
  "upstream_connect_time": "0.000",
  "upstream_cache_status": "MISS",
  "tcpinfo_rtt": "46320",
  "tcpinfo_rttvar": "91"
}
{
  "remote_addr": "10.10.10.10",
  "time_iso8601": "2019-09-12T14:09:34+00:00",
  "request_uri": "/flv/example/aee001dce114c88874b306bc73c2d482_1.ts?range=564-1804987",
  "request_length": "562",
  "request_method": "GET",
  "request_time": "0.613",
  "server_port": "8801",
  "server_protocol": "HTTP/1.0",
  "ssl_protocol": "",
  "status": "200",
  "bytes_sent": "5318082",
  "http_referer": "",
  "http_user_agent": "AppleCoreMedia/1.0.0.16E227 (iPad; U; CPU OS 12_2 like Mac OS X; en_us)",
  "upstream_response_time": "",
  "upstream_addr": "",
  "upstream_connect_time": "",
  "upstream_cache_status": "HIT",
  "tcpinfo_rtt": "45322",
  "tcpinfo_rttvar": "295"
}

3 misses and 1 hit, the hit 3 of the upstream variables we used are blank, because the server took the item from the cache.

Live status information like used space and more for nginx proxy cache

Using the Nginx virtual host traffic status module you can have extended live information for your proxy cache module and the proxy cache upstream servers. We have covered the topic of how to install the module here – Install Nginx virtual host traffic status module – traffic information in Nginx and more per server block and upstreams and this article is just to show you what information you could have using the module with proxy cache (and the upstream servers) module.
In general, there is no live information about Nginx proxy cache. Of course, by the space it is occupied in the disk you can guess how much space is taken by your Nginx cache (or when you restart or upgrade the Nginx it would reinitialize the cache and when finished the numbers would be written in the error log). With this module “Nginx virtual host traffic status module” – https://github.com/vozlt/nginx-module-vts you would have additional status information page containing information for the proxy cache module (we included only for the proxy cache here, for more look at the other article mentioned above), too:

Per upstream server

  • state – up, down, backup server and so on.
  • Response Time – the time the server responded last time. You can use this to see how far away is your server and to detect problems with your upstreams connectivity.
  • Weight – the weight of the server in the group. It’s from the configuration file.
  • Max Fails – the max fail attempts before it blacklists the upstream for “Fail Timeout” time. It’s from the configuration file.
  • Fail Timeout – the time, which the server will be blacklisted and the time the all fails (from Max Fails) must occur. It’s from the configuration file.
  • RequestsTotal – from the start of the server, Req/s – Requests per second at the moment of loading the extended status page, Time.
  • ResponsesTotal and split by error codes – 1xx, 2xx, 3xx, 4xx, 5xx.
  • TrafficSent – total sent from the start of the server), Rcvd – total received from the start of the server, Sent/s – sent per second at the moment, Rcvd/s – received per second at the moment.

Per cache – i.e. key zone name

  • SizeCapacity – the capacity from the configuration file, Used – used space lively updated! With this you can have access to the used space of your zones with only loading a page – the extended status page of this module!
  • TrafficSent – total sent from the start of the server, Rcvd – total received from the start of the server, Sent/s – sent per second at the moment, Rcvd/s – received per second at the moment.
  • CacheMiss, Bypass, Expired, Stale, Updating, Revalidated, Hit, Scarce, Total – they all are self-explanatory and all counters are from the start of the server.

You can compute the effectiveness of your cache for a period of time. For example, you can make different graphs based on this data for long periods and for different short periods like in peaks of off-peaks. We might have an article on the subject.

SCREENSHOT 1) Cache with three cache zones and two upstream servers – main and backup

As you can see our biggest zone has 2.92T occupied and it is 100% of the available space, so probably the cache manager is deleteing at the moment. The hits are 24551772 and the total is 28023927 so the ratio in percentages is 87.6%! 87.6% of the hits of this zone is servered by the server without the need of touching the upstream servers. In the first cache zone we have more aggressive time expiring, so there were 21% requests, which were updated.

main menu
Cache with three cache zones and two upstream servers – main and backup

Keep on reading!

Install Nginx virtual host traffic status module – traffic information in nginx and more per server block and upstreams

This article is going to show how to compile and install the Nginx module – ngx_http_vhost_traffic_status.

The module gathers traffic information per the server blocks and upstream servers and shows information for Nginx proxy cache like used space.

In addition, the module shows the type of the Response – 1xx, 2xx, 3xx, 4xx, 5xx and total. So when if problems occur in a server block or an upstream server
This module nginx-module-vts offers really extended status information for your Nginx.
Here is one the status page of our web servers with 18 virtual hosts:

The status page shows all virtual hosts in section “Server zones” and all upstream servers for the FastCGI PHP backend servers.

Traffic, requests, and status codes are available. All data is available in JSON, too.

main menu
Traffic information in Nginx and more per server block and upstreams

Server zones information

  • Requests – Total, Requests/s, Time
  • Responses – 1xx, 2xx, 3xx, 4xx, 5xx, Total
  • Traffic – Sent, Received, Sent/s, Received/s
  • Cache – Miss, Bypass, Expired, Stale, Updating, Revalidated, Hit, Scarce, Total

In addition to the information above there are State, Response Time, Weight, MaxFails and FileTimeout for all the upstream servers. And for the Nginx proxy cache there are Size, Capacity (live information!) and all information above per zone – there is an additional article here Live status information like used space and more for nginx proxy cache.
Keep on reading!

nginx – remove “no live upstreams” for your backup connections

In this article, we discuss the use of Nginx upstream module for HTTP and CGI (FastCGI) requests.

Using Nginx upstream module is essential for scaling application backend, but there might be a few catch-ups. One of them is related to what will happen when a server fails?
There is a proxy_next_upstream directive (or for FastCGI – fastcgi_next_upstream), which instructs the upstream module what is a fail – is it a connection error or timeout or HTTP 500 returned by the upstream server or an ordinary HTTP 404 returned by the upstream server is also a fail. So when a failure is identified by the Nginx upstream module the upstream module will look for the next upstream server to handle the request. These directives instruct Nginx upstream module what is a failure then to handle the next upstream server if available.
The default values are too conservative (and probably it is better to be like that):

proxy_next_upstream error timeout;

And available options are:

proxy_next_upstream error | timeout | invalid_header | http_500 | http_502 | http_503 | http_504 | http_403 | http_404 | http_429 | non_idempotent | off ...;

They are the same for fastcgi_next_upstream, too.

Imagine you what to protect yourself from HTTP 500? Or even HTTP 403 and 404? It’s normal to include them in your configuration. But here is the catch-up:
What if an image is really missing (HTTP 404) on all your upstream backend servers? Or there is a syntax error in an application file (like a simple PHP file)? All your upstreams servers will return 404 or 500 (in case of application error) and all of them will be blacklisted for at least 20 seconds. Remember 404 or 500 is a failure and we need next upstream server and if all return failure for this particular request Nginx will return to the client there is a problem and will mark the server as down (unavailable for a period of time).

SO because of a single file (a problem in a single file), all the following requests will be denied with “502 Bad Gateway” or “500 Internal Server Error”, even your servers are healthy!

Just a tiny miss like a missing image could be misinterpreted as your upstream servers have problems, so they must be blocked! Even if you put a “backup” directive in the upstream server line!

The solution is to include one or more of your upstream servers with disabled failure count (fail_timeout=0s) as a backup server. So this server will be always available when all normal servers got blacklisted! And you are not going to receive any more “no live upstreams” and returning an error to your clients.
Here is a working configuration (it is the same for HTTP and FastCGI setups):

upstream backend {
     server   127.0.0.1:8000;
     server   10.10.10.10:9000 fail_timeout=20s;
     server   127.0.0.1:8000 backup fail_timeout=0s max_fails=1000;
}

And be careful what you add in proxy_next_upstream (or fastcgi_next_upstream). In general, HTTP 403 and 403 are not for this directive!

A real-world example (FastCGI)

One of our project using an Nginx with the upstream module to scale the PHP application backend began to serve only HTTP 502 to all clients! In the PHP logs, there was a rear syntax error on a single file (not in the main part of the site), but Nginx was answering to all requests, no matter of the URI with 502. What had happened? The two backend application servers had returned with 500 (because of this error) and were blacklisted for 20 seconds! And all the following requests were not served by the upstream backends because there were “no live upstreams”:

upstream backend-php {
        server   127.0.0.1:8000;
        server   178.63.22.46:9000 fail_timeout=20s;
        server   127.0.0.1:8000 backup;
}

with

fastcgi_next_upstream error timeout invalid_header http_500 http_503 http_429 non_idempotent;

Even we have a backup, it was also blacklisted. In fact, a scanner was scanning all of our PHP files and one URL returned syntax error, then all of the upstream servers were blacklisted and we experienced an effective DoS because of misconfiguration. After we changed our configuration to:

upstream backend-php {
        server   127.0.0.1:8000;
        server   10.10.10.10:9000 fail_timeout=20s;
        server   127.0.0.1:8000 backup fail_timeout=0s max_fails=1000;
}

Everything returned to normal. There was still this syntax error but did not stop all other valid URLs to be served. Of course, we fixed the broken file and stopped the scammer from scanning our site.

A real-world example 2 (HTTP)

In our proxy static cache servers in remote locations, we experienced periodically “no live upstreams” and our clients received “502 Bad Gateway” on-peak hours! The problem was we have too aggressive proxy connect, read and send timeout, but because we were serving a live TV we needed them. And on-peak if a single connection just huck-up for 5-10 seconds our upstream servers were blacklisted for 20 seconds! Using proxy_cache_lock could worsen the situation! Then we changed our configuration to have a backup upstream server, which effectively would not be blacklisted and lowered the proxy_cache_lock to be sure if a single connection failed for some reason all other might succeed in bringing the file to the cache!

proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504 http_403 http_404;
proxy_connect_timeout 2s;
proxy_read_timeout 5s;
proxy_send_timeout 5s;
proxy_cache_lock on;
proxy_cache_lock_timeout 20s;
proxy_cache_lock_age 10s;

with upstream configuration:

upstream backend_http {
        server 10.10.10.10 fail_timeout=20s;
        server 10.10.10.11 backup fail_timeout=0s max_fails=1000;
        keepalive 16;
}

ansible – restart a (nginx) service only if it is running and the configuration is ok

Another ansible quick tip showing how to restart a program properly. We want to restart the program or the service only if it is running (because some system on executing restart may start the service even it is in the stopped state).
Here is what the ansible playbook do:

  1. Check if the program is running.
  2. Check the configuration of the program. Do not restart a program or service if it cannot start after a stop command because of bad configuration file(s).
  3. Restart the service (the program) only if the above two are true.

If you are a newbie in ansible you can check this article – First ansible use – install and execute a single command or multiple tasks in a playbook There you can see how to create your inventory file (and configure sudo if you remotely log in with unprivileged user) used herein the example.

Ansible YAML file

For our example we use the nginx webserver in the ansible playbook. Put the following code in a file and then execute ansible-playbook:

---
- hosts: all
  tasks:
            
    - name: Test for running nginx
      shell: ps axuf|grep 'nginx'|grep -v "grep" | tr -d "\n" | cat
      register: test_running_nginx
      changed_when: False
      tags: restart-nginx
      
    - name: First check the configuration
      shell: /usr/sbin/nginx -t
      register: test_nginx_config
      when: test_running_nginx.stdout != ""
      changed_when: False
      ignore_errors: True
      tags: restart-nginx
          
    - name: Restart nginx
      service: name=nginx state=restarted
      when: test_running_nginx.stdout != "" and test_nginx_config.rc == 0
      tags: restart-nginx

Here is how to run the above ansible playbook

myuser@srv ~ $ ansible-playbook -l srv2 -i ./inventory.ini ./playbook-example.yml -b

PLAY [all] *****************************************************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************************************
ok: [srv2]

TASK [Test for running nginx] **********************************************************************************************************************************************
ok: [srv2]

TASK [First check the configuration] ***************************************************************************************************************************************
ok: [srv2]

TASK [Restart nginx] *******************************************************************************************************************************************************
changed: [srv2]

PLAY RECAP *****************************************************************************************************************************************************************
srv2                       : ok=4    changed=1    unreachable=0    failed=0   

Here we add to the command line “-b”, which will escalate to root if it is needed (using sudo) because the remote connection is done with unprivileged user “myuser”. You can skip this option if you described the remote connection with the root user in the inventory file (or a system user, which has permissions to restart services).
Keep on reading!

ansible – insert after only if the pattern exists and the new insert is not there

Here is a quick ansible tip for system administrators for the ansible lineinfile. Imagine you want to insert a line after a word (or a predefined marker in your configuration file), but you want to insert the line ONLY if the word exists!
It could be done with lineinfile module but there is a limitation. The module will insert after the first occurrence of your marker or at the end of the file. Here is what the manual says: “If specified regular expression has no matches, EOF will be used instead.” And what if you what to insert some additional line to your structured configuration file? It will corrupt your configuration file, so we need something else!
Not only this! Imagine you have already inserted the line in a previous playbook run? It will be unwanted to add the line, again and again, each time the playbook is run. So here we propose the following solution:

  1. Test for existance of the file you want to insert text.
  2. Test for the existance of the marker (aka tag) in the file.
  3. Test for the existance of the line we want to insert.
  4. Insert the line after the marker (aka tag) if all of the above three conditions are true.

Here we use three ansible modules – stat, shell, lineinfile and variables and conditional checks.
If you are a newbie in ansible you can check this article – First ansible use – install and execute a single command or multiple tasks in a playbook There you can see how to create your inventory file (and configure sudo if you remotely log in with unprivileged user) used herein the example:

Ansible YAML file

---
- hosts: all
  tasks:
        - name: Test for nginx-config
          stat:
            path: /etc/nginx/nginx.conf
          register: test_exist_nginx_config
          tags: cors-insert-include
      
        - name: Test for \#FIRST-SRV-LOCATION tag
          shell: grep '#FIRST-SRV-LOCATION' /etc/nginx/nginx.conf | tr -d "\n" | cat
          register: test_first_srv_location
          when: test_exist_nginx_config.stat.exists
          changed_when: False
          tags: cors-insert-include

        - name: Test for cors-locations.loc inserted already
          shell: grep "cors-locations.loc" /etc/nginx/nginx.conf | tr -d "\n" | cat
          register: test_cors_locations_loc
          when: test_exist_nginx_config.stat.exists
          changed_when: False
          tags: cors-insert-include
          
        - name: Insert the includes after \#FIRST-SRV-LOCATION
          lineinfile:
            path: /etc/nginx/nginx.conf
            insertafter: '#FIRST-SRV-LOCATION'
            line: '                include /etc/nginx/conf.d/cors-locations.loc;'
            state: present
          when: test_exist_nginx_config.stat.exists and test_first_srv_location.stdout != "" and test_cors_locations_loc.stdout == ""
          tags: cors-insert-include

We want to insert a new include line after our predefined tag “#FIRST-SRV-LOCATION” in the nginx webserver’s main configuration file.

Here is how to run the above ansible playbook

Keep on reading!

Really bad performance when going from Write-Back to Write-Through in a LSI controller

Ever wonder what is the impact of write-through of an LSI controller in a real-world streaming server? Have no wonder anymore!

you can get several (multiple?) times slower with the write-through mode than if your controller were using the write-back mode of the cache

And it could happen any moment because when charging the battery of the LSI controller and you have set “No Write Cache if Bad BBU” the write-through would kick in. Of course, you can make a schedule for the battery charging/discharging process, but in general, it will happen and it will hurt your IO performance a lot!

In simple words a write operation is successful only if the controller confirms the write operation on all disks, no matter the data has already been in the cache.

This mode puts pressure on the disks and Write-Through is a known destroyer of hard disks! You can read a lot of administrator’s feedback on the Internet about crashed disks using write-through mode (and sometimes several simultaneously on one machine losing all your data even it would have redundancy with some of the RAID setups like RAID1, RAID5, RAID6, RAID10 and so).

srv ~ # sudo megacli -ldinfo -lall -aall
                                     
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :system
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 13.781 TB
Sector Size         : 512
Mirror Data         : 13.781 TB
State               : Optimal
Strip Size          : 128 KB
Number Of Drives per span:2
Span Depth          : 6
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only

Exit Code: 0x00

As you can see our default cache policy is WriteBack and “No Write Cache if Bad BBU”, the BBU is not bad, but charging!
Keep on reading!

nginx with php fpm (fastcgi) and the warning – an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp

As the web grows and the technology advances the page size of the web sites also grows or just some times you might want to output a big chunk of data from your application server – PHP-FPM (but it could be any of another ruby, python, C, Django and more), for example.
Here is a fast configuration tip (note this is not the proxy-related warning!):

The default nginx buffers per CGI connection are too small

Here is what to do in your nginx configuration file:
First, look for a line “include /etc/nginx/fastcgi_params;” or similar and add or edit if they exist after this line:

        fastcgi_buffer_size 16k;
        fastcgi_buffers 32 16k;

Check out more for the buffers here http://nginx.org/en/docs/http/ngx_http_fastcgi_module.html#fastcgi_buffers
The warning should stop if it does not stop you can try raising them. It could consume more memory but could lower the IO usage of your disks and improve the performance of your site or whatever backend works!

Here is the warning in our nginx error logs. We got this warning when using php-fpm and the php output size was 325965 bytes (~320K).

2019/04/04 09:56:05 [warn] 24451#24451: *44269838 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/0/12/0019966120 while reading upstream, client: 10.10.10.10, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"
2019/04/04 09:56:07 [warn] 24451#24451: *44269849 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/2/12/0019966122 while reading upstream, client: 10.10.10.11, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"
2019/04/04 09:56:09 [warn] 24450#24450: *44269856 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/7/12/0019966127 while reading upstream, client: 10.10.10.12, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"

Tune nginx proxy cache – control the cache manager how to delete cached files

In most cases you’ll never want to modify the default settings for deleting cache items with proxy_cache_path directives. The problem is in a peak the file deleting could impact your server performance and even it could kill your server leaving it unresponsive for a period of time. You cannot instruct nginx with a schedule job for deletion cached items or ban the deletion when the server is busy or loaded. The manager just traces each zone for used cache capacity versus the maximum allowed size and if the used capacity is near or bigger than the maximum allowed size (max_size) the manager process triggers deletion with the default values – the nginx manager will try to delete at least 100 files (up to 200 milliseconds) and then it will sleep for 50 milliseconds then again it will try deleting 100 files. So your file system could receive at least 1000 files per second to delete!

This could lead your server to almost unresponsive state in the peaks.

And it could be perfectly OK in off-peaks, but there is no way how to tell nginx cache manager there is a plenty free space despite you reach the cache limit so at the moment it is not the best time to delete the cache!

You can tune three parameters per cache directory (manual here: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path):

  • manager_files – not more than this number of files to delete in one iteration. The default value is 100.
  • manager_threshold – limit the delete iteration time. The default value is 200 milliseconds and you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
  • manager_sleep – how much time to sleep the manager before executing another delete iteration. The default value is 50 milliseconds and here you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g manager_files=2 manager_sleep=200ms manager_threshold=500ms;

The cache manager will delete not more than 2 files for up to 500 milliseconds and it will sleep 200 milliseconds before another delete iteration.

The best option for loaded servers

The best option for loaded servers with full cache is to balance the free space – delete small amount of files at once to be sure your server will not get loaded even the free space decreases at the peaks (so more files are cached than the nginx manager could delete – you are aware of this and the free space should be enough), but during the off peak (which normally is several times longer than the peak) the nginx manager could catch up with the deleting and it should free up some space (cached files are lesser than the deleted ones). Of course, you should tune this according to your situation.
The main idea is to delete in small amounts of files to not saturate your disks it could take longer to recover the free space, but it will not load your server in peaks. You should consider two things:

  1. Free space – enough free space and to be sure the free space is enough for the peaks, when the cache could grow above the threshold.
  2. Number of deletions per iteration – you should experiment with this. Fist you should be away how many files are added for a period of time, which includes one peak and one off-peak and then to balance the number in such a way that after the period the cache is not above the maximum size. Probably the best is to start with a 24 hours period, which includes at least one peak.

As you can see the example above only 2 files are good enough for an iteration for our case. Taking into account the 200ms sleep between the files’ deletions 10 files at most should be deleted per second. In our case it is not enough for the peak, but for the off-peak, which is 20 hours every 24 hours, is good enough to get into the maximum size limit of the cache.

Here you can learn how to verify your nginx is deleting cache files and the impact of the default settings on a busy server in a peak: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) Our loaded server just stopped serving files and the bandwidth decreased with 99% because nginx cache manager suddenly started deleting cached files.