Data too large, data for [] would be [] which is larger than the limit of

Rsyslog writing to Elasticsearch could lead to an error for some of the records and missing to save them in the backend:

{ ... { "error": { "root_cause": [ { "type": "circuit_breaking_exception", 
"reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }], 
"type": "circuit_breaking_exception", "reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }, "status": 429 } }

Unfortunately, such writes are not saved in the Elasticsearch and the data has been lost.

The problem here is that the Java VM has reached the maximum allowed memory and more memory should be allowed to be used by the Java Virtual Machine.

Find the Java VM option for the Elasticsearchjvm.options. In CentOS 7 the file is located in /etc/elasticsearch/jvm.options and set more memory with the variables “-Xms[SIZE]g -Xmx[SIZE]g”, such as:

.....
-Xms4g
-Xmx4g
.....

|grep -v grep
This will allow 4G “maximum size of total heap space” to be used by the Java Virtual Machine. By default, it is 1G (-Xms1g -Xmx1g). It is a good idea to set it half of the server’s memory. Save and restart the Elasticsearch service as usual:

systemctl restart elasticsearch

You should see the variable in the command line with ps command:

[root@loganalyzer ~]# ps axuf|grep elasticsearch
elastic+   592 10.8 34.4 168638848 5493156 ?   Ssl  00:56   4:23 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 
-Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false 
-Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT 
-Xms4g -Xmx4g 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-Djava.io.tmpdir=/tmp/elasticsearch-16851535740012150929 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch 
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log elasticsearch
-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m 
-XX:MaxDirectMemorySize=2147483648 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch 
-Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true 
-cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
elastic+   690  0.0  0.0  70448  4516 ?        Sl   00:56   0:00  \_ /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

The environment variable ES_JAVA_OPTS could be used, too.

ES_JAVA_OPTS="-Xms4g -Xmx4g" ./bin/elasticsearch 

send access logs in json to Elasticsearch using rsyslog

Here is a simple example of how to send well-formatted JSON access logs directly to the Elasticsearch server.

It is as simple as Nginx (it could be any webserver) sends the access logs using UDP to the rsyslog server, which then sends well-formatted JSON data to the Elasticsearch server.

No other server program like logstash is used. The data is transformed in rsyslog and it is passed through a couple of modules to ensure the JSON is valid and Elasticsearch would not complain (and missing logs entry!).
Objectives:

  1. Nginx to send access logs using UDP to the rsyslog server.
  2. rsyslog server to accept UDP messages.
  3. rsyslog server transforms the web-server access logs from the Nginx server to JSON.
  4. rsyslog server sends the validated JSON to the Elasticsearch server.

The configuration and the commands are tested on CentOS 7, CentOS 8 and Ubuntu 18 LTS (just replace yum with apt).

STEP 1) Nginx to send access logs using UDP to the rsyslog server.

It is simple enough to send Nginx’ access logs to a UDP server (local or remote) there are two articles here: nginx remote logging to UDP rsyslog server (CentOS 7) and syslog – UDP local to rsyslog and send remote with TCP and compression. For simplicity, Nginx will send to the remote rsyslog server using UDP.
Instruct the Nginx to send access logs using UDP to the remote rsyslog server.
Define a new access log format in http serction:

        log_format mainJSON escape=json '@cee: {'
                '"vhost":"$server_name",'
                '"remote_addr":"$remote_addr",'
                '"time_iso8601":"$time_iso8601",'
                '"request_uri":"$request_uri",'
                '"request_length":"$request_length",'
                '"request_method":"$request_method",'
                '"request_time":"$request_time",'
                '"server_port":"$server_port",'
                '"server_protocol":"$server_protocol",'
                '"ssl_protocol":"$ssl_protocol",'
                '"status":"$status",'
                '"bytes_sent":"$bytes_sent",'
                '"http_referer":"$http_referer",'
                '"http_user_agent":"$http_user_agent",'
                '"upstream_response_time":"$upstream_response_time",'
                '"upstream_addr":"$upstream_addr",'
                '"upstream_connect_time":"$upstream_connect_time",'
                '"upstream_cache_status":"$upstream_cache_status",'
                '"tcpinfo_rtt":"$tcpinfo_rtt",'
                '"tcpinfo_rttvar":"$tcpinfo_rttvar"'
                '}';

It is a valid JSON object, but sometimes in user agent or referer contain non-standard and not valid characters, so it breaks the JSON format, which may lead to problems in Elasticsearch (read ahead).

In a server section of Nginx configuration file /etc/nginx/nginx.conf:

server {
     .....
     access_log      /var/log/nginx/example.com_access.log main;
     access_log      syslog:server=10.10.10.2:514,facility=local7,tag=nginx,severity=info mainJSON;
     .....
}

Keep on reading!

syslog – UDP local to rsyslog and send remote with TCP and compression

This article is to show how to log Nginx’s access logs locally using UDP to the local rsyslog daemon, which will send the logs to a remote rsyslog server using TCP and compression. In general, logs could generate a lot of traffic and using UDP over distant locations could result in packet loss respectively logs’ lines loss. The idea here is to log messages locally using UDP (non-blocking way) to a local Syslog server, which will send the stream to a remote central Syslog server using TCP connections to be sure no packets are lost. In addition, we are going to enable local caching (if the remote server is temporary unreachable) and compression between the two Syslog servers.
Our goal is to use

  • UDP for our client program (Nginx in the case) for non-blocking log writes.
  • TCP between our local machine and the remote syslog server – to be sure not to lose messages on bad connectivity.
  • local caching for our client machine – not to lose messages if the remote syslog is temporary unreachable.
  • compression between the local machine and the remote syslog server.

The configuration and the commands are tested on CentOS 7, CentOS 8 and Ubuntu 18 LTS. Check out UDP remote logging here – nginx remote logging to UDP rsyslog server (CentOS 7).

STEP 1) Configure client’s local rsyslog to accept UDP log messages only if the messages’ tags are “nginx”

Couple of things should be enabled in the local client-size rsyslog daemon:

  • rsyslog to accept UDP messages. Uncomment or add the following under section “Modules” (probably the first section in the file?) in /etc/rsyslog.conf
    $ModLoad imudp
    $UDPServerRun 514
    

    or

    module(load="imudp")
    input(type="imudp" port="514")
    

    The first is the old syntax, which is still supported and the second is the new syntax. For simplicity, all of the following configuration will be using the new syntax, because the old one is depricated.

  • Add a rule to catch the tag containing “nginx” and execute action to forward the messages to the remote server
    if ($syslogtag == 'nginx:') then {
    action(type="omfwd" target="10.10.10.10" port="10514" protocol="tcp" compression.Mode="single" ZipLevel="9"
    queue.filename="forwarding" queue.spoolDirectory="/var/log" queue.size="1000000" queue.type="LinkedList" queue.maxFileSize="1g" queue.SaveOnShutdown="on"
    action.resumeRetryCount="-1")
    & stop
    }
    
  • The options are almost self-explanatory, the important ones are there is no retry limit count of reconnecting to the server, there is in-disk cache of maximum 1G if the remote server is unavailable and the compression per message is turned on. More on actionshttps://www.rsyslog.com/doc/v8-stable/configuration/actions.html, the forward modulehttps://www.rsyslog.com/doc/v8-stable/configuration/modules/omfwd.html and the queuehttps://www.rsyslog.com/doc/v8-stable/rainerscript/queue_parameters.html

And restart the rsyslog:

systemctl restart rsyslog

Keep on reading!

nginx remote logging to UDP rsyslog server (CentOS 7)

This article will present to you all the configuration needed to remotely save access logs of an Nginx web server. All the configuration from the client and server sides is included. The client and the server use CentOS 7 Linux distribution and the configuration could be used under different Linux distribution. Probably only Selinux rules are kind of specific to the CentOS 7 and the firewalld rules are specific for those who use it as a firewall replacing the iptables. Here is the summary of what to expect:

  • Client-side – nginx configuration
  • Server-side – rsyslog configuration to accept UDP connections
  • Server-side – selinux and firewall configuration

The JSON formatted logs may be sent to a Elasticsearch server, for example. Here is how to do it – send access logs in json to Elasticsearch using rsyslog

STEP 1) Client-side – the Nginx configuration.

Nginx configuration is pretty simple just a single line with the log template and the IP (and port if not default 514) of the rsyslog server. For the record, this is the official documentation https://nginx.org/en/docs/syslog.html. In addition it worth mentioning there could be multiple access_log directives in a single section to log simultaneously on different targets (and the templates may be different or the same). So you can set the access log output of a section locally and remotely.
Nginx configuration (probably /etc/nginx/nginx.conf or whatever is the organization of your Nginx configuration files.)

server {
     .....
     access_log      /var/log/nginx/example.com_access.log main;
     access_log      syslog:server=10.10.10.2:514,facility=local7,tag=nginx,severity=info main3;
     .....
}

The “main” and “main3” are just names of the logging templates defined earlier (you may check rsyslog remote logging – prevent local messages to appear to see an interesting Nginx logging template).
The error log also could be remotely logged:

error_log syslog:server=10.10.10.3 debug;

STEP 2) Server-side – rsyslog configuration to accept UDP connections.

Of course, if you have not installed the rsyslog it’s high time you installed it with (for CentOS 7):

yum install -y rsyslog

To enable rsyslog to listen for UDP connections your rsyslog configuration file (/etc/rsyslog.conf) must include the following:

$ModLoad imudp
$UDPServerRun 514

Most of the Linux distributions have these two lines commented so you just need to uncomment them by removing the “#” from the beginning of the lines. If the lines are missing just add them under section “MODULES” (it should be near the first lines of the rsyslog configuration file).
Change the 514 with the number you like for the UDP listening port.
Write the client’s incoming lines of information to a different location and prevent merging with the local log messages – rsyslog remote logging – prevent local messages to appear. Include as a first rule under the rules’ section starting with “RULES” of the rsyslog configuration file (/etc/rsyslog.conf):

# Remote logging
$template HostIPtemp,"/mnt/logging/%FROMHOST-IP%.log"
if ($fromhost-ip != "127.0.0.1" ) then ?HostIPtemp
& stop

Logs only of remote hosts are going to be saved under /mnt/logging/.log.
Keep on reading!

rsyslog remote logging – prevent local messages to appear

A tip for those who have a remote user server for their log files. When you set up a remote server you probably don’t want local messages to appear in the logging directory (directories) and here is how you can archive it:
Above all the rules in the configuration file “/etc/rsyslog.conf” (or where it is in your system) you include “if” statement for the local server like this:

# Remote logging
$template HostIPtemp,"/mnt/logging/%FROMHOST-IP%.log"
if ($fromhost-ip != "127.0.0.1" ) then ?HostIPtemp
& stop

The name of the template is “HostIPtemp” and the starting part of the path “/mnt/logging/” may be anything you like.
All the remote messages will be redirected to the template and all the rules after won’t be applied to them because we use the “stop instruction”.

That’s why this rule must be above all rules in the whole rule configuration. Above all rules – probably you will find a commented line with “#### RULES ####”

The above configuration will have the following directory structure:

[root@srv ~]# ls -altr /mnt/logging/
total 2792
drwxr-xr-x. 7 root root    4096 12 Sep 10,05 ..
drwxr-xr-x. 2 root root    4096 12 Sep 13,01 .
-rw-------. 1 root root 2844525 12 Sep 13,01 10.10.10.10.log
-rw-------. 1 root root 1245633 12 Sep 13,01 10.10.10.11.log
-rw-------. 1 root root 9722578 12 Sep 13,01 10.10.10.12.log
-rw-------. 1 root root 1127231 12 Sep 13,01 10.10.10.13.log