Data too large, data for [] would be [] which is larger than the limit of

Rsyslog writing to Elasticsearch could lead to an error for some of the records and missing to save them in the backend:

{ ... { "error": { "root_cause": [ { "type": "circuit_breaking_exception", 
"reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }], 
"type": "circuit_breaking_exception", "reason": "[parent] Data too large, data for [<http_request>] would be [1008813778\/962mb], which is larger than the limit of [986061209\/940.3mb], 
real usage: [1008812248\/962mb], new bytes reserved: [1530\/1.4kb], usages [request=0\/0b, fielddata=317\/317b, in_flight_requests=1530\/1.4kb, accounting=178301893\/170mb]",
"bytes_wanted": 1008813778, "bytes_limit": 986061209, "durability": "PERMANENT" }, "status": 429 } }

Unfortunately, such writes are not saved in the Elasticsearch and the data has been lost.

The problem here is that the Java VM has reached the maximum allowed memory and more memory should be allowed to be used by the Java Virtual Machine.

Find the Java VM option for the Elasticsearchjvm.options. In CentOS 7 the file is located in /etc/elasticsearch/jvm.options and set more memory with the variables “-Xms[SIZE]g -Xmx[SIZE]g”, such as:

.....
-Xms4g
-Xmx4g
.....

|grep -v grep
This will allow 4G “maximum size of total heap space” to be used by the Java Virtual Machine. By default, it is 1G (-Xms1g -Xmx1g). It is a good idea to set it half of the server’s memory. Save and restart the Elasticsearch service as usual:

systemctl restart elasticsearch

You should see the variable in the command line with ps command:

[root@loganalyzer ~]# ps axuf|grep elasticsearch
elastic+   592 10.8 34.4 168638848 5493156 ?   Ssl  00:56   4:23 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 
-Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false 
-Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT 
-Xms4g -Xmx4g 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-Djava.io.tmpdir=/tmp/elasticsearch-16851535740012150929 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch 
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log elasticsearch
-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m 
-XX:MaxDirectMemorySize=2147483648 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch 
-Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true 
-cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
elastic+   690  0.0  0.0  70448  4516 ?        Sl   00:56   0:00  \_ /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

The environment variable ES_JAVA_OPTS could be used, too.

ES_JAVA_OPTS="-Xms4g -Xmx4g" ./bin/elasticsearch 

Kibana server is not ready yet – and Waiting for that migration to complete in the logs

Now, living in the cloud and big data there is a time when the admin may need to save all their logs in a central place! Elasticsearch and Kibana look good for the job! And after months of hassle-free work of the Elasticsearch and Kibana, Elasticsearch just stopped working and after a restart and upgrade (Elasticsearch and Kibana) Kibana showed an error message:

Kibana server is not ready yet

And if you have tried the STOP/START of Kibana and Elasticsearch and Kibana would still show the above message here is what you should do:

  1. Check if the two services are running! Kibana and Elasticsearch, if some of them is missing start it.
  2. Search for the logs and especially Elasticsearch logs. The first place to check is systemd logs with journalctl program (systemctl status also will point out the problem showing last lines of the logs).
  3. Look for the last lines and if they include

    Another Kibana instance appears to be migrating the index

    This article is probably the right place to solve the issue and start your setup successful.

STEP 1) Running services and analyzing the logs.

If kibana and elastisearch use systemd system to operate it is easy to access the logs with systemctl and journalctl
Check whether the kibana and elasticsearch are running with:

[root@loganalyzer ~]# ps ax|grep elasticsearch|grep -v grep
  258 ?        Ssl  836:31 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=COMPAT -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.io.tmpdir=/tmp/elasticsearch-13303119363353782625 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -XX:MaxDirectMemorySize=536870912 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=default -Des.distribution.type=rpm -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet
  360 ?        Sl     0:00 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller
[root@loganalyzer ~]# ps ax|grep kibana|grep -v grep
 1284 ?        Ssl    4:32 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli -c /etc/kibana/kibana.yml

If one of the two services are missing you must start it! And second, each service should have only one instance (i.e. process)!
And check the kibana logs with journalctl

[root@loganalyzer ~]# journalctl -u kibana.service
.....
.....
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","plugins","bfetch"],"pid":1219,"message":"Setting up plugin"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Waiting until all Elasticsearch nodes
 are compatible with Kibana before starting saved objects migrations..."}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Starting saved objects migrations"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["info","savedobjects-service"],"pid":1219,"message":"Creating index .kibana_task_manager_2
."}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["warning","savedobjects-service"],"pid":1219,"message":"Unable to connect to Elasticsearch
. Error: [resource_already_exists_exception] index [.kibana_task_manager_2/O070AunfSyG6hwd6_pqqRA] already exists, with { index_uuid=\"O070AunfSyG6hwd6_pqqRA\" & index=\".kibana_task_manager
_2\" }"}
Apr 24 23:09:31 loganalyzer kibana[1219]: {"type":"log","@timestamp":"2020-04-24T23:09:31Z","tags":["warning","savedobjects-service"],"pid":1219,"message":"Another Kibana instance appears to
 be migrating the index. Waiting for that migration to complete. If no other Kibana instance is attempting migrations, you can get past this message by deleting index .kibana_task_manager_2 
and restarting Kibana."}

The systemctl status may be used, too. The error and the index are shown in the last lines of the status output – look below.

The problem here is there was a migration of index .kibana_task_manager_2, but it was abandoned because of unknown reason and now we should delete it to be able to use our kibana service. The index name might be with another name but it is the same problem.

STEP 2) Delete kibana index

Delete the kibana index in elasticsearch backend using curl and HTTP/HTTPS request such as:

[root@loganalyzer kibana]# curl -XDELETE http://192.168.0.2:9200/.kibana_task_manager_2
{"acknowledged":true}

Keep on reading!

send access logs in json to Elasticsearch using rsyslog

Here is a simple example of how to send well-formatted JSON access logs directly to the Elasticsearch server.

It is as simple as Nginx (it could be any webserver) sends the access logs using UDP to the rsyslog server, which then sends well-formatted JSON data to the Elasticsearch server.

No other server program like logstash is used. The data is transformed in rsyslog and it is passed through a couple of modules to ensure the JSON is valid and Elasticsearch would not complain (and missing logs entry!).
Objectives:

  1. Nginx to send access logs using UDP to the rsyslog server.
  2. rsyslog server to accept UDP messages.
  3. rsyslog server transforms the web-server access logs from the Nginx server to JSON.
  4. rsyslog server sends the validated JSON to the Elasticsearch server.

The configuration and the commands are tested on CentOS 7, CentOS 8 and Ubuntu 18 LTS (just replace yum with apt).

STEP 1) Nginx to send access logs using UDP to the rsyslog server.

It is simple enough to send Nginx’ access logs to a UDP server (local or remote) there are two articles here: nginx remote logging to UDP rsyslog server (CentOS 7) and syslog – UDP local to rsyslog and send remote with TCP and compression. For simplicity, Nginx will send to the remote rsyslog server using UDP.
Instruct the Nginx to send access logs using UDP to the remote rsyslog server.
Define a new access log format in http serction:

        log_format mainJSON escape=json '@cee: {'
                '"vhost":"$server_name",'
                '"remote_addr":"$remote_addr",'
                '"time_iso8601":"$time_iso8601",'
                '"request_uri":"$request_uri",'
                '"request_length":"$request_length",'
                '"request_method":"$request_method",'
                '"request_time":"$request_time",'
                '"server_port":"$server_port",'
                '"server_protocol":"$server_protocol",'
                '"ssl_protocol":"$ssl_protocol",'
                '"status":"$status",'
                '"bytes_sent":"$bytes_sent",'
                '"http_referer":"$http_referer",'
                '"http_user_agent":"$http_user_agent",'
                '"upstream_response_time":"$upstream_response_time",'
                '"upstream_addr":"$upstream_addr",'
                '"upstream_connect_time":"$upstream_connect_time",'
                '"upstream_cache_status":"$upstream_cache_status",'
                '"tcpinfo_rtt":"$tcpinfo_rtt",'
                '"tcpinfo_rttvar":"$tcpinfo_rttvar"'
                '}';

It is a valid JSON object, but sometimes in user agent or referer contain non-standard and not valid characters, so it breaks the JSON format, which may lead to problems in Elasticsearch (read ahead).

In a server section of Nginx configuration file /etc/nginx/nginx.conf:

server {
     .....
     access_log      /var/log/nginx/example.com_access.log main;
     access_log      syslog:server=10.10.10.2:514,facility=local7,tag=nginx,severity=info mainJSON;
     .....
}

Keep on reading!