send access logs in json to Elasticsearch using rsyslog

Author:

Here is a simple example of how to send well-formatted JSON access logs directly to the Elasticsearch server.

It is as simple as Nginx (it could be any webserver) sends the access logs using UDP to the rsyslog server, which then sends well-formatted JSON data to the Elasticsearch server.

No other server program like logstash is used. The data is transformed in rsyslog and it is passed through a couple of modules to ensure the JSON is valid and Elasticsearch would not complain (and missing logs entry!).
Objectives:

  1. Nginx to send access logs using UDP to the rsyslog server.
  2. rsyslog server to accept UDP messages.
  3. rsyslog server transforms the web-server access logs from the Nginx server to JSON.
  4. rsyslog server sends the validated JSON to the Elasticsearch server.

The configuration and the commands are tested on CentOS 7, CentOS 8 and Ubuntu 18 LTS (just replace yum with apt).

STEP 1) Nginx to send access logs using UDP to the rsyslog server.

It is simple enough to send Nginx’ access logs to a UDP server (local or remote) there are two articles here: nginx remote logging to UDP rsyslog server (CentOS 7) and syslog – UDP local to rsyslog and send remote with TCP and compression. For simplicity, Nginx will send to the remote rsyslog server using UDP.
Instruct the Nginx to send access logs using UDP to the remote rsyslog server.
Define a new access log format in http serction:

        log_format mainJSON escape=json '@cee: {'
                '"vhost":"$server_name",'
                '"remote_addr":"$remote_addr",'
                '"time_iso8601":"$time_iso8601",'
                '"request_uri":"$request_uri",'
                '"request_length":"$request_length",'
                '"request_method":"$request_method",'
                '"request_time":"$request_time",'
                '"server_port":"$server_port",'
                '"server_protocol":"$server_protocol",'
                '"ssl_protocol":"$ssl_protocol",'
                '"status":"$status",'
                '"bytes_sent":"$bytes_sent",'
                '"http_referer":"$http_referer",'
                '"http_user_agent":"$http_user_agent",'
                '"upstream_response_time":"$upstream_response_time",'
                '"upstream_addr":"$upstream_addr",'
                '"upstream_connect_time":"$upstream_connect_time",'
                '"upstream_cache_status":"$upstream_cache_status",'
                '"tcpinfo_rtt":"$tcpinfo_rtt",'
                '"tcpinfo_rttvar":"$tcpinfo_rttvar"'
                '}';

It is a valid JSON object, but sometimes in user agent or referer contain non-standard and not valid characters, so it breaks the JSON format, which may lead to problems in Elasticsearch (read ahead).

In a server section of Nginx configuration file /etc/nginx/nginx.conf:

server {
     .....
     access_log      /var/log/nginx/example.com_access.log main;
     access_log      syslog:server=10.10.10.2:514,facility=local7,tag=nginx,severity=info mainJSON;
     .....
}


Multiple access_log directives are allowed.
Change the IP (10.10.10.2 – the IP of the remote UDP rsyslog server) and the port (514) to the proper ones and reload the Nginx.

STEP 2) rsyslog server to accept UDP messages. And include the needed modules.

Nginx sends UDP messages, which rsyslog accepts. Here is the basic rsyslog configuration (/etc/rsyslog.conf), uncomment or add the following lines:

$ModLoad imudp
$UDPServerRun 514

module(load="mmjsonparse")          # for parsing CEE-enhanced syslog messages
module(load="omelasticsearch")      # for outputting to Elasticsearch
module(load="mmutf8fix")

Or it could be used the new syntax (which is preferred):

module(load="imudp")
input(type="imudp" port="514")

module(load="mmjsonparse")          # for parsing CEE-enhanced syslog messages
module(load="omelasticsearch")      # for outputting to Elasticsearch
module(load="mmutf8fix")

The modules mmjsonparse, omelasticsearch and mmutf8fix will be used in the next sections.
They may not be installed by default like in CentOS 7/8. Here is how to install under CentOS 7:

yum install -y rsyslog-mmjsonparse rsyslog-elasticsearch rsyslog-mmutf8fix

Check out the server’s firewall and make a rule to accept UDP mesages on port 514 if there is a firewall (for more details check the links provided in STEP 1)):

firewall-cmd --permanent --zone=public --add-rich-rule="rule family="ipv4" source address="10.10.10.200" port protocol="tcp" port="10514" accept"
firewall-cmd --reload

10.10.10.200 is the IP of the client, i.e. the web-server IP.

STEP 3) rsyslog server transforms the web-server access logs from the Nginx server to JSON.

rsyslog parses the JSON and transforms it properly. Make a template (insert it in the RULE section of /etc/rsyslog.conf):

template(name="json-syslog" type="list" option.jsonf="on") {
        property(outname="@timestamp" name="timereported" dateFormat="rfc3339" format="jsonf")
        property(outname="host" name="hostname" format="jsonf")
        property(outname="severity" name="syslogseverity" caseConversion="upper" format="jsonf" datatype="number")
        property(outname="facility" name="syslogfacility" format="jsonf" datatype="number")
        property(outname="syslog-tag" name="syslogtag" format="jsonf")
        property(outname="source" name="app-name" format="jsonf" onEmpty="null")
        property(outname="vhost" name="$!vhost" format="jsonf" datatype="string" onEmpty="null")
        property(outname="remote_addr" name="$!remote_addr" format="jsonf" datatype="string" onEmpty="null")
        property(outname="time_iso8601" name="$!time_iso8601" format="jsonf" datatype="string" onEmpty="null")
        property(outname="request_uri" name="$!request_uri" format="jsonf" datatype="string" onEmpty="null")
        property(outname="request_length" name="$!request_length" format="jsonf" datatype="number" onEmpty="null")
        property(outname="request_method" name="$!request_method" format="jsonf" datatype="string" onEmpty="null")
        property(outname="request_time" name="$!request_time" format="jsonf" datatype="number" onEmpty="null")
        property(outname="server_port" name="$!server_port" format="jsonf" datatype="number" onEmpty="null")
        property(outname="server_protocol" name="$!server_protocol" format="jsonf" datatype="string" onEmpty="null")
        property(outname="ssl_protocol" name="$!ssl_protocol" format="jsonf" datatype="string" onEmpty="null")
        property(outname="status" name="$!status" format="jsonf" datatype="number" onEmpty="null")
        property(outname="bytes_sent" name="$!bytes_sent" format="jsonf" datatype="number" onEmpty="null")
        property(outname="http_referer" name="$!http_referer" format="jsonf" datatype="string" onEmpty="null")
        property(outname="http_user_agent" name="$!http_user_agent" format="jsonfr" datatype="string" onEmpty="null")
        property(outname="upstream_response_time" name="$!upstream_response_time" format="jsonf" datatype="string" onEmpty="null")
        property(outname="upstream_addr" name="$!upstream_addr" format="jsonf" datatype="string" onEmpty="null")
        property(outname="upstream_connect_time" name="$!upstream_connect_time" format="jsonf" datatype="string" onEmpty="null")
        property(outname="upstream_cache_status" name="$!upstream_cache_status" format="jsonf" datatype="string" onEmpty="null")
        property(outname="tcpinfo_rtt" name="$!tcpinfo_rtt" format="jsonf" datatype="number" onEmpty="null")
        property(outname="tcpinfo_rttvar" name="$!tcpinfo_rttvar" format="jsonf" datatype="number" onEmpty="null")
}

This template will be used to transform each UDP message from the Nginx access logs. The mmutf8fix will analyze the UDP access log message and it will fix problems with non-standard characters and it si mandatory to use this module because messages may not be accepted by the Elasticsearch server. Then the module mmjsonparse kicks in and parses each message so we can use the above template to build our valid JSON object (option.jsonf=”on” will create an JSON object, which the property directive adds a pair of name/value. The value is from the original and parsed already JSON object of the message by the module mmjsonparse). Add the actions in an IF selecting only the remote IPs (not local, because we do not want the local syslog mesages).

.....
if ($fromhost-ip != "127.0.0.1" ) then {
    action(type="mmutf8fix")
    action(type="mmjsonparse")
}
.....

The IF will include one more action shown in the next section.

STEP 4)rsyslog server sends the validated JSON to the Elasticsearch server.

Use action with type “omelasticsearch” (to activate the Elasticsearch mode) and the above template to send the transformed JSON object to the Elasticsearch server. Here is the final and full IF rule located at the top of the RULE section in /etc/rsyslog.conf (right after the above template):

if ($fromhost-ip != "127.0.0.1" ) then {
    action(type="mmutf8fix")
    action(type="mmjsonparse")
    action(type="omelasticsearch" server="10.10.10.10" serverport="9200" template="json-syslog" searchIndex="rsyslog-index" dynSearchIndex="on" errorfile="/var/log/omelasticsearch.log")
& stop
}

Change the “server” with the IP of your Elasticsearch server (this is the IP of the Elasticsearch server – 10.10.10.10). A error log file is inlcuded to track the errors. Each message is a valid UTF-8 JSON object and the errors are written in “/var/log/omelasticsearch.log“. It is a goog idea to check this files frequently.

The JSON object

@cee: {
"vhost":"srv2.example.com",
"remote_addr":"51.234.23.12",
"time_iso8601":"2020-02-22T01:53:26+00:00",
"request_uri":"/example_uri/path",
"request_length":"233",
"request_method":"GET",
"request_time":"0.000",
"server_port":"443",
"server_protocol":"HTTP/2.0",
"ssl_protocol":"TLSv1.2",
"status":"206",
"bytes_sent":"1327441",
"http_referer":"https://example.com/search/whatsearch",
"http_user_agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36","upstream_response_time":"",
"upstream_addr":"",
"upstream_connect_time":"",
"upstream_cache_status":"HIT",
"tcpinfo_rtt":"17618",
"tcpinfo_rttvar":"780"
}

Leave a Reply

Your email address will not be published. Required fields are marked *