nfs | Any IT here? Help Me!

LXC with SELinux and NFS share result in kernel: SELinux: inode_doinit_use_xattr: getxattr returned 2 for dev=0:43 ino=

After staring a new LXC container, the syslog program (Syslog-ng) began to throw thousands of errors with this kind of message:

Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-6977140995289226736
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-6551465724643968476
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-5980833553552494142
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-8820947409424952637
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-8270463809263745561
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-7923279144252216900
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-6181977668994943343
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-7585065875445167421
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-7923279144252216900
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-5826517164673898101
Dec  1 10:50:36 srv kernel: SELinux: inode_doinit_use_xattr:  getxattr returned 2 for dev=0:43 ino=-7585065875445167421
Dec  1 11:01:01 h3 rsyslogd[1147]: imjournal: 3871493 messages lost due to rate-limiting (20000 allowed within 600 seconds)

These messages were logged in thousands. The same time, the NFS statistics showed a strange peak of using getattr. Something was calling getattr thousands times per second. Despite there were no SELinux blocks in audit.log as the dmesg suggested the SELinux might be blamed.
The LXC container is an application container, which has mound bind directory from the host server. The very same directory is an local NFS share (using NFS-Ganesha) of a GlusterFS volume and the PHP files are situated there.

main menu — kernel SELinux inode_doinit_use_xattr getxattr returned 2 nfsstat getattr graph

So the LXC container reads the PHP files from this NFS share. There were no issues to access the files and the application LXC worked just fine.
The problem disappeared when the NFS share was remounted with SELinux permissions using the context word:

node3:/VOL1 /mnt/nfs/VOL1 nfs defaults,hard,noexec,nosuid,_netdev,fsc,noatime,context="system_u:object_r:httpd_sys_rw_content_t:s0" 0 0

All the files are of SELinux label httpd_sys_rw_content_t and after restarting the LXC container there were no SELinux lines in the dmesg and the syslog logs. The administrator should configure the right SELinux permissions to the LXC bound directories. More on why SELinux sometimes does not report on blocks in the audit.log here – Selinux permission denied and no log in audit.log.

Migrate from NFS Kernel Server to NFS-Ganesha server under CentOS Stream 9

This article is to show how to migrate from the NFS kernel server to the NFS-Ganesha server under CentOS Stream 9. The most important thing for migrating from one program to another program is how much downtime will be and what is expected to be done by the clients. In this case, what the clients are needed to do when NFS-Ganesha is used for the server?

Here are the main points when migrating from NFS Kernel Server to the NFS-Ganesha:

The nfs-tuils and nfs-ganesha packages and in general, the two software, are perfectly fine installed on the same system. There are no conflicts when NFS Kernel Server and the NFS-Ganesha server are installed at the same time on the same system.
The clients, do not need to do anything, except remount the NFS mounts.
It should be installed a new community repository by installing the centos-release-nfs-ganesha5 package. The Special Interest Groups (SIG) maintains the repository and the group is within the CentOS community

For installation of NFS-Ganesha and a detailed information check out the older article on the subject – Simple export of a ext4 directory with NFS Ganesha 3.5 server in CentOS 8 with SELinux enforcing, Simple export of a ext4 directory with NFS Ganesha 3.5 server in CentOS 8 without SELinux and Create and export a GlusterFS volume with NFS-Ganesha in CentOS 8

Prerequisite – NFS Kernel Configuration

NFS Kernel Server is installed with nfs-utils packages (and its dependencies) and it has the following simple configuration:

[root@srv ~]# cat /etc/exports
/mnt/storage           192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check)

And here are the NFS services on the system:

[root@srv ~]# systemctl |grep nfs
  proc-fs-nfsd.mount                                         loaded active mounted   NFSD configuration filesystem
  var-lib-nfs-rpc_pipefs.mount                               loaded active mounted   RPC Pipe File System
  nfs-idmapd.service                                         loaded active running   NFSv4 ID-name mapping service
  nfs-mountd.service                                         loaded active running   NFS Mount Daemon
  nfs-server.service                                         loaded active exited    NFS server and services
  nfsdcld.service                                            loaded active running   NFSv4 Client Tracking Daemon
  nfs-client.target                                          loaded active active    NFS client services

The server’s firewall has been tuned for the NFS kernel server, so no need to edit anything in the firewall for the NFS-Ganesha server.
Keep on reading!

Software and technical details of Fedora Server 35 including cockpit screenshots

This article is for those of you who do not want to install a whole new operating system only to discover some technical details about the default installation like disk layout, packages included, software versions, and so on. Here we are going to review in several sections what is like to have a default installation of Fedora Server 35 using a real not virtual machine!
The kernel is 5.14.10 it detects successfully the Threadripper 1950X AMD and the system is stable (we booted in UEFI mode).
The installation procedure uses default options for all installation setups – Minimal network installation of Fedora 35 Server.
Installed packages are 604 occupying 1.7G space:. Note, this is Fedora Server Install, not minimal install. The server install includes the web console – cockpit version 254.

[root@srv ~]# dnf list installed|wc -l
604
[root@srv ~]# df -h /
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/fedora_fedora-root   15G  1.4G   14G  10% /

Keep on reading!

Pages: 1234

Caching NFS files with cachefilesd

A great tool for caching a network filesystem like NFS mounts is cachefilesd! It is easy to use it and a good deal of stats can be retrieved from the tool. More on how it works here https://www.kernel.org/doc/Documentation/filesystems/caching/fscache.txt

Here are quick steps to cache an NFS mounts (it works with NFS-Ganesha servers, too):

Install the daemon tool cachefilesd
Check the configuration file /etc/cachefilesd.conf. In most cases, no need to edit the file! Just check the disk limits if they are good.
Start the cachefilesd daemon.
Mount the network directories with “fsc” option. Umount and mount them all if they’ve been already mounted. The fsc is mandatory option to enable file cacheing of a network mount.
Check stats to see if the file cching is working properly.

The example below is under CentOS 8, but it is almost the same in most Linux distributions.

STEP 1) Install the daemon tool cachefilesd

This is straight forward, just install it with the package manager:

[root@srv ~]# dnf install cachefilesd
Last metadata expiration check: 2:33:44 ago on Tue 08 Dec 2020 07:18:01 AM UTC.
Dependencies resolved.
=============================================================================================================================================================================================
 Package                                        Architecture                              Version                                            Repository                                 Size
=============================================================================================================================================================================================
Installing:
 cachefilesd                                    x86_64                                    0.10.10-4.el8                                      BaseOS                                     43 k

Transaction Summary
=============================================================================================================================================================================================
Install  1 Package

Total download size: 43 k
Installed size: 71 k
Is this ok [y/N]: y
Downloading Packages:
cachefilesd-0.10.10-4.el8.x86_64.rpm                                                                                                                         3.1 MB/s |  43 kB     00:00    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                        2.8 MB/s |  43 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                     1/1 
  Installing       : cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 
  Running scriptlet: cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 
  Verifying        : cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 

Installed:
  cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                                           

Complete!

STEP 2) Check the configuration file and tune for your system.

In most cases, the defaults in /etc/cachefilesd.conf are good to start with:

dir /var/cache/fscache
tag mycache
brun 10%
bcull 7%
bstop 3%
frun 10%
fcull 7%
fstop 3%

# Assuming you're using SELinux with the default security policy included in
# this package
secctx system_u:system_r:cachefiles_kernel_t:s0

The directory where the cache will reside and the lines with the percentages are for disk space limitation. “brun 10%” means cache can runs freely till the disk space drops below 10%. “bcull 7%” – culling the cache when the free space drops below “7%” and more in the man page (or https://linux.die.net/man/5/cachefilesd.conf).
So if one maintains disk free space below 10% the configuration file should be edited.

STEP 3) Start the cachefilesd daemon.

And enable on boot to start automatically.

[root@srv ~]# systemctl start cachefilesd
[root@srv ~]# systemctl enable cachefilesd
Created symlink /etc/systemd/system/multi-user.target.wants/cachefilesd.service → /usr/lib/systemd/system/cachefilesd.service.
[root@srv ~]# systemctl status cachefilesd
● cachefilesd.service - Local network file caching management daemon
   Loaded: loaded (/usr/lib/systemd/system/cachefilesd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-12-08 10:01:24 UTC; 11s ago
 Main PID: 29786 (cachefilesd)
    Tasks: 1 (limit: 408616)
   Memory: 2.5M
   CGroup: /system.slice/cachefilesd.service
           └─29786 /usr/sbin/cachefilesd -n -f /etc/cachefilesd.conf

Dec 08 10:01:24 srv systemd[1]: Starting Local network file caching management daemon...
Dec 08 10:01:24 srv systemd[1]: Started Local network file caching management daemon.
Dec 08 10:01:24 srv cachefilesd[29786]: About to bind cache
Dec 08 10:01:24 srv cachefilesd[29786]: Bound cache
Dec 08 10:01:24 srv cachefilesd[29786]: Daemon Started

The status command shows the daemon cachefilesd is running. But does it cache?

STEP 4) Mount the network filesystems with option fsc

To make cachefilesd cache a network mount the option fsc must be included in the mount options. Remount may not work correctly, so to be sure a full umount/mount should be executed. Here is an example /etc/fstab file:

192.168.0.1:/mnt/storage /mnt/storage  nfs defaults,hard,intr,noexec,nosuid,_netdev,fsc,vers=4 0 0

And then mount with simple command:

mount /mnt/storage

Check whether the mounts if the FS cache is used. FSC must be “yes”.

[root@srv ~]# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV          FSID                              FSC
v4 c0a80001  801 0:41         d4098a2af096148:ec7560388cbe5b83  yes

There is a proc file for cache statistics:

[root@srv ~]# cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=49 dat=4385599 spc=0
Objects: alc=43666 nal=0 avl=43666 ded=36002
ChkAux : non=0 ok=12289 upd=0 obs=761
Pages  : mrk=24915179 unc=24492585
Acquire: n=4385648 nul=0 noc=0 ok=4385648 nbf=0 oom=0
Lookups: n=43666 neg=31372 pos=12294 crt=31372 tmo=0
Invals : n=1 run=1
Updates: n=0 nul=0 run=1
Relinqs: n=4377930 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=751549 ok=716860 wt=21436 nod=34689 nbf=0 int=0 oom=0
Retrvls: ops=751549 owt=9158 abt=0
Stores : n=550412 ok=550412 agn=0 nbf=0 oom=0
Stores : ops=33238 run=583650 pgs=550412 rxd=550412 olm=0
VmScan : nos=23963352 gon=0 bsy=0 can=0 wt=0
Ops    : pend=9160 run=784788 enq=26874960 can=0 rej=0
Ops    : ini=1301962 dfr=265 rel=1301962 gc=265
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
CacheEv: nsp=761 stl=0 rtr=0 cul=0

And here is the cache directory filled with files. If there are no files, the FS cache is not used, probably the mount is not mounted with FSC! Umount and mount the mounts again.

[root@srv ~]# find /var/cache/fscache|head -n 20
/var/cache/fscache
/var/cache/fscache/graveyard
/var/cache/fscache/cache
/var/cache/fscache/cache/@4a
/var/cache/fscache/cache/@4a/I03nfs
/var/cache/fscache/cache/@4a/I03nfs/@84
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XMW1n8cXgyl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XZG2n84qfxl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XQM2n8IQ4El60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XZO2n8Ms7kl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XvQ2n88dKgl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XV40o8cREC6b0
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7X3Y1n8MHO_l60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7Xhkwn80f23zb0
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XEQ2n8IuAll60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XoO2n8A7Bll60
[root@srv ~]# du -d 1 -h /var/cache/fscache
4.0K    /var/cache/fscache/graveyard
3.8G    /var/cache/fscache/cache
3.8G    /var/cache/fscache

There are 3.8G in the cache.

Create and export a GlusterFS volume with NFS-Ganesha in CentOS 8

GlusterFS built-in NFS server supports only NFS version 3. GlusterFS offers NFS exports using NFS-Ganesha, which supports NFS version 3 and 4 protocols.
NFS-Ganesha server is a user-mode file sharing server, which offers a GlusterFS plugin to export GlusterFS volumes. In the following article, the NSF-Ganesha and GlusterFS are installed and a simple GlusterFS volume is created and then exported through NFS 3 and 4 version protocols.
The version of the software in this article:

CentOS Stream release 8 (25.04.2021)
GlusterFS 8.4
NFS-Ganesha 3.5

STEP 1) Install GlusterFS.

dnf install -y centos-release-gluster
dnf install -y glusterfs-server

The first line will installs a new repository under the SIG management – https://wiki.centos.org/SpecialInterestGroup/Storage. The second line installs the GlusterFS server.

STEP 2) Install NFS-Ganesha.

dnf install -y centos-release-nfs-ganesha30
dnf install -y nfs-ganesha nfs-ganesha-gluster

The first line again installs a new repository under the SIG management and the second line installs the NFS-Ganesha server with Gluster plugin.

STEP 3) Create GlusterFS volume

Start the GlusterFS server and create a simple 3 replicas volume with:
Start the GlusterFS on all the three nodes and enable the GlusterFS communication between the three nodes using firewall-cmd utility. So execute the following commands:

systemctl start glusterd
firewall-cmd --permanent --new-zone=glusternodes
firewall-cmd --permanent --zone=glusternodes --add-source=192.168.0.200
firewall-cmd --permanent --zone=glusternodes --add-source=192.168.0.201
firewall-cmd --permanent --zone=glusternodes --add-source=192.168.0.202
firewall-cmd --permanent --zone=glusternodes --add-service=glusterfs
firewall-cmd --reload

On the first node create the GlusterFS volume. First, add the glnode2 and glnode3 to the cluster.

gluster peer probe glnode2
gluster peer probe glnode3
gluster volume create VOL1 replica 3 transport tcp glnode1:/mnt/storage/gluster/brick glnode2:/mnt/storage/gluster/brick glnode3:/mnt/storage/gluster/brick
gluster volume start VOL1

Keep on reading!

Review of netdata graphs – system overview, cpu, memory, disks and nfs

This is a review of the netdata graphs. Here you can see what you can expect to have when you install netdata (version 1.10) in you server.
As you can see many of the graphs have detailed explanations and some of them have hits what to monitor and pay attention to.

CHART 1) System Overview and grapsh which gather statistics from all parts of the system like CPU, load, disk, ram, swap, network, processes, idlejitter, interrups, softirqs, softnet, entropy, ipc semaphores, uptime.

This is a fst view of the resources of the system and it presents summarized statistics, not detailed! For example you can expect to have the total CPU usage not per core or processor and so on.

CHART 2) CPU and Load

1) Total CPU utilization, netdata Quotation: “Total CPU utilization (all cores). 100% here means there is no CPU idle time at all. You can get per core usage at the CPUs section and per application usage at the Applications Monitoring section. Keep an eye on iowait. If it is constantly high, your disks are a bottleneck and they slow your system down. Another important metric worth monitoring, is softirq. A constantly high percentage of softirq may indicate network driver issues.” and 2) System Load Average – netdata Quotation: “Current system load, i.e. the number of processes using CPU or waiting for system resources (usually CPU and disk). The 3 metrics refer to 1, 5 and 15 minute averages. Linux calculates this once every 5 seconds. Netdata reads them from /proc/loadavg.””

CHART 3) Disk

1) Total Disk I/O for all disks from /proc/vmstat. You can easily match how much of the read/written data is from/to disks. 2) Memory paged form/to disk.

CHART 4) RAM

1) Read from /proc/meminfo. It shows the total RAM and how much is free, used, cached and in buffers. Together with swap graph this is like “free” linux command in the browser. 2) Read from /proc/meminfo. It shows total, free and used swap memory. 3) Swap I/O – Read from /proc/vmstat. More interesting than the previous one, because here you can get aware how often is used your swap device. In fact if you have ins and outs here even a couple of them you probably need more physical RAM or you have misconfigured a service or a application, which could be identified by graphs in Applications->mem or User->mem – which shows the applications’ and users’ ram usage.

CHART 5) All network traffic on all interfaces – no virtual ones included, but it includes IPv4 and IPv6 traffic.

CHART 6) Processes

1) Read /proc/stat. It appears the Running are “processes in the CPU” and Blocked are in Disk sleep. netdata Quotation: “System processes, read from /proc/stat. Running are the processes in the CPU. Blocked are processes that are willing to enter the CPU, but they cannot, e.g. because they wait for disk activity.” 2) The number of new processes created per second. 3) All system processes – the total number for the given time.

CHART 7) Context Switches and idle

1) Context Switches – how many times the CPU is switching from one process, thread or task to another. 2) netdata Quotation: “idle jitter is calculated by netdata. A thread is spawned that requests to sleep for a few microseconds. When the system wakes it up, it measures how many microseconds have passed. The difference between the requested and the actual duration of the sleep, is the idle jitter. This number is useful in real-time environments, where CPU jitter can affect the quality of the service (like VoIP media gateways).”

CHART 8) Interrupts and softirqs

1) Total number of CPU interrupts, 2) System interrupts – hardware interrupts – which part of your hardware system is doing the interrups – you could identify a hardware abuser. 3) CPU softirqs in detail, read from /proc/softirqs – you could identify a software abuser – a service or a processes

CHART 9) softnet and entropy

1) netdata Quotation: “Statistics for CPUs SoftIRQs related to network receive work. Break down per CPU core can be found at CPU / softnet statistics. processed states the number of packets processed, dropped is the number packets dropped because the network device backlog was full (to fix them on Linux use sysctl to increase net.core.netdev_max_backlog), squeezed is the number of packets dropped because the network device budget ran out (to fix them on Linux use sysctl to increase net.core.netdev_budget).” 2) netdata Quotation: “Entropy, is a pool of random numbers (/dev/random) that is mainly used in cryptography. If the pool of entropy gets empty, processes requiring random numbers may run a lot slower (it depends on the interface each program uses), waiting for the pool to be replenished. Ideally a system with high entropy demands should have a hardware device for that purpose (TPM is one such device). There are also several software-only options you may install, like haveged, although these are generally useful only in servers.”

CHART 10) IPC Semaphores and Uptime

1) The total ipc semaphores used in the system 3) uptime of the system

CHART 11) CPU

Utilization by core/logical processor. You can see how much percentage of the CPU is spent in user, system, iowait (probably disk operations!) and softirq (mainly network, but could be also a program with many threads with a lot context switching between them). Here you can see the first Core utilization graph has softirq of 6.0 and the other have none – this is due to the network card is using only the first core/processor (more to follow on the subject).

CHART 12) Interrupts

Interrupts by core/logical processor. Hardware interrups – enp3s0_28 (the network card), NMI, LOC, PMI, IWI, RES, CAL, TLB and so on. You can see the network interrupts are processed only by the first core/processor. You can change this by setting cpu affinity and to split across all CPU – in most cases you do not need this, because using one core/processor the latency is better, but on a busy server easily could reach 100% busy of the first core and the network packets processing will get in troubles.

CHART 13) softirqs

Software interrupts – TIMER, NET_TX, NET_RX, TASKLET, SCHED, RCU – network, context switches synchronization and so on.

CHART 14) softnet

Quotation netdata: “Statistics for per CPUs core SoftIRQs related to network receive work. Total for all CPU cores can be found at System / softnet statistics. processed states the number of packets processed, dropped is the number packets dropped because the network device backlog was full (to fix them on Linux use sysctl to increase net.core.netdev_max_backlog), squeezed is the number of packets dropped because the network device budget ran out (to fix them on Linux use sysctl to increase net.core.netdev_budget).” You can see how much SoftIRQs related to network receive each CPU. As you can see again the network is processed by the first core/processor.

CHART 15) throttling and cpufreq

1) The throttling of the CPU cores if any and 2) cpu frequency changes. If your server is in idle probably you can see more often to get to lower frequency on some cores/processors.

CHART 16) C-state residency for each core/processor.

CHART 17) Memory

1) Total available RAM for applications, 2) Commited Memory is the all the memory allocated by processes and 3) page faults – Quotation netdata: “A page fault is a type of interrupt, called trap, raised by computer hardware when a running program accesses a memory page that is mapped into the virtual address space, but not actually loaded into main memory. If the page is loaded in memory at the time the fault is generated, but is not marked in the memory management unit as being loaded in memory, then it is called a minor or soft page fault. A major page fault is generated when the system needs to load the memory page from disk or swap memory.”

CHART 18) Kernel and Swap memory

1) Quotation netdata: “Dirty is the amount of memory waiting to be written to disk. Writeback is how much memory is actively being written to disk.” – you can tune kernel to how much dirty memory to hold. 2) Memory used by kernel – netdata Quotation: “The total amount of memory being used by the kernel. Slab is the amount of memory used by the kernel to cache data structures for its own use. KernelStack is the amount of memory allocated for each task done by the kernel. PageTables is the amount of memory dedicated to the lowest level of page tables (A page table is used to turn a virtual address into a physical memory address). VmallocUsed is the amount of memory being used as virtual address space.” 3) slab – netdata Quotation: “Reclaimable is the amount of memory which the kernel can reuse. Unreclaimable can not be reused even when the kernel is lacking memory.”

CHART 19) Hugepages

netdata Quotation: “Hugepages is a feature that allows the kernel to utilize the multiple page size capabilities of modern hardware architectures. The kernel creates multiple pages of virtual memory, mapped from both physical RAM and swap. There is a mechanism in the CPU architecture called “Translation Lookaside Buffers” (TLB) to manage the mapping of virtual memory pages to actual physical memory addresses. The TLB is a limited hardware resource, so utilizing a large amount of physical memory with the default page size consumes the TLB and adds processing overhead. By utilizing Huge Pages, the kernel is able to create pages of much larger sizes, each page consuming a single resource in the TLB. Huge Pages are pinned to physical RAM and cannot be swapped/paged out.”

CHART 20) deduper (ksm)

You can save some RAM with this feature. netdata Quotation: “Kernel Same-page Merging (KSM) performance monitoring, read from several files in /sys/kernel/mm/ksm/. KSM is a memory-saving de-duplication feature in the Linux kernel (since version 2.6.32). The KSM daemon ksmd periodically scans those areas of user memory which have been registered with it, looking for pages of identical content which can be replaced by a single write-protected page (which is automatically copied if a process later wants to update its content). KSM was originally developed for use with KVM (where it was known as Kernel Shared Memory), to fit more virtual machines into physical memory, by sharing the data common between them. But it can be useful to any application which generates many instances of the same data.”

CHART 21) Charts with the performance of the disks and disk devices like raids – charts for every device in the system. Most important charts here are the disk utilization where you can see how busy is your device!

1) The disk I/O Bandwidth – Amount of data transferred to and from disk – “md2”. 2) Disk Completed I/O operations – netdata Quotation: “Completed disk I/O operations. Keep in mind the number of operations requested might be higher, since the system is able to merge adjacent to each other (see merged operations chart).”

CHART 22) Disk I/O

1) The average I/O Operations size of device “md2”, 2) Disk space utilization of device “md2” and 3) inodes usage of device “md2”.

CHART 23) Disk I/O of md0

1) Disk I/O Bandwidth, 2) Disk Completed I/O Operations, 3) The average I/O Operations

CHART 24) Disk I/O of sda

1) Disk I/O Bandwidth, 2) Disk Completed I/O Operations, 3) Disk current I/O Operations

CHART 25) Disk I/O of sda 2

1) Backlog – netdata Quotation: “Backlog is an indication of the duration of pending disk operations. On every I/O event the system is multiplying the time spent doing I/O since the last update of this field with the number of pending operations. While not accurate, this metric can provide an indication of the expected completion time of the operations in progress.”, 2) Disk Utilization Time – one of the most important charts, you can see if you disk is saturated, netdata Quotation: “Disk Utilization measures the amount of time the disk was busy with something. This is not related to its performance. 100% means that the system always had an outstanding operation on the disk. Keep in mind that depending on the underlying technology of the disk, 100% here may or may not be an indication of congestion.” 3) Average Completed I/O Operation Time 4) Average Completed I/O Operation Time

CHART 26) Disk I/O of sda 3

1) netdata Quotation: “The average service time for completed I/O operations. This metric is calculated using the total busy time of the disk and the number of completed operations. If the disk is able to execute multiple parallel operations the reporting average service time will be misleading.” 2) netdata Quotation: “The number of merged disk operations. The system is able to merge adjacent I/O operations, for example two 4KB reads can become one 8KB read before given to disk.” 3) netdata Quotation: “The sum of the duration of all completed I/O operations. This number can exceed the interval if the disk is able to execute I/O operations in parallel.”

CHART 27) Performance statistics for a NFS client working on the system.

1) RPC – calls per second, 2) What kind of RPC calls and how many of them.