Caching NFS files with cachefilesd

A great tool for caching a network filesystem like NFS mounts is cachefilesd! It is easy to use it and a good deal of stats can be retrieved from the tool. More on how it works here

Here are quick steps to cache an NFS mounts (it works with NFS-Ganesha servers, too):

  1. Install the daemon tool cachefilesd
  2. Check the configuration file /etc/cachefilesd.conf. In most cases, no need to edit the file! Just check the disk limits if they are good.
  3. Start the cachefilesd daemon.
  4. Mount the network directories with “fsc” option. Umount and mount them all if they’ve been already mounted. The fsc is mandatory option to enable file cacheing of a network mount.
  5. Check stats to see if the file cching is working properly.

The example below is under CentOS 8, but it is almost the same in most Linux distributions.

STEP 1) Install the daemon tool cachefilesd

This is straight forward, just install it with the package manager:

[root@srv ~]# dnf install cachefilesd
Last metadata expiration check: 2:33:44 ago on Tue 08 Dec 2020 07:18:01 AM UTC.
Dependencies resolved.
 Package                                        Architecture                              Version                                            Repository                                 Size
 cachefilesd                                    x86_64                                    0.10.10-4.el8                                      BaseOS                                     43 k

Transaction Summary
Install  1 Package

Total download size: 43 k
Installed size: 71 k
Is this ok [y/N]: y
Downloading Packages:
cachefilesd-0.10.10-4.el8.x86_64.rpm                                                                                                                         3.1 MB/s |  43 kB     00:00    
Total                                                                                                                                                        2.8 MB/s |  43 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                     1/1 
  Installing       : cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 
  Running scriptlet: cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 
  Verifying        : cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 



STEP 2) Check the configuration file and tune for your system.

In most cases, the defaults in /etc/cachefilesd.conf are good to start with:

dir /var/cache/fscache
tag mycache
brun 10%
bcull 7%
bstop 3%
frun 10%
fcull 7%
fstop 3%

# Assuming you're using SELinux with the default security policy included in
# this package
secctx system_u:system_r:cachefiles_kernel_t:s0

The directory where the cache will reside and the lines with the percentages are for disk space limitation. “brun 10%” means cache can runs freely till the disk space drops below 10%. “bcull 7%” – culling the cache when the free space drops below “7%” and more in the man page (or
So if one maintains disk free space below 10% the configuration file should be edited.

STEP 3) Start the cachefilesd daemon.

And enable on boot to start automatically.

[root@srv ~]# systemctl start cachefilesd
[root@srv ~]# systemctl enable cachefilesd
Created symlink /etc/systemd/system/ → /usr/lib/systemd/system/cachefilesd.service.
[root@srv ~]# systemctl status cachefilesd
● cachefilesd.service - Local network file caching management daemon
   Loaded: loaded (/usr/lib/systemd/system/cachefilesd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-12-08 10:01:24 UTC; 11s ago
 Main PID: 29786 (cachefilesd)
    Tasks: 1 (limit: 408616)
   Memory: 2.5M
   CGroup: /system.slice/cachefilesd.service
           └─29786 /usr/sbin/cachefilesd -n -f /etc/cachefilesd.conf

Dec 08 10:01:24 srv systemd[1]: Starting Local network file caching management daemon...
Dec 08 10:01:24 srv systemd[1]: Started Local network file caching management daemon.
Dec 08 10:01:24 srv cachefilesd[29786]: About to bind cache
Dec 08 10:01:24 srv cachefilesd[29786]: Bound cache
Dec 08 10:01:24 srv cachefilesd[29786]: Daemon Started

The status command shows the daemon cachefilesd is running. But does it cache?

STEP 4) Mount the network filesystems with option fsc

To make cachefilesd cache a network mount the option fsc must be included in the mount options. Remount may not work correctly, so to be sure a full umount/mount should be executed. Here is an example /etc/fstab file: /mnt/storage  nfs defaults,hard,intr,noexec,nosuid,_netdev,fsc,vers=4 0 0

And then mount with simple command:

mount /mnt/storage

Check whether the mounts if the FS cache is used. FSC must be “yes”.

[root@srv ~]# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV          FSID                              FSC
v4 c0a80001  801 0:41         d4098a2af096148:ec7560388cbe5b83  yes

There is a proc file for cache statistics:

[root@srv ~]# cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=49 dat=4385599 spc=0
Objects: alc=43666 nal=0 avl=43666 ded=36002
ChkAux : non=0 ok=12289 upd=0 obs=761
Pages  : mrk=24915179 unc=24492585
Acquire: n=4385648 nul=0 noc=0 ok=4385648 nbf=0 oom=0
Lookups: n=43666 neg=31372 pos=12294 crt=31372 tmo=0
Invals : n=1 run=1
Updates: n=0 nul=0 run=1
Relinqs: n=4377930 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=751549 ok=716860 wt=21436 nod=34689 nbf=0 int=0 oom=0
Retrvls: ops=751549 owt=9158 abt=0
Stores : n=550412 ok=550412 agn=0 nbf=0 oom=0
Stores : ops=33238 run=583650 pgs=550412 rxd=550412 olm=0
VmScan : nos=23963352 gon=0 bsy=0 can=0 wt=0
Ops    : pend=9160 run=784788 enq=26874960 can=0 rej=0
Ops    : ini=1301962 dfr=265 rel=1301962 gc=265
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
CacheEv: nsp=761 stl=0 rtr=0 cul=0

And here is the cache directory filled with files. If there are no files, the FS cache is not used, probably the mount is not mounted with FSC! Umount and mount the mounts again.

[root@srv ~]# find /var/cache/fscache|head -n 20
[root@srv ~]# du -d 1 -h /var/cache/fscache
4.0K    /var/cache/fscache/graveyard
3.8G    /var/cache/fscache/cache
3.8G    /var/cache/fscache

There are 3.8G in the cache.

Create and export a GlusterFS volume with NFS-Ganesha in CentOS 8

GlusterFS built-in NFS server supports only NFS version 3. GlusterFS offers NFS exports using NFS-Ganesha, which supports NFS version 3 and 4 protocols.
NFS-Ganesha server is a user-mode file sharing server, which offers a GlusterFS plugin to export GlusterFS volumes. In the following article, the NSF-Ganesha and GlusterFS are installed and a simple GlusterFS volume is created and then exported through NFS 3 and 4 version protocols.
The version of the software in this article:

  • CentOS Stream release 8 (25.04.2021)
  • GlusterFS 8.4
  • NFS-Ganesha 3.5

STEP 1) Install GlusterFS.

dnf install -y centos-release-gluster
dnf install -y glusterfs-server

The first line will installs a new repository under the SIG management – The second line installs the GlusterFS server.

STEP 2) Install NFS-Ganesha.

dnf install -y centos-release-nfs-ganesha30
dnf install -y nfs-ganesha nfs-ganesha-gluster

The first line again installs a new repository under the SIG management and the second line installs the NFS-Ganesha server with Gluster plugin.

STEP 3) Create GlusterFS volume

Start the GlusterFS server and create a simple 3 replicas volume with:
Start the GlusterFS on all the three nodes and enable the GlusterFS communication between the three nodes using firewall-cmd utility. So execute the following commands:

systemctl start glusterd
firewall-cmd --permanent --new-zone=glusternodes
firewall-cmd --permanent --zone=glusternodes --add-source=
firewall-cmd --permanent --zone=glusternodes --add-source=
firewall-cmd --permanent --zone=glusternodes --add-source=
firewall-cmd --permanent --zone=glusternodes --add-service=glusterfs
firewall-cmd --reload

On the first node create the GlusterFS volume. First, add the glnode2 and glnode3 to the cluster.

gluster peer probe glnode2
gluster peer probe glnode3
gluster volume create VOL1 replica 3 transport tcp glnode1:/mnt/storage/gluster/brick glnode2:/mnt/storage/gluster/brick glnode3:/mnt/storage/gluster/brick
gluster volume start VOL1

Keep on reading!

Review of netdata graphs – system overview, cpu, memory, disks and nfs

This is a review of the netdata graphs. Here you can see what you can expect to have when you install netdata (version 1.10) in you server.
As you can see many of the graphs have detailed explanations and some of them have hits what to monitor and pay attention to.

CHART 1) System Overview and grapsh which gather statistics from all parts of the system like CPU, load, disk, ram, swap, network, processes, idlejitter, interrups, softirqs, softnet, entropy, ipc semaphores, uptime.

This is a fst view of the resources of the system and it presents summarized statistics, not detailed! For example you can expect to have the total CPU usage not per core or processor and so on.

main menu
System Overview

CHART 2) CPU and Load

1) Total CPU utilization, netdata Quotation: “Total CPU utilization (all cores). 100% here means there is no CPU idle time at all. You can get per core usage at the CPUs section and per application usage at the Applications Monitoring section. Keep an eye on iowait. If it is constantly high, your disks are a bottleneck and they slow your system down. Another important metric worth monitoring, is softirq. A constantly high percentage of softirq may indicate network driver issues.” and 2) System Load Average – netdata Quotation: “Current system load, i.e. the number of processes using CPU or waiting for system resources (usually CPU and disk). The 3 metrics refer to 1, 5 and 15 minute averages. Linux calculates this once every 5 seconds. Netdata reads them from /proc/loadavg.””

main menu
CPU and Load

CHART 3) Disk

1) Total Disk I/O for all disks from /proc/vmstat. You can easily match how much of the read/written data is from/to disks. 2) Memory paged form/to disk.

main menu
Disk I/O and Memory Paged from/to disk


1) Read from /proc/meminfo. It shows the total RAM and how much is free, used, cached and in buffers. Together with swap graph this is like “free” linux command in the browser. 2) Read from /proc/meminfo. It shows total, free and used swap memory. 3) Swap I/O – Read from /proc/vmstat. More interesting than the previous one, because here you can get aware how often is used your swap device. In fact if you have ins and outs here even a couple of them you probably need more physical RAM or you have misconfigured a service or a application, which could be identified by graphs in Applications->mem or User->mem – which shows the applications’ and users’ ram usage.

main menu
System memory and System swap memory

CHART 5) All network traffic on all interfaces – no virtual ones included, but it includes IPv4 and IPv6 traffic.

main menu
Physical Network Interfaces Aggregated Bandwidth

CHART 6) Processes

1) Read /proc/stat. It appears the Running are “processes in the CPU” and Blocked are in Disk sleep. netdata Quotation: “System processes, read from /proc/stat. Running are the processes in the CPU. Blocked are processes that are willing to enter the CPU, but they cannot, e.g. because they wait for disk activity.” 2) The number of new processes created per second. 3) All system processes – the total number for the given time.

main menu
System processes

CHART 7) Context Switches and idle

1) Context Switches – how many times the CPU is switching from one process, thread or task to another. 2) netdata Quotation: “idle jitter is calculated by netdata. A thread is spawned that requests to sleep for a few microseconds. When the system wakes it up, it measures how many microseconds have passed. The difference between the requested and the actual duration of the sleep, is the idle jitter. This number is useful in real-time environments, where CPU jitter can affect the quality of the service (like VoIP media gateways).”

main menu
Context Switches and idlejitter

CHART 8) Interrupts and softirqs

1) Total number of CPU interrupts, 2) System interrupts – hardware interrupts – which part of your hardware system is doing the interrups – you could identify a hardware abuser. 3) CPU softirqs in detail, read from /proc/softirqs – you could identify a software abuser – a service or a processes

main menu
Interrups and softirqs

CHART 9) softnet and entropy

1) netdata Quotation: “Statistics for CPUs SoftIRQs related to network receive work. Break down per CPU core can be found at CPU / softnet statistics. processed states the number of packets processed, dropped is the number packets dropped because the network device backlog was full (to fix them on Linux use sysctl to increase net.core.netdev_max_backlog), squeezed is the number of packets dropped because the network device budget ran out (to fix them on Linux use sysctl to increase net.core.netdev_budget).” 2) netdata Quotation: “Entropy, is a pool of random numbers (/dev/random) that is mainly used in cryptography. If the pool of entropy gets empty, processes requiring random numbers may run a lot slower (it depends on the interface each program uses), waiting for the pool to be replenished. Ideally a system with high entropy demands should have a hardware device for that purpose (TPM is one such device). There are also several software-only options you may install, like haveged, although these are generally useful only in servers.”

main menu
softnet and entropy

CHART 10) IPC Semaphores and Uptime

1) The total ipc semaphores used in the system 3) uptime of the system

main menu
ipc semaphores and uptime


Utilization by core/logical processor. You can see how much percentage of the CPU is spent in user, system, iowait (probably disk operations!) and softirq (mainly network, but could be also a program with many threads with a lot context switching between them). Here you can see the first Core utilization graph has softirq of 6.0 and the other have none – this is due to the network card is using only the first core/processor (more to follow on the subject).

main menu

CHART 12) Interrupts

Interrupts by core/logical processor. Hardware interrups – enp3s0_28 (the network card), NMI, LOC, PMI, IWI, RES, CAL, TLB and so on. You can see the network interrupts are processed only by the first core/processor. You can change this by setting cpu affinity and to split across all CPU – in most cases you do not need this, because using one core/processor the latency is better, but on a busy server easily could reach 100% busy of the first core and the network packets processing will get in troubles.

main menu

CHART 13) softirqs

Software interrupts – TIMER, NET_TX, NET_RX, TASKLET, SCHED, RCU – network, context switches synchronization and so on.

main menu

CHART 14) softnet

Quotation netdata: “Statistics for per CPUs core SoftIRQs related to network receive work. Total for all CPU cores can be found at System / softnet statistics. processed states the number of packets processed, dropped is the number packets dropped because the network device backlog was full (to fix them on Linux use sysctl to increase net.core.netdev_max_backlog), squeezed is the number of packets dropped because the network device budget ran out (to fix them on Linux use sysctl to increase net.core.netdev_budget).” You can see how much SoftIRQs related to network receive each CPU. As you can see again the network is processed by the first core/processor.

main menu

CHART 15) throttling and cpufreq

1) The throttling of the CPU cores if any and 2) cpu frequency changes. If your server is in idle probably you can see more often to get to lower frequency on some cores/processors.

main menu
Throttling and cpufreq

CHART 16) C-state residency for each core/processor.

main menu

CHART 17) Memory

1) Total available RAM for applications, 2) Commited Memory is the all the memory allocated by processes and 3) page faults – Quotation netdata: “A page fault is a type of interrupt, called trap, raised by computer hardware when a running program accesses a memory page that is mapped into the virtual address space, but not actually loaded into main memory. If the page is loaded in memory at the time the fault is generated, but is not marked in the memory management unit as being loaded in memory, then it is called a minor or soft page fault. A major page fault is generated when the system needs to load the memory page from disk or swap memory.”

main menu

CHART 18) Kernel and Swap memory

1) Quotation netdata: “Dirty is the amount of memory waiting to be written to disk. Writeback is how much memory is actively being written to disk.” – you can tune kernel to how much dirty memory to hold. 2) Memory used by kernel – netdata Quotation: “The total amount of memory being used by the kernel. Slab is the amount of memory used by the kernel to cache data structures for its own use. KernelStack is the amount of memory allocated for each task done by the kernel. PageTables is the amount of memory dedicated to the lowest level of page tables (A page table is used to turn a virtual address into a physical memory address). VmallocUsed is the amount of memory being used as virtual address space.” 3) slab – netdata Quotation: “Reclaimable is the amount of memory which the kernel can reuse. Unreclaimable can not be reused even when the kernel is lacking memory.”

main menu

CHART 19) Hugepages

netdata Quotation: “Hugepages is a feature that allows the kernel to utilize the multiple page size capabilities of modern hardware architectures. The kernel creates multiple pages of virtual memory, mapped from both physical RAM and swap. There is a mechanism in the CPU architecture called “Translation Lookaside Buffers” (TLB) to manage the mapping of virtual memory pages to actual physical memory addresses. The TLB is a limited hardware resource, so utilizing a large amount of physical memory with the default page size consumes the TLB and adds processing overhead. By utilizing Huge Pages, the kernel is able to create pages of much larger sizes, each page consuming a single resource in the TLB. Huge Pages are pinned to physical RAM and cannot be swapped/paged out.”

main menu

CHART 20) deduper (ksm)

You can save some RAM with this feature. netdata Quotation: “Kernel Same-page Merging (KSM) performance monitoring, read from several files in /sys/kernel/mm/ksm/. KSM is a memory-saving de-duplication feature in the Linux kernel (since version 2.6.32). The KSM daemon ksmd periodically scans those areas of user memory which have been registered with it, looking for pages of identical content which can be replaced by a single write-protected page (which is automatically copied if a process later wants to update its content). KSM was originally developed for use with KVM (where it was known as Kernel Shared Memory), to fit more virtual machines into physical memory, by sharing the data common between them. But it can be useful to any application which generates many instances of the same data.”

main menu
deduper (ksm)

CHART 21) Charts with the performance of the disks and disk devices like raids – charts for every device in the system. Most important charts here are the disk utilization where you can see how busy is your device!

1) The disk I/O Bandwidth – Amount of data transferred to and from disk – “md2”. 2) Disk Completed I/O operations – netdata Quotation: “Completed disk I/O operations. Keep in mind the number of operations requested might be higher, since the system is able to merge adjacent to each other (see merged operations chart).”

main menu

CHART 22) Disk I/O

1) The average I/O Operations size of device “md2”, 2) Disk space utilization of device “md2” and 3) inodes usage of device “md2”.

main menu

CHART 23) Disk I/O of md0

1) Disk I/O Bandwidth, 2) Disk Completed I/O Operations, 3) The average I/O Operations

main menu
Disk statitsics for device md0

CHART 24) Disk I/O of sda

1) Disk I/O Bandwidth, 2) Disk Completed I/O Operations, 3) Disk current I/O Operations

main menu
Disk statitsics for device sda

CHART 25) Disk I/O of sda 2

1) Backlog – netdata Quotation: “Backlog is an indication of the duration of pending disk operations. On every I/O event the system is multiplying the time spent doing I/O since the last update of this field with the number of pending operations. While not accurate, this metric can provide an indication of the expected completion time of the operations in progress.”, 2) Disk Utilization Time – one of the most important charts, you can see if you disk is saturated, netdata Quotation: “Disk Utilization measures the amount of time the disk was busy with something. This is not related to its performance. 100% means that the system always had an outstanding operation on the disk. Keep in mind that depending on the underlying technology of the disk, 100% here may or may not be an indication of congestion.” 3) Average Completed I/O Operation Time 4) Average Completed I/O Operation Time

main menu
Disk statitsics for device sda – 2

CHART 26) Disk I/O of sda 3

1) netdata Quotation: “The average service time for completed I/O operations. This metric is calculated using the total busy time of the disk and the number of completed operations. If the disk is able to execute multiple parallel operations the reporting average service time will be misleading.” 2) netdata Quotation: “The number of merged disk operations. The system is able to merge adjacent I/O operations, for example two 4KB reads can become one 8KB read before given to disk.” 3) netdata Quotation: “The sum of the duration of all completed I/O operations. This number can exceed the interval if the disk is able to execute I/O operations in parallel.”

main menu
Disk statitsics for device sda – 3

CHART 27) Performance statistics for a NFS client working on the system.

1) RPC – calls per second, 2) What kind of RPC calls and how many of them.

main menu
NFS Client