Delete an Offline RAID6 virtual drive and create a new one with AVAGO storcli

Offline virtual device means it cannot be used because the missing or bad or failed disks are more than the fault tolerance it is offering. In this case, there is a RAID 6 on a AVAGO MegaRAID 3018 controller with 2 x RAID6 virtual drives with 6 disks each. One of the virtual drives misses 3 of the 6 disks in the group, so this virtual drive is in Offline state and it cannot be repaired. Three new disks are put to replace the failed disks. Here is what command to issue with the AVAGO command-line utility storcli under CentOS 7 to delete and then create a healthy new RAID 6 virtual drive:

  1. Delete the Offline virtual drive.
  2. Create a new RAID 6 virtual drive with 6 disks.
  3. Initialize the newly create virtual drive to make it consistent.

On each step, it is included additional show storcli commands to better preset what happens in reality and how the controller reflects the changes.
The initial state of the whole configuration is shown below:

[root@srv ~]# /opt/MegaRAID/storcli/storcli64 /c0 show
Generating detailed summary of the adapter, it may take a while to complete.

CLI Version = 007.0709.0000.0000 Aug 14, 2018
Operating system = Linux 3.10.0-957.1.3.el7.x86_64
Controller = 0
Status = Success
Description = None

Product Name = AVAGO 3108 MegaRAID
Serial Number = FW-AC5CMJEAARBWA
SAS Address =  500304802426b600
PCI Address = 00:01:00:00
System Time = 09/20/2022, 14:09:12
Mfg. Date = 00/00/00
Controller Time = 09/20/2022, 14:09:08
FW Package Build = 24.21.0-0028
BIOS Version = 6.36.00.2_4.19.08.00_0x06180202
FW Version = 4.680.00-8290
Driver Name = megaraid_sas
Driver Version = 07.705.02.00-rh1
Current Personality = RAID-Mode 
Vendor Id = 0x1000
Device Id = 0x5D
SubVendor Id = 0x15D9
SubDevice Id = 0x809
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 1
Device Number = 0
Function Number = 0
Drive Groups = 2

TOPOLOGY :
========

----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type  State BT      Size PDC  PI SED DS3  FSpace TR 
----------------------------------------------------------------------------
 0 -   -   -        -   RAID6 OfLn  N  43.654 TB dflt N  N   dflt N      N  
 0 0   -   -        -   RAID6 Dgrd  N  43.654 TB dflt N  N   dflt N      N  
 0 0   0   -        -   DRIVE Msng  -  10.913 TB -    -  -   -    -      N  
 0 0   1   8:1      13  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 0 0   2   8:2      10  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 0 0   3   -        -   DRIVE Msng  -  10.913 TB -    -  -   -    -      N  
 0 0   4   8:4      11  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 0 0   5   -        -   DRIVE Msng  -  10.913 TB -    -  -   -    -      N  
 1 -   -   -        -   RAID6 Optl  N  43.654 TB dflt N  N   dflt N      N  
 1 0   -   -        -   RAID6 Optl  N  43.654 TB dflt N  N   dflt N      N  
 1 0   0   8:6      20  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   1   8:7      19  DRIVE Onln  N  12.732 TB dflt N  N   dflt -      N  
 1 0   2   8:8      18  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   3   8:9      15  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   4   8:10     12  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
 1 0   5   8:11     14  DRIVE Onln  N  10.913 TB dflt N  N   dflt -      N  
----------------------------------------------------------------------------

DG=Disk Group Index|Arr=Array Index|Row=Row Index|EID=Enclosure Device ID
DID=Device ID|Type=Drive Type|Onln=Online|Rbld=Rebuild|Dgrd=Degraded
Pdgd=Partially degraded|Offln=Offline|BT=Background Task Active
PDC=PD Cache|PI=Protection Info|SED=Self Encrypting Drive|Frgn=Foreign
DS3=Dimmer Switch 3|dflt=Default|Msng=Missing|FSpace=Free Space Present
TR=Transport Ready

Virtual Drives = 2

VD LIST :
=======

------------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC      Size Name     
------------------------------------------------------------------
0/0   RAID6 OfLn  RW     No      RAWBD -   ON  43.654 TB storage1 
1/1   RAID6 Optl  RW     Yes     RAWBD -   ON  43.654 TB storage2 
------------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency

Physical Drives = 12

PD LIST :
=======

---------------------------------------------------------------------------------
EID:Slt DID State DG      Size Intf Med SED PI SeSz Model                Sp Type 
---------------------------------------------------------------------------------
8:0       9 UGood -  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 D  -    
8:1      13 Onln  0  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:2      10 Onln  0  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:3      17 UGood -  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 D  -    
8:4      11 Onln  0  10.913 TB SATA HDD N   N  512B ST12000NM001G-2MV103 U  -    
8:5      16 UGood -  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 D  -    
8:6      20 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:7      19 Onln  1  12.732 TB SATA HDD N   N  512B ST14000NM001G-2KJ103 U  -    
8:8      18 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:9      15 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:10     12 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:11     14 Onln  1  10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
---------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


Cachevault_Info :
===============

------------------------------------
Model  State   Temp Mode MfgDate    
------------------------------------
CVPM02 Optimal 28C  -    2018/01/11 
------------------------------------

The show storcli command for the first virtual drive “/c0/v0” is also possible:

[root@srv ~]# /opt/MegaRAID/storcli/storcli64 /c0/v0 show all
CLI Version = 007.0709.0000.0000 Aug 14, 2018
Operating system = Linux 3.10.0-957.1.3.el7.x86_64
Controller = 0
Status = Success
Description = None


/c0/v0 :
======

------------------------------------------------------------------
DG/VD TYPE  State Access Consist Cache Cac sCC      Size Name     
------------------------------------------------------------------
0/0   RAID6 OfLn  RW     No      RAWBD -   ON  43.654 TB storage1 
------------------------------------------------------------------

Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|Dgrd=Degraded
Optl=Optimal|RO=Read Only|RW=Read Write|HD=Hidden|TRANS=TransportReady|B=Blocked|
Consist=Consistent|R=Read Ahead Always|NR=No Read Ahead|WB=WriteBack|
AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled
Check Consistency


PDs for VD 0 :
============

---------------------------------------------------------------------------------
EID:Slt DID State DG      Size Intf Med SED PI SeSz Model                Sp Type 
---------------------------------------------------------------------------------
8:1      13 Onln   0 10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:2      10 Onln   0 10.913 TB SATA HDD N   N  512B ST12000NM0007-2A1101 U  -    
8:4      11 Onln   0 10.913 TB SATA HDD N   N  512B ST12000NM001G-2MV103 U  -    
---------------------------------------------------------------------------------

EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup
DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare
UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface
Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info
SeSz-Sector Size|Sp-Spun|U-Up|D-Down/PowerSave|T-Transition|F-Foreign
UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded
CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded


VD0 Properties :
==============
Strip Size = 1.0 MB
Number of Blocks = 93746888704
VD has Emulated PD = Yes
Span Depth = 1
Number of Drives Per Span = 6
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disk's Default
Encryption = None
Data Protection = Disabled
Active Operations = None
Exposed to OS = Yes
OS Drive Name = N/A
Creation Date = 19-12-2018
Creation Time = 06:11:08 AM
Emulation type = default
Cachebypass size = Cachebypass-64k
Cachebypass Mode = Cachebypass Intelligent
Is LD Ready for OS Requests = Yes
SCSI NAA Id = 600304802426b60023ac9d7c0a7a305b
SCSI Unmap = No

Keep on reading!

Copying partition table from one disk to another with older sfdisk under CentOS 7

Older version of sfdisk may still be used for msdos partition tables.
To copy the partition table from one disk to another using sfdisk a temporary file should be used to store the data for the partition table.

Here is an example of how to copy the msdos partition table from disk sda to disk sdb! Two simple commands

  1. Dump the source (sda) partition table to a temporary file.
  2. Redirect the standard input of the sfdisk utility with the above temporary file.

A copying partition table is really useful when recovering from a drive failure in a Linux software raid. Sometimes it is difficult or just easier to create the exact layout as the source mirror disk!

mdadm --add /dev/md1 /dev/sdb2
mdadm: /dev/sdb2 not large enough to join array

Errors such as the above are easily resolved with just two commands. The new versions of disk programs align the partitions, which may be a problem for a software RAID to join in a partition.

STEP 1) Dump the source partition table.

The source partition table is from /dev/sda:

[root@srv ~]# sfdisk -d /dev/sda > part_table_sda
sfdisk: Warning: extended partition does not start at a cylinder boundary.
DOS and Linux will interpret the contents differently.

There is a warning about not aligned partition, which may cause problems when creating from scratch, so copying the partition table is the best option in such cases.

Here is what the temporary file part_table_sda with the partition table information contains:

[root@srv ~]# cat part_table_sda 
# partition table of /dev/sda
unit: sectors

/dev/sda1 : start=     2048, size= 67045376, Id=fd, bootable
/dev/sda2 : start= 67110912, size=  1048576, Id=fd
/dev/sda3 : start= 68159488, size= 62883840, Id=fd
/dev/sda4 : start=131043328, size=845729792, Id= f
/dev/sda5 : start=131076096, size=845434880, Id=fd

Keep on reading!

Caching NFS files with cachefilesd

A great tool for caching a network filesystem like NFS mounts is cachefilesd! It is easy to use it and a good deal of stats can be retrieved from the tool. More on how it works here https://www.kernel.org/doc/Documentation/filesystems/caching/fscache.txt

Here are quick steps to cache an NFS mounts (it works with NFS-Ganesha servers, too):

  1. Install the daemon tool cachefilesd
  2. Check the configuration file /etc/cachefilesd.conf. In most cases, no need to edit the file! Just check the disk limits if they are good.
  3. Start the cachefilesd daemon.
  4. Mount the network directories with “fsc” option. Umount and mount them all if they’ve been already mounted. The fsc is mandatory option to enable file cacheing of a network mount.
  5. Check stats to see if the file cching is working properly.

The example below is under CentOS 8, but it is almost the same in most Linux distributions.

STEP 1) Install the daemon tool cachefilesd

This is straight forward, just install it with the package manager:

[root@srv ~]# dnf install cachefilesd
Last metadata expiration check: 2:33:44 ago on Tue 08 Dec 2020 07:18:01 AM UTC.
Dependencies resolved.
=============================================================================================================================================================================================
 Package                                        Architecture                              Version                                            Repository                                 Size
=============================================================================================================================================================================================
Installing:
 cachefilesd                                    x86_64                                    0.10.10-4.el8                                      BaseOS                                     43 k

Transaction Summary
=============================================================================================================================================================================================
Install  1 Package

Total download size: 43 k
Installed size: 71 k
Is this ok [y/N]: y
Downloading Packages:
cachefilesd-0.10.10-4.el8.x86_64.rpm                                                                                                                         3.1 MB/s |  43 kB     00:00    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                        2.8 MB/s |  43 kB     00:00     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                                                                                     1/1 
  Installing       : cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 
  Running scriptlet: cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 
  Verifying        : cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                    1/1 

Installed:
  cachefilesd-0.10.10-4.el8.x86_64                                                                                                                                                           

Complete!

STEP 2) Check the configuration file and tune for your system.

In most cases, the defaults in /etc/cachefilesd.conf are good to start with:

dir /var/cache/fscache
tag mycache
brun 10%
bcull 7%
bstop 3%
frun 10%
fcull 7%
fstop 3%

# Assuming you're using SELinux with the default security policy included in
# this package
secctx system_u:system_r:cachefiles_kernel_t:s0

The directory where the cache will reside and the lines with the percentages are for disk space limitation. “brun 10%” means cache can runs freely till the disk space drops below 10%. “bcull 7%” – culling the cache when the free space drops below “7%” and more in the man page (or https://linux.die.net/man/5/cachefilesd.conf).
So if one maintains disk free space below 10% the configuration file should be edited.

STEP 3) Start the cachefilesd daemon.

And enable on boot to start automatically.

[root@srv ~]# systemctl start cachefilesd
[root@srv ~]# systemctl enable cachefilesd
Created symlink /etc/systemd/system/multi-user.target.wants/cachefilesd.service → /usr/lib/systemd/system/cachefilesd.service.
[root@srv ~]# systemctl status cachefilesd
● cachefilesd.service - Local network file caching management daemon
   Loaded: loaded (/usr/lib/systemd/system/cachefilesd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-12-08 10:01:24 UTC; 11s ago
 Main PID: 29786 (cachefilesd)
    Tasks: 1 (limit: 408616)
   Memory: 2.5M
   CGroup: /system.slice/cachefilesd.service
           └─29786 /usr/sbin/cachefilesd -n -f /etc/cachefilesd.conf

Dec 08 10:01:24 srv systemd[1]: Starting Local network file caching management daemon...
Dec 08 10:01:24 srv systemd[1]: Started Local network file caching management daemon.
Dec 08 10:01:24 srv cachefilesd[29786]: About to bind cache
Dec 08 10:01:24 srv cachefilesd[29786]: Bound cache
Dec 08 10:01:24 srv cachefilesd[29786]: Daemon Started

The status command shows the daemon cachefilesd is running. But does it cache?

STEP 4) Mount the network filesystems with option fsc

To make cachefilesd cache a network mount the option fsc must be included in the mount options. Remount may not work correctly, so to be sure a full umount/mount should be executed. Here is an example /etc/fstab file:

192.168.0.1:/mnt/storage /mnt/storage  nfs defaults,hard,intr,noexec,nosuid,_netdev,fsc,vers=4 0 0

And then mount with simple command:

mount /mnt/storage

Check whether the mounts if the FS cache is used. FSC must be “yes”.

[root@srv ~]# cat /proc/fs/nfsfs/volumes
NV SERVER   PORT DEV          FSID                              FSC
v4 c0a80001  801 0:41         d4098a2af096148:ec7560388cbe5b83  yes

There is a proc file for cache statistics:

[root@srv ~]# cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=49 dat=4385599 spc=0
Objects: alc=43666 nal=0 avl=43666 ded=36002
ChkAux : non=0 ok=12289 upd=0 obs=761
Pages  : mrk=24915179 unc=24492585
Acquire: n=4385648 nul=0 noc=0 ok=4385648 nbf=0 oom=0
Lookups: n=43666 neg=31372 pos=12294 crt=31372 tmo=0
Invals : n=1 run=1
Updates: n=0 nul=0 run=1
Relinqs: n=4377930 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=751549 ok=716860 wt=21436 nod=34689 nbf=0 int=0 oom=0
Retrvls: ops=751549 owt=9158 abt=0
Stores : n=550412 ok=550412 agn=0 nbf=0 oom=0
Stores : ops=33238 run=583650 pgs=550412 rxd=550412 olm=0
VmScan : nos=23963352 gon=0 bsy=0 can=0 wt=0
Ops    : pend=9160 run=784788 enq=26874960 can=0 rej=0
Ops    : ini=1301962 dfr=265 rel=1301962 gc=265
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: inv=0 upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
CacheEv: nsp=761 stl=0 rtr=0 cul=0

And here is the cache directory filled with files. If there are no files, the FS cache is not used, probably the mount is not mounted with FSC! Umount and mount the mounts again.

[root@srv ~]# find /var/cache/fscache|head -n 20
/var/cache/fscache
/var/cache/fscache/graveyard
/var/cache/fscache/cache
/var/cache/fscache/cache/@4a
/var/cache/fscache/cache/@4a/I03nfs
/var/cache/fscache/cache/@4a/I03nfs/@84
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XMW1n8cXgyl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XZG2n84qfxl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XQM2n8IQ4El60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XZO2n8Ms7kl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XvQ2n88dKgl60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XV40o8cREC6b0
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7X3Y1n8MHO_l60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7Xhkwn80f23zb0
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XEQ2n8IuAll60
/var/cache/fscache/cache/@4a/I03nfs/@84/Jc0010800840000cG0400/@68/J1100000000000Q0goaWH946iIn7oUMELrd80000000040000g00Kb000wFe000jt000oG3000000040000g000000000/@8b/EA0g00sg0200n8000000ixBMHyy9gdcUm_O8ewl7XoO2n8A7Bll60
[root@srv ~]# du -d 1 -h /var/cache/fscache
4.0K    /var/cache/fscache/graveyard
3.8G    /var/cache/fscache/cache
3.8G    /var/cache/fscache

There are 3.8G in the cache.

libelf was not found in the pkg-config search path

Building from source under CentOS the user may stumble on some compilation errors and most of them are for missing -devel packages. Here is such example with not so easy to find the name of a missing library:

[/tmp/netdata-libbpf-El77Ld/libbpf-0.0.9_netdata-1/src]# env CFLAGS=-fPIC CXXFLAGS= LDFLAGS= BUILD_STATIC_ONLY=y OBJDIR=build DESTDIR=.. make install 
Package libelf was not found in the pkg-config search path.
Perhaps you should add the directory containing `libelf.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libelf' found
mkdir -p build/staticobjs
cc -I. -I../include -I../include/uapi -DCOMPAT_NEED_REALLOCARRAY -fPIC -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64   -c bpf.c -o build/staticobjs/bpf.o
cc -I. -I../include -I../include/uapi -DCOMPAT_NEED_REALLOCARRAY -fPIC -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64   -c btf.c -o build/staticobjs/btf.o
btf.c:17:18: fatal error: gelf.h: No such file or directory
 #include <gelf.h>
                  ^
compilation terminated.
make: *** [build/staticobjs/btf.o] Error 1
 FAILED   

The missing development library file is with the name: elfutils-libelf-devel. Installing the package with yum or dnf will resolve the above error:

yum install -y elfutils-libelf-devel

Or for CentOS 8 and newer Fedora versions:

dnf install -y elfutils-libelf-devel

removing the default kernel in CentOS 8 – remove elrepo kernel

Removing the default kernel aka the loaded kernel in CentOS 8 maybe challenging because the package is protected and cannot be removed by the yum or dnf.
Here is the case: an elrepo kernel-ml loaded and the dnf prints it cannot remove the package, because it is protected:

[root@srv ~]# dnf remove kernel-ml kernel-ml-core kernel-ml-modules
Error: 
 Problem: The operation would result in removing the following protected packages: kernel-ml-core
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
[root@srv ~]# uname -a
Linux srv.localhost 5.10.4-1.el8.elrepo.x86_64 #1 SMP Tue Dec 29 11:04:23 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@srv ~]# grubby --default-kernel
/boot/vmlinuz-5.10.4-1.el8.elrepo.x86_64

The system is booted up with the kernel we are trying to remove, which is impossible.

The solution is to set a new default kernel and load it. Then dnf will be able to remove the first kernel.

For CentOS 7, just use the yum instead of dnf command.
Using grubby is really easy and straightforward:

STEP 1) List all installed and available to boot kernels

[root@srv ~]# grubby --info=ALL |grep ^kernel
kernel="/boot/vmlinuz-5.10.4-1.el8.elrepo.x86_64"
kernel="/boot/vmlinuz-4.18.0-259.el8.x86_64"
kernel="/boot/vmlinuz-4.18.0-257.el8.x86_64"
kernel="/boot/vmlinuz-0-rescue-45e12f0814fd4947b99cbdcb88950361"

STEP 2) Select the kernel to load the next time

[root@srv ~]# grubby --set-default "/boot/vmlinuz-4.18.0-259.el8.x86_64"
The default is /boot/loader/entries/45e12f0814fd4947b99cbdcb88950361-4.18.0-259.el8.x86_64.conf with index 1 and kernel /boot/vmlinuz-4.18.0-259.el8.x86_64

Keep on reading!

gitlab in podman cannot create unix sockets in glusterfs because of SELinux

Installing gitlab-ee (and gitlab-ce) under CentOS 7 with enabled SELinux (i.e. enforcing mode) looped endlessly the container in restarting the installation process! There were multiple errors for missing sockets in the podman logs of the gitlab container. Here are some of the errors:
Missing postgresql unix socket in “/var/opt/gitlab/postgresql”:

Recipe: gitlab::database_migrations
  * bash[migrate gitlab-rails database] action run
    [execute] rake aborted!
              PG::ConnectionBad: could not connect to server: No such file or directory
                Is the server running locally and accepting
                connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?
              /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:53:in `block (3 levels) in <top (required)>'
              /opt/gitlab/embedded/bin/bundle:23:in `load'
              /opt/gitlab/embedded/bin/bundle:23:in `<main>'
              Tasks: TOP => gitlab:db:configure
              (See full trace by running task with --trace)
    
    
    Error executing action `run` on resource 'bash[migrate gitlab-rails database]'
.....
.....
Running handlers:
There was an error running gitlab-ctl reconfigure:

bash[migrate gitlab-rails database] (gitlab::database_migrations line 55) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of "bash"  "/tmp/chef-script20200915-35-lemic5" ----
STDOUT: rake aborted!
PG::ConnectionBad: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/opt/gitlab/postgresql/.s.PGSQL.5432"?
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:53:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => gitlab:db:configure
(See full trace by running task with --trace)
STDERR: 
---- End output of "bash"  "/tmp/chef-script20200915-35-lemic5" ----
Ran "bash"  "/tmp/chef-script20200915-35-lemic5" returned 1

Missing redis socket in

Running handlers:
There was an error running gitlab-ctl reconfigure:

redis_service[redis] (redis::enable line 19) had an error: RuntimeError: ruby_block[warn pending redis restart] (/opt/gitlab/embedded/cookbooks/cache/cookbooks/redis/resources/service.rb line 65) had an error: RuntimeError: Execution of the command `/opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket INFO` failed with a non-zero exit code (1)
stdout: 
stderr: Could not connect to Redis at /var/opt/gitlab/redis/redis.socket: No such file or directory

It should be noted that the /var/opt/gitlab directory has been mapped in /mnt/storage/podman/gitlab/data. GlusterFS is used for /mnt/storage, so the gitlab files resides on a GlusterFS volume.

ERROR 1) Cannot create unix socket.

Checking the /var/log/audit/audit.log reveiled the problem immediately:
Keep on reading!

collectd nginx plugin: curl_easy_perform failed because of selinux

Enabling the Nginx plugin for collectd under CentOS (or any other system using SELinux) might be confusing for a newbie. Most sources on the Internet would just install collectd-nginx:

yum install -y collectd-nginx

and configure it in the nginx.conf and collectd.conf. Still, the statistics might not work as expected, the collectd may not be able to gather statistics from the Nginx.

SELinux may prevent collectd (plugin) daemon to connect to Nginx and gather statistics from the Nginx stats page.

Checking the collectd log and it reports a problem:
Keep on reading!

Dracut boot failed with missing device – exit and continue normal booting!

This issue deserves a much more article, in fact, a straightforward tip:

You may be able to continue a normal boot only by typing “exit” and hitting enter in the “Dracut” console.

Most of the time this Dracut console entering is caused because the system administrator of the server/machine added, replaced or deleted a RAID or similar device and forgot to update the configuration (grub2 probably). And in most of these cases, the raid is not critical for machine normal boot from the root partition, but it may be critical for the services lately. Booting in normal mode, even without some devices, is the main goal because under the normal mode it easier to repair the system.
Check out the two articles on the topic (especially the first one):

SCREENSHOT 1) Just type “exit” and hit enter.

It’s worth noting that if you executed some commands in the console and/or mounted devices to test they are with healthy file system or for whatever reason you did it, the boot process may not continue after typeing exit and probablly a reboot is required. The server will go once more in this mode and then just typing will work.

main menu
type exit

Keep on reading!

CentOS 7 – Your kernel headers for kernel cannot be found at – missing kernel-devel

Getting the following error may be deceiving:

Error! echo
Your kernel headers for kernel 3.10.0-1062.1.1.el7.x86_64 cannot be found at
/lib/modules/3.10.0-1062.1.1.el7.x86_64/build or /lib/modules/3.10.0-1062.1.1.el7.x86_64/source.

Because you may have already installed the kernel-headers package for the current kernel and still to get the same error. So what is missing?

In fact, the kernel headers for compiling a kernel module is in kernel-devel package.

[root@localhost ~]# yum install kernel-devel
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.wwfx.net
 * extras: mirror.wwfx.net
 * updates: mirror.wwfx.net
Resolving Dependencies
--> Running transaction check
---> Package kernel-devel.x86_64 0:3.10.0-1062.1.1.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================
 Package                                   Arch                                Version                                           Repository                            Size
============================================================================================================================================================================
Installing:
 kernel-devel                              x86_64                              3.10.0-1062.1.1.el7                               updates                               18 M

Transaction Summary
============================================================================================================================================================================
Install  1 Package

Total download size: 18 M
Installed size: 38 M
Is this ok [y/d/N]: y
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
kernel-devel-3.10.0-1062.1.1.el7.x86_64.rpm                                                                                                          |  18 MB  00:00:02     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : kernel-devel-3.10.0-1062.1.1.el7.x86_64                                                                                                                  1/1 
  Verifying  : kernel-devel-3.10.0-1062.1.1.el7.x86_64                                                                                                                  1/1 

Installed:
  kernel-devel.x86_64 0:3.10.0-1062.1.1.el7                                                                                                                                 

Complete!

If you have used other Linux distribution the “kernel headers”/”linux headers” package just means what it is named. In the CentOS 7 world there are two packages:

[root@localhost ~]# yum info kernel-devel
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.wwfx.net
 * extras: mirror.wwfx.net
 * updates: mirror.wwfx.net
Installed Packages
Name        : kernel-devel
Arch        : x86_64
Version     : 3.10.0
Release     : 1062.1.1.el7
Size        : 38 M
Repo        : installed
From repo   : updates
Summary     : Development package for building kernel modules to match the kernel
URL         : http://www.kernel.org/
License     : GPLv2
Description : This package provides kernel headers and makefiles sufficient to build modules
            : against the kernel package.

[root@localhost ~]# yum info kernel-headers
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.wwfx.net
 * epel: mirrors.neterra.net
 * extras: mirror.wwfx.net
 * updates: mirror.wwfx.net
Installed Packages
Name        : kernel-headers
Arch        : x86_64
Version     : 3.10.0
Release     : 1062.1.1.el7
Size        : 3.7 M
Repo        : installed
From repo   : updates
Summary     : Header files for the Linux kernel for use by glibc
URL         : http://www.kernel.org/
License     : GPLv2
Description : Kernel-headers includes the C header files that specify the interface
            : between the Linux kernel and userspace libraries and programs.  The
            : header files define structures and constants that are needed for
            : building most standard programs and are also needed for rebuilding the
            : glibc package.

CentOS 7 – Dependency Resolution – Error – Requires: dkms – missing epel repository

Quick note for those not familiar with the CentOS 7 peculiarity and especially the repository peculiarity.
Receiving the follwoing error:

--> Finished Dependency Resolution
Error: Package: 3:kmod-nvidia-latest-dkms-418.87.00-2.el7.x86_64 (cuda)
           Requires: dkms
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

It means you need a package (or meta-package, which might pull multiple packages and dependencies offering a big framework, for example), which could not be found in the existing repositories. In this very case, we need the DKMS (Dynamic Kernel Module Support) – https://en.wikipedia.org/wiki/Dynamic_Kernel_Module_Support

The DKMS is offered in epel repository and it could not be found in the CentOS 7 official repositories. Just add the epel repository.

[root@localhost ~]# yum install -y epel-release
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.daticum.com
 * extras: mirrors.daticum.com
 * updates: mirrors.daticum.com
Resolving Dependencies
--> Running transaction check
---> Package epel-release.noarch 0:7-11 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================
 Package                                       Arch                                    Version                                Repository                               Size
============================================================================================================================================================================
Installing:
 epel-release                                  noarch                                  7-11                                   extras                                   15 k

Transaction Summary
============================================================================================================================================================================
Install  1 Package

Total download size: 15 k
Installed size: 24 k
Downloading packages:
epel-release-7-11.noarch.rpm                                                                                                                         |  15 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : epel-release-7-11.noarch                                                                                                                                 1/1 
  Verifying  : epel-release-7-11.noarch                                                                                                                                 1/1 

Installed:
  epel-release.noarch 0:7-11                                                                                                                                                

Complete!

And rerun your first install yum line. Now you won’t receive the DKMS error.