SSD cache device to a software RAID5 using LVM2

Author:

Continuing our series LVM2 plus cache device:

  1. single hard disk with a SSD device SSD cache device to a hard disk drive using LVM, which uses SSD drive as a cache device to a single hard drive.
  2. Mirror LVM2 device with a SSD device for cache – SSD cache device to a software raid using LVM2 – software mirror across two devices with an additional SSD cache device over the mirror.

And now we show you how to do software RAID5 with SSD cache nvme using LVM2.

The goal:
Caching RAID5 consisting of three 8T hard drives with a single 1T NVME SSD drive. Caching reads, i.e. the write-through is enabled ().
Our setup:

  • 1 NVME SSD disk Samsung 1T. It will be used for writethrough cache device (you may use writeback, too, you do not care for the data if the cache device fails)!
  • 3 Hard disk drive 8T grouped in RAID5 for redundancy.

STEP 1) Install lvm2 and enable the lvm2 service

Only this step is different on different Linux distributions. We included three of them:
Ubuntu 16+:

sudo apt update && apt upgrade -y
sudo apt install lvm2 -y
systemctl enable lvm2-lvmetad
systemctl start lvm2-lvmetad

CentOS 7:

yum update
yum install -y lvm2
systemctl enable lvm2-lvmetad
systemctl start lvm2-lvmetad

Gentoo:

emerge --sync
emerge -v sys-fs/lvm2
/etc/init.d/lvm start
rc-update add default lvm

STEP 2) Add the four partitions to the lvm2.

Three partitions from the hard drives and one from the NVME SSD (the cache device). We have set up a partition in the NVME SSD device to occupy 100% of the space (but you may use 90% of the space to have a better SSD endurance and in many cases performance).
The devices are “/dev/sda5”, “/dev/sdb5”, “/dev/sdc5” (the first 4 partitions are occupied by the grub, boot, swap, root partitions of our CentOS 7 Linux distribution if you wonder why we use /dev/sd[X]5 ) and “/dev/nvme0n1p1”:

[root@srv ~]# parted /dev/sda
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: ATA HGST HUH721008AL (scsi)
Disk /dev/sda: 8002GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start   End     Size    File system  Name     Flags
 4      1049kB  2097kB  1049kB                        bios_grub
 1      2097kB  34.4GB  34.4GB                        raid
 2      34.4GB  34.9GB  537MB                         raid
 3      34.9GB  88.6GB  53.7GB                        raid
 5      88.6GB  8002GB  7913GB               primary  raid

(parted) q                                                                
[root@srv ~]# parted /dev/sdb
GNU Parted 3.2
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: ATA HGST HUH721008AL (scsi)
Disk /dev/sdb: 8002GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start   End     Size    File system  Name     Flags
 4      1049kB  2097kB  1049kB                        bios_grub
 1      2097kB  34.4GB  34.4GB                        raid
 2      34.4GB  34.9GB  537MB                         raid
 3      34.9GB  88.6GB  53.7GB                        raid
 5      88.6GB  8002GB  7913GB               primary  raid

(parted) q                                                                
[root@srv ~]# parted /dev/sdc
GNU Parted 3.2
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: ATA HGST HUH721008AL (scsi)
Disk /dev/sdc: 8002GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start   End     Size    File system  Name     Flags
 4      1049kB  2097kB  1049kB  xfs                   bios_grub
 1      2097kB  34.4GB  34.4GB                        raid
 2      34.4GB  34.9GB  537MB                         raid
 3      34.9GB  88.6GB  53.7GB                        raid
 5      88.6GB  8002GB  7913GB               primary  raid

(parted) q                                                                  
[root@srv ~]# parted /dev/nvme0n1
GNU Parted 3.2
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: NVMe Device (nvme)
Disk /dev/nvme0n1: 1024GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  1024GB  1024GB               primary

(parted) q

Add partitions to the LVM2 (as physical volumes) and create an LVM Volume Group.

[root@srv ~]# pvcreate /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/nvme0n1p1
  Physical volume "/dev/sda5" successfully created.
  Physical volume "/dev/sdb5" successfully created.
  Physical volume "/dev/sdc5" successfully created.
  Physical volume "/dev/nvme0n1p1" successfully created.
[root@srv ~]# pvdisplay
  "/dev/nvme0n1p1" is a new physical volume of "<953.87 GiB"
  --- NEW Physical volume ---
  PV Name               /dev/nvme0n1p1
  VG Name               
  PV Size               <953.87 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               MrMhqj-Tggr-ajtS-HrkV-QHJ1-LxAd-hUUNve
   
  "/dev/sda5" is a new physical volume of "<7.20 TiB"
  --- NEW Physical volume ---
  PV Name               /dev/sda5
  VG Name               
  PV Size               <7.20 TiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               Csm0xo-YyrA-ATPo-13Ut-Nvra-C3IE-vuRA13
   
  "/dev/sdb5" is a new physical volume of "<7.20 TiB"
  --- NEW Physical volume ---
  PV Name               /dev/sdb5
  VG Name               
  PV Size               <7.20 TiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               GeKH1N-5mn6-HF84-d7KS-6kQO-47Dm-UdwGPB
   
  "/dev/sdc5" is a new physical volume of "<7.20 TiB"
  --- NEW Physical volume ---
  PV Name               /dev/sdc5
  VG Name               
  PV Size               <7.20 TiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               myJVyA-FHiZ-Nxqg-lzma-63Az-4a4V-Orz2Rx
   

You may add all the devices (aka partitions) in one line with pvcreate. The pvdisplay will display meta information for the physical volumes (the partitions we’ve just added).

And then create the LVM Volume Group device. The four physical volumes must be in the same group.

[root@logs ~]# vgcreate VG_storage /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/nvme0n1p1
  Volume group "VG_storage" successfully created
[root@logs ~]# vgdisplay
  --- Volume group ---
  VG Name               VG_storage
  System ID             
  Format                lvm2
  Metadata Areas        4
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                4
  Act PV                4
  VG Size               22.52 TiB
  PE Size               4.00 MiB
  Total PE              5903990
  Alloc PE / Size       0 / 0   
  Free  PE / Size       5903990 / 22.52 TiB
  VG UUID               PSqwF4-3WvJ-0EEX-Lb2x-MiAG-25Q0-p2a7Ap

Successfully created and you may verify it with vgdisplay

STEP 3) Create the RAID5 device.

First create the RAID5 device using the three slow hard drive disks and their pertitions “/dev/sda5”, “/dev/sdb5” and “/dev/sdc5”. We want to use all the available space on our slow disks in one logical storage device we use “100%FREE”. The name of the logical device is “lv_slow” hinting it consists of slow disks.

[root@srv ~]# lvcreate --type raid5 -l 100%FREE -I 512 -n lv_slow VG_storage /dev/sda5 /dev/sdb5 /dev/sdc5 
  Logical volume "lv_slow" created.
[root@srv ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/VG_storage/lv_slow
  LV Name                lv_slow
  VG Name                VG_storage
  LV UUID                5gdDBR-1h7N-WA6j-20Dn-IUQR-Ry61-dcQdoG
  LV Write Access        read/write
  LV Creation host, time logs.example.com, 2019-11-23 09:35:40 +0000
  LV Status              available
  # open                 0
  LV Size                14.39 TiB
  Current LE             3773198
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     6144
  Block device           253:6

The “-I 512” sets the 512 Kbytes chunk size of the RAID5.
And lvdisplay will show meta information for the successfully created logical volume. Because it is a RAID5 the usable space is “three disks minus one” i.e. 14.39TiB (from 22.52 TiB).

STEP 4) Create the cache pool logical device and then convert the logical slow volume to use the newly create cache pool logical device .

First, create the cache pool logical volume with the name “lv_cache” (to show it’s a fast SSD device). Again, we use 100% available space on the physical volume (100% from the partition we’ve used).

[root@CentOS-82-64-minimal ~]# lvcreate --type cache-pool -l 100%FREE -c 1M --cachemode writethrough -n lv_cache VG_storage /dev/nvme0n1p1
  Logical volume "lv_cache" created.
[root@CentOS-82-64-minimal ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/VG_storage/lv_slow
  LV Name                lv_slow
  VG Name                VG_storage
  LV UUID                5gdDBR-1h7N-WA6j-20Dn-IUQR-Ry61-dcQdoG
  LV Write Access        read/write
  LV Creation host, time logs.example.com, 2019-11-23 09:35:40 +0000
  LV Status              available
  # open                 0
  LV Size                14.39 TiB
  Current LE             3773198
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     6144
  Block device           253:6
   
  --- Logical volume ---
  LV Path                /dev/VG_storage/lv_cache
  LV Name                lv_cache
  VG Name                VG_storage
  LV UUID                m3h1Gq-8Yd7-WrAd-KqkJ-ljlM-z1zB-7J0Pqi
  LV Write Access        read/write
  LV Creation host, time logs.example.com, 2019-11-23 09:40:40 +0000
  LV Pool metadata       lv_cache_cmeta
  LV Pool data           lv_cache_cdata
  LV Status              NOT available
  LV Size                953.77 GiB
  Current LE             244166
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

Verify with “lvdisplay” the cache-pool is created. We set two important parameters the write-through mode enabled. write-through saves your data from a cache device failure. If the data is not so important (like in a proxy cache server?) you may want to replace “writethrough” with “writeback” in the above command.

And now convert the cache device – the slow device (logical volume lv_slow) will have a cache device (logical volume lv_cache):

[root@srv ~]# lvconvert --type cache --cachemode writethrough --cachepool VG_storage/lv_cache VG_storage/lv_slow
Do you want wipe existing metadata of cache pool VG_storage/lv_cache? [y/n]: y
  Logical volume VG_storage/lv_slow is now cached.
[root@srv ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/VG_storage/lv_slow
  LV Name                lv_slow
  VG Name                VG_storage
  LV UUID                5gdDBR-1h7N-WA6j-20Dn-IUQR-Ry61-dcQdoG
  LV Write Access        read/write
  LV Creation host, time logs.example.com, 2019-11-23 09:35:40 +0000
  LV Cache pool name     lv_cache
  LV Cache origin name   lv_slow_corig
  LV Status              available
  # open                 0
  LV Size                14.39 TiB
  Cache used blocks      0.01%
  Cache metadata blocks  16.06%
  Cache dirty blocks     0.00%
  Cache read hits/misses 0 / 48
  Cache wrt hits/misses  0 / 0
  Cache demotions        0
  Cache promotions       3
  Current LE             3773198
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:6

Note there is only one logical volume device with the name “lv_slow”, but still, you could see there is an additional logical device “inside” the lv_slow device – “lv_cache”. The properties (chunk size and writethrough mode) we’ve set earlier creating the lv_cache a preserved for the new cached lv_slow device (if you use the writeback on creation the command warns that the write-back mode breaks the data redundancy of the RAID5! Be careful with such setups – if write-back is enabled and there is a problem with the cache device (the SSD) you might lose all your data!).

STEP 5) Format and use the volume

Format and do not miss to include it in the /etc/fstab to mount it automatically on boot.

[root@srv ~]# mkfs.ext4 /dev/VG_storage/lv_slow
mke2fs 1.44.3 (10-July-2018)
Discarding device blocks: done                            
Creating filesystem with 3863754752 4k blocks and 482971648 inodes
Filesystem UUID: cbf0e33c-8b89-4b7b-b7dd-1a9429db3987
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 
        2560000000, 3855122432

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
[root@srv ~]# blkid |grep lv_slow
/dev/mapper/VG_storage-lv_slow_corig_rimage_0: UUID="cbf0e33c-8b89-4b7b-b7dd-1a9429db3987" TYPE="ext4"
/dev/mapper/VG_storage-lv_slow: UUID="cbf0e33c-8b89-4b7b-b7dd-1a9429db3987" TYPE="ext4"

And add it to the /etc/fstab:

UUID=cbf0e33c-8b89-4b7b-b7dd-1a9429db3987 /mnt/storag ext4 defaults,discard,noatime 1 3

And then just execute the mount command with “/mnt/storage” and you are ready to use your RAID1 with SSD cache device:

[root@static ~]# mount /mnt/storage
[root@logs ~]# df -h
Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                         32G     0   32G   0% /dev
tmpfs                            32G     0   32G   0% /dev/shm
tmpfs                            32G  804K   32G   1% /run
tmpfs                            32G     0   32G   0% /sys/fs/cgroup
/dev/md2                         49G  1.4G   46G   3% /
/dev/md1                        487M   98M  364M  22% /boot
/dev/mapper/VG_storage-lv_slow   15T   21M   15T   1% /mnt/storage
tmpfs                           6.3G     0  6.3G   0% /run/user/0

Additional LVM information with lvs

After a day the sync (RAID5 needs initial resync) is finished (the column Cpy%Sync is 100.00).

[root@logs ~]# lvs -a
  LV                       VG         Attr       LSize   Pool       Origin          Data%  Meta%  Move Log Cpy%Sync Convert
  [lv_cache]               VG_storage Cwi---C--- 953.77g                            2.40   16.07           0.00            
  [lv_cache_cdata]         VG_storage Cwi-ao---- 953.77g                                                                   
  [lv_cache_cmeta]         VG_storage ewi-ao----  48.00m                                                                   
  lv_slow                  VG_storage Cwi-aoC---  14.39t [lv_cache] [lv_slow_corig] 2.40   16.07           0.00            
  [lv_slow_corig]          VG_storage rwi-aoC---  14.39t                                                   100.00          
  [lv_slow_corig_rimage_0] VG_storage iwi-aor---  <7.20t                                                                   
  [lv_slow_corig_rimage_1] VG_storage iwi-aor---  <7.20t                                                                   
  [lv_slow_corig_rimage_2] VG_storage iwi-aor---  <7.20t                                                                   
  [lv_slow_corig_rmeta_0]  VG_storage ewi-aor---   4.00m                                                                   
  [lv_slow_corig_rmeta_1]  VG_storage ewi-aor---   4.00m                                                                   
  [lv_slow_corig_rmeta_2]  VG_storage ewi-aor---   4.00m                                                                   
  [lvol0_pmspare]          VG_storage ewi-------  48.00m

One thought on “SSD cache device to a software RAID5 using LVM2”

  1. Hi,

    thanks for your post! Is it possible to create a mirrored cache with lvm? (more safe i think)
    It should be i possible right? but do you know how to do this?

Leave a Reply

Your email address will not be published. Required fields are marked *