Continuing our series LVM2 plus cache device:
- single hard disk with a SSD device SSD cache device to a hard disk drive using LVM, which uses SSD drive as a cache device to a single hard drive.
- Mirror LVM2 device with a SSD device for cache – SSD cache device to a software raid using LVM2 – software mirror across two devices with an additional SSD cache device over the mirror.
And now we show you how to do software RAID5 with SSD cache nvme using LVM2.
The goal:
Caching RAID5 consisting of three 8T hard drives with a single 1T NVME SSD drive. Caching reads, i.e. the write-through is enabled ().
Our setup:
- 1 NVME SSD disk Samsung 1T. It will be used for writethrough cache device (you may use writeback, too, you do not care for the data if the cache device fails)!
- 3 Hard disk drive 8T grouped in RAID5 for redundancy.
STEP 1) Install lvm2 and enable the lvm2 service
Only this step is different on different Linux distributions. We included three of them:
Ubuntu 16+:
sudo apt update && apt upgrade -y sudo apt install lvm2 -y systemctl enable lvm2-lvmetad systemctl start lvm2-lvmetad
CentOS 7:
yum update yum install -y lvm2 systemctl enable lvm2-lvmetad systemctl start lvm2-lvmetad
Gentoo:
emerge --sync emerge -v sys-fs/lvm2 /etc/init.d/lvm start rc-update add default lvm
STEP 2) Add the four partitions to the lvm2.
Three partitions from the hard drives and one from the NVME SSD (the cache device). We have set up a partition in the NVME SSD device to occupy 100% of the space (but you may use 90% of the space to have a better SSD endurance and in many cases performance).
The devices are “/dev/sda5”, “/dev/sdb5”, “/dev/sdc5” (the first 4 partitions are occupied by the grub, boot, swap, root partitions of our CentOS 7 Linux distribution if you wonder why we use /dev/sd[X]5 ) and “/dev/nvme0n1p1”:
[root@srv ~]# parted /dev/sda GNU Parted 3.2 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA HGST HUH721008AL (scsi) Disk /dev/sda: 8002GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 4 1049kB 2097kB 1049kB bios_grub 1 2097kB 34.4GB 34.4GB raid 2 34.4GB 34.9GB 537MB raid 3 34.9GB 88.6GB 53.7GB raid 5 88.6GB 8002GB 7913GB primary raid (parted) q [root@srv ~]# parted /dev/sdb GNU Parted 3.2 Using /dev/sdb Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA HGST HUH721008AL (scsi) Disk /dev/sdb: 8002GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 4 1049kB 2097kB 1049kB bios_grub 1 2097kB 34.4GB 34.4GB raid 2 34.4GB 34.9GB 537MB raid 3 34.9GB 88.6GB 53.7GB raid 5 88.6GB 8002GB 7913GB primary raid (parted) q [root@srv ~]# parted /dev/sdc GNU Parted 3.2 Using /dev/sdc Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: ATA HGST HUH721008AL (scsi) Disk /dev/sdc: 8002GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 4 1049kB 2097kB 1049kB xfs bios_grub 1 2097kB 34.4GB 34.4GB raid 2 34.4GB 34.9GB 537MB raid 3 34.9GB 88.6GB 53.7GB raid 5 88.6GB 8002GB 7913GB primary raid (parted) q [root@srv ~]# parted /dev/nvme0n1 GNU Parted 3.2 Using /dev/nvme0n1 Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) p Model: NVMe Device (nvme) Disk /dev/nvme0n1: 1024GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 1024GB 1024GB primary (parted) q
Add partitions to the LVM2 (as physical volumes) and create an LVM Volume Group.
[root@srv ~]# pvcreate /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/nvme0n1p1 Physical volume "/dev/sda5" successfully created. Physical volume "/dev/sdb5" successfully created. Physical volume "/dev/sdc5" successfully created. Physical volume "/dev/nvme0n1p1" successfully created. [root@srv ~]# pvdisplay "/dev/nvme0n1p1" is a new physical volume of "<953.87 GiB" --- NEW Physical volume --- PV Name /dev/nvme0n1p1 VG Name PV Size <953.87 GiB Allocatable NO PE Size 0 Total PE 0 Free PE 0 Allocated PE 0 PV UUID MrMhqj-Tggr-ajtS-HrkV-QHJ1-LxAd-hUUNve "/dev/sda5" is a new physical volume of "<7.20 TiB" --- NEW Physical volume --- PV Name /dev/sda5 VG Name PV Size <7.20 TiB Allocatable NO PE Size 0 Total PE 0 Free PE 0 Allocated PE 0 PV UUID Csm0xo-YyrA-ATPo-13Ut-Nvra-C3IE-vuRA13 "/dev/sdb5" is a new physical volume of "<7.20 TiB" --- NEW Physical volume --- PV Name /dev/sdb5 VG Name PV Size <7.20 TiB Allocatable NO PE Size 0 Total PE 0 Free PE 0 Allocated PE 0 PV UUID GeKH1N-5mn6-HF84-d7KS-6kQO-47Dm-UdwGPB "/dev/sdc5" is a new physical volume of "<7.20 TiB" --- NEW Physical volume --- PV Name /dev/sdc5 VG Name PV Size <7.20 TiB Allocatable NO PE Size 0 Total PE 0 Free PE 0 Allocated PE 0 PV UUID myJVyA-FHiZ-Nxqg-lzma-63Az-4a4V-Orz2Rx
You may add all the devices (aka partitions) in one line with pvcreate. The pvdisplay will display meta information for the physical volumes (the partitions we’ve just added).
And then create the LVM Volume Group device. The four physical volumes must be in the same group.
[root@logs ~]# vgcreate VG_storage /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/nvme0n1p1 Volume group "VG_storage" successfully created [root@logs ~]# vgdisplay --- Volume group --- VG Name VG_storage System ID Format lvm2 Metadata Areas 4 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 4 Act PV 4 VG Size 22.52 TiB PE Size 4.00 MiB Total PE 5903990 Alloc PE / Size 0 / 0 Free PE / Size 5903990 / 22.52 TiB VG UUID PSqwF4-3WvJ-0EEX-Lb2x-MiAG-25Q0-p2a7Ap
Successfully created and you may verify it with vgdisplay
STEP 3) Create the RAID5 device.
First create the RAID5 device using the three slow hard drive disks and their pertitions “/dev/sda5”, “/dev/sdb5” and “/dev/sdc5”. We want to use all the available space on our slow disks in one logical storage device we use “100%FREE”. The name of the logical device is “lv_slow” hinting it consists of slow disks.
[root@srv ~]# lvcreate --type raid5 -l 100%FREE -I 512 -n lv_slow VG_storage /dev/sda5 /dev/sdb5 /dev/sdc5 Logical volume "lv_slow" created. [root@srv ~]# lvdisplay --- Logical volume --- LV Path /dev/VG_storage/lv_slow LV Name lv_slow VG Name VG_storage LV UUID 5gdDBR-1h7N-WA6j-20Dn-IUQR-Ry61-dcQdoG LV Write Access read/write LV Creation host, time logs.example.com, 2019-11-23 09:35:40 +0000 LV Status available # open 0 LV Size 14.39 TiB Current LE 3773198 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 6144 Block device 253:6
The “-I 512” sets the 512 Kbytes chunk size of the RAID5.
And lvdisplay will show meta information for the successfully created logical volume. Because it is a RAID5 the usable space is “three disks minus one” i.e. 14.39TiB (from 22.52 TiB).
STEP 4) Create the cache pool logical device and then convert the logical slow volume to use the newly create cache pool logical device .
First, create the cache pool logical volume with the name “lv_cache” (to show it’s a fast SSD device). Again, we use 100% available space on the physical volume (100% from the partition we’ve used).
[root@CentOS-82-64-minimal ~]# lvcreate --type cache-pool -l 100%FREE -c 1M --cachemode writethrough -n lv_cache VG_storage /dev/nvme0n1p1 Logical volume "lv_cache" created. [root@CentOS-82-64-minimal ~]# lvdisplay --- Logical volume --- LV Path /dev/VG_storage/lv_slow LV Name lv_slow VG Name VG_storage LV UUID 5gdDBR-1h7N-WA6j-20Dn-IUQR-Ry61-dcQdoG LV Write Access read/write LV Creation host, time logs.example.com, 2019-11-23 09:35:40 +0000 LV Status available # open 0 LV Size 14.39 TiB Current LE 3773198 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 6144 Block device 253:6 --- Logical volume --- LV Path /dev/VG_storage/lv_cache LV Name lv_cache VG Name VG_storage LV UUID m3h1Gq-8Yd7-WrAd-KqkJ-ljlM-z1zB-7J0Pqi LV Write Access read/write LV Creation host, time logs.example.com, 2019-11-23 09:40:40 +0000 LV Pool metadata lv_cache_cmeta LV Pool data lv_cache_cdata LV Status NOT available LV Size 953.77 GiB Current LE 244166 Segments 1 Allocation inherit Read ahead sectors auto
Verify with “lvdisplay” the cache-pool is created. We set two important parameters the write-through mode enabled. write-through saves your data from a cache device failure. If the data is not so important (like in a proxy cache server?) you may want to replace “writethrough” with “writeback” in the above command.
And now convert the cache device – the slow device (logical volume lv_slow) will have a cache device (logical volume lv_cache):
[root@srv ~]# lvconvert --type cache --cachemode writethrough --cachepool VG_storage/lv_cache VG_storage/lv_slow Do you want wipe existing metadata of cache pool VG_storage/lv_cache? [y/n]: y Logical volume VG_storage/lv_slow is now cached. [root@srv ~]# lvdisplay --- Logical volume --- LV Path /dev/VG_storage/lv_slow LV Name lv_slow VG Name VG_storage LV UUID 5gdDBR-1h7N-WA6j-20Dn-IUQR-Ry61-dcQdoG LV Write Access read/write LV Creation host, time logs.example.com, 2019-11-23 09:35:40 +0000 LV Cache pool name lv_cache LV Cache origin name lv_slow_corig LV Status available # open 0 LV Size 14.39 TiB Cache used blocks 0.01% Cache metadata blocks 16.06% Cache dirty blocks 0.00% Cache read hits/misses 0 / 48 Cache wrt hits/misses 0 / 0 Cache demotions 0 Cache promotions 3 Current LE 3773198 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:6
Note there is only one logical volume device with the name “lv_slow”, but still, you could see there is an additional logical device “inside” the lv_slow device – “lv_cache”. The properties (chunk size and writethrough mode) we’ve set earlier creating the lv_cache a preserved for the new cached lv_slow device (if you use the writeback on creation the command warns that the write-back mode breaks the data redundancy of the RAID5! Be careful with such setups – if write-back is enabled and there is a problem with the cache device (the SSD) you might lose all your data!).
STEP 5) Format and use the volume
Format and do not miss to include it in the /etc/fstab to mount it automatically on boot.
[root@srv ~]# mkfs.ext4 /dev/VG_storage/lv_slow mke2fs 1.44.3 (10-July-2018) Discarding device blocks: done Creating filesystem with 3863754752 4k blocks and 482971648 inodes Filesystem UUID: cbf0e33c-8b89-4b7b-b7dd-1a9429db3987 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 2560000000, 3855122432 Allocating group tables: done Writing inode tables: done Creating journal (262144 blocks): done Writing superblocks and filesystem accounting information: done [root@srv ~]# blkid |grep lv_slow /dev/mapper/VG_storage-lv_slow_corig_rimage_0: UUID="cbf0e33c-8b89-4b7b-b7dd-1a9429db3987" TYPE="ext4" /dev/mapper/VG_storage-lv_slow: UUID="cbf0e33c-8b89-4b7b-b7dd-1a9429db3987" TYPE="ext4"
And add it to the /etc/fstab:
UUID=cbf0e33c-8b89-4b7b-b7dd-1a9429db3987 /mnt/storag ext4 defaults,discard,noatime 1 3
And then just execute the mount command with “/mnt/storage” and you are ready to use your RAID1 with SSD cache device:
[root@static ~]# mount /mnt/storage [root@logs ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 32G 0 32G 0% /dev tmpfs 32G 0 32G 0% /dev/shm tmpfs 32G 804K 32G 1% /run tmpfs 32G 0 32G 0% /sys/fs/cgroup /dev/md2 49G 1.4G 46G 3% / /dev/md1 487M 98M 364M 22% /boot /dev/mapper/VG_storage-lv_slow 15T 21M 15T 1% /mnt/storage tmpfs 6.3G 0 6.3G 0% /run/user/0
Additional LVM information with lvs
After a day the sync (RAID5 needs initial resync) is finished (the column Cpy%Sync is 100.00).
[root@logs ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lv_cache] VG_storage Cwi---C--- 953.77g 2.40 16.07 0.00 [lv_cache_cdata] VG_storage Cwi-ao---- 953.77g [lv_cache_cmeta] VG_storage ewi-ao---- 48.00m lv_slow VG_storage Cwi-aoC--- 14.39t [lv_cache] [lv_slow_corig] 2.40 16.07 0.00 [lv_slow_corig] VG_storage rwi-aoC--- 14.39t 100.00 [lv_slow_corig_rimage_0] VG_storage iwi-aor--- <7.20t [lv_slow_corig_rimage_1] VG_storage iwi-aor--- <7.20t [lv_slow_corig_rimage_2] VG_storage iwi-aor--- <7.20t [lv_slow_corig_rmeta_0] VG_storage ewi-aor--- 4.00m [lv_slow_corig_rmeta_1] VG_storage ewi-aor--- 4.00m [lv_slow_corig_rmeta_2] VG_storage ewi-aor--- 4.00m [lvol0_pmspare] VG_storage ewi------- 48.00m
Hi,
thanks for your post! Is it possible to create a mirrored cache with lvm? (more safe i think)
It should be i possible right? but do you know how to do this?