Following the article SSD cache device to a hard disk drive using LVM the initial setup is:
- Single slow 2T harddisk. So no redundancy. If it fails all data is gone.
- SSD cache device for the slow harddisk above.
And here is how to handle a slow device failure! The data is all gone because there is no redundancy option when using a single device for data. The data is not valuable, because it is a cache server. This article will show what to expect when the slow device fails and how to replace it.
The original slow device is missing and replaced by a new one and the partitions are as follow:
- /dev/sda4 – the slow device. New device.
- /dev/sdb5 – the SSD device. Still in the LVM2 Volume Group and the Logical Volume.
Physical Volume is missing marked as “[unknown]”. The pvdisplay shows metadata information for the missing device. The vgdisplay and lvdisplay show information for the group and the logical volume, but the logical volume is in not available status. So the logical volume cannot be used, which is kind of normal when only the cache device is present.
Slow drive failure and the LVM2 status
[root@srv ~]# ssm list ------------------------------------------------------ Device Free Used Total Pool ------------------------------------------------------ /dev/sda 2.00 TB /dev/sda1 0.00 KB 50.00 GB 50.03 GB md /dev/sda2 0.00 KB 15.75 GB 15.76 GB md /dev/sda3 0.00 KB 1023.00 MB 1.00 GB md /dev/sda4 1.93 TB /dev/sdb 894.25 GB /dev/sdb1 0.00 KB 50.00 GB 50.03 GB md /dev/sdb2 0.00 KB 15.75 GB 15.76 GB md /dev/sdb3 0.00 KB 1023.00 MB 1.00 GB md /dev/sdb4 1.00 KB /dev/sdb5 0.00 KB 675.00 GB 675.00 GB VG_storage1 /dev/sdb6 152.46 GB [unknown] 0.00 KB 1.93 TB 1.93 TB VG_storage1 ------------------------------------------------------ ----------------------------------------------------- Pool Type Devices Free Used Total ----------------------------------------------------- VG_storage1 lvm 2 0.00 KB 2.59 TB 2.59 TB ----------------------------------------------------- ------------------------------------------------------------------------------ Volume Pool Volume size FS FS size Free Type Mount point ------------------------------------------------------------------------------ /dev/md125 md 50.00 GB ext4 50.00 GB 44.41 GB raid1 / /dev/md126 md 1023.00 MB ext4 1023.00 MB 788.84 MB raid1 /boot /dev/md127 md 15.75 GB raid1 /dev/sdb6 152.46 GB ext4 152.46 GB 145.77 GB ------------------------------------------------------------------------------ ---------------------------------------------------------------------------------- Snapshot Origin Pool Volume size Type ---------------------------------------------------------------------------------- /dev/VG_storage1/lv_storage1 [lv_storage1_corig] VG_storage1 1.93 TB cache ---------------------------------------------------------------------------------- [root@srv ~]# pvdisplay WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. --- Physical volume --- PV Name /dev/sdb5 VG Name VG_storage1 PV Size 675.00 GiB / not usable 4.00 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 172799 Free PE 0 Allocated PE 172799 PV UUID oLn3hh-ROFU-WSW8-0m8P-YLWY-Akoz-nCxh96 --- Physical volume --- PV Name [unknown] VG Name VG_storage1 PV Size 1.93 TiB / not usable 4.00 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 507188 Free PE 0 Allocated PE 507188 PV UUID IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd [root@srv ~]# pvscan WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. PV /dev/sdb5 VG VG_storage1 lvm2 [<675.00 GiB / 0 free] PV [unknown] VG VG_storage1 lvm2 [1.93 TiB / 0 free] Total: 2 [2.59 TiB] / in use: 2 [2.59 TiB] / in no VG: 0 [0 ] [root@srv ~]# vgdisplay WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. --- Volume group --- VG Name VG_storage1 System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 10 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 2 Act PV 1 VG Size 2.59 TiB PE Size 4.00 MiB Total PE 679987 Alloc PE / Size 679987 / 2.59 TiB Free PE / Size 0 / 0 VG UUID eZ2ZIb-jcDl-kPLj-oFwJ-LLuN-VxLD-rVJclP [root@srv ~]# vgscan Reading volume groups from cache. WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. Found volume group "VG_storage1" using metadata type lvm2 [root@srv ~]# lvdisplay WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. --- Logical volume --- LV Path /dev/VG_storage1/lv_storage1 LV Name lv_storage1 VG Name VG_storage1 LV UUID NFWlWF-VmSO-HVr4-72RW-YY82-1ax2-92cI6P LV Write Access read/write LV Creation host, time srv.example.com, 2019-10-25 17:18:41 +0000 LV Cache pool name lv_cache LV Cache origin name lv_storage1_corig LV Status NOT available LV Size 1.93 TiB Current LE 507188 Segments 1 Allocation inherit Read ahead sectors auto [root@srv ~]# lvscan WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. inactive '/dev/VG_storage1/lv_storage1' [1.93 TiB] inherit [root@srv ~]# lvs lvs lvscan [root@srv ~]# lvs -a WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lv_cache] VG_storage1 Cwi---C--- 674.90g [lv_cache_cdata] VG_storage1 Cwi------- 674.90g [lv_cache_cmeta] VG_storage1 ewi------- 48.00m lv_storage1 VG_storage1 Cwi---C-p- 1.93t [lv_cache] [lv_storage1_corig] [lv_storage1_corig] VG_storage1 owi---C-p- 1.93t [lvol0_pmspare] VG_storage1 ewi------- 48.00m
Adding a device to the physical devices is possible.
The new slow device is added successfully. It is a partition /dev/sda4:
[root@srv ~]# pvcreate /dev/sda4 WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. Physical volume "/dev/sda4" successfully created. [root@srv ~]# pvdisplay --- Physical volume --- PV Name /dev/sdb5 VG Name VG_storage1 PV Size 675.00 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 172799 Free PE 172799 Allocated PE 0 PV UUID oLn3hh-ROFU-WSW8-0m8P-YLWY-Akoz-nCxh96 --- Physical volume --- PV Name [unknown] VG Name VG_storage1 PV Size 1.93 TiB / not usable 4.00 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 507188 Free PE 0 Allocated PE 507188 PV UUID IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd --- Physical volume --- PV Name /dev/sda4 VG Name VG_storage1 PV Size 1.93 TiB / not usable <5.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 507188 Free PE 507188 Allocated PE 0 PV UUID wyhdu9-dEgh-8YPx-sw32-mSAw-fL7w-IBFe2d
The unknown device is still here.
Extending the current Volume Group is possible.
Extend the current Volume Group with the replacement device.
[root@srv ~]# vgextend VG_storage1 /dev/sda4 WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Volume group "VG_storage1" successfully extended [root@srv ~]# vgdisplay WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. --- Volume group --- VG Name VG_storage1 System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 12 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 3 Act PV 2 VG Size <4.53 TiB PE Size 4.00 MiB Total PE 1187175 Alloc PE / Size 679987 / 2.59 TiB Free PE / Size 507188 / 1.93 TiB VG UUID eZ2ZIb-jcDl-kPLj-oFwJ-LLuN-VxLD-rVJclP
The volume group size increases by the size of the newly added physical device (/dev/sda4) despite the missing original faulty slow device.
Extending the Logical Device is not possible till there are missing devices.
[root@srv ~]# lvextend VG_storage1/lv_storage1 /dev/sda4 WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. Cannot change VG VG_storage1 while PVs are missing. Consider vgreduce --removemissing. Cannot process volume group VG_storage1
Reduce the size of the Volume Group by removing the missing device.
By using vgreduce the missing device will be removed from the Logical Volume and in this case, the Logical Volume will be removed (no other backend device as if there were a mirror). The physical device marked as “[unknown]” will be removed, too. The –force option is needed (probably because there is unclean cache in the cache device)!
You may want to test the command before the actual execution using –test. The first command is with –test, which instracts the vgreduce to output the actions taken, but without actually written them on the devices (test mode!).
[root@srv ~]# vgreduce --removemissing --test --force VG_storage1 TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated. WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. WARNING: Removing partial LV VG_storage1/lv_storage1. Failed to active cache locally VG_storage1/lv_storage1. Failed to uncache VG_storage1/lv_storage1. [root@srv ~]# vgreduce --removemissing --force VG_storage1 WARNING: Device for PV IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd not found or rejected by a filter. Couldn't find device with uuid IfF78Q-KV3N-GH94-6wvU-23ku-jC20-S8tcBd. WARNING: Removing partial LV VG_storage1/lv_storage1. Flushing 0 blocks for cache VG_storage1/lv_storage1. Logical volume "lv_cache" successfully removed Logical volume "lv_storage1" successfully removed Wrote out consistent volume group VG_storage1.
The second command removes the Logical Volume, reduces the Volume Group by removing the physical volume [unknown] and deleting the [unknown] physical volume. No missing PVs are reported after the second command. Here is the status of all the devices:
[root@srv ~]# lvdisplay [root@srv ~]# lvscan [root@srv ~]# vgdisplay --- Volume group --- VG Name VG_storage1 System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 19 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 2.59 TiB PE Size 4.00 MiB Total PE 679987 Alloc PE / Size 0 / 0 Free PE / Size 679987 / 2.59 TiB VG UUID eZ2ZIb-jcDl-kPLj-oFwJ-LLuN-VxLD-rVJclP [root@srv ~]# pvdisplay --- Physical volume --- PV Name /dev/sdb5 VG Name VG_storage1 PV Size 675.00 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 172799 Free PE 172799 Allocated PE 0 PV UUID oLn3hh-ROFU-WSW8-0m8P-YLWY-Akoz-nCxh96 --- Physical volume --- PV Name /dev/sda4 VG Name VG_storage1 PV Size 1.93 TiB / not usable <5.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 507188 Free PE 507188 Allocated PE 0 PV UUID wyhdu9-dEgh-8YPx-sw32-mSAw-fL7w-IBFe2d
Create the LVM2 cached devices, again.
Create a cache pool device and then use it in a create command for Logical Volume. For details, you may want to read the original article – SSD cache device to a hard disk drive using LVM
[root@srv ~]# lvcreate --type cache-pool -l 100%FREE -n lv_cache VG_storage1 /dev/sdb5 Using 2.75 MiB chunk size instead of default 64.00 KiB, so cache pool has less than 1000000 chunks. Logical volume "lv_cache" created. [root@srv ~]# lvdisplay --- Logical volume --- LV Path /dev/VG_storage1/lv_cache LV Name lv_cache VG Name VG_storage1 LV UUID YdhMVE-R8FM-sx4v-SPc9-rLRz-90s8-u1sF00 LV Write Access read/write LV Creation host, time srv.example.com, 2020-04-11 22:50:14 +0000 LV Pool metadata lv_cache_cmeta LV Pool data lv_cache_cdata LV Status NOT available LV Size 674.90 GiB Current LE 172775 Segments 1 Allocation inherit Read ahead sectors auto [root@srv ~]# lvcreate --type cache -l 100%FREE -n lv_storage1 --cachepool lv_cache VG_storage1 --cachemode writeback Logical volume "lv_storage1" created. [root@srv ~]# lvdisplay --- Logical volume --- LV Path /dev/VG_storage1/lv_storage1 LV Name lv_storage1 VG Name VG_storage1 LV UUID xtfvHV-Sbva-TOAq-gHqo-ALSq-cpXF-xwKP82 LV Write Access read/write LV Creation host, time srv.example.com, 2020-04-11 22:51:29 +0000 LV Cache pool name lv_cache LV Cache origin name lv_storage1_corig LV Status available # open 0 LV Size 1.93 TiB Cache used blocks 0.01% Cache metadata blocks 4.19% Cache dirty blocks 0.00% Cache read hits/misses 0 / 47 Cache wrt hits/misses 0 / 0 Cache demotions 0 Cache promotions 2 Current LE 507188 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:0 [root@srv ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [lv_cache] VG_storage1 Cwi---C--- 674.90g 0.01 4.19 0.00 [lv_cache_cdata] VG_storage1 Cwi-ao---- 674.90g [lv_cache_cmeta] VG_storage1 ewi-ao---- 48.00m lv_storage1 VG_storage1 Cwi-a-C--- 1.93t [lv_cache] [lv_storage1_corig] 0.01 4.19 0.00 [lv_storage1_corig] VG_storage1 owi-aoC--- 1.93t [lvol0_pmspare] VG_storage1 ewi------- 48.00m
The cached device is ready for use. Just format it, mount it and use it…
[root@srv ~]# mkfs.ext4 -m 0 /dev/VG_storage1/lv_storage1 mke2fs 1.42.9 (28-Dec-2013) Discarding device blocks: done Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=704 blocks, Stripe width=704 blocks 129843200 inodes, 519360512 blocks 0 blocks (0.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2667577344 15850 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done [root@srv ~]# mount /dev/VG_storage1/lv_storage1 /mnt/storage1 [root@srv ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 8.9M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/md126 50G 2.2G 45G 5% / /dev/md125 991M 151M 773M 17% /boot /dev/mapper/VG_storage1-lv_storage1 2.0T 82M 2.0T 1% /storage1 tmpfs 3.2G 0 3.2G 0% /run/user/0 [root@srv) ~]# lvdisplay --- Logical volume --- LV Path /dev/VG_storage1/lv_storage1 LV Name lv_storage1 VG Name VG_storage1 LV UUID xtfvHV-Sbva-TOAq-gHqo-ALSq-cpXF-xwKP82 LV Write Access read/write LV Creation host, time srv.example.com, 2020-04-11 22:51:29 +0000 LV Cache pool name lv_cache LV Cache origin name lv_storage1_corig LV Status available # open 1 LV Size 1.93 TiB Cache used blocks 2.64% Cache metadata blocks 6.23% Cache dirty blocks 0.00% Cache read hits/misses 1240 / 93 Cache wrt hits/misses 44315 / 51907 Cache demotions 0 Cache promotions 1085 Current LE 507188 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:3
Of course, it is a good idea to reboot the server to see everything is OK over a restart though it is not mandatory.
Thanks a lot, exactly the needed information after loosing a hard drive in a thin cached pool, after that the server was failing to boot properly.