Replacing a disk maybe sometimes challenging, especially with software RAID. If the software RAID1 went inactive this article might be for you!
Booting from a LIVECD or a rescue PXE system and all RAID devices got inactive despite the loaded personalities. We have similar article on the subject – Recovering MD array and mdadm: Cannot get array info for /dev/md0
livecd ~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath] md125 : inactive sdb3[1](S) 1047552 blocks super 1.2 md126 : inactive sdb1[1](S) 52427776 blocks super 1.2 md127 : inactive sdb2[1](S) 16515072 blocks super 1.2 unused devices: <none>
Despite the personalities are loaded, which means the kernel modules are successfully loaded – “[raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath] “. Still, something got wrong and the device’s personality is unrecognized and is inactive state.
A device in inactive state cannot be recovered and it cannot be added disks:
livecd ~ # mdadm --add /dev/md125 /dev/sda3 mdadm: Cannot get array info for /dev/md125
In general, to recover a RAID in inactive state:
- Check if the kernel modules are loaded. If the RAID setups are using RAID1, the “Personalities” line in /proc/mdstat should include it as “[raid1]”
- Try to run the device with “mdadm –run”
- Add the missing device to the RAID device with “mdadm –add” if the status of the RAID device goes to “active (auto-read-only)” or just “active”.
- Wait for the RAID device to recover.
Here are the steps and the RAID status and its changes:
STEP 1) Check if the kernel modules are loaded.
Just cat the /proc/mdstat and search for the “Personalities” line:
livecd ~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath] md125 : inactive sdb3[1](S) 1047552 blocks super 1.2 md126 : inactive sdb1[1](S) 52427776 blocks super 1.2 md127 : inactive sdb2[1](S) 16515072 blocks super 1.2 unused devices: <none>
The above example shows all software RAID modules are loaded successfully. But if someone is missing it’s simple to load it. For example, to load the RAID1 execute:
modprobe raid1
If you do not know the type of the inactive RAID, you could always check the metadata information of one of the partitions mentioned in the /proc/mdadm:
livecd ~ # mdadm -E /dev/sdb3 /dev/sdb3: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 474d4d5b:4d995cb5:a51a8287:28fb4f1a Name : srv.example.com:boot Creation Time : Fri Oct 25 12:28:25 2019 Raid Level : raid1 Raid Devices : 2 Avail Dev Size : 2095104 (1023.00 MiB 1072.69 MB) Array Size : 1047552 (1023.00 MiB 1072.69 MB) Data Offset : 4096 sectors Super Offset : 8 sectors Unused Space : before=4016 sectors, after=0 sectors State : clean Device UUID : afd7785e:4f987f6d:0e66b02a:43071feb Internal Bitmap : 8 sectors from superblock Update Time : Fri Apr 10 19:00:59 2020 Bad Block Log : 512 entries available at offset 16 sectors Checksum : 79709c4e - correct Events : 47 Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
The Raid Level is raid1 and is in clean state (which means this is not the faulty disk/partition/device). So if the kernel RAID1 module is missing, just load it.
STEP 2) Try to run the device with “mdadm –run”
Run the array:
livecd ~ # mdadm --run /dev/md126 mdadm: started array /dev/md/srv.example.com:root livecd ~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath] md125 : inactive sdb3[1](S) 1047552 blocks super 1.2 md126 : active (auto-read-only) raid1 sdb1[1] 52427776 blocks super 1.2 [2/1] [_U] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : inactive sdb2[1](S) 16515072 blocks super 1.2 unused devices: <none>
The RAID device md126 has been identified and the state has changed to active (it’s “active(auto-read-only)”, which means will turn to active when mounted). One disk is missing “[_U]”.
Execute the run command for the other two RAID devices:
livecd ~ # mdadm --run /dev/md125 mdadm: started array /dev/md/srv.example.com:boot livecd ~ # mdadm --run /dev/md127 mdadm: started array /dev/md/srv.example.com:swap livecd ~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath] md125 : active (auto-read-only) raid1 sdb3[1] 1047552 blocks super 1.2 [2/1] [_U] bitmap: 0/1 pages [0KB], 65536KB chunk md126 : active (auto-read-only) raid1 sdb1[1] 52427776 blocks super 1.2 [2/1] [_U] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active (auto-read-only) raid1 sdb2[1] 16515072 blocks super 1.2 [2/1] [_U] unused devices: <none>
STEP 3) Add the missing device to the RAID device
If you have not already copy the partitions layout to the new disk (follow this only of you have not done it yet).
sgdisk /dev/sdb -R /dev/sda sgdisk -G /dev/sda
Add the missing partitions in the RAID devices and wait for recovering.
livecd ~ # mdadm --add /dev/md126 /dev/sda1 mdadm: added /dev/sda1 livecd ~ # mdadm --add /dev/md125 /dev/sda3 mdadm: added /dev/sda3 livecd ~ # mdadm --add /dev/md127 /dev/sda2 mdadm: added /dev/sda2 livecd ~ # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath] md125 : active raid1 sda3[2] sdb3[1] 1047552 blocks super 1.2 [2/1] [_U] resync=DELAYED bitmap: 0/1 pages [0KB], 65536KB chunk md126 : active raid1 sda1[2] sdb1[1] 52427776 blocks super 1.2 [2/1] [_U] [=>...................] recovery = 6.0% (3170496/52427776) finish=6.2min speed=132104K/sec bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active raid1 sda2[2] sdb2[1] 16515072 blocks super 1.2 [2/1] [_U] resync=DELAYED unused devices: <none>
Reinstall GRUB?
In most cases a grub installation should be performed before restarting the server. Here is how to do it (BIOS Legacy mode):
livecd ~ # mkdir /mnt/recover/ livecd ~ # mount /dev/md126 /mnt/recover/ livecd ~ # mount -o bind /dev /mnt/recover/dev livecd ~ # mount -o bind /proc /mnt/recover/proc livecd ~ # mount -o bind /sys /mnt/recover/sys livecd ~ # chroot /mnt/recover/ [root@livecd (srv) /]# . /etc/profile [root@livecd (srv) /]# grub2-install /dev/sda Installing for i386-pc platform. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. Installation finished. No error reported. [root@livecd (srv) /]# nano /etc/fstab [root@livecd (srv) /]# exit livecd ~ # umount /mnt/recover/dev livecd ~ # umount /mnt/recover/proc/ livecd ~ # umount /mnt/recover/sys livecd ~ # umount /mnt/recover livecd ~ # reboot