Replacing a disk maybe sometimes challenging, especially with software RAID. If the software RAID1 went inactive this article might be for you!
Booting from a LIVECD or a rescue PXE system and all RAID devices got inactive despite the loaded personalities. We have similar article on the subject – Recovering MD array and mdadm: Cannot get array info for /dev/md0
livecd ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath]
md125 : inactive sdb3[1](S)
1047552 blocks super 1.2
md126 : inactive sdb1[1](S)
52427776 blocks super 1.2
md127 : inactive sdb2[1](S)
16515072 blocks super 1.2
unused devices: <none>
Despite the personalities are loaded, which means the kernel modules are successfully loaded – “[raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath] “. Still, something got wrong and the device’s personality is unrecognized and is inactive state.
A device in inactive state cannot be recovered and it cannot be added disks:
livecd ~ # mdadm --add /dev/md125 /dev/sda3 mdadm: Cannot get array info for /dev/md125
In general, to recover a RAID in inactive state:
- Check if the kernel modules are loaded. If the RAID setups are using RAID1, the “Personalities” line in /proc/mdstat should include it as “[raid1]”
- Try to run the device with “mdadm –run”
- Add the missing device to the RAID device with “mdadm –add” if the status of the RAID device goes to “active (auto-read-only)” or just “active”.
- Wait for the RAID device to recover.
Here are the steps and the RAID status and its changes:
STEP 1) Check if the kernel modules are loaded.
Just cat the /proc/mdstat and search for the “Personalities” line:
livecd ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath]
md125 : inactive sdb3[1](S)
1047552 blocks super 1.2
md126 : inactive sdb1[1](S)
52427776 blocks super 1.2
md127 : inactive sdb2[1](S)
16515072 blocks super 1.2
unused devices: <none>
The above example shows all software RAID modules are loaded successfully. But if someone is missing it’s simple to load it. For example, to load the RAID1 execute:
modprobe raid1
If you do not know the type of the inactive RAID, you could always check the metadata information of one of the partitions mentioned in the /proc/mdadm:
livecd ~ # mdadm -E /dev/sdb3
/dev/sdb3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 474d4d5b:4d995cb5:a51a8287:28fb4f1a
Name : srv.example.com:boot
Creation Time : Fri Oct 25 12:28:25 2019
Raid Level : raid1
Raid Devices : 2
Avail Dev Size : 2095104 (1023.00 MiB 1072.69 MB)
Array Size : 1047552 (1023.00 MiB 1072.69 MB)
Data Offset : 4096 sectors
Super Offset : 8 sectors
Unused Space : before=4016 sectors, after=0 sectors
State : clean
Device UUID : afd7785e:4f987f6d:0e66b02a:43071feb
Internal Bitmap : 8 sectors from superblock
Update Time : Fri Apr 10 19:00:59 2020
Bad Block Log : 512 entries available at offset 16 sectors
Checksum : 79709c4e - correct
Events : 47
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
The Raid Level is raid1 and is in clean state (which means this is not the faulty disk/partition/device). So if the kernel RAID1 module is missing, just load it.
STEP 2) Try to run the device with “mdadm –run”
Run the array:
livecd ~ # mdadm --run /dev/md126
mdadm: started array /dev/md/srv.example.com:root
livecd ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath]
md125 : inactive sdb3[1](S)
1047552 blocks super 1.2
md126 : active (auto-read-only) raid1 sdb1[1]
52427776 blocks super 1.2 [2/1] [_U]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : inactive sdb2[1](S)
16515072 blocks super 1.2
unused devices: <none>
The RAID device md126 has been identified and the state has changed to active (it’s “active(auto-read-only)”, which means will turn to active when mounted). One disk is missing “[_U]”.
Execute the run command for the other two RAID devices:
livecd ~ # mdadm --run /dev/md125
mdadm: started array /dev/md/srv.example.com:boot
livecd ~ # mdadm --run /dev/md127
mdadm: started array /dev/md/srv.example.com:swap
livecd ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath]
md125 : active (auto-read-only) raid1 sdb3[1]
1047552 blocks super 1.2 [2/1] [_U]
bitmap: 0/1 pages [0KB], 65536KB chunk
md126 : active (auto-read-only) raid1 sdb1[1]
52427776 blocks super 1.2 [2/1] [_U]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active (auto-read-only) raid1 sdb2[1]
16515072 blocks super 1.2 [2/1] [_U]
unused devices: <none>
STEP 3) Add the missing device to the RAID device
If you have not already copy the partitions layout to the new disk (follow this only of you have not done it yet).
sgdisk /dev/sdb -R /dev/sda sgdisk -G /dev/sda
Add the missing partitions in the RAID devices and wait for recovering.
livecd ~ # mdadm --add /dev/md126 /dev/sda1
mdadm: added /dev/sda1
livecd ~ # mdadm --add /dev/md125 /dev/sda3
mdadm: added /dev/sda3
livecd ~ # mdadm --add /dev/md127 /dev/sda2
mdadm: added /dev/sda2
livecd ~ # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] [linear] [multipath]
md125 : active raid1 sda3[2] sdb3[1]
1047552 blocks super 1.2 [2/1] [_U]
resync=DELAYED
bitmap: 0/1 pages [0KB], 65536KB chunk
md126 : active raid1 sda1[2] sdb1[1]
52427776 blocks super 1.2 [2/1] [_U]
[=>...................] recovery = 6.0% (3170496/52427776) finish=6.2min speed=132104K/sec
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid1 sda2[2] sdb2[1]
16515072 blocks super 1.2 [2/1] [_U]
resync=DELAYED
unused devices: <none>
Reinstall GRUB?
In most cases a grub installation should be performed before restarting the server. Here is how to do it (BIOS Legacy mode):
livecd ~ # mkdir /mnt/recover/ livecd ~ # mount /dev/md126 /mnt/recover/ livecd ~ # mount -o bind /dev /mnt/recover/dev livecd ~ # mount -o bind /proc /mnt/recover/proc livecd ~ # mount -o bind /sys /mnt/recover/sys livecd ~ # chroot /mnt/recover/ [root@livecd (srv) /]# . /etc/profile [root@livecd (srv) /]# grub2-install /dev/sda Installing for i386-pc platform. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image.. Installation finished. No error reported. [root@livecd (srv) /]# nano /etc/fstab [root@livecd (srv) /]# exit livecd ~ # umount /mnt/recover/dev livecd ~ # umount /mnt/recover/proc/ livecd ~ # umount /mnt/recover/sys livecd ~ # umount /mnt/recover livecd ~ # reboot