Recovering MD array and mdadm: Cannot get array info for /dev/md0

What a case! A long story short one of our disks got a bad disk in a software RAID1 setup and when we tried replacing the disk in a recovery Linux console we got the strange error of an MD device:

mdadm: Cannot get array info for /dev/md125

And ccording to the /proc/mdstat the device was there and mdadm -E reported the array was “clean”.

root@631019 ~ # mdadm --add /dev/md125 /dev/sdb2
mdadm: Cannot get array info for /dev/md125

root@631019 ~ # cat /proc/mdstat                                                        🙁
Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] [raid10] 
md122 : inactive sda4[0](S)
      33520640 blocks super 1.2
       
md123 : inactive sda5[0](S)
      1914583040 blocks super 1.2
       
md124 : inactive sda3[0](S)
      4189184 blocks super 1.2
       
md125 : inactive sda2[0](S)
      1048512 blocks
       
unused devices: <none>

root@631019 ~ # mdadm -E /dev/sda2                                                      🙁
/dev/sda2:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : aff708ee:16669ffb:1a120e13:7e9185ae
  Creation Time : Thu Mar 14 15:10:21 2019
     Raid Level : raid1
  Used Dev Size : 1048512 (1023.94 MiB 1073.68 MB)
     Array Size : 1048512 (1023.94 MiB 1073.68 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 126

    Update Time : Thu Jul 11 10:22:17 2019
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c1ee0a10 - correct
         Events : 103


      Number   Major   Minor   RaidDevice State
this     0       8        2        0      active sync   /dev/sda2

   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2

The important piece of information here is that the RAID1 is in an inactive state, which is really strange! It is perfectly normal to be started with one disk missing (the raid as you can see consists from 2 disks) and in read-only mode before mounting it. But here it is in an inactive state! The output of /proc/mdstat shows a sign of inappropriate assembly of all those arrays probably during the boot of the rescue Linux system – missing information or old version of mdadm utility or some other configuration loaded! In such states – inactive and as you see no information about the type of the arrays it is normal mdadm to report error it could not get current array info. The key word here is CURRENT despite mdadm misses it in the error output:

root@631019 ~ # mdadm --add /dev/md125 /dev/sdb2
mdadm: Cannot get array info for /dev/md125

Because in fact mdadm tries adding a disk in the currently loaded configuration, not the real one in your disks!

The solution

  1. Remove ALL current configuration by issuing multiple stop commands with mdadm, no inactive raids or any raids should be reported in “/proc/mdstat”.
  2. Remove (or better rename) mdadm configuration files in /etc/mdadm.conf (in some Linux distributions is /etc/mdadm/mdadm.conf).
  3. Rescan for MD devices with mdadm. The mdadm will load the configuration from your disks.
  4. Add the missing partitions to your software raid devices.

Keep on reading!

Centos 7 Server hangs up on boot after deleting a software raid (mdadm device)

We have a CentOS 7 server with a simple two hard drives setup in RAID1 of total 4 devices for boot, root, swap and storage. The storage device (/dev/md5) was removed and recreated with RAID0 for better performance, because the server was promoted as only cache server. Then the server was restarted and it never went up.
On IPMI KVM it just started loading the kernel and hanged up after several seconds without any additional information:

The kernel loads the mdadm devices and do not continue and the device md5 is missing.

main menu
CentOS 7 kernel loading the mdadm RAID devices

To boot successfully you must remove the missing device

On the Grub 2 menu press “e” and you’ll get this screen. Here you can edit all lines if you need. You must remove the last rd.md.uuid in our case or the one you deleted. Remove it and press Ctrl+x to load the kernel.

main menu
Grub 2 edit

There are two options you can do:

  • OPTION 1) Remove rd.md.uuid option of your old mdadm device
  • OPTION 2) Replace the ID in rd.md.uuid= with the new ID of the mdadm device.

Each of these two options could be used to solve the booting problem. Edit /etc/default/grub and replace or remove rd.md.uuid and generate the grub.conf.
You can find old mdadm ID in /etc/mdadm.conf (if you have not replace it there).

[root@srv ~]# cat /etc/mdadm.conf 
ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 level=raid1 num-devices=2 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md5 level=raid1 num-devices=2 metadata=1.2 name=2035110:storage1 UUID=e6eb2590:b767be36:c76bb869:45ff0c3c
[root@srv ~]# mdadm --detail --scan
ARRAY /dev/md2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md/5 metadata=1.2 name=s2035110:5 UUID=901074eb:16ba7c5b:0af69934:e9444102
[root@srv ~]# mdadm --detail --scan > /etc/mdadm.conf 

Here is our old /etc/default/grub:

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=e6eb2590:b767be36:c76bb869:45ff0c3c console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
GRUB_DISABLE_RECOVERY="true"

Here we edit our /boot/grub2/grub.cfg, replace the old uuid and generate grub.cfg (legacy BIOS):

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=901074eb:16ba7c5b:0af69934:e9444102 console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
[root@srv ~]# grub2-mkconfig -o /boot/grub2/grub.cfg 
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-957.5.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.5.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d
Found initrd image: /boot/initramfs-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d.img
done
[root@srv ~]# reboot

Use this for UEFI BIOS boot:
First check if /boot and /boot/efi are mounted and if not you must mount them with:

mount /boot
mount /boot/efi

Generate the grub.cfg

grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

Bonus

In fact when the original device was removed and added a new one we formatted it as usual. But it was not possible to mount it, you just execute mount

/dev/md5 /mnt/stor1

no error, but no mount could be found, the device was not mounted and when you execute

umount /mnt/stor1

The OS told the “/mnt/stor1” was not mounted. Several more tries were made unsuccessfully to mount the “/dev/md5”, then the restart was performed and the server never went up.
Suppose the systemd just did not allow to mount the device because of the boot parameters rd.md.uuid!