CentOS 7 dracut-initqueue timeout and could not boot – warning /dev/disk/by-id/md-uuid- does not exist

Let’s say you update your software raid layout – create, delete or modify your software raid and reboot the system and your server does not start normally. After loading your remote video console (KVM) you see the boot process reports for a missing device and you are under console (dracut console). Your system is in “Emergency mode”.

The warning:

dracut-initqueue[504]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[504]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[504]: Warning: dracut-initqueue timeout - starting timeout scripts
....
....
dracut-initqueue[504]: Warning: could not boot.
dracut-initqueue[504]: Warning: /dev/disk/by-id/md-uuid-2fdc509e:8dd05ed3:c2350cb4:ea5a620d does not exist
      Starting Dracut Emergency Shell...
Warning: /dev/disk/by-id/md-uuid-2fdc509e:8dd05ed3:c2350cb4:ea5a620d does not exist

Generating "/run/initramfs/rdsosreport.txt"


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


dracut:/#

SCREENSHOT 1) The boot process reports mutiple warning messages of dracut-initqueue timeout, because a drive cannot be found.

main menu
Warning: dracut-initqueue timeout – starting timeout scripts

Keep on reading!

Centos 7 Server hangs up on boot after deleting a software raid (mdadm device)

We have a CentOS 7 server with a simple two hard drives setup in RAID1 of total 4 devices for boot, root, swap and storage. The storage device (/dev/md5) was removed and recreated with RAID0 for better performance, because the server was promoted as only cache server. Then the server was restarted and it never went up.
On IPMI KVM it just started loading the kernel and hanged up after several seconds without any additional information:

The kernel loads the mdadm devices and do not continue and the device md5 is missing.

main menu
CentOS 7 kernel loading the mdadm RAID devices

To boot successfully you must remove the missing device

On the Grub 2 menu press “e” and you’ll get this screen. Here you can edit all lines if you need. You must remove the last rd.md.uuid in our case or the one you deleted. Remove it and press Ctrl+x to load the kernel.

main menu
Grub 2 edit

There are two options you can do:

  • OPTION 1) Remove rd.md.uuid option of your old mdadm device
  • OPTION 2) Replace the ID in rd.md.uuid= with the new ID of the mdadm device.

Each of these two options could be used to solve the booting problem. Edit /etc/default/grub and replace or remove rd.md.uuid and generate the grub.conf.
You can find old mdadm ID in /etc/mdadm.conf (if you have not replace it there).

[root@srv ~]# cat /etc/mdadm.conf 
ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 level=raid1 num-devices=2 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md5 level=raid1 num-devices=2 metadata=1.2 name=2035110:storage1 UUID=e6eb2590:b767be36:c76bb869:45ff0c3c
[root@srv ~]# mdadm --detail --scan
ARRAY /dev/md2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md/5 metadata=1.2 name=s2035110:5 UUID=901074eb:16ba7c5b:0af69934:e9444102
[root@srv ~]# mdadm --detail --scan > /etc/mdadm.conf 

Here is our old /etc/default/grub:

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=e6eb2590:b767be36:c76bb869:45ff0c3c console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
GRUB_DISABLE_RECOVERY="true"

Here we edit our /boot/grub2/grub.cfg, replace the old uuid and generate grub.cfg (legacy BIOS):

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=901074eb:16ba7c5b:0af69934:e9444102 console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
[root@srv ~]# grub2-mkconfig -o /boot/grub2/grub.cfg 
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-957.5.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.5.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d
Found initrd image: /boot/initramfs-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d.img
done
[root@srv ~]# reboot

Use this for UEFI BIOS boot:
First check if /boot and /boot/efi are mounted and if not you must mount them with:

mount /boot
mount /boot/efi

Generate the grub.cfg

grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

Bonus

In fact when the original device was removed and added a new one we formatted it as usual. But it was not possible to mount it, you just execute mount

/dev/md5 /mnt/stor1

no error, but no mount could be found, the device was not mounted and when you execute

umount /mnt/stor1

The OS told the “/mnt/stor1” was not mounted. Several more tries were made unsuccessfully to mount the “/dev/md5”, then the restart was performed and the server never went up.
Suppose the systemd just did not allow to mount the device because of the boot parameters rd.md.uuid!

grub2: grub-install: error: disk mduuid not found even after the partition has bios_grub on

This tutorial is for all of us that has done everything by the book with parted and still they receive an error when installing grub2 to the boot sector!

srv@local ~ # grub2-install /dev/sda
Installing for i386-pc platform.
grub2-install: error: disk `mduuid/51b39c2b565a6629d9efc9b3c39b44ff' not found. 

The solution is simple enough:

set the bios_grub on AGAIN even the parted reported it as ON

So you have a problem with your disks and booted to reinstall the grub and used parted from the rescue CD/DVD/USB and then you chroot to the OS you wanted to repair and execute

grub2-install or grub-install

you get the error above? But why parted reported this:

srv@local ~ # parted /dev/sda
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: DELL PERC H700 (scsi)
Disk /dev/sda: 480GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system     Name     Flags
 1      1049kB  2097kB  1049kB                  primary  bios_grub
 2      2097kB  4096MB  4094MB  linux-swap(v1)  primary  raid
 3      4096MB  24.0GB  19.9GB  ext4            primary
 4      24.0GB  480GB   456GB   ext4            primary

(parted)

And still you get the error:

grub2-install: error: disk `mduuid/51b39c2b565a6629d9efc9b3c39b44ff' not found. 

Why it is unknown for us, but the solution was simple, just do SET the flag again – in our case in the chrooted environment we used the parted program from the distro we wanted to repair and the grub-install was then used from the same distro (in the chrooted environment). At first we used the parted from the rescue distro, but apparently they are some issues with the versions even the two parted program – that one from the chrooted environment and from the rescue distro reported the bios_grub as set ON.
It is possible to get this error after using

sgdisk

to duplicate the partition table of a disk (BTW ALWAYS use “-G, –randomize-guids” with sgdisk, after you duplicate the partition table of a disk or you’ll get into BIG troubles!).
So to write down the solution (in fact it’s like a workaround):

srv@local ~ # parted /dev/sda
GNU Parted 3.2
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) set 1 bios_grub on
(parted) q
srv@local ~ # parted /dev/sdb
GNU Parted 3.2
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) set 1 bios_grub on
(parted) q
srv@local ~ # grub2-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
srv@local ~ # grub2-install /dev/sdb
Installing for i386-pc platform.
Installation finished. No error reported.

* If you are using UEFI enabled boot you probably need more options for the grub installation

Something like that for the grub2 installation (but it is specific for your distro – the path for efi directory, just find it under /boot and put the right path – nothing special!):

grub-install --recheck --target=x86_64-efi --efi-directory=/boot/efi/ /dev/sda