CentOS 7 dracut-initqueue timeout and could not boot – warning /dev/disk/by-id/md-uuid- does not exist

Let’s say you update your software raid layout – create, delete or modify your software raid and reboot the system and your server does not start normally. After loading your remote video console (KVM) you see the boot process reports for a missing device and you are under console (dracut console). Your system is in “Emergency mode”.

The warning:

dracut-initqueue[504]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[504]: Warning: dracut-initqueue timeout - starting timeout scripts
dracut-initqueue[504]: Warning: dracut-initqueue timeout - starting timeout scripts
....
....
dracut-initqueue[504]: Warning: could not boot.
dracut-initqueue[504]: Warning: /dev/disk/by-id/md-uuid-2fdc509e:8dd05ed3:c2350cb4:ea5a620d does not exist
      Starting Dracut Emergency Shell...
Warning: /dev/disk/by-id/md-uuid-2fdc509e:8dd05ed3:c2350cb4:ea5a620d does not exist

Generating "/run/initramfs/rdsosreport.txt"


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


dracut:/#

SCREENSHOT 1) The boot process reports mutiple warning messages of dracut-initqueue timeout, because a drive cannot be found.

main menu
Warning: dracut-initqueue timeout – starting timeout scripts

Keep on reading!

Failed to start Security Audit Service, Authorization Manager and Login Service

A power outrage caused one of our servers to shut down unexpectedly and after it had been powered up the server did not show up. The server was unreachable and apparently, the network did not bring up the interfaces.
Loading the IPMI KVM Console and rebooting the server there were three errors on the screen during the boot up of the CentOS 7:

[FAILED] Failed to start Security Audit Service.
See 'systemctl status auditd.service' for details.
....
....
[FAILED] Failed to start Authorization Manager.
See 'systemctl status polkit.service' for details.
....
....
[FAILED] Failed to start Login Service.
See 'systemctl status systemd-logind.service' for details.

And after the above last line, the system stopped loading.
The disks are clean, but there was no login service, so you cannot log in to the server through the keyboard and the monitor! There was no network as mentioned above, which meant no logging at all in the server. You might not know, but if auditd service is enabled you probably use Selinux!

STEP 1) Failed to start the three important services – Security Audit Service, Authorization Manager and Login Service.

So we ended up with unability to log in our server.

main menu

Not sure what exactly caused this problem (seems strange a perfectly working Selinux enabled CentOS 7 server to have miss-labeled files in the root only because of an unexpected shutdown), but to be able to fix the issue and bring back your server to life

you need a rescue CD/USB/DVD/PXE Server to boot from and mount the disks and relabel your root file system.

STEP 1) Boot from a rescue CD/USB/DVD/PXE Server.

In our case, we used the IPMI KVM Console and mounted a Gentoo ISO disk and then booted from it to have a bash shell in our system. Our root resides on software RAID 1, so cat the /proc/mdstat and mount your root file system somewhere (/mnt/gentoo is there by default…)

STEP 2) Booted in our rescue Gentoo CD and mount your root file system.

main menu
Rescue Gentoo CD

STEP 2) create a file “.autorelabel” in the mounting point of your root file system.

So in our case, we mounted our CentOS 7 root file system in /mnt/gentoo and you must create a file with patch “/mnt/gentoo/.autorelabel”. umount and reboot. And a few minutes later your server will be back from the dead. A quick and handful advice – edit your /etc/fstab to mount only the root file system by commenting out all other big storage mounts – of course, if it is possible. We have big storage with millions of files in /mnt/storage-01 and we put the “#” to comment out the line with it – we do not want to wait for relabeling this file system, because the problem apparently is in our root file system! If it is possible (it is highly recommended) to relabel only the root file system in such situations to be able to regain shell control over your server fast.

Bonus – booted in rescue but no logs

OK, we booted to the rescue and tried to see what was the error (with journalctl in chrooted /mnt/gentoo), which did not allow auditd, polkit and systemd-logind to fail to start, but it appeared by default the systemd logs are not persistent on the disk in CentOS 7, so when you reboot in rescue you do not have systemd logs from the last boot! As a piece of additional advice here you may consider enabling persistent systemd logs!

using portage eix for the first time – cannot open database file

Installing “app-portage/eix” in Gentoo to manage your portage updates you might encounter this error, when trying to use “eix” for the first time:

Writing database file /var/cache/eix/portage.eix...
cannot open database file /var/cache/eix/portage.eix for writing (mode = 'wb')

The chances are missing directory “/var/cache/eix/” or the user:group of the “/var/cache/eix/” is root:root, which is NOT right.

The user:group must be “portage:portage”.

So the solution is really simple:

mkdir -p /var/cache/eix
chown portage:portage /var/cache/eix

Output – the errors you might get

Using the eix-sync failed with:

root@srv1 ~ # eix-sync 
 * eix-cache does not exist
 * Running eix-update
Reading Portage settings...
Building database (/var/cache/eix/portage.eix)...
[0] "gentoo" /usr/portage/ (cache: metadata-md5-or-flat)
     Reading category 167|167 (100) Finished             
[1] "myportage" /usr/local/myportage (cache: parse|ebuild*#metadata-md5#metadata-flat#assign)
     Reading category 167|167 (100) Finished    
Applying masks...
Calculating hash tables...
Writing database file /var/cache/eix/portage.eix...
cannot open database file /var/cache/eix/portage.eix for writing (mode = 'wb')
 * eix-update failed
 * Time statistics:
     6 seconds for initial eix-update
     6 seconds total

Using the “eix-update” failed, too.

root@srv ~ # eix-update 
Reading Portage settings...
Building database (/var/cache/eix/portage.eix)...
[0] "gentoo" /usr/portage/ (cache: metadata-md5-or-flat)
     Reading category 167|167 (100) Finished             
[1] "myportage" /usr/local/myportage (cache: parse|ebuild*#metadata-md5#metadata-flat#assign)
     Reading category 167|167 (100) Finished    
Applying masks...
Calculating hash tables...
Writing database file /var/cache/eix/portage.eix...
cannot open database file /var/cache/eix/portage.eix for writing (mode = 'wb')

Output 2 – Successful update with eix

root@srv ~ # eix-update 
Reading Portage settings...
Building database (/var/cache/eix/portage.eix)...
[0] "gentoo" /usr/portage/ (cache: metadata-md5-or-flat)
     Reading category 167|167 (100) Finished             
[1] "myportage" /usr/local/myportage (cache: parse|ebuild*#metadata-md5#metadata-flat#assign)
     Reading category 167|167 (100) Finished    
Applying masks...
Calculating hash tables...
Writing database file /var/cache/eix/portage.eix...
Database contains 19544 packages in 167 categories