LSI MegaRAID 2108 freezes with abort command and all processes hang up in disk sleep

It happened to one of our old LSI MegaRAID 2108 controllers (AOC-USAS2LP-H8iR (smc2108) with 36 disk, 32x2T and 4x8T) to freeze and most of the processes hang up with Disk sleep. The server was up, the network was working, but no login could be successful. A hard reset was executed with the IPMI KVM. The server started up, the MegaRAID controller booted with a warning that it was shutdown unexpectedly so there could be possible loss of data and to accept it by pressing any key or “C” to boot in the WebBIOS of the controller.

To summarize it up: the LSI controller hangs up when is in the following modes:

  1. Background Initialization
  2. Check Consistency

Aborting and disabling the modes above let out controller to work till replacement. If you experience any kind of strange disk hangs or freezes you can try our solution here! Check below to see how to do it yourself.

Keep on reading!

systemd service freezes in activating (start-post) status – mysqld or other services

We’ve experienced this with the MySQL server under CentOS 7, but you can have this state with other services!
After updating our MySQL we tried to start it up, but the service got this strange state after “systemctl start” returned:

[root@mysql2 ~]# systemctl start mysqld
Job for mysqld.service failed because a timeout was exceeded. See "systemctl status mysqld.service" and "journalctl -xe" for details.

The timeout is big it’s something like 5 to 10 minutes and so it is typical (do not do it!) to type “ctrl+c” and you end up without this message and a strange state of the mysql:

[root@mysql2 ~]# systemctl status mysqld
● mysqld.service - MySQL Community Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: activating (start-post) since Fri 2018-11-09 09:00:55 UTC; 6min ago
  Process: 8333 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
  Process: 8321 ExecStartPre=/usr/bin/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
 Main PID: 8333 (code=exited, status=0/SUCCESS);         : 8334 (mysql-systemd-s)
   CGroup: /user.slice/user-0.slice/session-2395.scope/system.slice/mysqld.service
           └─control
             ├─ 8334 /bin/bash /usr/bin/mysql-systemd-start post
             └─10152 sleep 1

Nov 09 09:00:55 mysql2.mytv.bg systemd[1]: Starting MySQL Community Server...
Nov 09 09:00:56 mysql2.mytv.bg mysqld_safe[8333]: 181109 09:00:56 mysqld_safe Logging to '/var/log/mysqld.log'.
Nov 09 09:00:56 mysql2.mytv.bg mysqld_safe[8333]: 181109 09:00:56 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

Meanwhile with “pstree”:

[root@mysql2 ~]# pstree
systemd─┬─agetty
        ├─crond
        ├─dbus-daemon
        ├─mysql-systemd-s───sleep
        ├─rsyslogd───2*[{rsyslogd}]
        ├─sshd─┬─sshd───bash───systemctl─┬─systemctl
        │      │                         └─systemd-tty-ask
        │      └─sshd───bash───pstree
        ├─systemd-journal
        └─systemd-logind

So as you can see no mysqld process! Apparently systemctl had tried to start MySQL server process and it failed.
So the first thing to do was to check the MySQL logs. In our case it was a obsolete option in my.cnf:

2018-11-09 09:10:57 11384 [ERROR] /usr/sbin/mysqld: unknown variable 'default-character-set=utf8'
2018-11-09 09:10:57 11384 [ERROR] Aborting

The interesting part is that

the service got “Active: activating (start-post)” and when you fix the problem you cannot “systemctl start mysqld” it just start to wait for the current timeout.

In fact this state means “I’m trying to start the service…” and it is in an endless loop to start the service and if you the service has a big start timeout like 5-10 minutes you must wait for the next iteration of the loop to start the service successfully (if you fixed the problem!). And if you want not to wait you must execute first stop to the service and then start – you’ll not wait for any timeout and you can check immediately if the service was started successfully:

[root@mysql2 ~]# systemctl status mysqld
● mysqld.service - MySQL Community Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: activating (start-post) since Fri 2018-11-09 09:20:56 UTC; 2min 50s ago
  Process: 13208 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
  Process: 13196 ExecStartPre=/usr/bin/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
 Main PID: 13208 (code=exited, status=0/SUCCESS);         : 13209 (mysql-systemd-s)
   CGroup: /user.slice/user-0.slice/session-2395.scope/system.slice/mysqld.service
           └─control
             ├─13209 /bin/bash /usr/bin/mysql-systemd-start post
             └─14357 sleep 1

Nov 09 09:20:56 mysql2.mytv.bg systemd[1]: Starting MySQL Community Server...
Nov 09 09:20:56 mysql2.mytv.bg mysqld_safe[13208]: 181109 09:20:56 mysqld_safe Logging to '/var/log/mysqld.log'.
Nov 09 09:20:56 mysql2.mytv.bg mysqld_safe[13208]: 181109 09:20:56 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
[root@mysql2 ~]# systemctl start mysqld
Job for mysqld.service failed because a timeout was exceeded. See "systemctl status mysqld.service" and "journalctl -xe" for details.
[root@mysql2 ~]# systemctl status mysqld
● mysqld.service - MySQL Community Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-11-09 09:30:59 UTC; 2s ago
  Process: 15656 ExecStartPost=/usr/bin/mysql-systemd-start post (code=exited, status=0/SUCCESS)
  Process: 15643 ExecStartPre=/usr/bin/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
 Main PID: 15655 (mysqld_safe)
   CGroup: /user.slice/user-0.slice/session-2395.scope/system.slice/mysqld.service
           ├─15655 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
           └─16243 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mysqld.log --open-files-limit=10000...

Nov 09 09:30:56 mysql2.mytv.bg systemd[1]: Starting MySQL Community Server...
Nov 09 09:30:57 mysql2.mytv.bg mysqld_safe[15655]: 181109 09:30:57 mysqld_safe Logging to '/var/log/mysqld.log'.
Nov 09 09:30:57 mysql2.mytv.bg mysqld_safe[15655]: 181109 09:30:57 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Nov 09 09:30:59 mysql2.mytv.bg systemd[1]: Started MySQL Community Server.

As you can see we even received error again that the service cannot be started and immediately after that the service status is in normal “active (running)” state! And we waited for around 10 minutes! You can see the times in the logs above.
So to summarize it up:

If you have a service in “activating (start-post)” the service cannot be started because of an error, check and fix the problem and then issue “stop and start”:

[root@mysql2 ~]# systemctl start mysqld
Job for mysqld.service failed because a timeout was exceeded. See "systemctl status mysqld.service" and "journalctl -xe" for details.
[root@mysql2 ~]# systemctl status mysqld
● mysqld.service - MySQL Community Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: activating (start-post) since Fri 2018-11-09 10:05:20 UTC; 2min 17s ago
  Process: 23601 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
  Process: 23589 ExecStartPre=/usr/bin/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
 Main PID: 23601 (code=exited, status=0/SUCCESS);         : 23602 (mysql-systemd-s)
   CGroup: /user.slice/user-0.slice/session-2395.scope/system.slice/mysqld.service
           └─control
             ├─23602 /bin/bash /usr/bin/mysql-systemd-start post
             └─24646 sleep 1

Nov 09 10:05:20 mysql2.mytv.bg systemd[1]: Starting MySQL Community Server...
Nov 09 10:05:21 mysql2.mytv.bg mysqld_safe[23601]: 181109 10:05:21 mysqld_safe Logging to '/var/log/mysqld.log'.
Nov 09 10:05:21 mysql2.mytv.bg mysqld_safe[23601]: 181109 10:05:21 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
[root@mysql2 ~]# systemctl stop mysqld
[root@mysql2 ~]# systemctl status mysqld
● mysqld.service - MySQL Community Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2018-11-09 10:07:52 UTC; 4s ago
  Process: 23602 ExecStartPost=/usr/bin/mysql-systemd-start post (code=killed, signal=TERM)
  Process: 23601 ExecStart=/usr/bin/mysqld_safe --basedir=/usr (code=exited, status=0/SUCCESS)
  Process: 23589 ExecStartPre=/usr/bin/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
 Main PID: 23601 (code=exited, status=0/SUCCESS)

Nov 09 10:05:20 mysql2.mytv.bg systemd[1]: Starting MySQL Community Server...
Nov 09 10:05:21 mysql2.mytv.bg mysqld_safe[23601]: 181109 10:05:21 mysqld_safe Logging to '/var/log/mysqld.log'.
Nov 09 10:05:21 mysql2.mytv.bg mysqld_safe[23601]: 181109 10:05:21 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Nov 09 10:07:52 mysql2.mytv.bg systemd[1]: Stopped MySQL Community Server.
[root@mysql2 ~]# systemctl start mysqld
[root@mysql2 ~]# systemctl status mysqld
● mysqld.service - MySQL Community Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-11-09 10:08:06 UTC; 3s ago
  Process: 24711 ExecStartPost=/usr/bin/mysql-systemd-start post (code=exited, status=0/SUCCESS)
  Process: 24698 ExecStartPre=/usr/bin/mysql-systemd-start pre (code=exited, status=0/SUCCESS)
 Main PID: 24710 (mysqld_safe)
   CGroup: /user.slice/user-0.slice/session-2395.scope/system.slice/mysqld.service
           ├─24710 /bin/sh /usr/bin/mysqld_safe --basedir=/usr
           └─25298 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mysqld.log --open-files-limit=10000...

Nov 09 10:08:04 mysql2.mytv.bg systemd[1]: Starting MySQL Community Server...
Nov 09 10:08:04 mysql2.mytv.bg mysqld_safe[24710]: 181109 10:08:04 mysqld_safe Logging to '/var/log/mysqld.log'.
Nov 09 10:08:04 mysql2.mytv.bg mysqld_safe[24710]: 181109 10:08:04 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
Nov 09 10:08:06 mysql2.mytv.bg systemd[1]: Started MySQL Community Server.