rsync and selinux – opendir failed: Permission denied

Selinux could sometime mess up with your setup. Let’s say you configured your rsync daemon but still, you get the error related to permissions when executing the rsync to copy files!

rsync: opendir "/." (in backup2) failed: Permission denied (13)

Apparently, the rsync client connects to the server and it finds there is a section name “backup2”, but still no permission despite you explicitly set in the section uid and ig to be root (uid=0 and gid=0 in the section)!

The most common reason is

selinux denies rsync process to open the directory exported by the path in your rsync configuration file.

By default, Selinux will deny access to any of the files and directories in your system! In most cases here what can you help:

setsebool -P rsync_export_all_ro=1

rsync_export_all_ro will export any files and directories read-only and requests like above will not be denied.
The capital letter “-P” is to set it permanently for the system over reboots.
Keep on reading!

PHP missing xml extension – Fatal error: Uncaught Error: Class DOMDocument not found

We upgraded one of our servers and at first did not notice this error. Soon logs began to fill with this error:

 Fatal error: Uncaught Error: Class 'DOMDocument' not found in /var/www/htdocs/site1/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php:4501 Stack trace: #0 /var/www/htdocs/site1/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php(4056): All_in_One_SEO_Pack->get_prev_next_links(Object(WP_Post)) #1 /var/www/htdocs/site1/wp-includes/class-wp-hook.php(286): All_in_One_SEO_Pack->wp_head('') #2 /var/www/htdocs/site1/wp-includes/class-wp-hook.php(310): WP_Hook->apply_filters(NULL, Array) #3 /var/www/htdocs/site1/wp-includes/plugin.php(465): WP_Hook->do_action(Array) #4 /var/www/htdocs/site1/wp-includes/general-template.php(2668): do_action('wp_head') #5 /var/www/htdocs/site1/wp-content/themes/twentysixteen/header.php(21): wp_head() #6 /var/www/htdocs/site1/wp-includes/template.php(704): require_once('/var/www/h...') #7 /var/www/htdocs/site1/wp-includes/template.php(653): load_template('/var/www/h...', true) #8 /var/www/htdocs/site1/wp-incl in /var/www/htdocs/site1/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php on line 4501

Apparently, we missed to install all PHP extensions as before and one of our WordPress plugins (All in One SEO Pack) began to throw this nasty error and blank pages, where the error is supposed to be shown!

The fix is easy enough just install PHP-XML

CentOS 7

  • default PHP:
    yum -y install php-xml
    
  • Using ius-release repo:
    yum -y install php72u-xml
    

Ubuntu 16/17/18

sudo apt install -y php-xml

Gentoo

Just add to your current USE the following “xml” in your /etc/portage/make.conf and emerge (be sure “xml” is in USE emerge output, not “-xml”!):

srv1 ~ # emerge -va --nodeps php

These are the packages that would be merged, in order:

[ebuild   R    ] dev-lang/php-7.2.10:7.2::gentoo  USE="acl apache2 bcmath berkdb bzip2 calendar cgi cli ctype curl exif fileinfo filter fpm ftp gd gdbm hash iconv ipv6 json mhash mysql mysqli nls opcache pcntl pdo phar posix readline session sharedmem simplexml snmp soap sockets sqlite ssl tokenizer truetype unicode xml xmlreader xmlrpc xmlwriter zip zlib -argon2 -cdb -cjk -coverage -debug -embed -enchant -firebird -flatfile -gmp -imap -inifile -intl -iodbc -kerberos -ldap -ldap-sasl -libedit -libressl -lmdb -mssql -oci8-instant-client -odbc -phpdbg -postgres -qdbm -recode (-selinux) -session-mm -sodium -spell -systemd -sysvipc -test -threads -tidy -tokyocabinet -wddx -webp -xpm -xslt -zip-encryption" 0 KiB

Bonus

Here is how our WordPress page looks like with the error:

 <!DOCTYPE html>
<html lang="en-US" class="no-js">
<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<link rel="profile" href="http://gmpg.org/xfn/11">
		<link rel="pingback" href="https://ahelpme.com/xmlrpc.php">
		<script>(function(html){html.className = html.className.replace(/\bno-js\b/,'js')})(document.documentElement);</script>
<title>How to compile xmr-stak (2.4.5) under CentOS 7 for CPU mining cryptocurrencies | Any IT here? Help Me! - Part 2</title>

<!-- All in One SEO Pack 2.12 by Michael Torbert of Semper Fi Web Design[387,517] -->
<br />
<b>Fatal error</b>:  Uncaught Error: Class 'DOMDocument' not found in /var/www/htdocs/site1/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php:4501
Stack trace:
#0 /var/www/htdocs/site1/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php(4056): All_in_One_SEO_Pack-&gt;get_prev_next_links(Object(WP_Post))
#1 /var/www/htdocs/site1/wp-includes/class-wp-hook.php(286): All_in_One_SEO_Pack-&gt;wp_head('')
#2 /var/www/htdocs/site1/wp-includes/class-wp-hook.php(310): WP_Hook-&gt;apply_filters(NULL, Array)
#3 /var/www/htdocs/site1/wp-includes/plugin.php(465): WP_Hook-&gt;do_action(Array)
#4 /var/www/htdocs/site1/wp-includes/general-template.php(2668): do_action('wp_head')
#5 /var/www/htdocs/site1/wp-content/themes/twentysixteen/header.php(21): wp_head()
#6 /var/www/htdocs/site1/wp-includes/template.php(704): require_once('/var/www/h...')
#7 /var/www/htdocs/site1/wp-includes/template.php(653): load_template('/var/www/h...', true)
#8 /var/www/htdocs/site1/wp-incl in <b>/var/www/htdocs/site1/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php</b> on line <b>4501</b><br />

And here is what you can expect in your web server logs (nginx + php-fpm setup)

2019/02/23 03:36:14 [error] 27441#27441: *56837 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Uncaught Error: Class 'DOMDocument' not found in /var/www/htdocs/site1/root/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php:4501
Stack trace:
#0 /var/www/htdocs/site1/root/wp-content/plugins/all-in-one-seo-pack/aioseop_class.php(4056): All_in_One_SEO_Pack->get_prev_next_links(Object(WP_Post))
#1 /var/www/htdocs/site1/root/wp-includes/class-wp-hook.php(286): All_in_One_SEO_Pack->wp_head('')
#2 /var/www/htdocs/site1/root/wp-includes/class-wp-hook.php(310): WP_Hook->apply_filters(NULL, Array)
#3 /var/www/htdocs/site1/root/wp-includes/plugin.php(465): WP_Hook->do_action(Array)
#4 /var/www/htdocs/site1/root/wp-includes/general-template.php(2668): do_action('wp_head')
#5 /var/www/htdocs/site1/root/wp-content/themes/twentysixteen/header.php(21): wp_head()
#6 /var/www/htdocs/site1/root/wp-includes/template.php(704): require_once('/var/www/h...')
#7 /var/www/htdocs/site1/root/wp-includes/template.php(653): load_template('/var/www/h...', true)
#8 /" while reading response header from upstream, client: 66.249.79.133, server: ahelpme.com, request: "GET /linux/fedora/review-of-freshly-installed-fedora-29-kde-plasma-desktop-kde-gui/2/ HTTP/1.1", upstream: "fastcgi://192.168.0.12:9000", host: "ahelpme.com"

And keep in mind if your HTTP 405 codes get in your access logs:

10.10.10.2 - - [22/Feb/2019:05:20:34 +0000] "GET /xmlrpc.php HTTP/1.1" 405 53 "-" "Mozilla/5.0 (X11; Linux i686; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" "-"
10.10.10.2 - - [22/Feb/2019:05:20:38 +0000] "GET /xmlrpc.php HTTP/1.1" 405 53 "-" "Mozilla/5.0 (X11; Linux i686; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" "-"

The impact of enabling MySQL sync_binlog – really high disk IO

If you enable this feature in your MySQL you could

increase your disk IO time and write by 8-10x times.

Generally, this feature could save your replication scheme if a power failure occurs or OS crash and it could guarantee that no transaction is lost from the binary log. When enabled the binary log is synchronized on disk before transactions are committed. You can check the manual here: https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_sync_binlog and it also says there could be a great impact on disk writes but how many?
So here are two setups:

SETUP 1) 2 x 3T hard drives TOSHIBA DT01ACA300 in software RAID1

The impact of setting the sync_binlog=1 is 8-10 times the IO time and IO writes. Here is images of several hours of sync_binlog=1 and then we disabled it online:

SCREENSHOT 1) Enable the binary log synchronization with sync_binlog=1.

As you can see the increase in the disk IO time and disk write IOPS are significant – somewhere between 5 and 6 times more! The load is not increased more than 1.5x as normal, but it should be noted the server is off-peak and it has plenty of RAM 32G. Still mush load when some other IO appears.

main menu
Set MySQL sync_binlog=1 in a software raid of two hard drives

SCREENSHOT 2) Disabling the binary log synchronization with sync_binlog=0.

The decrease of the disk IO time and disk write IOPS are significant – somewhere between 5 and 6 times more! Everything back to normal.

main menu
Set MySQL sync_binlog=0 in a software raid of two hard drives.

SCREENSHOT 3) Enable the binary log synchronization with sync_binlog=1.

As you can see the increase in the disk IO time and disk write IOPS are significant – somewhere between 8 and 10 times more! The load is not increased, but it should be noted the server is off-peak and it has plenty of RAM 192G. Still mush load when some other IO appears.

SET GLOBAL sync_binlog=0;

SETUP 2) 2 x 960G SSD SAMSUNG SM863 in software RAID1

The impact of setting the sync_binlog=1 is also 8-10 times the IO time and IO writes. Here is images of several hours of sync_binlog=1 and then we disabled it online:

main menu
Set MySQL sync_binlog=1 in a software raid of two enterprise SSDs

SCREENSHOT 4) Disabling the binary log synchronization with sync_binlog=0.

The decrease of the disk IO time and disk write IOPS are significant – somewhere between 8 and 10 times more and even more! Everything back to normal.

main menu
Set MySQL sync_binlog=0 in a software raid of two enterprise SSDs

SCREENSHOT 5) Enable the binary log synchronization with sync_binlog=1.

Just the period of the graphs are bigger. As you can see the increase in the disk IO time and disk write IOPS are significant – somewhere between 8 and 10 times more! The load is not increased, but it should be noted the server is off-peak and it has plenty of RAM 192G. Still mush load when some other IO appears.

main menu
Set MySQL sync_binlog=1 in a software raid of two enterprise SSDs (big period)

SCREENSHOT 6) Disabling the binary log synchronization with sync_binlog=0.

Just the period of the graphs are bigger. The decrease of the disk IO time and disk write IOPS are significant – somewhere between 8 and 10 times more and even more! Everything back to normal.

main menu
Set MySQL sync_binlog=0 in a software raid of two enterprise SSDs (big period)

BONUS – MySQL changed the default value from 0 (disabled) to 1 (enabled) from 5.7 (in fact MySQL >= 5.7.7).

SO BE CAREFUL now when upgrading from older versions like MySQL 5.1, 5.5, 5.6 – you would probably need to disable it in the MySQL configuration file my.cnf.

Upgrade MySQL 5.6 to 5.7 what problems to expect with old my.cnf configuration file

Finally we do not have any more MySQL 5.6 servers. We upgraded our last part of the system with MySQL 5.6 to 5.7. In our opinion this upgrade is one of the major referred to MySQL configuration file my.cnf – multiple deprecated directives are removed in this new 5.7 version so when upgrading you should removed them before restarting or starting the new version if you want to have running MySQL server instance.
Keep in mind our my.cnf are old, they are created with MySQL 5.0 and they are edited in every upgrade to a new version (5.0 to 5.1, 5.1 to 5.5 and 5.5 to 5.6) and when we needed a specific optimization for our work load. And this is only for our configuration, there surely are more deprecated/removed variables in the new version. Here is a good starting point – https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-previous-series.html This article is not how to upgrade your old MySQL 5.6 to the new MySQL 5.7 it shows what problems you might have after you upgrade MySQL 5.6 to the new MySQL 5.7.
There are two parts of this article:

  1. Removed variables, which were perfectly OK in the old version 5.6
  2. Changed default value of variables, which impact greatly the IO or the the SQL execution

The error messages are included, too.

PART 1) Removed variables.

Some MySQL variables first get deprecated and then removed in later versions (some are just renamed) and if they are contained in the my.cnf configuration file your server will not start up at all. The MySQL log shows that the server starts and then throws an error about “unknown variable” and starts a shutdown procedure. So you end up without database server and it is important to remove them from the configuration or find the new name of a renamed one.

2019-02-26T09:50:12.612950Z 0 [ERROR] unknown variable 'key_buffer=512M'
2019-02-26T09:50:36.361870Z 0 [ERROR] unknown variable 'thread_concurrency=6'
2019-02-26T09:51:17.658546Z 0 [ERROR] unknown variable 'thread_cache=10'
2019-02-26T09:51:32.473210Z 0 [ERROR] unknown variable 'innodb_additional_mem_pool_size=256M'

All four

key_buffer, thread_concurrency, thread_cache, innodb_additional_mem_pool_size

MySQL variables were removed and your server won’t start up if they are contained in the configuration. The “key_buffer” has been renamed to “key_buffer_size so replace it with key_buffer_size in your my.cnf. It’s important to replace it, because commenting it out would activate the default value and in this case 8M key_buffer_size, which is pretty low (in fact almost all default values of the MySQL variables are really low and it is a problem and a topic of discussions in many forums).
The “thread_cache” also renamed long ago to “thread_cache_size“, so replace it with thread_cache_size.
thread_concurrency and innodb_additional_mem_pool_size were removed long ago they first stopped doing anything and with this version they removed the variables. As you can see old configuration files could carry on many old names along the years.

The important thing here is you must renamed the ones, which got renamed and remove the ones, which got removed, because your server is not going start up with them in the configuration.

PART 2) Changed default value

Some default values of MySQL variables got changed and if you have not included it in the my.cnf configuration you might be really surprised how big impact they have on the IO or even on the behavior of the SQL statements.

2.1) Our first MySQL variables is

sync_binlog

– the default value was “0” (deactivated synchronization) and now it is “1” (bin log synchronization). This could greatly impact the performance of your MySQL database server with like 8-10 times more writes and IO disk wait time (really!!!) – you can see it here: (coming soon). So if you haven’t used this variable before you should put it in your my.cnf configuration for sure (in [mysqld] section):

sync_binlog=0

do not need to restart the server, just put in the my.cnf configuration file and open a mysql root console and execute:

SET GLOBAL sync_binlog=0;

it can be live changed.

2.2) And the second example is

sql_mode

– the default value was “NO_ENGINE_SUBSTITUTION” and now it is “ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION”, which is a pretty substantial difference. You can lose INSERTs and UPDATEs easily because a much strict mode is activated by default.
For example with an INSERT if you do not set value to a field, which column does not have default value (yes, it is wrong, but it was OK before), your insert won’t be executed and you’ll get an error (or just a FALSE after execution of your query like with PHP PDO). Here is the MySQL explanation:

A value is missing when a new row to be inserted does not contain a value for a non-NULL column that has no explicit DEFAULT clause in its definition.

And more in https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mode-strict
So if you haven’t used this variable before you should put it in your my.cnf configuration for sure (in [mysqld] section):

sql_mode=NO_ENGINE_SUBSTITUTION

do not need to restart the server, just put in the my.cnf configuration file and open a mysql root console and execute:

SET GLOBAL sql_mode='NO_ENGINE_SUBSTITUTION'

Update Supermicro BMC/IPMI Firmware – under Linux console

Here you will see our log of upgrading the Supermicro IPMI firmware with the cli tool included in the firmware package for your IPMI unit under Linux console.
If your server has built-in IPMI unit in the motherboard there will be a firmware for it next to the BIOS firmware in the Supermicro site. You go to the page of your Supermicro page and on the left part you have also the BIOS and IPMI firmware links. The IPMI firmware package has a Windows/DOS and Linux executable files to flash the firmware under the console.
So here we flash a new firmware to our motherboard is X10SLM+-F.

Here you can see left “Links & Resources” and click on ” BMC/IPMI Firmware” to download the latest IPMI firmware for your motherboard.

main menu
Motherboard X10SLM+-F page in Supermicro site

Upload the downloaded file in your server.

STEP 1) Unpack the firmware file downloaded from Supermicro site.

Here we include the verbose output of “tar” so you can see what files are included. The files we use here are highlighted.

[root@srv ~]# ls -altr
total 25904
drwxr-xr-x. 94 root root    81920  3 Feb 17,42 ..
drwxr-xr-x.  2 root root     4096  3 Feb 17,43 .
-rw-r--r--.  1 root root 26432121  3 Feb 17,43 REDFISH_X10_372.zip
[root@srv ~]# mkdir REDFISH_X10_372
[root@srv ~]# cd REDFISH_X10_372/
[root@srv ~/REDFISH_X10_372]# unzip ../REDFISH_X10_372.zip 
Archive:  ../REDFISH_X10_372.zip
  inflating: Redfish_Ref_Guide_2.0.pdf  
   creating: 2.07/
   creating: 2.07/dos/
  inflating: 2.07/dos/AdUpdate.exe   
   creating: 2.07/linux/
   creating: 2.07/linux/x32/
  inflating: 2.07/linux/x32/AlUpdate  
   creating: 2.07/linux/x64/
  inflating: 2.07/linux/x64/AlUpdate  
  inflating: 2.07/ReleaseNote.txt    
   creating: 2.07/windows/
   creating: 2.07/windows/x32/
  inflating: 2.07/windows/x32/AwUpdate.exe  
  inflating: 2.07/windows/x32/phymem32.sys  
  inflating: 2.07/windows/x32/pmdll32.dll  
  inflating: 2.07/windows/x32/superbmc32.sys  
  inflating: 2.07/windows/x32/superdll_ssm32.dll  
   creating: 2.07/windows/x64/
  inflating: 2.07/windows/x64/AwUpdate.exe  
  inflating: 2.07/windows/x64/phymem64.sys  
  inflating: 2.07/windows/x64/pmdll64.dll  
  inflating: 2.07/windows/x64/superbmc.sys  
  inflating: 2.07/windows/x64/superdll_ssm64.dll  
  inflating: IPMI Firmware Update_NEW.doc  
  inflating: REDFISH_X10_372.bin

There are 5 version of the flash utility Linux 32bit and 64bit, Windows 32bit and 64bit and a dos version.

STEP 2) Flash the BCM/IPMI firmware.

We choose here not to preserve configuration, because some old features might be incompatible with the new one. It is not mandatory to do it in fact we also tested with “to preserve” the old configuration and we have no problems afterwards.
We do not change almost anything in the IPMI configuration except admin password and the network settings and when flashing under the OS you have the ability to reconfigure it after the flashing process. Your server is up and running and you can use “ipmitool” to configure the IPMI module.
The whole process took about 15 minutes.

[root@srv ~/REDFISH_X10_372]# 2.07/linux/x64/AlUpdate -f REDFISH_X10_372.bin -r n
sh: cls: command not found
*****************************************************************************
* ATEN Technology, Inc.                                                     *
*****************************************************************************
* FUNCTION   :  IPMI FIRMWARE UPDATE UTILITY                                *
* VERSION    :  2.07                                                        *
* BUILD DATE :  Jul 13 2016                                                 *
* USAGE      :                                                              *
*             (1)Update FIRMWARE : AlUpdate -f filename.bin [OPTION]        *
*             (2)Dump FIRMWARE   : AlUpdate -d filename                     *
*             (3)Restore CONFIG  : AlUpdate -c -f filename.bin              *
*             (4)Backup CONFIG   : AlUpdate -c -d filename.bin              *
*****************************************************************************
* OPTION                                                                    *
*   -i the IPMI channel, currently, kcs and lan are supported               *
* LAN channel specific arguments                                            *
*   -h remote BMC address and RMCP+ port, (default port is 623)             *
*   -u IPMI user name                                                       *
*   -p IPMI password correlated to IPMI user name                           *
*   -r Preserve Configuration (default is Preserve)                         *
*      n:No Preserve, reset to factory default settings                     *
*      y:Preserve, keep all of the settings                                 *
*   -c IPMI configuration backup/restore                                    *
*      -f [restore.bin] Restore configurations                              *
*      -d [backup.bin] Backup configurations                                *
*****************************************************************************
* EXAMPLE                                                                   *
*   we like to upgrade firmware through KCS channel                         *
*   AlUpdate -f fwuperade.bin -i kcs -r y                                   *
*   AlUpdate -d fwdump.bin -i kcs -r y                                      *
*                                                                           *
*   we like to restore/backup IPMI config through KCS channel               *
*   AlUpdate -c -f restore.bin -i kcs -r y                                  *
*   AlUpdate -c -d backup.bin -i kcs -r y                                   *
*                                                                           *
*   we like to upgrade firmware through LAN channel with                    *
*   - BMC IP address 10.11.12.13 port 623                                   *
*   - IPMI username is usr                                                  *
*   - Password for alice is pwd                                             *
*   - Preserve Configuration                                                *
*   AlUpdate -f fw.bin -i lan -h 10.11.12.13 623 -u usr -p pwd -r y         *
*   AlUpdate -d fwdump.bin -i lan -h 10.11.12.13 623 -u usr -p pwd -r y     *
*                                                                           *
*   we like to restore/backup IPMI config through LAN channel with          *
*   - BMC IP address 10.11.12.13 port 623                                   *
*   - IPMI username is usr                                                  *
*   - Password for alice is pwd                                             *
*   - Preserve Configuration                                                *
*   AlUpdate -c -f fw.bin -i lan -h 10.11.12.13 623 -u usr -p pwd           *
*   AlUpdate -c -d fwdump.bin -i lan -h 10.11.12.13 623 -u usr -p pwd       *
*****************************************************************************

2.07/linux/x64/AlUpdate -f REDFISH_X10_372.bin -r n 
Try open dev ipmi0....
Check if this file is valid................
If the FW update fails,PLEASE TRY AGAIN
Load part 0   126008 bytes, [Ok]                       
Load part 1 14635008 bytes, [Ok]                       
Load part 2  1537585 bytes, [Ok]                       
Load part 3  8081440 bytes, [Ok]                       
Load part 4   262144 bytes, [Ok]                       



                 If the FW update fails. PLEASE WAIT 5 MINS AND REMOVE THE AC...
new firmware is updating...100%
Update Complete,Please wait for BMC reboot, about 1 min                       
[root@srv ~/REDFISH_X10_372]# 

All the lines starting with “Load part” will shows progress percentages like:

Load part 1 14635008 bytes,  4137K bytes   29%"

And the line starting with “new firmware is updating…” also shows like:

new firmware is updating...28%

In dmesg you can see your IPMI module resets:

[root@conv1 ~]# dmesg
[1954154.242383] usb 3-7: USB disconnect, device number 2
[1954154.242385] usb 3-7.1: USB disconnect, device number 3
[1954185.337154] usb 3-7: new high-speed USB device number 4 using xhci_hcd
[1954185.501356] usb 3-7: New USB device found, idVendor=0557, idProduct=7000
[1954185.501358] usb 3-7: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[1954185.501879] hub 3-7:1.0: USB hub found
[1954185.501923] hub 3-7:1.0: 4 ports detected
[1954185.899168] usb 3-7.1: new low-speed USB device number 5 using xhci_hcd
[1954185.999375] usb 3-7.1: New USB device found, idVendor=0557, idProduct=2419
[1954185.999376] usb 3-7.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[1954186.000708] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.0/input/input10
[1954186.051346] hid-generic 0003:0557:2419.0003: input,hidraw0: USB HID v1.00 Keyboard [HID 0557:2419] on usb-0000:00:14.0-7.1/input0
[1954186.052050] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.1/input/input11
[1954186.052423] hid-generic 0003:0557:2419.0004: input,hidraw1: USB HID v1.00 Mouse [HID 0557:2419] on usb-0000:00:14.0-7.1/input1
[1954199.668503] usb 3-7.1: USB disconnect, device number 5
[1954201.450533] usb 3-7.1: new low-speed USB device number 6 using xhci_hcd
[1954201.550755] usb 3-7.1: New USB device found, idVendor=0557, idProduct=2419
[1954201.550756] usb 3-7.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[1954201.552044] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.0/input/input12
[1954201.602658] hid-generic 0003:0557:2419.0005: input,hidraw0: USB HID v1.00 Keyboard [HID 0557:2419] on usb-0000:00:14.0-7.1/input0
[1954201.603372] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.1/input/input13
[1954201.603729] hid-generic 0003:0557:2419.0006: input,hidraw1: USB HID v1.00 Mouse [HID 0557:2419] on usb-0000:00:14.0-7.1/input1

Update supermicro X10SLH-F firmware BIOS under Linux with the SUM cli

As you can see our product is:

product: X10SLH-F/X10SLM+-F

The same string is in our KVM IPMI: “Product Name: X10SLH-F/X10SLM+-F” and in the BIOS, but if you go the supermicro site you will find that

  • X10SLH-F has C226 chipset (supports video in the CPU)
  • X10SLM+-F has C224 chipset

and because we use the video in the CPU we know our motherboard is X10SLH-F and we downloaded the BIOS firmware for it. You also could check your chipset with lshw command.

STEP 1) Download and unpack the SUM (Supermicro Update Manager) and the BIOS zip file

Unpack the SUM (Supermicro Update Manager), here you can find a detail information about SUM – Update supermicro server’s firmware BIOS under linux with the SUM cli

[root@srv1 ~]# tar xzvf sum_2.0.0_Linux_x86_64_20171108.tar.gz 
sum_2.0.0_Linux_x86_64/
sum_2.0.0_Linux_x86_64/ReleaseNote.txt
sum_2.0.0_Linux_x86_64/sum
sum_2.0.0_Linux_x86_64/ExternalData/
sum_2.0.0_Linux_x86_64/ExternalData/VENID.txt
sum_2.0.0_Linux_x86_64/ExternalData/SMCIPID.txt
sum_2.0.0_Linux_x86_64/driver/
sum_2.0.0_Linux_x86_64/driver/RHL4_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL4_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/driver/RHL6_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL6_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/driver/RHL5_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL5_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/driver/RHL7_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL7_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/SUM_UserGuide.pdf
[root@srv1 ~]# unzip x10slh8_510.zip
Archive:  x10slh8_510.zip
   creating: x10slh8.510/
  inflating: x10slh8.510/AFUDOSU.SMC  
  inflating: x10slh8.510/ami.bat     
  inflating: x10slh8.510/Readme for AMI BIOS.txt  
  inflating: x10slh8.510/x10slh8.510  
[root@srv1 ~]# cd sum_2.0.0_Linux_x86_64
sum_2.0.0_Linux_x86_64/                 sum_2.0.0_Linux_x86_64_20171108.tar.gz  
[root@conv1 ~]# cd sum_2.0.0_Linux_x86_64

STEP 2) Flash the BIOS file with sum cli.

Here you can see what to expect flashing the BIOS firmware.

[root@srv1 sum_2.0.0_Linux_x86_64]# ./sum -c UpdateBios --file ../x10slh8.510/x10slh8.510 
Supermicro Update Manager (for UEFI BIOS) 2.0.0 (2017/11/08) (x86_64)
Copyright©2017 Super Micro Computer, Inc. All rights reserved
Reading BIOS flash ..................... (100%)
Checking BIOS ID ...
Writing BIOS flash ..................... (100%)
Verifying BIOS flash ................... (100%)
Checking ME Firmware ...
Putting ME data to BIOS ................ (100%)
Writing ME region in BIOS flash ...
 - Update success for /FDT!!
 - Updated Recovery Loader to OPRx
 - Updated FPT, MFSB, FTPR and MFS
 - ME Entire Image done
WARNING:Must power cycle or restart the system for the changes to take effect!
[root@srv1 sum_2.0.0_Linux_x86_64]# reboot

During the BIOS flashing your console could have seemed unresponsive for several minutes, but it is OK, the flash process is about 10 minutes. Then reboot and wait for several automatic resets of your system and after that when your system reaches the OS boot you should reboot again and reset your BIOS to the optimized defaults and then you can tune it as it was before.

In some rear cases you could receive “Critical Error” – “FDT is different.” you should reboot and repeat the procedure, more information here – Update supermicro server’s firmware BIOS under linux with the SUM cli

Bonus

Some commands to find the exact information for the server motherboard.

[root@srv1 ~]# lshw|grep -A 14 "core$"
  *-core
       description: Motherboard
       product: X10SLH-F/X10SLM+-F
       vendor: Supermicro
       physical id: 0
       version: 1.01
       serial: ZM1111111111
       slot: To be filled by O.E.M.
     *-firmware
          description: BIOS
          vendor: American Megatrends Inc.
          physical id: 0
          version: 3.0a
          date: 12/17/2015
          size: 64KiB
[root@srv1 ~]# lspci |grep -i c226
00:1f.0 ISA bridge: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller (rev 05)
conv2 ~ # lspci -vvv|grep -i c226
00:1f.0 ISA bridge: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller (rev 05)
        Subsystem: Super Micro Computer Inc C226 Series Chipset Family Server Advanced SKU LPC Controller

Tune nginx proxy cache – control the cache manager how to delete cached files

In most cases you’ll never want to modify the default settings for deleting cache items with proxy_cache_path directives. The problem is in a peak the file deleting could impact your server performance and even it could kill your server leaving it unresponsive for a period of time. You cannot instruct nginx with a schedule job for deletion cached items or ban the deletion when the server is busy or loaded. The manager just traces each zone for used cache capacity versus the maximum allowed size and if the used capacity is near or bigger than the maximum allowed size (max_size) the manager process triggers deletion with the default values – the nginx manager will try to delete at least 100 files (up to 200 milliseconds) and then it will sleep for 50 milliseconds then again it will try deleting 100 files. So your file system could receive at least 1000 files per second to delete!

This could lead your server to almost unresponsive state in the peaks.

And it could be perfectly OK in off-peaks, but there is no way how to tell nginx cache manager there is a plenty free space despite you reach the cache limit so at the moment it is not the best time to delete the cache!

You can tune three parameters per cache directory (manual here: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path):

  • manager_files – not more than this number of files to delete in one iteration. The default value is 100.
  • manager_threshold – limit the delete iteration time. The default value is 200 milliseconds and you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
  • manager_sleep – how much time to sleep the manager before executing another delete iteration. The default value is 50 milliseconds and here you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g manager_files=2 manager_sleep=200ms manager_threshold=500ms;

The cache manager will delete not more than 2 files for up to 500 milliseconds and it will sleep 200 milliseconds before another delete iteration.

The best option for loaded servers

The best option for loaded servers with full cache is to balance the free space – delete small amount of files at once to be sure your server will not get loaded even the free space decreases at the peaks (so more files are cached than the nginx manager could delete – you are aware of this and the free space should be enough), but during the off peak (which normally is several times longer than the peak) the nginx manager could catch up with the deleting and it should free up some space (cached files are lesser than the deleted ones). Of course, you should tune this according to your situation.
The main idea is to delete in small amounts of files to not saturate your disks it could take longer to recover the free space, but it will not load your server in peaks. You should consider two things:

  1. Free space – enough free space and to be sure the free space is enough for the peaks, when the cache could grow above the threshold.
  2. Number of deletions per iteration – you should experiment with this. Fist you should be away how many files are added for a period of time, which includes one peak and one off-peak and then to balance the number in such a way that after the period the cache is not above the maximum size. Probably the best is to start with a 24 hours period, which includes at least one peak.

As you can see the example above only 2 files are good enough for an iteration for our case. Taking into account the 200ms sleep between the files’ deletions 10 files at most should be deleted per second. In our case it is not enough for the peak, but for the off-peak, which is 20 hours every 24 hours, is good enough to get into the maximum size limit of the cache.

Here you can learn how to verify your nginx is deleting cache files and the impact of the default settings on a busy server in a peak: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) Our loaded server just stopped serving files and the bandwidth decreased with 99% because nginx cache manager suddenly started deleting cached files.

how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process)

In peaks deleting files could kill your server and easily the traffic could degraded multiple times than normal if the nginx cache manager start deleting files!

The server is perfectly normal but suddenly it just get loaded and all nginx processes are in D (“Disk sleep”) state.

What could it be? What is going on with your proxy server?

Probably the cache is full!

Unfortunately there is no way to check how much is filled the cache live – just an upgrade or restart of the nginx process will trigger nginx cache loader to check all the cache files and will write the cache size on exit in the error log – but be careful the cache loading is also IO intensive operation – stats all the cache files and they could be millions images).

If you are sure the cache manager is to blame for the IO of your server (probably using this method – Check whether nginx cache manager is deleting files at the moment), you can stop it almost immediately!

Just increase the nginx cache drastically – add zero to the maximum cache size

Of course, you should have enough free space till you resolve the problem – for example more servers or manual deletion on peak-off or tune your cache deletion or any other solution….
Search for something like

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=400g

And add zero to the max_size number like:

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g

The max size will increase from 400G to 4000G (4T)!
This will effectively stop the files deleting and the nginx cache manager will have slept for long time before invoking again to delete files. This could be life saving operation for your server at peak!

Here is a real graph from one of our servers – the cache manager started deleting files from the cache and the traffic dropped 99%!!!

SCREENSHOT 1) The nginx cache manager just started to delete files from the cache and this operation just killed our server completely.

You can see almost zero bandwidth! The problem was resolved when we reloaded nginx with a bigger cache max_size value. The nginx manager immediately went to sleep and no IO for deleting files. The load of the server returned to normal!

main menu
nginx cache manager start deleting files

SCREENSHOT 2) Hard drives were saturated and the disk maxed the IO time to 10 ms.

Despite the bigger READ and WRITE IOPS there was 95-99% less traffic.

main menu
Disk IO Time when cache manager is working

Then you can tune the values for deleting files from the cache – Tune nginx proxy cache – control the cache manager how to delete cached files.

Check whether nginx cache manager is deleting files at the moment

Here is a tip for the webmasters (or system admins) to discover whether the nginx using proxy_cache to cache files is deleting files at the moment! There situation where you may need to know if the loaded of a static media server is caused by the deletion of the cache manager or by the read or seek operations when serving the static files. The deletion is really slow and IO intensive operation, which could greatly impact the performance and traffic of the server.
Find the process nginx’s “cache manager process” and strace it:

[root@srv ~]# ps axuf|grep nginx
root     31582  0.0  0.0 2906768 25108 ?       Ss   Feb15   0:01 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx    16008  1.9  1.3 2941188 440224 ?      S    16:39   1:33  \_ nginx: worker process
nginx    16009  1.5  1.2 2941188 398836 ?      S    16:39   1:12  \_ nginx: worker process
nginx    16010  0.5  0.7 2941984 239064 ?      S    16:39   0:26  \_ nginx: worker process
nginx    16011  0.7  0.9 2941984 299356 ?      D    16:39   0:35  \_ nginx: worker process
nginx    16012  1.2  1.1 2941188 389540 ?      D    16:39   1:01  \_ nginx: worker process
nginx    16013  2.3  1.5 2941188 487324 ?      D    16:39   1:55  \_ nginx: worker process
nginx    16014  0.0  0.6 2906772 224004 ?      S    16:39   0:01  \_ nginx: cache manager process
[root@srv ~]# strace -f -p 16014
strace: Process 16014 attached
gettid()                                = 16014
write(31, "2019/02/25 18:00:31 [info] 16014"..., 89) = 89
epoll_wait(36, [], 512, 5406)           = 0
unlink("/mnt/cache/0/39/c8ccbbc06d16debb1c8d58ceb6f99390") = 0
unlink("/mnt/cache/0/78/118924d7bf70e20fa8f790c6f9e7c780") = 0
unlink("/mnt/cache/3/ce/fab074cc670e6a80114dcbc398a63ce3") = 0
unlink("/mnt/cache/5/48/0b4e162dd7be8244815721fb7d68e485") = 0
unlink("/mnt/cache/5/56/e5eb4b38c7c8d209d0aabaf79ac02565") = 0
unlink("/mnt/cache/e/c6/207b432fa77375e4eefcaf52db250c6e") = 0
unlink("/mnt/cache/4/6d/ac0db27a03dabc79d869068db1b516d4") = 0
unlink("/mnt/cache/9/e8/91625c6e60de8e5425c4135c7dfb2e89") = 0
unlink("/mnt/cache/b/3c/f3c53000cf0cb20d55d8c09df8a733cb") = 0
unlink("/mnt/cache/f/f7/6f06423cd411b45816969fe020903f7f") = 0
unlink("/mnt/cache/f/50/c9b8ab72821a6e9bcb9c8d4b790dc50f") = 0
unlink("/mnt/cache/6/1f/74b0f1fdf1ac30db6af7793dc15671f6") = 0
unlink("/mnt/cache/0/83/caf199c1b99d438f96caec71bf2ea830") = 0
unlink("/mnt/cache/4/3d/c90f8fbbba4aaf407e386641dc2203d4") = 0
unlink("/mnt/cache/4/ad/d23cf8598020141b2bcec46d2b5cbad4") = 0
unlink("/mnt/cache/d/47/05973bc310503f36c67b7c1c24c8247d") = 0
unlink("/mnt/cache/f/11/e4fcbde8533d89105ab41f22c55e211f") = 0
unlink("/mnt/cache/2/06/29066a58e4116d24266026b4ed1e3062") = 0
epoll_wait(32, [], 512, 50)             = 0
unlink("/mnt/cache/4/6b/9a104ebdf70d00137a88d4584b2bb6b4") = 0
unlink("/mnt/cache/e/95/6d176447f57f21769d86a8f0b2a8b95e") = 0
unlink("/mnt/cache/b/b2/2f6f51163c65ae1fc06a913d6de1ab2b") = 0
unlink("/mnt/cache/a/24/2b058045a23b69de7a4442c9e6fce24a") = 0
unlink("/mnt/cache/7/60/00833e0b236ca8472f5be8227d645607") = 0
unlink("/mnt/cache/a/08/bf00eea300eff97dc4fffa61daaca08a") = 0
unlink("/mnt/cache/2/48/a291d8aca2b6f4f9471686eabe9b2482") = 0
unlink("/mnt/cache/0/e3/2d631adbc3bfdf8e44a51fa5453eee30") = 0
unlink("/mnt/cache/1/3b/08eef7c86c5ece9b5279b304dd86e3b1") = 0
unlink("/mnt/cache/b/a4/03213e4a8a1e8fb17ae698e54e70fa4b") = 0
unlink("/mnt/cache/b/a3/77f1b11811a9cda0ae93c498769f7a3b") = 0
unlink("/mnt/cache/4/01/1d50fac60681ae3263c8875775d20014") = 0
unlink("/mnt/cache/c/94/e71b96cbc65b248bd8e4540cbd69294c") = 0
unlink("/mnt/cache/1/59/99ec58e865b97e217835dd84f5f48591") = 0
unlink("/mnt/cache/4/b8/6a64825ce555b8f2440f051a7f7bcb84") = 0
unlink("/mnt/cache/7/51/fe2acbb895427ed8e406ce7e79d61517") = 0
.....
.....

You can tune the file removing from the cache with manager_files, manager_threshold and manager_sleep arguments of the proxy_cache_path.
If you came here searching information on the topic probably you should check out these articles, too: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) and Tune nginx proxy cache – control the cache manager how to delete cached files

Centos 7 Server hangs up on boot after deleting a software raid (mdadm device)

We have a CentOS 7 server with a simple two hard drives setup in RAID1 of total 4 devices for boot, root, swap and storage. The storage device (/dev/md5) was removed and recreated with RAID0 for better performance, because the server was promoted as only cache server. Then the server was restarted and it never went up.
On IPMI KVM it just started loading the kernel and hanged up after several seconds without any additional information:

The kernel loads the mdadm devices and do not continue and the device md5 is missing.

main menu
CentOS 7 kernel loading the mdadm RAID devices

To boot successfully you must remove the missing device

On the Grub 2 menu press “e” and you’ll get this screen. Here you can edit all lines if you need. You must remove the last rd.md.uuid in our case or the one you deleted. Remove it and press Ctrl+x to load the kernel.

main menu
Grub 2 edit

There are two options you can do:

  • OPTION 1) Remove rd.md.uuid option of your old mdadm device
  • OPTION 2) Replace the ID in rd.md.uuid= with the new ID of the mdadm device.

Each of these two options could be used to solve the booting problem. Edit /etc/default/grub and replace or remove rd.md.uuid and generate the grub.conf.
You can find old mdadm ID in /etc/mdadm.conf (if you have not replace it there).

[root@srv ~]# cat /etc/mdadm.conf 
ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 level=raid1 num-devices=2 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md5 level=raid1 num-devices=2 metadata=1.2 name=2035110:storage1 UUID=e6eb2590:b767be36:c76bb869:45ff0c3c
[root@srv ~]# mdadm --detail --scan
ARRAY /dev/md2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md/5 metadata=1.2 name=s2035110:5 UUID=901074eb:16ba7c5b:0af69934:e9444102
[root@srv ~]# mdadm --detail --scan > /etc/mdadm.conf 

Here is our old /etc/default/grub:

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=e6eb2590:b767be36:c76bb869:45ff0c3c console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
GRUB_DISABLE_RECOVERY="true"

Here we edit our /boot/grub2/grub.cfg, replace the old uuid and generate grub.cfg (legacy BIOS):

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=901074eb:16ba7c5b:0af69934:e9444102 console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
[root@srv ~]# grub2-mkconfig -o /boot/grub2/grub.cfg 
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-957.5.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.5.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d
Found initrd image: /boot/initramfs-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d.img
done
[root@srv ~]# reboot

Use this for UEFI BIOS boot:
First check if /boot and /boot/efi are mounted and if not you must mount them with:

mount /boot
mount /boot/efi

Generate the grub.cfg

grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

Bonus

In fact when the original device was removed and added a new one we formatted it as usual. But it was not possible to mount it, you just execute mount

/dev/md5 /mnt/stor1

no error, but no mount could be found, the device was not mounted and when you execute

umount /mnt/stor1

The OS told the “/mnt/stor1” was not mounted. Several more tries were made unsuccessfully to mount the “/dev/md5”, then the restart was performed and the server never went up.
Suppose the systemd just did not allow to mount the device because of the boot parameters rd.md.uuid!