Upgrade MySQL 5.6 to 5.7 what problems to expect with old my.cnf configuration file

Finally, we do not have any more MySQL 5.6 servers. We upgraded our last part of the system with MySQL 5.6 to 5.7. In our opinion, this upgrade is one of the major referred to MySQL configuration file my.cnf – multiple deprecated directives are removed in this new 5.7 version so when upgrading you should remove them before restarting or starting the new version if you want to have running MySQL server instance.
Keep in mind our my.cnf are old, they are created with MySQL 5.0 and they are edited in every upgrade to a new version (5.0 to 5.1, 5.1 to 5.5, and 5.5 to 5.6), and when we needed a specific optimization for our workload. And this is only for our configuration, there surely are more deprecated/removed variables in the new version. Here is a good starting point – https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-previous-series.html This article is not how to upgrade your old MySQL 5.6 to the new MySQL 5.7 it shows what problems you might have after you upgrade MySQL 5.6 to the new MySQL 5.7.
There are two parts of this article:

  1. Removed variables, which were perfectly OK in the old version 5.6
  2. Changed default value of variables, which impact greatly the IO or the the SQL execution

The error messages are included, too.

PART 1) Removed variables.

Some MySQL variables first get deprecated and then removed in later versions (some are just renamed) and if they are contained in the my.cnf configuration file your server will not start up at all. The MySQL log shows that the server starts and then throws an error about “unknown variable” and starts a shutdown procedure. So you end up without a database server and it is important to remove them from the configuration or find the new name of a renamed one.

2019-02-26T09:50:12.612950Z 0 [ERROR] unknown variable 'key_buffer=512M'
2019-02-26T09:50:36.361870Z 0 [ERROR] unknown variable 'thread_concurrency=6'
2019-02-26T09:51:17.658546Z 0 [ERROR] unknown variable 'thread_cache=10'
2019-02-26T09:51:32.473210Z 0 [ERROR] unknown variable 'innodb_additional_mem_pool_size=256M'

All four

key_buffer, thread_concurrency, thread_cache, innodb_additional_mem_pool_size

MySQL variables were removed and your server won’t start up if they are contained in the configuration. The “key_buffer” has been renamed to “key_buffer_size so replace it with key_buffer_size in your my.cnf. It’s important to replace it, because commenting it out would activate the default value and in this case 8M key_buffer_size, which is pretty low (in fact almost all default values of the MySQL variables are really low and it is a problem and a topic of discussions in many forums).
The “thread_cache” also renamed long ago to “thread_cache_size“, so replace it with thread_cache_size.
thread_concurrency and innodb_additional_mem_pool_size were removed long ago they first stopped doing anything and with this version, they removed the variables. As you can see old configuration files could carry on many old names over the years.

The important thing here is you must rename the ones, which got renamed and remove the ones, which got removed, because your server is not going to start up with them in the configuration.

PART 2) Changed default value

Some default values of MySQL variables got changed and if you have not included it in the my.cnf configuration you might be really surprised how big an impact they have on the IO or even on the behavior of the SQL statements.

2.1) Our first MySQL variables is

sync_binlog

– the default value was “0” (deactivated synchronization) and now it is “1” (bin log synchronization). This could greatly impact the performance of your MySQL database server with like 8-10 times more writes and IO disk wait time (really!!!) – you can see it here: (coming soon). So if you haven’t used this variable before you should put it in your my.cnf configuration for sure (in [mysqld] section):

sync_binlog=0

do not need to restart the server, just put in the my.cnf configuration file and open a mysql root console and execute:

SET GLOBAL sync_binlog=0;

it can be live changed.

2.2) And the second example is

sql_mode

– the default value was “NO_ENGINE_SUBSTITUTION” and now it is “ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION”, which is a pretty substantial difference. You can lose INSERTs and UPDATEs easily because a much strict mode is activated by default.
For example, with an INSERT if you do not set the value to a field, which column does not have a default value (yes, it is wrong, but it was OK before), your insert won’t be executed and you’ll get an error (or just a FALSE after execution of your query like with PHP PDO). Here is the MySQL explanation:

A value is missing when a new row to be inserted does not contain a value for a non-NULL column that has no explicit DEFAULT clause in its definition.

And more in https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mode-strict
So if you haven’t used this variable before you should put it in your my.cnf configuration for sure (in [mysqld] section):

sql_mode=NO_ENGINE_SUBSTITUTION

do not need to restart the server, just put in the my.cnf configuration file and open a MySQL root console and execute:

SET GLOBAL sql_mode='NO_ENGINE_SUBSTITUTION'

Update Supermicro BMC/IPMI Firmware – under Linux console

Here you will see our log of upgrading the Supermicro IPMI firmware with the cli tool included in the firmware package for your IPMI unit under Linux console.
If your server has built-in IPMI unit in the motherboard there will be a firmware for it next to the BIOS firmware in the Supermicro site. You go to the page of your Supermicro page and on the left part you have also the BIOS and IPMI firmware links. The IPMI firmware package has a Windows/DOS and Linux executable files to flash the firmware under the console.
So here we flash a new firmware to our motherboard is X10SLM+-F.

Here you can see left “Links & Resources” and click on ” BMC/IPMI Firmware” to download the latest IPMI firmware for your motherboard.

main menu
Motherboard X10SLM+-F page in Supermicro site

Upload the downloaded file in your server.

STEP 1) Unpack the firmware file downloaded from Supermicro site.

Here we include the verbose output of “tar” so you can see what files are included. The files we use here are highlighted.

[root@srv ~]# ls -altr
total 25904
drwxr-xr-x. 94 root root    81920  3 Feb 17,42 ..
drwxr-xr-x.  2 root root     4096  3 Feb 17,43 .
-rw-r--r--.  1 root root 26432121  3 Feb 17,43 REDFISH_X10_372.zip
[root@srv ~]# mkdir REDFISH_X10_372
[root@srv ~]# cd REDFISH_X10_372/
[root@srv ~/REDFISH_X10_372]# unzip ../REDFISH_X10_372.zip 
Archive:  ../REDFISH_X10_372.zip
  inflating: Redfish_Ref_Guide_2.0.pdf  
   creating: 2.07/
   creating: 2.07/dos/
  inflating: 2.07/dos/AdUpdate.exe   
   creating: 2.07/linux/
   creating: 2.07/linux/x32/
  inflating: 2.07/linux/x32/AlUpdate  
   creating: 2.07/linux/x64/
  inflating: 2.07/linux/x64/AlUpdate  
  inflating: 2.07/ReleaseNote.txt    
   creating: 2.07/windows/
   creating: 2.07/windows/x32/
  inflating: 2.07/windows/x32/AwUpdate.exe  
  inflating: 2.07/windows/x32/phymem32.sys  
  inflating: 2.07/windows/x32/pmdll32.dll  
  inflating: 2.07/windows/x32/superbmc32.sys  
  inflating: 2.07/windows/x32/superdll_ssm32.dll  
   creating: 2.07/windows/x64/
  inflating: 2.07/windows/x64/AwUpdate.exe  
  inflating: 2.07/windows/x64/phymem64.sys  
  inflating: 2.07/windows/x64/pmdll64.dll  
  inflating: 2.07/windows/x64/superbmc.sys  
  inflating: 2.07/windows/x64/superdll_ssm64.dll  
  inflating: IPMI Firmware Update_NEW.doc  
  inflating: REDFISH_X10_372.bin

There are 5 version of the flash utility Linux 32bit and 64bit, Windows 32bit and 64bit and a dos version.

STEP 2) Flash the BCM/IPMI firmware.

We choose here not to preserve configuration, because some old features might be incompatible with the new one. It is not mandatory to do it in fact we also tested with “to preserve” the old configuration and we have no problems afterwards.
We do not change almost anything in the IPMI configuration except admin password and the network settings and when flashing under the OS you have the ability to reconfigure it after the flashing process. Your server is up and running and you can use “ipmitool” to configure the IPMI module.
The whole process took about 15 minutes.

[root@srv ~/REDFISH_X10_372]# 2.07/linux/x64/AlUpdate -f REDFISH_X10_372.bin -r n
sh: cls: command not found
*****************************************************************************
* ATEN Technology, Inc.                                                     *
*****************************************************************************
* FUNCTION   :  IPMI FIRMWARE UPDATE UTILITY                                *
* VERSION    :  2.07                                                        *
* BUILD DATE :  Jul 13 2016                                                 *
* USAGE      :                                                              *
*             (1)Update FIRMWARE : AlUpdate -f filename.bin [OPTION]        *
*             (2)Dump FIRMWARE   : AlUpdate -d filename                     *
*             (3)Restore CONFIG  : AlUpdate -c -f filename.bin              *
*             (4)Backup CONFIG   : AlUpdate -c -d filename.bin              *
*****************************************************************************
* OPTION                                                                    *
*   -i the IPMI channel, currently, kcs and lan are supported               *
* LAN channel specific arguments                                            *
*   -h remote BMC address and RMCP+ port, (default port is 623)             *
*   -u IPMI user name                                                       *
*   -p IPMI password correlated to IPMI user name                           *
*   -r Preserve Configuration (default is Preserve)                         *
*      n:No Preserve, reset to factory default settings                     *
*      y:Preserve, keep all of the settings                                 *
*   -c IPMI configuration backup/restore                                    *
*      -f [restore.bin] Restore configurations                              *
*      -d [backup.bin] Backup configurations                                *
*****************************************************************************
* EXAMPLE                                                                   *
*   we like to upgrade firmware through KCS channel                         *
*   AlUpdate -f fwuperade.bin -i kcs -r y                                   *
*   AlUpdate -d fwdump.bin -i kcs -r y                                      *
*                                                                           *
*   we like to restore/backup IPMI config through KCS channel               *
*   AlUpdate -c -f restore.bin -i kcs -r y                                  *
*   AlUpdate -c -d backup.bin -i kcs -r y                                   *
*                                                                           *
*   we like to upgrade firmware through LAN channel with                    *
*   - BMC IP address 10.11.12.13 port 623                                   *
*   - IPMI username is usr                                                  *
*   - Password for alice is pwd                                             *
*   - Preserve Configuration                                                *
*   AlUpdate -f fw.bin -i lan -h 10.11.12.13 623 -u usr -p pwd -r y         *
*   AlUpdate -d fwdump.bin -i lan -h 10.11.12.13 623 -u usr -p pwd -r y     *
*                                                                           *
*   we like to restore/backup IPMI config through LAN channel with          *
*   - BMC IP address 10.11.12.13 port 623                                   *
*   - IPMI username is usr                                                  *
*   - Password for alice is pwd                                             *
*   - Preserve Configuration                                                *
*   AlUpdate -c -f fw.bin -i lan -h 10.11.12.13 623 -u usr -p pwd           *
*   AlUpdate -c -d fwdump.bin -i lan -h 10.11.12.13 623 -u usr -p pwd       *
*****************************************************************************

2.07/linux/x64/AlUpdate -f REDFISH_X10_372.bin -r n 
Try open dev ipmi0....
Check if this file is valid................
If the FW update fails,PLEASE TRY AGAIN
Load part 0   126008 bytes, [Ok]                       
Load part 1 14635008 bytes, [Ok]                       
Load part 2  1537585 bytes, [Ok]                       
Load part 3  8081440 bytes, [Ok]                       
Load part 4   262144 bytes, [Ok]                       



                 If the FW update fails. PLEASE WAIT 5 MINS AND REMOVE THE AC...
new firmware is updating...100%
Update Complete,Please wait for BMC reboot, about 1 min                       
[root@srv ~/REDFISH_X10_372]# 

All the lines starting with “Load part” will shows progress percentages like:

Load part 1 14635008 bytes,  4137K bytes   29%"

And the line starting with “new firmware is updating…” also shows like:

new firmware is updating...28%

In dmesg you can see your IPMI module resets:

[root@conv1 ~]# dmesg
[1954154.242383] usb 3-7: USB disconnect, device number 2
[1954154.242385] usb 3-7.1: USB disconnect, device number 3
[1954185.337154] usb 3-7: new high-speed USB device number 4 using xhci_hcd
[1954185.501356] usb 3-7: New USB device found, idVendor=0557, idProduct=7000
[1954185.501358] usb 3-7: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[1954185.501879] hub 3-7:1.0: USB hub found
[1954185.501923] hub 3-7:1.0: 4 ports detected
[1954185.899168] usb 3-7.1: new low-speed USB device number 5 using xhci_hcd
[1954185.999375] usb 3-7.1: New USB device found, idVendor=0557, idProduct=2419
[1954185.999376] usb 3-7.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[1954186.000708] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.0/input/input10
[1954186.051346] hid-generic 0003:0557:2419.0003: input,hidraw0: USB HID v1.00 Keyboard [HID 0557:2419] on usb-0000:00:14.0-7.1/input0
[1954186.052050] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.1/input/input11
[1954186.052423] hid-generic 0003:0557:2419.0004: input,hidraw1: USB HID v1.00 Mouse [HID 0557:2419] on usb-0000:00:14.0-7.1/input1
[1954199.668503] usb 3-7.1: USB disconnect, device number 5
[1954201.450533] usb 3-7.1: new low-speed USB device number 6 using xhci_hcd
[1954201.550755] usb 3-7.1: New USB device found, idVendor=0557, idProduct=2419
[1954201.550756] usb 3-7.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[1954201.552044] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.0/input/input12
[1954201.602658] hid-generic 0003:0557:2419.0005: input,hidraw0: USB HID v1.00 Keyboard [HID 0557:2419] on usb-0000:00:14.0-7.1/input0
[1954201.603372] input: HID 0557:2419 as /devices/pci0000:00/0000:00:14.0/usb3/3-7/3-7.1/3-7.1:1.1/input/input13
[1954201.603729] hid-generic 0003:0557:2419.0006: input,hidraw1: USB HID v1.00 Mouse [HID 0557:2419] on usb-0000:00:14.0-7.1/input1

Update supermicro X10SLH-F firmware BIOS under Linux with the SUM cli

As you can see our product is:

product: X10SLH-F/X10SLM+-F

The same string is in our KVM IPMI: “Product Name: X10SLH-F/X10SLM+-F” and in the BIOS, but if you go the supermicro site you will find that

  • X10SLH-F has C226 chipset (supports video in the CPU)
  • X10SLM+-F has C224 chipset

and because we use the video in the CPU we know our motherboard is X10SLH-F and we downloaded the BIOS firmware for it. You also could check your chipset with lshw command.

STEP 1) Download and unpack the SUM (Supermicro Update Manager) and the BIOS zip file

Unpack the SUM (Supermicro Update Manager), here you can find a detail information about SUM – Update supermicro server’s firmware BIOS under linux with the SUM cli

[root@srv1 ~]# tar xzvf sum_2.0.0_Linux_x86_64_20171108.tar.gz 
sum_2.0.0_Linux_x86_64/
sum_2.0.0_Linux_x86_64/ReleaseNote.txt
sum_2.0.0_Linux_x86_64/sum
sum_2.0.0_Linux_x86_64/ExternalData/
sum_2.0.0_Linux_x86_64/ExternalData/VENID.txt
sum_2.0.0_Linux_x86_64/ExternalData/SMCIPID.txt
sum_2.0.0_Linux_x86_64/driver/
sum_2.0.0_Linux_x86_64/driver/RHL4_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL4_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/driver/RHL6_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL6_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/driver/RHL5_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL5_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/driver/RHL7_x86_64/
sum_2.0.0_Linux_x86_64/driver/RHL7_x86_64/sum_bios.ko
sum_2.0.0_Linux_x86_64/SUM_UserGuide.pdf
[root@srv1 ~]# unzip x10slh8_510.zip
Archive:  x10slh8_510.zip
   creating: x10slh8.510/
  inflating: x10slh8.510/AFUDOSU.SMC  
  inflating: x10slh8.510/ami.bat     
  inflating: x10slh8.510/Readme for AMI BIOS.txt  
  inflating: x10slh8.510/x10slh8.510  
[root@srv1 ~]# cd sum_2.0.0_Linux_x86_64
sum_2.0.0_Linux_x86_64/                 sum_2.0.0_Linux_x86_64_20171108.tar.gz  
[root@conv1 ~]# cd sum_2.0.0_Linux_x86_64

STEP 2) Flash the BIOS file with sum cli.

Here you can see what to expect flashing the BIOS firmware.

[root@srv1 sum_2.0.0_Linux_x86_64]# ./sum -c UpdateBios --file ../x10slh8.510/x10slh8.510 
Supermicro Update Manager (for UEFI BIOS) 2.0.0 (2017/11/08) (x86_64)
Copyright©2017 Super Micro Computer, Inc. All rights reserved
Reading BIOS flash ..................... (100%)
Checking BIOS ID ...
Writing BIOS flash ..................... (100%)
Verifying BIOS flash ................... (100%)
Checking ME Firmware ...
Putting ME data to BIOS ................ (100%)
Writing ME region in BIOS flash ...
 - Update success for /FDT!!
 - Updated Recovery Loader to OPRx
 - Updated FPT, MFSB, FTPR and MFS
 - ME Entire Image done
WARNING:Must power cycle or restart the system for the changes to take effect!
[root@srv1 sum_2.0.0_Linux_x86_64]# reboot

During the BIOS flashing your console could have seemed unresponsive for several minutes, but it is OK, the flash process is about 10 minutes. Then reboot and wait for several automatic resets of your system and after that when your system reaches the OS boot you should reboot again and reset your BIOS to the optimized defaults and then you can tune it as it was before.

In some rear cases you could receive “Critical Error” – “FDT is different.” you should reboot and repeat the procedure, more information here – Update supermicro server’s firmware BIOS under linux with the SUM cli

Bonus

Some commands to find the exact information for the server motherboard.

[root@srv1 ~]# lshw|grep -A 14 "core$"
  *-core
       description: Motherboard
       product: X10SLH-F/X10SLM+-F
       vendor: Supermicro
       physical id: 0
       version: 1.01
       serial: ZM1111111111
       slot: To be filled by O.E.M.
     *-firmware
          description: BIOS
          vendor: American Megatrends Inc.
          physical id: 0
          version: 3.0a
          date: 12/17/2015
          size: 64KiB
[root@srv1 ~]# lspci |grep -i c226
00:1f.0 ISA bridge: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller (rev 05)
conv2 ~ # lspci -vvv|grep -i c226
00:1f.0 ISA bridge: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller (rev 05)
        Subsystem: Super Micro Computer Inc C226 Series Chipset Family Server Advanced SKU LPC Controller

Tune nginx proxy cache – control the cache manager how to delete cached files

In most cases you’ll never want to modify the default settings for deleting cache items with proxy_cache_path directives. The problem is in a peak the file deleting could impact your server performance and even it could kill your server leaving it unresponsive for a period of time. You cannot instruct nginx with a schedule job for deletion cached items or ban the deletion when the server is busy or loaded. The manager just traces each zone for used cache capacity versus the maximum allowed size and if the used capacity is near or bigger than the maximum allowed size (max_size) the manager process triggers deletion with the default values – the nginx manager will try to delete at least 100 files (up to 200 milliseconds) and then it will sleep for 50 milliseconds then again it will try deleting 100 files. So your file system could receive at least 1000 files per second to delete!

This could lead your server to almost unresponsive state in the peaks.

And it could be perfectly OK in off-peaks, but there is no way how to tell nginx cache manager there is a plenty free space despite you reach the cache limit so at the moment it is not the best time to delete the cache!

You can tune three parameters per cache directory (manual here: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path):

  • manager_files – not more than this number of files to delete in one iteration. The default value is 100.
  • manager_threshold – limit the delete iteration time. The default value is 200 milliseconds and you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
  • manager_sleep – how much time to sleep the manager before executing another delete iteration. The default value is 50 milliseconds and here you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g manager_files=2 manager_sleep=200ms manager_threshold=500ms;

The cache manager will delete not more than 2 files for up to 500 milliseconds and it will sleep 200 milliseconds before another delete iteration.

The best option for loaded servers

The best option for loaded servers with full cache is to balance the free space – delete small amount of files at once to be sure your server will not get loaded even the free space decreases at the peaks (so more files are cached than the nginx manager could delete – you are aware of this and the free space should be enough), but during the off peak (which normally is several times longer than the peak) the nginx manager could catch up with the deleting and it should free up some space (cached files are lesser than the deleted ones). Of course, you should tune this according to your situation.
The main idea is to delete in small amounts of files to not saturate your disks it could take longer to recover the free space, but it will not load your server in peaks. You should consider two things:

  1. Free space – enough free space and to be sure the free space is enough for the peaks, when the cache could grow above the threshold.
  2. Number of deletions per iteration – you should experiment with this. Fist you should be away how many files are added for a period of time, which includes one peak and one off-peak and then to balance the number in such a way that after the period the cache is not above the maximum size. Probably the best is to start with a 24 hours period, which includes at least one peak.

As you can see the example above only 2 files are good enough for an iteration for our case. Taking into account the 200ms sleep between the files’ deletions 10 files at most should be deleted per second. In our case it is not enough for the peak, but for the off-peak, which is 20 hours every 24 hours, is good enough to get into the maximum size limit of the cache.

Here you can learn how to verify your nginx is deleting cache files and the impact of the default settings on a busy server in a peak: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) Our loaded server just stopped serving files and the bandwidth decreased with 99% because nginx cache manager suddenly started deleting cached files.

how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process)

In peaks deleting files could kill your server and easily the traffic could degraded multiple times than normal if the nginx cache manager start deleting files!

The server is perfectly normal but suddenly it just get loaded and all nginx processes are in D (“Disk sleep”) state.

What could it be? What is going on with your proxy server?

Probably the cache is full!

Unfortunately there is no way to check how much is filled the cache live – just an upgrade or restart of the nginx process will trigger nginx cache loader to check all the cache files and will write the cache size on exit in the error log – but be careful the cache loading is also IO intensive operation – stats all the cache files and they could be millions images).

If you are sure the cache manager is to blame for the IO of your server (probably using this method – Check whether nginx cache manager is deleting files at the moment), you can stop it almost immediately!

Just increase the nginx cache drastically – add zero to the maximum cache size

Of course, you should have enough free space till you resolve the problem – for example more servers or manual deletion on peak-off or tune your cache deletion or any other solution….
Search for something like

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=400g

And add zero to the max_size number like:

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g

The max size will increase from 400G to 4000G (4T)!
This will effectively stop the files deleting and the nginx cache manager will have slept for long time before invoking again to delete files. This could be life saving operation for your server at peak!

Here is a real graph from one of our servers – the cache manager started deleting files from the cache and the traffic dropped 99%!!!

SCREENSHOT 1) The nginx cache manager just started to delete files from the cache and this operation just killed our server completely.

You can see almost zero bandwidth! The problem was resolved when we reloaded nginx with a bigger cache max_size value. The nginx manager immediately went to sleep and no IO for deleting files. The load of the server returned to normal!

main menu
nginx cache manager start deleting files

SCREENSHOT 2) Hard drives were saturated and the disk maxed the IO time to 10 ms.

Despite the bigger READ and WRITE IOPS there was 95-99% less traffic.

main menu
Disk IO Time when cache manager is working

Then you can tune the values for deleting files from the cache – Tune nginx proxy cache – control the cache manager how to delete cached files.

Check whether nginx cache manager is deleting files at the moment

Here is a tip for the webmasters (or system admins) to discover whether the nginx using proxy_cache to cache files is deleting files at the moment! There situation where you may need to know if the loaded of a static media server is caused by the deletion of the cache manager or by the read or seek operations when serving the static files. The deletion is really slow and IO intensive operation, which could greatly impact the performance and traffic of the server.
Find the process nginx’s “cache manager process” and strace it:

[root@srv ~]# ps axuf|grep nginx
root     31582  0.0  0.0 2906768 25108 ?       Ss   Feb15   0:01 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx    16008  1.9  1.3 2941188 440224 ?      S    16:39   1:33  \_ nginx: worker process
nginx    16009  1.5  1.2 2941188 398836 ?      S    16:39   1:12  \_ nginx: worker process
nginx    16010  0.5  0.7 2941984 239064 ?      S    16:39   0:26  \_ nginx: worker process
nginx    16011  0.7  0.9 2941984 299356 ?      D    16:39   0:35  \_ nginx: worker process
nginx    16012  1.2  1.1 2941188 389540 ?      D    16:39   1:01  \_ nginx: worker process
nginx    16013  2.3  1.5 2941188 487324 ?      D    16:39   1:55  \_ nginx: worker process
nginx    16014  0.0  0.6 2906772 224004 ?      S    16:39   0:01  \_ nginx: cache manager process
[root@srv ~]# strace -f -p 16014
strace: Process 16014 attached
gettid()                                = 16014
write(31, "2019/02/25 18:00:31 [info] 16014"..., 89) = 89
epoll_wait(36, [], 512, 5406)           = 0
unlink("/mnt/cache/0/39/c8ccbbc06d16debb1c8d58ceb6f99390") = 0
unlink("/mnt/cache/0/78/118924d7bf70e20fa8f790c6f9e7c780") = 0
unlink("/mnt/cache/3/ce/fab074cc670e6a80114dcbc398a63ce3") = 0
unlink("/mnt/cache/5/48/0b4e162dd7be8244815721fb7d68e485") = 0
unlink("/mnt/cache/5/56/e5eb4b38c7c8d209d0aabaf79ac02565") = 0
unlink("/mnt/cache/e/c6/207b432fa77375e4eefcaf52db250c6e") = 0
unlink("/mnt/cache/4/6d/ac0db27a03dabc79d869068db1b516d4") = 0
unlink("/mnt/cache/9/e8/91625c6e60de8e5425c4135c7dfb2e89") = 0
unlink("/mnt/cache/b/3c/f3c53000cf0cb20d55d8c09df8a733cb") = 0
unlink("/mnt/cache/f/f7/6f06423cd411b45816969fe020903f7f") = 0
unlink("/mnt/cache/f/50/c9b8ab72821a6e9bcb9c8d4b790dc50f") = 0
unlink("/mnt/cache/6/1f/74b0f1fdf1ac30db6af7793dc15671f6") = 0
unlink("/mnt/cache/0/83/caf199c1b99d438f96caec71bf2ea830") = 0
unlink("/mnt/cache/4/3d/c90f8fbbba4aaf407e386641dc2203d4") = 0
unlink("/mnt/cache/4/ad/d23cf8598020141b2bcec46d2b5cbad4") = 0
unlink("/mnt/cache/d/47/05973bc310503f36c67b7c1c24c8247d") = 0
unlink("/mnt/cache/f/11/e4fcbde8533d89105ab41f22c55e211f") = 0
unlink("/mnt/cache/2/06/29066a58e4116d24266026b4ed1e3062") = 0
epoll_wait(32, [], 512, 50)             = 0
unlink("/mnt/cache/4/6b/9a104ebdf70d00137a88d4584b2bb6b4") = 0
unlink("/mnt/cache/e/95/6d176447f57f21769d86a8f0b2a8b95e") = 0
unlink("/mnt/cache/b/b2/2f6f51163c65ae1fc06a913d6de1ab2b") = 0
unlink("/mnt/cache/a/24/2b058045a23b69de7a4442c9e6fce24a") = 0
unlink("/mnt/cache/7/60/00833e0b236ca8472f5be8227d645607") = 0
unlink("/mnt/cache/a/08/bf00eea300eff97dc4fffa61daaca08a") = 0
unlink("/mnt/cache/2/48/a291d8aca2b6f4f9471686eabe9b2482") = 0
unlink("/mnt/cache/0/e3/2d631adbc3bfdf8e44a51fa5453eee30") = 0
unlink("/mnt/cache/1/3b/08eef7c86c5ece9b5279b304dd86e3b1") = 0
unlink("/mnt/cache/b/a4/03213e4a8a1e8fb17ae698e54e70fa4b") = 0
unlink("/mnt/cache/b/a3/77f1b11811a9cda0ae93c498769f7a3b") = 0
unlink("/mnt/cache/4/01/1d50fac60681ae3263c8875775d20014") = 0
unlink("/mnt/cache/c/94/e71b96cbc65b248bd8e4540cbd69294c") = 0
unlink("/mnt/cache/1/59/99ec58e865b97e217835dd84f5f48591") = 0
unlink("/mnt/cache/4/b8/6a64825ce555b8f2440f051a7f7bcb84") = 0
unlink("/mnt/cache/7/51/fe2acbb895427ed8e406ce7e79d61517") = 0
.....
.....

You can tune the file removing from the cache with manager_files, manager_threshold and manager_sleep arguments of the proxy_cache_path.
If you came here searching information on the topic probably you should check out these articles, too: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) and Tune nginx proxy cache – control the cache manager how to delete cached files

Centos 7 Server hangs up on boot after deleting a software raid (mdadm device)

We have a CentOS 7 server with a simple two hard drives setup in RAID1 of total 4 devices for boot, root, swap and storage. The storage device (/dev/md5) was removed and recreated with RAID0 for better performance, because the server was promoted as only cache server. Then the server was restarted and it never went up.
On IPMI KVM it just started loading the kernel and hanged up after several seconds without any additional information:

The kernel loads the mdadm devices and do not continue and the device md5 is missing.

main menu
CentOS 7 kernel loading the mdadm RAID devices

To boot successfully you must remove the missing device

On the Grub 2 menu press “e” and you’ll get this screen. Here you can edit all lines if you need. You must remove the last rd.md.uuid in our case or the one you deleted. Remove it and press Ctrl+x to load the kernel.

main menu
Grub 2 edit

There are two options you can do:

  • OPTION 1) Remove rd.md.uuid option of your old mdadm device
  • OPTION 2) Replace the ID in rd.md.uuid= with the new ID of the mdadm device.

Each of these two options could be used to solve the booting problem. Edit /etc/default/grub and replace or remove rd.md.uuid and generate the grub.conf.
You can find old mdadm ID in /etc/mdadm.conf (if you have not replace it there).

[root@srv ~]# cat /etc/mdadm.conf 
ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 level=raid1 num-devices=2 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md5 level=raid1 num-devices=2 metadata=1.2 name=2035110:storage1 UUID=e6eb2590:b767be36:c76bb869:45ff0c3c
[root@srv ~]# mdadm --detail --scan
ARRAY /dev/md2 metadata=0.90 UUID=9c08f218:cd5c0f8f:d96bc0d1:57b77e99
ARRAY /dev/md3 metadata=1.2 name=2035110:swap UUID=1f74a2e0:757bfb9f:9c860e50:325f37cb
ARRAY /dev/md4 metadata=1.2 name=2035110:root UUID=29bf4aa8:b7dae21a:45f4c188:baea4c13
ARRAY /dev/md/5 metadata=1.2 name=s2035110:5 UUID=901074eb:16ba7c5b:0af69934:e9444102
[root@srv ~]# mdadm --detail --scan > /etc/mdadm.conf 

Here is our old /etc/default/grub:

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=e6eb2590:b767be36:c76bb869:45ff0c3c console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
GRUB_DISABLE_RECOVERY="true"

Here we edit our /boot/grub2/grub.cfg, replace the old uuid and generate grub.cfg (legacy BIOS):

[root@srv ~]# cat /etc/default/grub 
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --speed=115200"
GRUB_CMDLINE_LINUX="rd.md.uuid=9c08f218:cd5c0f8f:d96bc0d1:57b77e99 rd.md.uuid=1f74a2e0:757bfb9f:9c860e50:325f37cb rd.md.uuid=29bf4aa8:b7dae21a:45f4c188:baea4c13 rd.md.uuid=901074eb:16ba7c5b:0af69934:e9444102 console=tty0 crashkernel=auto console=ttyS0,115200 net.ifnames=1"
[root@srv ~]# grub2-mkconfig -o /boot/grub2/grub.cfg 
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-957.5.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-957.5.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d
Found initrd image: /boot/initramfs-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d.img
done
[root@srv ~]# reboot

Use this for UEFI BIOS boot:
First check if /boot and /boot/efi are mounted and if not you must mount them with:

mount /boot
mount /boot/efi

Generate the grub.cfg

grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg

Bonus

In fact when the original device was removed and added a new one we formatted it as usual. But it was not possible to mount it, you just execute mount

/dev/md5 /mnt/stor1

no error, but no mount could be found, the device was not mounted and when you execute

umount /mnt/stor1

The OS told the “/mnt/stor1” was not mounted. Several more tries were made unsuccessfully to mount the “/dev/md5”, then the restart was performed and the server never went up.
Suppose the systemd just did not allow to mount the device because of the boot parameters rd.md.uuid!

mysql – Error ‘Your password does not satisfy the current policy requirements’ or zero length mysql password

We got this error when granting permissions for one of our new slave server (it could be for an ordinary MySQL server, too):

Error 'Your password does not satisfy the current policy requirements' on query. Default database: ''. Query: 'GRANT REPLICATION SLAVE ON *.* TO 'reusr'@'127.0.01''

It appeared that MySQL has activated by default a password checking plugin and our password in the GRANT (or SET PASSWORD) option didn’t meet the requirements.
So here is what you can do:

OPTION 1) Lower the password policy level

Check the policy level and lower it if it is MEDIUM or HIGH (they are there options LOW=0, MEDIUM=1 the default and HIGH=2). The policy level controls how to check and what is involved in the complexity algorithm for the passwords. More details here – https://dev.mysql.com/doc/refman/5.7/en/validate-password-options-variables.html#sysvar_validate_password_policy. Here is what you have:

[myuser@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
....
....
mysql> SHOW VARIABLES LIKE 'validate%';
+--------------------------------------+--------+
| Variable_name                        | Value  |
+--------------------------------------+--------+
| validate_password_check_user_name    | OFF    |
| validate_password_dictionary_file    |        |
| validate_password_length             | 8      |
| validate_password_mixed_case_count   | 1      |
| validate_password_number_count       | 1      |
| validate_password_policy             | MEDIUM |
| validate_password_special_char_count | 1      |
+--------------------------------------+--------+
7 rows in set (0.00 sec)

So set the validate_password_policy=0 and try again your query:

mysql> set global validate_password_policy=0;
Query OK, 0 rows affected (0.00 sec)

If you still get the error your password is lower than the validate_password_length (=8 by default) so you need to change it at last to 8 characters. But what if you what zero password (or with 1,2,3 characters)? Setting validate_password_length to 0 won’t work, because there is a hard limit to 4, so you cannot set it to 0 event the set query is not reporting error when using 0 with validate_password_length.

You should uninstall the plugin.

OPTION 2) Uninstall the MySQL Validation Plugin

You can uninstall the validation plugin on-the-fly in a working server without restarting or reloading and then you can set whatever password you like.
Here is how to do it:

[myuser@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
....
....
mysql> UNINSTALL PLUGIN validate_password;
Query OK, 0 rows affected (0.03 sec)

mysql> SHOW VARIABLES LIKE 'validate%';
Empty set (0.01 sec)

As you can see no “validate_password” variables are available anymore! Now set your password.
But there is a catch, if you have started the server with “–validate-password=FORCE_PLUS_PERMANENT” (you can check it with “ps axuf|grep mysqld” in the command line) you won’t be able to uninstall the plugin live even with the root MySQL user. So at the end if you do not have root permissions to restart the MySQL service without this option it might be better to change your password or skip the query if it is received by the slave in the MySQL replication bin log.
You can install the plugin again with:

[myuser@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
....
....
mysql> INSTALL PLUGIN validate_password SONAME 'validate_password.so';
Query OK, 0 rows affected (0.00 sec)

And it will be available over restarts, too, because it is registered in “mysql.plugin” table.

mysql – Error ‘Column count of mysql.user is wrong. Expected 45, found 43. The table is probably corrupted’ on query.

If you

upgraded your MySQL server (from 5.6 to 5.7 or above)

or

imported a MySQL dump SQL file from older version

than your current server you may encounter when granting permissions to a user:

Error 'Column count of mysql.user is wrong. Expected 45, found 43. The table is probably corrupted' on query. Default database: ''. Query: 'GRANT REPLICATION SLAVE ON *.* TO 'replusr'@'144.76.156.182''

Do not panic probably it is not corrupted just continue reading.

There is the simple fix, just

execute mysql_upgrade

It will automatically detect what to upgrade and it will upgrade it:

[myuser@mysql1 ~]# screen -R upgrade
[myuser@mysql1 ~]# mysql_upgrade 
Checking if update is needed.
Checking server version.
Running queries to upgrade MySQL server.
Checking system database.
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.engine_cost                                  OK
mysql.event                                        OK
mysql.func                                         OK
mysql.general_log                                  OK
mysql.gtid_executed                                OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.host                                         OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.ndb_binlog_index                             OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.server_cost                                  OK
mysql.servers                                      OK
mysql.slave_master_info                            OK
mysql.slave_relay_log_info                         OK
mysql.slave_worker_info                            OK
mysql.slow_log                                     OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.user                                         OK
The sys schema is already up to date (version 1.5.1).
Found 0 sys functions, but expected 22. Re-installing the sys schema.
Upgrading the sys schema.
Checking databases.
phpmyadmin.pma__bookmark                           OK
phpmyadmin.pma__central_columns                    OK
phpmyadmin.pma__column_info                        OK
phpmyadmin.pma__designer_settings                  OK
phpmyadmin.pma__export_templates                   OK
phpmyadmin.pma__favorite                           OK
phpmyadmin.pma__history                            OK
phpmyadmin.pma__navigationhiding                   OK
phpmyadmin.pma__pdf_pages                          OK
phpmyadmin.pma__recent                             OK
phpmyadmin.pma__relation                           OK
phpmyadmin.pma__savedsearches                      OK
phpmyadmin.pma__table_coords                       OK
phpmyadmin.pma__table_info                         OK
phpmyadmin.pma__table_uiprefs                      OK
phpmyadmin.pma__tracking                           OK
phpmyadmin.pma__userconfig                         OK
phpmyadmin.pma__usergroups                         OK
phpmyadmin.pma__users                              OK
sys.sys_config                                     OK
db1.access                                         OK
db1.users                                          OK
db1.objects                                        OK
db1.isp                                            OK
db1.desc                                           OK
Upgrade process completed successfully.
Checking if update is needed.

It works when the server is up and running and it is a good idea to execute the command in a screen.
It does not need to be logged as root, but mysql_upgrade does need to have the root MySQL password. In the example above it did not asked for password, because we have it in ~/.my.cnf file.

Just to note you might upgraded a long before this error to appear!

If you do not use a certain functionality you could live up happily with the old mysql.user scheme (and all old mysql.* tables). In our case we upgraded one of our slaves and several days after when a grant command on the master was issued the replication just stopped with this error! Of course, if someone were used the command in our slave the error would have appeared there sooner.
We also had case where old MySQL SQL dump file (5.6) was imported in a newer MySQL server 5.7 and there had been no issues for weeks till the GRANT command.

perror

Th error code is 1805.

[myuser@mysql1 ~]# perror 1805
MySQL error code 1805 (ER_COL_COUNT_DOESNT_MATCH_CORRUPTED_V2): Column count of %s.%s is wrong. Expected %d, found %d. The table is probably corrupted

Supermicro server cannot enter BIOS with F2, DEL or other when UEFI mode OS is installed

If you happen to have a supermicro server (X10SLH-F) and install Linux in UEFI mode in our case CentOS 7 and you want to enter the BIOS you’ll be surprised that you cannot with the keys provided in the very same BIOS boot screen – F2, DEL. The F11 and F12 also does not work for menu selection and network boot!

Even if you manage to press the DEL key and you see on the screen “Entering BIOS setup…” – the server WON’T enter BIOS, but will continue with the UEFI BIOS boot drive!

So what to do? Ammm break temporary your system by removing (renaming or moving) the EFI directory in your efi boot partition, resetting your server and holding pressed DEL key (again) on all start up screens of the server. When the UEFI BIOS boot entry is not valid any more and there are no other boot devices (and probably because we pressed DEL key) we were able to enter in the BIOS without remote hands on the collocation side or any other intervention on the server.

[root@srv ~]# mv /boot/efi/EFI/ /boot/efi/EFI_org
[root@srv ~]# reboot

This is the path in CentOS 7 and our standard partition layout:

[root@srv ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3         26G  4.5G    20G  19% /
devtmpfs         7.8G     0   7.8G   0% /dev
tmpfs            7.8G     0   7.8G   0% /dev/shm
tmpfs            7.8G  8.5M   7.8G   1% /run
tmpfs            7.8G     0   7.8G   0% /sys/fs/cgroup
/dev/sda2        976M   98M   812M  11% /boot
/dev/sda1        200M  9.8M   191M   5% /boot/efi
tmpfs            1.6G     0   1.6G   0% /run/user/0

DO NOT forget to remove all other (virtual) CD/DVD ROM Devices and temporary disable your network PXE Server (if you have any in the network)

Because it when the UEFI BIOS cannot find the EFI file saved in the UEFI BIOS BOOT drive it might follow the boot order before entering the BIOS!

Enter the bios by remote console on our X9 boards with UEFI bios

Apparently there is an issue with X8 and X9 supermicro boards in UEFI mode BIOS: https://www.supermicro.com/support/faqs/faq.cfm?faq=14029
So for someone it could be useful pressing and holding “ESC” + “-” or F4 to enter the UEFI BIOS, but we could not make it because of the IPMI KVM we used to manage the server.