make Gluster daemon to resolve the proper hostnames of your peers

This is a useful tip for GlusterFS nodes. When adding a peer to a gluster cluster you may use the hostname (or IP) and the Gluster daemon on the added server tries to resolve the hostname from the IP, which contacts it (or if the cluster has multiple peers – multiple IP resolves would happen).
Here is a simple example. The cluster will have two peers (srv1.example.com and srv2.example.com):
Add the peer srv2.example.com to your cluster srv1.example.com (in fact, the cluster consists only from the local Gluster daemon):

[root@srv1 ~]# gluster peer probe srv2.example.com
peer probe: success.
[root@srv1 ~]# gluster peer status
Number of Peers: 1

Hostname: srv2.example.com
Uuid: 8322b61c-a94d-491b-afc9-9f10eb8e8b92
State: Peer in Cluster (Connected)

And when you check the status of the cluster in the second server srv2.example.com. The second server uses the PTR domain of the first server:

[root@srv2 ~]# gluster peer status
Number of Peers: 1

Hostname: static.123.123.123.123.clients.your-server.de
Uuid: 3d273834-eca6-4997-871f-1a282ca90fb0
State: Peer in Cluster (Connected)

You see the hostname is a temporary namestatic.123.123.123.123.clients.your-server.de, the PTR of the srv1.example.com. You may have problems in the future if you leave it like that and even it is the really uninformative domain name for your cluster’s configuration. To change the peer hostname in a cluster is really difficult and dangerous, so the option is to change the PTR of the servers’ IPs, but if you cannot do it or it is too slow to do it you can just use “/etc/hosts” file!

Use “/etc/hosts” to make Gluster daemon to resolve the proper hostnames of your peers!

Edit the “/etc/hosts” on (the first and) the (peer) second server (add the line, do not remove the others if they exit). Replace the IP with your first server’s IP and hostname.

123.123.123.123 srv1.example.com

And then add it to the cluster on the first server and check again in the second server:

[root@srv2 ~]# gluster peer status
Number of Peers: 1

Hostname: srv1.example.com
Uuid: 3d273834-eca6-4997-871f-1a282ca90fb0
State: Peer in Cluster (Connected)

And in the fist server:

[root@srv1 ~]# gluster peer status
Number of Peers: 1

Hostname: srv2.example.com
Uuid: 8322b61c-a94d-491b-afc9-9f10eb8e8b92
State: Peer in Cluster (Connected)

Now the two servers have the right hostnames for peers. And these hostnames will be used for the Gluster configuration saved in the servers.

In fact, it is a good idea to add all your cluster peers in the “/etc/hosts” on all servers:

123.123.123.123 srv1.example.com
124.124.124.124 srv2.example.com

List all your files (and directories) with file size over FTP without ls -R (recursive)

A great piece of software is

lftp – sophisticated file transfer program

This little console tool could ease your life significantly with many enhancements to the simple FTP protocol. This tip is for those how what to list all their files in a directory or the entire FTP account, but do not have ls command with recursive abilities. So the only option is to manually go through all the directories to fetch the listing information of the directories, but this could be automatically done by

lftp using the custom command “find” and if you add “-l” argument the output is like “ls -al” – file or directory, file permissions, user and group, file size, date and file name are shown on single line for each file.

Just execute the command with proper credentials and the starting directory of your choice. The command output could even be piped to another command.
Keep on reading!

nginx with php fpm (fastcgi) and the warning – an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp

As the web grows and the technology advances the page size of the web sites also grows or just some times you might want to output a big chunk of data from your application server – PHP-FPM (but it could be any of another ruby, python, C, Django and more), for example.
Here is a fast configuration tip (note this is not the proxy-related warning!):

The default nginx buffers per CGI connection are too small

Here is what to do in your nginx configuration file:
First, look for a line “include /etc/nginx/fastcgi_params;” or similar and add or edit if they exist after this line:

        fastcgi_buffer_size 16k;
        fastcgi_buffers 32 16k;

Check out more for the buffers here http://nginx.org/en/docs/http/ngx_http_fastcgi_module.html#fastcgi_buffers
The warning should stop if it does not stop you can try raising them. It could consume more memory but could lower the IO usage of your disks and improve the performance of your site or whatever backend works!

Here is the warning in our nginx error logs. We got this warning when using php-fpm and the php output size was 325965 bytes (~320K).

2019/04/04 09:56:05 [warn] 24451#24451: *44269838 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/0/12/0019966120 while reading upstream, client: 10.10.10.10, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"
2019/04/04 09:56:07 [warn] 24451#24451: *44269849 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/2/12/0019966122 while reading upstream, client: 10.10.10.11, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"
2019/04/04 09:56:09 [warn] 24450#24450: *44269856 an upstream response is buffered to a temporary file /var/cache/nginx/fastcgi_temp/7/12/0019966127 while reading upstream, client: 10.10.10.12, server: srv17.srv.en, request: "GET /api/20140102/product HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "srv17.srv.en"

Unpack centos 7 initramfs file with and without dracut skipcpio

In CentOS 7 the initramfs consists of two concatenated gzipped cpio files. If you want to check what files and probably configuration files are included you can unpack it, but you should use

the dracut tool skipcpio

/usr/lib/dracut/skipcpio <initramfs-file> | zcat | cpio -id --no-absolute-filenames

The following is the output of a CentOS 7

[root@srv ~]# mkdir initramfs-unpacked
[root@srv ~]# cd initramfs-unpacked/
[root@srv initramfs-unpacked]# /usr/lib/dracut/skipcpio /boot/initramfs-3.10.0-957.10.1.el7.x86_64.img | zcat | cpio -id --no-absolute-filenames
164026 blocks
[root@srv initramfs-unpacked]# ls -al
общо 52
drwxr-xr-x. 12 root root 4096  1 Apr 11,48 .
dr-xr-x---.  5 root root 4096  1 Apr 11,48 ..
lrwxrwxrwx.  1 root root    7  1 Apr 11,48 bin -> usr/bin
drwxr-xr-x.  2 root root 4096  1 Apr 11,48 dev
drwxr-xr-x.  9 root root 4096  1 Apr 11,48 etc
lrwxrwxrwx.  1 root root   23  1 Apr 11,48 init -> usr/lib/systemd/systemd
lrwxrwxrwx.  1 root root    7  1 Apr 11,48 lib -> usr/lib
lrwxrwxrwx.  1 root root    9  1 Apr 11,48 lib64 -> usr/lib64
drwxr-xr-x.  2 root root 4096  1 Apr 11,48 proc
drwxr-xr-x.  2 root root 4096  1 Apr 11,48 root
drwxr-xr-x.  2 root root 4096  1 Apr 11,48 run
lrwxrwxrwx.  1 root root    8  1 Apr 11,48 sbin -> usr/sbin
-rwxr-xr-x.  1 root root 3117  1 Apr 11,48 shutdown
drwxr-xr-x.  2 root root 4096  1 Apr 11,48 sys
drwxr-xr-x.  2 root root 4096  1 Apr 11,48 sysroot
drwxr-xr-x.  2 root root 4096  1 Apr 11,48 tmp
drwxr-xr-x.  7 root root 4096  1 Apr 11,48 usr
drwxr-xr-x.  3 root root 4096  1 Apr 11,48 var
[root@srv initramfs-unpacked]# ls -al /boot/
общо 114812
dr-xr-xr-x.  6 root root     4096 30 Mar  2,36 .
dr-xr-xr-x. 19 root root     4096 30 Mar  2,37 ..
-rw-r--r--.  1 root root   151923 18 Mar 15,10 config-3.10.0-957.10.1.el7.x86_64
drwxr-xr-x.  3 root root     4096 28 Jan 20,52 efi
drwxr-xr-x.  2 root root     4096 30 Mar  2,29 grub
drwx------.  5 root root     4096 29 Mar 13,50 grub2
-rw-------.  1 root root 44256471 28 Jan 20,57 initramfs-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d.img
-rw-------.  1 root root 44821343 29 Mar 13,50 initramfs-3.10.0-957.10.1.el7.x86_64.img
-rw-------.  1 root root 10982937 30 Mar  2,36 initramfs-3.10.0-957.10.1.el7.x86_64kdump.img
drwx------.  2 root root    16384 29 Mar 13,46 lost+found
-rw-r--r--.  1 root root   314087 18 Mar 15,10 symvers-3.10.0-957.10.1.el7.x86_64.gz
-rw-------.  1 root root  3544363 18 Mar 15,10 System.map-3.10.0-957.10.1.el7.x86_64
-rwxr-xr-x.  1 root root  6639808 28 Jan 20,57 vmlinuz-0-rescue-05cb8c7b39fe0f70e3ce97e5beab809d
-rwxr-xr-x.  1 root root  6643904 18 Mar 15,10 vmlinuz-3.10.0-957.10.1.el7.x86_64
-rw-r--r--.  1 root root      171 18 Mar 15,10 .vmlinuz-3.10.0-957.10.1.el7.x86_64.hmac

You can see the init is handled by systemd!

Not using dracut skipcpio

early_cpio – dracut set this file at the beginning of the CentOS 7 initramfs. It contains the CPU microcode.
You can check it with “file” command and if it shows: “ASCII cpio archive (SVR4 with no CRC)” there is a microcode prepended to the initramfs file.

And here without the dracut skipcpio tool with an example:

  1. cpio the original initramfs and write down the number of blocks reported
  2. use dd to skip the first blocks from the above step
  3. Uncompress (and unpack) the file created by dd – this is the real initramfs file.

Here is how you can do it:

[root@srv ~]# file /boot/initramfs-3.10.0-957.10.1.el7.x86_64.img
/boot/initramfs-3.10.0-957.10.1.el7.x86_64.img: ASCII cpio archive (SVR4 with no CRC)
[root@srv ~]# mkdir initramfs-unpacked-3
[root@srv ~]# cd initramfs-unpacked-3
[root@srv initramfs-unpacked-3]# cat /boot/initramfs-3.10.0-957.10.1.el7.x86_64.img | cpio -idmv
.
early_cpio
kernel
kernel/x86
kernel/x86/microcode
kernel/x86/microcode/AuthenticAMD.bin
kernel/x86/microcode/GenuineIntel.bin
3412 blocks
[root@srv initramfs-unpacked-3]# dd if=/boot/initramfs-3.10.0-957.10.1.el7.x86_64.img of=initramfs-tmp.img bs=512 skip=3412
84129+1 records in
84129+1 records out
43074399 bytes (43 MB) copied, 0.191311 s, 225 MB/s
[root@srv initramfs-unpacked-3]# ls
early_cpio  initramfs-tmp.img  kernel
[root@srv initramfs-unpacked-3]# file initramfs-tmp.img 
initramfs-tmp.img: gzip compressed data, from Unix, last modified: Fri Mar 29 13:49:41 2019, max compression
[root@srv initramfs-unpacked-3]# zcat ./initramfs-tmp.img | cpio -idm
164026 blocks
[root@srv initramfs-unpacked-3]# ls -al
total 42128
drwxr-xr-x. 13 root root     4096 Apr  1 12:38 .
dr-xr-x---. 10 root root     4096 Apr  1 12:38 ..
lrwxrwxrwx.  1 root root        7 Apr  1 12:38 bin -> usr/bin
drwxr-xr-x.  2 root root     4096 Apr  1 12:38 dev
-rw-r--r--.  1 root root        2 Mar 29 13:49 early_cpio
drwxr-xr-x.  9 root root     4096 Apr  1 12:38 etc
lrwxrwxrwx.  1 root root       23 Apr  1 12:38 init -> usr/lib/systemd/systemd
-rw-r--r--.  1 root root 43074399 Apr  1 12:35 initramfs-tmp.img
drwxr-xr-x.  3 root root     4096 Mar 29 13:49 kernel
lrwxrwxrwx.  1 root root        7 Apr  1 12:38 lib -> usr/lib
lrwxrwxrwx.  1 root root        9 Apr  1 12:38 lib64 -> usr/lib64
drwxr-xr-x.  2 root root     4096 Mar 29 13:49 proc
drwxr-xr-x.  2 root root     4096 Mar 29 13:49 root
drwxr-xr-x.  2 root root     4096 Mar 29 13:49 run
lrwxrwxrwx.  1 root root        8 Apr  1 12:38 sbin -> usr/sbin
-rwxr-xr-x.  1 root root     3117 Nov  2 17:40 shutdown
drwxr-xr-x.  2 root root     4096 Mar 29 13:49 sys
drwxr-xr-x.  2 root root     4096 Mar 29 13:49 sysroot
drwxr-xr-x.  2 root root     4096 Mar 29 13:49 tmp
drwxr-xr-x.  7 root root     4096 Apr  1 12:38 usr
drwxr-xr-x.  3 root root     4096 Apr  1 12:38 var

Tune nginx proxy cache – control the cache manager how to delete cached files

In most cases you’ll never want to modify the default settings for deleting cache items with proxy_cache_path directives. The problem is in a peak the file deleting could impact your server performance and even it could kill your server leaving it unresponsive for a period of time. You cannot instruct nginx with a schedule job for deletion cached items or ban the deletion when the server is busy or loaded. The manager just traces each zone for used cache capacity versus the maximum allowed size and if the used capacity is near or bigger than the maximum allowed size (max_size) the manager process triggers deletion with the default values – the nginx manager will try to delete at least 100 files (up to 200 milliseconds) and then it will sleep for 50 milliseconds then again it will try deleting 100 files. So your file system could receive at least 1000 files per second to delete!

This could lead your server to almost unresponsive state in the peaks.

And it could be perfectly OK in off-peaks, but there is no way how to tell nginx cache manager there is a plenty free space despite you reach the cache limit so at the moment it is not the best time to delete the cache!

You can tune three parameters per cache directory (manual here: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path):

  • manager_files – not more than this number of files to delete in one iteration. The default value is 100.
  • manager_threshold – limit the delete iteration time. The default value is 200 milliseconds and you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
  • manager_sleep – how much time to sleep the manager before executing another delete iteration. The default value is 50 milliseconds and here you must use nginx time syntax concatenated to the number you want, for example if you want 500 milliseconds you must use “500 ms”.
        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g manager_files=2 manager_sleep=200ms manager_threshold=500ms;

The cache manager will delete not more than 2 files for up to 500 milliseconds and it will sleep 200 milliseconds before another delete iteration.

The best option for loaded servers

The best option for loaded servers with full cache is to balance the free space – delete small amount of files at once to be sure your server will not get loaded even the free space decreases at the peaks (so more files are cached than the nginx manager could delete – you are aware of this and the free space should be enough), but during the off peak (which normally is several times longer than the peak) the nginx manager could catch up with the deleting and it should free up some space (cached files are lesser than the deleted ones). Of course, you should tune this according to your situation.
The main idea is to delete in small amounts of files to not saturate your disks it could take longer to recover the free space, but it will not load your server in peaks. You should consider two things:

  1. Free space – enough free space and to be sure the free space is enough for the peaks, when the cache could grow above the threshold.
  2. Number of deletions per iteration – you should experiment with this. Fist you should be away how many files are added for a period of time, which includes one peak and one off-peak and then to balance the number in such a way that after the period the cache is not above the maximum size. Probably the best is to start with a 24 hours period, which includes at least one peak.

As you can see the example above only 2 files are good enough for an iteration for our case. Taking into account the 200ms sleep between the files’ deletions 10 files at most should be deleted per second. In our case it is not enough for the peak, but for the off-peak, which is 20 hours every 24 hours, is good enough to get into the maximum size limit of the cache.

Here you can learn how to verify your nginx is deleting cache files and the impact of the default settings on a busy server in a peak: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) Our loaded server just stopped serving files and the bandwidth decreased with 99% because nginx cache manager suddenly started deleting cached files.

how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process)

In peaks deleting files could kill your server and easily the traffic could degraded multiple times than normal if the nginx cache manager start deleting files!

The server is perfectly normal but suddenly it just get loaded and all nginx processes are in D (“Disk sleep”) state.

What could it be? What is going on with your proxy server?

Probably the cache is full!

Unfortunately there is no way to check how much is filled the cache live – just an upgrade or restart of the nginx process will trigger nginx cache loader to check all the cache files and will write the cache size on exit in the error log – but be careful the cache loading is also IO intensive operation – stats all the cache files and they could be millions images).

If you are sure the cache manager is to blame for the IO of your server (probably using this method – Check whether nginx cache manager is deleting files at the moment), you can stop it almost immediately!

Just increase the nginx cache drastically – add zero to the maximum cache size

Of course, you should have enough free space till you resolve the problem – for example more servers or manual deletion on peak-off or tune your cache deletion or any other solution….
Search for something like

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=400g

And add zero to the max_size number like:

        proxy_cache_path /mnt/cache levels=1:2 keys_zone=CACHESTATICS:900m inactive=710h max_size=4000g

The max size will increase from 400G to 4000G (4T)!
This will effectively stop the files deleting and the nginx cache manager will have slept for long time before invoking again to delete files. This could be life saving operation for your server at peak!

Here is a real graph from one of our servers – the cache manager started deleting files from the cache and the traffic dropped 99%!!!

SCREENSHOT 1) The nginx cache manager just started to delete files from the cache and this operation just killed our server completely.

You can see almost zero bandwidth! The problem was resolved when we reloaded nginx with a bigger cache max_size value. The nginx manager immediately went to sleep and no IO for deleting files. The load of the server returned to normal!

main menu
nginx cache manager start deleting files

SCREENSHOT 2) Hard drives were saturated and the disk maxed the IO time to 10 ms.

Despite the bigger READ and WRITE IOPS there was 95-99% less traffic.

main menu
Disk IO Time when cache manager is working

Then you can tune the values for deleting files from the cache – Tune nginx proxy cache – control the cache manager how to delete cached files.

Check whether nginx cache manager is deleting files at the moment

Here is a tip for the webmasters (or system admins) to discover whether the nginx using proxy_cache to cache files is deleting files at the moment! There situation where you may need to know if the loaded of a static media server is caused by the deletion of the cache manager or by the read or seek operations when serving the static files. The deletion is really slow and IO intensive operation, which could greatly impact the performance and traffic of the server.
Find the process nginx’s “cache manager process” and strace it:

[root@srv ~]# ps axuf|grep nginx
root     31582  0.0  0.0 2906768 25108 ?       Ss   Feb15   0:01 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx    16008  1.9  1.3 2941188 440224 ?      S    16:39   1:33  \_ nginx: worker process
nginx    16009  1.5  1.2 2941188 398836 ?      S    16:39   1:12  \_ nginx: worker process
nginx    16010  0.5  0.7 2941984 239064 ?      S    16:39   0:26  \_ nginx: worker process
nginx    16011  0.7  0.9 2941984 299356 ?      D    16:39   0:35  \_ nginx: worker process
nginx    16012  1.2  1.1 2941188 389540 ?      D    16:39   1:01  \_ nginx: worker process
nginx    16013  2.3  1.5 2941188 487324 ?      D    16:39   1:55  \_ nginx: worker process
nginx    16014  0.0  0.6 2906772 224004 ?      S    16:39   0:01  \_ nginx: cache manager process
[root@srv ~]# strace -f -p 16014
strace: Process 16014 attached
gettid()                                = 16014
write(31, "2019/02/25 18:00:31 [info] 16014"..., 89) = 89
epoll_wait(36, [], 512, 5406)           = 0
unlink("/mnt/cache/0/39/c8ccbbc06d16debb1c8d58ceb6f99390") = 0
unlink("/mnt/cache/0/78/118924d7bf70e20fa8f790c6f9e7c780") = 0
unlink("/mnt/cache/3/ce/fab074cc670e6a80114dcbc398a63ce3") = 0
unlink("/mnt/cache/5/48/0b4e162dd7be8244815721fb7d68e485") = 0
unlink("/mnt/cache/5/56/e5eb4b38c7c8d209d0aabaf79ac02565") = 0
unlink("/mnt/cache/e/c6/207b432fa77375e4eefcaf52db250c6e") = 0
unlink("/mnt/cache/4/6d/ac0db27a03dabc79d869068db1b516d4") = 0
unlink("/mnt/cache/9/e8/91625c6e60de8e5425c4135c7dfb2e89") = 0
unlink("/mnt/cache/b/3c/f3c53000cf0cb20d55d8c09df8a733cb") = 0
unlink("/mnt/cache/f/f7/6f06423cd411b45816969fe020903f7f") = 0
unlink("/mnt/cache/f/50/c9b8ab72821a6e9bcb9c8d4b790dc50f") = 0
unlink("/mnt/cache/6/1f/74b0f1fdf1ac30db6af7793dc15671f6") = 0
unlink("/mnt/cache/0/83/caf199c1b99d438f96caec71bf2ea830") = 0
unlink("/mnt/cache/4/3d/c90f8fbbba4aaf407e386641dc2203d4") = 0
unlink("/mnt/cache/4/ad/d23cf8598020141b2bcec46d2b5cbad4") = 0
unlink("/mnt/cache/d/47/05973bc310503f36c67b7c1c24c8247d") = 0
unlink("/mnt/cache/f/11/e4fcbde8533d89105ab41f22c55e211f") = 0
unlink("/mnt/cache/2/06/29066a58e4116d24266026b4ed1e3062") = 0
epoll_wait(32, [], 512, 50)             = 0
unlink("/mnt/cache/4/6b/9a104ebdf70d00137a88d4584b2bb6b4") = 0
unlink("/mnt/cache/e/95/6d176447f57f21769d86a8f0b2a8b95e") = 0
unlink("/mnt/cache/b/b2/2f6f51163c65ae1fc06a913d6de1ab2b") = 0
unlink("/mnt/cache/a/24/2b058045a23b69de7a4442c9e6fce24a") = 0
unlink("/mnt/cache/7/60/00833e0b236ca8472f5be8227d645607") = 0
unlink("/mnt/cache/a/08/bf00eea300eff97dc4fffa61daaca08a") = 0
unlink("/mnt/cache/2/48/a291d8aca2b6f4f9471686eabe9b2482") = 0
unlink("/mnt/cache/0/e3/2d631adbc3bfdf8e44a51fa5453eee30") = 0
unlink("/mnt/cache/1/3b/08eef7c86c5ece9b5279b304dd86e3b1") = 0
unlink("/mnt/cache/b/a4/03213e4a8a1e8fb17ae698e54e70fa4b") = 0
unlink("/mnt/cache/b/a3/77f1b11811a9cda0ae93c498769f7a3b") = 0
unlink("/mnt/cache/4/01/1d50fac60681ae3263c8875775d20014") = 0
unlink("/mnt/cache/c/94/e71b96cbc65b248bd8e4540cbd69294c") = 0
unlink("/mnt/cache/1/59/99ec58e865b97e217835dd84f5f48591") = 0
unlink("/mnt/cache/4/b8/6a64825ce555b8f2440f051a7f7bcb84") = 0
unlink("/mnt/cache/7/51/fe2acbb895427ed8e406ce7e79d61517") = 0
.....
.....

You can tune the file removing from the cache with manager_files, manager_threshold and manager_sleep arguments of the proxy_cache_path.
If you came here searching information on the topic probably you should check out these articles, too: how to disable effectively the deleting (purging) files from nginx proxy_cache (nginx cache manager process) and Tune nginx proxy cache – control the cache manager how to delete cached files

mysql – Error ‘Your password does not satisfy the current policy requirements’ or zero length mysql password

We got this error when granting permissions for one of our new slave server (it could be for an ordinary MySQL server, too):

Error 'Your password does not satisfy the current policy requirements' on query. Default database: ''. Query: 'GRANT REPLICATION SLAVE ON *.* TO 'reusr'@'127.0.01''

It appeared that MySQL has activated by default a password checking plugin and our password in the GRANT (or SET PASSWORD) option didn’t meet the requirements.
So here is what you can do:

OPTION 1) Lower the password policy level

Check the policy level and lower it if it is MEDIUM or HIGH (they are there options LOW=0, MEDIUM=1 the default and HIGH=2). The policy level controls how to check and what is involved in the complexity algorithm for the passwords. More details here – https://dev.mysql.com/doc/refman/5.7/en/validate-password-options-variables.html#sysvar_validate_password_policy. Here is what you have:

[myuser@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
....
....
mysql> SHOW VARIABLES LIKE 'validate%';
+--------------------------------------+--------+
| Variable_name                        | Value  |
+--------------------------------------+--------+
| validate_password_check_user_name    | OFF    |
| validate_password_dictionary_file    |        |
| validate_password_length             | 8      |
| validate_password_mixed_case_count   | 1      |
| validate_password_number_count       | 1      |
| validate_password_policy             | MEDIUM |
| validate_password_special_char_count | 1      |
+--------------------------------------+--------+
7 rows in set (0.00 sec)

So set the validate_password_policy=0 and try again your query:

mysql> set global validate_password_policy=0;
Query OK, 0 rows affected (0.00 sec)

If you still get the error your password is lower than the validate_password_length (=8 by default) so you need to change it at last to 8 characters. But what if you what zero password (or with 1,2,3 characters)? Setting validate_password_length to 0 won’t work, because there is a hard limit to 4, so you cannot set it to 0 event the set query is not reporting error when using 0 with validate_password_length.

You should uninstall the plugin.

OPTION 2) Uninstall the MySQL Validation Plugin

You can uninstall the validation plugin on-the-fly in a working server without restarting or reloading and then you can set whatever password you like.
Here is how to do it:

[myuser@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
....
....
mysql> UNINSTALL PLUGIN validate_password;
Query OK, 0 rows affected (0.03 sec)

mysql> SHOW VARIABLES LIKE 'validate%';
Empty set (0.01 sec)

As you can see no “validate_password” variables are available anymore! Now set your password.
But there is a catch, if you have started the server with “–validate-password=FORCE_PLUS_PERMANENT” (you can check it with “ps axuf|grep mysqld” in the command line) you won’t be able to uninstall the plugin live even with the root MySQL user. So at the end if you do not have root permissions to restart the MySQL service without this option it might be better to change your password or skip the query if it is received by the slave in the MySQL replication bin log.
You can install the plugin again with:

[myuser@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
....
....
mysql> INSTALL PLUGIN validate_password SONAME 'validate_password.so';
Query OK, 0 rows affected (0.00 sec)

And it will be available over restarts, too, because it is registered in “mysql.plugin” table.

mysql – Error ‘Column count of mysql.user is wrong. Expected 45, found 43. The table is probably corrupted’ on query.

If you

upgraded your MySQL server (from 5.6 to 5.7 or above)

or

imported a MySQL dump SQL file from older version

than your current server you may encounter when granting permissions to a user:

Error 'Column count of mysql.user is wrong. Expected 45, found 43. The table is probably corrupted' on query. Default database: ''. Query: 'GRANT REPLICATION SLAVE ON *.* TO 'replusr'@'144.76.156.182''

Do not panic probably it is not corrupted just continue reading.

There is the simple fix, just

execute mysql_upgrade

It will automatically detect what to upgrade and it will upgrade it:

[myuser@mysql1 ~]# screen -R upgrade
[myuser@mysql1 ~]# mysql_upgrade 
Checking if update is needed.
Checking server version.
Running queries to upgrade MySQL server.
Checking system database.
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.engine_cost                                  OK
mysql.event                                        OK
mysql.func                                         OK
mysql.general_log                                  OK
mysql.gtid_executed                                OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.host                                         OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.ndb_binlog_index                             OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.server_cost                                  OK
mysql.servers                                      OK
mysql.slave_master_info                            OK
mysql.slave_relay_log_info                         OK
mysql.slave_worker_info                            OK
mysql.slow_log                                     OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.user                                         OK
The sys schema is already up to date (version 1.5.1).
Found 0 sys functions, but expected 22. Re-installing the sys schema.
Upgrading the sys schema.
Checking databases.
phpmyadmin.pma__bookmark                           OK
phpmyadmin.pma__central_columns                    OK
phpmyadmin.pma__column_info                        OK
phpmyadmin.pma__designer_settings                  OK
phpmyadmin.pma__export_templates                   OK
phpmyadmin.pma__favorite                           OK
phpmyadmin.pma__history                            OK
phpmyadmin.pma__navigationhiding                   OK
phpmyadmin.pma__pdf_pages                          OK
phpmyadmin.pma__recent                             OK
phpmyadmin.pma__relation                           OK
phpmyadmin.pma__savedsearches                      OK
phpmyadmin.pma__table_coords                       OK
phpmyadmin.pma__table_info                         OK
phpmyadmin.pma__table_uiprefs                      OK
phpmyadmin.pma__tracking                           OK
phpmyadmin.pma__userconfig                         OK
phpmyadmin.pma__usergroups                         OK
phpmyadmin.pma__users                              OK
sys.sys_config                                     OK
db1.access                                         OK
db1.users                                          OK
db1.objects                                        OK
db1.isp                                            OK
db1.desc                                           OK
Upgrade process completed successfully.
Checking if update is needed.

It works when the server is up and running and it is a good idea to execute the command in a screen.
It does not need to be logged as root, but mysql_upgrade does need to have the root MySQL password. In the example above it did not asked for password, because we have it in ~/.my.cnf file.

Just to note you might upgraded a long before this error to appear!

If you do not use a certain functionality you could live up happily with the old mysql.user scheme (and all old mysql.* tables). In our case we upgraded one of our slaves and several days after when a grant command on the master was issued the replication just stopped with this error! Of course, if someone were used the command in our slave the error would have appeared there sooner.
We also had case where old MySQL SQL dump file (5.6) was imported in a newer MySQL server 5.7 and there had been no issues for weeks till the GRANT command.

perror

Th error code is 1805.

[myuser@mysql1 ~]# perror 1805
MySQL error code 1805 (ER_COL_COUNT_DOESNT_MATCH_CORRUPTED_V2): Column count of %s.%s is wrong. Expected %d, found %d. The table is probably corrupted