Kernel building failure – unable to initialize decompress status for section .debug_info

Upgrading the Gentoo system may lead to some glitches especially if not emerge the world slot very often!
In fact, leaving older packages living with the new one using slots may also lead to glitches of the same type! Here is one example, where leaving old packages may prevent the user to build some packages or the Linux kernel itself.

.....
.....
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o lib/slub_kunit.ko lib/slub_kunit.o lib/slub_kunit.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o lib/ts_bm.ko lib/ts_bm.o lib/ts_bm.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o lib/ts_fsm.ko lib/ts_fsm.o lib/ts_fsm.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o lib/ts_kmp.ko lib/ts_kmp.o lib/ts_kmp.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o mm/kfence/kfence_test.ko mm/kfence/kfence_test.o mm/kfence/kfence_test.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/6lowpan.ko net/6lowpan/6lowpan.o net/6lowpan/6lowpan.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_dest.ko net/6lowpan/nhc_dest.o net/6lowpan/nhc_dest.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_fragment.ko net/6lowpan/nhc_fragment.o net/6lowpan/nhc_fragment.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_ghc_ext_dest.ko net/6lowpan/nhc_ghc_ext_dest.o net/6lowpan/nhc_ghc_ext_dest.mod.o;  true
x86_64-pc-linux-gnu-ld.bfd: mm/kfence/kfence_test.o: unable to initialize decompress status for section .debug_info
x86_64-pc-linux-gnu-ld.bfd: mm/kfence/kfence_test.o: unable to initialize decompress status for section .debug_info
x86_64-pc-linux-gnu-ld.bfd: mm/kfence/kfence_test.o: unable to initialize decompress status for section .debug_info
x86_64-pc-linux-gnu-ld.bfd: mm/kfence/kfence_test.o: unable to initialize decompress status for section .debug_info
mm/kfence/kfence_test.o: file not recognized: file format not recognized
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_ghc_ext_frag.ko net/6lowpan/nhc_ghc_ext_frag.o net/6lowpan/nhc_ghc_ext_frag.mod.o;  true
make[3]: *** [/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/linux-5.15/scripts/Makefile.modfinal:59: mm/kfence/kfence_test.ko] Error 1
make[3]: *** Waiting for unfinished jobs....
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_ghc_ext_hop.ko net/6lowpan/nhc_ghc_ext_hop.o net/6lowpan/nhc_ghc_ext_hop.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_ghc_ext_route.ko net/6lowpan/nhc_ghc_ext_route.o net/6lowpan/nhc_ghc_ext_route.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_ghc_icmpv6.ko net/6lowpan/nhc_ghc_icmpv6.o net/6lowpan/nhc_ghc_icmpv6.mod.o;  true
  x86_64-pc-linux-gnu-ld.bfd -r -m elf_x86_64 --build-id=sha1  -T scripts/module.lds -o net/6lowpan/nhc_ghc_udp.ko net/6lowpan/nhc_ghc_udp.o net/6lowpan/nhc_ghc_udp.mod.o;  true
make[2]: *** [/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/linux-5.15/scripts/Makefile.modpost:140: __modpost] Error 2
make[1]: *** [/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/linux-5.15/Makefile:1783: modules] Error 2
make[1]: Leaving directory '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/build'
make: *** [Makefile:219: __sub-make] Error 2
 * ERROR: sys-kernel/gentoo-kernel-5.15.5::gentoo failed (compile phase):
 *   emake failed
 * 
 * If you need support, post the output of `emerge --info '=sys-kernel/gentoo-kernel-5.15.5::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=sys-kernel/gentoo-kernel-5.15.5::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/temp/environment'.
 * Working directory: '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/linux-5.15'
 * S: '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/linux-5.15'

>>> Failed to emerge sys-kernel/gentoo-kernel-5.15.5, Log file:

>>>  '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/temp/build.log'

 * Messages for package sys-kernel/gentoo-kernel-5.15.5:

 * ERROR: sys-kernel/gentoo-kernel-5.15.5::gentoo failed (compile phase):
 *   emake failed
 * 
 * If you need support, post the output of `emerge --info '=sys-kernel/gentoo-kernel-5.15.5::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=sys-kernel/gentoo-kernel-5.15.5::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/temp/environment'.
 * Working directory: '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/linux-5.15'
 * S: '/var/tmp/portage/sys-kernel/gentoo-kernel-5.15.5/work/linux-5.15'

The problem appears to be leaving multiple old sys-devel/binutils:

 sys-devel/binutils
    selected: 2.31.1-r3 2.32-r1 2.34 2.35.2  
   protected: none 
     omitted: 2.37_p1-r1

Removing the old versions of sys-devel/binutils 2.31.1-r3 2.32-r1 2.34 2.35.2 and leaving only the latest one solves the problem with the above error “unable to initialize decompress status for section .debug_info”.

emerge -vaC "<sys-devel/binutils-2.37_p1-r1"

Of course, an older version of sys-devel/binutils or a buggy one could lead to such an error! Update the sys-devel/binutils or change the version if the above error is hit.

Elasticsearch failed to set password apm_system error in initial setup

A relatively typical error when installing a single node Elastic Elasticsearch software is when the passwords are set:

[root@loganalyzer elasticsearch]# ./bin/elasticsearch-setup-passwords -v auto

Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user.
The passwords will be randomly generated and printed to the console.
Please confirm that you would like to continue [y/N]y



Connection failure to: http://192.168.0.4:9200/_security/user/apm_system/_password?pretty failed: Read timed out


ERROR: Failed to set password for user [apm_system].

Such error may prevent the initial password setting of several important passwords and compromise the Elasticsearch software security model. Even including the

discovery.type: single-node

in the /etc/elasticsearch/elasticsearch.yml would lead to such error. The missing option in the configuration /etc/elasticsearch/elasticsearch.yml is:

discovery.seed_hosts: ["node-1"]

By default, this option is commented out and it should be set on initial installation, though it is not required when starting the elasticsearch node (with no security model enabled)!
This is an array with all the servers’ hostnames in the cluster setup. In single-node mode, this option (discovery.seed_hosts) should be set only to the hostname of the single node like in this case “node-1”. This is the hostname of the server. The user must include the user’s current server hostname, not this example name “node-1”!

Setting the right hostname for discovery.seed_hosts in /etc/elasticsearch/elasticsearch.yml would let the user to set all password with the Elasticsearch tool elasticsearch-setup-passwords

The error may occur in a cluster setup with multiple servers, too, if the hosts are not filled in this option – discovery.seed_hosts.
Here is what to expect when executing elasticsearch-setup-passwords (even with some RED indexes):

[root@loganalyzer ~]# cd /usr/share/elasticsearch/
[root@loganalyzer elasticsearch]# ./bin/elasticsearch-setup-passwords -v auto

Your cluster health is currently RED.
This means that some cluster data is unavailable and your cluster is not fully functional.

It is recommended that you resolve the issues with your cluster before running elasticsearch-setup-passwords.
It is very likely that the password changes will fail when run against an unhealthy cluster.

Do you want to continue with the password setup process [y/N]y

Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user.
The passwords will be randomly generated and printed to the console.
Please confirm that you would like to continue [y/N]y


Changed password for user apm_system
PASSWORD apm_system = judakai2Wai9Saiph8ah

Changed password for user kibana_system
PASSWORD kibana_system = eisiadit3CieG4Requie

Changed password for user kibana
PASSWORD kibana = bi3NohquohLoonaizei1

Changed password for user logstash_system
PASSWORD logstash_system = AhC2kue5eeR4eK1LeeZa

Changed password for user beats_system
PASSWORD beats_system = reeyu8ooj8Eebee5ni2c

Changed password for user remote_monitoring_user
PASSWORD remote_monitoring_user = aeshahx9Ohkoph3rai6a

Changed password for user elastic
PASSWORD elastic = beiPhei4xu5iXailocei

No errors and the password are set successfully.

logrotate stopped rotating a log file

logrotate under CentOS 7 had worked perfectly till one day when a web server log stopped being rotated! All the other logs were rotated successfully. Apparently, there was no reason for this, and the logrotate status file was pointing to the very web server log file was rotated successfully during the night, but it wasn’t really!

The problem appeared to be the presence of an uncompressed file, despite the compressed file of the same file being generated.

Apparently, the logrotate or the compressing process was interrupted and the uncompressed file was left in place. After that, every time rotation happened it just stopped on this step for this log file. So the solution is to remove the compressed file and then the normal rotation will continue as it should be.

The logrotate status file is /var/lib/logrotate/logrotate.status and it is just a text file with the time of the last rotation for each log file:

"/var/log/nginx/error.log" 2021-11-2-9:33:26
"/var/log/nginx/media.static.log" 2021-11-2-9:33:26
"/var/log/php-fpm/www.access.log" 2021-11-2-9:33:26
"/var/log/yum.log" 2021-11-2-9:31:49
"/var/log/httpd/*log" 2021-11-2-9:0:0
"/var/log/wtmp" 2021-11-2-9:33:26
"/var/log/chrony/*.log" 2021-9-25-3:0:0
....

But the file has been one big file for the last 30 days:

[root@srv nginx]# ls -altr |tail -n 15
-rw-r-----.  1 nginx adm         1199 Oct  2 02:42 media.static.error.log-20211002.gz
-rw-r-----.  1 nginx adm         1919 Oct  2 02:54 error.log-20211002.gz
-rw-r-----.  1 nginx adm     39174314 Oct  2 03:16 media.static.log-20211002.gz
-rw-r-----.  1 nginx adm       307922 Oct  2 03:16 access.log-20211002.gz
-rw-r-----.  1 nginx adm         1461 Oct  3 02:16 media.static.error.log-20211003.gz
-rw-r-----.  1 nginx adm         1892 Oct  3 02:42 error.log-20211003.gz
-rw-r-----.  1 nginx adm       518079 Oct  3 03:38 access.log-20211003.gz
-rw-r-----.  1 nginx adm   1666744662 Oct  3 03:38 media.static.log-20211003
drwxr-xr-x.  2 root  root       20480 Oct  4 03:31 .
-rw-r-----.  1 nginx adm     23642112 Oct  4 03:31 media.static.log-20211003.gz
-rw-r-----.  1 nginx adm       633643 Nov  2 06:29 media.static.error.log
-rw-r-----.  1 nginx adm       680493 Nov  2 09:30 error.log
drwxr-xr-x. 16 root  root        4096 Nov  2 09:31 ..
-rw-r-----.  1 nginx adm    279561983 Nov  2 09:43 access.log
-rw-r-----.  1 nginx adm  29415439068 Nov  2 09:43 media.static.log

Debug the problem

Executing the logrotate manually will just report that the log files does not need rotation based on the time in the logrotate status file is /var/lib/logrotate/logrotate.status and in the status file it says the file has been rotated on “2021-11-2-9:33:26”!
But executing with force option reveals the real problem:
There is an uncompressed file and a compressed one with the same only with the suffix .gz.
So the uncompressed log file was wrong left in place after compression (or probably the compression or rotation process was interrupted). After the event (probably interruption of the rotating process) every day the logrotate outputted error when rotating this very same log file, but it had been written in the logrorate status file as such the rotating was successful with the latest date and time.

[root@srv nginx]# logrotate -vf /etc/logrotate.conf 
....
rotating log /var/log/nginx/media.static.log, log->rotateCount is 1000
dateext suffix '-20211102'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
compressing log with: /bin/gzip
set default create context to system_u:object_r:httpd_log_t:s0
error: error creating output file /var/log/nginx/media.static.log-20211003.gz: File exists

rotating pattern: /var/log/php-fpm/*log  forced from command line (4 rotations)
empty log files are not rotated, old logs are removed
....

The solution

First, remove the compressed file force logrotate to rotate the logs.

[root@srv nginx]# rm media.static.log-20211003.gz
[root@srv nginx]# logrotate -vf /etc/logrotate.conf
reading config file /etc/logrotate.conf
including /etc/logrotate.d
reading config file chrony
reading config file nginx
reading config file php-fpm
....
rotating log /var/log/nginx/media.static.log, log->rotateCount is 1000
dateext suffix '-20211003'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
compressing log with: /bin/gzip
set default create context to system_u:object_r:httpd_log_t:s0
destination /var/log/nginx/media.static.log-20211003 already exists, skipping rotation
....

No error, just skipping the rotation (the renaming of the file) and compressing it.

Changing the queueing discipline – Error: Cannot delete qdisc with handle of zero.

If trying to delete a qdisc root to set no rules and no queuing algorithm probably for the purpose to add a new one a replace command is the right option!
An error saying there are no existing rules:

[root@srv ~]# tc qdisc del dev ens1f0 root
Error: Cannot delete qdisc with handle of zero.

Changing the qdisc with another. The example below replaces the MQ (multiqueues) with FQ (Fair Queue policy):

[root@srv ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP group default qlen 1000
    link/ether b4:96:54:9f:10:71 brd ff:ff:ff:ff:ff:ff
    inet 84.16.231.24/26 brd 84.16.231.63 scope global noprefixroute ens1f0
       valid_lft forever preferred_lft forever
    inet6 fe80::b696:91ff:fe8e:860/64 scope link 
       valid_lft forever preferred_lft forever
[root@srv ~]# tc qdisc replace dev ens1f0 root fq
[root@srv ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether b4:96:54:9f:10:71 brd ff:ff:ff:ff:ff:ff
    inet 84.16.231.24/26 brd 84.16.231.63 scope global noprefixroute ens1f0
       valid_lft forever preferred_lft forever
    inet6 fe80::b696:91ff:fe8e:860/64 scope link 
       valid_lft forever preferred_lft forever

The replace command is used,but the add command has the same effect in this case. del cannot remove the queueing discipline from a physical network device. noqueue might be only special devices like loopback (i.e. localhost), bonding network devices and so on.

In CentOS, for additional kernel modules use kernel-modules-extra package if some qdiscs are missing.

Gentoo ERROR: Python module pytevent of version 0.10.2 not found, and bundling disabled

Emerging the sys-libs/ldb-2.3.0-r1 package may fail with an error for a missing Python mode, despite the sys-libs/tevent with a python USE flag is presented in the system:

Checking for system tevent (>=0.10.2)                                                           : yes 
ERROR: Python module pytevent of version 0.10.2 not found, and bundling disabled
 * ERROR: sys-libs/ldb-2.3.0-r1::gentoo failed (configure phase):
 *   configure failed
 * 
 * Call stack:

Indeed, the tevent (>=0.10.2) is found, but not the Python module! And the checking pase of the setup fails.
First, check whether the USE of sys-libs/tevent has python and the right version PYTHON_SINGLE_TARGET=”python3_8″ is used (the Python version may vary here):

root@srv # emerge -pv =tevent-0.10.2::gentoo

[ebuild   R    ] sys-libs/tevent-0.10.2::gentoo  USE="python" ABI_X86="32 (64) (-x32)" PYTHON_SINGLE_TARGET="python3_8 -python3_9" 0 KiB

Total: 1 package (1 reinstall), Size of downloads: 0 KiB

This system uses Python 3.8 and the library sys-libs/tevent was built with this USE flag.

The problem here is the tevent is installed under a its own directory: /usr/lib64/tevent. Using the ldconig utility the problem quickly has been resolved. Just add a file /etc/ld.so.conf.d/tevent_ldb.conf with the path to the library and then regenerate the ldconfig:

root@srv # cat /etc/ld.so.conf.d/tevent_ldb.conf
/usr/lib64/tevent
root@srv # ldconfig
root@srv # 

Do not forget to run the “ldconfig“, because tevent library won’t be added to the LD cache. Emerging the sys-libs/ldb and then samba was successful after this quick workaround! There is a Gentoo bug reported, but the problem still exists – https://bugs.gentoo.org/590026

Zookeeper cluster delete all in a subtree – Failed to delete some node(s) in the subtree!

Deleting a node with a subtree in a Zookeeper is pretty easy and it could be done even with the Zookeeper cli from the command-line.

Deleting a node, which has a subtree could be done with the command “deleteall” but the command should be executing on the leader in the Zookeeper cluster!

If the user receives the following error:

root@zk01:/apache-zookeeper-3.7.0-bin# ./bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 25] deleteall /clickhouse/statdata/collectd_metrics/replicas/replica_2
Failed to delete some node(s) in the subtree!

First, check if the Zookeeper node is the leader, and second, execute the deleteall command on the path, which should be deleted. Using the zkServer.sh in the Zookeeper bin directory to check whether the Zookeeper node is the leader one:

root@zk03:/apache-zookeeper-3.7.0-bin# ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
root@zk03:/apache-zookeeper-3.7.0-bin# ./bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] deleteall /clickhouse/statdata/collectd_metrics/replicas/replica_2
[zk: localhost:2181(CONNECTED) 1] 

No error, when the command deleteall is executed in the leader. No additional output either.

Deleting with the “delete” command will result in an error, too:

[zk: localhost:2181(CONNECTED) 0] delete /clickhouse/statdata/collectd_metrics/replicas/replica_2
Node not empty: /clickhouse/statdata/collectd_metrics/replicas/replica_2

Zookeeper cli output

The follower’s output with delete, deleteall and list command ls:

root@zk01:/apache-zookeeper-3.7.0-bin# ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
root@zk01:/apache-zookeeper-3.7.0-bin# ./bin/zkCli.sh
Connecting to localhost:2181
2021-09-21 04:24:34,369 [myid:] - INFO  [main:Environment@98] - Client environment:zookeeper.version=3.7.0-e3704b390a6697bfdf4b0bef79e3da7a4f6bac4b, built on 2021-03-17 09:46 UTC
2021-09-21 04:24:34,373 [myid:] - INFO  [main:Environment@98] - Client environment:host.name=zk01
2021-09-21 04:24:34,373 [myid:] - INFO  [main:Environment@98] - Client environment:java.version=11.0.12
2021-09-21 04:24:34,375 [myid:] - INFO  [main:Environment@98] - Client environment:java.vendor=Oracle Corporation
2021-09-21 04:24:34,376 [myid:] - INFO  [main:Environment@98] - Client environment:java.home=/usr/local/openjdk-11
2021-09-21 04:24:34,376 [myid:] - INFO  [main:Environment@98] - Client environment:java.class.path=/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.7.0-bin/bin/../build/classes:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.7.0-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-prometheus-metrics-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-jute-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/snappy-java-1.1.7.7.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/slf4j-log4j12-1.7.30.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/slf4j-api-1.7.30.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_servlet-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_hotspot-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_common-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-native-unix-common-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-native-epoll-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-resolver-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-handler-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-common-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-codec-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-buffer-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/metrics-core-4.1.12.1.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jline-2.14.6.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-util-ajax-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-util-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-servlet-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-server-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-security-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-io-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-http-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-databind-2.10.5.1.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-core-2.10.5.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-annotations-2.10.5.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/commons-cli-1.4.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/audience-annotations-0.12.0.jar:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/conf:
2021-09-21 04:24:34,376 [myid:] - INFO  [main:Environment@98] - Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2021-09-21 04:24:34,376 [myid:] - INFO  [main:Environment@98] - Client environment:java.io.tmpdir=/tmp
2021-09-21 04:24:34,376 [myid:] - INFO  [main:Environment@98] - Client environment:java.compiler=<NA>
2021-09-21 04:24:34,377 [myid:] - INFO  [main:Environment@98] - Client environment:os.name=Linux
2021-09-21 04:24:34,377 [myid:] - INFO  [main:Environment@98] - Client environment:os.arch=amd64
2021-09-21 04:24:34,377 [myid:] - INFO  [main:Environment@98] - Client environment:os.version=5.4.0-42-generic
2021-09-21 04:24:34,377 [myid:] - INFO  [main:Environment@98] - Client environment:user.name=root
2021-09-21 04:24:34,377 [myid:] - INFO  [main:Environment@98] - Client environment:user.home=/root
2021-09-21 04:24:34,377 [myid:] - INFO  [main:Environment@98] - Client environment:user.dir=/apache-zookeeper-3.7.0-bin
2021-09-21 04:24:34,378 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.free=55MB
2021-09-21 04:24:34,380 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.max=256MB
2021-09-21 04:24:34,380 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.total=64MB
2021-09-21 04:24:34,385 [myid:] - INFO  [main:ZooKeeper@637] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@7946e1f4
2021-09-21 04:24:34,390 [myid:] - INFO  [main:X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-09-21 04:24:34,397 [myid:] - INFO  [main:ClientCnxnSocket@239] - jute.maxbuffer value is 1048575 Bytes
2021-09-21 04:24:34,409 [myid:] - INFO  [main:ClientCnxn@1726] - zookeeper.request.timeout value is 0. feature enabled=false
Welcome to ZooKeeper!
2021-09-21 04:24:34,439 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1171] - Opening socket connection to server localhost/127.0.0.1:2181.
2021-09-21 04:24:34,441 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1173] - SASL config status: Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2021-09-21 04:24:34,454 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1005] - Socket connection established, initiating session, client: /127.0.0.1:40846, server: localhost/127.0.0.1:2181
2021-09-21 04:24:34,467 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1438] - Session establishment complete on server localhost/127.0.0.1:2181, session id = 0x1006ad505c30007, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] delete /clickhouse/statdata/collectd_metrics/replicas/replica_2
Node not empty: /clickhouse/statdata/collectd_metrics/replicas/replica_2
[zk: localhost:2181(CONNECTED) 1] deleteall /clickhouse/statdata/collectd_metrics/replicas/replica_2
Failed to delete some node(s) in the subtree!
[zk: localhost:2181(CONNECTED) 2] ls /clickhouse/statdata/collectd_metrics/replicas/replica_2/parts
[2021_5155978_5164111_2367, 2021_5155978_5164112_2368, 2021_5155978_5164113_2369, 2021_5155978_5164114_2370, 2021_5155978_5164115_2371, 2021_5155978_5164116_2372, 2021_5155978_5164117_2373]

Deleting the node subtree in the Zookeeper leader:

root@zk03:/apache-zookeeper-3.7.0-bin# ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader
root@zk03:/apache-zookeeper-3.7.0-bin# ./bin/zkCli.sh 
Connecting to localhost:2181
2021-09-21 04:41:55,397 [myid:] - INFO  [main:Environment@98] - Client environment:zookeeper.version=3.7.0-e3704b390a6697bfdf4b0bef79e3da7a4f6bac4b, built on 2021-03-17 09:46 UTC
2021-09-21 04:41:55,401 [myid:] - INFO  [main:Environment@98] - Client environment:host.name=zk03
2021-09-21 04:41:55,401 [myid:] - INFO  [main:Environment@98] - Client environment:java.version=11.0.12
2021-09-21 04:41:55,404 [myid:] - INFO  [main:Environment@98] - Client environment:java.vendor=Oracle Corporation
2021-09-21 04:41:55,404 [myid:] - INFO  [main:Environment@98] - Client environment:java.home=/usr/local/openjdk-11
2021-09-21 04:41:55,404 [myid:] - INFO  [main:Environment@98] - Client environment:java.class.path=/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/target/classes:/apache-zookeeper-3.7.0-bin/bin/../build/classes:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/target/lib/*.jar:/apache-zookeeper-3.7.0-bin/bin/../build/lib/*.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-prometheus-metrics-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-jute-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/zookeeper-3.7.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/snappy-java-1.1.7.7.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/slf4j-log4j12-1.7.30.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/slf4j-api-1.7.30.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_servlet-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_hotspot-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient_common-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/simpleclient-0.9.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-native-unix-common-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-native-epoll-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-transport-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-resolver-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-handler-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-common-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-codec-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/netty-buffer-4.1.59.Final.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/metrics-core-4.1.12.1.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/log4j-1.2.17.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jline-2.14.6.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-util-ajax-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-util-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-servlet-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-server-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-security-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-io-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jetty-http-9.4.38.v20210224.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/javax.servlet-api-3.1.0.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-databind-2.10.5.1.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-core-2.10.5.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/jackson-annotations-2.10.5.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/commons-cli-1.4.jar:/apache-zookeeper-3.7.0-bin/bin/../lib/audience-annotations-0.12.0.jar:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-*.jar:/apache-zookeeper-3.7.0-bin/bin/../zookeeper-server/src/main/resources/lib/*.jar:/conf:
2021-09-21 04:41:55,405 [myid:] - INFO  [main:Environment@98] - Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
2021-09-21 04:41:55,405 [myid:] - INFO  [main:Environment@98] - Client environment:java.io.tmpdir=/tmp
2021-09-21 04:41:55,405 [myid:] - INFO  [main:Environment@98] - Client environment:java.compiler=<NA>
2021-09-21 04:41:55,405 [myid:] - INFO  [main:Environment@98] - Client environment:os.name=Linux
2021-09-21 04:41:55,405 [myid:] - INFO  [main:Environment@98] - Client environment:os.arch=amd64
2021-09-21 04:41:55,405 [myid:] - INFO  [main:Environment@98] - Client environment:os.version=5.4.0-42-generic
2021-09-21 04:41:55,406 [myid:] - INFO  [main:Environment@98] - Client environment:user.name=root
2021-09-21 04:41:55,406 [myid:] - INFO  [main:Environment@98] - Client environment:user.home=/root
2021-09-21 04:41:55,406 [myid:] - INFO  [main:Environment@98] - Client environment:user.dir=/apache-zookeeper-3.7.0-bin
2021-09-21 04:41:55,406 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.free=56MB
2021-09-21 04:41:55,408 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.max=256MB
2021-09-21 04:41:55,408 [myid:] - INFO  [main:Environment@98] - Client environment:os.memory.total=64MB
2021-09-21 04:41:55,414 [myid:] - INFO  [main:ZooKeeper@637] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@7946e1f4
2021-09-21 04:41:55,419 [myid:] - INFO  [main:X509Util@77] - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-09-21 04:41:55,427 [myid:] - INFO  [main:ClientCnxnSocket@239] - jute.maxbuffer value is 1048575 Bytes
2021-09-21 04:41:55,439 [myid:] - INFO  [main:ClientCnxn@1726] - zookeeper.request.timeout value is 0. feature enabled=false
Welcome to ZooKeeper!
2021-09-21 04:41:55,473 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1171] - Opening socket connection to server localhost/127.0.0.1:2181.
2021-09-21 04:41:55,477 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1173] - SASL config status: Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2021-09-21 04:41:55,496 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1005] - Socket connection established, initiating session, client: /127.0.0.1:41234, server: localhost/127.0.0.1:2181
2021-09-21 04:41:55,513 [myid:localhost:2181] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1438] - Session establishment complete on server localhost/127.0.0.1:2181, session id = 0x3006ad4fdbc0002, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] deleteall /clickhouse/statdata/collectd_metrics/replicas/replica_2
[zk: localhost:2181(CONNECTED) 1] 2021-09-21 04:42:53,061 [myid:] - ERROR [main:ServiceUtils@42] - Exiting JVM with code 0
root@zk03:/apache-zookeeper-3.7.0-bin#

Unable to continue upgrading an old cacti 0.8.8 to the latest version 1.2.18

Upgrading an old instance of cacti monitoring software may become a challenge, because of multiple new recommendations and requirements for the latest version 1.2.x.
There are a couple of b recommendations like memory limit and maximum execution time and multiple plugin requirements, which if not fulfilled the setup cannot continue. Second, there are the MySQL recommendations and there is an option innodb_file_format, which in general, is recommended to be Barracuda, but by default, in older version of MySQL use Antelope!

Upgrading from CACTI version 0.8.8 is successful to CACTI version 1.2.3, but then the upgrade process just began restarting and failed to upgrade to the final target CACTI version 1.2.18 because of the old MySQL InnoDB table format – Antelope.

Despite the Barracuda is just recommended and the upgrade process continues through the steps of the setup wizard, it just suddenly stops and returns to the welcome install screen.
Setting the option innodb_file_format resolves the problem and the upgrade setup finishes successfully the upgrade from CACTI version 0.8.8 (apparently with an intermediate upgrade to CACTI version 1.2.3) to CACTI version 1.2.18.

innodb_file_format=barracuda

Probably, this option will be a mandatory MySQL option for upgrading to a newer CACTI version after 1.2.3.

Several screenshots of recommendations and requirements for upgrading to CACTI 1.2.28

Keep on reading!

alter table error 1215 (HY000): Cannot add foreign key constraint

Adding a foreign key may result in the following error:

mysql> ALTER TABLE `table1` ADD CONSTRAINT FK_table2_id FOREIGN KEY (`table2_id`) REFERENCES table2(`ID`);
ERROR 1215 (HY000): Cannot add foreign key constraint

This error can occur because in couple of cases and it is not very informative, but basically, it says there is some kind of compatibility issue between the table1.table_id and table2.ID. And the three most common problems, besides the tables and columns exists, are:

  1. One of the two table are not Innodb.
    SHOW CREATE TABLE `table1`;
    
  2. There is a value (or values) in table1.table_id, which does not exists in table2.ID.
  3. SELECT `table1`.`table2_id`, `table2`.`ID` FROM `table1` LEFT JOIN `table2` ON `table1`.`table2_id`=`table2`.`ID`;
    

    Execute a left join to check whether there are NULL values in the table1.table2_id. If NULL values exist a record with the same missing IDs should be inserted in table2.ID. this kind of check may be stopped temporarily with:

    SET FOREIGN_KEY_CHECKS=0;
    

    Change it back to 1 after the ALTER clause.

  4. The type and the attributes of the columns are different. The attributes such as UNSINGED may cause a problem, too!
    ALTER TABLE `table1` CHANGE `table2_id` `table2_id` INT(11) UNSIGNED NOT NULL; 
    

    In this case, the column types are the same, but the attributes are different and the MySQL server throws the error! Change the signed int to unsigned with the above command. And after the table1.table2_id and table2.ID are of the same type and they have the same attributes, the alter command will be executed successfully.

Here are the initial structure of above tables:

CREATE TABLE `table1` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `table2_id` int(11) NOT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=220 DEFAULT CHARSET=utf8
CREATE TABLE `table2` (
  `ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `col1` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=642 DEFAULT CHARSET=utf8 

And after a successful ALTER statement, the table1 will looks like:

ALTER TABLE `table1` ADD CONSTRAINT `FK_table2_id` FOREIGN KEY (`table2_id`) REFERENCES `table1`(`ID`);
Query OK, 216 rows affected (0.92 sec)
Records: 216  Duplicates: 0  Warnings: 0

CREATE TABLE `table1` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `table2_id` int(11) NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `FK_table2_id` (`table1_id`),
  CONSTRAINT `FK_table2_id` FOREIGN KEY (`table2_id`) REFERENCES `table2` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=220 DEFAULT CHARSET=utf8

Gentoo emerge virtualbox- Mesa / GLU: Mesa not found at, Mesa headers not found

Emerging the package app-emulation/virtualbox the following error occurs:

Checking for Mesa / GLU: 
  Mesa not found at -L/usr/X11R6/lib -L/usr/X11R6/lib64 -L/usr/local/lib -lXext -lX11 -lGL -I/usr/local/include or Mesa headers not found
  Check the file /var/tmp/portage/app-emulation/virtualbox-6.1.18/work/VirtualBox-6.1.18/configure.log for detailed error information.
Check /var/tmp/portage/app-emulation/virtualbox-6.1.18/work/VirtualBox-6.1.18/configure.log for details
 * ERROR: app-emulation/virtualbox-6.1.18::gentoo failed (configure phase):
 *   (no error message)
 * 
 * Call stack:
 *     ebuild.sh, line  125:  Called src_configure
 *   environment, line 5504:  Called doecho './configure' '--with-gcc=x86_64-pc-linux-gnu-gcc' '--with-g++=x86_64-pc-linux-gnu-g++' '--disable-dbus' '--disable-kmods' '--disable-alsa' '--disable-docs' '--disable-devmapper' '--disable-pulse' '--disable-python' '--enable-webservice' '--enable-vnc'
 *   environment, line 1538:  Called die
 * The specific snippet of code:
 *       "$@" || die

The configure script reports the mesa is missing, but the package media-libs/mesa is installed. Reinstalling does not fix the problem.
Farther investigation in the logs by checking the configure.log reveals the real problem:

srv ~ # tail -n 16 /var/tmp/portage/app-emulation/virtualbox-6.1.18/work/VirtualBox-6.1.18/configure.log
***** Checking Mesa / GLU *****
compiling the following source file:
#include <cstdio>
#include <X11/Xlib.h>
#include <GL/glx.h>
#include <GL/glu.h>
extern "C" int main(void)
{
  return 0;
}
using the following command line:
x86_64-pc-linux-gnu-g++  -fPIC -g -O -Wall -o /var/tmp/portage/app-emulation/virtualbox-6.1.18/work/VirtualBox-6.1.18/.tmp_out /var/tmp/portage/app-emulation/virtualbox-6.1.18/work/VirtualBox-6.1.18/.tmp_src.cc "-L/usr/X11R6/lib -L/usr/X11R6/lib64 -L/usr/local/lib -lXext -lX11 -lGL -I/usr/local/include"
/var/tmp/portage/app-emulation/virtualbox-6.1.18/work/VirtualBox-6.1.18/.tmp_src.cc:4:10: fatal error: GL/glu.h: No such file or directory
    4 | #include <GL/glu.h>
      |          ^~~~~~~~~~
compilation terminated.

The glu part of mesa is missing. In Gentoo, the glu (https://gitlab.freedesktop.org/mesa/glu) is not included in the media-libs/mesa and it is a separate package media-libs/glu.

The solution is to emerge media-libs/glu and then the app-emulation/virtualbox.

emerge -v media-libs/glu

Another Linux distribution may include glu in the main mesa package.

Here, the conclusion is to always check the configure.log, because it reports the exact error and not to trust the generic output of the configure script.

Stopping the glusterfs volume releases disk sleep process hangs

A quick tip for GlusterFS volume. There are multiple possible reasons for a Linux process to hang in “Disk Sleep” state, which even the KILL -9 cannot interrupt:

  • a bug in GlusterFS
  • just bad options turn on online
  • other device relying on a GlusterFS, which is unavailable.
[17294588.184470] INFO: task gdisk:12505 blocked for more than 120 seconds.
[17294588.184538] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17294588.184628] gdisk           D ffff8ce01fb9acc0     0 12505  26866 0x00000080
[17294588.184780] Call Trace:
[17294588.184844]  [<ffffffffbaed3d81>] ? __wake_up_common_lock+0x91/0xc0
[17294588.184910]  [<ffffffffbb585da9>] schedule+0x29/0x70
[17294588.184974]  [<ffffffffbb5838b1>] schedule_timeout+0x221/0x2d0
[17294588.185037]  [<ffffffffbaed3dc3>] ? __wake_up+0x13/0x20
[17294588.185102]  [<ffffffffc0a05d2e>] ? loop_make_request+0x12e/0x210 [loop]
[17294588.185169]  [<ffffffffbaf06d32>] ? ktime_get_ts64+0x52/0xf0
[17294588.185232]  [<ffffffffbb58549d>] io_schedule_timeout+0xad/0x130
[17294588.185304]  [<ffffffffbb5863dd>] wait_for_completion_io+0xfd/0x140
[17294588.185369]  [<ffffffffbaedb990>] ? wake_up_state+0x20/0x20
[17294588.185468]  [<ffffffffbb157e64>] blkdev_issue_flush+0xb4/0x110
[17294588.185533]  [<ffffffffbb08d335>] blkdev_fsync+0x35/0x50
[17294588.185598]  [<ffffffffbb082f57>] do_fsync+0x67/0xb0
[17294588.185671]  [<ffffffffbb083240>] SyS_fsync+0x10/0x20
[17294588.185734]  [<ffffffffbb592ed2>] system_call_fastpath+0x25/0x2a
[17294708.187598] INFO: task gdisk:12505 blocked for more than 120 seconds.
[17294708.187664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17294708.187753] gdisk           D ffff8ce01fb9acc0     0 12505  26866 0x00000080
[17294708.187905] Call Trace:
[17294708.187968]  [<ffffffffbaed3d81>] ? __wake_up_common_lock+0x91/0xc0
[17294708.188033]  [<ffffffffbb585da9>] schedule+0x29/0x70
[17294708.188096]  [<ffffffffbb5838b1>] schedule_timeout+0x221/0x2d0
[17294708.188159]  [<ffffffffbaed3dc3>] ? __wake_up+0x13/0x20
[17294708.188223]  [<ffffffffc0a05d2e>] ? loop_make_request+0x12e/0x210 [loop]
[17294708.188289]  [<ffffffffbaf06d32>] ? ktime_get_ts64+0x52/0xf0
[17294708.188352]  [<ffffffffbb58549d>] io_schedule_timeout+0xad/0x130
[17294708.188416]  [<ffffffffbb5863dd>] wait_for_completion_io+0xfd/0x140
[17294708.188480]  [<ffffffffbaedb990>] ? wake_up_state+0x20/0x20
[17294708.188545]  [<ffffffffbb157e64>] blkdev_issue_flush+0xb4/0x110
[17294708.188624]  [<ffffffffbb08d335>] blkdev_fsync+0x35/0x50
[17294708.188690]  [<ffffffffbb082f57>] do_fsync+0x67/0xb0
[17294708.188754]  [<ffffffffbb083240>] SyS_fsync+0x10/0x20
[17294708.188828]  [<ffffffffbb592ed2>] system_call_fastpath+0x25/0x2a

The above example of dmesg log shows the gdisk process stuck in “Disk Sleep” state, because of a loop device from a file on an unavailable GlusterFS volume! Kill -9 won’t help, the process will remain in this bad state and even a restart would be difficult to perform!

[root@srv1 ~]# gluster volume stop VOL2 
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: VOL2: success
[root@srv1 ~]# gluster volume start VOL2 
volume start: VOL2: success

The solution is to stop the GlusterFS Volume and all the blocked processes on bad devices such as above would be released. The processes will carry on executing or will end their execution after issuing a stop command to the volume. No problem to start the GlusterFS volume immediately after the stop!
NOTE: executing STOP command would affect all servers using this volume. The volume becomes inaccessible for all!