Gentoo clang-15: error: does not contain a GCC installation

main menu
emerge firefox

Trying to build a package resulted in a building failure because a Clang could not find GNU GCC installation as the error shows.

Executing just x86_64-pc-linux-gnu-clang-15 the same error.

[root@srv ~]# x86_64-pc-linux-gnu-clang-15 
clang-15: error: '/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0' does not contain a GCC installation
clang-15: error: no input files

Apparently, because the latest upgrade of GNU GCC went from 12.2.0 to 12.2.1_p20221008 and the directory changed to /usr/lib/gcc/x86_64-pc-linux-gnu/12.
The Clang binaries read several configuration files and one of them was not updated when the GCC had been upgraded. The configuration file /etc/clang/gentoo-gcc-install.cfg has the wrong path, because the gcc-config version was an old one or the configuration file is generated only on GCC major version, not in the minor. But the last upgrade didn’t modified the Clang configuration file /etc/clang/gentoo-gcc-install.cfg

To resolve this issue, the user may edit the file manually or just use gcc-config to revert to the older and then to the new GCC version at once. List the currently installed GCC versions in the system and choose one.

[root@srv ~]# gcc-config -l
 [1] x86_64-pc-linux-gnu-8.2.0
 [2] x86_64-pc-linux-gnu-8.3.0
 [3] x86_64-pc-linux-gnu-9.2.0
 [4] x86_64-pc-linux-gnu-10.3.0
 [5] x86_64-pc-linux-gnu-11.3.0
 [6] x86_64-pc-linux-gnu-12 *
[root@srv ~]# gcc-config x86_64-pc-linux-gnu-11.3.0
 * Switching native-compiler to x86_64-pc-linux-gnu-11.3.0 ...
>>> Regenerating /etc/ld.so.cache...                                                                  [ ok ]
 * If you intend to use the gcc from the new profile in an already
 * running shell, please remember to do:

 *   . /etc/profile

[root@srv ~]#  gcc-config x86_64-pc-linux-gnu-12
 * Switching native-compiler to x86_64-pc-linux-gnu-12 ...
>>> Regenerating /etc/ld.so.cache...                                                                  [ ok ]
 * If you intend to use the gcc from the new profile in an already
 * running shell, please remember to do:

 *   . /etc/profile
[root@srv ~]# . /etc/profile

Check the sys-devel/gcc-config for available upgrades and do them if there are!
More Gentoo tips here.

lxc_attach_run_shell: 1333 Permission denied – failed to exec shell

An annoying error when using the LXC container tools like lxc-attach, which is really simple to fix.

[root@srv ~]# lxc-attach -n db-cluster-3
lxc_container: attach.c: lxc_attach_run_shell: 1333 Permission denied - failed to exec shell
[root@srv ~]#

This error just reports the bash shell in the container cannot be started and the SELinux audit file adds some errors, too:

type=AVC msg=audit(1665745824.682:24229): avc:  denied  { entrypoint } for  pid=20646 comm="lxc-attach" path="/usr/bin/bash" dev="md3" ino=111806476 scontext=system_u:system_r:unconfined_service_t:s0 tcontext=unconfined_u:object_r:var_log_t:s0 tclass=file
type=SYSCALL msg=audit(1665745824.682:24229): arch=c000003e syscall=59 success=no exit=-13 a0=24412c6 a1=7ffe87c07170 a2=2443870 a3=7ffe87c08c60 items=0 ppid=20644 pid=20646 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts12 ses=3304 comm="lxc-attach" exe="/usr/bin/lxc-attach" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=PROCTITLE msg=audit(1665745824.682:24229): proctitle=6C78632D617474616368002D6E0064622D636C75737465722D33
type=AVC msg=audit(1665745824.682:24230): avc:  denied  { entrypoint } for  pid=20646 comm="lxc-attach" path="/usr/bin/bash" dev="md3" ino=111806476 scontext=system_u:system_r:unconfined_service_t:s0 tcontext=unconfined_u:object_r:var_log_t:s0 tclass=file
type=SYSCALL msg=audit(1665745824.682:24230): arch=c000003e syscall=59 success=no exit=-13 a0=7f08b5e579a0 a1=7ffe87c07170 a2=2443870 a3=7ffe87c08c60 items=0 ppid=20644 pid=20646 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts12 ses=3304 comm="lxc-attach" exe="/usr/bin/lxc-attach" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=PROCTITLE msg=audit(1665745824.682:24230): proctitle=6C78632D617474616368002D6E0064622D636C75737465722D33

So clearly, the problem is in SELinux, and turn it off temporarily with

setenforce 0

Turning off the SELinux is not the right thing! There are two aspects to the problem:

  • Missing SELinux rules, which are installed with a special package container-selinux
  • Wrong SELinux permissions for the LXC container’s root directory. In most cases, the user just changes the default /var/lib/lxc/[container] to something new and the LXC works, but it breaks some LXC parts.

Installing container-selinux is easy:

dnf install -y container-selinux

Or the old yum:

yum install -y container-selinux

Then check the SELinux attributes with:

[root@srv ~]# ls -altrZ /mnt/storage/servers/mycontainer/
drwxr-xr-x. root root unconfined_u:object_r:var_log_t:s0 ..
-rw-r--r--. root root unconfined_u:object_r:var_log_t:s0 config
drwxrwx---. root root unconfined_u:object_r:var_log_t:s0 .
drwxr-xr-x. root root unconfined_u:object_r:var_log_t:s0 rootfs

The problem is var_log_t, which is an SELinux file context and it should be container_var_lib_t. Stop the container and fix the permissions. If the default directory (/var/lib/lxc) were used, it would not have this problem. Adding the SELinux file context definition to the new directory is mandatory when changing the directory root of a container:

[root@srv ~]# semanage fcontext -a -t container_var_lib_t '/mnt/storage/servers/mycontainer(/.*)?'
[root@srv ~]# restorecon -Rv /mnt/storage/servers/mycontainer/
restorecon reset /mnt/storage/servers/mycontainer context unconfined_u:object_r:var_log_t:s0->unconfined_u:object_r:container_var_lib_t:s0
.....
.....
restorecon reset /mnt/storage/servers/mycontainer/config context unconfined_u:object_r:var_log_t:s0->unconfined_u:object_r:container_var_lib_t:s0

All files permissions under /mnt/storage/servers/mycontainer/ should be fixed with the restorecon. Start the LXC container and try to attach it with lxc-attach. Now, there should not be any errors:

[root@srv ~]# lxc-attach -n mycontainer
[root@mycontainer ~]#

The files’ context is the right one – container_var_lib_t:

[root@srv ~]# ls -altrZ /mnt/storage/servers/mycontainer/
drwxr-xr-x. root root unconfined_u:object_r:var_log_t:s0 ..
-rw-r--r--. root root unconfined_u:object_r:container_var_lib_t:s0 config
drwxrwx---. root root unconfined_u:object_r:container_var_lib_t:s0 .
drwxr-xr-x. root root unconfined_u:object_r:container_var_lib_t:s0 rootfs

More on LXC containershttps://ahelpme.com/category/software/lxc/.

Recover from Unable to fetch live group_replication member data from any server in cluster

main menu
MySQL Router Unable to fetch live group_replication

After multiple networking connectivity issues between MySQL InnoDB Cluster nodes, the cluster may break and the MySQL Router begins to log the following messages:

2022-10-11 15:20:48 metadata_cache ERROR [7f2d619fe640] Unable to fetch live group_replication member data from any server in cluster 'mycluster1'
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] Member db-cluster-2:3306 (05b6c7c7-f285-11ec-adfc-00163e0b38ff) defined in metadata not found in actual Group Replication
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] db-cluster-1:3306 is not part of quorum for cluster 'mycluster1'
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] Member db-cluster-1:3306 (8bf2c25f-90ae-11ec-93d1-00163e20a401) defined in metadata not found in actual Group Replication
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] db-cluster-3:3306 is not part of quorum for cluster 'mycluster1'
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] Member db-cluster-3:3306 (99856952-90ae-11ec-9a5f-fafd8f1acc17) defined in metadata not found in actual Group Replication
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] db-cluster-2:3306 is not part of quorum for cluster 'mycluster1'

And in MySQL nodes there are also the errors of unable to connect to 33061:

2022-10-11T15:16:25.728393Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-1:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.728714Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-3:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.729195Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-1:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.729569Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-3:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.730154Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-1:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.730474Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-3:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.730485Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 33061'
2022-10-11T15:16:25.782015Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33061'

When a MySQL Cluster node node cannot connect to the 33061 (or 3306X), it may be a signal for a firewall issue or the group replication has not started on this node, which is the case here!

The group replication is not working and the cluster must be recovered. All nodes wait to join an exiting cluster, which is not available. It has not stared yet and it would not start alone even the administrator may restart all the nodes.
Keep on reading!

Delete Glusterfs volume when a peer is down – failed: Some of the peers are down

Deleting GlusterFS volumes may fail with an error, pointing out some of the peers are down, i.e. they are disconnected. Even all the volume’s peers of the volume the user is trying to delete are available, still the error appears and it is not possible to delete the volume.
That’s because GlusterFS by design stores the volume configuration spread to all peers – no matter they host a brick/arbiter of the volume or not. If a peer is a part of a GlusterFS setup, it is mandatory to be available and online in the peer status, to be able to delete a volume.
If the user still wants to delete the volume:

  1. * Force remove the brink, which was hosted on the detached peer. If any!
  2. Detach the disconnected peer from the peers
  3. Delete the volume

Here are real examples with and without a brick on the unavailable peer.
The initial volumes and peers configuration:

[root@srv1 ~]# gluster volume info
 
Volume Name: VOL1
Type: Replicate
Volume ID: 02ff2995-7307-4f3d-aa24-862edda7ce81
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ng1:/mnt/storage1/glusterfs/brick1
Brick2: ng3:/mnt/storage1/glusterfs/brick1
Brick3: ng1:/mnt/storage1/glusterfs/arbiter1 (arbiter)
Options Reconfigured:
features.scrub: Active
features.bitrot: on
cluster.self-heal-daemon: enable
storage.linux-io_uring: off
client.event-threads: 4
performance.cache-max-file-size: 50MB
performance.parallel-readdir: on
network.inode-lru-limit: 200000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.cache-size: 2048MB
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
 
Volume Name: VOL2
Type: Replicate
Volume ID: fc2e82e4-2576-4bb1-b9bf-c6b2aff10ef0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ng1:/mnt/storage1/glusterfs/brick2
Brick2: ng2:/mnt/storage1/glusterfs/brick2
Brick3: ng1:/mnt/storage1/glusterfs/arbiter2 (arbiter)
Options Reconfigured:
features.scrub: Active
features.bitrot: on
cluster.self-heal-daemon: enable
storage.linux-io_uring: off
performance.parallel-readdir: on
network.compression: off
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
features.cache-invalidation: on

[root@srv ~]# gluster peer status
Number of Peers: 2

Hostname: ng1
Uuid: 7953514b-b52c-4a5c-be03-763c3e24eb4e
State: Peer in Cluster (Connected)

Hostname: ng3
Uuid: 3d273834-eca6-4997-871f-1a282ca90fb0
State: Peer in Cluster (Disconnected)

Delete a GlusterFS volume – all bricks and bricks’ peers are available, but another peer is not.

First, the error, when the disconnected peer is still in peer status list.

[root@srv ~]# gluster volume stop VOL2
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: VOL2: success
[root@srv ~]# gluster volume delete VOL2
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
volume delete: VOL2: failed: Some of the peers are down

Keep on reading!

Removing of kwayland-server and kwayland-server” is soft blocking kde-plasma/kwin-5.25.2

A big change for Plasma KDE happened two months ago – a “Merge kwayland-server into kwin“.
So after KDE Plasma 5.25, there is no kwayland-server any more (respectively no kwayland-server with version 5.25 and no package in Gentoo) and it may block a Gentoo update with the following error:

mydesktop root # emerge -va --verbose-conflicts --verbose --backtrack=300 $(qlist -IC|grep -i kde)
......
......
[ebuild     U  ] dev-util/kdevelop-php-22.04.2:5::gentoo [21.12.3:5::gentoo] USE="handbook -debug -test" 1,057 KiB
[ebuild     U  ] kde-apps/umbrello-22.04.2:5::gentoo [21.12.3:5::gentoo] USE="handbook php -debug -test" 5,544 KiB
[ebuild     U  ] kde-apps/kross-interpreters-22.04.2:5::gentoo [21.12.3:5::gentoo] USE="-debug" 149 KiB
[blocks B      ] kde-plasma/kwayland-server ("kde-plasma/kwayland-server" is soft blocking kde-plasma/kwin-5.25.2)

Total: 340 packages (329 upgrades, 5 new, 6 reinstalls), Size of downloads: 1,001,699 KiB
Conflict: 1 block (1 unsatisfied)

 * Error: The above package list contains packages which cannot be
 * installed at the same time on the same system.

  (kde-plasma/kwayland-server-5.24.5-r1:5/5::gentoo, ebuild scheduled for merge) pulled in by
    kde-plasma/kwayland-server
    kde-plasma/kwayland-server:5::gentoo required by @selected 
    kde-plasma/kwayland-server required by @selected 

  (kde-plasma/kwin-5.25.2:5/5::gentoo, ebuild scheduled for merge) pulled in by
    >=kde-plasma/kwin-5.25.2:5 required by (kde-plasma/plasma-desktop-5.25.2:5/5::gentoo, ebuild scheduled for merge) USE="handbook ibus kaccounts scim semantic-desktop -debug -emoji -telemetry -test" ABI_X86="(64)"
    >=kde-plasma/kwin-5.25.2:5[lock] required by (kde-plasma/plasma-meta-5.25.2:5/5::gentoo, ebuild scheduled for merge) USE="accessibility bluetooth browser-integration crash-handler crypt desktop-portal display-manager elogind gtk handbook kwallet legacy-systray networkmanager pulseaudio sddm smart wallpapers -colord -discover (-firewall) -grub -plymouth -sdk -systemd -thunderbolt" ABI_X86="(64)"
    >=kde-plasma/kwin-5.25.2:5 required by (kde-plasma/libkworkspace-5.25.2:5/5::gentoo, ebuild scheduled for merge) USE="-debug -test" ABI_X86="(64)"
    >=kde-plasma/kwin-5.25.2:5 required by (kde-plasma/plasma-workspace-5.25.2:5/5::gentoo, ebuild scheduled for merge) USE="calendar fontconfig geolocation handbook policykit semantic-desktop -appstream -debug -gps -screencast -telemetry -test" ABI_X86="(64)"

emerge could not continue with the upgrade to KDE Platform 5.25.2.

main menu
emerge error

kwayland-server is pulled by selected, but the last version of the package is from 5.24 release, which should immediately signal that there is something wrong with it, because the emerge command shows the latest KDE Plasma version to be 5.25 (with the exact version 5.25.2).

Solution – deselect/remove kde-plasma/kwayland-server

The solution is simple, just deselect it from the world slot to be sure it won’t be pulled again in the future. Remove the package manually if the error still persists, but only deselecting should work. Of course, it should not be selected in the command-line with emerge, neither. In general, such package won’t be available any more.
Always keep eye on the pulled versions and the versions you are trying to install, most of the time the problem is obvious and from a single “wrong/bad” package, which may generate e great deal of erroneous and frightening dependencies output.

mydesktop root # emerge --deselect kwayland-server
>>> Removing kde-plasma/kwayland-server from "world" favorites file...
>>> Removing kde-plasma/kwayland-server:5::gentoo from "world" favorites file...

And now the emerge command is OK and no problem with the dependencies and blocks:

mydesktop root # emerge -va --verbose-conflicts --verbose --backtrack=300 $(qlist -IC|grep -i kde|grep -v kwayland-server)
......
......
[ebuild  N     ] kde-plasma/kwin-5.25.2:5::gentoo  USE="accessibility (caps) handbook lock multimedia -debug -gles2-only -plasma -screencast -test" 6,468 KiB
[uninstall     ] kde-plasma/kwayland-server-5.24.3:5::gentoo  USE="-debug -doc -test" 
[blocks b      ] kde-plasma/kwayland-server ("kde-plasma/kwayland-server" is soft blocking kde-plasma/kwin-5.25.2)
[ebuild     U  ] kde-plasma/libkworkspace-5.25.2:5::gentoo [5.24.3:5::gentoo] USE="-debug -test" 0 KiB
......
......
[ebuild     U  ] kde-apps/akregator-22.04.2:5::gentoo [21.12.3:5::gentoo] USE="handbook -debug -speech -telemetry -test" 2,209 KiB

Total: 339 packages (328 upgrades, 5 new, 6 reinstalls, 1 uninstall), Size of downloads: 1,001,483 KiB
Conflict: 1 block (all satisfied)

More on Gentoo blocking – Gentoo update tips when updating packages with blocks and masked files

Building python 3.10.4 and possibly undefined macro: AC_MSG_ERROR

Emerging the new python 3.10 in Gentoo may lead to the following error, despite all the dependencies installed. This error might also occur in any other Linux distro! During the configure stage the autoconf tool outputs error:

root@srv ~ # cat /var/tmp/portage/dev-lang/python-3.10.5/temp/autoconf.out 
***** autoconf *****
***** PWD: /var/tmp/portage/dev-lang/python-3.10.5/work/Python-3.10.5
***** autoconf --force

configure.ac:59: warning: The macro `AC_CONFIG_HEADER' is obsolete.
configure.ac:59: You should run autoupdate.
./lib/autoconf/status.m4:719: AC_CONFIG_HEADER is expanded from...
configure.ac:59: the top level
configure.ac:911: warning: AC_LINK_IFELSE was called before AC_USE_SYSTEM_EXTENSIONS
./lib/autoconf/specific.m4:364: AC_USE_SYSTEM_EXTENSIONS is expanded from...
configure.ac:911: the top level
configure.ac:2214: warning: The macro `AC_HEADER_STDC' is obsolete.
configure.ac:2214: You should run autoupdate.
./lib/autoconf/headers.m4:704: AC_HEADER_STDC is expanded from...
configure.ac:2214: the top level
configure.ac:4250: warning: The macro `AC_HEADER_TIME' is obsolete.
configure.ac:4250: You should run autoupdate.
./lib/autoconf/headers.m4:743: AC_HEADER_TIME is expanded from...
configure.ac:4250: the top level
configure.ac:18: error: possibly undefined macro: AC_MSG_ERROR
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation

It appears a dependency is missing! First, build the package sys-devel/autoconf-archive and then the building of python-3.10.5 will finish successfully.

root@srv ~ # emerge -va autoconf-archive

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild  N     ] sys-devel/autoconf-archive-2022.02.11::gentoo  660 KiB

Total: 1 package (1 new), Size of downloads: 660 KiB

Would you like to merge these packages? [Yes/No] yes

Emerging the dev-lang/python-3.10.5::gentoo outputs the error and the building process stops. The error output in emerge command is so informative. The actual error is in the /var/tmp/portage/dev-lang/python-3.10.5/temp/autoconf.out.
Keep on reading!

Recovery of MySQL 8 Cluster instance after server crash and corrupted data in log event

There is a MySQL 8 Cluster InnoDB of three servers and one of the server crashed with a bad RAM. The same setup is described here – Install and deploy MySQL 8 InnoDB Cluster with 3 nodes under CentOS 8 and MySQL Router for HA. The failed server got restarted without clean shutdown and after booting up the MySQL Cluster node tried to recover automatically, but the recover process failed and the node left the group of the three server:

2022-05-31T04:00:00.322469Z 24 [ERROR] [MY-011620] [Repl] Plugin group_replication reported: 'Fatal error during the incremental recovery process of Group Replication. The server will leave the group.'
2022-05-31T04:00:00.322489Z 24 [Warning] [MY-011645] [Repl] Plugin group_replication reported: 'Skipping leave operation: concurrent attempt to leave the group is on-going.'
2022-05-31T04:00:00.322500Z 24 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2022-05-31T04:00:03.448475Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

The recovery process proposed here follows these steps

  1. Connect with mysqlsh (MySQL Shell) to a MySQL instance, which is currently a part of the cluster group. The member, which left the group is not part any more, though the MySQL Cluster status shows it is part of the cluster topology, but with error.
  2. Remove the bad instance from the MySQL Cluster with removeInstance
  3. Add the instance with addInstance and the recovery process will kick in. The type of the recovery process will be chosen by the setup if not specified. In this case, the setup chooses the Incremental state recovery over (full) clone mode.
  4. Initiate the cluster rescan operation to recovery the group replication and the MySQL Cluster.

mysql

Summery of the recovery process

  • The recovery process was successful.
  • The distributed recovery with Incremental state recovery has finished for 24 hours for 200Mbyte database, which is really strange and the speed was really bad. The instance uses ordinary disks, not SSDs and a 1Gbps network.
  • No need to change or manage the MySQL Router in any of the steps or the recovery stages. It handled the situation from the very beginning by removing the bad instance and then adding it again only after the recovery process had finished successfully.
  • MySQL Shell should be connected to an healthy instance currently a part of the Cluster.

In the console output logs all commands and important lines are highlighted.

STEP 1) Remove the bad instance from the cluster.

The status of the cluster with the bad instance.

[root@db-cluster-3 ~]# mysqlsh
MySQL Shell 8.0.28

Copyright (c) 2016, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > \connect clusteradmin@db-cluster-1
Creating a session to 'clusteradmin@db-cluster-1'
Fetching schema names for autocompletion... Press ^C to stop.
Closing old connection...
Your MySQL connection id is 39806649 (X protocol)
Server version: 8.0.28 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
 MySQL  db-cluster-1:33060+ ssl  JS > var cluster = dba.getCluster()
 MySQL  db-cluster-1:33060+ ssl  JS > cluster.status()
{
    "clusterName": "mycluster1", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "db-cluster-1:3306", 
        "ssl": "REQUIRED", 
        "status": "OK_NO_TOLERANCE", 
        "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active.", 
        "topology": {
            "db-cluster-1:3306": {
                "address": "db-cluster-1:3306", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "replicationLag": null, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.28"
            }, 
            "db-cluster-2:3306": {
                "address": "db-cluster-2:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": null, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.28"
            }, 
            "db-cluster-3:3306": {
                "address": "db-cluster-3:3306", 
                "instanceErrors": [
                    "ERROR: group_replication has stopped with an error."
                ], 
                "memberRole": "SECONDARY", 
                "memberState": "ERROR", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "role": "HA", 
                "status": "(MISSING)", 
                "version": "8.0.28"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "db-cluster-1:3306"
}

Keep on reading!

lxc and interface lo does not exist in virtualized server

Virtualizing a real server with an LXC container is pretty easy – do a rsync and run it. Sometimes there are some glitches when starting the LXC container for the first time. Such errors like the following – no networking available at the start, but when attached to the started container it seems to have the network interfaces with no IPs. Even, though it is possible to set the IPs manually the init scripts do not work.

[root@srv ~]# lxc-start -F -n n7763.node-int.info
lxc-start: live300.mytv.bg: start.c: proc_pidfd_open: 1607 Function not implemented - Failed to send signal through pidfd
INIT: version 2.88 booting

   OpenRC 0.12.4 is starting up Gentoo Linux (x86_64) [LXC]

 * /proc is already mounted
 * Mounting /run ... * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * Caching service dependencies ... [ ok ]
 * setting up tmpfiles.d entries for /dev ... [ ok ]
 * Creating user login records ... [ ok ]
 * Wiping /tmp directory ... [ ok ]
 * Bringing up network interface lo ...RTNETLINK answers: File exists
 [ ok ]
 * Updating /etc/mtab ... [ ok ]
 * Bringing up interface lo
 *   ERROR: interface lo does not exist
 *   Ensure that you have loaded the correct kernel module for your hardware
 * ERROR: net.lo failed to start
 * setting up tmpfiles.d entries ... [ ok ]
INIT: Entering runlevel: 3
 * Loading iptables state and starting firewall ... [ ok ]
 * Bringing up interface lo
 *   ERROR: interface lo does not exist
 *   Ensure that you have loaded the correct kernel module for your hardware
 * ERROR: net.lo failed to start
 * Bringing up interface eth0
 *   ERROR: interface eth0 does not exist
 *   Ensure that you have loaded the correct kernel module for your hardware
 * ERROR: net.eth0 failed to start

And it appeared that the old /dev was still in place, which messed up with virtualization and the init scripts.
The solution is simple just

  1. remove the existing /dev
  2. create a new empty one

And the LXC container of the real server will start with a network as usual.

So when virtualizing a real server into LXC container after doing RSYNC of the storage, it is mandatory to create an empty /dev, /proc, and /sys directories!

More on the LXC containers – Run LXC CentOS 8 container with bridged network under CentOS 8.

git status and bus error on SSD – fix READ errors by recovering part of the file

SSD and Linux encryption may not be the best idea, especially without the TRIM (allow-discards) option (or never executed fstrim?). Nevertheless, this error may occur not only on an SSD device, but just where there is a corrupted file system or device.
In our case, the SSD has some read errors. Apparently, some files or some parts of files could not be read by the git command:

[myuser@dekstop kernel]# git status -v
Bus error 84/115708)

In the case of SSD bad reads, the only working solution is to find and overwrite the problem file(s) or remove the file(s) and recreate them. A more sophisticated solution is to dump the file with dd and skip errors option enabled to another location and then overwrite the old file with the new one. So only the corrupted area of the file will be lost, which in most cases is just one or two sectors, i.e. one or two 512 bytes of data.

STEP 1) Find the bad files with the find command.

Use find Linux command and read all the files with the cat Linux command, so a bad sector will output an input/output error on READ. On write errors won’t be generated, but the sector will be automatically moved to a healthy one (the bad sector is marked and never used more).

[myuser@dekstop kernel]#  find -type f -exec cat {} > /dev/null \;
cat: ./servers/logo_description.txt: Input/output error

If multiple files are found repeat the procedure with each file.

STEP 2) Copy the healthy portion of the file.

The easiest way to remove the error is just to delete the file (or overwrite it), but if the healthy portion of the file is desirable the dd utility may be used to recover it:
Keep on reading!

Debug options for LXC and lxc-start when lxc container could not start

Setup and running LXC container is really easy, but sometimes it is unclear why the LXC container could not start. Most of the time, there is a generic error, which says nothing for the real reason:

root@srv ~ # lxc-start -n test-lxc
lxc-start: test-lxc: lxccontainer.c: wait_on_daemonized_start: 867 Received container state "ABORTING" instead of "RUNNING"
lxc-start: test-lxc: tools/lxc_start.c: main: 306 The container failed to start
lxc-start: test-lxc: tools/lxc_start.c: main: 309 To get more details, run the container in foreground mode
lxc-start: test-lxc: tools/lxc_start.c: main: 311 Additional information can be obtained by setting the --logfile and --logpriority options

No specific reason why the LXC container test-lxc can not be started and the lxc-start command failed. There is just an offer to use the logging options and here is how the administrator of the box may do it by including the following lxc-start options:

-l DEBUG –logfile=test-lxc.log –logpriority=9

Here is a real-world example of an old kernel trying to run LXC 4.0
Keep on reading!