Recover from Unable to fetch live group_replication member data from any server in cluster

Author:

main menu
MySQL Router Unable to fetch live group_replication

After multiple networking connectivity issues between MySQL InnoDB Cluster nodes, the cluster may break and the MySQL Router begins to log the following messages:

2022-10-11 15:20:48 metadata_cache ERROR [7f2d619fe640] Unable to fetch live group_replication member data from any server in cluster 'mycluster1'
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] Member db-cluster-2:3306 (05b6c7c7-f285-11ec-adfc-00163e0b38ff) defined in metadata not found in actual Group Replication
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] db-cluster-1:3306 is not part of quorum for cluster 'mycluster1'
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] Member db-cluster-1:3306 (8bf2c25f-90ae-11ec-93d1-00163e20a401) defined in metadata not found in actual Group Replication
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] db-cluster-3:3306 is not part of quorum for cluster 'mycluster1'
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] Member db-cluster-3:3306 (99856952-90ae-11ec-9a5f-fafd8f1acc17) defined in metadata not found in actual Group Replication
2022-10-11 15:20:49 metadata_cache WARNING [7f2d619fe640] db-cluster-2:3306 is not part of quorum for cluster 'mycluster1'

And in MySQL nodes there are also the errors of unable to connect to 33061:

2022-10-11T15:16:25.728393Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-1:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.728714Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-3:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.729195Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-1:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.729569Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-3:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.730154Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-1:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.730474Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to peer node db-cluster-3:33061 when joining a group. My local port is: 33061.'
2022-10-11T15:16:25.730485Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error connecting to all peers. Member join failed. Local port: 33061'
2022-10-11T15:16:25.782015Z 0 [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 33061'

When a MySQL Cluster node node cannot connect to the 33061 (or 3306X), it may be a signal for a firewall issue or the group replication has not started on this node, which is the case here!

The group replication is not working and the cluster must be recovered. All nodes wait to join an exiting cluster, which is not available. It has not stared yet and it would not start alone even the administrator may restart all the nodes.

To recover the cluster find the last read-write (primary) node and issue a MySQL InnoDB Cluster recovery command from an outrage:

[root@db-cluster-2 ~]# mysqlsh
MySQL Shell 8.0.29

Copyright (c) 2016, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
 MySQL  JS > \connect clusteradmin@db-cluster-2
Creating a session to 'clusteradmin@db-cluster-2'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 1518 (X protocol)
Server version: 8.0.28 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
 MySQL  db-cluster-2:33060+ ssl  JS > var cluster = dba.rebootClusterFromCompleteOutage()
Restoring the cluster 'mycluster1' from complete outage...

The instance 'db-cluster-1:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

The instance 'db-cluster-3:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

* Waiting for seed instance to become ONLINE...
db-cluster-2:3306 was restored.
Rejoining 'db-cluster-1:3306' to the cluster.
Rejoining instance 'db-cluster-1:3306' to cluster 'mycluster1'...

The instance 'db-cluster-1:3306' was successfully rejoined to the cluster.

Rejoining 'db-cluster-3:3306' to the cluster.
Rejoining instance 'db-cluster-3:3306' to cluster 'mycluster1'...

The instance 'db-cluster-3:3306' was successfully rejoined to the cluster.

The cluster was successfully rebooted.

 MySQL  db-cluster-2:33060+ ssl  JS > cluster.status()
{
    "clusterName": "mycluster1", 
    "defaultReplicaSet": {
        "name": "default", 
        "primary": "db-cluster-2:3306", 
        "ssl": "REQUIRED", 
        "status": "OK", 
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", 
        "topology": {
            "db-cluster-1:3306": {
                "address": "db-cluster-1:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": null, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.28"
            }, 
            "db-cluster-2:3306": {
                "address": "db-cluster-2:3306", 
                "memberRole": "PRIMARY", 
                "mode": "R/W", 
                "readReplicas": {}, 
                "replicationLag": null, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.28"
            }, 
            "db-cluster-3:3306": {
                "address": "db-cluster-3:3306", 
                "memberRole": "SECONDARY", 
                "mode": "R/O", 
                "readReplicas": {}, 
                "replicationLag": null, 
                "role": "HA", 
                "status": "ONLINE", 
                "version": "8.0.28"
            }
        }, 
        "topologyMode": "Single-Primary"
    }, 
    "groupInformationSourceMember": "db-cluster-2:3306"
}

The MySQL Shell output above is

Use rebootClusterFromCompleteOutage on the MySQL node, which was the last writable. More on the subject – https://dev.mysql.com/doc/mysql-shell/8.0/en/reboot-outage.html
And here are the MySQL InnoDB Cluster logs of the primary server:

2022-10-11T15:20:57.480449Z 1535 [System] [MY-013587] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is starting.'
2022-10-11T15:20:57.482668Z 1578 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2022-10-11T15:20:58.630276Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'
2022-10-11T15:20:59.630857Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to db-cluster-2:3306 on view 16655016576298330:1.'
2022-10-11T15:20:59.630943Z 0 [System] [MY-011490] [Repl] Plugin group_replication reported: 'This server was declared online within the replication group.'
2022-10-11T15:20:59.630996Z 0 [System] [MY-011507] [Repl] Plugin group_replication reported: 'A new primary with address db-cluster-2:3306 was elected. The new primary will execute all previous group transactions before allowing writes.'
2022-10-11T15:20:59.631727Z 1586 [System] [MY-013731] [Repl] Plugin group_replication reported: 'The member action "mysql_disable_super_read_only_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "1" will be run.'
2022-10-11T15:20:59.631808Z 1586 [System] [MY-011566] [Repl] Plugin group_replication reported: 'Setting super_read_only=OFF.'
2022-10-11T15:20:59.631838Z 1586 [System] [MY-013731] [Repl] Plugin group_replication reported: 'The member action "mysql_start_failover_channels_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "10" will be run.'
2022-10-11T15:20:59.632039Z 1603 [System] [MY-011510] [Repl] Plugin group_replication reported: 'This server is working as primary member.'
2022-10-11T15:21:00.736531Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to db-cluster-2:3306, db-cluster-1:3306 on view 16655016576298330:2.'
2022-10-11T15:21:03.831355Z 0 [System] [MY-011492] [Repl] Plugin group_replication reported: 'The member with address db-cluster-1:3306 was declared online within the replication group.'
2022-10-11T15:21:05.213035Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to db-cluster-2:3306, db-cluster-1:3306, db-cluster-3:3306 on view 16655016576298330:3.'
2022-10-11T15:21:21.243823Z 0 [System] [MY-011492] [Repl] Plugin group_replication reported: 'The member with address db-cluster-3:3306 was declared online within the replication group.'

Immediately after recovering the MySQL Router also began serving the read-write sockets:

2022-10-11 15:21:00 metadata_cache INFO [7f2d619fe640] Potential changes detected in cluster 'mycluster1' after metadata refresh
2022-10-11 15:21:00 metadata_cache INFO [7f2d619fe640] Metadata for cluster 'mycluster1' has 3 member(s), single-primary: (view_id=0)
2022-10-11 15:21:00 metadata_cache INFO [7f2d619fe640]     db-cluster-1:3306 / 33060 - mode=n/a 
2022-10-11 15:21:00 metadata_cache INFO [7f2d619fe640]     db-cluster-3:3306 / 33060 - mode=n/a 
2022-10-11 15:21:00 metadata_cache INFO [7f2d619fe640]     db-cluster-2:3306 / 33060 - mode=RW 
2022-10-11 15:21:00 routing INFO [7f2d619fe640] Start accepting connections for routing routing:mycluster1_x_rw listening on 6448 and named socket /var/lib/mysqlrouter/mysqlx.sock
2022-10-11 15:21:00 routing INFO [7f2d619fe640] Start accepting connections for routing routing:mycluster1_x_ro listening on 6449 and named socket /var/lib/mysqlrouter/mysqlxro.sock
2022-10-11 15:21:00 routing INFO [7f2d619fe640] Start accepting connections for routing routing:mycluster1_rw listening on 6446 and named socket /var/lib/mysqlrouter/mysql.sock
2022-10-11 15:21:00 routing INFO [7f2d619fe640] Start accepting connections for routing routing:mycluster1_ro listening on 6447 and named socket /var/lib/mysqlrouter/mysqlro.sock
2022-10-11 15:21:00 metadata_cache WARNING [7f2d619fe640] Member db-cluster-2:3306 (05b6c7c7-f285-11ec-adfc-00163e0b38ff) defined in metadata not found in actual Group Replication
2022-10-11 15:21:00 metadata_cache WARNING [7f2d619fe640] db-cluster-1:3306 is not part of quorum for cluster 'mycluster1'
2022-10-11 15:21:00 metadata_cache WARNING [7f2d619fe640] Member db-cluster-1:3306 (8bf2c25f-90ae-11ec-93d1-00163e20a401) defined in metadata not found in actual Group Replication
2022-10-11 15:21:00 metadata_cache WARNING [7f2d619fe640] db-cluster-3:3306 is not part of quorum for cluster 'mycluster1'
2022-10-11 15:21:00 metadata_cache WARNING [7f2d619fe640] Member db-cluster-3:3306 (99856952-90ae-11ec-9a5f-fafd8f1acc17) defined in metadata not found in actual Group Replication
2022-10-11 15:21:04 metadata_cache INFO [7f2d619fe640] Potential changes detected in cluster 'mycluster1' after metadata refresh
2022-10-11 15:21:04 metadata_cache INFO [7f2d619fe640] Metadata for cluster 'mycluster1' has 3 member(s), single-primary: (view_id=0)
2022-10-11 15:21:04 metadata_cache INFO [7f2d619fe640]     db-cluster-1:3306 / 33060 - mode=RO 
2022-10-11 15:21:04 metadata_cache INFO [7f2d619fe640]     db-cluster-3:3306 / 33060 - mode=n/a 
2022-10-11 15:21:04 metadata_cache INFO [7f2d619fe640]     db-cluster-2:3306 / 33060 - mode=RW 
2022-10-11 15:21:06 metadata_cache INFO [7f2d619fe640] Potential changes detected in cluster 'mycluster1' after metadata refresh
2022-10-11 15:21:06 metadata_cache INFO [7f2d619fe640] Metadata for cluster 'mycluster1' has 3 member(s), single-primary: (view_id=0)
2022-10-11 15:21:06 metadata_cache INFO [7f2d619fe640]     db-cluster-1:3306 / 33060 - mode=RO 
2022-10-11 15:21:06 metadata_cache INFO [7f2d619fe640]     db-cluster-3:3306 / 33060 - mode=n/a 
2022-10-11 15:21:06 metadata_cache INFO [7f2d619fe640]     db-cluster-2:3306 / 33060 - mode=RW 
2022-10-11 15:21:21 metadata_cache INFO [7f2d619fe640] Potential changes detected in cluster 'mycluster1' after metadata refresh
2022-10-11 15:21:21 metadata_cache INFO [7f2d619fe640] Metadata for cluster 'mycluster1' has 3 member(s), single-primary: (view_id=0)
2022-10-11 15:21:21 metadata_cache INFO [7f2d619fe640]     db-cluster-1:3306 / 33060 - mode=RO 
2022-10-11 15:21:21 metadata_cache INFO [7f2d619fe640]     db-cluster-3:3306 / 33060 - mode=RO 
2022-10-11 15:21:21 metadata_cache INFO [7f2d619fe640]     db-cluster-2:3306 / 33060 - mode=RW

First, it was detected the primary node and the MySQL Router began accepting connections

Bonus) Why the cluster got in failed state

After several connectivity issues, the primary server in the cluster got disconnected for a long time and the MySQL Router reported that

2022-10-11 12:29:30 metadata_cache INFO [7f2d619fe640] Connected with metadata server running on db-cluster-1:3306
2022-10-11 12:29:30 metadata_cache INFO [7f2d619fe640] Potential changes detected in cluster 'mycluster1' after metadata refresh
2022-10-11 12:29:30 metadata_cache INFO [7f2d619fe640] Metadata for cluster 'mycluster1' has 3 member(s), single-primary: (view_id=0)
2022-10-11 12:29:30 metadata_cache INFO [7f2d619fe640]     db-cluster-1:3306 / 33060 - mode=RO 
2022-10-11 12:29:30 metadata_cache INFO [7f2d619fe640]     db-cluster-3:3306 / 33060 - mode=RO 
2022-10-11 12:29:30 metadata_cache INFO [7f2d619fe640]     db-cluster-2:3306 / 33060 - mode=n/a 
2022-10-11 12:29:30 routing INFO [7f2d619fe640] Stop accepting connections for routing routing:mycluster1_x_rw listening on 6448 and named socket /var/lib/mysqlrouter/mysqlx.sock
2022-10-11 12:29:30 routing INFO [7f2d619fe640] Stop accepting connections for routing routing:mycluster1_rw listening on 6446 and named socket /var/lib/mysqlrouter/mysql.sock

The other two nodes continued to be in READ-ONLY, i.e. secondary mode, because there were also connectivity issues between the two servers. After restarting the mysqld service in all three servers, MySQL Router began reporting:
Unable to fetch live group_replication member data from any server in cluster ‘mycluster1’.

Check out how to recover a single instance in a MySQL InnoDB Cluster here – Recovery of MySQL 8 Cluster instance after server crash and corrupted data in log event.

Leave a Reply

Your email address will not be published. Required fields are marked *