Suddenly your slave server reset without a clean shutdown and when it came up again you saw the error of this kind:
2016-02-26 10:41:50 876 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594 2016-02-26 10:41:50 876 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.005014' position 152146793
So what we know, our master server is OK, our slave server was reset by unknown issue, so the problem is only in out slave logs. Mysql sever shows the status with:
mysql> SHOW SLAVE STATUS\G;
There are multiple lines of information, but the most important in our situation is these two lines:
Relay_Master_Log_File: mysql-bin.005014 Exec_Master_Log_Pos: 152146793
This is the place where the slave server stopped at (as you can see from the logs above, newer versions of MySQL print these two values in the log, but older versions do not print them in the log, so check them with the above command!).
The slave server stopped at file mysql-bin.005014 and position 152146793 and could not continue, because its files are corrupted. We can reset the position issuing a CHANGE MASTER command, which will clean up the relay logs and the slave will start the replication from this position – no data will be lost. Before issuing the following commands save the relay log files, they can be useful if you have later errors. Here is the command:
STOP SLAVE; CHANGE MASTER TO MASTER_HOST='1.1.1.1', MASTER_USER='replusr', MASTER_LOG_FILE='mysql-bin.005014', MASTER_LOG_POS=152146793; START SLAVE;
There three commands above
- Stop the replication in the slave, because the replication is still running and the slave is logging the binary log received from the master
- Change master command to reset the logs with the right position
- Start the replication in the slave
The replication must continue without errors!
In some cases after we issue the above commands and the replication starts it immediately stops with error of
Duplicate entry
.
Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry '3918722' for key 'PRIMARY'' on query. Default database: 'testdb'. Query: 'INSERT INTO `testtable` (`tabid`, `tabip`, `stat`, `ins`) VALUES ('83908', '2591777309', '1', NOW())'
So we did everything right, but our replication is again broken? The problem is that when there is such reset, it could happen the autoincrement of the table is reserved,but not used, because the server was reset just in the middle of the insert operation or it could be inserted properly, but the server was reset in the middle of updating the replication metadata! So you have two options:
- Change the autoincrement value of the table if there is no record with ID of the duplicate entry, just select it:
SELECT * FROM testdb WHERE id=[ID_FROM_THE_ERROR]
If there is no ID with such value, change the autoicrement of the table with
ALTER TABLE tbl AUTO_INCREMENT = [ID_FROM_THE_ERROR];
- Skip the duplicate entry query with
STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;
or parallel replication use
STOP SLAVE; START SLAVE UNTIL sql_after_mts_gaps; SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE;
You could trace the problem reading the relay logs at the position it stopped.
Often there is an issue with the last recorded position, so you should examine why you have duplicate entry. Check if the entry is inserted and if it is, just skip it! But then if you hit again a duplicate entry or another error, you should reinitialize the slave dumping the replicated databases from the master!
Here is the full log of status command, when there is a problem with the corrupted mysql relay logs:
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 1.1.1.1 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.005014 Read_Master_Log_Pos: 246696051 Relay_Log_File: mysqld-relay-bin.009911 Relay_Log_Pos: 152146956 Relay_Master_Log_File: mysql-bin.005014 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1594 Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Skip_Counter: 0 Exec_Master_Log_Pos: 152146793 Relay_Log_Space: 246698113 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1594 Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Replicate_Ignore_Server_Ids: Master_Server_Id: 2 Master_UUID: ce8a6c29-cf8e-11e5-9d39-000000000001 Master_Info_File: /var/lib/mysql/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 180226 11:54:51 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 1 row in set (0.00 sec)
One thought on “mysql slave reset and fixing relay log read failure”