mysql slave requested master to start replication from position greater than the file size

If you happen to reset your master mysql server without shutting down the mysql process (probably because of your super collocation cut the power!!!) you could have a slave server, which have the following error:

Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size; the first event 'mysql-bin.005173' at 644642725, the last event read from './mysql-bin.005173' at 4, the last byte read from './mysql-bin.005173' at 4.'

The slave server wants to read a position in the master, which does not exist probably because it was not committed to the file.

It is fairly easy (and in most cases safe) to just correct the position and restart the replication! Take the Master_Log_File and Read_Master_Log_Pos on the slave with:

[root@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 50497
Server version: 5.6.38-log MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 10.10.10.10
                  Master_User: replusr
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.005173
          Read_Master_Log_Pos: 644642725
               Relay_Log_File: mysqld-relay-bin.000479
                Relay_Log_Pos: 644642888
        Relay_Master_Log_File: mysql-bin.005173
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 644642725
              Relay_Log_Space: 644643109
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size; the first event 'mysql-bin.005173' at 644642725, the last event read from './mysql-bin.005173' at 4, the last byte read from './mysql-bin.005173' at 4.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: ce8a6c29-cf8e-11e5-9d39-000000000001
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 180525 16:27:22
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

As you can see the highlighted variables are important, that is the file and the position you should check in your master mysql server. So go to your master datadir (or the binlog directory, which could be changed with “binlog-dir”) and check whether there is Master_Log_File: “mysql-bin.005173” with position Read_Master_Log_Pos: “644642725”

srv@local-master ~ # cd /var/lib/mysql/
srv@local-master mysql # ls -altr mysql-bin.005173
-rw-rw---- 1 mysql mysql 644639026 24 may 12,22 mysql-bin.005173
srv@local-master mysql # mysqlbinlog mysql-bin.005173|tail -n 15
# at 644638722
#180524 12:03:31 server id 2  end_log_pos 644638930 CRC32 0xfeabe1ab    Query   thread_id=88263388      exec_time=0     error_code=0
SET TIMESTAMP=1527152611/*!*/;
UPDATE `group_desc` SET `id` = '153357',`name` = 'Test Group Name',`tags` = '|Test Group |' WHERE  `id` = '153357'
/*!*/;
# at 644638930
#180524 12:03:31 server id 2  end_log_pos 644639026 CRC32 0x5ca1b693    Query   thread_id=88263388      exec_time=0     error_code=0
SET TIMESTAMP=1527152611/*!*/;
COMMIT
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
srv@local-master ~ # 

And yes, there is no position Read_Master_Log_Pos: “644642725” as the slave requested! The position number starts with

# at

for example

# at 644638930

As you can see from the bash commands above we got the last 15 lines of our binlog mysql file and the last position was 644638930. Here is what is going on: slave requested to continue from master’s position at 644642725, but master has last position 644638930:

644642725 > 644638930

To fix it just use the next binlog file and position 1 and your slave will continue normally. Let’s say there is a possibility your master missed to write the last commands to the master’s binlog because of the reset and in this situation your slave could be out of sync and in this case you should recover your slave from a full mysql dump and import in the slave. But in most cases it is fairly safe to continue.

[root@mysql1 ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 50506
Server version: 5.6.38-log MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.12 sec)

mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.005174',MASTER_LOG_POS=1;
Query OK, 0 rows affected (0.47 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.04 sec)

And if everything is OK your slave will continue with no errors (just with a big delay – behind your master):

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.10.10.10
                  Master_User: replusr
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.005175
          Read_Master_Log_Pos: 328685690
               Relay_Log_File: mysqld-relay-bin.000003
                Relay_Log_Pos: 36999
        Relay_Master_Log_File: mysql-bin.005175
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 36836
              Relay_Log_Space: 334719008
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 103216
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: ce8a6c29-cf8e-11e5-9d39-000000000001
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Repair by sorting
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

But if you get:

Duplicate entry

Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '422987' for key 'PRIMARY'' on query. Default database: 'mydb'. Query: 'INSERT INTO `spec_cookie` (`userid`, `userip`, `cookie`, `added`) VALUES ('96201', '2591115382', 'f3b81be45a484c652d38a2c70f8c44c30d4d04d1293918c9071e052ffd9c76f7', NOW())'

You might get into troubles if you continue, be careful!!! Examine the query, select the data in the slave and in the master, if they are equal you can skip it the error with:

[srv@local-slave ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 50506
Server version: 5.6.38-log MySQL Community Server (GPL)

Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> STOP SLAVE;
Query OK, 0 rows affected (0.09 sec)

mysql> SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
Query OK, 0 rows affected (0.00 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.12 sec)

mysql>

If they are not equal you can change them manually and continue again. More about Duplicate entry you can check here (similar mysql binlog problems): mysql slave reset and fixing relay log read failure

Speeding up the receiving of mysql binlog when replication is configured

Have you ever set up a mysql replication between two host with more than 100ms latency between. Let’s say one of your server is in US and the other is in Europe. The latency between the servers could reach more than 100ms and your MySQL replication (or galera cluster, or master-master configuration) which could easily turn into problems when you have really busy servers. In our case we have really busy servers with more than 7G binlog of every minute, which means 7G/minute transfer between the servers with a latency of 100ms and it could get worse your connectivity just hiccup for a minute or two and you would have a replication behind with more than 20 binlog files, which could take up to an hour to advance. There is a good chance to get the binlog files a lot faster – between 5-8 times faster

just compress them!

You can enable compression between your servers so they will transfer the binlog in a compressed format, the compression algo is really fast and almost no CPU consuming you’ll not observe any penalties (even if you enable it between servers on “the same switch”) and your traffic will decrease with almost 5-8 times it depends on your queries but in general it is around such values! So you now 1G binlog could be transferred 5-8 times faster using only 150Mbytes of your connection!
Here is how to enable this option

  1. Enter MySQL console
  2. set the compression to true
  3. verify the compression is set
  4. STOP and then START the slave – the compression won’t happen between the servers if you DO NOT DO this!!!
srv@local ~ # mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 55856
Server version: 10.1.30-MariaDB-1~xenial mariadb.org binary distribution

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> set global slave_compressed_protocol=1;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> show global variables like 'slave_compressed_protocol';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| slave_compressed_protocol | ON    |
+---------------------------+-------+
1 row in set (0.00 sec)

MariaDB [(none)]> STOP SLAVE;
Query OK, 0 rows affected (0.15 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.01 sec)

Then check your network:

main menu
Network Statistics

main menu
Network Packets

mysql slave reset and fixing relay log read failure

Suddenly your slave server reset without a clean shutdown and when it came up again you saw the error of this kind:

2016-02-26 10:41:50 876 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
2016-02-26 10:41:50 876 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.005014' position 152146793

So what we know, our master server is OK, our slave server was reset by unknown issue, so the problem is only in out slave logs. Mysql sever shows the status with:

mysql> SHOW SLAVE STATUS\G;

There are multiple lines of information, but the most important in our situation is these two lines:

Relay_Master_Log_File: mysql-bin.005014
  Exec_Master_Log_Pos: 152146793

This is the place where the slave server stopped at (as you can see from the logs above, newer versions of MySQL print these two values in the log, but older versions do not print them in the log, so check them with the above command!).
The slave server stopped at file mysql-bin.005014 and position 152146793 and could not continue, because its files are corrupted. We can reset the position issuing a CHANGE MASTER command, which will clean up the relay logs and the slave will start the replication from this position – no data will be lost. Before issuing the following commands save the relay log files, they can be useful if you have later errors. Here is the command:

STOP SLAVE;              
CHANGE MASTER TO
         MASTER_HOST='1.1.1.1',
         MASTER_USER='replusr',
         MASTER_LOG_FILE='mysql-bin.005014',
         MASTER_LOG_POS=152146793;
START SLAVE;

There three commands above

  • Stop the replication in the slave, because the replication is still running and the slave is logging the binary log received from the master
  • Change master command to reset the logs with the right position
  • Start the replication in the slave

The replication must continue without errors!

In some cases after we issue the above commands and the replication starts it immediately stops with error of

Duplicate entry

.

Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '3918722' for key 'PRIMARY'' on query. Default database: 'testdb'. Query: 'INSERT INTO `testtable` (`tabid`, `tabip`, `stat`, `ins`) VALUES ('83908', '2591777309', '1', NOW())'

So we did everything right, but our replication is again broken? The problem is that when there is such reset, it could happen the autoincrement of the table is reserved,but not used, because the server was reset just in the middle of the insert operation or it could be inserted properly, but the server was reset in the middle of updating the replication metadata! So you have two options:

  • Change the autoincrement value of the table if there is no record with ID of the duplicate entry, just select it:
    SELECT * FROM testdb WHERE id=[ID_FROM_THE_ERROR]
    

    If there is no ID with such value, change the autoicrement of the table with

    ALTER TABLE tbl AUTO_INCREMENT = [ID_FROM_THE_ERROR];
    
  • Skip the duplicate entry query with
    STOP SLAVE;
    SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
    START SLAVE;
    

    or parallel replication use

    STOP SLAVE;
    START SLAVE UNTIL sql_after_mts_gaps;
    SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
    START SLAVE;
    

You could trace the problem reading the relay logs at the position it stopped.
Often there is an issue with the last recorded position, so you should examine why you have duplicate entry. Check if the entry is inserted and if it is, just skip it! But then if you hit again a duplicate entry or another error, you should reinitialize the slave dumping the replicated databases from the master!

Here is the full log of status command, when there is a problem with the corrupted mysql relay logs:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 1.1.1.1
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.005014
          Read_Master_Log_Pos: 246696051
               Relay_Log_File: mysqld-relay-bin.009911
                Relay_Log_Pos: 152146956
        Relay_Master_Log_File: mysql-bin.005014
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 152146793
              Relay_Log_Space: 246698113
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: ce8a6c29-cf8e-11e5-9d39-000000000001
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: 
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 180226 11:54:51
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)