mysql 主从错误处理:unblock with ‘mysqladmin flush-hosts‘ 和 Slave has more GTIDs than the master has

昨天一次插入 10W 条记录、卡死了怎么办,发现从服务器状态错误,因为中途曾经几次 kill process,现在数据没有同步,状态也不对了!

20230818 又一次处理 unblock with ‘mysqladmin flush-hosts’

之前怎么会处理的那么复杂?

  1. 许久不查看 replica,今天查看的时候发现错误了

             Last_IO_Error: Error reconnecting to source 'root@192.168.0.201:3306'. This was attempt 2077/86400, with a delay of 60 seconds between attempts. Message: Host '192.168.0.195' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
    
  2. 处理方法
    提示信息中已经告知处理方法 : unblock with ‘mysqladmin flush-hosts’
    我还傻乎乎转了一圈

    1). 在 master
    mysql> flush hosts;
    Query OK, 0 rows affected, 1 warning (0.08 sec)
    2). 在 replica 确认一下能否连接上 master
    
    $ mysql -h192.168.0.XXX -uroot -p
    
    3). 在 replica ,无需重新启动 replica
    直接查看
    mysql> show replica status \G;
    
    OK
    

处理 unblock with ‘mysqladmin flush-hosts’

  1. 错误信息:

    Last_IO_Error: Error connecting to source ‘repl@192.168.0.101:3306’. This was attempt 1/86400, with a delay of 60 seconds between attempts. Message: Host ‘192.168.0.103’ is blocked because of many connection errors; unblock with ‘mysqladmin flush-hosts’

     mysql> show replica status\G;
     *************************** 1. row ***************************
                  Replica_IO_State: Connecting to source
                       Source_Host: 192.168.0.101
                       Source_User: repl
                       Source_Port: 3306
                     Connect_Retry: 60
                   Source_Log_File: binlog.000001
               Read_Source_Log_Pos: 30115639
                    Relay_Log_File: ubuntu2004-103-relay-bin.000004
                     Relay_Log_Pos: 30111997
             Relay_Source_Log_File: binlog.000001
                Replica_IO_Running: Connecting
               Replica_SQL_Running: Yes
               ......
               Last_IO_Errno: 1129
                     Last_IO_Error: Error connecting to source 'repl@192.168.0.101:3306'. This was attempt 1/86400, with a delay of 60 seconds between attempts. Message: Host '192.168.0.103' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'
               ......
    
  2. 按照网上搜索的办法

     mysql> set global max_connect_errors = 1000;
     Query OK, 0 rows affected (0.00 sec)
     
     mysql> show global variables like '%max_connect_errors';
     +--------------------+-------+
     | Variable_name      | Value |
     +--------------------+-------+
     | max_connect_errors | 1000  |
     +--------------------+-------+
     1 row in set (0.01 sec)
    

    没有任何作用!

  3. 按照错误信息给出的方法

     $ mysqladmin -uroot -pPassword@123 flush-hosts
    

    这次报错不一样了
    Last_IO_Errno: 13114
    Last_IO_Error: Got fatal error 1236 from source when reading data from binary log: ‘Slave has more GTIDs than the master has, using the master’s SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been’

     mysql> show replica status\G;
     *************************** 1. row ***************************
                  Replica_IO_State: 
                       Source_Host: 192.168.0.101
                       Source_User: root
                       Source_Port: 3306
                     Connect_Retry: 60
                   Source_Log_File: 
               Read_Source_Log_Pos: 4
                    Relay_Log_File: ubuntu2004-103-relay-bin.000001
                     Relay_Log_Pos: 4
             Relay_Source_Log_File: 
                Replica_IO_Running: No
               Replica_SQL_Running: Yes
                   Replicate_Do_DB: 
               Replicate_Ignore_DB: 
                Replicate_Do_Table: 
            Replicate_Ignore_Table: 
           Replicate_Wild_Do_Table: 
       Replicate_Wild_Ignore_Table: 
                        Last_Errno: 0
                        Last_Error: 
                      Skip_Counter: 0
               Exec_Source_Log_Pos: 0
                   Relay_Log_Space: 157
                   ......
                                   Last_IO_Errno: 13114
                     Last_IO_Error: Got fatal error 1236 from source when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been'
               ......
    
  4. 网上搜来的参考文章,虽然还不是很明白,但是,解决了问题
    GTID主从复制与运维教程
    我这个情况应该属于主库提前删除了二进制日志和位置

     主服务器
     mysql> show master status;
     +---------------+----------+--------------+------------------+-------------------+
     | File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
     +---------------+----------+--------------+------------------+-------------------+
     | binlog.000001 |      157 |              |                  |                   |
     +---------------+----------+--------------+------------------+-------------------+
     1 row in set (0.00 sec)
     
     从服务器
     mysql> show master logs;
     +---------------+-----------+-----------+
     | Log_name      | File_size | Encrypted |
     +---------------+-----------+-----------+
     | binlog.000001 |       504 | No        |
     | binlog.000002 |       180 | No        |
     | binlog.000003 |       483 | No        |
     | binlog.000004 |       220 | No        |
     | binlog.000005 | 185690786 | No        |
     | binlog.000006 |       220 | No        |
     | binlog.000007 |  30195447 | No        |
     | binlog.000008 |      3300 | No        |
     | binlog.000009 |       253 | No        |
     +---------------+-----------+-----------+
     9 rows in set (0.00 sec)
    

    为什么有这么大差别?不得而知,后续再处理

    按照参考文章的做法

     mysql> stop replica;
     Query OK, 0 rows affected (0.04 sec)
     
     mysql> CHANGE REPLICATION SOURCE TO
         ->  SOURCE_HOST = '192.168.0.101',
         ->  SOURCE_PORT = 3306,
         ->  SOURCE_USER = 'root',
         ->  SOURCE_PASSWORD = 'Password@123',
         -> MASTER_LOG_FILE = 'binlog.000001',
         -> MASTER_LOG_POS= 157,
         ->  SOURCE_AUTO_POSITION = 0;
     Query OK, 0 rows affected, 4 warnings (0.24 sec)
     
     mysql> start replica;
     Query OK, 0 rows affected (0.11 sec)
    

    再次查看,OK 了!

     mysql> show replica status\G;
     *************************** 1. row ***************************
                  Replica_IO_State: Waiting for source to send event
                       Source_Host: 192.168.0.101
                       Source_User: root
                       Source_Port: 3306
                     Connect_Retry: 60
                   Source_Log_File: binlog.000001
               Read_Source_Log_Pos: 157
                    Relay_Log_File: ubuntu2004-103-relay-bin.000002
                     Relay_Log_Pos: 323
             Relay_Source_Log_File: binlog.000001
                Replica_IO_Running: Yes
               Replica_SQL_Running: Yes
               ......
    

    重新来过

处理 Slave has more GTIDs than the master has, using the master’s SERVER_UUID

  1. 按照参考文章也 reset 一个,这次 reset master; 也在从服务器上进行的

     mysql> stop replica;
     Query OK, 0 rows affected (0.03 sec)
     
     mysql> reset replica;
     Query OK, 0 rows affected (0.29 sec)
     
     mysql> reset master;
     Query OK, 0 rows affected (0.21 sec)
     
     mysql> CHANGE REPLICATION SOURCE TO
         ->  SOURCE_HOST = '192.168.0.101',
         ->  SOURCE_PORT = 3306,
         ->  SOURCE_USER = 'root',
         ->  SOURCE_PASSWORD = 'Password@123',
         ->  SOURCE_AUTO_POSITION = 1;
     Query OK, 0 rows affected, 2 warnings (0.37 sec)
     
     mysql> show replica status\G;
     
     OK
    
  2. 测试一下

     mysql> use test_20230414;
     
     mysql> truncate t;
     
     mysql> INSERT INTO `test1` (`name`, `age`, `indb`, `outdb`) VALUES ('wzh26', '25', '2021-04-01', '2023-04-25') ;
     Query OK, 1 row affected (0.06 sec)
    

    在从服务器上查看,同步 OK

  3. 在主服务器上再次执行昨天的存储过程

    只插入 1000 条记录

     mysql> call proc_insert_person(1000);
     Query OK, 0 rows affected (0.18 sec)
     
     mysql> show binlog events;
     +---------------+--------+----------------+-----------+-------------+-------------------------------------------------------------------+
     | Log_name      | Pos    | Event_type     | Server_id | End_log_pos | Info                                                              |
     +---------------+--------+----------------+-----------+-------------+-------------------------------------------------------------------+
     ......
     binlog.000001 |   3021 | Table_map      |       101 |        3091 | table_id: 198 (test_20230414.test1)                               |
     | binlog.000001 |   3091 | Write_rows     |       101 |        3143 | table_id: 198 flags: STMT_END_F                                   |
     | binlog.000001 |   3143 | Xid            |       101 |        3174 | COMMIT /* xid=1755757 */                                          |
     | binlog.000001 |   3174 | Gtid           |       101 |        3254 | SET @@SESSION.GTID_NEXT= 'de30036d-c49f-11ed-9a42-00155d5aa68e:7' |
     | binlog.000001 |   3254 | Query          |       101 |        3338 | BEGIN                                                             |
     | binlog.000001 |   3338 | Table_map      |       101 |        3407 | table_id: 218 (test_20230414.person)                              |
     | binlog.000001 |   3407 | Write_rows     |       101 |        3452 | table_id: 218 flags: STMT_END_F                                   |
     | binlog.000001 |   3452 | Table_map      |       101 |        3521 | table_id: 218 (test_20230414.person)                              |
     | binlog.000001 |   3521 | Write_rows     |       101 |        3566 | table_id: 218 flags: STMT_END_F                                   |
     |
     ......
     | binlog.000001 | 119067 | Write_rows     |       101 |      119114 | table_id: 218 flags: STMT_END_F                                   |
     | binlog.000001 | 119114 | Table_map      |       101 |      119183 | table_id: 218 (test_20230414.person)                              |
     | binlog.000001 | 119183 | Write_rows     |       101 |      119231 | table_id: 218 flags: STMT_END_F                                   |
     | binlog.000001 | 119231 | Xid            |       101 |      119262 | COMMIT /* xid=1756386 */   
    

    最后一条是 COMMIT ,原来二进制日志是这么记录的

    在从服务器上也查看一下

     mysql> show binlog events;
     
     和主服务器上一样的!
    

    现在主从同步是可以了,但是,因为 kill process ,造成的不同步,并没有恢复,也应该没有办法恢复!