处理mysql主从不同步问题

Z时代
2024-01-10
分类：综合

database

问题描述：发现主库操作数据从库没有变动问题，可能原因是从库重启导致的无法同步问题。

排查思路：

1、查看主从复制状态

发现从库的IO和SQL进程都是no(正常状态应该是yes)

注意：mysql replication中slave机器上有两个关键进程，死一个都不行，一个是slave_sql_running,一个是slave_io_running，一个负责与主机的IO通信，一个负责自己的slave mysql进程。

2、解决办法如下：

>stop slave; ##停止同步

> SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; ##设置counter为1,启动同步

>show slave statusG; ##查看同步状态

3、发现SQL进程还是No

提示信息显示如下：

Last_Errno: 1396

Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction "fab1fc64-d0f2-11ec-a2a6-000c2950bca1:9" at master log mybinlog.000001, end_log_pos 1966. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

4、根据上面的提示，查询到的异常数据出现在opp_starck表中

#select * from performance_schema.replication_applier_status_by_workerG; 确定事务发生在表opp_strack上，定位在表上，再去排查是哪张表

可以参考：https://blog.csdn.net/memory6364/article/details/86152717

5、主从数据恢复一致后需要在slave上跳过报错的事务，在从库中执行

使用GTID跳过错误的方法：找到错误的GTID跳过（通过exec_master_log_pos去binlog里找GTID，或者则通过监控表replication_applier_status_by_worker找到GTID，也可以通过excured_gtid_set算GTID），这里使用监控来找到错误的GTID。找到GTID之后，跳过错误的步骤：