MySQL Group Replication 学习(部署篇)

  MySQL5.7版本出来有很长时间了,之前也装了玩了下,大概了解了部分功能和特性,但没有系统的学习和测试其主要新增功能,最近也特意抽出时间想去多了解了解,学习学习5.7的主要新特性,这里主要是针对其最突出的功能之一MySQL Group Replication(MGR 组复制)学习下,针对其部署过程做个简要记录。
一、环境准备

名称    版本    备注
操作系统    RHEL6.5_X86_64    
数据库    5.7.18-15   Percona二进制版本
复制节点  10.26.7.129 node1
    10.26.7.142 node2
    10.26.7.166 node3

二、MGR详细部署步骤
1、MYSQL5.7安装
本次学习实验采用的是Percona-Server-5.7.18-15-Linux.x86_64.ssl101二进制版本,具体安装过程略
2、节点1my.cnf参数配置(主写节点)
#replicate
server-id=1001
skip-slave-start = false
read-only = false
expire_logs_days = 2
max_binlog_size = 1G
max_binlog_cache_size = 2G
log-bin = /home/mysql/mysql-bin
log-bin-index = /home/mysql/bin-index
binlog_format = row

log-slave-updates = 1
sync_binlog = 1
log-slow-slave-statements = 1
max-relay-log-size = 1G
relay-log = /home/mysql/mysql-relay
relay-log-index = /home/mysql/relay-index
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
#group replication
transaction_write_set_extraction  =XXHASH64         
loose-group_replication_group_name  =”2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec”  #务必以uuid形式配置
loose-group_replication_start_on_boot  =off
loose-group_replication_local_address  =”10.26.7.129:24001″                                    #不同节点配置不同节点本身的IP地址和端口,区分MYSQL自身的3306端口
loose-group_replication_group_seeds  =”10.26.7.129:24001,10.26.7.142:24001,10.26.7.166:24001″
loose-group_replication_bootstrap_group  =off
3、创建复制账户(主写节点)
set sql_log_bin=0;
create user rpl_user@’%';
grant replication slave on *.* to rpl_user@’%’  identified by ‘rpl_pass';
flush privileges;
set sql_log_bin=1;
change master to master_user=’rpl_user’,master_password=’rpl_pass’ for channel ‘group_replication_recovery';
4、安装组复制插件并启动组复制(主写节点)
安装插件
install plugin group_replication soname ‘group_replication.so';
检查插件是否正确安装
show plugins
+—————————–+———-+——————–+———————-+———+
| Name                        | Status   | Type               | Library              | License |
+—————————–+———-+——————–+———————-+———+
| group_replication           | ACTIVE   | GROUP REPLICATION  | group_replication.so | GPL     |
+—————————–+———-+——————–+———————-+———+
启动组复制
set global group_replication_bootstrap_group=ON;
start group_replication;
set global group_replication_bootstrap_group=OFF;
检查组复制成员及状态
select * from performance_schema.replication_group_members;
+—————————+————————————–+————-+————-+————–+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+—————————+————————————–+————-+————-+————–+
| group_replication_applier | 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec | node1       |        3306 | ONLINE       |
5、添加组复制成员实例node2 和node3
***添加节点前,务必做DNS解析,如果没有配置DNS解析服务器,需要在每个节点配置hosts解析  /etc/hosts***
10.26.7.166 node3
10.26.7.142 node2
10.26.7.129 node1

***节点my.cnf参数文件server-id和loose-group_replication_local_address =”node2:24001需要分别更改”***

node2
set sql_log_bin=0;
create user rpl_user@’%';
grant replication slave on *.* to rpl_user@’%’  identified by ‘rpl_pass';
flush privileges;

set sql_log_bin=1;
change master to master_user=’rpl_user’,master_password=’rpl_pass’ for channel ‘group_replication_recovery';

install plugin group_replication soname ‘group_replication.so';
show plugins

set global group_replication_allow_local_disjoint_gtids_join=ON;
start group_replication;

node3同样执行上述命令

然后检查组复制信息
select * from performance_schema.replication_group_members;     
+—————————+————————————–+————-+————-+————–+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+—————————+————————————–+————-+————-+————–+
| group_replication_applier | 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec | node1       |        3306 | ONLINE       |
| group_replication_applier | 35e38786-66bb-11e7-bcc3-b8ca3a6a61a4 | node2       |        3306 | ONLINE       |
| group_replication_applier | 3bbedb1e-66bb-11e7-8fc0-b8ca3a6a7c48 | node3       |        3306 | ONLINE       |
+—————————+————————————–+————-+————-+————–+

测试组复制是否正常:
(root:localhost:Sat Jul 15 13:26:33 2017)[(none)]>create database dbtest;
Query OK, 1 row affected (0.01 sec)
(root:localhost:Sat Jul 15 13:26:40 2017)[(none)]>use dbtest;
Database changed
(root:localhost:Sat Jul 15 13:26:45 2017)[dbtest]>create table t1(id int primary key);
Query OK, 0 rows affected (0.01 sec)
(root:localhost:Sat Jul 15 13:26:54 2017)[dbtest]>insert into t1 values(1);
Query OK, 1 row affected (0.00 sec)
node2和node3执行查询
(root:localhost:Sat Jul 15 12:57:32 2017)[db01]>use dbtest;
Database changed
(root:localhost:Sat Jul 15 13:27:26 2017)[dbtest]>select * from t1;
+—-+
| id |
+—-+
|  1 |
+—-+
1 row in set (0.00 sec)
三、错误问题以及汇总:
1、错误案例01
错误信息:2017-07-15T01:36:06.929941Z 4 [ERROR] Plugin group_replication reported: ‘The group name ‘group-replication-test’ is not a valid UUID’
错误原因:loose-group_replication_group_name参数没有按照UUID格式指定,被认为设置该参数无效
解决方案:更改loose-group_replication_group_name参数值为,loose-group_replication_group_name  =”2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec”
2、错误案例02
错误信息:
2017-07-15T01:29:27.271909Z 0 [Warning] unknown variable ‘loose-group_replication_group_name=group-replication-test’
2017-07-15T01:29:27.271926Z 0 [Warning] unknown variable ‘loose-group_replication_start_on_boot=off’
2017-07-15T01:29:27.271930Z 0 [Warning] unknown variable ‘loose-group_replication_local_address=10.26.7.129:3306′
2017-07-15T01:29:27.271935Z 0 [Warning] unknown variable ‘loose-group_replication_group_seeds=10.26.7.129:3306,10.26.7.142:3306,10.26.7.166:3306′
2017-07-15T01:29:27.271939Z 0 [Warning] unknown variable ‘loose-group_replication_bootstrap_group=off’
错误原因:因为先设置了这些参数,而没有装group_replication插件,导致数据库实例无法识别这些参数
解决方案:安装group replication插件,install plugin group_replication soname ‘group_replication.so'; (uninstall plugin group_replication 卸载,show plugins查看)
3、错误案例03
错误信息:
2017-07-15T01:54:54.447829Z 0 [Note] Plugin group_replication reported: ‘Unable to bind to 0.0.0.0:3306 (socket=60, errno=98)!’
2017-07-15T01:54:54.447948Z 0 [ERROR] Plugin group_replication reported: ‘Unable to announce tcp port 3306. Port already in use?’
2017-07-15T01:54:54.448101Z 0 [ERROR] Plugin group_replication reported: ‘[GCS] Error joining the group while waiting for the network layer to become ready.’
错误原因:配置的组复制监听端口和MYSQL实例端口冲突
解决方案:调整下面参数
loose-group_replication_local_address 
=”10.26.7.129:24001″                                   
#不同节点配置不同节点本身的IP地址和端口,区分MYSQL自身的3306端口

loose-group_replication_group_seeds  =”10.26.7.129:24001,10.26.7.142:24001,10.26.7.166:24001″
4、错误案例04
错误信息:
2017-07-15T04:20:01.249529Z 21 [ERROR] Slave I/O for channel ‘group_replication_recovery': error connecting to master ‘rpl_user@node2:3306′ – retry-time: 60  retries: 1, Error_code: 2005
错误原因:没有配置DNS解析或者hosts解析,节点无法连接其他数据库
解决方案:配置hosts解析,每个节点/etc/hosts添加如下内容
10.26.7.166 node3
10.26.7.142 node2
10.26.7.129 node1
5、错误案例05
错误信息
2017-07-15T03:42:45.395407Z 288 [ERROR] Slave SQL for channel ‘group_replication_recovery': Error ‘Can’t create database ‘db01′; database exists’ on query. Default database: ‘db01′. Query: ‘create database db01′, Error_code: 1007
2017-07-15T03:42:45.395472Z 288 [Warning] Slave: Can’t create database ‘db01′; database exists Error_code: 1007
2017-07-15T03:42:45.395503Z 288 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with “SLAVE START”. We stopped at log ‘mysql-bin.000005′ position 434
错误原因:这个错误是由于节点由于各种原因退出mgr组,后面又加入了mgr组,但之前存在的数据依旧存在
解决方案:删除要加入组复制节点存在的数据库即可,但其他节点不是主写节点,需要先调整参数set global super_read_only=0;然后执行drop database db01;再重新加入组
set global group_replication_allow_local_disjoint_gtids_join=ON;
start group_replication;
6、错误案例06
错误信息:
2017-07-15T03:44:09.982428Z 18 [ERROR] Slave SQL for channel ‘group_replication_recovery': Error ‘Can’t create database ‘db01′; database exists’ on query. Default database: ‘db01′. Query: ‘create database db01′, Error_code: 1007
2017-07-15T03:44:09.982493Z 18 [Warning] Slave: Can’t create database ‘db01′; database exists Error_code: 1007
2017-07-15T03:44:09.982522Z 18 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with “SLAVE START”. We stopped at log ‘mysql-bin.000005′ position 434
错误原因:同上错误案例05
解决方案:同上错误案例05
7、错误案例07
错误信息:
2017-07-15T03:49:10.370846Z 0 [ERROR] Plugin group_replication reported: ‘This member has more executed transactions than those present in the group. Local transactions: 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec:1-4,35e38786-66bb-11e7-bcc3-b8ca3a6a61a4:1 > Group transactions: 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec:1-16′
错误原因:同上错误案例05,在从库执行了多余的事务
解决方案:同上错误案例05,直接重新加入节点即可
set global group_replication_allow_local_disjoint_gtids_join=ON;
start group_replication;
8、错误案例08
错误信息
ERROR 3098 (HY000): The table does not comply with the requirements by an external plugin.
错误原因:由于主节点创建了表t1,但没有指定主键(此时表结构可以复制到各节点,一旦插入数据DML操作即会报错)
解决方案:为表增加主键,然后做DML操作(MGR需要各表都有主键)
alter table t1 add primary key(id);
insert into t1 values(1),(2);    
9、错误案例09
错误信息:
mysqldump -R -E –triggers –single-transaction  –master-data=2 -B db01 >db01.sql
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don’t want to restore GTIDs, pass –set-gtid-purged=OFF. To make a complete dump, pass –all-databases –triggers –routines –events.
mysqldump: Couldn’t execute ‘SAVEPOINT sp': The MySQL server is running with the –transaction-write-set-extraction!=OFF option so it cannot execute this statement (1290)
错误原因:mgr不支持mysqldump的事务一致性备份,因为其不支持savepoint
解决方案:通过xtrabackup或者不加–single-transaction备份
10、错误案例10
错误信息:
create table t2 as select * from t1;
ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE … SELECT.
错误原因:配置MGR,开启了GTID,所有GTID不支持的操作,MGR也不支持
解决方案:使用create table t2 like t1; insert into t2 select * from t;分开两个事务执行

PS:如果您想和业内技术大牛交流的话,请加qq群(527933790)或者关注微信公众 号(AskHarries),谢谢!

转载请注明原文出处:Harries Blog™ » MySQL Group Replication 学习(部署篇)

赞 (0)

分享到:更多 ()

评论 0

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址