1.环境配置

本实验环境共有四个节点, 其角色分配如下(实验机器均为centos 7.x

机器名称IP配置服务角色
manager172.16.90.211manager控制器
master172.16.90.212数据库主服务器
slave1172.16.90.213数据库从服务器
slave2172.16.90.214数据库从服务器

为了方便我们后期的操作,我们在各节点的/etc/hosts文件配置内容中添加如下内容:

172.16.90.211 node1
172.16.90.212 node2
172.16.90.213 node3
172.16.90.214 node4

1)初始主节点 master 的配置

在master上添加以下信息

#vim /etc/my.cnf
[mysqld]
server_id = 1					#复制集群中的各节点的id均必须唯一
skip_name_resolve   #关闭名称解析
gtid-mode = on					#启用gtid类型
enforce-gtid-consistency = true #强制GTID的一致性
log-slave-updates = 1			#slave更新是否记入日志
log-bin = mysql-bin				#开启二进制日志
relay-log = relay-log			#开启中继日志

[mysql]
prompt = (\\u@\\h) [\d] >\\
no_auto_rehash

2)所有 slave 节点依赖的配置

修改两个 slave 的数据库配置文件,两台机器都做如下操作:

#vim /etc/my.cnf
[mysqld]
server_id = 2和3					#复制集群中的各节点的id均必须唯一
skip_name_resolve   #关闭名称解析
gtid-mode = on					#启用gtid类型
enforce-gtid-consistency = true #强制GTID的一致性
log-slave-updates = 1			#slave更新是否记入日志
log-bin = mysql-bin				#开启二进制日志
relay-log = relay-log			#开启中继日志
read_only = ON						#启用只读属性
relay_log_purge = 0					#是否自动清空不再需要中继日志

[mysql]
prompt = (\\u@\\h) [\d] >\\      #连接上之后会有提示符
no_auto_rehash

之后三台机器都重启mysqld  # systemctl restart mysqld

3)配置一主多从复制架构

master 节点上:  

(root@localhost) [(none)]>create user rep@'%' identified with mysql_native_paassword by 'MySQL@123';
(root@localhost) [(none)]>grant replication slave,replication client on *.* to rep@'%';

两个slave 节点上:

(root@localhost) [(none)]>change master to
    -> master_user='rep',
    -> master_password='MySQL@123',
    -> master_host='172.16.90.212',
    -> master_auto_position=1;

(root@localhost) [(none)]>start slave;
Query OK, 0 rows affected (0.00 sec)

(root@localhost) [(none)]>show slave status \G
*************************** 1. row ***************************
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

2.安装配置MHA

1)下载MHA    

下载两个rpm包    

mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

mha4mysql-node-0.58-0.el7.centos.noarch.rpm

node4安装 yum install mha*.rpm

node1 node2 node3 安装 mha4mysql-node-0.58-0.el7.centos.noarch.rpm

2)准备 ssh 互通环境

[root@all ~]# ssh-keygen -f ~/.ssh/id_rsa -P '' -q
[root@all ~]# ssh-copy-id node1

所有节点都生成密钥对,然后拷贝到管理节点,最后管理节点的authorized_keys分发到其他节点
[root@node1 ~]# cat ~/.ssh/authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDN0CAV4pyxzBTev1ooGEp8DCr9RjQYvXwMga79zpLqdWu3sqWu8s5+QM6ylZ127cU66f5aYM/aNW6fujViI40ILOzD1GNGsWGcFCmGd0Do5hCs/rNf+4+FvanCOWGTkr/px15akaLI3YApX4e7qjPwVO2UEEnEXGB/9cGMzN000Nl8EZMq6S+DWZ4W2uap0K+xyznHy1SadjV4bDAaQtAWks8HYWSSQLWHn20LCc66FMVXeanZOd3wc4KgI2tA/3tfNbSnYUhX6+Yu3D8FAqdBdgD0QfKfROSDcwf3qZA9VQsYn6FZ//IFYx95KI/pDMmUD0Fe7ARlhv/Mc0dw9gGf root@node3.kongd.com
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0ujR4l0Y1XsBRtTCKxViXDs7kF7ephdaW0ovMFMLsFq6WsffrW3aiJyTvDIN6+7WSuuCVsVuxT1GLyWzYgnYnAa6bEYRuBK39idGiVCSdTJ/i3wY0XCiFXSmL2UiW4nBQQQUQd/xtu/PAHFNJ1WUT+DNxUz5pwfP6hYTN5xJ0YDqS2nmAkrP/9Elgb75yGOKIvCpyCZPWzGsAclmr6aw2WngAHoBtKgMP3mYOdTMaF/xQLK7W0n7QB0jaY9culCPKfFkxcHEiDJRxyHPz0SIFvGXxoJ4fHXO41p8nBipIXXOM3QMO8P8s6BM1g+MW9uvEgVi4Gk15jXtyYMnZNbL7 root@node1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvRsXlBbUuZDzoeBJ4qYkH+avwDjH3qkgUbHjKcxXolvtScShy4rH2xFdCVcwCltb8HeV1P3O1+f5HCHiCzNxQgrDW/pFmCJmtLojVVijZvpYrtRWlRUZtUOnau438q+WgxJED7RJc2J6Sah+3pxVGuK5AcQhX94VVbl45ZvOvqQd9AoYtyJwepp+B6BcscMPr6xuJhtgQPIPBFzwiqyQ7MWITirZqQvlK0rYw30VYuo3znh1PnmH8BHIm1Z0RPueCNZLnzTyu3+ZrUODcEqipWVemhntGMkadABEiBikmlqJpNPdqlg8uzdmwN4zoP4z05sEptfcN4kNg0/ejy9Tf root@node2.kongd.com
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCyf81Si+QgwEXektvtqxloYI46QtTe3xxRv4cU6hy3AbqJobgk3+d9mrxUkQxtvUA+BfPTQOBAMXYCVJIOSDMMZnkjbHRN3rNsTzoGZ91V8L5rVDwOPjSZC+GZB+8p8EHOtd7ofYcnzp/rBGoBmNo2AC3HAyZp5ZRmxE4/jayVy6XgYVPGSuSitBqWRiKxNLJfbYgpHdXjUhWVA02u9iR9RD+4i7Qy0hYaf3Cv5hEiZwUxnxbmIuVGDuoJAIOEUSK5g9hdcOKxCTuz3mA02bocVKRouIVIK12PQo/t9n8Q5MEQgGVMGQrz0eSxLzLYwJdJuq5pN3K5uYFuslUnDXRJ root@node4.kongd.com

[root@node1 ~]# scp ~/.ssh/authorized_keys node2:~/.ssh/
root@node2's password: 
authorized_keys                                                                                                    100% 1598     3.4MB/s   00:00    
[root@node1 ~]# scp ~/.ssh/authorized_keys node3:~/.ssh/
root@node3's password: 
authorized_keys                                                                                                    100% 1598     2.9MB/s   00:00    
[root@node1 ~]# scp ~/.ssh/authorized_keys node4:~/.ssh/
root@node4's password: 
authorized_keys                                                                                                    100% 1598     2.9MB/s   00:00   

验证:所有节点验证
[root@node4 ~]# for i in node{1..4};do ssh $i hostname;done
node1
node2
node3
node4

 

3) 初始化 MHA ,进行配置

  Manager 节点需要为每个监控的 master/slave 集群提供一个专用的配置文件,而所有的 master/slave 集群也可共享全局配置。全局配置文件默认为/etc/masterha_default.cnf,其为可选配置。如果仅监控一组 master/slave 集群,也可直接通过 application 的配置来提供各服务器的默认配置信息。而每个 application 的配置文件路径为自定义。具体操作见下一步骤。

在 master 上进行授权

在所有 Mysql 节点授权拥有管理权限的用户可在本地网络中由其他节点上远程访问。 当然, 此时仅需要且只能在 master 节点运行类似如下 SQL 语句即可。

(root@localhost) [(none)]> create user mhaadmin@'%' identified with mysql_native_pasword by 'Mha@1234';
(root@localhost) [(none)]> grant all on *.* to mhaadmin@'%';
Query OK, 0 rows affected, 1 warning (0.00 sec)

定义 MHA 管理配置文件

为MHA专门创建一个管理用户, 方便以后使用, 在mysql的主节点上, 三个节点自动同步:

创建配置文件目录
# mkdir /etc/mha
创建日志目录
# mkdir -p /var/log/mha/app1
# vim /etc/mha/app1.cnf

配置文件内容如下;

# vim /etc/mha/app1.cnf
[server default] 			//适用于server1,2,3个server的配置
user=mhaadmin 				//mha管理用户
password=Mha@1234			//mha管理密码
manager_workdir=/var/log/mha/app1 		//mha_master自己的工作路径
manager_log=/var/log/mha/app1/manager.log 	// mha_master自己的日志文件
ssh_user=root 				// 基于ssh的密钥认证
repl_user=rep				//数据库用户名
repl_password=MySQL@123	//数据库密码
ping_interval=1 			//ping间隔时长
[server1] 					//节点2
hostname=172.16.90.212	 	//节点2主机地址
ssh_port=22 				//节点2的ssh端口
candidate_master=1 			//将来可不可以成为master候选节点/主节点
[server2]
hostname=172.16.90.213
ssh_port=22
candidate_master=1
[server3]
hostname=172.16.90.214
ssh_port=22
candidate_master=1

3.对四个节点进行检测

1)检测各节点间 ssh 互信通信配置是否 ok   我们在 Manager 机器上输入下述命令来检测:

[root@node1 ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Wed Nov 24 11:36:56 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Nov 24 11:36:56 2021 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Wed Nov 24 11:36:56 2021 - [info] Reading server configuration from /etc/mha/app1.cnf..
Wed Nov 24 11:36:56 2021 - [info] Starting SSH connection tests..
Wed Nov 24 11:36:57 2021 - [debug] 
Wed Nov 24 11:36:56 2021 - [debug]  Connecting via SSH from root@172.16.90.212(172.16.90.212:22) to root@172.16.90.213(172.16.90.213:22)..
Wed Nov 24 11:36:56 2021 - [debug]   ok.
Wed Nov 24 11:36:56 2021 - [debug]  Connecting via SSH from root@172.16.90.212(172.16.90.212:22) to root@172.16.90.214(172.16.90.214:22)..
Wed Nov 24 11:36:56 2021 - [debug]   ok.
Wed Nov 24 11:36:57 2021 - [debug] 
Wed Nov 24 11:36:56 2021 - [debug]  Connecting via SSH from root@172.16.90.213(172.16.90.213:22) to root@172.16.90.212(172.16.90.212:22)..
Wed Nov 24 11:36:56 2021 - [debug]   ok.
Wed Nov 24 11:36:56 2021 - [debug]  Connecting via SSH from root@172.16.90.213(172.16.90.213:22) to root@172.16.90.214(172.16.90.214:22)..
Wed Nov 24 11:36:57 2021 - [debug]   ok.
Wed Nov 24 11:36:58 2021 - [debug] 
Wed Nov 24 11:36:57 2021 - [debug]  Connecting via SSH from root@172.16.90.214(172.16.90.214:22) to root@172.16.90.212(172.16.90.212:22)..
Wed Nov 24 11:36:57 2021 - [debug]   ok.
Wed Nov 24 11:36:57 2021 - [debug]  Connecting via SSH from root@172.16.90.214(172.16.90.214:22) to root@172.16.90.213(172.16.90.213:22)..
Wed Nov 24 11:36:57 2021 - [debug]   ok.
Wed Nov 24 11:36:58 2021 - [info] All SSH connection tests passed successfully.

2)检查管理的MySQL复制集群的连接配置参数是否OK

[root@node1 ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
...
Wed Nov 24 11:37:55 2021 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

4.启动 MHA

我们在 manager 节点上执行以下命令来启动 MHA:

[root@node1 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[1] 11457

启动成功以后,我们来查看一下 master 节点的状态:

[root@node1 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:11457) is running(0:PING_OK), master:172.16.90.212

查看监控日志命令如下:

[root@node1 ~]# tail -f /var/log/mha/app1/manager.log
172.16.90.212(172.16.90.212:3306) (current master)
 +--172.16.90.213(172.16.90.213:3306)
 +--172.16.90.214(172.16.90.214:3306)

Wed Nov 24 11:45:30 2021 - [warning] master_ip_failover_script is not defined.
Wed Nov 24 11:45:30 2021 - [warning] shutdown_script is not defined.
Wed Nov 24 11:45:30 2021 - [info] Set master ping interval 1 seconds.
Wed Nov 24 11:45:30 2021 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Wed Nov 24 11:45:30 2021 - [info] Starting ping health check on 172.16.90.212(172.16.90.212:3306)..
Wed Nov 24 11:45:30 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

上面启动太麻烦了,可以用脚本来启动

[root@node1 ~]# vim /etc/init.d/masterha_managerd
#!/bin/bash
# chkconfig: 35 80 20
# description: MHA management script.

STARTEXEC="/usr/bin/masterha_manager --conf"
STOPEXEC="/usr/bin/masterha_stop --conf"
CONF="/etc/mha/app1.cnf"
process_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`
PARAMS="--ignore_last_failover"

case "$1" in
  start)
      if [ $process_count -gt 1 ]
      then
              echo "masterha_manager exists, process is already running"
      else
              echo "Starting Masterha Manager"
              $STARTEXEC $CONF $PARAMS < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
      fi
      ;;
  stop)
      if [ $process_count -eq 0 ]
      then
              echo "Masterha Manager does not exist, process is not running"
      else
              echo "Stopping ..."
              $STOPEXEC $CONF
              while(true)
              do
                  process_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`
                  if [ $process_count -gt 0 ]
                      then
                           sleep 1
                      else
                           break
                  fi
              done
              echo "Master Manager stopped"
      fi
      ;;
  *)
      echo "Please use start or stop as first argument"
      ;;
esac
[root@node1 ~]# chmod +x /etc/init.d/masterha_managerd
[root@node1 ~]# chkconfig --add masterha_managerd
[root@node1 ~]# chkconfig masterha_managerd on

测试服务脚本:

[root@node1 ~]# systemctl start masterha_managerd
[root@node1 ~]# systemctl status masterha_managerd
● masterha_managerd.service - SYSV: MHA management script.
   Loaded: loaded (/etc/rc.d/init.d/masterha_managerd; bad; vendor preset: disabled)
   Active: active (running) since Wed 2021-11-24 15:22:55 CST; 9s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 25727 ExecStart=/etc/rc.d/init.d/masterha_managerd start (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/masterha_managerd.service
           └─25733 perl /usr/bin/masterha_manager --conf /etc/mha/app1.cnf --ignore_last_failover

Nov 24 15:22:54 node1.kongd.com systemd[1]: Starting SYSV: MHA management script....
Nov 24 15:22:55 node1.kongd.com masterha_managerd[25727]: Starting Masterha Manager
Nov 24 15:22:55 node1.kongd.com systemd[1]: Started SYSV: MHA management script..
[root@node1 ~]# ps -ef | grep -w masterha_manager
root      25733      1  0 15:22 ?        00:00:00 perl /usr/bin/masterha_manager --conf /etc/mha/app1.cnf --ignore_last_failover
root      25800   1169  0 15:23 pts/0    00:00:00 grep --color=auto -w masterha_manager

[root@node1 ~]# systemctl stop masterha_managerd
[root@node1 ~]# ps -ef | grep -w masterha_manager
root      25856   1169  0 15:24 pts/0    00:00:00 grep --color=auto -w masterha_manager

5.配置VIP   

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式 (即不需要keepalived或者heartbeat类似的软件).

为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟ip,而不是使用keepalived来完成。

1、编写脚本

[root@node1 ~]# vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
​
use strict;
use warnings FATAL => 'all';
​
use Getopt::Long;
​
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);
​
my $vip = '172.16.90.210/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
​
GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s'  => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i'  => \$new_master_port,
);
​
exit &main();
​
sub main {
​
    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
​
    if ( $command eq "stop" || $command eq "stopssh" ) {
​
        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {
​
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}
​
sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
     return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
​
sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
​
[root@node1 ~]# chmod +x /usr/local/bin/master_ip_failover 

2、更改manager配置文件

[root@node1 ~]# vim /etc/mha/app1.cnf 
[server default]
添加:
master_ip_failover_script=/usr/local/bin/master_ip_failover

3、主库上,手工生成第一个vip地址

[root@node2 ~]# ifconfig eth0:1 172.16.90.210/24
注意:第一次需要在主库上手动配置vip
[root@node2 ~]# ifconfig -a |grep -A 2 "eth0:1"
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.90.210  netmask 255.255.255.0  broadcast 172.16.90.255
        ether 28:6e:d4:89:3b:3e  txqueuelen 1000  (Ethernet)

4、重启MHA

[root@node1 ~]# systemctl restart masterha_managerd

6.测试 MHA 故障转移

在 master 节点关闭 mysql 服务,模拟主节点数据崩溃

[root@node2 ~]# systemctl stop mysqld.service

在 manger 节点查看日志

[root@node1 ~]# tail -1 /var/log/mha/app1/manager.log 
Master failover to 172.16.90.213(172.16.90.213:3306) completed successfully.

检查VIP

[root@node3 ~]# ifconfig -a |grep -A 2 "eth0:1"
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.90.210  netmask 255.255.255.0  broadcast 172.16.90.255
        ether 28:6e:d4:89:3b:3e  txqueuelen 1000  (Ethernet)

表示 manager 检测到172.16.90.212节点故障, 而后自动执行故障转移, 将172.16.90.213提升为主节点。   注意,故障转移完成后, manager将会自动停止, 此时使用 masterha_check_status 命令检测将会遇到错误提示, 如下所示:

[root@node1 ~]# masterha_check_status -conf=/etc/mha/app1.cnf
app1 is stopped(2:NOT_RUNNING).

 

7.提供新的从节点以修复复制集群

      原有 master 节点故障后,需要重新准备好一个新的 MySQL 节点。基于来自于master 节点的备份恢复数据后,将其配置为新的 master 的从节点即可。注意,新加入的节点如果为新增节点,其 IP 地址要配置为原来 master 节点的 IP,否则,还需要修改 mha.cnf 中相应的 ip 地址。随后再次启动 manager ,并再次检测其状态。

[root@node2 ~]# systemctl start mysql.service

(root@localhost) [(none)]> change master to
    -> master_user='rep',
    -> master_password='MySQL@123',
    -> master_host='172.16.90.213',
    -> master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

(root@localhost) [(none)]> start slave;
Query OK, 0 rows affected (0.01 sec)

(root@localhost) [(none)]> show slave status \G
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

8.新节点提供后再次执行检查操作

添加配置:
[root@node1 ~]# vim /etc/mha/app1.cnf
[server1]
hostname=172.16.90.212
ssh_port=22
candidate_master=1

[root@node1 ~]# systemctl restart masterha_managerd
[root@node1 ~]# masterha_check_status -conf=/etc/mha/app1.cnf
app1 (pid:26445) is running(0:PING_OK), master:172.16.90.213

 

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐