MySQL之高可用组件MHA

1 简介

MHA(Master High Availability)是由日本人yoshinorim开发的一款成熟且开源的MySQL高可用程序,它实现了MySQL主从环境下MASTER宕机后能够自动进行单次故障转移的功能,其本身由perl语言编写,安装方便,使用简单

MHA Master可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上

MHA Manager主要运行一些工具,比如masterha_manager工具实现自动监控MySQL Master和实现master故障切换,其它工具实现手动实现master故障切换、在线master转移、连接检查等

MHA Node部署在所有运行MySQL的服务器上,无论是master还是slave。主要作用有三个

Ⅰ、保存二进制日志 如果能够访问故障master,会拷贝master的二进制日志

II、应用差异中继日志 从拥有最新数据的slave上生成差异中继日志,然后应用差异日志

III、清除中继日志 在不停止SQL线程的情况下删除中继日志

MHA工作原理

  • 从宕机崩溃的Master保存二进制日志事件(binlog event);
  • 识别含有最新更新的Slave;
  • 应用差异的中继日志(relay log)到其他Slave;
  • 应用从Master保存的二进制日志事件;
  • 提升一个Slave为新的Master;
  • 使其他的Slave连接新的Master进行复制;
  • MHA数据补偿

    • 当SSH能连接,从库对比主库GTID 或者position号,立即将二进制日志保存至各个从节点并且应用(save_binary_logs )
    • 当SSH不能连接, 对比从库之间的relaylog的差异(apply_diff_relay_logs)

    2 环境部署

    节点
    ip
    主节点 10.243.95.3
    从节点1 10.243.95.4
    从节点2 10.243.95.5

    数据库复制一主两从架构,MHA node部署在每个节点上,MHA manager部署在从节点2上

    2.1 建立SSH互信

    #各节点生成rsa密钥
    ssh-keygen -t rsa
    #发送密钥至三台服务器实现互信
    ssh-copy-id [email protected]
    ssh-copy-id [email protected]
    ssh-copy-id [email protected]

    2.2 搭建主从复制

    具体内容见上篇主从复制,注意mha57以下版本不支持GTID事务,需避雷,按需选择下面的部署方法

    2.3 安装依赖包

    2.3.1 MHA-0.57-el6版本

    #1.所有节点安装以下依赖包
    yum -y install perl-DBD-MySQL perl-ExtUtils-MakeMaker perl-CPAN
    yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
    #2.所有节点安装mha-node包
    rpm -ivh  mha4mysql-node-0.57-0.el6.noarch.rpm
    #3.管理节点安装mha-manage包
    rpm -ivh  mha4mysql-manager-0.57-0.el6.noarch.rpm

    2.3.2 MHA-0.55-el6版本

    #1.解压mha包
    unzip -u mha_pack.zip
    cd mha_pack/mha_pack/rhel6/mha2
    #2.安装依赖
    rpm -ivh perl-DBI-1.609-4.el6.x86_64.rpm
    rpm -ivh perl-DBD-MySQL-4.013-3.el6.x86_64.rpm
    rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm
    rpm -ivh perl-DBI-1.609-4.el6.x86_64.rpm
    rpm -ivh perl-DBD-MySQL-4.013-3.el6.x86_64.rpm
    rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm
    rpm -ivh perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
    rpm -ivh perl-TimeDate-1.16-11.1.el6.noarch.rpm
    rpm -ivh perl-MailTools-2.04-4.el6.noarch.rpm
    rpm -ivh perl-Email-Date-Format-1.002-5.el6.noarch.rpm
    rpm -ivh perl-MIME-Types-1.28-2.el6.noarch.rpm
    rpm -ivh perl-MIME-Lite-3.027-2.el6.noarch.rpm
    rpm -ivh perl-Mail-Sender-0.8.16-3.el6.noarch.rpm
    rpm -ivh perl-Mail-Sendmail-0.79-12.el6.noarch.rpm
    rpm -ivh perl-Params-Validate-0.92-3.el6.x86_64.rpm
    rpm -ivh perl-Log-Dispatch-2.27-1.el6.noarch.rpm
    rpm -ivh perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm
    rpm -ivh perl-Time-HiRes-1.9721-127.el6.x86_64.rpm
    rpm -ivh mha4mysql-manager-0.55-0.el6.noarch.rpm

    2.4 配置文件

    新建mha配置文件:vi /etc/masterha/app1.cnf

    [server default]
    # mysql root user and password
    user=root
    password=123456

    # mysql log locations
    #master_binlog_dir=/mysql/data

    # user to use SSH around the cluster
    ssh_user=root

    # repl user and password
    repl_user=repl
    repl_password=123456

    # working directory on the manager
    manager_log=/root/mha/log/manager.log
    manager_workdir=/root/mha/log
    #remote_workdir=/mysql/mha/log

    # MHA ping interval
    ping_interval=5

    ## scripts to support MHA functions
    ## secondary_check_script= /usr/bin/masterha_secondary_check -s remote_host1 -s remote_host2
    master_ip_failover_script= /root/mha/scripts/master_ip_failover
    ## shutdown_script= /usr/local/masterha/scripts/power_manager
    ## report_script= /usr/local/masterha/scripts/send_report
    master_ip_online_change_script= /root/mha/scripts/master_ip_online_change

    # masters
    [server1]
    hostname=10.243.95.3
    port=3306
    candidate_master=1
    master_binlog_dir=/opt/mysql/binlog
    remote_workdir=/root/mha/log

    [server2]
    hostname=10.243.95.4
    port=3306
    candidate_master=1
    check_repl_delay=0
    master_binlog_dir=/opt/mysql/binlog
    remote_workdir=/root/mha/log

    [server3]
    hostname=10.243.95.5
    port=3306 
    candidate_master=1
    check_repl_delay=0
    master_binlog_dir=/opt/mysql/binlog
    remote_workdir=/root/mha/log

    2.5 切换脚本

    1)创建一些目录

    #建立mha的文件夹
    mkdir -p /root/mha/conf
    mkdir -p /root/mha/log 
    mkdir -p /root/mha/scripts
    chmod -R 777 /root/mha/scripts

    2)新建脚本

    • 在线切换脚本
    • vi /root/mha/scripts/master_ip_online_change
    • #!/usr/bin/env perl

      #  Copyright (C) 2011 DeNA Co.,Ltd.
      #
      #  This program is free software; you can redistribute it and/or modify
      #  it under the terms of the GNU General Public License as published by
      #  the Free Software Foundation; either version 2 of the License, or
      #  (at your option) any later version.
      #
      #  This program is distributed in the hope that it will be useful,
      #  but WITHOUT ANY WARRANTY; without even the implied warranty of
      #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
      #  GNU General Public License for more details.
      #
      #  You should have received a copy of the GNU General Public License
      #   along with this program; if not, write to the Free Software
      #  Foundation, Inc.,
      #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

      ## Note: This is a sample script and is not complete. Modify the script based on your environment.

      use strict;
      use warnings FATAL => ‘all’;

      use Getopt::Long;
      use MHA::DBHelper;
      use MHA::NodeUtil;
      use Time::HiRes qw( sleep gettimeofday tv_interval );
      use Data::Dumper;

      my $_tstart;
      my $_running_interval = 0.1;
      my (
        $command, $ssh_user,         $orig_master_host, $orig_master_ip,
        $orig_master_port, $orig_master_user, $orig_master_password,
        $new_master_host,  $new_master_ip,    $new_master_port,
        $new_master_user,  $new_master_password
      );

      my $vip = ‘192.168.90.30’;  # writer Virtual IP
      my $gw = `route|grep default|awk ‘{print $2}’`;
      chomp($gw);
      my $interface = ‘eth0’;  # network interface
      my $ssh_start_vip = “ip addr add $vip/32 dev $interface”;
      my $ssh_ping = “arping -U -I $interface -s $vip $gw -c 5”;
      my $ssh_stop_vip = “ip addr del $vip/32 dev $interface”;

      GetOptions(
        ‘command=s’              => $command,
        ‘ssh_user=s’                => $ssh_user,
        ‘orig_master_host=s’     => $orig_master_host,
        ‘orig_master_ip=s’       => $orig_master_ip,
        ‘orig_master_port=i’     => $orig_master_port,
        ‘orig_master_user=s’     => $orig_master_user,
        ‘orig_master_password=s’ => $orig_master_password,
        ‘new_master_host=s’      => $new_master_host,
        ‘new_master_ip=s’        => $new_master_ip,
        ‘new_master_port=i’      => $new_master_port,
        ‘new_master_user=s’      => $new_master_user,
        ‘new_master_password=s’  => $new_master_password,
      );

      $ssh_user = ‘root’ unless ($ssh_user);

      exit &main();

      sub current_time_us {
        my ( $sec, $microsec ) = gettimeofday();
        my $curdate = localtime($sec);
        return $curdate . ” ” . sprintf( “%06d”, $microsec );
      }

      sub sleep_until {
        my $elapsed = tv_interval($_tstart);
        if ( $_running_interval > $elapsed ) {
          sleep( $_running_interval – $elapsed );
        }
      }

      sub get_threads_util {
        my $dbh                    = shift;
        my $my_connection_id       = shift;
        my $running_time_threshold = shift;
        my $type                   = shift;
        $running_time_threshold = 0 unless ($running_time_threshold);
        $type                   = 0 unless ($type);
        my @threads;

        my $sth = $dbh->prepare(“SHOW PROCESSLIST”);
        $sth->execute();

        while ( my $ref = $sth->fetchrow_hashref() ) {
          my $id         = $ref->{Id};
          my $user       = $ref->{User};
          my $host       = $ref->{Host};
          my $command    = $ref->{Command};
          my $state      = $ref->{State};
          my $query_time = $ref->{Time};
          my $info       = $ref->{Info};
          $info =~ s/^s*(.*?)s*$/$1/ if defined($info);
          next if ( $my_connection_id == $id );
          next if ( defined($query_time) && $query_time < $running_time_threshold );
          next if ( defined($command)    && $command eq “Binlog Dump” );
          next if ( defined($user)       && $user eq “system user” );
          next
            if ( defined($command)
            && $command eq “Sleep”
            && defined($query_time)
            && $query_time >= 1 );

          if ( $type >= 1 ) {
            next if ( defined($command) && $command eq “Sleep” );
            next if ( defined($command) && $command eq “Connect” );
          }

          if ( $type >= 2 ) {
            next if ( defined($info) && $info =~ m/^select/i );
            next if ( defined($info) && $info =~ m/^show/i );
          }

          push @threads, $ref;
        }
        return @threads;
      }

      sub main {
        if ( $command eq “stop” ) {
          ## Gracefully killing connections on the current master
          # 1. Set read_only= 1 on the new master
          # 2. Set read_only= 1 on the current master
          # 3. Kill current queries
          # 4. Deegister original master’s ip to the catalog database
          # * Any database access failure will result in script die.
          my $exit_code = 1;
          eval {
            ## Setting read_only=1 on the new master (to avoid accident)
            my $new_master_handler = new MHA::DBHelper();

            # args: hostname, port, user, password, raise_error(die_on_error)_or_not
            $new_master_handler->connect( $new_master_ip, $new_master_port,
              $new_master_user, $new_master_password, 1 );
            print current_time_us() . ” Set read_only on the new master.. “;
            $new_master_handler->enable_read_only();
            if ( $new_master_handler->is_read_only() ) {
              print “ok.n”;
            }
            else {
              die “Failed!n”;
            }
            $new_master_handler->disconnect();

            # Connecting to the orig master, die if any database error happens
            my $orig_master_handler = new MHA::DBHelper();
            $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
              $orig_master_user, $orig_master_password, 1 );

            ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
            $orig_master_handler->disable_log_bin_local();

            ## Waiting for N * 100 milliseconds so that current connections can exit
            my $time_until_read_only = 15;
            $_tstart = [gettimeofday];
            my @threads = get_threads_util( $orig_master_handler->{dbh},
              $orig_master_handler->{connection_id} );
            while ( $time_until_read_only > 0 && $#threads >= 0 ) {
              if ( $time_until_read_only % 5 == 0 ) {
                printf
      “%s Waiting all running %d threads are disconnected.. (max %d milliseconds)n”,
                  current_time_us(), $#threads + 1, $time_until_read_only * 100;
                if ( $#threads < 5 ) {
                  print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . “n”
                    foreach (@threads);
                }
              }
              sleep_until();
              $_tstart = [gettimeofday];
              $time_until_read_only–;
              @threads = get_threads_util( $orig_master_handler->{dbh},
                $orig_master_handler->{connection_id} );
            }

            ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
            print current_time_us() . ” Set read_only=1 on the orig master.. “;
            $orig_master_handler->enable_read_only();
            if ( $orig_master_handler->is_read_only() ) {
              print “ok.n”;
            }
            else {
              die “Failed!n”;
            }

            ## Waiting for M * 100 milliseconds so that current update queries can complete
            my $time_until_kill_threads = 5;
            @threads = get_threads_util( $orig_master_handler->{dbh},
              $orig_master_handler->{connection_id} );
            while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
              if ( $time_until_kill_threads % 5 == 0 ) {
                printf
      “%s Waiting all running %d queries are disconnected.. (max %d milliseconds)n”,
                  current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
                if ( $#threads < 5 ) {
                  print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . “n”
                    foreach (@threads);
                }
              }
              sleep_until();
              $_tstart = [gettimeofday];
              $time_until_kill_threads–;
              @threads = get_threads_util( $orig_master_handler->{dbh},
                $orig_master_handler->{connection_id} );
            }

            ## Terminating all threads
            print current_time_us() . ” Killing all application threads..n”;
            $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
            print current_time_us() . ” done.n”;
            $orig_master_handler->enable_log_bin_local();
            $orig_master_handler->disconnect();

            ## Deregister original master’s VIP
            print “Disabling the VIP on old master: $orig_master_host n”;
            &stop_vip();
            $exit_code = 10;
            
            ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
            $exit_code = 0;
          };
          if ($@) {
            warn “Got Error: $@n”;
            exit $exit_code;
          }
          exit $exit_code;
        }
        elsif ( $command eq “start” ) {
          # 1. Register new master’s ip to the catalog database

      # We don’t return error even though activating updatable accounts/ip failed so that we don’t interrupt slaves’ recovery.
      # If exit code is 0 or 10, MHA does not abort
          my $exit_code = 10;
          eval {
            ## Update master ip on the catalog database, etc
            print “Enabling the VIP – $vip on the new master – $new_master_host n”;
            &start_vip();
            
            $exit_code = 0;
          };
          if ($@) {
            warn “Got Error: $@n”;
            exit $exit_code;
          }
          exit $exit_code;
        }
        elsif ( $command eq “status” ) {

          # do nothing
          exit 0;
        }
        else {
          &usage();
          exit 1;
        }
      }

      # A simple system call that enable the VIP on the new master 
      sub start_vip() {
          `ssh $ssh_user@$new_master_host ” $ssh_start_vip “`;
          `ssh $ssh_user@$new_master_host ” $ssh_ping “`;
      }

      # A simple system call that disable the VIP on the old_master
      sub stop_vip() {
          `ssh $ssh_user@$orig_master_host ” $ssh_stop_vip “`;
      }

      sub usage {
        print
      “Usage: master_ip_online_change –command=start|stop|status –orig_master_host=host –orig_master_ip=ip –orig_master_port=port –new_master_host=host –new_master_ip=ip –new_master_port=portn”;
        die;
      }

      注:若提示用户无repl权限且确认已添加,检查mysql.user表下的repl用户是否存在其它host存在repl权限为N的行,因为mha检查到任意行存在N即返回失败

      2.6.3 故障自动切换验证

      1、开启三个数据库节点实例,确保主从复制状态正常(执行2.6.2操作)

      2、后台打开监控程序,并持续监控日志

    • 4、完成切换

      故障切换内容:

      • 检查到master不通,尝试继续ping检查,达到阈值5触发自动切换
      • 若配置文件未指定候选节点,根据日志量选举新的master主节点
      • 若主节点未宕机,移除原master节点的vip
      • 其他节点与新的master建立复制关系,主从复制同步中
      • 新master节点启动vip,完成切换
      2.6.4 手动切换

      1)主节点已宕机

    • 2)主节点未宕机
    0

    评论0

    请先
    显示验证码
    没有账号?注册  忘记密码?