Keepalived+Nginx高可用集群搭建笔记

4,875 阅读10分钟

前言

Keepalived是以VRRP(Virtual Router Redundancy Protocol,虚拟路由冗余协议)协议为实现基础的,这个协议可以认为是实现了路由器高可用的协议,将多台提供相同功能的路由器组成一个路由器组。

原理:在整个Keepalived集群中会有一个MASTER和多个BACKUPmaster节点上有一个对外提供服务的Virtual IP(VIP),并且MASTER会发组播的心跳信息,当BACKUP收不到VRRP包时就认为MASTER宕掉了,这时就需要根据VRRP优先级来选举一个BACKUP作为MASTER,当MASTER恢复时,BACKUP又会释放在MASTER故障时自身接管的IP资源和服务,恢复到原来的备用角色,这样就可以保证路由器的高可用。

环境说明

  • 操作系统:CentOS 7 (Minimal Install)
# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
  • 演示环境
VIP IP 主机名
10.10.0.10 10.10.0.11 master
10.10.0.10 10.10.0.12 backup

部署

更换服务器源仓库

# mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
# curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
# yum makecache
# yum -y update

安装Keepalived

yum安装

Keepalived可以使用yum直接安装,在master服务器和backup服务器执行:

# yum -y install keepalived

源码编译安装

参考文档:Keepalived官方文档

安装依赖库

# yum -y install openssl-devel libnl3-devel ipset-devel iptables-devel file-devel net-snmp-devel glib2-devel json-c-devel pcre2-devel libnftnl-devel libmnl-devel

下载Keepalived

# wget https://github.com/acassen/keepalived/archive/v2.0.18.tar.gz

解压Keepalived

# tar -zxvf v2.0.18.tar.gz
# cd keepalived-2.0.18

开始安装

# ./build_setup
./build_setup:行3: aclocal: 未找到命令
./build_setup:行4: autoheader: 未找到命令
./build_setup:行5: automake: 未找到命令
./build_setup:行6: autoreconf: 未找到命令

如果出现如上报错,安装autotools系列工具

# yum -y install aclocal autoheader automake autoreconf

继续

# ./configure
# make && make install

最后复制相关配置文件到系统默认路径

# mkdir /etc/keepalived
# cp ./keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
# cp ./keepalived/etc/init.d/keepalived /etc/init.d/
# cp ./keepalived/etc/sysconfig/keepalived /etc/sysconfig/

修改/usr/lib/systemd/system/keepalived.servicePIDFile的值为/var/run/keepalived.pid

配置Keepalived

Keepalived提供了两种模式

  • 抢占式:MASTER与BACKUP节点上state配置不同,当MASTER节点宕掉后由BACKUP节点接手MASTER节点的VIP与服务,在MASTER节点恢复后重新由MASTER节点来接手VIP与服务,BACKUP节点继续回到备用状态。
  • 非抢占式:MASTER与BACKUP节点上state配置都为BACKUP,且在vrrp_instance块下两个节点都增加nopreempt,表示不争抢VIP。两个节点启动后默认都为BACKUP状态,双方在发送组播信息后,会根据优先级来选举一个MASTER出来,由于两者都配置了nopreempt,所以MASTER从故障中恢复后不会抢占VIP,这样会避免VIP切换可能造成的服务延迟

MASTER节点

首先,我们先确认下网卡及IP

# ip addr show | grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
    inet 10.10.0.11/8 brd 10.255.255.255 scope global noprefixroute ens192
    inet6 fd08:815:48b2::e91/128 scope global noprefixroute
    inet6 fd08:815:48b2:0:d419:f3f5:85de:b72/64 scope global noprefixroute
    inet6 fe80::49a2:321d:8cf6:651a/64 scope link noprefixroute

可以看到本次使用的是ens192这块网卡,IP为:10.10.0.11,然后我们编辑keepalived配置文件

# vim /etc/keepalived/keepalived.conf

配置如下:

! Configuration File for keepalived

global_defs {
   # email 收件人
   notification_email {
     acassen@firewall.loc
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   # email 发件人
   notification_email_from Alexandre.Cassen@firewall.loc
   # email SMTP服务器地址
   smtp_server 192.168.200.1
   smtp_connect_timeout 30
   # 标识本节点的ID,通常为hostname
   router_id akiya01
   vrrp_skip_check_adv_addr
   vrrp_strict
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}
# vrrp实例,相同实例的备节点名字要相同
vrrp_instance VI_1 {
    # 指定keepalived的角色,“MASTER”表示此主机是主服务器,“BACKUP”表示此主机是备用服务器
    state MASTER
    # 指定网卡接口,这里改为我们当前使用的网卡“ens192”
    interface ens192
    # 虚拟路由标识,这个标识是一个数字,同一个vrrp实例使用唯一的标识
    # 即同一vrrp_instance下,MASTER和BACKUP必须是一致的
    virtual_router_id 51
     # 定义优先级;数字越大,优先级越高(0-255)
    # 在同一个vrrp_instance下,“MASTER”的优先级必须大于“BACKUP”的优先级
    priority 100
    # 设定MASTER与BACKUP负载均衡器之间同步检查的时间间隔,单位是秒
    advert_int 1
    # 设置验证类型和密码
    authentication {
        # 设置验证类型,主要有PASS和AH两种
        auth_type PASS
        # 设置验证密码,在同一个vrrp_instance下,MASTER与BACKUP必须使用相同的密码才能正常通信
        auth_pass akiya
    }
    # 有故障时是否激活邮件通知
    #smtp_alert
    # 禁止抢占服务
    # 默认情况,当MASTER服务挂掉之后,BACKUP自动升级为MASTER并接替它的任务
    # 当MASTER服务恢复后,升级为MASTER的BACKUP服务又自动降为BACKUP,把工作权交给原MASTER
    # 当配置了nopreempt,MASTER从挂掉到恢复,不再将服务抢占过来。
    #nopreempt
    # 虚拟IP,两个节点设置必须一样。可以设置多个,一行写一个
    virtual_ipaddress {
        # 虚拟IP为10.10.0.10/8;绑定接口为ens192;别名ha:net,主备相同
        10.10.0.10/8 dev ens192 label ha:net
    }
}

BACKUP节点

BACKUP配置基本与Master一致,仅有部分地方变动

  • state角色为BACKUP
  • interface为网卡的ID,需要根据机器实际情况确认填写
  • virtual_route_id要和MASTER一致,默认为51
  • priority要比MASTER

修改BACKUP节点Keepalived配置,部署配置如下:

! Configuration File for keepalived
...
rrp_instance VI_1 {
    # 指定Keepalived的角色,BACKUP表示此主机是备用节点
    state BACKUP
    # 确认网卡的ID
    interface ens192
    # 即同一vrrp_instance下,“MASTER”和“BACKUP”必须是一致的
    virtual_router_id 51
    # 优先级,比MASTER小
    priority 99
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    # 虚拟IP,两个节点设置必须一样。可以设置多个,一行写一个
    virtual_ipaddress {
        # 虚拟IP为10.10.0.10/8;绑定接口为ens192;别名ha:net,主备相同
        10.10.0.10/8 dev ens192 label ha:net
    }
}

启动服务

配置完MASTERBACKUP节点后,我们就可以启动并测试服务了

添加防火墙规则

因为vrrp使用224.0.0.18这个组播地址

# firewall-cmd --direct --permanent --add-rule ipv4 filter INPUT 0 --in-interface ens192 --destination 224.0.0.18 --protocol vrrp -j ACCEPT
# firewall-cmd --direct --permanent --add-rule ipv4 filter OUTPUT 0 --out-interface ens192 --destination 224.0.0.18 --protocol vrrp -j ACCEPT
# firewall-cmd --reload

查看规则

# firewall-cmd --direct --get-rules ipv4 filter INPUT
0 --in-interface ens192 --destination 224.0.0.18 --protocol vrrp -j ACCEPT
# firewall-cmd --direct --get-rules ipv4 filter OUTPUT
0 --out-interface ens192 --destination 224.0.0.18 --protocol vrrp -j ACCEPT

启动Keepalived

启动Keepalived并添加到开机自启

# systemctl start keepalived
# systemctl enable keepalived

然后我们再次查看MASTER节点IP可以发现新增了一个

# ip addr show | grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
    inet 10.10.0.11/8 brd 10.255.255.255 scope global noprefixroute ens192
    inet 10.10.0.10/32 scope global ha:net
    inet6 fd08:815:48b2::e91/128 scope global noprefixroute
    inet6 fd08:815:48b2:0:d419:f3f5:85de:b72/64 scope global noprefixroute
    inet6 fe80::49a2:321d:8cf6:651a/64 scope link noprefixroute

同样在BACKUP节点上查看IP结果为

# ip addr show | grep inet
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
    inet 10.10.0.12/8 brd 10.255.255.255 scope global noprefixroute ens192
    inet6 fd08:815:48b2::1ca/128 scope global noprefixroute
    inet6 fd08:815:48b2:0:b840:33aa:f6de:253b/64 scope global noprefixroute
    inet6 fe80::a96d:fe89:d95:3dfd/64 scope link noprefixroute

测试Keepalived

安装tcpdump工具

# yum -y install tcpdump

MASTER节点上执行如下命令

# tcpdump -i ens192 vrrp -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
10:10:24.193943 IP 10.10.0.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
10:10:25.194972 IP 10.10.0.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
10:10:26.196009 IP 10.10.0.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
10:10:27.197038 IP 10.10.0.11 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype simple, intvl 1s, length 20
...

如果关闭MASTER上的Keepalived则无包可抓,并且VIP会对应的漂移到BACKUP上去。

配置日志

Keepalived默认日志输出到系统日志/var/log/messages中,因为系统日志很多,在查询时相对麻烦。

我们可以将Keepalived日志单独拿出来,这里需要修改日志输出路径。

  1. 修改Keepalived配置
# vim /etc/sysconfig/keepalived

更改如下:

# Options for keepalived. See `keepalived --help' output and keepalived(8) and
# keepalived.conf(5) man pages for a list of all options. Here are the most
# common ones :
#
# --vrrp               -P    Only run with VRRP subsystem.
# --check              -C    Only run with Health-checker subsystem.
# --dont-release-vrrp  -V    Dont remove VRRP VIPs & VROUTEs on daemon stop.
# --dont-release-ipvs  -I    Dont remove IPVS topology on daemon stop.
# --dump-conf          -d    Dump the configuration data.
# --log-detail         -D    Detailed log messages.
# --log-facility       -S    0-7 Set local syslog facility (default=LOG_DAEMON)
#

KEEPALIVED_OPTIONS="-D"

修改KEEPALIVED_OPTIONS="-D"KEEPALIVED_OPTIONS="-D -d -S 0"-S指定syslog的facility

  1. 修改/etc/rsyslog.conf,在末尾添加
...
local0.*            /var/log/keepalived.log
  1. 重启日志记录服务
# systemctl restart rsyslog
  1. 重启Keepalived
# systemctl restart keepalived
  1. 查看日志
# ls -lh /var/log/keepalived.log
-rw-------. 1 root root 14K 9月  30 13:22 /var/log/keepalived.log
# head -n 10 /var/log/keepalived.log
Sep 30 13:22:52 master Keepalived[30707]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Sep 30 13:22:52 master Keepalived[30707]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 30 13:22:52 master Keepalived[30708]: Starting Healthcheck child process, pid=30709
Sep 30 13:22:52 master Keepalived[30708]: Starting VRRP child process, pid=30710
Sep 30 13:22:52 master Keepalived_healthcheckers[30709]: Initializing ipvs
Sep 30 13:22:52 master Keepalived_healthcheckers[30709]: Opening file '/etc/keepalived/keepalived.conf'.
Sep 30 13:22:52 master Keepalived_healthcheckers[30709]: ------< Global definitions >------
Sep 30 13:22:52 master Keepalived_healthcheckers[30709]: Router ID = ha01
Sep 30 13:22:52 master Keepalived_healthcheckers[30709]: Smtp server = 192.168.200.1
Sep 30 13:22:52 master Keepalived_healthcheckers[30709]: Smtp server port = 25

Keepalived+Nginx

在实际情况中,业务停止而Keepalived服务还在工作的情况会导致VIP无法找到对应的服务,这时就需要写守护进程脚本,下面以Nginx为例。

Nginx安装

  • 增加对应的Nginx源
# rpm -ivh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpm
  • yum安装Nginx
# yum -y install nginx
  • NGINX启动、停止、重启、开机自启
# systemctl start nginx   # 启动Nginx服务
# systemctl stop nginx    # 停止Nginx服务
# systemctl restart nginx # 重启Nginx服务
# systemctl enable nginx  # 开机自启Nginx服务
# nginx -t                # 检查配置文件正确性
# nginx -s reload         # 平滑重载配置
  • 检查启动是否成功启动
# curl -i localhost

创建Nginx服务检测脚本

分别在主备服务器的/etc/keepalived目录下创建nginx_check.sh脚本,脚本如下:

#!/bin/bash
# author:akiya
A=$(ps -C nginx --no-header | wc -l)
if [ $A -eq 0 ]; then
    systemctl start nginx
    sleep 2
    if [ $(ps -C nginx --no-header | wc -l) -eq 0 ]; then
        systemctl stop keepalived
    fi
fi

为脚本添加可执行权限

# chmod +x /etc/keepalived/nginx_check.sh

此脚本用于Keepalived定时检测Nginx的服务状态,如果Nginx停止,会尝试重新启动Nginx,如果启动失败,会将Keepalived服务停止,使IP漂移到备用节点上。

修改Keepalived配置

/etc/keepalived/keepalived.conf中增加检测脚本配置

global_defs {
   ...
   }
...
# keepalived会定时执行脚本并对脚本执行的结果进行分析,动态调整vrrp_instance的优先级
# 如果脚本执行结果为0,并且weight配置的值大于0,则优先级相应的增加。
# 如果脚本执行结果非0,并且weight配置的值小于 0,则优先级相应的减少。
# 其他情况,维持原本配置的优先级,即配置文件中priority对应的值。
vrrp_script chk_nginx {
       script "/etc/keepalived/nginx_check.sh"
       interval 2  #每2秒检测一次nginx的运行状态
       weight -20  #失败一次,将自己的优先级-20
}
vrrp_instance VI_1 {
    ...
    virtual_ipaddress {
        10.10.0.10/8 dev ens192 label ha:net
    }
    track_script {
        # Nginx存活状态监测脚本
        chk_nginx
    }
}

问题处理

Unable to access script

在使用yum安装的版本为1.3.5,在配置文件中编写vrrp_script块后,启动服务遇到一个问题Unable to access script,经查资料发现Git Issues中有提到这个问题,新版本目前已解决。

部分报错相关日志如下:

Sep 30 14:25:42 master Keepalived_vrrp[30930]:     chk_nginx no match, ignoring...
Sep 30 14:26:04 master Keepalived_vrrp[30944]:     nginx_check no match, ignoring...
Sep 30 14:44:18 master Keepalived_vrrp[30980]: Unable to access script `/etc/keepalived/nginx_check.sh`
Sep 30 14:44:18 master Keepalived_vrrp[30980]: Disabling track script chk_nginx since not found

如果使用yum安装可以在安装前查看下对应的包信息

# yum info keepalived

default user...

使用编译安装后(安装版本2.0.18),添加Nginx检测脚本并启动Keepalived服务后,日志显示 default user 'keepalived_script' for script execution does not exist - please create.

解决方法:在配置文件中添加运行检测脚本的用户或组即可

! Configuration File for keepalived

global_defs {
...
   script_user root
   enable_script_security
}
...