一、准备工作
Heartbeat 3.0.6:
1 | # wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/958e11be8686.tar.bz2 |
Cluster Glue 1.0.12:
1 | # wget http://hg.linux-ha.org/glue/archive/0a7add1d9996.tar.bz2 |
Resource Agents 3.9.6:
1 | # wget https://github.com/ClusterLabs/resource-agents/archive/v3.9.6.tar.gz |
1 2 3 4 | # yum install gcc gcc-c++ autoconf automake libtool glib2-devel libxml2-devel bzip2 bzip2-devel e2fsprogs-devel libxslt-devel libtool-ltdl-devel asciidoc # groupadd haclient # useradd -g haclient hacluster # yum install httpd |
二、编译Cluster Glue
1 2 3 4 5 6 | # tar -jxvf cluster-clue-1.0.12.tar.bz2 # cd Reusable-Cluster-Components-glue--0a7add1d9996/ # ./autogen.sh # ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' ##注:32位系统去掉64 # make # make install |
编译错误1:
1 2 3 4 5 | Making all in libltdl gmake[1]: 进入目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/libltdl” gmake[1]: *** 没有规则可以创建目标“all”。 停止。 gmake[1]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/libltdl” make: *** [all-recursive] 错误 1 |
解决:
1 | # yum install libtool-ltdl-devel |
编译错误2:
1 2 3 4 5 6 | collect2: error: ld returned 1 exit status gmake[2]: *** [ipctest] 错误 1 gmake[2]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/lib/clplumbing” gmake[1]: *** [all-recursive] 错误 1 gmake[1]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/lib” make: *** [all-recursive] 错误 1 |
解决:
1 | # ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' |
注:如使用32位系统时,将LIBS改为LIBS='/lib/libuuid.so.1'
编译错误3:
1 2 3 4 5 6 | gmake[2]: a2x:命令未找到 gmake[2]: *** [hb_report.8] 错误 127 gmake[2]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/doc” gmake[1]: *** [all-recursive] 错误 1 gmake[1]: 离开目录“/root/Reusable-Cluster-Components-glue--0a7add1d9996/doc” make: *** [all-recursive] 错误 1 |
解决:
1 | # yum install asciidoc |
三、编译Resource Agents
1 2 3 4 5 6 | # tar -zxvf resource-agents-3.9.6.tar.gz # cd resource-agents-3.9.6 # ./autogen.sh #./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' # make # make install |
四、编译Heartbeat
1 2 3 4 5 6 7 | # tar -jxvf heartbeat-3.0.6.tar.bz2 # cd Heartbeat-3-0-958e11be8686/ # ./bootstrap # export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib" # ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' # make # make install |
1 2 3 4 5 6 7 8 | # cp doc/{ha.cf,haresources,authkeys} /usr/local/heartbeat/etc/ha.d/ # chkconfig --add heartbeat # chkconfig heartbeat on # chmod 600 /usr/local/heartbeat/etc/ha.d/authkeys # mkdir -pv /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/ # cp /usr/lib/ocf/lib/heartbeat/ocf-* /usr/local/heartbeat/usr/lib/ocf/lib/heartbeat/ # ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/ # ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/ |
编译错误1:
1 2 3 4 5 6 7 8 9 10 11 12 | checking heartbeat/glue_config.h usability... no checking heartbeat/glue_config.h presence... no checking for heartbeat/glue_config.h... no configure: error: in `/root/Heartbeat-3-0-958e11be8686': configure: error: Core development headers were not found See `config.log' for more details checking heartbeat/glue_config.h usability... no checking heartbeat/glue_config.h presence... no checking for heartbeat/glue_config.h... no configure: error: in `/root/Heartbeat-3-0-958e11be8686': configure: error: Core development headers were not found See `config.log' for more details |
解决:
1 | # export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib" |
编译错误2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | In file included from ../include/lha_internal.h:41:0, from uuid_parse.c:25: /usr/local/heartbeat/include/heartbeat/glue_config.h:105:0: error: "HA_HBCONF_DIR" redefined [-Werror] #define HA_HBCONF_DIR "/usr/local/heartbeat/etc/ha.d/" ^ In file included from ../include/lha_internal.h:38:0, from uuid_parse.c:25: ../include/config.h:390:0: note: this is the location of the previous definition #define HA_HBCONF_DIR "/usr/local/heartbeat/etc/ha.d" ^ uuid_parse.c:36:26: fatal error: replace_uuid.h: No such file or directory #include <replace_uuid.h> ^ cc1: all warnings being treated as errors compilation terminated. gmake[1]: *** [uuid_parse.lo] 错误 1 gmake[1]: 离开目录“/root/Heartbeat-3-0-958e11be8686/replace” make: *** [all-recursive] 错误 1 |
解决:
1 | # ./configure --prefix=/usr/local/heartbeat --with-daemon-user=hacluster --with-daemon-group=haclient --enable-fatal-warnings=no LIBS='/lib64/libuuid.so.1' |
五、Heartbeat配置
Heartbeat的配置主要涉及到ha.cf、haresources、authkeys这三个文件。其中ha.cf是主配置文件,haresource用来配置要让Heartbeat托管的服务,authkey是用来指定Heartbeat的认证方式。
1.配置ha.cf----主配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # cat /usr/local/heartbeat/etc/ha.d/ha.cf |grep ^[^#] debugfile /var/log/ha-debug ##用于记录heartbeat的调试信息 logfile/var/log/ha-log ##用于记录heartbeat的日志信息 logfacilitylocal0 ##设置heartbeat的日志,这里用的是系统日志 keepalive 2 ##设定心跳(监测)时间时间为2秒 deadtime 30 ##指定若备用节点在30秒内未收到主节点心跳信号,则接管主服务器资源 warntime 10 ##指定心跳延迟的时间为10秒,10秒内备节点不能接收主节点心跳信号, 即往日志写入警告日志,但不会切换服务 initdead 120 ##系统启动或重启后预留的忽略时间段,取值至少为deadtime的两倍 udpport 694 ##广播/单播通讯使用的Udp端口 bcast eno16777736 # Linux ##使用网卡eno16777736发送心跳检测 #mcast eth0 225.0.0.1 694 1 0 ##采用网卡eth0的Udp多播来组织心跳,一般在备用节点 不止一台时使用。Bcast、ucast和mcast分别代表广播、单播和多播,是组织心跳的的方式,任选其一 #ucast eno16777736 192.168.10.133 ##采用网卡eno16777736的udp单播来组织心跳,后面跟的IP地址为双机对方IP地址 auto_failback on ##定义当主节点恢复后,是否将服务自动切回 #watchdog /dev/watchdog ##可选配置,通过Heartbeat监控系统运行状态。 node node1 ##主节点名称,与uname -n显示一致 node node2 ##备用节点名称 ping 192.168.10.1 ##通过ping网关检测心跳是否正常,仅用来测试网络 respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail ##指定和heartbeat一起启动、关闭的进程,可选 #apiauth ipfail gid=haclient uid=hacluster ##设置启动IPfail的用户和组 |
注:
①watchdog /dev/watchdog:可选配置,通过Heartbeat监控系统运行状态。该特性需在内核中载入"softdog"内核模块,用来生成实际的设备文件,如系统中没有该模块,需进行指定,重新编译内核。编译完成输入 "insmod softdog"加载模块,然后输入"grep misc /proc/devices",输入"cat /proc/misc |grep watchdog",最后生成设备文件:"mknod /dev/watchdog c 10 130" 即可使用
②espawn hacluster /usr/lib/heartbeat/ipfail:指定和heartbeat一起启动、关闭的进程,可选。这些进程一般是和heartbeat集成的插件,遇到故障可自动重启。IPfail进程用于检测和处理网络故障,需配合ping语句指定ping node检测网络连通性;hacluster表示启动IPfail进程的用户。
2.配置haresources-----资源文件
Haresources文件用于指定双机系统的主节点、集群IP、子网掩码、广播地址及启动服务集群资源,
文件每一行可包含一个或多个资源脚本名,资源间使用空格隔开,参数间使用两个冒号隔开,主节点
和备份节点中资源文件haresources要完全一样。
一般格式为:
node-name network <resource-group>
node-name表示主节点的主机名,必须和ha.cf文件中指定的节点名一致。network用于设定集群的
IP地址、子网掩码和网络设备标识等。resource-group用于指定需Heartbeat托管的服务(即这些
服务可由Heartbeat来启动和关闭)。
注意:这里指定的IP地址就是集群对外服务的IP地址;
如要托管这些服务,必须将服务写成可通过start/stop来启动或关闭的脚本,放到/etc/init.d/
或/etc/ha.d/resource.d/目录下,Heartbeat会根据脚本名称自动去/etc/init.d或者
/etc/ha.d/resource.d目录下找到相应脚本进行启动或关闭操作。
1 2 | # cat /usr/local/heartbeat/etc/ha.d/haresources |grep -v "#" node1 IPaddr::192.168.10.222/24/eno16777736 |
node1是HA集群的主节点,IPaddr为heartbeat自带的执行脚本,heartbeat首先将执行/etc/ha.d/resource.d/IPaddr 192.168.10.222/24 start的操作,即虚拟一个子网掩码为255.255.255.0,IP为192.168.10.222的地址,此IP为heartbeat对外提供服务的网络地址,同时指定此IP使用的网络接口
注:如下有haresources详细中文解释
3.配置authkeys-----心跳密钥验证文件
1 2 3 | # grep -v "#" /usr/local/heartbeat/etc/ha.d/authkeys auth 2 2 sha1 HI! |
注:auth后填序号,可任意填写,但第二行开头必须为序号名,然后为验证方式,支持三种( crc md5 sha1 )方式验证,最后面是自定义密钥。
六、配置双机互信(可选)并复制文件至备机
HA-01(192.168.10.132):
1 2 | ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' ssh-copy-id -i .ssh/id_rsa.pub root@192.168.10.133 |
HA-02(192.168.10.133):
1 2 | ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' ssh-copy-id -i .ssh/id_rsa.pub root@192.168.10.132 |
复制配置文件至备机:
1 | # scp /usr/local/heartbeat/etc/ha.d/* root@192.168.10.133:/usr/local/heartbeat/etc/ha.d/ |
七、测试
1 2 3 4 | # systemctl start httpd # /etc/init.d/heartbeat start ##开启heartbeat # getenforce 0 # systemctl stop firewalld |
查看log信息
1 2 3 4 5 6 | # tail /var/log/ha-log Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Illegal directive [ucast] in /usr/local/heartbeat/etc/ha.d//ha.cf Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Illegal directive [ping] in /usr/local/heartbeat/etc/ha.d//ha.cf Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Client child command [/usr/lib/heartbeat/ipfail] is not executable Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Heartbeat not started: configuration error. Oct 26 10:07:18 node1 heartbeat: [2063]: ERROR: Configuration error, heartbeat not started. |
问题解决:
更改IPfail路径:
1 | respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail |
建立plugin软链接:
1 2 | # ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/ # ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/ |
继续查看log信息
1 2 3 4 5 6 7 8 9 10 11 | # tail /var/log/ha-log Oct 26 13:11:46 node1 heartbeat: [9744]: info: remote resource transition completed. Oct 26 13:11:46 node1 heartbeat: [9744]: info: node1 wants to go standby [foreign] Oct 26 13:11:46 node1 heartbeat: [9744]: info: standby: node2 can take our foreign resources Oct 26 13:11:46 node1 heartbeat: [11892]: info: give up foreign HA resources (standby). Oct 26 13:11:46 node1 heartbeat: [11892]: info: foreign HA resource release completed (standby). Oct 26 13:11:46 node1 heartbeat: [9744]: info: Local standby process completed [foreign]. Oct 26 13:11:47 node1 heartbeat: [9744]: WARN: 1 lost packet(s) for [node2] [11:13] Oct 26 13:11:47 node1 heartbeat: [9744]: info: remote resource transition completed. Oct 26 13:11:47 node1 heartbeat: [9744]: info: No pkts missing from node2! Oct 26 13:11:47 node1 heartbeat: [9744]: info: Other node completed standby takeover of foreign resources. |
问题解决:
1 2 | # vi /usr/local/heartbeat/etc/ha.d/haresources node1 IPaddr::192.168.10.222/24/eno16777736 |
注:haresources下需添加IPaddr::
问题:
1 2 3 4 5 6 7 8 9 10 11 | # tail /var/log/ha-log Oct 26 17:01:55 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (425 messages in queue) Oct 26 17:01:56 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (426 messages in queue) Oct 26 17:01:57 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (427 messages in queue) Oct 26 17:01:57 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (428 messages in queue) Oct 26 17:01:58 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (429 messages in queue) Oct 26 17:01:59 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (430 messages in queue) Oct 26 17:01:59 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (431 messages in queue) Oct 26 17:02:00 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (432 messages in queue) Oct 26 17:02:01 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (433 messages in queue) Oct 26 17:02:01 node1 heartbeat: [1755]: WARN: Message hist queue is filling up (434 messages in queue) |
解决:node2未关闭防火墙,systemctl stop firewalld关闭防火墙问题解决
问题:
1 2 3 | # tail /var/log/ha-log IPaddr(IPaddr_192.168.10.222)[6854]:2015/10/26_17:20:58 ERROR: Setup problem: couldn't find command: ifconfig /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[6828]:2015/10/26_17:20:58 ERROR: Program is not installed |
解决:yum install net-tools后即可使用ifconfig命令
重启heartbeat,继续查看log信息:
1 2 3 4 5 6 7 8 9 10 11 12 | # systemctl restart hearbeat # tail /var/log/ha-log Oct 26 19:25:36 node1 heartbeat: [1783]: info: Configuration validated. Starting heartbeat 3.0.6 Oct 26 19:25:37 node1 heartbeat: [1783]: info: heartbeat: version 3.0.6 Oct 26 19:25:37 node1 heartbeat: [1783]: info: Heartbeat generation: 1445827146 Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eno16777736 Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eno16777736 - Status: 1 Oct 26 19:25:37 node1 heartbeat: [1783]: info: glib: ping heartbeat started. Oct 26 19:25:37 node1 heartbeat: [1783]: info: Local status now set to: 'up' Oct 26 19:25:37 node1 heartbeat: [1783]: info: Link 192.168.10.1:192.168.10.1 up. Oct 26 19:25:37 node1 heartbeat: [1783]: info: Status update for node 192.168.10.1: status ping Oct 26 19:25:37 node1 heartbeat: [1783]: info: Link node1:eno16777736 up. |
使用ifconfig命令查看
浏览器输入
down掉node1节点,查看会不会漂移至node2节点
node1:
1 | # systemctl stop heartbeat |
node2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # tail /var/log/ha-log mach_down(default)[1937]:2015/10/26_20:03:58 info: Taking over resource group IPaddr::192.168.10.222/24/eno16777736 ResourceManager(default)[1964]:2015/10/26_20:03:58 info: Acquiring resource group: node1 IPaddr::192.168.10.222/24/eno16777736 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[1992]:2015/10/26_20:03:58 INFO: Resource is stopped ResourceManager(default)[1964]:2015/10/26_20:03:58 info: Running /usr/local/heartbeat/etc/ha.d//resource.d/IPaddr 192.168.10.222/24/eno16777736 start IPaddr(IPaddr_192.168.10.222)[2083]:2015/10/26_20:03:58 INFO: Using calculated netmask for 192.168.10.222: 255.255.255.0 IPaddr(IPaddr_192.168.10.222)[2083]:2015/10/26_20:03:58 INFO: eval ifconfig eno16777736:0 192.168.10.222 netmask 255.255.255.0 broadcast 192.168.10.255 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.10.222)[2057]:2015/10/26_20:03:58 INFO: Success mach_down(default)[1937]:2015/10/26_20:03:58 info: /usr/local/heartbeat/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down(default)[1937]:2015/10/26_20:03:58 info: mach_down takeover complete for node node1. Oct 26 20:03:58 node2 heartbeat: [1711]: info: mach_down takeover complete. mach_down(default)[1937]:2015/10/26_20:03:58 info: mach_down takeover complete for node node1. Oct 26 20:03:58 node2 heartbeat: [1711]: info: mach_down takeover complete. Oct 26 20:04:29 node2 heartbeat: [1711]: WARN: node node1: is dead Oct 26 20:04:29 node2 heartbeat: [1711]: info: Dead node node1 gave up resources. Oct 26 20:04:29 node2 heartbeat: [1711]: info: Link node1:eno16777736 dead. Oct 26 20:04:29 node2 ipfail: [1737]: info: Status update: Node node1 now has status dead Oct 26 20:04:29 node2 ipfail: [1737]: info: NS: We are still alive! Oct 26 20:04:29 node2 ipfail: [1737]: info: Link Status update: Link node1/eno16777736 now has status dead Oct 26 20:04:30 node2 ipfail: [1737]: info: Asking other side for ping node count. Oct 26 20:04:30 node2 ipfail: [1737]: info: Checking remote count of ping nodes. |
使用ifconfig命令查看IP是否漂移至node2:
IP已漂移至node2,使用浏览器输入
OK啦!
附:heartbeat官网:
本文出自 “” 博客,请务必保留此出处