Deploying watchdog on ipfail-plugin for Heartbeat

The kernel uses watchdog to handle a hung system. Watchdog is simply a kernel module that checks a timer to determine whether the system is alive. Watchdog can reboot the system if it think it is hung. Watchdog is quite useful to to determine a server hang situation

To activate watchdog

respawn clusteruser /usr/lib/heartbeat/ipfail
#ping_group pingtarget
watchdog /dev/watchdog
auto_failback off

when you enable the watchdog option in your /etc/ha.d/ file, Heartbeat will write to /dev/watchdog file at an interval equal to the deadtime timer  If heartbeat fail to update the watchdog device, watchdog will initiate a kernel panic once the watchdog timeout period has expired.

Configure kernel to reboot when there is kernel panics

To force the kernel to reboot instead ojust hanging when there is kernel panics, you have to modify the boot arguments passed to the kernel. This can be done on /etc/grub.conf

default=0
title Fedora (
root (hd0,0)
kernel /boot/vmlinuz-2.6xxxxx.i686.PAE ro root=LABEL=/ panic=60
initrd /boot/initrd-2.6.xxxxx.i686.PAE.img

Alternatively, if you are using lilo.conf, you can add the following line


Remember to do a

# lilo -v