Deploying watchdog on ipfail-plugin for Heartbeat

The kernel uses watchdog to handle a hung system. Watchdog is simply a kernel module that checks a timer to determine whether the system is alive. Watchdog can reboot the system if it think it is hung. Watchdog is quite useful to to determine a server hang situation

To activate watchdog

respawn clusteruser /usr/lib/heartbeat/ipfail
#ping_group pingtarget
watchdog /dev/watchdog
auto_failback off

when you enable the watchdog option in your /etc/ha.d/ file, Heartbeat will write to /dev/watchdog file at an interval equal to the deadtime timer  If heartbeat fail to update the watchdog device, watchdog will initiate a kernel panic once the watchdog timeout period has expired.

Configure kernel to reboot when there is kernel panics

To force the kernel to reboot instead ojust hanging when there is kernel panics, you have to modify the boot arguments passed to the kernel. This can be done on /etc/grub.conf

#aaaaaa; line-height: 1.5; padding: 15px;">default=0
title Fedora (
root (hd0,0)
kernel /boot/vmlinuz-2.6xxxxx.i686.PAE ro root=LABEL=/ panic=60
initrd /boot/initrd-2.6.xxxxx.i686.PAE.img

Alternatively, if you are using lilo.conf, you can add the following line


Remember to do a

# lilo -v

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.