The kernel uses watchdog to handle a hung system. Watchdog is simply a kernel module that checks a timer to determine whether the system is alive. Watchdog can reboot the system if it think it is hung. Watchdog is quite useful to to determine a server hang situation
To activate watchdog
respawn clusteruser /usr/lib/heartbeat/ipfail ping 172.16.1.254 172.16.1.253 #ping_group pingtarget 172.16.1.254 172.16.1.253 watchdog /dev/watchdog auto_failback off
when you enable the watchdog option in your /etc/ha.d/ha.cf file, Heartbeat will write to /dev/watchdog file at an interval equal to the deadtime timer If heartbeat fail to update the watchdog device, watchdog will initiate a kernel panic once the watchdog timeout period has expired.
Configure kernel to reboot when there is kernel panics
To force the kernel to reboot instead ojust hanging when there is kernel panics, you have to modify the boot arguments passed to the kernel. This can be done on /etc/grub.conf
#aaaaaa; line-height: 1.5; padding: 15px;">default=0 timeout=0 splashimage=(hd0,0)/boot/grub/splash.xpm.gz hiddenmenu title Fedora (2.6.29.4-167.fc11.i686.PAE) root (hd0,0) kernel /boot/vmlinuz-2.6xxxxx.i686.PAE ro root=LABEL=/ panic=60 initrd /boot/initrd-2.6.xxxxx.i686.PAE.img
Alternatively, if you are using lilo.conf, you can add the following line
append="panic=60"
Remember to do a
# lilo -v