A Brief Introduction to Linux Virtual Cluster

According to the project website, Linux Virtual Server (LVS) is a highly scalable and highly available server built on a cluster of real servers. The architecture of server cluster is fully transparent to end users, and the users interact with the cluster system as if it were only a single high-performance virtual server.

Much of the information is taken from the book “The Linux Enterprise Cluster” by Karl Cooper and the Project Website.

This diagram is taken from the project website which explain very clearly the purpose of this project.

The Linux Virtual Server  (LVS) accepts all incoming client computer requests for services and decides which one in the cluster nodes will reply to the client. Some naming convention used by the LVS community

  1. Real Server refers to the nodes inside the an LVS Cluster
  2. Client Computer refers to computer outside the LVS Cluster
  3. Virtual IP (VIP) address refers to the IP address the Director uses to offer services to client computer. A single LVS Director can have multiple VIPS offering different services to client computerx. This is the only IP that the client computer needs to know to access to.
  4. Real IP (RIP) address refers to the IP Address used on the cluster node. Only the LVS Director is required to know the IP Addresses of this node
  5. Director IP (DIP) address refers to the IP address the LVS Director uses to connect to the RIP network. As requests from the client computer comes, they are forwarded to the client PCs. the VIP and DIP can be on the same NIC
  6. Client IP Address (CIP) address refers to the IP address of the client pc.

 

A. Types of LVS Clusters

The types of LVS Clusters are usually described by the type of forwarding method, the LVS Director uses to relay incoming requests to the nodes inside the cluster

  1.  Network Address Translation (LVS-NAT)
  2. Direct routing (LVS-DR)
  3. IP Tunnelling (LVS-TUN)

 According to the Book “The Linux Enterprise Cluster by Karl Cooper”, the best forwarding method to use with a Linux Enterprise Cluster is a LVS-DR. The easier to build is LVS-NAT. The LVS-TUN is generally not in use for mission critical applications and mentioned for the sake of competencies.

  

A1. LVS-NAT Translation

 

 In a LVS-NAT setup, the Director uses the Linux kernel’s ability (from the kernel’s filter code) to translate IP Addresses and ports as packets pass through the kernel.

From the diagram above, the client send a request and is sent to the Director on its VIP. The Director redirects the request to the RIP of the Cluster Node. The Cluster Node reply back via its RIP to the Director. The Director convert the cluster node RIP into the VIP owned by the Director and return reply to the client.

Some basic notes on the LVS-NAT

  1. The Director intercepts all communication from clients to the cluster nodes
  2. The cluster nodes uses the Director DIP as their default gateway for reply packets to the client computers
  3. The Director can remap network port numbers
  4. Any operating system can be used inside the cluster
  5. Network or the Director may be a bottleneck. It is mentioned that a 400Mhz can saturate a 100MB connection
  6. It may be difficult to administer the cluster node as the administrator must enter the cluster node via the Director. Of course you can do a bit of network and firewall configuration to circumvent the limitation

 

A2 LVS-DR (Direct Routing)

s

In a LVS-DR setup, the Director for towards all the incoming requests to the nodes inside the cluster, but the nodes inside the cluster send their replies directly back to the client computer.

From the diagram, the client send a request and is send to the Director on its VIP. The Director redirects the request to the RIP of the Cluster Node. The cluster node reply back directly to the client and the packet uses the VIP as  its source IP Addresses. The Client is fooled into thinking it is talking to a single computer (Director)

Some Basic properties of the LVS-DR

  1. The cluster nodes must be on the same segment as the Director
  2. The Director intercepts inbound communication but not outbound communication between clients and the real servers
  3. The cluster node do not use the Director as the default gateway for reply packets to the client
  4. The Director cannot remap network port number
  5. Most operating systems can be used on the real server inside the cluster. However the Operating System must be capable of configuring the network interface to avoid replying to ARP broadcasts.
  6. If the Director fails, the cluster nodes becomes distributed servers each with its own IP Addresses. You can “save” the situation by using Round-Robin DNS to hand out the RIP addresses for each cluster node. Alternatively, you can “save” the situation by asking the users to connect to the cluster node directly.

 

A3 LVS-TUN (IP Tunnelling)

 

IP tunnelling ca be used to forward packets from one subnet ot virtual LAN (VLAN) to another subnet or VLAN even when the packets must pass through another network. Building on the IP Tunnelling capability that is part of the Linux Kernel, the LVS-TUN forwarding method. The LVS-TUN forwarding method allows you to place the cluster nodes on a cluster network that is not on the same network segment as the Director.

The LVS-TUN enhances the capability of the LVS-DR method of packet forwrding by encapsulating inbound requests for cluster service from the client computers so that they can forwarded to cluster nodes that are no on the same physical network segment as the Director.This is done by encapsulating one packet inside another packet.

Basic Properties of LVS-TUN

  1. The cluster nodes do not need to be on the same physical network segment as the Director
  2. The RIP addresses must not be private IP Addresses
  3. Return packet must not go through the Director.
  4. The Director cannot remap network port number
  5. Only Operating Systems that supports the IP tunnelling protocol can be the servers inside the cluster.
  6. The LVS-TUN is less reliable than the LVD-DR as anything that breaks the connection between the Director and the cluster nodes will drop all client connections.

For more information on LVS Scheduling methods, see Linux Virtual Server Scheduling Methods Blog entry

Deploying watchdog on ipfail-plugin for Heartbeat

The kernel uses watchdog to handle a hung system. Watchdog is simply a kernel module that checks a timer to determine whether the system is alive. Watchdog can reboot the system if it think it is hung. Watchdog is quite useful to to determine a server hang situation

To activate watchdog

respawn clusteruser /usr/lib/heartbeat/ipfail
ping 172.16.1.254     172.16.1.253
#ping_group pingtarget 172.16.1.254 172.16.1.253
watchdog /dev/watchdog
auto_failback off

when you enable the watchdog option in your /etc/ha.d/ha.cf file, Heartbeat will write to /dev/watchdog file at an interval equal to the deadtime timer  If heartbeat fail to update the watchdog device, watchdog will initiate a kernel panic once the watchdog timeout period has expired.

Configure kernel to reboot when there is kernel panics

To force the kernel to reboot instead ojust hanging when there is kernel panics, you have to modify the boot arguments passed to the kernel. This can be done on /etc/grub.conf

#aaaaaa; line-height: 1.5; padding: 15px;">default=0
timeout=0
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title Fedora (2.6.29.4-167.fc11.i686.PAE)
root (hd0,0)
kernel /boot/vmlinuz-2.6xxxxx.i686.PAE ro root=LABEL=/ panic=60
initrd /boot/initrd-2.6.xxxxx.i686.PAE.img

Alternatively, if you are using lilo.conf, you can add the following line

append="panic=60"

Remember to do a

# lilo -v

Deploying ipfail plug-in for HeartBeat

This is a continuation of the Blog Entry Deploying a Highly Available Cluster (Heartbeat) on CentOS. In this Blog Entry, we are looking ipfail plug-in that comes with Heartbeat package.

ipfail plug-in purpose is to allow me to specify one of more ping servers in the HeartBeat configuration file. If the master server fail to see one of the ping server and if the slave server can ping the ping server, it will take over the ownership of the reasource as it assumes, there is network comunication issues with the clients even though the master server or may not be down.

To use ipfail, you must first decide which device on the network both Heartbeat Servers must ping at all times. Enter the information in /etc/ha.d/ha.cf.

respawn clusteruser /usr/lib/heartbeat/ipfail
ping 172.16.1.254     172.16.1.253
#ping_group pingtarget 172.16.1.254 172.16.1.253
auto_failback off
  1. The first line above tells Heartbeat to start the ipfail program on both master and slave server and to respawn it it if it stops using clusteruser user created during the instllation
  2. The 2nd line specifices the 1 or more ping servers that the heartbeat servers must ping to ensure it has connection to the network. Make sure you use ping servers on both interface. In “ping”, the connectivity of each IP address listed are independent and equally important. Ping reply from any of the IP Address listed are considered important.
  3. A ping_group is considered by Heartbeat to be a single cluster node (group-name). The ability to communicate with any of the group members means that the group-name member is reachable.
  4. Deploying watchdog on ipfail-plugin for Heartbeat

Deploying a Highly Available Cluster (Heartbeat) on CentOS

Here the Heatbeat Program is configured to work over a seperate physical connection between the 2 servers (over the private switch). The seperate connection between the 2 servers can be either a serial cable or another ethernet network connection via cross-over cable or a mini switch.

Do note that it is recommended to use at least 2 seperate physical connection to eliminate a single point of failure. As such both of your servers will need 2 phyiscal NIC to allow this 2 seperate physical connection design

Step 1: Configuring the IP addresses
Firstly, do note that RFC 1918 defines the following IP addresses for private network

10.0.0.0 to 10.255.255.255 (10/8 prefix)
172.16.0.0. to 172.31..255.255 (172.16/12 prefix)
192.168.0.0 to 192.168.255.255 (192.168/16 prefix)

For the master node, I will be configuring the IP Addresses as followed:

192.168.1.2 for eth0 (private LAN for the private physical path for the the heartbeat)
172.16.1.2 for eth1 (existing corporate LAN for the other physcial path for the heartbeat

For the slave node, I will be configuring the IP Addresses as followed:

192.168.1.3 for eth0 (private LAN for the private physical path for the heartbeat)
172.16.1.3 for eth1 (existing corporate LAN for the other physical path for the heartbeat)

For the Virtual IP Address, I will be configuring the IP Addresses as followed:

172.16.1.1 (for Virtual Address)

Verify the name of the master node and slave nodes

uname -n
(n01 for master node; n02 for slave node)

Step 2: Install and configure the Heartbeat

# yum install heartbeat

Now we have to configure 3 of the files for the 2 nodes. They are

authkeys
ha.cf
haresources

Copy the sample files to the /etc/ha.d directory

cp /usr/share/doc/heartbeat-2.1.2/authkeys /etc/ha.d/
cp /usr/share/doc/heartbeat-2.1.2/ha.cf /etc/ha.d/
cp /usr/share/doc/heartbeat-2.1.2/haresources /etc/ha.d/

Step 3: Configuring the /etc/ha.d/ha.cf

logfile /var/log/ha-log
logfacility local
warntime 5
keepalive 2
deadtime 15
initdead 60
bcast eth0 eth1
udpport 694
auto_failback off
node n01 n02
  1. keepalive – specifies how many seconds there should be there between heartbeats
  2. deadtime – specifies how many seconds how long the backup will wait without receiving a heartbeat from the primary server before taking over
  3. initdead – specifies that after the heartbeat daemon starts, it should wait 120 seconds before starting any resources on the primary server.
  4. warntime – issues a warning that a no longer available peer node may be dead
  5. nodes n01 n02 is generated by uname -n
  6. auth_failback off – If the master server fail, the slave server will hold up the resources and will not return the control to the master server when it is brought back live. If auth_failover on, once the master server is brought back online, the slave server will return the resource back to the master server

Step 4: Configure the /etc/ha.d/authkeys

Edit and uncomment the line so that the lines inside /etc/ha.d/authkeys so like this

auth 1
1 sha 1 password
  1. 1  – is a simple key index, starting with 1
  2. sha 1 – the signature algorithm being used. You may use either md5 or sha1
  3. password – refer to the password you create. Make sure it is the same on both systems

Change the permission of the authkeys file

chmod 600 /etc/ha.d/authkeys

Step 5: Configure /etc/ha.d/haresources

The /etc/ha.d/haresources file contain the name of the resources the master server should own. Name of resources are usually a scripts found in /etc/init.d/(script) or /etc/ha.d/resource.d directory. If you stop the heartbeat daemon, you will stop the resource daemon (For example httpd) will not run.

n01 httpd

(where n01 is the master server and the httpd is the resource daemon handled by heartbeat )

Step 6: Install the Heartbeat and configure the backup server 

Install the heart-beat according to Step 2. Next copy all the configuration file to the slave server

scp -r /etc/ha.d/ root@n02:/etc/

Copy the resource daemon heartbeat is running from the master server to the slave server. For this example, we will assume httpd

scp /etc/httpd/conf/httpd.conf root@n02:/etc/httpd/conf/

Step 7: Start and test the heartbeat on master (n01) and slave nodes (n02)

/etc/init.d/heartbeat start
http://172.16.1.1/ (Virtual IP Address)

Stop the master node heartbeat and type http://172.16.1.1/

/etc/init.d/heartbeat stop
http://172.16.1.1/ (Virtual IP Address)

(n02 should show hold the daemon)

Further reading:

  1. Deploying ipfail plug-in for HeartBeat (Linux Cluster)

For more information, do look at

  1. Heartbeat User’s Guide 3.0
  2. Configuring A High Availability Cluster (Heartbeat) On CentOS
  3. Linux-HA