August 7, 2011 by kittycool only

Infiniband versus Ethernet myths and misconceptions

This paper is a good writeup of the 8 myth and misconcption of Infiniband. This whitepaper Eight myths about InfiniBand WP 09-10 is from Chelsio. Here is a summary with my inputs on selected myths……

Opinion 1: Infiniband is lower latency than Ethernet

Infiniband vendors usually advertised latency in a specialized micro benchmarks with two servers in a back‐to‐back configuration. In a HPC production environment, application level latency is what matters. Infiniband lack congestion management and adaptive routing will result in interconnect hot spots unlike iWARP over Ethernet achieve reliability via TCP.

Opinion 2: QDR‐IB has higher bandwidth than 10GbE

This is interesting. A QDR is InfiniBand uses 8b/10b encoding, 40 Gbps InfiniBand is effectively 32 Gbps. However due to the limitation of PCIe “Gen 2”, you will hit a maximum of 26 Gbps. If you are using the PCIe “Gen 1”, you will hit a maximum of 13 Gbps. Do read Another article from Margalia Communication High-speed Remote Direct Memory Access (RDMA) Networking for HPC. Remember Chelsio Adapter comes 2 x 10GE card, you can trunk them together to come nearer to the maximum 26Gbps of Infiniband. Wait till the 40GBe comes into the market, it will be very challenging for Infinband.

Opinion 3: IB Switch scale better than 10GbE

Due to the fact that Infiniband Switch is a point-to-point switch it does not have congestion management and susceptibility to hot spots for large scale clusters unlike iWARP over Ethernet. I think we should take into account coming very low latency ASIC Switch See my blog entry Watch out Infiniband! Low Latency Ethernet Switch Chips are closing the gap and larger cut-through switches like ARISTA ultra low-latency cut-through 72-port switch with Fulcrum chipsets are in the pipeline. Purdue University 1300-nodes cluster uses Chelsio iWARP 10GE Cards.

July 20, 2011 by kittycool only

Vmware View for Android

VMware View Client for Android makes it easy to access your Windows virtual desktop from your Android with the best possible user experience on the Local Area Network (LAN) or across a Wide Area Network (WAN).

VMware View for Android – Tech Preview

July 17, 2011 by kittycool only

Basic Kickstart on CentOS 6 with DHCP-Less environment (Part 2)

cent-1

9. Put the CentOS CD into the Client and reboot. When you get to the initial CentOS 6 menu selection screen, you press “esc” so you will go to a boot prompt.

boot: linux ks=http://192.168.1.1/kickstart/base.cfg ksdevice=eth0 text asknetwork

(where 192.168.1.1 is the webserver which contains the anaconda file)

10. The System should install automatically and create an exact copy with the same configuration and password

11. To ease modification of the anaconda file, add and remove packages you may want to install kickstart config files

# yum install system-config-kickstart

For more information, you can also read

July 6, 2011 by kittycool only

Building HPC Clusters with 10Gigabit Ethernet (White Paper)

This document presents data comparing the performance of two well-known HPC applications and the impact of performance and bench-marked performance between
1GE, 10GE and InfiniBand. This paper is from ARISTA

Building HPC Clusters with 10Gigabit Ethernet (White Paper) (pdf)

July 5, 2011 by kittycool only

High Performance Cluster for Biomedical Research using 10GB Ethernet iWARP Fabric

This white paper is on a large research institute tht has achieve 36 TeraFLOPS with at least 84% efficiency using HPL benchmark on a cluster of 4032 cores using 10GB Ethernet iWARP Fabric and cut-through ARISTA switch

High Performance Cluster for Biomedical Research using 10GB Ethernet iWARP Fabric (pdf)

June 28, 2011 by kittycool only

Basic Kickstart on CentOS 6 with DHCP-Less environment (Part 1)

For this tutorial, we assume we wish to automate the installation of a CentOS Server using the Kickstart

Step 1: Setup of CentOS Server and looking at Anaconda Config File

Just follow the typical setup of a Clean CentOS Server. Just insert the DVD in, follow the easy to follow steps to install.
Once you have completed, you will notice that the Anaconda, the Red Hat Installation system saves a copy of the installation setup config file that was created by your installation choices to /root/ks-anaconda.cfg
You can use this file /root/ks-anaconda.cfg to create another identical machine
You can also use it to restore a machine to the original OS and then port the backup on top back.

Step 2: Setting up a On Demand Kickstart Server

Create a folder “kickstart” on the root of the web server (/var/www/html/kickstarts)
Copy /root/ks-anaconda.cfg to /var/www/html/kickstarts
Rename the ks-anaconda.cfg to base.cfg (or any name you wish)
Change the permission so that it can be read by the web server
```
# chmod 744 base.cfg
```
Tweak the base.cfg

Edit base.cfg….. Uncomment out the lines

clearpart --linux --drives=sda
part /boot --fstype ext3 --size=100 --ondisk=sda
part pv.3 --size=0 --grow --ondisk=sda
volgroup VolGroup00 --pesize=32768 pv.3
logvol / --fstype ext3 --name=LogVol00 --vgname=VolGroup00 --size=1024 --grow
logvol swap --fstype swap --name=LogVol01 --vgname=VolGroup00 --size=1000 --grow --maxsize=5952

Edit base.cfg . If you are using Static IP Addresses

network --device eth0 --bootproto static
--ip 192.168.1.2 --netmask 255.255.255.0
--gateway 192.168.1.1
--nameserver 192.168.1.100,192.168.101
--hostname mylinux.homelinux.org

Change the Installation Method.
On the 2nd line of base.cfg, change the original “cdrom” to the web install

url --url=http://url-to-web-install-server/CentOS-6.8
....
....
....
repo --name="CentOS" --baseurl="http://url-to-web-install-server/CentOS-6.8"

If you wish to disable selinux especially if you are setting up a cluster
```
selinux --disabled
```
See Basic Kickstart on CentOS 5 (Part 2) for the the rest of the tutorial….

June 27, 2011 by kittycool only

Red Hat Enterprise Linux Webinars

Informative Webinars from Red Hat

Red Hat Enterprise Linux Virtualization Webinar Series (http://www.redhat.com/webinars/virtualization/)
Red Hat Cloud Webinar Series (http://www.redhat.com/solutions/cloud/resources/buildingwebinarseries2011/)
Red Hat Webinar (http://www.redhat.com/webinars/)

May 22, 2011 by kittycool only

What about pNFS

Taken from Scale your file system with Parallel NFS (IBM developworks)

NFS is a mature technologies that form the mainstay of file system for many of Linux Boxes. However, one area that NFS found itself wanting is in the field of High Performance Computing where the I/O is highly intensive and demanding.

The above diagram shows the Linux NFS Server exporting its physical hard disks to the NFS Clients. The mounting an NFS file system is transparent to the client as when mounted, applications simply read and write files, subject to access control, oblivious to the machinations required to persist data. A pretty neat solution

NFS is quite capable, as evidenced by its widespread use as Network Attached Storage (NAS). It runs over both Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) and is (relatively) easy to administer. Furthermore, NFS version 4, the most recent, ratified version of the standard, improves security, furthers interoperability between Windows and UNIX®-like systems, and provides better exclusivity through lock leases. (NFSv4 was ratified in 2003.) NFS infrastructure is also inexpensive, because it typically runs well on common Ethernet hardware. NFS suits most problem domains.

However, one domain not traditionally well served by NFS is high-performance computing (HPC), where data files are very large, sometimes huge, and the number of NFS clients can reach into the thousands. (Think of a compute cluster or grid composed of thousands of commodity computing nodes.) Here, NFS is a liability, because the limits of the NFS server—be it bandwidth, storage capacity, or processor speed—throttle the overall performance of the computation. NFS is a bottleneck.

The next revision of NFS, version 4.1, includes an extension called Parallel NFS (pNFS) that combines the advantages of stock NFS with the massive transfer rates proffered by parallelized input and output (I/O). Using pNFS, file systems are shared from server to clients as before, but data does not pass through the NFS server. Instead, client systems and the data storage system connect directly, providing numerous parallelized, high-speed data paths for massive data transfers. After a bit of initialization and handshaking, the pNFS server is left “out of the loop,” and it no longer hinders transfer rates.

Like NFS, the pNFS server exports file systems and retains and maintains the canonical metadata describing each and every file in the data store. As with NFS, a pNFS client—here a node in a cluster—mounts the server’s exported file systems. Like NFS, each node treats the file system as if it were local and physically attached. Changes to metadata propagate through the network back to the pNFS server. Unlike NFS, however, a Read or Write of data managed with pNFS is a direct operation between a node and the storage system itself, pictured at the bottom in Figure 2. The pNFS server is removed from data transactions, giving pNFS a definite performance advantage.

Thus, pNFS retains all the niceties and conveniences of NFS and improves performance and scalability. The number of clients can be expanded to provide more computing power, while the size of the storage system can expand with little impact on client configuration. All you need to do is keep the pNFS catalog and storage system in sync.

The inner workings of pNFS

The pNFS protocol transfers file metadata (formally known as a layout) between the pNFS server and a client node. You can think of a layout as a map, describing how a file is distributed across the data store, such as how it is striped across multiple spindles. Additionally, a layout contains permissions and other file attributes. With metadata captured in a layout and persisted in the pNFS server, the storage system simply performs I/O.

The storage access protocol specifies how a client accesses data from the data store. As you might guess, each storage access protocol defines its own form of layout, because the access protocol and the organization of the data must be concordant.

The control protocol synchronizes state between the metadata server and the data servers. Synchronization, such as reorganizing files on media, is hidden from clients. Further, the control protocol is not specified in NFSv4.1; it can take many forms, allowing vendors the flexibility to compete on performance, cost, and features.

The client requests a layout for the file at hand.
The client obtains access rights by opening the file on the metadata server.
When authorized and given the layout, the client is free to access information from the data servers directly. Access proceeds according to the storage access protocol required for the type of store. (More on this below.)
If the client alters the file, the client’s instance of the layout is duly modified, and all modifications are committed back to the metadata server.
When the client no longer needs the file, it commits any remaining changes, returns its copy of the layout to the metadata server, and closes the file.

More specifically, a Read operation is a series of protocol operations:

The client sends a LOOKUP+OPEN request to the pNFS server. The server returns a file handle and state information.
The client requests a layout from the server through the LAYOUTGETcommand. The server returns the file layout.
The client issues a READ request to the storage devices, which initiates multiple Read operations in parallel.
When the client is finished reading, it expresses the end of the operation with LAYOUTRETURN.
If the layout shared with clients is ever obsolete because of separate activity, the server issues a CB_LAYOUTRECALL to indicate that the layout is no longer valid and must be purged and/or refetched.

A Write operation is similar, except that the client must issue a LAYOUTCOMMIT before LAYOUTRETURN to “publish” the changes to the file to the pNFS server.

Layouts can be cached in each client, further enhancing speed, and a client can voluntarily relinquish a layout from the server if it’s no longer of use. A server can also restrict the byte range of a Write layout to avoid quota limits or to reduce allocation overhead, among other reasons.

To prevent stale caches, the metadata server recalls layouts that have become inaccurate. Following a recall, every affected client must cease I/O and either fetch the layout anew or access the file through plain NFS. Recalls are mandatory before the server attempts any file administration, such as migration or re-striping.

April 25, 2011 by kittycool only

Installing Infiniband network with ESX – ESXI 4.x servers

This is an interesting article on How to create an infiniband network with ESX – ESXI 4.x servers. For more information see

create an infiniband network with ESX – ESXI 4.x servers

April 15, 2011 by kittycool only

Articles on Troubleshooting Performance Related Problems for VSphere 4.1

The hugely popular Performance Troubleshooting for VMware vSphere 4 guide is now updated for vSphere 4.1 . This document provides step-by-step approach for troubleshooting most common performance problems in vSphere-based virtual environments. The steps discussed in the document use performance data and charts readily available in the vSphere Client and esxtop to aid the troubleshooting flows. Each performance troubleshooting flow has two
parts:

How to identify the problem using specific performance counters.
Possible causes of the problem and solutions to solve it.

……………………

For more information, see

Troubleshooting Performance Related Problems in vSphere 4.1 Environments (pdf)
Performance & VMmark (Vmware Community)

The Linux Cluster

Linux Cluster Blog is a collection of how-to and tutorials for Linux Cluster and Enterprise Linux