Installing NFS4 on CentOS 5 and 6

For more information on NFS4 and difference between NFS3 and NFS4, do look at A brief look at the difference between NFSv3 and NFSv4.

This tutorial is a guide on how to install NFSv4 on CentOS 5 and 6

Step1: Installing the packages

# yum install nfs-utils nfs4-acl-tools portmap

Some facts about the tools above as given from yum info.

nfs-utils –  The nfs-utils package provides a daemon for the kernel NFS server and related tools, which provides a much higher level of performance than the traditional Linux NFS server used by most users.

This package also contains the showmount program.  Showmount queries the mount daemon on a remote host for information about the NFS (Network File System) server on the remote host. For example, showmount can display the clients which are mounted on that host. This package also contains the mount.nfs and umount.nfs program.

nfs4-acl-toolsThis package contains commandline and GUI ACL utilities for the Linux NFSv4 client.

portmap – The portmapper program is a security tool which prevents theft of NIS (YP), NFS and other sensitive information via the portmapper. A portmapper manages RPC connections, which are used by protocols like NFS and NIS.

The portmap package should be installed on any machine which acts as a server for protocols using RPC.

Step 2: Exports the File System from the NFS Server (Similar to NFSv3 except with the inclusion of fsid=0)

/home           192.168.1.0/24(rw,no_root_squash,sync,no_subtree_check,fsid=0)
/install        192.168.1.0/24(rw,no_root_squash,sync,no_subtree_check,fsid=1)

The fsid=0 and fsid=1 option provides a number to use in identifying the filesystem. This number must be different for all the filesystems in /etc/exports that use the fsid option. This option is only necessary for exporting filesystems that reside on a block device with a minor number above 255.one directory can be exported with each fsid option.

Exports the file system

# exportfs -av

Restart the NFS service

# service nfs start

If you are supporting NFSv3,  you have to start portmap as NFSv3 requires them. As such, NFSv4 does not need to interact with rpcbind[1], rpc.lockd, and rpc.statd daemons. For more information see Fedora Chapter 9.  Network File System (NFS) – How it works for a more in-depth understanding.

# service portmap restart

Step 2: Client Mapping

# mount -t nfs4 192.168.1.1:/ /home

For other information,

  1. NFS4 Client unable to mount Server NFS4 file
  2. A brief look at the difference between NFSv3 and NFSv4

A brief look at the difference between NFSv3 and NFSv4

There are a few interesting differences between NFSv3 and NFSv4. Comparison of  NFSv3 and NFSv4 is quite hard to obtain and the information is referenced from NFS Version 4 Open Source Project.

From a File System perspective, there are

Export Management

  1. In NFSv3, client must rely on auxiliary protocol, the mount protocol to request a list of server’s exports and obtain root filehandle of a given export. It is fed into the NFS protocol proper once the root filehandle is obtained.
  2. In NFSv4 uses the virtual file system to present the server’s export and associated root filehandles to the client.
  3. NFSv4 defines a special operation to retrieve the Root filehandle and the NFS Server presents the appearance to the client that each export is just a directory in the pseudofs
  4. NFSv4 Pseudo File System is supposed to provide maximum flexibility. Exports Pathname on servers can be changed transparently to clients.

State

  1. NFSv3 is stateless. In other words if the server reboots, the clients can pick up where it left off. No state has been lost.
  2. NFSv3 is typically used with NLM, an auxiliary protocol for file locking. NLM is stateful that the server LOCKD keeps track of locks.
  3. In NFSv4, locking operations are part of the protocol
  4. NFSv4 servers keep track of open files and delegations

Blocking Locks

  1. NFSv3 rely on NLM. Basically, Client process is put to “sleep”. When a callback is received from the server, client process is granted the lock.
  2. For NFSv4, the client to put to sleep, but will poll the server periodically for the lock.
  3. The benefits of the mechanism is that there is one-way reachability from client to server. But it may be less efficient.

PBS (Portable Batch System) Commands on Torque

There are some PBS Commands that you can use for your customised PBS templates and scripts.

Note:

# Remarks:
#  A line beginning with # is a comments;
#  A line beginning with #PBS is a pbs command;
# Case sensitive.

Job Name (Default)

#PBS -N jobname

Specifies the number of nodes (nodes=N) and the number of processors per node (ppn=M) that the job should use

#PBS -l nodes=2:ppn=8

Specifies the maximum amount of physical memory used by any process in the job.

#PBS -l pmem=4gb

Specifies maximum walltime (real time, not CPU time)

#PBS -l walltime=24:00:00

Queue Name (If default is used, there is no need to specify)

#PBS -q fastqueue

Group account (for example, g12345) to be charged

#PBS -W group_list=g12345

Put both normal output and error output into the same output file.

#PBS -j oe

Send me an email when the job begins,end and abort

#PBS -m bea
#PBS -M mymail@mydomain.com

Export all my environment variables to the job

#PBS -V

Rerun this job if it fails

#PBS -r y

Predefined Environmental Variables for OpenPBS qsub

The following environment variable reflect the environment when the user run qsub

  1. PBS_O_HOSTThe host where you ran the qsub command.
  2. PBS_O_LOGNAMEYour user ID where you ran qsub
  3. PBS_O_HOMEYour home directory where you ran qsub
  4. PBS_O_WORKDIRThe working directory where you ran qsub

The following reflect the environment where the job is executing

  1. PBS_ENVIRONMENTSet to PBS_BATCH to indicate the job is a batch job, or # to PBS_INTERACTIVE to indicate the job is a PBS interactive job
  2. PBS_O_QUEUEThe original queue you submitted to
  3. PBS_QUEUEThe queue the job is executing from
  4. PBS_JOBNAMEThe job’s name
  5. PBS_NODEFILEThe name of the file containing the list of nodes assigned to the job

iWARP, RDMA and TOE

Remote Direct Access Memory Access (RDMA) allows data to be transferred over a network from the memory of one computer to the memory of another computer without CPU intervention. There are 2 types of RDMA hardware: Infiniband and  RDMA over IP (iWARP). OpenFabrics Enterprise Distribution (OFED) stack provides common interface to both types of RDMA hardware.

High Bandwidth Switches like 10G allows high transfer rates, but TCP/IP is not sufficient to make use of the entire 10G bandwidth due to data copying, packet processing and interrupt handling on the CPUs at each end of the TCP/IP connection. In a traditional TCP/IP network stack, an interrupt occurs for every packet sent or received, data is copied at least once in each host computer’s memory (between  user space and the kernel’s TCP/IP buffers). The CPU is responsible for processing multiple nested packet headers for all protocols levels in all incoming and outgoing packets.

Cards with iWARP and TCP Offloading (TOC) capbilities like Chelsio enables to the entire iWARP, TCP/IP and IP Protocol to offlload from the main CPU on to the iWARP/TOE Card to achieve throuput close to full capacity of 10G Ethernet.

RDMA based communication
(Taken from TCP Bypass Overview by Informix Solution (June 2011) Pg 11)

  1. Remove from CPU from being bottleneck by using User Space to User Space remote copy – after memory registration
  2. HCA is responsible for virtual-physcial -> physial-virtual address mapping
  3. Shared keys and exchanged for access rights and current ownership
  4. Memory has to be registered to lock into RAM and initalise HCA TLB
  5. RDMA read uses no CPU cycles after registratio on doner side.