NFS mount errors with “clnt_create: RPC: Unknown host” for CentOS 6

When attempting to mount CentOS 6, my mount fails with

clnt_create: RPC: Unknown host

Diagnostic:

If we do a more thorough diagnostic, this is the issue

# showmount -e  
clnt_create: RPC: Unknown host  
# showmount -e localhost  
Export list for localhost:  
/export/my_data \*

Resolution:

Taken from Redhat Site

Implement forward and reverse lookups (A records and CNAME records) in DNS and have the system point towards the DNS servers. Implement for both IPv4 and IPv6. If unable to resolve DNS issues, change the /etc/hosts file from this:

Change from

::1          localhost localhost.localdomain localhost6 localhost6.localdomain6

To

::1          machine_hostname localhost localhost.localdomain localhost6 localhost6.localdomain6

Restart the NFS service and check on the showmount -e localhost and showmount -e and attempt to mount the share.

# service nfs restart  
# showmount -e localhost  
# showmount -e

Installing NFS4 on CentOS 5 and 6

For more information on NFS4 and difference between NFS3 and NFS4, do look at A brief look at the difference between NFSv3 and NFSv4.

This tutorial is a guide on how to install NFSv4 on CentOS 5 and 6

Step1: Installing the packages

# yum install nfs-utils nfs4-acl-tools portmap

Some facts about the tools above as given from yum info.

nfs-utils –  The nfs-utils package provides a daemon for the kernel NFS server and related tools, which provides a much higher level of performance than the traditional Linux NFS server used by most users.

This package also contains the showmount program.  Showmount queries the mount daemon on a remote host for information about the NFS (Network File System) server on the remote host. For example, showmount can display the clients which are mounted on that host. This package also contains the mount.nfs and umount.nfs program.

nfs4-acl-toolsThis package contains commandline and GUI ACL utilities for the Linux NFSv4 client.

portmap – The portmapper program is a security tool which prevents theft of NIS (YP), NFS and other sensitive information via the portmapper. A portmapper manages RPC connections, which are used by protocols like NFS and NIS.

The portmap package should be installed on any machine which acts as a server for protocols using RPC.

Step 2: Exports the File System from the NFS Server (Similar to NFSv3 except with the inclusion of fsid=0)

/home           192.168.1.0/24(rw,no_root_squash,sync,no_subtree_check,fsid=0)
/install        192.168.1.0/24(rw,no_root_squash,sync,no_subtree_check,fsid=1)

The fsid=0 and fsid=1 option provides a number to use in identifying the filesystem. This number must be different for all the filesystems in /etc/exports that use the fsid option. This option is only necessary for exporting filesystems that reside on a block device with a minor number above 255.one directory can be exported with each fsid option.

Exports the file system

# exportfs -av

Restart the NFS service

# service nfs start

If you are supporting NFSv3,  you have to start portmap as NFSv3 requires them. As such, NFSv4 does not need to interact with rpcbind[1], rpc.lockd, and rpc.statd daemons. For more information see Fedora Chapter 9.  Network File System (NFS) – How it works for a more in-depth understanding.

# service portmap restart

Step 2: Client Mapping

# mount -t nfs4 192.168.1.1:/ /home

For other information,

  1. NFS4 Client unable to mount Server NFS4 file
  2. A brief look at the difference between NFSv3 and NFSv4

A brief look at the difference between NFSv3 and NFSv4

There are a few interesting differences between NFSv3 and NFSv4. Comparison of  NFSv3 and NFSv4 is quite hard to obtain and the information is referenced from NFS Version 4 Open Source Project.

From a File System perspective, there are

Export Management

  1. In NFSv3, client must rely on auxiliary protocol, the mount protocol to request a list of server’s exports and obtain root filehandle of a given export. It is fed into the NFS protocol proper once the root filehandle is obtained.
  2. In NFSv4 uses the virtual file system to present the server’s export and associated root filehandles to the client.
  3. NFSv4 defines a special operation to retrieve the Root filehandle and the NFS Server presents the appearance to the client that each export is just a directory in the pseudofs
  4. NFSv4 Pseudo File System is supposed to provide maximum flexibility. Exports Pathname on servers can be changed transparently to clients.

State

  1. NFSv3 is stateless. In other words if the server reboots, the clients can pick up where it left off. No state has been lost.
  2. NFSv3 is typically used with NLM, an auxiliary protocol for file locking. NLM is stateful that the server LOCKD keeps track of locks.
  3. In NFSv4, locking operations are part of the protocol
  4. NFSv4 servers keep track of open files and delegations

Blocking Locks

  1. NFSv3 rely on NLM. Basically, Client process is put to “sleep”. When a callback is received from the server, client process is granted the lock.
  2. For NFSv4, the client to put to sleep, but will poll the server periodically for the lock.
  3. The benefits of the mechanism is that there is one-way reachability from client to server. But it may be less efficient.

Network File System ( NFS ) in High Performance Networks (White Papers)

This article “Network File System ( NFS ) in High Performance Networks” by Carnegic Mellon is very interesting article about NFS Performance. Do take a look. Here is a summary of their fundings

  1. For point-to-point throughput, IP over InfiniBand (Connected Mode) is comparable to a native InfiniBand.
  2. When a disk is a bottleneck, NFS can benefit from neither IPoIB nor RMDA
  3. When a disk is not a bottleneck, NFS benefits significantly from both IPoIB and RDMA. RDMA is better than IPoIB by ~20%
  4. As the number of concurrent read operations increases, aggregate throughputs achieved for both IPoIB and RDMA significantly improve with no disadvantage for IPoIB

Tuning NFSD Server Daemon for Performance

Do note that NFSD Daemon play an important component in performance tuning. Here are some tips

  1. Number of Instances of the NFSD Server Daemon. By default, the instances of NFSD = 8. From Optimizing NFS Performance, the author recommend  that system admin should use at the very least one daemon per processor, but four to eight per processor may be a better rule of thumb. To modify the number of nfsd, you can edit the RPCNFSDCOUNT at the NFS startup script (/etc/rc.d/init.d/nfs on RHEL, Fedora or CentOS)
  2. If you want to determine the nfsd yourself, you can look at the NFS statistics in details which are provided by the Linux kernel at /proc/net/rpc/nfsd
  3. A sample of /proc/net/rpc/nfsd

    rc 0 47750055 170015423
    fh 39 0 0 0 0
    io 376475178 3831903891
    th 8 18573687 48505.610 3718.131 2831.176 0.000 1813.483 1468.532 1399.593 1551.349 0.000 12224.473
    ra 16 122635704 971110 83992 77018 15770 11434 1655 550 882 407 518440
    net 217768755 0 217768891 1072
    rpc 217765688 0 0 0 0
    proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    proc3 22 3 24906977 238795 7255551 10595346 837 124313278 42671631 2419345 5043 5865 0 2399297 5130 2560 1593 48707 133600 34910 3 0 2650721
    proc4 2 0 0


  4. To analyse some of the output parameters, I’ll be drawing most of the information below from an excellent article “Understanding Linux nfsd statistics”. A brief summary is as followed:
    rc reports the stats for the NFS reach cache. The three numbers are cache hits, cache misses, and”nocache” which is presumably requests that bypassed the cache.
    io reports the overall I/O counter. The 2 numbers are bytes read, bytes written
    th reports the nfsd thread utilization. The first number is the numberof nsfd thread configured. The second number of times any thread is used. The remaining ten numbers are histogram representing a 10% range of thread utilisation in seconds
    ra reports the read-ahead cache. The first number is the  read-ahead cache size. The next 10 numbers are the number of times an entry was found in the read-ahead cache < 10%, < 20%, …, < 100% in to the cache. The last number on this line is the number of times an entry was not found in the cache.

Tuning NFS Server exports file for performance

Tuning NFS Server exports file (/etc/exports) for performance. As far as I know, these 2 options are the most important

  1. async: The default export behavior for both NFS Version 2 and Version 3 protocols, used by exportfs is “asynchronous”. According to Optimizing NFS Performance. This default permits the server to reply to client requests as soon as it has processed the request and handed it off to the local file system, without waiting for the data to be written to stable storage. This is indicated by the async option denoted in the server’s export list. It yields better performance at the cost of possible data corruption if the server reboots while still holding unwritten data and/or metadata in its caches. This possible data corruption is not detectable at the time of occurrence, since the async option instructs the server to lie to the client, telling the client that all data has indeed been written to the stable storage, regardless of the protocol used.
  2. no_subtree_check: For NFS version 1.0.x and above, To speed up transfer, disable subtree check especially if you are exporting large directory.
/tmp *(rw,async,no_subtree_check)

For other good materials on the /etc/exports, do check out
http://linux.die.net/man/5/exports

Dealing with Overflow of Fragmented Packets

Most of my information written in this blog can be found at NFS for clusters and Optimizing NFS Performance

One method to check for fragmented packets issues with the NFS Server is to use the IP: ReasmFails in the file /proc/net/snmp

# head -2 /proc/net/snmp | cut -d' ' -f17
ReasmFails
2

ReasmFails represents the number of fragment reassembly failures, if the ReasmFails goes up too quickly during heavy file activity, it means that the system may be having issues

According to Optimising NFS Performance, if the network topology is too complex,  fragment routes may differ, and may not all arrive at the Server for reassembly.  Once the number of unprocessed, fragmented packets reaches the number specified by ipfrag_high_thresh (in bytes), the NFS Server kernel will simply start throwing away fragmented packets until the number of incomplete packets reaches the number specified by ipfrag_low_thresh.

You can reduce the number of lost packets on the server by increasing the buffer size for fragmented packets.

$ echo 524288 > /proc/sys/net/ipv4/ipfrag_low_thresh
$ echo 524288 > /proc/sys/net/ipv4/ipfrag_high_thresh

which is doubling the defaults

Testing for Saturated Network for NFS

I’ve taken most of this information from the article “NFS for Clusters” and “Linux NFS and Automounter Administration” by Erez Zadok

Profiling Write Operation at NFS

$ time dd if=/dev/zero of=testfile bs=4k count=16384
16384+0 records in
16384+0 records out
67108864 bytes (67 MB) copied, 0.518172 s, 130 MB/s
real    0m0.529s
user    0m0.016s
sys    0m0.500s

time = time a simple command or give resource usage
dd = convert and copy a file
if =  read from FILE instead of stdin
of =  write to FILE instead of stdin
bs = read and write BYTES bytes at a time
count = BLOCKS

According to Wikipedia /dev/zero is a special file that provides as many null characters (ASCII NUL, 0x00) as are read from it. One of the typical uses is to provide a character stream for overwriting information. Another might be to generate a clean file of a certain size. Like /dev/null, /dev/zero acts as a source and sink for data. All writes to /dev/zero succeed with no other effects (the same as for /dev/null, although /dev/null is the more commonly used data sink); all reads on /dev/zero return as many NULs as characters requested.
 

Profiling Read Operation for NFS

When profiling reads instead of writes, call umount and mount to flush caches, or the read might be instantaneous and give the impression of quick read

$ cd /
$ umount /mnt/shareddrive
$ mount /mnt/shareddrive
$ cd /mnt/shareddrive
$ dd if=testfile of=/dev/null bs=4k count=16384

Here after unmounting and mounting again the shared NFS, the testfile which exists on the shared drive is read and writen to /dev/null.

According to the article “NFS for Clusters“, if more than 3% of calls are retransmitted, then there are problems with the network or NFS server.
Look for NFS failures on a shared disk server with

$ nfsstat -s
or
$ nfsstat -o rpc

Network design consideration for NFS

Network Design are an important consideration for NFS. Note the followings:

  1. If possible, dedicate the network to isolate the NFS Traffic.
  2. Trunking of multiple network to imporive network connections (Will write a blog entry later)
  3. If you have budget, you can consider a high-quality NAS which uses NFS accelerator components such as nonvolative RAM to commit NFS write operation as soon as possible which equivalent up to equivalent of async with reliability.