Taxonomy of File System (Part 1)

This writeup is a replica subset copy of the presentation of “How to Build a Petabyte Sized Storage System” by Dr. Ray Paden as given in LISA’09. This information is critical for Administrators to make decision on File Systems. For full information do look at “How to Build a Petabyte Sized Storage System”

1. Conventional I/O

  1. Used generally for “Local File Systems”
  2. Support POSIX I/O  model
  3. Limited form of parallelism
    • Disk level parallelism possible via striping
    • Intra-Node process parallelism (within the node)
  4. Journal extent based semantics
    • Journalling (AKA logging). Log information about operations performed on the file system meta-data as atomic transactions. In the event of a system failure, a file system is restored to a consistent state by replaying the log for the appropriate transactions.
  5. Caching is done via Virtual Memory which is slow….
  6. Example: ext3, NTFS, ReiserFS

2. Networked File Systems

  1. Disk access from remote nodes via network access
    • Generally based TCP/IP over ethernet
    • Useful for in-line interactive access (e.g. home directories)
  2. NFS is ubiquitos in UNIX/Linux Environments
    • Does not provide genuinely parallel model of I/O
      • Not cache coherent
      • Parallel write requires o_sync and -noac options to be safe
    • Poorer performance for HPC jobs especially parallel I/O
      • write: only 90MB/s on system capable of 400MB/s (4 tasks)
      • read: only 381 MB/s on a system capable of 40MB/s (16 tasks)
    • Used POSIX I/O API, but not its esmantics
    • Traditional NFS is limited by “single server” bottleneck
    • Parallel is not designed for parallel file access, by placing restriction on an file access and/or doing non-parallel file server, it may be good enough performance.

3. Network Attached Storage (AKA: Appliances)

  1. Appliance Concept
    • Focused on CIFS and/or NFS protocols
    • Integrated HW/SW storage product
      • Integrate servers, storage controllers, disks, networks, file system, protocol all into single product
      • Not intended for high performance storage
      • “black box” design
    • Provides an NFS server and/or CIFS/Samba solution
      • Server-based product; they do not improve client access or operation
      • Generally based on Ethernet LANS
    • Examples:
      • NetApp, Scale-out File System (SoFS)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.