Pawsey Supercomputing Centre deploy 130 PB of Multi-Tier Storage with Ceph

Article is taken from HPCWire “Pawsey Supercomputing Centre to Deploy 130PB of Multi-Tier Storage

The system has been designed to be both cost-effective and scalable.

To maximise value, Pawsey has invested in Ceph, software for building storage systems out of generic hardware, and has built the online storage infrastructure around Ceph in-house. As more servers are added, the online object storage becomes more stable, resilient, and even faster.

“That’s how we were able to build a 60 PB system on this budget,” explains Gray.

“An important part of this long-term storage upgrade was to demonstrate how it can be done in a financially scalable way.  In a world of mega-science projects like the Square Kilometre Array, we need to develop more cost-effective ways of providing massive storage.”

HPCWire “Pawsey Supercomputing Centre to Deploy 130PB of Multi-Tier Storage

Rapid Growth in HPC Storage

The Article is taken from On-Prem No Longer Centre Stage for Broader HPC Storage

AI/ML, more sophisticated analytics, and larger-scale HPC problems all bode well for the on-prem storage market in high performance computing (HPC) and are an even bigger boon for cloud storage vendors.

Nossokoff points to several shifts in the storage industry and among the top supercomputing sites, particularly in the U.S. that reflect changing priorities with storage technologies, especially with the mixed file problems AI/ML introduce into the traditional HPC storage hierarchy. “We’re seeing a focus on raw sequential large block performance in terms of TB/s, high-throughput metadata and random small-block IOPS performance, cost-effective capacity for increasingly large datasets in all HPC workloads, and work to add intelligent placement of data so it’s where it needs to be.”

In addition to keeping pace with the storage tweaks to suit AI/ML as well as traditional HPC, there have been shifts in the vendor ecosystem this year as well. These will likely have an impact on what some of the largest HPC sites do over the coming years as they build and deploy their first exascale machines. Persistent memory is becoming more common, companies like Samsung are moving from NVMe to CXL, which is an indication of where that might fit in the future HPC storage and memory stack. Companies like Vast Data, which were once seen as an up and coming player in the on-prem storage hardware space for HPC transformed into a software company, Nossokoff says.

On-Prem No Longer Centre Stage for Broader HPC Storage – NextPlatform

Storage Performance Basics for Deep Learning

This is an interesting write-up from James Mauro from Nvidia on Storage Performance Basics for Deep Learning.

The complexity of the workloads plus the volume of data required to feed deep-learning training creates a challenging performance environment. Deep learning workloads cut across a broad array of data sources (images, binary data, etc), imposing different disk IO load attributes, depending on the model and a myriad of parameters and variables.”

For Further Reads… Do take a look at https://developer.nvidia.com/blog/storage-performance-basics-for-deep-learning/

Rapidfile Toolskit 1.0

RapidFile Toolkit 1.0 (formerly, PureTools) provides fast client-side alternatives for common Linux commands like ls, du, find, chown, chmod, rm and cp which has been optimized for the high level of concurrency supported by FlashBlade NFS. You will be

For CentOS/RHEL

# sudo rpm -U rapidfile-1.0.0-beta.5/rapidfile-1.0.0-beta.5-Linux.rpm

Examples:

Disk Usages:

% pdu -sh /scratch/user1

Copy Files:

% pcp -r -p -u /scratch/user1/ /backup/user1/

Remove Files:

% prm -rv /scratch/user1/

Change Ownership:

% pchown -Rv user1:usergroup /scratch/user1

Change Permission:

% pchmod -Rv 755 /scratch/user1

References:

  1. RapidFile Toolkit for FlashBlade (PureTools)