February 13, 2021 by kittycool only

Performance Required for Deep Learning

There is this question that I wanted to find out about deep learning. What are essential System, Network, Protocol that will speed up the Training and/or Inferencing. There may not be necessary to employ the same level of requirements from Training to Inferencing and Vice Versa. I have received this information during a Nvidia Presentation

Training:

Scalability requires ultra-fast networking
Same hardware needs as HPC
Extreme network bandwidth
RDMA
SHARP (Mellanox Scalable Hierarchical Aggregation and Reduction Protocol)
GPUDirect (https://developer.nvidia.com/gpudirect)
Fast Access Storage

Influencing

Highly Transactional
Ultra-low Latency
Instant Network Response
RDMA
PeerDirect, GPUDirect

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Advertisements

Advertisements

Advertisements